README.md

   1 dups
   2 ====
   3
   4 Find duplicate files in given directory trees. Where "duplicate" is defined as
   5 having the same (and non-0) file size and MD5 hash digest.
   6
   7 It is roughly equivalent to the following one-liner:
   8 ```sh
   9 find . -type f -exec md5sum '{}' \; | awk '{digest = $1; path = $2; paths[digest, ++count[digest]] = path} END {for (digest in count) {n = count[digest]; if (n > 1) {print(digest, n); for (i=1; i<=n; i++) {print "    ", paths[digest, i]} } } }'
  10 ```
  11
  12 which, when indented, looks like:
  13 ```sh
  14 find . -type f -exec md5sum '{}' \; \
  15 | awk '
  16     {
  17         digest = $1
  18         path = $2
  19         paths[digest, ++count[digest]] = path
  20     }
  21
  22     END {
  23         for (digest in count) {
  24             n = count[digest]
  25             if (n > 1) {
  26                 print(digest, n)
  27                 for (i=1; i<=n; i++) {
  28                     print "    ", paths[digest, i]
  29                 }
  30             }
  31         }
  32     }'
  33 ```
  34
  35 and works well-enough, until you start getting weird file paths that are more
  36 of a pain to handle quoting for than re-writing this thing in OCaml :)
  37
  38 Example
  39 -------
  40 After building, run `dups` on the current directory tree:
  41
  42 ```sh
  43 $ make
  44 Finished, 0 targets (0 cached) in 00:00:00.
  45 Finished, 5 targets (0 cached) in 00:00:00.
  46
  47 $ ./dups .
  48 e40e3c4330857e2762d043427b499301 2
  49     "./_build/dups.native"
  50     "./dups"
  51 3d1c679e5621b8150f54d21f3ef6dcad 2
  52     "./_build/dups.ml"
  53     "./dups.ml"
  54 Time                       : 0.031084 seconds
  55 Considered                 : 121
  56 Hashed                     : 45
  57 Skipped due to 0      size : 2
  58 Skipped due to unique size : 74
  59 Ignored due to regex match : 0
  60
  61 ```
  62 Note that the report lines are written to `stderr`, so that `stdout` is safely
  63 processable by other tools:
  64
  65 ```
  66 $ ./dups . 2> /dev/null
  67 e40e3c4330857e2762d043427b499301 2
  68     "./_build/dups.native"
  69     "./dups"
  70 3d1c679e5621b8150f54d21f3ef6dcad 2
  71     "./_build/dups.ml"
  72     "./dups.ml"
  73
  74 $ ./dups . 1> /dev/null
  75 Time                       : 0.070765 seconds
  76 Considered                 : 121
  77 Hashed                     : 45
  78 Skipped due to 0      size : 2
  79 Skipped due to unique size : 74
  80 Ignored due to regex match : 0
  81
  82 ```