README.md

   1 dups
   2 ====
   3
   4 Find duplicate files in given directory trees. Where "duplicate" is defined as
   5 having the same MD5 hash digest.
   6
   7 It is roughly equivalent to the following one-liner:
   8 ```sh
   9 find . -type f -exec md5sum '{}' \; | awk '{paths[$1, ++cnt[$1]] = $2} END {for (path in cnt) {n = cnt[path]; if (n > 1) {print(path, n); for (i=1; i<=n; i++) {print("    ", paths[path, i])} } } }'
  10 ```
  11
  12 which, when indented, looks like:
  13 ```sh
  14 find . -type f -exec md5sum '{}' \; \
  15 | awk '
  16     {
  17         paths[$1, ++cnt[$1]] = $2
  18     }
  19     END {
  20         for (path in cnt) {
  21             n = cnt[path]
  22             if (n > 1) {
  23                 print(path, n)
  24                 for (i=1; i<=n; i++) {
  25                     print("    ", paths[path, i])
  26                 }
  27             }
  28         }
  29     }'
  30 ```
  31
  32 and works well-enough, until you start getting weird file paths that are more
  33 of a pain to handle quoting for than re-writing this thing in OCaml :)
  34
  35 Example
  36 -------
  37 After building, run `dups` on the current directory tree:
  38
  39 ```sh
  40 $ make
  41 Finished, 0 targets (0 cached) in 00:00:00.
  42 Finished, 5 targets (0 cached) in 00:00:00.
  43
  44 $ ./dups .
  45 df4235f3da793b798095047810153c6b 2
  46     "./_build/dups.ml"
  47     "./dups.ml"
  48 d41d8cd98f00b204e9800998ecf8427e 2
  49     "./_build/dups.mli"
  50     "./dups.mli"
  51 087809b180957ce812a39a5163554502 2
  52     "./_build/dups.native"
  53     "./dups"
  54 Processed 102 files in 0.025761 seconds.
  55 ```
  56 Note that the report line (`Processed 102 files in 0.025761 seconds.`) is
  57 written to `stderr`, so that `stdout` is safely processable by other tools.