Add README and LICENSE
[dups.git] / README.md
1 dups
2 ====
3
4 Find duplicate files in given directory trees. Where "duplicate" is defined as
5 having the same MD5 hash digest.
6
7 It is roughly equivalent to the following one-liner:
8 ```sh
9 find . -type f -exec md5sum '{}' \; | awk '{paths[$1, ++cnt[$1]] = $2} END {for (path in cnt) {n = cnt[path]; if (n > 1) {print(path, n); for (i=1; i<=n; i++) {print(" ", paths[path, i])} } } }'
10 ```
11
12 which, when indented, looks like:
13 ```sh
14 find . -type f -exec md5sum '{}' \; \
15 | awk '
16 {
17 paths[$1, ++cnt[$1]] = $2
18 }
19 END {
20 for (path in cnt) {
21 n = cnt[path]
22 if (n > 1) {
23 print(path, n)
24 for (i=1; i<=n; i++) {
25 print(" ", paths[path, i])
26 }
27 }
28 }
29 }'
30 ```
31
32 and works well-enough, until you start getting weird file paths that are more
33 of a pain to handle quoting for than re-writing this thing in OCaml :)
34
35 Example
36 -------
37 After building, run `dups` on the current directory tree:
38
39 ```sh
40 $ make
41 Finished, 0 targets (0 cached) in 00:00:00.
42 Finished, 5 targets (0 cached) in 00:00:00.
43
44 $ ./dups .
45 df4235f3da793b798095047810153c6b 2
46 "./_build/dups.ml"
47 "./dups.ml"
48 d41d8cd98f00b204e9800998ecf8427e 2
49 "./_build/dups.mli"
50 "./dups.mli"
51 087809b180957ce812a39a5163554502 2
52 "./_build/dups.native"
53 "./dups"
54 Processed 102 files in 0.025761 seconds.
55 ```
56 Note that the report line (`Processed 102 files in 0.025761 seconds.`) is
57 written to `stderr`, so that `stdout` is safely processable by other tools.
This page took 0.045236 seconds and 4 git commands to generate.