Commit | Line | Data |
---|---|---|
b839d582 SK |
1 | dups |
2 | ==== | |
3 | ||
4 | Find duplicate files in given directory trees. Where "duplicate" is defined as | |
5 | having the same MD5 hash digest. | |
6 | ||
7 | It is roughly equivalent to the following one-liner: | |
8 | ```sh | |
9 | find . -type f -exec md5sum '{}' \; | awk '{paths[$1, ++cnt[$1]] = $2} END {for (path in cnt) {n = cnt[path]; if (n > 1) {print(path, n); for (i=1; i<=n; i++) {print(" ", paths[path, i])} } } }' | |
10 | ``` | |
11 | ||
12 | which, when indented, looks like: | |
13 | ```sh | |
14 | find . -type f -exec md5sum '{}' \; \ | |
15 | | awk ' | |
16 | { | |
17 | paths[$1, ++cnt[$1]] = $2 | |
18 | } | |
19 | END { | |
20 | for (path in cnt) { | |
21 | n = cnt[path] | |
22 | if (n > 1) { | |
23 | print(path, n) | |
24 | for (i=1; i<=n; i++) { | |
25 | print(" ", paths[path, i]) | |
26 | } | |
27 | } | |
28 | } | |
29 | }' | |
30 | ``` | |
31 | ||
32 | and works well-enough, until you start getting weird file paths that are more | |
33 | of a pain to handle quoting for than re-writing this thing in OCaml :) | |
34 | ||
35 | Example | |
36 | ------- | |
37 | After building, run `dups` on the current directory tree: | |
38 | ||
39 | ```sh | |
40 | $ make | |
41 | Finished, 0 targets (0 cached) in 00:00:00. | |
42 | Finished, 5 targets (0 cached) in 00:00:00. | |
43 | ||
44 | $ ./dups . | |
45 | df4235f3da793b798095047810153c6b 2 | |
46 | "./_build/dups.ml" | |
47 | "./dups.ml" | |
48 | d41d8cd98f00b204e9800998ecf8427e 2 | |
49 | "./_build/dups.mli" | |
50 | "./dups.mli" | |
51 | 087809b180957ce812a39a5163554502 2 | |
52 | "./_build/dups.native" | |
53 | "./dups" | |
54 | Processed 102 files in 0.025761 seconds. | |
55 | ``` | |
56 | Note that the report line (`Processed 102 files in 0.025761 seconds.`) is | |
57 | written to `stderr`, so that `stdout` is safely processable by other tools. |