Skip files with unique and 0 sizes
[dups.git] / README.md
1 dups
2 ====
3
4 Find duplicate files in given directory trees. Where "duplicate" is defined as
5 having the same MD5 hash digest.
6
7 It is roughly equivalent to the following one-liner:
8 ```sh
9 find . -type f -exec md5sum '{}' \; | awk '{digest = $1; path = $2; paths[digest, ++count[digest]] = path} END {for (digest in count) {n = count[digest]; if (n > 1) {print(digest, n); for (i=1; i<=n; i++) {print " ", paths[digest, i]} } } }'
10 ```
11
12 which, when indented, looks like:
13 ```sh
14 find . -type f -exec md5sum '{}' \; \
15 | awk '
16 {
17 digest = $1
18 path = $2
19 paths[digest, ++count[digest]] = path
20 }
21
22 END {
23 for (digest in count) {
24 n = count[digest]
25 if (n > 1) {
26 print(digest, n)
27 for (i=1; i<=n; i++) {
28 print " ", paths[digest, i]
29 }
30 }
31 }
32 }'
33 ```
34
35 and works well-enough, until you start getting weird file paths that are more
36 of a pain to handle quoting for than re-writing this thing in OCaml :)
37
38 Example
39 -------
40 After building, run `dups` on the current directory tree:
41
42 ```sh
43 $ make
44 Finished, 0 targets (0 cached) in 00:00:00.
45 Finished, 5 targets (0 cached) in 00:00:00.
46
47 $ ./dups .
48 df4235f3da793b798095047810153c6b 2
49 "./_build/dups.ml"
50 "./dups.ml"
51 d41d8cd98f00b204e9800998ecf8427e 2
52 "./_build/dups.mli"
53 "./dups.mli"
54 087809b180957ce812a39a5163554502 2
55 "./_build/dups.native"
56 "./dups"
57 Processed 102 files in 0.025761 seconds.
58 ```
59 Note that the report line (`Processed 102 files in 0.025761 seconds.`) is
60 written to `stderr`, so that `stdout` is safely processable by other tools.
This page took 0.045221 seconds and 5 git commands to generate.