X-Git-Url: https://git.xandkar.net/?p=dups.git;a=blobdiff_plain;f=README.md;fp=README.md;h=a6e6c54804ace51f30c790c11ec56ad992fbbdb5;hp=0000000000000000000000000000000000000000;hb=b839d582481df4861b7bdf123f404dcf13ee5bbd;hpb=cfcdf90aaf8b18958ae0d0a1b1cb9793ca9f7909 diff --git a/README.md b/README.md new file mode 100644 index 0000000..a6e6c54 --- /dev/null +++ b/README.md @@ -0,0 +1,57 @@ +dups +==== + +Find duplicate files in given directory trees. Where "duplicate" is defined as +having the same MD5 hash digest. + +It is roughly equivalent to the following one-liner: +```sh +find . -type f -exec md5sum '{}' \; | awk '{paths[$1, ++cnt[$1]] = $2} END {for (path in cnt) {n = cnt[path]; if (n > 1) {print(path, n); for (i=1; i<=n; i++) {print(" ", paths[path, i])} } } }' +``` + +which, when indented, looks like: +```sh +find . -type f -exec md5sum '{}' \; \ +| awk ' + { + paths[$1, ++cnt[$1]] = $2 + } + END { + for (path in cnt) { + n = cnt[path] + if (n > 1) { + print(path, n) + for (i=1; i<=n; i++) { + print(" ", paths[path, i]) + } + } + } + }' +``` + +and works well-enough, until you start getting weird file paths that are more +of a pain to handle quoting for than re-writing this thing in OCaml :) + +Example +------- +After building, run `dups` on the current directory tree: + +```sh +$ make +Finished, 0 targets (0 cached) in 00:00:00. +Finished, 5 targets (0 cached) in 00:00:00. + +$ ./dups . +df4235f3da793b798095047810153c6b 2 + "./_build/dups.ml" + "./dups.ml" +d41d8cd98f00b204e9800998ecf8427e 2 + "./_build/dups.mli" + "./dups.mli" +087809b180957ce812a39a5163554502 2 + "./_build/dups.native" + "./dups" +Processed 102 files in 0.025761 seconds. +``` +Note that the report line (`Processed 102 files in 0.025761 seconds.`) is +written to `stderr`, so that `stdout` is safely processable by other tools.