X-Git-Url: https://git.xandkar.net/?p=dups.git;a=blobdiff_plain;f=README.md;h=6677b476c5a529dc3ece5dbf5de8c965b339290b;hp=04b84ecbbef4b1b95bef57749c4a0ae7a70172d4;hb=HEAD;hpb=dbb52e5c345aeafd3b7a2f142ca6bf2039616574 diff --git a/README.md b/README.md index 04b84ec..6677b47 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,12 @@ dups ==== -Find duplicate files in given directory trees. Where "duplicate" is defined as -having the same (and non-0) file size and MD5 hash digest. +Find duplicate files in N given directory trees. Where "duplicate" is defined +as having the same (and non-0) file size and MD5 hash digest. -It is roughly equivalent to the following one-liner: +It is roughly equivalent to the following one-liner (included as `dups.sh`): ```sh -find . -type f -print0 | xargs -0 -P 6 -I % md5sum % | awk '{digest = $1; sub("^" $1 " +", ""); path = $0; paths[digest, ++cnt[digest]] = path} END {for (digest in cnt) {n = cnt[digest]; if (n > 1) {print(digest, n); for (i=1; i<=n; i++) {printf " %s\n", paths[digest, i]} } } }' +find . -type f -print0 | xargs -0 -P $(nproc) -I % md5sum % | awk '{digest = $1; sub("^" $1 " +", ""); path = $0; paths[digest, ++cnt[digest]] = path} END {for (digest in cnt) {n = cnt[digest]; if (n > 1) {print(digest, n); for (i=1; i<=n; i++) {printf " %s\n", paths[digest, i]} } } }' ``` which, when indented, looks like: