… and also all files in your home. :/

As fdupes annoyingly doesn’t have an --exclude switch, I use following snippet to scan my home directory for duplicate files.

( I use duff instead of fdupes, simply because the duff man page has a better example to start with.)

Place a file called ‘.dupes.exclude’ in home and put directories to exclude separated by newline in there.

It is important to exclude the cwd by putting a . in the exclude file.

Note that this will also exclude all files in the current working directory.

Problem here is, that you either exclude the cwd or duff will happily ignore all your effort to not dive into certain directories. (Find will pass a . to duff — which will scan all directories because of the -r switch.)

Basic example:

.
VirtualBox VMs
media

Then you can use

    find . -type d $(printf "! -name %s " $(cat ~/.dupes.exclude)) -print0 | duff -0rz | xargs -0 -n1 echo

to print a list of duplicate files.

Sample output:

~/findtest % ls -lgG
total 24
drwxrwxr-x 2 4096 Feb 25 23:14 1
drwxrwxr-x 2 4096 Feb 25 23:14 2
-rw-rw-r-- 1    6 Feb 26 11:25 exclude
-rw-rw-r-- 1    5 Feb 25 23:14 testfile1
-rw-rw-r-- 1    5 Feb 25 23:14 testfile2
drwxrwxr-x 2 4096 Feb 26 09:45 zxc

~/findtest % find -type d $(printf "! -name %s " $(cat exclude)) -print0 | duff -0rz | xargs -0 -n1 echo
4 files in cluster 1 (5 bytes, digest 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83)
./1/testfile2
./1/testfile1
./2/testfile2
./2/testfile1