README ¶
lsidups
...is a barebone tool for finding image duplicates (or just similar images) from your terminal.
How to use
Pipe a list of files to compare into stdin or just input (-i
) a directory you want to check. It then will output images grouped by similarity so you can process them as you please.
It mainly relies on images (MIT).
Phash from goimagehash (BSD-2) is used to catch cropped duplicates (with that images
tends to struggle) and to allow for variable similarity threshold.
lsidups itself is just a wrapper that tries to provide a way to compare a lot (10k+) images reasonably fast from cli, and then allow you to process found duplicates in some other more convenient tool, like e.g. nsxiv or with some custom script (see examples directory).
Image formats support
At the moment of writing, it supports only jpeg, png, gif and webp (but not some of the ones with ICC profile).
Video
If you want to find video duplicates instead - try lsvdups (it's not very good, though).
Install
NOTE: If upgrading - consider deleting cache:
rm $XDG_CACHE_HOME/lsidups/*
Arch way
lsidups-git
Go way
Make sure you have go and git installed, and $(go env GOPATH)/bin
is in your $PATH
.
go install github.com/MahouShoujoMivutilde/lsidups@latest
Options
Usage of lsidups:
-T int
number of processing threads (default number of logical cores)
-c use caching (works per file path, honors mtime)
-cache-path string
where cache file will be stored (default "$XDG_CACHE_HOME/lsidups/" with fallback
-> "$HOME/.cache/lsidups/" -> "$APPDATA/lsidups" -> current directory)
-ct
remove missing/changed (on drive) files from cache and exit
-d int
phash threshold distance (less = more precise match, but more false negatives) (default 8)
-e value
image extensions (with dots) to look for (default .jpg,.jpeg,.png,.gif,.webp)
-g do not merge groups if some of the items are the same (default will merge)
-i string
directory to search (recursively) for duplicates, when set to - can take list of images
to compare from stdin (default "-")
-j output duplicates as json instead of standard flat list
-v show time it took to complete key parts of the search
Examples
find and list duplicates in ~/Pictures
lsidups -i ~/Pictures > dups.txt
dups.txt
/home/username/Pictures/image1.jpg
/home/username/Pictures/dir/image1.jpg
/home/username/Pictures/wdwd720p.jpg
/home/username/Pictures/wdwd1080p.jpg
/home/username/Pictures/wdwd1440p.jpg
# ...
you could also export json
lsidups -j -i ~/Pictures > dups.json
dups.txt
[
[
"/home/username/Pictures/image1.jpg",
"/home/username/Pictures/dir/image1.jpg"
],
[
"/home/username/Pictures/wdwd720p.jpg",
"/home/username/Pictures/wdwd1080p.jpg",
"/home/username/Pictures/wdwd1440p.jpg"
]
]
// ...
you can then sort images in groups e.g. by file size with sortsize.py, see examples directory for more
sortsize.py < dups.json > dups.txt
or compare just selected (e.g. with fd) images
fd 'mashu' -e png --changed-within 2weeks ~/Pictures > yourlist.txt
lsidups < yourlist.txt > dups.txt
then process them in any image viewer that can read stdin (nsxiv, imv)
nsxiv -io < dups.txt
or
imv < dups.txt
Both of them allow you to map shell commands to keys, so the possibilities are endless. E.g. you could macgyver some dmenu/fzf based mover, use trash-cli for deletion, etc.
Or a more complex example - find images present in folderA
, but not in folderB
:
comm -23 \
<(fd -t f -e png -e jpeg -e jpg -e webp . ~/pics/folderA | sort) \
<(fd -t f -e png -e jpeg -e jpg -e webp . ~/pics/folderA ~/folderB | lsidups -c | sort)
Also it is worth noting that lsidups merges groups if some of their items are the same. I think it makes sense from the user perspective, but the resulting group might contain images that are not all actually similar with each other.
Let's say we have 3 images: 1.png, 2.png, 3.png.
Hashes of 1 and 2 are similar enough to be considered related, and 2 and 3 are also similar enough, but 1 and 3 are far apart enough to be considered different.
By default they will be grouped like [1.png 2.png 3.png].
If you want to get 2 groups: [1.png 2.png] and [2.png 3.png] - pass flag -g
.
Caching
If you're planning to run lsidups on the same directory multiple times - consider using cache to speed things up.
Note, that cache is stored in form of a hash table with pairs like *absoluteFilepath*: *imageProperties*, so you don't need to have different caches for different directories, because irrelevant images will be just filtered out, and new will be added to cache at the end of the run.
It is also smart enough to not use image from cache if it appears to has changed.
check for default cache file location on your system
lsidups -h
run with caching enabled
lsidups -c -i ~/Pictures > dups.txt
store cache file in the custom location (directories will be created for you if necessary)
lsidups -c -cache-path ~/where/to/store/cache.gob -i ~/Pictures > dups.txt
Cache from older versions might become invalid after upgrades.