undouble is a Python library to detect (near-)identical images. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and grouping images. A threshold of 0 will group images with an identical image hash. The results can easily be explored by the plotting functionality and images can be moved with the move functionality. When moving images, the image in the group with the largest resolution will be copied, and all other images are moved to the **undouble** subdirectory. ⭐️Star it if you like it⭐️
The following steps are taken in the undouble library:
- Read all images from the directory recursively with the specified extensions.
- Compute image hash.
- Group similar images.
- Automatically organize the images in your folder if desired.
- Read the blog to get a structured overview of how to detect duplicate images using image hash functions.
On the documentation pages you can find detailed information about the working of the undouble with many examples.
conda create -n env_undouble python=3.8
conda activate env_undoublepip install undouble # new install
pip install -U undouble # update to latest versionpip install git+https://github.com/erdogant/undoublefrom undouble import Undouble# -------------------------------------------------
# >You are at the point of physically moving files.
# -------------------------------------------------
# >[7] similar images are detected over [3] groups.
# >[4] images will be moved to the [undouble] subdirectory.
# >[3] images will be copied to the [undouble] subdirectory.
# >[C]ontinue moving all files.
# >[W]ait in each directory.
# >[Q]uit
# >Answer: wThe input can be the following three types:
* Path to directory
* List of file locations
* Numpy array containing images
Please cite in your publications if this is useful for your research (see citation).
- Erdogan Taskesen, github: erdogant
- All kinds of contributions are welcome!
- If you wish to buy me a Coffee for this work, it is very appreciated :)
See LICENSE for details.
