- Summary
- What's New
- Features
- Usage
- Usage Examples
- Configuration Options
- Advanced Features
- Performance
- Installation
- Building
- Supported Formats
This script scans a source directory (recursively) for files (all types by default, or specified extensions), extracts their creation date (preferably from EXIF metadata, or falls back to the file system date), and copies or moves them into subfolders in a destination directory, organized by date (YYYY_MM_DD).
Key features: Comprehensive SHA-256 duplicate detection, intelligent conflict resolution, and flexible duplicate handling modes.
- 📊 Enhanced Log Headers: Professional session headers in log files with clear version information
- 🎯 Improved Readability: Structured log formatting with visual separators between sessions
- 📝 Better Tracking: Each log session now clearly shows program version and timestamps
- 🧠 Smart Master Selection: Automatically determines the best "master" file among duplicates based on:
- ✅ No duplicate keywords ("copy", "duplicate", "(1)", etc.) - highest priority
- ✅ Shortest filename - simpler names are typically originals
- ✅ Oldest creation/modification date - earlier files are likely originals
- 🔄 Automatic File Demotion: When incoming file is better master, existing files are automatically demoted
- 🛡️ Master Protection: Master files are protected from being overwritten by inferior duplicates
- 🔍 Comprehensive Logging: Detailed logs of master selection criteria, conflict reasons, and demotion actions
- ⚡ Intelligent Conflict Resolution: Handles complex scenarios with multiple duplicates
- 📌 Version Visibility: Version displays in help output and when run without arguments
- 🐛 Bug Fixes: Fixed version output duplication issues
- 🎯 Enhanced User Experience: Improved version tracking throughout the interface
- 🎯 All File Types by Default: No need to specify extensions - processes all supported formats automatically
- 📖 Enhanced Help System: Comprehensive help text with detailed explanations and real-world examples
- 🔧 Better User Experience: Streamlined workflow for both beginners and power users
A common use case might be to move them from a mobile device into archive folders, or to reorganize archives.
It will prefer to use the EXIF date in the file. If not present it will skip file unless the flag -x no (do not skip files without EXIF date) is passed in which case it will use file system creation date. By default, it performs comprehensive duplicate detection using SHA-256 hashing to prevent storing identical files. All operations are logged into a text file saved into the target directory.
Note this is a major rewrite of the upstream project skorokithakis/photocopy and this code is not downstreamed from it any longer.
- 🌟 Processes ALL file types by default - no extension specification needed (supports 33+ formats via hachoir)
- 📁 Smart file filtering - optionally specify extensions when targeted processing is needed
- Recursively processes all subfolders in the source directory.
- Uses EXIF metadata for creation date if available; otherwise, uses the file system's modification date.
- Can skip, only process, or fallback to file system date for files without EXIF metadata (configurable).
- Optionally moves files instead of copying.
- 🔍 Dry run mode with detailed preview of all operations before execution.
- Progress reporting and comprehensive logging to events.log in destination directory.
- 📖 Comprehensive help system with detailed explanations and real-world examples.
- Robust error handling for file operations, directory creation, and metadata extraction.
- 🎛️ User-friendly command-line interface with extensive help and examples.
- Uses pathlib for modern, robust path handling.
- 🧠 Intelligent Master File Selection (NEW in v2.0.0):
- Automatic master identification: Determines best file to keep based on multiple criteria
- Priority ranking: No duplicate keywords > Shortest filename > Oldest date
- Keyword detection: Recognizes "copy", "duplicate", "(1)", "_copy", and international variations
- Smart demotion: Automatically moves inferior files when better master arrives
- Master protection: Prevents accidental overwriting of master files
- Comprehensive logging: Details master selection decisions and criteria
- Comprehensive SHA256 checking: By default, checks each incoming file against ALL existing files in target directory (not just filename conflicts)
- Content-based detection: Uses SHA-256 hashing to detect truly identical files regardless of filename or location
- Hash caching: Builds and maintains an in-memory hash database of target files for efficient duplicate detection
- Multiple duplicate handling modes:
skip(default) - Skip if filename exists or identical content found anywhereoverwrite- Master-aware: protects master files, renames inferior duplicatesrename- Add numeric suffix to duplicates (e.g.,photo_001.jpg)content- Compare file hashes; skip identical content, rename different contentinteractive- Prompt user for each duplicate with full contextredirect- Move duplicates to separate directory with intelligent renaming
- Intelligent duplicate redirection: Configurable directory and keyword for duplicate organization
- Performance control: Use
-Nflag to disable comprehensive checking for large target directories - Smart conflict resolution: Automatically generates unique filenames when needed
From the packaged .exe. But the script is the same code.
C:\Users\user\Github\orgphoto\output>op.exe -h
usage: op.py [-h] [-m | -c] [-j EXT] [-v] [-x {yes,no,fs}] [-d]
[-D DUPLICATE_HANDLING] [-N] [-R DIR] [-K WORD] [--examples]
[--version]
SOURCE_DIR DEST_DIR
Organize files by date with comprehensive duplicate detection
positional arguments:
SOURCE_DIR Source directory containing images/videos to organize
DEST_DIR Destination directory where organized files will be placed
options:
-h, --help show this help message and exit
-m, --move Move files (cannot be used with --copy)
-c, --copy Copy files (cannot be used with --move)
-j, --extensions EXT Extension list - comma separated [default: jpeg,jpg]. Supports all extensions of hachoir
-v, --verbose Talk more
-x, --exifOnly {yes,no,fs}
'yes': skip files with no EXIF, 'no': process all files (fallback to filesystem date), 'fs': only process
files with no EXIF [default: yes]
-d, --dryrun Dry run mode: simulate actions, do not move/copy files
-D, --duplicate-handling DUPLICATE_HANDLING
How to handle duplicates: skip, overwrite, rename, content, interactive, redirect [default: skip]
-N, --no-comprehensive-check
Disable comprehensive SHA256 checking for better performance
-R, --redirect-dir DIR
Directory for redirected duplicates [default: Duplicates]
-K, --duplicate-keyword WORD
Keyword for duplicate filenames [default: duplicate]
--examples Show usage examples and exit
--version Show program version and exit
If neither --move nor --copy is specified, the script will prompt to run in dryrun mode simulating moving files.
Note: Version information displays when running without arguments. Use --version to see the version number.-
Process ALL file types (new default behavior - no extension filtering required):
python op.py -c Z:\photosync target/ -
Move JPG files only (specify extensions when filtering needed):
python op.py -m -j jpg Z:\photosync target/ -
Copy various file types, using file system date if EXIF is missing:
python op.py -c -x no -j gif,png,jpg,mov,mp4 Z:\photosync target/ -
Dry run: Simulate moving files without making changes:
python op.py -m -d -j jpg Z:\photosync target/ -
Process only files without EXIF data (using file system date):
python op.py -c -x fs -j jpg Z:\photosync target/ -
Move PNG and JPEG files with verbose logging:
python op.py -m -v -j png,jpeg Z:\photosync target/
-
Content-based duplicate detection (skip identical, rename different):
python op.py -c -D content -j jpg Z:\photosync target/This compares SHA-256 hashes to detect truly identical files regardless of filename
-
Content-based with custom keyword for different files:
python op.py -c -D content -K version -j jpg Z:\photosync target/Different content with same filename becomes: photo_version.jpg
-
Interactive duplicate handling (ask user for each conflict):
python op.py -m -D interactive -j jpg Z:\photosync target/Prompts user with options: Skip, Overwrite, Rename, or Redirect
-
Interactive mode with verbose context:
python op.py -m -D interactive -v -j jpg,png,heic Z:\photosync target/Provides detailed information about each duplicate for informed decisions
-
Always rename duplicates (never skip or overwrite):
python op.py -c -D rename -j jpg Z:\photosync target/Generates: photo.jpg → photo_duplicate.jpg → photo_duplicate_001.jpg
-
Rename with custom keyword:
python op.py -c -D rename -K copy -j jpg Z:\photosync target/Generates: photo.jpg → photo_copy.jpg → photo_copy_001.jpg
-
Redirect duplicates to separate directory:
python op.py -c -D redirect -j jpg Z:\photosync target/Creates: target/Duplicates/YYYY_MM_DD/filename_duplicate.jpg
-
Redirect with custom directory and keyword:
python op.py -c -D redirect -R MyDuplicates -K copy -j jpg Z:\photosync target/Creates: target/MyDuplicates/YYYY_MM_DD/filename_copy.jpg
-
Redirect to absolute path:
python op.py -c -D redirect -R /backup/duplicates -j jpg Z:\photosync target/Creates: /backup/duplicates/YYYY_MM_DD/filename_duplicate.jpg
-
Redirect with dry run to see what would happen:
python op.py -c -d -D redirect -R TestDupes -j jpg Z:\photosync target/Shows redirect actions in log without making changes
-
Overwrite all duplicates (replace existing files):
python op.py -m -D overwrite -j jpg Z:\photosync target/Warning: This will replace existing files without backup
-
Overwrite with verbose logging for audit trail:
python op.py -c -v -D overwrite -j jpg,png Z:\photosync target/
-
Disable comprehensive checking for large target directories:
python op.py -c -N -j jpg Z:\photosync target/Skips SHA-256 hashing of existing files for faster processing
-
Fast mode: disable comprehensive checking + rename duplicates:
python op.py -c -N -D rename -j jpg Z:\photosync target/Fastest processing - only checks filename conflicts
-
Performance mode with redirect:
python op.py -c -N -D redirect -R FastDupes -j jpg Z:\photosync target/
-
Content-based detection with verbose logging:
python op.py -m -v -D content -j png,jpeg Z:\photosync target/ -
Process files without EXIF, redirect duplicates:
python op.py -c -x fs -D redirect -R DuplicatesNoExif -j jpg Z:\photosync target/ -
Multi-format processing with custom duplicate handling:
python op.py -c -x no -D content -K backup -j jpg,png,gif,heic,mov,mp4 Z:\photosync target/ -
Maximum safety mode (comprehensive + interactive):
python op.py -c -D interactive -v -j jpg,png,heic,mov Z:\photosync target/
-
Mobile device photo import with comprehensive deduplication:
python op.py -c -x no -D content -j jpg,png,heic,mov /sdcard/DCIM target/photos/
-
Archive consolidation with duplicate redirection:
python op.py -c -D redirect -R Archive/Duplicates -j jpg,png,gif,tiff old_archive/ consolidated_archive/
-
Large photo library processing (performance optimized):
python op.py -c -N -D rename -K alt -j jpg,png,heic source/ target/
-
Cautious migration with dry-run and interactive:
python op.py -d -D interactive -v -j jpg,png,heic,mov source/ target/
-
UV with comprehensive duplicate detection:
uv run op.py -c -D content -j jpg source/ target/
-
UV with redirect and custom settings:
uv run op.py -c -D redirect -R MyDupes -K copy -j jpg,heic source/ target/
If neither -m nor -c is specified, the script will prompt to run in dryrun mode simulating moving files.
Short Flag Reference:
-m= move,-c= copy,-d= dry run,-v= verbose-j= extensions,-x= EXIF handling-D= duplicate handling,-N= disable comprehensive check-R= redirect directory,-K= duplicate keyword
See python op.py --help or python op.py --examples for all options.
orgphoto provides six different modes for handling duplicate files, each optimized for different use cases:
Behavior: Skip files if filename exists OR identical content found anywhere in target
- Use case: Safest option, avoids any duplicate content
- Performance: Medium (requires comprehensive hash checking)
- Output: Logs "skipped - duplicate detected"
python op.py -c -D skip -j jpg source/ target/Behavior: Skip if identical content exists, rename if same filename but different content
- Content identical: Skip processing (logged as "skipped - identical content")
- Filename conflict, different content: Rename with suffix
- Use case: Preserve all unique content while avoiding true duplicates
- Performance: Medium (requires comprehensive hash checking)
python op.py -c -D content -j jpg source/ target/
# photo.jpg with same content → skipped
# photo.jpg with different content → photo_duplicate.jpgBehavior: Always rename duplicates, never skip or overwrite
- Filename exists: Add suffix
_duplicate(or custom keyword) - Multiple conflicts: Incremental numbering
_duplicate_001,_duplicate_002 - Use case: Preserve all files, never lose anything
- Performance: Fast (can work without comprehensive checking)
python op.py -c -D rename -K backup -j jpg source/ target/
# photo.jpg → photo_backup.jpg → photo_backup_001.jpgBehavior: Replace existing files without confirmation
- Filename exists: Overwrite the existing file
- Content duplicates: Still overwrite (if comprehensive checking enabled)
- Use case: Always use the newest version of files
⚠️ Warning: Data loss possible - existing files are replaced- Performance: Fast
python op.py -c -D overwrite -j jpg source/ target/Behavior: Prompt user for each duplicate with full context
- Shows: Filename conflicts, content duplicates, file sizes, dates
- Options: Skip, Overwrite, Rename, Redirect
- Use case: Maximum control, good for one-time migrations
- Performance: Depends on user interaction speed
python op.py -c -D interactive -v -j jpg source/ target/Interactive prompt example:
Duplicate detected for: photo.jpg
Target location: /target/2023_05_01/photo.jpg
Content duplicates found at:
- /target/2023_05_01/photo.jpg
Filename conflict at: /target/2023_05_01/photo.jpg
Choose action:
s) Skip this file
o) Overwrite existing file(s)
r) Rename with suffix
R) Redirect to duplicates directory
Your choice [s/o/r/R]:
Behavior: Move duplicates to separate directory structure
- Directory: Creates
Duplicates/(or custom with-R) - Structure: Maintains date organization in redirect location
- Naming: Uses intelligent suffix generation
- Use case: Keep organized but separate duplicate files
- Performance: Fast, minimal overhead
python op.py -c -D redirect -R MyDupes -K copy -j jpg source/ target/The redirect mode provides sophisticated duplicate management by moving duplicate files to a separate directory structure while maintaining organization and applying intelligent renaming.
When --duplicate-handling redirect (or -D redirect) is used:
- Directory Creation: Creates a redirect directory (default:
Duplicates/in target root) - Duplicate Detection: Uses comprehensive SHA-256 checking or filename-based detection
- Intelligent Redirection: Moves duplicates to redirect directory with smart renaming
- Organized Structure: Maintains date-based organization within redirect directory
target/
├── 2023_01_01/ # Main organized files
│ ├── photo1.jpg
│ └── photo2.jpg
├── 2023_01_02/
│ └── photo3.jpg
└── Duplicates/ # Redirect directory
├── 2023_01_01/
│ ├── photo1_duplicate.jpg # Duplicate of main photo1.jpg
│ └── photo1_duplicate_001.jpg # Another copy of photo1.jpg
└── 2023_01_02/
└── photo3_copy.jpg # Custom keyword example
| Option | Short | Default | Description |
|---|---|---|---|
--redirect-dir |
-R |
Duplicates |
Directory name for redirected duplicates |
--duplicate-keyword |
-K |
duplicate |
Keyword inserted in duplicate filenames |
Basic redirect usage:
python op.py -c -D redirect -j jpg source/ target/
# Creates: target/Duplicates/YYYY_MM_DD/filename_duplicate.jpgCustom redirect directory:
python op.py -c -D redirect -R Archive/Duplicates -j jpg source/ target/
# Creates: target/Archive/Duplicates/YYYY_MM_DD/filename_duplicate.jpgCustom duplicate keyword:
python op.py -c -D redirect -K copy -j jpg source/ target/
# Creates: target/Duplicates/YYYY_MM_DD/filename_copy.jpgAbsolute path redirect:
python op.py -c -D redirect -R /backup/duplicates -j jpg source/ target/
# Creates: /backup/duplicates/YYYY_MM_DD/filename_duplicate.jpgRedirect mode uses intelligent filename generation:
- Base duplicate name:
filename_duplicate.ext - If name exists:
filename_duplicate_001.ext - Multiple duplicates:
filename_duplicate_002.ext,filename_duplicate_003.ext, etc. - Custom keyword:
filename_copy.ext(with-K copy)
- Works with comprehensive checking: Detects true content duplicates via SHA-256
- Works with filename conflicts: Handles traditional duplicate scenarios
- Interactive mode support: User can choose redirect option when prompted
- Dry-run compatible: Shows what would be redirected without making changes
- Logging integration: Clear indication of redirect actions in log files
orgphoto automatically determines which file should be the "master" (definitive version) when duplicates are detected. This intelligent system ensures you keep the best quality original files while properly handling copies and duplicates.
When duplicates are detected, orgphoto evaluates ALL conflicting files (both incoming and existing) using a three-tier priority system:
Files WITHOUT duplicate keywords are strongly preferred as masters:
Detected keywords:
- Word-based:
copy,duplicate,version,backup,alt,alternative - International:
copie(French),kopie(German),copia(Spanish/Italian) - Numbered patterns at end:
(1),(2),_copy_1,_duplicate_001,2
Examples:
photo.jpg → NO keywords (score: 0) ✓ BEST
photo_copy.jpg → Has "copy" (score: 1)
vacation (1).jpg → Has "(1)" (score: 1)
sunset_duplicate.jpg → Has "duplicate" (score: 1)
Among files with same keyword status, shorter names are preferred (originals are typically shorter):
Examples:
photo.jpg → Length 9 ✓ BEST
photo_edited.jpg → Length 16
photo_edited_final.jpg → Length 22
If names are equally simple, older files are preferred (first created is typically original):
Examples:
photo.jpg (2023-01-15 10:30) → Older ✓ BEST
photo.jpg (2023-01-15 11:45) → Newer
photo.jpg (2023-01-16 09:00) → Newest
If the incoming file is determined to be the better master:
- Existing files are demoted - Moved according to duplicate handling mode
- Incoming file takes primary position - Placed in intended location
- Logged as promotion:
[PROMOTED TO MASTER]
Log example:
MASTER SELECTION: Chose photo.jpg as master (incoming)
Criteria: has_dup_keywords=False, name_length=9, date=2023-01-15 10:30:00
Non-masters (1): ['photo_copy.jpg']
MASTER PROMOTION: Incoming file photo.jpg is the better master
DEMOTION: photo_copy.jpg will be moved to duplicate location
DEMOTED: photo_copy.jpg -> Duplicates/photo_copy_duplicate.jpg
If an existing file is the better master:
- Master file is protected - Cannot be overwritten
- Incoming file follows duplicate mode - Skipped, renamed, or redirected
- Logged as retention:
[SKIPPED - not master]or[RENAMED - master protected]
Log example:
MASTER SELECTION: Chose photo.jpg as master (existing)
Criteria: has_dup_keywords=False, name_length=9, date=2023-01-15 10:30:00
Non-masters (1): ['photo copy.jpg']
MASTER RETAINED: Existing file photo.jpg remains as master
photo copy.jpg -> skipped - existing file is better master
All duplicate handling modes now respect master file selection:
| Mode | Master is Existing | Master is Incoming |
|---|---|---|
skip |
Skip incoming | Demote existing, place incoming |
overwrite |
Protect master, rename incoming | Demote existing, place incoming |
rename |
Rename incoming | Demote existing, place incoming |
content |
Check content, handle accordingly | Demote existing if different |
interactive |
Master indicated in prompt | Demote existing with user confirmation |
redirect |
Redirect incoming | Demote existing to redirect |
# Situation: Three archives with duplicates
archive1/vacation.jpg
archive2/vacation_copy.jpg
archive3/vacation (1).jpg
# Result: Master selection picks vacation.jpg (no keywords)
# Others demoted: vacation_copy_duplicate.jpg, vacation (1)_duplicate.jpg# Phone creates: IMG_1234.jpg (original)
# Computer backup: IMG_1234 (1).jpg (duplicate)
# Result: Original IMG_1234.jpg recognized as master
# Backup version demoted automatically# High quality original: photo.jpg (5MB, 2023-01-15)
# Lower quality copy: photo_compressed.jpg (1MB, 2023-01-16)
# Result: photo.jpg selected (shorter name, older)
# Compressed version handled as duplicateMaster selection is automatic and always enabled. No configuration needed, but behavior adapts to duplicate handling mode:
# Master selection with skip mode
python op.py -c -D skip -j jpg source/ target/
# Master selection with redirect (demoted files go to redirect dir)
python op.py -c -D redirect -R Duplicates -j jpg source/ target/
# Master selection with rename (demoted files renamed in place)
python op.py -c -D rename -K old -j jpg source/ target/- Intelligent organization: Best files automatically prioritized
- Prevents data loss: Never lose original files to inferior duplicates
- Automatic cleanup: Inferior duplicates properly categorized
- Audit trail: Comprehensive logging of all decisions
- Time saving: No manual sorting of duplicates needed
orgphoto's comprehensive duplicate detection goes beyond simple filename checking by using SHA-256 content hashing:
- Scans ALL existing files in target directory at startup
- Calculates SHA-256 hash for each file
- Builds in-memory database:
{hash: [file_path1, file_path2, ...]} - Stores file modification times for cache invalidation
For each source file:
1. Calculate SHA-256 hash
2. Check hash cache for matching content
3. Check filename conflicts in destination
4. Apply duplicate handling strategy
5. Update hash cache if file is processed
- Content duplicates: Same SHA-256 hash, any filename, anywhere in target
- Filename conflicts: Same filename in same date directory
- Both types: Can be detected simultaneously
# Example log output:
Building comprehensive hash cache of target directory...
Hash cache built: 15,432 files indexed, 14,891 unique hashes
# Shows: 541 files had duplicate content- Hash storage: ~64 bytes per hash (SHA-256)
- Path storage: Variable, ~100-200 bytes per file path
- Cache metadata: File modification times, ~16 bytes per file
- Total estimate: ~200-300 bytes per target file
- Cache building: ~500-2000 files/second (depends on storage speed)
- Hash calculation: ~50-200 MB/second (depends on CPU)
- Duplicate checking: Near-instant lookup in memory cache
By default, orgphoto performs comprehensive SHA-256 checking of each incoming file against ALL existing files in the target directory.
Benefits:
- True duplicate detection: Finds identical files regardless of filename or location
- Space efficiency: Prevents storing duplicate content under different names
- Data integrity: Ensures you're not losing unique content
- Cross-directory detection: Finds duplicates anywhere in target tree
Performance Impact:
- Startup time: 10-60 seconds for 10,000 files (depends on storage speed)
- Memory usage: ~200-300 bytes per target file for hash cache
- Processing time: Each incoming file hashed once (~10-50 MB/second)
- Disk I/O: One-time read of all target files during cache build
| Target Files | Cache Build Time | Memory Usage | Processing Speed |
|---|---|---|---|
| 1,000 files | 2-5 seconds | ~300 KB | 200-500 files/min |
| 10,000 files | 20-60 seconds | ~3 MB | 150-300 files/min |
| 50,000 files | 2-5 minutes | ~15 MB | 100-200 files/min |
| 100,000 files | 5-15 minutes | ~30 MB | 50-150 files/min |
Benchmarks vary by storage type (SSD vs HDD), network latency, and CPU speed
Use the -N flag to disable comprehensive checking when:
Large Target Directories:
- >50,000 files: Cache building becomes time-consuming
- Network storage: Reading all files over network is slow
- Limited memory: Cache may use significant RAM
Performance Priority:
- Frequent runs: Cache rebuilt each time (not persistent between runs)
- Fast import needed: Only care about filename conflicts
- Batch processing: Processing speed more important than deduplication
Use Cases for -N:
# Fast processing, only filename-based duplicate detection
python op.py -c -N -D rename -j jpg source/ target/
# Large target directory, redirect filename conflicts only
python op.py -c -N -D redirect -j jpg source/ huge_target/
# Speed-optimized batch import
python op.py -c -N -D overwrite -j jpg,png,heic batch_source/ target/# Maximum safety (comprehensive + interactive)
python op.py -c -D interactive -j jpg source/ target/
# Balanced (comprehensive + automatic handling)
python op.py -c -D content -j jpg source/ target/
# Speed optimized (filename-only + rename)
python op.py -c -N -D rename -j jpg source/ target/
# Maximum speed (filename-only + overwrite)
python op.py -c -N -D overwrite -j jpg source/ target/- Separate by date: Use different target directories for different time periods
- Archive old files: Move older files to separate directories
- Clean duplicates: Periodically clean up redirect directories
- SSD storage: 5-10x faster cache building than HDD
- More RAM: Allows larger caches without performance impact
- Faster CPU: Improves hash calculation speed
- Network storage: Consider local staging for large operations
| Mode | Requires Comprehensive | Speed | Memory | Safety |
|---|---|---|---|---|
skip |
Yes (default) | Medium | High | Maximum |
content |
Yes (recommended) | Medium | High | Maximum |
interactive |
Yes (optional) | Slow* | High | Maximum |
rename |
No (optional) | Fast | Low | High |
redirect |
No (optional) | Fast | Low | High |
overwrite |
No (optional) | Fastest | Low | Minimal |
*Interactive mode speed depends on user response time
- Initial import: Use comprehensive checking for first-time setup
- Regular updates: Consider
-Nfor frequent incremental updates - Archive consolidation: Use
contentmode for merging archives - Mobile import: Use default settings for safety
- Bulk processing: Use
-N -D renamefor speed - Migration projects: Use
interactivemode for control
The hash cache provides excellent performance for most use cases, typically processing 100-500 files per minute even with comprehensive checking enabled.
pip Just run:
1. Clone the repo, or just download `op.py`
2. pip install hachoir
3. Then execute the script using python as in # python op.py
1. Clone the repo (you'll also want the supporting files)
2. Make sure uv is installed. It will handle dependencies
3. Then execute the script using python as in # uv run op.py
This project supports building Windows executables using PyInstaller with proper dependency resolution:
# Recommended approach - ensures proper dependency resolution
uv run pyinstaller --noconfirm --onefile --console --collect-all hachoir --exclude-module hachoir.wx.tree_view --icon "doc/favicon.ico" "op.py"
# Alternative using existing spec file (after running uv sync)
uv run pyinstaller op.specWhy use uv run pyinstaller?
- Dependency Resolution:
uv runensures PyInstaller runs within the project's virtual environment wherehachoirand other dependencies are properly installed - Module Discovery: The
--collect-all hachoirflag tells PyInstaller to include all hachoir submodules and data files - Reliability: Avoids "ModuleNotFoundError" issues that occur when PyInstaller can't find project dependencies
- Consistency: Uses the same dependency versions as your development environment
# Using auto-py-to-exe with existing config (may have dependency issues)
auto-py-to-exe op/pyinstallerconfig.json
# Manual PyInstaller command (not recommended - missing dependencies)
pyinstaller --noconfirm --onefile --console --icon "doc/favicon.ico" "op.py"Note: Legacy methods may fail with ModuleNotFoundError: No module named 'hachoir' because they don't properly resolve dependencies from the uv environment.
Here's an example of running the built .exe in Windows, where op.exe is asked to Move all files even if No eXif data is found (ahem heic files), move files of extensions (case-insensitive) jpg,png,jpeg,heic,mov, from src1 (and its sub directories) to target into folders by date:
# With comprehensive duplicate detection (default)
op.exe -m -x no -j jpg,png,jpeg,heic,mov C:\Users\user\Github\orgphoto\testing\src1 C:\Users\user\Github\orgphoto\testing\target
# Redirect duplicates to separate folder
op.exe -m -x no -D redirect -j jpg,png,jpeg,heic,mov C:\Users\user\Github\orgphoto\testing\src1 C:\Users\user\Github\orgphoto\testing\target
# Custom redirect directory and duplicate keyword
op.exe -m -x no -D redirect -R Archive\Duplicates -K copy -j jpg,png,jpeg,heic,mov C:\Users\user\Github\orgphoto\testing\src1 C:\Users\user\Github\orgphoto\testing\target
# For faster processing on large target directories
op.exe -m -x no -N -D rename -j jpg,png,jpeg,heic,mov C:\Users\user\Github\orgphoto\testing\src1 C:\Users\user\Github\orgphoto\testing\targetThis version of orgphoto (op) uses the https://pypi.org/project/hachoir/ software to extract EXIF metadata. Hachoir supports the following file formats as of version 3.3.0 in November 2024.
Total: 33 file formats, from https://hachoir.readthedocs.io/en/latest/metadata.html#supported-file-formats
- bzip2: bzip2 archive
- cab: Microsoft Cabinet archive
- gzip: gzip archive
- mar: Microsoft Archive
- tar: TAR archive
- zip: ZIP archive
- aiff: Audio Interchange File Format (AIFF)
- mpeg_audio: MPEG audio version 1, 2, 2.5
- real_audio: Real audio (.ra)
- sun_next_snd: Sun/NeXT audio
- matroska: Matroska multimedia container
- ogg: Ogg multimedia container
- real_media: !RealMedia (rm) Container File
- riff: Microsoft RIFF container
- bmp: Microsoft bitmap (BMP) picture
- gif: GIF picture
- ico: Microsoft Windows icon or cursor
- jpeg: JPEG picture
- pcx: PC Paintbrush (PCX) picture
- png: Portable Network Graphics (PNG) picture
- psd: Photoshop (PSD) picture
- targa: Truevision Targa Graphic (TGA)
- tiff: TIFF picture
- wmf: Microsoft Windows Metafile (WMF)
- xcf: Gimp (XCF) picture
- ole2: Microsoft Office document
- pcf: X11 Portable Compiled Font (pcf)
- torrent: Torrent metainfo file
- ttf: !TrueType font
- exe: Microsoft Windows Portable Executable
- asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio)
- flv: Macromedia Flash video
- mov: Apple !QuickTime movie