A small Rust CLI that walks a directory in parallel, groups files by type, flags stale ones, finds duplicates (hardlink-aware), and renders a color-coded summary in the terminal. Cross-platform: Linux, macOS, Windows.
bigfiles-demo.mp4
- Interactive TUI (
bigfiles tui) — ncdu-style directory browser with arrow-key navigation,/filter,oto reveal in OS file manager,dto send to Trash,Dfor dupes in the current subtree,rto re-scan - Quick audit (
bigfiles audit) — severity-coded "what's eating your disk" insights in one screen bigfiles top— flat list of the N largest files, no category grouping- Safe by default:
deleteanddupes --deletesend to Trash by default;--forceopts into permanent deletion - Walks a directory tree in parallel and collects file sizes, extensions, and modified timestamps
- Respects
.gitignoreand.ignorefiles by default (use--no-ignoreto disable) - Skips symlinks (no double-counting, no follow-link footguns)
- Groups files into categories: video, images, archives, audio, documents, code, junk, other
- Flags files not modified in the last N years as stale
- Renders a color-coded table with size bars, optionally with the largest files per category
- Sortable category table (
--sort size|count|stale-size|stale-count|name,--reverse) - Finds duplicate files by content hash with parallel BLAKE3 hashing, hardlink awareness, and a persistent on-disk cache so re-scans are near-instant
- Interactively deletes stale files or duplicate copies with explicit confirmation
- Emits JSON for piping into other tools
- Colorized
--helpoutput via clap styles
crates.io (requires Rust via rustup)
cargo install bigfilesTo upgrade:
cargo install bigfiles --forceDownload from the releases page for Linux (x86_64, aarch64), macOS (Intel, Apple Silicon), and Windows (x86_64). Extract and move bigfiles (or bigfiles.exe) onto your $PATH.
From source:
git clone https://github.com/Par-python/bigfiles
cd bigfiles
cargo install --path .# Scan current directory
bigfiles
# Scan a specific path
bigfiles ~/Downloads
# Skip hidden files and dirs, only descend 3 levels
bigfiles ~ --skip-hidden --depth 3
# Show the 5 largest files per category alongside the summary
bigfiles ~/Downloads --top 5
# Exclude paths via glob (repeatable)
bigfiles ~ --exclude 'node_modules' --exclude '*.log' --exclude 'target'
# Don't respect .gitignore / .ignore
bigfiles ~/some-project --no-ignore
# Treat anything not modified in 5+ years as stale (default: 2)
bigfiles ~/Documents --stale-years 5
# Pipe JSON into jq (envelope: { version, root, total_size, skipped, categories })
bigfiles ~/Movies --json | jq '.categories[] | select(.stale_size > 1000000000)'
# Sort the breakdown by file count instead of size; reverse for smallest-first
bigfiles ~ --sort count
bigfiles ~ --sort size --reverse
# Quick "what's eating my disk" insights view
bigfiles audit ~
# Show just the 10 largest files anywhere under the path
bigfiles top ~/Downloads -n 10
# Send stale files to Trash (default), or permanently delete with --force
bigfiles delete ~/Downloads
bigfiles delete ~/Downloads --forceBy default bigfiles uses the same ignore crate that ripgrep uses, so .gitignore, .ignore, and global git excludes are respected automatically. Scanning a Rust project? target/ is skipped. Node project? node_modules is skipped. No flag needed.
Use --no-ignore to walk everything regardless.
bigfiles tui <path> opens a full-screen ncdu-style directory browser. Sizes are aggregated per directory; the largest entries float to the top.
bigfiles tui ~Keys:
↑/↓orj/k— moveEnteror→— descend into directory←orBackspace— go up/— filter children by substring (Esccancels,Enterkeeps)o— reveal selected entry in your OS file manager (open -Ron macOS,explorer /select,on Windows,xdg-openon Linux)d— send the selected file or directory to Trash (yellow confirm bar appears;y/Enter confirms, any other key cancels). Trash-only in the TUI for safety. For permanent delete, usebigfiles delete --forceorbigfiles dupes --delete --forcefrom the CLI.D— open a duplicate-detection popup scoped to the currently-highlighted directory. Uses the persistent hash cache, so repeat runs over the same subtree are near-instant.j/kor PgUp/PgDn to scroll,Esc/qto close.r— re-run the scan from disk (the TUI exits briefly, shows the spinner, then re-enters with fresh data)q/Esc— quit?— toggle help
bigfiles audit <path> runs a normal scan, then prints a short list of severity-coded insights about what's eating your disk — heaviest category, top extensions, installer-junk total (.dmg/.pkg/.iso/.exe/.msi/.deb/.rpm), top-N-file concentration, and the share of stale data. Useful as a first-run "where do I start?" view.
bigfiles audit ~Insights are bulleted by severity: red ! for heavy (≥40% of total), yellow • for notable (≥20%), dimmed · for informational. Respects --stale-years and all global filters (--skip-hidden, --exclude, --depth, etc.).
bigfiles top <path> prints a flat list of the N largest files under the given path, sorted by size descending. No category grouping, no bars, no stale flags — just the biggest files. Pairs well with | head, | grep, or pipelines.
# Default: top 20
bigfiles top ~/Downloads
# Top 5
bigfiles top ~/Movies -n 5
# Pipe into other tools
bigfiles top ~ -n 100 | grep '\.mp4'Respects all global filters (--skip-hidden, --exclude, --depth, --no-ignore).
bigfiles dupes finds files with identical content. It uses a fast three-stage check, parallelized with rayon:
- Group by size
- Hash first/last 4 KB (
partial_hash) - Full BLAKE3 hash on remaining candidates
Hardlinks are collapsed by inode before hashing, so multiple paths pointing to the same on-disk file are reported as a single entry (and don't inflate "reclaimable" numbers). When a duplicate group includes hardlinks, the additional paths are shown indented under the primary path.
# Find dupes >= 1 MB in Downloads
bigfiles dupes ~/Downloads --min-size 1048576
# Default min-size is 1 KB; tune as needed
bigfiles dupes ~/Documents --min-size 1bigfiles dupes --delete walks each duplicate group and lets you pick which copy to keep; the rest are queued for removal. After all groups are processed, you get a summary and a y/N confirm before any file is touched.
By default, removed copies are moved to your OS Trash (recoverable). Pair with --force to delete permanently.
# Default: moves duplicate copies to Trash
bigfiles dupes ~/Downloads --delete
# Permanent removal (not recoverable)
bigfiles dupes ~/Downloads --delete --forceSafety guarantees:
- Per-group single-choice picker — you can only remove by not picking one to keep
- Every group offers a "skip — keep all" option;
Escalso skips - Always keeps ≥1 copy per group (it's structurally impossible to empty a group)
- No removal happens until the final
y/Nconfirm; default is No - Files are re-stat'd immediately before removal; non-regular files (symlinks, sockets, devices) are refused
- Trash by default: moved copies can be restored from your OS Trash unless you pass
--force - If Trash is unavailable (e.g. on some network mounts), the operation refuses and points you at
--force
Note that dupes are only ever paired within the scan root. If two copies live in separate trees, scan a common parent.
bigfiles dupes caches full-file BLAKE3 hashes to your OS cache directory so subsequent runs over the same tree are near-instant.
- Location:
~/Library/Caches/bigfiles/hashes.json(macOS),$XDG_CACHE_HOME/bigfiles/hashes.json(Linux),%LOCALAPPDATA%\bigfiles\Cache\hashes.json(Windows) - Cache key:
(path, mtime, size)— any change invalidates the entry, forcing a re-hash. Tagged with the hash algorithm so the cache survives version bumps. - Pruning: entries for paths that no longer exist are dropped on the next save.
--no-cacheruns without touching the cache (no read, no write).--clear-cachedeletes the cache file before running.
In local testing, the warm cache is roughly 40× faster than a cold first run on the same tree.
bigfiles delete shows you every file older than --stale-years (default 2) in an interactive checklist. You tick which ones to remove, see a confirmation summary, and only then are files touched.
By default, selected files are moved to your OS Trash (recoverable). Pair with --force to delete permanently.
# Default: moves stale files to Trash
bigfiles delete ~/Downloads --stale-years 3
# Permanent removal
bigfiles delete ~/Downloads --stale-years 3 --forceThe flow: list → tick boxes (Space) → Enter → review summary → type y to confirm. Hit Ctrl-C any time to bail. If Trash is unavailable, the operation refuses and points you at --force.
| Flag | Default | Description |
|---|---|---|
<PATH> |
. |
Directory to scan |
-s, --stale-years <N> |
2 |
Flag files not modified in this many years as stale |
-H, --skip-hidden |
off | Skip dotfiles and dot-directories |
-d, --depth <N> |
unlimited | Limit traversal depth (1 = only files directly in root) |
--no-ignore |
off | Do not respect .gitignore / .ignore files |
--no-pager |
off | Don't auto-page output through $PAGER |
-e, --exclude <GLOB> |
none | Skip files/dirs matching this glob; repeatable |
--units <STYLE> |
default |
Byte unit style: default (1024, KB/MB), iec (1024, KiB/MiB), si (1000, KB/MB) |
--color <WHEN> |
auto |
Color output: auto, always, never. Also respects NO_COLOR. |
-t, --top <N> |
off | Show N largest files per category (default scan only) |
-j, --json |
off | Emit raw JSON (default scan only) |
--sort <KEY> |
size |
Sort categories by size, count, stale-size, stale-count, or name (default scan only) |
--reverse |
off | Reverse the sort order (default scan only) |
| Flag | Default | Description |
|---|---|---|
--min-size <BYTES> |
1024 |
Minimum file size to consider |
--delete |
off | Interactively remove duplicate copies (keep one per group). Moves to Trash by default. |
--force |
off | When paired with --delete, permanently deletes instead of moving to Trash. Cannot be undone. |
--no-cache |
off | Skip the persistent hash cache for this run |
--clear-cache |
off | Delete the persistent hash cache before running |
| Flag | Default | Description |
|---|---|---|
--force |
off | Permanently delete selected files instead of moving them to Trash. Cannot be undone. |
| Flag | Default | Description |
|---|---|---|
-n, --n <N> |
20 |
Number of largest files to show |
When stdout is a real terminal, bigfiles auto-pages output through $PAGER (default less -FRX) — same UX as git log. Short output passes through instantly thanks to -F; long output (e.g. bigfiles ~ --top 20) opens scrollable. Use arrow keys / / to search / q to quit.
The pager is automatically skipped when:
- output is piped (
bigfiles ... | jqworks as expected) --jsonis set- the
deletesubcommand is running (interactive) --no-pageris passed
bigfiles 8.18 GB /Users/you/Downloads
category size files stale
────────────────────────────────────────────────────────────────────────
video 3.30 GB ██████████ 45
archives 2.81 GB ████████ 44
documents 1.23 GB ███ 362
audio 410.3 MB █ 29
images 326.9 MB 300 ⚠ 91.9 MB (12 files)
other 115.5 MB 358 ⚠ 2.5 MB (302 files)
code 721.3 KB 25 ⚠ 26.0 KB (14 files)
bigfiles uses the file's modified time (mtime), not access time. Many filesystems disable access-time updates by default (Linux noatime, modern macOS volumes), so atime is unreliable for staleness. mtime is updated whenever a file's contents change, which is a better signal for "this file is forgotten."
src/
main.rs # CLI entry, subcommand dispatch, clap styles
walker.rs # Parallel directory traversal, file collection, inode capture
classifier.rs # Extension → category mapping
analyzer.rs # Grouping, sorting, stale detection
renderer.rs # Default scan output
dupes.rs # Duplicate detection (parallel, hardlink-aware) + interactive delete
delete.rs # Interactive stale-file deletion
format.rs # Shared byte-size formatter
- Linux & macOS: full feature set, including hardlink-aware dupe detection and pager auto-launch.
- Windows: builds and runs cleanly via
cargo build --release(CI coverswindows-latest). Two caveats:- Pager is disabled on Windows (there's no portable
less). Output prints straight to stdout — pipe tomoreor use Windows Terminal's scrollback. The--no-pagerflag is a no-op there. - Hardlink detection is currently inactive — the inode/file-index API is nightly-only on
std. Dupe detection still works, but hardlinks are treated as separate entries instead of being collapsed.
- Pager is disabled on Windows (there's no portable
bigfiles uses a parallel directory walker (the ignore crate, same engine as ripgrep) which is fast and portable across macOS, Linux, and Windows. It is not the theoretical fastest approach on every platform:
- Windows / NTFS: Reading the Master File Table (MFT) directly enumerates every file on a volume in one sequential pass. Tools like
everything.exeuse this. bigfiles does not, yet. See #1. - macOS / APFS: The volume catalog can be read in bulk via
getattrlistbulk, which is faster thanreaddiron large trees. bigfiles does not exploit this. See #2. - Linux (ext4 / btrfs / xfs): No comparable shortcut. Standard parallel walking is close to optimal.
If you have benchmarks against fd, dust, fclones, or other scanners, open an issue. Honest numbers are welcome.
A common suggestion is to swap the ignore-based walker for jwalk for higher throughput. The numbers don't support it for bigfiles' workload. Run with cargo bench --bench walker_bench.
| Tree shape | ignore (gitignore on) |
ignore (gitignore off) |
jwalk |
|---|---|---|---|
| Shallow-wide (10k files in one dir) | 11.0 ms | 7.2 ms | 7.4 ms |
| Deep-narrow (50 levels × 1 file) | 1.73 ms | 1.02 ms | 1.43 ms |
Realistic (src + ignored node_modules) |
0.34 ms | 9.7 ms | 4.6 ms |
Takeaways:
- With gitignore parsing disabled,
ignoreandjwalkare effectively tied.ignoreactually wins the deep-narrow case. - The realistic workload is where
ignorepulls ahead by ~28×: a.gitignoreexcludingnode_moduleslets the walker skip ~5,000 files without ever callingstat.jwalkdoesn't have gitignore support out of the box, so it walks everything. - For bigfiles' actual users (dev machines with
node_modules,target/,.venv/), the gitignore-aware skip wins by a margin no raw-throughput improvement could close.
Measured on macOS (Apple Silicon) with Criterion, 100 samples each. Numbers are wall-clock per iteration.
- Removal is to Trash by default for both
deleteanddupes --delete. Pair with--forcefor permanent deletion (cannot be undone). The TUI is Trash-only — use the CLI with--forceif you need permanent removal. - If Trash is unavailable (some network mounts, certain restricted environments), the operation refuses cleanly and asks you to re-run with
--force. bigfiles will not silently fall back to permanent deletion. - Dupe pairing is relative to the scan root. If two copies live in separate trees (e.g.
~/A/fileand~/B/file), runningbigfiles ~/A dupeswon't find them. Scan a common parent. --top,--json,--sort,--reverseonly apply to the default scan. They're accepted but ignored underdupes/delete/audit/top/tui(a stderr note is printed).- Symlinks are skipped entirely. If you rely on symlink farms for organization, walking through them isn't supported — point bigfiles at the real paths.
- Per-directory breakdown ("top 10 heaviest subdirectories")
--watchmode that re-scans on an interval- A full TUI with
ratatui(expand/collapse categories, arrow-key navigation) - Persistent index in
~/.cache/bigfiles/to diff scans over time - Replace dupes with hardlinks (
--linkmode) instead of deleting
Starting with 1.0, the CLI surface and JSON schema follow semver:
- CLI flags: removing a flag, changing its short form, or changing default behavior requires a major version bump. New flags are minor.
- JSON output: the
"version": 1envelope is stable. Breaking changes ship as"version": 2. Adding new fields is minor. - Exit codes:
0success,1runtime error,2usage error. - Internal Rust API: not stable. Use the binary, not the library crate.
AGPL-3.0-or-later — see LICENSE.