bigfiles

A small Rust CLI that walks a directory in parallel, groups files by type, flags stale ones, finds duplicates (hardlink-aware), and renders a color-coded summary in the terminal. Cross-platform: Linux, macOS, Windows.

bigfiles-demo.mp4

What it does

Interactive TUI (bigfiles tui) — ncdu-style directory browser with arrow-key navigation, / filter, o to reveal in OS file manager, d to send to Trash, D for dupes in the current subtree, r to re-scan
Quick audit (bigfiles audit) — severity-coded "what's eating your disk" insights in one screen
bigfiles top — flat list of the N largest files, no category grouping
Safe by default: delete and dupes --delete send to Trash by default; --force opts into permanent deletion
Walks a directory tree in parallel and collects file sizes, extensions, and modified timestamps
Respects .gitignore and .ignore files by default (use --no-ignore to disable)
Skips symlinks (no double-counting, no follow-link footguns)
Groups files into categories: video, images, archives, audio, documents, code, junk, other
Flags files not modified in the last N years as stale
Renders a color-coded table with size bars, optionally with the largest files per category
Sortable category table (--sort size|count|stale-size|stale-count|name, --reverse)
Finds duplicate files by content hash with parallel BLAKE3 hashing, hardlink awareness, and a persistent on-disk cache so re-scans are near-instant
Interactively deletes stale files or duplicate copies with explicit confirmation
Emits JSON for piping into other tools
Colorized --help output via clap styles

Install

crates.io (requires Rust via rustup)

cargo install bigfiles

To upgrade:

cargo install bigfiles --force

Pre-built binaries

Download from the releases page for Linux (x86_64, aarch64), macOS (Intel, Apple Silicon), and Windows (x86_64). Extract and move bigfiles (or bigfiles.exe) onto your $PATH.

From source:

git clone https://github.com/Par-python/bigfiles
cd bigfiles
cargo install --path .

Usage

# Scan current directory
bigfiles

# Scan a specific path
bigfiles ~/Downloads

# Skip hidden files and dirs, only descend 3 levels
bigfiles ~ --skip-hidden --depth 3

# Show the 5 largest files per category alongside the summary
bigfiles ~/Downloads --top 5

# Exclude paths via glob (repeatable)
bigfiles ~ --exclude 'node_modules' --exclude '*.log' --exclude 'target'

# Don't respect .gitignore / .ignore
bigfiles ~/some-project --no-ignore

# Treat anything not modified in 5+ years as stale (default: 2)
bigfiles ~/Documents --stale-years 5

# Pipe JSON into jq (envelope: { version, root, total_size, skipped, categories })
bigfiles ~/Movies --json | jq '.categories[] | select(.stale_size > 1000000000)'

# Sort the breakdown by file count instead of size; reverse for smallest-first
bigfiles ~ --sort count
bigfiles ~ --sort size --reverse

# Quick "what's eating my disk" insights view
bigfiles audit ~

# Show just the 10 largest files anywhere under the path
bigfiles top ~/Downloads -n 10

# Send stale files to Trash (default), or permanently delete with --force
bigfiles delete ~/Downloads
bigfiles delete ~/Downloads --force

.gitignore awareness

By default bigfiles uses the same ignore crate that ripgrep uses, so .gitignore, .ignore, and global git excludes are respected automatically. Scanning a Rust project? target/ is skipped. Node project? node_modules is skipped. No flag needed.

Use --no-ignore to walk everything regardless.

Interactive TUI

bigfiles tui <path> opens a full-screen ncdu-style directory browser. Sizes are aggregated per directory; the largest entries float to the top.

bigfiles tui ~

Keys:

↑/↓ or j/k — move
Enter or → — descend into directory
← or Backspace — go up
/ — filter children by substring (Esc cancels, Enter keeps)
o — reveal selected entry in your OS file manager (open -R on macOS, explorer /select, on Windows, xdg-open on Linux)
d — send the selected file or directory to Trash (yellow confirm bar appears; y/Enter confirms, any other key cancels). Trash-only in the TUI for safety. For permanent delete, use bigfiles delete --force or bigfiles dupes --delete --force from the CLI.
D — open a duplicate-detection popup scoped to the currently-highlighted directory. Uses the persistent hash cache, so repeat runs over the same subtree are near-instant. j/k or PgUp/PgDn to scroll, Esc/q to close.
r — re-run the scan from disk (the TUI exits briefly, shows the spinner, then re-enters with fresh data)
q/Esc — quit
? — toggle help

Quick audit

bigfiles audit <path> runs a normal scan, then prints a short list of severity-coded insights about what's eating your disk — heaviest category, top extensions, installer-junk total (.dmg/.pkg/.iso/.exe/.msi/.deb/.rpm), top-N-file concentration, and the share of stale data. Useful as a first-run "where do I start?" view.

bigfiles audit ~

Insights are bulleted by severity: red ! for heavy (≥40% of total), yellow • for notable (≥20%), dimmed · for informational. Respects --stale-years and all global filters (--skip-hidden, --exclude, --depth, etc.).

Top N largest files

bigfiles top <path> prints a flat list of the N largest files under the given path, sorted by size descending. No category grouping, no bars, no stale flags — just the biggest files. Pairs well with | head, | grep, or pipelines.

# Default: top 20
bigfiles top ~/Downloads

# Top 5
bigfiles top ~/Movies -n 5

# Pipe into other tools
bigfiles top ~ -n 100 | grep '\.mp4'

Respects all global filters (--skip-hidden, --exclude, --depth, --no-ignore).

Find duplicate files

bigfiles dupes finds files with identical content. It uses a fast three-stage check, parallelized with rayon:

Group by size
Hash first/last 4 KB (partial_hash)
Full BLAKE3 hash on remaining candidates

Hardlinks are collapsed by inode before hashing, so multiple paths pointing to the same on-disk file are reported as a single entry (and don't inflate "reclaimable" numbers). When a duplicate group includes hardlinks, the additional paths are shown indented under the primary path.

# Find dupes >= 1 MB in Downloads
bigfiles dupes ~/Downloads --min-size 1048576

# Default min-size is 1 KB; tune as needed
bigfiles dupes ~/Documents --min-size 1

Remove duplicate copies (interactive)

bigfiles dupes --delete walks each duplicate group and lets you pick which copy to keep; the rest are queued for removal. After all groups are processed, you get a summary and a y/N confirm before any file is touched.

By default, removed copies are moved to your OS Trash (recoverable). Pair with --force to delete permanently.

# Default: moves duplicate copies to Trash
bigfiles dupes ~/Downloads --delete

# Permanent removal (not recoverable)
bigfiles dupes ~/Downloads --delete --force

Safety guarantees:

Per-group single-choice picker — you can only remove by not picking one to keep
Every group offers a "skip — keep all" option; Esc also skips
Always keeps ≥1 copy per group (it's structurally impossible to empty a group)
No removal happens until the final y/N confirm; default is No
Files are re-stat'd immediately before removal; non-regular files (symlinks, sockets, devices) are refused
Trash by default: moved copies can be restored from your OS Trash unless you pass --force
If Trash is unavailable (e.g. on some network mounts), the operation refuses and points you at --force

Note that dupes are only ever paired within the scan root. If two copies live in separate trees, scan a common parent.

Persistent hash cache

bigfiles dupes caches full-file BLAKE3 hashes to your OS cache directory so subsequent runs over the same tree are near-instant.

Location: ~/Library/Caches/bigfiles/hashes.json (macOS), $XDG_CACHE_HOME/bigfiles/hashes.json (Linux), %LOCALAPPDATA%\bigfiles\Cache\hashes.json (Windows)
Cache key: (path, mtime, size) — any change invalidates the entry, forcing a re-hash. Tagged with the hash algorithm so the cache survives version bumps.
Pruning: entries for paths that no longer exist are dropped on the next save.
--no-cache runs without touching the cache (no read, no write).
--clear-cache deletes the cache file before running.

In local testing, the warm cache is roughly 40× faster than a cold first run on the same tree.

Remove stale files (interactive)

bigfiles delete shows you every file older than --stale-years (default 2) in an interactive checklist. You tick which ones to remove, see a confirmation summary, and only then are files touched.

By default, selected files are moved to your OS Trash (recoverable). Pair with --force to delete permanently.

# Default: moves stale files to Trash
bigfiles delete ~/Downloads --stale-years 3

# Permanent removal
bigfiles delete ~/Downloads --stale-years 3 --force

The flow: list → tick boxes (Space) → Enter → review summary → type y to confirm. Hit Ctrl-C any time to bail. If Trash is unavailable, the operation refuses and points you at --force.

Flags (global)

Flag	Default	Description
`<PATH>`	`.`	Directory to scan
`-s, --stale-years <N>`	`2`	Flag files not modified in this many years as stale
`-H, --skip-hidden`	off	Skip dotfiles and dot-directories
`-d, --depth <N>`	unlimited	Limit traversal depth (1 = only files directly in root)
`--no-ignore`	off	Do not respect `.gitignore` / `.ignore` files
`--no-pager`	off	Don't auto-page output through `$PAGER`
`-e, --exclude <GLOB>`	none	Skip files/dirs matching this glob; repeatable
`--units <STYLE>`	`default`	Byte unit style: `default` (1024, KB/MB), `iec` (1024, KiB/MiB), `si` (1000, KB/MB)
`--color <WHEN>`	`auto`	Color output: `auto`, `always`, `never`. Also respects `NO_COLOR`.
`-t, --top <N>`	off	Show N largest files per category (default scan only)
`-j, --json`	off	Emit raw JSON (default scan only)
`--sort <KEY>`	`size`	Sort categories by `size`, `count`, `stale-size`, `stale-count`, or `name` (default scan only)
`--reverse`	off	Reverse the sort order (default scan only)

Flags (dupes subcommand)

Flag	Default	Description
`--min-size <BYTES>`	`1024`	Minimum file size to consider
`--delete`	off	Interactively remove duplicate copies (keep one per group). Moves to Trash by default.
`--force`	off	When paired with `--delete`, permanently deletes instead of moving to Trash. Cannot be undone.
`--no-cache`	off	Skip the persistent hash cache for this run
`--clear-cache`	off	Delete the persistent hash cache before running

Flags (delete subcommand)

Flag	Default	Description
`--force`	off	Permanently delete selected files instead of moving them to Trash. Cannot be undone.

Flags (top subcommand)

Flag	Default	Description
`-n, --n <N>`	`20`	Number of largest files to show

Pager

When stdout is a real terminal, bigfiles auto-pages output through $PAGER (default less -FRX) — same UX as git log. Short output passes through instantly thanks to -F; long output (e.g. bigfiles ~ --top 20) opens scrollable. Use arrow keys / / to search / q to quit.

The pager is automatically skipped when:

output is piped (bigfiles ... | jq works as expected)
--json is set
the delete subcommand is running (interactive)
--no-pager is passed

Example output

  bigfiles 8.18 GB  /Users/you/Downloads

  category           size                            files    stale
  ────────────────────────────────────────────────────────────────────────
  video           3.30 GB  ██████████                   45
  archives        2.81 GB  ████████                     44
  documents       1.23 GB  ███                         362
  audio         410.3 MB   █                            29
  images        326.9 MB                               300    ⚠ 91.9 MB (12 files)
  other         115.5 MB                               358    ⚠ 2.5 MB (302 files)
  code          721.3 KB                                25    ⚠ 26.0 KB (14 files)

How "stale" is detected

bigfiles uses the file's modified time (mtime), not access time. Many filesystems disable access-time updates by default (Linux noatime, modern macOS volumes), so atime is unreliable for staleness. mtime is updated whenever a file's contents change, which is a better signal for "this file is forgotten."

Project layout

src/
  main.rs        # CLI entry, subcommand dispatch, clap styles
  walker.rs      # Parallel directory traversal, file collection, inode capture
  classifier.rs  # Extension → category mapping
  analyzer.rs    # Grouping, sorting, stale detection
  renderer.rs    # Default scan output
  dupes.rs       # Duplicate detection (parallel, hardlink-aware) + interactive delete
  delete.rs      # Interactive stale-file deletion
  format.rs      # Shared byte-size formatter

Platform notes

Linux & macOS: full feature set, including hardlink-aware dupe detection and pager auto-launch.
Windows: builds and runs cleanly via cargo build --release (CI covers windows-latest). Two caveats:
- Pager is disabled on Windows (there's no portable less). Output prints straight to stdout — pipe to more or use Windows Terminal's scrollback. The --no-pager flag is a no-op there.
- Hardlink detection is currently inactive — the inode/file-index API is nightly-only on std. Dupe detection still works, but hardlinks are treated as separate entries instead of being collapsed.

Performance notes

bigfiles uses a parallel directory walker (the ignore crate, same engine as ripgrep) which is fast and portable across macOS, Linux, and Windows. It is not the theoretical fastest approach on every platform:

Windows / NTFS: Reading the Master File Table (MFT) directly enumerates every file on a volume in one sequential pass. Tools like everything.exe use this. bigfiles does not, yet. See #1.
macOS / APFS: The volume catalog can be read in bulk via getattrlistbulk, which is faster than readdir on large trees. bigfiles does not exploit this. See #2.
Linux (ext4 / btrfs / xfs): No comparable shortcut. Standard parallel walking is close to optimal.

If you have benchmarks against fd, dust, fclones, or other scanners, open an issue. Honest numbers are welcome.

Benchmarks: `ignore` vs `jwalk`

A common suggestion is to swap the ignore-based walker for jwalk for higher throughput. The numbers don't support it for bigfiles' workload. Run with cargo bench --bench walker_bench.

Tree shape	`ignore` (gitignore on)	`ignore` (gitignore off)	`jwalk`
Shallow-wide (10k files in one dir)	11.0 ms	7.2 ms	7.4 ms
Deep-narrow (50 levels × 1 file)	1.73 ms	1.02 ms	1.43 ms
Realistic (src + ignored `node_modules`)	0.34 ms	9.7 ms	4.6 ms

Takeaways:

With gitignore parsing disabled, ignore and jwalk are effectively tied. ignore actually wins the deep-narrow case.
The realistic workload is where ignore pulls ahead by ~28×: a .gitignore excluding node_modules lets the walker skip ~5,000 files without ever calling stat. jwalk doesn't have gitignore support out of the box, so it walks everything.
For bigfiles' actual users (dev machines with node_modules, target/, .venv/), the gitignore-aware skip wins by a margin no raw-throughput improvement could close.

Measured on macOS (Apple Silicon) with Criterion, 100 samples each. Numbers are wall-clock per iteration.

Caveats

Removal is to Trash by default for both delete and dupes --delete. Pair with --force for permanent deletion (cannot be undone). The TUI is Trash-only — use the CLI with --force if you need permanent removal.
If Trash is unavailable (some network mounts, certain restricted environments), the operation refuses cleanly and asks you to re-run with --force. bigfiles will not silently fall back to permanent deletion.
Dupe pairing is relative to the scan root. If two copies live in separate trees (e.g. ~/A/file and ~/B/file), running bigfiles ~/A dupes won't find them. Scan a common parent.
--top, --json, --sort, --reverse only apply to the default scan. They're accepted but ignored under dupes/delete/audit/top/tui (a stderr note is printed).
Symlinks are skipped entirely. If you rely on symlink farms for organization, walking through them isn't supported — point bigfiles at the real paths.

Future ideas

Per-directory breakdown ("top 10 heaviest subdirectories")
--watch mode that re-scans on an interval
A full TUI with ratatui (expand/collapse categories, arrow-key navigation)
Persistent index in ~/.cache/bigfiles/ to diff scans over time
Replace dupes with hardlinks (--link mode) instead of deleting

Stability

Starting with 1.0, the CLI surface and JSON schema follow semver:

CLI flags: removing a flag, changing its short form, or changing default behavior requires a major version bump. New flags are minor.
JSON output: the "version": 1 envelope is stable. Breaking changes ship as "version": 2. Adding new fields is minor.
Exit codes: 0 success, 1 runtime error, 2 usage error.
Internal Rust API: not stable. Use the binary, not the library crate.

License

AGPL-3.0-or-later — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
benches		benches
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml

Folders and files

Latest commit

History

Repository files navigation

bigfiles

What it does

Install

crates.io (requires Rust via rustup)

Pre-built binaries

Usage

.gitignore awareness

Interactive TUI

Quick audit

Top N largest files

Find duplicate files

Remove duplicate copies (interactive)

Persistent hash cache

Remove stale files (interactive)

Flags (global)

Flags (dupes subcommand)

Flags (delete subcommand)

Flags (top subcommand)

Pager

Example output

How "stale" is detected

Project layout

Platform notes

Performance notes

Benchmarks: ignore vs jwalk

Caveats

Future ideas

Stability

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Benchmarks: `ignore` vs `jwalk`

Packages