Codestin Search App

bxff · 2026-01-02T18:09:45Z

Purpose

This PR represents the culmination of an iterative optimization journey initially undertaken to support sushitrain's extremely tight iOS background task constraints. Along the way, I discovered better approaches and completely re-architected the folder scanning pipeline.

Evolution of Approaches

I started by exploring different delete detection strategies, implementing and measuring each one:

Original approach: Used osutil.IsDeleted making Lstat() calls per file during Phase 2
Cached approach: Added DirExistenceCache and SymlinkCache to reduce redundant syscalls
Zero-syscall approach: Built an ExistingFiles map during Phase 1 walk to eliminate Phase 2 syscalls entirely

While the zero-syscall version showed improvement, I realized there was another major opportunity for optimization. The scanner was making individual database queries for every single file during the walk:

// This was called 135,715 times during a scan!
func (cf cFiler) CurrentFile(file string) (protocol.FileInfo, bool) {
    fi, ok, err := cf.db.GetDeviceFile(cf.folder, protocol.LocalDeviceID, file)
    return fi, ok
}

I decided to restructure the entire scan around preloading to address this:

1. Preload All File Metadata

I added AllLocalFilesMap to the database interface to bulk-load all file infos in a single query:

func (s *folderDB) AllLocalFilesMap(device protocol.DeviceID) (map[string]protocol.FileInfo, []string, error)

Key optimizations include:

Returns both map (for O(1) lookups) and sorted slice (for deterministic iteration)
Excludes block data from query, reducing memory significantly since blocks aren't needed for comparison
The scanner uses IgnoreBlocks: true for comparisons, so blocks aren't used

2. Map-Based CurrentFiler

I replaced per-file DB queries with map lookups:

type mapCFiler struct {
    files map[string]protocol.FileInfo
}

func (m mapCFiler) CurrentFile(file string) (protocol.FileInfo, bool) {
    fi, ok := m.files[file]  // O(1) instead of DB roundtrip
    return fi, ok
}

3. Reuse Preload in Phase 2

Phase 2 previously spent significant time iterating the database. Now it reuses the preloaded map, eliminating this entirely:

// Before: Iterate database rows
// After: Iterate sortedNames slice from preloaded map
for _, name := range sortedNames {
    fi := preloadedFiles[name]
    // ... delete/ignore logic ...
}

4. Filesystem API Modernization

As groundwork, I migrated the codebase to Go's modern os.ReadDir API:

Added ReadDir to the Filesystem interface
Implemented in all filesystem wrappers (basicfs, fakefs, casefs, etc.)
While this didn't eliminate syscalls (scanner still needs full FileInfo for metadata), it's a cleaner API for future implementations

Bug Fixes

Deterministic Iteration Order

Tests initially failed because Go's random map iteration broke rename sequence ordering. I fixed this by:

Adding ORDER BY n.name to the SQL query
Returning a separate sorted names slice alongside the map
Using for _, name := range sortedNames instead of range preloadedFiles
Skipping files already processed by findRename in Phase 1

Efficient Subdir Filtering

I optimized the prefix matching from nested loops to a single pass with early exit, reducing unnecessary iterations when scanning specific subdirectories.

Testing

All existing tests pass with these changes:

TestRenameSequenceOrder validates the sorted iteration requirement
TestScanDeletedROChangedOnSR confirms delete detection works correctly
Manual testing with large folders confirms performance improvements

To test manually, monitor scan logs on a folder with many files. The logs show internal timing breakdowns for each phase. The changes are internal refactorings that should not affect any user-visible behavior.

Future Work

Once this PR merges, I plan to implement an optimistic scanner that:

Uses folder mtime to skip traversals entirely when nothing has changed
Trades perfect accuracy for speed in constrained environments
Leverages the preloading infrastructure from this PR

This will be particularly valuable for sushitrain's iOS background scan requirements, where scan time is severely limited.

During scanSubdirsDeletedAndIgnored, we were calling IsDeleted for every file in the database. Each call made 1 Lstat for the file plus N more Lstats for TraversesSymlink (one per parent path component). For 114k files, this meant over 1 million syscalls per scan. On macOS where QoS throttling amplifies syscall latency, scans hit 47 seconds. This change adds two caching layers to fix the bottleneck: 1. DirExistenceCache: Caches DirNames() per directory, replacing per-file Lstat calls with in-memory set lookups. 2. SymlinkCache: Caches Lstat results for path components, so parent directories are only checked once. Results for 114,589 files: - Syscalls: 1,083,884 -> 35,467 (30.6x reduction) - Scan time: 47s -> 15s (3.1x faster) - Per-operation: 8,434 ns -> 228 ns (37x faster)

Even with cached syscalls, Phase 2 delete detection still required 35k+ Lstat calls per scan. While caching helped, filesystem overhead remained a bottleneck. This change eliminates syscalls entirely from Phase 2: 1. During Phase 1 walk, we now build an ExistingFiles map containing every visited path. 2. Phase 2 replaces all Lstat calls with simple map lookups, reducing delete detection to pure memory operations. To validate the optimization, we've added comprehensive benchmarking: - benchmark-fast.sh: A/B/C comparison script to test all three delete detection strategies (original, cached, zero-syscall) - Results tracking: Automatically logs performance metrics - Git integration: Updated .gitignore to exclude benchmark results Results for 135,715 files: - Total scan: 34.9s -> 29.1s (17% faster than cached) - Phase 2 time: 5.0s -> 4.5s (10% reduction) - Delete detection: 150ms with zero syscalls - Syscalls in Phase 2: eliminated completely The optimization only applies to Phase 2 (17% of total scan time), since Phase 1 must still walk the filesystem. For further improvements, filesystem watchers would be needed to avoid walking entirely.

During Phase 1 walk, the scanner was calling CurrentFile() for every file in the folder. Each call resulted in an individual DB query to retrieve the file's current metadata. For 135,715 files, this meant 135,715 separate database hits, which was the dominant cost of the 22.5s walk time. This change fundamentally rearchitects the scan by preloading all file metadata into memory once, then using map lookups throughout: 1. AllLocalFilesMap: Bulk loads all file infos from a single query. Excludes block data to optimize memory (270MB -> 40MB for 135K files). 2. mapCFiler: Replaces individual DB queries with O(1) map lookups during the walk, eliminating the per-file database bottleneck. 3. Phase 2 reuse: The preloaded map now drives delete detection too, avoiding the 3.8s DB iteration entirely. 4. Sorted iteration: File names are returned in order to maintain deterministic behavior for tests and rename detection. 5. Optimized filtering: Prefix matching now uses a single pass instead of nested loops.

internal/db/interface.go

calmh · 2026-01-02T21:29:03Z

It sounds like you're effectively loading the entire file database into memory, which would be a significant amount of memory for larger installations. That's effectively the approach we moved away from with the introduction of the first database layer some 8-10 years ago.

The preload optimization eliminated per-file DB queries but introduced unbounded memory usage that scaled linearly with folder size. While the 40-50MB footprint for 135K files was manageable, this approach was unsustainable for larger installations and represented a regression to patterns we moved away from years ago. This change completely re-architects the scan pipeline to use streaming parallel iteration, addressing the memory concern while maintaining O(n) performance and preserving deterministic behavior: 1. Lexicographic filesystem walk: Modified ReadDir sorting to treat directories as name + "/" so DFS produces the same order as ORDER BY name. This enables lockstep iteration without buffering the entire filesystem tree. 2. Streaming DB iterator: Replaced bulk preload with cursor-based iteration that streams rows one at a time via a 1000-item buffer, excluding unused block data that was previously loaded and discarded. 3. Parallel merge scan: Walks filesystem and database simultaneously in a single pass. Deleted files are detected naturally as skipped DB entries, eliminating the need for a separate Phase 2 DB iteration and the ExistingFiles map. 4. O(1) memory footprint: Reduces memory from ~45MB for 135K files to a constant buffer size. The streaming approach maintains constant memory regardless of folder size. 5. Eliminated redundant work: Removed the DirExistenceCache and SymlinkCache layers that were re-checking existence during Phase 2, plus the per-file Lstat calls in delete detection. 6. Dead code removal: Deleted AllLocalFilesMap, mapCFiler, and ExistingFiles infrastructure that existed solely to support the preload approach. The scan now completes in a single filesystem walk with streaming DB iteration, maintaining deterministic order for move detection while using constant memory. All tests pass including rename sequence validation.

bxff · 2026-01-03T11:00:59Z

Fair point about the unbounded memory. Preloading everything into a map was trading memory for speed, and that's not sustainable for larger installations. I should clarify though: this isn't just "load the DB into memory and call it a day." The PR completely re-architects the scan pipeline to eliminate a whole class of inefficiencies that were making the old code do far more work than necessary.

The memory impact is actually smaller than it looks because I intentionally exclude block hashes from the preload. For 135K files, the napkin math comes out to roughly 40-50MB total, which is significant but not the hundreds of MB it would be if I loaded everything. The old code was loading block hashes for every file during Phase 1, then immediately discarding them since the scanner runs with IgnoreBlocks: true. That's pure waste.

Beyond that, the rearchitecture fixes some deeper problems, like:

Redundant existence checks: The old code called osutil.IsDeleted (which does an Lstat) for every single file during Phase 2, even though we just walked the entire filesystem in Phase 1.
Repeated symlink/directory checks: Similar story. Chunks of logic were re-checking things that the walk already established.
N+1 query patterns: Every file triggered a separate DB lookup during the walk. With 135K files, that's 135K round trips that could be batched.

The new streaming approach addresses your concern head on: instead of preloading everything, I'm now using a single streaming DB iterator (ORDER BY name) merged with a lexicographically-sorted filesystem walk. This gives us O(1) memory (just a small buffered channel) while preserving the O(n) performance, and it naturally handles delete detection by skipping entries during the merge.

The key insight was that the filesystem walk order wasn't deterministic, which broke the merge strategy. The lex-order fix (treating directories as name + "/" during sort) makes DFS produce the same order as ORDER BY name, so I can stream both sources in lockstep without buffering everything.

Memory should drop from ~45MB to just the buffer size, I keep a single FS walk, and move detection still works deterministically because both sides iterate in sorted order. All tests pass, including the rename sequence ones that caught the ordering bug initially.

calmh · 2026-01-03T11:42:03Z

Unfortunately, a long running database read transaction is also a bit of a no-go as it blocks compaction and results in unbounded database growth for the duration. We did a fair amount of work to aid that during the SQLite transition.

Long-running read transactions block compaction and cause unbounded WAL growth. AllLocalFilesOrdered previously held a transaction open for 30+ seconds, preventing PRAGMA wal_checkpoint from running. This implements chunked keyset pagination to release transactions between chunks: 1. Process results in 10K-record chunks via keyset pagination 2. Each chunk uses: SELECT ... WHERE name > ? ORDER BY name LIMIT 10000 3. Transaction releases after each chunk, allowing checkpoints to run 4. LastName tracking enables deterministic ordering across chunks The pattern matches existing gcChunkSize approach in db_service.go and periodicCheckpointLocked for WAL management. Memory footprint is O(10K) per chunk vs O(1) streaming, but avoids O(n) full preload. This restores normal WAL behavior while maintaining streaming semantics.

bxff · 2026-01-03T12:38:39Z

Implemented chunked keyset pagination to address the long-running transaction concern. The query now processes results in 10K-record chunks (adjustable), releasing the transaction between each chunk. This allows PRAGMA wal_checkpoint to run in the gaps, preventing the unbounded WAL growth issue while maintaining streaming behavior. Pattern follows the existing chunked GC approach in db_service.go.

imsodin · 2026-01-03T14:55:35Z

Quick warning: Beware that this doesn't deal with a plain lexicographical order only but paths, i.e. combo of hierarchy and lexicographical order. I haven't actually checked if/how you are handling this aspect, just want to quickly bring it up. I once tried to do almost the same with detecting changes and deletions by walking both the DB and filesystem togeter (just did it entirely in the scanner) but abandoned it due to the complexities around ordering/hierarchy. I am obviously not saying it's not doable and also not that you aren't yet handling it, I just didn't immediately see that it is correct/handling it on a quick skim.

Also personally I'd recommend spending some effort on keeping the diff minimal/readable. The scanning logic is very central to syncthing (and also not straightforward - I am ok saying that as a lot of it is my mess xD) so any change here carries a lot of risk. Reviewing and trying to make sure it's correct is easier if the diff is more focused. An obvious example is the filesystem modernisation change: Doesn't seem connected to the actual change here, so imo that should be done separately (not making any statement on it's viability in general here). Possibly some changes in the scan/folder code could also be made smaller/easier to read, given you seem to re-use most of the logic there and not entirely rewriting (then of course there'd be no point).

imsodin · 2026-01-03T15:06:08Z

Also I am somewhat skeptical of loading even just the file infos without blocks into memory. The 99 percentile folder according to usage reporting has ~1M files and there will be significantly larger ones still. At the same time syncthing is often run on resource constraint NAS. While I don't think we should go to extremes to support outlier use-cases on underpowered devices, I still think we should keep that constraint in mind. Especially when we aren't trading it against simplicitly/maintainability or data safety, but "just" performance like here. For your initial changes to avoid LStat calls you provided some example numbers/benchmarks that showed a large 3x speedup (though not quite sure if that cache didn't end up using basically the same amount of memory?). Then loading everything into memory "only" provided an additional speedup 10-20%. For the scanner change you didn't provide benchmark numbers at all anymore. Imo as we are talking performance here, this needs repeatable benchmarks. Possibly "just" optimising the delete detection to avoid redundant LStat calls with a limited cache already provides the majority of speedup with a small change-set and limited memory impact.

bxff · 2026-01-03T17:03:15Z

On path hierarchy and ordering, that's a valid concern and exactly what I ran into initially. The fix ensures DFS walk order matches the database's ORDER BY name by sorting directories as "name/". This makes "a.txt" come before "a/" since . < /, which aligns the traversal order with the database iterator. The rename sequence tests caught the ordering bug early and now pass consistently.

For diff cleanup, could you clarify what you have in mind? Are you suggesting the filesystem API modernization should be a separate PR entirely, or is there a way to structure the scan changes to reuse more existing logic? I tried to keep the core scanning logic intact while changing the orchestration, but I'm happy to refactor if you can point me toward a cleaner approach that preserves the performance gains with a smaller surface area.

On memory usage, the chunked pagination addresses this directly - it processes 10K record batches in streaming fashion with O(1) memory. No unbounded growth, and WAL can checkpoint between chunks.

Regarding benchmarks, I actually stopped providing numbers because run-to-run variance was huge - sometimes 30 seconds, sometimes 2 minutes for the same folder, likely due to OS caching and DRAM states. App-level vs local testing also showed different characteristics. The 3x Lstat speedup was real but used minimal cache memory (proportional to max directory depth, not file count). I can't prove each change's individual impact due to this variance, but together they eliminate redundant syscalls, N+1 DB queries, and wasted block hash loading that the old code was doing. The streaming chunked approach maintains these wins without the memory cost of full preloading.

calmh · 2026-01-04T08:49:26Z

it processes 10K record batches in streaming fashion with O(1) memory. No unbounded growth, and WAL can checkpoint between chunks.

I don't think this helps, really. The time it takes to process a given chunk is much more dependent on what's new or changed on disk than what's in the database -- we can be stuck for hours scanning large files in the middle of a chunk, no matter how small the chunk.

You might be able to optimise it per directory somehow, so that you can correlate the listdir for one directory with the database query for the same path. Even then though, we scan new files before processing deletions, and your listdir might be long out of date by the time you get there.

imsodin · 2026-01-04T10:59:50Z

lib/model/folder.go

+	errFn    func() error                     // Error check function
+	current  *protocol.FileInfo               // Current DB entry (nil if exhausted)
+	hasMore  bool                             // Whether iterator has more entries
+	deleted  []protocol.FileInfo              // Files skipped (deleted from disk)


Comment secondary to open fundamental/design questions aka I'd suggest not investing time into addressing this comment until reaching a consensus there:
This should be bounded. As in flush/handle them when some size is reached. Probably shouldn't happen in the CFiler itself but instead somehow return the found deleted elements to handle it in the caller (or callback).

Good catch, fixed now.

imsodin · 2026-01-04T11:45:16Z

For diff cleanup, could you clarify what you have in mind? Are you suggesting the filesystem API modernization should be a separate PR entirely, or is there a way to structure the scan changes to reuse more existing logic? I tried to keep the core scanning logic intact while changing the orchestration, but I'm happy to refactor if you can point me toward a cleaner approach that preserves the performance gains with a smaller surface area.

Besides the already pointed out filesystem change I don't have anything concrete in mind - definitely not generically asking you to refactor. Just wanted to point it out as something to consider, which apparently you already did. And in any case the fundamental/design questions/concerns brought up are the mainly relevant bits now, as that needs to be sorted out resp. a consensus reached first. Otherwise polish/details quite likely end up being wasted time.

The fix ensures DFS walk order matches the database's ORDER BY name by sorting directories as "name/". This makes "a.txt" come before "a/" since . < /, which aligns the traversal order with the database iterator. The rename sequence tests caught the ordering bug early and now pass consistently.

Ah right, I didn't notice the ordering change in walkfs before.
What's the significance of .? There are chars before and after /.

Below is a simple example where the current logic produces different order when walking FS and from DB with `ORDER BY name` - or I am wrong, in which case the concrete examples should make it easy to point out that I am wrong and how :)

Example files in folder:

a.txt
a_a
a/aaa
a/bbb
a.d/aaa
a_a/aaa
b/aaa

Database entries ORDER BY name:

a
a.d
a.d/aaa
a.txt
a/aaa
a/bbb
a_a
a_d
a_d/aaa
b
b/aaa

filesystem walk order with sorting and slash appended:

a.d
a.d/aaa
a.txt
a
a/aaa
a/bbb
a_a
a_d
a_d/aaa
b
b/aaa

filesystem walk order with sorting without slash appended (mostly just for my curiosity).

a
a/aaa
a/bbb
a.d
a.d/aaa
a.txt
a_a
a_d
a_d/aaa
b
b/aaa

Regarding benchmarks, I actually stopped providing numbers because run-to-run variance was huge - sometimes 30 seconds, sometimes 2 minutes for the same folder, likely due to OS caching and DRAM states. App-level vs local testing also showed different characteristics. The 3x Lstat speedup was real but used minimal cache memory (proportional to max directory depth, not file count). I can't prove each change's individual impact due to this variance, but together they eliminate redundant syscalls, N+1 DB queries, and wasted block hash loading that the old code was doing. The streaming chunked approach maintains these wins without the memory cost of full preloading.

That makes sense, getting relevant/real-world equivalent benchmarking is always hard, even more so when involving the filesystem and a database. Nevertheless I'd expect a basic benchmark (similar or potentially exactly the same as the model.BenchmarkTree* ones) to show significant changes here. Of course the impact is not the same, e.g. especially when it comes to syscalls on iOS - could still count fs calls in the bench to get a sense of that. The change here is both large/risky and has a lot of (partially) independent parts and effects - such benchmarks should help compare them, even if absolute numbers are artificial, and thus help with tradeoff of risk vs benefit of a change.

it processes 10K record batches in streaming fashion with O(1) memory. No unbounded growth, and WAL can checkpoint between chunks.

I don't think this helps, really. The time it takes to process a given chunk is much more dependent on what's new or changed on disk than what's in the database -- we can be stuck for hours scanning large files in the middle of a chunk, no matter how small the chunk.

You might be able to optimise it per directory somehow, so that you can correlate the listdir for one directory with the database query for the same path. [...]

Doing bounded size batch loads from DB into memory without a transaction like this indeed seems like a good option to lower lookup cost without locking and limited memory overhead. No complicated logic needed, just do the same fixed-size, ordered query as the method does now at once (if the ordering/lockstep works, but that concern is the same either way).

[...] Even then though, we scan new files before processing deletions, and your listdir might be long out of date by the time you get there.

Is that an issue though? Just means we update the DB to the state of the filesystem at the time of walking the filesystem, instead of after having scanned and hashed all the changes. If a file is resurrected in the meantime, that will be picked up/resurrected in the following scan.

bt90 · 2026-01-04T12:24:32Z

If you're benchmarking on Linux, you can simply clear the page cache before each run:

sudo sync && echo 3 | sudo tee -a /proc/sys/vm/drop_caches

Things like NVMe thermal throttling aside, this should give you much more consistent results.

DFS walk with sorted entries could not reproduce the DB's ORDER BY name behavior, producing different traversal orders that broke merge scan assumptions. The /-suffix trick made directories sort after their contents, causing divergence from SQLite's collation. This implements a min-heap based walk that yields DB-consistent order: 1. Min-heap tracking pending entries by full path lex order 2. Iterative popping of smallest path ensures global ordering 3. Directory children are pushed after parent processing 4. Replaced DFS recursion with O(W) memory heap where W = max pending entries Complexity: - Time: O(N log W) where N = total files, W = max directory width - Memory: O(W) vs O(depth) for DFS, typically 100-1000 entries - Worst case (flat 10K): ~1MB heap vs correct order guarantee The algorithm produces exactly the same order as SQLite's ORDER BY name, verified against SELECT queries. Removed ancestorDirList, walk() recursion, and the fragile /-suffix sorting hack.

Test expectations reflected the incorrect DFS-based walk order rather than true DB ORDER BY name collation. This updates assertions to match SQLite's lexicographic ordering and prevents regression. Changes: 1. Corrected expected slice order: ".stfolder", "a", "a.txt", ... (was "a.txt", "a", ...) 2. Added imsodin's test case with exact ordering from review feedback 3. Verified expected array against actual SQLite: SELECT name FROM files ORDER BY name The new test captures the exact scenario that exposed the bug: - Input: a, a/aaa, a/bbb, a.d, a.d/aaa, a.txt - DB order: a, a.d, a.d/aaa, a.txt, a/aaa, a/bbb - Old walk: a, a/aaa, a/bbb, a.d, a.d/aaa, a.txt <- Wrong! Ensures heap-based walk maintains DB-consistent ordering going forward.

The deleted files collection was unbounded, scaling O(total_deleted) and potentially consuming significant memory during large sync operations. While Phase 2 currently processes all deletions together, the collection should not grow without bound. Changes: 1. Added deletedBatchSize = 1000 constant 2. Batch flush in addDeleted() when threshold reached 3. onDeletedBatch callback enables future streaming processing 4. Memory: O(1000) constant vs O(total_deleted) unbounded API: newStreamingCFiler now takes onDeletedBatch callback parameter. Current usage passes nil to retain Phase 2 collection behavior, but the batching mechanism is ready for future streaming improvements without API changes. This prevents memory leaks during large operations while preserving existing Phase 2 semantics.

bxff · 2026-01-05T15:52:46Z

Below is a simple example where the current logic produces different order...

My original "/" suffix approach falls apart with your a.txt vs a case (. vs / collation). SQLite sorts by full path string, not directory-first with trailing slashes.

I replaced the whole approach with a min-heap walk instead. Now it globally orders by complete path lexicographically, which matches your DB output exactly:

a
a.d
a.d/aaa
a.txt
a/aaa
...

The heap ensures we're always pulling the next smallest path from any directory, not just depth-first. Your test case is now in the test suite and passes against actual SQLite ORDER BY name queries.

Appreciate you catching this early - the rename tests were flaky but I didn't see why until your concrete examples made it obvious.

bxff · 2026-01-05T16:13:20Z

I don't think this helps, really. The time it takes to process a given chunk is much more dependent on what's new or changed on disk than what's in the database -- we can be stuck for hours scanning large files in the middle of a chunk, no matter how small the chunk.

I might be dense here, but I don't follow the connection. My implementation releases the transaction immediately after fetching each 10K chunk - the transaction is held for milliseconds, not hours. The file scanning happens entirely after the transaction closes.

What scenario are you envisioning where scanning files keeps the transaction alive? That's not how I wrote it, but maybe I'm misunderstanding something fundamental.

"just do the same fixed-size, ordered query as the method does now at once (if the ordering/lockstep works, but that concern is the same either way)."

What do you mean by "at once"? My implementation already does a fixed-size ordered query, loads those 10K rows into memory, then closes the transaction before processing.

Add realistic folder scanning benchmarks: - BenchmarkScanRealistic_Small: 2,100 folders / 13,500 files - BenchmarkScanRealistic_Medium: 4,200 folders / 27,000 files - BenchmarkScanRealistic_Full: 21,000 folders / 135,000 files Uses FakeFS to avoid filesystem caching effects and ensure reproducible results across different machines and OS versions. Compatible with any Syncthing version for performance comparison.

bxff · 2026-01-05T16:56:05Z

Added a scan benchmark suite using FakeFS to measure performance across realistic folder/file ratios (21K folders/135K files) at three scales (Small, Medium, Full). FakeFS eliminates disk I/O and OS caching variance for reproducible results, enabling cross-version comparisons of scan time, memory allocations, and file operation counts.

bxff · 2026-01-09T11:52:42Z

Hey @calmh @imsodin, just a friendly reminder on this PR whenever you have time for a review. Thanks!

bxff added 5 commits December 31, 2025 23:32

fix: license

e9331a6

chore: removed personal benchmark

713a921

github-actions bot added the enhancement New features or improvements of some kind, as opposed to a problem (bug) label Jan 2, 2026

tomasz1986 reviewed Jan 2, 2026

View reviewed changes

internal/db/interface.go Outdated Show resolved Hide resolved

fix(docs): license (reverted back)

b5da354

bxff changed the title ~~feat(fs,db,model): preload file metadata to eliminate per-file DB queries during scan~~ perf(fs,db,model): streaming chunked scan with O(1) memory to eliminate per-file DB queries Jan 3, 2026

github-actions bot removed the enhancement New features or improvements of some kind, as opposed to a problem (bug) label Jan 3, 2026

imsodin reviewed Jan 4, 2026

View reviewed changes

bxff added 3 commits January 5, 2026 21:06

bxff added 2 commits January 5, 2026 22:19

Merge branch 'main' into main

95b5559

Uh oh!

Conversation

bxff commented Jan 2, 2026

Purpose

Evolution of Approaches

Bug Fixes

Testing

Future Work

Uh oh!

Uh oh!

calmh commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bxff commented Jan 3, 2026

Uh oh!

calmh commented Jan 3, 2026

Uh oh!

bxff commented Jan 3, 2026

Uh oh!

imsodin commented Jan 3, 2026

Uh oh!

imsodin commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bxff commented Jan 3, 2026

Uh oh!

calmh commented Jan 4, 2026

Uh oh!

imsodin Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

bxff Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

imsodin commented Jan 4, 2026

Uh oh!

bt90 commented Jan 4, 2026

Uh oh!

bxff commented Jan 5, 2026

Uh oh!

bxff commented Jan 5, 2026

Uh oh!

bxff commented Jan 5, 2026

Uh oh!

bxff commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

calmh commented Jan 2, 2026 •

edited

Loading

imsodin commented Jan 3, 2026 •

edited

Loading