Codestin Search App

imsodin · 2025-06-16T20:44:45Z

This is not ready to be merged, both because it's likely not finished and contains changes that probably shouldn't be there. I just realized it was a bit pointless not to show what I am talking about on the forum just because it's not ready, especially because I likely wont have time to get it anywhere closer to ready this week.

calmh · 2025-06-17T07:36:56Z

This could do with just a couple of sentences describing the new intended mechanism, for context when sifting through the diffs :)

imsodin · 2025-06-17T19:07:21Z

To make the diff at least not entirely unreadable, looks at these two sequences of commits separately - resp only the second is interesting:

Pure refactor - not sure it's worth it, but it's there for now on my dev branch: https://github.com/syncthing/syncthing/pull/10183/files/fa9f081918c314d1df51a5bc1d01559f95efb7a9
Actual change: https://github.com/syncthing/syncthing/pull/10183/files/fa9f081918c314d1df51a5bc1d01559f95efb7a9..871abb6164ac002f13308b221c02018e1f12226f

Some quick pointers for that "actual change":

I remove the deletion slices/maps and the file queue, instead using AllNeededGlobalFiles every/three times.
With the deletion map I also use the bucket used to detect renames. Instead for every file to be pulled, I now check for already exisiting local files with the same hash with AllLocalFilesWithBlocksHash and check if any of those will be deleted with GetGlobalFile.
As the files to be pulled aren't in the queue anymore, we also can't serve the needed files straight from that. That happens from the DB with a newly introduced variant of AllNeededGlobalFiles that returns metadata only for efficiency.

The changes above are already a lot, but then there were also some inconsistencies I just changed/fixed/papered over on the way plus some random actions along the way - that all needs weeding out. Again, this PR is very much just for reference if you happen to want to check out something about it, I wouldn't recommend investing time into it otherwise yet.

calmh · 2025-06-17T19:15:04Z

Right, it sounds mostly reasonable, I need to look at the details of course, two things that strike me out of the box:

You can't do a lot of work while holding a db iterator, because all writes go to the wal file while a select is open. This was one of the initial points of contention with the SQLite implementation that the wal file can grow unbounded if we're receiving updates or making changes with an open iterator. Short iterator passes are key (in time, so think "this should take at most a couple of seconds" when looping).
There may be consistency issues with the multiple passes, like we create needed directories in pass one, but pass two may include more files and their required directories than we knew about in pass one...

imsodin · 2025-06-17T19:22:05Z

Argh, yeah the first aspect will likely be problematic with my approach. Actually it is problematic as is for sure, nothing short-lived about what happens during the second phase of the iteration, but probably solvable. As we do ordered passes, maybe I will just add a timeout, release and start a new iteration? I'll have to think about this a bit more and try some things.

The second one I did consider but from what I see we diligently do checks for issues like that. As in an inconsistency can happen, but it wont cause problems beyond single items failing, which will then be redone/fixed on the next pull iteration.

calmh · 2025-06-17T19:29:00Z

I think targeted queries with limit clauses can work around some of it, e.g. grab the first 25 needed items of type directory and process those, grab the first 25 files and process them in memory, etc.

imsodin · 2025-06-17T20:56:03Z

I was thinking timeout as an item count isn't necessarily well correlated with time spent processing. Anyway, details - we could always combine both if necessary. However what seems simple at first isn't I think: We don't handle all the needed items when iterating, so when we iterate say the first 50, we may handle 10 right away. So next iteration we should skip the first 40 needed items. We'd have to keep track of that, which is already annoying. Plus other changes like index updates might also throw a spanner in. As we do an ordered iteration, we could remember the last handled file, and then instead of using an offset search until that one - better, but also not great (Edit: Also doesn't work, we might have handled that file -> no longer present. So we'd have to use the value of the ordered field, urgh). Ideally we'd have read snapshots :P
Or got back to having a queue, but put it in a (separate) DB. As in do one main DB iteration pass to put all relevant info into that separate db, then operate on that until done -> discard. That's a bit ugly/heavyhanded though.

imsodin added 14 commits June 16, 2025 22:41

wip: split processNeeded

16f9454

wip: split processNeeded

d45dba5

wip/builds: split processNeeded

5cf1220

wip/build: Inline one of them again

fa9f081

wip: Get rid of dirDeletions, and partially fileDeletions

fca3c3d

Fully get rid of file deletions and buckets

4697509

Also don't queue files on pull

696683d

get them tests in line

0a9ad1a

queue fixes

247f3f8

jobs and queue consistency stuff

6b45732

more queue fix (but man is this ugly)

be0bd08

works, arguable less horrifying overall (but still)

2c6f046

More efficient db iteration for queue jobs

88b2b52

fix/improve

871abb6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

draft: use db instead of storing items to be handled in memory while pulling#10183

draft: use db instead of storing items to be handled in memory while pulling#10183
imsodin wants to merge 14 commits intosyncthing:mainfrom
imsodin:simplify-pull

imsodin commented Jun 16, 2025 •

edited

Loading

Uh oh!

calmh commented Jun 17, 2025

Uh oh!

imsodin commented Jun 17, 2025

Uh oh!

calmh commented Jun 17, 2025

Uh oh!

imsodin commented Jun 17, 2025

Uh oh!

calmh commented Jun 17, 2025

Uh oh!

imsodin commented Jun 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

imsodin commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

calmh commented Jun 17, 2025

Uh oh!

imsodin commented Jun 17, 2025

Uh oh!

calmh commented Jun 17, 2025

Uh oh!

imsodin commented Jun 17, 2025

Uh oh!

calmh commented Jun 17, 2025

Uh oh!

imsodin commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imsodin commented Jun 16, 2025 •

edited

Loading

imsodin commented Jun 17, 2025 •

edited

Loading