draft: use db instead of storing items to be handled in memory while pulling#10183
draft: use db instead of storing items to be handled in memory while pulling#10183imsodin wants to merge 14 commits intosyncthing:mainfrom
Conversation
|
This could do with just a couple of sentences describing the new intended mechanism, for context when sifting through the diffs :) |
|
To make the diff at least not entirely unreadable, looks at these two sequences of commits separately - resp only the second is interesting:
Some quick pointers for that "actual change":
The changes above are already a lot, but then there were also some inconsistencies I just changed/fixed/papered over on the way plus some random actions along the way - that all needs weeding out. Again, this PR is very much just for reference if you happen to want to check out something about it, I wouldn't recommend investing time into it otherwise yet. |
|
Right, it sounds mostly reasonable, I need to look at the details of course, two things that strike me out of the box:
|
|
Argh, yeah the first aspect will likely be problematic with my approach. Actually it is problematic as is for sure, nothing short-lived about what happens during the second phase of the iteration, but probably solvable. As we do ordered passes, maybe I will just add a timeout, release and start a new iteration? I'll have to think about this a bit more and try some things. The second one I did consider but from what I see we diligently do checks for issues like that. As in an inconsistency can happen, but it wont cause problems beyond single items failing, which will then be redone/fixed on the next pull iteration. |
|
I think targeted queries with limit clauses can work around some of it, e.g. grab the first 25 needed items of type directory and process those, grab the first 25 files and process them in memory, etc. |
|
I was thinking timeout as an item count isn't necessarily well correlated with time spent processing. Anyway, details - we could always combine both if necessary. However what seems simple at first isn't I think: We don't handle all the needed items when iterating, so when we iterate say the first 50, we may handle 10 right away. So next iteration we should skip the first 40 needed items. We'd have to keep track of that, which is already annoying. Plus other changes like index updates might also throw a spanner in. As we do an ordered iteration, we could remember the last handled file, and then instead of using an offset search until that one - better, but also not great (Edit: Also doesn't work, we might have handled that file -> no longer present. So we'd have to use the value of the ordered field, urgh). Ideally we'd have read snapshots :P |
This is not ready to be merged, both because it's likely not finished and contains changes that probably shouldn't be there. I just realized it was a bit pointless not to show what I am talking about on the forum just because it's not ready, especially because I likely wont have time to get it anywhere closer to ready this week.