-
Notifications
You must be signed in to change notification settings - Fork 95
Description
For the past couple of months I have been investigating reports of "stuck" transfers and overall poor performance for high-throughput users. These reports come to me as GitHub Issues, Discord messages, and reddit posts.
I have painstakingly refactored upload and download logic in #1456 and #1462 to try and ensure that there is no logical path in which transfer logic can conclude without having transitioned the corresponding database record into a terminal (Completed) state, and the problem persists.
Copy-on-write filesystems can reorder writes, and this is apparently causing the transfer record updates to arrive out of order in the SQLite write ahead log. The effect of this is that two writes happen within a few milliseconds of one another, such as the transition from InProgress to Completed, can be reversed, leaving the transfer record in the InProgress state when the transfer is finished. I haven't proven this to be true, but the research that I have done suggests that it is possible (sycophantic LLMs agree, fwiw).
Next is the problem of user limits. When deciding whether to allow a user to enqueue a file, slskd must query the transfers database and compute how many files and megabytes they have downloaded/enqueued over the past day and week. In testing with a transfer database with ~150k records, these decisions are taking up to 15 seconds. This is causing all kinds of odd behavior, and it's causing transfers to fail at various stages.
I don't have a solution yet but it is very clear that SQLite needs to be removed from any "hot path" in the application, and relegated to historical information storage only.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status