You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use sharding for sqlite cache (16 shards) (#21292)
SQLite writes can become a major bottleneck for parallel runs, since
only one write can be active at any time. Sharding helps a lot and was
pretty easy to implement and reason about.
We need to be a bit careful to not have transactions that span multiple
shards, as it might cause deadlocks. Sharding is based on path name
without file name extension(s), so cache data for a single module goes
to the same shard always.
Use a predictable string hash function that is tuned for mypyc. It's
much faster than say SHA-1 (though hashing probably isn't a huge
bottleneck).
A version of this with 8 shards was on the order of 20% faster in some
cases when using 8 workers. The impact was bigger on macOS, but Linux
was also better (at least on a cloud VM). Before merging, I'll run some
benchmarks to validate that 16 shards don't regress anything.
Used coding agent assist here, but did things here in small reviewed
increments.
Also updated the cache conversion and diff scripts (tested manually
using coding agent).
Related to #21215.
0 commit comments