Create database indices for unread items and deleted urls#2057
Create database indices for unread items and deleted urls#2057noctux wants to merge 1 commit intonewsboat:masterfrom
Conversation
As reported by user Evil_Bob on IRC, we have several SQL queries on our
critical startup path that query the rss_items table on 1) unread
items and 2) a combination of feedurl and the deleted-flag.
He suggests that this can be speed up significantly by introducing two
additional indices which this commit introduces.
Some initial (not very scientific) measurements on my personal database:
(for i in {0..9}; do time newsboat -x print-unread; done):
Without indices:
newsboat -x print-unread 5,46s user 5,80s system 95% cpu 11,830 total
newsboat -x print-unread 5,44s user 5,83s system 95% cpu 11,827 total
newsboat -x print-unread 5,37s user 5,76s system 95% cpu 11,691 total
newsboat -x print-unread 5,33s user 5,84s system 95% cpu 11,748 total
newsboat -x print-unread 5,54s user 5,77s system 95% cpu 11,889 total
newsboat -x print-unread 5,36s user 5,86s system 94% cpu 11,931 total
newsboat -x print-unread 5,32s user 6,02s system 95% cpu 11,912 total
newsboat -x print-unread 5,22s user 5,95s system 95% cpu 11,735 total
newsboat -x print-unread 5,48s user 5,82s system 95% cpu 11,854 total
newsboat -x print-unread 5,34s user 5,85s system 95% cpu 11,756 total
With indices:
newsboat -x print-unread 1,27s user 0,34s system 74% cpu 2,179 total
newsboat -x print-unread 1,22s user 0,35s system 74% cpu 2,113 total
newsboat -x print-unread 1,21s user 0,34s system 73% cpu 2,099 total
newsboat -x print-unread 1,22s user 0,34s system 73% cpu 2,120 total
newsboat -x print-unread 1,24s user 0,35s system 72% cpu 2,201 total
newsboat -x print-unread 1,25s user 0,31s system 73% cpu 2,125 total
newsboat -x print-unread 1,19s user 0,38s system 74% cpu 2,124 total
newsboat -x print-unread 1,23s user 0,32s system 73% cpu 2,115 total
newsboat -x print-unread 1,21s user 0,33s system 73% cpu 2,095 total
newsboat -x print-unread 1,19s user 0,36s system 73% cpu 2,099 total
Of course on the other hand the indices consume a bit more space in the
.db file:
Before: ./cache.db: 911M
After: ./cache.db: 924M
Still, this seems like an adequate trade-off to make.
|
Weird, I can't reproduce the speedup. For me, this PR is consistently 1% faster than the current What I can reproduce is the size increase :) 4.4% for my database. It's okay as long as we get a comparable speed increase, I think. |
|
Ideas from IRC discussion with noctux and Evil_Bob:
|
|
Nope, can't reproduce. Things I tried:
Each test run shows that this PR is 1% faster than the baseline, which is slightly comforting :) I start to wonder if my data is somehow non-representative. @noctux, does your cache.db contain anything private? :) Would it be possible to share it with me (privately, under a promise that I won't pass it further and will destroy my copy once the mystery here is solved)? Also, if you have a minute, can you please re-run your tests with |
|
Notes from a couple discussions we had with Evil_Bob and noctux on IRC:
I want to get to the bottom of this, because I feel uneasy merging something that doesn't quite behave the way we expect it to. OTOH we haven't seen a case where this PR is slower than current master, so maybe I'll just cave in and merge it even if I don't understand the behaviour. I think the next step is to look at what SQL queries should use these indices, and compare the output of SQL |
As reported by user Evil_Bob on IRC, we have several SQL queries on our
critical startup path that query the rss_items table on 1) unread
items and 2) a combination of feedurl and the deleted flag.
He suggests that this can be speed up significantly by introducing two
additional indices which this commit adds.
Some initial (not very scientific) measurements on my personal database:
(
for i in {0..9}; do time newsboat -x print-unread; done):Without indices:
With indices:
Of course on the other hand the indices consume a bit more space in the
.db file:
Before:
./cache.db: 911MAfter:
./cache.db: 924MStill, this seems like an adequate trade-off to make.
Please note: The release number embedded in the sql-migration has to be bumped on the next release. I don't know what the release engineering process is at this point :)