Codestin Search App

noctux · 2022-05-06T20:14:27Z

As reported by user Evil_Bob on IRC, we have several SQL queries on our
critical startup path that query the rss_items table on 1) unread
items and 2) a combination of feedurl and the deleted flag.
He suggests that this can be speed up significantly by introducing two
additional indices which this commit adds.

Some initial (not very scientific) measurements on my personal database:
(for i in {0..9}; do time newsboat -x print-unread; done):

Without indices:

newsboat -x print-unread  5,46s user 5,80s system 95% cpu 11,830 total
newsboat -x print-unread  5,44s user 5,83s system 95% cpu 11,827 total
newsboat -x print-unread  5,37s user 5,76s system 95% cpu 11,691 total
newsboat -x print-unread  5,33s user 5,84s system 95% cpu 11,748 total
newsboat -x print-unread  5,54s user 5,77s system 95% cpu 11,889 total
newsboat -x print-unread  5,36s user 5,86s system 94% cpu 11,931 total
newsboat -x print-unread  5,32s user 6,02s system 95% cpu 11,912 total
newsboat -x print-unread  5,22s user 5,95s system 95% cpu 11,735 total
newsboat -x print-unread  5,48s user 5,82s system 95% cpu 11,854 total
newsboat -x print-unread  5,34s user 5,85s system 95% cpu 11,756 total

With indices:

newsboat -x print-unread  1,27s user 0,34s system 74% cpu 2,179 total
newsboat -x print-unread  1,22s user 0,35s system 74% cpu 2,113 total
newsboat -x print-unread  1,21s user 0,34s system 73% cpu 2,099 total
newsboat -x print-unread  1,22s user 0,34s system 73% cpu 2,120 total
newsboat -x print-unread  1,24s user 0,35s system 72% cpu 2,201 total
newsboat -x print-unread  1,25s user 0,31s system 73% cpu 2,125 total
newsboat -x print-unread  1,19s user 0,38s system 74% cpu 2,124 total
newsboat -x print-unread  1,23s user 0,32s system 73% cpu 2,115 total
newsboat -x print-unread  1,21s user 0,33s system 73% cpu 2,095 total
newsboat -x print-unread  1,19s user 0,36s system 73% cpu 2,099 total

Of course on the other hand the indices consume a bit more space in the
.db file:

Before: ./cache.db: 911M
After: ./cache.db: 924M

Still, this seems like an adequate trade-off to make.

Please note: The release number embedded in the sql-migration has to be bumped on the next release. I don't know what the release engineering process is at this point :)

As reported by user Evil_Bob on IRC, we have several SQL queries on our critical startup path that query the rss_items table on 1) unread items and 2) a combination of feedurl and the deleted-flag. He suggests that this can be speed up significantly by introducing two additional indices which this commit introduces. Some initial (not very scientific) measurements on my personal database: (for i in {0..9}; do time newsboat -x print-unread; done): Without indices: newsboat -x print-unread 5,46s user 5,80s system 95% cpu 11,830 total newsboat -x print-unread 5,44s user 5,83s system 95% cpu 11,827 total newsboat -x print-unread 5,37s user 5,76s system 95% cpu 11,691 total newsboat -x print-unread 5,33s user 5,84s system 95% cpu 11,748 total newsboat -x print-unread 5,54s user 5,77s system 95% cpu 11,889 total newsboat -x print-unread 5,36s user 5,86s system 94% cpu 11,931 total newsboat -x print-unread 5,32s user 6,02s system 95% cpu 11,912 total newsboat -x print-unread 5,22s user 5,95s system 95% cpu 11,735 total newsboat -x print-unread 5,48s user 5,82s system 95% cpu 11,854 total newsboat -x print-unread 5,34s user 5,85s system 95% cpu 11,756 total With indices: newsboat -x print-unread 1,27s user 0,34s system 74% cpu 2,179 total newsboat -x print-unread 1,22s user 0,35s system 74% cpu 2,113 total newsboat -x print-unread 1,21s user 0,34s system 73% cpu 2,099 total newsboat -x print-unread 1,22s user 0,34s system 73% cpu 2,120 total newsboat -x print-unread 1,24s user 0,35s system 72% cpu 2,201 total newsboat -x print-unread 1,25s user 0,31s system 73% cpu 2,125 total newsboat -x print-unread 1,19s user 0,38s system 74% cpu 2,124 total newsboat -x print-unread 1,23s user 0,32s system 73% cpu 2,115 total newsboat -x print-unread 1,21s user 0,33s system 73% cpu 2,095 total newsboat -x print-unread 1,19s user 0,36s system 73% cpu 2,099 total Of course on the other hand the indices consume a bit more space in the .db file: Before: ./cache.db: 911M After: ./cache.db: 924M Still, this seems like an adequate trade-off to make.

coveralls · 2022-05-06T20:21:34Z

Coverage increased (+0.009%) to 59.291% when pulling 3cc6bbc on noctux:additional-sqlite-indices into 78506f2 on newsboat:master.

Minoru · 2022-05-09T19:39:17Z

Weird, I can't reproduce the speedup. For me, this PR is consistently 1% faster than the current master, but that's it. This might be due to some settings I have in my config, or maybe I should experiment on a real disk instead of tmpfs. I'll try different configurations later and report back.

What I can reproduce is the size increase :) 4.4% for my database. It's okay as long as we get a comparable speed increase, I think.

Minoru · 2022-05-09T21:11:40Z

Ideas from IRC discussion with noctux and Evil_Bob:

try this on a real disk, not tmpfs
try applying the patch to the latest release, not master
- noctux's measurements above weren't even done with this PR; the indices were added manually. Could our DB versioning be the culprit? I wonder

Minoru · 2022-05-11T19:27:50Z

Nope, can't reproduce. Things I tried:

running the tests on HDD rather than tmpfs (with plenty of free RAM for the page cache, and one warm-up run before each 10-run test)
cherry-picking this PR onto r2.27 tag (i.e. applying it to the latest release rather than master)
adding the indices manually (actually, I ran the same binary on two different databases, one "stock", the other one already converted by an earlier invocation of the code from this PR)
running with default settings (--config-file=/dev/null)

Each test run shows that this PR is 1% faster than the baseline, which is slightly comforting :)

I start to wonder if my data is somehow non-representative. @noctux, does your cache.db contain anything private? :) Would it be possible to share it with me (privately, under a promise that I won't pass it further and will destroy my copy once the mystery here is solved)?

Also, if you have a minute, can you please re-run your tests with --config-file=/dev/null to make sure that your config doesn't affect this? (It's not very likely that both you and Evil_Bob have a setting that I don't, but hey, it's easy enough to check)/

Minoru · 2022-05-28T21:08:54Z

Notes from a couple discussions we had with Evil_Bob and noctux on IRC:

I made a subset of my cache.db, and both Evil_Bob and noctux don't see the expected speedup on it too! So my data is to blame for my (lack of) results above
Evil_Bob dumped my subset cache.db, created a new one, and still couldn't observe the promised speedup: it went from 17 seconds on my file to 14 on his, but there is no difference between indexed and non-indexed cases
my subset cache.db has a page size of 1024 bytes, whereas newly created databases have 4096 (I found that out using .dbinfo in sqlite3 shell). If I take my subset cache.db, open sqlite3 shell, type pragma page_size=4096; vacuum;, then echo qy | newsboat goes from 20 seconds to 16.8 (on tmpfs). Still no difference between indiced and non-indiced cases though. The database with 4K pages is 0.9% smaller
Evil_Bob wrote a script that generates a bunch of feeds and a urls file that includes them. With this setup, I can reproduce the speedup promised by this PR: 15s went down to 3.6s (12s to 3.18s for Evil_Bob)

I want to get to the bottom of this, because I feel uneasy merging something that doesn't quite behave the way we expect it to. OTOH we haven't seen a case where this PR is slower than current master, so maybe I'll just cave in and merge it even if I don't understand the behaviour.

I think the next step is to look at what SQL queries should use these indices, and compare the output of SQL ANALYZE between my subset DB and the one generated by the script.

hiltjo mentioned this pull request Nov 10, 2022

[question] cache.db growing bigger over time, and newsboat become slower and slower #2254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create database indices for unread items and deleted urls#2057

Create database indices for unread items and deleted urls#2057
noctux wants to merge 1 commit intonewsboat:masterfrom
noctux:additional-sqlite-indices

noctux commented May 6, 2022

Uh oh!

coveralls commented May 6, 2022

Uh oh!

Minoru commented May 9, 2022

Uh oh!

Minoru commented May 9, 2022 •

edited

Loading

Uh oh!

Minoru commented May 11, 2022

Uh oh!

Minoru commented May 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

noctux commented May 6, 2022

Uh oh!

coveralls commented May 6, 2022

Uh oh!

Minoru commented May 9, 2022

Uh oh!

Minoru commented May 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Minoru commented May 11, 2022

Uh oh!

Minoru commented May 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Minoru commented May 9, 2022 •

edited

Loading