Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@JackDanger
Copy link

Database Migration

NO

Description

With ~200,000 albums in the database the queries to fetch missing albums become very slow, even on a fast postgres db.

Problem Cause
High row count selecting rows before grouping and sorting
Large sort in memory GB of memory used for low millions of records w/ quicksort
Expensive Anti Join On TrackFiles.Id IS NULL condition
Grouping before LIMIT Entire result grouped before applying LIMIT+OFFSET

This is one approach to optimize these queries without significantly changing the query builder.

The original query explanation

lidarr_main=> explain analyze SELECT "Albums".* FROM "Albums" JOIN "Artists" ON ("Albums"."ArtistMetadataId" = "Artists"."ArtistMetadataId") JOIN "AlbumReleases" ON ("Albums"."Id" = "AlbumReleases"."AlbumId") JOIN "Tracks" ON ("AlbumReleases"."Id" = "Tracks"."AlbumReleaseId") LEFT JOIN "TrackFiles" ON ("Tracks"."TrackFileId" = "TrackFiles"."Id") WHERE ("TrackFiles"."Id" IS NULL) AND ("AlbumReleases"."Monitored" = 't') AND ("Albums"."ReleaseDate" <= '2025-04-01') AND (("Albums"."Monitored" = 't') AND ("Artists"."Monitored" = 't')) GROUP BY "Albums"."Id" , "Artists"."SortName" ORDER BY "Albums"."ReleaseDate" DESC LIMIT 20 OFFSET 20;

                                                                                                              QUERY PLAN                                                                                                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=89169.19..89169.19 rows=1 width=571) (actual time=23750.310..23752.152 rows=20 loops=1)
   ->  Sort  (cost=89169.18..89169.19 rows=1 width=571) (actual time=23750.307..23752.148 rows=40 loops=1)
         Sort Key: "Albums"."ReleaseDate" DESC
         Sort Method: top-N heapsort  Memory: 81kB
         ->  Group  (cost=89169.16..89169.17 rows=1 width=571) (actual time=23395.390..23692.280 rows=149339 loops=1)
               Group Key: "Albums"."Id", "Artists"."SortName"
               ->  Sort  (cost=89169.16..89169.17 rows=1 width=571) (actual time=23395.381..23527.284 rows=2359531 loops=1)
                     Sort Key: "Albums"."Id", "Artists"."SortName"
                     Sort Method: quicksort  Memory: 1493490kB
                     ->  Nested Loop  (cost=9761.97..89169.15 rows=1 width=571) (actual time=120.457..21173.061 rows=2359531 loops=1)
                           ->  Nested Loop  (cost=9761.68..89168.84 rows=1 width=560) (actual time=120.439..17482.768 rows=2558513 loops=1)
                                 ->  Nested Loop  (cost=9761.26..89168.04 rows=1 width=4) (actual time=113.705..11849.661 rows=2945179 loops=1)
                                       ->  Gather  (cost=9760.84..89167.59 rows=1 width=4) (actual time=111.690..1867.697 rows=5526730 loops=1)
                                             Workers Planned: 5
                                             Workers Launched: 0
                                             ->  Parallel Hash Anti Join  (cost=8760.84..88167.49 rows=1 width=4) (actual time=111.120..1544.392 rows=5526730 loops=1)
                                                   Hash Cond: ("Tracks"."TrackFileId" = "TrackFiles"."Id")
                                                   ->  Parallel Index Only Scan using idx_tracks_albumreleaseid_trackfileid on "Tracks"  (cost=0.43..75618.48 rows=1010293 width=8) (actual time=0.016..597.486 rows=5826988 loops=1)
                                                         Heap Fetches: 58839
                                                   ->  Parallel Hash  (cost=7533.59..7533.59 rows=98145 width=4) (actual time=107.524..107.524 rows=300865 loops=1)
                                                         Buckets: 524288  Batches: 1  Memory Usage: 15872kB
                                                         ->  Parallel Index Only Scan using idx_trackfiles_id on "TrackFiles"  (cost=0.42..7533.59 rows=98145 width=4) (actual time=0.033..37.543 rows=300865 loops=1)
                                                               Heap Fetches: 16938
                                       ->  Index Scan using "PK_AlbumReleases" on "AlbumReleases"  (cost=0.42..0.45 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=5526730)
                                             Index Cond: ("Id" = "Tracks"."AlbumReleaseId")
                                             Filter: "Monitored"
                                             Rows Removed by Filter: 0
                                 ->  Index Scan using "PK_Albums" on "Albums"  (cost=0.42..0.80 rows=1 width=560) (actual time=0.002..0.002 rows=1 loops=2945179)
                                       Index Cond: ("Id" = "AlbumReleases"."AlbumId")
                                       Filter: ("Monitored" AND ("ReleaseDate" <= '2025-04-01 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 0
                           ->  Index Scan using idx_artists_artistmetadataid on "Artists"  (cost=0.29..0.32 rows=1 width=15) (actual time=0.001..0.001 rows=1 loops=2558513)
                                 Index Cond: ("ArtistMetadataId" = "Albums"."ArtistMetadataId")
                                 Filter: "Monitored"
                                 Rows Removed by Filter: 0
 Planning Time: 3.742 ms
 Execution Time: 23850.137 ms
(37 rows)

The NEW query explain

lidarr_main=> explain analyze SELECT "Albums".* FROM "Albums" JOIN "Artists" ON ("Albums"."ArtistMetadataId" = "Artists"."ArtistMetadataId") WHERE (("Albums"."Monitored" = true) AND ("Albums"."ReleaseDate" <= '2025-04-01')) AND ("Artists"."Monitored" = true) A
ND "Albums"."Id" IN (SELECT "AlbumReleases"."AlbumId" FROM "AlbumReleases" JOIN "Tracks" ON ("AlbumReleases"."Id" = "Tracks"."AlbumReleaseId") LEFT JOIN "TrackFiles" ON ("Tracks"."TrackFileId" = "TrackFiles"."Id") WHERE "TrackFiles" IS NULL AND "AlbumReleases"
."Monitored" = true GROUP BY "AlbumReleases"."AlbumId") AND (("Albums"."Monitored" = true) AND ("Artists"."Monitored" = true)) GROUP BY "Albums"."Id" , "Artists"."SortName" ORDER BY "Albums"."ReleaseDate" DESC LIMIT 20 OFFSET 100 ;
                                                                                                                      QUERY PLAN                                                                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=125886.44..125886.49 rows=20 width=571) (actual time=2165.526..2176.292 rows=20 loops=1)
   ->  Sort  (cost=125886.19..125911.44 rows=10103 width=571) (actual time=2147.578..2158.345 rows=120 loops=1)
         Sort Key: "Albums"."ReleaseDate" DESC
         Sort Method: top-N heapsort  Memory: 176kB
         ->  Group  (cost=125411.00..125486.77 rows=10103 width=571) (actual time=2062.566..2115.053 rows=149337 loops=1)
               Group Key: "Albums"."Id", "Artists"."SortName"
               ->  Sort  (cost=125411.00..125436.25 rows=10103 width=571) (actual time=2062.539..2085.123 rows=149337 loops=1)
                     Sort Key: "Albums"."Id", "Artists"."SortName"
                     Sort Method: quicksort  Memory: 89482kB
                     ->  Hash Join  (cost=105232.94..124739.02 rows=10103 width=571) (actual time=1853.816..1996.874 rows=149337 loops=1)
                           Hash Cond: ("Albums"."ArtistMetadataId" = "Artists"."ArtistMetadataId")
                           ->  Hash Join  (cost=104202.74..123682.23 rows=10132 width=560) (actual time=1847.397..1970.359 rows=154380 loops=1)
                                 Hash Cond: ("Albums"."Id" = "AlbumReleases"."AlbumId")
                                 ->  Seq Scan on "Albums"  (cost=0.00..19065.85 rows=157571 width=560) (actual time=0.013..72.567 rows=175607 loops=1)
                                       Filter: ("Monitored" AND "Monitored" AND ("ReleaseDate" <= '2025-04-01 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 24177
                                 ->  Hash  (cost=104041.49..104041.49 rows=12900 width=4) (actual time=1847.314..1858.076 rows=174924 loops=1)
                                       Buckets: 262144 (originally 16384)  Batches: 1 (originally 1)  Memory Usage: 8198kB
                                       ->  Group  (cost=102449.31..104041.49 rows=12900 width=4) (actual time=1589.835..1843.704 rows=174924 loops=1)
                                             Group Key: "AlbumReleases"."AlbumId"
                                             ->  Gather Merge  (cost=102449.31..104009.24 rows=12900 width=4) (actual time=1589.810..1770.925 rows=2945157 loops=1)
                                                   Workers Planned: 5
                                                   Workers Launched: 5
                                                   ->  Sort  (cost=101449.23..101455.68 rows=2580 width=4) (actual time=1569.655..1582.641 rows=490860 loops=6)
                                                         Sort Key: "AlbumReleases"."AlbumId"
                                                         Sort Method: quicksort  Memory: 12289kB
                                                         Worker 0:  Sort Method: quicksort  Memory: 24577kB
                                                         Worker 1:  Sort Method: quicksort  Memory: 12289kB
                                                         Worker 2:  Sort Method: quicksort  Memory: 12289kB
                                                         Worker 3:  Sort Method: quicksort  Memory: 12289kB
                                                         Worker 4:  Sort Method: quicksort  Memory: 24577kB
                                                         ->  Nested Loop  (cost=20746.34..101303.04 rows=2580 width=4) (actual time=83.970..1538.354 rows=490860 loops=6)
                                                               ->  Parallel Hash Left Join  (cost=20745.92..99018.92 rows=5051 width=4) (actual time=82.755..240.181 rows=921118 loops=6)
                                                                     Hash Cond: ("Tracks"."TrackFileId" = "TrackFiles"."Id")
                                                                     Filter: ("TrackFiles".* IS NULL)
                                                                     Rows Removed by Filter: 50047
                                                                     ->  Parallel Index Only Scan using idx_tracks_albumreleaseid_trackfileid on "Tracks"  (cost=0.43..75621.36 rows=1010298 width=8) (actual time=0.085..62.140 rows=971165 loops=6)
                                                                           Heap Fetches: 58975
                                                                     ->  Parallel Hash  (cost=19518.55..19518.55 rows=98155 width=543) (actual time=82.169..82.170 rows=50148 loops=6)
                                                                           Buckets: 524288  Batches: 1  Memory Usage: 158944kB
                                                                           ->  Parallel Seq Scan on "TrackFiles"  (cost=0.00..19518.55 rows=98155 width=543) (actual time=5.097..28.659 rows=50148 loops=6)
                                                               ->  Index Scan using "PK_AlbumReleases" on "AlbumReleases"  (cost=0.42..0.45 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5526708)
                                                                     Index Cond: ("Id" = "Tracks"."AlbumReleaseId")
                                                                     Filter: "Monitored"
                                                                     Rows Removed by Filter: 0
                           ->  Hash  (cost=736.01..736.01 rows=23535 width=15) (actual time=6.399..6.399 rows=23573 loops=1)
                                 Buckets: 32768  Batches: 1  Memory Usage: 1392kB
                                 ->  Seq Scan on "Artists"  (cost=0.00..736.01 rows=23535 width=15) (actual time=0.017..4.288 rows=23573 loops=1)
                                       Filter: ("Monitored" AND "Monitored")
                                       Rows Removed by Filter: 33
 Planning Time: 1.050 ms
 JIT:
   Functions: 122
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 3.874 ms, Inlining 0.000 ms, Optimization 2.900 ms, Emission 45.741 ms, Total 52.515 ms
 Execution Time: 2194.403 ms
(56 rows)

| Problem                  | Cause                                                      |
|--------------------------|------------------------------------------------------------|
| High row count           | selecting rows before grouping and sorting                 |
| Large sort in memory     | GB of memory used for low millions of records w/ quicksort |
| Expensive Anti Join      | On `TrackFiles.Id IS NULL` condition                       |
| Grouping before LIMIT    | Entire result grouped before applying LIMIT+OFFSET         |
.Join<Album, Artist>((l, r) => l.ArtistMetadataId == r.ArtistMetadataId)
.Where<Album>(a => a.Monitored == true && a.ReleaseDate <= currentTime)
.Where<Artist>(a => a.Monitored == true)
.Where("\"Albums\".\"Id\" IN (SELECT \"AlbumReleases\".\"AlbumId\" FROM \"AlbumReleases\" JOIN \"Tracks\" ON (\"AlbumReleases\".\"Id\" = \"Tracks\".\"AlbumReleaseId\") LEFT JOIN \"TrackFiles\" ON (\"Tracks\".\"TrackFileId\" = \"TrackFiles\".\"Id\") WHERE \"TrackFiles\" IS NULL AND \"AlbumReleases\".\"Monitored\" = true GROUP BY \"AlbumReleases\".\"AlbumId\")")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gross, but I didn't know if y'all would prefer I fix the Builder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly this breaks the actual behavior.

missingTracksSubquery seems to be unused, and you made albums to be mandatory monitored to show up in missing, which currently according to the filter selected by the user it should show up if Unmonitored is selected.

@bakerboy448 bakerboy448 marked this pull request as draft August 21, 2025 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants