Limit the number of concurrently opened indexes #3331

dureuill · 2023-01-11T18:18:53Z

Pull Request

Related issue

Relevant to #1841, fixes #3382

What does this PR do?

User standpoint

Limit the number of concurrently opened indexes (currently, the number of indexes that can be concurrently opened is computed at startup)
When too many an index is opened, the least recently used one is closed and its virtual memory released.
This allows a user to have an arbitrary number of indexes of an arbitrary size

Implementation standpoint

Added a LRU cache map in index-scheduler::lru. A more complete implementation (eg with helper functions not used here) is available but would better fit a dedicated crate.
Use the LRU cache map in the IndexScheduler. To simplify the lifecycle of indexes, they are never removed from the cache when they are in the middle of a resize or delete operation. To achieve this, an intermediate Vec stores the UUIDs of the indexes that are in the middle of such an operation.
Upon creating the index scheduler object, compute the total virtual memory that is adressable by using a dichotomic search on the max size of an index. Use this as a base to compute the number of indexes that can be open with 2TiB per index. If the virtual memory address space is lower than 2TiB, then only allow for 1 index of a fraction of that size.

dureuill · 2023-01-30T09:31:21Z

I feel like with the tests I did, this PR is now ready for review.

github-actions · 2023-02-07T13:13:04Z

Uffizzi Preview deployment-14601 was deleted.

Kerollmops

Thank you very much, maybe some small changes to do.

index-scheduler/src/index_mapper.rs

index-scheduler/src/lru.rs

dureuill · 2023-02-09T10:44:52Z

Change default index size to 10GiB
Default to 100 concurrent indexes in Unixes, 10 in Windows

irevoire · 2023-02-13T15:57:48Z

index-scheduler/src/index_mapper.rs

+                    if tries >= 100 {
+                        panic!("Too many attempts to close index {name} prior to deletion.")
+                    }


I'm not sure I understood in which case this can happen, and what follows if we panic here?

what follows if we panic here?

I expect we will respawn a scheduler thread after some logging but you're right that I should check that.

I'm not sure I understood in which case this can happen

This should be a comment (I will add it), but I see 2 situations this can happen:

There's a bug in the code, and the index is never adequately closed. I believe this actually happened during development, this could happen again during refactoring. It is better to eventually get an error message rather than a full on "infinite loop"

There's a reader that won't relinquish the Index object for more than 600 seconds.

I added a comment stating the situations where the panic could possibly occur.

I also tested manually triggering a panic at this location. It fails the deletion task, and kills the batch thread, which is renewed for each task, so no other side effect.

irevoire · 2023-02-13T15:58:41Z

index-scheduler/src/index_mapper.rs

+            if tries > 100 {
+                panic!("Too many spurious wake ups while the index is being resized");
+            }


Same question

I added a comment stating the situations where the panic could possibly occur.

I also tested manually triggering the panic and trying to search/add documents in an index. In both cases it kills an "actix arbiter" thread and fails the request (not in a very graceful manner, I basically get curl: (56) Recv failure: Connection reset by peer).

It looks like actix respawns the arbiter threads, so we're not at risk of running out of threads in this situation.

dureuill · 2023-02-15T14:18:29Z

Added an update with the following:

We compute an "index budget" when the Index scheduler is created, to know how much cumulated virtual memory we can allocate through memmap before exhausting the virtual address space. A fraction of the budget is then split to determine how many indexes can be opened simultaneously in the cache with a size of 2TiB. If that number is 0, then there can be only one index in the cache with a size of whatever budget is available.
As computing the budget adds 2 seconds at startup, it is disabled in tests.

This should result in the following:

No more performance hit due to index resize, except for users with indexes bigger than 2 TiB (if this impedes your use case, please contact us)
Much lower risk of filling the entirety of the cache
Adapts to machines with smaller virtual memory address space

I will do a second prototype with these changes soon.

remaining TODO

Some comments on the PR left unaddressed
"Invasive" logs that are good for the prototype but should be removed before merging
Handle the methods that attempt to open all indexes at once
Check that the added dynamic budget computation doesn't impede usage for Cloud

dureuill · 2023-02-20T17:22:26Z

Pushed an update that handles the methods that attempt to open all indexes at once.

There is now a try_for_each_index method that allows to iterate on all indexes. For callers that only needed the names, there is also a new index_names method that doesn't open the indexes at all.

The stats, indexes, dumps routes as well as snapshot creation/import and dump import seem to be fixed by this change.

There is some performance effect to calling stats on a meilisearch with 1000+ indexes: about 8s on Mac (0.8s for 100 indexes, unnoticeable for 10 indexes), about 0.5s on Linux (unnoticeable for 100 indexes or below). I cannot compare with v1 because we couldn't have more than 200 indexes there so the stats route would fail. Interestingly on mac the time for the route to answer decreases when limiting the total virtual memory with ulimit. Possibly the allocator is simply slow for big allocations, whereas Linux' is very fast by comparison on that use case.

Kerollmops

Thank you very much!

index-scheduler/src/lib.rs

index-scheduler/src/lru.rs

index-scheduler/src/lib.rs

meilisearch/src/lib.rs

meilisearch/src/routes/indexes/mod.rs

index-scheduler/src/utils.rs

index-scheduler/src/index_mapper/index_map.rs

dureuill · 2023-02-23T08:06:34Z

TODO:

It appears that indexes are not being registered correctly as BeingDeleted currently, this needs checking
Rebase on main
Squash redundant commits

Follow-ups for subsequent PRs (to be added in the tracking issue and/or opened as different issues):

Increased Windows/macOS startup time. This could be mitigated by first trying a budget value that should be "good enough" and skipping the dichotomy if it passes.
Degraded stats performance. This could be mitigated by caching the stats of indexes (eg in the tasks db).
Finer-grained error handling in the dichotomy: make a difference between allocation errors (continue the dichotomy after marking this value as "bad") and other kinds of errors (stop the dichotomy and report the error to the caller)

Kerollmops

Beautiful work @dureuill 🎉 I think we can merge this PR and address your comment (☝️) in other PRs. I let you bors merge.

dureuill · 2023-02-23T13:37:06Z

Thank you for the review @Kerollmops!

bors merge

bors · 2023-02-23T15:03:51Z

Build succeeded:

dureuill marked this pull request as draft January 11, 2023 18:19

dureuill force-pushed the resize-full-indexes branch from d0037d8 to 952d0d1 Compare January 12, 2023 10:01

dureuill force-pushed the use-lru branch 2 times, most recently from 911d88f to 1603e29 Compare January 12, 2023 14:30

dureuill mentioned this pull request Jan 18, 2023

Don't limit the number or size of indexes #3382

Closed

dureuill force-pushed the use-lru branch from 4fffec6 to d181e4d Compare January 19, 2023 09:14

dureuill force-pushed the resize-full-indexes branch from 952d0d1 to a69fbf8 Compare January 25, 2023 08:23

dureuill force-pushed the use-lru branch 2 times, most recently from 3f51ffb to e5aeaea Compare January 25, 2023 09:42

dureuill added this to the v1.1.0 milestone Jan 25, 2023

dureuill marked this pull request as ready for review January 30, 2023 09:31

dureuill force-pushed the resize-full-indexes branch from a69fbf8 to e1cc4e3 Compare February 7, 2023 12:43

dureuill force-pushed the use-lru branch from e5aeaea to 3db8451 Compare February 7, 2023 12:44

Kerollmops requested changes Feb 7, 2023

View reviewed changes

dureuill force-pushed the resize-full-indexes branch from e1cc4e3 to a8fe5fa Compare February 9, 2023 09:30

dureuill force-pushed the use-lru branch from 3db8451 to 98a0cc1 Compare February 9, 2023 09:30

dureuill force-pushed the use-lru branch from 3a23872 to 768f427 Compare February 9, 2023 11:17

dureuill force-pushed the resize-full-indexes branch from a8fe5fa to 08cd94c Compare February 9, 2023 15:35

dureuill force-pushed the use-lru branch from 768f427 to e5016e1 Compare February 9, 2023 15:35

dureuill force-pushed the resize-full-indexes branch from 08cd94c to 7c42fb9 Compare February 10, 2023 14:58

dureuill force-pushed the use-lru branch from e5016e1 to 35906a5 Compare February 10, 2023 14:59

irevoire reviewed Feb 13, 2023

View reviewed changes

dureuill mentioned this pull request Feb 14, 2023

Transparently resize indexes on MaxDatabaseSizeReached errors #3319

Merged

3 tasks

dureuill force-pushed the use-lru branch from 35906a5 to 524baae Compare February 15, 2023 08:28

dureuill force-pushed the resize-full-indexes branch from 7c42fb9 to 3908fde Compare February 16, 2023 09:53

dureuill force-pushed the use-lru branch from 30c63b2 to d117767 Compare February 16, 2023 09:54

dureuill force-pushed the use-lru branch from c09051d to 1bf0be9 Compare February 20, 2023 16:21

dureuill force-pushed the use-lru branch 2 times, most recently from fac978a to 006663f Compare February 22, 2023 08:09

dureuill requested a review from Kerollmops February 22, 2023 08:45

dureuill force-pushed the use-lru branch from 006663f to 4f85aa8 Compare February 22, 2023 09:24

dureuill requested a review from irevoire February 22, 2023 09:32

Kerollmops requested changes Feb 22, 2023

View reviewed changes

dureuill force-pushed the use-lru branch from 4f85aa8 to 535fdc1 Compare February 22, 2023 16:55

dureuill force-pushed the use-lru branch from 48109b0 to 2a837f7 Compare February 23, 2023 10:22

dureuill added 10 commits February 23, 2023 11:23

Add LruMap

fdf0435

Use LRU cache

80b060f

Add basic tests for index eviction and resize

1db7d5d

Add dichotomic search to utils

f1119f2

Compute budget

a529bf1

Switch to 2TiB default index size, updates documentation

c63294f

Skip computing index budget in tests

5822764

Don't iterate all indexes manually

3db613f

Move index_mapper to mod.rs

431782f

move index_map to file

71e7900

dureuill force-pushed the use-lru branch from 2a837f7 to 71e7900 Compare February 23, 2023 11:22

Kerollmops approved these changes Feb 23, 2023

View reviewed changes

bors bot merged commit ca25904 into main Feb 23, 2023

bors bot deleted the use-lru branch February 23, 2023 15:03

curquiza mentioned this pull request Feb 23, 2023

Cannot allocate memory when creating indexes #1841

Closed

irevoire mentioned this pull request Feb 23, 2023

Cache the result of the indexes stats #3540

Closed

meili-bot added the v1.1.0 PRs/issues solved in v1.1.0 released on 2023-04-03 label Apr 6, 2023

Limit the number of concurrently opened indexes #3331

Limit the number of concurrently opened indexes #3331

Uh oh!

Conversation

dureuill commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Related issue

What does this PR do?

User standpoint

Implementation standpoint

Uh oh!

dureuill commented Jan 30, 2023

Uh oh!

github-actions bot commented Feb 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kerollmops left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dureuill commented Feb 9, 2023

Uh oh!

irevoire Feb 13, 2023

Choose a reason for hiding this comment

Uh oh!

dureuill Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

dureuill Feb 22, 2023

Choose a reason for hiding this comment

Uh oh!

irevoire Feb 13, 2023

Choose a reason for hiding this comment

Uh oh!

dureuill Feb 22, 2023

Choose a reason for hiding this comment

Uh oh!

dureuill commented Feb 15, 2023

remaining TODO

Uh oh!

dureuill commented Feb 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kerollmops left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dureuill commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kerollmops left a comment

Choose a reason for hiding this comment

Uh oh!

dureuill commented Feb 23, 2023

Uh oh!

bors bot commented Feb 23, 2023

Uh oh!

Uh oh!

dureuill commented Jan 11, 2023 •

edited

Loading

github-actions bot commented Feb 7, 2023 •

edited

Loading

dureuill commented Feb 20, 2023 •

edited

Loading

dureuill commented Feb 23, 2023 •

edited

Loading