-
Notifications
You must be signed in to change notification settings - Fork 881
Remove the in-memory database #15109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@hugodutka You might want to also look at references to |
Can you share more details? Like, how you tested and what the numbers were? In CI we use a 4 core runner for the in-memory tests and an 8 core runner for the postgres tests and they both take 4 minutes. Just anecdotally, when I run the coderd test suite with the in-memory database most tests take less than 1 sec, but with |
For a single test run in isolation here’s what I did:
@spikecurtis when running tests with |
For the full suite, I changed this line to Postgres Test:
Took 108s without build cache. In-Memory Test:
Took 121s without build cache. I ran all tests on a local machine - AMD Ryzen 7 5800X, 32 GB RAM, and some NVME. |
Some tests using the memory implementation intentionally do not insert all the data dependencies to run a test. The real db has foreign key dependents, so those tests would probably now fail. On the other side of things, tests that are insert heavy use the in memory implementation as I found many inserts make the db tests slower. |
I wonder if it makes sense to allow seeding the database with some raw SQL? e.g. |
@johnstcn I think that would be huge, but I just do not want to have to write raw sql for a test setup 😢 |
@hugodutka I was not running with I ran It's also a challenge to use These issues are probably soluble with some effort, but just understand that it's going to be more complicated than just deleting a bunch of code. |
@spikecurtis Thanks for the details! I checked the Docker image for Postgres that I totally agree with you - there are a lot of little details to consider, and it's not as simple as just deleting a bunch of code. For instance, to make Postgres tests snappy, we can't afford to run migrations for every test. Since using The issue description was a bit light on specifics, but I was planning to work out those details in the PR. Appreciate your input - it helps to make sure the PR will be useful. |
@hugodutka Just some extra random information. If you run Coder locally, we default to running a local Postgres database. This database persists it's data somewhere on disk, I can't recall exactly where. ( You mentioned saving a template cache, would this be something like a pg_dump? Curious if we could have the We would have to make sure to clean up said resources though if we don't use ephemeral docker volumes. Might be more work than it's worth 🤷♂ |
@Emyrk Regarding template cache I was thinking about a regular database in whatever postgres instance the user is already running. My plan is to name the template database with the hash of the contents of all migration files. Then when a test is run, it checks if such a database already exists and if not, creates it. Then it creates its own test db with |
Progress UpdateI've completed the conversion of the entire test suite to use PostgreSQL, fully removing As I started enabling PostgreSQL in CI for Windows, macOS, and race tests, I learned that:
Additionally, I discovered that the 10s+ latency reported by some developers on individual tests was likely due to each test instance creating a new PostgreSQL container if My plan is to proceed with a series of PRs to:
|
This PR is the first in a series aimed at closing [#15109](#15109). ### Changes - **Template Database Creation:** `dbtestutil.Open` now has the ability to create a template database if none is provided via `DB_FROM`. The template database’s name is derived from a hash of the migration files, ensuring that it can be reused across tests and is automatically updated whenever migrations change. - **Optimized Database Handling:** Previously, `dbtestutil.Open` would spin up a new container for each test when `DB_FROM` was unset. Now, it first checks for an active PostgreSQL instance on `localhost:5432`. If none is found, it creates a single container that remains available for subsequent tests, eliminating repeated container startups. These changes address the long individual test times (10+ seconds) reported by some users, likely due to the time Docker took to start and complete migrations.
This PR is the second in a series aimed at closing #15109. ## Changes - adds `scripts/embedded-pg/main.go`, which can start a native Postgres database. This is used to set up PG on Windows and macOS, as these platforms don't support Docker in Github Actions. - runs the `test-go-pg` job on macOS and Windows too - adds the `test-go-race-go` job, which runs race tests with Postgres on Linux
Another PR to address #15109. - adds the DisableForeignKeysAndTriggers utility, which simplifies converting tests from in-mem to postgres - converts the dbauthz test suite to pass on both the in-mem db and Postgres
We have an effort underway to replace `dbmem` (#15109), and consequently we've begun running our full test-suite (with Postgres) on all supported OSs - Windows, MacOS, and Linux, since #15520. Since this change, we've seen a marked decrease in the success rate of our builds on `main` (note how the Windows/MacOS failures account for the vast majority of failed builds):  We're still investigating why these OSs are a lot less reliable. It's likely that the VMs on which the builds are run have different characteristics from our Ubuntu runners such as disk I/O, network latency, or something else. **In the meantime, we need to start trusting CI failures in `main` again, as the current failures are too noisy / vague for us to correct.** We've also considered hosting our own runners where possible so we can get OS-level observability to rule out some possibilities. See the [meeting notes](https://www.notion.so/coderhq/CI-Investigation-Call-Notes-17dd579be59280d8897cc9fe4bb46695?pvs=6&utm_content=17dd579b-e592-80d8-897c-c9fe4bb46695&utm_campaign=T1ZPT2FL0&n=slack&n=slack_link_unfurl) where we linked into this for more detail. This PR introduces several changes: 1. Moves the full test-suite with Postgres on Windows/MacOS to the `nightly-gauntlet` workflow tradeoff: this means that any regressions may be more difficult to discover since we merge to main several times a day 2. Run only the CLI test-suite on each PR / merge to `main` on Windows/MacOS 3. `test-go` is still running the full test-suite against all OSs (including the CLI ones), but will soon be removed once #15109 is completed since it uses `dbmem` 4. Changes `nightly-gauntlet` to run at 4AM: we've seen several instances of the runner being stopped externally, and we're _guessing_ this may have something to do with the midnight UTC execution time, when other cron jobs may run 5. Removes the existing `nightly-gauntlet` jobs since they haven't passed in a long time, indicating that nobody cares enough to fix them and they don't provide diagnostic value; we can restore them later if necessary I've manually run both these new workflows successfully: - `ci`: https://github.com/coder/coder/actions/runs/12825874176/job/35764724907 - `nightly-gauntlet`: https://github.com/coder/coder/actions/runs/12825539092 --------- Signed-off-by: Danny Kopping <[email protected]> Co-authored-by: Muhammad Atif Ali <[email protected]>
Another PR to address #15109. Changes: - Introduces the `--ephemeral` flag, which changes the Coder config directory to a temporary location. The config directory is where the built-in PostgreSQL stores its data, so using a new one results in a deployment with a fresh state. The `--ephemeral` flag is set to replace the `--in-memory` flag once the in-memory database is removed.
We have an effort underway to replace `dbmem` (#15109), and consequently we've begun running our full test-suite (with Postgres) on all supported OSs - Windows, MacOS, and Linux, since #15520. Since this change, we've seen a marked decrease in the success rate of our builds on `main` (note how the Windows/MacOS failures account for the vast majority of failed builds):  We're still investigating why these OSs are a lot less reliable. It's likely that the VMs on which the builds are run have different characteristics from our Ubuntu runners such as disk I/O, network latency, or something else. **In the meantime, we need to start trusting CI failures in `main` again, as the current failures are too noisy / vague for us to correct.** We've also considered hosting our own runners where possible so we can get OS-level observability to rule out some possibilities. See the [meeting notes](https://www.notion.so/coderhq/CI-Investigation-Call-Notes-17dd579be59280d8897cc9fe4bb46695?pvs=6&utm_content=17dd579b-e592-80d8-897c-c9fe4bb46695&utm_campaign=T1ZPT2FL0&n=slack&n=slack_link_unfurl) where we linked into this for more detail. This PR introduces several changes: 1. Moves the full test-suite with Postgres on Windows/MacOS to the `nightly-gauntlet` workflow tradeoff: this means that any regressions may be more difficult to discover since we merge to main several times a day 2. Run only the CLI test-suite on each PR / merge to `main` on Windows/MacOS 3. `test-go` is still running the full test-suite against all OSs (including the CLI ones), but will soon be removed once #15109 is completed since it uses `dbmem` 4. Changes `nightly-gauntlet` to run at 4AM: we've seen several instances of the runner being stopped externally, and we're _guessing_ this may have something to do with the midnight UTC execution time, when other cron jobs may run 5. Removes the existing `nightly-gauntlet` jobs since they haven't passed in a long time, indicating that nobody cares enough to fix them and they don't provide diagnostic value; we can restore them later if necessary I've manually run both these new workflows successfully: - `ci`: https://github.com/coder/coder/actions/runs/12825874176/job/35764724907 - `nightly-gauntlet`: https://github.com/coder/coder/actions/runs/12825539092 --------- Signed-off-by: Danny Kopping <[email protected]> Co-authored-by: Muhammad Atif Ali <[email protected]>
Another PR to address #15109. Changes: - Introduces the `--ephemeral` flag, which changes the Coder config directory to a temporary location. The config directory is where the built-in PostgreSQL stores its data, so using a new one results in a deployment with a fresh state. The `--ephemeral` flag is set to replace the `--in-memory` flag once the in-memory database is removed.
The in-memory database is currently only used in tests. It was originally added to ensure tests that do not depend on complex db logic can pass quickly.
However, I've recently ran performance tests on the postgres and in-memory test suites (both on full suites and on individual tests) and it seems there's almost no timing difference between them. After talking to @kylecarbs we've decided to get rid of the in-memory database and see what happens. If we see there was a good reason for it to exist then we'll bring it back.
Currently, coder supports the
--in-memory
flag. It should be deprecated and replaced by the--ephemeral
flag, which would initialize a new postgres db every time.The text was updated successfully, but these errors were encountered: