You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have an effort underway to replace `dbmem` (#15109), and consequently
we've begun running our full test-suite (with Postgres) on all supported
OSs - Windows, MacOS, and Linux, since #15520.
Since this change, we've seen a marked decrease in the success rate of
our builds on `main` (note how the Windows/MacOS failures account for
the vast majority of failed builds):

We're still investigating why these OSs are a lot less reliable. It's
likely that the VMs on which the builds are run have different
characteristics from our Ubuntu runners such as disk I/O, network
latency, or something else.
**In the meantime, we need to start trusting CI failures in `main`
again, as the current failures are too noisy / vague for us to
correct.**
We've also considered hosting our own runners where possible so we can
get OS-level observability to rule out some possibilities.
See the [meeting
notes](https://www.notion.so/coderhq/CI-Investigation-Call-Notes-17dd579be59280d8897cc9fe4bb46695?pvs=6&utm_content=17dd579b-e592-80d8-897c-c9fe4bb46695&utm_campaign=T1ZPT2FL0&n=slack&n=slack_link_unfurl)
where we linked into this for more detail.
This PR introduces several changes:
1. Moves the full test-suite with Postgres on Windows/MacOS to the
`nightly-gauntlet` workflow
tradeoff: this means that any regressions may be more difficult to
discover since we merge to main several times a day
2. Run only the CLI test-suite on each PR / merge to `main` on
Windows/MacOS
3. `test-go` is still running the full test-suite against all OSs
(including the CLI ones), but will soon be removed once #15109 is
completed since it uses `dbmem`
4. Changes `nightly-gauntlet` to run at 4AM: we've seen several
instances of the runner being stopped externally, and we're _guessing_
this may have something to do with the midnight UTC execution time, when
other cron jobs may run
5. Removes the existing `nightly-gauntlet` jobs since they haven't
passed in a long time, indicating that nobody cares enough to fix them
and they don't provide diagnostic value; we can restore them later if
necessary
I've manually run both these new workflows successfully:
- `ci`:
https://github.com/coder/coder/actions/runs/12825874176/job/35764724907
- `nightly-gauntlet`:
https://github.com/coder/coder/actions/runs/12825539092
---------
Signed-off-by: Danny Kopping <[email protected]>
Co-authored-by: Muhammad Atif Ali <[email protected]>
0 commit comments