Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix: reduce parallelism and increase worker size on go-test-race #15106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Oct 16, 2024

Sets parallelism on go-test-race to 4 concurrent tests and 2 concurrent packages.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite Graphite

Copy link
Contributor Author

@Emyrk @johnstcn @dannykopping @ammario WDYT?

My hope is that this will reduce occurrences of flakes like coder/internal#102 where things just get inexplicably slow for multiple seconds.

We could also consider throwing beefier hardware at the race tests, but I still think we need to explicitly limit parallelism, or Go will see the increased HW threads and do even more stuff in parallel.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give it a try! This was suggested previously in #12761 (without -p 2).

For the benefit of future readers, I'd also suggest adding a clarifying comment and linking to this PR for context.

@spikecurtis Do you happen to know the approximate difference in execution time with and without? Do we maybe also need to update the value for -timeout?

EDIT: should we also update test-race to be more in line with the CI version?

@matifali
Copy link
Member

matifali commented Oct 16, 2024

We can also experiment with a 4-core runner to reduce CI costs if we get the same execution time. For context, using a smaller runner on test-e2e didn't impact execution time.

Edit:

We could also consider throwing beefier hardware at the race tests

My suggestion was exactly opposite to what you are saying.

@spikecurtis
Copy link
Contributor Author

With this PR, the go-test-race goes from about 6 min to 10 min elapsed time.

@spikecurtis
Copy link
Contributor Author

@matifali I think what's going on is that in CI, we are running too many things in parallel and starving individual tests of CPU they need to execute in something close to the amount of time they were designed for. That is, we need more resources for the go-test-race target, either by taking a longer elapsed time, or if that is unacceptable, running on better hardware.

@matifali
Copy link
Member

@spikecurtis Feel free to use a larger runner if it reduces CI time. if a 16-core runner does this in 5 minutes it will cost us the same amount while reducing time.

Copy link
Member

@Emyrk Emyrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many parallel github actions jobs can run on 1 machine?

Is it possible 2 go test are running on the same machine from different branches? Like 2 github actions firing and running in parallel?

@matifali
Copy link
Member

@Emyrk, each job in the workflow is run on a new ephemeral VM. They are single tenant. See https://depot.dev/docs/github-actions/overview

@dannykopping
Copy link
Contributor

@spikecurtis do you know if we're setting GOMAXPROCS on our test runners?

@spikecurtis spikecurtis changed the title fix: reduce parallelism to 4x2 on go-test-race fix: reduce parallelism and increase worker size on go-test-race Oct 17, 2024
@spikecurtis
Copy link
Contributor Author

@spikecurtis do you know if we're setting GOMAXPROCS on our test runners?

I think we are not, which makes it default to the number of HW threads

@spikecurtis
Copy link
Contributor Author

I increased to 4 concurrent packages and the worker to a 16 core box and we are back at ~6min.

@spikecurtis spikecurtis marked this pull request as ready for review October 17, 2024 12:53
@dannykopping
Copy link
Contributor

dannykopping commented Oct 17, 2024

@spikecurtis do you know if we're setting GOMAXPROCS on our test runners?

I think we are not, which makes it default to the number of HW threads

Are we only running one CI job per host, or multiple in parallel?
I assume the latter.

In that case we could consider tuning GOMAXPROCS a bit.

@spikecurtis
Copy link
Contributor Author

Are we only running one CI job per host, or multiple in parallel?
I assume the latter.

Its one CI job per host, running on depot.dev VMs

@dannykopping
Copy link
Contributor

Its one CI job per host, running on depot.dev VMs

Ah. In that case the default GOMAXPROCS is fine.

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@Emyrk Emyrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to experiment 👍

The slowness on that example is a bit crazy.

For the example in particular, I do think we could refactor the test to more isolate the agent and Coder ends. Make it do less 🤷‍♂️

@spikecurtis spikecurtis merged commit d18e830 into main Oct 18, 2024
31 of 33 checks passed
@spikecurtis spikecurtis deleted the spike/race-parallel branch October 18, 2024 06:45
@github-actions github-actions bot locked and limited conversation to collaborators Oct 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants