-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Mac presubmit queues are out of SLO #114656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
2h is probably too much but a huge increase in execution|queue time is expected after #113539 where we are explicitly cleaning the local xcode cache |
Setting as P1 seems like mac capacity is at 100% utilization from 6:00AM PST to 5:00PM PST. Next step we need to identify if this is normal usage or something strange is happening. |
From the data it seems like we've always had peeks of 30M 100% utilization at ~9:00AM PST but we started to have 3+ peeks per day on 10/17/2022. Things got really bad on 10/24/2022 where we started with 100% utilization from 9:00 to 5:00 |
The local cache cleanup happened on 10/19 (#113729), the queue time (90th%) was <1min for 10/19-10/24. |
One correlated change per the timing is the quota increase (cl/483782210, which was landed 0n 10/25). @godofredoc Are you comfortable with a revert? |
|
Ths is
Can you please post daily data for: 10/25, 10/27, 11/03? |
The engine source checkout was not improved with this quota increase CL. It has been taking 5-7 mins before and after. See |
That is expected, cl/483782210 fixes peek quota whith checkouts taking 40mins in try. 5-7 mins is expected in cold checkouts if we are getting consistent 5-7 mins checkouts then that is a bug that need to be fixed. |
10/25: 6 min |
Are these 90th percentiles? can we also get the 50th? |
Here are the comparison between 90th and 50th, plus the build number: The queue time correlates with the No. of builds |
When did you enable the Mac engine v2 builders? |
Nice, in this case the 50th %tile is giving us a better signal.
Higher queue times are expected in engine_v2 under the current conditions because they require multiple bots to complete. The decrease in the 50th percentile which I believe most of the legacy builds will fit into, except for engine_v2, web engine and fuchsia builders was caused by the following PRs: |
Probably we are making thing a bit worse for a short period of time ~4h while we re-allocate mac machines to different pools. The plan to address the queue time is as follows:
|
@khyati82 this is the impact we were expecting during the transition of legacy to engine_v2. |
https://chrome-internal-review.googlesource.com/c/infradata/config/+/5074772 to move half (8) bots from staging to try. |
Do you mean to use linux VMs to host Mac engine_v2 builders? @godofredoc |
The current implementation will make this look weird. Basically I meant to change https://github.com/flutter/engine/blob/main/.ci.yaml#L327 to use a linux machine rather than mac. Which I believe requires to change |
After all the caches have been cleaned and the logic to cleanup runtimes has landed we may not need to always delete the caches. Bug: #114656
CL to default to not cleaning source code caches in Web Engine builds: https://flutter-review.googlesource.com/c/recipes/+/35661 |
This property was previously hardcoded in the recipe. Bug: flutter/flutter#114656
I'm not sure how you are distributing mac resources but fwiw https://flutter-review.git.corp.google.com/c/recipes/+/35760 should lower mac clang-tidy bot execution 5-10m. The number of linted files is going from 800->600. Hopefully that helps a bit. |
The json translation of gclient_vars is failing because it can't parse the boolean value as True. Bug: flutter/flutter#114656
This is to make the behavior consistent for json lists and dictionaries. Bug: flutter/flutter#114656
This property was previously hardcoded in the recipe. Bug: flutter/flutter#114656
The json translation of gclient_vars is failing because it can't parse the boolean value as True. Bug: flutter/flutter#114656
The support for per-build custom gclient variable overrides is removed from the build recipes, so the unused fields in the build configuration json are removed here. This completes the change made in flutter#37351 Bug: flutter/flutter#114656
This property was previously hardcoded in the recipe. Bug: flutter/flutter#114656
The json translation of gclient_vars is failing because it can't parse the boolean value as True. Bug: flutter/flutter#114656
Queue time drops to 20mins for past two weeks. Moving to TD for further optimization track. |
After all the caches have been cleaned and the logic to cleanup runtimes has landed we may not need to always delete the caches. Bug: flutter#114656
After all the caches have been cleaned and the logic to cleanup runtimes has landed we may not need to always delete the caches. Bug: flutter#114656
This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of |
I've had several PRs the past 2 weeks be out of SLO (stuck in queue for 2 hours). I don't send PRs during the peak hours, and I expect this is worse for those contributing during those hours.
Example PR: #114646 (first commit was queued for 2 hours)
The text was updated successfully, but these errors were encountered: