-
Notifications
You must be signed in to change notification settings - Fork 28.6k
Allow configuration of commit batch size in ci.yaml #130499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Cocoon is already batch scheduling framework tasks with a batch size of 6. It is the back-filling logic which back fills those skipped tasks within a batch. Question: if we can bear with a test running as a batch size of 30 commits, or even no run at all for some days, is the test important enough to validate the tree? |
Context question about the particular test on the linked PR, I see that it was enabled on presubmit and therefore running in the checks on PRs - it was probably running a good bit more than 30 times a day then, right? I ask because 1) I want to confirm that we do in fact run firebase tests on pre-submit (I didn't know this if so) and 2) I don't have a good understanding of the ratio of presubmit runs to postsubmit runs. |
@keyonghan The request is to configure the batch size on a per-test basis. The tests are important, but for cases where FTL lacks capacity for specific devices, running more frequently will cause tests to timeout while waiting for available devices, and close the tree. |
This should be doable, but I am concerned when a real tree breakage change exists within the batch. Say a test with a batch size of 30 and the 2rd commit contains a breaking change, then we can only catch the breakage 30 commits later. Also is there any other use case in addition to this FTL test? Does it make more sense to mark it as flaky (as what Dan did now) to have staging pool validate all the time (though maybe failing consistently, it will not block the tree and will not miss validation on any breaking change)? Marking a test as flaky or changing the batch size (if supported) each needs a PR change, and each needs a revert PR when bot/device capacity is back. |
The basic problem is that we'd like to test on hardware that does not have enough availability to test on every commit. We cannot run those tests on presubmit, as that will be too much. We currently have two options:
If we run the test less frequently, it should be less flaky (because we won't be overloading the availablity of devices in FTL). |
Ideally FTL would give us an API to check if a device is available but that does not exist AFAIK. |
I started a thread in the internal Flutter/FTL group to see if we can figure out why this test took so long to time out and whether there's a better option the FTL team can give us too. |
The target is being validated in staging pool (after marked as flaky). See a passing build: https://ci.chromium.org/ui/p/flutter/builders/staging/Linux%20firebase_oriol33_abstract_method_smoke_test/484/overview |
Another thing we want to do in the future is to disable presubmit run with |
Ahh so maybe we should just mark the ones with lower availability as presubmit: false? |
That should help. The number of presubmit runs is much bigger than that of the post-submit. |
The issue here is the test is not flaky the infrastructure running the test is flakey and we have a way to deal with with that. Also we have the option of extending the timeout. |
Right now, here's the list of availability. The device we're running for this test is "medium" availability, there is a high availability API 33 device (panther) we should try instead.
|
Should we disable FTL on the release branches? RCs don't batch tasks. |
RCs also only run pretty infrequently right? |
Based on Q2, there were 2 runs/workday from RCs |
Based on the comments, it seems like the solution was to mark the test as presubmit=false. |
This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of |
Uh oh!
There was an error while loading. Please reload this page.
Context: Firebase Testlab has physical devices, only some of which are highly available. If we run tests too frequently on a non-highly available device, it will time out. This is not an issue for virtual devices.
#130497 appears to be the result of running a test on a busy day on a device that probably isn't highly available right now.
It would be great if we could tell cocoon to run this test in batches of say 30 commits, so that it runs more or less once a day on a busy day (it's ok if it doesn't run at all on some days). Today the yaml file says that all targets are run on every commit.
@godofredoc @CaseyHillers for input
@reidbaker @gmackall @zanderso @jonahwilliams fyi
The text was updated successfully, but these errors were encountered: