Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allow configuration of commit batch size in ci.yaml #130499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dnfield opened this issue Jul 13, 2023 · 18 comments
Closed

Allow configuration of commit batch size in ci.yaml #130499

dnfield opened this issue Jul 13, 2023 · 18 comments
Labels
a: tests "flutter test", flutter_test, or one of our tests team-infra Owned by Infrastructure team

Comments

@dnfield
Copy link
Contributor

dnfield commented Jul 13, 2023

Context: Firebase Testlab has physical devices, only some of which are highly available. If we run tests too frequently on a non-highly available device, it will time out. This is not an issue for virtual devices.

#130497 appears to be the result of running a test on a busy day on a device that probably isn't highly available right now.

It would be great if we could tell cocoon to run this test in batches of say 30 commits, so that it runs more or less once a day on a busy day (it's ok if it doesn't run at all on some days). Today the yaml file says that all targets are run on every commit.

@godofredoc @CaseyHillers for input

@reidbaker @gmackall @zanderso @jonahwilliams fyi

@dnfield dnfield added a: tests "flutter test", flutter_test, or one of our tests team-infra Owned by Infrastructure team labels Jul 13, 2023
@keyonghan
Copy link
Contributor

Cocoon is already batch scheduling framework tasks with a batch size of 6. It is the back-filling logic which back fills those skipped tasks within a batch.
One thing we can do is add a flag in the .ci.yaml to skip backfilling for the target.

Question: if we can bear with a test running as a batch size of 30 commits, or even no run at all for some days, is the test important enough to validate the tree?

@gmackall
Copy link
Member

Context question about the particular test on the linked PR, I see that it was enabled on presubmit and therefore running in the checks on PRs - it was probably running a good bit more than 30 times a day then, right?

I ask because 1) I want to confirm that we do in fact run firebase tests on pre-submit (I didn't know this if so) and 2) I don't have a good understanding of the ratio of presubmit runs to postsubmit runs.

@zanderso
Copy link
Member

@keyonghan The request is to configure the batch size on a per-test basis. The tests are important, but for cases where FTL lacks capacity for specific devices, running more frequently will cause tests to timeout while waiting for available devices, and close the tree.

@keyonghan
Copy link
Contributor

keyonghan commented Jul 13, 2023

@keyonghan The request is to configure the batch size on a per-test basis. The tests are important, but for cases where FTL lacks capacity for specific devices, running more frequently will cause tests to timeout while waiting for available devices, and close the tree.

This should be doable, but I am concerned when a real tree breakage change exists within the batch. Say a test with a batch size of 30 and the 2rd commit contains a breaking change, then we can only catch the breakage 30 commits later.

Also is there any other use case in addition to this FTL test? Does it make more sense to mark it as flaky (as what Dan did now) to have staging pool validate all the time (though maybe failing consistently, it will not block the tree and will not miss validation on any breaking change)? Marking a test as flaky or changing the batch size (if supported) each needs a PR change, and each needs a revert PR when bot/device capacity is back.

@dnfield
Copy link
Contributor Author

dnfield commented Jul 13, 2023

The basic problem is that we'd like to test on hardware that does not have enough availability to test on every commit.

We cannot run those tests on presubmit, as that will be too much.

We currently have two options:

  • Mark the test as flaky indefinitely and manually check if it's failing.
  • Mark the test as non-flaky and deal with flakes.

If we run the test less frequently, it should be less flaky (because we won't be overloading the availablity of devices in FTL).

@dnfield
Copy link
Contributor Author

dnfield commented Jul 13, 2023

Ideally FTL would give us an API to check if a device is available but that does not exist AFAIK.

@dnfield
Copy link
Contributor Author

dnfield commented Jul 13, 2023

I started a thread in the internal Flutter/FTL group to see if we can figure out why this test took so long to time out and whether there's a better option the FTL team can give us too.

@keyonghan
Copy link
Contributor

Mark the test as flaky indefinitely and manually check if it's failing.

The target is being validated in staging pool (after marked as flaky). See a passing build: https://ci.chromium.org/ui/p/flutter/builders/staging/Linux%20firebase_oriol33_abstract_method_smoke_test/484/overview

@keyonghan
Copy link
Contributor

Another thing we want to do in the future is to disable presubmit run with presubmit: false. When it was enabled, it was being validated in the try pool: https://github.com/flutter/flutter/pull/130497/files
Maybe that is one reason for potential high consumption of the capacity.

@dnfield
Copy link
Contributor Author

dnfield commented Jul 13, 2023

Ahh so maybe we should just mark the ones with lower availability as presubmit: false?

@keyonghan
Copy link
Contributor

Ahh so maybe we should just mark the ones with lower availability as presubmit: false?

That should help. The number of presubmit runs is much bigger than that of the post-submit.

@reidbaker
Copy link
Contributor

The basic problem is that we'd like to test on hardware that does not have enough availability to test on every commit.

We cannot run those tests on presubmit, as that will be too much.

We currently have two options:

  • Mark the test as flaky indefinitely and manually check if it's failing.
  • Mark the test as non-flaky and deal with flakes.

If we run the test less frequently, it should be less flaky (because we won't be overloading the availablity of devices in FTL).

The issue here is the test is not flaky the infrastructure running the test is flakey and we have a way to deal with with that. Also we have the option of extending the timeout.

@dnfield
Copy link
Contributor Author

dnfield commented Jul 13, 2023

Right now, here's the list of availability. The device we're running for this test is "medium" availability, there is a high availability API 33 device (panther) we should try instead.

 gcloud firebase test android list-device-capacities
┌──────────────┬─────────────────────────────┬───────────────┬─────────────────┐
│   MODEL_ID   │          MODEL_NAME         │ OS_VERSION_ID │ DEVICE_CAPACITY │
├──────────────┼─────────────────────────────┼───────────────┼─────────────────┤
│ 1610         │ vivo 1610                   │ 23            │ Medium          │
│ F01L         │ F-01L                       │ 27            │ High            │
│ FRT          │ Nokia 1                     │ 27            │ High            │
│ G8142        │ G8142                       │ 25            │ Low             │
│ HWCOR        │ COR-L29                     │ 27            │ Medium          │
│ HWMHA        │ MHA-L29                     │ 24            │ Medium          │
│ SH-01L       │ SH-01L                      │ 28            │ High            │
│ TC77         │ TC77                        │ 27            │ Low             │
│ a10          │ SM-A105FN                   │ 29            │ High            │
│ a51          │ SM-A515U                    │ 31            │ Low             │
│ b0q          │ SM-S908U1                   │ 33            │ High            │
│ bluejay      │ Pixel 6a                    │ 32            │ Medium          │
│ blueline     │ Pixel 3                     │ 28            │ High            │
│ cactus       │ Redmi 6A                    │ 27            │ High            │
│ cheetah      │ Pixel 7 Pro                 │ 33            │ Medium          │
│ crownqlteue  │ SM-N960U1                   │ 29            │ Medium          │
│ dreamlte     │ SM-G950F                    │ 28            │ High            │
│ f2q          │ SM-F916U1                   │ 30            │ Medium          │
│ felix        │ Pixel Fold                  │ 33            │ High            │
│ felix_camera │ Pixel Fold (Camera-enabled) │ 33            │ Low             │
│ grandppltedx │ SM-G532G                    │ 23            │ High            │
│ griffin      │ XT1650                      │ 24            │ Low             │
│ gts3lltevzw  │ SM-T827V                    │ 28            │ Low             │
│ gts8uwifi    │ SM-X900                     │ 33            │ High            │
│ hammerhead   │ Nexus 5                     │ 23            │ Medium          │
│ harpia       │ Moto G Play                 │ 23            │ Medium          │
│ java         │ Motorola G20                │ 30            │ High            │
│ lv0          │ LG-AS110                    │ 23            │ High            │
│ oriole       │ Pixel 6                     │ 31            │ High            │
│ oriole       │ Pixel 6                     │ 32            │ Medium          │
│ oriole       │ Pixel 6                     │ 33            │ Medium          │
│ panther      │ Pixel 7                     │ 33            │ High            │
│ pettyl       │ moto e5 play                │ 27            │ Medium          │
│ q2q          │ SM-F926U1                   │ 30            │ Low             │
│ q2q          │ SM-F926U1                   │ 31            │ Low             │
│ r11          │ Google Pixel Watch          │ 30            │ Medium          │
│ redfin       │ Pixel 5                     │ 30            │ High            │
│ sailfish     │ Pixel                       │ 25            │ Medium          │
│ starqlteue   │ SM-G960U1                   │ 26            │ High            │
│ tangorpro    │ Pixel Tablet                │ 33            │ High            │
│ x1q          │ SM-G981U1                   │ 29            │ High            │
└──────────────┴─────────────────────────────┴───────────────┴─────────────────┘

@CaseyHillers
Copy link
Contributor

Should we disable FTL on the release branches? RCs don't batch tasks.

@dnfield
Copy link
Contributor Author

dnfield commented Jul 13, 2023

RCs also only run pretty infrequently right?

@CaseyHillers
Copy link
Contributor

RCs also only run pretty infrequently right?

Based on Q2, there were 2 runs/workday from RCs

@godofredoc
Copy link
Contributor

Based on the comments, it seems like the solution was to mark the test as presubmit=false.

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
a: tests "flutter test", flutter_test, or one of our tests team-infra Owned by Infrastructure team
Projects
None yet
Development

No branches or pull requests

7 participants