Thanks to visit codestin.com
Credit goes to github.com

Skip to content

"Skia Gold received an unapproved image in post-submit" failures incorrectly reported as flaky #105915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
HansMuller opened this issue Jun 13, 2022 · 16 comments
Labels
c: flake Tests that sometimes, but not always, incorrectly pass infra: auto flake bot Issues with the bot that files flake issues infra: metrics Infrastructure metrics-related issues P2 Important issues not at the top of the work list team-infra Owned by Infrastructure team triaged-infra Triaged by Infrastructure team

Comments

@HansMuller
Copy link
Contributor

These three issues were automatically marked as P1 flakes: https://github.com/flutter/flutter/issues?q=is%3Aissue+issues%3A+105613+105608+105614. That's - in part - because of "unapproved image in post-submit" failures reported by Skia Gold. These Skia Gold failures should not contribute to the flaky test statistics, see for example #105613 (comment)

@HansMuller HansMuller added the infra: auto flake bot Issues with the bot that files flake issues label Jun 13, 2022
@Piinks Piinks added team-infra Owned by Infrastructure team infra: metrics Infrastructure metrics-related issues labels Jun 13, 2022
@HansMuller HansMuller added the P1 label Jun 13, 2022
@HansMuller
Copy link
Contributor Author

I've marked this issue P1 because the https://github.com/flutter/flutter/issues?q=is%3Aissue+issues%3A+105613+105608+105614 were all reported P1.

@keyonghan
Copy link
Contributor

Unfortunately there is not an easy way for the flake bot to analyze raw error messages and distinguish them differently when detecting flakes (it sees only failure or pass).
Instead, I think we should enforce the Gold image workflow if that will cause task failure in CI and tree red. Does it make sense? /cc @Piinks

@Piinks
Copy link
Contributor

Piinks commented Jun 13, 2022

I think we should enforce the Gold image workflow if that will cause task failure in CI and tree red

We already do this. If the workflow is not followed, the tree turns red. That is why it is being accidentally marked flaky.

@keyonghan
Copy link
Contributor

We already do this. If the workflow is not followed, the tree turns red. That is why it is being accidentally marked flaky.

I am wondering if this should be caught in presubmit tests?

@Piinks
Copy link
Contributor

Piinks commented Jun 13, 2022

Please see #93515 for more context.
We cannot catch this reliably in presubmit.

@keyonghan
Copy link
Contributor

Thanks for the context @Piinks .

I would repurpose this issue to support flake filtering out customized test failure errors. Maybe a flag to skip flake counts for such failures. But this will need non-trivial support to parse error messages or make tests output to be parsable/general.

As this issue happened on only one commit, and is not blocking anything, decreasing to P4 and moving to technical debt.

@keyonghan keyonghan added P2 Important issues not at the top of the work list and removed P1 labels Jun 13, 2022
@keyonghan
Copy link
Contributor

Trying to understand the workflow. If people don't follow the correct way, this will cause post-submit CI (tree) red. And this expects to keep the tree red. However our auto retry will rerun the failed builds and try greening the tree asap. The successful retry contributes to the flake.
My question is: how those unapproved images would pass on the retry?

Regarding pre-submit check:

  1. For Fix lerp to eccentric circle. #108743, the bot complains about golden file change, and added label will affect goldens. Does it make sense to fail the gold status check based on that?
  2. For Deprecate toggleableActiveColor #97972, no complains about golden file change, but it fails post-submit CI due to unapproved golden images. Is this expected?

As there is no easy way for the flake bot to exclude such failures/flakes, I am looking for any potential workarounds re such golden image errors. Based on dashboard, this issue happens on multiple commits in recent days and caused tree red intermittently.

@Piinks
Copy link
Contributor

Piinks commented Aug 2, 2022

Sometimes golden file test can be flakey, in that they do not produce the same image every time. I have only seen this happen on canvas kit image tests, but @yjbanov adding some fuzziness to the image test to reduce that flakiness. That is what happened in the case of 2 above.

In the first case, the PR did introduce image changes, but the flutter-gold check can never go red in presubmit. Doing so would break the engine auto roller when it introduces image changes. That is why flutter-gold holds a pending state in presubmit until images are approved.

There may not be a way to filter these tests out of flaky reports, it is not easy to distinguish. If the tree goes (correctly red) on an unapproved image, someone can go to https://flutter-gold.skia.org/, approve the image, and then it would pass on a retry.

@keyonghan
Copy link
Contributor

keyonghan commented Aug 3, 2022

In the first case, the PR did introduce image changes, but the flutter-gold check can never go red in presubmit. Doing so would break the engine auto roller when it introduces image changes. That is why flutter-gold holds a pending state in presubmit until images are approved.

I see. So it seems we choose to block the framework tree later on, instead of failing the engine roller? IIUC, when golden image changes in an engine roller PR, either it needs a manual approval in presubmit (which we are not doing), or a manual approval in postsubmit when it reds the tree (which is what we are doing now).

Why do we not fail earlier? As the latter will block all development workflow, and both need a manual intervention anyway. Did I miss anything? /cc @zanderso

@Piinks
Copy link
Contributor

Piinks commented Aug 3, 2022

either it needs a manual approval in presubmit (which we are not doing)

Oh no, this is what we are doing. The engine sheriff approved images if the roll introduces new images

@zanderso
Copy link
Member

zanderso commented Aug 3, 2022

Yeah, the Engine roll shouldn't turn red, but should be held pending waiting for manual approval. (I think the notification that a roll is in this state probably needs improvement. Sometimes the sheriff doesn't notice for several hours.)

@keyonghan
Copy link
Contributor

keyonghan commented Aug 3, 2022

Sync'ed with @Piinks , the workflow does make sense.

Now we are having two issues from gold server side which contributes to flakiness from our side.

@yjbanov
Copy link
Contributor

yjbanov commented Sep 2, 2022

I have only seen this happen on canvas kit image tests, but @yjbanov adding some fuzziness to the image test to reduce that flakiness.

Fuzzy matching is only used for HTML renderer. CanvasKit uses strict matching, like the non-web version.

@ricardoamador
Copy link
Contributor

@godofredoc Any update?

@godofredoc godofredoc removed their assignment Jun 14, 2023
@flutter-triage-bot flutter-triage-bot bot added the c: flake Tests that sometimes, but not always, incorrectly pass label Jul 7, 2023
@ricardoamador ricardoamador added the triaged-infra Triaged by Infrastructure team label Aug 23, 2023
@matanlurey
Copy link
Contributor

Please see #93515 for more context. We cannot catch this reliably in presubmit.

@Piinks Low priority - do you think this is still important?

@Piinks
Copy link
Contributor

Piinks commented Feb 26, 2025

I think so, I agree though it's low priority right now. In an ideal world, our flaky bot detector could be a bit smarter about this scenario.

@Piinks Piinks removed their assignment Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: flake Tests that sometimes, but not always, incorrectly pass infra: auto flake bot Issues with the bot that files flake issues infra: metrics Infrastructure metrics-related issues P2 Important issues not at the top of the work list team-infra Owned by Infrastructure team triaged-infra Triaged by Infrastructure team
Projects
None yet
Development

No branches or pull requests

10 participants