-
Notifications
You must be signed in to change notification settings - Fork 28.7k
"Skia Gold received an unapproved image in post-submit" failures incorrectly reported as flaky #105915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've marked this issue P1 because the https://github.com/flutter/flutter/issues?q=is%3Aissue+issues%3A+105613+105608+105614 were all reported P1. |
Unfortunately there is not an easy way for the flake bot to analyze raw error messages and distinguish them differently when detecting flakes (it sees only failure or pass). |
We already do this. If the workflow is not followed, the tree turns red. That is why it is being accidentally marked flaky. |
I am wondering if this should be caught in presubmit tests? |
Please see #93515 for more context. |
Thanks for the context @Piinks . I would repurpose this issue to support flake filtering out customized test failure errors. Maybe a flag to skip As this issue happened on only one commit, and is not blocking anything, decreasing to P4 and moving to technical debt. |
Trying to understand the workflow. If people don't follow the correct way, this will cause post-submit CI (tree) red. And this expects to keep the tree red. However our auto retry will rerun the failed builds and try greening the tree asap. The successful retry contributes to the flake. Regarding pre-submit check:
As there is no easy way for the flake bot to exclude such failures/flakes, I am looking for any potential workarounds re such golden image errors. Based on dashboard, this issue happens on multiple commits in recent days and caused tree red intermittently. |
Sometimes golden file test can be flakey, in that they do not produce the same image every time. I have only seen this happen on canvas kit image tests, but @yjbanov adding some fuzziness to the image test to reduce that flakiness. That is what happened in the case of 2 above. In the first case, the PR did introduce image changes, but the flutter-gold check can never go red in presubmit. Doing so would break the engine auto roller when it introduces image changes. That is why flutter-gold holds a pending state in presubmit until images are approved. There may not be a way to filter these tests out of flaky reports, it is not easy to distinguish. If the tree goes (correctly red) on an unapproved image, someone can go to https://flutter-gold.skia.org/, approve the image, and then it would pass on a retry. |
I see. So it seems we choose to block the framework tree later on, instead of failing the engine roller? IIUC, when golden image changes in an engine roller PR, either it needs a manual approval in presubmit (which we are not doing), or a manual approval in postsubmit when it reds the tree (which is what we are doing now). Why do we not fail earlier? As the latter will block all development workflow, and both need a manual intervention anyway. Did I miss anything? /cc @zanderso |
Oh no, this is what we are doing. The engine sheriff approved images if the roll introduces new images |
Yeah, the Engine roll shouldn't turn red, but should be held pending waiting for manual approval. (I think the notification that a roll is in this state probably needs improvement. Sometimes the sheriff doesn't notice for several hours.) |
Sync'ed with @Piinks , the workflow does make sense. Now we are having two issues from gold server side which contributes to flakiness from our side.
|
Fuzzy matching is only used for HTML renderer. CanvasKit uses strict matching, like the non-web version. |
@godofredoc Any update? |
I think so, I agree though it's low priority right now. In an ideal world, our flaky bot detector could be a bit smarter about this scenario. |
These three issues were automatically marked as P1 flakes: https://github.com/flutter/flutter/issues?q=is%3Aissue+issues%3A+105613+105608+105614. That's - in part - because of "unapproved image in post-submit" failures reported by Skia Gold. These Skia Gold failures should not contribute to the flaky test statistics, see for example #105613 (comment)
The text was updated successfully, but these errors were encountered: