-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Need a way to verify flaky golden test fixes #111325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For reference, there has been some discussion in this thread: https://discord.com/channels/608014603317936148/1017957368182624297 |
On that note, it might be reasonable to accept a certain % of variation. I also remember seeing some image diff projects using machine learning to detect false positives in golden diffs (as that's not a Flutter-specific problem). Maybe that's something to look into |
@rrousselGit We allow fuzzy matching of images for the HTML renderer on the web, where we have limited control over how browsers render pixels, and they are frequently flaky. However, for Skia, including CanvasKit, we expect pixel-perfect output. With a couple of exceptions all our goldens are stable. We treat the exceptions as bugs. We also found that using percentage was quite risky. Sometimes a golden would be made of a lot of empty space around and/or inside the content, so even 0.1% could turn out quite big. Instead, we use absolute pixel counts and color deltas. |
Reopening since the fix was rolled back. |
@harryterkelsen is overhauling how we take screenshots, so I'm going to hold off on anything screenshot related for now. |
This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of |
Currently if a golden test is flaky we just skip it, e.g.:
Let's say the flake is fixed (perhaps in Skia or in the engine). Skia Gold has a handy feature that, for a given test, shows how stable generated goldens are by giving each golden variant a unique color. For example, in the screenshot below the black, orange, and green circles indicate that the test generated three variations of a golden, i.e. it is flaky:
A non-flaky golden test will show a continuous string of dots of the same color, e.g.:
Unfortunately, we can't use this feature because when we skip a test we stop sending to Skia Gold entirely. The only option is to speculatively unskip the test and hope that it's no longer flaky. The cost of a mistake is closed tree, P0s, wasted time, and other sadness.
Feature request
Add an optional parameter to
matchesGoldenFile
:{ bool isFlaky = false }
. When set totrue
we continue generating the golden, and we continue sending it to Skia Gold, but we don't fail the test. This has the same effect as skipping it, but it allows us to monitor it over time, and when the flake is fixed theisFlaky
argument can be removed.Additionally,
flutter test
could print a warning to the console about the flaky golden, and we can include these in our technical debt calculation.The text was updated successfully, but these errors were encountered: