Need a way to verify flaky golden test fixes #111325

yjbanov · 2022-09-10T00:37:37Z

Currently if a golden test is flaky we just skip it, e.g.:

Let's say the flake is fixed (perhaps in Skia or in the engine). Skia Gold has a handy feature that, for a given test, shows how stable generated goldens are by giving each golden variant a unique color. For example, in the screenshot below the black, orange, and green circles indicate that the test generated three variations of a golden, i.e. it is flaky:

A non-flaky golden test will show a continuous string of dots of the same color, e.g.:

Unfortunately, we can't use this feature because when we skip a test we stop sending to Skia Gold entirely. The only option is to speculatively unskip the test and hope that it's no longer flaky. The cost of a mistake is closed tree, P0s, wasted time, and other sadness.

Feature request

Add an optional parameter to matchesGoldenFile: { bool isFlaky = false }. When set to true we continue generating the golden, and we continue sending it to Skia Gold, but we don't fail the test. This has the same effect as skipping it, but it allows us to monitor it over time, and when the flake is fixed the isFlaky argument can be removed.

Additionally, flutter test could print a warning to the console about the flaky golden, and we can include these in our technical debt calculation.

The text was updated successfully, but these errors were encountered:

Piinks · 2022-09-12T20:42:00Z

For reference, there has been some discussion in this thread: https://discord.com/channels/608014603317936148/1017957368182624297

rrousselGit · 2022-09-16T16:47:59Z

On that note, it might be reasonable to accept a certain % of variation.
Currently a test would fail, even if there's a 0.1% difference between images. An error margin could help

I also remember seeing some image diff projects using machine learning to detect false positives in golden diffs (as that's not a Flutter-specific problem). Maybe that's something to look into

yjbanov · 2022-09-16T17:02:51Z

@rrousselGit We allow fuzzy matching of images for the HTML renderer on the web, where we have limited control over how browsers render pixels, and they are frequently flaky. However, for Skia, including CanvasKit, we expect pixel-perfect output. With a couple of exceptions all our goldens are stable. We treat the exceptions as bugs.

We also found that using percentage was quite risky. Sometimes a golden would be made of a lot of empty space around and/or inside the content, so even 0.1% could turn out quite big. Instead, we use absolute pixel counts and color deltas.

yjbanov · 2022-11-11T18:51:30Z

Reopening since the fix was rolled back.

Piinks · 2022-12-02T21:12:32Z

I am un-assigning myself right now as I am not actively working on this issue. Landing #115004 would close this again, but it is blocked on #93263. We can revisit #115004 after that is resolved.

ricardoamador · 2023-02-02T23:31:17Z

It looks like #115004 is not blocking this issue but that this is blocked directly by #93263. Is that correct?

Piinks · 2023-02-02T23:39:37Z

Can confirm! #115004 would fix this issue, but is blocked on #93263

yjbanov · 2023-09-28T22:44:28Z

@harryterkelsen is overhauling how we take screenshots, so I'm going to hold off on anything screenshot related for now.

github-actions · 2023-10-12T23:00:34Z

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

yjbanov added c: contributor-productivity Team-specific productivity, code health, technical debt. team: flakes team-infra Owned by Infrastructure team infra: auto flake bot Issues with the bot that files flake issues labels Sep 10, 2022

yjbanov mentioned this issue Sep 14, 2022

Allow marking a golden check as flaky #111595

Closed

keyonghan added the passed secondary triage label Sep 15, 2022

yjbanov self-assigned this Sep 27, 2022

yjbanov added the P1 High-priority issues at the top of the work list label Sep 27, 2022

yjbanov mentioned this issue Sep 27, 2022

Canvaskit datepicker skiagold tests are flaky #110785

Open

yjbanov mentioned this issue Oct 13, 2022

Allow marking a golden check as flaky #113396

Closed

4 tasks

yjbanov assigned Piinks Nov 4, 2022

Piinks mentioned this issue Nov 7, 2022

Allow Flutter golden file tests to be flaky #114450

Merged

8 tasks

auto-submit bot closed this as completed in #114450 Nov 8, 2022

yjbanov reopened this Nov 11, 2022

Piinks removed their assignment Dec 2, 2022

flutter-triage-bot bot added the c: flake Tests that sometimes, but not always, incorrectly pass label Jul 7, 2023

Hixie removed the passed secondary triage label Jul 8, 2023

chinmaygarde removed the team: flakes label Jul 19, 2023

yjbanov closed this as completed Sep 28, 2023

github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need a way to verify flaky golden test fixes #111325

Need a way to verify flaky golden test fixes #111325

yjbanov commented Sep 10, 2022

Piinks commented Sep 12, 2022

Uh oh!

rrousselGit commented Sep 16, 2022

Uh oh!

yjbanov commented Sep 16, 2022

Uh oh!

yjbanov commented Nov 11, 2022

Uh oh!

Piinks commented Dec 2, 2022

Uh oh!

ricardoamador commented Feb 2, 2023

Uh oh!

Piinks commented Feb 2, 2023

Uh oh!

yjbanov commented Sep 28, 2023

Uh oh!

github-actions bot commented Oct 12, 2023

Uh oh!

Need a way to verify flaky golden test fixes #111325

Need a way to verify flaky golden test fixes #111325

Comments

yjbanov commented Sep 10, 2022

Feature request

Piinks commented Sep 12, 2022

Uh oh!

rrousselGit commented Sep 16, 2022

Uh oh!

yjbanov commented Sep 16, 2022

Uh oh!

yjbanov commented Nov 11, 2022

Uh oh!

Piinks commented Dec 2, 2022

Uh oh!

ricardoamador commented Feb 2, 2023

Uh oh!

Piinks commented Feb 2, 2023

Uh oh!

yjbanov commented Sep 28, 2023

Uh oh!

github-actions bot commented Oct 12, 2023

Uh oh!