Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Need a way to verify flaky golden test fixes #111325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yjbanov opened this issue Sep 10, 2022 · 9 comments · Fixed by #114450
Closed

Need a way to verify flaky golden test fixes #111325

yjbanov opened this issue Sep 10, 2022 · 9 comments · Fixed by #114450
Assignees
Labels
c: contributor-productivity Team-specific productivity, code health, technical debt. c: flake Tests that sometimes, but not always, incorrectly pass infra: auto flake bot Issues with the bot that files flake issues P1 High-priority issues at the top of the work list team-infra Owned by Infrastructure team

Comments

@yjbanov
Copy link
Contributor

yjbanov commented Sep 10, 2022

Currently if a golden test is flaky we just skip it, e.g.:

Let's say the flake is fixed (perhaps in Skia or in the engine). Skia Gold has a handy feature that, for a given test, shows how stable generated goldens are by giving each golden variant a unique color. For example, in the screenshot below the black, orange, and green circles indicate that the test generated three variations of a golden, i.e. it is flaky:

Screen Shot 2022-09-02 at 1 51 39 PM

A non-flaky golden test will show a continuous string of dots of the same color, e.g.:

Screen Shot 2022-09-09 at 5 28 18 PM

Unfortunately, we can't use this feature because when we skip a test we stop sending to Skia Gold entirely. The only option is to speculatively unskip the test and hope that it's no longer flaky. The cost of a mistake is closed tree, P0s, wasted time, and other sadness.

Feature request

Add an optional parameter to matchesGoldenFile: { bool isFlaky = false }. When set to true we continue generating the golden, and we continue sending it to Skia Gold, but we don't fail the test. This has the same effect as skipping it, but it allows us to monitor it over time, and when the flake is fixed the isFlaky argument can be removed.

Additionally, flutter test could print a warning to the console about the flaky golden, and we can include these in our technical debt calculation.

@yjbanov yjbanov added c: contributor-productivity Team-specific productivity, code health, technical debt. team: flakes team-infra Owned by Infrastructure team infra: auto flake bot Issues with the bot that files flake issues labels Sep 10, 2022
@Piinks
Copy link
Contributor

Piinks commented Sep 12, 2022

For reference, there has been some discussion in this thread: https://discord.com/channels/608014603317936148/1017957368182624297

@rrousselGit
Copy link
Contributor

On that note, it might be reasonable to accept a certain % of variation.
Currently a test would fail, even if there's a 0.1% difference between images. An error margin could help

I also remember seeing some image diff projects using machine learning to detect false positives in golden diffs (as that's not a Flutter-specific problem). Maybe that's something to look into

@yjbanov
Copy link
Contributor Author

yjbanov commented Sep 16, 2022

@rrousselGit We allow fuzzy matching of images for the HTML renderer on the web, where we have limited control over how browsers render pixels, and they are frequently flaky. However, for Skia, including CanvasKit, we expect pixel-perfect output. With a couple of exceptions all our goldens are stable. We treat the exceptions as bugs.

We also found that using percentage was quite risky. Sometimes a golden would be made of a lot of empty space around and/or inside the content, so even 0.1% could turn out quite big. Instead, we use absolute pixel counts and color deltas.

@yjbanov yjbanov self-assigned this Sep 27, 2022
@yjbanov yjbanov added the P1 High-priority issues at the top of the work list label Sep 27, 2022
@yjbanov
Copy link
Contributor Author

yjbanov commented Nov 11, 2022

Reopening since the fix was rolled back.

@yjbanov yjbanov reopened this Nov 11, 2022
@Piinks
Copy link
Contributor

Piinks commented Dec 2, 2022

I am un-assigning myself right now as I am not actively working on this issue. Landing #115004 would close this again, but it is blocked on #93263. We can revisit #115004 after that is resolved.

@Piinks Piinks removed their assignment Dec 2, 2022
@ricardoamador
Copy link
Contributor

It looks like #115004 is not blocking this issue but that this is blocked directly by #93263. Is that correct?

@Piinks
Copy link
Contributor

Piinks commented Feb 2, 2023

Can confirm! #115004 would fix this issue, but is blocked on #93263

@flutter-triage-bot flutter-triage-bot bot added the c: flake Tests that sometimes, but not always, incorrectly pass label Jul 7, 2023
@yjbanov
Copy link
Contributor Author

yjbanov commented Sep 28, 2023

@harryterkelsen is overhauling how we take screenshots, so I'm going to hold off on anything screenshot related for now.

@yjbanov yjbanov closed this as completed Sep 28, 2023
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: contributor-productivity Team-specific productivity, code health, technical debt. c: flake Tests that sometimes, but not always, incorrectly pass infra: auto flake bot Issues with the bot that files flake issues P1 High-priority issues at the top of the work list team-infra Owned by Infrastructure team
Projects
None yet
7 participants