[ty] Render all changed diagnostics in conformance.py#23613
[ty] Render all changed diagnostics in conformance.py#23613WillDuke wants to merge 17 commits intoastral-sh:mainfrom
Conversation
Typing conformance results improved 🎉SummaryHow are test cases classified?Each test case represents one expected error annotation or a group of annotations sharing a tag. Counts are per test case, not per diagnostic — multiple diagnostics on the same line count as one. Required annotations (`E`) are true positives when ty flags the expected location and false negatives when it does not. Optional annotations (`E?`) are true positives when flagged but true negatives (not false negatives) when not. Tagged annotations (`E[tag]`) require ty to flag exactly one of the tagged lines; tagged multi-annotations (`E[tag+]`) allow any number up to the tag count. Flagging unexpected locations counts as a false positive.
Files changed
True positives added (1)1 diagnostic
False positives removed (4)4 diagnostics
True positives changed (4)4 diagnostics
False positives changed (1)1 diagnostic
|
a757c4f to
9752926
Compare
Memory usage reportSummary
Significant changesClick to expand detailed breakdownprefect
sphinx
trio
flake8
|
|
|
|
Thanks @WillDuke for working on improving this! It seems to me that having the same line(s) show up as both e.g. "true positives added" and "true positives removed" is pretty confusing. I would rather have a separate "diagnostics changed" section where we give the diff for any line which didn't change true/false positive status, but where the diagnostic is now different. Will this PR also fix the issue observed just now in #23611, where a line went from one false positive to two false positives, resulting in a "1 added false positive" in the summary table, but no details below? I think in that case it could also just show up in "diagnostics changed", since the line did not change status. (This should maybe be separate, but while I'm wishing for features, it would be awesome if this job were to notify us if a PR moves an entire conformance-suite file into -- or I guess out of -- fully-passing state.) This job is already a massive improvement over what we had before, thank you so much! |
|
Thanks, Carl! Yeah, I wasn't sure how best to render the changed ones. Maybe a wider table that has both versions? Or perhaps just show the new diagnostics? And yes, this would fix that issue. At least, it would now show one row under each of false positives removed and false positives added with the latter having two grouped diagnostics in the same confusing way as seen here. I can also look at adding a file-level summary! Edit: I went ahead and tried one approach: changed diagnostics get their own table with rows alternating between the diagnostics removed and added for each test case. The current conformance comment has updated to reflect these changes. The changes I added to demonstrate the fix didn't cause a new file to fully pass, but they will be listed now with this change. Given the confusion that the other approach was causing, I've changed the metrics to count test cases rather than diagnostics. This means that you wont see spurious regressions appearing in the summary table when in fact ty's performance on the test cases improved. However, you will also not see any change in the summary table if e.g. false positive diagnostics were removed without changing the status of the test case. You would have to expand the false positives changed tab to see that some of the false positive diagnostics had been removed. |
|
The latest conformance comment here looks pretty good to me -- curious what others think. @AlexWaygood ? |
Summary
This is meant to address 2786. Previously, the conformance.py script would only render changed diagnostics if one of the ty versions raised no diagnostics in the test group. This PR updates the change detection logic and the renderer so that we display both the old and new diagnostics if they are not identical. I've also created separate sections for cases where diagnostics changed without changing the classification as well as a new
Files changedsection that shows how performance at the file level has changed.Test Plan
I'll try cherry-picking #23088 and take a look at the comment.
I also ran the following command to see how the output would look against 0.0.18:
Which returned the following:
conformance.md
Typing conformance results improved 🎉
Summary
Each test case represents one expected error annotation or a group of annotations sharing a tag. Counts are per test case, not per diagnostic — multiple diagnostics on the same line count as one. Required annotations (
E) are true positives when ty flags the expected location and false negatives when it does not. Optional annotations (E?) are true positives when flagged but true negatives (not false negatives) when not. Tagged annotations (E[tag]) require ty to flag exactly one of the tagged lines; tagged multi-annotations (E[tag+]) allow any number up to the tag count. Flagging unexpected locations counts as a false positive.Files changed
True positives added (1)
1 diagnostic
False positives removed (1)
1 diagnostic
True positives changed (6)
6 diagnostics
False positives changed (1)
1 diagnostic
False positives added (1)
1 diagnostic
Optional Diagnostics Added (2)
2 diagnostics
Optional Diagnostics Removed (1)
1 diagnostic
Optional Diagnostics Changed (1)
1 diagnostic