-
Couldn't load subscription status.
- Fork 727
497 stable sorted CPE array (JSON and SPDX) #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Benchmark Test ResultsBenchmark results from the latest changes vs base branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: If the objective of this PR is to ensure that SPDX output is consistently sorted on the whole, is there a test we can introduce to make that assertion, beyond CPEs?
|
|
||
| func dashIndex(cpe wfn.Attributes) int { | ||
| count := 0 | ||
| dash := "-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This could be a const since we never need to assign a new value to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to rename the PR. The bug was sorted spdx output, but that was not actually the issue. The issue was that the diffs found when running multiple of the same image were located in the cpes array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to: stable sorted CPE array (JSON and SPDX)
Yea let me also see if I can add an integration level test that asserts there are no regressions on this front as far as producing a consistent document across multiple runs. |
| return iScore > jScore | ||
| } | ||
|
|
||
| func dashIndex(cpe wfn.Attributes) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here as to why this is here would go a long way --It isn't obvious that this is here purely to stabilize the sort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added here. I'm also open to talk about this being a pretty fragile assumption. It was a patch that was very targeted at the alpine example I listed in the PR. I'm still running some other images to see where/how this might break and if I can find a larger and more general solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed --it's always worth a little extra time up front to attempt and avoid solving the same problem twice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just mostly confused why are we not just text sorting? We could sort based on each field split based on : or somesuch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea there was some conversation on using
return c[i].BindToFmtString() > c[j].BindToFmtString()
I'll check that rather than sorting based on random -
Signed-off-by: Christopher Angelo Phillips <[email protected]>
3ce3b28 to
8ae8026
Compare
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
|
@kzantow I added text sort as the base case if our |
|
@anchore/tools There are two places I found as candidates for regression test additions for this PR. This first would be in the syft/internal/presenter/packages/json_presenter_test.go Lines 29 to 37 in 6480f06
syft/internal/presenter/packages/utils_test.go Lines 106 to 108 in 6480f06
The other is we write a new integration test under I'm happy to add either, just wanted your thoughts going into Friday on if one was good enough or if we want both levels checked. |
Great ideas. I'm liking the second route a bit more at the moment, but I could be convinced of either direction. My initial thinking is if we can avoid the fragility that sometimes comes with snapshot testing, these tests would be more reliable and thus more trusted by the team. |
|
Re: the PR's purpose...
My thinking is this: Right now, merging this PR will cause #497 to be closed. 497 is expressing that the SPDX output's sorting should be stable in general (which I agree with). So if we want to consider 497 done, I think we should have test(s) that ensure 497 is fully addressed. If we don't want to tackle all of 497 in this particular PR, I'm cool with that, too! In that case, 497 would still be unfinished (IMHO) upon merging this PR. Curious for your latest thinking! |
If we can get an integration test added that shows two runs of |
|
re: snapshot testing: I agree with @luhring on this one --snapshot testing is great for change detection and seeing what changed, but is not great at communicating why something changed. Leaning more on unit tests for depth in cases and specific business logic verifications is the way to go here. If we have a concern about how things are wired up, maybe an integration or CLI level test would be useful here --generate an SBOM for the same input a few times and ensure the output is stable as a whole (like you suggested @spiffcs ). Though it doesn't need to be limited to spdx-json, but testing all SBOM outputs ( |
I think we if add this integration test in this PR we can call #497 closed since we addressed the root of the problem and add regression. I'll write it up today. |
| mustCPE("cpe:2.3:a:alpine-keys:alpine_keys:2.3-r1:*:*:*:*:*:*:*"), | ||
| mustCPE("cpe:2.3:a:alpine-keys:alpine-keys:2.3-r1:*:*:*:*:*:*:*"), | ||
| mustCPE("cpe:2.3:a:alpine_keys:alpine_keys:2.3-r1:*:*:*:*:*:*:*"), | ||
| mustCPE("cpe:2.3:a:alpine_keys:alpine-keys:2.3-r1:*:*:*:*:*:*:*"), | ||
| mustCPE("cpe:2.3:a:alpine:alpine_keys:2.3-r1:*:*:*:*:*:*:*"), | ||
| mustCPE("cpe:2.3:a:alpine:alpine-keys:2.3-r1:*:*:*:*:*:*:*"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make this obviously more out of order?
Signed-off-by: Christopher Angelo Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this looks good -- super simple and I think it should work for solving the original issue 👍
* add small sorting change to our specificity Signed-off-by: Christopher Angelo Phillips <[email protected]>
Attempts to Solve #497
To see the current issue you can:
This will produce a diff where the
cpesfield for some packages is sorted differently from run to run.In the case where
countFieldLengthis equal for two sequential CPE in a given list of CPE, then our sort function would return a non deterministic order for cases as seen below:If the below change is too specific or fragile I can explore using the
Stableinterface to see if that at least keeps us consistent across runs.The array order coming into the sort is always consistent, but where we get the
diffbetween runs is whenLess(i, j) and Less(j, i) are false,.This PR assigns a priority based on the character
-for theVendorandProductfields.Given the above example the ideal sorted output would be:
Currently I'm trying to find images which break the assumptions found in this code.