-
Couldn't load subscription status.
- Fork 727
feat: 1944 - update purl generation to use a consistent groupID #2033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Keith Zantow <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
f03e747 to
f8a6582
Compare
|
@westonsteimel I combined the work from 2032 and the single groupID answer code together in this branch - this does not alter any CPE generation or change how java packages are deduped - I'll be updating those in a separate PR I'm adding two commits here -
Syft Maindb crawler results : currently running:
^ I have to run these again but my ISP just went out and I don't want to pull this down over data 😬 Latest Comparison |
Benchmark Test ResultsBenchmark results from the latest changes vs base branch |
2ce6400 to
fe85967
Compare
Signed-off-by: Christopher Phillips <[email protected]>
fe85967 to
9f3157c
Compare
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
| // 2. The group ID from the POM project | ||
| // 3. The group ID from a select map of known group IDs | ||
| // 3. The group ID from the Java manifest | ||
| func GroupIDFromJavaMetadata(pkgName string, metadata pkg.JavaMetadata) (groupID string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these new functions probably don't belong in the cpe package since they have been replicated to not affect the CPE generation logic today. My vote would be to put this under the java cataloger package (as near duplicate functions) and unexport all of the functions (I'd duplicate the looksLikeGroupID function too... since there isn't a specific definition for this and the impl may change based on the use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I parsed this comment correctly - check f1e05da
If there are some edits or things to move around from here we can do a quick sync on the package organization and exports/imports
| parentKey := fmt.Sprintf("%s:%s:%s", groupID, parentPkg.Name, parentPkg.Version) | ||
| pomProjectKey := fmt.Sprintf("%s:%s:%s", pomProperties.GroupID, pomProperties.ArtifactID, pomProperties.Version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be trying to correlate the parent name and the current pom properties artifact ID and changing the virtual path when this occurs. Can you leave a code comment to why this is important? (and this might look test worthy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with commentary and test updates in 06bd7b9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify: why is the parent package name and pom properties artifact ID being compared? (instead of artifact id vs artifact id or name vs name?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment explaining the significance and why we use the values we're using: 1bbb6c5
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay -- I meant to submit this a few days ago!
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
|
Latest run: 1e4f54a |
Signed-off-by: Christopher Phillips <[email protected]>
|
Ratio when removing ArtifactID: 77a6fd4 Conclusion is we had less correct by about 10,000 with a larger sample size so I'm reverting the commit. |
05f7f67 to
c3b3e79
Compare
This reverts commit 77a6fd4. Signed-off-by: Christopher Phillips <[email protected]>
c3b3e79 to
1bbb6c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the deep dive on this one @spiffcs 🙌
…ore#2033) Separate the logic for CPE and PURL generation. PURL generation needs a single answer for groupID based on a priority of discovering the field. CPE generation still uses multiple potential groupID to populate the candidate cpe. Improve GroupID detection. Currently syft does not use any hierarchy for GroupID detection and treats all sources as equal. It treats fields from the manifest file with priority. This change adds a hierarchy to the fields and returns a single answer based on that hierarchy. --------- Signed-off-by: Christopher Phillips <[email protected]> Signed-off-by: Keith Zantow <[email protected]> Co-authored-by: Keith Zantow <[email protected]>
Summary
This PR aims to improve 3 related aspects of syft's java cataloging. Pulls in #2032
Separate the logic for CPE and PURL generation. PURL generation needs a single answer for groupID based on a priority of discovering the field. CPE generation still uses multiple potential groupID to populate the candidate cpe.
Improve GroupID detection. Currently syft does not use any hierarchy for GroupID detection and treats all sources as equal. It treats fields from the manifest file with priority. We should try to obtain a GroupID answer from Pom Properties, then the Pom Project and finally, if a GroupID has not been found, the Manifest. Manifest should not return an answer if one was found in PomProject, and PomProject should not return a GroupID answer if PomProperties found one.
Note: we've removed the
child matches parent by virtual pathcase because the virtual path now has more uniqueness that depends on a packages metadata. The parent did not fulfill this requirement and thus a new package was created, virtual paths no longer match, and the two packages were not merged