-
Couldn't load subscription status.
- Fork 727
(#495) Update documentNamespace uniqueness for spdx-json output
#528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Christopher Angelo Phillips <[email protected]>
|
I think there is a bug when scanning |
Benchmark Test ResultsBenchmark results from the latest changes vs base branch |
Signed-off-by: Christopher Angelo Phillips <[email protected]>
documentNamespace uniqueness for spdx-json outputdocumentNamespace uniqueness for spdx-json output
| namespace = strings.Trim(fmt.Sprintf("%s-%s", name, uID.String()), "-") | ||
| case source.DirectoryScheme: | ||
| name = srcMetadata.Path | ||
| namespace = uID.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we construct this value using two different approaches here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - I opted to keep the ImageMetadata.UserInput as the namespace prefix for images since that's what exists on main.
For the directory, I could not think of a great metadata identifier besides the digested contents path. The path presents a few challenges when appending it to our https://anchore.com/syft/image/%s format.
A user could specify an absolute path /foo/bar which would result in ...image//foo/bar-<uuid>.
A user could also specify a relative path ./foo/bar which would result in ...image/./foo/bar-<uuid>
Rather than parse/clean all possible fs inputs I decided to opt for simplicity and supply just the uuid since the documentNamespace field on main for a directory input is currently not set and there are other fields that can help identify context for the contents scanned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either case (image or directory) can include FS paths, so we might need to consider that in this solution. For example, here's a run of this branch while scanning an OCI directory:
// ...
"documentNamespace": "https://anchore.com/syft/image//Users/dan/Desktop/ubuntu-3414a09d-7f4e-4632-8fa0-6d440cf2c
71e",
// ...Curious for your thoughts here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good callout.
This behavior also exists currently on main so I'm happy to use this branch to squash this bug. I was able to reproduce it on my machine with alpine and saw this generated with the latest version of syft:
"documentNamespace": "https://anchore.com/syft/image//Users/hal/development/images/alpine",
We could run the resulting namespace string through a cleaner function that strips all bad prefixes.
We could also adopt a common method of just appending a uuid since the generated sbom has other fields that provide context as to what was scanned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good ideas. I don't have a strong suggestion here. Here's what I do think...
-
We should probably use the same logic for building this string, regardless of an image or directory path. This lets us factor out this string generation as a function, if we want. Also, we should probably remove "image" from
"https://anchore.com/syft/image", since this value is getting used in both paths (this is true both already and potentially going forward, too). -
We could go the route of cleaning the user input value, but I could also see that potentially getting hairy. It'd be great if there was a convention we could lean on that exists already — one that has accounted for producing safe "job identifying" strings. I'm not sure if this exists today. So perhaps for now, UUIDs alone are enough (instead of incorporating user input)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - I think UUIDs alone are good enough until we start seeing more of a standard being adopted across the sbom community. I'll also update the path so we have syft/dir vs syft/image depending on the input contents
Signed-off-by: Christopher Angelo Phillips <[email protected]>
|
@luhring updated the verbosity and organization based on pr feedback. Let me know what you think of my comment RE: different value construction. Happy to make any edits that we settle on if we need more metadata in the namespace field for directory inputs. |
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
5bc213b to
e4cd021
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments in the issue about this, too: #495 (comment)
| switch srcMetadata.Scheme { | ||
| case source.ImageScheme: | ||
| name = srcMetadata.ImageMetadata.UserInput | ||
| identifier = fmt.Sprintf("image/%s", uniqueID.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I think this identifier is meant to include the name, e.g. it should be something like: http://anchore.com/syft/image/alpine-latest-SOMEUUID. See: DocumentNamespace spec, should have: http://[CreatorWebsite]/[pathToSpdx]/[DocumentName]-[UUID] this was why I rolled it into the same issue as missing DocumentName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Good to see it so explicitly formated in their docs. This format is pretty easy for images, but one of the review comments I got was that we want it to be similar across directory and image inputs.
@luhring what are your thoughts given the spec information? Is it ok if we have some different formatting options for image vs directory?
What should the document name be when we scan a directory or OCI directory source?
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
2ca49ba to
0675307
Compare
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge step forward for mankind!
…put (anchore#528) * add unique namespace identifier Signed-off-by: Christopher Angelo Phillips <[email protected]>
Fixes: #495
In the case where an input directory is
.or./the name is stripped and auuidv4becomes the namespace identifier.When a URL is an input here is an example of the format output:
Since
.is unreserved I believe we can get away with keeping the baseURL for image inputs in the path:https://datatracker.ietf.org/doc/html/rfc3986#section-2.3
Signed-off-by: Christopher Angelo Phillips [email protected]