Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Oct 4, 2021

Fixes: #495

In the case where an input directory is . or ./ the name is stripped and a uuidv4 becomes the namespace identifier.

When a URL is an input here is an example of the format output:

"documentNamespace": "https:/anchore.com/syft/image/gcr.io/distroless/nodejs-debian10-6d671418-314c-480b-a620-c86d985d5441",

Since . is unreserved I believe we can get away with keeping the baseURL for image inputs in the path:
https://datatracker.ietf.org/doc/html/rfc3986#section-2.3

Signed-off-by: Christopher Angelo Phillips [email protected]

Signed-off-by: Christopher Angelo Phillips <[email protected]>
@spiffcs spiffcs linked an issue Oct 4, 2021 that may be closed by this pull request
@spiffcs spiffcs requested a review from a team October 4, 2021 14:10
@spiffcs
Copy link
Contributor Author

spiffcs commented Oct 4, 2021

I think there is a bug when scanning directory

@github-actions
Copy link

github-actions bot commented Oct 4, 2021

Benchmark Test Results

Benchmark results from the latest changes vs base branch
name                                                   old time/op    new time/op    delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2          1.04ms ± 3%    1.21ms ±11%  +16.98%  (p=0.008 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2        1.75ms ± 0%    1.93ms ± 6%  +10.37%  (p=0.008 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     498µs ± 3%     553µs ± 5%  +11.06%  (p=0.008 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 489µs ± 2%     599µs ± 8%  +22.36%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  486µs ± 1%     569µs ±13%  +17.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  10.5ms ± 1%    12.0ms ± 4%  +14.42%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  821µs ± 3%     881µs ± 5%   +7.31%  (p=0.008 n=5+5)
ImagePackageCatalogers/go-cataloger-2                     256µs ± 3%     299µs ± 9%  +17.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                   457µs ± 0%     536µs ± 8%  +17.06%  (p=0.008 n=5+5)

name                                                   old alloc/op   new alloc/op   delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           146kB ± 0%     146kB ± 0%     ~     (p=0.841 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2         754kB ± 0%     755kB ± 0%     ~     (p=0.421 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     118kB ± 0%     118kB ± 0%   +0.12%  (p=0.016 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 132kB ± 0%     132kB ± 0%     ~     (p=0.095 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  140kB ± 0%     140kB ± 0%   -0.00%  (p=0.024 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  2.73MB ± 0%    2.74MB ± 0%     ~     (p=0.095 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                 1.18MB ± 0%    1.18MB ± 0%     ~     (p=0.056 n=5+5)
ImagePackageCatalogers/go-cataloger-2                    54.8kB ± 0%    54.9kB ± 0%   +0.16%  (p=0.016 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                   123kB ± 0%     123kB ± 0%   -0.01%  (p=0.008 n=5+5)

name                                                   old allocs/op  new allocs/op  delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           2.41k ± 0%     2.41k ± 0%     ~     (all equal)
ImagePackageCatalogers/python-package-cataloger-2         9.58k ± 0%     9.58k ± 0%     ~     (p=0.889 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     1.99k ± 0%     1.99k ± 0%     ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                 2.54k ± 0%     2.54k ± 0%     ~     (all equal)
ImagePackageCatalogers/rpmdb-cataloger-2                  3.25k ± 0%     3.25k ± 0%     ~     (all equal)
ImagePackageCatalogers/java-cataloger-2                   37.5k ± 0%     37.5k ± 0%     ~     (p=0.516 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  2.48k ± 0%     2.48k ± 0%     ~     (all equal)
ImagePackageCatalogers/go-cataloger-2                     1.46k ± 0%     1.46k ± 0%     ~     (all equal)
ImagePackageCatalogers/rust-cataloger-2                   3.21k ± 0%     3.21k ± 0%     ~     (all equal)

Signed-off-by: Christopher Angelo Phillips <[email protected]>
@spiffcs spiffcs changed the title (#495) Updates documentNamespace uniqueness for spdx-json output (#495) Update documentNamespace uniqueness for spdx-json output Oct 4, 2021
Comment on lines 52 to 55
namespace = strings.Trim(fmt.Sprintf("%s-%s", name, uID.String()), "-")
case source.DirectoryScheme:
name = srcMetadata.Path
namespace = uID.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we construct this value using two different approaches here?

Copy link
Contributor Author

@spiffcs spiffcs Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - I opted to keep the ImageMetadata.UserInput as the namespace prefix for images since that's what exists on main.

For the directory, I could not think of a great metadata identifier besides the digested contents path. The path presents a few challenges when appending it to our https://anchore.com/syft/image/%s format.

A user could specify an absolute path /foo/bar which would result in ...image//foo/bar-<uuid>.
A user could also specify a relative path ./foo/bar which would result in ...image/./foo/bar-<uuid>

Rather than parse/clean all possible fs inputs I decided to opt for simplicity and supply just the uuid since the documentNamespace field on main for a directory input is currently not set and there are other fields that can help identify context for the contents scanned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either case (image or directory) can include FS paths, so we might need to consider that in this solution. For example, here's a run of this branch while scanning an OCI directory:

// ...
"documentNamespace": "https://anchore.com/syft/image//Users/dan/Desktop/ubuntu-3414a09d-7f4e-4632-8fa0-6d440cf2c
71e",
// ...

Curious for your thoughts here...

Copy link
Contributor Author

@spiffcs spiffcs Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good callout.

This behavior also exists currently on main so I'm happy to use this branch to squash this bug. I was able to reproduce it on my machine with alpine and saw this generated with the latest version of syft:

"documentNamespace": "https://anchore.com/syft/image//Users/hal/development/images/alpine",

We could run the resulting namespace string through a cleaner function that strips all bad prefixes.

We could also adopt a common method of just appending a uuid since the generated sbom has other fields that provide context as to what was scanned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good ideas. I don't have a strong suggestion here. Here's what I do think...

  1. We should probably use the same logic for building this string, regardless of an image or directory path. This lets us factor out this string generation as a function, if we want. Also, we should probably remove "image" from "https://anchore.com/syft/image", since this value is getting used in both paths (this is true both already and potentially going forward, too).

  2. We could go the route of cleaning the user input value, but I could also see that potentially getting hairy. It'd be great if there was a convention we could lean on that exists already — one that has accounted for producing safe "job identifying" strings. I'm not sure if this exists today. So perhaps for now, UUIDs alone are enough (instead of incorporating user input)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - I think UUIDs alone are good enough until we start seeing more of a standard being adopted across the sbom community. I'll also update the path so we have syft/dir vs syft/image depending on the input contents

Signed-off-by: Christopher Angelo Phillips <[email protected]>
@spiffcs
Copy link
Contributor Author

spiffcs commented Oct 4, 2021

@luhring updated the verbosity and organization based on pr feedback.

Let me know what you think of my comment RE: different value construction. Happy to make any edits that we settle on if we need more metadata in the namespace field for directory inputs.

Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
@spiffcs spiffcs force-pushed the 495-spdx-document-namespace branch from 5bc213b to e4cd021 Compare October 4, 2021 20:38
kzantow
kzantow previously requested changes Oct 5, 2021
Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments in the issue about this, too: #495 (comment)

switch srcMetadata.Scheme {
case source.ImageScheme:
name = srcMetadata.ImageMetadata.UserInput
identifier = fmt.Sprintf("image/%s", uniqueID.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think this identifier is meant to include the name, e.g. it should be something like: http://anchore.com/syft/image/alpine-latest-SOMEUUID. See: DocumentNamespace spec, should have: http://[CreatorWebsite]/[pathToSpdx]/[DocumentName]-[UUID] this was why I rolled it into the same issue as missing DocumentName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Good to see it so explicitly formated in their docs. This format is pretty easy for images, but one of the review comments I got was that we want it to be similar across directory and image inputs.

@luhring what are your thoughts given the spec information? Is it ok if we have some different formatting options for image vs directory?

What should the document name be when we scan a directory or OCI directory source?

Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
@spiffcs spiffcs force-pushed the 495-spdx-document-namespace branch from 2ca49ba to 0675307 Compare October 5, 2021 15:59
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Signed-off-by: Christopher Angelo Phillips <[email protected]>
Copy link
Contributor

@luhring luhring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@kzantow kzantow dismissed their stale review October 5, 2021 17:06

No longer valid

Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge step forward for mankind!

@spiffcs spiffcs merged commit f47a6a8 into main Oct 5, 2021
@spiffcs spiffcs deleted the 495-spdx-document-namespace branch October 5, 2021 17:10
@spiffcs spiffcs restored the 495-spdx-document-namespace branch October 6, 2021 16:09
@spiffcs spiffcs deleted the 495-spdx-document-namespace branch October 28, 2021 19:42
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
…put (anchore#528)

* add unique namespace identifier

Signed-off-by: Christopher Angelo Phillips <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing/incorrect SPDX fields: DocumentName, DocumentNamespace

3 participants