(#495) Update `documentNamespace` uniqueness for spdx-json output #528

spiffcs · 2021-10-04T14:09:37Z

Fixes: #495

In the case where an input directory is . or ./ the name is stripped and a uuidv4 becomes the namespace identifier.

When a URL is an input here is an example of the format output:

"documentNamespace": "https:/anchore.com/syft/image/gcr.io/distroless/nodejs-debian10-6d671418-314c-480b-a620-c86d985d5441",

Since . is unreserved I believe we can get away with keeping the baseURL for image inputs in the path:
https://datatracker.ietf.org/doc/html/rfc3986#section-2.3

Signed-off-by: Christopher Angelo Phillips [email protected]

Signed-off-by: Christopher Angelo Phillips <[email protected]>

spiffcs · 2021-10-04T14:11:37Z

I think there is a bug when scanning directory

github-actions · 2021-10-04T14:12:40Z

Benchmark Test Results

Benchmark results from the latest changes vs base branch

name                                                   old time/op    new time/op    delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2          1.04ms ± 3%    1.21ms ±11%  +16.98%  (p=0.008 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2        1.75ms ± 0%    1.93ms ± 6%  +10.37%  (p=0.008 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     498µs ± 3%     553µs ± 5%  +11.06%  (p=0.008 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 489µs ± 2%     599µs ± 8%  +22.36%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  486µs ± 1%     569µs ±13%  +17.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  10.5ms ± 1%    12.0ms ± 4%  +14.42%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  821µs ± 3%     881µs ± 5%   +7.31%  (p=0.008 n=5+5)
ImagePackageCatalogers/go-cataloger-2                     256µs ± 3%     299µs ± 9%  +17.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                   457µs ± 0%     536µs ± 8%  +17.06%  (p=0.008 n=5+5)

name                                                   old alloc/op   new alloc/op   delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           146kB ± 0%     146kB ± 0%     ~     (p=0.841 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2         754kB ± 0%     755kB ± 0%     ~     (p=0.421 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     118kB ± 0%     118kB ± 0%   +0.12%  (p=0.016 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 132kB ± 0%     132kB ± 0%     ~     (p=0.095 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  140kB ± 0%     140kB ± 0%   -0.00%  (p=0.024 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  2.73MB ± 0%    2.74MB ± 0%     ~     (p=0.095 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                 1.18MB ± 0%    1.18MB ± 0%     ~     (p=0.056 n=5+5)
ImagePackageCatalogers/go-cataloger-2                    54.8kB ± 0%    54.9kB ± 0%   +0.16%  (p=0.016 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                   123kB ± 0%     123kB ± 0%   -0.01%  (p=0.008 n=5+5)

name                                                   old allocs/op  new allocs/op  delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           2.41k ± 0%     2.41k ± 0%     ~     (all equal)
ImagePackageCatalogers/python-package-cataloger-2         9.58k ± 0%     9.58k ± 0%     ~     (p=0.889 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     1.99k ± 0%     1.99k ± 0%     ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                 2.54k ± 0%     2.54k ± 0%     ~     (all equal)
ImagePackageCatalogers/rpmdb-cataloger-2                  3.25k ± 0%     3.25k ± 0%     ~     (all equal)
ImagePackageCatalogers/java-cataloger-2                   37.5k ± 0%     37.5k ± 0%     ~     (p=0.516 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  2.48k ± 0%     2.48k ± 0%     ~     (all equal)
ImagePackageCatalogers/go-cataloger-2                     1.46k ± 0%     1.46k ± 0%     ~     (all equal)
ImagePackageCatalogers/rust-cataloger-2                   3.21k ± 0%     3.21k ± 0%     ~     (all equal)

Signed-off-by: Christopher Angelo Phillips <[email protected]>

luhring · 2021-10-04T15:18:08Z

internal/presenter/packages/spdx_json_presenter.go

+		namespace = strings.Trim(fmt.Sprintf("%s-%s", name, uID.String()), "-")
 	case source.DirectoryScheme:
 		name = srcMetadata.Path
+		namespace = uID.String()


Why do we construct this value using two different approaches here?

Good question - I opted to keep the ImageMetadata.UserInput as the namespace prefix for images since that's what exists on main.

For the directory, I could not think of a great metadata identifier besides the digested contents path. The path presents a few challenges when appending it to our https://anchore.com/syft/image/%s format.

A user could specify an absolute path /foo/bar which would result in ...image//foo/bar-<uuid>.
A user could also specify a relative path ./foo/bar which would result in ...image/./foo/bar-<uuid>

Rather than parse/clean all possible fs inputs I decided to opt for simplicity and supply just the uuid since the documentNamespace field on main for a directory input is currently not set and there are other fields that can help identify context for the contents scanned.

I think either case (image or directory) can include FS paths, so we might need to consider that in this solution. For example, here's a run of this branch while scanning an OCI directory:

// ... "documentNamespace": "https://anchore.com/syft/image//Users/dan/Desktop/ubuntu-3414a09d-7f4e-4632-8fa0-6d440cf2c 71e", // ...

Curious for your thoughts here...

That's a good callout.

This behavior also exists currently on main so I'm happy to use this branch to squash this bug. I was able to reproduce it on my machine with alpine and saw this generated with the latest version of syft:

"documentNamespace": "https://anchore.com/syft/image//Users/hal/development/images/alpine",

We could run the resulting namespace string through a cleaner function that strips all bad prefixes.

We could also adopt a common method of just appending a uuid since the generated sbom has other fields that provide context as to what was scanned.

These are good ideas. I don't have a strong suggestion here. Here's what I do think...

We should probably use the same logic for building this string, regardless of an image or directory path. This lets us factor out this string generation as a function, if we want. Also, we should probably remove "image" from "https://anchore.com/syft/image", since this value is getting used in both paths (this is true both already and potentially going forward, too).

We could go the route of cleaning the user input value, but I could also see that potentially getting hairy. It'd be great if there was a convention we could lean on that exists already — one that has accounted for producing safe "job identifying" strings. I'm not sure if this exists today. So perhaps for now, UUIDs alone are enough (instead of incorporating user input)?

Yea - I think UUIDs alone are good enough until we start seeing more of a standard being adopted across the sbom community. I'll also update the path so we have syft/dir vs syft/image depending on the input contents

internal/presenter/packages/spdx_json_presenter.go

Signed-off-by: Christopher Angelo Phillips <[email protected]>

spiffcs · 2021-10-04T15:38:48Z

@luhring updated the verbosity and organization based on pr feedback.

Let me know what you think of my comment RE: different value construction. Happy to make any edits that we settle on if we need more metadata in the namespace field for directory inputs.

internal/presenter/packages/spdx_json_presenter.go

Signed-off-by: Christopher Angelo Phillips <[email protected]>

kzantow

Left some comments in the issue about this, too: #495 (comment)

kzantow · 2021-10-05T01:09:15Z

internal/presenter/packages/spdx_json_presenter.go

 	switch srcMetadata.Scheme {
 	case source.ImageScheme:
 		name = srcMetadata.ImageMetadata.UserInput
+		identifier = fmt.Sprintf("image/%s", uniqueID.String())


Note: I think this identifier is meant to include the name, e.g. it should be something like: http://anchore.com/syft/image/alpine-latest-SOMEUUID. See: DocumentNamespace spec, should have: http://[CreatorWebsite]/[pathToSpdx]/[DocumentName]-[UUID] this was why I rolled it into the same issue as missing DocumentName.

Nice! Good to see it so explicitly formated in their docs. This format is pretty easy for images, but one of the review comments I got was that we want it to be similar across directory and image inputs.

@luhring what are your thoughts given the spec information? Is it ok if we have some different formatting options for image vs directory?

What should the document name be when we scan a directory or OCI directory source?

Signed-off-by: Christopher Angelo Phillips <[email protected]>

internal/presenter/packages/spdx_json_presenter.go

luhring

🙌

internal/presenter/packages/spdx_json_presenter.go

No longer valid

kzantow

Huge step forward for mankind!

…put (anchore#528) * add unique namespace identifier Signed-off-by: Christopher Angelo Phillips <[email protected]>

add unique namespace identifier

d401167

Signed-off-by: Christopher Angelo Phillips <[email protected]>

spiffcs linked an issue Oct 4, 2021 that may be closed by this pull request

Missing/incorrect SPDX fields: DocumentName, DocumentNamespace #495

Closed

spiffcs requested a review from a team October 4, 2021 14:10

update namespace for directory path

07a171a

Signed-off-by: Christopher Angelo Phillips <[email protected]>

spiffcs changed the title ~~(#495) Updates documentNamespace uniqueness for spdx-json output~~ (#495) Update documentNamespace uniqueness for spdx-json output Oct 4, 2021

luhring reviewed Oct 4, 2021

View reviewed changes

update code organization from pr comments

5d7a234

Signed-off-by: Christopher Angelo Phillips <[email protected]>

luhring reviewed Oct 4, 2021

View reviewed changes

internal/presenter/packages/spdx_json_presenter.go Outdated Show resolved Hide resolved

spiffcs added 2 commits October 4, 2021 11:56

update to const

e2453ee

Signed-off-by: Christopher Angelo Phillips <[email protected]>

update identifier to be input specific with uuid

e4cd021

Signed-off-by: Christopher Angelo Phillips <[email protected]>

spiffcs force-pushed the 495-spdx-document-namespace branch from 5bc213b to e4cd021 Compare October 4, 2021 20:38

kzantow previously requested changes Oct 5, 2021

View reviewed changes

spiffcs added 2 commits October 5, 2021 11:01

add name back into uri

1e0ca22

Signed-off-by: Christopher Angelo Phillips <[email protected]>

add small clean function

0675307

Signed-off-by: Christopher Angelo Phillips <[email protected]>

spiffcs force-pushed the 495-spdx-document-namespace branch from 2ca49ba to 0675307 Compare October 5, 2021 15:59

spiffcs added 7 commits October 5, 2021 12:05

handle edge case for relative/absolute

d94f673

Signed-off-by: Christopher Angelo Phillips <[email protected]>

update failing snapshot

f68c159

Signed-off-by: Christopher Angelo Phillips <[email protected]>

small changes

f1dd70a

Signed-off-by: Christopher Angelo Phillips <[email protected]>

update fixture back

221b808

Signed-off-by: Christopher Angelo Phillips <[email protected]>

add conditional after clean

7587a79

Signed-off-by: Christopher Angelo Phillips <[email protected]>

go simple fix

e9a475b

Signed-off-by: Christopher Angelo Phillips <[email protected]>

generalize identifier construction

87c46fc

Signed-off-by: Christopher Angelo Phillips <[email protected]>

kzantow reviewed Oct 5, 2021

View reviewed changes

internal/presenter/packages/spdx_json_presenter.go Show resolved Hide resolved

luhring approved these changes Oct 5, 2021

View reviewed changes

kzantow reviewed Oct 5, 2021

View reviewed changes

internal/presenter/packages/spdx_json_presenter.go Show resolved Hide resolved

kzantow approved these changes Oct 5, 2021

View reviewed changes

spiffcs merged commit f47a6a8 into main Oct 5, 2021

spiffcs deleted the 495-spdx-document-namespace branch October 5, 2021 17:10

spiffcs restored the 495-spdx-document-namespace branch October 6, 2021 16:09

spiffcs deleted the 495-spdx-document-namespace branch October 28, 2021 19:42

GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024

(anchore#495) Update documentNamespace uniqueness for spdx-json out…

205e8ed

…put (anchore#528) * add unique namespace identifier Signed-off-by: Christopher Angelo Phillips <[email protected]>

Uh oh!

(#495) Update documentNamespace uniqueness for spdx-json output #528

(#495) Update documentNamespace uniqueness for spdx-json output #528

Uh oh!

Conversation

spiffcs commented Oct 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spiffcs commented Oct 4, 2021

Uh oh!

github-actions bot commented Oct 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Test Results

Uh oh!

luhring Oct 4, 2021

Choose a reason for hiding this comment

Uh oh!

spiffcs Oct 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luhring Oct 4, 2021

Choose a reason for hiding this comment

Uh oh!

spiffcs Oct 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luhring Oct 4, 2021

Choose a reason for hiding this comment

Uh oh!

spiffcs Oct 4, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

spiffcs commented Oct 4, 2021

Uh oh!

Uh oh!

kzantow left a comment

Choose a reason for hiding this comment

Uh oh!

kzantow Oct 5, 2021

Choose a reason for hiding this comment

Uh oh!

spiffcs Oct 5, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luhring left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kzantow left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

(#495) Update `documentNamespace` uniqueness for spdx-json output #528

(#495) Update `documentNamespace` uniqueness for spdx-json output #528

spiffcs commented Oct 4, 2021 •

edited

Loading

github-actions bot commented Oct 4, 2021 •

edited

Loading

spiffcs Oct 4, 2021 •

edited

Loading

spiffcs Oct 4, 2021 •

edited

Loading