Fix CPE encode/decode when it contains special chars #714

sambhav · 2021-12-21T16:27:17Z

Fixes #712
Fixes #426

Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.
Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings
Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Edit on Point 2 after the second commit -

Remove WFNize for input strings
WFNize seems to not be part of the standard as per
pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]>

WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]>

wagoodman · 2021-12-21T21:46:03Z

@samj1912 thanks for jumping on this!

I have a few reflections on the PR specifically:

(regarding epoch) I'm not certain it is safe to drop epochs from versions, though I see exactly why it was done. If a tool downstream of syft extracts the version from the CPE and uses that as a basis of comparison with other versions from the same product, without epoch it is not clear if the comparison is valid or not (since only versions within the same epoch are directly comparable, otherwise the epoch values themselves should be compared).

(regarding WFNize) You're right that wfn.WFNize doesn't quite do the trick, and the encode/decode path remaining stable is an important item for us. Unless I'm missing something I think the lib is missing something akin to "wfn.UnWFNize" for escaped values... something that isn't easily bolted on in syft (but possible I think).

(regarding nvdtools replacement?) You made an observation in the community slack that I wanted to highlight:

I think it might be best to use a diff. go cpe library given all the bugs that I pointed out

I think this is definitely an option that should be on the table. If we dive down this path I think we need to consider how that will impact grype matching by CPE (which is I think one of the reasons this lib was chosen to begin with).

sambhav · 2021-12-21T22:03:50Z

https://github.com/knqyf263/go-cpe seems to be the only other cpe parsing lib that I can find. From a cursory look, it seems to handle quoting and parsing better than the current library but it seems mostly unmaintained at this point.

wagoodman · 2021-12-21T22:32:16Z

I've found a few others, but the one you found is at the top my list of ones to try as well 👍 . The cases that a CPE library needs to cover (at least) are:

valid decode of a CPE formatted string
be able to encode a CPE object into a formatted string
be able to compare two CPEs (objects or strings)

(there is a chance I missed a case here, but these are the big ones)

The third case comes up in Grype, specifically here: https://github.com/anchore/grype/blob/4f964c4ee26ad01a80b8bcffb6bf23c0afb71d09/grype/vulnerability/store_adapter.go#L103

This function wraps the nvdtools MatchWithoutVersion method: https://github.com/facebookincubator/nvdtools/blob/952d0696a19961ffb4bc18c415df62ea4d00d2fa/wfn/matcher.go#L39-L50

If the new CPE lib doesn't have the exact comparison method we're interested, that may be OK. As long as we can craft it with the components that the lib provides (ideally easily 😄 ).

Happy to help on this one --odds are this is a pretty big undertaking (could be wrong though!). Feel free to reach out to me on slack!

sambhav · 2021-12-21T23:59:37Z

From slack https://anchorecommunity.slack.com/archives/C4PJFNEEM/p1640131075279100?thread_ts=1640100243.273100&cid=C4PJFNEEM -

@alex Goodman I think I managed to solve the issue - lmk if my reasoning makes sense -

instead of sanitizing at cpe generation time, I now do so at decode, this ensures that an encode<>decode cycle produces the same cpe reproducibly and we do not rely on cpe generation to sanitize the output. The wfnize function should now correctly be unwfnize by the StripSlashes function that is called when NewCPE is used and similarly all format decoders now use the new CPEString function which correctly escapes all unwanted chars.

sambhav · 2021-12-22T00:02:19Z

I also noticed that my regex check that was introduced in NewCPE was causing a lot of unquoted CPEs to be filtered out before my last change, which is why the encode/decode cycle test was passing - since for all problematic CPEs - it will just filtering them out instead of properly encoding/decoding them. My latest changes should ensure that all CPEs are preserved as before while still being valid now that they are quoted properly.

The regex check however is still necessary as it ensures that any possible CPEs that are invalid even after the quoting logic don't make it through to the end.

Signed-off-by: Sambhav Kothari <[email protected]>

sambhav · 2021-12-22T21:10:03Z

@wagoodman this should be ready for review now - I had to update the decode function (normalize) to match the encode function (sanitize) since StripSlashes was also not doing what it is supposed to.

syft/pkg/cpe.go

Signed-off-by: Sambhav Kothari <[email protected]>

spiffcs

Small nits for discussion and thanks so much for the PR @samj1912!

syft/pkg/cataloger/common/cpe/sort_by_specificity.go

syft/pkg/cpe.go

syft/pkg/cpe_test.go

spiffcs · 2022-01-03T18:41:00Z

Core functionality of the PR looks like a great update to help with the issues in the nvdtools library. Test cases look good and round trip has been implemented correctly (sanitize <-> original). Small nits on the string building (buffer vs in-library builder). Excellent addition 🥳 .

wagoodman · 2022-01-03T21:12:30Z

syft/pkg/cpe_test.go

+		CPEUrl    string `json:"cpe-url"`
+		WFN       CPE    `json:"wfn"`
+	}{}
+	out, err := ioutil.ReadFile("test-fixtures/cpe-data.json")


this test fixture file has a lot of cases! Is there a way to reduce this to a minimum (or at least smaller) set of cases that still reflect all of the features tested?

wagoodman

Really great work @samj1912 ! I only have #714 (comment) as a comment (as a side note, I don't ever remember leaving a comment on a review that says "this may be too many tests" ... it's a nice problem to have!). If you think that the set of tests can't easily be reduced, we can get this in as is and maybe work on reducing it later (having the tests is better than not).

wagoodman · 2022-01-03T22:07:42Z

syft/pkg/cpe.go

+const cpeRegexString = ((`^([c][pP][eE]:/[AHOaho]?(:[A-Za-z0-9\._\-~%]*){0,6})`) +
+	// Or match the CPE binding string
+	// Note that we had to replace '`' with '\x60' to escape the backticks
+	`|(cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,/:;<=>@\[\]\^\x60\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,/:;<=>@\[\]\^\x60\{\|}~]))+(\?*|\*?))|[\*\-])){4})$`)


this is quite a scary regex! Is this check 100% necessary? or can it be left off during CPE construction? (I presume the very-large test fixture of CPEs covers some of these regex paths in testing)

@wagoodman I think it is necessary since we use the NewCPE to validate and filter out invalid CPEs. This ensures that syft never puts/parses an invalid CPE in an sbom. I have seen issues where syft outputs CPEs which are not accepted/matched by https://nvd.nist.gov/products/cpe/search

Though this is a large test fixture you're right in the sense that this is making things better. We can always cull back in a future PR or remove if we replace the CPE library we're using in the future.

Signed-off-by: Sambhav Kothari <[email protected]>

wagoodman

Solid work, thanks for the contribution! 💯

…hub.com/hectorj2f/syft into hectorj2f/add_dependencies_to_cyclonedx * 'hectorj2f/add_dependencies_to_cyclonedx' of https://github.com/hectorj2f/syft: (29 commits) Improve CycloneDX format output (#710) Add additional PHP metadata (#753) Update Syft formats for SyftJson (#752) Add support for "file" source type in syftjson unmarshaling (#750) remove contains file from spdx dependency generation support .sar for java ecosystem (#748) Start developer documentation (#746) Align SPDX export more with SPDX 2.2 specification (#743) Replace distro type (#742) update goreleaser with windows checksums (#740) bump stereoscope version to remove old containerd (#741) Add support for multiple output files in different formats (#732) Add support for searching for jars within archives (#734) 683 windows filepath (#735) Fix CPE encode/decode when it contains special chars (#714) support .par for java ecosystems (#727) Add arm64 support to install script (#729) Revert "bump goreleaser to v1.2 (#720)" (#731) Add a version flag (#722) Add lpkg as java package format (#694) ...

* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]> Signed-off-by: fsl <[email protected]>

* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]> Signed-off-by: Christopher Phillips <[email protected]>

* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]>

sambhav force-pushed the sanitize-cpe-version branch 5 times, most recently from ca5c64a to 5f2624c Compare December 21, 2021 18:43

sambhav force-pushed the sanitize-cpe-version branch from 5f2624c to d79f933 Compare December 21, 2021 18:46

sambhav force-pushed the sanitize-cpe-version branch 3 times, most recently from 47f6ac3 to 21f6611 Compare December 21, 2021 23:58

sambhav force-pushed the sanitize-cpe-version branch from 21f6611 to 8b64bf9 Compare December 22, 2021 15:49

Quote the string on decode to ensure consistent CPE string generation

a36c0b4

Signed-off-by: Sambhav Kothari <[email protected]>

sambhav force-pushed the sanitize-cpe-version branch from 8b64bf9 to a36c0b4 Compare December 22, 2021 15:50

sambhav mentioned this pull request Dec 22, 2021

Improve CycloneDX format output #710

Merged

sambhav force-pushed the sanitize-cpe-version branch from edc4db1 to af4a1cc Compare December 22, 2021 18:29

Add test cases for round-tripping the CPE and fix strip slashes

82015bf

Signed-off-by: Sambhav Kothari <[email protected]>

sambhav force-pushed the sanitize-cpe-version branch from af4a1cc to 82015bf Compare December 22, 2021 18:33

sambhav changed the title ~~Fix CPE generation when version contains epoch~~ Fix CPE encode/decode when it contains special chars Dec 22, 2021

wagoodman reviewed Dec 23, 2021

View reviewed changes

syft/pkg/cpe.go Outdated Show resolved Hide resolved

Add comprehensive tests for cpe parsing

fc6d8c5

Signed-off-by: Sambhav Kothari <[email protected]>

sambhav requested a review from wagoodman December 23, 2021 22:49

spiffcs reviewed Jan 3, 2022

View reviewed changes

syft/pkg/cataloger/common/cpe/sort_by_specificity.go Show resolved Hide resolved

syft/pkg/cpe.go Outdated Show resolved Hide resolved

syft/pkg/cpe.go Outdated Show resolved Hide resolved

syft/pkg/cpe_test.go Show resolved Hide resolved

wagoodman reviewed Jan 3, 2022

View reviewed changes

Use strings.Builder instead of byte buffer

666fd69

Signed-off-by: Sambhav Kothari <[email protected]>

sambhav force-pushed the sanitize-cpe-version branch from 3a544ec to 666fd69 Compare January 5, 2022 21:53

sambhav requested review from spiffcs and wagoodman January 6, 2022 09:08

wagoodman approved these changes Jan 6, 2022

View reviewed changes

wagoodman merged commit 2a7325a into anchore:main Jan 6, 2022

sambhav deleted the sanitize-cpe-version branch January 6, 2022 16:08

Uh oh!

Fix CPE encode/decode when it contains special chars #714

Fix CPE encode/decode when it contains special chars #714

Uh oh!

Conversation

sambhav commented Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wagoodman commented Dec 21, 2021

Uh oh!

sambhav commented Dec 21, 2021

Uh oh!

wagoodman commented Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sambhav commented Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sambhav commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sambhav commented Dec 22, 2021

Uh oh!

Uh oh!

spiffcs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spiffcs commented Jan 3, 2022

Uh oh!

wagoodman Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

wagoodman left a comment

Choose a reason for hiding this comment

Uh oh!

wagoodman Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

sambhav Jan 5, 2022

Choose a reason for hiding this comment

Uh oh!

wagoodman Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

wagoodman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sambhav commented Dec 21, 2021 •

edited

Loading

wagoodman commented Dec 21, 2021 •

edited

Loading

sambhav commented Dec 21, 2021 •

edited

Loading

sambhav commented Dec 22, 2021 •

edited

Loading