-
Couldn't load subscription status.
- Fork 727
Fix CPE encode/decode when it contains special chars #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ca5c64a to
5f2624c
Compare
Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]>
5f2624c to
d79f933
Compare
WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]>
|
@samj1912 thanks for jumping on this! I have a few reflections on the PR specifically: (regarding epoch) I'm not certain it is safe to drop epochs from versions, though I see exactly why it was done. If a tool downstream of syft extracts the version from the CPE and uses that as a basis of comparison with other versions from the same product, without epoch it is not clear if the comparison is valid or not (since only versions within the same epoch are directly comparable, otherwise the epoch values themselves should be compared). (regarding WFNize) You're right that wfn.WFNize doesn't quite do the trick, and the encode/decode path remaining stable is an important item for us. Unless I'm missing something I think the lib is missing something akin to "wfn.UnWFNize" for escaped values... something that isn't easily bolted on in syft (but possible I think). (regarding nvdtools replacement?) You made an observation in the community slack that I wanted to highlight:
I think this is definitely an option that should be on the table. If we dive down this path I think we need to consider how that will impact grype matching by CPE (which is I think one of the reasons this lib was chosen to begin with). |
|
https://github.com/knqyf263/go-cpe seems to be the only other cpe parsing lib that I can find. From a cursory look, it seems to handle quoting and parsing better than the current library but it seems mostly unmaintained at this point. |
|
I've found a few others, but the one you found is at the top my list of ones to try as well 👍 . The cases that a CPE library needs to cover (at least) are:
(there is a chance I missed a case here, but these are the big ones) The third case comes up in Grype, specifically here: https://github.com/anchore/grype/blob/4f964c4ee26ad01a80b8bcffb6bf23c0afb71d09/grype/vulnerability/store_adapter.go#L103 This function wraps the nvdtools If the new CPE lib doesn't have the exact comparison method we're interested, that may be OK. As long as we can craft it with the components that the lib provides (ideally easily 😄 ). Happy to help on this one --odds are this is a pretty big undertaking (could be wrong though!). Feel free to reach out to me on slack! |
47f6ac3 to
21f6611
Compare
|
From slack https://anchorecommunity.slack.com/archives/C4PJFNEEM/p1640131075279100?thread_ts=1640100243.273100&cid=C4PJFNEEM -
|
|
I also noticed that my regex check that was introduced in The regex check however is still necessary as it ensures that any possible CPEs that are invalid even after the quoting logic don't make it through to the end. |
21f6611 to
8b64bf9
Compare
Signed-off-by: Sambhav Kothari <[email protected]>
8b64bf9 to
a36c0b4
Compare
edc4db1 to
af4a1cc
Compare
Signed-off-by: Sambhav Kothari <[email protected]>
af4a1cc to
82015bf
Compare
|
@wagoodman this should be ready for review now - I had to update the decode function (normalize) to match the encode function (sanitize) since StripSlashes was also not doing what it is supposed to. |
Signed-off-by: Sambhav Kothari <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits for discussion and thanks so much for the PR @samj1912!
|
Core functionality of the PR looks like a great update to help with the issues in the |
| CPEUrl string `json:"cpe-url"` | ||
| WFN CPE `json:"wfn"` | ||
| }{} | ||
| out, err := ioutil.ReadFile("test-fixtures/cpe-data.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test fixture file has a lot of cases! Is there a way to reduce this to a minimum (or at least smaller) set of cases that still reflect all of the features tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work @samj1912 ! I only have #714 (comment) as a comment (as a side note, I don't ever remember leaving a comment on a review that says "this may be too many tests" ... it's a nice problem to have!). If you think that the set of tests can't easily be reduced, we can get this in as is and maybe work on reducing it later (having the tests is better than not).
| const cpeRegexString = ((`^([c][pP][eE]:/[AHOaho]?(:[A-Za-z0-9\._\-~%]*){0,6})`) + | ||
| // Or match the CPE binding string | ||
| // Note that we had to replace '`' with '\x60' to escape the backticks | ||
| `|(cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,/:;<=>@\[\]\^\x60\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,/:;<=>@\[\]\^\x60\{\|}~]))+(\?*|\*?))|[\*\-])){4})$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite a scary regex! Is this check 100% necessary? or can it be left off during CPE construction? (I presume the very-large test fixture of CPEs covers some of these regex paths in testing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wagoodman I think it is necessary since we use the NewCPE to validate and filter out invalid CPEs. This ensures that syft never puts/parses an invalid CPE in an sbom. I have seen issues where syft outputs CPEs which are not accepted/matched by https://nvd.nist.gov/products/cpe/search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this is a large test fixture you're right in the sense that this is making things better. We can always cull back in a future PR or remove if we replace the CPE library we're using in the future.
Signed-off-by: Sambhav Kothari <[email protected]>
3a544ec to
666fd69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work, thanks for the contribution! 💯
…hub.com/hectorj2f/syft into hectorj2f/add_dependencies_to_cyclonedx * 'hectorj2f/add_dependencies_to_cyclonedx' of https://github.com/hectorj2f/syft: (29 commits) Improve CycloneDX format output (#710) Add additional PHP metadata (#753) Update Syft formats for SyftJson (#752) Add support for "file" source type in syftjson unmarshaling (#750) remove contains file from spdx dependency generation support .sar for java ecosystem (#748) Start developer documentation (#746) Align SPDX export more with SPDX 2.2 specification (#743) Replace distro type (#742) update goreleaser with windows checksums (#740) bump stereoscope version to remove old containerd (#741) Add support for multiple output files in different formats (#732) Add support for searching for jars within archives (#734) 683 windows filepath (#735) Fix CPE encode/decode when it contains special chars (#714) support .par for java ecosystems (#727) Add arm64 support to install script (#729) Revert "bump goreleaser to v1.2 (#720)" (#731) Add a version flag (#722) Add lpkg as java package format (#694) ...
* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]> Signed-off-by: fsl <[email protected]>
* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]> Signed-off-by: Christopher Phillips <[email protected]>
* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]>
* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]>
* Fix CPE generation when the generated CPE contains invalid characters Currently syft seems to generate invalid CPEs which do not conform with the official CPE spec. This is because the underlying nvdtools library is not a completely spec compliant implementation and has some interesting bugs/issues. The following are the list of issues I have encountered with nvdtools: 1. It parses strings which are not CPEs incorrectly as valid CPEs. This messes up our filter function which is supposed to filter out any incorrect CPEs we generate. In order to fix this, I have introduced a new regex in the NewCPE function which follows the upstream spec and filters out any incorrect CPEs. 2. Introduce wfn.WFNize for any cpe attributes we infer from packages. This ensures that we are escaping and quoting any special characters before putting them into CPEs. Note that nvdtools has yet another bug in the WFNize function, specifically the "addSlashesAt" part of the function which stops the loop as soon as it encounters ":" a valid character for a WFN attribute after quoting, but the way nvdtools handles it causes it to truncate strings that container ":". As a result strings like "prefix:1.2" which would have been quoted as "prefix\:1.2" end up becoming "prefix" instead causing loss of information and incorrect CPEs being generated. As a result in such cases, we remove out strings containing ":" in any part entirely for now. This is similar to the way we were handling CPE filtering in the past with http urls as vendor strings 3. Add special handling for version which contain ":" due to epochs in debian and rpm. In this case, we strip out the parts before ":" i.e. the epoch and only output the actual function. This ensures we are not discarding valid version strings due to pt #.2. In the future we should look at moving to a more spec compliant cpe parsing library to avoid such shenanigans. Signed-off-by: Sambhav Kothari <[email protected]> * Remove WFNize for input strings WFNize seems to not be part of the standard as per https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize and seems to have bugs/issues with encode/decode cycles, so I am just removing it at this point and relying on the CPE regex to filter out invalid CPEs for now. Signed-off-by: Sambhav Kothari <[email protected]> * Quote the string on decode to ensure consistent CPE string generation Signed-off-by: Sambhav Kothari <[email protected]> * Add test cases for round-tripping the CPE and fix strip slashes Signed-off-by: Sambhav Kothari <[email protected]> * Add comprehensive tests for cpe parsing Signed-off-by: Sambhav Kothari <[email protected]> * Use strings.Builder instead of byte buffer Signed-off-by: Sambhav Kothari <[email protected]>
Fixes #712
Fixes #426
Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.
The following are the list of issues I have encountered with nvdtools:
It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.
Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings
Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.
In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.
Edit on Point 2 after the second commit -
Remove WFNize for input strings
WFNize seems to not be part of the standard as per
pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.