Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sambhav
Copy link
Contributor

@sambhav sambhav commented Dec 21, 2021

Fixes #712
Fixes #426

Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

  1. It parses strings which are not CPEs incorrectly as valid CPEs. This
    messes up our filter function which is supposed to filter out any
    incorrect CPEs we generate. In order to fix this, I have introduced
    a new regex in the NewCPE function which follows the upstream spec and
    filters out any incorrect CPEs.

  2. Introduce wfn.WFNize for any cpe attributes we infer from packages.
    This ensures that we are escaping and quoting any special characters
    before putting them into CPEs. Note that nvdtools has yet another bug
    in the WFNize function, specifically the "addSlashesAt" part of the
    function which stops the loop as soon as it encounters ":" a valid
    character for a WFN attribute after quoting, but the way nvdtools
    handles it causes it to truncate strings that container ":". As a result
    strings like "prefix:1.2" which would have been quoted as "prefix:1.2"
    end up becoming "prefix" instead causing loss of information and
    incorrect CPEs being generated. As a result in such cases, we remove out
    strings containing ":" in any part entirely for now. This is similar
    to the way we were handling CPE filtering in the past with http urls as
    vendor strings

  3. Add special handling for version which contain ":" due to epochs in
    debian and rpm. In this case, we strip out the parts before ":" i.e.
    the epoch and only output the actual function. This ensures we are not
    discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Edit on Point 2 after the second commit -

Remove WFNize for input strings
WFNize seems to not be part of the standard as per
pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

@sambhav sambhav force-pushed the sanitize-cpe-version branch 5 times, most recently from ca5c64a to 5f2624c Compare December 21, 2021 18:43
Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

1. It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.

2. Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix\:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings

3. Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Signed-off-by: Sambhav Kothari <[email protected]>
@sambhav sambhav force-pushed the sanitize-cpe-version branch from 5f2624c to d79f933 Compare December 21, 2021 18:46
WFNize seems to not be part of the standard as per
https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

Signed-off-by: Sambhav Kothari <[email protected]>
@wagoodman
Copy link
Contributor

@samj1912 thanks for jumping on this!

I have a few reflections on the PR specifically:

(regarding epoch) I'm not certain it is safe to drop epochs from versions, though I see exactly why it was done. If a tool downstream of syft extracts the version from the CPE and uses that as a basis of comparison with other versions from the same product, without epoch it is not clear if the comparison is valid or not (since only versions within the same epoch are directly comparable, otherwise the epoch values themselves should be compared).

(regarding WFNize) You're right that wfn.WFNize doesn't quite do the trick, and the encode/decode path remaining stable is an important item for us. Unless I'm missing something I think the lib is missing something akin to "wfn.UnWFNize" for escaped values... something that isn't easily bolted on in syft (but possible I think).

(regarding nvdtools replacement?) You made an observation in the community slack that I wanted to highlight:

I think it might be best to use a diff. go cpe library given all the bugs that I pointed out

I think this is definitely an option that should be on the table. If we dive down this path I think we need to consider how that will impact grype matching by CPE (which is I think one of the reasons this lib was chosen to begin with).

@sambhav
Copy link
Contributor Author

sambhav commented Dec 21, 2021

https://github.com/knqyf263/go-cpe seems to be the only other cpe parsing lib that I can find. From a cursory look, it seems to handle quoting and parsing better than the current library but it seems mostly unmaintained at this point.

@wagoodman
Copy link
Contributor

wagoodman commented Dec 21, 2021

I've found a few others, but the one you found is at the top my list of ones to try as well 👍 . The cases that a CPE library needs to cover (at least) are:

  1. valid decode of a CPE formatted string
  2. be able to encode a CPE object into a formatted string
  3. be able to compare two CPEs (objects or strings)

(there is a chance I missed a case here, but these are the big ones)

The third case comes up in Grype, specifically here: https://github.com/anchore/grype/blob/4f964c4ee26ad01a80b8bcffb6bf23c0afb71d09/grype/vulnerability/store_adapter.go#L103

This function wraps the nvdtools MatchWithoutVersion method: https://github.com/facebookincubator/nvdtools/blob/952d0696a19961ffb4bc18c415df62ea4d00d2fa/wfn/matcher.go#L39-L50

If the new CPE lib doesn't have the exact comparison method we're interested, that may be OK. As long as we can craft it with the components that the lib provides (ideally easily 😄 ).

Happy to help on this one --odds are this is a pretty big undertaking (could be wrong though!). Feel free to reach out to me on slack!

@sambhav sambhav force-pushed the sanitize-cpe-version branch 3 times, most recently from 47f6ac3 to 21f6611 Compare December 21, 2021 23:58
@sambhav
Copy link
Contributor Author

sambhav commented Dec 21, 2021

From slack https://anchorecommunity.slack.com/archives/C4PJFNEEM/p1640131075279100?thread_ts=1640100243.273100&cid=C4PJFNEEM -

@alex Goodman I think I managed to solve the issue - lmk if my reasoning makes sense -

instead of sanitizing at cpe generation time, I now do so at decode, this ensures that an encode<>decode cycle produces the same cpe reproducibly and we do not rely on cpe generation to sanitize the output. The wfnize function should now correctly be unwfnize by the StripSlashes function that is called when NewCPE is used and similarly all format decoders now use the new CPEString function which correctly escapes all unwanted chars.

@sambhav
Copy link
Contributor Author

sambhav commented Dec 22, 2021

I also noticed that my regex check that was introduced in NewCPE was causing a lot of unquoted CPEs to be filtered out before my last change, which is why the encode/decode cycle test was passing - since for all problematic CPEs - it will just filtering them out instead of properly encoding/decoding them. My latest changes should ensure that all CPEs are preserved as before while still being valid now that they are quoted properly.

The regex check however is still necessary as it ensures that any possible CPEs that are invalid even after the quoting logic don't make it through to the end.

@sambhav sambhav force-pushed the sanitize-cpe-version branch from 21f6611 to 8b64bf9 Compare December 22, 2021 15:49
@sambhav sambhav force-pushed the sanitize-cpe-version branch from af4a1cc to 82015bf Compare December 22, 2021 18:33
@sambhav sambhav changed the title Fix CPE generation when version contains epoch Fix CPE encode/decode when it contains special chars Dec 22, 2021
@sambhav
Copy link
Contributor Author

sambhav commented Dec 22, 2021

@wagoodman this should be ready for review now - I had to update the decode function (normalize) to match the encode function (sanitize) since StripSlashes was also not doing what it is supposed to.

@sambhav sambhav requested a review from wagoodman December 23, 2021 22:49
Copy link
Contributor

@spiffcs spiffcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits for discussion and thanks so much for the PR @samj1912!

@spiffcs
Copy link
Contributor

spiffcs commented Jan 3, 2022

Core functionality of the PR looks like a great update to help with the issues in the nvdtools library. Test cases look good and round trip has been implemented correctly (sanitize <-> original). Small nits on the string building (buffer vs in-library builder). Excellent addition 🥳 .

CPEUrl string `json:"cpe-url"`
WFN CPE `json:"wfn"`
}{}
out, err := ioutil.ReadFile("test-fixtures/cpe-data.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test fixture file has a lot of cases! Is there a way to reduce this to a minimum (or at least smaller) set of cases that still reflect all of the features tested?

Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work @samj1912 ! I only have #714 (comment) as a comment (as a side note, I don't ever remember leaving a comment on a review that says "this may be too many tests" ... it's a nice problem to have!). If you think that the set of tests can't easily be reduced, we can get this in as is and maybe work on reducing it later (having the tests is better than not).

const cpeRegexString = ((`^([c][pP][eE]:/[AHOaho]?(:[A-Za-z0-9\._\-~%]*){0,6})`) +
// Or match the CPE binding string
// Note that we had to replace '`' with '\x60' to escape the backticks
`|(cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,/:;<=>@\[\]\^\x60\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,/:;<=>@\[\]\^\x60\{\|}~]))+(\?*|\*?))|[\*\-])){4})$`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite a scary regex! Is this check 100% necessary? or can it be left off during CPE construction? (I presume the very-large test fixture of CPEs covers some of these regex paths in testing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wagoodman I think it is necessary since we use the NewCPE to validate and filter out invalid CPEs. This ensures that syft never puts/parses an invalid CPE in an sbom. I have seen issues where syft outputs CPEs which are not accepted/matched by https://nvd.nist.gov/products/cpe/search

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this is a large test fixture you're right in the sense that this is making things better. We can always cull back in a future PR or remove if we replace the CPE library we're using in the future.

@sambhav sambhav force-pushed the sanitize-cpe-version branch from 3a544ec to 666fd69 Compare January 5, 2022 21:53
@sambhav sambhav requested review from spiffcs and wagoodman January 6, 2022 09:08
Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work, thanks for the contribution! 💯

@wagoodman wagoodman merged commit 2a7325a into anchore:main Jan 6, 2022
@sambhav sambhav deleted the sanitize-cpe-version branch January 6, 2022 16:08
spiffcs added a commit that referenced this pull request Jan 19, 2022
…hub.com/hectorj2f/syft into hectorj2f/add_dependencies_to_cyclonedx

* 'hectorj2f/add_dependencies_to_cyclonedx' of https://github.com/hectorj2f/syft: (29 commits)
  Improve CycloneDX format output (#710)
  Add additional PHP metadata (#753)
  Update Syft formats for SyftJson (#752)
  Add support for "file" source type in syftjson unmarshaling (#750)
  remove contains file from spdx dependency generation
  support .sar for java ecosystem (#748)
  Start developer documentation (#746)
  Align SPDX export more with SPDX 2.2 specification (#743)
  Replace distro type (#742)
  update goreleaser with windows checksums (#740)
  bump stereoscope version to remove old containerd (#741)
  Add support for multiple output files in different formats (#732)
  Add support for searching for jars within archives (#734)
  683 windows filepath (#735)
  Fix CPE encode/decode when it contains special chars (#714)
  support .par for java ecosystems (#727)
  Add arm64 support to install script (#729)
  Revert "bump goreleaser to v1.2 (#720)" (#731)
  Add a version flag (#722)
  Add lpkg as java package format (#694)
  ...
fengshunli pushed a commit to fengshunli/syft that referenced this pull request Jan 24, 2022
* Fix CPE generation when the generated CPE contains invalid characters

Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

1. It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.

2. Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix\:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings

3. Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Signed-off-by: Sambhav Kothari <[email protected]>

* Remove WFNize for input strings

WFNize seems to not be part of the standard as per
https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

Signed-off-by: Sambhav Kothari <[email protected]>

* Quote the string on decode to ensure consistent CPE string generation

Signed-off-by: Sambhav Kothari <[email protected]>

* Add test cases for round-tripping the CPE and fix strip slashes

Signed-off-by: Sambhav Kothari <[email protected]>

* Add comprehensive tests for cpe parsing

Signed-off-by: Sambhav Kothari <[email protected]>

* Use strings.Builder instead of byte buffer

Signed-off-by: Sambhav Kothari <[email protected]>
Signed-off-by: fsl <[email protected]>
spiffcs pushed a commit that referenced this pull request Jan 24, 2022
* Fix CPE generation when the generated CPE contains invalid characters

Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

1. It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.

2. Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix\:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings

3. Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Signed-off-by: Sambhav Kothari <[email protected]>

* Remove WFNize for input strings

WFNize seems to not be part of the standard as per
https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

Signed-off-by: Sambhav Kothari <[email protected]>

* Quote the string on decode to ensure consistent CPE string generation

Signed-off-by: Sambhav Kothari <[email protected]>

* Add test cases for round-tripping the CPE and fix strip slashes

Signed-off-by: Sambhav Kothari <[email protected]>

* Add comprehensive tests for cpe parsing

Signed-off-by: Sambhav Kothari <[email protected]>

* Use strings.Builder instead of byte buffer

Signed-off-by: Sambhav Kothari <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
spiffcs pushed a commit that referenced this pull request Jan 25, 2022
* Fix CPE generation when the generated CPE contains invalid characters

Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

1. It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.

2. Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix\:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings

3. Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Signed-off-by: Sambhav Kothari <[email protected]>

* Remove WFNize for input strings

WFNize seems to not be part of the standard as per
https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

Signed-off-by: Sambhav Kothari <[email protected]>

* Quote the string on decode to ensure consistent CPE string generation

Signed-off-by: Sambhav Kothari <[email protected]>

* Add test cases for round-tripping the CPE and fix strip slashes

Signed-off-by: Sambhav Kothari <[email protected]>

* Add comprehensive tests for cpe parsing

Signed-off-by: Sambhav Kothari <[email protected]>

* Use strings.Builder instead of byte buffer

Signed-off-by: Sambhav Kothari <[email protected]>
jonasagx pushed a commit to jonasagx/syft that referenced this pull request Jan 28, 2022
* Fix CPE generation when the generated CPE contains invalid characters

Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

1. It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.

2. Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix\:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings

3. Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Signed-off-by: Sambhav Kothari <[email protected]>

* Remove WFNize for input strings

WFNize seems to not be part of the standard as per
https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

Signed-off-by: Sambhav Kothari <[email protected]>

* Quote the string on decode to ensure consistent CPE string generation

Signed-off-by: Sambhav Kothari <[email protected]>

* Add test cases for round-tripping the CPE and fix strip slashes

Signed-off-by: Sambhav Kothari <[email protected]>

* Add comprehensive tests for cpe parsing

Signed-off-by: Sambhav Kothari <[email protected]>

* Use strings.Builder instead of byte buffer

Signed-off-by: Sambhav Kothari <[email protected]>
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* Fix CPE generation when the generated CPE contains invalid characters

Currently syft seems to generate invalid CPEs which do not
conform with the official CPE spec. This is because the underlying
nvdtools library is not a completely spec compliant implementation
and has some interesting bugs/issues.

The following are the list of issues I have encountered with nvdtools:

1. It parses strings which are not CPEs incorrectly as valid CPEs. This
messes up our filter function which is supposed to filter out any
incorrect CPEs we generate. In order to fix this, I have introduced
a new regex in the NewCPE function which follows the upstream spec and
filters out any incorrect CPEs.

2. Introduce wfn.WFNize for any cpe attributes we infer from packages.
This ensures that we are escaping and quoting any special characters
before putting them into CPEs. Note that nvdtools has yet another bug
in the WFNize function, specifically the "addSlashesAt" part of the
function which stops the loop as soon as it encounters ":" a valid
character for a WFN attribute after quoting, but the way nvdtools
handles it causes it to truncate strings that container ":". As a result
strings like "prefix:1.2" which would have been quoted as "prefix\:1.2"
end up becoming "prefix" instead causing loss of information and
incorrect CPEs being generated. As a result in such cases, we remove out
strings containing ":" in any part entirely for now. This is similar
to the way we were handling CPE filtering in the past with http urls as
vendor strings

3. Add special handling for version which contain ":" due to epochs in
debian and rpm. In this case, we strip out the parts before ":" i.e.
the epoch and only output the actual function. This ensures we are not
discarding valid version strings due to pt #.2.

In the future we should look at moving to a more spec compliant cpe
parsing library to avoid such shenanigans.

Signed-off-by: Sambhav Kothari <[email protected]>

* Remove WFNize for input strings

WFNize seems to not be part of the standard as per
https://pkg.go.dev/github.com/facebookincubator/[email protected]/wfn#WFNize
and seems to have bugs/issues with encode/decode cycles, so I am
just removing it at this point and relying on the CPE regex to filter
out invalid CPEs for now.

Signed-off-by: Sambhav Kothari <[email protected]>

* Quote the string on decode to ensure consistent CPE string generation

Signed-off-by: Sambhav Kothari <[email protected]>

* Add test cases for round-tripping the CPE and fix strip slashes

Signed-off-by: Sambhav Kothari <[email protected]>

* Add comprehensive tests for cpe parsing

Signed-off-by: Sambhav Kothari <[email protected]>

* Use strings.Builder instead of byte buffer

Signed-off-by: Sambhav Kothari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Syft can generate invalid CPEs for debs and rpms with epochs Failed to parse CPE - unbind formatted string

3 participants