Prioritize hashes and download URL for PurlDB mapping #430
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In order to get an accurate mapping for a package in DejaCode to PurlDB entries the patched query prioritizes the hashes. This is needed in cases where the same PURL (without query parameters) can have multiple different download URLs as is the case with Python packages and various binaries for different hardware architectures or interpreter versions. Additionally, lookups for SHA-256 and MD5 are added as SHA-1 may not be populated under all circumstances. Hashes from SBOM imports, generated by tools such as cdxgen, commonly do not use SHA-1 anymore, since it is a mostly deprecated hashing algorithm due to the risk of hash collisions. SHA-512 could not yet be added as PurlDB does not support a lookup for it. The reason for the order of prioritization is that hashes give the most accurate for the content of the package, download URL at least points to the download location which would still allow to differentiate between the different target architectures, and lastly the PURL itself in case no fully accurate matches could be found otherwise. The results are then filtered by checking that PURLs match. Here a modification is made to also strip the query parameters from the PurlDB PURL as they may also contain them and previously caused matches to not be found. For reference see the following issues:
Checks for existing packages are also extended to compare against the other hash types.