Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@wagoodman
Copy link
Contributor

@wagoodman wagoodman commented Oct 29, 2025

This adds the ability, when SYFT_EXP_CAPABILITIES=true, to use an internal syft cataloger caps command to describe cataloger capabilities such as:

  • what catalogers exist
  • what globs / evidence are searched for
  • if a cataloger finds licenses
  • if a cataloger detects particular package manager claims (listing of files, digests, package integrity hash)
  • what dependencies (if any) can be detected (depth of nodes, topology of the edges, and kinds of dependencies included)
  • what API and app-level configurations exist for each cataloger

This is available via an ascii table and JSON output.

Caution

This is an experimental feature and can change without warning and could be removed entirely. Do not depend on this command in production.

The way the capabilities are tracked is described in depth in the internal tooling's readme.

A quick summary is that we use the source code and test observations as a basis for what catalogers exist, how they are configured, and what they output. These things are then used to cross-validate a pseudo-generated packages/*.yaml (some auto generated items, some manually filled in items) with a set of completion tests (tests that ensure the full universe of things are defined and self-consistent) and then used to drive the cataloger caps command.

Fixes #4155

@wagoodman wagoodman added the changelog-ignore Don't include this issue in the release changelog label Oct 29, 2025
wagoodman and others added 3 commits October 29, 2025 15:18
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Keith Zantow <[email protected]>
default: true
- name: dependency.depth
default:
- direct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see relationships being returned from

return []pkg.Package{p}, nil, nil
so I'm not sure this is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, I think the fact that the parser tracked dependencies when parsing threw me. Technically for the dependencies depth description we only need to look at the list of packages, there don't necessarily need to be relationships... so the fact that this returns a single package is the clue.

@github-actions github-actions bot added the json-schema Changes the json schema label Nov 14, 2025
@wagoodman wagoodman force-pushed the ast-parse-cataloger-capabilities branch from f41cb60 to 725b0df Compare November 14, 2025 22:35
@github-actions github-actions bot removed the json-schema Changes the json schema label Nov 14, 2025
@anchore anchore deleted a comment from github-actions bot Nov 15, 2025
wagoodman and others added 4 commits November 14, 2025 23:45
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
@wagoodman wagoodman changed the title Add internal cataloger capability descriptions Add experimental cataloger capabilities command Dec 4, 2025
@wagoodman wagoodman marked this pull request as ready for review December 4, 2025 15:08
Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to hold this up and overall I think this is fine, especially since there are no changes to the external API, so that's great. In terms of the actual configuration files, I think they are ok -- I'd like them to be split up more (one per parser or cataloger entry), and when a new one is generated it would be nice to have empty defaults for all the fields that need to be manually filled out, but this can always be refined later. Two main thoughts:

I still think we should put capabilities files in the same directory as the cataloger, though it would probably take a little weirdness of a top-level embed (e.g. //go:embed syft/pkg/cataloger/*/*.capability.yaml or similar), which calls back to set the results on the internal package, but it could be done without any external API. Having these in separate locations I'm pretty sure is a large part of the confusion for editing them to me -- if they were in the same directory when people edited the catalogers there would be a significantly higher chance that they are updated properly when making changes that affects these files, especially those that aren't detected by the analysis.

The other holdup I have level of integration when there is still a the dichotomy of detected capability configuration vs. required manual configuration. Including this in a test hook for all PRs for what ends up being a moderately small amount of configuration. In the example cataloger, this is the generated amount of configuration, less than half of what I think is needed to be complete. (And in this particular case, it's wrong because of an oddity that required writing tests differently than expected, but that's probably a bug that should be fixed). By integrating this into the PR process, it gives us the maintenance burden of all the generator code -- almost 10,000 fairly complicated (AST parsing, etc.) LoC to generate on the order of 2,500 LoC -- that gives me pause from a maintenance perspective. I wouldn't want to get rid of this, but I wonder if it was a hint rather than an error, would that reduce the required maintenance for something that is likely to change significantly with 2.0? For example: a user adds a new cataloger without any capabilities, the generator detects what it can and gives the user a new file to edit, it doesn't really matter if it's wrong because the user just manually edits it afterwards. I acknowledge the desire to prevent drift, though. But I foresee this causing some friction for reasons I can't put my finger on -- probably not a large amount of friction, though -- moreso some new cataloger needs to do something unique and generation doesn't work quite right, but because it's all internal we can always change it later.

All that said, this is so very useful for many people, and we can iterate on the details later.

@wagoodman wagoodman self-assigned this Dec 16, 2025
@wagoodman wagoodman added this to OSS Dec 16, 2025
@wagoodman wagoodman moved this to In Progress in OSS Dec 16, 2025
@wagoodman wagoodman moved this from In Progress to In Review in OSS Dec 16, 2025
@wagoodman wagoodman removed this from OSS Dec 16, 2025
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
@wagoodman
Copy link
Contributor Author

I took another shot at moving the cataloger.yaml files, I couldn't easily get it working earlier, but did find a way in the end. It did require introducing an init function to register the capabilities into an internal package, which is not ideal, but it doesn't not introduce a breaking change and does not affect the public API.

@wagoodman wagoodman enabled auto-merge (squash) December 19, 2025 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-ignore Don't include this issue in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Command output to give more information on what catalogers look for and what they can find

4 participants