feat: add support for cataloging GGUF models

## Summary

Add a new ai-artifact cataloger in Syft that detects and parses GGUF files (.gguf). We want to do header-only metadata extraction (fast, no full weights or full downloads). Emit results in:

- Syft JSON (native) using a new metadata type for GGUF
- CycloneDX 1.6 (ML-BOM) as machine-learning-model components with basic properties.

### Goals / Scope

Detect .gguf files from supported sources. This issue starts with Local FS & container filesystem. A second issue will focus on OCI media types and adding a new syft source to parse the docker layer API for efficient cataloging.

## Notes
- Parse only the GGUF header (magic, version, KV count, KV table) to capture identity & key facts.
- Create a new package type `model` and a new metadata type gguf-file-metadata.

Emit Syft JSON package(s) with:
- type: "model"
- metadataType: "gguf-file-metadata"
- metadata: minimal but stable fields (see below).

Emit CycloneDX 1.6 with:
- type: "machine-learning-model"
- minimal modelCard.modelParameters and properties mapping (see below).
- Zero network calls for local/container sources.


We're also looking for a stable global identifiers across remotes. This will be obtained by taking a hash of the metadata extracted from the model.

## Examples
### Syft JSON example (native):
```json
{
  "name": "Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf",
  "type": "ai-artifact",
  "foundBy": "ai-artifact-cataloger",
  "locations": [{"path": "/models/Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf"}],
  "licenses": [],
  "purl": "",
  "metadataType": "gguf-file-metadata",
  "metadata": {
    "ModelFormat": "gguf",
    "ModelName": "Qwen3-Coder-30B-A3B-Instruct",
    "ModelVersion": "unknown",
    "FileSize": 0,                       // best-effort if available from resolver
    "Hash": "",                          // leave blank unless already computed upstream
    "License": "apache-2.0",
    "GGUFVersion": 3,
    "Architecture": "qwen3moe",
    "Quantization": "IQ4_NL",
    "Parameters": 0,                     // if present in header
    "TensorCount": 579,                  // derived from header tensor entries
    "Header": {                          // raw KVs (namespaced)
      "general.architecture": "qwen3moe",
      "general.name": "Qwen3-Coder-30B-A3B-Instruct",
      "general.license": "apache-2.0",
      "general.quantized_by": "Unsloth"
    },
    "TruncatedHeader": false
  }
}
```

### CycloneDX 1.6 (ML-BOM) mapping:
component:
- type = "machine-learning-model"
- name = general.name || filename
- version = header field if available (else "unknown")
- modelCard.modelParameters (best-effort):
- architectureFamily from general.architecture (map common values: llama/qwen/gemma → "transformer" family)
- modelArchitecture freeform (e.g., "decoder-only", if inferable; else omit)

Note: Keep CycloneDX output minimal & typed; avoid dumping the entire KV bag to properties.

### CLI UX

Works out of the box for local files or hugging face URL:
- `syft dir:./path/to/models -o json`
- `go run cmd/syft/main.go -o json https://huggingface.co/janhq/Jan-v1-4B-GGUF/blob/main/Jan-v1-4B-Q4_K_M.gguf`

Add --select-catalogers=ai-artifact to limit runs if needed (optional).

### Follow-ups
- OCI Artifact (local | remote)
- PURL strategy (e.g., pkg:huggingface/...) once we add remote/registry context.
- Safetensors & ONNX parsers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for cataloging GGUF models #4184

Summary

Goals / Scope

Notes

Examples

Syft JSON example (native):

CycloneDX 1.6 (ML-BOM) mapping:

CLI UX

Follow-ups

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: add support for cataloging GGUF models #4184

Description

Summary

Goals / Scope

Notes

Examples

Syft JSON example (native):

CycloneDX 1.6 (ML-BOM) mapping:

CLI UX

Follow-ups

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions