4 releases (breaking)
Uses new Rust 2024
| new 0.5.0 | May 18, 2026 |
|---|---|
| 0.4.0 | May 13, 2026 |
| 0.3.0 | May 12, 2026 |
| 0.2.0 | May 6, 2026 |
#1667 in Parser implementations
Used in 2 crates
3.5MB
8K
SLoC
openvet-lockfile
Lockfile parsing into OpenVet audit Subjects.
Modern dependency managers split their dependency resolution into two files: a manifest, which contains the project's direct dependencies and version requirements, and a lockfile. Once resolution is complete, the resolved versions for every dependency, direct and transitive, along with their hashes, are written into a lockfile. This makes subsequent installs reproducible.
This crate parses the lockfiles emitted by various dependency managers and extracts parsed subjects from them. A subject uniquely identifies a resolved dependency via the tuple (registry, package, version, variant, hash), where variant is optional and carries an ecosystem-specific per-artefact discriminator: a PEP 425 wheel tag for Python, a platform suffix for RubyGems, etc. Because OpenVet anchors audits against the bytes of a specific published artefact, this crate only supports lockfile formats that carry an integrity hash for each entry. For ecosystems where hashing is opt-in (notably RubyGems), the project must be re-locked with the appropriate flag before its lockfile can be ingested.
Supported lockfiles
The supported dependency-manager lockfiles are:
- Cargo (
Cargo.lock) - npm (
package-lock.json) - pnpm (
pnpm-lock.yaml) - Yarn Classic (
yarn.lockv1) - Yarn Berry (
yarn.lockv2+) - Bun (
bun.lock, text format) - uv (
uv.lock) - PEP 751 (
pylock.toml) - Go modules (
go.sum) - RubyGems (
Gemfile.lockwithCHECKSUMS)
Where a lockfile format has gone through major shape revisions, every adapter accepts the modern revisions transparently and rejects the older ones with a clear "re-lock with a more recent tool" message. Variants that are not parsed include:
bun.lockb: Bun's binary lockfile, used before bun 1.2. The format is officially undocumented, so we reject it and recommend re-locking on bun ≥ 1.2.Gemfile.lockwithout aCHECKSUMSsection: Bundler's lockfile format did not include hashes for most of its history; theCHECKSUMSsection landed in Bundler 2.5 (Dec 2023). Without it, nothing in the lockfile is auditable, so we reject and recommendbundle lock --add-checksums.pnpm-lock.yamlwithlockfileVersion< 5: used by pnpm < 7. Pre-v5 lockfiles use a substantially different shape; rather than carry a second parser for a long-deprecated format, we reject and recommend re-locking with pnpm ≥ 7.package-lock.jsonwithlockfileVersion1: used by npm 5 and 6. The v1 shape only carries a nesteddependenciestree without the flatpackagesmap that modern npm produces; rejecting v1 keeps the parser focused on the npm-7+ shape.
In every rejection case the returned error includes the suggested re-lock command, so the user knows exactly what to run.
Subject construction
In the common case every dependency in a lockfile maps to a single
audit subject, but a few ecosystem-specific patterns make one
dependency expand to several subjects. These are passed through
honestly. The policy layer in openvet check decides whether to
require audits for all of them, any of them, or just the canonical
artefact, on a per-project basis.
- Multi-SRI integrity sets. npm, pnpm, yarn classic, and bun all
permit space-separated SRI strings such as
"sha256-… sha512-…", where every entry must hash the same bytes. We emit one subject per algorithm so an auditor inspecting only one digest can still match. - Per-platform Python wheels. Python publishes a separate wheel per (Python version × platform × ABI) tuple, each with its own hash, in addition to the source distribution. uv.lock and pylock.toml therefore emit one subject per wheel plus one for the sdist of every package.
- Per-artefact hash tables. PEP 751 explicitly allows multiple
hash algorithms per artefact (e.g.
hashes = { sha256 = …, sha512 = … }). Each entry becomes its own subject.
In addition, a few ecosystems publish multiple binary artefacts under
one (package, version) — the same logical release shipped per
platform, per Python version, etc. These show up as separate subjects
distinguished by the variant slot:
- Python wheels (
uv.lock,pylock.toml) — the wheel filename's PEP 425 tag (py3-none-any,cp39-abi3-manylinux_2_28_x86_64) becomes the variant; the source distribution getssdist. - RubyGems platform suffixes (
Gemfile.lock) — Bundler encodes the platform as a version suffix (1.17.1-aarch64-linux-gnu); the parser splits on the first-and lands the platform in the variant. Pure-Ruby gems and pre-release versions (1.16.0.rc1— RubyGems pre-releases use.not-) carry no variant.
Skip reasons
Some lockfile entries don't survive the trip into an audit subject —
either because the data needed for a subject isn't there, or because
the entry refers to something that doesn't have a stable
registry-published artefact for an auditor to have reviewed. Rather
than dropping these silently, the parser surfaces them as skipped
entries with a reason, so that openvet check can warn about the
untracked deps instead of letting them through unnoticed.
- Path dependencies — workspace members and local-path deps in
every ecosystem (npm
link:true, yarnworkspace:, pnpmdirectory:, uveditable:, RubyGemsPATH/GIT, and so on). Skipped because they aren't published immutably — there's nothing for an auditor to have reviewed. - Git dependencies — packages pulled directly from a git repository. Currently always skipped, because we don't yet have a canonical answer for what the subject tuple of a git revision looks like. The long-term goal is to audit git revisions too.
- Unknown source — recognised URL scheme but no canonical
registry coordinate. Most common case: a direct
https://tarball where the URL is the only identity. - No checksum — a registry-sourced dep whose lockfile entry has no integrity field. Tends to come from proxy registries that omit hashes from their index responses.
- Bad checksum — integrity present but malformed: bad base64 / hex, wrong length for the named algorithm, or an unknown algorithm prefix.
Dependencies
~7–12MB
~246K SLoC