An analysis of emerging standards for describing open source software in source repositories, with the goal of enabling a crawlable software index for procurement, supply chain security, and ecosystem transparency — particularly for the public sector.
Create a single format with a good chance of wide adoption from the open source community and the public sector that enables:
- Procurement discovery — help procurement people find software relevant to their goals (e.g., a CMS that is also a CRM)
- Vendor/contributor credit metadata — attach additional metadata (e.g., Drupal-style credit system) to provide procurement people insights about vendors knowledgeable with the software
- Supply chain steering — enable federal authorities to steer money to secure the supply chain or identify gaps
- Usage declarations — enable procurement offices to declare which software they use (inspired by openCode.de badges)
- Security and compliance assessment — enable procurement offices to evaluate the security posture, dependency transparency, and licensing compliance of software before adoption (e.g., OpenSSF Scorecard, SBOM availability, REUSE compliance)
| Standard | Format | Status | Scope |
|---|---|---|---|
| publiccode.yml | YAML | Active, EU-adopted | Public sector software metadata |
| OSS Taxonomy | YAML vocabulary | Early stage | Faceted classification for OSS |
| CodeMeta | JSON-LD | Active, academic adoption | Research software citation/metadata |
| gov-codejson | JSON | Active, U.S.-specific | U.S. federal agency software inventory |
| contribute.json | JSON | Decommissioned (2024) | Contribution onboarding metadata |
What it is: A YAML metadata file placed in the repository root, originally developed by Italy's Digital Transformation Team. Defines software name, description, categories, software type, intended audience, maintenance status/contacts, legal/licensing info, dependencies, and localized descriptions.
- Spec: https://yml.publiccode.tools/schema.core.html
- GitHub: https://github.com/publiccodeyml/publiccode.yml
Strengths:
- Strongest real-world traction by far. Mandatory in Italy since 2020 (powering Developers Italia catalog), adopted by Germany's openCode.de, and since March 2025, a prerequisite for the EU Open Source Solutions Catalogue (640+ solutions across 30+ public sector domains).
- Purpose-built for procurement. Has fields for
intendedAudience(countries, scope),categories(controlled vocabulary like "content-management", "crm"),softwareType(standalone/web, library, addon, etc.),maintenance(internal/contract/community/none with contractor details and expiry dates), andusedBy(list of public administrations using the software). - Federated crawling architecture. National catalogs feed into the EU-wide catalog via public APIs — proven at scale.
- Human-readable YAML. Low barrier for maintainers. Tooling exists: publiccode.yml editor, validators, crawlers.
- Country-specific extensions. The schema has a modular design where each country can define additional fields without breaking the core.
- Badge ecosystem proven. openCode.de already runs automated badges for maintenance, reuse (Level 1–3 based on confirmed adopters), security, and open source compliance — directly relevant to procurement trust signals.
Weaknesses:
- Public-sector-centric naming and framing. "publiccode" signals government software; open source projects outside government may not feel it applies to them. Fields like
usedByassume public administration context. - No vendor/contributor credit system. No fields for tracking which companies contribute, their expertise level, or anything resembling Drupal's credit system. The
maintenance.contractorsfield is rudimentary (name + expiry date). - Limited taxonomy depth. The
categorieslist is flat and coarse (e.g., "content-management") — no faceted classification like domain + function + audience. - No supply-chain / dependency-security fields. No SBOM reference, no OpenSSF scorecard integration, no vulnerability policy metadata beyond what badges add externally.
- Small governance community. The spec is maintained by the publiccodeyml GitHub organization. The separately-governed Foundation for Public Code (a Dutch foundation) maintains The Standard for Public Code, a broader framework that recommends publiccode.yml as a tool — but does not control the spec itself. The spec's maintainer community is small relative to its institutional adoption.
Chances of success: High for the public sector. It is already the de facto EU standard. The question is whether it can expand beyond government contexts.
What it is: A faceted classification system for categorizing open source projects across 6 dimensions: domain, role, technology, audience, layer, and function. Stored as YAML in a GitHub repo, designed to integrate with CodeMeta via namespaced keywords (e.g., "domain:web-development").
See also: The Dependency Layer in Digital Sovereignty
Strengths:
- Solves the taxonomy problem well. Multi-faceted classification is far more expressive than flat category lists. A project can be
domain:healthcare,function:authentication,layer:backend,role:librarysimultaneously. - Designed for funding/gap analysis. If "function:authentication" is widely depended on but has few maintained options, funders can identify this.
- Integrates with existing standards. Designed to work as a layer on top of CodeMeta without schema changes — pure additive.
- Backed by ecosyste.ms infrastructure. Andrew Nesbitt runs the largest open dataset of OSS metadata, tracking packages, repos, and dependencies across ecosystems.
Weaknesses:
- Not a metadata file standard. It's a classification vocabulary, not a repo-level manifest. Projects don't place an "oss-taxonomy.yml" in their repo — classification happens externally or via CodeMeta keywords.
- Early stage, single-person governance. Community-contributed via GitHub PRs but no institutional backing yet.
- No procurement, vendor, or maintenance metadata. It classifies what software does, not who maintains it, how it's supported, or who has expertise.
Chances of success: High as a vocabulary/taxonomy layer that other standards adopt. Low as a standalone standard. Best outcome: publiccode.yml and CodeMeta adopt its classification dimensions.
What it is: A JSON-LD metadata schema based on schema.org terms for describing software. A codemeta.json file in the repo root. Maintained by a research community consortium, affiliated with the SciCodes Consortium.
Strengths:
- Strong academic/research adoption. Used by Software Heritage, Zenodo, and FAIRCORE4EOSC. The primary standard for research software citation and discovery.
- Semantically rich. Built on JSON-LD and schema.org, enabling linked-data interoperability. Machine-readable in a way that integrates with the broader web of data.
- Crosswalks exist. Mappings between CodeMeta and other metadata formats (npm, PyPI, CRAN, Maven, etc.) enable automated extraction.
- Active effort to merge upstream. Work is ongoing to get CodeMeta terms adopted directly into schema.org, which would make it the web-native standard.
Weaknesses:
- Academic focus, not procurement focus. Core use case is citation and reproducibility, not "help a procurement officer find a CMS with good vendor support."
- JSON-LD complexity. The format is powerful but intimidating for average maintainers compared to simple YAML. The
@context,@typeboilerplate adds friction. - No maintenance/vendor/support metadata. No fields for contractor info, maintenance contracts, vendor expertise, or adopter declarations.
- No controlled category vocabulary. Software descriptions are free-text or use schema.org
applicationCategory, which is not standardized for procurement domains. - Low adoption outside academia. Most mainstream open source projects don't ship a
codemeta.json.
Chances of success: High in academia, low for procurement. It's the right semantic foundation but lacks the procurement-specific fields. Best outcome: serve as the underlying linked-data vocabulary that publiccode.yml maps to.
What it is: A JSON schema for collecting metadata about U.S. federal agency software projects. Evolved from the code.gov code.json standard mandated by M-16-21 and the SHARE IT Act.
Strengths:
- U.S. federal mandate. Agencies are legally required to publish this metadata, creating a compliance-driven adoption floor.
- Active development. Schema v2.0.0 released September 2025, showing ongoing iteration.
- Government-specific fields. Designed for the compliance context of federal software inventory.
Weaknesses:
- U.S.-specific scope. Designed for U.S. federal compliance, not international use. No localization, no multi-country architecture.
- Agency-centric, not project-centric. Metadata describes what an agency has, not what a project is. It's an inventory format, not a discovery format.
- Tiny community. 5 stars, 2 forks — the team is "still building the core team and defining roles and responsibilities."
- No ecosystem for open source broadly. Not designed to be placed in open source repositories by their maintainers; it's for agency reporting.
- Overlaps with publiccode.yml but with less maturity, less tooling, and narrower geographic scope.
Chances of success: Low as a global standard. Will persist as a U.S. compliance artifact. The smart move would be for DSACMS to adopt publiccode.yml with a U.S. country extension, similar to how Italy and Germany did.
What it is: A JSON schema for describing how to contribute to an open source project — communication channels, bug tracking, deployment URLs, tech stack.
Strengths:
- None that are currently relevant.
Weaknesses:
- Decommissioned and archived (March 2024). The website (contributejson.org) is gone. The schema was never stabilized.
- Narrow scope. Only covered contribution onboarding metadata, not software classification, procurement, or supply chain.
- Zero adoption beyond Mozilla's own projects.
Chances of success: Zero. This is a dead project. Mentioned here only as a cautionary tale about standards that don't find a community.
| Standard | Focus | Relevance |
|---|---|---|
| CITATION.cff | Software citation (YAML) | GitHub-native support. Covers authorship/citation only. Complementary, not competing. |
| REUSE (FSFE) | Per-file licensing compliance using SPDX | Adopted by KDE, SAP, Nextcloud, DLR. Already integrated as a badge on openCode.de. Complementary. |
| OpenSSF Scorecard | Automated security health scoring | Scans 1M+ projects weekly. Security metadata that procurement officers need. Complementary. |
| CycloneDX / SPDX | SBOM (Software Bill of Materials) | Supply chain transparency, mandated by U.S. executive order. Answers "what are the dependencies" not "what is this software for." Complementary. |
| schema.org SoftwareSourceCode | Web-native linked data vocabulary | The semantic backbone that CodeMeta builds on. Not a standalone file format. |
| Canada Open Resource Exchange | Canadian government open source catalog | Uses its own schema, but has discussed adopting publiccode.yml. |
| Drupal Credit System | Vendor contribution tracking and marketplace ranking | The most mature model for linking contributions to vendor credibility. Not a file format but a system design to learn from. |
A growing number of initiatives assess digital sovereignty at the organization, provider, or country level. These operate above the project metadata layer — they are consumers of the kind of data publiccode.yml provides, not competing standards. The EU Cloud Sovereignty Framework's SOV-5 (Supply Chain, 20% weight) and SOV-6 (Technology Sovereignty — requires open source components, open APIs, open protocols) directly need project-level metadata like SBOM references, license information, and open standard declarations.
| Initiative | Scope | What it assesses |
|---|---|---|
| EU Cloud Sovereignty Framework | Cloud providers (EU) | 8 sovereignty objectives (SOV-1–SOV-8), SEAL 0–4 rating. Basis for a €180M EU procurement tender. SOV-6 explicitly requires open source. |
| Munich Digital Sovereignty Score | Municipal IT services | Nutri-Score-style rating of vendor lock-in, jurisdiction risk, open standards. Applied to 194 services; integrated into procurement. |
| Bechtle Index für digitale Souveränität | Enterprise / public sector orgs | Data sovereignty, technological independence, design freedom. Software-based assessment launching Q1 2026. |
| Nextcloud Digital Sovereignty Index | Countries | Self-hosted tool deployments per 100k citizens across ~60 countries. Measures adoption of 50 collaboration tools. |
| SUSE CSF Assessment | Organizations | Free self-assessment tool scoring against the EU Cloud Sovereignty Framework. Produces SEAL rating and gap analysis. |
These frameworks strengthen the case for publiccode.yml extensions: sovereignty assessors need machine-readable project metadata (licenses, SBOMs, open standard compliance, dependency transparency) to automate their evaluations. The proposed supplyChain section and faceted classification directly feed this need.
- publiccode.yml has
categoriesbut it's flat. Adopting the OSS Taxonomy faceted approach (domain + function + audience + layer) would make this dramatically better. - Missing: No marketplace-style search UX exists beyond openCode.de and the EU OSS Catalogue.
- Nobody covers this today. This is the biggest gap. Drupal's system works because it's centralized — the credit system tracks contributions per-issue and rolls up to marketplace rankings.
- What's needed: A way for projects to point to external credit registries that track vendor/contributor data. The ecosyste.ms funds leaderboard proposal and the Open Source Pledge integration discussion are working toward this from the funding side.
- Possible approach: Extend publiccode.yml with a
creditRegistriessection — project-endorsed pointers to external systems (Drupal-style credit APIs, ecosyste.ms, forge stats) where the dynamic credit data actually lives.
- OSS Taxonomy is purpose-built for this (identify underinvested function areas).
- OpenSSF Scorecard provides the security dimension.
- CycloneDX/SPDX SBOMs provide the dependency graph.
- Missing: A way to link these together in a single metadata file. publiccode.yml could reference SBOM locations and scorecard URLs.
- publiccode.yml has
usedBybut it's manually maintained and unverified. - openCode.de's badge system is the most advanced implementation: confirmed adopter declarations drive the Reuse Badge (Level 1–3).
- Missing: A federated protocol for procurement offices to publish their software declarations. Usage data doesn't belong in the project's publiccode.yml (the project has no authority over who uses it) — it needs decentralized usage registries with a standardized discovery mechanism.
Build on publiccode.yml. It has the strongest momentum:
- Legal mandate in Italy, adopted in Germany, prerequisite for the EU OSS Catalogue since 2025
- Proven federated crawler architecture
- Active tooling ecosystem (editors, validators, crawlers)
- Already designed for the public sector audience
But extend it with:
- Faceted taxonomy (from OSS Taxonomy) replacing or supplementing the flat
categorieslist — enabling "CMS that's also a CRM" queries - Credit registry discovery — a
creditRegistriessection pointing to external systems (Drupal-style credit APIs, ecosyste.ms) where vendor/contributor data lives, endorsed by the project - Supply chain references — fields pointing to SBOM locations, OpenSSF Scorecard results, and REUSE compliance status
- A companion Registry Discovery Standard — decentralized usage registries where procurement offices declare what they deploy, discoverable by crawlers without per-project configuration, enabling the reuse badge system to scale beyond openCode.de
Make publiccode.yml attractive to non-government open source projects by:
- Reframing it (or creating a profile/alias) that doesn't signal "government only"
- Making the public-sector-specific fields optional while keeping the core useful for any project
- Integrating with package registries (npm, PyPI, crates.io) through CodeMeta crosswalks so that metadata can be auto-populated
The worst outcome would be fragmentation — five competing standards each with 10% adoption. publiccode.yml is the only candidate that has crossed the threshold from "proposal" to "infrastructure with legal backing," and the EU's adoption makes it the gravitational center.
See PROPOSAL.md for a concrete proposal with schema extensions for faceted classification, supply chain references, credit registry discovery, a Registry Discovery Standard for usage declarations, and optional linked data representation.
See PITCH.md for stakeholder pitches — what each actor (maintainers, vendors, procurement offices, funders, registries) gains economically and politically from this proposal.
See METHODOLOGY.md for research provenance — full documentation of the AI-assisted research process, prompts, sources, editorial decisions, and limitations.
See ROADMAP.md for an illustrative phased implementation plan - to work towards the proposal by building momentum with achievable milestones, and address critical risks.
See RISK_ANALYSIS.md for a systematic examination of risks to the proposal - each risk includes an assessment of likelihood and impact, plus potential mitigations.