PyCon Talk: GitHub Actions Security in Python Packages

Data collection and analysis for a PyCon talk on GitHub Actions security across Python packages.

Uses ecosyste.ms to identify Python packages, then scans their GitHub Actions workflows with zizmor to find common security misconfigurations.

Slides

The deck is slides.md, rendered with Marp and a custom theme in theme.css.

npx @marp-team/marp-cli slides.md --theme theme.css -o slides.html
npx @marp-team/marp-cli slides.md --theme theme.css -o slides.html --watch
npx @marp-team/marp-cli slides.md --theme theme.css -o slides.pdf --allow-local-files

Open slides.html in a browser. f for fullscreen, p for presenter view with speaker notes.

Data collection

Requires uv.

cd collect

# run everything for a registry (fetch, scan, load, report)
uv run run.py pypi.org
uv run run.py rubygems.org --critical

# or run steps individually
uv run main.py                      # fetch packages from ecosyste.ms (resumable)
uv run scan.py                      # clone repos, run zizmor, extract actions (resumable)
uv run load_db.py                   # zizmor findings -> data/pypi_org.db
uv run report.py                    # findings report -> data/report_pypi_org.md
uv run load_actions_db.py           # action uses -> data/actions_pypi_org.db
uv run report_actions.py            # actions report -> data/report_actions_pypi_org.md

All scripts default to pypi.org and accept an optional registry argument and --critical flag.

Additional scripts, run ad-hoc against the same databases:

slide_data.py — every number that appears on a slide, regenerable after a scan
bucket_cves.py — fetches GHSA advisories (ecosystem=actions) and buckets them by zizmor audit
report_brief.py — toolchain/brief analysis report
report_token_risk.py — packages ranked by PyPI token-hygiene risk
slice_publish_jobs.py — third-party actions running in jobs that also run pypa/gh-action-pypi-publish
resolve_actions.py — resolves transitive uses: dependencies inside composite actions
typosquat.py — typosquat variants of popular actions that exist in the actions inventory
export_workflows.py — exports workflow files from the worst repos for review
compare.py — cross-registry comparison of zizmor findings

scan.py writes per-package results to data/zizmor_results/<registry>/<pkg>.json with a <pkg>.sha sidecar recording the commit scanned. Pass --workers N to clone and scan N repos concurrently. Pass --no-brief to skip the toolchain analysis and clone --sparse so only .github/ is ever materialised; use this for full-registry scans. Clones over 500 MB are skipped and recorded in failed.json. By default it skips any package that already has results, so an interrupted run can be resumed by re-running the same command. Pass --force to refresh: each repo's HEAD is checked with git ls-remote and only repos with new commits since the recorded SHA are re-cloned and re-scanned. Clones retry up to three times on transient errors (timeouts, early EOF, 5xx, rate limits) with a 300s per-attempt timeout.

For a clean point-in-time snapshot, move data/zizmor_results/<registry>/ aside before running.

zizmor is pinned to a specific version in scan.py so results are comparable across runs.

uv run python -m unittest test_scan -v

Restoring data on another machine

The full collect/data/ tree is ~41 GB and not in git. A travel archive (pycon-travel.zip, ~610 MB) contains everything except data/zizmor_results*/ and data/pages/ — enough to run all reports, slide_data.py, and sqlite queries without re-scanning.

The archive paths are rooted at pycon/, so unzip from the parent directory of an existing clone to overlay the data:

git pull                    # make sure the clone is current first
cd ..                       # parent of pycon/
unzip pycon-travel.zip      # overlays collect/data/ into the clone
cd pycon/collect
uv sync

git status should be clean afterwards. If you need raw zizmor output or fetched pages, re-run scan.py / main.py — both are resumable.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
collect		collect
ecosystem-reports		ecosystem-reports
images		images
.gitignore		.gitignore
01-ci-supply-chain.md		01-ci-supply-chain.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
bio.md		bio.md
post.md		post.md
slides.html		slides.html
slides.md		slides.md
slides.pdf		slides.pdf
theme.css		theme.css
zizmor_issue.md		zizmor_issue.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyCon Talk: GitHub Actions Security in Python Packages

Slides

Data collection

Restoring data on another machine

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyCon Talk: GitHub Actions Security in Python Packages

Slides

Data collection

Restoring data on another machine

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages