-
Notifications
You must be signed in to change notification settings - Fork 450
feat: Add new in-house GitHub action to detect secrets exposure #3112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
22ebf1e to
1c46a0f
Compare
a7660ab to
4bad68c
Compare
4c67f61 to
e89a4bb
Compare
e89a4bb to
4c8d763
Compare
42477cd to
a205e02
Compare
a205e02 to
e5d1237
Compare
bc8af8e to
ca91aca
Compare
e51c402 to
a9dee01
Compare
a9dee01 to
7fce67f
Compare
7fce67f to
ba3cf76
Compare
|
I've reviewed this and run |
ba3cf76 to
122116a
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @courtneypacheco please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
122116a to
0bc940e
Compare
This GitHub action detects any secret accidentally exposed through a GH workflow that auto-triggers on PR builds. The action's logic relies on a "deny list" approach when deciding if a workflow is problematic or not. Also, isort is disabled for the new GitHub action because it tries to add random comments to the Python scripts. Signed-off-by: Courtney Pacheco <[email protected]>
0bc940e to
3b74acc
Compare
booxter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work. I love how this is structured and tested.
| if read_dir and file_path: | ||
| raise NotImplementedError( | ||
| "This tool currently does not support both --dir and --file being passed in simultaneously." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (lines 68-77) can be done with parser.add_mutually_exclusive_group(required=True) I think at the argsparse stage.
|
|
||
| # For each YAML file we've found, store it in a dictionary for later parsing. We need the yaml file *path* as the | ||
| # key so we can provide actionable error messages if an exception is caught. | ||
| workflow_files_found = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you use it as a bool (either found or not found), so it could be a bool
| f"YAML file '{file_path}' is not recognized as a Git workflow file." | ||
| ) | ||
| print(f" - Successfully found and loaded workflow file: {file_path}") | ||
| git_workflow_files = {file_path: loaded_file} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two code paths (for single file and for a directory) are somewhat duplicate. We could probably reuse the code to handle a single yaml file in the for loop above (for directory case).
| # worry about a bad actor editing the repo contents to retrieve and/or use our GitHub secrets without their | ||
| # knowledge or consent. | ||
| if workflow_auto_triggers_on_pull_request(trigger_conditions) is False: | ||
| filename_without_path = file_path.split("/")[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.sep? (Perhaps a better way would be to switch to pathlib.Path objects for this code and then use their name attribute to access path parts platform-independently.)
|
|
||
| class MissingTriggerConditionsError(Exception): | ||
| "Raised when trigger conditions are not defined in a Git workflow file" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: passs here are redundant.
| """ | ||
| workflow_file = None | ||
| try: | ||
| with open(workflow_filename, "r") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: r is default
| trigger_conditions: list | ||
| List of the trigger conditions, but not their configs. | ||
| """ | ||
| # Note that because GitHub workflow files use the key "on" to denote trigger conditions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love YAML! /s
| # https://github.com/actions/checkout/issues/249 | ||
| fetch-depth: 0 | ||
|
|
||
| # In-house method to detect and identfy exposed secrets in Git workflow files that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identify!
#3228) **Checklist:** - [x] **Commit Message Formatting**: Commit titles and messages follow guidelines in the [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/#summary). - [ ] [Changelog](https://github.com/instructlab/instructlab/blob/main/CHANGELOG.md) updated with breaking and/or notable changes for the next minor release. - [ ] Documentation has been updated, if necessary. - [ ] Unit tests have been added, if necessary. - [ ] Functional tests have been added, if necessary. - [ ] E2E Workflow tests have been added, if necessary. ## Background On Feb 11, 2025, I created an in-house GitHub action called `detect-exposed-secrets`: #3112 I have since taken the contents of this `detect-exposed-secrets` action and migrated them to our `ci-actions` repo here: https://github.com/instructlab/ci-actions/tree/main/actions/detect-exposed-workflow-secrets During this migration process, I also updated name of the action from `detect-exposed-secrets` to `detect-exposed-workflow-secrets` so that the name of the action is accurate. (The original name implied the action might detect any type of exposed secret, when that isn't accurate.) ## Proposed Changes * Remove the in-house GitHub action that I created ~30 days ago since it now exists in the `ci-actions` repo * Update this repository's `lint.yml` file to reference the action from the `ci-actions` repo instead of from this repository.' * Pin the version of the reusable action to `v0.1.0` so that any updates to the action are not automatically consumed without anyone's knowledge Approved-by: danmcp Approved-by: ktdreyer
Checklist:
conventional commits.
Overview
This GitHub action is used to:
main.NOTE: When we talk about "auto-triggered workflows", we're essentially talking about the small E2E job, medium E2E job, unit tests, etc. because they automatically run any time a given contributor creates a pull request and/or pushes a new commit to an existing pull request. In the cases of the large E2E and x-large E2E jobs, neither of them auto-trigger on PR builds, which means if a PR author or reviewer wants the large/x-large E2E test to run on a PR, a trusted repo maintainer must review the PR's proposed changes and manually trigger the desired E2E job if they do not see any security concerns.
What is Specifically Analyzed
This action looks to see if any secrets are loaded into Git workflow environment variables, like in this case with top-level environment secrets:
or secrets embedded into individual steps:
Since GitHub secrets are defined using the form
${{ secrets.SECRET_NAME }}, we can iterate through the env blocks in the file and use regular expressions to detect GitHub secrets.What is Not Analyzed
This check does not look to see if a user outright copy-pastes a token or a password. In other words, it does NOT detect something like:
How it Works
This new in-house GitHub action allows the repository maintainers to either provide a valid path to a Git workflow file (via the
--fileflag) OR provide a directory to read Git workflow files from (via the--dirflag). Providing both flags simultaneously is not allowed or supported, and is blocked by the CLI. (Just call the CLI tool each time you want to check a new location.)The tool then validates that: (1.) it can find the desired YAML file(s), and (2.) that the YAML files are indeed Git workflow files, not any random YAML file. If any valid Git workflow files are found, the tool then parses those workflow files to determine if any GitHub secrets are loaded into environment variables, either at the top-level of the workflow file OR through any one of the jobs' steps.
Example output from this PR's build:
Additional Info
I named this check
security-lintbecause the tool that powers it technically falls under the category of "linters". However, I don't think it falls 100% under the existing "lint" category, as this linter is dedicated more towards security and not formatting etc.. I figure we can put future security lint steps in the same section of thelint.ymlfile later on as well, if we need to add more.