Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@courtneypacheco
Copy link
Contributor

@courtneypacheco courtneypacheco commented Feb 1, 2025

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

Overview

This GitHub action is used to:

  1. Detect when GitHub secrets would be accidentally exposed through GitHub workflows that auto-trigger on PR builds, and
  2. Notify the PR author and reviewers about the security concerns before the PR gets merged into main.

NOTE: When we talk about "auto-triggered workflows", we're essentially talking about the small E2E job, medium E2E job, unit tests, etc. because they automatically run any time a given contributor creates a pull request and/or pushes a new commit to an existing pull request. In the cases of the large E2E and x-large E2E jobs, neither of them auto-trigger on PR builds, which means if a PR author or reviewer wants the large/x-large E2E test to run on a PR, a trusted repo maintainer must review the PR's proposed changes and manually trigger the desired E2E job if they do not see any security concerns.

What is Specifically Analyzed

This action looks to see if any secrets are loaded into Git workflow environment variables, like in this case with top-level environment secrets:

name: "Some E2E Job"

 # <<< Exposed secrets at the top level (global) >>>
env:
  SECRET_TOKEN: ${{ secrets.SECRET_TOKEN }}
  SECRET_KEY: ${{ secrets.SECRET_KEY }}

jobs:
  some-job: {}
  some-other-job: {}

or secrets embedded into individual steps:

name: "Some E2E Job"

jobs:

  some-job:
    steps:
      # <<< Exposed secrets at the step level (local to the job, but still an issue) >>>
      - name: First step
        env:
          SECRET_TOKEN: ${{ secrets.SECRET_TOKEN }}
          SECRET_KEY: ${{ secrets.SECRET_KEY }}
        run: some-cmd

  some-other-job:
    steps:
      # <<< Exposed secrets at the step level (local to the job, but still an issue) >>>
      - name: First step
        env:
          SECRET_TOKEN: ${{ secrets.SECRET_TOKEN }}
          SECRET_KEY: ${{ secrets.SECRET_KEY }}
        uses: some-instructlab-gh-action

Since GitHub secrets are defined using the form ${{ secrets.SECRET_NAME }}, we can iterate through the env blocks in the file and use regular expressions to detect GitHub secrets.

What is Not Analyzed

This check does not look to see if a user outright copy-pastes a token or a password. In other words, it does NOT detect something like:

name: "Some E2E Job"

jobs:
  some-job:
    steps:

      # Password strings are not detected
      - name: Store my password to the GitHub env
        run: echo "MY_PASSWORD=my-super-secret-password" >> $GITHUB_ENV

      # Neither are tokens or API keys, like git tokens
      - name: Store my git token to the GitHub env
        run: echo "git_token=ghp_1234567890" >> $GITHUB_ENV

How it Works

This new in-house GitHub action allows the repository maintainers to either provide a valid path to a Git workflow file (via the --file flag) OR provide a directory to read Git workflow files from (via the --dir flag). Providing both flags simultaneously is not allowed or supported, and is blocked by the CLI. (Just call the CLI tool each time you want to check a new location.)

The tool then validates that: (1.) it can find the desired YAML file(s), and (2.) that the YAML files are indeed Git workflow files, not any random YAML file. If any valid Git workflow files are found, the tool then parses those workflow files to determine if any GitHub secrets are loaded into environment variables, either at the top-level of the workflow file OR through any one of the jobs' steps.

Example output from this PR's build:

Screenshot 2025-02-01 at 4 21 42 PM

Additional Info

I named this check security-lint because the tool that powers it technically falls under the category of "linters". However, I don't think it falls 100% under the existing "lint" category, as this linter is dedicated more towards security and not formatting etc.. I figure we can put future security lint steps in the same section of the lint.yml file later on as well, if we need to add more.

@mergify mergify bot added CI/CD Affects CI/CD configuration ci-failure PR has at least one CI failure labels Feb 1, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from 22ebf1e to 1c46a0f Compare February 1, 2025 16:57
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 1, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch 7 times, most recently from a7660ab to 4bad68c Compare February 1, 2025 17:36
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 1, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch 2 times, most recently from 4c67f61 to e89a4bb Compare February 1, 2025 18:01
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 1, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from e89a4bb to 4c8d763 Compare February 1, 2025 18:03
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 1, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch 2 times, most recently from 42477cd to a205e02 Compare February 1, 2025 18:12
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 1, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from a205e02 to e5d1237 Compare February 1, 2025 18:29
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 1, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch 2 times, most recently from bc8af8e to ca91aca Compare February 1, 2025 18:41
@nathan-weinberg nathan-weinberg self-requested a review February 3, 2025 15:24
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from e51c402 to a9dee01 Compare February 3, 2025 21:10
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Feb 3, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from a9dee01 to 7fce67f Compare February 3, 2025 21:16
@mergify mergify bot added the ci-failure PR has at least one CI failure label Feb 3, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from 7fce67f to ba3cf76 Compare February 3, 2025 21:22
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Feb 3, 2025
@ktdreyer
Copy link
Contributor

ktdreyer commented Feb 3, 2025

I've reviewed this and run pytest locally to independently verify it. Great job.

@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from ba3cf76 to 122116a Compare February 4, 2025 01:16
@mergify mergify bot added the container Affects containization aspects label Feb 4, 2025
@mergify
Copy link
Contributor

mergify bot commented Feb 4, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @courtneypacheco please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added documentation Improvements or additions to documentation needs-rebase This Pull Request needs to be rebased labels Feb 4, 2025
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from 122116a to 0bc940e Compare February 4, 2025 01:19
@mergify mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased labels Feb 4, 2025
This GitHub action detects any secret accidentally exposed through a GH workflow that auto-triggers on PR builds. The action's logic relies on a "deny list" approach when deciding if a workflow is problematic or not.

Also, isort is disabled for the new GitHub action because it tries to add random comments to the Python scripts.

Signed-off-by: Courtney Pacheco <[email protected]>
@courtneypacheco courtneypacheco force-pushed the courtneypacheco-detect-secrets-in-workflow-envs branch from 0bc940e to 3b74acc Compare February 4, 2025 01:21
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Feb 4, 2025
@courtneypacheco courtneypacheco removed documentation Improvements or additions to documentation container Affects containization aspects labels Feb 4, 2025
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Feb 10, 2025
Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work. I love how this is structured and tested.

if read_dir and file_path:
raise NotImplementedError(
"This tool currently does not support both --dir and --file being passed in simultaneously."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (lines 68-77) can be done with parser.add_mutually_exclusive_group(required=True) I think at the argsparse stage.


# For each YAML file we've found, store it in a dictionary for later parsing. We need the yaml file *path* as the
# key so we can provide actionable error messages if an exception is caught.
workflow_files_found = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you use it as a bool (either found or not found), so it could be a bool

f"YAML file '{file_path}' is not recognized as a Git workflow file."
)
print(f" - Successfully found and loaded workflow file: {file_path}")
git_workflow_files = {file_path: loaded_file}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two code paths (for single file and for a directory) are somewhat duplicate. We could probably reuse the code to handle a single yaml file in the for loop above (for directory case).

# worry about a bad actor editing the repo contents to retrieve and/or use our GitHub secrets without their
# knowledge or consent.
if workflow_auto_triggers_on_pull_request(trigger_conditions) is False:
filename_without_path = file_path.split("/")[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.sep? (Perhaps a better way would be to switch to pathlib.Path objects for this code and then use their name attribute to access path parts platform-independently.)


class MissingTriggerConditionsError(Exception):
"Raised when trigger conditions are not defined in a Git workflow file"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: passs here are redundant.

"""
workflow_file = None
try:
with open(workflow_filename, "r") as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: r is default

trigger_conditions: list
List of the trigger conditions, but not their configs.
"""
# Note that because GitHub workflow files use the key "on" to denote trigger conditions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love YAML! /s

# https://github.com/actions/checkout/issues/249
fetch-depth: 0

# In-house method to detect and identfy exposed secrets in Git workflow files that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identify!

@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Feb 11, 2025
@booxter booxter removed the request for review from nathan-weinberg February 11, 2025 17:29
@mergify mergify bot merged commit 4d6549d into main Feb 11, 2025
29 checks passed
@mergify mergify bot deleted the courtneypacheco-detect-secrets-in-workflow-envs branch February 11, 2025 17:30
mergify bot added a commit that referenced this pull request Mar 25, 2025
#3228)

**Checklist:**

- [x] **Commit Message Formatting**: Commit titles and messages follow guidelines in the
  [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/#summary).
- [ ] [Changelog](https://github.com/instructlab/instructlab/blob/main/CHANGELOG.md) updated with breaking and/or notable changes for the next minor release.
- [ ] Documentation has been updated, if necessary.
- [ ] Unit tests have been added, if necessary.
- [ ] Functional tests have been added, if necessary.
- [ ] E2E Workflow tests have been added, if necessary.

## Background

On Feb 11, 2025, I created an in-house GitHub action called `detect-exposed-secrets`: #3112

I have since taken the contents of this `detect-exposed-secrets` action and migrated them to our `ci-actions` repo here: https://github.com/instructlab/ci-actions/tree/main/actions/detect-exposed-workflow-secrets

During this migration process, I also updated name of the action from `detect-exposed-secrets` to `detect-exposed-workflow-secrets` so that the name of the action is accurate. (The original name implied the action might detect any type of exposed secret, when that isn't accurate.)

## Proposed Changes

* Remove the in-house GitHub action that I created ~30 days ago since it now exists in the `ci-actions` repo
* Update this repository's `lint.yml` file to reference the action from the `ci-actions` repo instead of from this repository.'
* Pin the version of the reusable action to `v0.1.0` so that any updates to the action are not automatically consumed without anyone's knowledge


Approved-by: danmcp

Approved-by: ktdreyer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants