-
-
Notifications
You must be signed in to change notification settings - Fork 2k
ci: Revise docs-preview-deploy.yml
#4247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
polarathene
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution, but I am not sure if it make sense to resolve these unless there is a clear benefit over the current implementation?
ifcan exclude${{ }}yes, but for maintainers I rather they don't have to think about that and would prefer to keep it simple for them to recognize and not question when it's valid to omit the expression syntax. These workflows are rarely modified beyond dependabot.- Replacing the
envcontext usage with explicit shell variables could be done, but unless there is a valid improvement, I would again prefer to keep it as-is since it'll be easier to grok and less error prone. It can be quite easy for these little mistakes to slip through.
| pull_requests: ${{ tojson(github.event.workflow_run.pull_requests) }} | ||
| run: | | ||
| PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "${{ env.head_sha }}")][0].number' <<< '${{ env.pull_requests }}') | ||
| PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "${head_sha}")][0].number' <<< '${pull_requests}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the value of this would be, and I don't think your suggestion here would work as-is due to the use of ' single-quotes wrapping this expression it would not interpolate the variables. Thus your suggestion would break the intended functionality.
The expression syntax to use env context works as that is pre-processed before the run executes.
Both of these variables are referencing the steps own env vars set directly above, which use context that shouldn't be possible to tamper with? Could you please explain the value of this suggestion?
I think there is less risk for error for maintainers with the env context expressions used, since it is distinctively clear for us to grok vs the subtle error with quotes in shell scripts affecting interpolation which can introduce bugs as this PR has done unintentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sha is totally safe since its only alphanumeric value. The github.event.workflow_run.pull_requests contains untrusted data from the triggering Pull Request though (eg: title and body). An attacker can add a body like foo'`id`'bar which will pollute the value of the pull_requests env var. If you use the ${{}} expression interpolation, the workflow expressions will get interpolated into the final bash script that will be then passed to bash for exection. So if you use ${{ env.pull_requests }} the attacker will be able to use the PR body to close the bash single quote and add new commands (eg: foo';`id`;bar).
You are right that '${pull_requests}' needs to be replaced with "${pull_requests}" since otherwise it wont get expanded. Will fix that in the PR. More info here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you use
${{ env.pull_requests }}the attacker will be able to use the PR body to close the bash single quote and add new commands (eg:foo';`id`;bar).
While I get what you're saying here, how is it different from ${pull_requests}? Both would be a string of JSON key/value pairs?
Ah ok, templating to generate script with content vs runtime variable.
The
github.event.workflow_run.pull_requestscontains untrusted data from the triggering Pull Request though (eg: title and body).
I don't think it does, pull_request context has specific metadata like that but not for this workflow_run event:
echo "PRs: ${{ tojson(github.event.workflow_run.pull_requests) }}"PRs: [
{
base: {
ref: main,
repo: {
id: 506839796,
name: actions-example,
url: https://api.github.com/repos/polarathene/actions-example
},
sha: e2e0c6555bec8ca661b59057443cfa5a54b8da75
},
head: {
ref: test-branch,
repo: {
id: 506839796,
name: actions-example,
url: https://api.github.com/repos/polarathene/actions-example
},
sha: 113f63be4028584a1528e63ff33f821d990c28d2
},
id: 2157169857,
number: 7,
url: https://api.github.com/repos/polarathene/actions-example/pulls/7
}
]
No PR title in this data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, the docs just mention that its an array of pull request objects so I assumed they contained the title and body. I would be in the safe side though, perhaps untrusted data is added to these objects in the future
| { | ||
| echo "PR_NUMBER=${PR_NUMBER}" | ||
| echo 'PR_HEADSHA=${{ env.head_sha }}' | ||
| echo 'PR_HEADSHA=${head_sha}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, the content is single quote wrapped, this would not interpolate and instead the value of PR_HEADSHA as an ENV will now be the string ${head_sha} instead of the actual fixed string of a the head SHA.
| pull_requests: ${{ tojson(github.event.workflow_run.pull_requests) }} | ||
| run: | | ||
| PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "${{ env.head_sha }}")][0].number' <<< '${{ env.pull_requests }}') | ||
| PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "${head_sha}")][0].number' <<< "${pull_requests}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will still be invalid:
jq -r '[.[] | select(.head.sha == "${head_sha}")][0].number'jq is splatting (.[]) all array items from pull_requests variable, and selecting only the one that has .head.sha value that matches the expected checksum, but due to single quote wrapping this will not work. Flipping the quotes alone will not work either, you'd need to escape the inner " AFAIK.
The env input appears fairly trustworthy though? I don't think the branch name can be used as an attack vector with quotes?
| PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "${head_sha}")][0].number' <<< "${pull_requests}") | |
| PR_NUMBER=$(jq -r "[.[] | select(.head.sha == \"${head_sha}\")][0].number" <<< "${pull_requests}") |
vs
| PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "${head_sha}")][0].number' <<< "${pull_requests}") | |
| PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "${{ env.head_sha }}")][0].number' <<< '${{ env.pull_requests }}') |
The latter seems simpler to grok, so long as the input is not exploitable that seems less prone to human error with maintenance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the branch name can be used as an attack vector with quotes?
The branch name can, but the SHA is safe to use, so in this case its ok to use the env workflow expression
Co-authored-by: Brennan Kinney <[email protected]>
polarathene
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll apply this. LGTM, thank you for contributing the fix and improvements! ❤️
docs-preview-deploy.yml
| github.event.workflow_run.conclusion == 'success' | ||
| && github.event.workflow_run.event == 'pull_request' | ||
| && contains(github.event.workflow_run.pull_requests.*.head.sha, github.event.workflow_run.head_sha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition appears to be triggering a skip for some reason.
docs-preview-prepare ran successfully on the PR pre-merge commit:
$ git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +a721146229a6adfb6f18cd28f14b8396c54ec6f7:refs/remotes/pull/4183/merge
a721146229a6adfb6f18cd28f14b8396c54ec6f7 -> pull/4183/mergeWhen cloning this repo and checkout that PR pre-merge commit, we can run git log:
$ git log
commit a721146229a6adfb6f18cd28f14b8396c54ec6f7 (grafted, HEAD, pull/4183/merge)
Author: RoelSG <email-address-here>
Date: Sun Nov 10 00:56:27 2024 +0000
Merge a5682e7c805397493aec6e5f333b236fb13c7cb3 into 0ff9c0132a8914d6756739a7a3b085e47870b93ddocs-preview-deploy then triggers for commit 0ff9c01, and the if condition above evaluates to false, skipping the workflow 🤔
The doc-preview-* workflows were triggered by a merge commit from master into the PR (that I did via Github's Web UI button to update the branch).
Commit a5682e7 reflects that, while a721146 would be the pre-merge commit Github created for the PR sync event, and commit 0ff9c01 is the current latest commit on master branch (the squash merge commit from this very merged PR updating the workflow).
Looking at an older workflow run that did pass, this commit was the trigger, even though it shouldn't have met the docs-preview-prepare path requirement to trigger, it was the PR merge to master branch commit, rather than a pre-merge commit, or as mentioned in this case a commit merging master into the PR branch (which has been a valid trigger in the past too, unrelated to the commit content).
The previous working if condition was:
if: ${{ github.event.workflow_run.event == 'pull_request' && github.event.workflow_run.conclusion == 'success' }}Presumably the contains addition is not compatible here. If that's the case then the context may not be valid below either 🤔
I suppose it depends on if a721146 (github generated pre-merge commit) or the actual latest commit on the PR branch (a5682e7?) is treated as the head_sha. Perhaps in this case it's a compatibility issue with the master to PR merge commit only, I'll verify that shortly 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE:
- The condition did not skip on my own test repo when I created a PR and sync'd to changes of the primary branch
main. - Opened a PR with a small change to trigger the workflow, that worked fine.
Perhaps it has something to do with the the failing PR having been opened prior to this CI workflow update, or it's something related to a third-party contributor/branch 🤷♂️
UPDATE 2:
Opened another PR, this time from my own fork of DMS. This triggered the same problem with the if condition skipping the workflow, and that was just the PR commits itself, no new pushes/updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the third condition:
&& contains(github.event.workflow_run.pull_requests.*.head.sha, github.event.workflow_run.head_sha)The workflow runs but shows that the context we want is missing completely:
PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "45f63b686fb61d85fec2ddab7ac4504f8b191555")][0].number' <<< "${pull_requests}")
{
echo "PR_NUMBER=${PR_NUMBER}"
echo 'PR_HEADSHA=45f63b686fb61d85fec2ddab7ac4504f8b191555'
} >> "${GITHUB_ENV}"
shell: /usr/bin/bash -e {0}
env:
head_sha: 45f63b686fb61d85fec2ddab7ac4504f8b191555
pull_requests: []
Whereas for the working PR (branch on the same repo) we have that extra context available:
PR_NUMBER=$(jq -r '[.[] | select(.head.sha == "8bef84e53657709eda5d35729399817035362efd")][0].number' <<< "${pull_requests}")
{
echo "PR_NUMBER=${PR_NUMBER}"
echo 'PR_HEADSHA=8bef84e53657709eda5d35729399817035362efd'
} >> "${GITHUB_ENV}"
shell: /usr/bin/bash -e {0}
env:
head_sha: 8bef84e53657709eda5d35729399817035362efd
pull_requests: [
{
"base": {
"ref": "master",
"repo": {
"id": 33037215,
"name": "docker-mailserver",
"url": "https://api.github.com/repos/docker-mailserver/docker-mailserver"
},
"sha": "0ff9c0132a8914d6756739a7a3b085e47870b93d"
},
"head": {
"ref": "docs/smtp-bind-fix-snippet-titles",
"repo": {
"id": 33037215,
"name": "docker-mailserver",
"url": "https://api.github.com/repos/docker-mailserver/docker-mailserver"
},
"sha": "8bef84e53657709eda5d35729399817035362efd"
},
"id": 2171261491,
"number": 4258,
"url": "https://api.github.com/repos/docker-mailserver/docker-mailserver/pulls/4258"
}
]
So perhaps some metadata does need to be passed through, but then anyone could adjust the PR/issue provided as input 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response, I totally missed the notification.
The third condition can be safely removed.
I dont know why github.event.workflow_run.pull_requests is empty for that case, but I dont see how passing it via an env var rather than interpolating it directly could change the value it receives from the runner.
I found this answer that says that pull_requests is not filled for PRs coming from a fork and they had to iterate through all open PRs to find the right one: https://stackoverflow.com/a/79017997
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go, from the GHA workflow event trigger docs
The
pull_requestwebhook event payload is empty for merged pull requests and pull requests that come from forked repositories.
There has also been an ongoing discussion about this particular issue for years:
- GH blog advises with passing PR number via artifact as ENV from untrusted to trusted workflows (which led to the security concern in the first place 😅 )
- GH CLI or REST API via JS script (rate limit risk)
pull_request_target+ re-usable workflows with restricted permissions for untrusted buildsgh run viewprovides the PR number in an awkward manner, but is only compatible with forks apparently 🤷♂️
Perhaps this is something GH can properly address, as community discussions like that with the confusion and various solutions probably isn't benefiting anyone 😅
The blog post referenced by solution 1 above, does touch on pull_request_target caveats:
You may ask yourself: if the
pull_request_targetworkflow only checks out and builds the PR, i.e. runs untrusted code but doesn’t reference any secrets, is it still vulnerable?Yes it is, because a workflow triggered on
pull_request_targetstill has the read/write repository token in memory that is potentially available to any running program.
If the workflow usesactions/checkoutand does not pass the optional parameterpersist-credentialsas false, it makes it even worse. The default for the parameter istrue.It means that in any subsequent steps any running code can simply read the stored repository token from the disk.
If you don’t need a repository write access or secrets, just stick to thepull_requesttrigger.
The persist-credentials setting appears to have that related logic here.
The blog post only refers to steps for a job, while solution 3 from above shows a workflow with two jobs, the build job of the 1st workflow provides inputs into the 2nd via workflow call trigger to perform a git checkout of the PR without secrets or permissions, which should avoid the persist-credentials concern for their preview job that runs next?
EDIT: I've been looking into solution 3 (pull_request_target) and it seems like it might be the right way to go, I'll get a PR up for this and ping you for a glance over? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third condition can be safely removed.
Yeah, it was more of a safe-guard and in this case it did correctly skip the workflow because the expected metadata was missing. I dropped it to verify that it was in fact lacking the metadata needed, so a different solution is required.
I dont know why
github.event.workflow_run.pull_requestsis empty for that case, but I dont see how passing it via an env var rather than interpolating it directly could change the value it receives from the runner.
The previous solution that did work was using the untrusted pull_request workflow (prepare) to get the pull request number and related info, then send that over to the trusted workflow_run workflow (deploy) which appended the lines to the filepath of $GITHUB_ENV.
While that did work, there was a security concern with LD_PRELOAD you mentioned and there was various concerns I had with how to approach that properly that I tried the alternative approach that we have in place currently.
I'd rather not go back to the previous method since the PR contributor can manipulate what those values would be.
I found this answer that says that
pull_requestsis not filled for PRs coming from a fork and they had to iterate through all open PRs to find the right one: https://stackoverflow.com/a/79017997
Yes, as my prior message posted at roughly the same time as your comment notes, that appears to be the case. I'll push ahead with the pull_request_target solution 👍
Description
ifcondition so that it actually works as intended.pull_requestsENV at runtime instead of embedding content into the script via GHA context expression. This is a better practice which prevent exploits from untrusted inputs (notably for context objects which might introduce new fields in future).