Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@nikola-jokic
Copy link
Collaborator

@nikola-jokic nikola-jokic commented Aug 15, 2025

As part of the effort to remove the volume as a dependency, this PR intends to fully replace the volumes by using node exec cp.
This change eliminates the need for workflow pods and container steps to execute on the same node where the runner is (in case read write once is used). If read write many volumes are used, then affinities or use scheduler must have been used, which made it more complicated and not ideal.

Due to a need for the workflow pods to land on different nodes for most environments, we used the exec API with retries. This will increase the duration of the workflow, but will eliminate the whole set of issues.

To reduce the amount of data being copied, only the temp dir is copied for each run step, while the whole work repository is copied during the setup.

@nikola-jokic nikola-jokic mentioned this pull request Aug 15, 2025
Copy link

@austinpray-mixpanel austinpray-mixpanel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this gets too far: can you address the concerns with this approach I raised over in #160 (comment) ?

@zarko-a
Copy link
Contributor

zarko-a commented Aug 18, 2025

Before this gets too far: can you address the concerns with this approach I raised over in #160 (comment) ?

I'll take a stab at responding as I'd really like to get this feature out as soon as possible :)

Cloning a volume via your cloud provider's API, then mounting it inside K8S is FAR more complicated than doing a simple copy via exec API. My understanding is that runner copies only the job "spec" (for the lack of better word) and maybe nodejs binary to what used to be a shared volume. Although maybe node is copied from the init container actually, I don't have the full picture of Nikola's implementation yet. In any case the size of this is relatively small and I don't see why it shouldn't be reliable. Doing a whole PV clone for <100MB of files seems like a huge overkill. Potentially heavy operations like repo cloning actually happen in workflow pod and wouldn't be copied using kube exec API.

Most importantly runner container hooks are written to be pretty generic and not prefer one cloud provider over the other.
Be careful what you wish for, even if they decided to implement something like you are suggesting. GCP/GKE would likely be the last one to get support for this. Both AWS and and Azure are bigger and I'm sure GH has more customers on those two clouds than GCP.

@austinpray-mixpanel
Copy link

Hey @zarko-a! Yeah thanks for braining this out with me

my main concern was
"I have significant doubts that this will be a stable approach. At scale we observe even trivial use cases for the exec api (like exec into a pod and check for existence of a file on a cron) to fail for all sorts of reasons."

To expand on that:

  • Anecdotally the exec API is super flakey. We experience lots of random connection issues and timeouts when we issued execs 100s of times per day as a part of our deploy workflows. This is anecdotal on Kube 1.25-1.27 though. We removed exec api stuff from the hot path around the time 1.27 was released.
    • I'm happy to burn some $$$ if we want to stress test this. Like Spin up hundreds of pod pairs and use this code to copy files between the pods.
  • Logically the exec API is dependent on control plane uptime, which is not 100%
    • For instance in GKE land the control plane has a 99.5% and 99.95% monthly uptime SLA for zonal and regional clusters respectively. Intentional control plane upgrades and other things like that could also cause api downtime which would fail worker setup.

👉 So at minimum I would expect this implementation to expect these execs to fail or be interrupted. Heavily integrate backoff retries or something like that.

GCP/GKE would likely be the last one to get support for this. Both AWS and and Azure are bigger and I'm sure GH has more customers on those two clouds than GCP.

Well yeah if there was an ADR out for cloud specific providers my team would for sure contribute a GCP one in short order

@anlesk
Copy link

anlesk commented Sep 3, 2025

Adding my 5 cents here. As we have altered hooks code on our end and implemented copy via k8s api, similar to how it was originally proposed in #160, we had to implement retry mechanism and the uptime is still not 100%, as sometimes copy fails even after a number of retries with or without sleep/wait incorporated between the attempts.

We do see the retry to kick in in roughly 20% of the executions.

The copy of the nodejs distro still takes a decent amount of time and the overall lag for workflow container kickstart varies between a 40s to 3min on our set up.

@nikola-jokic
Copy link
Collaborator Author

Hey @anlesk,

Thank you for your feedback! That is one of the reasons I wanted to use an init container to copy runner assets, so we can only copy the temporary directory. The _temp should be much smaller in size, resulting in a lower number of errors, but we will still add retries to ensure it works.
Initially, we would copy the workdir, which has actions downloaded. This one would be slightly larger, but node assets should be much larger than that.

@nikola-jokic
Copy link
Collaborator Author

Quick update:

Retries are added. We are trying to minimize the number of files being copied by leveraging the init container as much as possible. It is challenging with the container action, where the workspace should be mounted, and we don't know in advance what the workspace will look like at the time the container step is invoked.

But for most use-cases, a single workspace copy at the time the prepare-job is called should be okay, and each run-step basically copies the temp dir only to the container, and back to the runner. This approach may lead to fewer issues.

@ukhl
Copy link

ukhl commented Sep 5, 2025

I was discussing this issue with someone and they mentioned it sounds similar to this problem AWS is looking to solve for their mountpoint-s3-csi-driver.
https://github.com/awslabs/mountpoint-s3-csi-driver/blob/0753c0635a38b68a4433683bef53b769ad2c7b40/docs/HEADROOM_FOR_MPPOD.md

This solution wouldn't get rid of the runner and workflow pods be on the same node requirement, but it does provide a potential solution for setting requests and limits on workflow pods and ensuring there is enough space on a node for these pods. I think everyone would prefer the flexibility of workflow pods going to whatever node has room for them, but if the issues people are bringing up with exec being too flaky ends up holding true, or the spin up time of these pods is heavily impacted by the copy solution, maybe this "headroom" pod solution is a decent alternative.

@nikola-jokic
Copy link
Collaborator Author

Hey @ukhl,

There is no way we can afford to build on top of a cloud-specific solution. However, the intention of this repo is to provide a solution that should work in most cases and would allow you to customize/modify the implementation that suits your needs.

The copy is needed because there are many instances where read-write-many volumes don't exist in user environments. If we had the luxury of relying on read-write-many volumes, it would avoid all the headaches of scheduling the workflow pod on the same node as the runner. We would simply rely on the shared filesystem maintained by a driver, allowing workflow pods to be scheduled on nodes with enough capacity to handle it.

Since the copy using exec is supported by Kubernetes itself, and many people have faced issues using the volume, we are trying our best to remove this dependency. I would personally love to avoid it, since we now have to make sure the file permissions are properly set, that there are binaries available on arbitrary job containers in order to execute the action, user volume mounts are applied to the correct places, etc... But we can't. That is why using the exec API with retries, transferring the least amount of files possible, does seem to be the best approach.

We truly appreciate the feedback, though! It is amazing to see this many people coming up with such helpful and thoughtful feedback and suggestions!

@ukhl
Copy link

ukhl commented Sep 6, 2025

Hi @nikola-jokic I think you might have jumped to conclusions based on the repository I linked to. The doc I linked to isn't cloud proprietary. It's a method of getting multiple pods to schedule together via pod affinity.

The idea here would be to schedule the runner pod and a dummy workflow pod at the same time. This pod would then be replaced with a real workflow pod if/when required.

This approach would be less efficient on resources, but only if you don't use workflow/step pods that often. It would solve the problem of trying to schedule a workflow/step pod on a node and there not being enough resources for it.

@nikola-jokic nikola-jokic marked this pull request as ready for review September 24, 2025 13:38
Copilot AI review requested due to automatic review settings September 24, 2025 13:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the dependency on Kubernetes persistent volumes by implementing a local filesystem approach for sharing data between the runner and pods. The changes simplify the architecture by using emptyDir volumes and file copying operations instead of persistent volumes.

  • Replaces persistent volume mounts with emptyDir volumes and file copying operations
  • Adds new functions for copying files to/from pods using tar streams
  • Refactors volume mounting logic to use container-specific volumes instead of shared work volumes

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
packages/k8s/tests/test-setup.ts Removes volume creation logic and adds local directory setup for testing
packages/k8s/tests/run-container-step-test.ts Updates tests to use prepareJob before running container steps
packages/k8s/tests/prepare-job-test.ts Removes volume mount validation tests and updates user volume mount tests
packages/k8s/tests/k8s-utils-test.ts Removes containerVolumes tests and renames writeEntryPointScript to writeRunScript
packages/k8s/src/k8s/utils.ts Replaces volume mount logic with script generation for file operations
packages/k8s/src/k8s/index.ts Adds file copying functions and removes persistent volume creation
packages/k8s/src/index.ts Updates runScriptStep call signature
packages/k8s/src/hooks/run-script-step.ts Adds file copying operations before and after script execution
packages/k8s/src/hooks/run-container-step.ts Refactors to use pod-based execution with file copying
packages/k8s/src/hooks/prepare-job.ts Updates job preparation to use file copying instead of volume mounts
packages/k8s/package.json Adds tar-fs dependency and updates dev dependencies
examples/extension.yaml Removes security context from extension example

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


kc.loadFromDefault()

const k8sApi = kc.makeApiClient(k8s.CoreV1Api)
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The k8sStorageApi variable declaration was removed but may still be referenced elsewhere in the codebase. Ensure all references to k8sStorageApi are also removed or updated.

Copilot uses AI. Check for mistakes.
),
targetVolumePath: '/volume_mount',
sourceVolumePath: userVolumeMount,
targetVolumePath: '/__w/myvolume',
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targetVolumePath changed from '/volume_mount' to '/__w/myvolume' which appears to be a significant change in the volume mounting strategy. Ensure this path change is intentional and consistent with the new architecture.

Suggested change
targetVolumePath: '/__w/myvolume',
targetVolumePath: '/volume_mount',

Copilot uses AI. Check for mistakes.
}

export function listDirAllCommand(dir: string): string {
return `cd ${dir} && find . -not -path '*/_runner_hook_responses*' -printf '%b %p\n'`
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The listDirAllCommand function uses string interpolation without input validation. The dir parameter should be validated or escaped to prevent command injection vulnerabilities.

Suggested change
return `cd ${dir} && find . -not -path '*/_runner_hook_responses*' -printf '%b %p\n'`
return `cd ${shlex.quote(dir)} && find . -not -path '*/_runner_hook_responses*' -printf '%b %p\n'`

Copilot uses AI. Check for mistakes.
Comment on lines +334 to +336
const child = spawn(commands[0], commands.slice(1), {
stdio: ['ignore', 'pipe', 'ignore']
})
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The localCalculateOutputHash function spawns a child process using the first element of the commands array without validation. This could lead to command injection if the commands parameter is not properly validated by the caller.

Copilot uses AI. Check for mistakes.
@nikola-jokic nikola-jokic requested a review from a team as a code owner September 25, 2025 13:05
@nikola-jokic nikola-jokic merged commit 96c35e7 into main Oct 2, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants