-
Notifications
You must be signed in to change notification settings - Fork 892
fix: Add reaper to coder agent #2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- The coder agent runs as PID 1 in some of our Docker workspaces. In such cases it is the responsibility of the init process to reap dead processes. Failing to do so can result in an inability to create new processes by running out of PIDs. This PR adds a reaper to our agent that is only spawned if it detects that it is PID1.
@kylecarbs what do you think about skipping the test if it's in CI, I'm a little concerned about having a forkexec in our tests... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice solution!
pattrs := &syscall.ProcAttr{ | ||
Dir: pwd, | ||
// Add our marker for identifying the child process. | ||
Env: append(os.Environ(), fmt.Sprintf("%s=true", agentEnvMark)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we filter this out from sub-process envs (i.e. when we launch shells)? It probably doesn't matter for correctness, more of a cleanliness thing.
We could also add CODER_AGENT_PID
as part of the sub-process envs for programmatic use (e.g. to enable pprof, kill -USR1 $CODER_AGENT_PID
). We'd need to do that always though, not only when forking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's certainly cleaner to filter it out. The only reason I didn't is because I did not want implementation details of the reaper
package to leak outside. In other words, if you're developing the agent I don't want people to have to think about using a special function that filters out arcane environment variables that you don't understand.
I think if it becomes an issue we devote some time to figuring out something unobtrusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already merged this but I think a cleaner solution is to have a --noreap
flag that we append to the forkexec cmd. Then we don't pollute subprocess envs or leak to the world whats happening under the hood. What do you think @mafredri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense and I agree that's a clean solution @sreya. 👍🏻
@@ -50,6 +52,23 @@ func workspaceAgent() *cobra.Command { | |||
} | |||
defer logWriter.Close() | |||
logger := slog.Make(sloghuman.Sink(cmd.ErrOrStderr()), sloghuman.Sink(logWriter)).Leveled(slog.LevelDebug) | |||
|
|||
isLinux := runtime.GOOS == "linux" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to limit this to linux
only? I suppose this should work on Darwin and other BSD flavors too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was causing things to fail when it ran on macos. I don't think limiting it to linux
is going to be a big deal because the reaper is primarily going to be used for docker containers and on mac that's going to be run in a linux vm anyway.
The coder agent runs as PID 1 in some of our Docker workspaces.
In such cases it is the responsibility of the init process to
reap dead processes. Failing to do so can result in an inability
to create new processes by running out of PIDs.
This PR adds a reaper to our agent that is only spawned if it
detects that it is PID1.
resolves #2398