Thanks to visit codestin.com
Credit goes to github.com

Skip to content

"Graceful" shutdown with SIGTERM appears to interrupt Teraform provider #14433

Closed
@aaronlehmann

Description

@aaronlehmann

Sending SIGTERM to the coder server is supposed to trigger a graceful shutdown that drains build jobs before exiting. However, it seems like when a build job is running at the time SIGTERM is received, the job gets interrupted anyway:

Stop caught, waiting for provisioner jobs to complete and gracefully exiting. Use ctrl+\ to force quitShutting down API server...

2024-08-23 15:12:17.146 [info]  provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged  job_id=4b457a13-609f-413b-bf61-fd29bf86bebd  template_name=workspace-v1  template_version=zealous_borg5  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151  workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289  workspace_name=[redacted]  workspace_owner=[redacted]  workspace_transition=start  level=INFO  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151 ...
    output= Interrupt received.
            Please wait for Terraform to exit or data loss may occur.
            Gracefully shutting down...
2024-08-23 15:12:17.146 [info]  provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged  job_id=4b457a13-609f-413b-bf61-fd29bf86bebd  template_name=workspace-v1  template_version=zealous_borg5  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151  workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289  workspace_name=[redacted]  workspace_owner=[redacted]  workspace_transition=start  level=INFO  output="Stopping operation..."  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151
2024-08-23 15:12:17.146 [info]  provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged  job_id=4b457a13-609f-413b-bf61-fd29bf86bebd  template_name=workspace-v1  template_version=zealous_borg5  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151  workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289  workspace_name=[redacted]  workspace_owner=[redacted]  workspace_transition=start  level=INFO  output="netflix_ec2.dev: Modifications errored after 24s"  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151

This was a result of configuring systemd to send the coder server SIGTERM and wait 10 minutes before following up with a kill signal. Howver, the interrupt and "Stopping operation..." log message appears to be immediate. The provider log also showed that its operation was cancelled partway through.

KillSignal=SIGTERM
SendSIGKILL=yes
TimeoutStopSec=10min

This is a high priority issue for us as it limits our ability to safely deploy updates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    must-doIssues that must be completed by the end of the Sprint. Or else. Only humans may set this.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions