Documentation
¶
Index ¶
Constants ¶
const ( // HungJobDuration is the duration of time since the last update to a job // before it is considered hung. HungJobDuration = 5 * time.Minute // HungJobExitTimeout is the duration of time that provisioners should allow // for a graceful exit upon cancellation due to failing to send an update to // a job. // // Provisioners should avoid keeping a job "running" for longer than this // time after failing to send an update to the job. HungJobExitTimeout = 3 * time.Minute // MaxJobsPerRun is the maximum number of hung jobs that the detector will // terminate in a single run. MaxJobsPerRun = 10 )
Variables ¶
var HungJobLogMessages = []string{
"",
"====================",
"Coder: Build has been detected as hung for 5 minutes and will be terminated.",
"====================",
"",
}
HungJobLogMessages are written to provisioner job logs when a job is hung and terminated.
Functions ¶
This section is empty.
Types ¶
type Detector ¶
type Detector struct {
// contains filtered or unexported fields
}
Detector automatically detects hung provisioner jobs, sends messages into the build log and terminates them as failed.
func New ¶
func New(ctx context.Context, db database.Store, pub pubsub.Pubsub, log slog.Logger, tick <-chan time.Time) *Detector
New returns a new hang detector.
func (*Detector) Start ¶
func (d *Detector) Start()
Start will cause the detector to detect and unhang provisioner jobs on every tick from its channel. It will stop when its context is Done, or when its channel is closed.
Start should only be called once.
func (*Detector) WithStatsChannel ¶
WithStatsChannel will cause Executor to push a RunStats to ch after every tick. This push is blocking, so if ch is not read, the detector will hang. This should only be used in tests.
type Stats ¶
type Stats struct { // TerminatedJobIDs contains the IDs of all jobs that were detected as hung and // terminated. TerminatedJobIDs []uuid.UUID // Error is the fatal error that occurred during the last run of the // detector, if any. Error may be set to AcquireLockError if the detector // failed to acquire a lock. Error error }
Stats contains statistics about the last run of the detector.