-
Notifications
You must be signed in to change notification settings - Fork 255
Description
Description of the problem/feature request
Pods spawned by Job objects might stick around for long time if not explicitly deleted after finishing what they suppose to do. Users should be encouraged to purposefully set how long pods created by job objects should live and not to leave it to other clean-up mechanisms that might be triggered.
Description of the existing behavior vs. expected behavior
Job and CronJob Kubernetes objects spawn Pods to perform whatever job they are meant to execute.
In standalone Job objects, the pod's ttl is controlled by the field ttlSecondsAfterFinished
which does not have a default value, therefore, when unset, the finished pod won't be deleted automatically unless garbage collection thresholds are triggered, which on nodes with large filesystems backing container storage, can potentially never run prior to issues involving the grpc message buffer size.
In managed Job objects (created by CronJob objects), setting ttlSecondsAfterFinished
might interfere with successfulJobsHistoryLimit
and failedJobsHistoryLimit
from CronJob. Final behaviour is determined by the stricktier and can easily cause confusion and unexpected behaviour.
Therefore, reasonable linting targets are
- Advice setting
ttlSecondsAfterFinished
for standalone Job objects whenever it's not set - Advice unsetting
ttlSecondsAfterFinished
for managed Job objects whenever it's set
Additional context
I'll be creating a PR to implement the new check