How to configure alerts for a job

Reviewed on September 19, 2025

This page shows you how to configure alerts for Scaleway Serverless Jobs using Scaleway Cockpit and Grafana.

Before you start

To complete the actions presented below, you must have:

A Scaleway account logged into the console
Owner status or IAM permissions allowing you to perform actions in the intended Organization
Scaleway resources you can monitor
Created Grafana credentials with the Editor role
Enabled the alert manager
Added at least one contact in the Scaleway console or contact points in Grafana
Selected the Scaleway Alerting alert manager in Grafana

Log in to Grafana using your credentials.
Click the Grafana icon in the top left side of your screen to open the menu.
Click the arrow next to Alerting on the left-side menu, then click Alert rules.
Click + New alert rule.
Enter a name for your alert.
In the Define query and alert condition section, toggle Advanced options.
Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the Scaleway Metrics data source.
In the Rule type subsection, click the Data source-managed tab.

Important
Data source managed alert rules allow you to configure alerts managed by the data source of your choice, instead of using Grafana's managed alerting system which is not supported by Cockpit. This step is mandatory because Cockpit does not support Grafana’s built-in alerting system, but only alerts configured and evaluated by the data source itself.
In the query field next to the Loading metrics... > button, select the metric you want to configure an alert for. Refer to the table below for details on each alert for Serverless Jobs.

AnyJobError

Pending period
5s
Summary
Job run {{ $labels.resource_id }} is in error.
Query and alert condition
(serverless_job_run:state_failed == 1) OR (serverless_job_run:state_internal_error == 1)
Description
Job run {{ $labels.resource_id }} from the job definition {{ $labels.resource_name }} finish in error. Check the console to find out the error message.

JobError

Pending period
5s
Summary
Job run {{ $labels.resource_id }} is in error.
Query and alert condition
(serverless_job_run:state_failed{resource_name="your-job-name-here"} == 1) OR (serverless_job_run:state_internal_error{resource_name="your-job-name-here"} == 1)
Description
Job run {{ $labels.resource_id }} from the job definition {{ $labels.resource_name }} finish in error. Check the console to find out the error message.

AnyJobHighCPUUsage

Pending period
10s
Summary
High CPU usage for job run {{ $labels.resource_id }}.
Query and alert condition
serverless_job_run:cpu_usage_seconds_total:rate30s / serverless_job_run:cpu_limit * 100 > 90
Description
Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available CPU since 10s.

JobHighCPUUsage

Pending period
10s
Summary
High CPU usage for job run {{ $labels.resource_job definition }}.
Query and alert condition
serverless_job_run:cpu_usage_seconds_total:rate30s{resource_name="your-job-name-here"} / serverless_job_run:cpu_limit{resource_name="your-job-name-here"} * 100 > 90
Description
Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available CPU since 10s.

AnyJobHighMemoryUsage

Pending period
10s
Summary
High memory usage for job run {{ $labels.resource_job definition }}.
Query and alert condition
(serverless_job_run:memory_usage_bytes / serverless_job_run:memory_limit_bytes ) * 100 > 80
Description
Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available RAM since 10s.

JobHighMemoryUsage

Pending period
10s
Summary
High memory usage for job run {{ $labels.resource_id }}.
Query and alert condition
(serverless_job_run:memory_usage_bytes{resource_id="your-job-name-here"} / serverless_job_run:memory_limit_bytes{resource_id="your-job-name-here"}) * 100 > 80
Description
Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available RAM since 10s.
Make sure that the values for the labels you have selected correspond to those of the target resource.
In the Set alert evaluation behavior section, specify how long the condition must be met before triggering the alert.
Enter a name in the Namespace and Group fields to categorize and manage your alert rules. Rules that share the same group will use the same configuration, including the evaluation interval which determines how often the rule is evaluated (by default: every 1 minute). You can modify this interval later in the group settings.

Note
The evaluation interval is different from the pending period set in step 2. The evaluation interval controls how often the rule is checked, while the pending period defines how long the condition must be continuously met before the alert fires.
In the Configure labels and notifications section, click + Add labels. A pop-up appears.
Enter a label and value name and click Save. You can skip this step if you want your alerts to be sent to the contacts you may already have created in the Scaleway console.

Note
In Grafana, notifications are sent by matching alerts to notification policies based on labels. This step is about deciding how alerts will reach you or your team (Slack, email, etc.) based on labels you attach to them. Then, you can set up rules that define who receives notifications in the Notification policies page. Find out how to configure notification policies in Grafana.
Click Save rule and exit in the top right corner of your screen to save and activate your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert has been triggered.