Thanks to visit codestin.com
Credit goes to www.scaleway.com

Skip to navigationSkip to main contentSkip to footerScaleway DocsAsk our AI
Ask our AI

How to configure alerts for a job

This page shows you how to configure alerts for Scaleway Serverless Jobs using Scaleway Cockpit and Grafana.

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • Scaleway resources you can monitor
  • Created Grafana credentials with the Editor role
  • Enabled the alert manager
  • Added at least one contact in the Scaleway console or contact points in Grafana
  • Selected the Scaleway Alerting alert manager in Grafana
  1. Log in to Grafana using your credentials.

  2. Click the Grafana icon in the top left side of your screen to open the menu.

  3. Click the arrow next to Alerting on the left-side menu, then click Alert rules.

  4. Click + New alert rule.

  5. Enter a name for your alert.

  6. In the Define query and alert condition section, toggle Advanced options.

  7. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the Scaleway Metrics data source.

  8. In the Rule type subsection, click the Data source-managed tab.

    Important

    Data source managed alert rules allow you to configure alerts managed by the data source of your choice, instead of using Grafana's managed alerting system which is not supported by Cockpit. This step is mandatory because Cockpit does not support Grafana’s built-in alerting system, but only alerts configured and evaluated by the data source itself.

  9. In the query field next to the Loading metrics... > button, select the metric you want to configure an alert for. Refer to the table below for details on each alert for Serverless Jobs.

    AnyJobError

    Pending period
    5s
    Summary
    Job run {{ $labels.resource_id }} is in error.
    Query and alert condition
    (serverless_job_run:state_failed == 1) OR (serverless_job_run:state_internal_error == 1)
    Description
    Job run {{ $labels.resource_id }} from the job definition {{ $labels.resource_name }} finish in error. Check the console to find out the error message.

    JobError

    Pending period
    5s
    Summary
    Job run {{ $labels.resource_id }} is in error.
    Query and alert condition
    (serverless_job_run:state_failed{resource_name="your-job-name-here"} == 1) OR (serverless_job_run:state_internal_error{resource_name="your-job-name-here"} == 1)
    Description
    Job run {{ $labels.resource_id }} from the job definition {{ $labels.resource_name }} finish in error. Check the console to find out the error message.

    AnyJobHighCPUUsage

    Pending period
    10s
    Summary
    High CPU usage for job run {{ $labels.resource_id }}.
    Query and alert condition
    serverless_job_run:cpu_usage_seconds_total:rate30s / serverless_job_run:cpu_limit * 100 > 90
    Description
    Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available CPU since 10s.

    JobHighCPUUsage

    Pending period
    10s
    Summary
    High CPU usage for job run {{ $labels.resource_job definition }}.
    Query and alert condition
    serverless_job_run:cpu_usage_seconds_total:rate30s{resource_name="your-job-name-here"} / serverless_job_run:cpu_limit{resource_name="your-job-name-here"} * 100 > 90
    Description
    Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available CPU since 10s.

    AnyJobHighMemoryUsage

    Pending period
    10s
    Summary
    High memory usage for job run {{ $labels.resource_job definition }}.
    Query and alert condition
    (serverless_job_run:memory_usage_bytes / serverless_job_run:memory_limit_bytes ) * 100 > 80
    Description
    Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available RAM since 10s.

    JobHighMemoryUsage

    Pending period
    10s
    Summary
    High memory usage for job run {{ $labels.resource_id }}.
    Query and alert condition
    (serverless_job_run:memory_usage_bytes{resource_id="your-job-name-here"} / serverless_job_run:memory_limit_bytes{resource_id="your-job-name-here"}) * 100 > 80
    Description
    Job run {{ $labels.resource_name }} from the job definition {{ $labels.resource_name }} is using more than {{ printf "%.0f" $value }}% of its available RAM since 10s.
  10. Make sure that the values for the labels you have selected correspond to those of the target resource.

  11. In the Set alert evaluation behavior section, specify how long the condition must be met before triggering the alert.

  12. Enter a name in the Namespace and Group fields to categorize and manage your alert rules. Rules that share the same group will use the same configuration, including the evaluation interval which determines how often the rule is evaluated (by default: every 1 minute). You can modify this interval later in the group settings.

    Note

    The evaluation interval is different from the pending period set in step 2. The evaluation interval controls how often the rule is checked, while the pending period defines how long the condition must be continuously met before the alert fires.

  13. In the Configure labels and notifications section, click + Add labels. A pop-up appears.

  14. Enter a label and value name and click Save. You can skip this step if you want your alerts to be sent to the contacts you may already have created in the Scaleway console.

    Note

    In Grafana, notifications are sent by matching alerts to notification policies based on labels. This step is about deciding how alerts will reach you or your team (Slack, email, etc.) based on labels you attach to them. Then, you can set up rules that define who receives notifications in the Notification policies page. Find out how to configure notification policies in Grafana.

  15. Click Save rule and exit in the top right corner of your screen to save and activate your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert has been triggered.

Still need help?

Create a support ticket
No Results