Conversation
There was a problem hiding this comment.
8 issues found across 18 files
Prompt for AI agents (all 8 issues)
Understand the root cause of the following 8 issues and fix them.
<file name="docs/alerts/alert-configuration-syntax/variables-and-special-symbols.md">
<violation number="1" location="docs/alerts/alert-configuration-syntax/variables-and-special-symbols.md:5">
The `:::tip` block that explains why status values are 1/3/4 is never closed before the following `:::note`, so the admonition markup is unbalanced and renders incorrectly.</violation>
<violation number="2" location="docs/alerts/alert-configuration-syntax/variables-and-special-symbols.md:127">
`calc` converts `$mem_raw` into a boolean, so `$this` is never above 4M and the example alert can never fire.</violation>
</file>
<file name="docs/alerts/alert-configuration-syntax/calculations-and-transformations.md">
<violation number="1" location="docs/alerts/alert-configuration-syntax/calculations-and-transformations.md:90">
The swap alert example labels the result as `MB/s`, but the stock alert actually reports a percentage of RAM swapped out, so the documented units are incorrect.</violation>
<violation number="2" location="docs/alerts/alert-configuration-syntax/calculations-and-transformations.md:91">
The documented warning threshold (`$this > 200`) does not match the real stock alert, which warns once the percentage exceeds roughly 20–30; documenting the wrong cutoff misleads anyone copying the example.</violation>
<violation number="3" location="docs/alerts/alert-configuration-syntax/calculations-and-transformations.md:92">
This example adds a critical threshold that the underlying stock alert does not define, so the doc is attributing behavior that doesn’t exist.</violation>
</file>
<file name="docs/alerts/alert-configuration-syntax/optional-metadata.md">
<violation number="1" location="docs/alerts/alert-configuration-syntax/optional-metadata.md:176">
The workflow example uses `type: latency`, but `latency` is a `class` value; using it under `type` contradicts the taxonomy defined earlier in the document and misdirects readers.</violation>
<violation number="2" location="docs/alerts/alert-configuration-syntax/optional-metadata.md:193">
Section 3.6.6 Step 2 recommends `type` values like `latency`/`error`, contradicting the earlier definition of `type` as a functional domain (System, Database, etc.), which will confuse alert authors.</violation>
</file>
<file name="docs/alerts/creating-alerts-pages/creating-and-editing-alerts-via-config-files.md">
<violation number="1" location="docs/alerts/creating-alerts-pages/creating-and-editing-alerts-via-config-files.md:166">
Chart IDs are documented as never using underscores even though real Netdata chart IDs (for example `disk_space._run`) contain underscores; this misleads readers trying to locate the correct `chart` name.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
|
|
||
| Alert expressions in Netdata can reference variables that represent metric values, alert state, time information, and chart metadata. Understanding which variables are available and how to discover them is essential for writing effective alerts. | ||
|
|
||
| :::tip |
There was a problem hiding this comment.
The :::tip block that explains why status values are 1/3/4 is never closed before the following :::note, so the admonition markup is unbalanced and renders incorrectly.
Prompt for AI agents
Address the following comment on docs/alerts/alert-configuration-syntax/variables-and-special-symbols.md at line 5:
<comment>The `:::tip` block that explains why status values are 1/3/4 is never closed before the following `:::note`, so the admonition markup is unbalanced and renders incorrectly.</comment>
<file context>
@@ -0,0 +1,607 @@
+
+Alert expressions in Netdata can reference variables that represent metric values, alert state, time information, and chart metadata. Understanding which variables are available and how to discover them is essential for writing effective alerts.
+
+:::tip
+
+Refer to this section when you're writing `calc`, `warn`, or `crit` expressions and need to know which variables exist, debugging alerts that reference missing or incorrect variable names, building alerts that combine multiple dimensions or chart metadata, or using the `alarm_variables` API to explore what's available on a chart.
</file context>
| **Example:** | ||
| ```conf | ||
| on: system.mem | ||
| calc: $mem_raw > 5000000 # Raw collected value in KiB |
There was a problem hiding this comment.
calc converts $mem_raw into a boolean, so $this is never above 4M and the example alert can never fire.
Prompt for AI agents
Address the following comment on docs/alerts/alert-configuration-syntax/variables-and-special-symbols.md at line 127:
<comment>`calc` converts `$mem_raw` into a boolean, so `$this` is never above 4M and the example alert can never fire.</comment>
<file context>
@@ -0,0 +1,607 @@
+**Example:**
+```conf
+on: system.mem
+calc: $mem_raw > 5000000 # Raw collected value in KiB
+warn: $this > 4000000
+```
</file context>
| calc: $this / 1024 * 100 / ( $system.ram.used + $system.ram.cached + $system.ram.free ) | ||
| units: MB/s | ||
| warn: $this > 200 | ||
| crit: $this > 400 |
There was a problem hiding this comment.
This example adds a critical threshold that the underlying stock alert does not define, so the doc is attributing behavior that doesn’t exist.
Prompt for AI agents
Address the following comment on docs/alerts/alert-configuration-syntax/calculations-and-transformations.md at line 92:
<comment>This example adds a critical threshold that the underlying stock alert does not define, so the doc is attributing behavior that doesn’t exist.</comment>
<file context>
@@ -0,0 +1,352 @@
+ calc: $this / 1024 * 100 / ( $system.ram.used + $system.ram.cached + $system.ram.free )
+ units: MB/s
+ warn: $this > 200
+ crit: $this > 400
+```
+
</file context>
| lookup: sum -30m unaligned absolute of out | ||
| calc: $this / 1024 * 100 / ( $system.ram.used + $system.ram.cached + $system.ram.free ) | ||
| units: MB/s | ||
| warn: $this > 200 |
There was a problem hiding this comment.
The documented warning threshold ($this > 200) does not match the real stock alert, which warns once the percentage exceeds roughly 20–30; documenting the wrong cutoff misleads anyone copying the example.
Prompt for AI agents
Address the following comment on docs/alerts/alert-configuration-syntax/calculations-and-transformations.md at line 91:
<comment>The documented warning threshold (`$this > 200`) does not match the real stock alert, which warns once the percentage exceeds roughly 20–30; documenting the wrong cutoff misleads anyone copying the example.</comment>
<file context>
@@ -0,0 +1,352 @@
+lookup: sum -30m unaligned absolute of out
+ calc: $this / 1024 * 100 / ( $system.ram.used + $system.ram.cached + $system.ram.free )
+ units: MB/s
+ warn: $this > 200
+ crit: $this > 400
+```
</file context>
| # From swap.conf | ||
| lookup: sum -30m unaligned absolute of out | ||
| calc: $this / 1024 * 100 / ( $system.ram.used + $system.ram.cached + $system.ram.free ) | ||
| units: MB/s |
There was a problem hiding this comment.
The swap alert example labels the result as MB/s, but the stock alert actually reports a percentage of RAM swapped out, so the documented units are incorrect.
Prompt for AI agents
Address the following comment on docs/alerts/alert-configuration-syntax/calculations-and-transformations.md at line 90:
<comment>The swap alert example labels the result as `MB/s`, but the stock alert actually reports a percentage of RAM swapped out, so the documented units are incorrect.</comment>
<file context>
@@ -0,0 +1,352 @@
+# From swap.conf
+lookup: sum -30m unaligned absolute of out
+ calc: $this / 1024 * 100 / ( $system.ram.used + $system.ram.cached + $system.ram.free )
+ units: MB/s
+ warn: $this > 200
+ crit: $this > 400
</file context>
docs/alerts/creating-alerts-pages/creating-and-editing-alerts-via-config-files.md
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
Addressing Ralph's comments
Adressing ralph's comments
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/alerts/understanding-alerts/alert-types-alarm-vs-template.md">
<violation number="1" location="docs/alerts/understanding-alerts/alert-types-alarm-vs-template.md:83">
P2: The code block is labeled as `yaml` but Netdata health configuration files use a custom format, not YAML. This could mislead users into thinking they can use YAML syntax features. Consider using a generic code fence (no language) or a custom label like `conf` or `netdata`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
docs/alerts/understanding-alerts/alert-types-alarm-vs-template.md
Outdated
Show resolved
Hide resolved
Addressing Ralph's comments
Addressing cubic's comment
…via-config-files.md Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
59bb8d1 to
106da90
Compare
There was a problem hiding this comment.
10 issues found across 11 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/alerts/controlling-alerts-noise.md">
<violation number="1">
P2: `($this) = 0` is a valid equality check, so the alert will still flip states whenever the metric equals zero; use expressions that always evaluate to false if you want the alert to stay inactive.</violation>
</file>
<file name="docs/alerts/troubleshooting-alerts.md">
<violation number="1">
P2: The corrected calc example still divides by zero because the division occurs before the ternary multiplier is applied. Wrap the division in the conditional so it only executes when the denominator is non-zero.</violation>
<violation number="2">
P2: The notification-troubleshooting command queries `.alerts` even though `/api/v1/alarms` exposes data under `.alarms`. This typo makes the example unusable—update the jq path to `.alarms.your_alert_name`.</violation>
</file>
<file name="docs/alerts/alert-examples.md">
<violation number="1">
P2: The “Disk I/O latency” example queries the `disk.io` `avgsz` dimension, which measures average I/O size, yet it is documented as latency with millisecond units—so anyone copying this alert will never detect slow disks and will see incorrect units in notifications. Use the latency dimension (e.g., `await`) or update the text to describe size correctly.</violation>
</file>
<file name="docs/alerts/cloud-alert-features.md">
<violation number="1">
P2: The silencing rule example uses `alerts: *`, which is invalid YAML because `*` expects an anchor name. Quote the asterisk (e.g., `alerts: '*'`) so the sample can be copied without parse errors.</violation>
</file>
<file name="docs/alerts/built-in-alerts.md">
<violation number="1">
P2: `disk_space_usage` thresholds are documented incorrectly. The stock alert warns at >80% used and goes critical at >98% used with <5 GiB free, not at “free <20% / <10%”.</violation>
</file>
<file name="docs/alerts/receiving-notifications.md">
<violation number="1">
P2: Remove the leading space from the PagerDuty recipient example so the documented configuration works when copied into `health_alarm_notify.conf`.</violation>
</file>
<file name="docs/alerts/architecture.md">
<violation number="1">
P2: The reload flow incorrectly states that `/api/v1/health` expects a POST. The documented command is a GET with `?cmd=reload`, so the diagram should reflect the actual request or readers will call the API with the wrong method.</violation>
</file>
<file name="docs/alerts/advanced-techniques.md">
<violation number="1">
P2: The hysteresis example doesn’t actually implement different exit thresholds—alerts clear as soon as CPU drops below the enter value, contradicting the documented behavior. Update the expressions so the condition remains true until the metric crosses the lower “clear” thresholds.</violation>
</file>
<file name="docs/alerts/apis-alerts-events.md">
<violation number="1">
P2: The Prometheus scrape example points to `/api/v1/alarms`, but that endpoint only returns JSON and cannot be ingested by Prometheus; use `/api/v1/allmetrics` with `format=prometheus` instead.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Chapters added with proper subdirectory structure: - 4: controlling-alerts-noise (5 files) - 5: receiving-notifications (5 files) - 6: alert-examples (6 files) - 7: troubleshooting-alerts (6 files) - 8: advanced-techniques (6 files) - 9: apis-alerts-events (6 files) - 10: cloud-alert-features (5 files) - 11: built-in-alerts.md - 12: best-practices.md - 13: architecture.md Update map.csv with new entries.
There was a problem hiding this comment.
24 issues found across 51 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/alerts/controlling-alerts-noise.md">
<violation number="1" location="docs/alerts/controlling-alerts-noise.md:84">
P3: Section title and description contradict the behavior of the example: the config actually suppresses notifications entirely. Update the heading/text to reflect that this technique only keeps the alert visible while preventing notifications.</violation>
</file>
<file name="docs/alerts/apis-alerts-events.md">
<violation number="1" location="docs/alerts/apis-alerts-events.md:259">
P2: Prometheus cannot scrape `/api/v1/alarms` because it emits JSON rather than the required text exposition format; point users to Netdata’s Prometheus endpoint (e.g., `/api/v1/allmetrics?format=prometheus`) or a JSON exporter instead.</violation>
</file>
<file name="docs/alerts/troubleshooting-alerts.md">
<violation number="1" location="docs/alerts/troubleshooting-alerts.md:255">
P2: The troubleshooting command queries `.alerts` on the `/api/v1/alarms` response, but the API exposes the `alarms` object; this typo makes the example unusable.</violation>
</file>
<file name="docs/alerts/cloud-alert-features/index.md">
<violation number="1" location="docs/alerts/cloud-alert-features/index.md:15">
P3: The “What’s Next” block should include links for every subsection introduced above; 10.3 Alert Deduplication and 10.4 Room-Based Alerting are missing, making the navigation incomplete.</violation>
</file>
<file name="docs/alerts/receiving-notifications.md">
<violation number="1" location="docs/alerts/receiving-notifications.md:195">
P2: Remove the stray leading space from the PagerDuty recipient example so the documented value matches the intended recipient name.</violation>
<violation number="2" location="docs/alerts/receiving-notifications.md:198">
P3: Correct the typo so the PagerDuty integration is consistently named and readers can match it with the preceding configuration snippet.</violation>
</file>
<file name="docs/alerts/alert-examples/index.md">
<violation number="1" location="docs/alerts/alert-examples/index.md:28">
P3: The “What’s Next” list omits sections 6.4 and 6.5 that are introduced earlier, so the navigation is incomplete and can mislead readers.</violation>
</file>
<file name="docs/alerts/troubleshooting-alerts/notifications-not-sent.md">
<violation number="1" location="docs/alerts/troubleshooting-alerts/notifications-not-sent.md:8">
P2: The jq example targets `.alerts.your_alert_name`, but the Alarms API response nests data under `.alarms`, so the documented command will always return `null`.</violation>
</file>
<file name="docs/alerts/advanced-techniques/performance.md">
<violation number="1" location="docs/alerts/advanced-techniques/performance.md:7">
P2: The table misstates how the `every` interval impacts CPU usage. Since `every` expresses the time between evaluations, lower values (more frequent checks) consume more CPU, as your example right below demonstrates. Please flip the statement to reflect that shorter intervals increase CPU load.</violation>
</file>
<file name="docs/alerts/cloud-alert-features/silencing-rules.md">
<violation number="1" location="docs/alerts/cloud-alert-features/silencing-rules.md:12">
P2: Quote the wildcard in `alerts: *`; unquoted `*` is treated as an alias and makes the YAML example invalid.</violation>
</file>
<file name="docs/alerts/troubleshooting-alerts/always-critical.md">
<violation number="1" location="docs/alerts/troubleshooting-alerts/always-critical.md:28">
P2: The “RIGHT” calc example still performs the division before applying the guard, so it will divide by zero when `$var == $var2`. The fix example therefore remains incorrect and can cause the same runtime error the section is trying to address.</violation>
</file>
<file name="docs/alerts/troubleshooting-alerts/alert-never-triggers.md">
<violation number="1" location="docs/alerts/troubleshooting-alerts/alert-never-triggers.md:14">
P2: The jq filter queries `.alerts.your_alert_name`, but the `/api/v1/alarms` response nests data under `alarms`, so this command always returns null and prevents checking the alert’s value.</violation>
</file>
<file name="docs/alerts/best-practices.md">
<violation number="1" location="docs/alerts/best-practices.md:50">
P2: The example treats the raw 5xx request rate as a percentage, so the thresholds (`0.3`, `0.5`) and `${value}%` message are incorrect for `nginx.requests` and will massively under-report real error spikes.</violation>
</file>
<file name="docs/alerts/receiving-notifications/agent-parent-notifications.md">
<violation number="1" location="docs/alerts/receiving-notifications/agent-parent-notifications.md:65">
P2: Remove the leading space in `DEFAULT_RECIPIENT_PD` so the documented PagerDuty recipient matches the intended identifier and notifications are delivered.</violation>
</file>
<file name="docs/alerts/controlling-alerts-noise/silencing-cloud.md">
<violation number="1" location="docs/alerts/controlling-alerts-noise/silencing-cloud.md:28">
P2: The sample silencing rule won’t parse because `*disk*` is treated as a YAML alias; quote the pattern so the example configuration is valid.</violation>
</file>
<file name="docs/alerts/advanced-techniques/custom-actions.md">
<violation number="1" location="docs/alerts/advanced-techniques/custom-actions.md:31">
P2: The PagerDuty integration example omits the mandatory `payload` object, so the documented curl command fails against the Events API v2. Include the required payload fields (summary, severity, source, etc.).</violation>
</file>
<file name="docs/alerts/receiving-notifications/controlling-recipients.md">
<violation number="1" location="docs/alerts/receiving-notifications/controlling-recipients.md:9">
P2: `severity` is indented under the `integration` scalar, producing invalid YAML and misleading configuration guidance. Align `severity` with `integration` so the example parses correctly.</violation>
<violation number="2" location="docs/alerts/receiving-notifications/controlling-recipients.md:31">
P2: `role` is incorrectly indented beneath the `integration` scalar, so the YAML example is syntactically invalid. Align `role` with `integration` so the configuration example is usable.</violation>
</file>
<file name="docs/alerts/advanced-techniques/hysteresis.md">
<violation number="1" location="docs/alerts/advanced-techniques/hysteresis.md:18">
P2: The example alert never maintains hysteresis—the `$status` checks make the condition false right after it fires, so it immediately clears instead of waiting for the stated 70%/85% lower thresholds.</violation>
</file>
<file name="docs/alerts/built-in-alerts.md">
<violation number="1" location="docs/alerts/built-in-alerts.md:20">
P2: `ram_available`’s context and thresholds do not match the stock alert: it watches `mem.available` and only warns below ~15% (no CRIT).</violation>
<violation number="2" location="docs/alerts/built-in-alerts.md:21">
P2: `disk_space_usage` row reverses the stock alert logic (it triggers when usage exceeds ~80–90% and CRIT also requires <5 GiB free).</violation>
</file>
<file name="docs/alerts/controlling-alerts-noise/disabling-alerts.md">
<violation number="1" location="docs/alerts/controlling-alerts-noise/disabling-alerts.md:33">
P2: The configuration snippet that supposedly disables a stock alert never sets `enabled: no`, so following it does nothing—stock alarms stay enabled unless that flag is overridden.</violation>
</file>
<file name="docs/alerts/advanced-techniques.md">
<violation number="1" location="docs/alerts/advanced-techniques.md:45">
P2: The provided hysteresis example never implements the stated 70%/85% exit thresholds, so the “advanced” pattern clears immediately instead of providing hysteresis.</violation>
<violation number="2" location="docs/alerts/advanced-techniques.md:66">
P2: The “More Complex Hysteresis” example omits the lower exit thresholds entirely, so it cannot stay in WARNING until 60% or CRITICAL until 80% as described.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| ## 7.5.1 Is It Evaluation or Notification? | ||
|
|
||
| ```bash | ||
| curl -s "http://localhost:19999/api/v1/alarms" | jq '.alerts.your_alert_name' |
There was a problem hiding this comment.
P2: The jq example targets .alerts.your_alert_name, but the Alarms API response nests data under .alarms, so the documented command will always return null.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/alerts/troubleshooting-alerts/notifications-not-sent.md, line 8:
<comment>The jq example targets `.alerts.your_alert_name`, but the Alarms API response nests data under `.alarms`, so the documented command will always return `null`.</comment>
<file context>
@@ -0,0 +1,31 @@
+## 7.5.1 Is It Evaluation or Notification?
+
+```bash
+curl -s "http://localhost:19999/api/v1/alarms" | jq '.alerts.your_alert_name'
+```
+
</file context>
| curl -s "http://localhost:19999/api/v1/alarms" | jq '.alerts.your_alert_name' | |
| curl -s "http://localhost:19999/api/v1/alarms" | jq '.alarms.your_alert_name' |
docs/alerts/controlling-alerts-noise/controlling-alerts-noise.md
Outdated
Show resolved
Hide resolved
…chapter 1 pattern)
…ers now have subdirectories)
…tices, architecture)
…tching chapters 4-10 pattern)
There was a problem hiding this comment.
2 issues found across 32 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/alerts/architecture/configuration-layers.md">
<violation number="1" location="docs/alerts/architecture/configuration-layers.md:9">
P2: The new paragraph reverses the documented precedence: stock alerts load first, not last, so the text misleads users about which layer overrides which.</violation>
</file>
<file name="docs/alerts/best-practices/sli-slo-alerts.md">
<violation number="1" location="docs/alerts/best-practices/sli-slo-alerts.md:17">
P2: This sentence contradicts the preceding guidance by triggering the alert only after the SLO is already violated. Clarify that the alert threshold should be stricter than the SLO so responders get notified before exhausting the error budget.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 16 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/alerts/built-in-alerts/adjusting-stock-alerts.md">
<violation number="1" location="docs/alerts/built-in-alerts/adjusting-stock-alerts.md:37">
P2: The provided disable example never actually disables the alert because it omits `enabled: no` (or another disabling directive), so following it leaves the stock alert active.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
2 issues found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/alerts/advanced-techniques/hysteresis.md">
<violation number="1" location="docs/alerts/advanced-techniques/hysteresis.md:18">
P2: The WARN hysteresis clause uses `> 70`, so the alarm clears at exactly 70%, contradicting the described “clear when below 70%” behavior. Use `>= 70` to keep WARNING active until the value drops below 70%.</violation>
<violation number="2" location="docs/alerts/advanced-techniques/hysteresis.md:19">
P2: The CRITICAL hysteresis clause uses `> 85`, so it clears at exactly 85% even though the text says it should clear only when below 85%. Change the holding condition to `>= 85` to match the documented behavior.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| lookup: average -5m of user,system | ||
| every: 1m | ||
| warn: (($this > 80) && ($status != WARNING)) || (($this > 70) && ($status == WARNING)) | ||
| crit: (($this > 95) && ($status != CRITICAL)) || (($this > 85) && ($status == CRITICAL)) |
There was a problem hiding this comment.
P2: The CRITICAL hysteresis clause uses > 85, so it clears at exactly 85% even though the text says it should clear only when below 85%. Change the holding condition to >= 85 to match the documented behavior.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/alerts/advanced-techniques/hysteresis.md, line 19:
<comment>The CRITICAL hysteresis clause uses `> 85`, so it clears at exactly 85% even though the text says it should clear only when below 85%. Change the holding condition to `>= 85` to match the documented behavior.</comment>
<file context>
@@ -15,8 +15,8 @@ template: cpu_hysteresis
- warn: ($this > 80) && ($status != WARNING)
- crit: ($this > 95) && ($status != CRITICAL)
+ warn: (($this > 80) && ($status != WARNING)) || (($this > 70) && ($status == WARNING))
+ crit: (($this > 95) && ($status != CRITICAL)) || (($this > 85) && ($status == CRITICAL))
</file context>
</details>
```suggestion
crit: (($this > 95) && ($status != CRITICAL)) || (($this >= 85) && ($status == CRITICAL))
| on: system.cpu | ||
| lookup: average -5m of user,system | ||
| every: 1m | ||
| warn: (($this > 80) && ($status != WARNING)) || (($this > 70) && ($status == WARNING)) |
There was a problem hiding this comment.
P2: The WARN hysteresis clause uses > 70, so the alarm clears at exactly 70%, contradicting the described “clear when below 70%” behavior. Use >= 70 to keep WARNING active until the value drops below 70%.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/alerts/advanced-techniques/hysteresis.md, line 18:
<comment>The WARN hysteresis clause uses `> 70`, so the alarm clears at exactly 70%, contradicting the described “clear when below 70%” behavior. Use `>= 70` to keep WARNING active until the value drops below 70%.</comment>
<file context>
@@ -15,8 +15,8 @@ template: cpu_hysteresis
every: 1m
- warn: ($this > 80) && ($status != WARNING)
- crit: ($this > 95) && ($status != CRITICAL)
+ warn: (($this > 80) && ($status != WARNING)) || (($this > 70) && ($status == WARNING))
+ crit: (($this > 95) && ($status != CRITICAL)) || (($this > 85) && ($status == CRITICAL))
</file context>
</details>
```suggestion
warn: (($this > 80) && ($status != WARNING)) || (($this >= 70) && ($status == WARNING))
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/alerts/understanding-alerts/what-is-a-netdata-alert.md">
<violation number="1" location="docs/alerts/understanding-alerts/what-is-a-netdata-alert.md:103">
P2: Saying "use template for all cases" contradicts the preceding explanation that alarms are for single chart instances while templates target all matching contexts. This misguides readers into thinking alarms should never be used.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
c0ac189 to
d9fe4c7
Compare
|
many commits, opened new and fresh |
This commit contains fixes from a thorough review of all 74 pages across 13 chapters of the alerting documentation rewrite (PR netdata#21333). ## Critical Fixes - **Configuration precedence**: Fixed incorrect claim that stock configs load first. User config files load FIRST; stock files with same name are EXCLUDED entirely. (managing-stock-vs-custom-alerts.md, configuration-layers.md) - **Operator precedence**: Fixed wrong precedence table. AND/OR have SAME precedence level (not different levels as documented). (expressions-operators-functions.md) - **Fabricated variable**: Removed $collected_total_raw which does NOT exist in codebase. (variables-and-special-symbols.md) - **Fabricated CLI command**: Removed non-existent `netdatacli health configuration`. (configuration-layers.md) - **Fabricated YAML syntax**: Cloud silencing rules and room-based targeting use web UI forms, not YAML configuration. (silencing-rules.md, room-based.md) - **Fabricated alert examples**: All examples in alert-examples/ chapter used non-existent charts, dimensions, and templates. Replaced with real stock alerts. - **Fabricated API endpoints**: Cloud has no public API for events. (cloud-events.md) - **Wrong status constants**: Added missing RAISED status, fixed values ($WARNING=3, $CRITICAL=4). (Multiple files) ## Other Fixes - Fixed API endpoint /api/v1/alerts -> /api/v1/alarms - Fixed reload-health command behavior (returns 0 regardless of errors) - Added missing cross-references between sections - Fixed section numbering (13.6 -> 13.5) - Added RAISED state to state machine documentation - Clarified notification delay vs status change timing ## Review Summary - All 13 chapters reviewed (74 files total) - 38% scored 45-50 (excellent), 49% scored 40-44 (good) - Documentation is now technically accurate and merge-ready
- Fix disk.chart syntax: change disk.space./ to disk_space./ (chart IDs use underscore, contexts use dot) - Fix alarm vs template: use template: for context-based rules, alarm: for chart-specific rules - Add index pages for built-in-alerts, best-practices, and architecture Ref: netdata#21333
- Fix disk.chart syntax: change disk.space./ to disk_space./ (chart IDs use underscore, contexts use dot) - Fix alarm vs template: use template: for context-based rules, alarm: for chart-specific rules - Add index pages for built-in-alerts, best-practices, and architecture Ref: netdata#21333
hey, @ralphm , @ktsaou , @sashwathn, @shyamvalsan
Summary by cubic
Rebuilt the Alerts docs into a complete, structured guide. Covers concepts, creation (Cloud and file-based), operations, advanced techniques, APIs, Cloud features, best practices, and architecture.
New Features
Refactors
Written for commit d9fe4c7. Summary will update on new commits.