____ _ _____
| _ \ / \|_ _|
| |_) / _ \ | |
| __/ ___ \| |
|_| /_/ \_\_|
Prometheus Alert Testing tool
You may also be interested in PromCLI
go get github.com/kevinjqiu/pat
You must have golang 1.9+ and dep installed.
Check out this repo to $GOPATH/src/github.com/kevinjqiu/pat
and then:
cd $GOPATH/src/github.com/kevinjqiu/pat && make build
pat [options] <test_yaml_file_glob>
e.g.,
pat test/*.yaml
Test files are written in yaml format. For a complete schema definition (in jsonschema format), see here.
name- The name of the test caserules- The rule definitions that are under testfixtures- The fixture setup for the testsassertions- The test assertions
The rules section defines how the rules-under-test should be loaded.
Currently, two rules loading strategies are supported:
- fromFile - load the rules from a .rules yaml file. If the path specified is not an absolute path, the rule file path will be relative to the test file.
- fromLiteral - embed the rules under test right inside the test file.
rules:
fromFile: http-rules.yamlor
rules:
fromLiteral: |-
groups:
- name: prometheus.rules
rules:
- alert: HTTPRequestRateLow
expr: http_requests{group="canary", job="app-server"} < 100
for: 1m
labels:
severity: criticalThe fixtures section defines a list of metrics fixtures that the tests will be using.
Each item in the list has the following attributes:
duration- How long these metrics will be set to the specified value. The duration must be acceptable by Golang'stime.ParseDuration(), e.g.,5m(5 minutes),1h(1 hour), etc.metrics- The metrics and their values
fixtures:
5m:
- http_requests{job="app-server", instance="0", group="blue"} 75
- http_requests{job="app-server", instance="1", group="blue"} 120This will create these two metrics, with the values last for 5 minutes.
You are also able to specify multiple metrics values:
5m:
- http_requests{job="app-server", instance="0", group="blue"} 75 100 200In this case, the metric http_requests{job="app-server", instance="0", group="blue"} will be set to 75 for the first 5 minutes, 100 for the next 5 minutes and 200 for the next 5 minutes. You can use this form to easily setup long running time series.
The assertions section contains a list of expectations when the alert rules are evaluated at certain time.
at- The instant when the rules are being evaluatedexpected- The list of expected alert properties
assertions:
- at: 0m
expected:
- alertname: HTTPRequestRateLow
alertstate: pending
job: app-server
severity: critical
- at: 5m
expected:
- alertname: HTTPRequestRateLow
alertstate: firing
job: app-server
severity: critical
- at: 10m
expected: []In this example, we're asserting that when the alert rules are evaluated at 0m, with the given fixtures, we should get HTTPRequestRateLow alert in pending state, and when evaluated at 5m, the alert should be in firing state. When evaluated at 10m, we shouldn't get any alert.
Suppose you have the following rule file that you want to be tested:
groups:
- name: prometheus.rules
rules:
- alert: HTTPRequestRateLow
expr: http_requests{group="canary", job="app-server"} < 100
for: 1m
labels:
severity: criticalWrite a yaml file with your test cases:
name: Test HTTP Requests too low alert
rules:
fromFile: rules.yaml
fixtures:
- duration: 5m
metrics:
- http_requests{job="app-server", instance="0", group="canary", severity="overwrite-me"} 75 85 95 105 105 95 85
- http_requests{job="app-server", instance="1", group="canary", severity="overwrite-me"} 80 90 100 110 120 130 140
assertions:
- at: 0m
expected:
- alertname: HTTPRequestRateLow
alertstate: pending
group: canary
instance: "0"
job: app-server
severity: critical
- alertname: HTTPRequestRateLow
alertstate: pending
group: canary
instance: "1"
job: app-server
severity: critical
comment: |-
At 0m, the alerts met the threshold but has not met the duration requirement. Expect the alert to be pending
- at: 5m
expected:
- alertname: HTTPRequestRateLow
alertstate: firing
group: canary
instance: "0"
job: app-server
severity: critical
- alertname: HTTPRequestRateLow
alertstate: firing
group: canary
instance: "1"
job: app-server
severity: critical
comment: |-
At 5m, the alerts should be firing because the duration requirement is met.
- at: 10m
expected:
- alertname: HTTPRequestRateLow
alertstate: firing
group: canary
instance: "0"
job: app-server
severity: critical
comment: |-
At 10m, the alert should be firing only for instance 0 because instance 1 is >= 100.
- at: 15m
expected: []
comment: |-
At 15m, both instances are back to normal, therefore we expect no alert.Run the test:
$ ./pat examples/test.yaml
=== RUN Test_HTTP_Requests_too_low_alert_at_0m
--- PASS: Test_HTTP_Requests_too_low_alert_at_0m (0.00s)
=== RUN Test_HTTP_Requests_too_low_alert_at_5m
--- PASS: Test_HTTP_Requests_too_low_alert_at_5m (0.00s)
PASS