Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

yuri-tceretian
Copy link
Contributor

@yuri-tceretian yuri-tceretian commented Sep 20, 2023

What is this feature?
This PR adds support for a new type of command introduced in #70998. The hysteresis command requires information about the results of the previous evaluation. This PR updates the state manager and scheduler to facilitate this information for the evaluation engine:

  • state manager is updated to keep a fingerprint of the result instance for each State. It differs from CacheID or Labelshash because it is calculated from the labels of the raw result.
  • scheduler wraps the state manager and the rule into an intermediate structure AlertingResultsFromRuleState and provides it to the rule evaluator.
  • The evaluator analyzes the expression tree (parsed rule's query) and if it contains HysteresisCommand, it populates it with the data taken from the AlertingResultsFromRuleState.

The structure AlertingResultsFromRuleState implements the logic that determines what metric\dimension\state is considered as "loaded":

  • state Pending or Alerting
  • state reason is empty. This is to avoid execution of exceptional states such as Error\NoData as Alerting.

Why do we need this feature?
To support hysteresis expression in alerting. This will help users reduce flapping on the alert rules when the metric value frequently crosses threshold boundaries.

Who is this feature for?
Alerting

Which issue(s) does this PR fix?:

Related #6202

Special notes for your reviewer:

  • PR can be reviewed by commit. In some commits I provide the explanations for chosen approach.
  • The majority of the changes are tests. The core backend changes are in just two commits da01d9a and ef96be8
  • the feature is behind the flag recoveryThreshold. if it is disabled the expr engine does not create HysteresisCommand struct and code in this PR does not do anything.
UI screen recording
hysteresis_UI.mp4

When viewing the rule

Screenshot 2023-11-06 at 09 45 23

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

@yuri-tceretian yuri-tceretian self-assigned this Sep 20, 2023
@grafana-delivery-bot grafana-delivery-bot bot added this to the 10.2.x milestone Sep 20, 2023
@grafana-pr-automation grafana-pr-automation bot added area/frontend type/docs Flags the technical writing team for documentation support; auto adds to org-wide docs project area/backend/db/migration area/backend labels Sep 20, 2023
@yuri-tceretian yuri-tceretian force-pushed the yuri-tceretian/expr-hysteresis-alerting branch from d64d02a to 5e1b397 Compare September 20, 2023 19:52
@yuri-tceretian yuri-tceretian force-pushed the yuri-tceretian/expr-hysteresis-alerting branch from 72aa808 to 24d23fd Compare October 10, 2023 14:56
@yuri-tceretian yuri-tceretian marked this pull request as ready for review October 11, 2023 09:30
@yuri-tceretian yuri-tceretian requested review from a team as code owners October 11, 2023 09:30
@yuri-tceretian yuri-tceretian requested a review from a team October 11, 2023 09:30
@yuri-tceretian yuri-tceretian requested a review from a team as a code owner October 11, 2023 09:30
@soniaAguilarPeiron
Copy link
Contributor

/deploy-to-hg

@grafana grafana deleted a comment from ephemeral-instances-bot bot Dec 7, 2023
@grafana grafana deleted a comment from ephemeral-instances-bot bot Dec 7, 2023
@soniaAguilarPeiron
Copy link
Contributor

/deploy-to-hg

@grafana grafana deleted a comment from ephemeral-instances-bot bot Dec 14, 2023
@grafana grafana deleted a comment from ephemeral-instances-bot bot Dec 14, 2023
@soniaAguilarPeiron
Copy link
Contributor

My last commit is a fix for the style for hysteresis fields in dashboard panel expressions.

Before the change: Screenshot 2023-12-15 at 09 59 07 Screenshot 2023-12-15 at 09 58 56
After the change: Screenshot 2023-12-15 at 09 50 52 Screenshot 2023-12-15 at 09 50 44

@soniaAguilarPeiron
Copy link
Contributor

After a conversation with @yuri-tceretian we agreed on hiding hsyteresis in panel expressions.

@grafana grafana deleted a comment from ephemeral-instances-bot bot Jan 3, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot Jan 4, 2024
@JacobsonMT
Copy link
Contributor

/deploy-to-hg

@ephemeral-instances-bot
Copy link

  • Preparing your instance. A comment containing your instance's url will be added to this PR when the instance is ready.
  • Your instance will be ready in ~10 minutes.
  • Check the GitHub actions tab to follow the workflow progress
  • Slack channel: #proj-ephemeral-hg-instances
  • Building instance with yuri-tceretian/expr-hysteresis-alerting oss branch and main enterprise branch. How to choose a branch

@ephemeral-instances-bot
Copy link

Copy link
Contributor

@JacobsonMT JacobsonMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Confirmed working as expected, there are parts I'm not 100% sure on the implementation but considering this is behind a feature flag it's not necessary to dwell on those details at this stage. 🚀

@yuri-tceretian yuri-tceretian merged commit f6a4674 into main Jan 4, 2024
@yuri-tceretian yuri-tceretian deleted the yuri-tceretian/expr-hysteresis-alerting branch January 4, 2024 16:47
zserge pushed a commit that referenced this pull request Jan 22, 2024
Backend: 

* Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command.
   - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels.
  -  add rule_fingerprint column to alert_instance
   - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it.
   - add AlertingResultsFromRuleState that implements the new interface in eval package
   - update getExprRequest to patch the hysteresis command.

* Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition.


Frontend:

* Add hysteresis option to Threshold in UI. It's called "Recovery Threshold"
* Add test for getUnloadEvaluatorTypeFromCondition
* Hide hysteresis in panel expressions

* Refactor isInvalid and add test for it
* Remove unnecesary React.memo
* Add tests for updateEvaluatorConditions

---------

Co-authored-by: Sonia Aguilar <[email protected]>
@summerwollin summerwollin modified the milestones: 10.3.x, 10.3.0 Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add to changelog add to what's new area/backend/db/migration area/backend area/frontend blocked Issue awaiting a resolution or consencus before moving on no-backport Skip backport of PR type/docs Flags the technical writing team for documentation support; auto adds to org-wide docs project
Projects
None yet
Development

Successfully merging this pull request may close these issues.