mirror of
https://github.com/grafana/grafana.git
synced 2025-02-25 18:55:37 -06:00
Alerting docs: Update and restructure Introduction/Alert rule evaluation
(#87331)
* Migrate and update `Alert rule evaluation` * Minor change * Update internal links to new URLs
This commit is contained in:
parent
1b69d647be
commit
7dd05998fd
@ -139,8 +139,8 @@ Here are some tips on how to create an effective alert management set up for you
|
||||
[alertmanager]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/alertmanager"
|
||||
[alertmanager]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/alertmanager"
|
||||
|
||||
[alert-rule-evaluation]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/rule-evaluation"
|
||||
[alert-rule-evaluation]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/rule-evaluation"
|
||||
[alert-rule-evaluation]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation"
|
||||
[alert-rule-evaluation]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rule-evaluation"
|
||||
|
||||
[external-alertmanagers]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/set-up/configure-alertmanager"
|
||||
[external-alertmanagers]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/set-up/configure-alertmanager"
|
||||
|
@ -0,0 +1,97 @@
|
||||
---
|
||||
aliases:
|
||||
- ../fundamentals/alert-rules/rule-evaluation/ # /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/rule-evaluation/
|
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rule-evaluation/
|
||||
description: Use alert rule evaluation to determine how frequently an alert rule should be evaluated and how quickly it should change its state
|
||||
keywords:
|
||||
- grafana
|
||||
- alerting
|
||||
- evaluation
|
||||
labels:
|
||||
products:
|
||||
- cloud
|
||||
- enterprise
|
||||
- oss
|
||||
title: Alert rule evaluation
|
||||
weight: 108
|
||||
---
|
||||
|
||||
# Alert rule evaluation
|
||||
|
||||
The criteria determining when an alert rule fires are based on two settings:
|
||||
|
||||
- [Evaluation group](#evaluation-group): how frequently the alert rule is evaluated.
|
||||
- [Pending period](#pending-period): how long the condition must be met to start firing.
|
||||
|
||||
{{< figure src="/media/docs/alerting/alert-rule-evaluation.png" max-width="750px" caption="Set alert rule evaluation" >}}
|
||||
|
||||
## Evaluation group
|
||||
|
||||
Every alert rule is assigned to an evaluation group. You can assign the alert rule to an existing evaluation group or create a new one.
|
||||
|
||||
Each evaluation group contains an **evaluation interval** that determines how frequently the alert rule is checked. For instance, the evaluation may occur every `10s`, `30s`, `1m`, `10m`, etc.
|
||||
|
||||
Alert rules in different groups can be evaluated simultaneously.
|
||||
|
||||
**Grafana-managed** alert rules within the same group are evaluated simultaneously. However, **data-source managed** alert rules within the same group are evaluated one after the other—this is necessary to ensure that recording rules are evaluated before alert rules.
|
||||
|
||||
## Pending period
|
||||
|
||||
You can set a pending period to prevent unnecessary alerts from temporary issues.
|
||||
|
||||
The pending period specifies how long the condition must be met before firing, ensuring the condition is consistently met over a consecutive period.
|
||||
|
||||
You can also set the pending period to zero to skip it and have the alert fire immediately once the condition is met.
|
||||
|
||||
## Evaluation example
|
||||
|
||||
Keep in mind:
|
||||
|
||||
- One alert rule can generate multiple alert instances - one for each time series produced by the alert rule's query.
|
||||
- Alert instances from the same alert rule may be in different states. For instance, only one observed machine might start firing.
|
||||
- Only firing and resolved alert instances are routed to manage their notifications.
|
||||
|
||||
{{< figure src="/media/docs/alerting/alert-rule-evaluation-overview-statediagram.png" max-width="750px" >}}
|
||||
|
||||
<!--
|
||||
Remove ///
|
||||
stateDiagram-v2
|
||||
direction LR
|
||||
Normal --///> Pending
|
||||
note right of Normal
|
||||
Route "Resolved" alert instances
|
||||
for notifications
|
||||
end note
|
||||
Pending --///> Firing
|
||||
Firing --///> Normal: Resolved
|
||||
note right of Firing
|
||||
Route "Firing" alert instances
|
||||
for notifications
|
||||
end note
|
||||
-->
|
||||
|
||||
Consider an alert rule with an **evaluation interval** set at every 30 seconds and a **pending period** of 90 seconds. The evaluation occurs as follows:
|
||||
|
||||
| Time | Condition | Alert instance state | Pending counter |
|
||||
| ------------------------- | --------- | -------------------- | --------------- |
|
||||
| 00:30 (first evaluation) | Not met | Normal | - |
|
||||
| 01:00 (second evaluation) | Breached | Pending | 0s |
|
||||
| 01:30 (third evaluation) | Breached | Pending | 30s |
|
||||
| 02:00 (fourth evaluation) | Breached | Pending | 60s |
|
||||
| 02:30 (fifth evaluation) | Breached | Firing<sup>\*</sup> | 90s |
|
||||
|
||||
An alert instance is resolved when it transitions from the `Firing` to the `Normal` state. For instance, in the previous example:
|
||||
|
||||
| Time | Condition | Alert instance state | Pending counter |
|
||||
| -------------------------- | --------- | ----------------------------- | --------------- |
|
||||
| 03:00 (sixth evaluation) | Not met | Normal <sup>Resolved \*</sup> | 120s |
|
||||
| 03:30 (seventh evaluation) | Not met | Normal | 150s |
|
||||
|
||||
To learn more about the state changes of alert rules and alert instances, refer to [State and health of alert rules][alerts-state-health].
|
||||
|
||||
{{% docs/reference %}}
|
||||
|
||||
[alerts-state-health]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation/state-and-health"
|
||||
[alerts-state-health]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rule-evaluation/state-and-health"
|
||||
|
||||
{{% /docs/reference %}}
|
@ -1,8 +1,9 @@
|
||||
---
|
||||
aliases:
|
||||
- ../../fundamentals/alert-rules/state-and-health/ # /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/state-and-health/
|
||||
- ../../fundamentals/state-and-health/ # /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/state-and-health/
|
||||
- ../../unified-alerting/alerting-rules/state-and-health/ # /docs/grafana/<GRAFANA_VERSION>/alerting/unified-alerting/alerting-rules/state-and-health
|
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rules/state-and-health/
|
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rule-evaluation/state-and-health/
|
||||
description: Learn about the state and health of alert rules to understand several key status indicators about your alerts
|
||||
keywords:
|
||||
- grafana
|
@ -95,8 +95,8 @@ When choosing which alert rule type to use, consider the following comparison be
|
||||
[create-recording-rules]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/alerting-rules/create-mimir-loki-managed-recording-rule"
|
||||
[create-recording-rules]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/alerting-rules/create-mimir-loki-managed-recording-rule"
|
||||
|
||||
[alert-rule-evaluation]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/rule-evaluation"
|
||||
[alert-rule-evaluation]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/rule-evaluation"
|
||||
[alert-rule-evaluation]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation"
|
||||
[alert-rule-evaluation]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rule-evaluation"
|
||||
|
||||
[expression-queries]: "/docs/grafana/ -> /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/queries-conditions#expression-queries"
|
||||
[expression-queries]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/queries-conditions#expression-queries"
|
||||
|
@ -1,72 +0,0 @@
|
||||
---
|
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rules/rule-evaluation/
|
||||
description: Use alert rule evaluation to determine how frequently an alert rule should be evaluated and how quickly it should change its state
|
||||
keywords:
|
||||
- grafana
|
||||
- alerting
|
||||
- evaluation
|
||||
labels:
|
||||
products:
|
||||
- cloud
|
||||
- enterprise
|
||||
- oss
|
||||
title: Alert rule evaluation
|
||||
weight: 108
|
||||
---
|
||||
|
||||
# Alert rule evaluation
|
||||
|
||||
Use alert rule evaluation to determine how frequently an alert rule should be evaluated and how quickly it should change its state.
|
||||
|
||||
To do this, you need to make sure that your alert rule is in the right evaluation group and set a pending period time that works best for your use case.
|
||||
|
||||
## Evaluation group
|
||||
|
||||
Every alert rule is part of an evaluation group. Each evaluation group contains an evaluation interval that determines how frequently the alert rule is checked.
|
||||
|
||||
**Data-source managed** alert rules within the same group are evaluated one after the other, while alert rules in different groups can be evaluated simultaneously. This feature is especially useful when you want to ensure that recording rules are evaluated before any alert rules.
|
||||
|
||||
**Grafana-managed** alert rules are evaluated at the same time, regardless of alert rule group. The default evaluation interval is set at 10 seconds, which means that Grafana-managed alert rules are evaluated every 10 seconds to the closest 10-second window on the clock, for example, 10:00:00, 10:00:10, 10:00:20, and so on. You can also configure your own evaluation interval, if required.
|
||||
|
||||
**Note:**
|
||||
|
||||
Evaluation groups and alerts grouping in notification policies are two separate things. Grouping in notification policies allows multiple alerts sharing the same labels to be sent in the same time message.
|
||||
|
||||
## Pending period
|
||||
|
||||
By setting a pending period, you can avoid unnecessary alerts for temporary problems.
|
||||
|
||||
In the pending period, you select the period in which an alert rule can be in breach of the condition until it fires.
|
||||
|
||||
**Example**
|
||||
|
||||
Imagine you have an alert rule evaluation interval set at every 30 seconds and the pending period to 90 seconds.
|
||||
|
||||
Evaluation occurs as follows:
|
||||
|
||||
[00:30] First evaluation - condition not met.
|
||||
|
||||
[01:00] Second evaluation - condition breached.
|
||||
Pending counter starts. **Alert starts pending.**
|
||||
|
||||
[01:30] Third evaluation - condition breached. Pending counter = 30s. **Pending state.**
|
||||
|
||||
[02:00] Fourth evaluation - condition breached. Pending counter = 60s **Pending state.**
|
||||
|
||||
[02:30] Fifth evaluation - condition breached. Pending counter = 90s. **Alert starts firing**
|
||||
|
||||
If the alert rule has a condition that needs to be in breach for a certain amount of time before it takes action, then its state changes as follows:
|
||||
|
||||
- When the condition is first breached, the rule goes into a "pending" state.
|
||||
|
||||
- The rule stays in the "pending" state until the condition has been broken for the required amount of time - pending period.
|
||||
|
||||
- After the required time has passed, the rule goes into a "firing" state.
|
||||
|
||||
- If the condition is no longer broken during the pending period, the rule goes back to its normal state.
|
||||
|
||||
**Note:**
|
||||
|
||||
If you want to skip the pending state, you can simply set the pending period to 0. This effectively skips the pending period and your alert rule starts firing as soon as the condition is breached.
|
||||
|
||||
When an alert rule fires, alert instances are produced, which are then sent to the Alertmanager.
|
Loading…
Reference in New Issue
Block a user