mirror of
https://github.com/grafana/grafana.git
synced 2025-02-25 18:55:37 -06:00
Add grafana_state_reason
section in State of alerts (#91562)
* Add `grafana_state_reason` section in State of alerts * Minor edit for clarification * Mention `Paused/RuleDeleted/Updated` states
This commit is contained in:
parent
8a97143120
commit
98a74d844e
@ -256,7 +256,7 @@ You can configure the alert instance state when its evaluation returns no data:
|
||||
| No Data configuration | Description |
|
||||
| --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| No Data | The default option. Sets alert instance state to `No data`. <br/> The alert rule also creates a new alert instance `DatasourceNoData` with the name and UID of the alert rule, and UID of the datasource that returned no data as labels. |
|
||||
| Alerting | Sets alert instance state to `Alerting`. It waits until the [pending period](ref:pending-period) has finished. |
|
||||
| Alerting | Sets alert instance state to `Alerting`. It transitions from `Pending` to `Alerting` after the [pending period](ref:pending-period) has finished. |
|
||||
| Normal | Sets alert instance state to `Normal`. |
|
||||
| Keep Last State | Maintains the alert instance in its last state. Useful for mitigating temporary issues, refer to [Keep last state](ref:keep-last-state). |
|
||||
|
||||
@ -265,7 +265,7 @@ You can also configure the alert instance state when its evaluation returns an e
|
||||
| Error configuration | Description |
|
||||
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Error | The default option. Sets alert instance state to `Error`. <br/> The alert rule also creates a new alert instance `DatasourceError` with the name and UID of the alert rule, and UID of the datasource that returned no data as labels. |
|
||||
| Alerting | Sets alert instance state to `Alerting`. It waits until the [pending period](ref:pending-period) has finished. |
|
||||
| Alerting | Sets alert instance state to `Alerting`. It transitions from `Pending` to `Alerting` after the [pending period](ref:pending-period) has finished. |
|
||||
| Normal | Sets alert instance state to `Normal`. |
|
||||
| Keep Last State | Maintains the alert instance in its last state. Useful for mitigating temporary issues, refer to [Keep last state](ref:keep-last-state). |
|
||||
|
||||
|
@ -44,13 +44,13 @@ There are three key components that help you understand how your alerts behave d
|
||||
|
||||
An alert instance can be in either of the following states:
|
||||
|
||||
| State | Description |
|
||||
| ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Normal** | The state of an alert when the condition (threshold) is not met. |
|
||||
| **Pending** | The state of an alert that has breached the threshold but for less than the [pending period](ref:pending-period). |
|
||||
| **Alerting** | The state of an alert that has breached the threshold for longer than the [pending period](ref:pending-period). |
|
||||
| **NoData** | The state of an alert whose query returns no data or all values are null. You can [change the default behavior](/docs/grafana/latest/alerting/alerting-rules/create-grafana-managed-rule/#configure-no-data-and-error-handling). |
|
||||
| **Error** | The state of an alert when an error or timeout occurred evaluating the alert rule. You can [change the default behavior](/docs/grafana/latest/alerting/alerting-rules/create-grafana-managed-rule/#configure-no-data-and-error-handling). |
|
||||
| State | Description |
|
||||
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Normal** | The state of an alert when the condition (threshold) is not met. |
|
||||
| **Pending** | The state of an alert that has breached the threshold but for less than the [pending period](ref:pending-period). |
|
||||
| **Alerting** | The state of an alert that has breached the threshold for longer than the [pending period](ref:pending-period). |
|
||||
| **NoData** | The state of an alert whose query returns no data or all values are null. You can [change the default behavior of the no data state](#modify-the-no-data-and-error-state). |
|
||||
| **Error** | The state of an alert when an error or timeout occurred evaluating the alert rule. You can [change the default behavior of the error state](#modify-the-no-data-and-error-state). |
|
||||
|
||||
{{< figure src="/media/docs/alerting/alert-instance-states-v3.png" caption="Alert instance state diagram" alt="A diagram of the distinct alert instance states and transitions." max-width="750px" >}}
|
||||
|
||||
@ -64,18 +64,37 @@ Alert instances will be routed for [notifications](ref:notifications) when they
|
||||
|
||||
An alert instance is considered stale if its dimension or series has disappeared from the query results entirely for two evaluation intervals.
|
||||
|
||||
Stale alert instances that are in the **Alerting**, **NoData**, or **Error** states transition to the **Normal** state as **Resolved**, and include the `grafana_state_reason` annotation with the value **MissingSeries**. They are routed for notifications like other resolved alert instances.
|
||||
Stale alert instances that are in the **Alerting**, **NoData**, or **Error** states transition to the **Normal** state as **Resolved**. Once transitioned, these resolved alert instances are routed for notifications like other resolved alerts.
|
||||
|
||||
### Keep last state
|
||||
### Modify the no data and error state
|
||||
|
||||
The "Keep Last State" option helps mitigate temporary data source issues, preventing alerts from unintentionally firing, resolving, and re-firing.
|
||||
|
||||
In [Configure no data and error handling,](ref:no-data-and-error-handling) you can decide to keep the last state of the alert instance when a `NoData` and/or `Error` state is encountered. Just like normal evaluation, the alert instance transitions from `Pending` to `Alerting` after the pending period has elapsed.
|
||||
In [Configure no data and error handling](ref:no-data-and-error-handling), you can change the default behaviour when the evaluation returns no data or an error. You can set the alert instance state to `Alerting`, `Normal`, or keep the last state.
|
||||
|
||||
{{< figure src="/media/docs/alerting/alert-rule-configure-no-data-and-error.png" alt="A screenshot of the `Configure no data and error handling` option in Grafana Alerting." max-width="500px" >}}
|
||||
|
||||
#### Keep last state
|
||||
|
||||
The "Keep Last State" option helps mitigate temporary data source issues, preventing alerts from unintentionally firing, resolving, and re-firing.
|
||||
|
||||
However, in situations where strict monitoring is critical, relying solely on the "Keep Last State" option may not be appropriate. Instead, consider using an alternative or implementing additional alert rules to ensure that issues with prolonged data source disruptions are detected.
|
||||
|
||||
### `grafana_state_reason` annotation
|
||||
|
||||
Occasionally, an alert instance may be in a state that isn't immediately clear to everyone. For example:
|
||||
|
||||
- Stale alert instances in the `Alerting` state transition to the `Normal` state when the series disappear.
|
||||
- If "no data" handling is configured to transition to a state other than `NoData`.
|
||||
- If "error" handling is configured to transition to a state other than `Error`.
|
||||
- If the alert rule is deleted, paused, or updated in some cases, the alert instance also transitions to the `Normal` state.
|
||||
|
||||
In these situations, the evaluation state may differ from the alert state, and it might be necessary to understand the reason for being in that state when receiving the notification.
|
||||
|
||||
The `grafana_state_reason` annotation is included in these situations, providing the reason in the notifications that explain why the alert instance transitioned to its current state. For example:
|
||||
|
||||
- Stale alert instances in the `Normal` state include the `grafana_state_reason` annotation with the value **MissingSeries**.
|
||||
- If "no data" or "error" handling transitions to the `Normal` state, the `grafana_state_reason` annotation is included with the value **NoData** or **Error**, respectively.
|
||||
- If the alert rule is deleted or paused, the `grafana_state_reason` is set to **Paused** or **RuleDeleted**. For some updates, it is set to **Updated**.
|
||||
|
||||
### Special alerts for `NoData` and `Error`
|
||||
|
||||
When evaluation of an alert rule produces state `NoData` or `Error`, Grafana Alerting generates a new alert instance that have the following additional labels:
|
||||
|
Loading…
Reference in New Issue
Block a user