mirror of
https://github.com/grafana/grafana.git
synced 2025-02-25 18:55:37 -06:00
Alerting docs: State and health of alerts
minor updates (#97880)
* Alerting docs: Update `State and health of alerts` * correct internal ref name
This commit is contained in:
parent
a9cd0f19f4
commit
0cce224adb
@ -29,11 +29,11 @@ refs:
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation/state-and-health/#alert-instance-state
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rule-evaluation/state-and-health/#alert-instance-state
|
||||
modify-the-no-data-and-error-state:
|
||||
modify-the-no-data-or-error-state:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation/state-and-health/#modify-the-no-data-and-error-state
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation/state-and-health/#modify-the-no-data-or-error-state
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rule-evaluation/state-and-health/#modify-the-no-data-and-error-state
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rule-evaluation/state-and-health/#modify-the-no-data-or-error-state
|
||||
pending-period:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation/#pending-period
|
||||
@ -241,7 +241,7 @@ To do this, you need to make sure that your alert rule is in the right evaluatio
|
||||
|
||||
{{< docs/shared lookup="alerts/table-configure-no-data-and-error.md" source="grafana" version="<GRAFANA_VERSION>" >}}
|
||||
|
||||
For more details, refer to [alert instance states](ref:alert-instance-state) and [modify the no data and error state](ref:modify-the-no-data-and-error-state).
|
||||
For more details, refer to [alert instance states](ref:alert-instance-state) and [modify the no data or error state](ref:modify-the-no-data-or-error-state).
|
||||
|
||||
## Configure notifications
|
||||
|
||||
|
@ -45,36 +45,57 @@ There are three key components that help you understand how your alerts behave d
|
||||
An alert instance can be in either of the following states:
|
||||
|
||||
| State | Description |
|
||||
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Normal** | The state of an alert when the condition (threshold) is not met. |
|
||||
| **Pending** | The state of an alert that has breached the threshold but for less than the [pending period](ref:pending-period). |
|
||||
| **Alerting** | The state of an alert that has breached the threshold for longer than the [pending period](ref:pending-period). |
|
||||
| **NoData** | The state of an alert whose query returns no data or all values are null. You can [change the default behavior of the no data state](#modify-the-no-data-and-error-state). |
|
||||
| **Error** | The state of an alert when an error or timeout occurred evaluating the alert rule. You can [change the default behavior of the error state](#modify-the-no-data-and-error-state). |
|
||||
| **No Data<sup>\*</sup>** | The state of an alert whose query returns no data or all values are null. <br/> An alert in this state generates a new [DatasourceNoData alert](#no-data-and-error-alerts). You can [modify the default behavior of the no data state](#modify-the-no-data-or-error-state). |
|
||||
| **Error<sup>\*</sup>** | The state of an alert when an error or timeout occurred evaluating the alert rule. <br/> An alert in this state generates a new [DatasourceError alert](#no-data-and-error-alerts). You can [modify the default behavior of the error state](#modify-the-no-data-or-error-state). |
|
||||
|
||||
{{< admonition type="note" >}}
|
||||
|
||||
`No Data` and `Error` states are supported only for Grafana-managed alert rules.
|
||||
|
||||
{{< /admonition >}}
|
||||
|
||||
{{< figure src="/media/docs/alerting/alert-instance-states-v3.png" caption="Alert instance state diagram" alt="A diagram of the distinct alert instance states and transitions." max-width="750px" >}}
|
||||
|
||||
### Notifications
|
||||
### Notification routing
|
||||
|
||||
Alert instances will be routed for [notifications](ref:notifications) when they are in the `Alerting` state or have been `Resolved`, transitioning from `Alerting` to `Normal` state.
|
||||
|
||||
{{< figure src="/media/docs/alerting/alert-rule-evaluation-overview-statediagram-v2.png" alt="A diagram of the alert instance states and when to route their notifications." max-width="750px" >}}
|
||||
|
||||
### `No Data` and `Error` alerts
|
||||
|
||||
When evaluation of an alert rule produces state `No Data` or `Error`, Grafana Alerting generates a new alert instance that have the following additional labels:
|
||||
|
||||
- `alertname`: Either `DatasourceNoData` or `DatasourceError` depending on the state.
|
||||
- `datasource_uid`: The UID of the data source that caused the state.
|
||||
|
||||
You can manage these alerts like regular ones by using their labels to apply actions such as adding a silence, routing via notification policies, and more.
|
||||
|
||||
### Lifecycle of stale alert instances
|
||||
|
||||
An alert instance is considered stale if its dimension or series has disappeared from the query results entirely for two evaluation intervals.
|
||||
|
||||
Stale alert instances that are in the **Alerting**, **No Data**, or **Error** states transition to the **Normal** state as **Resolved**. Once transitioned, these resolved alert instances are routed for notifications like other resolved alerts.
|
||||
|
||||
### Modify the no data and error state
|
||||
## Modify the `No Data` or `Error` state
|
||||
|
||||
In [Configure no data and error handling](ref:no-data-and-error-handling), you can change the default behaviour when the evaluation returns no data or an error. You can set the alert instance state to `Alerting`, `Normal`, or keep the last state.
|
||||
|
||||
Note that `No Data` and `Error` states are supported only for Grafana-managed alert rules.
|
||||
|
||||
{{< figure src="/media/docs/alerting/alert-rule-configure-no-data-and-error.png" alt="A screenshot of the `Configure no data and error handling` option in Grafana Alerting." max-width="500px" >}}
|
||||
|
||||
{{< docs/shared lookup="alerts/table-configure-no-data-and-error.md" source="grafana" version="<GRAFANA_VERSION>" >}}
|
||||
|
||||
To reduce the number of **No Data** or **Error** state alerts received, try the following.
|
||||
Note that when you configure the **No Data** or **Error** behavior to `Alerting` or `Normal`, Grafana attempts to keep a stable set of fields under notification `Values`. If your query returns no data or an error, Grafana re-uses the latest known set of fields in `Values`, but will use `-1` in place of the measured value.
|
||||
|
||||
### Reduce `No Data` or `Error` alerts
|
||||
|
||||
To minimize the number of **No Data** or **Error** state alerts received, try the following.
|
||||
|
||||
1. Use the **Keep last state** option. For more information, refer to the section below. This option allows the alert to retain its last known state when there is no data available, rather than switching to a **No Data** state.
|
||||
1. For **No Data** alerts, you can optimize your alert rule by expanding the time range of the query. However, if the time range is too big, it affects the performance of the query and can lead to errors due to timeout.
|
||||
@ -83,15 +104,13 @@ To reduce the number of **No Data** or **Error** state alerts received, try the
|
||||
|
||||
1. Change the default [evaluation time out](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#evaluation_timeout). The default is set at 30 seconds. To increase the default evaluation timeout, open a support ticket from the [Cloud Portal](https://grafana.com/docs/grafana-cloud/account-management/support/#grafana-cloud-support-options). Note that this should be a last resort, because it may affect the performance of all alert rules and cause missed evaluations if the timeout is too long.
|
||||
|
||||
Note that when you configure the **No data** or **Error** behavior to `Alerting` or `Normal`, Grafana attempts to keep a stable set of fields under notification `Values`. If your query returns no data or an error, Grafana re-uses the latest known set of fields in `Values`, but will use `-1` in place of the measured value.
|
||||
|
||||
#### Keep last state
|
||||
### Keep last state
|
||||
|
||||
The "Keep Last State" option helps mitigate temporary data source issues, preventing alerts from unintentionally firing, resolving, and re-firing.
|
||||
|
||||
However, in situations where strict monitoring is critical, relying solely on the "Keep Last State" option may not be appropriate. Instead, consider using an alternative or implementing additional alert rules to ensure that issues with prolonged data source disruptions are detected.
|
||||
|
||||
### `grafana_state_reason` annotation
|
||||
## `grafana_state_reason` for troubleshooting
|
||||
|
||||
Occasionally, an alert instance may be in a state that isn't immediately clear to everyone. For example:
|
||||
|
||||
@ -102,21 +121,12 @@ Occasionally, an alert instance may be in a state that isn't immediately clear t
|
||||
|
||||
In these situations, the evaluation state may differ from the alert state, and it might be necessary to understand the reason for being in that state when receiving the notification.
|
||||
|
||||
The `grafana_state_reason` annotation is included in these situations, providing the reason in the notifications that explain why the alert instance transitioned to its current state. For example:
|
||||
The `grafana_state_reason` annotation is included in these situations, providing the reason that explains why the alert instance transitioned to its current state. For example:
|
||||
|
||||
- Stale alert instances in the `Normal` state include the `grafana_state_reason` annotation with the value **MissingSeries**.
|
||||
- If "no data" or "error" handling transitions to the `Normal` state, the `grafana_state_reason` annotation is included with the value **No Data** or **Error**, respectively.
|
||||
- If the alert rule is deleted or paused, the `grafana_state_reason` is set to **Paused** or **RuleDeleted**. For some updates, it is set to **Updated**.
|
||||
|
||||
### Special alerts for `NoData` and `Error`
|
||||
|
||||
When evaluation of an alert rule produces state `NoData` or `Error`, Grafana Alerting generates a new alert instance that have the following additional labels:
|
||||
|
||||
- `alertname`: Either `DatasourceNoData` or `DatasourceError` depending on the state.
|
||||
- `datasource_uid`: The UID of the data source that caused the state.
|
||||
|
||||
You can manage these alerts like regular ones by using their labels to apply actions such as adding a silence, routing via notification policies, and more.
|
||||
|
||||
## Alert rule state
|
||||
|
||||
The alert rule state is determined by the “worst case” state of the alert instances produced. For example, if one alert instance is `Alerting`, the alert rule state is firing.
|
||||
|
@ -6,9 +6,9 @@ title: 'Table configure no data and error'
|
||||
---
|
||||
|
||||
| Configure | Set alert state | Description |
|
||||
| ------------ | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| NoData | No Data | The default option for **No Data** events.<br/>Sets alert instance state to `No data`. <br/> The alert rule also creates a new alert instance `DatasourceNoData` with the name and UID of the alert rule, and UID of the datasource that returned no data as labels. |
|
||||
| ---------------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| No Data | No Data | The default option for **No Data** events.<br/>Sets alert instance state to `No Data`. <br/> The alert rule also creates a new alert instance `DatasourceNoData` with the name and UID of the alert rule, and UID of the datasource that returned no data as labels. |
|
||||
| Error | Error | The default option for **Error** events.<br/>Sets alert instance state to `Error`. <br/> The alert rule also creates a new alert instance `DatasourceError` with the name and UID of the alert rule, and UID of the datasource that returned no data as labels. |
|
||||
| NoData/Error | Alerting | Sets the alert instance state to `Pending` and then transitions to `Alerting` once the pending period ends. If you sent the pending period to 0, the alert instance state is immediately set to `Alerting`. |
|
||||
| NoData/Error | Normal | Sets alert instance state to `Normal`. |
|
||||
| NoData/Error | Keep Last State | Maintains the alert instance in its last state. Useful for mitigating temporary issues. |
|
||||
| No Data or Error | Alerting | Sets the alert instance state to `Pending` and then transitions to `Alerting` once the pending period ends. If you sent the pending period to 0, the alert instance state is immediately set to `Alerting`. |
|
||||
| No Data or Error | Normal | Sets alert instance state to `Normal`. |
|
||||
| No Data or Error | Keep Last State | Maintains the alert instance in its last state. Useful for mitigating temporary issues. |
|
||||
|
Loading…
Reference in New Issue
Block a user