Alerting Docs: Update the introduction to Templates (#93935)

* Intro/Templates: update Intro and Template annotations sections

* Template labels section + adjustements

* Template notifications

* Use diagram for `meta_image`
This commit is contained in:
Pepe Cano 2024-09-30 11:31:18 +02:00 committed by GitHub
parent 66b881ae2f
commit 405887eebf
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 169 additions and 145 deletions

View File

@ -36,6 +36,64 @@ Each template is evaluated whenever the alert rule is evaluated, and is evaluate
Extra whitespace in label templates can break matches with notification policies. Extra whitespace in label templates can break matches with notification policies.
{{% /admonition %}} {{% /admonition %}}
## Variables
In Grafana templating, the `$` and `.` symbols are used to reference variables and their properties. You can reference variables directly in your alert rule definitions using the `$` symbol followed by the variable name. Similarly, you can access properties of variables using the dot (`.`) notation within alert rule definitions.
The following variables are available to you when templating labels and annotations:
### The labels variable
The `$labels` variable contains all labels from the query. For example, suppose you have a query that returns CPU usage for all of your servers, and you have an alert rule that fires when any of your servers have exceeded 80% CPU usage for the last 5 minutes. You want to add a summary annotation to the alert that tells you which server is experiencing high CPU usage. With the `$labels` variable you can write a template that prints a human-readable sentence such as:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes
```
> If you are using a classic condition then `$labels` will not contain any labels from the query. Classic conditions discard these labels in order to enforce uni-dimensional behavior (at most one alert per alert rule). If you want to use labels from the query in your template then use the example [here](#print-all-labels-from-a-classic-condition).
### The value variable
The `$value` variable is a string containing the labels and values of all instant queries; threshold, reduce and math expressions, and classic conditions in the alert rule. It does not contain the results of range queries, as these can return anywhere from 10s to 10,000s of rows or metrics. If it did, for especially large queries a single alert could use 10s of MBs of memory and Grafana would run out of memory very quickly.
To print the `$value` variable in the summary you would write something like this:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ $value }}
```
And would look something like this:
```
CPU usage for instance1 has exceeded 80% for the last 5 minutes: [ var='A' labels={instance=instance1} value=81.234 ]
```
Here `var='A'` refers to the instant query with Ref ID A, `labels={instance=instance1}` refers to the labels, and `value=81.234` refers to the average CPU usage over the last 5 minutes.
If you want to print just some of the string instead of the full string then use the `$values` variable. It contains the same information as `$value`, but in a structured table, and is much easier to use then writing a regular expression to match just the text you want.
### The values variable
The `$values` variable is a table containing the labels and floating point values of all instant queries and expressions, indexed by their Ref IDs.
To print the value of the instant query with Ref ID A:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ index $values "A" }}
```
For example, given an alert with the labels `instance=server1` and an instant query with the value `81.2345`, this would print:
```
CPU usage for instance1 has exceeded 80% for the last 5 minutes: 81.2345
```
If the query in Ref ID A is a range query rather than an instant query then add a reduce expression with Ref ID B and replace `(index $values "A")` with `(index $values "B")`:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ index $values "B" }}
```
## Examples ## Examples
The following examples attempt to show the most common use-cases we have seen for templates. You can use these examples verbatim, or adapt them as necessary for your use case. For more information on how to write text/template refer see [the beginner's guide to alert notification templates in Grafana](https://grafana.com/blog/2023/04/05/grafana-alerting-a-beginners-guide-to-templating-alert-notifications/). The following examples attempt to show the most common use-cases we have seen for templates. You can use these examples verbatim, or adapt them as necessary for your use case. For more information on how to write text/template refer see [the beginner's guide to alert notification templates in Grafana](https://grafana.com/blog/2023/04/05/grafana-alerting-a-beginners-guide-to-templating-alert-notifications/).
@ -216,62 +274,6 @@ B2: 84.5678
B3: 95.6789 B3: 95.6789
``` ```
## Variables
The following variables are available to you when templating labels and annotations:
### The labels variable
The `$labels` variable contains all labels from the query. For example, suppose you have a query that returns CPU usage for all of your servers, and you have an alert rule that fires when any of your servers have exceeded 80% CPU usage for the last 5 minutes. You want to add a summary annotation to the alert that tells you which server is experiencing high CPU usage. With the `$labels` variable you can write a template that prints a human-readable sentence such as:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes
```
> If you are using a classic condition then `$labels` will not contain any labels from the query. Classic conditions discard these labels in order to enforce uni-dimensional behavior (at most one alert per alert rule). If you want to use labels from the query in your template then use the example [here](#print-all-labels-from-a-classic-condition).
### The value variable
The `$value` variable is a string containing the labels and values of all instant queries; threshold, reduce and math expressions, and classic conditions in the alert rule. It does not contain the results of range queries, as these can return anywhere from 10s to 10,000s of rows or metrics. If it did, for especially large queries a single alert could use 10s of MBs of memory and Grafana would run out of memory very quickly.
To print the `$value` variable in the summary you would write something like this:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ $value }}
```
And would look something like this:
```
CPU usage for instance1 has exceeded 80% for the last 5 minutes: [ var='A' labels={instance=instance1} value=81.234 ]
```
Here `var='A'` refers to the instant query with Ref ID A, `labels={instance=instance1}` refers to the labels, and `value=81.234` refers to the average CPU usage over the last 5 minutes.
If you want to print just some of the string instead of the full string then use the `$values` variable. It contains the same information as `$value`, but in a structured table, and is much easier to use then writing a regular expression to match just the text you want.
### The values variable
The `$values` variable is a table containing the labels and floating point values of all instant queries and expressions, indexed by their Ref IDs.
To print the value of the instant query with Ref ID A:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ index $values "A" }}
```
For example, given an alert with the labels `instance=server1` and an instant query with the value `81.2345`, this would print:
```
CPU usage for instance1 has exceeded 80% for the last 5 minutes: 81.2345
```
If the query in Ref ID A is a range query rather than an instant query then add a reduce expression with Ref ID B and replace `(index $values "A")` with `(index $values "B")`:
```
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ index $values "B" }}
```
## Functions ## Functions
The following functions are available to you when templating labels and annotations: The following functions are available to you when templating labels and annotations:

View File

@ -17,148 +17,170 @@ labels:
- enterprise - enterprise
- oss - oss
title: Templates title: Templates
meta_image: /media/docs/alerting/how-notification-templates-works.png
weight: 115 weight: 115
refs: refs:
variables-label-annotation: labels:
- pattern: /docs/grafana/
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/annotation-label/#labels
- pattern: /docs/grafana-cloud/
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/annotation-label/#labels
annotations:
- pattern: /docs/grafana/
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/annotation-label/#annotations
- pattern: /docs/grafana-cloud/
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/annotation-label/#annotations
templating-labels-annotations:
- pattern: /docs/grafana/ - pattern: /docs/grafana/
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/alerting-rules/templating-labels-annotations/ destination: /docs/grafana/<GRAFANA_VERSION>/alerting/alerting-rules/templating-labels-annotations/
- pattern: /docs/grafana-cloud/ - pattern: /docs/grafana-cloud/
destination: /docs/grafana-cloud/alerting-and-irm/alerting/alerting-rules/templating-labels-annotations/ destination: /docs/grafana-cloud/alerting-and-irm/alerting/alerting-rules/templating-labels-annotations/
notification-message-reference:
- pattern: /docs/grafana/
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/configure-notifications/template-notifications/reference/
- pattern: /docs/grafana-cloud/
destination: /docs/grafana-cloud/alerting-and-irm/alerting/configure-notifications/template-notifications/reference/
notification-messages:
- pattern: /docs/grafana/
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/configure-notifications/template-notifications/
- pattern: /docs/grafana-cloud/
destination: /docs/grafana-cloud/alerting-and-irm/alerting/configure-notifications/template-notifications/
create-notification-templates:
- pattern: /docs/grafana/
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/configure-notifications/template-notifications/create-notification-templates/
- pattern: /docs/grafana-cloud/
destination: /docs/grafana-cloud/alerting-and-irm/alerting/configure-notifications/template-notifications/create-notification-templates/
--- ---
# Templates # Templates
Use templating to customize, format, and reuse alert notification messages. Create more flexible and informative alert notification messages by incorporating dynamic content, such as metric values, labels, and other contextual information. Use templating to customize, format, and reuse alert notification messages. Create more flexible and informative alert notification messages by incorporating dynamic content, such as metric values, labels, and other contextual information.
In Grafana, there are two ways to template your alert notification messages: In Grafana, you have various options to template your alert notification messages:
1. Labels and annotations 1. [Alert rule annotations](#template-annotations)
- Template labels and annotations in alert rules. - Annotations add extra information, like `summary` and `description`, to alert instances for notification messages.
- Labels and annotations contain information about an alert. - Template annotations to display query values that are meaningful to the alert, for example, the server name or the threshold query value.
- Labels are used to differentiate an alert from all other alerts, while annotations are used to add additional information to an existing alert.
2. Notification templates 1. [Alert rule labels](#template-labels)
- Template notifications in contact points. - Labels are used to differentiate an alert instance from all other alert instances.
- Add notification templates to contact points for reuse and consistent messaging in your notifications. - Template labels to add an additional label based on a query value, or when the labels from the query are incomplete or not descriptive enough.
- Use notification templates to change the title, message, and format of the message in your notifications.
This diagram illustrates the entire process of templating, from the creation of labels and annotations in alert rules or notification templates in contact points, to what they look like when exported and applied in your alert notification messages. 1. [Notification templates](#template-notifications)
- Notification templates are used by contact points for consistent messaging in notification titles and descriptions.
- Template notifications when you want to customize the appearance and information of your notifications.
- Avoid using notification templates to add extra information to alert instances—use annotations instead.
{{< figure src="/media/docs/alerting/grafana-templating-diagram-2.jpg" max-width="1200px" caption="How Templating works" >}} This diagram illustrates the entire templating process, from querying labels and templating the alert summary and notification to the final alert notification message.
{{< figure src="/media/docs/alerting/how-notification-templates-works.png" max-width="1200px" caption="How templating works" >}}
In this diagram: In this diagram:
- **Monitored Application**: A web server, database, or any other service generating metrics. For example, it could be an NGINX server providing metrics about request rates, response times, and so on. 1. The alert rule query returns `12345`, along with the values of the `instance` and `job` labels.
- **Prometheus**: Prometheus collects metrics from the monitored application. For example, it might scrape metrics from the NGINX server, including labels like instance (the server hostname) and job (the service name). 1. This query result breaches the alert rule condition, firing the alert instance.
- **Grafana**: Grafana queries Prometheus to retrieve metrics data. For example, you might create an alert rule to monitor NGINX request rates over time, and template labels or annotations based on the instance label. 1. The alert instance generates an annotation summary, defined by the template used in the alert rule summary. In this case, it displays the value of the `instance` label: `server1`.
- **Alertmanager**: Part of the Prometheus ecosystem, Alertmanager handles alert notifications. For example, if the request rate exceeds a certain threshold on a particular NGINX server, Alertmanager can send an alert notification to, for example, Slack or email, including the server name and the exceeded threshold (the instance label will be interpolated, and the actual server name will appear in the alert notification). 1. The Alertmanager receives the firing alert instance, including the final annotation summary, and determines the contact point that will process the alert.
- **Alert notification**: When an alert rule condition is met, Alertmanager sends a notification to various channels such as Slack, Grafana OnCall, etc. These notifications can include information from the labels associated with the alerting rule. For example, if an alert triggers due to high CPU usage on a specific server, the notification message can include details like server name (instance label), disk usage percentage, and the threshold that was exceeded. 1. The Alertmanager uses the contact point's notification template to format the message, then sends the notification to the configured destination(s)—an email address.
## Labels and annotations ## Template annotations
Labels and annotations contain information about an alert. Labels are used to differentiate an alert from all other alerts, while annotations are used to add additional information to an existing alert. [Annotations](ref:annotations) can be defined in the alert rule to add extra information to alert instances.
### Template labels When creating an alert rule, Grafana suggests several optional annotations, such as `description`, `summary`, `runbook_url`, `dashboardUId` and `panelId`, which help identify and respond to alerts. You can also create custom annotations.
Label templates are applied in the alert rule itself (i.e. in the Configure labels and notifications section of an alert). Annotations are key-value pairs, and their values can contain a combination of text and template code that is evaluated when the alert fires.
{{<admonition type="note">}} Annotations can contain plain text, but you should template annotations if you need to display query values that are relevant to the alert, for example:
Think about templating labels when you need to improve or change how alerts are uniquely identified. This is especially helpful if the labels you get from your query aren't detailed enough. Keep in mind that it's better to keep long sentences for summaries and descriptions. Also, avoid using the query's value in labels because it may result in the creation of many alerts when you actually only need one.
{{</admonition>}}
Templating can be applied by using variables and functions. These variables can represent dynamic values retrieved from your data queries. - Show the query value that triggers the alert.
- Include labels returned by the query that identify the alert.
- Format the annotation message depending on a query value.
{{<admonition type="note">}} Heres an example of templating an annotation, which explains where and why the alert was triggered. In this case, the alert triggers when CPU usage exceeds a threshold, and the `summary` annotation provides the relevant details.
In Grafana templating, the $ and . symbols are used to reference variables and their properties. You can reference variables directly in your alert rule definitions using the $ symbol followed by the variable name. Similarly, you can access properties of variables using the dot (.) notation within alert rule definitions.
{{</admonition>}}
Here are some commonly used built-in [variables](ref:variables-label-annotation) to interact with the name and value of labels in Grafana alerting: ```
CPU usage for {{ index $labels "instance" }} has exceeded 80% ({{ index $values "A" }}) for the last 5 minutes.
```
- The `$labels` variable, which contains all labels from the query. The outcome of this template would be:
For example, let's say you have an alert rule that triggers when the CPU usage exceeds a certain threshold. You want to create annotations that provide additional context when this alert is triggered, such as including the specific server that experienced the high CPU usage. ```
CPU usage for Instance 1 has exceeded 80% (81.2345) for the last 5 minutes.
```
The host {{ index $labels "instance" }} has exceeded 80% CPU usage for the last 5 minutes Implement annotations that provide meaningful information to respond to your alerts. Annotations are displayed in the Grafana alert detail view and are included by default in notifications.
The outcome of this template would print: For more details on how to template annotations, refer to [Template annotations and labels](ref:templating-labels-annotations).
The host instance 1 has exceeded 80% CPU usage for the last 5 minutes ## Template labels
- The `$value` variable, which is a string containing the labels and values of all instant queries; threshold, reduce and math expressions, and classic conditions in the alert rule. [Labels](ref:labels) are used to differentiate one alert instance from all other alert instances, as the set of labels uniquely identifies an alert instance. Notification policies and silences use labels to handle alert instances.
In the context of the previous example, $value variable would write something like this: Template labels when you need to improve or change how alerts are uniquely identified. This is helpful if the labels you get from your query aren't detailed enough.
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ $value }} Heres an example of templating a `severity` label based on the query value:
The outcome of this template would print: ```
{{ if (gt $values.A.Value 90.0) -}}
critical
{{ else if (gt $values.A.Value 80.0) -}}
high
{{ else if (gt $values.A.Value 60.0) -}}
medium
{{ else -}}
low
{{- end }}
```
CPU usage for instance1 has exceeded 80% for the last 5 minutes: [ var='A' labels={instance=instance1} value=81.234 ] Avoid using query values in labels, as this may result in the creation of numerous alerts when only one is needed. Use annotation to inform about the query value instead.
- The `$values` variable is a table containing the labels and floating point values of all instant queries and expressions, indexed by their Ref IDs (i.e. the id that identifies the query or expression. By default the Red ID of the query is “A”). For more details on how to template labels, refer to [Template annotations and labels](ref:templating-labels-annotations).
Given an alert with the labels instance=server1 and an instant query with the value 81.2345, would write like this: ## Template notifications
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ index $values "A" }} [Notification templates](ref:notification-messages) allow you to customize the content of your notifications, such as the subject of an email or the body of a Slack message.
And it would print: Notification templates differ from templating annotations and labels in the following ways:
CPU usage for instance1 has exceeded 80% for the last 5 minutes: 81.2345 - Notification templates are assigned to the **Contact point**, rather than the alert rule.
- If not specified, the contact point uses a default template that includes relevant alert information.
- You can create reusable notification templates and reference them in other templates.
- The same template can be shared across multiple contact points, making it easier to maintain and ensuring consistency.
- While both annotation/label templates and notification templates use the same templating language, the available variables and functions differ. For more details, refer to the [notification template reference](ref:notification-message-reference) and [annotation/label template reference](ref:templating-labels-annotations).
- Notification templates should not be used to add additional information to individual alerts—use annotations for that purpose.
{{% admonition type="caution" %}} Here is an example of a notification template that summarizes all firing and resolved alerts in a notification group:
Extra whitespace in label templates can break matches with notification policies.
{{% /admonition %}}
### Template annotations ```
Both labels and annotations have the same structure: a set of named values; however their intended uses are different. The purpose of annotations is to add additional information to existing alerts.
There are a number of suggested annotations in Grafana such as `description`, `summary`, `runbook_url`, `dashboardUId` and `panelId`. Like labels, annotations must have a name, and their value can contain a combination of text and template code that is evaluated when an alert is fired.
Here is an example of templating an annotation in the context of an alert rule. The text/template is added into the Add annotations section.
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes
The outcome of this template would print
CPU usage for Instance 1 has exceeded 80% for the last 5 minutes
### Template notifications
Notification templates represent the alternative approach to templating designed for reusing templates. Notifications are messages to inform users about events or conditions triggered by alerts. You can create reusable notification templates to customize the content and format of alert notifications. Variables, labels, or other context-specific details can be added to the templates to dynamically insert information like metric values.
Here is an example of a notification template:
```go
{{ define "alerts.message" -}} {{ define "alerts.message" -}}
{{ if .Alerts.Firing -}} {{ if .Alerts.Firing -}}
{{ len .Alerts.Firing }} firing alert(s) {{ len .Alerts.Firing }} firing alert(s)
{{ template "alerts.summarize" .Alerts.Firing }} {{ template "alerts.summarize" .Alerts.Firing }}
{{- end }} {{- end }}
{{- if .Alerts.Resolved -}} {{- if .Alerts.Resolved -}}
{{ len .Alerts.Resolved }} resolved alert(s) {{ len .Alerts.Resolved }} resolved alert(s)
{{ template "alerts.summarize" .Alerts.Resolved }} {{ template "alerts.summarize" .Alerts.Resolved }}
{{- end }} {{- end }}
{{- end }} {{- end }}
{{ define "alerts.summarize" -}} {{ define "alerts.summarize" -}}
{{ range . -}} {{ range . -}}
- {{ index .Annotations "summary" }} - {{ index .Annotations "summary" }}
{{ end }} {{ end }}
{{ end }} {{ end }}
``` ```
This is the message you would receive in your contact point: The notification message to the contact point would look like this:
1 firing alert(s) ```
- The database server db1 has exceeded 75% of available disk space. Disk space used is 76%, please resize the disk size within the next 24 hours 1 firing alert(s)
- The database server db1 has exceeded 75% of available disk space. Disk space used is 76%, please resize the disk size within the next 24 hours.
1 resolved alert(s) 1 resolved alert(s)
- The web server web1 has been responding to 5% of HTTP requests with 5xx errors for the last 5 minutes - The web server web1 has been responding to 5% of HTTP requests with 5xx errors for the last 5 minutes.
```
Once the template is created, you need to make reference to it in your **Contact point** (in the Optional `[contact point]` settings) . For instructions on creating and using notification templates, refer to [Create notification templates.](ref:create-notification-templates)
{{<admonition type="note">}}
It's not recommended to include individual alert information within notification templates. Instead, it's more effective to incorporate such details within the rule using labels and annotations.
{{</admonition>}}