Alerting: Improve the docs on templating labels and annotations (#76593)

The previous version of this page I wrote with the expectation that
readers would first learn the templating language and then write
their templates. This doesn't seem to have worked out as well as I
had expected, and so I've rewritten the documentation to explain
the language using relevant and useful examples instead.
This commit is contained in:
George Robinson 2023-10-23 09:43:13 +01:00 committed by GitHub
parent 4b6b3b7018
commit e743aa54b8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -18,153 +18,255 @@ weight: 117
# Templating labels and annotations
In Grafana you template labels and annotations just like you would in Prometheus. If you have used Prometheus before then you should be familiar with the `$labels` and `$value` variables which contain the labels and value of the alert. You can use the same variables in Grafana, even if the alert does not use a Prometheus datasource. If you haven't used Prometheus before then don't worry as each of these variables, and how to template them, will be explained as you follow the rest of this page.
You can use templates to include data from queries and expressions in labels and annotations. For example, you might want to set the severity label for an alert based on the value of the query, or use the instance label from the query in a summary annotation so you know which server is experiencing high CPU usage.
## Go's templating language
All templates should be written in [text/template](https://pkg.go.dev/text/template). Regardless of whether you are templating a label or an annotation, you should write each template inline inside the label or annotation that you are templating. This means you cannot share templates between labels and annotations, and instead you will need to copy templates wherever you want to use them.
Templates for labels and annotations are written in Go's templating language, [text/template](https://pkg.go.dev/text/template).
Each template is evaluated whenever the alert rule is evaluated, and is evaluated for every alert separately. For example, if your alert rule has a templated summary annotation, and the alert rule has 10 firing alerts, then the template will be executed 10 times, once for each alert. You should try to avoid doing expensive computations in your templates as much as possible.
### Opening and closing tags
## Examples
In text/template, templates start with `{{` and end with `}}` irrespective of whether the template prints a variable or executes control structures such as if statements. This is different from other templating languages such as Jinja where printing a variable uses `{{` and `}}` and control structures use `{%` and `%}`.
Rather than write a complete tutorial on text/template, the following examples attempt to show the most common use-cases we have seen for templates. You can use these examples verbatim, or adapt them as necessary for your use case. For more information on how to write text/template refer to the [text/template](https://pkg.go.dev/text/template) documentation.
### Print
### Print all labels, comma separated
To print the value of something use `{{` and `}}`. You can print the the result of a function or the value of a variable. For example, to print the `$labels` variable you would write the following:
To print all labels, comma separated, print the `$labels` variable:
```
{{ $labels }}
```
### Iterate over labels
To iterate over each label in `$labels` you can use a `range`. Here `$k` refers to the name and `$v` refers to the value of the current label. For example, if your query returned a label `instance=test` then `$k` would be `instance` and `$v` would be `test`.
For example, given an alert with the labels `alertname=High CPU usage`, `grafana_folder=CPU alerts` and `instance=server1`, this would print:
```
{{ range $k, $v := $labels }}
alertname=High CPU usage, grafana_folder=CPU alerts, instance=server1
```
> If you are using classic conditions then `$labels` will not contain any labels from the query. Refer to [the $labels variable](#the-labels-variable) for more information.
### Print all labels, one per line
To print all labels, one per line, use a `range` to iterate over each key/value pair and print them individually. Here `$k` refers to the name and `$v` refers to the value of the current label:
```
{{ range $k, $v := $labels -}}
{{ $k }}={{ $v }}
{{ end }}
```
## The labels, value and values variables
For example, given an alert with the labels `alertname=High CPU usage`, `grafana_folder=CPU alerts` and `instance=server1`, this would print:
```
alertname=High CPU usage
grafana_folder=CPU alerts
instance=server1
```
> If you are using classic conditions then `$labels` will not contain any labels from the query. Refer to [the $labels variable](#the-labels-variable) for more information.
### Print an individual label
To print an individual label use the `index` function with the `$labels` variable:
```
The host {{ index $labels "instance" }} has exceeded 80% CPU usage for the last 5 minutes
```
For example, given an alert with the labels `instance=server1`, this would print:
```
The host server1 has exceeded 80% CPU usage for the last 5 minutes
```
> If you are using classic conditions then `$labels` will not contain any labels from the query. Refer to [the $labels variable](#the-labels-variable) for more information.
### Print the value of a query
To print the value of an instant query you can print its Ref ID using the `index` function and the `$values` variable:
```
{{ index $values "A" }}
```
For example, given an instant query that returns the value 81.2345, this will print:
```
81.2345
```
To print the value of a range query you must first reduce it from a time series to an instant vector with a reduce expression. You can then print the result of the reduce expression by using its Ref ID instead. For example, if the reduce expression takes the average of A and has the Ref ID B you would write:
```
{{ index $values "B" }}
```
### Print the humanized value of a query
To print the humanized value of an instant query use the `humanize` function:
```
{{ humanize (index $values "A").Value }}
```
For example, given an instant query that returns the value 81.2345, this will print:
```
81.234
```
To print the humanized value of a range query you must first reduce it from a time series to an instant vector with a reduce expression. You can then print the result of the reduce expression by using its Ref ID instead. For example, if the reduce expression takes the average of A and has the Ref ID B you would write:
```
{{ humanize (index $values "B").Value }}
```
### Print the value of a query as a percentage
To print the value of an instant query as a percentage use the `humanizePercentage` function:
```
{{ humanizePercentage (index $values "A").Value }}
```
This function expects the value to be a decimal number between 0 and 1. If the value is instead a decimal number between 0 and 100 you can either divide it by 100 in your query or using a math expression. If the query is a range query you must first reduce it from a time series to an instant vector with a reduce expression.
### Set a severity from the value of a query
To set a severity label from the value of a query use an if statement and the greater than comparison function. Make sure to use decimals (`80.0`, `50.0`, `0.0`, etc) when doing comparisons against `$values` as text/template does not support type coercion. You can find a list of all the supported comparison functions [here](https://pkg.go.dev/text/template#hdr-Functions).
```
{{ if (gt $values.A.Value 80.0) -}}
high
{{ else if (gt $values.A.Value 50.0) -}}
medium
{{ else -}}
low
{{- end }}
```
### Print all labels from a classic condition
You cannot use `$labels` to print labels from the query if you are using classic conditions, and must use `$values` instead. The reason for this is classic conditions discard these labels to enforce uni-dimensional behavior (at most one alert per alert rule). If classic conditions didn't discard these labels, then queries that returned many time series would cause alerts to flap between firing and resolved constantly as the labels would change every time the alert rule was evaluated.
Instead, the `$values` variable contains the reduced values of all time series for all conditions that are firing. For example, if you have an alert rule with a query A that returns two time series, and a classic condition B with two conditions, then `$values` would contain `B0`, `B1`, `B2` and `B3`. If the classic condition B had just one condition, then `$values` would contain just `B0` and `B1`.
To print all labels of all firing time series use the following template (make sure to replace `B` in the regular expression with the Ref ID of the classic condition if it's different):
```
{{ range $k, $v := $values -}}
{{ if (match "B[0-9]+" $k) -}}
{{ $k }}: {{ $v.Labels }}{{ end }}
{{ end }}
```
For example, a classic condition for two time series exceeding a single condition would print:
```
B0: instance=server1
B1: instance=server2
```
If the classic condition has two or more conditions, and a time series exceeds multiple conditions at the same time, then its labels will be duplicated for each condition that is exceeded:
```
B0: instance=server1
B1: instance=server2
B2: instance=server1
B3: instance=server2
```
If you need to print unique labels you should consider changing your alert rules from uni-dimensional to multi-dimensional instead. You can do this by replacing your classic condition with reduce and math expressions.
### Print all values from a classic condition
To print all values from a classic condition take the previous example and replace `$v.Labels` with `$v.Value`:
```
{{ range $k, $v := $values -}}
{{ if (match "B[0-9]+" $k) -}}
{{ $k }}: {{ $v.Value }}{{ end }}
{{ end }}
```
For example, a classic condition for two time series exceeding a single condition would print:
```
B0: 81.2345
B1: 84.5678
```
If the classic condition has two or more conditions, and a time series exceeds multiple conditions at the same time, then `$values` will contain the values of all conditions:
```
B0: 81.2345
B1: 92.3456
B2: 84.5678
B3: 95.6789
```
## Variables
The following variables are available to you when templating labels and annotations:
### The labels variable
The `$labels` variable contains the labels from the query. For example, a query that checks if an instance is down might return an instance label with the name of the instance that is down. For example, suppose you have an alert rule that fires when one of your instances has been down for more than 5 minutes. You want to add a summary to the alert that tells you which instance is down. With the `$labels` variable, you can create a summary that prints the instance label in the summary:
The `$labels` variable contains all labels from the query. For example, suppose you have a query that returns CPU usage for all of your servers, and you have an alert rule that fires when any of your servers have exceeded 80% CPU usage for the last 5 minutes. You want to add a summary annotation to the alert that tells you which server is experiencing high CPU usage. With the `$labels` variable you can write a template that prints a human-readable sentence such as:
```
Instance {{ $labels.instance }} has been down for more than 5 minutes
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes
```
### Labels with dots
If the label you want to print contains a dot (full stop or period) in its name using the same dot in the template will not work:
```
Instance {{ $labels.instance.name }} has been down for more than 5 minutes
```
This is because the template is attempting to use a non-existing field called `name` in `$labels.instance`. You should instead use the `index` function, which prints the label `instance.name` in the `$labels` variable:
```
Instance {{ index $labels "instance.name" }} has been down for more than 5 minutes
```
> If you are using a classic condition then `$labels` will not contain any labels from the query. Classic conditions discard these labels in order to enforce uni-dimensional behavior (at most one alert per alert rule). If you want to use labels from the query in your template then use the example [here](#print-all-labels-from-a-classic-condition).
### The value variable
The `$value` variable works different from Prometheus. In Prometheus `$value` is a floating point number containing the value of the expression, but in Grafana it is a string containing the labels and values of all Threshold, Reduce and Math expressions, and Classic Conditions for this alert rule. It does not contain the results of queries, as these can return anywhere from 10s to 10,000s of rows or metrics.
The `$value` variable is a string containing the labels and values of all instant queries; threshold, reduce and math expressions, and classic conditions in the alert rule. It does not contain the results of range queries, as these can return anywhere from 10s to 10,000s of rows or metrics. If it did, for especially large queries a single alert could use 10s of MBs of memory and Grafana would run out of memory very quickly.
If you were to use the `$value` variable in the summary of an alert:
To print the `$value` variable in the summary you would write something like this:
```
{{ $labels.service }} has over 5% of responses with 5xx errors: {{ $value }})
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ $value }})
```
The summary might look something like the following:
And would look something like this:
```
api has an over 5% of responses with 5xx errors: [ var='B' labels={service=api} value=6.789 ]
CPU usage for instance1 has exceeded 80% for the last 5 minutes: [ var='A' labels={instance=instance1} value=81.234 ]
```
Here `var='B'` refers to the expression with the RefID B. In Grafana, all queries and expressions are identified by a RefID that identifies each query and expression in an alert rule. Similarly `labels={service=api}` refers to the labels, and `value=6.789` refers to the value.
Here `var='A'` refers to the instant query with Ref ID A, `labels={instance=instance1}` refers to the labels, and `value=81.234` refers to the average CPU usage over the last 5 minutes.
You might have observed that there is no RefID A. That is because in most alert rules the RefID A refers to a query, and since queries can return many rows or time series they are not included in `$value`.
If you want to print just some of the string instead of the full string then use the `$values` variable. It contains the same information as `$value`, but in a structured table, and is much easier to use then writing a regular expression to match just the text you want.
### The values variable
If the `$value` variable contains more information than you need, you can instead print the labels and value of individual expressions using `$values`. Unlike `$value`, the `$values` variable is a table of objects containing the labels and floating point values of each expression, indexed by their RefID.
The `$values` variable is a table containing the labels and floating point values of all instant queries and expressions, indexed by their Ref IDs.
If you were to print the value of the expression with RefID `B` in the summary of the alert:
To print the value of the instant query with Ref ID A:
```
{{ $labels.service }} has over 5% of responses with 5xx errors: {{ $values.B }}%
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ index $values "A" }})
```
The summary will contain just the value:
For example, given an alert with the labels `instance=server1` and an instant query with the value `81.2345`, this would print:
```
api has over 5% of responses with 5xx errors: 6.78912%
CPU usage for instance1 has exceeded 80% for the last 5 minutes: 81.2345
```
However, while `{{ $values.B }}` prints the number 6.78912, it is actually a string as you are printing the object that contains both the labels and value for RefID B, not the floating point value of B. To use the floating point value of RefID B you must use the `Value` field from `$values.B`.
If you were to print the humanized floating point value in the summary of an alert:
If the query in Ref ID A is a range query rather than an instant query then add a reduce expression with Ref ID B and replace `(index $values "A")` with `(index $values "B")`:
```
{{ $labels.service }} has over 5% of responses with 5xx errors: {{ humanize $values.B.Value }}%
```
The summary will contain the humanized value:
```
api has over 5% of responses with 5xx errors: 6.789%
```
You can also compare the floating point value using the `eq`, `ne`, `lt`, `le`, `gt` and `ge` comparison operators:
```
{{ if gt $values.B.Value 50.0 -}}
Critical 5xx error rate
{{ else -}}
Elevated 5xx error rate
{{ end }}
```
When using comparison operators with `$values` make sure to compare it to a floating point number such as `50.0` and not an integer such as `50`. Go templates do not support implicit type coercion, and comparing a floating point number to an integer will break your template.
### No data, execution errors and timeouts
If the query in your alert rule returns no data, or fails because of a datasource error or timeout, then any Threshold, Reduce or Math expressions that use that query will also return no data or an error. When this happens these expression will be absent from `$values`. It is good practice to check that a RefID is present before using it as otherwise your template will break should your query return no data or an error. You can do this using an if statement:
```
{{ if $values.B }}{{ $labels.service }} has over 5% of responses with 5xx errors: {{ humanizePercentage $values.B.Value }}{{ end }}
```
## Classic Conditions
If the rule uses Classic Conditions instead of Threshold, Reduce and Math expressions, then the `$values` variable is indexed by both the Ref ID and position of the condition in the Classic Condition. For example, if you have a Classic Condition with RefID B containing two conditions, then `$values` will contain two conditions `B0` and `B1`.
```
The first condition is {{ $values.B0 }}, and the second condition is {{ $values.B1 }}
```
With classic conditions, labels from the query are not available in `$labels` variable, because single alert instance are generated. Instead, you can retrieve the labels from the `$values` variable.
```
{{ range $k, $v := $values }}
The value is {{ $v }} and the labels are {{ $v.Labels }}
{{ end }}
CPU usage for {{ index $labels "instance" }} has exceeded 80% for the last 5 minutes: {{ index $values "B" }})
```
## Functions
The following functions are also available when expanding labels and annotations:
The following functions are available to you when templating labels and annotations:
### args
The `args` function translates a list of objects to a map with keys arg0, arg1 etc. This is intended to allow multiple arguments to be passed to templates.
#### Example
The `args` function translates a list of objects to a map with keys arg0, arg1 etc. This is intended to allow multiple arguments to be passed to templates:
```
{{define "x"}}{{.arg0}} {{.arg1}}{{end}}{{template "x" (args 1 "2")}}
@ -176,9 +278,7 @@ The `args` function translates a list of objects to a map with keys arg0, arg1 e
### externalURL
The `externalURL` function returns the external URL of the Grafana server as configured in the ini file(s).
#### Example
The `externalURL` function returns the external URL of the Grafana server as configured in the ini file(s):
```
{{ externalURL }}
@ -190,9 +290,7 @@ https://example.com/grafana
### graphLink
The `graphLink` function returns the path to the graphical view in [Explore][explore] for the given expression and data source.
#### Example
The `graphLink` function returns the path to the graphical view in [Explore][explore] for the given expression and data source:
```
{{ graphLink "{\"expr\": \"up\", \"datasource\": \"gdev-prometheus\"}" }}
@ -204,9 +302,7 @@ The `graphLink` function returns the path to the graphical view in [Explore][exp
### humanize
The `humanize` function humanizes decimal numbers.
#### Example
The `humanize` function humanizes decimal numbers:
```
{{ humanize 1000.0 }}
@ -218,9 +314,7 @@ The `humanize` function humanizes decimal numbers.
### humanize1024
The `humanize1024` works similar to `humanize` but but uses 1024 as the base rather than 1000.
#### Example
The `humanize1024` works similar to `humanize` but but uses 1024 as the base rather than 1000:
```
{{ humanize1024 1024.0 }}
@ -232,9 +326,7 @@ The `humanize1024` works similar to `humanize` but but uses 1024 as the base rat
### humanizeDuration
The `humanizeDuration` function humanizes a duration in seconds.
#### Example
The `humanizeDuration` function humanizes a duration in seconds:
```
{{ humanizeDuration 60.0 }}
@ -246,9 +338,7 @@ The `humanizeDuration` function humanizes a duration in seconds.
### humanizePercentage
The `humanizePercentage` function humanizes a ratio value to a percentage.
#### Example
The `humanizePercentage` function humanizes a ratio value to a percentage:
```
{{ humanizePercentage 0.2 }}
@ -260,9 +350,7 @@ The `humanizePercentage` function humanizes a ratio value to a percentage.
### humanizeTimestamp
The `humanizeTimestamp` function humanizes a Unix timestamp.
#### Example
The `humanizeTimestamp` function humanizes a Unix timestamp:
```
{{ humanizeTimestamp 1577836800.0 }}
@ -274,9 +362,7 @@ The `humanizeTimestamp` function humanizes a Unix timestamp.
### match
The `match` function matches the text against a regular expression pattern.
#### Example
The `match` function matches the text against a regular expression pattern:
```
{{ match "a.*" "abc" }}
@ -288,9 +374,7 @@ true
### pathPrefix
The `pathPrefix` function returns the path of the Grafana server as configured in the ini file(s).
#### Example
The `pathPrefix` function returns the path of the Grafana server as configured in the ini file(s):
```
{{ pathPrefix }}
@ -302,9 +386,7 @@ The `pathPrefix` function returns the path of the Grafana server as configured i
### tableLink
The `tableLink` function returns the path to the tabular view in [Explore][explore] for the given expression and data source.
#### Example
The `tableLink` function returns the path to the tabular view in [Explore][explore] for the given expression and data source:
```
{{ tableLink "{\"expr\": \"up\", \"datasource\": \"gdev-prometheus\"}" }}
@ -316,9 +398,7 @@ The `tableLink` function returns the path to the tabular view in [Explore][explo
### title
The `title` function capitalizes the first character of each word.
#### Example
The `title` function capitalizes the first character of each word:
```
{{ title "hello, world!" }}
@ -330,9 +410,7 @@ Hello, World!
### toLower
The `toLower` function returns all text in lowercase.
#### Example
The `toLower` function returns all text in lowercase:
```
{{ toLower "Hello, world!" }}
@ -344,9 +422,7 @@ hello, world!
### toUpper
The `toUpper` function returns all text in uppercase.
#### Example
The `toUpper` function returns all text in uppercase:
```
{{ toUpper "Hello, world!" }}
@ -358,9 +434,7 @@ HELLO, WORLD!
### reReplaceAll
The `reReplaceAll` function replaces text matching the regular expression.
#### Example
The `reReplaceAll` function replaces text matching the regular expression:
```
{{ reReplaceAll "localhost:(.*)" "example.com:$1" "localhost:8080" }}