grafana-mixin: Fix expression for GrafanaRequestsFailing alert (#63382)

Fix expression for GrafanaRequestsFailing alert

The intent of the alert is to get the ratio of 5xx to all status codes
[^1]. With the original expression, the left hand side can have more than
one row with the same labels except for the status code. This results in
a promql error because it is doing a many-to-one matching. Doing a sum
on the left hand side first should preserve the intent of the alert and
resolve the issue.

[^1]: https://github.com/grafana/grafana/pull/43116
This commit is contained in:
lpugoy 2023-05-04 22:35:36 +10:00 committed by GitHub
parent 3c42dea10b
commit c792af3ad0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -11,8 +11,8 @@
{ {
alert: 'GrafanaRequestsFailing', alert: 'GrafanaRequestsFailing',
expr: ||| expr: |||
100 * namespace_job_handler_statuscode:grafana_http_request_duration_seconds_count:rate5m{handler!~"/api/datasources/proxy/:id.*|/api/ds/query|/api/tsdb/query", status_code=~"5.."} 100 * sum without (status_code) (namespace_job_handler_statuscode:grafana_http_request_duration_seconds_count:rate5m{handler!~"/api/datasources/proxy/:id.*|/api/ds/query|/api/tsdb/query", status_code=~"5.."})
/ ignoring (status_code) group_left /
sum without (status_code) (namespace_job_handler_statuscode:grafana_http_request_duration_seconds_count:rate5m{handler!~"/api/datasources/proxy/:id.*|/api/ds/query|/api/tsdb/query"}) sum without (status_code) (namespace_job_handler_statuscode:grafana_http_request_duration_seconds_count:rate5m{handler!~"/api/datasources/proxy/:id.*|/api/ds/query|/api/tsdb/query"})
> %(grafanaRequestsFailingThresholdPercent)s > %(grafanaRequestsFailingThresholdPercent)s
||| % $._config, ||| % $._config,