Commit Graph

35 Commits

Author SHA1 Message Date
Steve Simpson
4d1a2c3370
Alerting: Move rule_groups_rules metric from State to Scheduler. (#63144)
The `rule_groups_rules` metric is currently defined and computed by `State`.
It makes more sense for this metric to be computed off of the configured rule
set, not based on the rule evaluation state. There could be an edge condition
where a rule does not have a state yet, and so is uncounted.

Additionally, we would like this metric (and others), to have a `rule_group`
label, and this is much easier to achieve if the metric is produced from the
`Scheduler` package.
2023-02-09 17:05:19 +01:00
Yuri Tseretyan
9d57b1c72e
Alerting: Do not persist noop transition from Normal state. (#61201)
* add feature flag `alertingNoNormalState`
* update instance database to support exclusion of state in list operation
* do not save normal state and delete transitions to normal
* update get methods to filter out normal state
2023-01-13 18:29:29 -05:00
Denis Limarev
90badc8729
Performance: Add preallocation for some slices (#59593) 2023-01-11 18:03:37 +01:00
Yuri Tseretyan
3621cf5a12
Alerting: Update handling of stale state (#58276)
* delete all stale states in one lock
* do not use touched states to detect stale rely only on LastEvaluationTime maintained correctly
* fix tests to use correct eval time
* delete unused method
2022-11-07 11:03:53 -05:00
Alexander Weaver
de46c1b002
Alerting: Improve logs in state manager and historian (#57374)
* Touch up log statements, fix casing, add and normalize contexts

* Dedicated logger for dashboard resolver

* Avoid injecting logger to historian

* More minor log touch-ups

* Dedicated logger for state manager

* Use rule context in annotation creator

* Rename base logger and avoid redundant contextual loggers
2022-10-21 16:16:51 -05:00
Alexander Weaver
3ddb28bad9
Find-and-replace 'err' logs to 'error' to match log search conventions (#57309) 2022-10-19 17:36:54 -04:00
George Robinson
52965de369
Alerting: Add doc comments to state struct and normalize fields (#56647) 2022-10-11 09:30:33 +01:00
George Robinson
802d67eeca
Alerting: Support values in notification templates (#56457)
We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't.

Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via.

This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!
2022-10-10 13:40:21 +01:00
Yuriy Tseretyan
e2f1201382
Alerting: Fix migration to not add label "alertname" (#56509)
* do not add label alertname because it is overridden in state manager anyway
* update state manager to not consider labels with same value as dupe
2022-10-07 15:06:53 -04:00
Yuriy Tseretyan
7b6437402a
Alerting: Refactor state manager's cache (#56197)
* remove ResetAllStates because it's not used
* refactor cache to accept logs, metrics and url as method args
* update manager Warm method to set the entire state at once
* remove unused reset method
* introduce ruleStates
* change getOrCreate to belong to ruleStates
* update Get to not return error
2022-10-06 15:30:12 -04:00
Yuriy Tseretyan
03e746d9df
Alerting: Delete state from the database on reset (#53919)
* make ResetStatesByRuleUID return states
* delete rule states when reset
* rule eval routine to clean up the state only when rule is deleted
2022-08-25 14:12:22 -04:00
Yuriy Tseretyan
e5e8747ee9
Alerting: Update state manager to accept reserved labels (#52189)
* add tests for cache getOrCreate
* update ProcessEvalResults to accept extra lables
* extract to getRuleExtraLabels
* move populating of constant rule labels to extra labels
2022-07-14 15:59:59 -04:00
George Robinson
43358c7248
Alerting: Keep private annotations across evaluations (#49080) 2022-05-18 11:21:18 +02:00
idafurjes
56c3875bb9
Chore: Remove context.TODO (#43458)
* Remove context.TODO() from services

* Fix live test
2021-12-28 10:26:18 +01:00
Santiago
562cd9e44e
Alerting template functions (#39261)
* Alerting: (wip) add template funcs

* Alerting: (wip) numeric template functions

* Alerting: (wip) template functions

* Test for the "args" function

* Alerting: (wip) Documentation for template functions

* Alerting: template functions - refactor

* code review changes

* disable linter error

* Use Prometheus implementation of TemplateExpander

* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* change templateCaptureValue to support using template functions

* Update pkg/services/ngalert/state/template.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Test and documentation added for reReplaceAll template function

* complete missing functions, documentation and tests

* Use the alert instance's evaluation time for expanding the template

* strvalue graphlink and tablelink functions

* delete duplicate test

* make strvalue return an empty string

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2021-10-04 15:04:37 -03:00
Santiago
c3cf95f383
Revert "Alerting: add template funcs (#38404)" (#39258)
This reverts commit d6fb0181fb.
2021-09-15 19:47:22 -03:00
Santiago
d6fb0181fb
Alerting: add template funcs (#38404)
* Alerting: (wip) add template funcs

* Alerting: (wip) numeric template functions

* Alerting: (wip) template functions

* Test for the "args" function

* Alerting: (wip) Documentation for template functions

* Alerting: template functions - refactor

* code review changes

* disable linter error

* Use Prometheus implementation of TemplateExpander

* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2021-09-15 18:48:29 -03:00
gotjosh
a2f4344bf2
Alerting: Refactor & fix unified alerting metrics structure (#39151)
* Alerting: Refactor & fix unified alerting metrics structure

Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.

This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
2021-09-14 12:55:01 +01:00
George Robinson
5caf6cb369
Change templateCaptureValue to support using template functions (#38766)
* Change templateCaptureValue to support using template functions

This commit changes templateCaptureValue to use float64 for the value
instead of *float64. This change means that annotations and labels can
use the float64 value with functions such as printf and avoid having to
check for nil. It also means that absent values are now printed as 0.

* Use math.NaN() instead of 0 for absent value
2021-09-08 10:46:15 +01:00
David Parrott
b5f464412d
Alerting: automatically remove stale alerting states (#36767)
* initial attempt at automatic removal of stale states

* test case, need espected states

* finish unit test

* PR feedback

* still multiply by time.second

* pr feedback
2021-07-26 18:12:04 +02:00
George Robinson
2f4c893cf3
Expand the value string in annotations and labels of alerts (#37051)
This commit makes it possible to use the value string in
annotations and labels for alerts with "{{ $value }}"
2021-07-22 15:20:44 +01:00
George Robinson
456dac1303
Expand the value of math and reduce expressions in annotations and labels (#36611)
* Expand the value of math and reduce expressions in annotations and labels

This commit makes it possible to use the values of reduce and math
expressions in annotations and labels via their RefIDs. It uses the
Stringer interface to ensure that "{{ $values.A }}" still prints the
value in decimal format while also making the labels for each RefID
available with "{{ $values.A.Labels }}" and the float64 value with
"{{ $values.A.Value }}"
2021-07-15 13:10:56 +01:00
David Parrott
310d3ebe3d
change template expansion missing value handling (#36679) 2021-07-13 06:57:18 -07:00
Ganesh Vernekar
dcd4bf1615
Alerting: Fill the empty GeneratorURL (#35740)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-16 15:34:12 +05:30
Ganesh Vernekar
8417088969
Alerting: Expand {{$labels.xyz}} template in labels and annotations (#35159)
* Alerting: Expand `{{$labels.xyz}}` template in labels and annotations

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix annotation not updating for same alert

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-03 19:24:36 +02:00
David Parrott
20d356947c
set state correctly and test (#34680) 2021-05-26 11:37:42 -07:00
David Parrott
25485100b0
Alerting: Trim results when at processing instead of on ticker (#34248)
* Trim results when at processing instead of on ticker

* User RWMutex correctly

* remove comment
2021-05-18 10:56:14 -07:00
Owen Diehl
1367f7171e
Alerting/ruler metrics (#34144)
* adds active configurations metric

* rule evaluation metrics

* ruler metrics

* pr feedback
2021-05-14 16:13:44 -04:00
Kyle Brandt
fae093bbe2
Alerting: Fix state cache getOrCreate panic (#33777) 2021-05-06 14:35:52 +02:00
David Parrott
39099bf3c0
Alerting nested state cache (#33666)
* nest cache by orgID, ruleUID, stateID

* update accessors to use new cache structure

* test and linter fixup

* fix panic

Co-authored-by: Kyle Brandt <kyle@grafana.com>

* add comment to identify what's going on with nested maps in cache

Co-authored-by: Kyle Brandt <kyle@grafana.com>
2021-05-04 09:57:50 -07:00
Kyle Brandt
48358efc13
Alerting: remove State cache entries on Ruler Delete (#33638)
for https://github.com/grafana/alerting-squad/issues/133
2021-05-03 14:01:33 -04:00
Owen Diehl
070627d11e
better handle metrics for state transitions (#33648) 2021-05-03 11:57:24 -04:00
Owen Diehl
5e48b54549
Alerting/metrics (#33547)
* moves alerting metrics to their own pkg

* adds grafana_alerting_alerts (by state) metric

* alerts_received_{total,invalid}

* embed alertmanager alerting struct in ng metrics & remove duplicated notification metrics (already embed alertmanager notifier metrics)

* use silence metrics from alertmanager lib

* fix - manager has metrics

* updates ngalert tests

* comment lint
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>

* cleaner prom registry code

* removes ngalert global metrics

* new registry use in all tests

* ngalert metrics impl service, hack testinfra code to prevent duplicate metric registrations

* nilmetrics unexported
2021-04-30 12:28:06 -04:00
Kyle Brandt
914443c816
Alerting: Fix state cache id duplication (#33480) 2021-04-28 11:42:19 -04:00
David Parrott
788bc2a793
Alerting: refactor state tracker (#33292)
* set processing time

* merge labels and set on response

* use state cache for adding alerts to rules

* minor cleanup

* add support for NoData and Error results

* rename test

* bring in changes from other PRs tha have been merged

* pr feedback

* add integration test

* close state tracker cleanup on context.Done

* fixup test

* rename state tracker

* set EvaluationDuration on Result

* default labels set as constants

* separate cache and state from manager

* use RWMutex in cache
2021-04-23 21:32:25 +02:00