grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-25 18:55:37 -06:00

Author	SHA1	Message	Date
Yuri Tseretyan	f6a46744a6	Alerting: Support hysteresis command expression (#75189 ) Backend: * Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command. - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels. - add rule_fingerprint column to alert_instance - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it. - add AlertingResultsFromRuleState that implements the new interface in eval package - update getExprRequest to patch the hysteresis command. * Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition. Frontend: * Add hysteresis option to Threshold in UI. It's called "Recovery Threshold" * Add test for getUnloadEvaluatorTypeFromCondition * Hide hysteresis in panel expressions * Refactor isInvalid and add test for it * Remove unnecesary React.memo * Add tests for updateEvaluatorConditions --------- Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>	2024-01-04 11:47:13 -05:00
gotjosh	e877174501	Alerting: Expose metrics for Alertmanager Alerts - `grafana_alerting_alertmanager_alerts` (#75802 ) * Alerting: Expose metrics for Alertmanager Alerts In Grafana, the alert evaluation and alert delivery are combined. We're always used a metric named `grafana_alerting_alerts` to get a sense of what are the alerts that are currently firing (these come from the evaluation side) and opted to not map the alertmanager alerts metric directly. I think it's important that we make a disction between alerts that happen at evaluation vs alerts that are received for delivery by the internal Alertmanager as we have options to skip the delivery of these alerts to the internal alertmanager altogether.	2023-10-02 16:36:23 +01:00
gotjosh	59694fb2be	Alerting: Don't use a separate collection system for metrics (#75296 ) * Alerting: Don't use a separate collection system for metrics The state package had a metric collection system that ran every 15s updating the values of the metrics - there is a common pattern for this in the Prometheus ecosystem called "collectors". I have removed the behaviour of using a time-based interval to "set" the metrics in favour of a set of functions as the "value" that get called at scrape time.	2023-09-25 10:27:30 +01:00
SatVeer Singh	1bfa3a0f1e	Chore: Replace go-multierror with errors package (#66432 ) * code refactor and type assertions added to tests * no-lint rule added for specific line	2023-06-19 12:29:45 +03:00
Yuri Tseretyan	baffe83da6	Alerting: Improve performance of cache.getOrCreate (#63909 ) * move expansion of labels and annotations outside of mutex lock * propagate struct but not pointer	2023-06-15 09:37:47 -04:00
Matthew Jacobson	b9dc04139a	Alerting: Respect "For" Duration for NoData alerts (#65574 ) * Alerting: Respect "For" Duration for NoData alerts This change modifies `resultNoData` to be more inline with the logic of the other state handlers. The main effects of this are: 1) NoData states with NoDataState config set to Alerting will respect "For" duration. 2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK. 3) Better state transition logging.	2023-03-31 19:05:15 +03:00
George Robinson	0c8876c3a2	Alerting: Return errors when expanding templates (#63662 ) This commit changes the state package so that errors encountered while expanding templates for custom labels and annotations are returned from the function. This is not used at present, but will be used in the future as we look at how to offer better feedback to users who don't have access to logs, for example our customers who use Hosted Grafana.	2023-03-08 12:25:02 +00:00
George Robinson	ed71012ced	Alerting: Fix Classic Conditions $values variable (#64243 ) This commit fixes a bug in the $values variable in notification templates when using Classic Conditions. Since Classic Conditions are not multi-dimensional, the values of each series that exceeded the condition should be available as a RefID and offset. For example, B0, B1, etc. However, this bug meant that instead just a single condition would be printed as B, not B0.	2023-03-06 12:08:00 -05:00
George Robinson	0a01391ebe	Alerting: Small readability improvements to template.go (#63422 ) * Alerting: Small readability improvements to template.go * Fix lint	2023-02-20 09:24:11 +00:00
George Robinson	9e86916d48	Alerting: Move templating to template package (#63347 ) This commit moves templating from the state package to a sub-package called template. This sub-package will be the logical package for future ease-of-use improvements to templating custom annotations and labels.	2023-02-16 17:16:36 +01:00
Steve Simpson	4d1a2c3370	Alerting: Move `rule_groups_rules` metric from State to Scheduler. (#63144 ) The `rule_groups_rules` metric is currently defined and computed by `State`. It makes more sense for this metric to be computed off of the configured rule set, not based on the rule evaluation state. There could be an edge condition where a rule does not have a state yet, and so is uncounted. Additionally, we would like this metric (and others), to have a `rule_group` label, and this is much easier to achieve if the metric is produced from the `Scheduler` package.	2023-02-09 17:05:19 +01:00
Yuri Tseretyan	9d57b1c72e	Alerting: Do not persist noop transition from Normal state. (#61201 ) * add feature flag `alertingNoNormalState` * update instance database to support exclusion of state in list operation * do not save normal state and delete transitions to normal * update get methods to filter out normal state	2023-01-13 18:29:29 -05:00
Denis Limarev	90badc8729	Performance: Add preallocation for some slices (#59593 )	2023-01-11 18:03:37 +01:00
Yuri Tseretyan	3621cf5a12	Alerting: Update handling of stale state (#58276 ) * delete all stale states in one lock * do not use touched states to detect stale rely only on LastEvaluationTime maintained correctly * fix tests to use correct eval time * delete unused method	2022-11-07 11:03:53 -05:00
Alexander Weaver	de46c1b002	Alerting: Improve logs in state manager and historian (#57374 ) * Touch up log statements, fix casing, add and normalize contexts * Dedicated logger for dashboard resolver * Avoid injecting logger to historian * More minor log touch-ups * Dedicated logger for state manager * Use rule context in annotation creator * Rename base logger and avoid redundant contextual loggers	2022-10-21 16:16:51 -05:00
Alexander Weaver	3ddb28bad9	Find-and-replace 'err' logs to 'error' to match log search conventions (#57309 )	2022-10-19 17:36:54 -04:00
George Robinson	52965de369	Alerting: Add doc comments to state struct and normalize fields (#56647 )	2022-10-11 09:30:33 +01:00
George Robinson	802d67eeca	Alerting: Support values in notification templates (#56457 ) We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't. Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via. This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!	2022-10-10 13:40:21 +01:00
Yuriy Tseretyan	e2f1201382	Alerting: Fix migration to not add label "alertname" (#56509 ) * do not add label alertname because it is overridden in state manager anyway * update state manager to not consider labels with same value as dupe	2022-10-07 15:06:53 -04:00
Yuriy Tseretyan	7b6437402a	Alerting: Refactor state manager's cache (#56197 ) * remove ResetAllStates because it's not used * refactor cache to accept logs, metrics and url as method args * update manager Warm method to set the entire state at once * remove unused reset method * introduce ruleStates * change getOrCreate to belong to ruleStates * update Get to not return error	2022-10-06 15:30:12 -04:00
Yuriy Tseretyan	03e746d9df	Alerting: Delete state from the database on reset (#53919 ) * make ResetStatesByRuleUID return states * delete rule states when reset * rule eval routine to clean up the state only when rule is deleted	2022-08-25 14:12:22 -04:00
Yuriy Tseretyan	e5e8747ee9	Alerting: Update state manager to accept reserved labels (#52189 ) * add tests for cache getOrCreate * update ProcessEvalResults to accept extra lables * extract to getRuleExtraLabels * move populating of constant rule labels to extra labels	2022-07-14 15:59:59 -04:00
George Robinson	43358c7248	Alerting: Keep private annotations across evaluations (#49080 )	2022-05-18 11:21:18 +02:00
idafurjes	56c3875bb9	Chore: Remove context.TODO (#43458 ) * Remove context.TODO() from services * Fix live test	2021-12-28 10:26:18 +01:00
Santiago	562cd9e44e	Alerting template functions (#39261 ) * Alerting: (wip) add template funcs * Alerting: (wip) numeric template functions * Alerting: (wip) template functions * Test for the "args" function * Alerting: (wip) Documentation for template functions * Alerting: template functions - refactor * code review changes * disable linter error * Use Prometheus implementation of TemplateExpander * Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * change templateCaptureValue to support using template functions * Update pkg/services/ngalert/state/template.go Co-authored-by: gotjosh <josue.abreu@gmail.com> * Test and documentation added for reReplaceAll template function * complete missing functions, documentation and tests * Use the alert instance's evaluation time for expanding the template * strvalue graphlink and tablelink functions * delete duplicate test * make strvalue return an empty string Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> Co-authored-by: gotjosh <josue.abreu@gmail.com>	2021-10-04 15:04:37 -03:00
Santiago	c3cf95f383	Revert "Alerting: add template funcs (#38404 )" (#39258 ) This reverts commit `d6fb0181fb`.	2021-09-15 19:47:22 -03:00
Santiago	d6fb0181fb	Alerting: add template funcs (#38404 ) * Alerting: (wip) add template funcs * Alerting: (wip) numeric template functions * Alerting: (wip) template functions * Test for the "args" function * Alerting: (wip) Documentation for template functions * Alerting: template functions - refactor * code review changes * disable linter error * Use Prometheus implementation of TemplateExpander * Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>	2021-09-15 18:48:29 -03:00
gotjosh	a2f4344bf2	Alerting: Refactor & fix unified alerting metrics structure (#39151 ) * Alerting: Refactor & fix unified alerting metrics structure Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance. This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.	2021-09-14 12:55:01 +01:00
George Robinson	5caf6cb369	Change templateCaptureValue to support using template functions (#38766 ) * Change templateCaptureValue to support using template functions This commit changes templateCaptureValue to use float64 for the value instead of float64. This change means that annotations and labels can use the float64 value with functions such as printf and avoid having to check for nil. It also means that absent values are now printed as 0. Use math.NaN() instead of 0 for absent value	2021-09-08 10:46:15 +01:00
David Parrott	b5f464412d	Alerting: automatically remove stale alerting states (#36767 ) * initial attempt at automatic removal of stale states * test case, need espected states * finish unit test * PR feedback * still multiply by time.second * pr feedback	2021-07-26 18:12:04 +02:00
George Robinson	2f4c893cf3	Expand the value string in annotations and labels of alerts (#37051 ) This commit makes it possible to use the value string in annotations and labels for alerts with "{{ $value }}"	2021-07-22 15:20:44 +01:00
George Robinson	456dac1303	Expand the value of math and reduce expressions in annotations and labels (#36611 ) * Expand the value of math and reduce expressions in annotations and labels This commit makes it possible to use the values of reduce and math expressions in annotations and labels via their RefIDs. It uses the Stringer interface to ensure that "{{ $values.A }}" still prints the value in decimal format while also making the labels for each RefID available with "{{ $values.A.Labels }}" and the float64 value with "{{ $values.A.Value }}"	2021-07-15 13:10:56 +01:00
David Parrott	310d3ebe3d	change template expansion missing value handling (#36679 )	2021-07-13 06:57:18 -07:00
Ganesh Vernekar	dcd4bf1615	Alerting: Fill the empty GeneratorURL (#35740 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-16 15:34:12 +05:30
Ganesh Vernekar	8417088969	Alerting: Expand `{{$labels.xyz}}` template in labels and annotations (#35159 ) * Alerting: Expand `{{$labels.xyz}}` template in labels and annotations Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix annotation not updating for same alert Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-03 19:24:36 +02:00
David Parrott	20d356947c	set state correctly and test (#34680 )	2021-05-26 11:37:42 -07:00
David Parrott	25485100b0	Alerting: Trim results when at processing instead of on ticker (#34248 ) * Trim results when at processing instead of on ticker * User RWMutex correctly * remove comment	2021-05-18 10:56:14 -07:00
Owen Diehl	1367f7171e	Alerting/ruler metrics (#34144 ) * adds active configurations metric * rule evaluation metrics * ruler metrics * pr feedback	2021-05-14 16:13:44 -04:00
Kyle Brandt	fae093bbe2	Alerting: Fix state cache getOrCreate panic (#33777 )	2021-05-06 14:35:52 +02:00
David Parrott	39099bf3c0	Alerting nested state cache (#33666 ) * nest cache by orgID, ruleUID, stateID * update accessors to use new cache structure * test and linter fixup * fix panic Co-authored-by: Kyle Brandt <kyle@grafana.com> * add comment to identify what's going on with nested maps in cache Co-authored-by: Kyle Brandt <kyle@grafana.com>	2021-05-04 09:57:50 -07:00
Kyle Brandt	48358efc13	Alerting: remove State cache entries on Ruler Delete (#33638 ) for https://github.com/grafana/alerting-squad/issues/133	2021-05-03 14:01:33 -04:00
Owen Diehl	070627d11e	better handle metrics for state transitions (#33648 )	2021-05-03 11:57:24 -04:00
Owen Diehl	5e48b54549	Alerting/metrics (#33547 ) * moves alerting metrics to their own pkg * adds grafana_alerting_alerts (by state) metric * alerts_received_{total,invalid} * embed alertmanager alerting struct in ng metrics & remove duplicated notification metrics (already embed alertmanager notifier metrics) * use silence metrics from alertmanager lib * fix - manager has metrics * updates ngalert tests * comment lint Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * cleaner prom registry code * removes ngalert global metrics * new registry use in all tests * ngalert metrics impl service, hack testinfra code to prevent duplicate metric registrations * nilmetrics unexported	2021-04-30 12:28:06 -04:00
Kyle Brandt	914443c816	Alerting: Fix state cache id duplication (#33480 )	2021-04-28 11:42:19 -04:00
David Parrott	788bc2a793	Alerting: refactor state tracker (#33292 ) * set processing time * merge labels and set on response * use state cache for adding alerts to rules * minor cleanup * add support for NoData and Error results * rename test * bring in changes from other PRs tha have been merged * pr feedback * add integration test * close state tracker cleanup on context.Done * fixup test * rename state tracker * set EvaluationDuration on Result * default labels set as constants * separate cache and state from manager * use RWMutex in cache	2021-04-23 21:32:25 +02:00

45 Commits