Commit Graph

41 Commits

Author SHA1 Message Date
Alexander Weaver
2bfdda5b68
Alerting: Break dependency between state and image packages (#58381)
* Refactor state and manager to not depend directly on image interface

* Move generic errors to models package

* Move NotAvailableImageService to state as its only references are in state tests

* Move NoopImageService to state package

* Move mock to state package

* Fix linter error

* Fix comment styling

* Fix a couple added references introduced by rebase

* Empty commit to kick build
2022-11-09 15:06:49 -06:00
George Robinson
1290951b65
Alerting: Small improvements to staleResultsHandler (#58007) 2022-11-09 11:08:32 +00:00
Yuri Tseretyan
623de12e35
Alerting: Create AlertInstanceKey in one place (#58278)
* use method GetAlertInstanceKey
* do not add key if error
2022-11-07 09:35:29 -05:00
Alexander Weaver
cc8c1380e2
Alerting: Persist annotations from multidimensional rules in batches (#56575)
* Reduce piecemeal state fields

* Read data directly off state instead of rule

* Unify state and context into single struct

* Expose contextual information to layer above setNextState

* Work in terms of ContextualState and call historian in batches

* Call annotations service in batches

* Export format state and reason and remove workaround in unrelated test package

* Add new method to annotation service for batch inserting

* Fix loop variable aliasing bug caught by linter, didn't change behavior

* Incl timerange on annotation tests

* Insert one at a time if tags are present

* Point to rule from ContextualState rather than copy fields

* Build annotations and copy data prior to starting goroutine

* Rename to StateTransition

* Use new bulk-insert utility

* Remove rule from StateTransition and pass in directly to historian

* Simplify annotations logic since we have only one rule

* Fix logs and context, nilcheck, simplify method name

* Regenerate mock
2022-11-04 10:39:26 -05:00
George Robinson
215ffee437
Alerting: Fix screenshot is not taken for stale series (#57982) 2022-11-02 22:14:22 +00:00
George Robinson
52965de369
Alerting: Add doc comments to state struct and normalize fields (#56647) 2022-10-11 09:30:33 +01:00
George Robinson
802d67eeca
Alerting: Support values in notification templates (#56457)
We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't.

Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via.

This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!
2022-10-10 13:40:21 +01:00
George Robinson
5561f935e6
Alerting: Fix send resolved notifications (#54793)
This commit fixes a bug where we did not send resolved alerts to Alertmanager for resolved alert instances. This meant that resolved notifications did not have the annotations from the resolved state, and a result did not also have the resolved screenshot.
2022-09-15 17:25:05 +01:00
Yuriy Tseretyan
9f90a7b54d
Alerting: State manager to use InstanceStore (#53852)
* move saving the state to state manager when scheduler stops
* move saving state to ProcessEvalResults

* add GetRuleKey to State
* add LogContext to AlertRuleKey
2022-08-18 09:40:33 -04:00
George Robinson
34d45977ca
Alerting: Fix bug where state did not change between Alerting and Error (#52204)
This commit fixes a bug where the state did not change from Alerting to Error if the evaluation result returned an error, or from Error to Alerting if evaluations stopped returning errors.
2022-07-14 10:53:39 +01:00
Joe Blubaugh
9e8efaa459
Alerting: Add stored screenshot utilities to the channels package. (#49470)
Adds three functions:
`withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function.
`withStoredImage` does this for an image attached to a specific alert.
`openImage` finds and opens an image file on disk.

Moves `store.Image` to `models.Image`
Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods.
Updates all pkg/alert/notifier/channels to use withStoredImage routines.
2022-05-26 13:29:56 +08:00
Joe Blubaugh
1cc034d960
Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
Joe Blubaugh
687e79538b
Alerting: Add a general screenshot service and alerting-specific image service. (#49293)
This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service.

The screenshot package has the following services, most of which can be composed:

BrowserScreenshotService (Takes screenshots with headless Chrome)
CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService)
NoopScreenshotService (A no-op screenshot service for tests)
SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel)
ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable)
UploadingScreenshotService (A screenshot service that uploads taken screenshots)

The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296

This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.
2022-05-22 22:33:49 +08:00
Yuriy Tseretyan
4b417c8f3e
use NaN if condition value is nil (#48370) 2022-04-27 15:59:13 -03:00
Yuriy Tseretyan
884c885289
Alerting: Support OK option for Error state (#47670)
* support OK state for Error
2022-04-13 14:45:29 -04:00
gotjosh
cb6124c921
Alerting: Accurately set value for prom-compatible APIs (#47216)
* Alerting: Accurately set value for prom-compatible APIs

Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames.

* Fix an extra test

* Ensure a consitent ordering

* Address review comments

* address review comments
2022-04-05 19:36:42 +01:00
George Robinson
79769132c0
Alerting: Alert rule should wait For duration when execution error state is Alerting (#47052)
Alerting: Alert rule should wait For duration when execution error state is Alerting
2022-03-31 09:57:58 +01:00
gotjosh
84e5f336fe
Alerting: Classic conditions can now display multiple values (#46971)
* Alerting: Extract classic condition values by RefID

* uncapitalise function

* update documentation

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/state/state.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/state/state.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Run prettier

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2022-03-29 20:33:03 +01:00
gotjosh
a338c78ca8
Alerting: Remove internal labels from prometheus compatible API responses (#46548)
* Alerting: Remove internal labels from prometheus compatible API responses

* Appease the linter

* Fix integration tests

* Fix API documentation & linter

* move removal of internal labels to the models
2022-03-16 16:04:19 +00:00
George Robinson
789cfc31e3
Alerting: Fix use of > instead of >= when checking the For duration (#46011) 2022-03-01 17:06:42 +00:00
Yuriy Tseretyan
984c95de63
Do not store EvaluationString in Evaluation. (#44606)
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state
2022-02-02 19:18:20 +01:00
George Robinson
5e2280ceee
Add metrics to ngalert scheduler (#44602)
This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.
2022-01-31 16:56:43 +00:00
George Robinson
c932dc959c
Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630) 2021-12-03 09:55:16 +00:00
gotjosh
dd5a2e5128
Alerting: Clear alerting rule evaluation errors after intermittent failures (#42386)
* Alerting: Clear alerting rule evaluation errors after intermittent failures

When an alert transitioned in a way that `alerting -> error -> (alerting|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.
2021-11-26 17:58:19 +00:00
George Robinson
1b26d4d88e
Alerting: Create DatasourceError alert if evaluation returns error (#41869)
* Alerting: Create DatasourceError alert if evaluation returns error

* Alerting: Add docs for DatasourceError alert

* Alerting: Fix DatasourceError alert does not have dashboard_uid label

* Alerting: Add break when datasource_uid found

* Alerting: Update TestProcessEvalResults
2021-11-25 11:46:47 +01:00
Yuriy Tseretyan
610643a668
Alerting: Special alert instance if rule is in state NoData (#40540)
* do not suppress NoData state
* extract conversion of state to postable alert + tests
* create a special alert instance if nodata 
* use NoData when converting from Keep Last State instead of Alerting
* add silence during migration if NoData is mapped to KeepLastState.
2021-11-04 16:42:34 -04:00
George Robinson
27609dc2c5
Fix alerts with evaluation interval more than 30 seconds resolving in Alertmanager (#39513) 2021-09-22 14:55:46 +01:00
gotjosh
dd502f22eb
Alerting: Fix alert flapping in the internal alertmanager (#38648)
* Alerting: Fix alert flapping in the alertmanager

fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager.
Mostly due to a combination of `EndsAt` and resend delay.

The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard
back from the alert generation system.

Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself,
and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing.

This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.
2021-09-02 16:22:59 +01:00
Kyle Brandt
aa904a5a04
NGAlert: Send resolve signal to alertmanager on alerting -> Normal (#37363) 2021-07-29 20:29:17 +02:00
George Robinson
456dac1303
Expand the value of math and reduce expressions in annotations and labels (#36611)
* Expand the value of math and reduce expressions in annotations and labels

This commit makes it possible to use the values of reduce and math
expressions in annotations and labels via their RefIDs. It uses the
Stringer interface to ensure that "{{ $values.A }}" still prints the
value in decimal format while also making the labels for each RefID
available with "{{ $values.A.Labels }}" and the float64 value with
"{{ $values.A.Value }}"
2021-07-15 13:10:56 +01:00
David Parrott
19f18bcecc
Alerting: annotation on state change (#36535)
* WIP

* Add annotation on alert state change

* move annotation creation to manager

* praise the linter!

* add debug msg when creating annotation
2021-07-13 09:50:10 -07:00
David Parrott
4732f832f7
Alerting: recalculate EndsAt (#35830)
* setEndsAt

* one more test case

* add should clause to tests
2021-06-17 10:01:46 -07:00
David Parrott
20d356947c
set state correctly and test (#34680) 2021-05-26 11:37:42 -07:00
David Parrott
7a83d1f9ff
Alerting resend delay for sending to notifiers (#34312)
* adds resend delay to avoid saturating notifier

* correct method signatures

* pr feedback
2021-05-19 22:15:09 +02:00
David Parrott
25485100b0
Alerting: Trim results when at processing instead of on ticker (#34248)
* Trim results when at processing instead of on ticker

* User RWMutex correctly

* remove comment
2021-05-18 10:56:14 -07:00
David Parrott
bbb7bbf891
Alerting: Remove back end logic for supporting KeepLastState (#34242)
* Removed back end logic for supporting KeepLastState

* Map keep_state correctly in migrations
2021-05-18 10:55:43 -07:00
Kyle Brandt
63b2dd06a5
Alerting: Set "value" with evalmatches in G Managed (#34075)
When, and currently only when using a classic condition, evaluation information is added (which is like the EvalMatches from dashboard alerting).

This is returned via the API and can be included in notifications by reading the `__value__` label attached `.Alerts` in the template. It is a string.
2021-05-18 09:12:39 -04:00
David Parrott
b1a8c67689
Alerting return evaluation errors to /rules (#33663)
* Set and return errors produced by evaluation results

* test fixup
2021-05-04 13:08:12 -04:00
Kyle Brandt
7823842c5d
Alerting: Load annotations from rule into State cache (#33542)
for https://github.com/grafana/alerting-squad/issues/127
2021-04-30 20:23:12 +02:00
Sofia Papagiannaki
1e380e869e
[Alerting]: some fixes (#33538)
* Fix fialure when adding state annotations

* Fix get org rules API

Do not fail response if user has no access to view a namespace.
Do not include the namespace in the response instead.

* lint
2021-04-29 19:15:15 +03:00
David Parrott
788bc2a793
Alerting: refactor state tracker (#33292)
* set processing time

* merge labels and set on response

* use state cache for adding alerts to rules

* minor cleanup

* add support for NoData and Error results

* rename test

* bring in changes from other PRs tha have been merged

* pr feedback

* add integration test

* close state tracker cleanup on context.Done

* fixup test

* rename state tracker

* set EvaluationDuration on Result

* default labels set as constants

* separate cache and state from manager

* use RWMutex in cache
2021-04-23 21:32:25 +02:00