Commit Graph

59 Commits

Author SHA1 Message Date
William Wernert
10dc6c6d75
Alerting: Add "Keep Last State" backend functionality (#83940)
* Implement keep last state for state transitions

* Respect For duration when keeping state

* Only keep transition from recording an annotation

* Add keep last state option for nodata/error in UI
2024-03-12 10:00:43 -04:00
Yuri Tseretyan
1eebd2a4de
Alerting: Support for simplified notification settings in rule API (#81011)
* Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping
* Support validation of notification settings when rules are updated

* Implement route generator for Alertmanager configuration. That fetches all notification settings.
* Update multi-tenant Alertmanager to run the generator before applying the configuration.

* Add notification settings labels to state calculation
* update the Multi-tenant Alertmanager to provide validation for notification settings

* update GET API so only admins can see auto-gen
2024-02-15 09:45:10 -05:00
Yuri Tseretyan
47546a4c72
Alerting: Update API to use folders' full paths (#81214)
* update GetUserVisibleNamespaces to use FolderSeriver
* update GetNamespaceByUID to use FolderService.GetFolders
* update GetAlertRulesForScheduling to use FolderService.GetFolders 

* Update API and GetAlertRulesForScheduling to use the folder's full path
* get full path of folder in RouteTestGrafanaRuleConfig

* fix escaping of titles for MySQL
2024-02-06 17:12:13 -05:00
Sofia Papagiannaki
d1dab5828d
Alerting: Update rule API to address folders by UID (#74600)
* Change ruler API to expect the folder UID as namespace

* Update example requests

* Fix tests

* Update swagger

* Modify FIle field in /api/prometheus/grafana/api/v1/rules

* Fix ruler export

* Modify folder in responses to be formatted as <parent UID>/<title>

* Add alerting test with nested folders

* Apply suggestion from code review

* Alerting: use folder UID instead of title in rule API (#77166)

Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>

* Drop a few more latent uses of namespace_id

* move getNamespaceKey to models package

* switch GetAlertRulesForScheduling to use folder table

* update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`.

* fi tests

* add tests for GetAlertRulesForScheduling when parent uid

* fix integration tests after merge

* fix test after merge

* change format of the namespace to JSON array

this is needed for forward compatibility, when we migrate to full paths

* update EF code to decode nested folder

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2024-01-17 11:07:39 +02:00
William Wernert
48b5ac779b
Alerting/Annotations: Add annotation backend for Loki alert state history (#78156)
* Move scope type vars to testutil package

* Expose parts of state historian for use in annotation backend

* Implement Loki ASH Annotation store

This store will only implement the `Get` method of a RepositoryImpl since alert state history
writes to Loki elsewhere.

* Use interface for Loki HTTP Client

* Add tests for Loki ASH Annotation store

* Add missing test

* Fix lint

* Organize tests

* Add filter tests

* Improve tests

* Move filter logic into outer function

* Fix lint

* Add comment

* Fix tests

* Fix lint

* Rename historian store + refactor

* Cleanup historian store

* Fix tests

* Minor cleanup

* Use new `ShouldRecordAnnotation` filter

* Fix logic and add tests for this check

* Fix typos, remove unused variables, `< 1` -> `== 0`

* More closely mimic RBAC filter from xorm to ensure correct logic

* Move off weaveworks client

* Address PR comments
2024-01-10 18:42:35 -05:00
Yuri Tseretyan
f6a46744a6
Alerting: Support hysteresis command expression (#75189)
Backend: 

* Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command.
   - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels.
  -  add rule_fingerprint column to alert_instance
   - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it.
   - add AlertingResultsFromRuleState that implements the new interface in eval package
   - update getExprRequest to patch the hysteresis command.

* Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition.


Frontend:

* Add hysteresis option to Threshold in UI. It's called "Recovery Threshold"
* Add test for getUnloadEvaluatorTypeFromCondition
* Hide hysteresis in panel expressions

* Refactor isInvalid and add test for it
* Remove unnecesary React.memo
* Add tests for updateEvaluatorConditions

---------

Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
2024-01-04 11:47:13 -05:00
Alexander Weaver
6ee52ac80c
Alerting: Allow more time before Alertmanager expire-resolves alerts (#77094)
* Sync endsAt factor with prometheus

* Fix state tests
2023-10-25 10:03:46 -05:00
Alexander Weaver
acee3efcf9
Alerting: Use common StateReason values for NoData/Error mapped states (#76781)
Fix hardcoded state reasons
2023-10-18 17:26:41 -05:00
Kyle Brandt
1df4d332c9
SSE: Use errutil to show better error messages in prod (#71658)
- include public message
- propagate data source query errors so they are shown as well to which fixes #70026
2023-07-21 06:38:29 -04:00
George Robinson
815e98ed95
Alerting: Add debug logs for EndsAt timestamp (#70336)
This commit adds debug logs for previous_ends_at and next_ends_at
to state.go to help us debug issues where alerts are resolved in
Alertmanager due to expiration. This change is in response to a
support escalation where this information was needed but unavailable.
2023-06-20 12:13:38 +03:00
Matthew Jacobson
ba3994d338
Alerting: Repurpose rule testing endpoint to return potential alerts (#69755)
* Alerting: Repurpose rule testing endpoint to return potential alerts

This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations.

The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent.

This means that the endpoint will, among other things:

- Attach static annotations and labels from the rule configuration to the alert instances.
- Attach dynamic annotations from the datasource to the alert instances.
- Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances.
- Interpolate templated annotations / labels and accept allowed template functions.
2023-06-08 18:59:54 -04:00
Matthew Jacobson
b9dc04139a
Alerting: Respect "For" Duration for NoData alerts (#65574)
* Alerting: Respect "For" Duration for NoData alerts

This change modifies `resultNoData` to be more inline with the logic of the other state handlers.

The main effects of this are:

1) NoData states with NoDataState config set to Alerting will respect "For" duration.
2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK.
3) Better state transition logging.
2023-03-31 19:05:15 +03:00
Yuri Tseretyan
9d57b1c72e
Alerting: Do not persist noop transition from Normal state. (#61201)
* add feature flag `alertingNoNormalState`
* update instance database to support exclusion of state in list operation
* do not save normal state and delete transitions to normal
* update get methods to filter out normal state
2023-01-13 18:29:29 -05:00
Alexander Weaver
b289b8ac6e
Alerting: Set error annotation on EvaluationError regardless of underlying error type (#61506)
Set error annotation regardless of underlying error type
2023-01-13 13:58:02 -06:00
George Robinson
76601f3ae7
Alerting: Better define how we set states (#59977)
This commit better defines how we set states in resultNormal,
resultAlerting, resultError and resultNoData. It changes the existing
code to call methods such as SetAlerting, SetPending, SetNormal,
SetError and NoData instead of assigning values to each individual field
whenever the state is changed. This should make it easier to understand
what fields should be set for which states and avoid cases where states are
missing, or have additional unexpected fields.
2022-12-08 20:12:13 +00:00
George Robinson
6359dab040
Alerting: Change resultError in preparation for supporting ForError duration (#59894) 2022-12-07 10:45:56 +00:00
George Robinson
3c249e1b99
Fix incorrect start time for DatasourceError alerts (#59903) 2022-12-06 18:44:06 +00:00
Yuri Tseretyan
a85adeed96
Alerting: Update state history service to filter states transitions (#58863)
* rename the method to better reflect its behavior
* make historian filter transition on itself
* call historian with all changes
2022-12-06 12:33:15 -05:00
Alexander Weaver
2bfdda5b68
Alerting: Break dependency between state and image packages (#58381)
* Refactor state and manager to not depend directly on image interface

* Move generic errors to models package

* Move NotAvailableImageService to state as its only references are in state tests

* Move NoopImageService to state package

* Move mock to state package

* Fix linter error

* Fix comment styling

* Fix a couple added references introduced by rebase

* Empty commit to kick build
2022-11-09 15:06:49 -06:00
George Robinson
1290951b65
Alerting: Small improvements to staleResultsHandler (#58007) 2022-11-09 11:08:32 +00:00
Yuri Tseretyan
623de12e35
Alerting: Create AlertInstanceKey in one place (#58278)
* use method GetAlertInstanceKey
* do not add key if error
2022-11-07 09:35:29 -05:00
Alexander Weaver
cc8c1380e2
Alerting: Persist annotations from multidimensional rules in batches (#56575)
* Reduce piecemeal state fields

* Read data directly off state instead of rule

* Unify state and context into single struct

* Expose contextual information to layer above setNextState

* Work in terms of ContextualState and call historian in batches

* Call annotations service in batches

* Export format state and reason and remove workaround in unrelated test package

* Add new method to annotation service for batch inserting

* Fix loop variable aliasing bug caught by linter, didn't change behavior

* Incl timerange on annotation tests

* Insert one at a time if tags are present

* Point to rule from ContextualState rather than copy fields

* Build annotations and copy data prior to starting goroutine

* Rename to StateTransition

* Use new bulk-insert utility

* Remove rule from StateTransition and pass in directly to historian

* Simplify annotations logic since we have only one rule

* Fix logs and context, nilcheck, simplify method name

* Regenerate mock
2022-11-04 10:39:26 -05:00
George Robinson
215ffee437
Alerting: Fix screenshot is not taken for stale series (#57982) 2022-11-02 22:14:22 +00:00
George Robinson
52965de369
Alerting: Add doc comments to state struct and normalize fields (#56647) 2022-10-11 09:30:33 +01:00
George Robinson
802d67eeca
Alerting: Support values in notification templates (#56457)
We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't.

Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via.

This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!
2022-10-10 13:40:21 +01:00
George Robinson
5561f935e6
Alerting: Fix send resolved notifications (#54793)
This commit fixes a bug where we did not send resolved alerts to Alertmanager for resolved alert instances. This meant that resolved notifications did not have the annotations from the resolved state, and a result did not also have the resolved screenshot.
2022-09-15 17:25:05 +01:00
Yuriy Tseretyan
9f90a7b54d
Alerting: State manager to use InstanceStore (#53852)
* move saving the state to state manager when scheduler stops
* move saving state to ProcessEvalResults

* add GetRuleKey to State
* add LogContext to AlertRuleKey
2022-08-18 09:40:33 -04:00
George Robinson
34d45977ca
Alerting: Fix bug where state did not change between Alerting and Error (#52204)
This commit fixes a bug where the state did not change from Alerting to Error if the evaluation result returned an error, or from Error to Alerting if evaluations stopped returning errors.
2022-07-14 10:53:39 +01:00
Joe Blubaugh
9e8efaa459
Alerting: Add stored screenshot utilities to the channels package. (#49470)
Adds three functions:
`withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function.
`withStoredImage` does this for an image attached to a specific alert.
`openImage` finds and opens an image file on disk.

Moves `store.Image` to `models.Image`
Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods.
Updates all pkg/alert/notifier/channels to use withStoredImage routines.
2022-05-26 13:29:56 +08:00
Joe Blubaugh
1cc034d960
Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
Joe Blubaugh
687e79538b
Alerting: Add a general screenshot service and alerting-specific image service. (#49293)
This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service.

The screenshot package has the following services, most of which can be composed:

BrowserScreenshotService (Takes screenshots with headless Chrome)
CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService)
NoopScreenshotService (A no-op screenshot service for tests)
SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel)
ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable)
UploadingScreenshotService (A screenshot service that uploads taken screenshots)

The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296

This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.
2022-05-22 22:33:49 +08:00
Yuriy Tseretyan
4b417c8f3e
use NaN if condition value is nil (#48370) 2022-04-27 15:59:13 -03:00
Yuriy Tseretyan
884c885289
Alerting: Support OK option for Error state (#47670)
* support OK state for Error
2022-04-13 14:45:29 -04:00
gotjosh
cb6124c921
Alerting: Accurately set value for prom-compatible APIs (#47216)
* Alerting: Accurately set value for prom-compatible APIs

Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames.

* Fix an extra test

* Ensure a consitent ordering

* Address review comments

* address review comments
2022-04-05 19:36:42 +01:00
George Robinson
79769132c0
Alerting: Alert rule should wait For duration when execution error state is Alerting (#47052)
Alerting: Alert rule should wait For duration when execution error state is Alerting
2022-03-31 09:57:58 +01:00
gotjosh
84e5f336fe
Alerting: Classic conditions can now display multiple values (#46971)
* Alerting: Extract classic condition values by RefID

* uncapitalise function

* update documentation

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/state/state.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/state/state.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Run prettier

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2022-03-29 20:33:03 +01:00
gotjosh
a338c78ca8
Alerting: Remove internal labels from prometheus compatible API responses (#46548)
* Alerting: Remove internal labels from prometheus compatible API responses

* Appease the linter

* Fix integration tests

* Fix API documentation & linter

* move removal of internal labels to the models
2022-03-16 16:04:19 +00:00
George Robinson
789cfc31e3
Alerting: Fix use of > instead of >= when checking the For duration (#46011) 2022-03-01 17:06:42 +00:00
Yuriy Tseretyan
984c95de63
Do not store EvaluationString in Evaluation. (#44606)
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state
2022-02-02 19:18:20 +01:00
George Robinson
5e2280ceee
Add metrics to ngalert scheduler (#44602)
This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.
2022-01-31 16:56:43 +00:00
George Robinson
c932dc959c
Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630) 2021-12-03 09:55:16 +00:00
gotjosh
dd5a2e5128
Alerting: Clear alerting rule evaluation errors after intermittent failures (#42386)
* Alerting: Clear alerting rule evaluation errors after intermittent failures

When an alert transitioned in a way that `alerting -> error -> (alerting|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.
2021-11-26 17:58:19 +00:00
George Robinson
1b26d4d88e
Alerting: Create DatasourceError alert if evaluation returns error (#41869)
* Alerting: Create DatasourceError alert if evaluation returns error

* Alerting: Add docs for DatasourceError alert

* Alerting: Fix DatasourceError alert does not have dashboard_uid label

* Alerting: Add break when datasource_uid found

* Alerting: Update TestProcessEvalResults
2021-11-25 11:46:47 +01:00
Yuriy Tseretyan
610643a668
Alerting: Special alert instance if rule is in state NoData (#40540)
* do not suppress NoData state
* extract conversion of state to postable alert + tests
* create a special alert instance if nodata 
* use NoData when converting from Keep Last State instead of Alerting
* add silence during migration if NoData is mapped to KeepLastState.
2021-11-04 16:42:34 -04:00
George Robinson
27609dc2c5
Fix alerts with evaluation interval more than 30 seconds resolving in Alertmanager (#39513) 2021-09-22 14:55:46 +01:00
gotjosh
dd502f22eb
Alerting: Fix alert flapping in the internal alertmanager (#38648)
* Alerting: Fix alert flapping in the alertmanager

fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager.
Mostly due to a combination of `EndsAt` and resend delay.

The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard
back from the alert generation system.

Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself,
and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing.

This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.
2021-09-02 16:22:59 +01:00
Kyle Brandt
aa904a5a04
NGAlert: Send resolve signal to alertmanager on alerting -> Normal (#37363) 2021-07-29 20:29:17 +02:00
George Robinson
456dac1303
Expand the value of math and reduce expressions in annotations and labels (#36611)
* Expand the value of math and reduce expressions in annotations and labels

This commit makes it possible to use the values of reduce and math
expressions in annotations and labels via their RefIDs. It uses the
Stringer interface to ensure that "{{ $values.A }}" still prints the
value in decimal format while also making the labels for each RefID
available with "{{ $values.A.Labels }}" and the float64 value with
"{{ $values.A.Value }}"
2021-07-15 13:10:56 +01:00
David Parrott
19f18bcecc
Alerting: annotation on state change (#36535)
* WIP

* Add annotation on alert state change

* move annotation creation to manager

* praise the linter!

* add debug msg when creating annotation
2021-07-13 09:50:10 -07:00
David Parrott
4732f832f7
Alerting: recalculate EndsAt (#35830)
* setEndsAt

* one more test case

* add should clause to tests
2021-06-17 10:01:46 -07:00