Commit Graph

1038 Commits

Author SHA1 Message Date
George Robinson
c637a5543e Alerting: Rename caps to captures as cap is a reserved word (#63432) 2023-02-20 10:08:36 +00:00
George Robinson
aacf9da969 Alerting: Change Data to use Labels instead of map[string]string (#63431)
This commit changes the Data struct in template.go to use Labels
instead of map[string]string. It changes how labels are printed
when using {{ .Labels }} from map[foo:bar bar:baz] to
foo=bar, bar=baz.
2023-02-20 10:08:23 +00:00
George Robinson
0a01391ebe Alerting: Small readability improvements to template.go (#63422)
* Alerting: Small readability improvements to template.go

* Fix lint
2023-02-20 09:24:11 +00:00
George Robinson
0659134793 Alerting: Better printing of labels (#63348)
This commit changes how labels are printed in templates for custom
annotations and labels from map[foo:bar bar:baz] to foo=bar, bar=baz.
Labels are comma separated, and sorted in increasing order.
2023-02-16 12:04:15 -05:00
George Robinson
9e86916d48 Alerting: Move templating to template package (#63347)
This commit moves templating from the state package to a sub-package
called template. This sub-package will be the logical package for
future ease-of-use improvements to templating custom annotations
and labels.
2023-02-16 17:16:36 +01:00
Alexander Weaver
958fb2c50a Alerting: Unify structs in Loki client and make them more consistent with Prometheus (#63055)
* Use existing row struct instead of [2]string, add deserialization helper

* Replace Stream struct with stream struct which is exactly the same

* Drop unused status field

* Don't export queryRes and queryData

* Tests for custom marshalling

* Rename row fields to T and V for consistency with prometheus samples

* Rename row to sample
2023-02-11 05:17:44 -06:00
George Robinson
1f984409a2 Alerting: Fix a bug taking screenshots with Dashboard UID (#63220)
This commit fixes a bug where Grafana would fail to take a screenshot if
the same Dashboard UID was present across two or more different orgs.
2023-02-09 15:23:01 -05:00
Steve Simpson
4d1a2c3370 Alerting: Move rule_groups_rules metric from State to Scheduler. (#63144)
The `rule_groups_rules` metric is currently defined and computed by `State`.
It makes more sense for this metric to be computed off of the configured rule
set, not based on the rule evaluation state. There could be an edge condition
where a rule does not have a state yet, and so is uncounted.

Additionally, we would like this metric (and others), to have a `rule_group`
label, and this is much easier to achieve if the metric is produced from the
`Scheduler` package.
2023-02-09 17:05:19 +01:00
suntala
49b3027049 Chore: Remove Result field from datasources (#63048)
* Remove Result field from AddDataSourceCommand
* Remove DatasourcesPermissionFilterQuery Result
* Remove GetDataSourceQuery Result
* Remove GetDataSourcesByTypeQuery Result
* Remove GetDataSourcesQuery Result
* Remove GetDefaultDataSourceQuery Result
* Remove UpdateDataSourceCommand Result
2023-02-09 15:49:44 +01:00
Alexander Weaver
f80bf11782 Alerting: Make time range query parameters not required when querying Loki (#62985)
* Make from and to not required

* Move default range calculation up to loki.go
2023-02-07 14:26:43 -06:00
Santiago
955c7b13ea Alerting: Change error log to warning and apply correct format when updating historic config (#62973)
* change error to warning, apply correct format

* Update pkg/services/ngalert/store/alertmanager.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* different wording and data

---------

Co-authored-by: George Robinson <george.robinson@grafana.com>
2023-02-07 11:04:05 -03:00
Joey Tawadrous
121260e0dd Parca: Use data query schema (#62840)
* Parca data query schema

* Remove groupBy
2023-02-07 09:56:21 +00:00
Alexander Weaver
0efb84617e Alerting: Create benchmarking test for state.ProcessEvalResults (#62041)
* Create benchmark for ProcessEvalResults

* Simplify the test
2023-02-03 15:38:08 -06:00
Yuri Tseretyan
f066e8cdcd Alerting: Update to alerting 20230203015918-0e4e2675d7aa (after refactoring) (#62823)
* add alerting prefix to some packages from alerting that have similar names in prometheus alertmanager
2023-02-03 11:36:49 -05:00
idafurjes
982939111b Rename Id to ID for annotation models (#62886)
* Rename Id to ID for annotation models

* Add xorm tags

* Rename Id to ID for API key models

* Add xorm tags
2023-02-03 17:23:09 +01:00
Alexander Weaver
9eeea8f5ea Alerting: Add label query parameters to state history endpoint (#62831)
* Allow equality-only matching of arbitrary labels via query params

* Pre-initialize map
2023-02-02 16:52:08 -06:00
Jean-Philippe Quéméner
1694a35e0c Alerting: implement loki query for alert state history (#61992)
* Alerting: implement loki query for alert state history

* extract selector building

* add unit tests for selector creation

* backup

* give selectors their own type

* build dataframe

* add some tests

* small changes after manual testing

* use struct client

* golint

* more golint

* Make RuleUID optional for Loki implementation

* Drop initial assumption that we only have one series

* Pare down to three columns, fix timestamp overflows, improve failure cases in loki responses

* Embed structred log lines in the dataframe as objects rather than json strings

* Include state history label filter

* Remove dead code

---------

Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
2023-02-02 16:31:51 -06:00
Matthew Jacobson
f9ec16e74f Alerting: Fix template validation in provisioning api (#62530)
* Alerting: Fix template validation in provisioning api

Fix issue where provisioning API accepts a malformed template having extra
text outside of definition block and template name matching definition name.
2023-02-02 15:26:39 -05:00
Alexander Weaver
647f73ddc5 Alerting: Add static label to all state history entries (#62817)
* Add static label to all state history entries

* Separate label and value visually
2023-02-02 13:25:26 -06:00
Santiago
ba731f7865 Alerting: Mark AM configuration as applied (#61330)
* Mark AM configuration as applied

* add missing checks, make linter happy

* fix deadlock, mark as valid on save and on load

* mark configurations only if needed

* check error after applyConfig()

* code review comments

* code review changes

* more code review changes

* clean HistoricConfigFromAlertConfig function
2023-02-02 14:45:17 -03:00
Alexander Weaver
6ad1cfef38 Alerting: Add endpoint for querying state history (#62166)
* Define endpoint and generate

* Wire up and register endpoint

* Cleanup, define authorization

* Forgot the leading slash

* Wire up query and SignedInUser

* Wire up timerange query params

* Add todo for label queries

* Drop comment

* Update path to rules subtree
2023-02-02 11:34:00 -06:00
Alexander Weaver
9fa28c11c5 Alerting: Usability adjustments to Loki representation of state history values (#62643)
* Extract label merge, add test file

* Extract error/NoData to first class fields, remove a layer from values

* Include dashUID and panelID as line-level fields

* Drop unnecessary object receiver

* Add tests for stream building

* Drop NoData field from log lines
2023-02-02 10:54:10 -06:00
idafurjes
23c27cffb3 Chore: Rename Id to ID in alerting models (#62777)
* Chore: Rename Id to ID in alerting models

* Add xorm tags for datasource

* Add xorm tag for uid
2023-02-02 17:22:43 +01:00
Sonia Aguilar
753c84f825 Alerting: Pass yaml as a query param in export request (#62751)
* Set YAML as default value for exporting alert rules

* use YAML format for rule list export

Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>

* lint

* Add new format query param to swagger+docs

* Fix broken test

---------

Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
Co-authored-by: Matt Jacobson <matthew.jacobson@grafana.com>
2023-02-02 16:10:02 +00:00
Steve Simpson
c44e9f6b71 Alerting: Add metrics around notification delivery. (#62778)
This change exposes more metrics from the embedded Alertmanager, which are
valuable for troubleshooting Alertmanager operation particularly in HA setups.

```
grafana_alerting_notifications_total
grafana_alerting_notifications_failed_total
grafana_alerting_notification_requests_total
grafana_alerting_notification_requests_failed_total
grafana_alerting_notification_latency_seconds
grafana_alerting_nflog_gc_duration_seconds
grafana_alerting_nflog_snapshot_duration_seconds
grafana_alerting_nflog_snapshot_size_bytes
grafana_alerting_nflog_queries_total
grafana_alerting_nflog_query_errors_total
grafana_alerting_nflog_query_duration_seconds
grafana_alerting_nflog_gossip_messages_propagated_total
grafana_alerting_dispatcher_aggregation_groups
grafana_alerting_dispatcher_alert_processing_duration_seconds
```

Note that `alertmanager_dispatcher_aggregation_group_limit_reached_total` is
explicitly not exposed, as the group limit metrics are not enabled.
2023-02-02 14:44:20 +01:00
Alexander Weaver
fcecf4d3cb Alerting: Refactor away a layer of indirection around the goroutine in Loki state history (#62644)
Inline recordStreamsAsync in loki backend
2023-02-01 11:57:29 -06:00
Gilles De Mey
26866953c1 Alerting: hide "silence" button for external AM setups (#62133) 2023-02-01 15:51:05 +01:00
Sofia Papagiannaki
f143b0a5b2 Chore: Move folder store interface, implementation and test under pkg/services/folder (#62586)
* Chore: Move folder store into folder service package

* Split folder and dashboard store implementations
2023-02-01 15:43:21 +02:00
Alex Moreno
53945afedf Alerting: Allow alert rule pausing from API (#62326)
* Add is_paused attr to the POST alert rule group endpoint

* Add is_paused to alerting API POST alert rule group

* Fixed tests

* Add is_paused to alerting gettable endpoints

* Fix integration tests

* Alerting: allow to pause existing rules (#62401)

* Display Pause Rule switch in Editing Rule form

* add isPaused property to form interface and dto

* map isPaused prop with is_paused value from DTO

Also update test snapshots

* Append '(Paused)' text on alert list state column when appropriate

* Change Switch styles according to discussion with UX

Also adding a tooltip with info what this means

* Adjust styles

* Fix alignment and isPaused type definition

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>

* Fix test

* Fix test

* Fix RuleList test

---------

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>

* wip

* Fix tests and add comments to clarify AlertRuleWithOptionals

* Fix one more test

* Fix tests

* Fix typo in comment

* Fix alert rule(s) cannot be paused via API

* Add integration tests for alerting api pausing flow

* Remove duplicated integration test

---------

Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
2023-02-01 13:15:03 +01:00
Alexander Weaver
03f3fbec0d Alerting: Fix handling of special floating-point cases when writing observed values to annotations (#61074)
* Fix json serialization of state values

* Simplify two of the tests

* Fix linter complaint

* Don't return error if we fail to look up dashboard, just log it and move on

* Address linter complaint
2023-01-31 15:27:51 -06:00
gotjosh
55e7cf1aed Alerting: Introduce Metric Aggregation starting with Silences (#62512)
* Alerting: Introduce Metric Aggregation starting with Silences
---------

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2023-01-31 19:54:38 +00:00
gotjosh
178f290f0c Update dskit to the latest main (#62616)
* Update dskit to the latest main

* Break free from a cortex depedency
2023-01-31 19:05:49 +00:00
ismail simsek
91221bc436 Expressions: Fixes the issue showing expressions editor (#62510)
* Use suggested value for uid

* update the snapshot

* use __expr__

* replace all -100 with __expr__

* update snapshot

* more changes

* revert redundant change

* Use expr.DatasourceUID where it's possible

* generate files
2023-01-31 18:50:10 +01:00
Alexander Weaver
e7ace4ed62 Alerting: Allow separate read and write path URLs for Loki state history (#62268)
Extract config parsing and add tests
2023-01-30 16:30:05 -06:00
Alexander Weaver
b4682fe3cb Alerting: Configurable externalLabels for Loki state history (#62404)
* Add config option for external labels

* Remove redundant nilcheck
2023-01-30 14:24:45 -06:00
Alex Moreno
7a465f42a6 Alerting: Allow pausing alerts from provisioning (#62263)
* Allow pausing alerts from provisioning

* Update swagger

* Add IsPaused to provision export endpoints

* Add pause field in sample.yml

* Add exception for reset state in first loop iteration of scheduler if rule is paused

* Update provision definition and swagger docs

* Fix provisioning export tests

* Suggestion: Simplify if condition

* Add more context to a comment
2023-01-30 16:29:05 +01:00
Ieva
ee3d742c7d RBAC: inherit folder permissions when resolving managed permissions (#62244)
* add nested folder scope inheritance to managed permission services

* add a more specific erorr

* remove circular dependencies

* use errutil for returning erorr

* fix tests

* fix tests

* define a new error in ac package
2023-01-30 14:19:42 +00:00
idafurjes
3bda112c5f Chore: Move search model from models package to search service (#62215)
* Chore: Move search model from models package to search service

* Remove unused imports

* Cleanup after merge
2023-01-30 15:17:53 +01:00
Serge Zaitsev
d6d4097567 Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
Yuri Tseretyan
0c4671e31f Alerting: Update historian to ignore transitions from Normal Paused and Updated (#62267) 2023-01-27 16:26:22 -05:00
gotjosh
3c616da83f Alerting: Refactor metrics/ngalert.go into seperate files (#62362)
* Alerting: Refactor metrics/ngalert.go into seperate files
2023-01-27 18:49:49 +00:00
Matthew Jacobson
c006df375a Alerting: Create endpoints for exporting in provisioning file format (#58623)
This adds provisioning endpoints for downloading alert rules and alert rule groups in a 
format that is compatible with file provisioning. Each endpoint supports both json and 
yaml response types via Accept header as well as a query parameter 
download=true/false that will set Content-Disposition to recommend initiating a download 
or inline display.

This also makes some package changes to keep structs with potential to drift closer 
together. Eventually, other alerting file structs should also move into this new file 
package, but the rest require some refactoring that is out of scope for this PR.
2023-01-27 11:39:16 -05:00
George Robinson
b6db1ed524 Revert "Alerting: Add is_paused attr to the POST alert rule group endpoint" (#62310)
Revert "Alerting: Add is_paused attr to the POST alert rule group endpoint (#62253)"

This reverts commit 3ccafe3a5a.
2023-01-27 13:41:36 +01:00
Alex Moreno
3ccafe3a5a Alerting: Add is_paused attr to the POST alert rule group endpoint (#62253)
Add is_paused attr to the POST alert rule group endpoint
2023-01-27 10:50:06 +01:00
Yuri Tseretyan
05bf241952 Alerting: Update state manager to return StateTransitions when Delete or Reset (#62264)
* update Delete and Reset methods to return state transitions

this will be used by notifier code to decide whether alert needs to be sent or not.

* update scheduler to provide reason to delete states and use transitions

* update FromAlertsStateToStoppedAlert to accept StateTransition and filter by old state

* fixup

* fix tests
2023-01-27 09:46:21 +01:00
idafurjes
6c5a573772 Chore: Move ReqContext to contexthandler service (#62102)
* Chore: Move ReqContext to contexthandler service

* Rename package to contextmodel

* Generate ngalert files

* Remove unused imports
2023-01-27 08:50:36 +01:00
Alex Moreno
531b439cf1 Alerting: Add alert pausing feature (#60734)
* Add field in alert_rule model, add state to alert_instance model, and state to eval

* Remove paused state from eval package

* Skip paused alert rules in scheduler

* Add migration to add is_paused field to alert_rule table

* Convert to postable alerts only if not normal, pernding, or paused

* Handle paused eval results in state manager

* Add Paused state to eval package

* Add paused alerts logic in scheduler

* Skip alert on scheduler

* Remove paused status from eval package

* Apply suggestions from code review

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Remove state

* Rethink schedule and manager for paused alerts

* Change return to continue

* Remove unused var

* Rethink alert pausing

* Paused alerts storing annotations

* Only add one state transition

* Revert boolean method renaming refactor

* Revert take image refactor

* Make registry errors public

* Revert method extraction for getting a folder title

* Revert variable renaming refactor

* Undo unnecessary changes

* Revert changes in test

* Remove IsPause check in PatchPartiLAlertRule function

* Use SetNormal to set state

* Fix text by returning to old behaviour on alert rule deletion

* Add test in schedule_unit_test.go to test ticks with paused alerts

* Add coment to clarify usage of context.Background()

* Add comment to clarify resetStateByRuleUID method usage

* Move rule get to a more limited scope

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* rum gofmt on pkg/services/ngalert/schedule/schedule.go

* Remove defer cancel for context

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/testing.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* skip scheduler rule state clean up on paused alert rule

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Fix mock in test

* Add (hopefully) final suggestions

* Use error channel from recordAnnotationsSync to cancel context

* Run make gen-cue

* Place pause alert check in channel update after version check

* Reduce branching un update channel select

* Add if for error and move code inside if in state manager ResetStateByRuleUID

* Add reason to logs

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Do not delete alert rule routine, just exit on eval if is paused

* Reduce branching and create-close a channel to avoid deadlocks

* Separate state deletion and state reset (includes history saving)

* Add current pause state in rule route in scheduler

* Split clearState and bring errCh closer to RecordStatesAsync call

* Change rule to ruleMeta in RecordStatesAsync

* copy state to be able to modify it

* Add timeout to context creation

* Shorten the timeout

* Use resetState is rule is paused and deleteState if rule is not paused

* Remove Empty state reason

* Save every rule change in historian

* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID

* Remove useless line

* Remove outdated comment

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
2023-01-26 18:29:10 +01:00
Kristin Laemmert
e8b8a9e276 chore: move dashboard_acl models into dashboard service (#62151) 2023-01-26 08:46:30 -05:00
gotjosh
0bfe150928 Alerting: Fix Test Receivers when settings are non-strings (#62156)
* Alerting: Fix Test Receivers when settings are non-strings

As part of the Alerting extraction, we want to make sure we don't have circular depedencies. As such, I had to move `PostableGrafanaReceiver` to a new struct in `grafana/alerting` called `GrafanaReceiver`.

`PostableGrafanaReceiver` has an attribute called `Settings` that uses a Grafana-propietary struct called `RawMessage`, this struct shadows `json.RawMessage`.

When I created `GrafanaReceiver`, I turned settings into a `map[string]string` thinking all settings would end up as strings. This was a mistake, and this test proves that it doesn't work, and breaks the API.
2023-01-26 12:54:03 +00:00
George Robinson
a7eab8e46e Alerting: Support context.Context in Loki interface (#61979)
This commit adds support for canceleable contexts in the Loki
interface.
2023-01-26 09:31:20 +00:00