Commit Graph

189 Commits

Author SHA1 Message Date
Jean-Philippe Quéméner
2b8c6d66e1
feat(alerting): add query optimizations for prometheus (#76015) 2023-10-17 11:41:25 +02:00
Yuri Tseretyan
2497db4bd6
Alerting: Add UID of rules to response that were affected by update group request (#75985)
* update storage's method InstertRules to return ids of added rules as slice to keep the same order as rules in the argument
* schematize response of update rule group endpoint, add created, updated, deleted fields that contain UID of affected rules.
* update integration tests to use the new fields
2023-10-07 01:11:24 +03:00
Jo
41bcb5e07f
Identity: Port folder library to identity.Requester (#76105)
Port folders to identity.Requester
2023-10-06 15:02:34 +02:00
Yuri Tseretyan
5be52dfe21
Alerting: Fix store's GetNamespaceByUID (#75976) 2023-10-04 13:13:31 -04:00
Jean-Philippe Quéméner
f3b6d01306
feat(alerting): enable loki query optimization by default (#74739) 2023-09-13 13:52:40 +02:00
Serge Zaitsev
58f6648505
Chore: capitalise messages for alerting (#74335) 2023-09-04 18:46:34 +02:00
Ryan McKinley
025b2f3011
Chore: use any rather than interface{} (#74066) 2023-08-30 18:46:47 +03:00
Jean-Philippe Quéméner
2266e09f94
Alerting: optimize rules with multiple loki range queries (#73103) 2023-08-09 19:00:51 +02:00
Jean-Philippe Quéméner
2c6cf66741
Alerting: Optimize external Loki queries (#73014) 2023-08-08 15:13:41 +02:00
Serge Zaitsev
7767ab6f43
Chore: Add folder data migration, fix unique index (#72602)
* add folder data migration, fix unique index

* fix unique index

* pass a fake store in tests

* pass store into other providers in tests

* and now with alerting!
2023-08-01 09:36:37 +02:00
Arati R
20ffbbc41e
NestedFolders: Add library panels counting and deletion to folder registry (#69149)
* Expose library element service's folder service
* Register library panels, add count implementation
* Expand folder counts test
* Update registry deletion method interface
* Allow getting library elements from any folder
* Add test for library panel deletion
* Add test for library panel counting
2023-07-25 13:05:53 +02:00
Santiago
ff9eff49bd
Alerting: Bump grafana/alerting and refactor the ImageStore/Provider to provide image URL/bytes (#70182)
* implement alerting.images.Provider interface in our ImageStore

* add URLExists() method to fakeConfigStore

* make linter happy

* update integration tests
2023-06-21 20:53:30 -03:00
Jean-Philippe Quéméner
934ba1aaa1
Alerting: Rewrite range to instant queries if possible (#69976) 2023-06-16 19:55:49 +02:00
Matthew Jacobson
0c688190f7
Alerting: Fix unique violation when updating rule group with title chains/cycles (#67868)
* Alerting: Fix unique violation when updating rule group with title chains/cycles

The uniqueness constraint for titles within an org+folder is enforced on every update within a transaction instead of on commit (deferred constraint). This means that there could be a set of updates that will throw a unique constraint violation in an intermediate step even though the final state is valid. For example, a chain of updates RuleA -> RuleB -> RuleC could fail if not executed in the correct order, or a swap of titles RuleA <-> RuleB cannot be executed in any order without violating the constraint.

The exact solution to this is complex and requires determining directed paths and cycles in the update graph, adding in temporary updates to break cycles, and then executing the updates in reverse topological order (see first commit in PR if curious).

This is not implemented here.

Instead, we choose a simpler solution that works in all cases but might perform more updates than necessary. This simpler solution makes a determination of whether an intermediate collision could occur and if so, adds a temporary title on all updated rules to break any cycles and remove the need for specific ordering.

In addition, we make sure diffs are executed in the following order: DELETES, UPDATES, INSERTS.
2023-06-08 18:51:50 -04:00
Arati R
6cb1a5e368
Nested folders: Add alert rule counts and deletion to folder registry (#67259)
* Let alert rule service implement registry service
* Add count method to RuleStore interface
* Add implementation for deletion of alert rules
* Rename uid to folderUID in registry methods
* Check forceDeleteRule value for registry deletion
* Register alerting store with folder service
* Move folder test functions to separate package
* Add testing for alert rule counting, deletion
* Remove redundant count method
* Fix deleteChildrenInFolder signature
* Update pkg/services/ngalert/store/alert_rule.go
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* Add tests for nested folder deletion
* Refactor TestIntegrationNestedFolderService
* Add rules store as parameter for alertng provider

---------

Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
2023-06-02 16:38:02 +02:00
Ieva
d8b66d5c4b
RBAC: remove some IsDisabled checks (#69272)
* remove some access contorl IsDisabled() checks

* cleaning up tests

* update tests

* linting
2023-05-31 09:58:57 +01:00
Alexander Weaver
0ed5d3bdf2
Revert "Alerting: Refactor the ImageStore/Provider to provide image URL/bytes" (#69265)
Revert "Alerting: Refactor the ImageStore/Provider to provide image URL/bytes (#67693)"

This reverts commit 72a187b0be.
2023-05-30 11:33:33 -05:00
Santiago
72a187b0be
Alerting: Refactor the ImageStore/Provider to provide image URL/bytes (#67693)
* (WIP) Refactor the ImageStore interface to work with our latest alerting repository

* update alerting package

* refactor, new URLExists method in ImageProvider

* tests for the new methods

* fix linter warnings

* use alertingImages as an alias for grafana/alerting/images

* logs about image uris and not found images

* nerf image not found logs

* extract duplicated code to getImageFromURI() method

* refactor getImageFromURI()

* add index on url

* add comment about migration log

* sync generated files
2023-05-30 11:25:55 -03:00
Matthew Jacobson
97ae6ae6ef
Alerting: Fix flaky TestIntegrationUpdateAlertRules (#69106)
Prevents duplicate alert rule ids and 0 value for BaseInterval and IntervalSeconds
2023-05-25 16:00:06 -04:00
Yuri Tseretyan
b57ef1f2c7
Alerting: Fix TestIntegration_GetAlertRulesForScheduling to make sure rules are created in different org (#69088)
make sure rules are created in different org
2023-05-25 13:51:38 -04:00
Santiago
b0881daf23
Alerting: Use URLs in image annotations (#66804)
* use tokens or urls in image annotations

* improve tests, fix some comments

* fix empty tokens

* code review changes, check for url before checking for token (support old token formats)
2023-04-26 13:06:18 -03:00
Arati R
cab3ba519a
NestedFolders: Add folder service registry with dashboard service implementation (#65033)
* Delete folders, dashboards with registry service
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Update signature of ProvideDashboardServiceImpl
* Regenerate mockery file
* Add test for DeleteInFolder
* Add test for DeleteDashboardsInFolder
* Delete child dashboard associations via registry
* Add validation of folder uid and org id

---------

Co-authored-by: Serge Zaitsev <hello@zserge.com>
2023-04-14 11:17:23 +02:00
gotjosh
2bbf0c9de4
Alerting: Allow Rules to Schedule to be filtered by Rule Group (#59990)
* Alerting: Allow Rules to Schedule to be filtered by Rule Group
2023-04-13 12:55:42 +01:00
Matthew Jacobson
63187fae0c
Alerting: Remove and revert flag alertingBigTransactions (#65976)
* Alerting: Remove and revert flag alertingBigTransactions

This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag.

Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count
came at the cost of significant increase in CPU usage and query latency.

* Fix lint backend

* Removed last bits of alertingBigTransactions

---------

Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
2023-04-06 18:06:25 +02:00
Matthew Jacobson
85f738cdf9
Alerting: Add endpoint to revert to a previous alertmanager configuration (#65751)
* Alerting: Add endpoint to revert to a previous alertmanager configuration

This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
2023-04-05 14:10:03 -04:00
Santiago
aba91d3053
Alerting: Fetch all applied alerting configurations (#65728)
* WIP

* skip invalid historic configurations instead of erroring

* add warning log when bad historic config is found

* remove unused custom marshaller for GettableHistoricUserConfig

* add id to historic user config, move limit check to store, fix typo

* swagger spec
2023-03-31 17:43:04 -03:00
Serge Zaitsev
0beb768427
Chore: Remove result fields from ngalert (#65410)
* remove result fields from ngalert

* remove duplicate imports
2023-03-28 10:34:35 +02:00
Alexander Weaver
4a1c18abf6
Alerting: Fix intermittency when seeding database in rule store tests (#64322)
Force unique IDs when seeding database
2023-03-07 09:40:55 -05:00
bla2ej
56c8661929
Alerting: Get alert rules on faults (#61248) (#63051)
* Alerting: get alert rules on faults (#61248)

Two functions used to fetch alert rules from DB are updated:
- GetAlertRulesForScheduling
- ListAlertRules
Rows are scanned one by one so good ones are returned.
Common Error is logged with indication how many
rules failed on deserialization.

Resolved: #61248

* updates from review comments
2023-02-21 08:54:20 -06:00
Santiago
955c7b13ea
Alerting: Change error log to warning and apply correct format when updating historic config (#62973)
* change error to warning, apply correct format

* Update pkg/services/ngalert/store/alertmanager.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* different wording and data

---------

Co-authored-by: George Robinson <george.robinson@grafana.com>
2023-02-07 11:04:05 -03:00
Santiago
ba731f7865
Alerting: Mark AM configuration as applied (#61330)
* Mark AM configuration as applied

* add missing checks, make linter happy

* fix deadlock, mark as valid on save and on load

* mark configurations only if needed

* check error after applyConfig()

* code review comments

* code review changes

* more code review changes

* clean HistoricConfigFromAlertConfig function
2023-02-02 14:45:17 -03:00
Alex Moreno
53945afedf
Alerting: Allow alert rule pausing from API (#62326)
* Add is_paused attr to the POST alert rule group endpoint

* Add is_paused to alerting API POST alert rule group

* Fixed tests

* Add is_paused to alerting gettable endpoints

* Fix integration tests

* Alerting: allow to pause existing rules (#62401)

* Display Pause Rule switch in Editing Rule form

* add isPaused property to form interface and dto

* map isPaused prop with is_paused value from DTO

Also update test snapshots

* Append '(Paused)' text on alert list state column when appropriate

* Change Switch styles according to discussion with UX

Also adding a tooltip with info what this means

* Adjust styles

* Fix alignment and isPaused type definition

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>

* Fix test

* Fix test

* Fix RuleList test

---------

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>

* wip

* Fix tests and add comments to clarify AlertRuleWithOptionals

* Fix one more test

* Fix tests

* Fix typo in comment

* Fix alert rule(s) cannot be paused via API

* Add integration tests for alerting api pausing flow

* Remove duplicated integration test

---------

Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
2023-02-01 13:15:03 +01:00
idafurjes
3bda112c5f
Chore: Move search model from models package to search service (#62215)
* Chore: Move search model from models package to search service

* Remove unused imports

* Cleanup after merge
2023-01-30 15:17:53 +01:00
Serge Zaitsev
d6d4097567
Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
Alex Moreno
531b439cf1
Alerting: Add alert pausing feature (#60734)
* Add field in alert_rule model, add state to alert_instance model, and state to eval

* Remove paused state from eval package

* Skip paused alert rules in scheduler

* Add migration to add is_paused field to alert_rule table

* Convert to postable alerts only if not normal, pernding, or paused

* Handle paused eval results in state manager

* Add Paused state to eval package

* Add paused alerts logic in scheduler

* Skip alert on scheduler

* Remove paused status from eval package

* Apply suggestions from code review

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Remove state

* Rethink schedule and manager for paused alerts

* Change return to continue

* Remove unused var

* Rethink alert pausing

* Paused alerts storing annotations

* Only add one state transition

* Revert boolean method renaming refactor

* Revert take image refactor

* Make registry errors public

* Revert method extraction for getting a folder title

* Revert variable renaming refactor

* Undo unnecessary changes

* Revert changes in test

* Remove IsPause check in PatchPartiLAlertRule function

* Use SetNormal to set state

* Fix text by returning to old behaviour on alert rule deletion

* Add test in schedule_unit_test.go to test ticks with paused alerts

* Add coment to clarify usage of context.Background()

* Add comment to clarify resetStateByRuleUID method usage

* Move rule get to a more limited scope

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* rum gofmt on pkg/services/ngalert/schedule/schedule.go

* Remove defer cancel for context

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/testing.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* skip scheduler rule state clean up on paused alert rule

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Fix mock in test

* Add (hopefully) final suggestions

* Use error channel from recordAnnotationsSync to cancel context

* Run make gen-cue

* Place pause alert check in channel update after version check

* Reduce branching un update channel select

* Add if for error and move code inside if in state manager ResetStateByRuleUID

* Add reason to logs

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Do not delete alert rule routine, just exit on eval if is paused

* Reduce branching and create-close a channel to avoid deadlocks

* Separate state deletion and state reset (includes history saving)

* Add current pause state in rule route in scheduler

* Split clearState and bring errCh closer to RecordStatesAsync call

* Change rule to ruleMeta in RecordStatesAsync

* copy state to be able to modify it

* Add timeout to context creation

* Shorten the timeout

* Use resetState is rule is paused and deleteState if rule is not paused

* Remove Empty state reason

* Save every rule change in historian

* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID

* Remove useless line

* Remove outdated comment

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
2023-01-26 18:29:10 +01:00
Kristin Laemmert
e8b8a9e276
chore: move dashboard_acl models into dashboard service (#62151) 2023-01-26 08:46:30 -05:00
idafurjes
421976e919
Chore: Remove folders from models pkg (#61853) 2023-01-25 09:14:32 +01:00
Santiago
e5920c211e
Chore: Fix random indices for slices in test files (#61884)
* Fix random indices for slices in test files

* Empty commit
2023-01-24 15:07:37 -03:00
Matthew Jacobson
23e05373a7
Alerting: Fix flaky TestIntegrationUpdateAlertRules (#61641)
Prevents random OrgID=0 in test alert generation causing invalid alert rule.
2023-01-17 19:09:46 +00:00
Alexander Weaver
4f1bdc0607
Alerting: Skip flaky test in TestIntegrationUpdateAlertRules (#61627)
* Skip flaky test

* Add comment
2023-01-17 10:39:16 -06:00
Denis Limarev
e6dee8a723
Perfomance: Preallocate slices (#61580) 2023-01-17 11:50:17 +00:00
Yuri Tseretyan
9d57b1c72e
Alerting: Do not persist noop transition from Normal state. (#61201)
* add feature flag `alertingNoNormalState`
* update instance database to support exclusion of state in list operation
* do not save normal state and delete transitions to normal
* update get methods to filter out normal state
2023-01-13 18:29:29 -05:00
Alexander Weaver
0e7640475f
Alerting: Store alertmanager configuration history in a separate table in the database (#60492)
* Update config store to split between active and history tables

* Migrations to fix up indexes

* Implement migration from old format to new

* Move add migrations call

* Delete duplicated rows

* Explicitly map fields

* Quote the column name because it's a reserved word

* Lift migrations to top

* Use XORM for nearly everything, avoid any non trivial raw SQL

* Touch up indexes and zero out IDs on move

* Drop TODO that's already completed

* Fix assignment of IDs
2023-01-04 10:43:26 -06:00
Alexander Weaver
91bd1cdb41
Revert "Alerting: Store alertmanager configuration history in a separate table in the database" (#60470)
Revert "Alerting: Store alertmanager configuration history in a separate table in the database (#60197)"

This reverts commit ec80f38c34.
2022-12-16 10:07:44 -05:00
Alexander Weaver
ec80f38c34
Alerting: Store alertmanager configuration history in a separate table in the database (#60197)
* Update config store to split between active and history tables

* Migrations to fix up indexes

* Implement migration from old format to new

* Move add migrations call

* Delete duplicated rows

* Explicitly map fields

* Quote the column name because it's a reserved word

* Lift migrations to top
2022-12-15 17:35:00 -06:00
Sofia Papagiannaki
11d8bcbea9
Guardian: Introduce additional constructors (#59577)
* Guardian: Use dashboard UID instead of ID

* Apply suggestions from code review

Introduce several guardian constructors and each time use
the most appropriate one.
2022-12-15 16:34:17 +02:00
Alexander Weaver
595e623c28
Alerting: Additional tests for the config store (#60130)
Additional tests for the config store
2022-12-12 11:11:18 -06:00
Joe Blubaugh
1a8d0e2736
Alerting: Speed up unit and integration tests. (#60067)
This change marks tests in the `sender` package that use an external
process as integration tests instead of unit tests. This speeds up the
package's unit tests by about 20 seconds.

This change also reduces the number of alert instances in the `store`
package's bulk write integration test from 20_000 to 10_000. This is
still enough to exercise the bulk-write code but speeds up the package
tests from about 250s to 130s.

Put together, integration tests go to about 160s while also speeding up
unit tests by 20s.
2022-12-12 14:21:06 +08:00
Sofia Papagiannaki
02b6b09121
Nested Folders: Set user in the API level (#59148) 2022-11-23 11:13:47 +02:00
Sofia Papagiannaki
9855e74b92
Chore: Refactor quota service (#58643)
Chore: Refactor quota service (#57586)

* Chore: refactore quota service

* Apply suggestions from code review
2022-11-14 21:08:10 +02:00