* Alerting: Apply query optimization to eval endpoints
Previously, query optimization was applied to alert queries when scheduled but
not when ran through `api/v1/eval` or `/api/v1/rule/test/grafana`. This could
lead to discrepancies between preview and scheduled alert results.
* remove use of SignedInUserCopies
* add extra safety to not cross assign permissions
unwind circular dependency
dashboardacl->dashboardaccess
fix missing import
* correctly set teams for permissions
* fix missing inits
* nit: check err
* exit early for api keys
* update storage's method InstertRules to return ids of added rules as slice to keep the same order as rules in the argument
* schematize response of update rule group endpoint, add created, updated, deleted fields that contain UID of affected rules.
* update integration tests to use the new fields
* add folder data migration, fix unique index
* fix unique index
* pass a fake store in tests
* pass store into other providers in tests
* and now with alerting!
* Expose library element service's folder service
* Register library panels, add count implementation
* Expand folder counts test
* Update registry deletion method interface
* Allow getting library elements from any folder
* Add test for library panel deletion
* Add test for library panel counting
* Alerting: Fix unique violation when updating rule group with title chains/cycles
The uniqueness constraint for titles within an org+folder is enforced on every update within a transaction instead of on commit (deferred constraint). This means that there could be a set of updates that will throw a unique constraint violation in an intermediate step even though the final state is valid. For example, a chain of updates RuleA -> RuleB -> RuleC could fail if not executed in the correct order, or a swap of titles RuleA <-> RuleB cannot be executed in any order without violating the constraint.
The exact solution to this is complex and requires determining directed paths and cycles in the update graph, adding in temporary updates to break cycles, and then executing the updates in reverse topological order (see first commit in PR if curious).
This is not implemented here.
Instead, we choose a simpler solution that works in all cases but might perform more updates than necessary. This simpler solution makes a determination of whether an intermediate collision could occur and if so, adds a temporary title on all updated rules to break any cycles and remove the need for specific ordering.
In addition, we make sure diffs are executed in the following order: DELETES, UPDATES, INSERTS.
* Let alert rule service implement registry service
* Add count method to RuleStore interface
* Add implementation for deletion of alert rules
* Rename uid to folderUID in registry methods
* Check forceDeleteRule value for registry deletion
* Register alerting store with folder service
* Move folder test functions to separate package
* Add testing for alert rule counting, deletion
* Remove redundant count method
* Fix deleteChildrenInFolder signature
* Update pkg/services/ngalert/store/alert_rule.go
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* Add tests for nested folder deletion
* Refactor TestIntegrationNestedFolderService
* Add rules store as parameter for alertng provider
---------
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* (WIP) Refactor the ImageStore interface to work with our latest alerting repository
* update alerting package
* refactor, new URLExists method in ImageProvider
* tests for the new methods
* fix linter warnings
* use alertingImages as an alias for grafana/alerting/images
* logs about image uris and not found images
* nerf image not found logs
* extract duplicated code to getImageFromURI() method
* refactor getImageFromURI()
* add index on url
* add comment about migration log
* sync generated files
* use tokens or urls in image annotations
* improve tests, fix some comments
* fix empty tokens
* code review changes, check for url before checking for token (support old token formats)
* Delete folders, dashboards with registry service
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Update signature of ProvideDashboardServiceImpl
* Regenerate mockery file
* Add test for DeleteInFolder
* Add test for DeleteDashboardsInFolder
* Delete child dashboard associations via registry
* Add validation of folder uid and org id
---------
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Alerting: Remove and revert flag alertingBigTransactions
This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag.
Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count
came at the cost of significant increase in CPU usage and query latency.
* Fix lint backend
* Removed last bits of alertingBigTransactions
---------
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* Alerting: Add endpoint to revert to a previous alertmanager configuration
This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
* WIP
* skip invalid historic configurations instead of erroring
* add warning log when bad historic config is found
* remove unused custom marshaller for GettableHistoricUserConfig
* add id to historic user config, move limit check to store, fix typo
* swagger spec
* Alerting: get alert rules on faults (#61248)
Two functions used to fetch alert rules from DB are updated:
- GetAlertRulesForScheduling
- ListAlertRules
Rows are scanned one by one so good ones are returned.
Common Error is logged with indication how many
rules failed on deserialization.
Resolved: #61248
* updates from review comments
* change error to warning, apply correct format
* Update pkg/services/ngalert/store/alertmanager.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* different wording and data
---------
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Mark AM configuration as applied
* add missing checks, make linter happy
* fix deadlock, mark as valid on save and on load
* mark configurations only if needed
* check error after applyConfig()
* code review comments
* code review changes
* more code review changes
* clean HistoricConfigFromAlertConfig function
* Add is_paused attr to the POST alert rule group endpoint
* Add is_paused to alerting API POST alert rule group
* Fixed tests
* Add is_paused to alerting gettable endpoints
* Fix integration tests
* Alerting: allow to pause existing rules (#62401)
* Display Pause Rule switch in Editing Rule form
* add isPaused property to form interface and dto
* map isPaused prop with is_paused value from DTO
Also update test snapshots
* Append '(Paused)' text on alert list state column when appropriate
* Change Switch styles according to discussion with UX
Also adding a tooltip with info what this means
* Adjust styles
* Fix alignment and isPaused type definition
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* Fix test
* Fix test
* Fix RuleList test
---------
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* wip
* Fix tests and add comments to clarify AlertRuleWithOptionals
* Fix one more test
* Fix tests
* Fix typo in comment
* Fix alert rule(s) cannot be paused via API
* Add integration tests for alerting api pausing flow
* Remove duplicated integration test
---------
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Add field in alert_rule model, add state to alert_instance model, and state to eval
* Remove paused state from eval package
* Skip paused alert rules in scheduler
* Add migration to add is_paused field to alert_rule table
* Convert to postable alerts only if not normal, pernding, or paused
* Handle paused eval results in state manager
* Add Paused state to eval package
* Add paused alerts logic in scheduler
* Skip alert on scheduler
* Remove paused status from eval package
* Apply suggestions from code review
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Remove state
* Rethink schedule and manager for paused alerts
* Change return to continue
* Remove unused var
* Rethink alert pausing
* Paused alerts storing annotations
* Only add one state transition
* Revert boolean method renaming refactor
* Revert take image refactor
* Make registry errors public
* Revert method extraction for getting a folder title
* Revert variable renaming refactor
* Undo unnecessary changes
* Revert changes in test
* Remove IsPause check in PatchPartiLAlertRule function
* Use SetNormal to set state
* Fix text by returning to old behaviour on alert rule deletion
* Add test in schedule_unit_test.go to test ticks with paused alerts
* Add coment to clarify usage of context.Background()
* Add comment to clarify resetStateByRuleUID method usage
* Move rule get to a more limited scope
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* rum gofmt on pkg/services/ngalert/schedule/schedule.go
* Remove defer cancel for context
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/testing.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* skip scheduler rule state clean up on paused alert rule
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Fix mock in test
* Add (hopefully) final suggestions
* Use error channel from recordAnnotationsSync to cancel context
* Run make gen-cue
* Place pause alert check in channel update after version check
* Reduce branching un update channel select
* Add if for error and move code inside if in state manager ResetStateByRuleUID
* Add reason to logs
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Do not delete alert rule routine, just exit on eval if is paused
* Reduce branching and create-close a channel to avoid deadlocks
* Separate state deletion and state reset (includes history saving)
* Add current pause state in rule route in scheduler
* Split clearState and bring errCh closer to RecordStatesAsync call
* Change rule to ruleMeta in RecordStatesAsync
* copy state to be able to modify it
* Add timeout to context creation
* Shorten the timeout
* Use resetState is rule is paused and deleteState if rule is not paused
* Remove Empty state reason
* Save every rule change in historian
* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID
* Remove useless line
* Remove outdated comment
Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* add feature flag `alertingNoNormalState`
* update instance database to support exclusion of state in list operation
* do not save normal state and delete transitions to normal
* update get methods to filter out normal state
* Update config store to split between active and history tables
* Migrations to fix up indexes
* Implement migration from old format to new
* Move add migrations call
* Delete duplicated rows
* Explicitly map fields
* Quote the column name because it's a reserved word
* Lift migrations to top
* Use XORM for nearly everything, avoid any non trivial raw SQL
* Touch up indexes and zero out IDs on move
* Drop TODO that's already completed
* Fix assignment of IDs