Commit Graph

66 Commits

Author SHA1 Message Date
Alex Moreno
174c61b949
Alerting: Set Dashboard and Panel IDs on rule group replacement (#60374)
* Set Dashboard and Panel IDs on rule group replacement

* fix comments and abbreviate test variable name

* Update pkg/services/ngalert/provisioning/alert_rules.go

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
2022-12-16 11:47:25 +01:00
George Robinson
76601f3ae7
Alerting: Better define how we set states (#59977)
This commit better defines how we set states in resultNormal,
resultAlerting, resultError and resultNoData. It changes the existing
code to call methods such as SetAlerting, SetPending, SetNormal,
SetError and NoData instead of assigning values to each individual field
whenever the state is changed. This should make it easier to understand
what fields should be set for which states and avoid cases where states are
missing, or have additional unexpected fields.
2022-12-08 20:12:13 +00:00
Sofia Papagiannaki
9855e74b92
Chore: Refactor quota service (#58643)
Chore: Refactor quota service (#57586)

* Chore: refactore quota service

* Apply suggestions from code review
2022-11-14 21:08:10 +02:00
George Robinson
c5ae1bcfe0
Alerting: Fix logging pointer address of DashboardUID and PanelID variables (#58539) 2022-11-10 09:58:38 +00:00
Alexander Weaver
2bfdda5b68
Alerting: Break dependency between state and image packages (#58381)
* Refactor state and manager to not depend directly on image interface

* Move generic errors to models package

* Move NotAvailableImageService to state as its only references are in state tests

* Move NoopImageService to state package

* Move mock to state package

* Fix linter error

* Fix comment styling

* Fix a couple added references introduced by rebase

* Empty commit to kick build
2022-11-09 15:06:49 -06:00
Kristin Laemmert
ef7145e4aa
feat(nested folders): Add CountAlertRulesInFolder to ngalert store (#58269)
* chore: refactor CountDashboardsInFolder to use the more efficient Count() sql function

* feat(nested folders): Add CountAlertRulesInFolder to ngalert store

This commit adds CountAlertRulesInFolder and a new model for the CountAlertRulesQuery. It returns a count of alert rules associated with a given orgID and parent folder UID. (the namespace referenced inside alert rules is the parent folder).

I'm not sure where this belongs in the ngalert service, so that will come in a future commit.
2022-11-08 11:51:00 +01:00
Sofia Papagiannaki
96cdf77995
Revert "Chore: Refactor quota service (#57586)" (#58394)
This reverts commit 326ea86a57.
2022-11-08 11:52:07 +02:00
Sofia Papagiannaki
326ea86a57
Chore: Refactor quota service (#57586)
* Chore: refactore quota service

* Apply suggestions from code review
2022-11-08 10:25:34 +02:00
Neel
db1fd10ff1
Alerting: Append org ID to alert notification URLs (#57123) 2022-11-07 16:03:25 +00:00
Yuriy Tseretyan
0a4121cef8
Alerting: Contextual log provider for rule key (#57476)
* create contextual log context provider
* use contextual provider in scheduler
* init logger in the package
* use context for log context
* use context in state manager
2022-10-26 19:16:02 -04:00
George Robinson
802d67eeca
Alerting: Support values in notification templates (#56457)
We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't.

Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via.

This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!
2022-10-10 13:40:21 +01:00
Alexander Weaver
d66ed6fe35
Alerting: Move stray model structs in store package to model package (#55968)
* Move stray command structs to model package like the rest

* Fix broken references
2022-09-29 15:47:56 -05:00
Alexander Weaver
d17ab82b98
Alerting: Break up store.RuleStore interface, delete dead code (#55776)
* Refactor state manager to not depend on rule store interface

* Refactor grafana and proxied ruler APIs to not depend on store.RuleStore

* Refactor folder subscription logic to not use store.RuleStore

* Delete dead code

* Delete store.RuleStore
2022-09-27 08:56:30 -05:00
Yuriy Tseretyan
2d38664fe6
Alerting: Improve validation of query and expressions on rule submit (#53258)
* Improve error messages of server-side expression 
* move validation of alert queries and a condition to eval package
2022-09-21 15:14:11 -04:00
Yuriy Tseretyan
199996cbf9
Alerting: Resolve stale state + add state reason to notifications (#49352)
* adds a new reserved annotation `grafana_state_reason`
* explicitly resolve stale states
2022-09-21 13:24:47 -04:00
Timur Olzhabayev
b5b41988cf
Docs: Deprecating packages_api and removing it from our pipelines (#54473) 2022-09-01 18:15:44 +02:00
Yuriy Tseretyan
76ea0b15ae
Alerting: Scheduler to fetch folders along with rules (#52842)
* Update GetAlertRulesForScheduling to query for folders (if needed)
* Update scheduler's alertRulesRegistry to cache folder titles along with rules
* Update rule eval loop to take folder title from the
* Extract interface RuleStore 
* Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.
2022-08-31 11:08:19 -04:00
Yuriy Tseretyan
9f90a7b54d
Alerting: State manager to use InstanceStore (#53852)
* move saving the state to state manager when scheduler stops
* move saving state to ProcessEvalResults

* add GetRuleKey to State
* add LogContext to AlertRuleKey
2022-08-18 09:40:33 -04:00
Alexander Weaver
f093c249ac
Alerting: Fix incorrect embedded DTO being returned when handling rule groups (#53701)
* Fix DTO embedding when getting/putting alert rule groups

* Drop usage of word 'Domain'

* Rename var as well
2022-08-12 16:36:50 -05:00
Jean-Philippe Quéméner
54217a2037
Alerting: set dashboard and panel id using annotations in provisioning api (#53221) 2022-08-03 16:05:32 +02:00
Yuriy Tseretyan
5fb778814c
Alerting: Update rules version when folder title is updated (#53013)
* remove support for bus from scheduler
* rename event to FolderTitleUpdated and fire only if title has changed
* add method to increase version of all rules that belong to a folder
* update ngalert service to subscribe to folder title change event call data store and update scheduler
* add tests
2022-08-01 19:28:38 -04:00
Yuriy Tseretyan
a081764fd8
Alerting: Scheduler to use AlertRule (#52354)
* update GetAlertRulesForSchedulingQuery to have result AlertRule
* update fetcher utils and registry to support AlertRule
* alertRuleInfo to use alert rule instead of version
* update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation
* update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels

* remove unused function in ruleRoutine
* remove unused model SchedulableAlertRule

* store rule version in ruleRoutine instead of rule
* do not call the sender if nothing to send
2022-07-26 09:40:06 -04:00
Yuriy Tseretyan
6e1e4a4215
Alerting: Update DbStore to use disabled orgs from the config (#52156)
* update DbStore to use UnifiedAlerting settings
* remove disabled orgs from scheduler and use config in db store instead
* remove test
2022-07-15 14:13:30 -04:00
Alexander Weaver
2d7389c34d
Alerting: Provisioning API respects global rule quota (#52180)
* Inject interface for quota service and create mock

* Check quota and return 403 if limit exceeded

* Implement tests for quota being exceeded
2022-07-13 17:36:17 -05:00
Yuriy Tseretyan
554ebd647b
Alerting: Refactor Evaluator (#51673)
* AlertRule to return condition
* update ConditionEval to not return an error because it's always nil
* make getExprRequest private
* refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable.
* log error if results have errors
* change signature of evaluate function to not return an error
2022-07-12 16:51:32 -04:00
George Robinson
6844ac9879
Alerting: Change __alertScreenshotToken__ to __alertImageToken__ (#50771) 2022-07-04 06:05:36 -04:00
Yuriy Tseretyan
8b3b667a47
Alerting: Fix rule API to accept 0 duration of field For (#50992)
* make 'for' pointer to distinguish between missing field and 0
* set 'for' to -1 if the value is missing but not allow negative in the request + path -1 with the value from original rule
* update store validation to not allow negative 'for'
* update usages to use pointer
2022-06-30 11:46:26 -04:00
Yuriy Tseretyan
4d02f73e5f
Alerting: Persist rule position in the group (#50051)
Migrations:
* add a new column alert_group_idx to alert_rule table
* add a new column alert_group_idx to alert_rule_version table
* re-index existing rules during migration

API:
* set group index on update. Use the natural order of items in  the array as group index
* sort rules in the group on GET
* update the version of all rules of all affected groups. This will make optimistic lock work in the case of multiple concurrent request touching the same groups.

UI:
* update UI to keep the order of alerts in a group
2022-06-22 10:52:46 -04:00
Matthew Jacobson
5dee2ed24c
Alerting: Add first Grafana reserved label grafana_folder (#50262)
* Alerting: Add first Grafana reserved label g_label

g_label holds the title of the folder container the alert. The intention of this label
is to use it as part of the new default notification policy groupBy.

* Add nil check on updateRule labels map

* Disable gocyclo lint on schedule.ruleRoutine

will remove later in a separate refactoring PR to reduce complexity.

* Address doc suggestions

* Update g_folder for rules in folder when folder title changes

* Remove global bus in FolderService

* Modify tests to fit new common g_folder label

* Add changelog entry

* Fix merge conflicts

* Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`
2022-06-17 13:10:49 -04:00
Yuriy Tseretyan
c314ce48c7
Alerting: Support for optimistic locking for alert rules (#50274)
* add support for optimistic locking for alert_rule table
* return 409 in the case of opitimistic lock
2022-06-13 12:15:28 -04:00
Jean-Philippe Quéméner
862f51216b
Alerting: improve provisioning docs (#50347)
* Alerting: improve provisioning docs

* add new provisioning page

* add api docs

* fix formatting and add better descriptions

* fix typo
2022-06-10 16:25:15 +02:00
Jean-Philippe Quéméner
cf684ed38f
Alerting: bump rule version when updating rule group interval (#50295)
* Alerting: move group update to alert rule service

* rename validateAlertRuleInterval to validateRuleGroupInterval

* init baseinterval correctly

* add seconds suffix

* extract validation function for reusability

* add context to err message
2022-06-09 09:28:32 +02:00
Yuriy Tseretyan
a89d4a5be7
Alerting: Scheduler to drop ticks if a rule's evaluation is too slow (#48885)
* drop ticks if evaluation of a rule is too slow.
* add metric schedule_rule_evaluations_missed_total
2022-06-08 12:50:44 -04:00
Yuriy Tseretyan
49d93fb67e
Alerting: Update alert rule diff to not see difference between nil and empty map (#50192) 2022-06-03 21:27:29 +02:00
Yuriy Tseretyan
ad25e2a20c
Alerting: Update RBAC for alert rules to consider access to rule as access to group it belongs (#49033)
* update authz to exclude entire group if user does not have access to rule
* change rule update authz to not return changes because if user does not have access to any rule in group, they do not have access to the rule
* a new query that returns alerts in group by UID of alert that belongs to that group
* collect all affected groups during calculate changes
* update authorize to check access to groups
* update tests for calculateChanges to assert new fields
* add authorization tests
2022-06-01 10:23:54 -04:00
Joe Blubaugh
1d724810de
Alerting: State Manager takes screenshots. (#49338)
The State Manager will now take screenshots when an alert instance
switches to an Alerting or Resolved state.

Signed-off-by: Joe Blubaugh joe.blubaugh@grafana.com
2022-05-23 10:53:41 +08:00
George Robinson
43358c7248
Alerting: Keep private annotations across evaluations (#49080) 2022-05-18 11:21:18 +02:00
Yuriy Tseretyan
952cb4fc0b
Alerting: introduce AlertRuleGroupKey and use it in API handlers (#48945)
* create AlertGroupKey structure
* update PrometheusSrv.
  - extract creation of RuleGroup to a separate method. Use group key for grouping
* update RuleSrv 
 - update calculateChanges to use groupKey
 - authorize to use groupkey
2022-05-16 15:45:45 -04:00
Yuriy Tseretyan
369fcc5e9a
Alerting: scheduler to use short version of model for alert rule (#48916)
* scheduler to use a short version of alert rule model
2022-05-12 09:55:05 -04:00
George Robinson
c5547123bc
Remove redundant queries in GetAlertRules and GetOrgAlertRules and replace with ListAlertRules (#48108) 2022-04-25 11:42:42 +01:00
George Robinson
d66fc6ed1a
Alerting: Add GetRuleGroups to RuleStore (#48036)
This commit adds a new method GetRuleGroups to RuleStore which returns the set of rule groups across all organizations.
2022-04-21 17:59:22 +01:00
Yuriy Tseretyan
4ee48c2e77
Alerting: Update GetRuleGroupAlertRules to accept optional rule group (#46889)
* rename GetRuleGroupAlertRules to GetAlertRules
* make rule group optional in GetAlertRulesQuery
* simplify FakeStore. the current structure did not support optional rule group
2022-03-23 17:36:25 +00:00
gotjosh
a338c78ca8
Alerting: Remove internal labels from prometheus compatible API responses (#46548)
* Alerting: Remove internal labels from prometheus compatible API responses

* Appease the linter

* Fix integration tests

* Fix API documentation & linter

* move removal of internal labels to the models
2022-03-16 16:04:19 +00:00
Yuriy Tseretyan
4502e40ed8
Alerting: Revert Revert "Alerting: Calculate diff for two AlertRules" (#46034)
* Revert "Revert "Alerting: Calculate diff for two AlertRules (#45877)" (#46023)"

This reverts commit 82aa5acba6.

* remove flakiness
2022-03-01 11:10:29 -05:00
Jean-Philippe Quéméner
82aa5acba6
Revert "Alerting: Calculate diff for two AlertRules (#45877)" (#46023)
This reverts commit 4e19d7df63.
2022-03-01 13:40:47 +01:00
Yuriy Tseretyan
4e19d7df63
Alerting: Calculate diff for two AlertRules (#45877)
* add custom diff reporter DiffReporter that reports only paths that have a difference
* create Diff method for AlertRule that returns DiffReport, which is an alias for []Diff

Tests:
* create copy method for AlertRule in testing
* create GenerateAlertQuery method in testing
2022-02-28 17:13:53 +01:00
Yuriy Tseretyan
f75bea481d
Alerting: validate rules and calculate changes in API controller (#45072)
* Update API controller
   - add validation of rules API model
   - add function to calculate changes between the submitted alerts and existing alerts
   - update RoutePostNameRulesConfig to validate input models, calculate changes and apply in a transaction

* Update DBStore
   - delete unused storage method. All the logic is moved upstream.
   - upsert to not modify fields of new by values from the existing alert
   - if rule has UID do not try to pull it from db. (it is done upstream)

* Add rule generator
2022-02-23 11:30:04 -05:00
Alexander Weaver
935059a376
Alerting: Create basic storage layer for provisioning (#44679)
* Simplistic store API for provenance lookups on arbitrary types

* Add a few notes in comments

* Improved type safety for provisioned objects

* Clean-up TODOs for future PRs

* Clean up provisioning model

* Clean up tests

* Restrict allowable types in interface

* Fix linter error

* Move AlertRule domain methods to same file as AlertRule definition

* Update pkg/services/ngalert/models/provisioning.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Complete interface rename

* Pass context through store API

* More idiomatic method names

* Better error description

* Improve code-docs

* Use ORM language instead of raw sql

* Add support for records in different orgs

* ResourceTypeID -> ResourceType since it's not an ID

Co-authored-by: George Robinson <george.robinson@grafana.com>
2022-02-04 13:23:19 -06:00
George Robinson
1b26d4d88e
Alerting: Create DatasourceError alert if evaluation returns error (#41869)
* Alerting: Create DatasourceError alert if evaluation returns error

* Alerting: Add docs for DatasourceError alert

* Alerting: Fix DatasourceError alert does not have dashboard_uid label

* Alerting: Add break when datasource_uid found

* Alerting: Update TestProcessEvalResults
2021-11-25 11:46:47 +01:00
Yuriy Tseretyan
5836def6c2
Alerting: declare constants for __dashboardUid__ and __panelId__ literals (#39976) 2021-10-07 17:30:06 -04:00