Commit Graph

98 Commits

Author SHA1 Message Date
Yuri Tseretyan
abb49d96b5
Alerting: update state manager to return StateTransition instead of State (#58867)
* improve test for stale states
* update state manager return StateTransition
* update scheduler to accept state transitions
2022-12-06 13:07:39 -05:00
Alexander Weaver
9977c7ea43
Alerting: Simplify scheduler configuration and remove dependency on Grafana-wide settings (#59735)
* Make scheduler not depend directly on grafana-wide settings

* Re-add missing interval
2022-12-02 16:02:07 -06:00
Yuri Tseretyan
bad4f28d0d
Alerting: update test TestAlertingTicker to not rely on clock (#58544)
* extract method processTick
* make processTick return scheduled rules
* move state manager tests to state manager
* update test
* move all tests into one file
* remove unused fields
2022-11-09 15:08:57 -05:00
Yuri Tseretyan
978f1119d7
Alerting: Run state manager as regular sub-service (#58246) 2022-11-04 17:06:47 -04:00
Ryan McKinley
e6a9fa1cf9
ServiceAccounts: enable service accounts after IsRealUser change (#58263)
* suppor service accounts

* add: IsServiceAccount to scheduleUser in scheduler

Co-authored-by: eleijonmarck <eric.leijonmarck@gmail.com>
2022-11-04 15:53:35 -04:00
Eric Leijonmarck
72d0c6b428
Auth: add IsServiceAccount to IsRealUser (#58015)
* add: IsServiceAccount to SignedInUser and IsRealUser

* fix: linting error

* refactor: add function IsServiceAccountUser()

By adding the function IsServiceAccountUser() we use it to identify for
ServiceAccounts in the HasUniqueID() since caching is built up on having
a uniqueID, see comment: https://github.com/grafana/grafana/pull/58015#discussion_r1011361880
2022-11-04 12:39:54 +00:00
Yuriy Tseretyan
e3a4bde622
Alerting: Condition evaluator with cached pipeline (#57479)
* create rule evaluator
* load header from the context
* init one factory
* update scheduler
2022-11-02 10:13:39 -04:00
Yuriy Tseretyan
0a4121cef8
Alerting: Contextual log provider for rule key (#57476)
* create contextual log context provider
* use contextual provider in scheduler
* init logger in the package
* use context for log context
* use context in state manager
2022-10-26 19:16:02 -04:00
Yuriy Tseretyan
f3c219a980
Alerting: update format of logs in scheduler (#57302)
* Change the severity level of the log messages
2022-10-20 13:43:48 -04:00
Alexander Weaver
3ddb28bad9
Find-and-replace 'err' logs to 'error' to match log search conventions (#57309) 2022-10-19 17:36:54 -04:00
Alexander Weaver
4eb8e4ff66
Alerting: Add traceability headers for alert queries (#57127)
* Define EvaluationContext

* Refactor ConditionEval to use new context struct

* Refactor QueriesAndExpressionsEval to use EvaluationContext

* Remove dead field from AlertExecCtx

* Refactor Validate to use EvaluationContext

* Get rid of privately used AlertExecCtx

* Move EvaluationContext to new file and add helper

* Add builder pattern and bind rule info to context

* Extract header logic and add rule UID header

* Fix missing call
2022-10-19 14:19:43 -05:00
Yuriy Tseretyan
ad2a1dd680
Alerting: Start ticker only when scheduler starts (#56339) 2022-10-05 09:35:02 -04:00
Alexander Weaver
a00879ae21
Alerting: Refactor store to not export its own interface for InstanceStore, delete dead dependency injection (#55772)
* Add consumer-side store interface to state manager

* Remove dead dependency

* Delete dead dependency in API struct

* Delete store-layer InstanceStore interface

* Move fake for state's InstanceStore interface to state package
2022-09-26 13:55:05 -05:00
Alexander Weaver
bd6a5c900f
Alerting: Extract ticker into shared package (#55703)
* Move ticker files to dedicated package with no changes

* Fix package naming and resolve naming conflicts

* Fix up all existing references to moved objects

* Remove all alerting-specific references from shared util

* Rename TickerMetrics to simply Metrics

* Rename base ticker type to T and rename NewTicker to simply New
2022-09-26 12:35:33 -05:00
Yuriy Tseretyan
0629d3922a
stop flushing state when Grafana stops (#55504) 2022-09-21 10:10:17 -04:00
Yuriy Tseretyan
896eeb65a9
Alerting: Fix alerting evaluation to use proper permissions (#55127)
* access control to log user name if it does not have permissions
* update ngalert Evaluator to accept user instead of creating a pseudo one
* update alerting eval (rule\query testing) API to provide the real user to the Evaluator
* update scheduler to create a pseudo user with proper permissions
2022-09-14 09:30:58 -04:00
Timur Olzhabayev
b5b41988cf
Docs: Deprecating packages_api and removing it from our pipelines (#54473) 2022-09-01 18:15:44 +02:00
Yuriy Tseretyan
76ea0b15ae
Alerting: Scheduler to fetch folders along with rules (#52842)
* Update GetAlertRulesForScheduling to query for folders (if needed)
* Update scheduler's alertRulesRegistry to cache folder titles along with rules
* Update rule eval loop to take folder title from the
* Extract interface RuleStore 
* Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.
2022-08-31 11:08:19 -04:00
Yuriy Tseretyan
03e746d9df
Alerting: Delete state from the database on reset (#53919)
* make ResetStatesByRuleUID return states
* delete rule states when reset
* rule eval routine to clean up the state only when rule is deleted
2022-08-25 14:12:22 -04:00
Yuriy Tseretyan
41bd36eb97
Alerting: Update rules delete endpoint to handle rules in group (#53790)
* update RouteDeleteAlertRules rules to update as a group
* remove expecter from scheduler mock to support variadic function
* create function to check for provisioning status + tests

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2022-08-24 15:33:33 -04:00
Yuriy Tseretyan
9f90a7b54d
Alerting: State manager to use InstanceStore (#53852)
* move saving the state to state manager when scheduler stops
* move saving state to ProcessEvalResults

* add GetRuleKey to State
* add LogContext to AlertRuleKey
2022-08-18 09:40:33 -04:00
idafurjes
a14621fff6
Chore: Add user service method SetUsingOrg and GetSignedInUserWithCacheCtx (#53343)
* Chore: Add user service method SetUsingOrg

* Chore: Add user service method GetSignedInUserWithCacheCtx

* Use method GetSignedInUserWithCacheCtx from user service

* Fix lint after rebase

* Fix lint

* Fix lint error

* roll back some changes

* Roll back changes in api and middleware

* Add xorm tags to SignedInUser ID fields
2022-08-11 13:28:55 +02:00
idafurjes
6afad51761
Move SignedInUser to user service and RoleType and Roles to org (#53445)
* Move SignedInUser to user service and RoleType and Roles to org

* Use go naming convention for roles

* Fix some imports and leftovers

* Fix ldap debug test

* Fix lint

* Fix lint 2

* Fix lint 3

* Fix type and not needed conversion

* Clean up messages in api tests

* Clean up api tests 2
2022-08-10 11:56:48 +02:00
Yuriy Tseretyan
5fb778814c
Alerting: Update rules version when folder title is updated (#53013)
* remove support for bus from scheduler
* rename event to FolderTitleUpdated and fire only if title has changed
* add method to increase version of all rules that belong to a folder
* update ngalert service to subscribe to folder title change event call data store and update scheduler
* add tests
2022-08-01 19:28:38 -04:00
Yuriy Tseretyan
a081764fd8
Alerting: Scheduler to use AlertRule (#52354)
* update GetAlertRulesForSchedulingQuery to have result AlertRule
* update fetcher utils and registry to support AlertRule
* alertRuleInfo to use alert rule instead of version
* update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation
* update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels

* remove unused function in ruleRoutine
* remove unused model SchedulableAlertRule

* store rule version in ruleRoutine instead of rule
* do not call the sender if nothing to send
2022-07-26 09:40:06 -04:00
George Robinson
e7feff6d99
Alerting: Move debug log line to where alert rules are updated (#52318) 2022-07-18 11:27:06 -04:00
Yuriy Tseretyan
6e1e4a4215
Alerting: Update DbStore to use disabled orgs from the config (#52156)
* update DbStore to use UnifiedAlerting settings
* remove disabled orgs from scheduler and use config in db store instead
* remove test
2022-07-15 14:13:30 -04:00
Yuriy Tseretyan
a7509ba18b
Alerting: rule evaluation loop's update channel to provide version (#52170)
* handler for update message in rule evaluation routine ignores the message if its version greater or equal.
* replace messages to update the channel if it is not empty
2022-07-15 12:32:52 -04:00
Yuriy Tseretyan
e5e8747ee9
Alerting: Update state manager to accept reserved labels (#52189)
* add tests for cache getOrCreate
* update ProcessEvalResults to accept extra lables
* extract to getRuleExtraLabels
* move populating of constant rule labels to extra labels
2022-07-14 15:59:59 -04:00
Yuriy Tseretyan
429ed4b4ee
remove unused orgStore from scheduler (#52157) 2022-07-13 10:34:35 -04:00
Yuriy Tseretyan
0d4c503d3d
update Evaluator interface to accept context (#52151) 2022-07-13 10:21:11 -04:00
Yuriy Tseretyan
554ebd647b
Alerting: Refactor Evaluator (#51673)
* AlertRule to return condition
* update ConditionEval to not return an error because it's always nil
* make getExprRequest private
* refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable.
* log error if results have errors
* change signature of evaluate function to not return an error
2022-07-12 16:51:32 -04:00
Yuriy Tseretyan
a6b1090879
Alerting: refactor scheduler and separate notification logic (#48144)
* Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router.
* Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface.
* Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.
2022-07-12 15:13:04 -04:00
Matthew Jacobson
28dd413c1d
Alerting: Add config disabled_labels to disable reserved labels (#51832)
* Alerting: Add config disabled_labels to disable reserved labels

[unified_alerting.reserved_labels]
disabled_labels

* Replace IsGrafanaFolderDisabled with more generic IsReservedLabelDisabled

* Simplify SchedulerCfg by including UnifiedAlertingSettings
2022-07-11 12:41:40 -04:00
Yuriy Tseretyan
94e709fdcb
Alerting: Simplify eval.Evaluator interface (#51463)
* remove ExpressionService from argument list of Evaluator's methods
2022-06-27 17:40:44 -04:00
Yuriy Tseretyan
157c12211d
Alerting: State manager to use tick time to determine stale states (#50991)
* use correct stale timestamp
* calculate stale using tick time instead of time.now

* remove unused dependency on sql store
2022-06-22 00:16:53 +02:00
Matthew Jacobson
5dee2ed24c
Alerting: Add first Grafana reserved label grafana_folder (#50262)
* Alerting: Add first Grafana reserved label g_label

g_label holds the title of the folder container the alert. The intention of this label
is to use it as part of the new default notification policy groupBy.

* Add nil check on updateRule labels map

* Disable gocyclo lint on schedule.ruleRoutine

will remove later in a separate refactoring PR to reduce complexity.

* Address doc suggestions

* Update g_folder for rules in folder when folder title changes

* Remove global bus in FolderService

* Modify tests to fit new common g_folder label

* Add changelog entry

* Fix merge conflicts

* Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`
2022-06-17 13:10:49 -04:00
gotjosh
c59938b235
Alerting: Schedule Alert rules metric tracking (#50415)
* Alerting: Schedule Alert rules metric tracking

Change the record of metrics from one place to two as an attempt to have a semi-accurate record.
2022-06-08 18:37:33 +01:00
Yuriy Tseretyan
a89d4a5be7
Alerting: Scheduler to drop ticks if a rule's evaluation is too slow (#48885)
* drop ticks if evaluation of a rule is too slow.
* add metric schedule_rule_evaluations_missed_total
2022-06-08 12:50:44 -04:00
gotjosh
0cde283505
Alerting: Logs should not be capitalized and the errors key should be "err" (#50333)
* Alerting: decapitalize log lines and use "err" as the key for errors

Found using (logger|log).(Warn|Debug|Info|Error)\([A-Z] and (logger|log).(Warn|Debug|Info|Error)\(.+"error"
2022-06-07 19:54:23 +02:00
George Robinson
c83f84348c
Alerting: Fix database unavailable removes rules from scheduler (#49874) 2022-06-07 16:20:06 +01:00
Yuriy Tseretyan
c8d891785d
Alerting: Ticker to support stopping (#48142)
* add stop for ticker
* stop ticker when scheduler stops
* stop ticker when legacy engine stops
2022-06-01 17:48:10 +02:00
Joe Blubaugh
1cc034d960
Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
Yuriy Tseretyan
99156b40bd
Alerting: Move alertRuleRegistry to its own file (#48890)
* move alertRuleRegistry to its own file
* move tests to separate file
2022-05-11 10:04:50 -04:00
Yuriy Tseretyan
dc33e09b24
simplify getting a slice of keys (#48889) 2022-05-11 09:44:31 -04:00
Yuriy Tseretyan
75ba4e98c6
Alerting: Remove unused features from ticker + metric + tests (#47828)
* remove not used code:
  - remove offset in ticket because it is not used
  - remove unused ticker and scheduler methods

* use duration for interval
* add metrics grafana_alerting_ticker_last_consumed_tick_timestamp_seconds, grafana_alerting_ticker_next_tick_timestamp_seconds, grafana_alerting_ticker_interval_seconds
2022-04-22 15:09:47 -04:00
Yuriy Tseretyan
e94d0c1b96
Alerting: update rule test endpoints to respect data source permissions (#47169)
* make eval.Evaluator an interface
* inject Evaluator to TestingApiSrv
* move conditionEval to RouteTestGrafanaRuleConfig because it is the only place where it is used
* update rule test api to check data source permissions
2022-04-02 02:00:23 +02:00
Yuriy Tseretyan
e20d157a9b
Alerting: rules delete API to check data source authorization (#46906)
* merge RuleSrv rule delete methods
* remove unused store methods
* implement delete by uid for fake store
* add scheduler mock
* implement tests for RouteDeleteAlertRules
2022-03-25 12:39:24 -04:00
George Robinson
f87bfdf2ff
Update comment for scheduler_behind_seconds metric (#45918) 2022-02-25 15:43:08 +01:00
George Robinson
6cccbb5a09
Fix incorrect metric values for scheduler_behind_seconds (#45830) 2022-02-25 11:40:30 +00:00