Commit Graph

118 Commits

Author SHA1 Message Date
Yuriy Tseretyan
76ea0b15ae Alerting: Scheduler to fetch folders along with rules (#52842)
* Update GetAlertRulesForScheduling to query for folders (if needed)
* Update scheduler's alertRulesRegistry to cache folder titles along with rules
* Update rule eval loop to take folder title from the
* Extract interface RuleStore 
* Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.
2022-08-31 11:08:19 -04:00
Marcus Efraimsson
87afd9cadc Plugins: Remove various custom headers logic (#54146)
Removes various custom headers logic sprinkled around in the backend. 
It should automatically be applied to outgoing HTTP requests via the 
CustomHeadersMiddleware.
This also removes decryption of SecureJSONData to populate custom 
headers in ngalert which seemed to have caused a ton of CPU usage.
2022-08-26 11:56:10 +02:00
Yuriy Tseretyan
03e746d9df Alerting: Delete state from the database on reset (#53919)
* make ResetStatesByRuleUID return states
* delete rule states when reset
* rule eval routine to clean up the state only when rule is deleted
2022-08-25 14:12:22 -04:00
Yuriy Tseretyan
41bd36eb97 Alerting: Update rules delete endpoint to handle rules in group (#53790)
* update RouteDeleteAlertRules rules to update as a group
* remove expecter from scheduler mock to support variadic function
* create function to check for provisioning status + tests

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2022-08-24 15:33:33 -04:00
Yuriy Tseretyan
9f90a7b54d Alerting: State manager to use InstanceStore (#53852)
* move saving the state to state manager when scheduler stops
* move saving state to ProcessEvalResults

* add GetRuleKey to State
* add LogContext to AlertRuleKey
2022-08-18 09:40:33 -04:00
idafurjes
a14621fff6 Chore: Add user service method SetUsingOrg and GetSignedInUserWithCacheCtx (#53343)
* Chore: Add user service method SetUsingOrg

* Chore: Add user service method GetSignedInUserWithCacheCtx

* Use method GetSignedInUserWithCacheCtx from user service

* Fix lint after rebase

* Fix lint

* Fix lint error

* roll back some changes

* Roll back changes in api and middleware

* Add xorm tags to SignedInUser ID fields
2022-08-11 13:28:55 +02:00
idafurjes
6afad51761 Move SignedInUser to user service and RoleType and Roles to org (#53445)
* Move SignedInUser to user service and RoleType and Roles to org

* Use go naming convention for roles

* Fix some imports and leftovers

* Fix ldap debug test

* Fix lint

* Fix lint 2

* Fix lint 3

* Fix type and not needed conversion

* Clean up messages in api tests

* Clean up api tests 2
2022-08-10 11:56:48 +02:00
Yuriy Tseretyan
5fb778814c Alerting: Update rules version when folder title is updated (#53013)
* remove support for bus from scheduler
* rename event to FolderTitleUpdated and fire only if title has changed
* add method to increase version of all rules that belong to a folder
* update ngalert service to subscribe to folder title change event call data store and update scheduler
* add tests
2022-08-01 19:28:38 -04:00
Yuriy Tseretyan
a081764fd8 Alerting: Scheduler to use AlertRule (#52354)
* update GetAlertRulesForSchedulingQuery to have result AlertRule
* update fetcher utils and registry to support AlertRule
* alertRuleInfo to use alert rule instead of version
* update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation
* update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels

* remove unused function in ruleRoutine
* remove unused model SchedulableAlertRule

* store rule version in ruleRoutine instead of rule
* do not call the sender if nothing to send
2022-07-26 09:40:06 -04:00
Yuriy Tseretyan
054fe54b03 Alerting: Split Scheduler and AlertRouter tests (#52416)
* move fake FakeExternalAlertmanager to sender package
* move tests from scheduler to router
* update alerts router to have all fields private
* update scheduler tests to use sender mock
2022-07-19 09:32:54 -04:00
George Robinson
e7feff6d99 Alerting: Move debug log line to where alert rules are updated (#52318) 2022-07-18 11:27:06 -04:00
Yuriy Tseretyan
6e1e4a4215 Alerting: Update DbStore to use disabled orgs from the config (#52156)
* update DbStore to use UnifiedAlerting settings
* remove disabled orgs from scheduler and use config in db store instead
* remove test
2022-07-15 14:13:30 -04:00
Yuriy Tseretyan
a7509ba18b Alerting: rule evaluation loop's update channel to provide version (#52170)
* handler for update message in rule evaluation routine ignores the message if its version greater or equal.
* replace messages to update the channel if it is not empty
2022-07-15 12:32:52 -04:00
Yuriy Tseretyan
e5e8747ee9 Alerting: Update state manager to accept reserved labels (#52189)
* add tests for cache getOrCreate
* update ProcessEvalResults to accept extra lables
* extract to getRuleExtraLabels
* move populating of constant rule labels to extra labels
2022-07-14 15:59:59 -04:00
Yuriy Tseretyan
429ed4b4ee remove unused orgStore from scheduler (#52157) 2022-07-13 10:34:35 -04:00
Yuriy Tseretyan
0d4c503d3d update Evaluator interface to accept context (#52151) 2022-07-13 10:21:11 -04:00
Yuriy Tseretyan
554ebd647b Alerting: Refactor Evaluator (#51673)
* AlertRule to return condition
* update ConditionEval to not return an error because it's always nil
* make getExprRequest private
* refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable.
* log error if results have errors
* change signature of evaluate function to not return an error
2022-07-12 16:51:32 -04:00
Yuriy Tseretyan
a6b1090879 Alerting: refactor scheduler and separate notification logic (#48144)
* Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router.
* Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface.
* Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.
2022-07-12 15:13:04 -04:00
Matthew Jacobson
28dd413c1d Alerting: Add config disabled_labels to disable reserved labels (#51832)
* Alerting: Add config disabled_labels to disable reserved labels

[unified_alerting.reserved_labels]
disabled_labels

* Replace IsGrafanaFolderDisabled with more generic IsReservedLabelDisabled

* Simplify SchedulerCfg by including UnifiedAlertingSettings
2022-07-11 12:41:40 -04:00
George Robinson
6844ac9879 Alerting: Change __alertScreenshotToken__ to __alertImageToken__ (#50771) 2022-07-04 06:05:36 -04:00
Yuriy Tseretyan
94e709fdcb Alerting: Simplify eval.Evaluator interface (#51463)
* remove ExpressionService from argument list of Evaluator's methods
2022-06-27 17:40:44 -04:00
Yuriy Tseretyan
4b42cd3c1d Alerting: State manager to use clock (#51219)
* manager to use clock, to be able to mock real time
2022-06-22 12:18:42 -04:00
Yuriy Tseretyan
157c12211d Alerting: State manager to use tick time to determine stale states (#50991)
* use correct stale timestamp
* calculate stale using tick time instead of time.now

* remove unused dependency on sql store
2022-06-22 00:16:53 +02:00
Matthew Jacobson
5dee2ed24c Alerting: Add first Grafana reserved label grafana_folder (#50262)
* Alerting: Add first Grafana reserved label g_label

g_label holds the title of the folder container the alert. The intention of this label
is to use it as part of the new default notification policy groupBy.

* Add nil check on updateRule labels map

* Disable gocyclo lint on schedule.ruleRoutine

will remove later in a separate refactoring PR to reduce complexity.

* Address doc suggestions

* Update g_folder for rules in folder when folder title changes

* Remove global bus in FolderService

* Modify tests to fit new common g_folder label

* Add changelog entry

* Fix merge conflicts

* Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`
2022-06-17 13:10:49 -04:00
Ben Kochie
68691d7775 Convert some metrics to Histograms (#50420)
Because Summary metrics can not be aggreated, convert them to histograms
so that users with HA deployments can use these metrics.
* Convert metrics registration to promauto.
* Improve help text style.

Signed-off-by: SuperQ <superq@gmail.com>
2022-06-15 13:19:43 +02:00
gotjosh
c59938b235 Alerting: Schedule Alert rules metric tracking (#50415)
* Alerting: Schedule Alert rules metric tracking

Change the record of metrics from one place to two as an attempt to have a semi-accurate record.
2022-06-08 18:37:33 +01:00
Yuriy Tseretyan
a89d4a5be7 Alerting: Scheduler to drop ticks if a rule's evaluation is too slow (#48885)
* drop ticks if evaluation of a rule is too slow.
* add metric schedule_rule_evaluations_missed_total
2022-06-08 12:50:44 -04:00
gotjosh
0cde283505 Alerting: Logs should not be capitalized and the errors key should be "err" (#50333)
* Alerting: decapitalize log lines and use "err" as the key for errors

Found using (logger|log).(Warn|Debug|Info|Error)\([A-Z] and (logger|log).(Warn|Debug|Info|Error)\(.+"error"
2022-06-07 19:54:23 +02:00
George Robinson
c83f84348c Alerting: Fix database unavailable removes rules from scheduler (#49874) 2022-06-07 16:20:06 +01:00
Yuriy Tseretyan
c8d891785d Alerting: Ticker to support stopping (#48142)
* add stop for ticker
* stop ticker when scheduler stops
* stop ticker when legacy engine stops
2022-06-01 17:48:10 +02:00
Joe Blubaugh
9e8efaa459 Alerting: Add stored screenshot utilities to the channels package. (#49470)
Adds three functions:
`withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function.
`withStoredImage` does this for an image attached to a specific alert.
`openImage` finds and opens an image file on disk.

Moves `store.Image` to `models.Image`
Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods.
Updates all pkg/alert/notifier/channels to use withStoredImage routines.
2022-05-26 13:29:56 +08:00
Joe Blubaugh
1cc034d960 Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
Joe Blubaugh
1d724810de Alerting: State Manager takes screenshots. (#49338)
The State Manager will now take screenshots when an alert instance
switches to an Alerting or Resolved state.

Signed-off-by: Joe Blubaugh joe.blubaugh@grafana.com
2022-05-23 10:53:41 +08:00
Joe Blubaugh
687e79538b Alerting: Add a general screenshot service and alerting-specific image service. (#49293)
This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service.

The screenshot package has the following services, most of which can be composed:

BrowserScreenshotService (Takes screenshots with headless Chrome)
CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService)
NoopScreenshotService (A no-op screenshot service for tests)
SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel)
ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable)
UploadingScreenshotService (A screenshot service that uploads taken screenshots)

The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296

This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.
2022-05-22 22:33:49 +08:00
Kristin Laemmert
1df340ff28 backend/services: Move GetDashboard from sqlstore to dashboard service (#48971)
* rename folder to match package name
* backend/sqlstore: move GetDashboard into DashboardService

This is a stepping-stone commit which copies the GetDashboard function - which lets us remove the sqlstore from the interfaces in dashboards - without changing any other callers.
* checkpoint: moving GetDashboard calls into dashboard service
* finish refactoring api tests for dashboardService.GetDashboard
2022-05-17 14:52:22 -04:00
Yuriy Tseretyan
369fcc5e9a Alerting: scheduler to use short version of model for alert rule (#48916)
* scheduler to use a short version of alert rule model
2022-05-12 09:55:05 -04:00
Yuriy Tseretyan
99156b40bd Alerting: Move alertRuleRegistry to its own file (#48890)
* move alertRuleRegistry to its own file
* move tests to separate file
2022-05-11 10:04:50 -04:00
Yuriy Tseretyan
dc33e09b24 simplify getting a slice of keys (#48889) 2022-05-11 09:44:31 -04:00
Yuriy Tseretyan
75ba4e98c6 Alerting: Remove unused features from ticker + metric + tests (#47828)
* remove not used code:
  - remove offset in ticket because it is not used
  - remove unused ticker and scheduler methods

* use duration for interval
* add metrics grafana_alerting_ticker_last_consumed_tick_timestamp_seconds, grafana_alerting_ticker_next_tick_timestamp_seconds, grafana_alerting_ticker_interval_seconds
2022-04-22 15:09:47 -04:00
Alexander Weaver
8310789ef1 Indicate whether routes are provisioned when GETting Alertmanager configuration (#47857)
* Test composition simplification from last PR

* Policies use proper API model everywhere

* Expose policy provenance in API, miss some dep injection

* Complete injection

* fix args

* Tests for provenance value

* Extract test helpers so tests are very readable

* Single source adapter struct that was copied in 3 places

* Drop redundant test

* Resolve merge conflicts on changelog
2022-04-22 11:57:56 -05:00
Joe Blubaugh
3d91047e6e Alerting: Notification URL points to alert view page instead of alert edit page (#47752)
Before this change, notifications generated by the Grafana Alertmanager
pointed to '/alerting/:ruleID/edit'. This change instead points them to
the view path '/alerting/grafana/:ruleID/view'. The view page has a
better UX, including timeseries display. It's also where many alert
state improvements will land in the next few versions of Grafana.

Fixes #45301

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-04-20 21:43:55 +08:00
Alexander Weaver
758364e78b Alerting: Refactor GET/POST alerting config routes to be more extensible (#47229)
* Refactor GET am config to be extensible

* Extract post config route

* Fix tests

* Remove temporary duplication

* Fix broken test due to layer shift

* Fix duplicated error message

* Properly return 400 on config rejection

* Revert weird half method extraction

* Move things to notifier package and avoid redundant interface

* Simplify documentation

* Split encryption service and depend on minimal abstractions

* Properly initialize things all the way up to the composition root

* Encryption -> Crypto

* Address misc feedback

* Missing docstring

* Few more simple polish improvements

* Unify on MultiOrgAlertmanager. Discover bug in existing test

* Fix rebase conflicts

* Misc feedback, renames, docs

* Access crypto hanging off MultiOrgAlertmanager rather than having a separate API to initialize
2022-04-14 13:06:21 -05:00
Yuriy Tseretyan
e94d0c1b96 Alerting: update rule test endpoints to respect data source permissions (#47169)
* make eval.Evaluator an interface
* inject Evaluator to TestingApiSrv
* move conditionEval to RouteTestGrafanaRuleConfig because it is the only place where it is used
* update rule test api to check data source permissions
2022-04-02 02:00:23 +02:00
Yuriy Tseretyan
e20d157a9b Alerting: rules delete API to check data source authorization (#46906)
* merge RuleSrv rule delete methods
* remove unused store methods
* implement delete by uid for fake store
* add scheduler mock
* implement tests for RouteDeleteAlertRules
2022-03-25 12:39:24 -04:00
Yuriy Tseretyan
6610adf090 Alerting: remove UpdateRuleGroup from fake rule store (#46941)
* remove UpdateRuleGroup from fake rule store because It is not part of interface anymore
2022-03-24 19:29:19 -04:00
Yuriy Tseretyan
60d4cd80bf Alerting: update DeleteAlertRuleByUID to accept many UID (#46890) 2022-03-23 16:09:53 -04:00
Yuriy Tseretyan
4ee48c2e77 Alerting: Update GetRuleGroupAlertRules to accept optional rule group (#46889)
* rename GetRuleGroupAlertRules to GetAlertRules
* make rule group optional in GetAlertRulesQuery
* simplify FakeStore. the current structure did not support optional rule group
2022-03-23 17:36:25 +00:00
George Robinson
f87bfdf2ff Update comment for scheduler_behind_seconds metric (#45918) 2022-02-25 15:43:08 +01:00
George Robinson
6cccbb5a09 Fix incorrect metric values for scheduler_behind_seconds (#45830) 2022-02-25 11:40:30 +00:00
George Robinson
2ca79ca0c7 Rename evalCtx to avoid confusion with context.Context (#45144) 2022-02-25 11:09:20 +01:00