* access control to log user name if it does not have permissions
* update ngalert Evaluator to accept user instead of creating a pseudo one
* update alerting eval (rule\query testing) API to provide the real user to the Evaluator
* update scheduler to create a pseudo user with proper permissions
* WIP
* Set public_suffix to a pre Ruby 2.6 version
* we don't need to install python
* Stretch->Buster
* Bump versions in lib.star
* Manually update linter
Sort of messy, but the .mod-file need to contain all dependencies that
use 1.16+ features, otherwise they're assumed to be compiled with
-lang=go1.16 and cannot access generics et al.
Bingo doesn't seem to understand that, but it's possible to manually
update things to get Bingo happy.
* undo reformatting
* Various lint improvements
* More from the linter
* goimports -w ./pkg/
* Disable gocritic
* Add/modify linter exceptions
* lint + flatten nested list
Go 1.19 doesn't support nested lists, and there wasn't an obvious workaround.
https://go.dev/doc/comment#lists
Prior to this change, all alert instance writes and deletes happened
individually, in their own database transaction. This change batches up
writes or deletes for a given rule's evaluation loop into a single
transaction before applying it.
Before:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s
```
After:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s
```
So we cut time by about 75% and memory allocations by about 60% when
storing and deleting 100 instances.
This change also updates some of our tests so that they run successfully against postgreSQL - we were using random Int64s, but postgres integers, which our tables use, max out at 2^31-1
* Update GetAlertRulesForScheduling to query for folders (if needed)
* Update scheduler's alertRulesRegistry to cache folder titles along with rules
* Update rule eval loop to take folder title from the
* Extract interface RuleStore
* Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.
Removes various custom headers logic sprinkled around in the backend.
It should automatically be applied to outgoing HTTP requests via the
CustomHeadersMiddleware.
This also removes decryption of SecureJSONData to populate custom
headers in ngalert which seemed to have caused a ton of CPU usage.
* update RouteDeleteAlertRules rules to update as a group
* remove expecter from scheduler mock to support variadic function
* create function to check for provisioning status + tests
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* move saving the state to state manager when scheduler stops
* move saving state to ProcessEvalResults
* add GetRuleKey to State
* add LogContext to AlertRuleKey
* Chore: Add user service method SetUsingOrg
* Chore: Add user service method GetSignedInUserWithCacheCtx
* Use method GetSignedInUserWithCacheCtx from user service
* Fix lint after rebase
* Fix lint
* Fix lint error
* roll back some changes
* Roll back changes in api and middleware
* Add xorm tags to SignedInUser ID fields
* Move SignedInUser to user service and RoleType and Roles to org
* Use go naming convention for roles
* Fix some imports and leftovers
* Fix ldap debug test
* Fix lint
* Fix lint 2
* Fix lint 3
* Fix type and not needed conversion
* Clean up messages in api tests
* Clean up api tests 2
* remove support for bus from scheduler
* rename event to FolderTitleUpdated and fire only if title has changed
* add method to increase version of all rules that belong to a folder
* update ngalert service to subscribe to folder title change event call data store and update scheduler
* add tests
* update GetAlertRulesForSchedulingQuery to have result AlertRule
* update fetcher utils and registry to support AlertRule
* alertRuleInfo to use alert rule instead of version
* update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation
* update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels
* remove unused function in ruleRoutine
* remove unused model SchedulableAlertRule
* store rule version in ruleRoutine instead of rule
* do not call the sender if nothing to send
* move fake FakeExternalAlertmanager to sender package
* move tests from scheduler to router
* update alerts router to have all fields private
* update scheduler tests to use sender mock
* handler for update message in rule evaluation routine ignores the message if its version greater or equal.
* replace messages to update the channel if it is not empty
* add tests for cache getOrCreate
* update ProcessEvalResults to accept extra lables
* extract to getRuleExtraLabels
* move populating of constant rule labels to extra labels
* AlertRule to return condition
* update ConditionEval to not return an error because it's always nil
* make getExprRequest private
* refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable.
* log error if results have errors
* change signature of evaluate function to not return an error
* Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router.
* Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface.
* Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.
* Alerting: Add config disabled_labels to disable reserved labels
[unified_alerting.reserved_labels]
disabled_labels
* Replace IsGrafanaFolderDisabled with more generic IsReservedLabelDisabled
* Simplify SchedulerCfg by including UnifiedAlertingSettings
* Alerting: Add first Grafana reserved label g_label
g_label holds the title of the folder container the alert. The intention of this label
is to use it as part of the new default notification policy groupBy.
* Add nil check on updateRule labels map
* Disable gocyclo lint on schedule.ruleRoutine
will remove later in a separate refactoring PR to reduce complexity.
* Address doc suggestions
* Update g_folder for rules in folder when folder title changes
* Remove global bus in FolderService
* Modify tests to fit new common g_folder label
* Add changelog entry
* Fix merge conflicts
* Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`
Because Summary metrics can not be aggreated, convert them to histograms
so that users with HA deployments can use these metrics.
* Convert metrics registration to promauto.
* Improve help text style.
Signed-off-by: SuperQ <superq@gmail.com>
* Alerting: decapitalize log lines and use "err" as the key for errors
Found using (logger|log).(Warn|Debug|Info|Error)\([A-Z] and (logger|log).(Warn|Debug|Info|Error)\(.+"error"
Adds three functions:
`withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function.
`withStoredImage` does this for an image attached to a specific alert.
`openImage` finds and opens an image file on disk.
Moves `store.Image` to `models.Image`
Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods.
Updates all pkg/alert/notifier/channels to use withStoredImage routines.
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.
Fixes#42606
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
The State Manager will now take screenshots when an alert instance
switches to an Alerting or Resolved state.
Signed-off-by: Joe Blubaugh joe.blubaugh@grafana.com
This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service.
The screenshot package has the following services, most of which can be composed:
BrowserScreenshotService (Takes screenshots with headless Chrome)
CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService)
NoopScreenshotService (A no-op screenshot service for tests)
SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel)
ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable)
UploadingScreenshotService (A screenshot service that uploads taken screenshots)
The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296
This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.
* rename folder to match package name
* backend/sqlstore: move GetDashboard into DashboardService
This is a stepping-stone commit which copies the GetDashboard function - which lets us remove the sqlstore from the interfaces in dashboards - without changing any other callers.
* checkpoint: moving GetDashboard calls into dashboard service
* finish refactoring api tests for dashboardService.GetDashboard
* remove not used code:
- remove offset in ticket because it is not used
- remove unused ticker and scheduler methods
* use duration for interval
* add metrics grafana_alerting_ticker_last_consumed_tick_timestamp_seconds, grafana_alerting_ticker_next_tick_timestamp_seconds, grafana_alerting_ticker_interval_seconds
* Test composition simplification from last PR
* Policies use proper API model everywhere
* Expose policy provenance in API, miss some dep injection
* Complete injection
* fix args
* Tests for provenance value
* Extract test helpers so tests are very readable
* Single source adapter struct that was copied in 3 places
* Drop redundant test
* Resolve merge conflicts on changelog
Before this change, notifications generated by the Grafana Alertmanager
pointed to '/alerting/:ruleID/edit'. This change instead points them to
the view path '/alerting/grafana/:ruleID/view'. The view page has a
better UX, including timeseries display. It's also where many alert
state improvements will land in the next few versions of Grafana.
Fixes#45301
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
* Refactor GET am config to be extensible
* Extract post config route
* Fix tests
* Remove temporary duplication
* Fix broken test due to layer shift
* Fix duplicated error message
* Properly return 400 on config rejection
* Revert weird half method extraction
* Move things to notifier package and avoid redundant interface
* Simplify documentation
* Split encryption service and depend on minimal abstractions
* Properly initialize things all the way up to the composition root
* Encryption -> Crypto
* Address misc feedback
* Missing docstring
* Few more simple polish improvements
* Unify on MultiOrgAlertmanager. Discover bug in existing test
* Fix rebase conflicts
* Misc feedback, renames, docs
* Access crypto hanging off MultiOrgAlertmanager rather than having a separate API to initialize
* make eval.Evaluator an interface
* inject Evaluator to TestingApiSrv
* move conditionEval to RouteTestGrafanaRuleConfig because it is the only place where it is used
* update rule test api to check data source permissions