* Move ticker files to dedicated package with no changes
* Fix package naming and resolve naming conflicts
* Fix up all existing references to moved objects
* Remove all alerting-specific references from shared util
* Rename TickerMetrics to simply Metrics
* Rename base ticker type to T and rename NewTicker to simply New
* access control to log user name if it does not have permissions
* update ngalert Evaluator to accept user instead of creating a pseudo one
* update alerting eval (rule\query testing) API to provide the real user to the Evaluator
* update scheduler to create a pseudo user with proper permissions
* Update GetAlertRulesForScheduling to query for folders (if needed)
* Update scheduler's alertRulesRegistry to cache folder titles along with rules
* Update rule eval loop to take folder title from the
* Extract interface RuleStore
* Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.
* update RouteDeleteAlertRules rules to update as a group
* remove expecter from scheduler mock to support variadic function
* create function to check for provisioning status + tests
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* move saving the state to state manager when scheduler stops
* move saving state to ProcessEvalResults
* add GetRuleKey to State
* add LogContext to AlertRuleKey
* Chore: Add user service method SetUsingOrg
* Chore: Add user service method GetSignedInUserWithCacheCtx
* Use method GetSignedInUserWithCacheCtx from user service
* Fix lint after rebase
* Fix lint
* Fix lint error
* roll back some changes
* Roll back changes in api and middleware
* Add xorm tags to SignedInUser ID fields
* Move SignedInUser to user service and RoleType and Roles to org
* Use go naming convention for roles
* Fix some imports and leftovers
* Fix ldap debug test
* Fix lint
* Fix lint 2
* Fix lint 3
* Fix type and not needed conversion
* Clean up messages in api tests
* Clean up api tests 2
* remove support for bus from scheduler
* rename event to FolderTitleUpdated and fire only if title has changed
* add method to increase version of all rules that belong to a folder
* update ngalert service to subscribe to folder title change event call data store and update scheduler
* add tests
* update GetAlertRulesForSchedulingQuery to have result AlertRule
* update fetcher utils and registry to support AlertRule
* alertRuleInfo to use alert rule instead of version
* update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation
* update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels
* remove unused function in ruleRoutine
* remove unused model SchedulableAlertRule
* store rule version in ruleRoutine instead of rule
* do not call the sender if nothing to send
* handler for update message in rule evaluation routine ignores the message if its version greater or equal.
* replace messages to update the channel if it is not empty
* add tests for cache getOrCreate
* update ProcessEvalResults to accept extra lables
* extract to getRuleExtraLabels
* move populating of constant rule labels to extra labels
* AlertRule to return condition
* update ConditionEval to not return an error because it's always nil
* make getExprRequest private
* refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable.
* log error if results have errors
* change signature of evaluate function to not return an error
* Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router.
* Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface.
* Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.
* Alerting: Add config disabled_labels to disable reserved labels
[unified_alerting.reserved_labels]
disabled_labels
* Replace IsGrafanaFolderDisabled with more generic IsReservedLabelDisabled
* Simplify SchedulerCfg by including UnifiedAlertingSettings
* Alerting: Add first Grafana reserved label g_label
g_label holds the title of the folder container the alert. The intention of this label
is to use it as part of the new default notification policy groupBy.
* Add nil check on updateRule labels map
* Disable gocyclo lint on schedule.ruleRoutine
will remove later in a separate refactoring PR to reduce complexity.
* Address doc suggestions
* Update g_folder for rules in folder when folder title changes
* Remove global bus in FolderService
* Modify tests to fit new common g_folder label
* Add changelog entry
* Fix merge conflicts
* Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`
* Alerting: decapitalize log lines and use "err" as the key for errors
Found using (logger|log).(Warn|Debug|Info|Error)\([A-Z] and (logger|log).(Warn|Debug|Info|Error)\(.+"error"
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.
Fixes#42606
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
* remove not used code:
- remove offset in ticket because it is not used
- remove unused ticker and scheduler methods
* use duration for interval
* add metrics grafana_alerting_ticker_last_consumed_tick_timestamp_seconds, grafana_alerting_ticker_next_tick_timestamp_seconds, grafana_alerting_ticker_interval_seconds
* make eval.Evaluator an interface
* inject Evaluator to TestingApiSrv
* move conditionEval to RouteTestGrafanaRuleConfig because it is the only place where it is used
* update rule test api to check data source permissions
* Fix evaluation of alert rules for datasources with custom headers
* Fix unit tests
* Fix integration tests
* Evaluator fields should be package private
* (WIP) send alerts to external, internal, or both alertmanagers
* Modify admin configuration endpoint, update swagger docs
* Integration test for admin config updated
* Code review changes
* Fix alertmanagers choice not changing bug, add unit test
* Add AlertmanagersChoice as enum in swagger, code review changes
* Fix API and tests errors
* Change enum from int to string, use 'SendAlertsTo' instead of 'AlertmanagerChoice' where necessary
* Fix tests to reflect last changes
* Keep senders running when alerts are handled just internally
* Check if any external AM has been discovered before sending alerts, update tests
* remove duplicate data from logs
* update comment
* represent alertmanagers choice as an int instead of a string
* default alertmanagers choice to all alertmanagers, test cases
* update definitions and generate spec
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache.
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry.
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
* change registry.delete to return deleted struct
* use pointer to alertRuleInfo instead copying.
* do not access evaluation channel when routine is stopped
* remove stopCh and use context cancellation
* do not return ctx.Err when channel is cancelled because it cancels all other routines
* make alertRuleInfo fields and functions package private
Refactor usage of legacy data contracts. Moves legacy data contracts
to pkg/tsdb/legacydata package.
Refactor pkg/expr to be a proper service/dependency that can be provided
to wire to remove some unneeded dependencies to SSE in ngalert and other places.
Refactor pkg/expr to not use the legacydata,RequestHandler and use
backend.QueryDataHandler instead.
* move evaluation function out of loop
* extract updateRule function
* isolate alertRule change. update returns new rule and the evaluation accepts the rule as argument
* extract retry loop into a function
* add function wide log context.
* refactor metrics + add tests + replace timeNow with schedule.clock