* update to alerting 20230418161049-5f374e58cb32
* rename renamed structs in https://github.com/grafana/alerting/pull/73
* update ValidateContactPoint to use BuildReceiverConfiguration
* update logger factory according to changes
* rewrite integration builder
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Alerting: Allow hooking into request handler functions.
Adds a facility to AlertNG for hooking into API handlers, allowing the
replacement of request handlers for specific paths. One of goals of this
approach was to allow hooking as late as possible in the request, e.g.
after all middleware has been applied, to simplfiy usage.
* Update pkg/services/ngalert/api/hooks.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Update pkg/services/ngalert/api/hooks.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Update pkg/services/ngalert/ngalert.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Fixes to review comments
* Fix passing logger in
---------
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Alerting: Add totalsFiltered to RuleResponse to facilitate hidden by filters count
Currently, when both a limit_alerts and a matcher/state filter is applied, there is not enough information to determine how many alert instances were hidden by the filters. Only enough to determine the total hidden by the limit and filter combined.
This change adds a separate totalsFiltered field alongside the AlertRule totals that will contain the count of instances after filters but before limits.
This commit adds support for limits and filters to the Prometheus Rules
API.
Limits:
It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:
- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.
Filters:
Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.
A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.
Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.
Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.
The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:
{
"name": "test",
"value": "value1",
"isRegex": false,
"isEqual": true
}
The `isRegex` and `isEqual` options work as follows:
| IsEqual | IsRegex | Operator |
| ------- | -------- | -------- |
| true | false | = |
| true | true | =~ |
| false | true | !~ |
| false | false | != |
* replace receiver errors with one from alerting
* add the converter to alerting models
* update buildReceiverIntegration to accept GrafanaReceiver
---------
Co-authored-by: George Robinson <george.robinson@grafana.com>
* define initial service and add to wire
* update caching service interface
* add skipQueryCache header handler and update metrics query function to use it
* add caching service as a dependency to query service
* working caching impl
* propagate cache status to frontend in response
* beginning of improvements suggested by Lean - separate caching logic from query logic.
* more changes to simplify query function
* Decided to revert renaming of function
* Remove error status from cache request
* add extra documentation
* Move query caching duration metric to query package
* add a little bit of documentation
* wip: convert resource caching
* Change return type of query service QueryData to a QueryDataResponse with Headers
* update codeowners
* change X-Cache value to const
* use resource caching in endpoint handlers
* write resource headers to response even if it's not a cache hit
* fix panic caused by lack of nil check
* update unit test
* remove NONE header - shouldn't show up in OSS
* Convert everything to use the plugin middleware
* revert a few more things
* clean up unused vars
* start reverting resource caching, start to implement in plugin middleware
* revert more, fix typo
* Update caching interfaces - resource caching now has a separate cache method
* continue wiring up new resource caching conventions - still in progress
* add more safety to implementation
* remove some unused objects
* remove some code that I left in by accident
* add some comments, fix codeowners, fix duplicate registration
* fix source of panic in resource middleware
* Update client decorator test to provide an empty response object
* create tests for caching middleware
* fix unit test
* Update pkg/services/caching/service.go
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
* improve error message in error log
* quick docs update
* Remove use of mockery. Update return signature to return an explicit hit/miss bool
* create unit test for empty request context
* rename caching metrics to make it clear they pertain to caching
* Update pkg/services/pluginsintegration/clientmiddleware/caching_middleware.go
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Add clarifying comments to cache skip middleware func
* Add comment pointing to the resource cache update call
* fix unit tests (missing dependency)
* try to fix mystery syntax error
* fix a panic
* Caching: Introduce feature toggle to caching service refactor (#66323)
* introduce new feature toggle
* hide calls to new service behind a feature flag
* remove licensing flag from toggle (misunderstood what it was for)
* fix unit tests
* rerun toggle gen
---------
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Alerting: Tiny refactor on the eval and schedule packages
two very small things:
- We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext`
- The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary.
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
---------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Alerting: Add endpoint to revert to a previous alertmanager configuration
This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:
1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
* WIP
* skip invalid historic configurations instead of erroring
* add warning log when bad historic config is found
* remove unused custom marshaller for GettableHistoricUserConfig
* add id to historic user config, move limit check to store, fix typo
* swagger spec
* move export rules to definitions package
* move provisioning contact point methods to provisioning package
* move AlertRuleGroupWithFolderTitle to ngalert models and adapter functions to api's compat
* move rule_types files back to where they were before.
* copy AlertQuery from ngmodels to the definition package
* replaces usages of ngmodels.AlertQuery in API models
* create a converter between models of AlertQuery
---------
Co-authored-by: Alex Moreno <alexander.moreno@grafana.com>
* Rename RecordStatesAsync to Record
* Rename QueryStates to Query
* Implement fanout writes
* Implement primary queries
* Simplify error joining
* Add test for query path
* Add tests for writes and error propagation
* Allow fanout backend to be configured
* Touch up log messages and config validation
* Consistent documentation for all backend structs
* Parse and normalize backend names more consistently against an enum
* Touch-ups to documentation
* Improve clarity around multi-record blocking
* Keep primary and secondaries more distinct
* Rename fanout backend to multiple backend
* Simplify config keys for multi backend mode
* stop using the scheduler's Update and Delete methods all communication must be via the database
* update scheduler's registry to calculate diff before re-setting the cache
* update fetcher to return the diff generated by registry
* update processTick to update rule eval routine if the rule was updated and it is not going to be evaluated at this tick.
* remove references to the scheduler from api package
* remove unused methods in the scheduler
* Remove Result field from AddDataSourceCommand
* Remove DatasourcesPermissionFilterQuery Result
* Remove GetDataSourceQuery Result
* Remove GetDataSourcesByTypeQuery Result
* Remove GetDataSourcesQuery Result
* Remove GetDefaultDataSourceQuery Result
* Remove UpdateDataSourceCommand Result
* Alerting: Fix template validation in provisioning api
Fix issue where provisioning API accepts a malformed template having extra
text outside of definition block and template name matching definition name.
* Mark AM configuration as applied
* add missing checks, make linter happy
* fix deadlock, mark as valid on save and on load
* mark configurations only if needed
* check error after applyConfig()
* code review comments
* code review changes
* more code review changes
* clean HistoricConfigFromAlertConfig function
* Define endpoint and generate
* Wire up and register endpoint
* Cleanup, define authorization
* Forgot the leading slash
* Wire up query and SignedInUser
* Wire up timerange query params
* Add todo for label queries
* Drop comment
* Update path to rules subtree
* Set YAML as default value for exporting alert rules
* use YAML format for rule list export
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
* lint
* Add new format query param to swagger+docs
* Fix broken test
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
Co-authored-by: Matt Jacobson <matthew.jacobson@grafana.com>
* Add is_paused attr to the POST alert rule group endpoint
* Add is_paused to alerting API POST alert rule group
* Fixed tests
* Add is_paused to alerting gettable endpoints
* Fix integration tests
* Alerting: allow to pause existing rules (#62401)
* Display Pause Rule switch in Editing Rule form
* add isPaused property to form interface and dto
* map isPaused prop with is_paused value from DTO
Also update test snapshots
* Append '(Paused)' text on alert list state column when appropriate
* Change Switch styles according to discussion with UX
Also adding a tooltip with info what this means
* Adjust styles
* Fix alignment and isPaused type definition
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* Fix test
* Fix test
* Fix RuleList test
---------
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* wip
* Fix tests and add comments to clarify AlertRuleWithOptionals
* Fix one more test
* Fix tests
* Fix typo in comment
* Fix alert rule(s) cannot be paused via API
* Add integration tests for alerting api pausing flow
* Remove duplicated integration test
---------
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Use suggested value for uid
* update the snapshot
* use __expr__
* replace all -100 with __expr__
* update snapshot
* more changes
* revert redundant change
* Use expr.DatasourceUID where it's possible
* generate files
* Allow pausing alerts from provisioning
* Update swagger
* Add IsPaused to provision export endpoints
* Add pause field in sample.yml
* Add exception for reset state in first loop iteration of scheduler if rule is paused
* Update provision definition and swagger docs
* Fix provisioning export tests
* Suggestion: Simplify if condition
* Add more context to a comment
This adds provisioning endpoints for downloading alert rules and alert rule groups in a
format that is compatible with file provisioning. Each endpoint supports both json and
yaml response types via Accept header as well as a query parameter
download=true/false that will set Content-Disposition to recommend initiating a download
or inline display.
This also makes some package changes to keep structs with potential to drift closer
together. Eventually, other alerting file structs should also move into this new file
package, but the rest require some refactoring that is out of scope for this PR.
* Add field in alert_rule model, add state to alert_instance model, and state to eval
* Remove paused state from eval package
* Skip paused alert rules in scheduler
* Add migration to add is_paused field to alert_rule table
* Convert to postable alerts only if not normal, pernding, or paused
* Handle paused eval results in state manager
* Add Paused state to eval package
* Add paused alerts logic in scheduler
* Skip alert on scheduler
* Remove paused status from eval package
* Apply suggestions from code review
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Remove state
* Rethink schedule and manager for paused alerts
* Change return to continue
* Remove unused var
* Rethink alert pausing
* Paused alerts storing annotations
* Only add one state transition
* Revert boolean method renaming refactor
* Revert take image refactor
* Make registry errors public
* Revert method extraction for getting a folder title
* Revert variable renaming refactor
* Undo unnecessary changes
* Revert changes in test
* Remove IsPause check in PatchPartiLAlertRule function
* Use SetNormal to set state
* Fix text by returning to old behaviour on alert rule deletion
* Add test in schedule_unit_test.go to test ticks with paused alerts
* Add coment to clarify usage of context.Background()
* Add comment to clarify resetStateByRuleUID method usage
* Move rule get to a more limited scope
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* rum gofmt on pkg/services/ngalert/schedule/schedule.go
* Remove defer cancel for context
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/testing.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* skip scheduler rule state clean up on paused alert rule
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Fix mock in test
* Add (hopefully) final suggestions
* Use error channel from recordAnnotationsSync to cancel context
* Run make gen-cue
* Place pause alert check in channel update after version check
* Reduce branching un update channel select
* Add if for error and move code inside if in state manager ResetStateByRuleUID
* Add reason to logs
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Do not delete alert rule routine, just exit on eval if is paused
* Reduce branching and create-close a channel to avoid deadlocks
* Separate state deletion and state reset (includes history saving)
* Add current pause state in rule route in scheduler
* Split clearState and bring errCh closer to RecordStatesAsync call
* Change rule to ruleMeta in RecordStatesAsync
* copy state to be able to modify it
* Add timeout to context creation
* Shorten the timeout
* Use resetState is rule is paused and deleteState if rule is not paused
* Remove Empty state reason
* Save every rule change in historian
* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID
* Remove useless line
* Remove outdated comment
Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
This commit renames "Message templates" to "Notification templates"
in the user interface as it suggests that these templates cannot
be used to template anything other than the message. However, message
templates are much more general and can be used to template other fields
too such as the subject of an email, or the title of a Slack message.