* Alerting: Repurpose rule testing endpoint to return potential alerts
This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations.
The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent.
This means that the endpoint will, among other things:
- Attach static annotations and labels from the rule configuration to the alert instances.
- Attach dynamic annotations from the datasource to the alert instances.
- Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances.
- Interpolate templated annotations / labels and accept allowed template functions.
* Alerting: Fix unique violation when updating rule group with title chains/cycles
The uniqueness constraint for titles within an org+folder is enforced on every update within a transaction instead of on commit (deferred constraint). This means that there could be a set of updates that will throw a unique constraint violation in an intermediate step even though the final state is valid. For example, a chain of updates RuleA -> RuleB -> RuleC could fail if not executed in the correct order, or a swap of titles RuleA <-> RuleB cannot be executed in any order without violating the constraint.
The exact solution to this is complex and requires determining directed paths and cycles in the update graph, adding in temporary updates to break cycles, and then executing the updates in reverse topological order (see first commit in PR if curious).
This is not implemented here.
Instead, we choose a simpler solution that works in all cases but might perform more updates than necessary. This simpler solution makes a determination of whether an intermediate collision could occur and if so, adds a temporary title on all updated rules to break any cycles and remove the need for specific ordering.
In addition, we make sure diffs are executed in the following order: DELETES, UPDATES, INSERTS.
* Alerting: Fix provisioned templates being ignored by alertmanager
Template provisioning sets the template in cfg.TemplateFiles while a recent change
made it so that alertmanager reads cfg.AlertmanagerConfig.Templates instead.
This change fixes the issue on both ends, by having provisioning set boths fields and
reverts the change on the alertmanager side so that it uses cfg.TemplateFiles.
* Let alert rule service implement registry service
* Add count method to RuleStore interface
* Add implementation for deletion of alert rules
* Rename uid to folderUID in registry methods
* Check forceDeleteRule value for registry deletion
* Register alerting store with folder service
* Move folder test functions to separate package
* Add testing for alert rule counting, deletion
* Remove redundant count method
* Fix deleteChildrenInFolder signature
* Update pkg/services/ngalert/store/alert_rule.go
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* Add tests for nested folder deletion
* Refactor TestIntegrationNestedFolderService
* Add rules store as parameter for alertng provider
---------
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* (WIP) Refactor the ImageStore interface to work with our latest alerting repository
* update alerting package
* refactor, new URLExists method in ImageProvider
* tests for the new methods
* fix linter warnings
* use alertingImages as an alias for grafana/alerting/images
* logs about image uris and not found images
* nerf image not found logs
* extract duplicated code to getImageFromURI() method
* refactor getImageFromURI()
* add index on url
* add comment about migration log
* sync generated files
* remove unused HasAdmin and HasEdit permission methods
* remove legacy AC from HasAccess method
* remove unused function
* update alerting tests to work with RBAC
* use tokens or urls in image annotations
* improve tests, fix some comments
* fix empty tokens
* code review changes, check for url before checking for token (support old token formats)
* update to alerting 20230418161049-5f374e58cb32
* rename renamed structs in https://github.com/grafana/alerting/pull/73
* update ValidateContactPoint to use BuildReceiverConfiguration
* update logger factory according to changes
* rewrite integration builder
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Alerting: Allow hooking into request handler functions.
Adds a facility to AlertNG for hooking into API handlers, allowing the
replacement of request handlers for specific paths. One of goals of this
approach was to allow hooking as late as possible in the request, e.g.
after all middleware has been applied, to simplfiy usage.
* Update pkg/services/ngalert/api/hooks.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Update pkg/services/ngalert/api/hooks.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Update pkg/services/ngalert/ngalert.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Fixes to review comments
* Fix passing logger in
---------
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Alerting: Add totalsFiltered to RuleResponse to facilitate hidden by filters count
Currently, when both a limit_alerts and a matcher/state filter is applied, there is not enough information to determine how many alert instances were hidden by the filters. Only enough to determine the total hidden by the limit and filter combined.
This change adds a separate totalsFiltered field alongside the AlertRule totals that will contain the count of instances after filters but before limits.
This commit fixes a bug where DatasourceUID and RefID annotations are
missing for DatasourceNoData alerts in Grafana 9.5. This bug affects
datasource plugins that have moved to using the data plane contract.
This commit adds support for limits and filters to the Prometheus Rules
API.
Limits:
It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:
- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.
Filters:
Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.
A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.
Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.
Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.
The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:
{
"name": "test",
"value": "value1",
"isRegex": false,
"isEqual": true
}
The `isRegex` and `isEqual` options work as follows:
| IsEqual | IsRegex | Operator |
| ------- | -------- | -------- |
| true | false | = |
| true | true | =~ |
| false | true | !~ |
| false | false | != |
* Delete folders, dashboards with registry service
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Update signature of ProvideDashboardServiceImpl
* Regenerate mockery file
* Add test for DeleteInFolder
* Add test for DeleteDashboardsInFolder
* Delete child dashboard associations via registry
* Add validation of folder uid and org id
---------
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* replace receiver errors with one from alerting
* add the converter to alerting models
* update buildReceiverIntegration to accept GrafanaReceiver
---------
Co-authored-by: George Robinson <george.robinson@grafana.com>
* define initial service and add to wire
* update caching service interface
* add skipQueryCache header handler and update metrics query function to use it
* add caching service as a dependency to query service
* working caching impl
* propagate cache status to frontend in response
* beginning of improvements suggested by Lean - separate caching logic from query logic.
* more changes to simplify query function
* Decided to revert renaming of function
* Remove error status from cache request
* add extra documentation
* Move query caching duration metric to query package
* add a little bit of documentation
* wip: convert resource caching
* Change return type of query service QueryData to a QueryDataResponse with Headers
* update codeowners
* change X-Cache value to const
* use resource caching in endpoint handlers
* write resource headers to response even if it's not a cache hit
* fix panic caused by lack of nil check
* update unit test
* remove NONE header - shouldn't show up in OSS
* Convert everything to use the plugin middleware
* revert a few more things
* clean up unused vars
* start reverting resource caching, start to implement in plugin middleware
* revert more, fix typo
* Update caching interfaces - resource caching now has a separate cache method
* continue wiring up new resource caching conventions - still in progress
* add more safety to implementation
* remove some unused objects
* remove some code that I left in by accident
* add some comments, fix codeowners, fix duplicate registration
* fix source of panic in resource middleware
* Update client decorator test to provide an empty response object
* create tests for caching middleware
* fix unit test
* Update pkg/services/caching/service.go
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
* improve error message in error log
* quick docs update
* Remove use of mockery. Update return signature to return an explicit hit/miss bool
* create unit test for empty request context
* rename caching metrics to make it clear they pertain to caching
* Update pkg/services/pluginsintegration/clientmiddleware/caching_middleware.go
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Add clarifying comments to cache skip middleware func
* Add comment pointing to the resource cache update call
* fix unit tests (missing dependency)
* try to fix mystery syntax error
* fix a panic
* Caching: Introduce feature toggle to caching service refactor (#66323)
* introduce new feature toggle
* hide calls to new service behind a feature flag
* remove licensing flag from toggle (misunderstood what it was for)
* fix unit tests
* rerun toggle gen
---------
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Takes a specific code path for data that identifies itself as dataplane instead of "guessing" what the data is.
The data must identify itself by being in the dataplane by having both the following frame metadata properties:
- TypeVersion property that is greater than 0.0
- 'Type' property
The flag is disableSSEDataplane and disables this functionality and uses the old code for all queries regardless.
See https://github.com/grafana/grafana-plugin-sdk-go/blob/main/data/contract_docs/contract.md for dataplane details.
* Alerting: Remove and revert flag alertingBigTransactions
This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag.
Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count
came at the cost of significant increase in CPU usage and query latency.
* Fix lint backend
* Removed last bits of alertingBigTransactions
---------
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* Alerting: Tiny refactor on the eval and schedule packages
two very small things:
- We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext`
- The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary.
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
---------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>