Commit Graph

1025 Commits

Author SHA1 Message Date
Jean-Philippe Quéméner
5717d8954f
Alerting: Return empty list on export if no rules exist (#69023) 2023-05-25 14:12:18 +02:00
Ieva
4980b64274
RBAC: Remove legacy ac from authorization middleware (#68898)
remove legacy AC fallback from RBAC middleware, and some unused auth logic
2023-05-24 09:49:42 +01:00
Yuri Tseretyan
ab5a3820d5
Alerting: Fix status code of successful response POST /api/alertmanager/grafana/api/v2/silences in swagger specs (#67951)
* update status code to reflect reality

* update docs
2023-05-15 11:23:30 -04:00
Emil Tullstedt
23a9963507
Chore: Upgrade Prometheus to 2.43.0 (#67853)
- github.com/prometheus/prometheus => 2.43.0 (aka 0.43.0)
- github.com/prometheus/client_golang => 1.15.1
2023-05-10 14:09:49 +02:00
Virginia Cepeda
e1ff434917
Alerting: Change text on cloud AM email addresses for contact points (#68143) 2023-05-10 10:44:17 +02:00
Matthew Jacobson
5422609fb1
Alerting: Fix broken integration test (#68140)
From https://github.com/grafana/grafana/pull/68122
2023-05-09 22:27:40 +03:00
Jean-Philippe Quéméner
8bb62a8316
Alerting: Add option for memberlist label (#67982) 2023-05-09 10:32:23 +02:00
Matthew Jacobson
91471ac7ae
Alerting: Template Testing API (#67450) 2023-04-28 15:56:59 +01:00
Yuri Tseretyan
9eb10bee1f
Alerting: Scheduler use rule fingerprint instead of version (#66531)
* implement calculation of fingerprint for ruleWithFolder
* update scheduler to use fingerprint instead of rule's version
2023-04-28 10:42:16 -04:00
Uwe Sommerlatt
dfc99cdd19
Alerting: Fix misleading status code in provisioning API (#67331)
Fixes #66249
2023-04-27 09:25:34 +01:00
Santiago
b0881daf23
Alerting: Use URLs in image annotations (#66804)
* use tokens or urls in image annotations

* improve tests, fix some comments

* fix empty tokens

* code review changes, check for url before checking for token (support old token formats)
2023-04-26 13:06:18 -03:00
Yuri Tseretyan
a8b4a4bb45
Alerting: Update alerting module to 20230418161049-5f374e58cb32 + refactoring (#66622)
* update to alerting 20230418161049-5f374e58cb32
* rename renamed structs in https://github.com/grafana/alerting/pull/73
* update ValidateContactPoint to use BuildReceiverConfiguration
* update logger factory according to changes
* rewrite integration builder
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2023-04-25 13:39:46 -04:00
Alexander Weaver
117636e8ca
Alerting: Fix panic when reparenting receivers to groups following an attempted rename via Provisioning (#67167) 2023-04-24 21:23:23 -04:00
Steve Simpson
9effb9a708
Alerting: Allow hooking into request handler functions. (#67000)
* Alerting: Allow hooking into request handler functions.

Adds a facility to AlertNG for hooking into API handlers, allowing the
replacement of request handlers for specific paths. One of goals of this
approach was to allow hooking as late as possible in the request, e.g.
after all middleware has been applied, to simplfiy usage.

* Update pkg/services/ngalert/api/hooks.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update pkg/services/ngalert/api/hooks.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update pkg/services/ngalert/ngalert.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Fixes to review comments

* Fix passing logger in

---------

Co-authored-by: gotjosh <josue.abreu@gmail.com>
2023-04-24 18:18:44 +02:00
Jean-Philippe Quéméner
bbce69f295
Alerting: Use configured headers for external alertmanager (#63819) 2023-04-21 16:16:27 +02:00
Matthew Jacobson
eddd4f4508
Alerting: Add totalsFiltered to RuleResponse for hidden by filters count (#66883)
Alerting: Add totalsFiltered to RuleResponse to facilitate hidden by filters count

Currently, when both a limit_alerts and a matcher/state filter is applied, there is not enough information to determine how many alert instances were hidden by the filters. Only enough to determine the total hidden by the limit and filter combined.

This change adds a separate totalsFiltered field alongside the AlertRule totals that will contain the count of instances after filters but before limits.
2023-04-21 09:35:12 +01:00
George Robinson
35342a3c76
Alerting: Fix DatasourceUID and RefID missing for DatasourceNoData alerts (#66733)
This commit fixes a bug where DatasourceUID and RefID annotations are
missing for DatasourceNoData alerts in Grafana 9.5. This bug affects
datasource plugins that have moved to using the data plane contract.
2023-04-20 14:38:20 +01:00
George Robinson
883dcc81c0
Alerting: Add tests for Evaluate (#66739) 2023-04-20 11:24:40 +01:00
Alexander Weaver
3634079b8f
Alerting: Attach hash of instance labels to state history log lines (#65968)
* Add instanceID which is hash of labels

* Rename field to fingerprint

* Move to prometheus style signature

* Appease linter
2023-04-19 14:22:19 -05:00
Jean-Philippe Quéméner
bc11a484ed
Alerting: Add support for running HA using Redis (#65267)
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
2023-04-19 17:05:26 +02:00
Alexander Weaver
a384194e15
Alerting: Use default page size of 5000 when querying Loki for state history (#66315)
Always specify limit of 5000
2023-04-18 14:31:29 -05:00
Alexander Weaver
cf7157f683
Alerting: Capture refID of rule's condition expression in Loki state history entries (#66419)
* Capture condition from rule

* Add test
2023-04-18 14:21:28 -05:00
Alex Moreno
f64a89727e
Alerting: Allow provenance disable in alerting provisioning API (#63650)
* Allow provenance None in alert rule update and rule group replace

* Allow provenance None in contact point update

* Allow updating policies to none by sending x-disable-provenance header

* Allow mute timings to disable provenance with x-disable-provenance header

* Allow disabling provenance by using x-disable-provenance header

* Add provenance helper to lower the cyclomatic complexity

* Do not downgrade provenance except un ReplaceRuleGroup

* Add function explanation and change error handling

* Add docs for x-disable-provenance changes (#66300)

* Add docs for x-disable-provenance changes

* Apply suggestions from code review

Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>

* Update _index.md

---------

Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>

* Update docs/sources/alerting/set-up/provision-alerting-resources/_index.md

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Add error message check in tests

* Change docs

---------

Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
2023-04-18 15:10:36 +02:00
Kyle Brandt
840fb32ad8
SSE: (Instrumentation) Add Tracing (#66700)
spans are prefixed `SSE.`
2023-04-18 08:04:51 -04:00
Kyle Brandt
2f13c851e4
SSE: (Chore/Instrumentation) Add ds_queries_total metric and move met… (#66695)
* SSE: (Chore/Instrumentation) Add ds_queries_total metric and move metrics to service
2023-04-17 16:12:44 -07:00
George Robinson
19ebb079ba
Alerting: Add limits and filters to Prometheus Rules API (#66627)
This commit adds support for limits and filters to the Prometheus Rules
API.

Limits:

It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:

- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule

It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.

Filters:

Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.

A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.

Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.

Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.

The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:

{
    "name": "test",
    "value": "value1",
    "isRegex": false,
    "isEqual": true
}

The `isRegex` and `isEqual` options work as follows:

| IsEqual | IsRegex  | Operator |
| ------- | -------- | -------- |
| true    | false    |    =     |
| true    | true     |    =~    |
| false   | true     |    !~    |
| false   | false    |    !=    |
2023-04-17 17:45:06 +01:00
Arati R
cab3ba519a
NestedFolders: Add folder service registry with dashboard service implementation (#65033)
* Delete folders, dashboards with registry service
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Update signature of ProvideDashboardServiceImpl
* Regenerate mockery file
* Add test for DeleteInFolder
* Add test for DeleteDashboardsInFolder
* Delete child dashboard associations via registry
* Add validation of folder uid and org id

---------

Co-authored-by: Serge Zaitsev <hello@zserge.com>
2023-04-14 11:17:23 +02:00
Yuri Tseretyan
afd52d0866
Alerting: use alerting GrafanaReceiver and BuildReceiverConfiguration in Grafana (#65224)
* replace receiver errors with one from alerting
* add the converter to alerting models
* update buildReceiverIntegration to accept GrafanaReceiver
---------

Co-authored-by: George Robinson <george.robinson@grafana.com>
2023-04-13 12:25:32 -04:00
gotjosh
2bbf0c9de4
Alerting: Allow Rules to Schedule to be filtered by Rule Group (#59990)
* Alerting: Allow Rules to Schedule to be filtered by Rule Group
2023-04-13 12:55:42 +01:00
Michael Mandrus
5626461b3c
Caching: Refactor enterprise query caching middleware to a wire service (#65616)
* define initial service and add to wire

* update caching service interface

* add skipQueryCache header handler and update metrics query function to use it

* add caching service as a dependency to query service

* working caching impl

* propagate cache status to frontend in response

* beginning of improvements suggested by Lean - separate caching logic from query logic.

* more changes to simplify query function

* Decided to revert renaming of function

* Remove error status from cache request

* add extra documentation

* Move query caching duration metric to query package

* add a little bit of documentation

* wip: convert resource caching

* Change return type of query service QueryData to a QueryDataResponse with Headers

* update codeowners

* change X-Cache value to const

* use resource caching in endpoint handlers

* write resource headers to response even if it's not a cache hit

* fix panic caused by lack of nil check

* update unit test

* remove NONE header - shouldn't show up in OSS

* Convert everything to use the plugin middleware

* revert a few more things

* clean up unused vars

* start reverting resource caching, start to implement in plugin middleware

* revert more, fix typo

* Update caching interfaces - resource caching now has a separate cache method

* continue wiring up new resource caching conventions - still in progress

* add more safety to implementation

* remove some unused objects

* remove some code that I left in by accident

* add some comments, fix codeowners, fix duplicate registration

* fix source of panic in resource middleware

* Update client decorator test to provide an empty response object

* create tests for caching middleware

* fix unit test

* Update pkg/services/caching/service.go

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>

* improve error message in error log

* quick docs update

* Remove use of mockery. Update return signature to return an explicit hit/miss bool

* create unit test for empty request context

* rename caching metrics to make it clear they pertain to caching

* Update pkg/services/pluginsintegration/clientmiddleware/caching_middleware.go

Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>

* Add clarifying comments to cache skip middleware func

* Add comment pointing to the resource cache update call

* fix unit tests (missing dependency)

* try to fix mystery syntax error

* fix a panic

* Caching: Introduce feature toggle to caching service refactor (#66323)

* introduce new feature toggle

* hide calls to new service behind a feature flag

* remove licensing flag from toggle (misunderstood what it was for)

* fix unit tests

* rerun toggle gen

---------

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
2023-04-12 12:30:33 -04:00
Kyle Brandt
e78be44e1a
SSE: Dataplane Compliance (#65927)
Takes a specific code path for data that identifies itself as dataplane instead of "guessing" what the data is.

The data must identify itself by being in the dataplane by having both the following frame metadata properties:

- TypeVersion property that is greater than 0.0
- 'Type' property

The flag is disableSSEDataplane and disables this functionality and uses the old code for all queries regardless.

See https://github.com/grafana/grafana-plugin-sdk-go/blob/main/data/contract_docs/contract.md for dataplane details.
2023-04-12 12:24:34 -04:00
Matthew Jacobson
63187fae0c
Alerting: Remove and revert flag alertingBigTransactions (#65976)
* Alerting: Remove and revert flag alertingBigTransactions

This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag.

Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count
came at the cost of significant increase in CPU usage and query latency.

* Fix lint backend

* Removed last bits of alertingBigTransactions

---------

Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
2023-04-06 18:06:25 +02:00
gotjosh
1c3ce0735f
Alerting: Tiny refactor on the eval and schedule packages (#66130)
* Alerting: Tiny refactor on the eval and schedule packages

two very small things:

- We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext`
- The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary.

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>

---------

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2023-04-06 16:02:28 +01:00
Matthew Jacobson
85f738cdf9
Alerting: Add endpoint to revert to a previous alertmanager configuration (#65751)
* Alerting: Add endpoint to revert to a previous alertmanager configuration

This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
2023-04-05 14:10:03 -04:00
Alexander Weaver
fb520edd72
Alerting: Use a completely isolated context for state history writes (#64989)
* Add fresh context with timeout and same log properties, re-derive logger

* Unify timeout constants

* Move ctx after shortcut that got added through rebasing

* Unify timeouts

* Port opentracing's SpanFromContext and ContextFromSpan to the grafana tracing package

* Support both opentracing and otel variants

* Better document why we're creating a new ctx

* Add new func to FakeSpan which was added after rebase

* Support grafana-specific traceID key in both tracer implementations
2023-04-04 16:41:46 -05:00
George Robinson
bd29071a0d
Revert "Alerting: Add limits to the Prometheus Rules API" (#65842) 2023-04-03 15:20:37 +00:00
George Robinson
d96b0a71d3
Alerting: Add limits to the Prometheus Rules API (#65169)
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:

1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule

It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
2023-04-03 10:17:02 +01:00
Santiago
aba91d3053
Alerting: Fetch all applied alerting configurations (#65728)
* WIP

* skip invalid historic configurations instead of erroring

* add warning log when bad historic config is found

* remove unused custom marshaller for GettableHistoricUserConfig

* add id to historic user config, move limit check to store, fix typo

* swagger spec
2023-03-31 17:43:04 -03:00
Alexander Weaver
da4832724e
Alerting: Delete stub for SQL alert state history backend (#65667)
Delete stub for SQL backend
2023-03-31 11:15:56 -05:00
Matthew Jacobson
b9dc04139a
Alerting: Respect "For" Duration for NoData alerts (#65574)
* Alerting: Respect "For" Duration for NoData alerts

This change modifies `resultNoData` to be more inline with the logic of the other state handlers.

The main effects of this are:

1) NoData states with NoDataState config set to Alerting will respect "For" duration.
2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK.
3) Better state transition logging.
2023-03-31 19:05:15 +03:00
Steve Simpson
04336d53a9
Alerting: Update prometheus version (#65688) 2023-03-31 16:34:35 +02:00
Yuri Tseretyan
622c23716a
Alerting: Use logger with context in the state cache (#65663) 2023-03-31 10:11:30 -04:00
Alexander Weaver
b2abb63286
Alerting: Introduce proper feature toggles for common state history backend combinations (#65497)
* define 3 feature toggles for rollout phases

* Pass feature toggles along

* Implement first feature toggle

* Try a different strategy with fall-throughs to specific configurations

* Apply toggle overrides once outside of backend composition

* Emit log messages when we coerce backends

* Run code generator for feature toggle files

* Improve wording in flag descs

* Re-run generator

* Use code-generated constants instead of plain strings

* Use converted enum values rather than strings for pre-parsing
2023-03-30 13:53:21 -05:00
Alexander Weaver
5e87ea745d
Alerting: Fix and re-enable filters instance labels in log line test (#65618)
Fix and reenable test
2023-03-30 09:02:18 -05:00
Dimitris Sotirakis
e758b017d0
Alerting: Disable filters instance labels in log line test (#65610)
* Disable filters instance labels in log line test

* Add drone reference
2023-03-30 16:04:29 +03:00
Yuri Tseretyan
9eaffdf5a8
Alerting: Remove dependency on alerting package in definitions (#65390)
* move export rules to definitions package
* move provisioning contact point methods to provisioning package
* move AlertRuleGroupWithFolderTitle to ngalert models and adapter functions to api's compat
* move rule_types files back to where they were before.
2023-03-29 13:34:59 -04:00
Alexander Weaver
a416100abc
Alerting: No longer index state history log streams by instance labels (#65474)
* Remove private labels

* No longer index by instance labels

* Labels are now invariant, only build them once

* Remove bucketing since everything is in a single stream

* Refactor statesToStreams to only return a single unified log stream

* Don't query on labels that no longer exist

* Move selector logic to loki layer, genericize client to work in terms of straight logQL

* Add support for line-level label filters in query

* Combine existing selector tests for better parallelism

* Tests for logQL construction

* Underscore instead of dot for unwrapping labels in logql
2023-03-29 11:52:11 -05:00
Santiago
7b92849508
Alerting: Add CustomDetails field in PagerDuty contact point (#64860)
* Alerting: Add CustomDetails for PagerDuty

* fix default value for 'severity' from 'error' to 'critical'

* minimal docs for notifiers, specifying config for PagerDuty

* replace notifier -> integration

* replace notifier -> integration
2023-03-29 10:35:01 -03:00
Alexander Weaver
de1637afe5
Alerting: Add alert instance labels to Loki log lines in addition to stream labels (#65403)
Add instance labels to log line
2023-03-28 08:57:51 -05:00
Alexander Weaver
dd04757fc9
Alerting: Add "backend" label to state history writes metrics (#65395)
* Add backend label to state history writes metrics

* Update test expectations
2023-03-28 08:49:51 -05:00