Commit Graph

500 Commits

Author SHA1 Message Date
Alexander Weaver
2d7389c34d Alerting: Provisioning API respects global rule quota (#52180)
* Inject interface for quota service and create mock

* Check quota and return 403 if limit exceeded

* Implement tests for quota being exceeded
2022-07-13 17:36:17 -05:00
Yuriy Tseretyan
0d4c503d3d update Evaluator interface to accept context (#52151) 2022-07-13 10:21:11 -04:00
Sofia Papagiannaki
21632817c5 Alerting: Fix invalid swagger specification (#51907)
* Alerting: Fix invalid swagger specification

* Add make targets for validating the generated swagger spec
2022-07-13 12:34:54 +03:00
Yuriy Tseretyan
554ebd647b Alerting: Refactor Evaluator (#51673)
* AlertRule to return condition
* update ConditionEval to not return an error because it's always nil
* make getExprRequest private
* refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable.
* log error if results have errors
* change signature of evaluate function to not return an error
2022-07-12 16:51:32 -04:00
Yuriy Tseretyan
a6b1090879 Alerting: refactor scheduler and separate notification logic (#48144)
* Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router.
* Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface.
* Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.
2022-07-12 15:13:04 -04:00
Alexander Weaver
0e066dd5f8 Alerting: Allow filtering of contact points by name (#51933)
* Define query param and regenerate

* Add query struct for contact points

* Filter contact points by name in query

* Document that name filter is optional
2022-07-11 17:11:46 -05:00
Alexander Weaver
fce283d73e Alerting: Add method to reset notification policy tree back to the default (#51934)
* Define route and run codegen

* Wire up HTTP layer

* Update API layer and test fakes

* Implement reset of policy tree

* Implement service layer test and authorization bindings

* API layer testing

* Be more specific when injecting settings
2022-07-08 16:23:18 -05:00
Alexander Weaver
d77731646c Fix consistency errors and regenerate (#51935) 2022-07-08 10:33:43 -05:00
Jean-Philippe Quéméner
4a76436be2 Altering: validate that the mute time intervals exist when updating routing tree (#51573)
* validate that the mute time intervals exist when updating routing tree

* run lint

* add tests
2022-07-05 13:09:17 -04:00
Alexander Weaver
b9c7eb1380 Alerting: Add method to provisioning API for obtaining a group and its rules (#51398)
* Generate shell for new route

* Propagate path parameters

* Implement route logic

* Add a couple simple tests

* Use NotFound error for not found, avoid returning pointer

* Regenerate
2022-07-05 11:53:50 -05:00
Jean-Philippe Quéméner
e64cde8727 Alerting: validate that the receiver exist when updating routing tree (#51561)
* Alerting: validate that the receiver exist when updating routing tree

* rename interface

* add missing file

* change constructor

* fix e2e tests

* only import package once

* add unit test for bug

* wording

* close response body

* Update pkg/services/ngalert/api/tooling/definitions/alertmanager_validation.go

* refactor to remove database roundtrip
2022-07-05 18:09:57 +02:00
Yuriy Tseretyan
8b3b667a47 Alerting: Fix rule API to accept 0 duration of field For (#50992)
* make 'for' pointer to distinguish between missing field and 0
* set 'for' to -1 if the value is missing but not allow negative in the request + path -1 with the value from original rule
* update store validation to not allow negative 'for'
* update usages to use pointer
2022-06-30 11:46:26 -04:00
Kristin Laemmert
9de00c8eb2 chore/backend: move dashboard errors to dashboard service (#51593)
* chore/backend: move dashboard errors to dashboard service

Dashboard-related models are slowly moving out of the models package and into dashboard services. This commit moves dashboard-related errors; the rest will come in later commits.

There are no logical code changes, this is only a structural (package) move.

* lint lint lint
2022-06-30 09:31:54 -04:00
Sofia Papagiannaki
a5924315f8 API: Fix failure to generate swagger specification due to missing binary (#51551)
* Fix swagger generation

Add installing binary as dependency to the target

* Some more fixes
2022-06-30 09:58:07 +03:00
Yuriy Tseretyan
94e709fdcb Alerting: Simplify eval.Evaluator interface (#51463)
* remove ExpressionService from argument list of Evaluator's methods
2022-06-27 17:40:44 -04:00
Kristin Laemmert
945f015770 backend/datasources: move datasources models into the datasources service package (#51267)
* backend/datasources: move datasources models into the datasources service pkg
2022-06-27 12:23:15 -04:00
Yuriy Tseretyan
78c012df65 move eval_conditions to API models package (#51447) 2022-06-27 11:52:41 -04:00
Sofia Papagiannaki
1399ab50b3 API: Universal swagger generation (#51033) 2022-06-27 10:54:31 +03:00
Alexander Weaver
0d9389e1f4 Alerting: Code-gen parsing of URL parameters and fix related bugs (#50731)
* Extend template and generate

* Generate and fix up alertmanager endpoints

* Prometheus routes

* fix up Testing endpoints

* touch up ruler API

* Update provisioning and fix 500

* Drop dead code

* Remove more dead code

* Resolve merge conflicts
2022-06-23 15:13:39 -05:00
Yuriy Tseretyan
4d02f73e5f Alerting: Persist rule position in the group (#50051)
Migrations:
* add a new column alert_group_idx to alert_rule table
* add a new column alert_group_idx to alert_rule_version table
* re-index existing rules during migration

API:
* set group index on update. Use the natural order of items in  the array as group index
* sort rules in the group on GET
* update the version of all rules of all affected groups. This will make optimistic lock work in the case of multiple concurrent request touching the same groups.

UI:
* update UI to keep the order of alerts in a group
2022-06-22 10:52:46 -04:00
Gilles De Mey
81a5436c1e Alerting: Adds Mimir to Alertmanager data source implementation (#50943) 2022-06-20 12:56:38 +02:00
Yuriy Tseretyan
81089b956a Alerting: Update authorization rules for RouteGetNamespaceRulesConfig (#50965)
* use authorizeAccessToRuleGroup
* use toGettableRuleGroupConfig in get by namespace
* add comments for controller methods
2022-06-17 13:55:31 -04:00
Yuriy Tseretyan
c1550d1f07 Alerting: Rule api to fail update if provisioned rules are affected (#50835)
* add function that checks whether changes mention provisioned rules
* update API that updates group of rules to fail if check does not pass
2022-06-15 16:01:14 -04:00
Alexander Weaver
d61d439b11 Handle bsd vs gnu sed (#50641) 2022-06-14 15:35:23 -05:00
Karl Persson
44ffbfd6aa RBAC: Refactor GetUserPermissions to use []accesscontrol.Permission (#50683)
* Return slice of permissions instead of slice of pointers for permissions
2022-06-14 10:17:48 +02:00
Alexander Weaver
17e76b06ff Alerting: Fix rendering issues in OpenAPI docs (#50630)
* Clean up status codes

* Missing consumes tag

* Regenerate

* Fix incorrect documented responses and missing UI elements

* Fix response docs

* Fix wrong response copy paste

* Regenerate

* Temporarily revert
2022-06-13 12:51:07 -05:00
Yuriy Tseretyan
c314ce48c7 Alerting: Support for optimistic locking for alert rules (#50274)
* add support for optimistic locking for alert_rule table
* return 409 in the case of opitimistic lock
2022-06-13 12:15:28 -04:00
Jean-Philippe Quéméner
1ed7280363 Alerting: add right provenance when creating mute timings (#50707) 2022-06-13 18:05:41 +02:00
Jean-Philippe Quéméner
862f51216b Alerting: improve provisioning docs (#50347)
* Alerting: improve provisioning docs

* add new provisioning page

* add api docs

* fix formatting and add better descriptions

* fix typo
2022-06-10 16:25:15 +02:00
Alexander Weaver
7dd78fee2c Alerting: Fix provisioning validation status codes and panics (#50464)
* Updates to all except alert rules

* Return 400 when rules fail to validate, add testinfra

* More sane package aliases

* More package alias renames

* One more bug in contact point validation

* remove unused function

Co-authored-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>
Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
2022-06-09 10:38:46 +02:00
Jean-Philippe Quéméner
cf684ed38f Alerting: bump rule version when updating rule group interval (#50295)
* Alerting: move group update to alert rule service

* rename validateAlertRuleInterval to validateRuleGroupInterval

* init baseinterval correctly

* add seconds suffix

* extract validation function for reusability

* add context to err message
2022-06-09 09:28:32 +02:00
Yuriy Tseretyan
54fa04263b Alerting: Add RBAC actions and role for provisioning API routes (#50459)
* add alert provisioning actions and role

* linter
2022-06-09 09:18:57 +02:00
Alexander Weaver
28a47b56d2 Bump provisioning to admin-only in lieu of dedicated RBAC permissions (#50366) 2022-06-07 17:26:48 -05:00
gotjosh
0cde283505 Alerting: Logs should not be capitalized and the errors key should be "err" (#50333)
* Alerting: decapitalize log lines and use "err" as the key for errors

Found using (logger|log).(Warn|Debug|Info|Error)\([A-Z] and (logger|log).(Warn|Debug|Info|Error)\(.+"error"
2022-06-07 19:54:23 +02:00
Jean-Philippe Quéméner
4b8a4449ed Alerting: remove feature toggle for provisioning API (#50167)
* Alerting: remove feature toggle for provisioning API

* remove missed code parts

* remove unused import

* remove empty line

* mark routes as stable
2022-06-05 07:45:36 +02:00
Jean-Philippe Quéméner
4cc8c6f745 Alerting: Add provenance guard to config api (#50147)
* Alerting: add provenance guard to config api

* add tests

* only guard if config valid

* adapt error message

* simplify logic

* rename arguments

* make logic more straight forward

* rename opt to options

* remove useless maps
2022-06-04 14:55:46 +02:00
Jean-Philippe Quéméner
d2f3631a47 Alerting: add mute timings provenance to config api (#50149) 2022-06-03 19:32:31 +02:00
Alexander Weaver
67290aa49f Alerting: Add version segment to all provisioning routes (#49121)
Co-authored-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>
2022-06-03 16:45:08 +02:00
Gilles De Mey
e6ceee501f Alerting: Use correct permission scope for external AM updates (#50159)
Co-authored-by: konrad147 <konrad.lalik@grafana.com>
2022-06-03 15:12:34 +02:00
Jean-Philippe Quéméner
468ed68d64 Alerting: allow custom UID for contact points through API (#50089)
* Alerting: allow custom UID for contact points through API

* fix auth
2022-06-03 10:33:47 +02:00
Jean-Philippe Quéméner
81d360529b Alerting: Provisioning API - Alert rules (#47930) 2022-06-02 14:48:53 +02:00
gotjosh
1a50b0dbb7 Alerting: Remove double quotes from matchers (#50038)
* Alerting: Remove double quotes from matchers

With #38629 a new Alertmanager configuration object was introduced with `object_matchers`, it was meant to circumvent around the fact that Prometheus label names don't support a set of characters that Grafana needs to support for alerts, silences, matchers, etc. (with a common example being elasticsearch's `.`).
This new object does not include the label of sanitzation or validation that its Prometheus equivalent supports in `matchers` and therefore are semantically not equivalent.

This triggered the problem that when the migration is run, we use `matchers` as the object to populate in configuration for routing policies, but when the UI does its first save this object is transformed to `object_matchers`.

Matchers that were previously running just fine would immediately stop working as soon as the configuration is saved.

This problem surfaced with the introduction of #49952 where we stopped stripping double quotes from matchers (not just regex but _all_ of them).

* Add comment explaining rationale and future removal

Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
2022-06-01 16:05:24 -05:00
Yuriy Tseretyan
ad25e2a20c Alerting: Update RBAC for alert rules to consider access to rule as access to group it belongs (#49033)
* update authz to exclude entire group if user does not have access to rule
* change rule update authz to not return changes because if user does not have access to any rule in group, they do not have access to the rule
* a new query that returns alerts in group by UID of alert that belongs to that group
* collect all affected groups during calculate changes
* update authorize to check access to groups
* update tests for calculateChanges to assert new fields
* add authorization tests
2022-06-01 10:23:54 -04:00
Alexander Weaver
909ebcf979 Alerting: Endpoints for provisioning mute timings (#49635)
* Add validator for mute timing and make it provisionable

* Add tests to ensure prometheus validators are running and errors are propagated

* Internal API for manipulating mute timings

* Define and generate API layer

* Wire up generated code

* Implement API handlers

* Tests for golang layer

* Fix reference bug

* Fix linter and auth tests

* Resolve semantic errors and regenerate

* Remove pointless comment

* Extract out provisioning path param keys, simplify

* Expected number of paths
2022-05-26 14:24:34 -05:00
Sofia Papagiannaki
7cf321d7bd Alerting: Fix swagger specification (#49273)
* Alerting: fix specification

* Update merged swagger specification
2022-05-26 14:43:59 +03:00
Alexander Weaver
ac8951f689 Alerting: Add support for documenting which alerting APIs are stable (#49018)
* Support for documenting stable vs unstable alerting routes

* empty commit, restart drone

* Touch-up references in root makefile and drop trailing escape newline

* Rebase and regenerate

* Extend README with docs for this change
2022-05-23 14:08:27 -05:00
Yuriy Tseretyan
3dfafbadef Alerting: Fix access to alerts for viewer with editor permissions when RBAC is disabled (#49270)
* Add folder edit permission for users with Viewer role
* relax permissions required to create an alert when RBAC is disabled
2022-05-23 09:58:20 -04:00
Joe Blubaugh
1cc034d960 Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
Joe Blubaugh
12c25759da Alerting: Attach screenshot data to Slack notifications. (#49374)
This change extracts screenshot data from alert messages via a private annotation `__alertScreenshotToken__` and attaches a URL to a Slack message or uploads the data to an image upload endpoint if needed.

This change also implements a few foundational functions for use in other notifiers.
2022-05-23 14:24:20 +08:00
Yuriy Tseretyan
258b3ab18b Alerting: Fix RBAC actions for notification policies (#49185)
* squash actions "alert.notifications:update", "alert.notifications:create", "alert.notifications:delete" to "alert.notifications:write"
* add migration
* update UI to use the write action
* update docs
* changelog
2022-05-20 10:55:07 -04:00