Commit Graph

86 Commits

Author SHA1 Message Date
Ganesh Vernekar
e19c690426
Alerting: Fix potential panic in Alertmanager when starting up (#36562)
* Alerting: Fix potential panic in Alertmanager when starting up

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix reviews

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-07-12 11:15:16 +05:30
Sofia Papagiannaki
fc90d47863
Alerting API: Restrict access to Alertmanager configuration (#36507)
* Alerting API: Restrict access to Alertmanager configuration to viewers
2021-07-07 16:29:18 +03:00
Sofia Papagiannaki
8a3edf280e
Alerting: Fix prometheus API to check folder permissions (#36301) 2021-07-05 10:49:14 +03:00
Sofia Papagiannaki
abe35c8c01
Alerting: Add error recovery during rule evaluations (#35450)
* Alerting: Eval recovery after query failure

* Apply suggestions from code review
2021-06-15 19:30:21 +03:00
gotjosh
f7ed35336d
Alerting: Implement /status for the notification system (#33227)
* Alerting: Implement /status for the notification system

Implements the necessary plumbing to have a /status endpoint on the
notification system.

* Add API examples

* Update API specs

* Update prometheus/common dependency

Co-authored-by: Sofia Papagiannaki <sofia@grafana.com>
2021-06-15 19:14:02 +03:00
Sofia Papagiannaki
fba90b8f9b
Alerting: Recact html responses (#35277) 2021-06-04 20:57:24 +03:00
Sofia Papagiannaki
15c55b0115
Alerting: Fix notification channel migration and handle case when Alertmanager default configuration is absent (#35086)
* Fix dashboard alert and nootifier migration for MySQL

* Fix POSTing Alertmanager configuration if no current configuration exists

in case the default configuration has not be stored yet
or has failed to get stored

* Change CreatedAt field type
2021-06-04 15:52:41 +03:00
Sofia Papagiannaki
355be158b7
[Alerting]: fix/cleanup API examples (#34588) 2021-05-31 11:18:29 +03:00
Owen Diehl
9aca032d10
Alerting/consistent api errors (#34858)
* consolidates alertmanager api errors

* util & testing consistent errors

* consistent errors for rest of ngalert apis

* updates expected errors in testware

* bump ci

* linting

* unrelated: dashboard.go lint
2021-05-28 11:55:03 -04:00
Domas
347273cdea
Alerting: check upstream response content type in lotex proxy (#34760) 2021-05-27 14:12:29 +03:00
Owen Diehl
0e0ed43153
Alerting/testing promql extraction (#34665)
* promql compat for marshaling

* extracts upstream instant queries into data frame for alerting

* eval string parity
2021-05-25 11:54:50 -04:00
Owen Diehl
1d2febfa85
[Alerting] Route validations (#34393)
* more routing validation

* go mod

* recursive route validations
2021-05-19 10:36:28 -04:00
Owen Diehl
d6c4c2fcd5
[Alerting] Ensure upstream validations are run (#34333)
* use embedded validations via noop yaml unmarshaler

* lint

* fixes integration tests now that groupings are handled
2021-05-19 06:22:44 -04:00
David Parrott
bbb7bbf891
Alerting: Remove back end logic for supporting KeepLastState (#34242)
* Removed back end logic for supporting KeepLastState

* Map keep_state correctly in migrations
2021-05-18 10:55:43 -07:00
Sofia Papagiannaki
11243dec14
[Alerting]: Assign UUID to grafana receivers (#34241)
* [Alerting]: Assign UUID to grafana receivers

* Apply suggestions from code review

* Add test for updating invalid receiver

Co-authored-by: Domas <domasx2@gmail.com>
2021-05-18 17:31:00 +03:00
Kyle Brandt
63b2dd06a5
Alerting: Set "value" with evalmatches in G Managed (#34075)
When, and currently only when using a classic condition, evaluation information is added (which is like the EvalMatches from dashboard alerting).

This is returned via the API and can be included in notifications by reading the `__value__` label attached `.Alerts` in the template. It is a string.
2021-05-18 09:12:39 -04:00
Ganesh Vernekar
89c2b5e863
NGAlert: Remove unwanted fields from notification channel config (#34036)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-05-18 10:04:47 +02:00
gotjosh
eb74994b8b
Alerting: Modify configuration apply and save semantics - v2 (#34143)
* Save default configuration to the database and copy over secure settings
2021-05-14 19:49:54 +01:00
Owen Diehl
3b06f52bab
Alerting/allow empty receiver (#33962)
* simplifies yaml unmarshaling: PostableApiReceiver

* allow empty receiver type

* allows name only receivers (blackhole)

* better receiver type parsing

* linting
2021-05-12 07:58:16 -04:00
Sofia Papagiannaki
f4750fb3c8
[Alerting]: Alertmanager API apply permissions (#33843)
* [Alerting]: Alertmanager API apply permissions

* Apply suggestions from code review
2021-05-11 11:31:38 +03:00
Owen Diehl
e18ca8f6f2
enforce receivers align with backend type when posting AM config (#33877) 2021-05-10 16:58:41 -04:00
Sofia Papagiannaki
1c58fd380f
[Alerting]: store encrypted receiver secure settings (#33832)
* [Alerting]: Store secure settings encrypted

* Move encryption to the API handler
2021-05-10 15:30:42 +03:00
David Parrott
e58aca2d20
Alerting: remove instances from db and cache on rule update (#33722)
* remove instances from db and cache on rule update

* fix panic

* rename
2021-05-06 18:39:34 +02:00
Owen Diehl
a5ae8cf377
Unredact/secret (#33723)
* no longer redacts GETing proxied AM configs

* removes unused testfile

* testware fix

* consistently roundtrips yaml<>json and doesnt redact secrets

* lint
2021-05-05 16:21:53 -04:00
David Parrott
b1a8c67689
Alerting return evaluation errors to /rules (#33663)
* Set and return errors produced by evaluation results

* test fixup
2021-05-04 13:08:12 -04:00
David Parrott
39099bf3c0
Alerting nested state cache (#33666)
* nest cache by orgID, ruleUID, stateID

* update accessors to use new cache structure

* test and linter fixup

* fix panic

Co-authored-by: Kyle Brandt <kyle@grafana.com>

* add comment to identify what's going on with nested maps in cache

Co-authored-by: Kyle Brandt <kyle@grafana.com>
2021-05-04 09:57:50 -07:00
Sofia Papagiannaki
540f110220
[Alerting]: Extend quota service to optionally set limits on alerts (#33283)
* Quota: Extend service to set limit on alerts

* Add test for applying quota to alert rules

* Apply suggestions from code review

Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>

* Get used alert quota only if naglert is enabled

* Set alert limit to zero if nglalert is not enabled
Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>
2021-05-04 19:16:28 +03:00
Kyle Brandt
48358efc13
Alerting: remove State cache entries on Ruler Delete (#33638)
for https://github.com/grafana/alerting-squad/issues/133
2021-05-03 14:01:33 -04:00
Kyle Brandt
c1034f3118
Alerting: Create instanceStore (#33587)
for https://github.com/grafana/alerting-squad/issues/129
2021-05-03 07:19:15 -04:00
Kyle Brandt
b8f01fe034
Alerting: backend "ng" code cleanup (#33578) 2021-04-30 13:21:57 -04:00
Owen Diehl
5e48b54549
Alerting/metrics (#33547)
* moves alerting metrics to their own pkg

* adds grafana_alerting_alerts (by state) metric

* alerts_received_{total,invalid}

* embed alertmanager alerting struct in ng metrics & remove duplicated notification metrics (already embed alertmanager notifier metrics)

* use silence metrics from alertmanager lib

* fix - manager has metrics

* updates ngalert tests

* comment lint
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>

* cleaner prom registry code

* removes ngalert global metrics

* new registry use in all tests

* ngalert metrics impl service, hack testinfra code to prevent duplicate metric registrations

* nilmetrics unexported
2021-04-30 12:28:06 -04:00
Sofia Papagiannaki
1e380e869e
[Alerting]: some fixes (#33538)
* Fix fialure when adding state annotations

* Fix get org rules API

Do not fail response if user has no access to view a namespace.
Do not include the namespace in the response instead.

* lint
2021-04-29 19:15:15 +03:00
Owen Diehl
ec37b4cb87
[Alerting] Automatic request instrumentation (#33444)
* alerting: automatic request instrumentation

* always expose alerting prom metrics

* globally register alerting metrics
2021-04-28 16:59:15 -04:00
Sofia Papagiannaki
7ccb022c03
Alerting: validate condition before updating rulegroup (#33367)
* Alerting: validate condition before updating rulegroup

* Apply suggestions from code review
2021-04-28 11:31:51 +03:00
Kyle Brandt
b590e95682
AlertingAPI: Change list response query prop (#33419)
* Alerting: change to full []AlertQuery as json in a string and not just model.
2021-04-27 22:15:00 +02:00
Ganesh Vernekar
467ab124dd
NGAlert: Fix GET for Alertmanager config (#33379)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-04-27 20:48:19 +05:30
Kyle Brandt
adcba36d39
AlertingAPI: update swagger json files match datasourceUid change (#33332)
* update swagger json files match datasourceUid change
underlying change made in https://github.com/grafana/grafana/pull/33282
* Document DatasourceUID field in AlertQuery model
* Run spec generation from inside a docker container
* Generate latest spec

Co-authored-by: Sofia Papagiannaki <sofia@grafana.com>
2021-04-27 16:50:30 +02:00
Owen Diehl
86c8eed386
Instrument/ruler api (#33290)
* ruler api histogram instrumentation

* register ruler metrics
2021-04-27 08:25:32 -04:00
David Parrott
788bc2a793
Alerting: refactor state tracker (#33292)
* set processing time

* merge labels and set on response

* use state cache for adding alerts to rules

* minor cleanup

* add support for NoData and Error results

* rename test

* bring in changes from other PRs tha have been merged

* pr feedback

* add integration test

* close state tracker cleanup on context.Done

* fixup test

* rename state tracker

* set EvaluationDuration on Result

* default labels set as constants

* separate cache and state from manager

* use RWMutex in cache
2021-04-23 21:32:25 +02:00
David Parrott
ca79206498
Alerting: Handle NoData and Error evaluation results (#33194)
* set processing time

* merge labels and set on response

* use state cache for adding alerts to rules

* minor cleanup

* add support for NoData and Error results

* rename test

* bring in changes from other PRs tha have been merged

* pr feedback

* add integration test

* close state tracker cleanup on context.Done

* fixup test

* not those annotations
2021-04-23 20:47:52 +02:00
Kyle Brandt
5e818146de
Alerting/Expr: New SSE Request/QueryType, alerting move data source UID (#33282) 2021-04-23 16:52:32 +02:00
Sofia Papagiannaki
b2288f7ef9
[Alerting]: Add alerting endpoint for Query Evaluation (#33174)
* [Alerting]: Add alerting endpoint for Query Evaluation

* Fix passing down now parameter

* Add validations and test

* Fix eval queries and expressions test

* Add eval tests
2021-04-21 22:44:50 +03:00
David Parrott
4be1d84f23
Alerting: Enhancements to /rules (#33085)
* set processing time

* merge labels and set on response

* use state cache for adding alerts to rules

* minor cleanup

* pr feedback

* Do not initialize mutex unnecessarily

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* linter

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-04-21 09:30:03 -07:00
Sofia Papagiannaki
87a70af7eb
[Alerting]: Fix updating rule group and add tests (#33074)
* [Alerting]: Fix updating rule group and add test

* Fix updating rule labels
* Set default values for rule no data and error states
if they are missing
* Add test for updating rule

* Test updating annotations

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>

* add test for posting an unknown rule UID

* Fix alert rule validation and add tests

* Remove org id from PostableGrafanaRule

This field was not used; each rule gets the organisation of the user making
the rerquest

* Update pkg/tests/api/alerting/api_alertmanager_test.go

Co-authored-by: gotjosh <josue@grafana.com>
2021-04-21 17:22:58 +03:00
gotjosh
23c7e7ab60
Alerting: Various fixes for the alerts endpoint (#33182)
A set of fixes for the GET alert and groups endpoints.

- First, is the fact that the default values where not being for the query params. I've introduced a new method in the Grafana context that allow us to do this.
- Second, is the fact that alerts were never being transitioned to active. To my surprise this is actually done by the inhibitor in the pipeline - if an alert is not muted, or inhibited then it's active.
- Third, I have added an integration test to cover for regressions.

Signed-off-by: Josue Abreu <josue@grafana.com>
2021-04-21 06:34:42 -04:00
Owen Diehl
e065e19583
Fix/ngalert generation (#33172)
* fixes pkg names & alerting openapi generation

* cleans up api generation, uses docker & removes python
2021-04-20 13:12:32 -04:00
Oscar Kilhed
bc2d90f140
Fix lint issue in cortex ruler test (#33158) 2021-04-20 14:03:58 +02:00
Owen Diehl
e37a780e14
Inhouse alerting api (#33129)
* init

* autogens AM route

* POST dashboards/db spec

* POST alert-notifications spec

* fix description

* re inits vendor, updates grafana to master

* go mod updates

* alerting routes

* renames to receivers

* prometheus endpoints

* align config endpoint with cortex, include templates

* Change grafana receiver type

* Update receivers.go

* rename struct to stop swagger thrashing

* add rules API

* index html

* standalone swagger ui html page

* Update README.md

* Expose GrafanaManagedAlert properties

* Some fixes

- /api/v1/rules/{Namespace} should return a map
- update ExtendedUpsertAlertDefinitionCommand properties

* am alerts routes

* rename prom swagger section for clarity, remove example endpoints

* Add missing json and yaml tags

* folder perms

* make folders POST again

* fix grafana receiver type

* rename fodler->namespace for perms

* make ruler json again

* PR fixes

* silences

* fix Ok -> Ack

* Add id to POST /api/v1/silences (#9)

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add POST /api/v1/alerts (#10)

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* fix silences

* Add testing endpoints

* removes grpc replace directives

* [wip] starts validation

* pkg cleanup

* go mod tidy

* ignores vendor dir

* Change response type for Cortex/Loki alerts

* receiver unmarshaling tests

* ability to split routes between AM & Grafana

* api marshaling & validation

* begins work on routing lib

* [hack] ignores embedded field in generation

* path specific datasource for alerting

* align endpoint names with cloud

* single route per Alerting config

* removes unused routing pkg

* regens spec

* adds datasource param to ruler/prom route paths

* Modifications for supporting migration

* Apply suggestions from code review

* hack for cleaning circular refs in swagger definition

* generates files

* minor fixes for prom endpoints

* decorate prom apis with required: true where applicable

* Revert "generates files"

This reverts commit ef7e975584.

* removes server autogen

* Update imported structs from ngalert

* Fix listing rules response

* Update github.com/prometheus/common dependency

* Update get silence response

* Update get silences response

* adds ruler validation & backend switching

* Fix GET /alertmanager/{DatasourceId}/config/api/v1/alerts response

* Distinct gettable and postable grafana receivers

* Remove permissions routes

* Latest JSON specs

* Fix testing routes

* inline yaml annotation on apirulenode

* yaml test & yamlv3 + comments

* Fix yaml annotations for embedded type

* Rename DatasourceId path parameter

* Implement Backend.String()

* backend zero value is a real backend

* exports DiscoveryBase

* Fix GO initialisms

* Silences: Use PostableSilence as the base struct for creating silences

* Use type alias instead of struct embedding

* More fixes to alertmanager silencing routes

* post and spec JSONs

* Split rule config to postable/gettable

* Fix empty POST /silences payload

Recreating the generated JSON specs fixes the issue
without further modifications

* better yaml unmarshaling for nested yaml docs in cortex-am configs

* regens spec

* re-adds config.receivers

* omitempty to align with prometheus API behavior

* Prefix routes with /api

* Update Alertmanager models

* Make adjustments to follow the Alertmanager API

* ruler: add for and annotations to grafana alert (#45)

* Modify testing API routes

* Fix grafana rule for field type

* Move PostableUserConfig validation to this library

* Fix PostableUserConfig YAML encoding/decoding

* Use common fields for grafana and lotex rules

* Add namespace id in GettableGrafanaRule

* Apply suggestions from code review

* fixup

* more changes

* Apply suggestions from code review

* aligns structure pre merge

* fix new imports & tests

* updates tooling readme

* goimports

* lint

* more linting!!

* revive lint

Co-authored-by: Sofia Papagiannaki <papagian@gmail.com>
Co-authored-by: Domas <domasx2@gmail.com>
Co-authored-by: Sofia Papagiannaki <papagian@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: David Parrott <stomp.box.yo@gmail.com>
Co-authored-by: Kyle Brandt <kyle@grafana.com>
2021-04-19 14:26:04 -04:00
Kyle Brandt
2c862678ab
AlertingNG/SSE: Datasource UID and UID/ID only (drop name) (#33039)
SSE still will support ID until dashboard/frontend always requests UID
update *.http examples

Co-authored-by: gotjosh <josue@grafana.com>
2021-04-16 15:29:19 +02:00
gotjosh
362c4d4276
Alerting: Integration test rule creation (#33047)
* Alerting: Integration test rule creation

* Appease the linter

* Cleanup

* Make anonymous user role a parameter
2021-04-16 08:00:07 -04:00