Commit Graph

349 Commits

Author SHA1 Message Date
George Robinson
a9399ab3cd
Alerting: Add context.Context to RuleStore (#45004)
Alerting: Add context.Context to RuleStore
2022-02-08 08:52:03 +00:00
Alexander Weaver
935059a376
Alerting: Create basic storage layer for provisioning (#44679)
* Simplistic store API for provenance lookups on arbitrary types

* Add a few notes in comments

* Improved type safety for provisioned objects

* Clean-up TODOs for future PRs

* Clean up provisioning model

* Clean up tests

* Restrict allowable types in interface

* Fix linter error

* Move AlertRule domain methods to same file as AlertRule definition

* Update pkg/services/ngalert/models/provisioning.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Complete interface rename

* Pass context through store API

* More idiomatic method names

* Better error description

* Improve code-docs

* Use ORM language instead of raw sql

* Add support for records in different orgs

* ResourceTypeID -> ResourceType since it's not an ID

Co-authored-by: George Robinson <george.robinson@grafana.com>
2022-02-04 13:23:19 -06:00
Yuriy Tseretyan
ddfe2dce74
Alerting: Split grafana and lotex routes (#44742)
* split Lotex and Grafana routes
* update template to use authorize function for every route
2022-02-04 12:42:04 -05:00
idafurjes
7a23700e1a
Remove unused GetDashboard method (#44890)
* Remove unused GetDashboard method

* Uncomment test

* Fix dashboard service integration test

* Remove comment
2022-02-04 17:21:06 +01:00
George Robinson
9df43abbb5
Fix evaluation of alert rules for datasources with custom headers (#44862)
* Fix evaluation of alert rules for datasources with custom headers

* Fix unit tests

* Fix integration tests

* Evaluator fields should be package private
2022-02-04 14:56:37 +01:00
Yuriy Tseretyan
984c95de63
Do not store EvaluationString in Evaluation. (#44606)
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state
2022-02-02 19:18:20 +01:00
George Robinson
924deda589
Fix Discord Webhook URL for invalid template (#44763)
This commit fixes an issue where an invalid template for Discord would change the Webhook URL to "" and cause "unsupported protocol scheme" errors.
2022-02-02 14:28:41 +01:00
Santiago
04d93751b8
Alerting: send alerts to external, internal, or both alertmanagers (#40341)
* (WIP) send alerts to external, internal, or both alertmanagers

* Modify admin configuration endpoint, update swagger docs

* Integration test for admin config updated

* Code review changes

* Fix alertmanagers choice not changing bug, add unit test

* Add AlertmanagersChoice as enum in swagger, code review changes

* Fix API and tests errors

* Change enum from int to string, use 'SendAlertsTo' instead of 'AlertmanagerChoice' where necessary

* Fix tests to reflect last changes

* Keep senders running when alerts are handled just internally

* Check if any external AM has been discovered before sending alerts, update tests

* remove duplicate data from logs

* update comment

* represent alertmanagers choice as an int instead of a string

* default alertmanagers choice to all alertmanagers, test cases

* update definitions and generate spec
2022-02-01 20:36:55 -03:00
George Robinson
5e2280ceee
Add metrics to ngalert scheduler (#44602)
This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.
2022-01-31 16:56:43 +00:00
idafurjes
12420260ef
Remove bus from org invite api (#44530)
* Remove bus from org invite api

* Fix lint

* Remove comment
2022-01-31 17:24:52 +01:00
Serge Zaitsev
84a5910e56
Chore: Remove bus from ngalert (#44465)
* pass notification service down to the notifiers

* add ns to all notifiers

* remove bus from ngalert notifiers

* use smaller interfaces for notificationservice

* attempt to fix the tests

* remove unused struct field

* simplify notification service mock

* trying to resolve issues in the tests

* make linter happy

* make linter even happier

* linter, you are annoying
2022-01-26 16:42:40 +01:00
Jean-Philippe Quéméner
8ee3f59cd4
Alerting: recognize Cortex datasources correctly in the frontend (#44316)
* Alerting: always use msg field for user facing errors

* fix: revert front-end Cortex detection

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2022-01-21 15:44:11 +01:00
ying-jeanne
7422789ec7
Remove Macaron ParamsInt64 function from code base (#43810)
* draft commit

* change all calls

* Compilation errors
2022-01-15 00:55:57 +08:00
Yuriy Tseretyan
ed5c664e4a
Alerting: Stop firing of alert when it is updated (#39975)
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache. 
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry. 
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
2022-01-11 11:39:34 -05:00
Yuriy Tseretyan
ea478dec22
Alerting: Remove bridge between log15 and go-kit logger (#43769)
* remove bridge between log15 and go-kit logger.

* fix tests
2022-01-07 09:40:09 +01:00
ying-jeanne
a8eef45a44
Logger migration from log15 to gokit/log (#41636)
* migrate log15 to gokit/log

* fix console log

* update some unittest

* fix all unittest

* fix the build

* Update pkg/infra/log/log.go

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>

* general type vector

* correct the level key

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>
2022-01-06 22:28:05 +08:00
Alexander Weaver
fd583a0e3b
Alerting: Allow customization of Google chat message (#43568)
* Allow customizable googlechat message via optional setting

* Add optional message field in googlechat contact point configurator

* Fix strange error message on send if template fails to fully evaluate

* Elevate template evaluation failure logs to Warn level

* Extract default.title template embed from all channels to shared constant
2022-01-05 09:47:08 -06:00
idafurjes
8e6d6af744
Rename DispatchCtx to Dispatch (#43563) 2021-12-28 17:36:22 +01:00
idafurjes
7936c4c522
Rename AddHandlerCtx to AddHandler (#43557) 2021-12-28 16:08:07 +01:00
idafurjes
56c3875bb9
Chore: Remove context.TODO (#43458)
* Remove context.TODO() from services

* Fix live test
2021-12-28 10:26:18 +01:00
Alexander Weaver
56b3dc5445
Alerting: Allow configuration of non-ready alertmanagers (#43063)
* Create API test for overwriting invalid alertmanager config

* Avoid requiring alertmanager readiness for config changes

* AlertmanagerSrv depends on functionality rather than concrete types

* Add test for non-ready alertmanagers

* Additional cleanup and polish

* Back out previous integration test changes

* Refactor of tests incorrectly caused a test to become redundant

* Use pre-existing fake secret service

* Drop unused interface

* Test against concrete MultiOrgAlertmanager re-using fake infra from other tests

* Fix linter error

* Empty commit to rerun checks
2021-12-27 17:01:17 -06:00
Alexander Weaver
9abdaf251f
Alerting: Fix global state sensitivity in notifier channel tests (#43508) 2021-12-27 11:58:17 -06:00
idafurjes
b8852ef6a3
Chore: Remove context.TODO() (#43409)
* Remove context.TODO() from services

* Fix live test

* Remove context.TODO
2021-12-22 11:02:42 +01:00
Jean-Philippe Quéméner
ffc72aa255
Alerting: fix gosec warning that is not valid (#43425) 2021-12-21 19:47:47 +01:00
idafurjes
ff3cf94b56
Chore: Remove context.TODO() from services (#42555)
* Remove context.TODO() from services

* Fix live test
2021-12-20 17:05:33 +01:00
Yuriy Tseretyan
1a762083d7
Alerting: make alert rule routine evaluation control be thread-safe (#41220)
* change registry.delete to return deleted struct
* use pointer to alertRuleInfo instead copying.
* do not access evaluation channel when routine is stopped
* remove stopCh and use context cancellation
* do not return ctx.Err when channel is cancelled because it cancels all other routines
* make alertRuleInfo fields and functions package private
2021-12-16 14:52:47 -05:00
Ryan McKinley
2754e4fdf0
Expressions: use datasource model from the query (#41376)
* refactor datasource loading

* refactor datasource loading

* pass uid

* use dscache in alerting to get DS

* remove expr/translate pacakge

* remove dup injection entry

* fix DS type on metrics endpoint, remove SQL DS lookup inside SSE

* update test and adapter

* comment fix

* Make eval run as admin when getting datasource info

Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>

* fmt and comment

* remove unncessary/redundant code

Co-authored-by: Kyle Brandt <kyle@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2021-12-16 13:51:46 -03:00
Jean-Philippe Quéméner
b605340668
Alerting: log errors happening in the API on server side (#43192)
* Alerting: log errors happening in the API on server side

* adapt tests to reflect changed payload
2021-12-16 13:33:10 +01:00
Gilles De Mey
bb3b5c10e7
Alerting: fix WeCom channel notifier test assertion (#43173) 2021-12-15 19:45:12 +01:00
Gilles De Mey
cbbbb505b4
Alerting: use HTML-safe characters for the default template (#43148) 2021-12-15 17:57:08 +01:00
smallpath
aec14cba42
Alerting: Support WeCom as a contact point type (#40975)
* add wecom notifier

* fix backend lint

* fix alerting channel test

* update wecom doc

* update notifiers

* update wecom notifier test

* Apply suggestions from code review

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* unify wecom alerting

* fix backend lint

* fix front lint

* fix wecom test

* update docs

* Update pkg/services/ngalert/notifier/channels/wecom.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* remove old wecom notifier

* remove old notifier doc

* fix backend test

* Update docs/sources/alerting/unified-alerting/contact-points.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* fix doc style

Co-authored-by: gotjosh <josue.abreu@gmail.com>
2021-12-15 16:42:03 +00:00
Yuriy Tseretyan
1db9b1e6a9
Improve bridge for Alertmanager logger (#42958)
* Implement go-kit/log.Logger for internal logger.
2021-12-13 09:41:53 -05:00
Sofia Papagiannaki
c6483cd8ed
Alerting: Refactor API handlers to use web.Bind (#42600)
* Alerting: Refactor API handlers to use web.Bind

* lint
2021-12-13 09:22:57 +01:00
gotjosh
bdab1d1f1f
Fix flaky tests in several notifiers (#42668)
* Fix flaky tests in several notifiers

- Non-mocked time in sensu go tests
- Close server in Slack tests
- Use a mutex for writing responses in the fake slack server

* Remove mutex at the fake slack server
2021-12-03 12:34:31 +00:00
George Robinson
c932dc959c
Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630) 2021-12-03 09:55:16 +00:00
gotjosh
5b64c4f684
Alerting: Fix panic while proxying 4xx responses of requests to cortex/loki (#42570)
Fixes a panic that would ocurr as we proxy 4xx responses. When this happens and the content type of the response is JSON we try to check if the response has a "message" key. Then, we assume that the key will contain a value of string but we don't take into account that this value can potentially be `null`.

This adds a type assertion check to to this assumption so that we can keep the original JSON body as the response if we're unable to extract an `message`.
2021-12-01 13:53:29 +00:00
gotjosh
357e9ed1ea
Alerting: Fix Annotation Creation when the alerting state changes (#42479)
* Fix Annotation creation
- Remove validation of panelID, now annotations are created irrespective on whether they're attached to a panel or not.
- Alwasy attach the annotation to an AlertID

* Fix annotation creation

* fix tests
2021-12-01 11:04:54 +00:00
Sofia Papagiannaki
9c7b52fd36
Alerting: Fix API specification (#42282)
* Alerting: Fix API specification
2021-11-30 20:55:54 +01:00
Santiago
a21d1e50f1
avoid template execution errors on missing values (#41617) 2021-11-29 15:26:51 -03:00
gotjosh
dd5a2e5128
Alerting: Clear alerting rule evaluation errors after intermittent failures (#42386)
* Alerting: Clear alerting rule evaluation errors after intermittent failures

When an alert transitioned in a way that `alerting -> error -> (alerting|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.
2021-11-26 17:58:19 +00:00
George Robinson
1b26d4d88e
Alerting: Create DatasourceError alert if evaluation returns error (#41869)
* Alerting: Create DatasourceError alert if evaluation returns error

* Alerting: Add docs for DatasourceError alert

* Alerting: Fix DatasourceError alert does not have dashboard_uid label

* Alerting: Add break when datasource_uid found

* Alerting: Update TestProcessEvalResults
2021-11-25 11:46:47 +01:00
George Robinson
1e5b0e64ac
Alerting: Add comments to ScheduleService interface (#42228) 2021-11-25 10:12:04 +00:00
Armand Grillet
6523486122
Alerting: Make Unified Alerting enabled by default for those who do not use legacy alerting (#42200)
* update AlertingEnabled and UnifiedAlertingSettings.Enabled to be pointers
* add a pseudo migration to fix the AlertingEnabled and UnifiedAlertingSettings.Enabled if the latter is not defined
* update the default configuration file to make default value for both 'enabled' flags be undefined

Misc
* update Migrator to expose DB engine. This is needed for a ualert migration to access the database while the list of migrations is created.
* add more verbose failure when migrations do not match

Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2021-11-24 14:56:07 -05:00
Jean-Philippe Quéméner
cec2d965ec
Alerting: validate mute timings in the alertmanager configuration (#42125)
* Alerting: check for uniqueness of mutetime names

* add some testing

* add name validation

* add root route validation

* add tests for validation

* add check for root route mute_time_intervals

* add duplicate test

* remove useless yaml test

* refactor table test
2021-11-23 16:25:20 +01:00
George Robinson
9122e7f647
Alerting: Check for nil model.Settings and models.SecureSettings (#37738) 2021-11-22 11:56:18 +00:00
Peter Holmberg
97978a7c02
Alerting: Add value to notifier template (#41951)
* add value to email template

* add value to default template

* update test string

* test: fix ngalert test suite

* test: run CI

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2021-11-22 08:45:44 +01:00
Jean-Philippe Quéméner
b9cdad3814
Alerting: support mute timings configuration through the api for the embedded alertmanager (#41533)
* Alerting: accept mute_timing_intervals through the api for the embedded alertmanager

* add workaround for mutetimeinterval

* add mute timings to routes

* revert changes

* Update pkg/services/ngalert/api/api_alertmanager.go

* Update pkg/services/ngalert/api/api_alertmanager.go

* Update pkg/services/ngalert/api/api_alertmanager.go

* update prometheus/alertmanager dependency

* add some var docs
2021-11-19 16:50:55 +01:00
George Robinson
ac59b7615d
Alerting: README.md for ngalert 2021-11-18 10:26:42 +00:00
George Robinson
5f5298ad25
Alerting: Use require.ElementsMatch in TestEvaluateExecutionResultsNoData 2021-11-16 18:58:48 +01:00
George Robinson
4d288cc6c7
Alerting: Fix NoData tests (#41759) 2021-11-16 16:41:32 +00:00