Commit Graph

364 Commits

Author SHA1 Message Date
Peter Holmberg
4ef58e595c
Alerting: Add validation to slack contact point (#45618)
* add requiredifempty

* rename field, fix logic

* update mockdata

* remove logs

* update test

* fix json expected payload in e2e tests

* fix test

* fix test again

Co-authored-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>
2022-02-25 15:10:21 +01:00
George Robinson
6cccbb5a09
Fix incorrect metric values for scheduler_behind_seconds (#45830) 2022-02-25 11:40:30 +00:00
George Robinson
2ca79ca0c7
Rename evalCtx to avoid confusion with context.Context (#45144) 2022-02-25 11:09:20 +01:00
George Robinson
feae959c9d
Alerting: Create annotation if Firing alert is removed (#45703)
This commit changes staleResultsHandler to create an annotation if the current state is Alerting and the result is being removed from the state cache as it has not been updated since 2x the evaluation interval.
2022-02-24 16:25:28 +00:00
George Robinson
8d57318941
Alerting: Use expanded labels in dashboard annotations (#45726) 2022-02-24 10:58:54 +00:00
Nathan Rodman
f9701d78b1
Alerting: add field for custom slack endpoint (#45751)
* add field for custom slack endpoint

* add test for using custom endpoint

* Update pkg/services/ngalert/notifier/channels/slack.go

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>

* specify description for endpoint

* remove brittle string constants

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2022-02-23 14:33:23 -08:00
Yuriy Tseretyan
f75bea481d
Alerting: validate rules and calculate changes in API controller (#45072)
* Update API controller
   - add validation of rules API model
   - add function to calculate changes between the submitted alerts and existing alerts
   - update RoutePostNameRulesConfig to validate input models, calculate changes and apply in a transaction

* Update DBStore
   - delete unused storage method. All the logic is moved upstream.
   - upsert to not modify fields of new by values from the existing alert
   - if rule has UID do not try to pull it from db. (it is done upstream)

* Add rule generator
2022-02-23 11:30:04 -05:00
Yuriy Tseretyan
e44ea3d589
Alerting: Rename setting AlertForDuration to DefaultRuleEvaluationInterval (#45569)
* fix AlertForDuration to DefaultRuleEvaluationInterval

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2022-02-18 10:05:06 -05:00
Selene
d5b98772ed
Dashboards: Refactor service to make it injectable by wire (#44588)
* Add providers to folder and dashboard services

* Refactor folder and dashboard services

* Move store implementation to its own file due wire cannot allow us to cast to SQLStore

* Add store in some places and more missing dependencies

* Bad merge fix

* Remove old functions from tests and few fixes

* Fix provisioning

* Remove store from http server and some test fixes

* Test fixes

* Fix dashboard and folder tests

* Fix library tests

* Fix provisioning tests

* Fix plugins manager tests

* Fix alert and org users tests

* Refactor service package and more test fixes

* Fix dashboard_test tets

* Fix api tests

* Some lint fixes

* Fix lint

* More lint :/

* Move dashboard integration tests to dashboards service and fix dependencies

* Lint + tests

* More integration tests fixes

* Lint

* Lint again

* Fix tests again and again anda again

* Update searchstore_test

* Fix goimports

* More go imports

* More imports fixes

* Fix lint

* Move UnprovisionDashboard function into dashboard service and remove bus

* Use search service instead of bus

* Fix test

* Fix go imports

* Use nil in tests
2022-02-16 14:15:44 +01:00
Yuriy Tseretyan
02f8e99ca1
Alerting: move fake stores to store package (#45428)
* make fake storage public
* move fake storages to store package
2022-02-15 17:24:39 -05:00
Yuriy Tseretyan
095ea44e97
Alerting: Move BaseInterval and MinInterval to UnifiedAlerting config (#45107)
* use base interval if legacy value is less than the base interval
2022-02-11 16:13:49 -05:00
George Robinson
4e3a72fc2a
Add context.Context to AlertingStore (#45069) 2022-02-09 09:22:09 +00:00
Yuriy Tseretyan
ea236c276e
add missing option to swagger spec (#45070) 2022-02-08 10:09:37 -05:00
George Robinson
67a3e1d6fd
Add context.Context to InstanceStore (#45049) 2022-02-08 13:49:04 +00:00
Sofia Papagiannaki
35fe58de37
API: Extract OpenAPI specification from source code using go-swagger (#40528)
* API: Using go-swagger for extracting OpenAPI specification from source code

* Merge Grafana Alerting spec

* Include enterprise endpoints (if enabled)

* Serve SwaggerUI under feature flag

* Fix building dev docker images

* Configure swaggerUI

* Add missing json tags

Co-authored-by: Ying WANG <ying.wang@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
2022-02-08 13:38:43 +01:00
George Robinson
a9399ab3cd
Alerting: Add context.Context to RuleStore (#45004)
Alerting: Add context.Context to RuleStore
2022-02-08 08:52:03 +00:00
Alexander Weaver
935059a376
Alerting: Create basic storage layer for provisioning (#44679)
* Simplistic store API for provenance lookups on arbitrary types

* Add a few notes in comments

* Improved type safety for provisioned objects

* Clean-up TODOs for future PRs

* Clean up provisioning model

* Clean up tests

* Restrict allowable types in interface

* Fix linter error

* Move AlertRule domain methods to same file as AlertRule definition

* Update pkg/services/ngalert/models/provisioning.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Complete interface rename

* Pass context through store API

* More idiomatic method names

* Better error description

* Improve code-docs

* Use ORM language instead of raw sql

* Add support for records in different orgs

* ResourceTypeID -> ResourceType since it's not an ID

Co-authored-by: George Robinson <george.robinson@grafana.com>
2022-02-04 13:23:19 -06:00
Yuriy Tseretyan
ddfe2dce74
Alerting: Split grafana and lotex routes (#44742)
* split Lotex and Grafana routes
* update template to use authorize function for every route
2022-02-04 12:42:04 -05:00
idafurjes
7a23700e1a
Remove unused GetDashboard method (#44890)
* Remove unused GetDashboard method

* Uncomment test

* Fix dashboard service integration test

* Remove comment
2022-02-04 17:21:06 +01:00
George Robinson
9df43abbb5
Fix evaluation of alert rules for datasources with custom headers (#44862)
* Fix evaluation of alert rules for datasources with custom headers

* Fix unit tests

* Fix integration tests

* Evaluator fields should be package private
2022-02-04 14:56:37 +01:00
Yuriy Tseretyan
984c95de63
Do not store EvaluationString in Evaluation. (#44606)
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state
2022-02-02 19:18:20 +01:00
George Robinson
924deda589
Fix Discord Webhook URL for invalid template (#44763)
This commit fixes an issue where an invalid template for Discord would change the Webhook URL to "" and cause "unsupported protocol scheme" errors.
2022-02-02 14:28:41 +01:00
Santiago
04d93751b8
Alerting: send alerts to external, internal, or both alertmanagers (#40341)
* (WIP) send alerts to external, internal, or both alertmanagers

* Modify admin configuration endpoint, update swagger docs

* Integration test for admin config updated

* Code review changes

* Fix alertmanagers choice not changing bug, add unit test

* Add AlertmanagersChoice as enum in swagger, code review changes

* Fix API and tests errors

* Change enum from int to string, use 'SendAlertsTo' instead of 'AlertmanagerChoice' where necessary

* Fix tests to reflect last changes

* Keep senders running when alerts are handled just internally

* Check if any external AM has been discovered before sending alerts, update tests

* remove duplicate data from logs

* update comment

* represent alertmanagers choice as an int instead of a string

* default alertmanagers choice to all alertmanagers, test cases

* update definitions and generate spec
2022-02-01 20:36:55 -03:00
George Robinson
5e2280ceee
Add metrics to ngalert scheduler (#44602)
This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.
2022-01-31 16:56:43 +00:00
idafurjes
12420260ef
Remove bus from org invite api (#44530)
* Remove bus from org invite api

* Fix lint

* Remove comment
2022-01-31 17:24:52 +01:00
Serge Zaitsev
84a5910e56
Chore: Remove bus from ngalert (#44465)
* pass notification service down to the notifiers

* add ns to all notifiers

* remove bus from ngalert notifiers

* use smaller interfaces for notificationservice

* attempt to fix the tests

* remove unused struct field

* simplify notification service mock

* trying to resolve issues in the tests

* make linter happy

* make linter even happier

* linter, you are annoying
2022-01-26 16:42:40 +01:00
Jean-Philippe Quéméner
8ee3f59cd4
Alerting: recognize Cortex datasources correctly in the frontend (#44316)
* Alerting: always use msg field for user facing errors

* fix: revert front-end Cortex detection

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2022-01-21 15:44:11 +01:00
ying-jeanne
7422789ec7
Remove Macaron ParamsInt64 function from code base (#43810)
* draft commit

* change all calls

* Compilation errors
2022-01-15 00:55:57 +08:00
Yuriy Tseretyan
ed5c664e4a
Alerting: Stop firing of alert when it is updated (#39975)
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache. 
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry. 
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
2022-01-11 11:39:34 -05:00
Yuriy Tseretyan
ea478dec22
Alerting: Remove bridge between log15 and go-kit logger (#43769)
* remove bridge between log15 and go-kit logger.

* fix tests
2022-01-07 09:40:09 +01:00
ying-jeanne
a8eef45a44
Logger migration from log15 to gokit/log (#41636)
* migrate log15 to gokit/log

* fix console log

* update some unittest

* fix all unittest

* fix the build

* Update pkg/infra/log/log.go

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>

* general type vector

* correct the level key

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>
2022-01-06 22:28:05 +08:00
Alexander Weaver
fd583a0e3b
Alerting: Allow customization of Google chat message (#43568)
* Allow customizable googlechat message via optional setting

* Add optional message field in googlechat contact point configurator

* Fix strange error message on send if template fails to fully evaluate

* Elevate template evaluation failure logs to Warn level

* Extract default.title template embed from all channels to shared constant
2022-01-05 09:47:08 -06:00
idafurjes
8e6d6af744
Rename DispatchCtx to Dispatch (#43563) 2021-12-28 17:36:22 +01:00
idafurjes
7936c4c522
Rename AddHandlerCtx to AddHandler (#43557) 2021-12-28 16:08:07 +01:00
idafurjes
56c3875bb9
Chore: Remove context.TODO (#43458)
* Remove context.TODO() from services

* Fix live test
2021-12-28 10:26:18 +01:00
Alexander Weaver
56b3dc5445
Alerting: Allow configuration of non-ready alertmanagers (#43063)
* Create API test for overwriting invalid alertmanager config

* Avoid requiring alertmanager readiness for config changes

* AlertmanagerSrv depends on functionality rather than concrete types

* Add test for non-ready alertmanagers

* Additional cleanup and polish

* Back out previous integration test changes

* Refactor of tests incorrectly caused a test to become redundant

* Use pre-existing fake secret service

* Drop unused interface

* Test against concrete MultiOrgAlertmanager re-using fake infra from other tests

* Fix linter error

* Empty commit to rerun checks
2021-12-27 17:01:17 -06:00
Alexander Weaver
9abdaf251f
Alerting: Fix global state sensitivity in notifier channel tests (#43508) 2021-12-27 11:58:17 -06:00
idafurjes
b8852ef6a3
Chore: Remove context.TODO() (#43409)
* Remove context.TODO() from services

* Fix live test

* Remove context.TODO
2021-12-22 11:02:42 +01:00
Jean-Philippe Quéméner
ffc72aa255
Alerting: fix gosec warning that is not valid (#43425) 2021-12-21 19:47:47 +01:00
idafurjes
ff3cf94b56
Chore: Remove context.TODO() from services (#42555)
* Remove context.TODO() from services

* Fix live test
2021-12-20 17:05:33 +01:00
Yuriy Tseretyan
1a762083d7
Alerting: make alert rule routine evaluation control be thread-safe (#41220)
* change registry.delete to return deleted struct
* use pointer to alertRuleInfo instead copying.
* do not access evaluation channel when routine is stopped
* remove stopCh and use context cancellation
* do not return ctx.Err when channel is cancelled because it cancels all other routines
* make alertRuleInfo fields and functions package private
2021-12-16 14:52:47 -05:00
Ryan McKinley
2754e4fdf0
Expressions: use datasource model from the query (#41376)
* refactor datasource loading

* refactor datasource loading

* pass uid

* use dscache in alerting to get DS

* remove expr/translate pacakge

* remove dup injection entry

* fix DS type on metrics endpoint, remove SQL DS lookup inside SSE

* update test and adapter

* comment fix

* Make eval run as admin when getting datasource info

Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>

* fmt and comment

* remove unncessary/redundant code

Co-authored-by: Kyle Brandt <kyle@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2021-12-16 13:51:46 -03:00
Jean-Philippe Quéméner
b605340668
Alerting: log errors happening in the API on server side (#43192)
* Alerting: log errors happening in the API on server side

* adapt tests to reflect changed payload
2021-12-16 13:33:10 +01:00
Gilles De Mey
bb3b5c10e7
Alerting: fix WeCom channel notifier test assertion (#43173) 2021-12-15 19:45:12 +01:00
Gilles De Mey
cbbbb505b4
Alerting: use HTML-safe characters for the default template (#43148) 2021-12-15 17:57:08 +01:00
smallpath
aec14cba42
Alerting: Support WeCom as a contact point type (#40975)
* add wecom notifier

* fix backend lint

* fix alerting channel test

* update wecom doc

* update notifiers

* update wecom notifier test

* Apply suggestions from code review

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* unify wecom alerting

* fix backend lint

* fix front lint

* fix wecom test

* update docs

* Update pkg/services/ngalert/notifier/channels/wecom.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* remove old wecom notifier

* remove old notifier doc

* fix backend test

* Update docs/sources/alerting/unified-alerting/contact-points.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* fix doc style

Co-authored-by: gotjosh <josue.abreu@gmail.com>
2021-12-15 16:42:03 +00:00
Yuriy Tseretyan
1db9b1e6a9
Improve bridge for Alertmanager logger (#42958)
* Implement go-kit/log.Logger for internal logger.
2021-12-13 09:41:53 -05:00
Sofia Papagiannaki
c6483cd8ed
Alerting: Refactor API handlers to use web.Bind (#42600)
* Alerting: Refactor API handlers to use web.Bind

* lint
2021-12-13 09:22:57 +01:00
gotjosh
bdab1d1f1f
Fix flaky tests in several notifiers (#42668)
* Fix flaky tests in several notifiers

- Non-mocked time in sensu go tests
- Close server in Slack tests
- Use a mutex for writing responses in the fake slack server

* Remove mutex at the fake slack server
2021-12-03 12:34:31 +00:00
George Robinson
c932dc959c
Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630) 2021-12-03 09:55:16 +00:00