Commit Graph

145 Commits

Author SHA1 Message Date
Joan López de la Franca Beltran
722c414fef
Encryption: Refactor securejsondata.SecureJsonData to stop relying on global functions (#38865)
* Encryption: Add support to encrypt/decrypt sjd

* Add datasources.Service as a proxy to datasources db operations

* Encrypt ds.SecureJsonData before calling SQLStore

* Move ds cache code into ds service

* Fix tlsmanager tests

* Fix pluginproxy tests

* Remove some securejsondata.GetEncryptedJsonData usages

* Add pluginsettings.Service as a proxy for plugin settings db operations

* Add AlertNotificationService as a proxy for alert notification db operations

* Remove some securejsondata.GetEncryptedJsonData usages

* Remove more securejsondata.GetEncryptedJsonData usages

* Fix lint errors

* Minor fixes

* Remove encryption global functions usages from ngalert

* Fix lint errors

* Minor fixes

* Minor fixes

* Remove securejsondata.DecryptedValue usage

* Refactor the refactor

* Remove securejsondata.DecryptedValue usage

* Move securejsondata to migrations package

* Move securejsondata to migrations package

* Minor fix

* Fix integration test

* Fix integration tests

* Undo undesired changes

* Fix tests

* Add context.Context into encryption methods

* Fix tests

* Fix tests

* Fix tests

* Trigger CI

* Fix test

* Add names to params of encryption service interface

* Remove bus from CacheServiceImpl

* Add logging

* Add keys to logger

Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>

* Add missing key to logger

Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>

* Undo changes in markdown files

* Fix formatting

* Add context to secrets service

* Rename decryptSecureJsonData to decryptSecureJsonDataFn

* Name args in GetDecryptedValueFn

* Add template back to NewAlertmanagerNotifier

* Copy GetDecryptedValueFn to ngalert

* Add logging to pluginsettings

* Fix pluginsettings test

Co-authored-by: Tania B <yalyna.ts@gmail.com>
Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>
2021-10-07 17:33:50 +03:00
Santiago
562cd9e44e
Alerting template functions (#39261)
* Alerting: (wip) add template funcs

* Alerting: (wip) numeric template functions

* Alerting: (wip) template functions

* Test for the "args" function

* Alerting: (wip) Documentation for template functions

* Alerting: template functions - refactor

* code review changes

* disable linter error

* Use Prometheus implementation of TemplateExpander

* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* change templateCaptureValue to support using template functions

* Update pkg/services/ngalert/state/template.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Test and documentation added for reReplaceAll template function

* complete missing functions, documentation and tests

* Use the alert instance's evaluation time for expanding the template

* strvalue graphlink and tablelink functions

* delete duplicate test

* make strvalue return an empty string

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2021-10-04 15:04:37 -03:00
Yuriy Tseretyan
2b4e51f478
Alerting: Parse App URL only once (#39855) 2021-09-30 12:51:20 -04:00
Sofia Papagiannaki
012d4f0905
Alerting: Remove ngalert feature toggle and introduce two new settings for enabling Grafana 8 alerts and disabling them for specific organisations (#38746)
* Remove `ngalert` feature toggle

* Update frontend

Remove all references of ngalert feature toggle

* Update docs

* Disable unified alerting for specific orgs

* Add backend tests

* Apply suggestions from code review

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Disabled unified alerting by default

* Ensure backward compatibility with old ngalert feature toggle

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-29 16:16:40 +02:00
Sofia Papagiannaki
f6f3a54742
Alerting: tune rule evaluation via configuration (#35623)
* Alerting: Configure max evaluation retries

* Alerting: Enforce minimum rule evaluation interval

* Alerting: Disable rule evaluation from configuration

* Update docs

* Alerting: Configure rule evaluation timeout

* Move options on unified_alerting config section

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-28 13:00:16 +03:00
gotjosh
2ad82b9354
Alerting: Move the unified alerting settings to its own struct (#39350) 2021-09-20 10:12:21 +03:00
gotjosh
7db97097c9
Alerting: Support Unified Alerting with Grafana HA (#37920)
* Alerting: Support Unified Alerting in Grafana's HA mode.
2021-09-16 15:33:51 +01:00
gotjosh
a2f4344bf2
Alerting: Refactor & fix unified alerting metrics structure (#39151)
* Alerting: Refactor & fix unified alerting metrics structure

Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.

This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
2021-09-14 12:55:01 +01:00
gotjosh
39a3bb8a1c
Alerting: Persist notification log and silences to the database (#39005)
* Alerting: Persist notification log and silences to the database

This removes the dependency of having persistent disk to run grafana alerting. Instead of regularly flushing the notification log and silences to disk we now flush the binary content of those files to the database encoded as a base64 string.
2021-09-09 17:25:22 +01:00
Arve Knudsen
78596a6756
Migrate to Wire for dependency injection (#32289)
Fixes #30144

Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
2021-08-25 15:11:22 +02:00
David Parrott
7fbeefc090
Alerting: create wrapper for Alertmanager to enable org level isolation (#37320)
Introduces org-level isolation for the Alertmanager and its components.

Silences, Alerts and Contact points are not separated by org and are not shared between them.

Co-authored with @davidmparrott and @papagian
2021-08-24 11:28:09 +01:00
gotjosh
f3f3fcc727
Alerting: Introduces /api/v1/ngalert/alertmanagers to expose discovered and dropped Alertmanager(s) (#37632)
* Alerting: Expose discovered and dropped Alertmanagers

Exposes the API for discovered and dropped Alertmanagers.

* make admin config poll interval configurable

* update after rebase

* wordsmith

* More wordsmithing

* change name of the config

* settings package too
2021-08-13 13:14:36 +01:00
gotjosh
f83cd401e5
Alerting: Send alerts to external Alertmanager(s) (#37298)
* Alerting: Send alerts to external Alertmanager(s)

Within this PR we're adding support for registering or unregistering
sending to a set of external alertmanagers. A few of the things that are
going are:

- Introduce a new table to hold "admin" (either org or global)
  configuration we can change at runtime.
- A new periodic check that polls for this configuration and adjusts the
  "senders" accordingly.
- Introduces a new concept of "senders" that are responsible for
  shipping the alerts to the external Alertmanager(s). In a nutshell,
this is the Prometheus notifier (the one in charge of sending the alert)
mapped to a multi-tenant map.

There are a few code movements here and there but those are minor, I
tried to keep things intact as much as possible so that we could have an
easier diff.
2021-08-06 13:06:56 +01:00
gotjosh
442a6677fc
Alerting: Refactor Run of the scheduler (#37157)
* Alerting: Refactor `Run` of the scheduler

A bit of a refactor to make the diff easier to read for supporting
external Alertmanagers.

We'll introduce another routine that checks the database for
configuration and spawns other routines accordingly.

* Block the wait.

* Fix test
2021-07-27 11:52:59 +01:00
gotjosh
a86ad1190c
Alerting: Refactor state manager as a dependency (#36513)
* Alerting: Refactor state manager as a dependency

Within the scheduler, the state manager was being passed around a
certain number of functions. I've introduced it as a dependency to keep
the "service" interfaces as clean and homogeneous as possible.

This is relevant, because I'm going to introduce live reload of these
components as part of my next PR and it is better if dependencies are
self-contained.

* remove unused functions

* Fix a few more tests

* Make sure the `stateManager` is declared before the schedule
2021-07-07 17:18:31 +01:00
Ganesh Vernekar
dcd4bf1615
Alerting: Fill the empty GeneratorURL (#35740)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-16 15:34:12 +05:30
Sofia Papagiannaki
abe35c8c01
Alerting: Add error recovery during rule evaluations (#35450)
* Alerting: Eval recovery after query failure

* Apply suggestions from code review
2021-06-15 19:30:21 +03:00
Sofia Papagiannaki
23939eab10
[Alerting]: namespace fixes (#34470)
* [Alerting]: forbid viewers for updating rules if viewers can edit

check for CanSave instead of CanEdit

* Clear ngalert tables when deleting the folder

* Apply suggestions from code review

* Log failure to check save permission

Co-authored-by: gotjosh <josue@grafana.com>
2021-05-20 15:49:33 +03:00
Kyle Brandt
331991ca10
UAlerting: Increase default max datapoints (#34223)
Change const value from 100 to 43200 (12 hours at 1sec interval)
2021-05-17 18:46:52 +02:00
Owen Diehl
1367f7171e
Alerting/ruler metrics (#34144)
* adds active configurations metric

* rule evaluation metrics

* ruler metrics

* pr feedback
2021-05-14 16:13:44 -04:00
Owen Diehl
baca873a84
extracts alertmanager from DI, including migrations (#34071)
* extracts alertmanager from DI, including migrations

* includes alertmanager Run method in ngalert

* removes 3s test shutdown timeout

* lint
2021-05-13 14:01:38 -04:00
Kyle Brandt
3da8db7f3f
Alerting: Run table migrations regardless of feature flag and move out of service (#33996) 2021-05-12 14:39:48 -04:00
Sofia Papagiannaki
540f110220
[Alerting]: Extend quota service to optionally set limits on alerts (#33283)
* Quota: Extend service to set limit on alerts

* Add test for applying quota to alert rules

* Apply suggestions from code review

Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>

* Get used alert quota only if naglert is enabled

* Set alert limit to zero if nglalert is not enabled
Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>
2021-05-04 19:16:28 +03:00
Kyle Brandt
c1034f3118
Alerting: Create instanceStore (#33587)
for https://github.com/grafana/alerting-squad/issues/129
2021-05-03 07:19:15 -04:00
Owen Diehl
5e48b54549
Alerting/metrics (#33547)
* moves alerting metrics to their own pkg

* adds grafana_alerting_alerts (by state) metric

* alerts_received_{total,invalid}

* embed alertmanager alerting struct in ng metrics & remove duplicated notification metrics (already embed alertmanager notifier metrics)

* use silence metrics from alertmanager lib

* fix - manager has metrics

* updates ngalert tests

* comment lint
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>

* cleaner prom registry code

* removes ngalert global metrics

* new registry use in all tests

* ngalert metrics impl service, hack testinfra code to prevent duplicate metric registrations

* nilmetrics unexported
2021-04-30 12:28:06 -04:00
David Parrott
788bc2a793
Alerting: refactor state tracker (#33292)
* set processing time

* merge labels and set on response

* use state cache for adding alerts to rules

* minor cleanup

* add support for NoData and Error results

* rename test

* bring in changes from other PRs tha have been merged

* pr feedback

* add integration test

* close state tracker cleanup on context.Done

* fixup test

* rename state tracker

* set EvaluationDuration on Result

* default labels set as constants

* separate cache and state from manager

* use RWMutex in cache
2021-04-23 21:32:25 +02:00
David Parrott
c0d83fc01e
Alerting: Return cached alerts for prometheus/api/v1/alerts (#32654)
* Return cached alerts for prometheus/api/v1/alerts

* Return not implemented for /prometheus/grafana/api/v1/rules

* Set StartsAt for already alerting states

* Fix tests
2021-04-05 15:05:39 -07:00
Sofia Papagiannaki
daabf64aa1
[Alerting]: Update scheduler to evaluate rules created by the unified API (#32589)
* Update scheduler

* Fix tests

* Fixes after code review feedback

* lint - add uncommitted modifications

Co-authored-by: kyle <kyle@grafana.com>
2021-04-03 20:13:29 +03:00
David Parrott
2a8446e435
Alerting: Persist alerts on evaluation and shutdown. Warm cache from DB on startup (#32576)
* Initial commit for state tracking

* basic state transition logic and tests

* constructor. test and interface fixup

* use new sig for sch.definitionRoutine()

* test fixup

* make the linter happy

* more minor linting cleanup

* Alerting: Send alerts from state tracker to notifier

* Add evaluation time and test

Add evaluation time and test

* Add cleanup routine and logging

* Pull in compact.go and reconcile differences

* Save alert transitions and save all state on shutdown

* pr feedback

* WIP

* WIP

* Persist alerts on evaluation and shutdown. Warm cache on startup

* Filter non-firing alerts before sending to notifier

Co-authored-by: Josue Abreu <josue@grafana.com>
2021-04-02 08:11:33 -07:00
Sofia Papagiannaki
8793f5c7f8
[Alerting]: Delete obsolete database table and code (#32595)
* Delete obsolete migration

* Remove redundant code
2021-04-01 19:41:57 +03:00
Sofia Papagiannaki
ee06970d72
[Alerting]: Grafana managed ruler API implementation (#32537)
* [Alerting]: Grafana managed ruler API impl

* Apply suggestions from code review

* fix lint

* Add validation for ruleGroup name length

* Fix MySQL migration

Co-authored-by: kyle <kyle@grafana.com>
2021-04-01 11:11:45 +03:00
Sofia Papagiannaki
a5e95823b2
[Alerting]: Alertmanager API implementation (#32174)
* Add validation for grafana recipient

* Alertmanager API implementation (WIP)

* Fix encoding/decoding receiver settings from/to YAML

* Save templates together with the configuration

* update POST to apply latest config

* Alertmanager service enabled by the ngalert toggle

* Silence API integration with Alertmanager

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-03-31 23:00:56 +03:00
David Parrott
b1cb74c0c9
Alerting: Send alerts from state tracker to notifier, logging, and cleanup task (#32333)
* Initial commit for state tracking

* basic state transition logic and tests

* constructor. test and interface fixup

* use new sig for sch.definitionRoutine()

* test fixup

* make the linter happy

* more minor linting cleanup

* Alerting: Send alerts from state tracker to notifier

* Add evaluation time and test

Add evaluation time and test

* Add cleanup routine and logging

* Pull in compact.go and reconcile differences

* pr feedback

* pr feedback

Pull in compact.go and reconcile differences

Co-authored-by: Josue Abreu <josue@grafana.com>
2021-03-30 09:37:56 -07:00
David Parrott
d33a77a67f
Alerting: add state tracker to alerting evaluation (#32298)
* Initial commit for state tracking

* basic state transition logic and tests

* constructor. test and interface fixup

* use new sig for sch.definitionRoutine()

* test fixup

* make the linter happy

* more minor linting cleanup
2021-03-24 15:34:18 -07:00
gotjosh
9b52ffc6a9
Alerting: Fetch configuration from the database and run a notification service (#32175)
* Alerting: Fetch configuration from the database and run a notification
instance

Co-Authored-By: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-03-24 14:20:44 +00:00
Owen Diehl
93d0f7163f
[Alerting] Forking LoTex ruler (#32138)
* updates alerting api to master

* skeleton for lotex ruler

* withPath helper & legacyRulerPrefix const

* forked ruler

* wires up proxy

* safeMacaronWrapper

* working proxy

* jsonExtractor

* lint
2021-03-19 10:32:13 -04:00
gotjosh
cc74b1fe46
Alerting: Add database table for persisting alerting configuration (#32042)
* Alerting: Add database table for persisting alerting configuration

* Fix the linter

* Address review comments

* Don't split templates and configuration

It is already bundled together as part of a of the API so might as well
marshall it directly.
2021-03-18 18:12:28 +00:00
Sofia Papagiannaki
4ce0a49eac
AlertingNG: Split into several packages (#31719)
* AlertingNG: Split into several packages

* Move AlertQuery to models
2021-03-08 22:19:21 +02:00
Arve Knudsen
b79e61656a
Introduce TSDB service (#31520)
* Introduce TSDB service

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

Co-authored-by: Erik Sundell <erik.sundell87@gmail.com>
Co-authored-by: Will Browne <will.browne@grafana.com>
Co-authored-by: Torkel Ödegaard <torkel@grafana.org>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Zoltán Bedi <zoltan.bedi@gmail.com>
2021-03-08 07:02:49 +01:00
Sofia Papagiannaki
bd2390c49f
AlertingNG: code refactoring (#30787)
* AlertingNG: refactoring

* Fix tests
2021-03-03 17:52:19 +02:00
Sofia Papagiannaki
9ada4b6052
Expressions: Add option to disable feature (#30541)
* Expressions: Add option to disable feature

* Apply suggestions from code review

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2021-01-22 19:27:33 +02:00
Sofia Papagiannaki
8c31e25926
AlertingNG: Save alert instances (#30223)
* AlertingNG: Save alert instances

Co-authored-by: Kyle Brandt <kyle@grafana.com>

* Rename alert instance fields/columns

* Include definition title in listing alert instances

* Delete instances when deleting defintion

Co-authored-by: Kyle Brandt <kyle@grafana.com>
2021-01-18 20:57:17 +02:00
Sofia Papagiannaki
5560be73bf
Alerting NG: update API to expect UIDs instead of IDs (#29896)
* Change API to expect UIDs instead of ID

* Remove unnecessary transactions

When only one query is executed

* Modify API responses

* Cleanup tests

* Use globally orgID and UID for identifying alert definitions
2021-01-07 17:45:42 +02:00
Sofia Papagiannaki
3cac10e598
AlertingNG: Create a scheduler to evaluate alert definitions (#29305)
* Always use cache: stop passing skipCache among ngalert functions

* Add updated column

* Scheduler initial draft

* Add retry on failure

* Allow settting/updating alert definition interval

Set default interval if no interval is provided during alert definition creation.
Keep existing alert definition interval if no interval is provided during alert definition update.

* Parameterise alerting.Ticker to run on custom interval

* Allow updating alert definition interval without having to provide the queries and expressions

* Add schedule tests

* Use xorm tags for having initialisms with consistent case in Go

* Add ability to pause/unpause the scheduler

* Add alert definition versioning

* Optimise scheduler to fetch alert definition only when it's necessary

* Change MySQL data column to mediumtext

* Delete alert definition versions

* Increase default scheduler interval to 10 seconds

* Fix setting OrgID on updates

* Add validation for alert definition name length

* Recreate tables
2020-12-17 16:00:09 +02:00
Sofia Papagiannaki
43f580c299
AlertingNG: manage and evaluate alert definitions via the API (#28377)
* Alerting NG: prototype v2 (WIP)

* Separate eval package

* Modify eval alert definition endpoint

* Disable migration if ngalert is not enabled

* Remove premature test

* Fix lint issues

* Delete obsolete struct

* Apply suggestions from code review

* Update pkg/services/ngalert/ngalert.go

Co-authored-by: Kyle Brandt <kyle@grafana.com>

* Add API endpoint for listing alert definitions

* Introduce index for alert_definition table

* make ds object for expression to avoid panic

* wrap error

* Update pkg/services/ngalert/eval/eval.go

* Swith to backend.DataQuery

* Export TransformWrapper callback

* Fix lint issues

* Update pkg/services/ngalert/ngalert.go

Co-authored-by: Kyle Brandt <kyle@grafana.com>

* Validate alert definitions before storing them

* Introduce AlertQuery

* Add test

* Add QueryType in AlertQuery

* Accept only float64 (seconds) durations

* Apply suggestions from code review

* Get rid of bus

* Do not export symbols

* Fix failing test

* Fix failure due to service initialization order

Introduce MediumHigh service priority and assign it to backendplugin
service

* Fix test

* Apply suggestions from code review

* Fix renamed reference

Co-authored-by: Kyle Brandt <kyle@grafana.com>
2020-11-12 15:11:30 +02:00