Commit Graph

34 Commits

Author SHA1 Message Date
Yuri Tseretyan
1eebd2a4de
Alerting: Support for simplified notification settings in rule API (#81011)
* Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping
* Support validation of notification settings when rules are updated

* Implement route generator for Alertmanager configuration. That fetches all notification settings.
* Update multi-tenant Alertmanager to run the generator before applying the configuration.

* Add notification settings labels to state calculation
* update the Multi-tenant Alertmanager to provide validation for notification settings

* update GET API so only admins can see auto-gen
2024-02-15 09:45:10 -05:00
William Wernert
2203bc2a3d
Alerting: Refactor provisioning tests/fakes (#81205)
* Fix up test Alertmanager config JSON

* Move fake AM config and provisioning stores to fakes package
2024-01-24 17:15:55 -05:00
Santiago
73776f37eb
Alerting: Send state to the remote Alertmanager (#78538)
* Alerting: Introduce a Mimir client as part of the Remote Alertmanager

Mimir client that understands the new APIs developed for mimir. Very much a WIP still.

* more wip

* appease the linter

* more linting

* add more code

* get state from kvstore, encode, send

* send state to the remote Alertmanager, extract fullstate logic into its own function

* pass kvstore to remote.NewAlertmanager()

* refactor

* add fake kvstore to tests

* tests

* use FileStore to get state

* always log 'completed state upload'

* refactor compareRemoteConfig

* base64-encode the state in the file store

* export silences and nflog filenames, refactor

* log 'completed state/config upload...' regardless of outcome

* add values to the state store in tests

* address code review comments

* log error from filestore

---------

Co-authored-by: gotjosh <josue.abreu@gmail.com>
2023-11-29 12:49:39 +01:00
Santiago
a6b9b27673
Alerting: Remove OrgID() from the Alertmanager interface (#77398) 2023-10-31 10:58:47 +01:00
Santiago
f9fc2e4568
Alerting: Remove ConfigHash() from the Alertmanager interface (#77134) 2023-10-25 17:11:53 +02:00
Matthew Jacobson
82f3127e23
Alerting: Move legacy alert migration from sqlstore migration to service (#72702) 2023-10-12 13:43:10 +01:00
Alexander Weaver
f6649d7a97
Revert "Alerting: Remove vendored models in migration service" (#76387)
Revert "Alerting: Remove vendored models in migration service (#74503)"

This reverts commit 6a8649d544.
2023-10-11 14:21:21 -05:00
Matthew Jacobson
6a8649d544
Alerting: Remove vendored models in migration service (#74503)
This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls.

It also fills in some gaps in the testing suite around:

    - Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
    - Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993

Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.
2023-10-11 17:22:09 +01:00
Santiago
93b9f9b537
Alerting: Use interfaces for the Alertmanager (#73900) 2023-09-06 07:59:29 -03:00
Matthew Jacobson
85f738cdf9
Alerting: Add endpoint to revert to a previous alertmanager configuration (#65751)
* Alerting: Add endpoint to revert to a previous alertmanager configuration

This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
2023-04-05 14:10:03 -04:00
Santiago
aba91d3053
Alerting: Fetch all applied alerting configurations (#65728)
* WIP

* skip invalid historic configurations instead of erroring

* add warning log when bad historic config is found

* remove unused custom marshaller for GettableHistoricUserConfig

* add id to historic user config, move limit check to store, fix typo

* swagger spec
2023-03-31 17:43:04 -03:00
Santiago
ba731f7865
Alerting: Mark AM configuration as applied (#61330)
* Mark AM configuration as applied

* add missing checks, make linter happy

* fix deadlock, mark as valid on save and on load

* mark configurations only if needed

* check error after applyConfig()

* code review comments

* code review changes

* more code review changes

* clean HistoricConfigFromAlertConfig function
2023-02-02 14:45:17 -03:00
Serge Zaitsev
d6d4097567
Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
gotjosh
e7cd6eb13c
Alerting: Use alerting.GrafanaAlertmanager instead of initialising Alertmanager components directly (#61230)
* Alerting: Use `alerting.GrafanaAlertmanager` instead of initialising Alertmanager components directly
2023-01-13 12:54:38 -04:00
Alexander Weaver
8310789ef1
Indicate whether routes are provisioned when GETting Alertmanager configuration (#47857)
* Test composition simplification from last PR

* Policies use proper API model everywhere

* Expose policy provenance in API, miss some dep injection

* Complete injection

* fix args

* Tests for provenance value

* Extract test helpers so tests are very readable

* Single source adapter struct that was copied in 3 places

* Drop redundant test

* Resolve merge conflicts on changelog
2022-04-22 11:57:56 -05:00
Alexander Weaver
758364e78b
Alerting: Refactor GET/POST alerting config routes to be more extensible (#47229)
* Refactor GET am config to be extensible

* Extract post config route

* Fix tests

* Remove temporary duplication

* Fix broken test due to layer shift

* Fix duplicated error message

* Properly return 400 on config rejection

* Revert weird half method extraction

* Move things to notifier package and avoid redundant interface

* Simplify documentation

* Split encryption service and depend on minimal abstractions

* Properly initialize things all the way up to the composition root

* Encryption -> Crypto

* Address misc feedback

* Missing docstring

* Few more simple polish improvements

* Unify on MultiOrgAlertmanager. Discover bug in existing test

* Fix rebase conflicts

* Misc feedback, renames, docs

* Access crypto hanging off MultiOrgAlertmanager rather than having a separate API to initialize
2022-04-14 13:06:21 -05:00
Eng Zer Jun
b56848f006
test: use T.TempDir to create temporary test directory (#44947)
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-03-22 15:43:29 +01:00
Serge Zaitsev
84a5910e56
Chore: Remove bus from ngalert (#44465)
* pass notification service down to the notifiers

* add ns to all notifiers

* remove bus from ngalert notifiers

* use smaller interfaces for notificationservice

* attempt to fix the tests

* remove unused struct field

* simplify notification service mock

* trying to resolve issues in the tests

* make linter happy

* make linter even happier

* linter, you are annoying
2022-01-26 16:42:40 +01:00
Alexander Weaver
56b3dc5445
Alerting: Allow configuration of non-ready alertmanagers (#43063)
* Create API test for overwriting invalid alertmanager config

* Avoid requiring alertmanager readiness for config changes

* AlertmanagerSrv depends on functionality rather than concrete types

* Add test for non-ready alertmanagers

* Additional cleanup and polish

* Back out previous integration test changes

* Refactor of tests incorrectly caused a test to become redundant

* Use pre-existing fake secret service

* Drop unused interface

* Test against concrete MultiOrgAlertmanager re-using fake infra from other tests

* Fix linter error

* Empty commit to rerun checks
2021-12-27 17:01:17 -06:00
Tania B
5652bde447
Encryption: Use secrets service (#40251)
* Use secrets service in pluginproxy

* Use secrets service in pluginxontext

* Use secrets service in pluginsettings

* Use secrets service in provisioning

* Use secrets service in authinfoservice

* Use secrets service in api

* Use secrets service in sqlstore

* Use secrets service in dashboardshapshots

* Use secrets service in tsdb

* Use secrets service in datasources

* Use secrets service in alerting

* Use secrets service in ngalert

* Break cyclic dependancy

* Refactor service

* Break cyclic dependancy

* Add FakeSecretsStore

* Setup Secrets Service in sqlstore

* Fix

* Continue secrets service refactoring

* Fix cyclic dependancy in sqlstore tests

* Fix secrets service references

* Fix linter errors

* Add fake secrets service for tests

* Refactor SetupTestSecretsService

* Update setting up secret service in tests

* Fix missing secrets service in multiorg_alertmanager_test

* Use fake db in tests and sort imports

* Use fake db in datasources tests

* Fix more tests

* Fix linter issues

* Attempt to fix plugin proxy tests

* Pass secrets service to getPluginProxiedRequest in pluginproxy tests

* Fix pluginproxy tests

* Revert using secrets service in alerting and provisioning

* Update decryptFn in alerting migration

* Rename defaultProvider to currentProvider

* Use fake secrets service in alert channels tests

* Refactor secrets service test helper

* Update setting up secrets service in tests

* Revert alerting changes in api

* Add comments

* Remove secrets service from background services

* Convert global encryption functions into vars

* Revert "Convert global encryption functions into vars"

This reverts commit 498eb19859.

* Add feature toggle for envelope encryption

* Rename toggle

Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>
Co-authored-by: Joan López de la Franca Beltran <joanjan14@gmail.com>
2021-11-04 18:47:21 +02:00
Jean-Philippe Quéméner
153c356993
Alerting: delete orphaned records from kvstore (#40337) 2021-10-14 12:04:00 +02:00
gotjosh
48d73cb148
Alerting: Fixes a bug when trying to sync broken alertmanager config (#40338)
* Alerting: Fixes a bug when trying to sync broken alertmanager config

Broken alertmanager configuration has the potential to be introduced as part of a migration e.g. due to incompatible data between what grafana accepts and what the Alertmanager expects. When this happens, we expect an eventually consistent behaviour where we'll keep trying to apply the configuration until it works.

As part of change in https://github.com/grafana/grafana/pull/39237 we introduced a regression that modified this behaviour and instead tried to create a new Alertmanager for that organization everytime, which eventually ended up in a panic due to a duplicate metrics being registered.

This PR fixes that and introduces a test to catch further regressions.

* Remove disable orgs
2021-10-12 18:10:08 +01:00
Jean-Philippe Quéméner
e1dfec49f9
Alerting: cleanup alert resources on org removal (#39938) 2021-10-12 12:05:02 +02:00
Joan López de la Franca Beltran
722c414fef
Encryption: Refactor securejsondata.SecureJsonData to stop relying on global functions (#38865)
* Encryption: Add support to encrypt/decrypt sjd

* Add datasources.Service as a proxy to datasources db operations

* Encrypt ds.SecureJsonData before calling SQLStore

* Move ds cache code into ds service

* Fix tlsmanager tests

* Fix pluginproxy tests

* Remove some securejsondata.GetEncryptedJsonData usages

* Add pluginsettings.Service as a proxy for plugin settings db operations

* Add AlertNotificationService as a proxy for alert notification db operations

* Remove some securejsondata.GetEncryptedJsonData usages

* Remove more securejsondata.GetEncryptedJsonData usages

* Fix lint errors

* Minor fixes

* Remove encryption global functions usages from ngalert

* Fix lint errors

* Minor fixes

* Minor fixes

* Remove securejsondata.DecryptedValue usage

* Refactor the refactor

* Remove securejsondata.DecryptedValue usage

* Move securejsondata to migrations package

* Move securejsondata to migrations package

* Minor fix

* Fix integration test

* Fix integration tests

* Undo undesired changes

* Fix tests

* Add context.Context into encryption methods

* Fix tests

* Fix tests

* Fix tests

* Trigger CI

* Fix test

* Add names to params of encryption service interface

* Remove bus from CacheServiceImpl

* Add logging

* Add keys to logger

Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>

* Add missing key to logger

Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>

* Undo changes in markdown files

* Fix formatting

* Add context to secrets service

* Rename decryptSecureJsonData to decryptSecureJsonDataFn

* Name args in GetDecryptedValueFn

* Add template back to NewAlertmanagerNotifier

* Copy GetDecryptedValueFn to ngalert

* Add logging to pluginsettings

* Fix pluginsettings test

Co-authored-by: Tania B <yalyna.ts@gmail.com>
Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>
2021-10-07 17:33:50 +03:00
Sofia Papagiannaki
012d4f0905
Alerting: Remove ngalert feature toggle and introduce two new settings for enabling Grafana 8 alerts and disabling them for specific organisations (#38746)
* Remove `ngalert` feature toggle

* Update frontend

Remove all references of ngalert feature toggle

* Update docs

* Disable unified alerting for specific orgs

* Add backend tests

* Apply suggestions from code review

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Disabled unified alerting by default

* Ensure backward compatibility with old ngalert feature toggle

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-29 16:16:40 +02:00
Sofia Papagiannaki
f6f3a54742
Alerting: tune rule evaluation via configuration (#35623)
* Alerting: Configure max evaluation retries

* Alerting: Enforce minimum rule evaluation interval

* Alerting: Disable rule evaluation from configuration

* Update docs

* Alerting: Configure rule evaluation timeout

* Move options on unified_alerting config section

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-28 13:00:16 +03:00
Yuriy Tseretyan
05eb30e323
Alerting: Move alertmanager default config to UnifiedAlertingSettings (#39597) 2021-09-23 13:52:20 -04:00
gotjosh
2ad82b9354
Alerting: Move the unified alerting settings to its own struct (#39350) 2021-09-20 10:12:21 +03:00
gotjosh
7db97097c9
Alerting: Support Unified Alerting with Grafana HA (#37920)
* Alerting: Support Unified Alerting in Grafana's HA mode.
2021-09-16 15:33:51 +01:00
Santiago
0d2e68537c
Alerting: Cleanup template, silence and notification files created du… (#39007)
* Alerting: Cleanup template, silence and notification files created during tests

* Create tempdir for testing, delete afterwards and check for errors

* Refactoring error checks

* Update docs/sources/enterprise/access-control/fine-grained-access-control-references.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update docs/sources/administration/configuration.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update docs/sources/enterprise/access-control/fine-grained-access-control-references.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2021-09-15 18:48:52 -03:00
gotjosh
a2f4344bf2
Alerting: Refactor & fix unified alerting metrics structure (#39151)
* Alerting: Refactor & fix unified alerting metrics structure

Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.

This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
2021-09-14 12:55:01 +01:00
Marcus Efraimsson
2cc0788187
Chore: Disable backend test for now since it adds 10 minutes extra in CI (#39150)
Ref #38586
2021-09-13 19:37:26 +02:00
gotjosh
39a3bb8a1c
Alerting: Persist notification log and silences to the database (#39005)
* Alerting: Persist notification log and silences to the database

This removes the dependency of having persistent disk to run grafana alerting. Instead of regularly flushing the notification log and silences to disk we now flush the binary content of those files to the database encoded as a base64 string.
2021-09-09 17:25:22 +01:00
David Parrott
7fbeefc090
Alerting: create wrapper for Alertmanager to enable org level isolation (#37320)
Introduces org-level isolation for the Alertmanager and its components.

Silences, Alerts and Contact points are not separated by org and are not shared between them.

Co-authored with @davidmparrott and @papagian
2021-08-24 11:28:09 +01:00