* Add single receiver method
* Add receiver permissions
* Add single/multi GET endpoints for receivers
* Remove stable tag from time intervals
See end of PR description here: https://github.com/grafana/grafana/pull/81672
* Create locking config store that mimics existing provisioning store
* Rename existing receivers(_test).go
* Introduce shared receiver group service
* Fix test
* Move query model to models package
* ReceiverGroup -> Receiver
* Remove locking config store
* Move convert methods to compat.go
* Cleanup
This commit prevents saving configurations containing inhibition
rules in Grafana Alertmanager. It does not reject inhibition
rules when using external Alertmanagers, such as Mimir. This meant
the validation had to be put in the MultiOrgAlertmanager instead of
in the validation of PostableUserConfig. We can remove this when
inhibition rules are supported in Grafana Managed Alerts.
AM config applied via API would use the PostableUserConfig as the AM raw
config and also the hash used to decide when the AM config has changed.
However, when applied via the periodic sync the PostableApiAlertingConfig would
be used instead.
This leads to two issues:
- Inconsistent hash comparisons when modifying the AM causing redundant applies.
- GetStatus assumed the raw config was PostableUserConfig causing the endpoint
to return correctly after a new config is applied via API and then nothing once
the periodic sync runs.
Note: Technically, the upstream GrafanaAlertamanger GetStatus shouldn't be
returning PostableUserConfig or PostableApiAlertingConfig, but instead
GettableStatus. However, this issue required changes elsewhere and is out of
scope.
* Alerting: Add metric to check for default AM configurations
* Use a gauge for the config hash
* don't go out of bounds when converting uint64 to float64
* expose metric for config hash
* update metrics after applying config
* Alerting: Add metrics to the remote Alertmanager struct
* rephrase http_requests_failed description
* make linter happy
* remove unnecessary metrics
* extract timed client to separate package
* use histogram collector from dskit
* remove weaveworks dependency
* capture metrics for all requests to the remote Alertmanager (both clients)
* use the timed client in the MimirAuthRoundTripper
* HTTPRequestsDuration -> HTTPRequestDuration, clean up mimir client factory function
* refactor
* less git diff
* gauge for last readiness check in seconds
* initialize LastReadinesCheck to 0, tweak metric names and descriptions
* add counters for sync attempts/errors
* last config sync and last state sync timestamps (gauges)
* change latency metric name
* metric for remote Alertmanager mode
* code review comments
* move label constants to metrics package
* (WIP) Alerting: Use the forked Alertmanager for remote secondary mode
* fall back to using internal AM in case of error
* remove TODOs, clean up .ini file, add orgId as part of remote AM config struct
* log warnings and errors, fall back to remoteSecondary, fall back to internal AM only
* extract logic to decide remote Alertmanager mode to a separate function, switch on mode
* tests
* make linter happy
* remove func to decide remote Alertmanager mode
* refactor factory function and options
* add default case to switch statement
* remove ineffectual assignment
* Alerting: Introduce a Mimir client as part of the Remote Alertmanager
Mimir client that understands the new APIs developed for mimir. Very much a WIP still.
* more wip
* appease the linter
* more linting
* add more code
* get state from kvstore, encode, send
* send state to the remote Alertmanager, extract fullstate logic into its own function
* pass kvstore to remote.NewAlertmanager()
* refactor
* add fake kvstore to tests
* tests
* use FileStore to get state
* always log 'completed state upload'
* refactor compareRemoteConfig
* base64-encode the state in the file store
* export silences and nflog filenames, refactor
* log 'completed state/config upload...' regardless of outcome
* add values to the state store in tests
* address code review comments
* log error from filestore
---------
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Add GetFullState method to FileStore
* make tests compile, create stateStore in NewAlertmanager
* return errors instead of logging, accept an arbitrary number of strings
* make NewAlertmanager() accept a stateStore
* Alerting: Add an empty Forked Alertmanager
* Alerting: Add methods for silences to the forked Alertmanager
* check for errors in tests
* make linter happy
* make linter happy
* Alerting: Add methods for silences to the forked Alertmanager
* Alerting: Move `ExternalAlertmanager` to its own package
We'll avoid import cycles when using components from other packages. In addition to that, I've created an `Options` approach for the multiorg alertmanger to allow us to override how per tenant alertmanagers are created.
* switch things around
* address review comments
* fix references and warnings
* Alerting: post alerts to the remote Alertmanager and fetch them
* fix broken tests
* Alerting: Add Mimir Backend image to devenv (blocks)
* add alerting as code owner for mimir_backend block
* Alerting: Use Mimir image to run integration tests for the remote Alertmanager
* skip integration test when running all tests
* skipping integration test when no Alertmanager URL is provided
* fix bad host for mimir_backend
* remove basic auth testing until we have an nginx image in our CI
* add integration tests for alerts
* fix tests
* change SendCtx -> Send, add context.Context to Send, fix CI
* add reover() for functions from the Prometheus Alertmanager HTTP client that could panic
* add TODO to implement PutAlerts in a way that mimicks what Prometheus does
* fix log format
* Alerting: Use Mimir image to run integration tests for the remote Alertmanager
* skip integration test when running all tests
* skipping integration test when no Alertmanager URL is provided
* fix bad host for mimir_backend
* remove basic auth testing until we have an nginx image in our CI
This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls.
It also fills in some gaps in the testing suite around:
- Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
- Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993
Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.
* Alerting: Manage remote Alertmanager silences
* fix typo
* check errors when encoding json in fake external AM
* take path from configured URL, check for nil responses
* make discord url secure
* support migrating unsecure settings to secure settings
* Update public/app/features/alerting/unified/utils/receiver-form.ts
Co-authored-by: William Wernert <william.wernert@grafana.com>
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
Co-authored-by: William Wernert <william.wernert@grafana.com>
* Alerting: Fix contact point testing with secure settings
Fixes double encryption of secure settings during contact point testing and removes code duplication
that helped cause the drift between alertmanager and test endpoint. Also adds integration tests to cover
the regression.
Note: provisioningStore is created to remove cycle and the unnecessary dependency.
* Alerting: Make ApplyAlertmanagerConfiguration only decrypt/encrypt new/changed secure settings
Previously, ApplyAlertmanagerConfiguration would decrypt and re-encrypt all secure settings. However, this caused re-encrypted secure settings to be included in the raw configuration when applied to the embedded alertmanager, resulting in changes to the hash. Consequently, even if no actual modifications were made, saving any alertmanager configuration triggered an apply/restart and created a new historical entry in the database.
To address the issue, this modifies ApplyAlertmanagerConfiguration, which is called by POST `api/alertmanager/grafana/config/api/v1/alerts`, to decrypt and re-encrypt only new and updated secure settings. Unchanged secure settings are loaded directly from the database without alteration.
We determine whether secure settings have changed based on the following (already in-use) assumption: Only new or updated secure settings are provided via the POST `api/alertmanager/grafana/config/api/v1/alerts` request, while existing unchanged settings are omitted.
* Ensure saving a grafana-managed contact point will only send new/changed secure settings
Previously, when saving a grafana-managed contact point, empty string values were transmitted for all unset secure settings. This led to potential backend issues, as it assumed that only newly added or updated secure settings would be provided.
To address this, we now exclude empty ('', null, undefined) secure settings, unless there was a pre-existing entry in secureFields for that specific setting. In essence, this means we only transmit an empty secure setting if a previously configured value was cleared.
* Fix linting
* refactor omitEmptyUnlessExisting
* fixup
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* Alerting: Repurpose rule testing endpoint to return potential alerts
This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations.
The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent.
This means that the endpoint will, among other things:
- Attach static annotations and labels from the rule configuration to the alert instances.
- Attach dynamic annotations from the datasource to the alert instances.
- Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances.
- Interpolate templated annotations / labels and accept allowed template functions.
* Alerting: Fix provisioned templates being ignored by alertmanager
Template provisioning sets the template in cfg.TemplateFiles while a recent change
made it so that alertmanager reads cfg.AlertmanagerConfig.Templates instead.
This change fixes the issue on both ends, by having provisioning set boths fields and
reverts the change on the alertmanager side so that it uses cfg.TemplateFiles.