Commit Graph

35 Commits

Author SHA1 Message Date
Yuri Tseretyan
1eebd2a4de
Alerting: Support for simplified notification settings in rule API (#81011)
* Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping
* Support validation of notification settings when rules are updated

* Implement route generator for Alertmanager configuration. That fetches all notification settings.
* Update multi-tenant Alertmanager to run the generator before applying the configuration.

* Add notification settings labels to state calculation
* update the Multi-tenant Alertmanager to provide validation for notification settings

* update GET API so only admins can see auto-gen
2024-02-15 09:45:10 -05:00
Diego Augusto Molina
9c29e1a783
Alerting: Fix data races and improve testing (#81994)
* Alerting: fix race condition in (*ngalert/sender.ExternalAlertmanager).Run

* Chore: Fix data races when accessing members of *ngalert/state.FakeInstanceStore

* Chore: Fix data races in tests in ngalert/schedule and enable some parallel tests

* Chore: fix linters

* Chore: add TODO comment to remove loopvar once we move to Go 1.22
2024-02-14 12:45:39 -03:00
Ryan McKinley
0c6e409350
Chore: Update arrow and prometheus dependencies (#82215)
* update arrow and prometheus

* keep codeowner

* use compare

* use grafana-plugin-sdk-go v0.210.0

---------

Co-authored-by: ismail simsek <ismailsimsek09@gmail.com>
2024-02-13 01:50:25 +01:00
William Wernert
2203bc2a3d
Alerting: Refactor provisioning tests/fakes (#81205)
* Fix up test Alertmanager config JSON

* Move fake AM config and provisioning stores to fakes package
2024-01-24 17:15:55 -05:00
Santiago
1f6575e65e
Alerting: Test MOA in remote secondary mode (#79828) 2024-01-05 11:05:27 +01:00
Santiago
01add144b8
Alerting: Send alerts to the remote Alertmanager (#77034)
* Alerting: Rename remote.ExternalAlertmanager to remote.Alertmanager

* Alerting: Send alerts to the remote Alertmanager

* add ticker to readiness check, add tests

* use options when creating a new sender.ExternaAlertmanager

* unexport defaultMaxQueueCapacity

* delete unused defaultConfig field

* add debug log line when sending alerts to the remote alertmanager

* move and refactor readiness check

* update tests to not include defaultConfig
2023-10-25 11:52:48 +02:00
Santiago
61cb26711e
Alerting: Fetch alerts from a remote Alertmanager (#75844)
* Alerting: post alerts to the remote Alertmanager and fetch them

* fix broken tests

* Alerting: Add Mimir Backend image to devenv (blocks)

* add alerting as code owner for mimir_backend block

* Alerting: Use Mimir image to run integration tests for the remote Alertmanager

* skip integration test when running all tests

* skipping integration test when no Alertmanager URL is provided

* fix bad host for mimir_backend

* remove basic auth testing until we have an nginx image in our CI

* add integration tests for alerts

* fix tests

* change SendCtx -> Send, add context.Context to Send, fix CI

* add reover() for functions from the Prometheus Alertmanager HTTP client that could panic

* add TODO to implement PutAlerts in a way that mimicks what Prometheus does

* fix log format
2023-10-19 11:27:37 +02:00
Matthew Jacobson
82f3127e23
Alerting: Move legacy alert migration from sqlstore migration to service (#72702) 2023-10-12 13:43:10 +01:00
Alexander Weaver
f6649d7a97
Revert "Alerting: Remove vendored models in migration service" (#76387)
Revert "Alerting: Remove vendored models in migration service (#74503)"

This reverts commit 6a8649d544.
2023-10-11 14:21:21 -05:00
Matthew Jacobson
6a8649d544
Alerting: Remove vendored models in migration service (#74503)
This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls.

It also fills in some gaps in the testing suite around:

    - Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
    - Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993

Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.
2023-10-11 17:22:09 +01:00
Ryan McKinley
025b2f3011
Chore: use any rather than interface{} (#74066) 2023-08-30 18:46:47 +03:00
guangwu
bbe4b0d3de
chore: remove refs to deprecated io/ioutil (#70300) 2023-06-22 12:19:23 +02:00
Emil Tullstedt
23a9963507
Chore: Upgrade Prometheus to 2.43.0 (#67853)
- github.com/prometheus/prometheus => 2.43.0 (aka 0.43.0)
- github.com/prometheus/client_golang => 1.15.1
2023-05-10 14:09:49 +02:00
Jean-Philippe Quéméner
bbce69f295
Alerting: Use configured headers for external alertmanager (#63819) 2023-04-21 16:16:27 +02:00
Steve Simpson
04336d53a9
Alerting: Update prometheus version (#65688) 2023-03-31 16:34:35 +02:00
Yuri Tseretyan
98e1aeaebd
Alerting: Fix client to external Alertmanager to correctly build URL for Mimir Alertmanager (#63676) 2023-02-23 13:55:26 -05:00
suntala
49b3027049
Chore: Remove Result field from datasources (#63048)
* Remove Result field from AddDataSourceCommand
* Remove DatasourcesPermissionFilterQuery Result
* Remove GetDataSourceQuery Result
* Remove GetDataSourcesByTypeQuery Result
* Remove GetDataSourcesQuery Result
* Remove GetDefaultDataSourceQuery Result
* Remove UpdateDataSourceCommand Result
2023-02-09 15:49:44 +01:00
Santiago
ba731f7865
Alerting: Mark AM configuration as applied (#61330)
* Mark AM configuration as applied

* add missing checks, make linter happy

* fix deadlock, mark as valid on save and on load

* mark configurations only if needed

* check error after applyConfig()

* code review comments

* code review changes

* more code review changes

* clean HistoricConfigFromAlertConfig function
2023-02-02 14:45:17 -03:00
idafurjes
23c27cffb3
Chore: Rename Id to ID in alerting models (#62777)
* Chore: Rename Id to ID in alerting models

* Add xorm tags for datasource

* Add xorm tag for uid
2023-02-02 17:22:43 +01:00
Serge Zaitsev
d6d4097567
Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
Joe Blubaugh
1a8d0e2736
Alerting: Speed up unit and integration tests. (#60067)
This change marks tests in the `sender` package that use an external
process as integration tests instead of unit tests. This speeds up the
package's unit tests by about 20 seconds.

This change also reduces the number of alert instances in the `store`
package's bulk write integration test from 20_000 to 10_000. This is
still enough to exercise the bulk-write code but speeds up the package
tests from about 250s to 130s.

Put together, integration tests go to about 160s while also speeding up
unit tests by 20s.
2022-12-12 14:21:06 +08:00
Alex Moreno
45facbba11
Alerting: Remove url based external alertmanagers config (#57918)
* Remove URL-based alertmanagers from endpoint config

* WIP

* Add migration and alertmanagers from admin_configuration

* Empty comment removed

* set BasicAuth true when user is present in url

* Remove Alertmanagers from GET /admin_config payload

* Remove URL-based alertmanager configuration from UI

* Fix new uid generation in external alertmanagers migration

* Fix tests for URL-based external alertmanagers

* Fix API tests

* Add more tests, move migration code to separate file, and remove possible am duplicate urls

* Fix edge cases in migration

* Fix imports

* Remove useless fields and fix created_at/updated_at retrieval

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Konrad Lalik <konrad.lalik@grafana.com>
2022-11-10 16:34:13 +01:00
Alexander Weaver
5ee4744d62
Alerting: Improve operational logs in sender package (#57134)
* Audit logs in sender package

* Fix casing and touch up a few key names

* Avoid logging entire alert struct

* Log configuration ID being applied

* Revert change to errorf rather than log

* Tune levels further and remove some redundancies

* Adjust logger naming and standardize log context

* Adjust logger naming in router

* Move log and get rid of dead error handling code
2022-10-20 14:19:04 -05:00
Alexander Weaver
3ddb28bad9
Find-and-replace 'err' logs to 'error' to match log search conventions (#57309) 2022-10-19 17:36:54 -04:00
Matthew Jacobson
940d18ad57
Alerting: Sanitize invalid label/annotation names for external alertmanagers (#54537)
* Alerting: Sanitize invalid label/annotation names for external alertmanagers

Grafana's built-in Alertmanager supports both Unicode label keys and values; however, if using an external
Prometheus Alertmanager label keys must be compatible with their data model.
This means label keys must only contain ASCII letters, numbers, as well as underscores and match the regex
`[a-zA-Z_][a-zA-Z0-9_]*`.

Any invalid characters will now be removed or replaced by the Grafana alerting engine before being sent to
the external Alertmanager according to the following rules:

- `Whitespace` will be removed.
- `ASCII characters` will be replaced with `_`.
- `All other characters` will be replaced with their lower-case hex representation.

* Prefix hex replacements with `0x`

* Refactor for clarity

* Apply suggestions from code review

Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>

Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>
2022-09-07 11:39:39 -04:00
Santiago
4fad827acd
Alerting: log external alertmanager URLs #54127 2022-08-24 13:52:39 -04:00
Jo
062d255124
Handle ioutil deprecations (#53526)
* replace ioutil.ReadFile -> os.ReadFile

* replace ioutil.ReadAll -> io.ReadAll

* replace ioutil.TempFile -> os.CreateTemp

* replace ioutil.NopCloser -> io.NopCloser

* replace ioutil.WriteFile -> os.WriteFile

* replace ioutil.TempDir -> os.MkdirTemp

* replace ioutil.Discard -> io.Discard
2022-08-10 15:37:51 +02:00
Konrad Lalik
54f2c056f5
Alerting: Configure alert manager data source as an external AM (#52081)
Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>
2022-08-01 10:20:43 +02:00
Jean-Philippe Quéméner
50ae42130b
Alerting: take datasources as external alertmanagers into consideration (#52534) 2022-07-20 16:50:49 +02:00
Yuriy Tseretyan
79d92aa03e
Alerting: Rename sender.Sender to sender.ExternalAlertmanagers (#52463) 2022-07-19 14:04:48 -04:00
Yuriy Tseretyan
054fe54b03
Alerting: Split Scheduler and AlertRouter tests (#52416)
* move fake FakeExternalAlertmanager to sender package
* move tests from scheduler to router
* update alerts router to have all fields private
* update scheduler tests to use sender mock
2022-07-19 09:32:54 -04:00
Yuriy Tseretyan
a6b1090879
Alerting: refactor scheduler and separate notification logic (#48144)
* Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router.
* Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface.
* Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.
2022-07-12 15:13:04 -04:00
Yuriy Tseretyan
ea478dec22
Alerting: Remove bridge between log15 and go-kit logger (#43769)
* remove bridge between log15 and go-kit logger.

* fix tests
2022-01-07 09:40:09 +01:00
gotjosh
a2f4344bf2
Alerting: Refactor & fix unified alerting metrics structure (#39151)
* Alerting: Refactor & fix unified alerting metrics structure

Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.

This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
2021-09-14 12:55:01 +01:00
gotjosh
f83cd401e5
Alerting: Send alerts to external Alertmanager(s) (#37298)
* Alerting: Send alerts to external Alertmanager(s)

Within this PR we're adding support for registering or unregistering
sending to a set of external alertmanagers. A few of the things that are
going are:

- Introduce a new table to hold "admin" (either org or global)
  configuration we can change at runtime.
- A new periodic check that polls for this configuration and adjusts the
  "senders" accordingly.
- Introduces a new concept of "senders" that are responsible for
  shipping the alerts to the external Alertmanager(s). In a nutshell,
this is the Prometheus notifier (the one in charge of sending the alert)
mapped to a multi-tenant map.

There are a few code movements here and there but those are minor, I
tried to keep things intact as much as possible so that we could have an
easier diff.
2021-08-06 13:06:56 +01:00