* Alerting: post alerts to the remote Alertmanager and fetch them
* fix broken tests
* Alerting: Add Mimir Backend image to devenv (blocks)
* add alerting as code owner for mimir_backend block
* Alerting: Use Mimir image to run integration tests for the remote Alertmanager
* skip integration test when running all tests
* skipping integration test when no Alertmanager URL is provided
* fix bad host for mimir_backend
* remove basic auth testing until we have an nginx image in our CI
* add integration tests for alerts
* fix tests
* change SendCtx -> Send, add context.Context to Send, fix CI
* add reover() for functions from the Prometheus Alertmanager HTTP client that could panic
* add TODO to implement PutAlerts in a way that mimicks what Prometheus does
* fix log format
* Alerting: Use Mimir image to run integration tests for the remote Alertmanager
* skip integration test when running all tests
* skipping integration test when no Alertmanager URL is provided
* fix bad host for mimir_backend
* remove basic auth testing until we have an nginx image in our CI
* Fix migration of custom dashboard permissions
Dashboard alert permissions were determined by both its dashboard and
folder scoped permissions, while UA alert rules only have folder
scoped permissions.
This means, when migrating an alert, we'll need to decide if the parent folder
is a correct location for the newly created alert rule so that users, teams,
and org roles have the same access to it as they did in legacy.
To do this, we translate both the folder and dashboard resource
permissions to two sets of SetResourcePermissionCommands. Each of these
encapsulates a mapping of all:
OrgRoles -> Viewer/Editor/Admin
Teams -> Viewer/Editor/Admin
Users -> Viewer/Editor/Admin
When the dashboard permissions (including those inherited from the parent
folder) differ from the parent folder permissions alone, we need to create a
new folder to represent the access-level of the legacy dashboard.
Compromises:
When determining the SetResourcePermissionCommands we only take into account
managed and basic roles. Fixed and custom roles introduce significant complexity
and synchronicity hurdles. Instead, we log a warning they had the potential to
override the newly created folder permissions.
Also, we don't attempt to reconcile datasource permissions that were
not necessary in legacy alerting. Users without access to the necessary
datasources to edit an alert rule will need to obtain said access separate from
the migration.
This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls.
It also fills in some gaps in the testing suite around:
- Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
- Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993
Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.
* update storage's method InstertRules to return ids of added rules as slice to keep the same order as rules in the argument
* schematize response of update rule group endpoint, add created, updated, deleted fields that contain UID of affected rules.
* update integration tests to use the new fields
* extend RuleStore interface to get namespace by UID
* add new export API endpoints
* implement request handlers
* update authorization and wire handlers to paths
* add folder error matchers to errorToResponse
* add tests for export methods
* Alerting: Expose metrics for Alertmanager Alerts
In Grafana, the alert evaluation and alert delivery are combined. We're always used a metric named `grafana_alerting_alerts` to get a sense of what are the alerts that are currently firing (these come from the evaluation side) and opted to not map the alertmanager alerts metric directly.
I think it's important that we make a disction between alerts that happen at evaluation vs alerts that are received for delivery by the internal Alertmanager as we have options to skip the delivery of these alerts to the internal alertmanager altogether.
* Migrate old alerting templates to use $labels
* Fix imports
* Add test coverage and separate rewriting to Go templates
* Fix lint
* Check for additional closing braces
* Add logging of invalid message templates
* Fix tests
* Small fixes
* Update comments
* Panic on empty token
* Use logtest.Fake
* Fix lint
* Allow for spaces in variable names by not tokenizing spaces
* Add template function to deduplicate Labels in a Value map
* Fix behavior of mapLookupString
* Reference deduplicated labels in migrated message template
* Fix behavior of deduplicateLabelsFunc
* Don't create variable for parent logger
* Add more tests for deduplicateLabelsFunc
* Remove unused function
* Apply suggestions from code review
Co-authored by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* Give label val merge function better name
* Extract template migration and escape literal tokens
* Consolidate + simplify template migration
---------
Co-authored-by: William Wernert <william.wernert@grafana.com>
* Alerting: Manage remote Alertmanager silences
* fix typo
* check errors when encoding json in fake external AM
* take path from configured URL, check for nil responses
* Alerting: Don't use a separate collection system for metrics
The state package had a metric collection system that ran every 15s updating the values of the metrics - there is a common pattern for this in the Prometheus ecosystem called "collectors".
I have removed the behaviour of using a time-based interval to "set" the metrics in favour of a set of functions as the "value" that get called at scrape time.
* Add support for `keep_firing_for` in ruler proxy
* Don't delete `keep_firing_for` when editing a rule with the field set
Co-Authored-By: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
---------
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
Changes SSE to not always fail all queries when one fails. Now only the query itself, and nodes that depend on it will error.
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* Make identity.Requester available at Context
* Clean pkg/services/guardian/guardian.go
* Clean guardian provider and guardian AC
* Clean pkg/api/team.go
* Clean ctxhandler, datasources, plugin and live
* Clean dashboards and guardian
* Implement NewUserDisplayDTOFromRequester
* Change status code numbers for http constants
* Upgrade signature of ngalert services
* log parsing errors instead of throwing error