Alerting: Remove start page of upgrade preview
Alerting upgrade page will now always show the summary table even before
upgrading any alerts or notification channels. There a few reasons for this:
- The information on the start page is redundant as it's now contained in the
documentation.
- Previously, if some unexpected issue prevented performing a full upgrade, a
user would have limited to no means to using the preview tool to help fix the
problem. This is because you could not see the summary table until the full
upgrade was performed at least once. Now, you can upgrade individual alerts and
notification channels from the beginning.
* Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping
* Support validation of notification settings when rules are updated
* Implement route generator for Alertmanager configuration. That fetches all notification settings.
* Update multi-tenant Alertmanager to run the generator before applying the configuration.
* Add notification settings labels to state calculation
* update the Multi-tenant Alertmanager to provide validation for notification settings
* update GET API so only admins can see auto-gen
* Add config for limit of rules per rule group
* Warn when editing big groups through normal API
* Warn on prov api writes for groups
* Wire up comp root, tests
* Also add warning to state manager warm
* Drop unnecessary conversion
* streamline initialization of test databases, support on-disk sqlite test db
* clean up test databases
* introduce testsuite helper
* use testsuite everywhere we use a test db
* update documentation
* improve error handling
* disable entity integration test until we can figure out locking error
* update GetUserVisibleNamespaces to use FolderSeriver
* update GetNamespaceByUID to use FolderService.GetFolders
* update GetAlertRulesForScheduling to use FolderService.GetFolders
* Update API and GetAlertRulesForScheduling to use the folder's full path
* get full path of folder in RouteTestGrafanaRuleConfig
* fix escaping of titles for MySQL
* Add single receiver method
* Add receiver permissions
* Add single/multi GET endpoints for receivers
* Remove stable tag from time intervals
See end of PR description here: https://github.com/grafana/grafana/pull/81672
* declare new API and models GettableTimeIntervals, PostableTimeIntervals
* add new actions alert.notifications.time-intervals:read and alert.notifications.time-intervals:write.
* update existing alerting roles with the read action. Add to all alerting roles.
* add integration tests
* Create locking config store that mimics existing provisioning store
* Rename existing receivers(_test).go
* Introduce shared receiver group service
* Fix test
* Move query model to models package
* ReceiverGroup -> Receiver
* Remove locking config store
* Move convert methods to compat.go
* Cleanup
These don't get marshalled and unmarshalled in the same way as they are represented in Go
This PR changes the OpenAPI spec to reflect what the API accepts and sends back
* ngalert openapi: Use same `basePath` as rest of Grafana
Currently, there are two issues that prevent easily merging `ngalert` and grafana openapi specs:
- The basePath is different. `grafana` has `/api` and `ngalert` has `/api/v1`. I changed `ngalert` to use `/api`
- The `ngalert` endpoints have their basePath in the each operation path. The basePath should actually be omitted
---------
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
* Change ruler API to expect the folder UID as namespace
* Update example requests
* Fix tests
* Update swagger
* Modify FIle field in /api/prometheus/grafana/api/v1/rules
* Fix ruler export
* Modify folder in responses to be formatted as <parent UID>/<title>
* Add alerting test with nested folders
* Apply suggestion from code review
* Alerting: use folder UID instead of title in rule API (#77166)
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
* Drop a few more latent uses of namespace_id
* move getNamespaceKey to models package
* switch GetAlertRulesForScheduling to use folder table
* update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`.
* fi tests
* add tests for GetAlertRulesForScheduling when parent uid
* fix integration tests after merge
* fix test after merge
* change format of the namespace to JSON array
this is needed for forward compatibility, when we migrate to full paths
* update EF code to decode nested folder
---------
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* add get mute timing by name to MuteTimingService
* update get mute timing request handler to use the service method
* replace validation, uniqueness and used errors with errutils
* update mute timing methods return errutil responses
* use the term "time interval" in errors bevause mute timings are deprecated in Alertmanager and will be replaced by time intervals in the future.
* update create and update methods to return struct instead of pointer
* Remove FolderID from service tests
* Add models
* Add folderID pack to publicdashboard tests
* Remove folderID from dashboard tests
* Remove folderID from folders
* Remove folderID from ngalert tests
* Remove nolint comment
* Add back some tests after rebase
* Alerting: Create feature flag for alert query optimization
Adds a feature flag alertingQueryOptimization for an already existing
functionality: alert query optimization. This feature flag will now be disabled
by default.
* Alerting: Fix NoData & Error alerts not resolving when rule is reset
On rule reset, when creating the PostableAlerts StateToPostableAlert did not
attach the correct NoData/Error alertname and rulename labels to expire/resolve
the active alerts when the previous cached state was NoData/Error.
This PR has two steps that together create a functional dry-run capability for the migration.
By enabling the feature flag alertingPreviewUpgrade when on legacy alerting it will:
a. Allow all Grafana Alerting background services except for the scheduler to start (multiorg alertmanager, state manager, routes, …).
b. Allow the UI to show Grafana Alerting pages alongside legacy ones (with appropriate in-app warnings that UA is not actually running).
c. Show a new “Alerting Upgrade” page and register associated /api/v1/upgrade endpoints that will allow the user to upgrade their organization live without restart and present a summary of the upgrade in a table.
* Drop from API response
* Drop from swagger docs
* Drop from integration tests
* regenerate public swagger docs
* Drop from frontend
* Drop asserts for namespaceID field
Swagger(ngalert): Add `X-Disable-Provenance` to missing operations
I added all functions that call the `determineProvenance` function
Schema changes are from:
`make` in `pkg/services/ngalert/api/tooling`
`make swagger-clean && make openapi3-gen` in root
* Export Notification Policy correctly (#78020)
The JSON version of an exported Notification Policy now
inline correctly the policy in the same way the Yaml version
does.
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* ngalert `make`: Support GNU install on Darwin
Currently, the Makefile assumes that Darwin is using the Mac version of `sed`
I have the GNU version, so it failed. With this PR, it checks which version is installed
I also called `make` and there are some changes that came out of it
* swagger-gen
* Alerting: Apply query optimization to eval endpoints
Previously, query optimization was applied to alert queries when scheduled but
not when ran through `api/v1/eval` or `/api/v1/rule/test/grafana`. This could
lead to discrepancies between preview and scheduled alert results.
When running in dev mode, error messages would contain an additional "error" property alongside "message". Since this causes confusion, that has been removed and now error messages are the same both modes (using "message").
* Alerting: post alerts to the remote Alertmanager and fetch them
* fix broken tests
* Alerting: Add Mimir Backend image to devenv (blocks)
* add alerting as code owner for mimir_backend block
* Alerting: Use Mimir image to run integration tests for the remote Alertmanager
* skip integration test when running all tests
* skipping integration test when no Alertmanager URL is provided
* fix bad host for mimir_backend
* remove basic auth testing until we have an nginx image in our CI
* add integration tests for alerts
* fix tests
* change SendCtx -> Send, add context.Context to Send, fix CI
* add reover() for functions from the Prometheus Alertmanager HTTP client that could panic
* add TODO to implement PutAlerts in a way that mimicks what Prometheus does
* fix log format
This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls.
It also fills in some gaps in the testing suite around:
- Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
- Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993
Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.
* update storage's method InstertRules to return ids of added rules as slice to keep the same order as rules in the argument
* schematize response of update rule group endpoint, add created, updated, deleted fields that contain UID of affected rules.
* update integration tests to use the new fields
* extend RuleStore interface to get namespace by UID
* add new export API endpoints
* implement request handlers
* update authorization and wire handlers to paths
* add folder error matchers to errorToResponse
* add tests for export methods
* Alerting: Manage remote Alertmanager silences
* fix typo
* check errors when encoding json in fake external AM
* take path from configured URL, check for nil responses
* Add support for `keep_firing_for` in ruler proxy
* Don't delete `keep_firing_for` when editing a rule with the field set
Co-Authored-By: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
---------
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
* Make identity.Requester available at Context
* Clean pkg/services/guardian/guardian.go
* Clean guardian provider and guardian AC
* Clean pkg/api/team.go
* Clean ctxhandler, datasources, plugin and live
* Clean dashboards and guardian
* Implement NewUserDisplayDTOFromRequester
* Change status code numbers for http constants
* Upgrade signature of ngalert services
* log parsing errors instead of throwing error
* Make identity.Requester available at Context
* Clean pkg/services/guardian/guardian.go
* Clean guardian provider and guardian AC
* Clean pkg/api/team.go
* Clean ctxhandler, datasources, plugin and live
* Question: what to do with the UserDisplayDTO?
* Clean dashboards and guardian
* Remove identity.Requester from ReqContext
* Implement NewUserDisplayDTOFromRequester
* Fix tests
* Change status code numbers for http constants
* Upgrade signature of ngalert services
* log parsing errors instead of throwing error
* Fix tests and add logs
* linting
* add metrics and tracing to state manager
* propagate tracer to state manager
* add scheduler metrics
* fix backtesting
* add test for state metrics
* remove StateUpdateCount
* update docs
* metrics can be null
* add tracer to new tests
* introduce a new action "alert.provisioning.secrets:read" and role "fixed:alerting.provisioning.secrets:reader"
* update alerting API authorization layer to let the user read provisioning with the new action
* let new action use decrypt flag
* add action and role to docs
* Alerting: Fix contact point testing with secure settings
Fixes double encryption of secure settings during contact point testing and removes code duplication
that helped cause the drift between alertmanager and test endpoint. Also adds integration tests to cover
the regression.
Note: provisioningStore is created to remove cycle and the unnecessary dependency.
* Alerting: Make ApplyAlertmanagerConfiguration only decrypt/encrypt new/changed secure settings
Previously, ApplyAlertmanagerConfiguration would decrypt and re-encrypt all secure settings. However, this caused re-encrypted secure settings to be included in the raw configuration when applied to the embedded alertmanager, resulting in changes to the hash. Consequently, even if no actual modifications were made, saving any alertmanager configuration triggered an apply/restart and created a new historical entry in the database.
To address the issue, this modifies ApplyAlertmanagerConfiguration, which is called by POST `api/alertmanager/grafana/config/api/v1/alerts`, to decrypt and re-encrypt only new and updated secure settings. Unchanged secure settings are loaded directly from the database without alteration.
We determine whether secure settings have changed based on the following (already in-use) assumption: Only new or updated secure settings are provided via the POST `api/alertmanager/grafana/config/api/v1/alerts` request, while existing unchanged settings are omitted.
* Ensure saving a grafana-managed contact point will only send new/changed secure settings
Previously, when saving a grafana-managed contact point, empty string values were transmitted for all unset secure settings. This led to potential backend issues, as it assumed that only newly added or updated secure settings would be provided.
To address this, we now exclude empty ('', null, undefined) secure settings, unless there was a pre-existing entry in secureFields for that specific setting. In essence, this means we only transmit an empty secure setting if a previously configured value was cleared.
* Fix linting
* refactor omitEmptyUnlessExisting
* fixup
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* Add limit query parameter
* Drop copy paste comment
* Extend history query limit to 30 days and 250 entries
* Fix history log entries ordering
* Update no history message, add empty history test
---------
Co-authored-by: Konrad Lalik <konrad.lalik@grafana.com>
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.
* replace condition validation with just structural validation
* validate conditions of only new and updated rules
* add integration tests for rule update\delete API
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Alerting: Repurpose rule testing endpoint to return potential alerts
This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations.
The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent.
This means that the endpoint will, among other things:
- Attach static annotations and labels from the rule configuration to the alert instances.
- Attach dynamic annotations from the datasource to the alert instances.
- Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances.
- Interpolate templated annotations / labels and accept allowed template functions.