Some refactoring that will simplify next changes for dry-run PRs. This should be no-op as far as the created ngalert resources and database state, though it does change some logs.
The key change here is to modify migrateOrg to return pairs of legacy struct + ngalert struct instead of actually persisting the alerts and alertmanager config. This will allow us to capture error information during dry-run migration.
It also moves most persistence-related operations such as title deduplication and folder creation to the right before we persist. This will simplify eventual partial migrations (individual alerts, dashboards, channels, ...).
Additionally it changes channel code to deal with PostableGrafanaReceiver instead of PostableApiReceiver (integration instead of contact point).
* In migration, create one label per channel
This PR changes how routing is done by the legacy alerting migration.
Previously, we created a single label on each alert rule that contained an array of contact point names. Ex: __contact__="slack legacy testing","slack legacy testing2"
This label was then routed against a series of regex-matching policies with continue=true. Ex: __contacts__ =~ .*"slack legacy testing".*
In the case of many contact points, this array could quickly become difficult to manage and difficult to grok at-a-glance.
This PR replaces the single __contact__ label with multiple __legacy_c_{contactname}__ labels and simple equality-matching policies. These channel-specific policies are nested in a single route under the top-level route which matches against __legacy_use_channels__ = true for ease of organization.
This should improve the experience for users wanting to keep the default migrated routing strategy but who also want to modify which contact points an alert sends to.
* Alerting: In migration, fallback to '1s' for malformed min interval
During legacy migration, when we encounter an alert datasource query
with a min interval (interval field in the query model) that is not
parseable, instead of failing the migration we fallback to a min interval
of 1s and continue.
The reason for this is a bug in legacy alerting (existing for a few major
versions) which allows arbitrary dashboard variables to be used as the
min interval, even though those variables do not work and will cause
the legacy alert to fail with `interval calculation failed: time: invalid
duration`.
This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls.
It also fills in some gaps in the testing suite around:
- Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
- Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993
Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.