This PR has two steps that together create a functional dry-run capability for the migration.
By enabling the feature flag alertingPreviewUpgrade when on legacy alerting it will:
a. Allow all Grafana Alerting background services except for the scheduler to start (multiorg alertmanager, state manager, routes, …).
b. Allow the UI to show Grafana Alerting pages alongside legacy ones (with appropriate in-app warnings that UA is not actually running).
c. Show a new “Alerting Upgrade” page and register associated /api/v1/upgrade endpoints that will allow the user to upgrade their organization live without restart and present a summary of the upgrade in a table.
* extract get and save operations to a alertmanagerConfigStore. this removes duplicated code in service (currently only mute timings) and improves testing
* replace generic errors with errutils one with better messages.
* update provisioning services to use new store
---------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Docs: Add table data in PDF
* fix lint issues
* Switch to public preview
* Apply suggestions from code review
Co-authored-by: Isabel <76437239+imatwawana@users.noreply.github.com>
---------
Co-authored-by: Isabel <76437239+imatwawana@users.noreply.github.com>
There were a few errors that prevented these endpoints (which are the most up-to-date ones) from being present in the openapi spec:
- The `enterprise` tag excluded the endpoints from being generated
- `okRespoonse` typo
- Invalid templating on the parameters
- Missing parameter structs
Some refactoring that will simplify next changes for dry-run PRs. This should be no-op as far as the created ngalert resources and database state, though it does change some logs.
The key change here is to modify migrateOrg to return pairs of legacy struct + ngalert struct instead of actually persisting the alerts and alertmanager config. This will allow us to capture error information during dry-run migration.
It also moves most persistence-related operations such as title deduplication and folder creation to the right before we persist. This will simplify eventual partial migrations (individual alerts, dashboards, channels, ...).
Additionally it changes channel code to deal with PostableGrafanaReceiver instead of PostableApiReceiver (integration instead of contact point).
* Separate overlapping legacy and UA alerting routes
api/alert-notifiers, alerting/list, and alerting/notifications existed in both
legacy and UA.
Rename legacy route paths and nav ids to be independent of UA ones.
Backend:
* Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command.
- add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels.
- add rule_fingerprint column to alert_instance
- update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it.
- add AlertingResultsFromRuleState that implements the new interface in eval package
- update getExprRequest to patch the hysteresis command.
* Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition.
Frontend:
* Add hysteresis option to Threshold in UI. It's called "Recovery Threshold"
* Add test for getUnloadEvaluatorTypeFromCondition
* Hide hysteresis in panel expressions
* Refactor isInvalid and add test for it
* Remove unnecesary React.memo
* Add tests for updateEvaluatorConditions
---------
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
* merge with system settings before storing them in the db
* add base for validating sso settings
* add unit tests for sso settings validation
* call Reload() from sso service upsert()
* remove actual validation because it was moved in a separate pr
* use constant to fix lint error
* check if provider is configurable in service Upsert() method
* add unit tests for update provider settings api method
* fix lint error
* Canvas: Add Zoom
* Scale selecto components based on zoom state
* Fix pan by reverting to 3.1.0 for zoom-pan
* Update to latest library that fixes pan regression
* Add mini map to canvas pan zoom
* Fix selecto and anchors on hover
* Update naming to be more clear
* Switch back to contentComponent
* Apply transformScale to drag and resize
* Update connection source and target scaling
* Add option to display mini map
* Update yarn lock
* Revert "Update yarn lock"
This reverts commit 3d1dd65d57.
* Set yarn lock to main
* Revert "Set yarn lock to main"
This reverts commit 64bc50557e.
* Update to Yarn 4
* Add react-zoom-pan-pinch
* Update react-zoom-pan checksum
* Revert changes to json files
* Remove last line of api merged
* Remove last lines of all impacted jsons
* Update home json
* Update coordinate calc function to include scale
* Fix types in coordinate calc function
* Fix util calculation for transform
* Fix arrow anchor shift behavior
* Fix scale offset when adding elements during zoom
* Fix drag of selected group during zoom
* Add feature flag for canvas pan zoom
* Revert "Add feature flag for canvas pan zoom"
This reverts commit b026e31d8d.
* Regenerate feature flag after merge
* Apply feature flag to enable pan zoom wrappers
* Add mini map toggle behind feature flag
* Simplify minimap behavior
* Update feature flag registry
* Set minimap to false by default
* fix gen-cue
* Set toggles gen to main
Add blank line to toggle gen csv
* Add canvas pan zoom to csv
* Remove old comment
* Change ref parameter to be more descriptive
* Rename visibleFun to be more descriptive
* Consolidate transformScale transformRef in util
* Remove non-null assertion on connection parentRect
* Consolidate parentRect null coalescing into object
* Remove minimap and change toggle
* Add controls inline help for pan and zoom
* Clean up mouse events
* Pull scale out of ref and isolate transform
* Remove transform ref from scene div
* Fix context menu visible behavior
* Fix connections and update util functions
* Move transform component instance to util
* fix backend test
* minor updates
* Clean up connections / fix minor bug where offset of arrow wasn't being calculated correctly
* missed connection code cleanup
* cleanup scene code a bit more
* actually fix backend test
* move eslint disable line closer to actual issue
---------
Co-authored-by: nmarrs <nathanielmarrs@gmail.com>
* Unified Storage: Add resource from/to entity tests
* fixup
* Remove GRN
* Update tests
* truncate timestamps to account for RFC3339, set Group and GroupVersion in k8s object
* Update tests
---------
Co-authored-by: Dan Cech <dcech@grafana.com>
* (WIP) Alerting: Use the forked Alertmanager for remote secondary mode
* fall back to using internal AM in case of error
* remove TODOs, clean up .ini file, add orgId as part of remote AM config struct
* log warnings and errors, fall back to remoteSecondary, fall back to internal AM only
* extract logic to decide remote Alertmanager mode to a separate function, switch on mode
* tests
* make linter happy
* remove func to decide remote Alertmanager mode
* refactor factory function and options
* add default case to switch statement
* remove ineffectual assignment
* In migration, create one label per channel
This PR changes how routing is done by the legacy alerting migration.
Previously, we created a single label on each alert rule that contained an array of contact point names. Ex: __contact__="slack legacy testing","slack legacy testing2"
This label was then routed against a series of regex-matching policies with continue=true. Ex: __contacts__ =~ .*"slack legacy testing".*
In the case of many contact points, this array could quickly become difficult to manage and difficult to grok at-a-glance.
This PR replaces the single __contact__ label with multiple __legacy_c_{contactname}__ labels and simple equality-matching policies. These channel-specific policies are nested in a single route under the top-level route which matches against __legacy_use_channels__ = true for ease of organization.
This should improve the experience for users wanting to keep the default migrated routing strategy but who also want to modify which contact points an alert sends to.