grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-25 18:55:37 -06:00

Author	SHA1	Message	Date
Matthew Jacobson	71e70c424f	Alerting: During legacy migration reduce the number of created silences (#78505 ) * Alerting: During legacy migration reduce the number of created silences During legacy migration every migrated rule was given a label rule_uid=<uid>. This was used to silence DatasourceError/DatasourceNoData alerts for migrated rules that had either ExecutionErrorState/NoDataState set to keep_state, respectively. This could potentially create a large amount of silences and a high cardinality label. Both of these scenarios have poor outcomes for CPU load and latency in unified alerting. Instead, this change creates one label per ExecutionErrorState/NoDataState when they are set to keep_state as well as two silence rules, if rules with said labels were created during migration. These silence rules are: - __legacy_silence_error_keep_state__ = true - __legacy_silence_nodata_keep_state__ = true This will drastically reduce the number of created silence rules in most cases as well as not create the potentially high cardinality label `rule_uid`.	2024-01-24 15:56:19 -05:00
Marcus Efraimsson	6768c6c059	Chore: Remove public vars in setting package (#81018 ) Removes the public variable setting.SecretKey plus some other ones. Introduces some new functions for creating setting.Cfg.	2024-01-23 12:36:22 +01:00
idafurjes	cb419e799b	Remove folderid service test (#80433 ) * Remove FolderID from service tests * Add models * Add folderID pack to publicdashboard tests * Remove folderID from dashboard tests * Remove folderID from folders * Remove folderID from ngalert tests * Remove nolint comment * Add back some tests after rebase	2024-01-12 16:43:39 +01:00
Matthew Jacobson	aa03b8f8a7	Alerting: Guided legacy alerting upgrade dry-run (#80071 ) This PR has two steps that together create a functional dry-run capability for the migration. By enabling the feature flag alertingPreviewUpgrade when on legacy alerting it will: a. Allow all Grafana Alerting background services except for the scheduler to start (multiorg alertmanager, state manager, routes, …). b. Allow the UI to show Grafana Alerting pages alongside legacy ones (with appropriate in-app warnings that UA is not actually running). c. Show a new “Alerting Upgrade” page and register associated /api/v1/upgrade endpoints that will allow the user to upgrade their organization live without restart and present a summary of the upgrade in a table.	2024-01-05 18:19:12 -05:00
Matthew Jacobson	3537c5440f	Alerting: Refactor migration to return pairs of legacy and upgraded structs (#79719 ) Some refactoring that will simplify next changes for dry-run PRs. This should be no-op as far as the created ngalert resources and database state, though it does change some logs. The key change here is to modify migrateOrg to return pairs of legacy struct + ngalert struct instead of actually persisting the alerts and alertmanager config. This will allow us to capture error information during dry-run migration. It also moves most persistence-related operations such as title deduplication and folder creation to the right before we persist. This will simplify eventual partial migrations (individual alerts, dashboards, channels, ...). Additionally it changes channel code to deal with PostableGrafanaReceiver instead of PostableApiReceiver (integration instead of contact point).	2024-01-05 05:37:13 -05:00
Matthew Jacobson	0424d44b39	Alerting: In migration, create one label per channel (#76527 ) * In migration, create one label per channel This PR changes how routing is done by the legacy alerting migration. Previously, we created a single label on each alert rule that contained an array of contact point names. Ex: __contact__="slack legacy testing","slack legacy testing2" This label was then routed against a series of regex-matching policies with continue=true. Ex: __contacts__ =~ ."slack legacy testing". In the case of many contact points, this array could quickly become difficult to manage and difficult to grok at-a-glance. This PR replaces the single __contact__ label with multiple __legacy_c_{contactname}__ labels and simple equality-matching policies. These channel-specific policies are nested in a single route under the top-level route which matches against __legacy_use_channels__ = true for ease of organization. This should improve the experience for users wanting to keep the default migrated routing strategy but who also want to modify which contact points an alert sends to.	2023-12-19 13:25:13 -05:00
Alexander Zobnin	959ebf82da	Folders: Show dashboards and folders with directly assigned permissions in "Shared" folder (#78465 ) * Folders: Show folders user has access to at the root level * Refactor * Refactor * Hide parent folders user has no access to * Skip expensive computation if possible * Fix tests * Fix potential nil access * Fix duplicated folders * Fix linter error * Fix querying folders if no managed permissions set * Update benchmark * Add special shared with me folder and fetch available non-root folders on demand * Fix parents query * Improve db query for folders * Reset benchmark changes * Fix permissions for shared with me folder * Simplify dedup * Add option to include shared folder permission to user's permissions * Fix nil UID * Remove duplicated folders from shared list * Folders: Fix fetching empty folder * Nested folders: Show dashboards with directly assigned permissions * Fix slow dashboards fetch * Refactor * Fix cycle dependencies * Move shared folder to models * Fix shared folder links * Refactor * Use feature flag for permissions * Use feature flag * Review comments * Expose shared folder UID through frontend settings * Add frontend type for sharedWithMeFolderUID option * Refactor: apply review suggestions * Fix parent uid for shared folder * Fix listing shared dashboards for users with access to all folders * Prevent creating folder with "shared" UID * Add tests for shared folders * Add test for shared dashboards * Fix linter * Add metrics for shared with me folder * Add metrics for shared with me dashboards * Fix tests * Tests: add metrics as a dependency * Fix access control metadata for shared with me folder * Use constant for shared with me * Optimize parent folders access check, fetch all folders in one query. * Use labels for metrics	2023-12-05 16:13:31 +01:00
Sofia Papagiannaki	6d4625ad52	Alerting: Fix deleting rules in a folder with matching UID in another organization (#78258 ) * Remove usage of obsolete function for deleting alert rules under folder * Apply suggestion from code review * Update tests	2023-12-04 11:34:38 +02:00
Matthew Jacobson	5a80962de9	Alerting: Add clean_upgrade config and deprecate force_migration (#78324 ) * Alerting: Add clean_upgrade config and deprecate force_migration Upgrading to UA and rolling back will no longer delete any data by default. Instead, each set of tables will remain unchanged when switching between legacy and UA. As such, the force_migration config has been deprecated and no extra configuration is required to roll back to legacy anymore. If clean_upgrade is set to true when upgrading from legacy alerting to Unified Alerting, grafana will first delete all existing Unified Alerting resources, thus re-upgrading all organizations from scratch. If false or unset, organizations that have previously upgraded will not lose their existing Unified Alerting data when switching between legacy and Unified Alerting. Similar to force_migration, it should be kept false when not needed as it may cause unintended data-loss if left enabled. --------- Co-authored-by: Christopher Moyer <35463610+chri2547@users.noreply.github.com>	2023-11-30 11:01:11 -05:00
Matthew Jacobson	cdad712547	Alerting: Keep track of individual org migration status (#78369 ) * Alerting: Keep track of individual org migration status Save migration status per migrated org. Change the meaning (and key/value) of the org_id=0 entry to store the current (previous) config value used by alerting. This is so we can know when to upgrade/downgrade by comparing with the new config value in UnifiedAlerting.IsEnabled.	2023-11-30 10:25:59 -05:00
Matthew Jacobson	2b51f0e263	Alerting: In migration improve deduplication of title and group (#78351 ) * Alerting: In migration improve deduplication of title and group This change improves alert titles generated in the legacy migration that occur when we need to deduplicate titles. Now when duplicate titles are detected we will first attempt to append a sequential index, falling back to a random uid if none are unique within 10 attempts. This should cause shorter and more easily readable deduplicated titles in most cases. In addition, groups are no longer deduplicated. Instead we set them to a combination of truncated dashboard name and humanized alert frequency. This way, alerts from the same dashboard share a group if they have the same evaluation interval. In the event that truncation causes overlap, it won't be a big issue as all alerts will still be in a group with the correct evaluation interval.	2023-11-29 10:05:00 -05:00
Matthew Jacobson	4b439b7f52	Alerting: In migration, fallback to '1s' for malformed min interval (#78614 ) * Alerting: In migration, fallback to '1s' for malformed min interval During legacy migration, when we encounter an alert datasource query with a min interval (interval field in the query model) that is not parseable, instead of failing the migration we fallback to a min interval of 1s and continue. The reason for this is a bug in legacy alerting (existing for a few major versions) which allows arbitrary dashboard variables to be used as the min interval, even though those variables do not work and will cause the legacy alert to fail with `interval calculation failed: time: invalid duration`.	2023-11-24 11:27:44 -05:00
Jean-Philippe Quéméner	11d4f604f5	fix(alerting): proper handling for queries with multiple conditions in migration (#78591 ) fix(alerting): proper handling for queries with multiple conditions	2023-11-23 18:05:44 +01:00
Jo	0de66a8099	Authz: Remove use of SignedInUser copy for permission evaluation (#78448 ) * remove use of SignedInUserCopies * add extra safety to not cross assign permissions unwind circular dependency dashboardacl->dashboardaccess fix missing import * correctly set teams for permissions * fix missing inits * nit: check err * exit early for api keys	2023-11-22 14:20:22 +01:00
Kat Yang	2f2ce3edbb	Chore: Deprecate ID from Folder (#78281 ) * Chore: Deprecate ID from Folder * chore: add more linter comments * chore: add missing lint comment	2023-11-20 15:44:51 -05:00
Jean-Philippe Quéméner	2d2e058563	refactor: use constant for prometheus datasource type (#78287 )	2023-11-17 01:07:35 +01:00
Kat Yang	3a2e96b0db	Chore: Deprecate FolderID from Dashboard (#77823 ) * Chore: Deprecate FolderID from Dashboard * chore: add two missing nolint comments	2023-11-15 10:28:50 -05:00
Ryan McKinley	f69fd3726b	FeatureToggles: Add context and and an explicit global check (#78081 )	2023-11-14 12:50:27 -08:00
Ryan McKinley	dec9a07738	Settings: Actually deprecate access to feature flags (#78073 )	2023-11-13 11:39:01 -08:00
Ryan McKinley	3509a5abb9	FeatureFlags: Cleanup usage of cfg.IsFeatureToggleEnabled (#78014 )	2023-11-13 07:55:15 -08:00
William Wernert	e562250f72	Alerting: Handle edge cases without panicking during template migration (#76890 ) * Handle empty variable, remove panics * Use fmt.Errorf only where appropriate	2023-11-02 13:24:54 -04:00
Matthew Jacobson	c2efcdde09	Alerting: Fix flaky SQLITE_BUSY when migrating with provisioned dashboards (#76658 ) * Alerting: Move migration from background service run to ngalert init sqlite database write contention between the migration's single transaction and dashboard provisioning's frequent commits was causing the migration to fail with SQLITE_BUSY/SQLITE_BUSY_SNAPSHOT on all retries. This is not a new issue for sqlite+grafana, but the discrepancy between the length of the transactions was causing it to be very consistent. In addition, since a failed migration has implications on the assumed correctness of the alertmanager and alert rule definition state, we cause a server shutdown on error. This can make e2e tests as well as some high-load provisioned sqlite installations flaky on startup. The correct fix for this is better transaction management across various services and is out of scope for this change as we're primarily interested in mitigating the current bout of server failures in e2e tests when using sqlite.	2023-10-19 10:03:00 -04:00
Torkel Ödegaard	0d55dad075	DashboardScene: Fixes full page reload of fullscreen view of a repeated panel (#76326 ) * Progress on view panel for repeats * Good enough * Update	2023-10-13 16:03:38 +02:00
Matthew Jacobson	a6d928e50e	Alerting: Prevent cleanup of non-empty folders on migration revert (#76439 ) Prevent cleanup of non-empty folders on revert	2023-10-12 18:40:51 -04:00
Matthew Jacobson	5f48619c9a	Alerting: Handle custom dashboard permissions in migration service (#74504 ) * Fix migration of custom dashboard permissions Dashboard alert permissions were determined by both its dashboard and folder scoped permissions, while UA alert rules only have folder scoped permissions. This means, when migrating an alert, we'll need to decide if the parent folder is a correct location for the newly created alert rule so that users, teams, and org roles have the same access to it as they did in legacy. To do this, we translate both the folder and dashboard resource permissions to two sets of SetResourcePermissionCommands. Each of these encapsulates a mapping of all: OrgRoles -> Viewer/Editor/Admin Teams -> Viewer/Editor/Admin Users -> Viewer/Editor/Admin When the dashboard permissions (including those inherited from the parent folder) differ from the parent folder permissions alone, we need to create a new folder to represent the access-level of the legacy dashboard. Compromises: When determining the SetResourcePermissionCommands we only take into account managed and basic roles. Fixed and custom roles introduce significant complexity and synchronicity hurdles. Instead, we log a warning they had the potential to override the newly created folder permissions. Also, we don't attempt to reconcile datasource permissions that were not necessary in legacy alerting. Users without access to the necessary datasources to edit an alert rule will need to obtain said access separate from the migration.	2023-10-12 18:12:40 -04:00
Matthew Jacobson	82f3127e23	Alerting: Move legacy alert migration from sqlstore migration to service (#72702 )	2023-10-12 13:43:10 +01:00
Alexander Weaver	f6649d7a97	Revert "Alerting: Remove vendored models in migration service" (#76387 ) Revert "Alerting: Remove vendored models in migration service (#74503)" This reverts commit `6a8649d544`.	2023-10-11 14:21:21 -05:00
Matthew Jacobson	6a8649d544	Alerting: Remove vendored models in migration service (#74503 ) This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls. It also fills in some gaps in the testing suite around: - Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9 - Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993 Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.	2023-10-11 17:22:09 +01:00

28 Commits