grafana

mirror of https://github.com/grafana/grafana.git synced 2024-12-01 21:19:28 -06:00

Author	SHA1	Message	Date
Sofia Papagiannaki	17ca61d7f8	Alerting: Export and provisioning rules into subfolders (#77450 ) * Folders: Optionally include fullpath in service responses * Alerting: Export folder fullpath instead of title * Escape separator in folder title * Add support for provisiong alret rules into subfolders * Use FolderService for creating folders during provisioning * Export WithFullpath() folder service function --------- Co-authored-by: Tania B <yalyna.ts@gmail.com> Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>	2024-05-31 11:09:20 +03:00
William Wernert	5de7d4d06d	Alerting: Create writer interface for recording rules (#88459 ) * Create writer interface for recording rules Also create fake impl + use it for stub in scheduler	2024-05-29 22:38:33 +03:00
Alexander Weaver	b926b6336d	Alerting: Scheduled recording rules execute their queries (#88309 ) * Basic eval flow * Wiring-up * fix * Extend todo * Start with tests * Include some relevant tests, skip ones that seem to have timing-based race conditions * Some tests, touch up linter and todo * Solve TODO * Add tracing * Tests to make sure an eval went through * Wire up feature toggles * Update pkg/services/ngalert/schedule/recording_rule.go Co-authored-by: Steve Simpson <steve.simpson@grafana.com> * Update pkg/services/ngalert/schedule/recording_rule_test.go Co-authored-by: Steve Simpson <steve.simpson@grafana.com> * Update pkg/services/ngalert/schedule/recording_rule_test.go Co-authored-by: Steve Simpson <steve.simpson@grafana.com> * Update pkg/services/ngalert/schedule/recording_rule_test.go Co-authored-by: Steve Simpson <steve.simpson@grafana.com> --------- Co-authored-by: Steve Simpson <steve.simpson@grafana.com>	2024-05-28 10:59:21 -05:00
Steve Simpson	08b18113d2	Alerting: Wire up alertmanagerRemoteOnly feature toggle. (#88329 ) * Alerting: Wire up alertmanagerRemoteOnly feature toggle. Though the mode isn't feature complete yet, it will be useful to have the feature toggle wired up in order to start testing. * Apply suggestions from code review Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Formatting --------- Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>	2024-05-27 16:18:46 +02:00
Steve Simpson	8421919cb5	Alerting: Feature toggle to disallow sending alerts externally (#87982 ) * Define feature toggle * Implement feature toggle	2024-05-23 14:29:19 +02:00
Santiago	e41434c332	Alerting: Promote configuration in the remote Alertmanager (#87388 )	2024-05-16 12:06:03 +02:00
Steve Simpson	67fa96f88d	Alerting: Pass logger into NewAnnotationBackend. (#87812 ) * Alerting: Pass logger into NewAnnotationBackend. Make it possible to pass loggers into more places for code reuse. * Mistake in passing logger	2024-05-14 15:51:27 +02:00
Steve Simpson	fbaa847a3c	Alerting: Pass logger into NewRemoteLokiBackend. (#87029 ) Tiny refactor to allow a logger to be passed into NewRemoteLokiBackend.	2024-04-29 12:10:23 +02:00
Santiago	309a7e7684	Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct (#85005 ) * Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct * send the hash of the encrypted configuration * tests, default config hash in AM struct * add missing default config to test * restore build directory * go work file... * fix broken test * remove unnecessary conversion to []byte * go work again... * make things work again with latest main branch changes * update error messages in tests for decrypting config	2024-04-19 15:11:07 +02:00
Jean-Philippe Quéméner	7cfd470c91	fix(alerting): only expose metrics if executing alerts (#85512 )	2024-04-03 17:18:02 +02:00
Matthew Jacobson	0c3c5c5607	Alerting: Stop persisting silences and nflog to disk (#84706 ) With this change, we no longer need to persist silence/nflog states to disk in addition to the kvstore	2024-03-23 00:37:33 +02:00
Yuri Tseretyan	b9abb8cabb	Alerting: Update provisioning API to support regular permissions (#77007 ) * allow users with regular actions access provisioning API paths * update methods that read rules skip new authorization logic if user CanReadAllRules to avoid performance impact on file-provisioning update all methods to accept identity.Requester that contains all permissions and is required by access control. * create deltas for single rul e * update modify methods skip new authorization logic if user CanWriteAllRules to avoid performance impact on file-provisioning update all methods to accept identity.Requester that contains all permissions and is required by access control. * implement RuleAccessControlService in provisioning * update file provisioning user to have all permissions to bypass authz * update provisioning API to return errutil errors correctly --------- Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>	2024-03-22 15:37:10 -04:00
Santiago	c9bb18101c	Alerting: Decrypt secrets before sending configuration to the remote Alertmanager (#83640 ) * (WIP) Alerting: Decrypt secrets before sending configuration to the remote Alertmanager * refactor, fix tests * test decrypting secrets * tidy up * test SendConfiguration, quote keys, refactor tests * make linter happy * decrypt configuration before comparing * copy configuration struct before decrypting * reduce diff in TestCompareAndSendConfiguration * clean up remote/alertmanager.go * make linter happy * avoid serializing into JSON to copy struct * codeowners	2024-03-19 12:12:03 +01:00
Gilles De Mey	8765c48389	Alerting: Remove legacy alerting (#83671 ) Removes legacy alerting, so long and thanks for all the fish! 🐟 --------- Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com> Co-authored-by: Sonia Aguilar <soniaAguilarPeiron@users.noreply.github.com> Co-authored-by: Armand Grillet <armandgrillet@users.noreply.github.com> Co-authored-by: William Wernert <rwwiv@users.noreply.github.com> Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>	2024-03-14 15:36:35 +01:00
William Wernert	8690a42e33	Alerting: Disallow invalid rule namespace UIDs in provisioning API (#83938 ) * Disallow invalid rule namespace UIDs in provisioning Reject requests with rules that reference a nonexistent folder or have an empty folder uid	2024-03-14 09:58:25 -04:00
Yuri Tseretyan	1eebd2a4de	Alerting: Support for simplified notification settings in rule API (#81011 ) * Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping * Support validation of notification settings when rules are updated * Implement route generator for Alertmanager configuration. That fetches all notification settings. * Update multi-tenant Alertmanager to run the generator before applying the configuration. * Add notification settings labels to state calculation * update the Multi-tenant Alertmanager to provide validation for notification settings * update GET API so only admins can see auto-gen	2024-02-15 09:45:10 -05:00
Alexander Weaver	99fa064576	Alerting: Emit warning when creating or updating unusually large groups (#82279 ) * Add config for limit of rules per rule group * Warn when editing big groups through normal API * Warn on prov api writes for groups * Wire up comp root, tests * Also add warning to state manager warm * Drop unnecessary conversion	2024-02-13 08:29:03 -06:00
Alexander Weaver	5bbe9c6e61	Alerting: Enable group-level rule evaluation jittering by default, remove feature toggle (#82212 ) * remove jitter feature flag * Add an out so users can manually disable jitter * Pass in cfg * Add TODO to remove knob in future	2024-02-09 15:53:58 -06:00
George Robinson	90a26e18db	Alerting: Update Alertmanager to e82436c (#82145 ) This commit updates Alertmanager to commit e82436c, which is based on commit f69a508 from Prometheus Alertmanager.	2024-02-08 11:25:27 +00:00
William Wernert	2ea82af6e7	Alerting: Pass in receiver service to API struct (#81978 )	2024-02-06 16:49:47 +02:00
George Robinson	c8ccc4649c	Alerting: Support UTF-8 (#81512 ) This pull request updates our fork of Alertmanager to commit 65bdab0, which is based on commit 5658f8c in Prometheus Alertmanager. It applies the changes from grafana/alerting#155 which removes the overrides for validation of alerts, labels and silences that we had put in place to allow alerts and silences to work for non-Prometheus datasources. However, as this is now supported in Alertmanager with the UTF-8 work, we can use the new upstream functions and remove these overrides. The compat package is a package in Alertmanager that takes care of backwards compatibility when parsing matchers, validating alerts, labels and silences. It has three modes: classic mode, UTF-8 strict mode, fallback mode. These modes are controlled via compat.InitFromFlags. Grafana initializes the compat package without any feature flags, which is the equivalent of fallback mode. Classic and UTF-8 strict mode are used in Mimir. While Grafana Managed Alerts have no need for fallback mode, Grafana can still be used as an interface to manage the configurations of Mimir Alertmanagers and view configurations of Prometheus Alertmanager, and those installations might not have migrated or being running on older versions. Such installations behave as if in classic mode, and Grafana must be able to parse their configurations to interact with them for some period of time. As such, Grafana uses fallback mode until we are ready to drop support for outdated installations of Mimir and the Prometheus Alertmanager.	2024-02-06 08:33:47 +00:00
William Wernert	7e939401dc	Alerting: Introduce initial common receiver service (#81211 ) * Create locking config store that mimics existing provisioning store * Rename existing receivers(_test).go * Introduce shared receiver group service * Fix test * Move query model to models package * ReceiverGroup -> Receiver * Remove locking config store * Move convert methods to compat.go * Cleanup	2024-02-01 14:42:59 -05:00
Jean-Philippe Quéméner	aa25776f81	Alerting: Add a feature flag to periodically save states (#80987 )	2024-01-23 17:03:30 +01:00
Alexander Weaver	00a260effa	Alerting: Add setting to distribute rule group evaluations over time (#80766 ) * Simple, per-base-interval jitter * Add log just for test purposes * Add strategy approach, allow choosing between group or rule * Add flag to jitter rules * Add second toggle for jittering within a group * Wire up toggles to strategy * Slightly improve comment ordering * Add tests for offset generation * Rename JitterStrategyFrom * Improve debug log message * Use grafana SDK labels rather than prometheus labels	2024-01-18 12:48:11 -06:00
Jean-Philippe Quéméner	82638d059f	feat(alerting): add state persister interface (#80384 )	2024-01-17 13:33:13 +01:00
Santiago	9e78faa7ba	Alerting: Add metrics to the remote Alertmanager struct (#79835 ) * Alerting: Add metrics to the remote Alertmanager struct * rephrase http_requests_failed description * make linter happy * remove unnecessary metrics * extract timed client to separate package * use histogram collector from dskit * remove weaveworks dependency * capture metrics for all requests to the remote Alertmanager (both clients) * use the timed client in the MimirAuthRoundTripper * HTTPRequestsDuration -> HTTPRequestDuration, clean up mimir client factory function * refactor * less git diff * gauge for last readiness check in seconds * initialize LastReadinesCheck to 0, tweak metric names and descriptions * add counters for sync attempts/errors * last config sync and last state sync timestamps (gauges) * change latency metric name * metric for remote Alertmanager mode * code review comments * move label constants to metrics package	2024-01-10 11:18:24 +01:00
Matthew Jacobson	aa03b8f8a7	Alerting: Guided legacy alerting upgrade dry-run (#80071 ) This PR has two steps that together create a functional dry-run capability for the migration. By enabling the feature flag alertingPreviewUpgrade when on legacy alerting it will: a. Allow all Grafana Alerting background services except for the scheduler to start (multiorg alertmanager, state manager, routes, …). b. Allow the UI to show Grafana Alerting pages alongside legacy ones (with appropriate in-app warnings that UA is not actually running). c. Show a new “Alerting Upgrade” page and register associated /api/v1/upgrade endpoints that will allow the user to upgrade their organization live without restart and present a summary of the upgrade in a table.	2024-01-05 18:19:12 -05:00
Santiago	a77ba40ed4	Alerting: Use the forked Alertmanager for remote secondary mode (#79646 ) * (WIP) Alerting: Use the forked Alertmanager for remote secondary mode * fall back to using internal AM in case of error * remove TODOs, clean up .ini file, add orgId as part of remote AM config struct * log warnings and errors, fall back to remoteSecondary, fall back to internal AM only * extract logic to decide remote Alertmanager mode to a separate function, switch on mode * tests * make linter happy * remove func to decide remote Alertmanager mode * refactor factory function and options * add default case to switch statement * remove ineffectual assignment	2023-12-21 15:26:31 +01:00
Santiago	9945514baa	Alerting: Validate configuration for the remote Alertmanager struct (#79691 ) * Alerting: Validate configuration for the remote Alertmanager struct * add TenantID to test * add OrgID to config struct in tests	2023-12-19 18:41:48 +01:00
William Wernert	62bdbe5b44	Annotations/Alerting: Add Loki historian store stub (#78363 ) * Add Loki historian store stub * Add composite store * Use composite store if Loki historian enabled * Split store interface into read/write * Make composite + historian stores read only * Use variadic constructor for composite * Modify Loki store enable logic * Use dskit.concurrency.ForEachJob for parallelism	2023-12-12 17:43:09 -05:00
Alexander Weaver	ab0ef5276f	Alerting: Decouple quota configuration logic from API interfaces and add tests (#78930 ) * Separate usage reporter from API * Extract quota registration * Decouple from API store interface * Move to ngalert package and add tests * linter	2023-12-01 10:47:19 -06:00
Steve Simpson	520c927931	Alerting: Only warm alert state cache if execute_alerts=true. (#78895 ) * Alerting: Only warm alert state cache if execute_alerts=true. If the Grafana instance is not executing alerts, then Warm()-ing the state manager is wasteful and could lead to misleading rule status queries, as the status returned will be always based on the state loaded from the database at startup, and not the most recent evaluation state. * Move Warm() down to shared conditional.	2023-12-01 10:17:32 +01:00
Santiago	01d274852c	Alerting: Add GetFullState method to FileStore (#78701 ) * Alerting: Add GetFullState method to FileStore * make tests compile, create stateStore in NewAlertmanager * return errors instead of logging, accept an arbitrary number of strings * make NewAlertmanager() accept a stateStore	2023-11-28 15:34:45 +01:00
Tania	39754ba2d6	Nested Folders: Wrap create/update operations with transactions (#78000 ) * Nested Folders: Add transaction to create and update methods * Update tests * Make IncreaseVersionForAllRulesInNamespace synchronous * Resolve merge conflicts	2023-11-21 23:06:20 +02:00
Ryan McKinley	f69fd3726b	FeatureToggles: Add context and and an explicit global check (#78081 )	2023-11-14 12:50:27 -08:00
Santiago	488a60aee6	Alerting: Rename remote.ExternalAlertmanager to remote.Alertmanager (#76956 )	2023-10-23 15:37:14 +02:00
gotjosh	866acbd5ac	Alerting: Move `ExternalAlertmanager` to its own package (#76854 ) * Alerting: Move `ExternalAlertmanager` to its own package We'll avoid import cycles when using components from other packages. In addition to that, I've created an `Options` approach for the multiorg alertmanger to allow us to override how per tenant alertmanagers are created. * switch things around * address review comments * fix references and warnings	2023-10-20 14:08:13 +02:00
Matthew Jacobson	c2efcdde09	Alerting: Fix flaky SQLITE_BUSY when migrating with provisioned dashboards (#76658 ) * Alerting: Move migration from background service run to ngalert init sqlite database write contention between the migration's single transaction and dashboard provisioning's frequent commits was causing the migration to fail with SQLITE_BUSY/SQLITE_BUSY_SNAPSHOT on all retries. This is not a new issue for sqlite+grafana, but the discrepancy between the length of the transactions was causing it to be very consistent. In addition, since a failed migration has implications on the assumed correctness of the alertmanager and alert rule definition state, we cause a server shutdown on error. This can make e2e tests as well as some high-load provisioned sqlite installations flaky on startup. The correct fix for this is better transaction management across various services and is out of scope for this change as we're primarily interested in mitigating the current bout of server failures in e2e tests when using sqlite.	2023-10-19 10:03:00 -04:00
Matthew Jacobson	82f3127e23	Alerting: Move legacy alert migration from sqlstore migration to service (#72702 )	2023-10-12 13:43:10 +01:00
Alexander Weaver	f6649d7a97	Revert "Alerting: Remove vendored models in migration service" (#76387 ) Revert "Alerting: Remove vendored models in migration service (#74503)" This reverts commit `6a8649d544`.	2023-10-11 14:21:21 -05:00
Matthew Jacobson	6a8649d544	Alerting: Remove vendored models in migration service (#74503 ) This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls. It also fills in some gaps in the testing suite around: - Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9 - Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993 Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.	2023-10-11 17:22:09 +01:00
gotjosh	59694fb2be	Alerting: Don't use a separate collection system for metrics (#75296 ) * Alerting: Don't use a separate collection system for metrics The state package had a metric collection system that ran every 15s updating the values of the metrics - there is a common pattern for this in the Prometheus ecosystem called "collectors". I have removed the behaviour of using a time-based interval to "set" the metrics in favour of a set of functions as the "value" that get called at scrape time.	2023-09-25 10:27:30 +01:00
Steve Simpson	894f420014	Alerting: Pass loggers into SchedulerCfg and ManagerCfg. (#75158 )	2023-09-20 15:07:02 +02:00
Will Browne	e855efb13d	Plugins: Move store and plugin dto to pluginsintegration (#74655 ) move store and plugin dto	2023-09-11 13:59:24 +02:00
Yuri Tseretyan	938e26b59f	Alerting: Add new metrics and tracings to state manager and scheduler (#71398 ) * add metrics and tracing to state manager * propagate tracer to state manager * add scheduler metrics * fix backtesting * add test for state metrics * remove StateUpdateCount * update docs * metrics can be null * add tracer to new tests	2023-08-16 09:04:18 +02:00
Yuri Tseretyan	0717ec11d6	Alerting: Update state manager to change all current states in the case when Error\NoData is executed as Ok\Nomal (#68142 )	2023-08-15 10:27:15 -04:00
Yuri Tseretyan	6b4a9d73d7	Alerting: Export contact points to check access control action instead legacy role (#71990 ) * introduce a new action "alert.provisioning.secrets:read" and role "fixed:alerting.provisioning.secrets:reader" * update alerting API authorization layer to let the user read provisioning with the new action * let new action use decrypt flag * add action and role to docs	2023-08-08 19:29:34 +03:00
Alexander Weaver	18b910e654	Alerting: Refactor annotation historian to isolate dashboard service dependency (#71689 ) * Refactor annotation historian to isolate dashboard service dependency * Export PanelKey * Don't export parsePanelKey * Remove commented out code	2023-07-18 08:18:55 -05:00
Steve Simpson	21ac224c45	Alerting: Make ImageService public in NGAlert. (#70737 )	2023-06-27 13:11:22 +02:00
George Robinson	7edbe72483	Alerting: Support concurrent queries for saving alert instances (#70525 ) This commit adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.	2023-06-23 11:36:07 +01:00

1 2 3 4

185 Commits