grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-29 12:14:08 -06:00

Author	SHA1	Message	Date
Santiago	a77ba40ed4	Alerting: Use the forked Alertmanager for remote secondary mode (#79646 ) * (WIP) Alerting: Use the forked Alertmanager for remote secondary mode * fall back to using internal AM in case of error * remove TODOs, clean up .ini file, add orgId as part of remote AM config struct * log warnings and errors, fall back to remoteSecondary, fall back to internal AM only * extract logic to decide remote Alertmanager mode to a separate function, switch on mode * tests * make linter happy * remove func to decide remote Alertmanager mode * refactor factory function and options * add default case to switch statement * remove ineffectual assignment	2023-12-21 15:26:31 +01:00
Matthew Jacobson	82f3127e23	Alerting: Move legacy alert migration from sqlstore migration to service (#72702 )	2023-10-12 13:43:10 +01:00
Alexander Weaver	f6649d7a97	Revert "Alerting: Remove vendored models in migration service" (#76387 ) Revert "Alerting: Remove vendored models in migration service (#74503)" This reverts commit `6a8649d544`.	2023-10-11 14:21:21 -05:00
Matthew Jacobson	6a8649d544	Alerting: Remove vendored models in migration service (#74503 ) This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls. It also fills in some gaps in the testing suite around: - Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9 - Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993 Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.	2023-10-11 17:22:09 +01:00
Santiago	93b9f9b537	Alerting: Use interfaces for the Alertmanager (#73900 )	2023-09-06 07:59:29 -03:00
Matthew Jacobson	91471ac7ae	Alerting: Template Testing API (#67450 )	2023-04-28 15:56:59 +01:00
Serge Zaitsev	d6d4097567	Chore: Fix goimports grouping in alerting (#62424 ) * fix goimports * fix goimports order	2023-01-30 09:55:35 +01:00
gotjosh	0be920e61c	Alerting: Remove unused code after importing from grafana/alerting (#61869 ) * Alerting: Remove unused code after importing from grafana/alerting	2023-01-23 10:30:10 +00:00
gotjosh	e7cd6eb13c	Alerting: Use `alerting.GrafanaAlertmanager` instead of initialising Alertmanager components directly (#61230 ) * Alerting: Use `alerting.GrafanaAlertmanager` instead of initialising Alertmanager components directly	2023-01-13 12:54:38 -04:00
Joe Blubaugh	1a8d0e2736	Alerting: Speed up unit and integration tests. (#60067 ) This change marks tests in the `sender` package that use an external process as integration tests instead of unit tests. This speeds up the package's unit tests by about 20 seconds. This change also reduces the number of alert instances in the `store` package's bulk write integration test from 20_000 to 10_000. This is still enough to exercise the bulk-write code but speeds up the package tests from about 250s to 130s. Put together, integration tests go to about 160s while also speeding up unit tests by 20s.	2022-12-12 14:21:06 +08:00
Sasha Melentyev	c02003af3c	Refactor time durations (#58484 ) This change uses `time.Second` in place of `1000 * time.Millisecond` and `time.Minute` in place of `60*time.Second`.	2022-11-22 15:09:15 +08:00
Kristin Laemmert	05709ce411	chore: remove sqlstore & mockstore dependencies from (most) packages (#57087 ) * chore: add alias for InitTestDB and Session Adds an alias for the sqlstore InitTestDB and Session, and updates tests using these to reduce dependencies on the sqlstore.Store. * next pass of removing sqlstore imports * last little bit * remove mockstore where possible	2022-10-19 09:02:15 -04:00
Santiago	09f8e026a1	Alerting: Expose info about notification delivery errors in a new /receivers endpoint (#55429 ) * (WIP) switch to fork AM, first implementation of the API, generate spec * get receivers avoiding race conditions * use latest version of our forked AM, tests * make linter happy, delete TODO comment * update number of expected paths to += 2 * delete unused endpoint code, code review comments, tests * Update pkg/services/ngalert/notifier/alertmanager.go Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com> * remove call to fmt.Println * clear naming for fields * shorter variable names in GetReceivers Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2022-10-03 10:58:41 -03:00
Yuriy Tseretyan	6e1e4a4215	Alerting: Update DbStore to use disabled orgs from the config (#52156 ) * update DbStore to use UnifiedAlerting settings * remove disabled orgs from scheduler and use config in db store instead * remove test	2022-07-15 14:13:30 -04:00
Kristin Laemmert	debbb8d59d	sqlstore: finish removing Find and SearchDashboards (#49347 ) * chore: replace artisnal FakeDashboardService with generated mock Maintaining a handcrafted FakeDashboardService is not sustainable now that we are in the process of moving the dashboard-related functions out of sqlstore. * sqlstore: finish removing Find and SearchDashboards Find and SearchDashboards were previously copied into the dashboard service. This commit completes that work, removing Find and SearchDashboards from the sqlstore and updating callers to use the dashboard service. * dashboards: remove SearchDashboards from Store interface SearchDashboards is a wrapper around FindDashboard that transforms the results, so it's been moved out of the Store entirely and the functionality moved into the Dashboard Service's search implementation. The database tests depended heavily on the transformation, so I added testSearchDashboards, a copy of search dashboards, instead of (heavily) refactoring all the tests.	2022-05-24 09:24:55 -04:00
Joe Blubaugh	631dd718a2	47470: Add additional delay to silences in test. (#47482 ) This test of silence cleanup was flaky because of its use of real wall time. In CI environments with slow execution, delays could cause the test to fail. This change mitigates the problem by increasing the end time of silences in the test. After Prometheus merges this PR: https://github.com/prometheus/alertmanager/pull/2867 we can make the test fully deterministic by using a fake clock. Fixes #47470 Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>	2022-04-08 14:52:08 +08:00
Alexander Weaver	c3ad36ba72	Temporarily skip intermittent test (#47471 )	2022-04-07 12:52:00 -05:00
Joe Blubaugh	c5b39dd3cd	Unified Alerting, Issue 41156: Clean up expired silences. (#46740 ) Expired silences older than the retention period were not being cleaned up. The root problem was that notifier.Alertmanager overrides the Prometheus alert manager's silence maintenance function and was not calling Silences.GC() in the overriden function.	2022-03-23 09:49:02 +01:00
Eng Zer Jun	b56848f006	test: use `T.TempDir` to create temporary test directory (#44947 ) The directory created by `T.TempDir` is automatically removed when the test and all its subtests complete. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-03-22 15:43:29 +01:00
Serge Zaitsev	84a5910e56	Chore: Remove bus from ngalert (#44465 ) * pass notification service down to the notifiers * add ns to all notifiers * remove bus from ngalert notifiers * use smaller interfaces for notificationservice * attempt to fix the tests * remove unused struct field * simplify notification service mock * trying to resolve issues in the tests * make linter happy * make linter even happier * linter, you are annoying	2022-01-26 16:42:40 +01:00
Yuriy Tseretyan	ea478dec22	Alerting: Remove bridge between log15 and go-kit logger (#43769 ) * remove bridge between log15 and go-kit logger. * fix tests	2022-01-07 09:40:09 +01:00
idafurjes	56c3875bb9	Chore: Remove context.TODO (#43458 ) * Remove context.TODO() from services * Fix live test	2021-12-28 10:26:18 +01:00
Alexander Weaver	56b3dc5445	Alerting: Allow configuration of non-ready alertmanagers (#43063 ) * Create API test for overwriting invalid alertmanager config * Avoid requiring alertmanager readiness for config changes * AlertmanagerSrv depends on functionality rather than concrete types * Add test for non-ready alertmanagers * Additional cleanup and polish * Back out previous integration test changes * Refactor of tests incorrectly caused a test to become redundant * Use pre-existing fake secret service * Drop unused interface * Test against concrete MultiOrgAlertmanager re-using fake infra from other tests * Fix linter error * Empty commit to rerun checks	2021-12-27 17:01:17 -06:00
Tania B	5652bde447	Encryption: Use secrets service (#40251 ) * Use secrets service in pluginproxy * Use secrets service in pluginxontext * Use secrets service in pluginsettings * Use secrets service in provisioning * Use secrets service in authinfoservice * Use secrets service in api * Use secrets service in sqlstore * Use secrets service in dashboardshapshots * Use secrets service in tsdb * Use secrets service in datasources * Use secrets service in alerting * Use secrets service in ngalert * Break cyclic dependancy * Refactor service * Break cyclic dependancy * Add FakeSecretsStore * Setup Secrets Service in sqlstore * Fix * Continue secrets service refactoring * Fix cyclic dependancy in sqlstore tests * Fix secrets service references * Fix linter errors * Add fake secrets service for tests * Refactor SetupTestSecretsService * Update setting up secret service in tests * Fix missing secrets service in multiorg_alertmanager_test * Use fake db in tests and sort imports * Use fake db in datasources tests * Fix more tests * Fix linter issues * Attempt to fix plugin proxy tests * Pass secrets service to getPluginProxiedRequest in pluginproxy tests * Fix pluginproxy tests * Revert using secrets service in alerting and provisioning * Update decryptFn in alerting migration * Rename defaultProvider to currentProvider * Use fake secrets service in alert channels tests * Refactor secrets service test helper * Update setting up secrets service in tests * Revert alerting changes in api * Add comments * Remove secrets service from background services * Convert global encryption functions into vars * Revert "Convert global encryption functions into vars" This reverts commit `498eb19859`. * Add feature toggle for envelope encryption * Rename toggle Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> Co-authored-by: Joan López de la Franca Beltran <joanjan14@gmail.com>	2021-11-04 18:47:21 +02:00
Joan López de la Franca Beltran	722c414fef	Encryption: Refactor securejsondata.SecureJsonData to stop relying on global functions (#38865 ) * Encryption: Add support to encrypt/decrypt sjd * Add datasources.Service as a proxy to datasources db operations * Encrypt ds.SecureJsonData before calling SQLStore * Move ds cache code into ds service * Fix tlsmanager tests * Fix pluginproxy tests * Remove some securejsondata.GetEncryptedJsonData usages * Add pluginsettings.Service as a proxy for plugin settings db operations * Add AlertNotificationService as a proxy for alert notification db operations * Remove some securejsondata.GetEncryptedJsonData usages * Remove more securejsondata.GetEncryptedJsonData usages * Fix lint errors * Minor fixes * Remove encryption global functions usages from ngalert * Fix lint errors * Minor fixes * Minor fixes * Remove securejsondata.DecryptedValue usage * Refactor the refactor * Remove securejsondata.DecryptedValue usage * Move securejsondata to migrations package * Move securejsondata to migrations package * Minor fix * Fix integration test * Fix integration tests * Undo undesired changes * Fix tests * Add context.Context into encryption methods * Fix tests * Fix tests * Fix tests * Trigger CI * Fix test * Add names to params of encryption service interface * Remove bus from CacheServiceImpl * Add logging * Add keys to logger Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Add missing key to logger Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Undo changes in markdown files * Fix formatting * Add context to secrets service * Rename decryptSecureJsonData to decryptSecureJsonDataFn * Name args in GetDecryptedValueFn * Add template back to NewAlertmanagerNotifier * Copy GetDecryptedValueFn to ngalert * Add logging to pluginsettings * Fix pluginsettings test Co-authored-by: Tania B <yalyna.ts@gmail.com> Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>	2021-10-07 17:33:50 +03:00
gotjosh	6572017ec7	Alerting: Allow more characters in label names so notifications are sent (#38629 ) Remove validation for labels to be accepted in the Alertmanager, This helps with datasources that produce non-compatible labels. Adds an "object_matchers" to alert manager routers so we can support labels names with extended characters beyond prometheus/openmetrics. It only does this for the internal Grafana managed Alert Manager. This requires a change to alert manager, so for now we use grafana/alertmanager which is a slight fork, with the intention of going back to upstream. The frontend handles the migration of "matchers" -> "object_matchers" when the route is edited and saved. Once this is done, downgrades will not work old versions will not recognize the "object_matchers". Co-authored-by: Kyle Brandt <kyle@grafana.com> Co-authored-by: Nathan Rodman <nathanrodman@gmail.com>	2021-10-04 15:06:40 +02:00
Sofia Papagiannaki	f6f3a54742	Alerting: tune rule evaluation via configuration (#35623 ) * Alerting: Configure max evaluation retries * Alerting: Enforce minimum rule evaluation interval * Alerting: Disable rule evaluation from configuration * Update docs * Alerting: Configure rule evaluation timeout * Move options on unified_alerting config section * Apply suggestions from code review Co-authored-by: gotjosh <josue@grafana.com>	2021-09-28 13:00:16 +03:00
Yuriy Tseretyan	1910d85ae0	Alerting: Optimization of fetching data in multiorg alertmanager (#39237 ) * Add method GetAllLatestAlertmanagerConfiguration to DBStore * add method ApplyConfig to AlertManager * update multiorg alert manager to load all alertmanager configs at once	2021-09-21 11:01:23 -04:00
gotjosh	7db97097c9	Alerting: Support Unified Alerting with Grafana HA (#37920 ) * Alerting: Support Unified Alerting in Grafana's HA mode.	2021-09-16 15:33:51 +01:00
gotjosh	a2f4344bf2	Alerting: Refactor & fix unified alerting metrics structure (#39151 ) * Alerting: Refactor & fix unified alerting metrics structure Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance. This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.	2021-09-14 12:55:01 +01:00
gotjosh	39a3bb8a1c	Alerting: Persist notification log and silences to the database (#39005 ) * Alerting: Persist notification log and silences to the database This removes the dependency of having persistent disk to run grafana alerting. Instead of regularly flushing the notification log and silences to disk we now flush the binary content of those files to the database encoded as a base64 string.	2021-09-09 17:25:22 +01:00
David Parrott	7fbeefc090	Alerting: create wrapper for Alertmanager to enable org level isolation (#37320 ) Introduces org-level isolation for the Alertmanager and its components. Silences, Alerts and Contact points are not separated by org and are not shared between them. Co-authored with @davidmparrott and @papagian	2021-08-24 11:28:09 +01:00
Sofia Papagiannaki	04d5dcb7c8	Alerting: modify DB table, accessors and migration to restrict org access (#37414 ) * Alerting: modify table and accessors to limit org access appropriately * Update migration to create multiple Alertmanager configs * Apply suggestions from code review Co-authored-by: gotjosh <josue@grafana.com> * replace mg.ClearMigrationEntry() mg.ClearMigrationEntry() would create a new session. This commit introduces a new migration for clearing an entry from migration log for replacing mg.ClearMigrationEntry() so that all dashboard alert migration operations will run inside the same transaction. It adds also `SkipMigrationLog()` in Migrator interface for skipping adding an entry in the migration_log. Co-authored-by: gotjosh <josue@grafana.com>	2021-08-12 16:04:09 +03:00
Ganesh Vernekar	94d2520a84	Alerting: Allow space in label and annotation names (#36549 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-07-08 18:26:09 +05:30
Sofia Papagiannaki	23939eab10	[Alerting]: namespace fixes (#34470 ) * [Alerting]: forbid viewers for updating rules if viewers can edit check for CanSave instead of CanEdit * Clear ngalert tables when deleting the folder * Apply suggestions from code review * Log failure to check save permission Co-authored-by: gotjosh <josue@grafana.com>	2021-05-20 15:49:33 +03:00
gotjosh	6384f86fb9	Alerting: Allow the notifier to log (#34232 ) * Alerting: Allow the notifier to log The notifier upstream code uses go-kit as its logging library. The grafana specific logger is not compatible with this API. In this PR, I have created a wrapper that implements io.Writer to make them compatible.	2021-05-17 18:06:47 +01:00
Owen Diehl	1367f7171e	Alerting/ruler metrics (#34144 ) * adds active configurations metric * rule evaluation metrics * ruler metrics * pr feedback	2021-05-14 16:13:44 -04:00
Owen Diehl	baca873a84	extracts alertmanager from DI, including migrations (#34071 ) * extracts alertmanager from DI, including migrations * includes alertmanager Run method in ngalert * removes 3s test shutdown timeout * lint	2021-05-13 14:01:38 -04:00
Ganesh Vernekar	5f44ccff0c	NGAlert: Fix unit test to write files in temporary directory (#34032 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-05-13 16:08:12 +05:30
Owen Diehl	5e48b54549	Alerting/metrics (#33547 ) * moves alerting metrics to their own pkg * adds grafana_alerting_alerts (by state) metric * alerts_received_{total,invalid} * embed alertmanager alerting struct in ng metrics & remove duplicated notification metrics (already embed alertmanager notifier metrics) * use silence metrics from alertmanager lib * fix - manager has metrics * updates ngalert tests * comment lint Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * cleaner prom registry code * removes ngalert global metrics * new registry use in all tests * ngalert metrics impl service, hack testinfra code to prevent duplicate metric registrations * nilmetrics unexported	2021-04-30 12:28:06 -04:00
Ganesh Vernekar	be1affe0a4	NGAlert: Fix flaky test (#33415 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-04-27 17:03:22 +05:30
Ganesh Vernekar	659ea20c3c	NGAlert: Run the maintenance cycle for the silences (#33301 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-04-23 16:19:03 +02:00
Ganesh Vernekar	0a03d5c29e	AlertingNG: Correctly set StartsAt, EndsAt, UpdatedAt after alert reception (#33109 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-04-22 20:42:18 +05:30
gotjosh	528ca9134b	Alerting: Use a default configuration and periodically poll for new ones (#32851 ) * Alerting: Use a default configuration and periodically poll for new ones Use a default configuration to make sure we always start the grafana instance. Then, regularly poll for new ones. I've also made sure that failures to apply configuration do not stop the Grafana server but instead keep polling until it is a success.	2021-04-13 13:02:44 +01:00
gotjosh	9b52ffc6a9	Alerting: Fetch configuration from the database and run a notification service (#32175 ) * Alerting: Fetch configuration from the database and run a notification instance Co-Authored-By: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2021-03-24 14:20:44 +00:00

45 Commits