grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-25 18:55:37 -06:00

Author	SHA1	Message	Date
Santiago	a77ba40ed4	Alerting: Use the forked Alertmanager for remote secondary mode (#79646 ) * (WIP) Alerting: Use the forked Alertmanager for remote secondary mode * fall back to using internal AM in case of error * remove TODOs, clean up .ini file, add orgId as part of remote AM config struct * log warnings and errors, fall back to remoteSecondary, fall back to internal AM only * extract logic to decide remote Alertmanager mode to a separate function, switch on mode * tests * make linter happy * remove func to decide remote Alertmanager mode * refactor factory function and options * add default case to switch statement * remove ineffectual assignment	2023-12-21 15:26:31 +01:00
Santiago	c46da8ea9b	Alerting: Update alerting package and imports from cluster and clusterpb (#79786 ) * Alerting: Update alerting package * update to latest commit * alias for imports	2023-12-21 12:34:48 +01:00
Santiago	f7248efff5	Alerting: Fix panic when creating a new Alertmanager returns an error (#79641 ) Alerting: Fix panic after error creating new Alertmanager	2023-12-18 15:33:07 +01:00
Santiago	73776f37eb	Alerting: Send state to the remote Alertmanager (#78538 ) * Alerting: Introduce a Mimir client as part of the Remote Alertmanager Mimir client that understands the new APIs developed for mimir. Very much a WIP still. * more wip * appease the linter * more linting * add more code * get state from kvstore, encode, send * send state to the remote Alertmanager, extract fullstate logic into its own function * pass kvstore to remote.NewAlertmanager() * refactor * add fake kvstore to tests * tests * use FileStore to get state * always log 'completed state upload' * refactor compareRemoteConfig * base64-encode the state in the file store * export silences and nflog filenames, refactor * log 'completed state/config upload...' regardless of outcome * add values to the state store in tests * address code review comments * log error from filestore --------- Co-authored-by: gotjosh <josue.abreu@gmail.com>	2023-11-29 12:49:39 +01:00
Santiago	197f0d2859	Alerting: Add methods for silences to the forked Alertmanager (#77805 ) * Alerting: Add an empty Forked Alertmanager * Alerting: Add methods for silences to the forked Alertmanager * check for errors in tests * make linter happy * make linter happy * Alerting: Add methods for silences to the forked Alertmanager	2023-11-08 12:03:40 +01:00
Santiago	a6b9b27673	Alerting: Remove OrgID() from the Alertmanager interface (#77398 )	2023-10-31 10:58:47 +01:00
Santiago	f9fc2e4568	Alerting: Remove ConfigHash() from the Alertmanager interface (#77134 )	2023-10-25 17:11:53 +02:00
Santiago	322a9c0b15	Alerting: Replace FileStore() for CleanUp() in the Alertmanager interface (#77126 ) Alerting: Remplace FileStore() for CleanUp() in the Alertmanager interface	2023-10-25 13:58:28 +02:00
gotjosh	866acbd5ac	Alerting: Move `ExternalAlertmanager` to its own package (#76854 ) * Alerting: Move `ExternalAlertmanager` to its own package We'll avoid import cycles when using components from other packages. In addition to that, I've created an `Options` approach for the multiorg alertmanger to allow us to override how per tenant alertmanagers are created. * switch things around * address review comments * fix references and warnings	2023-10-20 14:08:13 +02:00
Santiago	a60ec150f9	Alerting: Fetch receivers from remote Alertmanager (#76841 ) * Alerting: fetch receivers from remote Alertmanager * make linter happy * change require.Eventually() timeout and tick	2023-10-20 11:34:17 +02:00
Santiago	61cb26711e	Alerting: Fetch alerts from a remote Alertmanager (#75844 ) * Alerting: post alerts to the remote Alertmanager and fetch them * fix broken tests * Alerting: Add Mimir Backend image to devenv (blocks) * add alerting as code owner for mimir_backend block * Alerting: Use Mimir image to run integration tests for the remote Alertmanager * skip integration test when running all tests * skipping integration test when no Alertmanager URL is provided * fix bad host for mimir_backend * remove basic auth testing until we have an nginx image in our CI * add integration tests for alerts * fix tests * change SendCtx -> Send, add context.Context to Send, fix CI * add reover() for functions from the Prometheus Alertmanager HTTP client that could panic * add TODO to implement PutAlerts in a way that mimicks what Prometheus does * fix log format	2023-10-19 11:27:37 +02:00
Santiago	73be9449d1	Alerting: Manage remote Alertmanager silences (#75452 ) * Alerting: Manage remote Alertmanager silences * fix typo * check errors when encoding json in fake external AM * take path from configured URL, check for nil responses	2023-10-02 07:36:11 -03:00
Santiago	93b9f9b537	Alerting: Use interfaces for the Alertmanager (#73900 )	2023-09-06 07:59:29 -03:00
Serge Zaitsev	58f6648505	Chore: capitalise messages for alerting (#74335 )	2023-09-04 18:46:34 +02:00
Alexander Weaver	dfba94e052	Alerting: Limit redis pool size to 5 and make configurable (#74057 ) * Limit redis pool size to 5 and expose it in config ini * Coerce negative pool sizes to the default	2023-08-29 14:59:12 -05:00
Matthew Jacobson	d31d175109	Alerting: Fix contact point testing with secure settings (#72235 ) * Alerting: Fix contact point testing with secure settings Fixes double encryption of secure settings during contact point testing and removes code duplication that helped cause the drift between alertmanager and test endpoint. Also adds integration tests to cover the regression. Note: provisioningStore is created to remove cycle and the unnecessary dependency.	2023-07-25 10:04:27 -04:00
Sladyn	a06a5a7393	Alerting: Improve log messages (#67688 ) * Rename base logger and capatilize messages * Remove cflogger from config.go	2023-05-25 18:55:01 +03:00
Jean-Philippe Quéméner	8bb62a8316	Alerting: Add option for memberlist label (#67982 )	2023-05-09 10:32:23 +02:00
Yuri Tseretyan	a8b4a4bb45	Alerting: Update alerting module to 20230418161049-5f374e58cb32 + refactoring (#66622 ) * update to alerting 20230418161049-5f374e58cb32 * rename renamed structs in https://github.com/grafana/alerting/pull/73 * update ValidateContactPoint to use BuildReceiverConfiguration * update logger factory according to changes * rewrite integration builder Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>	2023-04-25 13:39:46 -04:00
Jean-Philippe Quéméner	bc11a484ed	Alerting: Add support for running HA using Redis (#65267 ) Co-authored-by: Steve Simpson <steve.simpson@grafana.com>	2023-04-19 17:05:26 +02:00
Yuri Tseretyan	f066e8cdcd	Alerting: Update to alerting 20230203015918-0e4e2675d7aa (after refactoring) (#62823 ) * add alerting prefix to some packages from alerting that have similar names in prometheus alertmanager	2023-02-03 11:36:49 -05:00
Santiago	ba731f7865	Alerting: Mark AM configuration as applied (#61330 ) * Mark AM configuration as applied * add missing checks, make linter happy * fix deadlock, mark as valid on save and on load * mark configurations only if needed * check error after applyConfig() * code review comments * code review changes * more code review changes * clean HistoricConfigFromAlertConfig function	2023-02-02 14:45:17 -03:00
Serge Zaitsev	d6d4097567	Chore: Fix goimports grouping in alerting (#62424 ) * fix goimports * fix goimports order	2023-01-30 09:55:35 +01:00
Santiago	b5fa9e3501	Chore: Fix "manger" typo (#61649 ) fix mangers -> managers	2023-01-17 23:13:27 +00:00
gotjosh	e7cd6eb13c	Alerting: Use `alerting.GrafanaAlertmanager` instead of initialising Alertmanager components directly (#61230 ) * Alerting: Use `alerting.GrafanaAlertmanager` instead of initialising Alertmanager components directly	2023-01-13 12:54:38 -04:00
gotjosh	ddb85ad6ad	Use the `ClusterPeer` interface from grafana/alerting (#61409 ) * Use the Cluster interface from grafana/alerting	2023-01-12 14:47:22 -04:00
Yuri Tseretyan	f0cabe14d5	Alerting: import Grafana alerting package and update usages (#60490 ) * update remaining notifiers to use alerting package	2022-12-19 10:53:58 -05:00
Alexander Weaver	3ddb28bad9	Find-and-replace 'err' logs to 'error' to match log search conventions (#57309 )	2022-10-19 17:36:54 -04:00
Santiago	09f8e026a1	Alerting: Expose info about notification delivery errors in a new /receivers endpoint (#55429 ) * (WIP) switch to fork AM, first implementation of the API, generate spec * get receivers avoiding race conditions * use latest version of our forked AM, tests * make linter happy, delete TODO comment * update number of expected paths to += 2 * delete unused endpoint code, code review comments, tests * Update pkg/services/ngalert/notifier/alertmanager.go Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com> * remove call to fmt.Println * clear naming for fields * shorter variable names in GetReceivers Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2022-10-03 10:58:41 -03:00
Jo	ca72cd570e	Remove ioutil.ReadDir from usage (#53550 ) * add depguard rule for ioutil * replace ioutil.ReadDir with os.ReadDir * use legacy option in depguard supported in golangci-lint v1.40 * replace ioutil.ReadDir with os.ReadDir * return error for file info	2022-08-11 07:21:12 -04:00
Joe Blubaugh	12c25759da	Alerting: Attach screenshot data to Slack notifications. (#49374 ) This change extracts screenshot data from alert messages via a private annotation `__alertScreenshotToken__` and attaches a URL to a Slack message or uploads the data to an image upload endpoint if needed. This change also implements a few foundational functions for use in other notifiers.	2022-05-23 14:24:20 +08:00
Alexander Weaver	8310789ef1	Indicate whether routes are provisioned when GETting Alertmanager configuration (#47857 ) * Test composition simplification from last PR * Policies use proper API model everywhere * Expose policy provenance in API, miss some dep injection * Complete injection * fix args * Tests for provenance value * Extract test helpers so tests are very readable * Single source adapter struct that was copied in 3 places * Drop redundant test * Resolve merge conflicts on changelog	2022-04-22 11:57:56 -05:00
Alexander Weaver	758364e78b	Alerting: Refactor GET/POST alerting config routes to be more extensible (#47229 ) * Refactor GET am config to be extensible * Extract post config route * Fix tests * Remove temporary duplication * Fix broken test due to layer shift * Fix duplicated error message * Properly return 400 on config rejection * Revert weird half method extraction * Move things to notifier package and avoid redundant interface * Simplify documentation * Split encryption service and depend on minimal abstractions * Properly initialize things all the way up to the composition root * Encryption -> Crypto * Address misc feedback * Missing docstring * Few more simple polish improvements * Unify on MultiOrgAlertmanager. Discover bug in existing test * Fix rebase conflicts * Misc feedback, renames, docs * Access crypto hanging off MultiOrgAlertmanager rather than having a separate API to initialize	2022-04-14 13:06:21 -05:00
George Robinson	4e3a72fc2a	Add context.Context to AlertingStore (#45069 )	2022-02-09 09:22:09 +00:00
Serge Zaitsev	84a5910e56	Chore: Remove bus from ngalert (#44465 ) * pass notification service down to the notifiers * add ns to all notifiers * remove bus from ngalert notifiers * use smaller interfaces for notificationservice * attempt to fix the tests * remove unused struct field * simplify notification service mock * trying to resolve issues in the tests * make linter happy * make linter even happier * linter, you are annoying	2022-01-26 16:42:40 +01:00
Yuriy Tseretyan	ea478dec22	Alerting: Remove bridge between log15 and go-kit logger (#43769 ) * remove bridge between log15 and go-kit logger. * fix tests	2022-01-07 09:40:09 +01:00
idafurjes	56c3875bb9	Chore: Remove context.TODO (#43458 ) * Remove context.TODO() from services * Fix live test	2021-12-28 10:26:18 +01:00
Alexander Weaver	56b3dc5445	Alerting: Allow configuration of non-ready alertmanagers (#43063 ) * Create API test for overwriting invalid alertmanager config * Avoid requiring alertmanager readiness for config changes * AlertmanagerSrv depends on functionality rather than concrete types * Add test for non-ready alertmanagers * Additional cleanup and polish * Back out previous integration test changes * Refactor of tests incorrectly caused a test to become redundant * Use pre-existing fake secret service * Drop unused interface * Test against concrete MultiOrgAlertmanager re-using fake infra from other tests * Fix linter error * Empty commit to rerun checks	2021-12-27 17:01:17 -06:00
Jean-Philippe Quéméner	b9cdad3814	Alerting: support mute timings configuration through the api for the embedded alertmanager (#41533 ) * Alerting: accept mute_timing_intervals through the api for the embedded alertmanager * add workaround for mutetimeinterval * add mute timings to routes * revert changes * Update pkg/services/ngalert/api/api_alertmanager.go * Update pkg/services/ngalert/api/api_alertmanager.go * Update pkg/services/ngalert/api/api_alertmanager.go * update prometheus/alertmanager dependency * add some var docs	2021-11-19 16:50:55 +01:00
Jean-Philippe Quéméner	153c356993	Alerting: delete orphaned records from kvstore (#40337 )	2021-10-14 12:04:00 +02:00
gotjosh	48d73cb148	Alerting: Fixes a bug when trying to sync broken alertmanager config (#40338 ) * Alerting: Fixes a bug when trying to sync broken alertmanager config Broken alertmanager configuration has the potential to be introduced as part of a migration e.g. due to incompatible data between what grafana accepts and what the Alertmanager expects. When this happens, we expect an eventually consistent behaviour where we'll keep trying to apply the configuration until it works. As part of change in https://github.com/grafana/grafana/pull/39237 we introduced a regression that modified this behaviour and instead tried to create a new Alertmanager for that organization everytime, which eventually ended up in a panic due to a duplicate metrics being registered. This PR fixes that and introduces a test to catch further regressions. * Remove disable orgs	2021-10-12 18:10:08 +01:00
Jean-Philippe Quéméner	e1dfec49f9	Alerting: cleanup alert resources on org removal (#39938 )	2021-10-12 12:05:02 +02:00
Joan López de la Franca Beltran	722c414fef	Encryption: Refactor securejsondata.SecureJsonData to stop relying on global functions (#38865 ) * Encryption: Add support to encrypt/decrypt sjd * Add datasources.Service as a proxy to datasources db operations * Encrypt ds.SecureJsonData before calling SQLStore * Move ds cache code into ds service * Fix tlsmanager tests * Fix pluginproxy tests * Remove some securejsondata.GetEncryptedJsonData usages * Add pluginsettings.Service as a proxy for plugin settings db operations * Add AlertNotificationService as a proxy for alert notification db operations * Remove some securejsondata.GetEncryptedJsonData usages * Remove more securejsondata.GetEncryptedJsonData usages * Fix lint errors * Minor fixes * Remove encryption global functions usages from ngalert * Fix lint errors * Minor fixes * Minor fixes * Remove securejsondata.DecryptedValue usage * Refactor the refactor * Remove securejsondata.DecryptedValue usage * Move securejsondata to migrations package * Move securejsondata to migrations package * Minor fix * Fix integration test * Fix integration tests * Undo undesired changes * Fix tests * Add context.Context into encryption methods * Fix tests * Fix tests * Fix tests * Trigger CI * Fix test * Add names to params of encryption service interface * Remove bus from CacheServiceImpl * Add logging * Add keys to logger Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Add missing key to logger Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Undo changes in markdown files * Fix formatting * Add context to secrets service * Rename decryptSecureJsonData to decryptSecureJsonDataFn * Name args in GetDecryptedValueFn * Add template back to NewAlertmanagerNotifier * Copy GetDecryptedValueFn to ngalert * Add logging to pluginsettings * Fix pluginsettings test Co-authored-by: Tania B <yalyna.ts@gmail.com> Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>	2021-10-07 17:33:50 +03:00
Sofia Papagiannaki	012d4f0905	Alerting: Remove `ngalert` feature toggle and introduce two new settings for enabling Grafana 8 alerts and disabling them for specific organisations (#38746 ) * Remove `ngalert` feature toggle * Update frontend Remove all references of ngalert feature toggle * Update docs * Disable unified alerting for specific orgs * Add backend tests * Apply suggestions from code review Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Disabled unified alerting by default * Ensure backward compatibility with old ngalert feature toggle * Apply suggestions from code review Co-authored-by: gotjosh <josue@grafana.com>	2021-09-29 16:16:40 +02:00
Yuriy Tseretyan	1910d85ae0	Alerting: Optimization of fetching data in multiorg alertmanager (#39237 ) * Add method GetAllLatestAlertmanagerConfiguration to DBStore * add method ApplyConfig to AlertManager * update multiorg alert manager to load all alertmanager configs at once	2021-09-21 11:01:23 -04:00
gotjosh	2ad82b9354	Alerting: Move the unified alerting settings to its own struct (#39350 )	2021-09-20 10:12:21 +03:00
gotjosh	7db97097c9	Alerting: Support Unified Alerting with Grafana HA (#37920 ) * Alerting: Support Unified Alerting in Grafana's HA mode.	2021-09-16 15:33:51 +01:00
gotjosh	a2f4344bf2	Alerting: Refactor & fix unified alerting metrics structure (#39151 ) * Alerting: Refactor & fix unified alerting metrics structure Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance. This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.	2021-09-14 12:55:01 +01:00
gotjosh	39a3bb8a1c	Alerting: Persist notification log and silences to the database (#39005 ) * Alerting: Persist notification log and silences to the database This removes the dependency of having persistent disk to run grafana alerting. Instead of regularly flushing the notification log and silences to disk we now flush the binary content of those files to the database encoded as a base64 string.	2021-09-09 17:25:22 +01:00
David Parrott	7fbeefc090	Alerting: create wrapper for Alertmanager to enable org level isolation (#37320 ) Introduces org-level isolation for the Alertmanager and its components. Silences, Alerts and Contact points are not separated by org and are not shared between them. Co-authored with @davidmparrott and @papagian	2021-08-24 11:28:09 +01:00

50 Commits