grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-25 18:55:37 -06:00

Author	SHA1	Message	Date
Matthew Jacobson	2b51f0e263	Alerting: In migration improve deduplication of title and group (#78351 ) * Alerting: In migration improve deduplication of title and group This change improves alert titles generated in the legacy migration that occur when we need to deduplicate titles. Now when duplicate titles are detected we will first attempt to append a sequential index, falling back to a random uid if none are unique within 10 attempts. This should cause shorter and more easily readable deduplicated titles in most cases. In addition, groups are no longer deduplicated. Instead we set them to a combination of truncated dashboard name and humanized alert frequency. This way, alerts from the same dashboard share a group if they have the same evaluation interval. In the event that truncation causes overlap, it won't be a big issue as all alerts will still be in a group with the correct evaluation interval.	2023-11-29 10:05:00 -05:00
Santiago	73776f37eb	Alerting: Send state to the remote Alertmanager (#78538 ) * Alerting: Introduce a Mimir client as part of the Remote Alertmanager Mimir client that understands the new APIs developed for mimir. Very much a WIP still. * more wip * appease the linter * more linting * add more code * get state from kvstore, encode, send * send state to the remote Alertmanager, extract fullstate logic into its own function * pass kvstore to remote.NewAlertmanager() * refactor * add fake kvstore to tests * tests * use FileStore to get state * always log 'completed state upload' * refactor compareRemoteConfig * base64-encode the state in the file store * export silences and nflog filenames, refactor * log 'completed state/config upload...' regardless of outcome * add values to the state store in tests * address code review comments * log error from filestore --------- Co-authored-by: gotjosh <josue.abreu@gmail.com>	2023-11-29 12:49:39 +01:00
Matthew Jacobson	ce90a1f2be	Alerting: Apply query optimization to eval endpoints (#78566 ) * Alerting: Apply query optimization to eval endpoints Previously, query optimization was applied to alert queries when scheduled but not when ran through `api/v1/eval` or `/api/v1/rule/test/grafana`. This could lead to discrepancies between preview and scheduled alert results.	2023-11-28 19:44:28 -05:00
Santiago	01d274852c	Alerting: Add GetFullState method to FileStore (#78701 ) * Alerting: Add GetFullState method to FileStore * make tests compile, create stateStore in NewAlertmanager * return errors instead of logging, accept an arbitrary number of strings * make NewAlertmanager() accept a stateStore	2023-11-28 15:34:45 +01:00
William Wernert	f7bf818527	Alerting: Make alert state history Loki http client public (#78291 ) * Make state history Loki client public * Make historian metrics subsystem configurable	2023-11-27 09:20:50 -05:00
Matthew Jacobson	4b439b7f52	Alerting: In migration, fallback to '1s' for malformed min interval (#78614 ) * Alerting: In migration, fallback to '1s' for malformed min interval During legacy migration, when we encounter an alert datasource query with a min interval (interval field in the query model) that is not parseable, instead of failing the migration we fallback to a min interval of 1s and continue. The reason for this is a bug in legacy alerting (existing for a few major versions) which allows arbitrary dashboard variables to be used as the min interval, even though those variables do not work and will cause the legacy alert to fail with `interval calculation failed: time: invalid duration`.	2023-11-24 11:27:44 -05:00
gotjosh	8120306fea	Remote Alertmanager(refactor): Only parse the URL once (#78631 ) * Remote Alertmanager(refactor): Only parse the URL once Exactly what it says in the tin. Signed-off-by: gotjosh <josue.abreu@gmail.com> * use the existing tests Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>	2023-11-24 11:05:13 +00:00
Jean-Philippe Quéméner	11d4f604f5	fix(alerting): proper handling for queries with multiple conditions in migration (#78591 ) fix(alerting): proper handling for queries with multiple conditions	2023-11-23 18:05:44 +01:00
gotjosh	23fe8f4e9c	Alerting: Introduce a Mimir client as part of the Remote Alertmanager (#78357 ) * Alerting: Introduce a Mimir client as part of the Remote Alertmanager This is our first attempt at making Grafana communicate use Mimir as a backend - it uses a new set of APIs that we've developed on the Mimir side to upload the grafana configuration and alertmanager state so that it can then be ported over. Codewise, we've introduced a couple of things: A client to isolate in its own package all the communication that happens with Mimir A few changes to the remote/alertmanager to include uploading the configuration and state when it starts A few refactors that align a bit better with the design approach that we're thinking An integration tests again these newly developed APIs using a custom image --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>	2023-11-23 16:59:36 +00:00
Jo	0de66a8099	Authz: Remove use of SignedInUser copy for permission evaluation (#78448 ) * remove use of SignedInUserCopies * add extra safety to not cross assign permissions unwind circular dependency dashboardacl->dashboardaccess fix missing import * correctly set teams for permissions * fix missing inits * nit: check err * exit early for api keys	2023-11-22 14:20:22 +01:00
Tania	39754ba2d6	Nested Folders: Wrap create/update operations with transactions (#78000 ) * Nested Folders: Add transaction to create and update methods * Update tests * Make IncreaseVersionForAllRulesInNamespace synchronous * Resolve merge conflicts	2023-11-21 23:06:20 +02:00
Kat Yang	2f2ce3edbb	Chore: Deprecate ID from Folder (#78281 ) * Chore: Deprecate ID from Folder * chore: add more linter comments * chore: add missing lint comment	2023-11-20 15:44:51 -05:00
Matthew Jacobson	893839d27b	Alerting: Move general alert rule validation from db-layer to model (#78325 ) Alerting: Move general alert rule validation to model	2023-11-17 11:20:50 -05:00
Jean-Philippe Quéméner	2d2e058563	refactor: use constant for prometheus datasource type (#78287 )	2023-11-17 01:07:35 +01:00
Yuri Tseretyan	7cec741bae	Alerting: Extract alerting rules authorization logic to a service (#77006 ) * extract alerting authorization logic to separate package * convert authorization logic to service	2023-11-15 18:54:54 +02:00
Kat Yang	3a2e96b0db	Chore: Deprecate FolderID from Dashboard (#77823 ) * Chore: Deprecate FolderID from Dashboard * chore: add two missing nolint comments	2023-11-15 10:28:50 -05:00
Ryan McKinley	f69fd3726b	FeatureToggles: Add context and and an explicit global check (#78081 )	2023-11-14 12:50:27 -08:00
Jo	580477bf8e	NGAlerting: Use identity.Requester interface instead of SignedInUser (#76360 ) * unfurl SignedInUserAttrs services * replace signedInUser with Requester replace signedInUser with requester * fix tests * linting --------- Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>	2023-11-14 14:47:34 +00:00
Santiago	4a152a0e35	Alerting: Add lifecycle methods to the forked Alertmanager (#77741 ) * Alerting: Add an empty Forked Alertmanager * Alerting: Add methods for silences to the forked Alertmanager * check for errors in tests * make linter happy * Alerting: Add methods for alerts to the forked Alertmanager * Alerting: Add methods for receivers to the forked Alertmanager * Alerting: Add TestTemplate method to the forked Alertmanager * make linter happy * separate into both forked AMs * fix tests * Alerting: Add lifecycle methods to the forked Alertmanager	2023-11-14 11:17:17 +01:00
Ryan McKinley	dec9a07738	Settings: Actually deprecate access to feature flags (#78073 )	2023-11-13 11:39:01 -08:00
Ryan McKinley	3509a5abb9	FeatureFlags: Cleanup usage of cfg.IsFeatureToggleEnabled (#78014 )	2023-11-13 07:55:15 -08:00
Santiago	8b751eb216	Alerting: Add TestTemplate method to the forked Alertmanager (#77577 ) * Alerting: Add an empty Forked Alertmanager * Alerting: Add methods for silences to the forked Alertmanager * check for errors in tests * make linter happy * Alerting: Add methods for alerts to the forked Alertmanager * Alerting: Add methods for receivers to the forked Alertmanager * Alerting: Add TestTemplate method to the forked Alertmanager * make linter happy * separate into both forked AMs * fix tests	2023-11-09 12:35:24 +01:00
Santiago	ba51c371ec	Alerting: Add methods for receivers to the forked Alertmanager (#77574 ) * Alerting: Add an empty Forked Alertmanager * Alerting: Add methods for silences to the forked Alertmanager * check for errors in tests * make linter happy * Alerting: Add methods for alerts to the forked Alertmanager * Alerting: Add methods for receivers to the forked Alertmanager * make linter happy * separate into both forked AMs * fix tests * rename testErr -> expErr	2023-11-09 11:38:16 +01:00
Santiago	e24fe96d90	Alerting: Add methods for alerts to the forked Alertmanager (#77571 ) * Alerting: Add an empty Forked Alertmanager * Alerting: Add methods for silences to the forked Alertmanager * check for errors in tests * make linter happy * Alerting: Add methods for alerts to the forked Alertmanager * make linter happy * separate into both forked AMs * rename testErr -> expErr	2023-11-08 13:52:04 +01:00
Santiago	197f0d2859	Alerting: Add methods for silences to the forked Alertmanager (#77805 ) * Alerting: Add an empty Forked Alertmanager * Alerting: Add methods for silences to the forked Alertmanager * check for errors in tests * make linter happy * make linter happy * Alerting: Add methods for silences to the forked Alertmanager	2023-11-08 12:03:40 +01:00
Yuri Tseretyan	a2629f3dd3	Alerting: Remove unused Accesscontrol dependency from DbStore (#77479 )	2023-11-02 15:54:30 -04:00
William Wernert	e562250f72	Alerting: Handle edge cases without panicking during template migration (#76890 ) * Handle empty variable, remove panics * Use fmt.Errorf only where appropriate	2023-11-02 13:24:54 -04:00
Santiago	01af8f61f1	Alerting: Separate the forked Alertmanager into two implementations (#77582 )	2023-11-02 17:53:18 +01:00
Santiago	8fc9873443	Alerting: Add an empty Forked Alertmanager struct (#77550 ) Alerting: Add an empty Forked Alertmanager	2023-11-02 16:49:03 +01:00
Yuri Tseretyan	85425b2194	Alerting: Fix flaky test TestExportRules (#77519 ) * fix test to correclty mock data store * Update pkg/services/ngalert/api/api_ruler_export_test.go Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com> * Update pkg/services/ngalert/api/api_ruler_export_test.go --------- Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>	2023-11-01 21:35:04 +02:00
Ryan McKinley	5d5f8dfc52	Chore: Upgrade Go to 1.21.3 (#77304 )	2023-11-01 09:17:38 -07:00
Santiago	a6b9b27673	Alerting: Remove OrgID() from the Alertmanager interface (#77398 )	2023-10-31 10:58:47 +01:00
Kyle Brandt	e4d1fdc3d0	Errors: Make errors the same in dev as prod (#77366 ) When running in dev mode, error messages would contain an additional "error" property alongside "message". Since this causes confusion, that has been removed and now error messages are the same both modes (using "message").	2023-10-30 14:06:26 -04:00
Yuri Tseretyan	48b55f39bf	Alerting: Add support for responders to Opsgenie integration (#77159 ) * add support for responders in opsgenie UI config * update export model Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>	2023-10-27 13:06:46 -04:00
Santiago	f9fc2e4568	Alerting: Remove ConfigHash() from the Alertmanager interface (#77134 )	2023-10-25 17:11:53 +02:00
Alexander Weaver	6ee52ac80c	Alerting: Allow more time before Alertmanager expire-resolves alerts (#77094 ) * Sync endsAt factor with prometheus * Fix state tests	2023-10-25 10:03:46 -05:00
Santiago	322a9c0b15	Alerting: Replace FileStore() for CleanUp() in the Alertmanager interface (#77126 ) Alerting: Remplace FileStore() for CleanUp() in the Alertmanager interface	2023-10-25 13:58:28 +02:00
Santiago	01add144b8	Alerting: Send alerts to the remote Alertmanager (#77034 ) * Alerting: Rename remote.ExternalAlertmanager to remote.Alertmanager * Alerting: Send alerts to the remote Alertmanager * add ticker to readiness check, add tests * use options when creating a new sender.ExternaAlertmanager * unexport defaultMaxQueueCapacity * delete unused defaultConfig field * add debug log line when sending alerts to the remote alertmanager * move and refactor readiness check * update tests to not include defaultConfig	2023-10-25 11:52:48 +02:00
Alexander Weaver	39599fa7f7	Alerting: Alert rule constraint violations return as 400s in provisioning API (#76396 ) Constraint violations become 400s	2023-10-23 10:28:40 -05:00
Santiago	488a60aee6	Alerting: Rename remote.ExternalAlertmanager to remote.Alertmanager (#76956 )	2023-10-23 15:37:14 +02:00
gotjosh	866acbd5ac	Alerting: Move `ExternalAlertmanager` to its own package (#76854 ) * Alerting: Move `ExternalAlertmanager` to its own package We'll avoid import cycles when using components from other packages. In addition to that, I've created an `Options` approach for the multiorg alertmanger to allow us to override how per tenant alertmanagers are created. * switch things around * address review comments * fix references and warnings	2023-10-20 14:08:13 +02:00
Santiago	a60ec150f9	Alerting: Fetch receivers from remote Alertmanager (#76841 ) * Alerting: fetch receivers from remote Alertmanager * make linter happy * change require.Eventually() timeout and tick	2023-10-20 11:34:17 +02:00
Steve Simpson	a0476741f2	Alerting: Fix HCL export for alerts with non-zero "for" field. (#76739 ) * Alerting: Fix HCL export for alerts with non-zero "for" field. Fixes #76734 * fix tests --------- Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>	2023-10-20 11:09:08 +02:00
Matthew Jacobson	c2efcdde09	Alerting: Fix flaky SQLITE_BUSY when migrating with provisioned dashboards (#76658 ) * Alerting: Move migration from background service run to ngalert init sqlite database write contention between the migration's single transaction and dashboard provisioning's frequent commits was causing the migration to fail with SQLITE_BUSY/SQLITE_BUSY_SNAPSHOT on all retries. This is not a new issue for sqlite+grafana, but the discrepancy between the length of the transactions was causing it to be very consistent. In addition, since a failed migration has implications on the assumed correctness of the alertmanager and alert rule definition state, we cause a server shutdown on error. This can make e2e tests as well as some high-load provisioned sqlite installations flaky on startup. The correct fix for this is better transaction management across various services and is out of scope for this change as we're primarily interested in mitigating the current bout of server failures in e2e tests when using sqlite.	2023-10-19 10:03:00 -04:00
Santiago	61cb26711e	Alerting: Fetch alerts from a remote Alertmanager (#75844 ) * Alerting: post alerts to the remote Alertmanager and fetch them * fix broken tests * Alerting: Add Mimir Backend image to devenv (blocks) * add alerting as code owner for mimir_backend block * Alerting: Use Mimir image to run integration tests for the remote Alertmanager * skip integration test when running all tests * skipping integration test when no Alertmanager URL is provided * fix bad host for mimir_backend * remove basic auth testing until we have an nginx image in our CI * add integration tests for alerts * fix tests * change SendCtx -> Send, add context.Context to Send, fix CI * add reover() for functions from the Prometheus Alertmanager HTTP client that could panic * add TODO to implement PutAlerts in a way that mimicks what Prometheus does * fix log format	2023-10-19 11:27:37 +02:00
Alexander Weaver	acee3efcf9	Alerting: Use common StateReason values for NoData/Error mapped states (#76781 ) Fix hardcoded state reasons	2023-10-18 17:26:41 -05:00
Santiago	7d9b2c73c7	Alerting: Use Mimir image to run integration tests for the remote Alertmanager (#76608 ) * Alerting: Use Mimir image to run integration tests for the remote Alertmanager * skip integration test when running all tests * skipping integration test when no Alertmanager URL is provided * fix bad host for mimir_backend * remove basic auth testing until we have an nginx image in our CI	2023-10-17 12:21:45 +02:00
Jean-Philippe Quéméner	2b8c6d66e1	feat(alerting): add query optimizations for prometheus (#76015 )	2023-10-17 11:41:25 +02:00
Torkel Ödegaard	0d55dad075	DashboardScene: Fixes full page reload of fullscreen view of a repeated panel (#76326 ) * Progress on view panel for repeats * Good enough * Update	2023-10-13 16:03:38 +02:00
Matthew Jacobson	a6d928e50e	Alerting: Prevent cleanup of non-empty folders on migration revert (#76439 ) Prevent cleanup of non-empty folders on revert	2023-10-12 18:40:51 -04:00

1 2 3 4 5 ...

1198 Commits