grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-25 18:30:41 -06:00

Author	SHA1	Message	Date
Santiago	36a0499128	Alerting: Implement CreateSilence in the forked Alertmanager (remote primary mode) (#85716 )	2024-04-29 18:47:25 +02:00
Santiago	1af2e69625	Alerting: Implement DeleteSilence in the forked AM (remote primary) (#85721 )	2024-04-29 17:23:41 +02:00
Steve Simpson	fbaa847a3c	Alerting: Pass logger into NewRemoteLokiBackend. (#87029 ) Tiny refactor to allow a logger to be passed into NewRemoteLokiBackend.	2024-04-29 12:10:23 +02:00
Yuri Tseretyan	dff7cb9afb	Alerting: Move alertmanager api silence code to separate files (#86947 ) * Move alertmanager api silence code to separate files unchanged * Replace with silence model instead interface --------- Co-authored-by: Matt Jacobson <matthew.jacobson@grafana.com>	2024-04-25 15:20:37 -04:00
Matthew Jacobson	3397e8bf09	Alerting: Improve error when receiver or time interval used by rule is deleted (#86865 ) * Alerting: Improve error when receiver used by rule is deleted * Remove RuleUID from public error and data * Improve fallback error in am config post * Refactor to expand to time intervals * Fix message on unchecked errors to be same as before	2024-04-25 13:36:00 -04:00
Santiago	a6be12c037	Alerting: Implement SaveAndApplyConfig in the forked Alertmanager (remote primary) (#84659 ) * Alerting: Implement SaveAndApplyConfiguration in the forked Alertmanager struct * call SaveAndApplyConfig on the remote first, log errors for the internal * add comments explaining why we ignore errors in the internal AM * restore go.work.sum	2024-04-23 15:45:35 +02:00
Steve Simpson	a6ad2380bf	Alerting: Refactor api_prometheus.go request handlers. (#86639 ) This splits the request handlers into two functions, one which is the actual handler and one which is independent from the Grafana `ReqContext` object. This is to make it easier to reuse the implementation in other code. Part of the refactoring changes the functions which get query parameters from the request to operate on a `url.Values` instead of the request object. The change also makes the code consistently use `req.Form` instead of a combination of `req.URL.Query()` and `req.Form`, though I have left `api_ruler` as-is to avoid this PR growing too large.	2024-04-23 14:50:26 +02:00
Santiago	c77ab53819	Alerting: implement SaveAndApplyConfig in the remote Alertmanager struct (#84642 ) * implement SaveAndApplyConfig in the remote Alertmanager struct * remove ID from CreateGrafanaAlertmanagerConfig call * decrypt, test that we decrypt, refactor * fix duplicated declaration in test * rephrase comment, remove unnecessary conversion to slice of bytes * fix test	2024-04-23 14:37:10 +02:00
Santiago	8b7c2a459b	Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary mode) (#85668 ) * Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary) * log the error for the internal AM instead of returning it	2024-04-23 14:36:40 +02:00
Yuri Tseretyan	9735a8a080	Alerting: Distinguish conflict violation errors (#86634 ) * update generator to set ID = 0 and do not set 0 if unique is needed * return proper message when the constraint violation	2024-04-22 12:28:46 -04:00
Julian Siebert	14f018e3fc	Docs: Use correct description for "og_priority" (#80889 )	2024-04-22 13:53:18 +00:00
Steve Simpson	54290f2ac4	Alerting: Fix TestRouteGetRuleStatuses as much as possible. (#86666 ) This test has been skipped for a long time, so it doesn't work anymore. I've fixed the test so it works again, but left some tests disabled which were apparently flaky. If we see the other test cases flaking, we'll have to disable it again. Fixes: - Use fake access control for most test cases, and real one for FGAC test cases. - Check that "file" in API responses the full folder path, not folder title.	2024-04-22 12:36:50 +02:00
Steve Simpson	f07f48616a	Alerting: Fix panic when limit_alerts=0. (#86640 ) Oversight in the TopK function meant if k=0, then we'd panic when checking element zero in the heap, because no items are ever allowed into the heap.	2024-04-22 10:14:19 +02:00
Steve Simpson	6ea97e41fb	Alerting: Consistently return Prometheus-style responses from rules APIs. (#86600 ) * Alerting: Consistently return Prometheus-style responses from rules APIs. This commit is part refactor and part fix. The /rules API occasionally returns error responses which are inconsistent with other error responses. This fixes that, and adds a function to map from Prometheus error type and HTTP code. * Fix integration tests * Linter happiness * Make linter more happy * Fix up one more place returning non-Prometheus responses	2024-04-19 21:03:20 +02:00
Santiago	529f55cfe8	Alerting: Remove isDefault field from receivers (Alertmanager configuration) (#86605 ) Alerting: Remove isDefault field from receivers in the Alertmanager configuration	2024-04-19 15:44:20 +02:00
Santiago	309a7e7684	Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct (#85005 ) * Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct * send the hash of the encrypted configuration * tests, default config hash in AM struct * add missing default config to test * restore build directory * go work file... * fix broken test * remove unnecessary conversion to []byte * go work again... * make things work again with latest main branch changes * update error messages in tests for decrypting config	2024-04-19 15:11:07 +02:00
Santiago	a2ce8fefed	Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager (#86451 ) * Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager * remove '-distroless' from mimir image name	2024-04-19 13:04:18 +02:00
Steve Simpson	5f7612834e	Alerting: Refactoring in api_prometheus.go to allow code reuse. (#86575 ) Preparing these functions to be used by some other part of the codebase, which does not have a `contextmodel.ReqContext`, only the normal request structure (`url.Values`, etc). This is slightly messy because of how Grafana allows url parameters to be in the URL or in the request body, so we need to make sure to invoke the form parsing logic in `ReqContext`.	2024-04-19 12:52:01 +02:00
Steve Simpson	73873f5a8a	Alerting: Optimize rule status gathering APIs when a limit is applied. (#86568 ) * Alerting: Optimize rule status gathering APIs when a limit is applied. The frontend very commonly calls the `/rules` API with `limit_alerts=16`. When there are a very large number of alert instances present, this API is quite slow to respond, and profiling suggests that a big part of the problem is sorting the alerts by importance, in order to select the first 16. This changes the application of the limit to use a more efficient heap-based top-k algorithm. This maintains a slice of only the highest ranked items whilst iterating the full set of alert instances, which substantially reduces the number of comparisons needed. This is particularly effective, as the `AlertsByImportance` comparison is quite complex. I've included a benchmark to compare the new TopK function to the existing Sort/limit strategy. It shows that for small limits, the new approach is much faster, especially at high numbers of alerts, e.g. 100K alerts / limit 16: 1.91s vs 0.02s (-99%) For situations where there is no effective limit, sorting is marginally faster, therefore in the API implementation, if there is either a) no limit or b) no effective limit, then we just sort the alerts as before. There is also a space overhead using a heap which would matter for large limits. * Remove commented test cases * Make linter happy	2024-04-19 11:51:22 +02:00
Matthew Jacobson	a20197229e	Alerting: Prevent simplified routing zero duration GroupInterval and RepeatInterval (#86561 ) Prevent zero duration GroupInterval and RepeatInterval	2024-04-18 21:08:38 -04:00
Matthew Jacobson	71445002b7	Alerting: Fix simplified routing group by override (#86552 ) * Alerting: Fix simplified routing custom group by override Custom group by overrides for simplified routing were missing required fields GroupBy and GroupByAll normally set during upstream Route validation. This fix ensures those missing fields are applied to the generated routes. * Inline GroupBy and GroupByAll initialization instead of normalize after	2024-04-18 21:08:14 -04:00
Matthew Jacobson	533bed6d94	Alerting: Fix simplified routes '...' groupBy creating invalid routes (#86006 ) * Alerting: Fix simplified routes '...' groupBy creating invalid routes There were a few ways to go about this fix: 1. Modifying our copy of upstream validation to allow this 2. Modify our notification settings validation to prevent this 3. Normalize group by on save 4. Normalized group by on generate Option 4. was chosen as the others have a mix of the following cons: - Generated routes risk being incompatible with upstream/remote AM - Awkward FE UX when using '...' - Rule definition changing after save and potential pitfalls with TF With option 4. generated routes stay compatible with external/remote AMs, FE doesn't need to change as we allow mixed '...' and custom label groupBys, and settings we save to db are the same ones requested. In addition, it has the slight benefit of allowing us to hide the internal implementation details of `alertname, grafana_folder` from the user in the future, since we don't need to send them with every FE or TF request. * Safer use of DefaultNotificationSettingsGroupBy * Fix missed API tests	2024-04-16 12:14:39 -04:00
Alexander Weaver	5b1498f98f	Alerting: Return a 400 and errutil error when trying to delete a contact point that is referenced by a policy (#85481 ) Return a 400 and errutil error when trying to delete a contact point that is referenced by a policy	2024-04-15 09:25:28 -05:00
Yuri Tseretyan	12605bfed2	Alerting: Update fixed roles to include silences permissions (#85826 ) * update fixed roles to include silences * add silence actions to managed permissions * update documentation	2024-04-12 12:37:34 -04:00
Steve Simpson	ad7f804255	Alerting: Fix evaluation metrics to not count retries (#85873 ) * Change evaluation metrics to only count once per eval, and add new metrics. * Cosmetic: Move eval total Inc() to orginal place.	2024-04-12 16:20:46 +02:00
Matthew Jacobson	f79dd7c7f9	Alerting: Persist silence state immediately on Create/Delete (#84705 ) * Alerting: Persist silence state immediately on Create/Delete Persists the silence state to the kvstore immediately instead of waiting for the next maintenance run. This is used after Create/Delete to prevent silences from being lost when a new Alertmanager is started before the state has persisted. This can happen, for example, in a rolling deployment scenario. * Fix test that requires real data * Don't error if silence state persist fails, maintenance will correct	2024-04-09 13:39:34 -04:00
Santiago	2e7cc68394	Alerting: Remove CleanUp method from the Alertmanager (#85650 ) Alerting: Remove Cleanup method from the Alertmanager	2024-04-09 12:13:27 +02:00
Yuri Tseretyan	509691b416	Alerting: Introduce authorization logic for operations on silences (#85418 ) * extract genericService from RuleService just to reuse it later * implement silence service --------- Co-authored-by: William Wernert <william.wernert@grafana.com> Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2024-04-08 18:02:28 -04:00
Santiago	6a75a8f354	Alerting: Update grafana/alerting and use Upsert for creating silences (#85676 ) * Alerting: Update grafana/alerting and use Upsert for creating silences * go.work.sum * change error message in tests for silences (save -> upsert)	2024-04-08 11:46:14 +02:00
Alexander Weaver	03114e7602	Alerting: Return better error for invalid time range on alert queries (#85611 ) * Return better error for invalid time range * drop comment	2024-04-05 09:20:21 -05:00
Alexander Weaver	734d0111cb	Alerting: Export pure function to convert query results to alert results (#85393 ) Exported pure function to convert query results to alert results	2024-04-05 08:57:31 -05:00
Santiago	c7573bb0f7	Alerting: Make retention period configurable for the notification log (#85605 ) * Alerting: Make retention period configurable for the notification log * update sample.ini * fix outdated comment (on disk -> kvstore) * skip checking cyclomatic complexity for ReadUnifiedAlertingSettings	2024-04-05 12:25:43 +02:00
Alexander Weaver	623ee3a2be	Alerting: Only append `/alertmanager` when sending alerts to mimir targets if not already present (#85543 ) Don't append alertmanager if not present	2024-04-04 11:58:41 -05:00
Dave Henderson	5687243d0b	Feature Flags: use FeatureToggles interface where possible (#85131 ) * Feature Flags: use FeatureToggles interface where possible Signed-off-by: Dave Henderson <dave.henderson@grafana.com> * Replace TestFeatureToggles with existing WithFeatures Signed-off-by: Dave Henderson <dave.henderson@grafana.com> --------- Signed-off-by: Dave Henderson <dave.henderson@grafana.com>	2024-04-04 12:22:31 -04:00
Serge Zaitsev	faa1244518	Chore: Replace sqlstore with db interface (#85366 ) * replace sqlstore with db interface in a few packages * remove from stats * remove sqlstore in admin test * remove sqlstore from api plugin tests * fix another createUser * remove sqlstore in publicdashboards * remove sqlstore from orgs * clean up orguser test * more clean up in sso * clean up service accounts * further cleanup * more cleanup in accesscontrol * last cleanup in accesscontrol * clean up teams * more removals * split cfg from db in testenv * few remaining fixes * fix test with bus * pass cfg for testing inside db as an option * set query retries when no opts provided * revert golden test data * rebase and rollback	2024-04-04 15:04:47 +02:00
Jean-Philippe Quéméner	7cfd470c91	fix(alerting): only expose metrics if executing alerts (#85512 )	2024-04-03 17:18:02 +02:00
Benoit Tigeot	6f38ac6615	Alerting: Reduce set of fields that could trigger alert state change (#83496 ) We want to avoid too much change of alert state based on change on alert's fields. For that we ignore some fields from the diff.	2024-03-26 12:35:30 -04:00
Julien Duchesne	2188516a21	Alerting: Fix receiver inheritance when provisioning a notification policy (#82007 ) Terraform Issue: grafana/terraform-provider-grafana#1007 Nested routes should be allowed to inherit the contact point from the root (or direct parent) route but this fails in the provisioning API (it works in the UI)	2024-03-26 12:31:59 -04:00
ismail simsek	6137c4e0a6	Chore: Bump golangci-lint v1.57.1 (#84998 ) * bump golangci-lint v1.57.1 * update setting * remove goconst * fix linting issues * prettier * fix G601 * go mod tidy go work sync	2024-03-25 15:28:24 +01:00
Matthew Jacobson	0c3c5c5607	Alerting: Stop persisting silences and nflog to disk (#84706 ) With this change, we no longer need to persist silence/nflog states to disk in addition to the kvstore	2024-03-23 00:37:33 +02:00
Yuri Tseretyan	48de8657c9	Alerting: Editor role can access all provisioning API (#85022 )	2024-03-23 00:14:15 +02:00
Yuri Tseretyan	b9abb8cabb	Alerting: Update provisioning API to support regular permissions (#77007 ) * allow users with regular actions access provisioning API paths * update methods that read rules skip new authorization logic if user CanReadAllRules to avoid performance impact on file-provisioning update all methods to accept identity.Requester that contains all permissions and is required by access control. * create deltas for single rul e * update modify methods skip new authorization logic if user CanWriteAllRules to avoid performance impact on file-provisioning update all methods to accept identity.Requester that contains all permissions and is required by access control. * implement RuleAccessControlService in provisioning * update file provisioning user to have all permissions to bypass authz * update provisioning API to return errutil errors correctly --------- Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>	2024-03-22 15:37:10 -04:00
Yuri Tseretyan	e138ae3eb9	Alerting: Improve openAPI specification and docs for export endpoints (#85008 )	2024-03-22 18:25:27 +02:00
Jean-Philippe Quéméner	f2c7023fe6	fix(alerting): use uid and not rand() in tests for title (#85001 )	2024-03-22 16:26:09 +02:00
Santiago	a2facbecd4	Alerting: Implement ApplyConfig for remote primary mode (forked AM) (#84811 ) * Alerting: Implement ApplyConfig for remote primary mode (forked AM) * add TODO for saving the config hash in other config-related methods * fix bad method receiver name (m -> am) * tests * add mutex * remove sync loop	2024-03-22 15:17:41 +01:00
Pepe Cano	2d6586952d	Alerting: Add placeholder to the Email Contact Point Message (#84064 )	2024-03-21 13:03:12 -04:00
Matthew Jacobson	fbd057b258	Alerting: Stop returning autogen routes for non-admin on api/v2/status (#84864 ) * Alerting: Stop returning autogen routes for non-admin on api/v2/status * Improve api/v2/status integration tests for user roles	2024-03-20 22:04:35 +02:00
William Wernert	6d16cf2699	Alerting: Marshal incoming json.RawMessage in diff (#84692 ) This will ensure the encoding is correct when comparing to the existing rule.	2024-03-20 13:10:39 -04:00
Yuri Tseretyan	04c9f459ec	Alerting: do not check for folder in file provisioning (#84822 ) provide nil folder service in file provisioning	2024-03-20 10:39:03 -04:00
Yuri Tseretyan	e593d36ed8	Alerting: Update rule access control to explicitly check for permissions "alert.rules:read" and "folders:read" (#78289 ) * require "folders:read" and "alert.rules:read" in all rules API requests (write and read). * add check for permissions "folders:read" and "alert.rules:read" to AuthorizeAccessToRuleGroup and HasAccessToRuleGroup * check only access to datasource in rule testing API --------- Co-authored-by: William Wernert <william.wernert@grafana.com>	2024-03-19 22:20:30 -04:00

1 2 3 4 5 ...

1395 Commits