grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-25 18:30:41 -06:00

Author	SHA1	Message	Date
Steve Simpson	8421919cb5	Alerting: Feature toggle to disallow sending alerts externally (#87982 ) * Define feature toggle * Implement feature toggle	2024-05-23 14:29:19 +02:00
Gaurav Agrawal	fdaa091a4d	Alerting: Support custom API URL for PagerDuty integration (#88007 ) * fix assert in LINE * fix pagerduty asserts --------- Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>	2024-05-22 15:31:55 -04:00
Alexander Weaver	89b54d06e9	Alerting: Schedule a shim implementation for recording rules (#87939 ) * Add shim rule implementation for recording rules * Give ruleFactory access to the original rule definition * Schedule shim implementation if the rule is a recording rule * Fix or suppress linter * Fix nolint	2024-05-21 16:42:58 -05:00
Alexander Weaver	49c8deb1ea	Alerting: Add recording rules to ruler API and validation (#87779 ) * Read path, main API * Define record field for incoming requests * Refactor several alerting specific validators into two paths * Refactor validateCondition actually contain all the condition validation logic * Move condition validation inside rule path * Validators for recording rules * Wire feature flag through to validators * Test for accepting a valid recording rule * Tests for negative case, no UID * Test for ignoring alerting fields * Build conditions based on recording rules as well * Regenerate swagger docs * Fix CRUD test to cover the right thing * Re-generate swagger docs with backdated v0.30.2 version * Regenerate base spec * Regenerate ngalert specs * Regenerate top level specs * Comment and rename * Return struct instead of modifying ref	2024-05-21 14:39:28 -05:00
William Wernert	cb0bcb6fe4	Alerting: Fix/update alerting API spec (#88130 )	2024-05-21 10:06:44 -04:00
Santiago	60e7a4e746	Alerting/Chore: Remove unused parameters (#88045 ) Alerting/Chore: Remove unused parameters from redisPeer.receiveLoop() and ReceiverService.shouldDecrypt()	2024-05-20 16:37:39 +02:00
Yuri Tseretyan	8c2a382788	Alerting: Fix typo in JSON response for rule export. (#88028 )	2024-05-20 09:39:39 -04:00
Yuri Tseretyan	05d6813a09	Alerting: Fix scheduler to sort rules before evaluation (#88006 ) sort rules scheduled for evaluation to make sure that the order is stable between evaluations. This is especially important in HA mode.	2024-05-17 11:38:19 -04:00
Santiago	e41434c332	Alerting: Promote configuration in the remote Alertmanager (#87388 )	2024-05-16 12:06:03 +02:00
Yuri Tseretyan	f410c7fca1	Alerting: use logger with same context within rule scheduling loop (#87934 )	2024-05-15 15:38:00 -04:00
Alexander Weaver	1badcf4b63	Alerting: Allow NoData and ExecErrState to be fully blank on recording rules (#87868 ) * Allow empty NoData and ExecErrState on recording rules * remove TODO about this	2024-05-15 09:35:54 -05:00
Alexander Weaver	b8a284fb81	Alerting: Fix xorm serialization of Record field struct, add tests for storing and reading (#87857 ) Fix sub struct ser and deser, add tests	2024-05-14 14:50:06 -05:00
Steve Simpson	67fa96f88d	Alerting: Pass logger into NewAnnotationBackend. (#87812 ) * Alerting: Pass logger into NewAnnotationBackend. Make it possible to pass loggers into more places for code reuse. * Mistake in passing logger	2024-05-14 15:51:27 +02:00
William Wernert	563fcb8bf4	Alerting: Encode query model map to string in rule export to avoid html escape sequences (#87663 ) * Encode query model map to string to avoid html escape sequences * Remove insignificant whitespace in test request	2024-05-14 09:29:50 -04:00
Fayzal Ghantiwala	7a2fbad0c8	Alerting: Add options to configure TLS for HA using Redis (#87567 ) * Add Alerting HA Redis Client TLS configs * Add test to ping miniredis with mTLS * Update .ini files and docs * Add tests for unified alerting ha redis TLS settings * Fix malformed go.sum * Add modowner * Fix lint error * Update docs and use dstls config	2024-05-14 14:21:42 +01:00
Alexander Weaver	e39658097f	Alerting: Wire recording rules feature toggle into limits struct (#87778 ) Wire toggle into limits	2024-05-14 07:44:14 -05:00
Ieva	167151b211	Chore: Remove use of deprecated method in AC code (#87541 ) * switch from using cfg to using featuremgmt for checking a feature toggle in AC code * merge test fixes	2024-05-10 11:56:52 +01:00
Alexander Weaver	a6a9ab4008	Alerting: Do not store series values from past evaluations in state manager for no reason (#87525 ) Do not store previous execution results on states	2024-05-09 15:51:55 -05:00
Yuri Tseretyan	356a29592b	Alerting: Add two sets of provisioning actions for rules and notifications (#87149 )	2024-05-09 13:19:07 -04:00
Alexander Weaver	36ef611cf4	Alerting: Add database migration for recording rule fields (#87012 ) * Create recording rule fields in model * Add migration * Write to database, support in version table * extend fingerprint * Force fields to be empty on validate * Another storage spot, tests for fingerprint * Explicitly set defaults in provisioning API * Tests for main API validation * Add diff tests even though fields are unpopulated for now * Use struct tag approach instead of FromDB/ToDB hooks as it better handles nulls when deserializing * test for deser * Backout RecordTo for now since it's not decided in the doc * back out of migration too * Drop datasourceref for now * address linter complaints * Try a single outer struct with all fields embedded	2024-05-09 12:12:44 -05:00
Alexander Weaver	6c47968f6c	Alerting: Do not retry rule evaluations with "input data must be a wide series but got type long" style errors (#87343 ) add typed error for series must be wide, do not retry	2024-05-07 11:31:07 -05:00
Matthew Jacobson	babfa2beac	Alerting: Hook up GMA silence APIs to new authentication handler (#86625 ) This PR connects the new RBAC authentication service to existing alertmanager API silence endpoints.	2024-05-03 15:32:30 -04:00
Santiago	b76a9e4d31	Alerting: Implement GetStatus in the remote Alertmanager struct (#84887 ) * Alerting: Implement GetStatus in the remote Alertmanager struct * update tests * fix tests, extract AlertmanagerConfig from PostableConfig * get the remote AM config instead of the Grafana one from the remote AM * pass grafana AM config in test * return error in GetStatus instead of logging it (internal AM)	2024-05-03 13:59:02 +02:00
Fayzal Ghantiwala	df25e9197e	Alerting: Get grafana-managed alert rule by UID (#86845 ) * Add auth checks and test * Check user is authorized to view rule and add tests * Change naming * Update Swagger params * Update auth test and swagger gen * Update swagger gen * Change response to GettableExtendedRuleNode * openapi3-gen * Update tests with refactors models pkg	2024-05-02 15:24:59 +01:00
Serge Zaitsev	ad5613d7d4	Chore: Remove cfg from folder service (#87212 ) remove cfg from folder service	2024-05-02 13:18:54 +02:00
William Wernert	93519f70ca	Alerting: Also fix HCL field name for MuteTimeIntervals (#87079 ) * Correct HCL field name for MuteTimeIntervals * Update test	2024-04-30 16:14:01 +01:00
Yuri Tseretyan	052082a927	Alerting: Refactor Alert Rule Generators (#86813 )	2024-04-29 21:52:15 -04:00
William Wernert	70ff229bed	Alerting: Use expected field name for receiver in HCL export (#87065 ) * Use expected field name for receiver in hcl Terraform provider expects `contact_point` instead of `receiver` in notification settings on a rule.	2024-04-29 18:13:29 +01:00
Santiago	36a0499128	Alerting: Implement CreateSilence in the forked Alertmanager (remote primary mode) (#85716 )	2024-04-29 18:47:25 +02:00
Santiago	1af2e69625	Alerting: Implement DeleteSilence in the forked AM (remote primary) (#85721 )	2024-04-29 17:23:41 +02:00
Steve Simpson	fbaa847a3c	Alerting: Pass logger into NewRemoteLokiBackend. (#87029 ) Tiny refactor to allow a logger to be passed into NewRemoteLokiBackend.	2024-04-29 12:10:23 +02:00
Yuri Tseretyan	dff7cb9afb	Alerting: Move alertmanager api silence code to separate files (#86947 ) * Move alertmanager api silence code to separate files unchanged * Replace with silence model instead interface --------- Co-authored-by: Matt Jacobson <matthew.jacobson@grafana.com>	2024-04-25 15:20:37 -04:00
Matthew Jacobson	3397e8bf09	Alerting: Improve error when receiver or time interval used by rule is deleted (#86865 ) * Alerting: Improve error when receiver used by rule is deleted * Remove RuleUID from public error and data * Improve fallback error in am config post * Refactor to expand to time intervals * Fix message on unchecked errors to be same as before	2024-04-25 13:36:00 -04:00
Santiago	a6be12c037	Alerting: Implement SaveAndApplyConfig in the forked Alertmanager (remote primary) (#84659 ) * Alerting: Implement SaveAndApplyConfiguration in the forked Alertmanager struct * call SaveAndApplyConfig on the remote first, log errors for the internal * add comments explaining why we ignore errors in the internal AM * restore go.work.sum	2024-04-23 15:45:35 +02:00
Steve Simpson	a6ad2380bf	Alerting: Refactor api_prometheus.go request handlers. (#86639 ) This splits the request handlers into two functions, one which is the actual handler and one which is independent from the Grafana `ReqContext` object. This is to make it easier to reuse the implementation in other code. Part of the refactoring changes the functions which get query parameters from the request to operate on a `url.Values` instead of the request object. The change also makes the code consistently use `req.Form` instead of a combination of `req.URL.Query()` and `req.Form`, though I have left `api_ruler` as-is to avoid this PR growing too large.	2024-04-23 14:50:26 +02:00
Santiago	c77ab53819	Alerting: implement SaveAndApplyConfig in the remote Alertmanager struct (#84642 ) * implement SaveAndApplyConfig in the remote Alertmanager struct * remove ID from CreateGrafanaAlertmanagerConfig call * decrypt, test that we decrypt, refactor * fix duplicated declaration in test * rephrase comment, remove unnecessary conversion to slice of bytes * fix test	2024-04-23 14:37:10 +02:00
Santiago	8b7c2a459b	Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary mode) (#85668 ) * Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary) * log the error for the internal AM instead of returning it	2024-04-23 14:36:40 +02:00
Yuri Tseretyan	9735a8a080	Alerting: Distinguish conflict violation errors (#86634 ) * update generator to set ID = 0 and do not set 0 if unique is needed * return proper message when the constraint violation	2024-04-22 12:28:46 -04:00
Julian Siebert	14f018e3fc	Docs: Use correct description for "og_priority" (#80889 )	2024-04-22 13:53:18 +00:00
Steve Simpson	54290f2ac4	Alerting: Fix TestRouteGetRuleStatuses as much as possible. (#86666 ) This test has been skipped for a long time, so it doesn't work anymore. I've fixed the test so it works again, but left some tests disabled which were apparently flaky. If we see the other test cases flaking, we'll have to disable it again. Fixes: - Use fake access control for most test cases, and real one for FGAC test cases. - Check that "file" in API responses the full folder path, not folder title.	2024-04-22 12:36:50 +02:00
Steve Simpson	f07f48616a	Alerting: Fix panic when limit_alerts=0. (#86640 ) Oversight in the TopK function meant if k=0, then we'd panic when checking element zero in the heap, because no items are ever allowed into the heap.	2024-04-22 10:14:19 +02:00
Steve Simpson	6ea97e41fb	Alerting: Consistently return Prometheus-style responses from rules APIs. (#86600 ) * Alerting: Consistently return Prometheus-style responses from rules APIs. This commit is part refactor and part fix. The /rules API occasionally returns error responses which are inconsistent with other error responses. This fixes that, and adds a function to map from Prometheus error type and HTTP code. * Fix integration tests * Linter happiness * Make linter more happy * Fix up one more place returning non-Prometheus responses	2024-04-19 21:03:20 +02:00
Santiago	529f55cfe8	Alerting: Remove isDefault field from receivers (Alertmanager configuration) (#86605 ) Alerting: Remove isDefault field from receivers in the Alertmanager configuration	2024-04-19 15:44:20 +02:00
Santiago	309a7e7684	Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct (#85005 ) * Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct * send the hash of the encrypted configuration * tests, default config hash in AM struct * add missing default config to test * restore build directory * go work file... * fix broken test * remove unnecessary conversion to []byte * go work again... * make things work again with latest main branch changes * update error messages in tests for decrypting config	2024-04-19 15:11:07 +02:00
Santiago	a2ce8fefed	Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager (#86451 ) * Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager * remove '-distroless' from mimir image name	2024-04-19 13:04:18 +02:00
Steve Simpson	5f7612834e	Alerting: Refactoring in api_prometheus.go to allow code reuse. (#86575 ) Preparing these functions to be used by some other part of the codebase, which does not have a `contextmodel.ReqContext`, only the normal request structure (`url.Values`, etc). This is slightly messy because of how Grafana allows url parameters to be in the URL or in the request body, so we need to make sure to invoke the form parsing logic in `ReqContext`.	2024-04-19 12:52:01 +02:00
Steve Simpson	73873f5a8a	Alerting: Optimize rule status gathering APIs when a limit is applied. (#86568 ) * Alerting: Optimize rule status gathering APIs when a limit is applied. The frontend very commonly calls the `/rules` API with `limit_alerts=16`. When there are a very large number of alert instances present, this API is quite slow to respond, and profiling suggests that a big part of the problem is sorting the alerts by importance, in order to select the first 16. This changes the application of the limit to use a more efficient heap-based top-k algorithm. This maintains a slice of only the highest ranked items whilst iterating the full set of alert instances, which substantially reduces the number of comparisons needed. This is particularly effective, as the `AlertsByImportance` comparison is quite complex. I've included a benchmark to compare the new TopK function to the existing Sort/limit strategy. It shows that for small limits, the new approach is much faster, especially at high numbers of alerts, e.g. 100K alerts / limit 16: 1.91s vs 0.02s (-99%) For situations where there is no effective limit, sorting is marginally faster, therefore in the API implementation, if there is either a) no limit or b) no effective limit, then we just sort the alerts as before. There is also a space overhead using a heap which would matter for large limits. * Remove commented test cases * Make linter happy	2024-04-19 11:51:22 +02:00
Matthew Jacobson	a20197229e	Alerting: Prevent simplified routing zero duration GroupInterval and RepeatInterval (#86561 ) Prevent zero duration GroupInterval and RepeatInterval	2024-04-18 21:08:38 -04:00
Matthew Jacobson	71445002b7	Alerting: Fix simplified routing group by override (#86552 ) * Alerting: Fix simplified routing custom group by override Custom group by overrides for simplified routing were missing required fields GroupBy and GroupByAll normally set during upstream Route validation. This fix ensures those missing fields are applied to the generated routes. * Inline GroupBy and GroupByAll initialization instead of normalize after	2024-04-18 21:08:14 -04:00
Matthew Jacobson	533bed6d94	Alerting: Fix simplified routes '...' groupBy creating invalid routes (#86006 ) * Alerting: Fix simplified routes '...' groupBy creating invalid routes There were a few ways to go about this fix: 1. Modifying our copy of upstream validation to allow this 2. Modify our notification settings validation to prevent this 3. Normalize group by on save 4. Normalized group by on generate Option 4. was chosen as the others have a mix of the following cons: - Generated routes risk being incompatible with upstream/remote AM - Awkward FE UX when using '...' - Rule definition changing after save and potential pitfalls with TF With option 4. generated routes stay compatible with external/remote AMs, FE doesn't need to change as we allow mixed '...' and custom label groupBys, and settings we save to db are the same ones requested. In addition, it has the slight benefit of allowing us to hide the internal implementation details of `alertname, grafana_folder` from the user in the future, since we don't need to send them with every FE or TF request. * Safer use of DefaultNotificationSettingsGroupBy * Fix missed API tests	2024-04-16 12:14:39 -04:00

1 2 3 4 5 ...

1423 Commits