grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-29 12:14:08 -06:00

Author	SHA1	Message	Date
Matthew Jacobson	babfa2beac	Alerting: Hook up GMA silence APIs to new authentication handler (#86625 ) This PR connects the new RBAC authentication service to existing alertmanager API silence endpoints.	2024-05-03 15:32:30 -04:00
Yuri Tseretyan	052082a927	Alerting: Refactor Alert Rule Generators (#86813 )	2024-04-29 21:52:15 -04:00
Yuri Tseretyan	9735a8a080	Alerting: Distinguish conflict violation errors (#86634 ) * update generator to set ID = 0 and do not set 0 if unique is needed * return proper message when the constraint violation	2024-04-22 12:28:46 -04:00
Yuri Tseretyan	1eebd2a4de	Alerting: Support for simplified notification settings in rule API (#81011 ) * Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping * Support validation of notification settings when rules are updated * Implement route generator for Alertmanager configuration. That fetches all notification settings. * Update multi-tenant Alertmanager to run the generator before applying the configuration. * Add notification settings labels to state calculation * update the Multi-tenant Alertmanager to provide validation for notification settings * update GET API so only admins can see auto-gen	2024-02-15 09:45:10 -05:00
Alexander Weaver	00a260effa	Alerting: Add setting to distribute rule group evaluations over time (#80766 ) * Simple, per-base-interval jitter * Add log just for test purposes * Add strategy approach, allow choosing between group or rule * Add flag to jitter rules * Add second toggle for jittering within a group * Wire up toggles to strategy * Slightly improve comment ordering * Add tests for offset generation * Rename JitterStrategyFrom * Improve debug log message * Use grafana SDK labels rather than prometheus labels	2024-01-18 12:48:11 -06:00
Yuri Tseretyan	f6a46744a6	Alerting: Support hysteresis command expression (#75189 ) Backend: * Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command. - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels. - add rule_fingerprint column to alert_instance - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it. - add AlertingResultsFromRuleState that implements the new interface in eval package - update getExprRequest to patch the hysteresis command. * Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition. Frontend: * Add hysteresis option to Threshold in UI. It's called "Recovery Threshold" * Add test for getUnloadEvaluatorTypeFromCondition * Hide hysteresis in panel expressions * Refactor isInvalid and add test for it * Remove unnecesary React.memo * Add tests for updateEvaluatorConditions --------- Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>	2024-01-04 11:47:13 -05:00
Matthew Jacobson	ce90a1f2be	Alerting: Apply query optimization to eval endpoints (#78566 ) * Alerting: Apply query optimization to eval endpoints Previously, query optimization was applied to alert queries when scheduled but not when ran through `api/v1/eval` or `/api/v1/rule/test/grafana`. This could lead to discrepancies between preview and scheduled alert results.	2023-11-28 19:44:28 -05:00
Yuri Tseretyan	7cec741bae	Alerting: Extract alerting rules authorization logic to a service (#77006 ) * extract alerting authorization logic to separate package * convert authorization logic to service	2023-11-15 18:54:54 +02:00
Yuri Tseretyan	85425b2194	Alerting: Fix flaky test TestExportRules (#77519 ) * fix test to correclty mock data store * Update pkg/services/ngalert/api/api_ruler_export_test.go Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com> * Update pkg/services/ngalert/api/api_ruler_export_test.go --------- Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>	2023-11-01 21:35:04 +02:00
Yuri Tseretyan	027bd9356f	Alerting: Rule Modify Export APIs (#75322 ) * extend RuleStore interface to get namespace by UID * add new export API endpoints * implement request handlers * update authorization and wire handlers to paths * add folder error matchers to errorToResponse * add tests for export methods	2023-10-02 11:47:59 -04:00
Yuri Tseretyan	0053b07885	Alerting: Refactor of state manager tests (#72849 ) * calculate cacheID instead of literals * use mocked clocks * advance clocks with the eval results * use clearer timestamp aliases * make expected state labels be more clear to read Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2023-08-04 13:39:49 -04:00
Yuri Tseretyan	b57ef1f2c7	Alerting: Fix TestIntegration_GetAlertRulesForScheduling to make sure rules are created in different org (#69088 ) make sure rules are created in different org	2023-05-25 13:51:38 -04:00
ismail simsek	91221bc436	Expressions: Fixes the issue showing expressions editor (#62510 ) * Use suggested value for uid * update the snapshot * use __expr__ * replace all -100 with __expr__ * update snapshot * more changes * revert redundant change * Use expr.DatasourceUID where it's possible * generate files	2023-01-31 18:50:10 +01:00
Alex Moreno	531b439cf1	Alerting: Add alert pausing feature (#60734 ) * Add field in alert_rule model, add state to alert_instance model, and state to eval * Remove paused state from eval package * Skip paused alert rules in scheduler * Add migration to add is_paused field to alert_rule table * Convert to postable alerts only if not normal, pernding, or paused * Handle paused eval results in state manager * Add Paused state to eval package * Add paused alerts logic in scheduler * Skip alert on scheduler * Remove paused status from eval package * Apply suggestions from code review Co-authored-by: George Robinson <george.robinson@grafana.com> * Remove state * Rethink schedule and manager for paused alerts * Change return to continue * Remove unused var * Rethink alert pausing * Paused alerts storing annotations * Only add one state transition * Revert boolean method renaming refactor * Revert take image refactor * Make registry errors public * Revert method extraction for getting a folder title * Revert variable renaming refactor * Undo unnecessary changes * Revert changes in test * Remove IsPause check in PatchPartiLAlertRule function * Use SetNormal to set state * Fix text by returning to old behaviour on alert rule deletion * Add test in schedule_unit_test.go to test ticks with paused alerts * Add coment to clarify usage of context.Background() * Add comment to clarify resetStateByRuleUID method usage * Move rule get to a more limited scope * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * rum gofmt on pkg/services/ngalert/schedule/schedule.go * Remove defer cancel for context * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/testing.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * skip scheduler rule state clean up on paused alert rule * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Fix mock in test * Add (hopefully) final suggestions * Use error channel from recordAnnotationsSync to cancel context * Run make gen-cue * Place pause alert check in channel update after version check * Reduce branching un update channel select * Add if for error and move code inside if in state manager ResetStateByRuleUID * Add reason to logs * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Do not delete alert rule routine, just exit on eval if is paused * Reduce branching and create-close a channel to avoid deadlocks * Separate state deletion and state reset (includes history saving) * Add current pause state in rule route in scheduler * Split clearState and bring errCh closer to RecordStatesAsync call * Change rule to ruleMeta in RecordStatesAsync * copy state to be able to modify it * Add timeout to context creation * Shorten the timeout * Use resetState is rule is paused and deleteState if rule is not paused * Remove Empty state reason * Save every rule change in historian * Add tests for DeleteStateByRuleUID and ResetStateByRuleUID * Remove useless line * Remove outdated comment Co-authored-by: George Robinson <george.robinson@grafana.com> Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>	2023-01-26 18:29:10 +01:00
Santiago	e5920c211e	Chore: Fix random indices for slices in test files (#61884 ) * Fix random indices for slices in test files * Empty commit	2023-01-24 15:07:37 -03:00
Matthew Jacobson	23e05373a7	Alerting: Fix flaky TestIntegrationUpdateAlertRules (#61641 ) Prevents random OrgID=0 in test alert generation causing invalid alert rule.	2023-01-17 19:09:46 +00:00
Yuri Tseretyan	abb49d96b5	Alerting: update state manager to return StateTransition instead of State (#58867 ) * improve test for stale states * update state manager return StateTransition * update scheduler to accept state transitions	2022-12-06 13:07:39 -05:00
idafurjes	080ea88af7	Nested Folders: Support getting of nested folder in folder service wh… (#58597 ) * Nested Folders: Support getting of nested folder in folder service when feature flag is set * Fix lint * Fix some tests * Fix ngalert test * ngalert fix * Fix API tests * Fix some tests and lint * Fix lint 2 * Fix library elements and panels * Add access control to get folder * Cleanup and minor test change	2022-11-11 14:28:24 +01:00
Yuri Tseretyan	bad4f28d0d	Alerting: update test TestAlertingTicker to not rely on clock (#58544 ) * extract method processTick * make processTick return scheduled rules * move state manager tests to state manager * update test * move all tests into one file * remove unused fields	2022-11-09 15:08:57 -05:00
Joe Blubaugh	b476ae62fb	Alerting: Write and Delete multiple alert instances. (#55350 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. These new transactions are off by default, guarded by the feature toggle "alertingBigTransactions" Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances.	2022-10-06 14:22:58 +08:00
Yuriy Tseretyan	2d38664fe6	Alerting: Improve validation of query and expressions on rule submit (#53258 ) * Improve error messages of server-side expression * move validation of alert queries and a condition to eval package	2022-09-21 15:14:11 -04:00
Joe Blubaugh	22c937340e	Revert "Alerting: Write and Delete multiple alert instances. (#54072 )" (#54885 ) This reverts commit `5e4fd94413`.	2022-09-09 17:44:06 +02:00
Joe Blubaugh	5e4fd94413	Alerting: Write and Delete multiple alert instances. (#54072 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances. This change also updates some of our tests so that they run successfully against postgreSQL - we were using random Int64s, but postgres integers, which our tables use, max out at 2^31-1	2022-09-02 11:17:20 +08:00
Yuriy Tseretyan	41bd36eb97	Alerting: Update rules delete endpoint to handle rules in group (#53790 ) * update RouteDeleteAlertRules rules to update as a group * remove expecter from scheduler mock to support variadic function * create function to check for provisioning status + tests Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>	2022-08-24 15:33:33 -04:00
Yuriy Tseretyan	5fb778814c	Alerting: Update rules version when folder title is updated (#53013 ) * remove support for bus from scheduler * rename event to FolderTitleUpdated and fire only if title has changed * add method to increase version of all rules that belong to a folder * update ngalert service to subscribe to folder title change event call data store and update scheduler * add tests	2022-08-01 19:28:38 -04:00
Yuriy Tseretyan	054fe54b03	Alerting: Split Scheduler and AlertRouter tests (#52416 ) * move fake FakeExternalAlertmanager to sender package * move tests from scheduler to router * update alerts router to have all fields private * update scheduler tests to use sender mock	2022-07-19 09:32:54 -04:00
Yuriy Tseretyan	e5e8747ee9	Alerting: Update state manager to accept reserved labels (#52189 ) * add tests for cache getOrCreate * update ProcessEvalResults to accept extra lables * extract to getRuleExtraLabels * move populating of constant rule labels to extra labels	2022-07-14 15:59:59 -04:00
Yuriy Tseretyan	4d02f73e5f	Alerting: Persist rule position in the group (#50051 ) Migrations: * add a new column alert_group_idx to alert_rule table * add a new column alert_group_idx to alert_rule_version table * re-index existing rules during migration API: * set group index on update. Use the natural order of items in the array as group index * sort rules in the group on GET * update the version of all rules of all affected groups. This will make optimistic lock work in the case of multiple concurrent request touching the same groups. UI: * update UI to keep the order of alerts in a group	2022-06-22 10:52:46 -04:00
Yuriy Tseretyan	952cb4fc0b	Alerting: introduce AlertRuleGroupKey and use it in API handlers (#48945 ) * create AlertGroupKey structure * update PrometheusSrv. - extract creation of RuleGroup to a separate method. Use group key for grouping * update RuleSrv - update calculateChanges to use groupKey - authorize to use groupkey	2022-05-16 15:45:45 -04:00
Yuriy Tseretyan	4502e40ed8	Alerting: Revert Revert "Alerting: Calculate diff for two AlertRules" (#46034 ) * Revert "Revert "Alerting: Calculate diff for two AlertRules (#45877)" (#46023)" This reverts commit `82aa5acba6`. * remove flakiness	2022-03-01 11:10:29 -05:00
Jean-Philippe Quéméner	82aa5acba6	Revert "Alerting: Calculate diff for two AlertRules (#45877 )" (#46023 ) This reverts commit `4e19d7df63`.	2022-03-01 13:40:47 +01:00
Yuriy Tseretyan	4e19d7df63	Alerting: Calculate diff for two AlertRules (#45877 ) * add custom diff reporter DiffReporter that reports only paths that have a difference * create Diff method for AlertRule that returns DiffReport, which is an alias for []Diff Tests: * create copy method for AlertRule in testing * create GenerateAlertQuery method in testing	2022-02-28 17:13:53 +01:00
Yuriy Tseretyan	f75bea481d	Alerting: validate rules and calculate changes in API controller (#45072 ) * Update API controller - add validation of rules API model - add function to calculate changes between the submitted alerts and existing alerts - update RoutePostNameRulesConfig to validate input models, calculate changes and apply in a transaction * Update DBStore - delete unused storage method. All the logic is moved upstream. - upsert to not modify fields of new by values from the existing alert - if rule has UID do not try to pull it from db. (it is done upstream) * Add rule generator	2022-02-23 11:30:04 -05:00

33 Commits