grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-25 18:55:37 -06:00

Author	SHA1	Message	Date
Alexander Weaver	7a171fd14a	Regenerate openapidocs at 1.21.8 to match ci (#84037 ) * Regenerate openapidocs at 1.21.8 to match ci * Adjust trigger to work on the actual outputted files * Also put go.mod and go.sum in the triggers * manually fix * Make an arbitrary change rather than touching the trigger to force a run * Drop all triggers - run all the time * Print diff - taken from @papagian's PR * Manual fixes to swagger doc --------- Co-authored-by: Ryan McKinley <ryantxu@gmail.com>	2024-03-06 16:08:45 -06:00
Alexander Weaver	d5fda06147	Alerting: Decouple rule routine from scheduler (#84018 ) * create rule factory for more complicated dep injection into rules * Rules get direct access to metrics, logs, traces utilities, use factory in tests * Use clock internal to rule * Use sender, statemanager, evalfactory directly * evalApplied and stopApplied * use schedulableAlertRules behind interface * loaded metrics reader * 3 relevant config options * Drop unused scheduler parameter * Rename ruleRoutine to run * Update READMED * Handle long parameter lists * remove dead branch	2024-03-06 13:44:53 -06:00
Alexander Weaver	1bb38e8f95	Alerting: Move ruleRoutine to be a method on ruleInfo (#83866 ) * Move ruleRoutine to ruleInfo file * Move tests as well * swap ruleInfo and scheduler parameters on ruleRoutine * Fix linter complaint, receiver name	2024-03-04 17:15:55 -06:00
Alexander Weaver	f2a9d0a89d	Alerting: Refactor ruleRoutine to take an entire ruleInfo instance (#83858 ) * Make stop a real method * ruleRoutine takes a ruleInfo reference directly rather than pieces of it * Fix whitespace	2024-03-04 15:15:01 -06:00
Yuri Tseretyan	1eebd2a4de	Alerting: Support for simplified notification settings in rule API (#81011 ) * Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping * Support validation of notification settings when rules are updated * Implement route generator for Alertmanager configuration. That fetches all notification settings. * Update multi-tenant Alertmanager to run the generator before applying the configuration. * Add notification settings labels to state calculation * update the Multi-tenant Alertmanager to provide validation for notification settings * update GET API so only admins can see auto-gen	2024-02-15 09:45:10 -05:00
Alexander Weaver	d4ae10ecc6	Alerting: Small refactor, move unrelated functions out of fetcher (#82459 ) Move unrelated functions out of fetcher	2024-02-14 20:01:32 +02:00
Alexander Weaver	843c477899	Alerting: Add exported API to scheduler to access currently loaded rules (#82031 ) * Add exported API to fetch rule definitions from scheduler * Add comment	2024-02-07 09:31:22 -06:00
Yuri Tseretyan	131c72d655	Alerting: Fix scheduler to group folders by the unique key (orgID and UID) (#81303 )	2024-01-30 17:14:11 -05:00
Alexander Weaver	00a260effa	Alerting: Add setting to distribute rule group evaluations over time (#80766 ) * Simple, per-base-interval jitter * Add log just for test purposes * Add strategy approach, allow choosing between group or rule * Add flag to jitter rules * Add second toggle for jittering within a group * Wire up toggles to strategy * Slightly improve comment ordering * Add tests for offset generation * Rename JitterStrategyFrom * Improve debug log message * Use grafana SDK labels rather than prometheus labels	2024-01-18 12:48:11 -06:00
Alexander Weaver	3c796ecc8f	Alerting: Add metric counting rule groups per org (#80669 ) * Refactor, fix bad map hint * Count groups per org	2024-01-16 16:35:56 -06:00
Alexander Weaver	542741f748	Alerting: Log scheduler maxAttempts, guard against invalid retry counts, log retry errors (#80234 ) * Log maxAttempts, add guard, log retry errors * fix whitespace * Initialize evaluator in TestProcessTicks	2024-01-09 13:19:37 -06:00
Yuri Tseretyan	f6a46744a6	Alerting: Support hysteresis command expression (#75189 ) Backend: * Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command. - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels. - add rule_fingerprint column to alert_instance - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it. - add AlertingResultsFromRuleState that implements the new interface in eval package - update getExprRequest to patch the hysteresis command. * Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition. Frontend: * Add hysteresis option to Threshold in UI. It's called "Recovery Threshold" * Add test for getUnloadEvaluatorTypeFromCondition * Hide hysteresis in panel expressions * Refactor isInvalid and add test for it * Remove unnecesary React.memo * Add tests for updateEvaluatorConditions --------- Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>	2024-01-04 11:47:13 -05:00
gotjosh	c631261681	Alerting: Attempt to retry retryable errors (#79161 ) * Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>	2023-12-06 20:45:08 +00:00
gotjosh	07915703fe	Revert "Alerting: Attempt to retry retryable errors" (#79158 ) Revert "Alerting: Attempt to retry retryable errors (#79037)" This reverts commit `3e51cf0949`.	2023-12-06 19:12:01 +00:00
gotjosh	3e51cf0949	Alerting: Attempt to retry retryable errors (#79037 ) * Alerting: Attempt to retry retryable errors Currently in a draft state, but this was the minimal diff I could put together to exemplify how could achieve this. Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>	2023-12-06 16:35:22 +00:00
Santiago	61cb26711e	Alerting: Fetch alerts from a remote Alertmanager (#75844 ) * Alerting: post alerts to the remote Alertmanager and fetch them * fix broken tests * Alerting: Add Mimir Backend image to devenv (blocks) * add alerting as code owner for mimir_backend block * Alerting: Use Mimir image to run integration tests for the remote Alertmanager * skip integration test when running all tests * skipping integration test when no Alertmanager URL is provided * fix bad host for mimir_backend * remove basic auth testing until we have an nginx image in our CI * add integration tests for alerts * fix tests * change SendCtx -> Send, add context.Context to Send, fix CI * add reover() for functions from the Prometheus Alertmanager HTTP client that could panic * add TODO to implement PutAlerts in a way that mimicks what Prometheus does * fix log format	2023-10-19 11:27:37 +02:00
Marcus Efraimsson	e4c1a7a141	Tracing: Standardize on otel tracing (#75528 )	2023-10-03 14:54:20 +02:00
Steve Simpson	894f420014	Alerting: Pass loggers into SchedulerCfg and ManagerCfg. (#75158 )	2023-09-20 15:07:02 +02:00
Yuri Tseretyan	938e26b59f	Alerting: Add new metrics and tracings to state manager and scheduler (#71398 ) * add metrics and tracing to state manager * propagate tracer to state manager * add scheduler metrics * fix backtesting * add test for state metrics * remove StateUpdateCount * update docs * metrics can be null * add tracer to new tests	2023-08-16 09:04:18 +02:00
Yuri Tseretyan	c7598cc6fb	Alerting: Add ability to control scheduler tick interval via config (#71980 ) * add ability to control scheduler interval via config * add feature flag `configurableSchedulerTick`	2023-07-26 12:44:12 -04:00
Kyle Brandt	f6a28cadbc	Alerting: (Chore/Instrumentation) Add traceID to logs with contextual logger (#71289 ) Alerting: (Chore) Add traceID to logs with contextual logger	2023-07-11 10:59:52 +02:00
SatVeer Singh	1bfa3a0f1e	Chore: Replace go-multierror with errors package (#66432 ) * code refactor and type assertions added to tests * no-lint rule added for specific line	2023-06-19 12:29:45 +03:00
Matthew Jacobson	ba3994d338	Alerting: Repurpose rule testing endpoint to return potential alerts (#69755 ) * Alerting: Repurpose rule testing endpoint to return potential alerts This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations. The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent. This means that the endpoint will, among other things: - Attach static annotations and labels from the rule configuration to the alert instances. - Attach dynamic annotations from the datasource to the alert instances. - Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances. - Interpolate templated annotations / labels and accept allowed template functions.	2023-06-08 18:59:54 -04:00
Yuri Tseretyan	9eb10bee1f	Alerting: Scheduler use rule fingerprint instead of version (#66531 ) * implement calculation of fingerprint for ruleWithFolder * update scheduler to use fingerprint instead of rule's version	2023-04-28 10:42:16 -04:00
gotjosh	1c3ce0735f	Alerting: Tiny refactor on the eval and schedule packages (#66130 ) * Alerting: Tiny refactor on the eval and schedule packages two very small things: - We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext` - The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary. * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com> * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com> --------- Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>	2023-04-06 16:02:28 +01:00
Alexander Weaver	9bcf8819d3	Alerting: Handful of small adjustments to log levels and parameters (#64572 ) Calculate duration earlier in scheduler	2023-03-17 12:15:49 +00:00
Yuri Tseretyan	85a954cd81	Alerting: Update scheduler to get updates only from database (#64635 ) * stop using the scheduler's Update and Delete methods all communication must be via the database * update scheduler's registry to calculate diff before re-setting the cache * update fetcher to return the diff generated by registry * update processTick to update rule eval routine if the rule was updated and it is not going to be evaluated at this tick. * remove references to the scheduler from api package * remove unused methods in the scheduler	2023-03-14 18:02:51 -04:00
Alex Moreno	f60dc4441f	Alerting: Add status label to GroupRules metric (#63454 ) * Add status label to GroupRules metric * Add state (active and paused) label to GrouRules * Add active/paused metrics tests	2023-02-23 12:38:27 +01:00
Steve Simpson	4d1a2c3370	Alerting: Move `rule_groups_rules` metric from State to Scheduler. (#63144 ) The `rule_groups_rules` metric is currently defined and computed by `State`. It makes more sense for this metric to be computed off of the configured rule set, not based on the rule evaluation state. There could be an edge condition where a rule does not have a state yet, and so is uncounted. Additionally, we would like this metric (and others), to have a `rule_group` label, and this is much easier to achieve if the metric is produced from the `Scheduler` package.	2023-02-09 17:05:19 +01:00
Yuri Tseretyan	f066e8cdcd	Alerting: Update to alerting 20230203015918-0e4e2675d7aa (after refactoring) (#62823 ) * add alerting prefix to some packages from alerting that have similar names in prometheus alertmanager	2023-02-03 11:36:49 -05:00
Alex Moreno	7a465f42a6	Alerting: Allow pausing alerts from provisioning (#62263 ) * Allow pausing alerts from provisioning * Update swagger * Add IsPaused to provision export endpoints * Add pause field in sample.yml * Add exception for reset state in first loop iteration of scheduler if rule is paused * Update provision definition and swagger docs * Fix provisioning export tests * Suggestion: Simplify if condition * Add more context to a comment	2023-01-30 16:29:05 +01:00
Serge Zaitsev	d6d4097567	Chore: Fix goimports grouping in alerting (#62424 ) * fix goimports * fix goimports order	2023-01-30 09:55:35 +01:00
Yuri Tseretyan	05bf241952	Alerting: Update state manager to return StateTransitions when Delete or Reset (#62264 ) * update Delete and Reset methods to return state transitions this will be used by notifier code to decide whether alert needs to be sent or not. * update scheduler to provide reason to delete states and use transitions * update FromAlertsStateToStoppedAlert to accept StateTransition and filter by old state * fixup * fix tests	2023-01-27 09:46:21 +01:00
Alex Moreno	531b439cf1	Alerting: Add alert pausing feature (#60734 ) * Add field in alert_rule model, add state to alert_instance model, and state to eval * Remove paused state from eval package * Skip paused alert rules in scheduler * Add migration to add is_paused field to alert_rule table * Convert to postable alerts only if not normal, pernding, or paused * Handle paused eval results in state manager * Add Paused state to eval package * Add paused alerts logic in scheduler * Skip alert on scheduler * Remove paused status from eval package * Apply suggestions from code review Co-authored-by: George Robinson <george.robinson@grafana.com> * Remove state * Rethink schedule and manager for paused alerts * Change return to continue * Remove unused var * Rethink alert pausing * Paused alerts storing annotations * Only add one state transition * Revert boolean method renaming refactor * Revert take image refactor * Make registry errors public * Revert method extraction for getting a folder title * Revert variable renaming refactor * Undo unnecessary changes * Revert changes in test * Remove IsPause check in PatchPartiLAlertRule function * Use SetNormal to set state * Fix text by returning to old behaviour on alert rule deletion * Add test in schedule_unit_test.go to test ticks with paused alerts * Add coment to clarify usage of context.Background() * Add comment to clarify resetStateByRuleUID method usage * Move rule get to a more limited scope * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * rum gofmt on pkg/services/ngalert/schedule/schedule.go * Remove defer cancel for context * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/testing.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * skip scheduler rule state clean up on paused alert rule * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Fix mock in test * Add (hopefully) final suggestions * Use error channel from recordAnnotationsSync to cancel context * Run make gen-cue * Place pause alert check in channel update after version check * Reduce branching un update channel select * Add if for error and move code inside if in state manager ResetStateByRuleUID * Add reason to logs * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Do not delete alert rule routine, just exit on eval if is paused * Reduce branching and create-close a channel to avoid deadlocks * Separate state deletion and state reset (includes history saving) * Add current pause state in rule route in scheduler * Split clearState and bring errCh closer to RecordStatesAsync call * Change rule to ruleMeta in RecordStatesAsync * copy state to be able to modify it * Add timeout to context creation * Shorten the timeout * Use resetState is rule is paused and deleteState if rule is not paused * Remove Empty state reason * Save every rule change in historian * Add tests for DeleteStateByRuleUID and ResetStateByRuleUID * Remove useless line * Remove outdated comment Co-authored-by: George Robinson <george.robinson@grafana.com> Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>	2023-01-26 18:29:10 +01:00
George Robinson	2a291afbae	Alerting: Use consts from alerting package (#61241 )	2023-01-10 19:59:13 +00:00
Yuri Tseretyan	da18c89e91	Alerting: Scheduler to call DeleteAlertRule once when it stops deleted rules (#61189 ) scheduler to call DeleteAlertRule once when it stops deleted rules	2023-01-09 14:39:32 -05:00
Yuri Tseretyan	48f1db63ff	Alerting: Add support for tracing to alerting scheduler (#61057 )	2023-01-06 21:21:43 -05:00
Yuri Tseretyan	abb49d96b5	Alerting: update state manager to return StateTransition instead of State (#58867 ) * improve test for stale states * update state manager return StateTransition * update scheduler to accept state transitions	2022-12-06 13:07:39 -05:00
Alexander Weaver	9977c7ea43	Alerting: Simplify scheduler configuration and remove dependency on Grafana-wide settings (#59735 ) * Make scheduler not depend directly on grafana-wide settings * Re-add missing interval	2022-12-02 16:02:07 -06:00
Yuri Tseretyan	bad4f28d0d	Alerting: update test TestAlertingTicker to not rely on clock (#58544 ) * extract method processTick * make processTick return scheduled rules * move state manager tests to state manager * update test * move all tests into one file * remove unused fields	2022-11-09 15:08:57 -05:00
Yuri Tseretyan	978f1119d7	Alerting: Run state manager as regular sub-service (#58246 )	2022-11-04 17:06:47 -04:00
Ryan McKinley	e6a9fa1cf9	ServiceAccounts: enable service accounts after IsRealUser change (#58263 ) * suppor service accounts * add: IsServiceAccount to scheduleUser in scheduler Co-authored-by: eleijonmarck <eric.leijonmarck@gmail.com>	2022-11-04 15:53:35 -04:00
Eric Leijonmarck	72d0c6b428	Auth: add IsServiceAccount to IsRealUser (#58015 ) * add: IsServiceAccount to SignedInUser and IsRealUser * fix: linting error * refactor: add function IsServiceAccountUser() By adding the function IsServiceAccountUser() we use it to identify for ServiceAccounts in the HasUniqueID() since caching is built up on having a uniqueID, see comment: https://github.com/grafana/grafana/pull/58015#discussion_r1011361880	2022-11-04 12:39:54 +00:00
Yuriy Tseretyan	e3a4bde622	Alerting: Condition evaluator with cached pipeline (#57479 ) * create rule evaluator * load header from the context * init one factory * update scheduler	2022-11-02 10:13:39 -04:00
Yuriy Tseretyan	0a4121cef8	Alerting: Contextual log provider for rule key (#57476 ) * create contextual log context provider * use contextual provider in scheduler * init logger in the package * use context for log context * use context in state manager	2022-10-26 19:16:02 -04:00
Yuriy Tseretyan	f3c219a980	Alerting: update format of logs in scheduler (#57302 ) * Change the severity level of the log messages	2022-10-20 13:43:48 -04:00
Alexander Weaver	3ddb28bad9	Find-and-replace 'err' logs to 'error' to match log search conventions (#57309 )	2022-10-19 17:36:54 -04:00
Alexander Weaver	4eb8e4ff66	Alerting: Add traceability headers for alert queries (#57127 ) * Define EvaluationContext * Refactor ConditionEval to use new context struct * Refactor QueriesAndExpressionsEval to use EvaluationContext * Remove dead field from AlertExecCtx * Refactor Validate to use EvaluationContext * Get rid of privately used AlertExecCtx * Move EvaluationContext to new file and add helper * Add builder pattern and bind rule info to context * Extract header logic and add rule UID header * Fix missing call	2022-10-19 14:19:43 -05:00
Yuriy Tseretyan	ad2a1dd680	Alerting: Start ticker only when scheduler starts (#56339 )	2022-10-05 09:35:02 -04:00
Alexander Weaver	a00879ae21	Alerting: Refactor store to not export its own interface for InstanceStore, delete dead dependency injection (#55772 ) * Add consumer-side store interface to state manager * Remove dead dependency * Delete dead dependency in API struct * Delete store-layer InstanceStore interface * Move fake for state's InstanceStore interface to state package	2022-09-26 13:55:05 -05:00

1 2 3

135 Commits