grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-25 18:30:41 -06:00

Author	SHA1	Message	Date
Yuri Tseretyan	05d6813a09	Alerting: Fix scheduler to sort rules before evaluation (#88006 ) sort rules scheduled for evaluation to make sure that the order is stable between evaluations. This is especially important in HA mode.	2024-05-17 11:38:19 -04:00
Yuri Tseretyan	f410c7fca1	Alerting: use logger with same context within rule scheduling loop (#87934 )	2024-05-15 15:38:00 -04:00
Alexander Weaver	a6a9ab4008	Alerting: Do not store series values from past evaluations in state manager for no reason (#87525 ) Do not store previous execution results on states	2024-05-09 15:51:55 -05:00
Alexander Weaver	36ef611cf4	Alerting: Add database migration for recording rule fields (#87012 ) * Create recording rule fields in model * Add migration * Write to database, support in version table * extend fingerprint * Force fields to be empty on validate * Another storage spot, tests for fingerprint * Explicitly set defaults in provisioning API * Tests for main API validation * Add diff tests even though fields are unpopulated for now * Use struct tag approach instead of FromDB/ToDB hooks as it better handles nulls when deserializing * test for deser * Backout RecordTo for now since it's not decided in the doc * back out of migration too * Drop datasourceref for now * address linter complaints * Try a single outer struct with all fields embedded	2024-05-09 12:12:44 -05:00
Yuri Tseretyan	052082a927	Alerting: Refactor Alert Rule Generators (#86813 )	2024-04-29 21:52:15 -04:00
Steve Simpson	ad7f804255	Alerting: Fix evaluation metrics to not count retries (#85873 ) * Change evaluation metrics to only count once per eval, and add new metrics. * Cosmetic: Move eval total Inc() to orginal place.	2024-04-12 16:20:46 +02:00
Dave Henderson	5687243d0b	Feature Flags: use FeatureToggles interface where possible (#85131 ) * Feature Flags: use FeatureToggles interface where possible Signed-off-by: Dave Henderson <dave.henderson@grafana.com> * Replace TestFeatureToggles with existing WithFeatures Signed-off-by: Dave Henderson <dave.henderson@grafana.com> --------- Signed-off-by: Dave Henderson <dave.henderson@grafana.com>	2024-04-04 12:22:31 -04:00
Benoit Tigeot	6f38ac6615	Alerting: Reduce set of fields that could trigger alert state change (#83496 ) We want to avoid too much change of alert state based on change on alert's fields. For that we ignore some fields from the diff.	2024-03-26 12:35:30 -04:00
ismail simsek	6137c4e0a6	Chore: Bump golangci-lint v1.57.1 (#84998 ) * bump golangci-lint v1.57.1 * update setting * remove goconst * fix linting issues * prettier * fix G601 * go mod tidy go work sync	2024-03-25 15:28:24 +01:00
Alexander Weaver	6c5e94095d	Alerting: Scheduler and registry handle rules by an interface (#84044 ) * export Evaluation * Export Evaluation * Export RuleVersionAndPauseStatus * export Eval, create interface * Export update and add to interface * Export Stop and Run and add to interface * Registry and scheduler use rule by interface and not concrete type * Update factory to use interface, update tests to work over public API rather than writing to channels directly * Rename map in registry * Rename getOrCreateInfo to not reference a specific implementation * Genericize alertRuleInfoRegistry into ruleRegistry * Rename alertRuleInfo to alertRule * Comments on interface * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com> --------- Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>	2024-03-11 22:57:38 +02:00
Alexander Weaver	201f5d3ac9	Alerting: Extract large closures in ruleRoutine (#84035 ) * extract notify * extract resetState * move evaluate metrics inside evaluate * split out evaluate	2024-03-06 16:39:23 -06:00
Alexander Weaver	7a171fd14a	Regenerate openapidocs at 1.21.8 to match ci (#84037 ) * Regenerate openapidocs at 1.21.8 to match ci * Adjust trigger to work on the actual outputted files * Also put go.mod and go.sum in the triggers * manually fix * Make an arbitrary change rather than touching the trigger to force a run * Drop all triggers - run all the time * Print diff - taken from @papagian's PR * Manual fixes to swagger doc --------- Co-authored-by: Ryan McKinley <ryantxu@gmail.com>	2024-03-06 16:08:45 -06:00
Alexander Weaver	d5fda06147	Alerting: Decouple rule routine from scheduler (#84018 ) * create rule factory for more complicated dep injection into rules * Rules get direct access to metrics, logs, traces utilities, use factory in tests * Use clock internal to rule * Use sender, statemanager, evalfactory directly * evalApplied and stopApplied * use schedulableAlertRules behind interface * loaded metrics reader * 3 relevant config options * Drop unused scheduler parameter * Rename ruleRoutine to run * Update READMED * Handle long parameter lists * remove dead branch	2024-03-06 13:44:53 -06:00
Alexander Weaver	1bb38e8f95	Alerting: Move ruleRoutine to be a method on ruleInfo (#83866 ) * Move ruleRoutine to ruleInfo file * Move tests as well * swap ruleInfo and scheduler parameters on ruleRoutine * Fix linter complaint, receiver name	2024-03-04 17:15:55 -06:00
Alexander Weaver	f2a9d0a89d	Alerting: Refactor ruleRoutine to take an entire ruleInfo instance (#83858 ) * Make stop a real method * ruleRoutine takes a ruleInfo reference directly rather than pieces of it * Fix whitespace	2024-03-04 15:15:01 -06:00
Alexander Weaver	fa51724bc6	Alerting: Move alertRuleInfo and tests to new files (#83854 ) Move ruleinfo and tests to new files	2024-03-04 11:24:49 -06:00
William Wernert	fabaff9a24	Alerting: Create metric for rules using simple notifications (#82904 ) --------- Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2024-02-16 19:01:49 +02:00
Yuri Tseretyan	1eebd2a4de	Alerting: Support for simplified notification settings in rule API (#81011 ) * Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping * Support validation of notification settings when rules are updated * Implement route generator for Alertmanager configuration. That fetches all notification settings. * Update multi-tenant Alertmanager to run the generator before applying the configuration. * Add notification settings labels to state calculation * update the Multi-tenant Alertmanager to provide validation for notification settings * update GET API so only admins can see auto-gen	2024-02-15 09:45:10 -05:00
Alexander Weaver	d4ae10ecc6	Alerting: Small refactor, move unrelated functions out of fetcher (#82459 ) Move unrelated functions out of fetcher	2024-02-14 20:01:32 +02:00
Diego Augusto Molina	ff08c0a790	Chore: improve test readability in ngalert/schedule (#82453 ) Chore: improve test readability	2024-02-14 14:53:32 -03:00
Diego Augusto Molina	9c29e1a783	Alerting: Fix data races and improve testing (#81994 ) * Alerting: fix race condition in (ngalert/sender.ExternalAlertmanager).Run Chore: Fix data races when accessing members of ngalert/state.FakeInstanceStore Chore: Fix data races in tests in ngalert/schedule and enable some parallel tests * Chore: fix linters * Chore: add TODO comment to remove loopvar once we move to Go 1.22	2024-02-14 12:45:39 -03:00
Alexander Weaver	5bbe9c6e61	Alerting: Enable group-level rule evaluation jittering by default, remove feature toggle (#82212 ) * remove jitter feature flag * Add an out so users can manually disable jitter * Pass in cfg * Add TODO to remove knob in future	2024-02-09 15:53:58 -06:00
Alexander Weaver	843c477899	Alerting: Add exported API to scheduler to access currently loaded rules (#82031 ) * Add exported API to fetch rule definitions from scheduler * Add comment	2024-02-07 09:31:22 -06:00
Ashley Harrison	39057552dc	QueryField: Handle autocomplete better (#81484 ) * extract out function + add unit tests * add feature toggle and default it to on	2024-01-31 10:01:20 +00:00
Yuri Tseretyan	131c72d655	Alerting: Fix scheduler to group folders by the unique key (orgID and UID) (#81303 )	2024-01-30 17:14:11 -05:00
Alexander Weaver	18b9c8fd5f	Alerting: Nilcheck JitterStrategyFrom so it can be used in contexts without feature toggles (#80841 ) Nilcheck so tests can have a nil feature toggles	2024-01-18 15:43:41 -06:00
Alexander Weaver	00a260effa	Alerting: Add setting to distribute rule group evaluations over time (#80766 ) * Simple, per-base-interval jitter * Add log just for test purposes * Add strategy approach, allow choosing between group or rule * Add flag to jitter rules * Add second toggle for jittering within a group * Wire up toggles to strategy * Slightly improve comment ordering * Add tests for offset generation * Rename JitterStrategyFrom * Improve debug log message * Use grafana SDK labels rather than prometheus labels	2024-01-18 12:48:11 -06:00
Jean-Philippe Quéméner	82638d059f	feat(alerting): add state persister interface (#80384 )	2024-01-17 13:33:13 +01:00
Alexander Weaver	3c796ecc8f	Alerting: Add metric counting rule groups per org (#80669 ) * Refactor, fix bad map hint * Count groups per org	2024-01-16 16:35:56 -06:00
Alexander Weaver	542741f748	Alerting: Log scheduler maxAttempts, guard against invalid retry counts, log retry errors (#80234 ) * Log maxAttempts, add guard, log retry errors * fix whitespace * Initialize evaluator in TestProcessTicks	2024-01-09 13:19:37 -06:00
Yuri Tseretyan	f6a46744a6	Alerting: Support hysteresis command expression (#75189 ) Backend: * Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command. - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels. - add rule_fingerprint column to alert_instance - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it. - add AlertingResultsFromRuleState that implements the new interface in eval package - update getExprRequest to patch the hysteresis command. * Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition. Frontend: * Add hysteresis option to Threshold in UI. It's called "Recovery Threshold" * Add test for getUnloadEvaluatorTypeFromCondition * Hide hysteresis in panel expressions * Refactor isInvalid and add test for it * Remove unnecesary React.memo * Add tests for updateEvaluatorConditions --------- Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>	2024-01-04 11:47:13 -05:00
gotjosh	c631261681	Alerting: Attempt to retry retryable errors (#79161 ) * Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>	2023-12-06 20:45:08 +00:00
gotjosh	07915703fe	Revert "Alerting: Attempt to retry retryable errors" (#79158 ) Revert "Alerting: Attempt to retry retryable errors (#79037)" This reverts commit `3e51cf0949`.	2023-12-06 19:12:01 +00:00
gotjosh	3e51cf0949	Alerting: Attempt to retry retryable errors (#79037 ) * Alerting: Attempt to retry retryable errors Currently in a draft state, but this was the minimal diff I could put together to exemplify how could achieve this. Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>	2023-12-06 16:35:22 +00:00
Santiago	61cb26711e	Alerting: Fetch alerts from a remote Alertmanager (#75844 ) * Alerting: post alerts to the remote Alertmanager and fetch them * fix broken tests * Alerting: Add Mimir Backend image to devenv (blocks) * add alerting as code owner for mimir_backend block * Alerting: Use Mimir image to run integration tests for the remote Alertmanager * skip integration test when running all tests * skipping integration test when no Alertmanager URL is provided * fix bad host for mimir_backend * remove basic auth testing until we have an nginx image in our CI * add integration tests for alerts * fix tests * change SendCtx -> Send, add context.Context to Send, fix CI * add reover() for functions from the Prometheus Alertmanager HTTP client that could panic * add TODO to implement PutAlerts in a way that mimicks what Prometheus does * fix log format	2023-10-19 11:27:37 +02:00
Marcus Efraimsson	e4c1a7a141	Tracing: Standardize on otel tracing (#75528 )	2023-10-03 14:54:20 +02:00
Steve Simpson	894f420014	Alerting: Pass loggers into SchedulerCfg and ManagerCfg. (#75158 )	2023-09-20 15:07:02 +02:00
Will Browne	e855efb13d	Plugins: Move store and plugin dto to pluginsintegration (#74655 ) move store and plugin dto	2023-09-11 13:59:24 +02:00
Ryan McKinley	025b2f3011	Chore: use any rather than interface{} (#74066 )	2023-08-30 18:46:47 +03:00
Yuri Tseretyan	938e26b59f	Alerting: Add new metrics and tracings to state manager and scheduler (#71398 ) * add metrics and tracing to state manager * propagate tracer to state manager * add scheduler metrics * fix backtesting * add test for state metrics * remove StateUpdateCount * update docs * metrics can be null * add tracer to new tests	2023-08-16 09:04:18 +02:00
Yuri Tseretyan	c7598cc6fb	Alerting: Add ability to control scheduler tick interval via config (#71980 ) * add ability to control scheduler interval via config * add feature flag `configurableSchedulerTick`	2023-07-26 12:44:12 -04:00
Will Browne	a8577c21ba	Plugins: Migrate PluginStore mock to pre-existing fakes package (#71664 ) * migrate to existing fakes package * fix imports	2023-07-17 10:21:44 +00:00
Kyle Brandt	f6a28cadbc	Alerting: (Chore/Instrumentation) Add traceID to logs with contextual logger (#71289 ) Alerting: (Chore) Add traceID to logs with contextual logger	2023-07-11 10:59:52 +02:00
Yuri Tseretyan	ada325de2a	Alerting: Use unsafe.Slice for hashing a string during rule fingerprint calculation (#71000 )	2023-06-30 14:58:23 -04:00
George Robinson	7edbe72483	Alerting: Support concurrent queries for saving alert instances (#70525 ) This commit adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.	2023-06-23 11:36:07 +01:00
SatVeer Singh	1bfa3a0f1e	Chore: Replace go-multierror with errors package (#66432 ) * code refactor and type assertions added to tests * no-lint rule added for specific line	2023-06-19 12:29:45 +03:00
Matthew Jacobson	ba3994d338	Alerting: Repurpose rule testing endpoint to return potential alerts (#69755 ) * Alerting: Repurpose rule testing endpoint to return potential alerts This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations. The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent. This means that the endpoint will, among other things: - Attach static annotations and labels from the rule configuration to the alert instances. - Attach dynamic annotations from the datasource to the alert instances. - Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances. - Interpolate templated annotations / labels and accept allowed template functions.	2023-06-08 18:59:54 -04:00
Yuri Tseretyan	9eb10bee1f	Alerting: Scheduler use rule fingerprint instead of version (#66531 ) * implement calculation of fingerprint for ruleWithFolder * update scheduler to use fingerprint instead of rule's version	2023-04-28 10:42:16 -04:00
Santiago	b0881daf23	Alerting: Use URLs in image annotations (#66804 ) * use tokens or urls in image annotations * improve tests, fix some comments * fix empty tokens * code review changes, check for url before checking for token (support old token formats)	2023-04-26 13:06:18 -03:00
Kyle Brandt	840fb32ad8	SSE: (Instrumentation) Add Tracing (#66700 ) spans are prefixed `SSE.`	2023-04-18 08:04:51 -04:00

1 2 3 4 5

221 Commits