grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-15 10:03:33 -06:00

Author	SHA1	Message	Date
Yuri Tseretyan	4374966987	Alerting: Replace hardcoded <no value> to [no value] in label expansion (#60129 ) * replace hardcoded <no value> to [no value] in label expansion	2022-12-12 10:12:30 -05:00
George Robinson	76601f3ae7	Alerting: Better define how we set states (#59977 ) This commit better defines how we set states in resultNormal, resultAlerting, resultError and resultNoData. It changes the existing code to call methods such as SetAlerting, SetPending, SetNormal, SetError and NoData instead of assigning values to each individual field whenever the state is changed. This should make it easier to understand what fields should be set for which states and avoid cases where states are missing, or have additional unexpected fields.	2022-12-08 20:12:13 +00:00
George Robinson	6359dab040	Alerting: Change resultError in preparation for supporting ForError duration (#59894 )	2022-12-07 10:45:56 +00:00
George Robinson	3c249e1b99	Fix incorrect start time for DatasourceError alerts (#59903 )	2022-12-06 18:44:06 +00:00
Yuri Tseretyan	abb49d96b5	Alerting: update state manager to return StateTransition instead of State (#58867 ) * improve test for stale states * update state manager return StateTransition * update scheduler to accept state transitions	2022-12-06 13:07:39 -05:00
Yuri Tseretyan	a85adeed96	Alerting: Update state history service to filter states transitions (#58863 ) * rename the method to better reflect its behavior * make historian filter transition on itself * call historian with all changes	2022-12-06 12:33:15 -05:00
Sasha Melentyev	c02003af3c	Refactor time durations (#58484 ) This change uses `time.Second` in place of `1000 * time.Millisecond` and `time.Minute` in place of `60*time.Second`.	2022-11-22 15:09:15 +08:00
Yuri Tseretyan	28d39d35fd	Alerting: Update state manager to save state transitions in one batch (#58358 ) * change stale results handler to not update database but return transitions * save all transitions in one call	2022-11-14 10:57:51 -05:00
George Robinson	c5ae1bcfe0	Alerting: Fix logging pointer address of DashboardUID and PanelID variables (#58539 )	2022-11-10 09:58:38 +00:00
Alexander Weaver	2bfdda5b68	Alerting: Break dependency between state and image packages (#58381 ) * Refactor state and manager to not depend directly on image interface * Move generic errors to models package * Move NotAvailableImageService to state as its only references are in state tests * Move NoopImageService to state package * Move mock to state package * Fix linter error * Fix comment styling * Fix a couple added references introduced by rebase * Empty commit to kick build	2022-11-09 15:06:49 -06:00
Yuri Tseretyan	bad4f28d0d	Alerting: update test TestAlertingTicker to not rely on clock (#58544 ) * extract method processTick * make processTick return scheduled rules * move state manager tests to state manager * update test * move all tests into one file * remove unused fields	2022-11-09 15:08:57 -05:00
George Robinson	1290951b65	Alerting: Small improvements to staleResultsHandler (#58007 )	2022-11-09 11:08:32 +00:00
Yuri Tseretyan	3621cf5a12	Alerting: Update handling of stale state (#58276 ) * delete all stale states in one lock * do not use touched states to detect stale rely only on LastEvaluationTime maintained correctly * fix tests to use correct eval time * delete unused method	2022-11-07 11:03:53 -05:00
Yuri Tseretyan	623de12e35	Alerting: Create AlertInstanceKey in one place (#58278 ) * use method GetAlertInstanceKey * do not add key if error	2022-11-07 09:35:29 -05:00
Yuri Tseretyan	f9c88e72ae	Alerting: Update saveAlertStates in state manager to not return results (#58279 )	2022-11-07 09:09:19 -05:00
Yuri Tseretyan	978f1119d7	Alerting: Run state manager as regular sub-service (#58246 )	2022-11-04 17:06:47 -04:00
Yuri Tseretyan	dce8879145	Alerting: Update state manager to accept rule store as Warm method argument (#58244 )	2022-11-04 14:23:08 -04:00
Alexander Weaver	cc8c1380e2	Alerting: Persist annotations from multidimensional rules in batches (#56575 ) * Reduce piecemeal state fields * Read data directly off state instead of rule * Unify state and context into single struct * Expose contextual information to layer above setNextState * Work in terms of ContextualState and call historian in batches * Call annotations service in batches * Export format state and reason and remove workaround in unrelated test package * Add new method to annotation service for batch inserting * Fix loop variable aliasing bug caught by linter, didn't change behavior * Incl timerange on annotation tests * Insert one at a time if tags are present * Point to rule from ContextualState rather than copy fields * Build annotations and copy data prior to starting goroutine * Rename to StateTransition * Use new bulk-insert utility * Remove rule from StateTransition and pass in directly to historian * Simplify annotations logic since we have only one rule * Fix logs and context, nilcheck, simplify method name * Regenerate mock	2022-11-04 10:39:26 -05:00
Alex Moreno	ba15d675e7	Alerting: Add values to annotations (#57738 ) * Add values to annotations * Fix imports * Use State attrs instead of Result attrs * Remove unnecessary variable	2022-11-03 10:35:34 +01:00
George Robinson	215ffee437	Alerting: Fix screenshot is not taken for stale series (#57982 )	2022-11-02 22:14:22 +00:00
Yuriy Tseretyan	3294918e9f	Alerting: Update state manager to support nil stores and metrics (#57791 )	2022-10-28 13:10:28 -04:00
Yuriy Tseretyan	0a4121cef8	Alerting: Contextual log provider for rule key (#57476 ) * create contextual log context provider * use contextual provider in scheduler * init logger in the package * use context for log context * use context in state manager	2022-10-26 19:16:02 -04:00
Alexander Weaver	de46c1b002	Alerting: Improve logs in state manager and historian (#57374 ) * Touch up log statements, fix casing, add and normalize contexts * Dedicated logger for dashboard resolver * Avoid injecting logger to historian * More minor log touch-ups * Dedicated logger for state manager * Use rule context in annotation creator * Rename base logger and avoid redundant contextual loggers	2022-10-21 16:16:51 -05:00
Alexander Weaver	3ddb28bad9	Find-and-replace 'err' logs to 'error' to match log search conventions (#57309 )	2022-10-19 17:36:54 -04:00
Alexander Weaver	129a28919b	Alerting: Cache result of dashboard ID lookups (#56587 ) * Create caching dashboard resolver * A couple tests for dashboard resolving * Log warning on not found * Additional polish + review nits * Move to singleflight instead of a plain mutex * Store errors instead of -1 in cache and use reflection when reading * Address linter error * One more linter error	2022-10-14 15:48:02 -05:00
George Robinson	52965de369	Alerting: Add doc comments to state struct and normalize fields (#56647 )	2022-10-11 09:30:33 +01:00
George Robinson	802d67eeca	Alerting: Support values in notification templates (#56457 ) We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't. Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via. This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!	2022-10-10 13:40:21 +01:00
Yuriy Tseretyan	e2f1201382	Alerting: Fix migration to not add label "alertname" (#56509 ) * do not add label alertname because it is overridden in state manager anyway * update state manager to not consider labels with same value as dupe	2022-10-07 15:06:53 -04:00
Yuriy Tseretyan	7b6437402a	Alerting: Refactor state manager's cache (#56197 ) * remove ResetAllStates because it's not used * refactor cache to accept logs, metrics and url as method args * update manager Warm method to set the entire state at once * remove unused reset method * introduce ruleStates * change getOrCreate to belong to ruleStates * update Get to not return error	2022-10-06 15:30:12 -04:00
Joe Blubaugh	b476ae62fb	Alerting: Write and Delete multiple alert instances. (#55350 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. These new transactions are off by default, guarded by the feature toggle "alertingBigTransactions" Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances.	2022-10-06 14:22:58 +08:00
Alexander Weaver	8df830557a	Alerting: Move annotation functionality behind a history persistence interface (#56133 ) * Move annotation functionality behind a history persistence interface * Rename to RecordState * Fix lint error in import aliasing * One more import linter error	2022-10-05 15:32:20 -05:00
Alexander Weaver	81b631d1e9	Use separate fake for rule reader (#55835 )	2022-09-27 10:33:32 -05:00
Alexander Weaver	d17ab82b98	Alerting: Break up store.RuleStore interface, delete dead code (#55776 ) * Refactor state manager to not depend on rule store interface * Refactor grafana and proxied ruler APIs to not depend on store.RuleStore * Refactor folder subscription logic to not use store.RuleStore * Delete dead code * Delete store.RuleStore	2022-09-27 08:56:30 -05:00
Alexander Weaver	a00879ae21	Alerting: Refactor store to not export its own interface for InstanceStore, delete dead dependency injection (#55772 ) * Add consumer-side store interface to state manager * Remove dead dependency * Delete dead dependency in API struct * Delete store-layer InstanceStore interface * Move fake for state's InstanceStore interface to state package	2022-09-26 13:55:05 -05:00
Yuriy Tseretyan	879241a48f	Alerting: Fix state manager tests (#55593 )	2022-09-21 13:57:18 -05:00
Yuriy Tseretyan	199996cbf9	Alerting: Resolve stale state + add state reason to notifications (#49352 ) * adds a new reserved annotation `grafana_state_reason` * explicitly resolve stale states	2022-09-21 13:24:47 -04:00
Yuriy Tseretyan	0629d3922a	stop flushing state when Grafana stops (#55504 )	2022-09-21 10:10:17 -04:00
Sofia Papagiannaki	754eea20b3	Chore: SQL store split for annotations (#55089 ) * Chore: SQL store split for annotations * Apply suggestion from code review	2022-09-19 10:54:37 +03:00
George Robinson	5561f935e6	Alerting: Fix send resolved notifications (#54793 ) This commit fixes a bug where we did not send resolved alerts to Alertmanager for resolved alert instances. This meant that resolved notifications did not have the annotations from the resolved state, and a result did not also have the resolved screenshot.	2022-09-15 17:25:05 +01:00
Joe Blubaugh	22c937340e	Revert "Alerting: Write and Delete multiple alert instances. (#54072 )" (#54885 ) This reverts commit `5e4fd94413`.	2022-09-09 17:44:06 +02:00
Joe Blubaugh	5e4fd94413	Alerting: Write and Delete multiple alert instances. (#54072 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances. This change also updates some of our tests so that they run successfully against postgreSQL - we were using random Int64s, but postgres integers, which our tables use, max out at 2^31-1	2022-09-02 11:17:20 +08:00
Yuriy Tseretyan	03e746d9df	Alerting: Delete state from the database on reset (#53919 ) * make ResetStatesByRuleUID return states * delete rule states when reset * rule eval routine to clean up the state only when rule is deleted	2022-08-25 14:12:22 -04:00
Yuriy Tseretyan	9f90a7b54d	Alerting: State manager to use InstanceStore (#53852 ) * move saving the state to state manager when scheduler stops * move saving state to ProcessEvalResults * add GetRuleKey to State * add LogContext to AlertRuleKey	2022-08-18 09:40:33 -04:00
Yuriy Tseretyan	e5e8747ee9	Alerting: Update state manager to accept reserved labels (#52189 ) * add tests for cache getOrCreate * update ProcessEvalResults to accept extra lables * extract to getRuleExtraLabels * move populating of constant rule labels to extra labels	2022-07-14 15:59:59 -04:00
George Robinson	34d45977ca	Alerting: Fix bug where state did not change between Alerting and Error (#52204 ) This commit fixes a bug where the state did not change from Alerting to Error if the evaluation result returned an error, or from Error to Alerting if evaluations stopped returning errors.	2022-07-14 10:53:39 +01:00
Yuriy Tseretyan	a6b1090879	Alerting: refactor scheduler and separate notification logic (#48144 ) * Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router. * Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface. * Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.	2022-07-12 15:13:04 -04:00
Yuriy Tseretyan	4b42cd3c1d	Alerting: State manager to use clock (#51219 ) * manager to use clock, to be able to mock real time	2022-06-22 12:18:42 -04:00
Yuriy Tseretyan	157c12211d	Alerting: State manager to use tick time to determine stale states (#50991 ) * use correct stale timestamp * calculate stale using tick time instead of time.now * remove unused dependency on sql store	2022-06-22 00:16:53 +02:00
gotjosh	0cde283505	Alerting: Logs should not be capitalized and the errors key should be "err" (#50333 ) * Alerting: decapitalize log lines and use "err" as the key for errors Found using (logger\|log).(Warn\|Debug\|Info\|Error)\([A-Z] and (logger\|log).(Warn\|Debug\|Info\|Error)\(.+"error"	2022-06-07 19:54:23 +02:00
Joe Blubaugh	56f40bd413	Alerting: Add Go error message to warning log for screenshots. (#49870 ) Makes debugging problems with alert screenshotting easier.	2022-05-31 20:56:22 +08:00

1 2 3

132 Commits