grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-25 18:30:41 -06:00

Author	SHA1	Message	Date
Yuriy Tseretyan	0a4121cef8	Alerting: Contextual log provider for rule key (#57476 ) * create contextual log context provider * use contextual provider in scheduler * init logger in the package * use context for log context * use context in state manager	2022-10-26 19:16:02 -04:00
Alexander Weaver	de46c1b002	Alerting: Improve logs in state manager and historian (#57374 ) * Touch up log statements, fix casing, add and normalize contexts * Dedicated logger for dashboard resolver * Avoid injecting logger to historian * More minor log touch-ups * Dedicated logger for state manager * Use rule context in annotation creator * Rename base logger and avoid redundant contextual loggers	2022-10-21 16:16:51 -05:00
Alexander Weaver	3ddb28bad9	Find-and-replace 'err' logs to 'error' to match log search conventions (#57309 )	2022-10-19 17:36:54 -04:00
Alexander Weaver	129a28919b	Alerting: Cache result of dashboard ID lookups (#56587 ) * Create caching dashboard resolver * A couple tests for dashboard resolving * Log warning on not found * Additional polish + review nits * Move to singleflight instead of a plain mutex * Store errors instead of -1 in cache and use reflection when reading * Address linter error * One more linter error	2022-10-14 15:48:02 -05:00
George Robinson	52965de369	Alerting: Add doc comments to state struct and normalize fields (#56647 )	2022-10-11 09:30:33 +01:00
George Robinson	802d67eeca	Alerting: Support values in notification templates (#56457 ) We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't. Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via. This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!	2022-10-10 13:40:21 +01:00
Yuriy Tseretyan	e2f1201382	Alerting: Fix migration to not add label "alertname" (#56509 ) * do not add label alertname because it is overridden in state manager anyway * update state manager to not consider labels with same value as dupe	2022-10-07 15:06:53 -04:00
Yuriy Tseretyan	7b6437402a	Alerting: Refactor state manager's cache (#56197 ) * remove ResetAllStates because it's not used * refactor cache to accept logs, metrics and url as method args * update manager Warm method to set the entire state at once * remove unused reset method * introduce ruleStates * change getOrCreate to belong to ruleStates * update Get to not return error	2022-10-06 15:30:12 -04:00
Joe Blubaugh	b476ae62fb	Alerting: Write and Delete multiple alert instances. (#55350 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. These new transactions are off by default, guarded by the feature toggle "alertingBigTransactions" Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances.	2022-10-06 14:22:58 +08:00
Alexander Weaver	8df830557a	Alerting: Move annotation functionality behind a history persistence interface (#56133 ) * Move annotation functionality behind a history persistence interface * Rename to RecordState * Fix lint error in import aliasing * One more import linter error	2022-10-05 15:32:20 -05:00
Alexander Weaver	81b631d1e9	Use separate fake for rule reader (#55835 )	2022-09-27 10:33:32 -05:00
Alexander Weaver	d17ab82b98	Alerting: Break up store.RuleStore interface, delete dead code (#55776 ) * Refactor state manager to not depend on rule store interface * Refactor grafana and proxied ruler APIs to not depend on store.RuleStore * Refactor folder subscription logic to not use store.RuleStore * Delete dead code * Delete store.RuleStore	2022-09-27 08:56:30 -05:00
Alexander Weaver	a00879ae21	Alerting: Refactor store to not export its own interface for InstanceStore, delete dead dependency injection (#55772 ) * Add consumer-side store interface to state manager * Remove dead dependency * Delete dead dependency in API struct * Delete store-layer InstanceStore interface * Move fake for state's InstanceStore interface to state package	2022-09-26 13:55:05 -05:00
Yuriy Tseretyan	879241a48f	Alerting: Fix state manager tests (#55593 )	2022-09-21 13:57:18 -05:00
Yuriy Tseretyan	199996cbf9	Alerting: Resolve stale state + add state reason to notifications (#49352 ) * adds a new reserved annotation `grafana_state_reason` * explicitly resolve stale states	2022-09-21 13:24:47 -04:00
Yuriy Tseretyan	0629d3922a	stop flushing state when Grafana stops (#55504 )	2022-09-21 10:10:17 -04:00
Sofia Papagiannaki	754eea20b3	Chore: SQL store split for annotations (#55089 ) * Chore: SQL store split for annotations * Apply suggestion from code review	2022-09-19 10:54:37 +03:00
George Robinson	5561f935e6	Alerting: Fix send resolved notifications (#54793 ) This commit fixes a bug where we did not send resolved alerts to Alertmanager for resolved alert instances. This meant that resolved notifications did not have the annotations from the resolved state, and a result did not also have the resolved screenshot.	2022-09-15 17:25:05 +01:00
Joe Blubaugh	22c937340e	Revert "Alerting: Write and Delete multiple alert instances. (#54072 )" (#54885 ) This reverts commit `5e4fd94413`.	2022-09-09 17:44:06 +02:00
Joe Blubaugh	5e4fd94413	Alerting: Write and Delete multiple alert instances. (#54072 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances. This change also updates some of our tests so that they run successfully against postgreSQL - we were using random Int64s, but postgres integers, which our tables use, max out at 2^31-1	2022-09-02 11:17:20 +08:00
Yuriy Tseretyan	03e746d9df	Alerting: Delete state from the database on reset (#53919 ) * make ResetStatesByRuleUID return states * delete rule states when reset * rule eval routine to clean up the state only when rule is deleted	2022-08-25 14:12:22 -04:00
Yuriy Tseretyan	9f90a7b54d	Alerting: State manager to use InstanceStore (#53852 ) * move saving the state to state manager when scheduler stops * move saving state to ProcessEvalResults * add GetRuleKey to State * add LogContext to AlertRuleKey	2022-08-18 09:40:33 -04:00
Yuriy Tseretyan	e5e8747ee9	Alerting: Update state manager to accept reserved labels (#52189 ) * add tests for cache getOrCreate * update ProcessEvalResults to accept extra lables * extract to getRuleExtraLabels * move populating of constant rule labels to extra labels	2022-07-14 15:59:59 -04:00
George Robinson	34d45977ca	Alerting: Fix bug where state did not change between Alerting and Error (#52204 ) This commit fixes a bug where the state did not change from Alerting to Error if the evaluation result returned an error, or from Error to Alerting if evaluations stopped returning errors.	2022-07-14 10:53:39 +01:00
Yuriy Tseretyan	a6b1090879	Alerting: refactor scheduler and separate notification logic (#48144 ) * Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router. * Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface. * Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.	2022-07-12 15:13:04 -04:00
Yuriy Tseretyan	4b42cd3c1d	Alerting: State manager to use clock (#51219 ) * manager to use clock, to be able to mock real time	2022-06-22 12:18:42 -04:00
Yuriy Tseretyan	157c12211d	Alerting: State manager to use tick time to determine stale states (#50991 ) * use correct stale timestamp * calculate stale using tick time instead of time.now * remove unused dependency on sql store	2022-06-22 00:16:53 +02:00
gotjosh	0cde283505	Alerting: Logs should not be capitalized and the errors key should be "err" (#50333 ) * Alerting: decapitalize log lines and use "err" as the key for errors Found using (logger\|log).(Warn\|Debug\|Info\|Error)\([A-Z] and (logger\|log).(Warn\|Debug\|Info\|Error)\(.+"error"	2022-06-07 19:54:23 +02:00
Joe Blubaugh	56f40bd413	Alerting: Add Go error message to warning log for screenshots. (#49870 ) Makes debugging problems with alert screenshotting easier.	2022-05-31 20:56:22 +08:00
Joe Blubaugh	9e8efaa459	Alerting: Add stored screenshot utilities to the channels package. (#49470 ) Adds three functions: `withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function. `withStoredImage` does this for an image attached to a specific alert. `openImage` finds and opens an image file on disk. Moves `store.Image` to `models.Image` Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods. Updates all pkg/alert/notifier/channels to use withStoredImage routines.	2022-05-26 13:29:56 +08:00
Joe Blubaugh	1cc034d960	Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259 ) This change adds a field to state.State and models.AlertInstance that indicate the "Reason" that an instance has its current state. This helps us account for cases where the state is "Normal" but the underlying evaluation returned "NoData" or "Error", for example. Fixes #42606 Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>	2022-05-23 16:49:49 +08:00
Joe Blubaugh	1d724810de	Alerting: State Manager takes screenshots. (#49338 ) The State Manager will now take screenshots when an alert instance switches to an Alerting or Resolved state. Signed-off-by: Joe Blubaugh joe.blubaugh@grafana.com	2022-05-23 10:53:41 +08:00
Joe Blubaugh	687e79538b	Alerting: Add a general screenshot service and alerting-specific image service. (#49293 ) This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service. The screenshot package has the following services, most of which can be composed: BrowserScreenshotService (Takes screenshots with headless Chrome) CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService) NoopScreenshotService (A no-op screenshot service for tests) SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel) ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable) UploadingScreenshotService (A screenshot service that uploads taken screenshots) The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296 This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.	2022-05-22 22:33:49 +08:00
George Robinson	43358c7248	Alerting: Keep private annotations across evaluations (#49080 )	2022-05-18 11:21:18 +02:00
Kristin Laemmert	1df340ff28	backend/services: Move GetDashboard from sqlstore to dashboard service (#48971 ) * rename folder to match package name * backend/sqlstore: move GetDashboard into DashboardService This is a stepping-stone commit which copies the GetDashboard function - which lets us remove the sqlstore from the interfaces in dashboards - without changing any other callers. * checkpoint: moving GetDashboard calls into dashboard service * finish refactoring api tests for dashboardService.GetDashboard	2022-05-17 14:52:22 -04:00
Yuriy Tseretyan	4b417c8f3e	use NaN if condition value is nil (#48370 )	2022-04-27 15:59:13 -03:00
George Robinson	c5547123bc	Remove redundant queries in GetAlertRules and GetOrgAlertRules and replace with ListAlertRules (#48108 )	2022-04-25 11:42:42 +01:00
Yuriy Tseretyan	884c885289	Alerting: Support OK option for Error state (#47670 ) * support OK state for Error	2022-04-13 14:45:29 -04:00
gotjosh	cb6124c921	Alerting: Accurately set value for prom-compatible APIs (#47216 ) * Alerting: Accurately set value for prom-compatible APIs Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames. * Fix an extra test * Ensure a consitent ordering * Address review comments * address review comments	2022-04-05 19:36:42 +01:00
George Robinson	79769132c0	Alerting: Alert rule should wait For duration when execution error state is Alerting (#47052 ) Alerting: Alert rule should wait For duration when execution error state is Alerting	2022-03-31 09:57:58 +01:00
gotjosh	84e5f336fe	Alerting: Classic conditions can now display multiple values (#46971 ) * Alerting: Extract classic condition values by RefID * uncapitalise function * update documentation * Update pkg/services/ngalert/eval/extract_md.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update pkg/services/ngalert/state/state.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update pkg/services/ngalert/state/state.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update pkg/services/ngalert/eval/extract_md.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Update pkg/services/ngalert/eval/extract_md.go Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Run prettier Co-authored-by: George Robinson <george.robinson@grafana.com> Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>	2022-03-29 20:33:03 +01:00
gotjosh	a338c78ca8	Alerting: Remove internal labels from prometheus compatible API responses (#46548 ) * Alerting: Remove internal labels from prometheus compatible API responses * Appease the linter * Fix integration tests * Fix API documentation & linter * move removal of internal labels to the models	2022-03-16 16:04:19 +00:00
gotjosh	8d4a0a0396	Alerting: Include annotations in prometheus Alert response. (#45970 ) * Alerting: Include annotations in prometheus Alert response. * add tests * re-order depedencies	2022-03-09 18:20:29 +00:00
George Robinson	789cfc31e3	Alerting: Fix use of > instead of >= when checking the For duration (#46011 )	2022-03-01 17:06:42 +00:00
George Robinson	feae959c9d	Alerting: Create annotation if Firing alert is removed (#45703 ) This commit changes staleResultsHandler to create an annotation if the current state is Alerting and the result is being removed from the state cache as it has not been updated since 2x the evaluation interval.	2022-02-24 16:25:28 +00:00
George Robinson	8d57318941	Alerting: Use expanded labels in dashboard annotations (#45726 )	2022-02-24 10:58:54 +00:00
Yuriy Tseretyan	02f8e99ca1	Alerting: move fake stores to store package (#45428 ) * make fake storage public * move fake storages to store package	2022-02-15 17:24:39 -05:00
George Robinson	67a3e1d6fd	Add context.Context to InstanceStore (#45049 )	2022-02-08 13:49:04 +00:00
George Robinson	a9399ab3cd	Alerting: Add context.Context to RuleStore (#45004 ) Alerting: Add context.Context to RuleStore	2022-02-08 08:52:03 +00:00
idafurjes	7a23700e1a	Remove unused GetDashboard method (#44890 ) * Remove unused GetDashboard method * Uncomment test * Fix dashboard service integration test * Remove comment	2022-02-04 17:21:06 +01:00
Yuriy Tseretyan	984c95de63	Do not store EvaluationString in Evaluation. (#44606 ) * do not store evaluation string in Evaluation. * reduce number of buckets to store for a single state	2022-02-02 19:18:20 +01:00
George Robinson	5e2280ceee	Add metrics to ngalert scheduler (#44602 ) This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.	2022-01-31 16:56:43 +00:00
idafurjes	56c3875bb9	Chore: Remove context.TODO (#43458 ) * Remove context.TODO() from services * Fix live test	2021-12-28 10:26:18 +01:00
George Robinson	c932dc959c	Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630 )	2021-12-03 09:55:16 +00:00
gotjosh	357e9ed1ea	Alerting: Fix Annotation Creation when the alerting state changes (#42479 ) * Fix Annotation creation - Remove validation of panelID, now annotations are created irrespective on whether they're attached to a panel or not. - Alwasy attach the annotation to an AlertID * Fix annotation creation * fix tests	2021-12-01 11:04:54 +00:00
Santiago	a21d1e50f1	avoid template execution errors on missing values (#41617 )	2021-11-29 15:26:51 -03:00
gotjosh	dd5a2e5128	Alerting: Clear alerting rule evaluation errors after intermittent failures (#42386 ) * Alerting: Clear alerting rule evaluation errors after intermittent failures When an alert transitioned in a way that `alerting -> error -> (alerting\|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.	2021-11-26 17:58:19 +00:00
George Robinson	1b26d4d88e	Alerting: Create DatasourceError alert if evaluation returns error (#41869 ) * Alerting: Create DatasourceError alert if evaluation returns error * Alerting: Add docs for DatasourceError alert * Alerting: Fix DatasourceError alert does not have dashboard_uid label * Alerting: Add break when datasource_uid found * Alerting: Update TestProcessEvalResults	2021-11-25 11:46:47 +01:00
Santiago	a45e4ff73f	graphLink and tableLink template functions (#41369 ) * graphLink and tableLink functions, docs updated * Code review changes * extract query struct outside of graphLink and tableLink functions * Fix docs * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Fix linting errors Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com> Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>	2021-11-10 10:36:03 -03:00
Yuriy Tseretyan	610643a668	Alerting: Special alert instance if rule is in state NoData (#40540 ) * do not suppress NoData state * extract conversion of state to postable alert + tests * create a special alert instance if nodata * use NoData when converting from Keep Last State instead of Alerting * add silence during migration if NoData is mapped to KeepLastState.	2021-11-04 16:42:34 -04:00
Yuriy Tseretyan	1b5b747885	Alerting: Additional Tests for State Manager (#41291 ) * rename fakeInstanceStore to FakeInstanceStore * update test for state manager to initialize instance store with FakeInstanceStore	2021-11-04 15:15:56 -04:00
Yuriy Tseretyan	5836def6c2	Alerting: declare constants for __dashboardUid__ and __panelId__ literals (#39976 )	2021-10-07 17:30:06 -04:00
idafurjes	2759b16ef5	Chore: Add context for dashboards (#39844 ) * Add context for dashboards * Remove GetDashboardCtx * Remove ctx.TODO	2021-10-05 13:26:24 +02:00
Santiago	562cd9e44e	Alerting template functions (#39261 ) * Alerting: (wip) add template funcs * Alerting: (wip) numeric template functions * Alerting: (wip) template functions * Test for the "args" function * Alerting: (wip) Documentation for template functions * Alerting: template functions - refactor * code review changes * disable linter error * Use Prometheus implementation of TemplateExpander * Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * change templateCaptureValue to support using template functions * Update pkg/services/ngalert/state/template.go Co-authored-by: gotjosh <josue.abreu@gmail.com> * Test and documentation added for reReplaceAll template function * complete missing functions, documentation and tests * Use the alert instance's evaluation time for expanding the template * strvalue graphlink and tablelink functions * delete duplicate test * make strvalue return an empty string Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> Co-authored-by: gotjosh <josue.abreu@gmail.com>	2021-10-04 15:04:37 -03:00
Sofia Papagiannaki	012d4f0905	Alerting: Remove `ngalert` feature toggle and introduce two new settings for enabling Grafana 8 alerts and disabling them for specific organisations (#38746 ) * Remove `ngalert` feature toggle * Update frontend Remove all references of ngalert feature toggle * Update docs * Disable unified alerting for specific orgs * Add backend tests * Apply suggestions from code review Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Disabled unified alerting by default * Ensure backward compatibility with old ngalert feature toggle * Apply suggestions from code review Co-authored-by: gotjosh <josue@grafana.com>	2021-09-29 16:16:40 +02:00
George Robinson	27609dc2c5	Fix alerts with evaluation interval more than 30 seconds resolving in Alertmanager (#39513 )	2021-09-22 14:55:46 +01:00
gotjosh	fcbcfd232b	Alerting: Move spammy log line to debug in the state manager (#39410 )	2021-09-20 16:05:55 +01:00
Santiago	c3cf95f383	Revert "Alerting: add template funcs (#38404 )" (#39258 ) This reverts commit `d6fb0181fb`.	2021-09-15 19:47:22 -03:00
Santiago	d6fb0181fb	Alerting: add template funcs (#38404 ) * Alerting: (wip) add template funcs * Alerting: (wip) numeric template functions * Alerting: (wip) template functions * Test for the "args" function * Alerting: (wip) Documentation for template functions * Alerting: template functions - refactor * code review changes * disable linter error * Use Prometheus implementation of TemplateExpander * Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>	2021-09-15 18:48:29 -03:00
Marcus Efraimsson	fa9857499b	Chore: GetDashboardQuery should be dispatched using DispatchCtx (#36877 ) * Chore: GetDashboardQuery should be dispatched using DispatchCtx * Fix after merge * Changes after review * Various fixes * Use GetDashboardCtx function instead of GetDashboard	2021-09-14 16:08:04 +02:00
gotjosh	a2f4344bf2	Alerting: Refactor & fix unified alerting metrics structure (#39151 ) * Alerting: Refactor & fix unified alerting metrics structure Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance. This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.	2021-09-14 12:55:01 +01:00
George Robinson	5caf6cb369	Change templateCaptureValue to support using template functions (#38766 ) * Change templateCaptureValue to support using template functions This commit changes templateCaptureValue to use float64 for the value instead of float64. This change means that annotations and labels can use the float64 value with functions such as printf and avoid having to check for nil. It also means that absent values are now printed as 0. Use math.NaN() instead of 0 for absent value	2021-09-08 10:46:15 +01:00
gotjosh	dd502f22eb	Alerting: Fix alert flapping in the internal alertmanager (#38648 ) * Alerting: Fix alert flapping in the alertmanager fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager. Mostly due to a combination of `EndsAt` and resend delay. The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard back from the alert generation system. Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself, and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing. This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.	2021-09-02 16:22:59 +01:00
Arve Knudsen	78596a6756	Migrate to Wire for dependency injection (#32289 ) Fixes #30144 Co-authored-by: dsotirakis <sotirakis.dim@gmail.com> Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com> Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com> Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com> Co-authored-by: Will Browne <wbrowne@users.noreply.github.com> Co-authored-by: Leon Sorokin <leeoniya@gmail.com> Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com> Co-authored-by: spinillos <selenepinillos@gmail.com> Co-authored-by: Karl Persson <kalle.persson@grafana.com> Co-authored-by: Leonard Gram <leo@xlson.com>	2021-08-25 15:11:22 +02:00
Kyle Brandt	aef67994a1	Annotations: Fix alerting annotation coloring (#37412 ) Co-authored-by: Ryan McKinley <ryantxu@gmail.com>	2021-08-12 09:37:54 -07:00
Kyle Brandt	aa904a5a04	NGAlert: Send resolve signal to alertmanager on alerting -> Normal (#37363 )	2021-07-29 20:29:17 +02:00
David Parrott	b5f464412d	Alerting: automatically remove stale alerting states (#36767 ) * initial attempt at automatic removal of stale states * test case, need espected states * finish unit test * PR feedback * still multiply by time.second * pr feedback	2021-07-26 18:12:04 +02:00
George Robinson	2f4c893cf3	Expand the value string in annotations and labels of alerts (#37051 ) This commit makes it possible to use the value string in annotations and labels for alerts with "{{ $value }}"	2021-07-22 15:20:44 +01:00
George Robinson	456dac1303	Expand the value of math and reduce expressions in annotations and labels (#36611 ) * Expand the value of math and reduce expressions in annotations and labels This commit makes it possible to use the values of reduce and math expressions in annotations and labels via their RefIDs. It uses the Stringer interface to ensure that "{{ $values.A }}" still prints the value in decimal format while also making the labels for each RefID available with "{{ $values.A.Labels }}" and the float64 value with "{{ $values.A.Value }}"	2021-07-15 13:10:56 +01:00
David Parrott	19f18bcecc	Alerting: annotation on state change (#36535 ) * WIP * Add annotation on alert state change * move annotation creation to manager * praise the linter! * add debug msg when creating annotation	2021-07-13 09:50:10 -07:00
David Parrott	310d3ebe3d	change template expansion missing value handling (#36679 )	2021-07-13 06:57:18 -07:00
gotjosh	a86ad1190c	Alerting: Refactor state manager as a dependency (#36513 ) * Alerting: Refactor state manager as a dependency Within the scheduler, the state manager was being passed around a certain number of functions. I've introduced it as a dependency to keep the "service" interfaces as clean and homogeneous as possible. This is relevant, because I'm going to introduce live reload of these components as part of my next PR and it is better if dependencies are self-contained. * remove unused functions * Fix a few more tests * Make sure the `stateManager` is declared before the schedule	2021-07-07 17:18:31 +01:00
David Parrott	4732f832f7	Alerting: recalculate EndsAt (#35830 ) * setEndsAt * one more test case * add should clause to tests	2021-06-17 10:01:46 -07:00
Ganesh Vernekar	dcd4bf1615	Alerting: Fill the empty GeneratorURL (#35740 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-16 15:34:12 +05:30
Ganesh Vernekar	8417088969	Alerting: Expand `{{$labels.xyz}}` template in labels and annotations (#35159 ) * Alerting: Expand `{{$labels.xyz}}` template in labels and annotations Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix annotation not updating for same alert Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-03 19:24:36 +02:00
David Parrott	20d356947c	set state correctly and test (#34680 )	2021-05-26 11:37:42 -07:00
David Parrott	7a83d1f9ff	Alerting resend delay for sending to notifiers (#34312 ) * adds resend delay to avoid saturating notifier * correct method signatures * pr feedback	2021-05-19 22:15:09 +02:00
David Parrott	25485100b0	Alerting: Trim results when at processing instead of on ticker (#34248 ) * Trim results when at processing instead of on ticker * User RWMutex correctly * remove comment	2021-05-18 10:56:14 -07:00
David Parrott	bbb7bbf891	Alerting: Remove back end logic for supporting KeepLastState (#34242 ) * Removed back end logic for supporting KeepLastState * Map keep_state correctly in migrations	2021-05-18 10:55:43 -07:00
Kyle Brandt	63b2dd06a5	Alerting: Set "value" with evalmatches in G Managed (#34075 ) When, and currently only when using a classic condition, evaluation information is added (which is like the EvalMatches from dashboard alerting). This is returned via the API and can be included in notifications by reading the `__value__` label attached `.Alerts` in the template. It is a string.	2021-05-18 09:12:39 -04:00
Owen Diehl	1367f7171e	Alerting/ruler metrics (#34144 ) * adds active configurations metric * rule evaluation metrics * ruler metrics * pr feedback	2021-05-14 16:13:44 -04:00
Kyle Brandt	babb17afd6	Alerting/Chore: Move tests from tests package (#34059 ) Instead put in package folder but with package name suffixed with _test This enables code coverage within the pkg while still allow the tests to operate from external to package perspective (only exported things).	2021-05-13 10:05:33 -04:00
Kyle Brandt	fae093bbe2	Alerting: Fix state cache getOrCreate panic (#33777 )	2021-05-06 14:35:52 +02:00
David Parrott	b1a8c67689	Alerting return evaluation errors to /rules (#33663 ) * Set and return errors produced by evaluation results * test fixup	2021-05-04 13:08:12 -04:00
David Parrott	39099bf3c0	Alerting nested state cache (#33666 ) * nest cache by orgID, ruleUID, stateID * update accessors to use new cache structure * test and linter fixup * fix panic Co-authored-by: Kyle Brandt <kyle@grafana.com> * add comment to identify what's going on with nested maps in cache Co-authored-by: Kyle Brandt <kyle@grafana.com>	2021-05-04 09:57:50 -07:00
Kyle Brandt	48358efc13	Alerting: remove State cache entries on Ruler Delete (#33638 ) for https://github.com/grafana/alerting-squad/issues/133	2021-05-03 14:01:33 -04:00
Owen Diehl	070627d11e	better handle metrics for state transitions (#33648 )	2021-05-03 11:57:24 -04:00
Kyle Brandt	7823842c5d	Alerting: Load annotations from rule into State cache (#33542 ) for https://github.com/grafana/alerting-squad/issues/127	2021-04-30 20:23:12 +02:00
Owen Diehl	5e48b54549	Alerting/metrics (#33547 ) * moves alerting metrics to their own pkg * adds grafana_alerting_alerts (by state) metric * alerts_received_{total,invalid} * embed alertmanager alerting struct in ng metrics & remove duplicated notification metrics (already embed alertmanager notifier metrics) * use silence metrics from alertmanager lib * fix - manager has metrics * updates ngalert tests * comment lint Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * cleaner prom registry code * removes ngalert global metrics * new registry use in all tests * ngalert metrics impl service, hack testinfra code to prevent duplicate metric registrations * nilmetrics unexported	2021-04-30 12:28:06 -04:00
Sofia Papagiannaki	1e380e869e	[Alerting]: some fixes (#33538 ) * Fix fialure when adding state annotations * Fix get org rules API Do not fail response if user has no access to view a namespace. Do not include the namespace in the response instead. * lint	2021-04-29 19:15:15 +03:00
Kyle Brandt	914443c816	Alerting: Fix state cache id duplication (#33480 )	2021-04-28 11:42:19 -04:00
David Parrott	788bc2a793	Alerting: refactor state tracker (#33292 ) * set processing time * merge labels and set on response * use state cache for adding alerts to rules * minor cleanup * add support for NoData and Error results * rename test * bring in changes from other PRs tha have been merged * pr feedback * add integration test * close state tracker cleanup on context.Done * fixup test * rename state tracker * set EvaluationDuration on Result * default labels set as constants * separate cache and state from manager * use RWMutex in cache	2021-04-23 21:32:25 +02:00
David Parrott	ca79206498	Alerting: Handle NoData and Error evaluation results (#33194 ) * set processing time * merge labels and set on response * use state cache for adding alerts to rules * minor cleanup * add support for NoData and Error results * rename test * bring in changes from other PRs tha have been merged * pr feedback * add integration test * close state tracker cleanup on context.Done * fixup test * not those annotations	2021-04-23 20:47:52 +02:00
gotjosh	de0802cf3b	Alerting: Fixes the integration test currently failing at master (#33233 ) * Alerting: Fixes the integration test currently failing at master * Skip the state tracker test for now	2021-04-21 14:57:17 -04:00
David Parrott	4be1d84f23	Alerting: Enhancements to /rules (#33085 ) * set processing time * merge labels and set on response * use state cache for adding alerts to rules * minor cleanup * pr feedback * Do not initialize mutex unnecessarily Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * linter Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2021-04-21 09:30:03 -07:00
David Parrott	555da77527	Dparrott/labels on alert rule (#33057 ) * move state tracker tests to /tests * set default labels on alerts * handle empty labels in result.Instance * create annotation on transition to alerting state	2021-04-16 15:11:40 +02:00
David Parrott	567a6a09bd	Alerting: Return RuleResponse for api/prometheus/grafana/api/v1/rules (#32919 ) * Return RuleResponse for api/prometheus/grafana/api/v1/rules * change TODO to note Co-authored-by: gotjosh <josue@grafana.com> * pr feedback * test fixup Co-authored-by: gotjosh <josue@grafana.com>	2021-04-13 17:38:09 -04:00
David Parrott	c0d83fc01e	Alerting: Return cached alerts for prometheus/api/v1/alerts (#32654 ) * Return cached alerts for prometheus/api/v1/alerts * Return not implemented for /prometheus/grafana/api/v1/rules * Set StartsAt for already alerting states * Fix tests	2021-04-05 15:05:39 -07:00
David Parrott	2a8446e435	Alerting: Persist alerts on evaluation and shutdown. Warm cache from DB on startup (#32576 ) * Initial commit for state tracking * basic state transition logic and tests * constructor. test and interface fixup * use new sig for sch.definitionRoutine() * test fixup * make the linter happy * more minor linting cleanup * Alerting: Send alerts from state tracker to notifier * Add evaluation time and test Add evaluation time and test * Add cleanup routine and logging * Pull in compact.go and reconcile differences * Save alert transitions and save all state on shutdown * pr feedback * WIP * WIP * Persist alerts on evaluation and shutdown. Warm cache on startup * Filter non-firing alerts before sending to notifier Co-authored-by: Josue Abreu <josue@grafana.com>	2021-04-02 08:11:33 -07:00
David Parrott	b1cb74c0c9	Alerting: Send alerts from state tracker to notifier, logging, and cleanup task (#32333 ) * Initial commit for state tracking * basic state transition logic and tests * constructor. test and interface fixup * use new sig for sch.definitionRoutine() * test fixup * make the linter happy * more minor linting cleanup * Alerting: Send alerts from state tracker to notifier * Add evaluation time and test Add evaluation time and test * Add cleanup routine and logging * Pull in compact.go and reconcile differences * pr feedback * pr feedback Pull in compact.go and reconcile differences Co-authored-by: Josue Abreu <josue@grafana.com>	2021-03-30 09:37:56 -07:00
David Parrott	d33a77a67f	Alerting: add state tracker to alerting evaluation (#32298 ) * Initial commit for state tracking * basic state transition logic and tests * constructor. test and interface fixup * use new sig for sch.definitionRoutine() * test fixup * make the linter happy * more minor linting cleanup	2021-03-24 15:34:18 -07:00

... 2 3 4 5 6

261 Commits