grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-25 18:55:37 -06:00

Author	SHA1	Message	Date
William Wernert	10dc6c6d75	Alerting: Add "Keep Last State" backend functionality (#83940 ) * Implement keep last state for state transitions * Respect For duration when keeping state * Only keep transition from recording an annotation * Add keep last state option for nodata/error in UI	2024-03-12 10:00:43 -04:00
Yuri Tseretyan	1eebd2a4de	Alerting: Support for simplified notification settings in rule API (#81011 ) * Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping * Support validation of notification settings when rules are updated * Implement route generator for Alertmanager configuration. That fetches all notification settings. * Update multi-tenant Alertmanager to run the generator before applying the configuration. * Add notification settings labels to state calculation * update the Multi-tenant Alertmanager to provide validation for notification settings * update GET API so only admins can see auto-gen	2024-02-15 09:45:10 -05:00
Yuri Tseretyan	47546a4c72	Alerting: Update API to use folders' full paths (#81214 ) * update GetUserVisibleNamespaces to use FolderSeriver * update GetNamespaceByUID to use FolderService.GetFolders * update GetAlertRulesForScheduling to use FolderService.GetFolders * Update API and GetAlertRulesForScheduling to use the folder's full path * get full path of folder in RouteTestGrafanaRuleConfig * fix escaping of titles for MySQL	2024-02-06 17:12:13 -05:00
Sofia Papagiannaki	d1dab5828d	Alerting: Update rule API to address folders by UID (#74600 ) * Change ruler API to expect the folder UID as namespace * Update example requests * Fix tests * Update swagger * Modify FIle field in /api/prometheus/grafana/api/v1/rules * Fix ruler export * Modify folder in responses to be formatted as <parent UID>/<title> * Add alerting test with nested folders * Apply suggestion from code review * Alerting: use folder UID instead of title in rule API (#77166) Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com> * Drop a few more latent uses of namespace_id * move getNamespaceKey to models package * switch GetAlertRulesForScheduling to use folder table * update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`. * fi tests * add tests for GetAlertRulesForScheduling when parent uid * fix integration tests after merge * fix test after merge * change format of the namespace to JSON array this is needed for forward compatibility, when we migrate to full paths * update EF code to decode nested folder --------- Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com> Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com> Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com> Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com> Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>	2024-01-17 11:07:39 +02:00
William Wernert	48b5ac779b	Alerting/Annotations: Add annotation backend for Loki alert state history (#78156 ) * Move scope type vars to testutil package * Expose parts of state historian for use in annotation backend * Implement Loki ASH Annotation store This store will only implement the `Get` method of a RepositoryImpl since alert state history writes to Loki elsewhere. * Use interface for Loki HTTP Client * Add tests for Loki ASH Annotation store * Add missing test * Fix lint * Organize tests * Add filter tests * Improve tests * Move filter logic into outer function * Fix lint * Add comment * Fix tests * Fix lint * Rename historian store + refactor * Cleanup historian store * Fix tests * Minor cleanup * Use new `ShouldRecordAnnotation` filter * Fix logic and add tests for this check * Fix typos, remove unused variables, `< 1` -> `== 0` * More closely mimic RBAC filter from xorm to ensure correct logic * Move off weaveworks client * Address PR comments	2024-01-10 18:42:35 -05:00
Yuri Tseretyan	f6a46744a6	Alerting: Support hysteresis command expression (#75189 ) Backend: * Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command. - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels. - add rule_fingerprint column to alert_instance - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it. - add AlertingResultsFromRuleState that implements the new interface in eval package - update getExprRequest to patch the hysteresis command. * Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition. Frontend: * Add hysteresis option to Threshold in UI. It's called "Recovery Threshold" * Add test for getUnloadEvaluatorTypeFromCondition * Hide hysteresis in panel expressions * Refactor isInvalid and add test for it * Remove unnecesary React.memo * Add tests for updateEvaluatorConditions --------- Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>	2024-01-04 11:47:13 -05:00
Alexander Weaver	6ee52ac80c	Alerting: Allow more time before Alertmanager expire-resolves alerts (#77094 ) * Sync endsAt factor with prometheus * Fix state tests	2023-10-25 10:03:46 -05:00
Alexander Weaver	acee3efcf9	Alerting: Use common StateReason values for NoData/Error mapped states (#76781 ) Fix hardcoded state reasons	2023-10-18 17:26:41 -05:00
Kyle Brandt	1df4d332c9	SSE: Use errutil to show better error messages in prod (#71658 ) - include public message - propagate data source query errors so they are shown as well to which fixes #70026	2023-07-21 06:38:29 -04:00
George Robinson	815e98ed95	Alerting: Add debug logs for EndsAt timestamp (#70336 ) This commit adds debug logs for previous_ends_at and next_ends_at to state.go to help us debug issues where alerts are resolved in Alertmanager due to expiration. This change is in response to a support escalation where this information was needed but unavailable.	2023-06-20 12:13:38 +03:00
Matthew Jacobson	ba3994d338	Alerting: Repurpose rule testing endpoint to return potential alerts (#69755 ) * Alerting: Repurpose rule testing endpoint to return potential alerts This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations. The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent. This means that the endpoint will, among other things: - Attach static annotations and labels from the rule configuration to the alert instances. - Attach dynamic annotations from the datasource to the alert instances. - Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances. - Interpolate templated annotations / labels and accept allowed template functions.	2023-06-08 18:59:54 -04:00
Matthew Jacobson	b9dc04139a	Alerting: Respect "For" Duration for NoData alerts (#65574 ) * Alerting: Respect "For" Duration for NoData alerts This change modifies `resultNoData` to be more inline with the logic of the other state handlers. The main effects of this are: 1) NoData states with NoDataState config set to Alerting will respect "For" duration. 2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK. 3) Better state transition logging.	2023-03-31 19:05:15 +03:00
Yuri Tseretyan	9d57b1c72e	Alerting: Do not persist noop transition from Normal state. (#61201 ) * add feature flag `alertingNoNormalState` * update instance database to support exclusion of state in list operation * do not save normal state and delete transitions to normal * update get methods to filter out normal state	2023-01-13 18:29:29 -05:00
Alexander Weaver	b289b8ac6e	Alerting: Set error annotation on EvaluationError regardless of underlying error type (#61506 ) Set error annotation regardless of underlying error type	2023-01-13 13:58:02 -06:00
George Robinson	76601f3ae7	Alerting: Better define how we set states (#59977 ) This commit better defines how we set states in resultNormal, resultAlerting, resultError and resultNoData. It changes the existing code to call methods such as SetAlerting, SetPending, SetNormal, SetError and NoData instead of assigning values to each individual field whenever the state is changed. This should make it easier to understand what fields should be set for which states and avoid cases where states are missing, or have additional unexpected fields.	2022-12-08 20:12:13 +00:00
George Robinson	6359dab040	Alerting: Change resultError in preparation for supporting ForError duration (#59894 )	2022-12-07 10:45:56 +00:00
George Robinson	3c249e1b99	Fix incorrect start time for DatasourceError alerts (#59903 )	2022-12-06 18:44:06 +00:00
Yuri Tseretyan	a85adeed96	Alerting: Update state history service to filter states transitions (#58863 ) * rename the method to better reflect its behavior * make historian filter transition on itself * call historian with all changes	2022-12-06 12:33:15 -05:00
Alexander Weaver	2bfdda5b68	Alerting: Break dependency between state and image packages (#58381 ) * Refactor state and manager to not depend directly on image interface * Move generic errors to models package * Move NotAvailableImageService to state as its only references are in state tests * Move NoopImageService to state package * Move mock to state package * Fix linter error * Fix comment styling * Fix a couple added references introduced by rebase * Empty commit to kick build	2022-11-09 15:06:49 -06:00
George Robinson	1290951b65	Alerting: Small improvements to staleResultsHandler (#58007 )	2022-11-09 11:08:32 +00:00
Yuri Tseretyan	623de12e35	Alerting: Create AlertInstanceKey in one place (#58278 ) * use method GetAlertInstanceKey * do not add key if error	2022-11-07 09:35:29 -05:00
Alexander Weaver	cc8c1380e2	Alerting: Persist annotations from multidimensional rules in batches (#56575 ) * Reduce piecemeal state fields * Read data directly off state instead of rule * Unify state and context into single struct * Expose contextual information to layer above setNextState * Work in terms of ContextualState and call historian in batches * Call annotations service in batches * Export format state and reason and remove workaround in unrelated test package * Add new method to annotation service for batch inserting * Fix loop variable aliasing bug caught by linter, didn't change behavior * Incl timerange on annotation tests * Insert one at a time if tags are present * Point to rule from ContextualState rather than copy fields * Build annotations and copy data prior to starting goroutine * Rename to StateTransition * Use new bulk-insert utility * Remove rule from StateTransition and pass in directly to historian * Simplify annotations logic since we have only one rule * Fix logs and context, nilcheck, simplify method name * Regenerate mock	2022-11-04 10:39:26 -05:00
George Robinson	215ffee437	Alerting: Fix screenshot is not taken for stale series (#57982 )	2022-11-02 22:14:22 +00:00
George Robinson	52965de369	Alerting: Add doc comments to state struct and normalize fields (#56647 )	2022-10-11 09:30:33 +01:00
George Robinson	802d67eeca	Alerting: Support values in notification templates (#56457 ) We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't. Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via. This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!	2022-10-10 13:40:21 +01:00
George Robinson	5561f935e6	Alerting: Fix send resolved notifications (#54793 ) This commit fixes a bug where we did not send resolved alerts to Alertmanager for resolved alert instances. This meant that resolved notifications did not have the annotations from the resolved state, and a result did not also have the resolved screenshot.	2022-09-15 17:25:05 +01:00
Yuriy Tseretyan	9f90a7b54d	Alerting: State manager to use InstanceStore (#53852 ) * move saving the state to state manager when scheduler stops * move saving state to ProcessEvalResults * add GetRuleKey to State * add LogContext to AlertRuleKey	2022-08-18 09:40:33 -04:00
George Robinson	34d45977ca	Alerting: Fix bug where state did not change between Alerting and Error (#52204 ) This commit fixes a bug where the state did not change from Alerting to Error if the evaluation result returned an error, or from Error to Alerting if evaluations stopped returning errors.	2022-07-14 10:53:39 +01:00
Joe Blubaugh	9e8efaa459	Alerting: Add stored screenshot utilities to the channels package. (#49470 ) Adds three functions: `withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function. `withStoredImage` does this for an image attached to a specific alert. `openImage` finds and opens an image file on disk. Moves `store.Image` to `models.Image` Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods. Updates all pkg/alert/notifier/channels to use withStoredImage routines.	2022-05-26 13:29:56 +08:00
Joe Blubaugh	1cc034d960	Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259 ) This change adds a field to state.State and models.AlertInstance that indicate the "Reason" that an instance has its current state. This helps us account for cases where the state is "Normal" but the underlying evaluation returned "NoData" or "Error", for example. Fixes #42606 Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>	2022-05-23 16:49:49 +08:00
Joe Blubaugh	687e79538b	Alerting: Add a general screenshot service and alerting-specific image service. (#49293 ) This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service. The screenshot package has the following services, most of which can be composed: BrowserScreenshotService (Takes screenshots with headless Chrome) CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService) NoopScreenshotService (A no-op screenshot service for tests) SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel) ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable) UploadingScreenshotService (A screenshot service that uploads taken screenshots) The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296 This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.	2022-05-22 22:33:49 +08:00
Yuriy Tseretyan	4b417c8f3e	use NaN if condition value is nil (#48370 )	2022-04-27 15:59:13 -03:00
Yuriy Tseretyan	884c885289	Alerting: Support OK option for Error state (#47670 ) * support OK state for Error	2022-04-13 14:45:29 -04:00
gotjosh	cb6124c921	Alerting: Accurately set value for prom-compatible APIs (#47216 ) * Alerting: Accurately set value for prom-compatible APIs Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames. * Fix an extra test * Ensure a consitent ordering * Address review comments * address review comments	2022-04-05 19:36:42 +01:00
George Robinson	79769132c0	Alerting: Alert rule should wait For duration when execution error state is Alerting (#47052 ) Alerting: Alert rule should wait For duration when execution error state is Alerting	2022-03-31 09:57:58 +01:00
gotjosh	84e5f336fe	Alerting: Classic conditions can now display multiple values (#46971 ) * Alerting: Extract classic condition values by RefID * uncapitalise function * update documentation * Update pkg/services/ngalert/eval/extract_md.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update pkg/services/ngalert/state/state.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update pkg/services/ngalert/state/state.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update pkg/services/ngalert/eval/extract_md.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Update pkg/services/ngalert/eval/extract_md.go Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com> * Run prettier Co-authored-by: George Robinson <george.robinson@grafana.com> Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>	2022-03-29 20:33:03 +01:00
gotjosh	a338c78ca8	Alerting: Remove internal labels from prometheus compatible API responses (#46548 ) * Alerting: Remove internal labels from prometheus compatible API responses * Appease the linter * Fix integration tests * Fix API documentation & linter * move removal of internal labels to the models	2022-03-16 16:04:19 +00:00
George Robinson	789cfc31e3	Alerting: Fix use of > instead of >= when checking the For duration (#46011 )	2022-03-01 17:06:42 +00:00
Yuriy Tseretyan	984c95de63	Do not store EvaluationString in Evaluation. (#44606 ) * do not store evaluation string in Evaluation. * reduce number of buckets to store for a single state	2022-02-02 19:18:20 +01:00
George Robinson	5e2280ceee	Add metrics to ngalert scheduler (#44602 ) This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.	2022-01-31 16:56:43 +00:00
George Robinson	c932dc959c	Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630 )	2021-12-03 09:55:16 +00:00
gotjosh	dd5a2e5128	Alerting: Clear alerting rule evaluation errors after intermittent failures (#42386 ) * Alerting: Clear alerting rule evaluation errors after intermittent failures When an alert transitioned in a way that `alerting -> error -> (alerting\|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.	2021-11-26 17:58:19 +00:00
George Robinson	1b26d4d88e	Alerting: Create DatasourceError alert if evaluation returns error (#41869 ) * Alerting: Create DatasourceError alert if evaluation returns error * Alerting: Add docs for DatasourceError alert * Alerting: Fix DatasourceError alert does not have dashboard_uid label * Alerting: Add break when datasource_uid found * Alerting: Update TestProcessEvalResults	2021-11-25 11:46:47 +01:00
Yuriy Tseretyan	610643a668	Alerting: Special alert instance if rule is in state NoData (#40540 ) * do not suppress NoData state * extract conversion of state to postable alert + tests * create a special alert instance if nodata * use NoData when converting from Keep Last State instead of Alerting * add silence during migration if NoData is mapped to KeepLastState.	2021-11-04 16:42:34 -04:00
George Robinson	27609dc2c5	Fix alerts with evaluation interval more than 30 seconds resolving in Alertmanager (#39513 )	2021-09-22 14:55:46 +01:00
gotjosh	dd502f22eb	Alerting: Fix alert flapping in the internal alertmanager (#38648 ) * Alerting: Fix alert flapping in the alertmanager fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager. Mostly due to a combination of `EndsAt` and resend delay. The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard back from the alert generation system. Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself, and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing. This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.	2021-09-02 16:22:59 +01:00
Kyle Brandt	aa904a5a04	NGAlert: Send resolve signal to alertmanager on alerting -> Normal (#37363 )	2021-07-29 20:29:17 +02:00
George Robinson	456dac1303	Expand the value of math and reduce expressions in annotations and labels (#36611 ) * Expand the value of math and reduce expressions in annotations and labels This commit makes it possible to use the values of reduce and math expressions in annotations and labels via their RefIDs. It uses the Stringer interface to ensure that "{{ $values.A }}" still prints the value in decimal format while also making the labels for each RefID available with "{{ $values.A.Labels }}" and the float64 value with "{{ $values.A.Value }}"	2021-07-15 13:10:56 +01:00
David Parrott	19f18bcecc	Alerting: annotation on state change (#36535 ) * WIP * Add annotation on alert state change * move annotation creation to manager * praise the linter! * add debug msg when creating annotation	2021-07-13 09:50:10 -07:00
David Parrott	4732f832f7	Alerting: recalculate EndsAt (#35830 ) * setEndsAt * one more test case * add should clause to tests	2021-06-17 10:01:46 -07:00

1 2

59 Commits