grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-25 18:30:41 -06:00

Author	SHA1	Message	Date
Yuri Tseretyan	1eebd2a4de	Alerting: Support for simplified notification settings in rule API (#81011 ) * Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping * Support validation of notification settings when rules are updated * Implement route generator for Alertmanager configuration. That fetches all notification settings. * Update multi-tenant Alertmanager to run the generator before applying the configuration. * Add notification settings labels to state calculation * update the Multi-tenant Alertmanager to provide validation for notification settings * update GET API so only admins can see auto-gen	2024-02-15 09:45:10 -05:00
Diego Augusto Molina	9c29e1a783	Alerting: Fix data races and improve testing (#81994 ) * Alerting: fix race condition in (ngalert/sender.ExternalAlertmanager).Run Chore: Fix data races when accessing members of ngalert/state.FakeInstanceStore Chore: Fix data races in tests in ngalert/schedule and enable some parallel tests * Chore: fix linters * Chore: add TODO comment to remove loopvar once we move to Go 1.22	2024-02-14 12:45:39 -03:00
Alexander Weaver	99fa064576	Alerting: Emit warning when creating or updating unusually large groups (#82279 ) * Add config for limit of rules per rule group * Warn when editing big groups through normal API * Warn on prov api writes for groups * Wire up comp root, tests * Also add warning to state manager warm * Drop unnecessary conversion	2024-02-13 08:29:03 -06:00
Ryan McKinley	0c6e409350	Chore: Update arrow and prometheus dependencies (#82215 ) * update arrow and prometheus * keep codeowner * use compare * use grafana-plugin-sdk-go v0.210.0 --------- Co-authored-by: ismail simsek <ismailsimsek09@gmail.com>	2024-02-13 01:50:25 +01:00
Dan Cech	790e1feb93	Chore: Update test database initialization (#81673 ) * streamline initialization of test databases, support on-disk sqlite test db * clean up test databases * introduce testsuite helper * use testsuite everywhere we use a test db * update documentation * improve error handling * disable entity integration test until we can figure out locking error	2024-02-09 09:35:39 -05:00
Yuri Tseretyan	47546a4c72	Alerting: Update API to use folders' full paths (#81214 ) * update GetUserVisibleNamespaces to use FolderSeriver * update GetNamespaceByUID to use FolderService.GetFolders * update GetAlertRulesForScheduling to use FolderService.GetFolders * Update API and GetAlertRulesForScheduling to use the folder's full path * get full path of folder in RouteTestGrafanaRuleConfig * fix escaping of titles for MySQL	2024-02-06 17:12:13 -05:00
Jean-Philippe Quéméner	aa25776f81	Alerting: Add a feature flag to periodically save states (#80987 )	2024-01-23 17:03:30 +01:00
Jean-Philippe Quéméner	eb7e1216a1	feat(alerting): add async state persister (#80763 )	2024-01-22 13:07:11 +01:00
Jean-Philippe Quéméner	82638d059f	feat(alerting): add state persister interface (#80384 )	2024-01-17 13:33:13 +01:00
Sofia Papagiannaki	d1dab5828d	Alerting: Update rule API to address folders by UID (#74600 ) * Change ruler API to expect the folder UID as namespace * Update example requests * Fix tests * Update swagger * Modify FIle field in /api/prometheus/grafana/api/v1/rules * Fix ruler export * Modify folder in responses to be formatted as <parent UID>/<title> * Add alerting test with nested folders * Apply suggestion from code review * Alerting: use folder UID instead of title in rule API (#77166) Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com> * Drop a few more latent uses of namespace_id * move getNamespaceKey to models package * switch GetAlertRulesForScheduling to use folder table * update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`. * fi tests * add tests for GetAlertRulesForScheduling when parent uid * fix integration tests after merge * fix test after merge * change format of the namespace to JSON array this is needed for forward compatibility, when we migrate to full paths * update EF code to decode nested folder --------- Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com> Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com> Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com> Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com> Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>	2024-01-17 11:07:39 +02:00
William Wernert	48b5ac779b	Alerting/Annotations: Add annotation backend for Loki alert state history (#78156 ) * Move scope type vars to testutil package * Expose parts of state historian for use in annotation backend * Implement Loki ASH Annotation store This store will only implement the `Get` method of a RepositoryImpl since alert state history writes to Loki elsewhere. * Use interface for Loki HTTP Client * Add tests for Loki ASH Annotation store * Add missing test * Fix lint * Organize tests * Add filter tests * Improve tests * Move filter logic into outer function * Fix lint * Add comment * Fix tests * Fix lint * Rename historian store + refactor * Cleanup historian store * Fix tests * Minor cleanup * Use new `ShouldRecordAnnotation` filter * Fix logic and add tests for this check * Fix typos, remove unused variables, `< 1` -> `== 0` * More closely mimic RBAC filter from xorm to ensure correct logic * Move off weaveworks client * Address PR comments	2024-01-10 18:42:35 -05:00
Santiago	9e78faa7ba	Alerting: Add metrics to the remote Alertmanager struct (#79835 ) * Alerting: Add metrics to the remote Alertmanager struct * rephrase http_requests_failed description * make linter happy * remove unnecessary metrics * extract timed client to separate package * use histogram collector from dskit * remove weaveworks dependency * capture metrics for all requests to the remote Alertmanager (both clients) * use the timed client in the MimirAuthRoundTripper * HTTPRequestsDuration -> HTTPRequestDuration, clean up mimir client factory function * refactor * less git diff * gauge for last readiness check in seconds * initialize LastReadinesCheck to 0, tweak metric names and descriptions * add counters for sync attempts/errors * last config sync and last state sync timestamps (gauges) * change latency metric name * metric for remote Alertmanager mode * code review comments * move label constants to metrics package	2024-01-10 11:18:24 +01:00
Matthew Jacobson	1d4419fbe4	Alerting: Fix NoData & Error alerts not resolving when rule is reset (#80184 ) * Alerting: Fix NoData & Error alerts not resolving when rule is reset On rule reset, when creating the PostableAlerts StateToPostableAlert did not attach the correct NoData/Error alertname and rulename labels to expire/resolve the active alerts when the previous cached state was NoData/Error.	2024-01-09 14:47:19 -05:00
Alexander Weaver	a8fb01a502	Swap weaveworks/common utilities for equivalents in grafana/dskit (#80051 ) * Replace histogram collector and grpc injectors * Extract request timing utility * Also vendor test file * Suppress erroneous linter warn	2024-01-05 10:08:38 -06:00
Alexander Weaver	90d4704cd7	Alerting: Fix URL timestamp conversion in historian API in annotation mode (#80026 ) Fix timestamp conversion when calling annotation store	2024-01-04 12:40:21 -06:00
Yuri Tseretyan	f6a46744a6	Alerting: Support hysteresis command expression (#75189 ) Backend: * Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command. - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels. - add rule_fingerprint column to alert_instance - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it. - add AlertingResultsFromRuleState that implements the new interface in eval package - update getExprRequest to patch the hysteresis command. * Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition. Frontend: * Add hysteresis option to Threshold in UI. It's called "Recovery Threshold" * Add test for getUnloadEvaluatorTypeFromCondition * Hide hysteresis in panel expressions * Refactor isInvalid and add test for it * Remove unnecesary React.memo * Add tests for updateEvaluatorConditions --------- Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>	2024-01-04 11:47:13 -05:00
Alexander Weaver	65ecde6eed	Alerting: Don't record annotations for mapped NoData transitions, when NoData is mapped to OK (#77164 ) * Exclude mapped nodata transitions when nodata mapped to OK * Fix processEvalResults test * Don't check NoDataState when filtering transition * Add comment to explain purpose of separate function --------- Co-authored-by: William Wernert <william.wernert@grafana.com>	2023-12-18 16:59:32 -05:00
William Wernert	9171bf92bb	Alerting: Add rule ID and title to alert state history Loki entry (#79481 ) * Add rule ID and title to Loki entry * Combine related tests	2023-12-14 13:06:23 -05:00
William Wernert	f7bf818527	Alerting: Make alert state history Loki http client public (#78291 ) * Make state history Loki client public * Make historian metrics subsystem configurable	2023-11-27 09:20:50 -05:00
Ryan McKinley	5d5f8dfc52	Chore: Upgrade Go to 1.21.3 (#77304 )	2023-11-01 09:17:38 -07:00
Alexander Weaver	6ee52ac80c	Alerting: Allow more time before Alertmanager expire-resolves alerts (#77094 ) * Sync endsAt factor with prometheus * Fix state tests	2023-10-25 10:03:46 -05:00
Alexander Weaver	acee3efcf9	Alerting: Use common StateReason values for NoData/Error mapped states (#76781 ) Fix hardcoded state reasons	2023-10-18 17:26:41 -05:00
Marcus Efraimsson	e4c1a7a141	Tracing: Standardize on otel tracing (#75528 )	2023-10-03 14:54:20 +02:00
gotjosh	e877174501	Alerting: Expose metrics for Alertmanager Alerts - `grafana_alerting_alertmanager_alerts` (#75802 ) * Alerting: Expose metrics for Alertmanager Alerts In Grafana, the alert evaluation and alert delivery are combined. We're always used a metric named `grafana_alerting_alerts` to get a sense of what are the alerts that are currently firing (these come from the evaluation side) and opted to not map the alertmanager alerts metric directly. I think it's important that we make a disction between alerts that happen at evaluation vs alerts that are received for delivery by the internal Alertmanager as we have options to skip the delivery of these alerts to the internal alertmanager altogether.	2023-10-02 16:36:23 +01:00
George Robinson	ed7d29f2b9	Alerting: Migrate old alerting templates to Go templates (#62911 ) * Migrate old alerting templates to use $labels * Fix imports * Add test coverage and separate rewriting to Go templates * Fix lint * Check for additional closing braces * Add logging of invalid message templates * Fix tests * Small fixes * Update comments * Panic on empty token * Use logtest.Fake * Fix lint * Allow for spaces in variable names by not tokenizing spaces * Add template function to deduplicate Labels in a Value map * Fix behavior of mapLookupString * Reference deduplicated labels in migrated message template * Fix behavior of deduplicateLabelsFunc * Don't create variable for parent logger * Add more tests for deduplicateLabelsFunc * Remove unused function * Apply suggestions from code review Co-authored by: Yuri Tseretyan <yuriy.tseretyan@grafana.com> * Give label val merge function better name * Extract template migration and escape literal tokens * Consolidate + simplify template migration --------- Co-authored-by: William Wernert <william.wernert@grafana.com>	2023-10-02 11:25:33 -04:00
gotjosh	59694fb2be	Alerting: Don't use a separate collection system for metrics (#75296 ) * Alerting: Don't use a separate collection system for metrics The state package had a metric collection system that ran every 15s updating the values of the metrics - there is a common pattern for this in the Prometheus ecosystem called "collectors". I have removed the behaviour of using a time-based interval to "set" the metrics in favour of a set of functions as the "value" that get called at scrape time.	2023-09-25 10:27:30 +01:00
Steve Simpson	894f420014	Alerting: Pass loggers into SchedulerCfg and ManagerCfg. (#75158 )	2023-09-20 15:07:02 +02:00
Ryan McKinley	025b2f3011	Chore: use any rather than interface{} (#74066 )	2023-08-30 18:46:47 +03:00
Yuri Tseretyan	938e26b59f	Alerting: Add new metrics and tracings to state manager and scheduler (#71398 ) * add metrics and tracing to state manager * propagate tracer to state manager * add scheduler metrics * fix backtesting * add test for state metrics * remove StateUpdateCount * update docs * metrics can be null * add tracer to new tests	2023-08-16 09:04:18 +02:00
Yuri Tseretyan	0717ec11d6	Alerting: Update state manager to change all current states in the case when Error\NoData is executed as Ok\Nomal (#68142 )	2023-08-15 10:27:15 -04:00
Yuri Tseretyan	69c8200fc9	Alerting: Add more tests for state manager ProcessEvalResults (#73019 ) Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2023-08-09 12:21:12 -04:00
Yuri Tseretyan	0053b07885	Alerting: Refactor of state manager tests (#72849 ) * calculate cacheID instead of literals * use mocked clocks * advance clocks with the eval results * use clearer timestamp aliases * make expected state labels be more clear to read Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2023-08-04 13:39:49 -04:00
Yuri Tseretyan	78fc3bcdf4	Alerting: Fix state manager to not keep datasource_uid and ref_id labels in state after Error (#72216 )	2023-07-26 11:41:46 -04:00
Alexander Weaver	8c8b3ecb5b	Alerting: Add dashboardUID and panelID query parameters for loki state history (#72119 ) * read query parameters * Generate loki query from params	2023-07-24 23:46:46 -05:00
Kyle Brandt	1df4d332c9	SSE: Use errutil to show better error messages in prod (#71658 ) - include public message - propagate data source query errors so they are shown as well to which fixes #70026	2023-07-21 06:38:29 -04:00
Alexander Weaver	ff48a145cc	Alerting: Add exported getters for PanelKey fields (#72064 ) Add getters	2023-07-20 15:47:20 -05:00
Alexander Weaver	d6db9a5b3c	Alerting: Add exported constructor for panelKey (#71872 ) Exported constructor for panelKey	2023-07-18 13:37:43 -05:00
Alexander Weaver	18b910e654	Alerting: Refactor annotation historian to isolate dashboard service dependency (#71689 ) * Refactor annotation historian to isolate dashboard service dependency * Export PanelKey * Don't export parsePanelKey * Remove commented out code	2023-07-18 08:18:55 -05:00
Yuri Tseretyan	64aa5465ac	Alerting: do not expand template for labels\annotations if value is not a template (#71492 )	2023-07-12 14:53:40 -04:00
Alexander Weaver	f94fb765b5	Alerting: Add limit query parameter to Loki-based ASH api, drop default limit from 5000 to 1000, extend visible time range for new ASH UI (#70769 ) * Add limit query parameter * Drop copy paste comment * Extend history query limit to 30 days and 250 entries * Fix history log entries ordering * Update no history message, add empty history test --------- Co-authored-by: Konrad Lalik <konrad.lalik@grafana.com>	2023-06-28 13:32:28 -05:00
George Robinson	594c851d4b	Alerting: Add duration to saving alert states done (#70844 )	2023-06-28 15:19:21 +01:00
William Wernert	4aa477f48f	Alerting: Move rule UID from Loki stream labels into log lines (#70637 ) Move rule uid into log line to reduce cardinality	2023-06-26 09:57:45 -04:00
George Robinson	7edbe72483	Alerting: Support concurrent queries for saving alert instances (#70525 ) This commit adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.	2023-06-23 11:36:07 +01:00
Santiago	d3bb9fbbaf	Alerting: Use only token for images in notifications (#70196 ) * Alerting: Use only tokens for images in notifications * update tests * make linter and modfile validator happy	2023-06-21 20:53:45 -03:00
Alexander Weaver	ce6f73bd32	Alerting: Add two missing tests which cover missing URLs for Loki state history (#70460 ) Add two missing tests which cover individual missing URLs	2023-06-21 12:58:37 -05:00
George Robinson	8a13ee3cd4	Alerting: Add debug logs when saving instances is finished (#70447 )	2023-06-21 14:19:04 +02:00
George Robinson	815e98ed95	Alerting: Add debug logs for EndsAt timestamp (#70336 ) This commit adds debug logs for previous_ends_at and next_ends_at to state.go to help us debug issues where alerts are resolved in Alertmanager due to expiration. This change is in response to a support escalation where this information was needed but unavailable.	2023-06-20 12:13:38 +03:00
SatVeer Singh	1bfa3a0f1e	Chore: Replace go-multierror with errors package (#66432 ) * code refactor and type assertions added to tests * no-lint rule added for specific line	2023-06-19 12:29:45 +03:00
Yuri Tseretyan	baffe83da6	Alerting: Improve performance of cache.getOrCreate (#63909 ) * move expansion of labels and annotations outside of mutex lock * propagate struct but not pointer	2023-06-15 09:37:47 -04:00
Santiago	ff3e028a85	Alerting: Add image URI annotation only when there's an image (#69825 ) * Alerting: Add image URI annotation only when there's an image * fix function name (changed on main branch)	2023-06-09 10:59:24 -03:00

1 2 3 4 5 ...

256 Commits