grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-12 00:25:46 -06:00

Author	SHA1	Message	Date
Marcus Efraimsson	e4c1a7a141	Tracing: Standardize on otel tracing (#75528 )	2023-10-03 14:54:20 +02:00
gotjosh	e877174501	Alerting: Expose metrics for Alertmanager Alerts - `grafana_alerting_alertmanager_alerts` (#75802 ) * Alerting: Expose metrics for Alertmanager Alerts In Grafana, the alert evaluation and alert delivery are combined. We're always used a metric named `grafana_alerting_alerts` to get a sense of what are the alerts that are currently firing (these come from the evaluation side) and opted to not map the alertmanager alerts metric directly. I think it's important that we make a disction between alerts that happen at evaluation vs alerts that are received for delivery by the internal Alertmanager as we have options to skip the delivery of these alerts to the internal alertmanager altogether.	2023-10-02 16:36:23 +01:00
George Robinson	ed7d29f2b9	Alerting: Migrate old alerting templates to Go templates (#62911 ) * Migrate old alerting templates to use $labels * Fix imports * Add test coverage and separate rewriting to Go templates * Fix lint * Check for additional closing braces * Add logging of invalid message templates * Fix tests * Small fixes * Update comments * Panic on empty token * Use logtest.Fake * Fix lint * Allow for spaces in variable names by not tokenizing spaces * Add template function to deduplicate Labels in a Value map * Fix behavior of mapLookupString * Reference deduplicated labels in migrated message template * Fix behavior of deduplicateLabelsFunc * Don't create variable for parent logger * Add more tests for deduplicateLabelsFunc * Remove unused function * Apply suggestions from code review Co-authored by: Yuri Tseretyan <yuriy.tseretyan@grafana.com> * Give label val merge function better name * Extract template migration and escape literal tokens * Consolidate + simplify template migration --------- Co-authored-by: William Wernert <william.wernert@grafana.com>	2023-10-02 11:25:33 -04:00
gotjosh	59694fb2be	Alerting: Don't use a separate collection system for metrics (#75296 ) * Alerting: Don't use a separate collection system for metrics The state package had a metric collection system that ran every 15s updating the values of the metrics - there is a common pattern for this in the Prometheus ecosystem called "collectors". I have removed the behaviour of using a time-based interval to "set" the metrics in favour of a set of functions as the "value" that get called at scrape time.	2023-09-25 10:27:30 +01:00
Steve Simpson	894f420014	Alerting: Pass loggers into SchedulerCfg and ManagerCfg. (#75158 )	2023-09-20 15:07:02 +02:00
Ryan McKinley	025b2f3011	Chore: use any rather than interface{} (#74066 )	2023-08-30 18:46:47 +03:00
Yuri Tseretyan	938e26b59f	Alerting: Add new metrics and tracings to state manager and scheduler (#71398 ) * add metrics and tracing to state manager * propagate tracer to state manager * add scheduler metrics * fix backtesting * add test for state metrics * remove StateUpdateCount * update docs * metrics can be null * add tracer to new tests	2023-08-16 09:04:18 +02:00
Yuri Tseretyan	0717ec11d6	Alerting: Update state manager to change all current states in the case when Error\NoData is executed as Ok\Nomal (#68142 )	2023-08-15 10:27:15 -04:00
Yuri Tseretyan	69c8200fc9	Alerting: Add more tests for state manager ProcessEvalResults (#73019 ) Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2023-08-09 12:21:12 -04:00
Yuri Tseretyan	0053b07885	Alerting: Refactor of state manager tests (#72849 ) * calculate cacheID instead of literals * use mocked clocks * advance clocks with the eval results * use clearer timestamp aliases * make expected state labels be more clear to read Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>	2023-08-04 13:39:49 -04:00
Yuri Tseretyan	78fc3bcdf4	Alerting: Fix state manager to not keep datasource_uid and ref_id labels in state after Error (#72216 )	2023-07-26 11:41:46 -04:00
Alexander Weaver	8c8b3ecb5b	Alerting: Add dashboardUID and panelID query parameters for loki state history (#72119 ) * read query parameters * Generate loki query from params	2023-07-24 23:46:46 -05:00
Kyle Brandt	1df4d332c9	SSE: Use errutil to show better error messages in prod (#71658 ) - include public message - propagate data source query errors so they are shown as well to which fixes #70026	2023-07-21 06:38:29 -04:00
Alexander Weaver	ff48a145cc	Alerting: Add exported getters for PanelKey fields (#72064 ) Add getters	2023-07-20 15:47:20 -05:00
Alexander Weaver	d6db9a5b3c	Alerting: Add exported constructor for panelKey (#71872 ) Exported constructor for panelKey	2023-07-18 13:37:43 -05:00
Alexander Weaver	18b910e654	Alerting: Refactor annotation historian to isolate dashboard service dependency (#71689 ) * Refactor annotation historian to isolate dashboard service dependency * Export PanelKey * Don't export parsePanelKey * Remove commented out code	2023-07-18 08:18:55 -05:00
Yuri Tseretyan	64aa5465ac	Alerting: do not expand template for labels\annotations if value is not a template (#71492 )	2023-07-12 14:53:40 -04:00
Alexander Weaver	f94fb765b5	Alerting: Add limit query parameter to Loki-based ASH api, drop default limit from 5000 to 1000, extend visible time range for new ASH UI (#70769 ) * Add limit query parameter * Drop copy paste comment * Extend history query limit to 30 days and 250 entries * Fix history log entries ordering * Update no history message, add empty history test --------- Co-authored-by: Konrad Lalik <konrad.lalik@grafana.com>	2023-06-28 13:32:28 -05:00
George Robinson	594c851d4b	Alerting: Add duration to saving alert states done (#70844 )	2023-06-28 15:19:21 +01:00
William Wernert	4aa477f48f	Alerting: Move rule UID from Loki stream labels into log lines (#70637 ) Move rule uid into log line to reduce cardinality	2023-06-26 09:57:45 -04:00
George Robinson	7edbe72483	Alerting: Support concurrent queries for saving alert instances (#70525 ) This commit adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.	2023-06-23 11:36:07 +01:00
Santiago	d3bb9fbbaf	Alerting: Use only token for images in notifications (#70196 ) * Alerting: Use only tokens for images in notifications * update tests * make linter and modfile validator happy	2023-06-21 20:53:45 -03:00
Alexander Weaver	ce6f73bd32	Alerting: Add two missing tests which cover missing URLs for Loki state history (#70460 ) Add two missing tests which cover individual missing URLs	2023-06-21 12:58:37 -05:00
George Robinson	8a13ee3cd4	Alerting: Add debug logs when saving instances is finished (#70447 )	2023-06-21 14:19:04 +02:00
George Robinson	815e98ed95	Alerting: Add debug logs for EndsAt timestamp (#70336 ) This commit adds debug logs for previous_ends_at and next_ends_at to state.go to help us debug issues where alerts are resolved in Alertmanager due to expiration. This change is in response to a support escalation where this information was needed but unavailable.	2023-06-20 12:13:38 +03:00
SatVeer Singh	1bfa3a0f1e	Chore: Replace go-multierror with errors package (#66432 ) * code refactor and type assertions added to tests * no-lint rule added for specific line	2023-06-19 12:29:45 +03:00
Yuri Tseretyan	baffe83da6	Alerting: Improve performance of cache.getOrCreate (#63909 ) * move expansion of labels and annotations outside of mutex lock * propagate struct but not pointer	2023-06-15 09:37:47 -04:00
Santiago	ff3e028a85	Alerting: Add image URI annotation only when there's an image (#69825 ) * Alerting: Add image URI annotation only when there's an image * fix function name (changed on main branch)	2023-06-09 10:59:24 -03:00
Matthew Jacobson	ba3994d338	Alerting: Repurpose rule testing endpoint to return potential alerts (#69755 ) * Alerting: Repurpose rule testing endpoint to return potential alerts This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations. The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent. This means that the endpoint will, among other things: - Attach static annotations and labels from the rule configuration to the alert instances. - Attach dynamic annotations from the datasource to the alert instances. - Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances. - Interpolate templated annotations / labels and accept allowed template functions.	2023-06-08 18:59:54 -04:00
Santiago	b0881daf23	Alerting: Use URLs in image annotations (#66804 ) * use tokens or urls in image annotations * improve tests, fix some comments * fix empty tokens * code review changes, check for url before checking for token (support old token formats)	2023-04-26 13:06:18 -03:00
Alexander Weaver	3634079b8f	Alerting: Attach hash of instance labels to state history log lines (#65968 ) * Add instanceID which is hash of labels * Rename field to fingerprint * Move to prometheus style signature * Appease linter	2023-04-19 14:22:19 -05:00
Alexander Weaver	a384194e15	Alerting: Use default page size of 5000 when querying Loki for state history (#66315 ) Always specify limit of 5000	2023-04-18 14:31:29 -05:00
Alexander Weaver	cf7157f683	Alerting: Capture refID of rule's condition expression in Loki state history entries (#66419 ) * Capture condition from rule * Add test	2023-04-18 14:21:28 -05:00
Matthew Jacobson	63187fae0c	Alerting: Remove and revert flag alertingBigTransactions (#65976 ) * Alerting: Remove and revert flag alertingBigTransactions This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag. Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count came at the cost of significant increase in CPU usage and query latency. * Fix lint backend * Removed last bits of alertingBigTransactions --------- Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>	2023-04-06 18:06:25 +02:00
Alexander Weaver	fb520edd72	Alerting: Use a completely isolated context for state history writes (#64989 ) * Add fresh context with timeout and same log properties, re-derive logger * Unify timeout constants * Move ctx after shortcut that got added through rebasing * Unify timeouts * Port opentracing's SpanFromContext and ContextFromSpan to the grafana tracing package * Support both opentracing and otel variants * Better document why we're creating a new ctx * Add new func to FakeSpan which was added after rebase * Support grafana-specific traceID key in both tracer implementations	2023-04-04 16:41:46 -05:00
Alexander Weaver	da4832724e	Alerting: Delete stub for SQL alert state history backend (#65667 ) Delete stub for SQL backend	2023-03-31 11:15:56 -05:00
Matthew Jacobson	b9dc04139a	Alerting: Respect "For" Duration for NoData alerts (#65574 ) * Alerting: Respect "For" Duration for NoData alerts This change modifies `resultNoData` to be more inline with the logic of the other state handlers. The main effects of this are: 1) NoData states with NoDataState config set to Alerting will respect "For" duration. 2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK. 3) Better state transition logging.	2023-03-31 19:05:15 +03:00
Steve Simpson	04336d53a9	Alerting: Update prometheus version (#65688 )	2023-03-31 16:34:35 +02:00
Yuri Tseretyan	622c23716a	Alerting: Use logger with context in the state cache (#65663 )	2023-03-31 10:11:30 -04:00
Alexander Weaver	5e87ea745d	Alerting: Fix and re-enable `filters instance labels in log line` test (#65618 ) Fix and reenable test	2023-03-30 09:02:18 -05:00
Dimitris Sotirakis	e758b017d0	Alerting: Disable `filters instance labels in log line` test (#65610 ) * Disable filters instance labels in log line test * Add drone reference	2023-03-30 16:04:29 +03:00
Alexander Weaver	a416100abc	Alerting: No longer index state history log streams by instance labels (#65474 ) * Remove private labels * No longer index by instance labels * Labels are now invariant, only build them once * Remove bucketing since everything is in a single stream * Refactor statesToStreams to only return a single unified log stream * Don't query on labels that no longer exist * Move selector logic to loki layer, genericize client to work in terms of straight logQL * Add support for line-level label filters in query * Combine existing selector tests for better parallelism * Tests for logQL construction * Underscore instead of dot for unwrapping labels in logql	2023-03-29 11:52:11 -05:00
Alexander Weaver	de1637afe5	Alerting: Add alert instance labels to Loki log lines in addition to stream labels (#65403 ) Add instance labels to log line	2023-03-28 08:57:51 -05:00
Alexander Weaver	dd04757fc9	Alerting: Add "backend" label to state history writes metrics (#65395 ) * Add backend label to state history writes metrics * Update test expectations	2023-03-28 08:49:51 -05:00
Serge Zaitsev	0beb768427	Chore: Remove result fields from ngalert (#65410 ) * remove result fields from ngalert * remove duplicate imports	2023-03-28 10:34:35 +02:00
Alexander Weaver	07368dec74	Alerting: Fix attachment of external labels to Loki state history log streams (#65140 ) Fix attachment of external labels, add tests	2023-03-21 18:00:59 -05:00
Alexander Weaver	bf54f2672e	Alerting: Switch to snappy-compressed-protobuf for outgoing push requests to Loki (#65077 ) * Encode with snappy, always * JSON encoder type * Headers * Copy labels formatter from promtail * Implement snappy-proto encoding * Create encoder interface, test both encoders, choose snappy-proto by default * Make encoder configurable at the LokiCfg level * Export both encoders * Touch up comment and tests * Drop unnecessary conversions after move to plain strings to appease linter	2023-03-21 13:38:42 -05:00
Alexander Weaver	cc7e5ce62e	Alerting: Fix ambiguous handling of equals in labels when bucketing Loki state history streams (#65013 ) * Use JSON instead of data.Labels string format as label repr * Drop debug log line	2023-03-21 12:33:27 -05:00
Alexander Weaver	e39d7f44c9	Alerting: Elide requests to Loki if nothing should be recorded (#65011 ) Exit early if no log streams or annotations	2023-03-21 09:30:56 -05:00
Alexander Weaver	40c5713cbd	Vendor errors.Join from Go standard library to avoid version incompatibilities (#64985 ) Vendor errors.Join from std lib	2023-03-17 14:07:58 -05:00

1 2 3 4 5

234 Commits