grafana

mirror of https://github.com/grafana/grafana.git synced 2024-11-26 02:40:26 -06:00

Author	SHA1	Message	Date
William Wernert	4aa477f48f	Alerting: Move rule UID from Loki stream labels into log lines (#70637 ) Move rule uid into log line to reduce cardinality	2023-06-26 09:57:45 -04:00
George Robinson	7edbe72483	Alerting: Support concurrent queries for saving alert instances (#70525 ) This commit adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.	2023-06-23 11:36:07 +01:00
Santiago	d3bb9fbbaf	Alerting: Use only token for images in notifications (#70196 ) * Alerting: Use only tokens for images in notifications * update tests * make linter and modfile validator happy	2023-06-21 20:53:45 -03:00
Alexander Weaver	ce6f73bd32	Alerting: Add two missing tests which cover missing URLs for Loki state history (#70460 ) Add two missing tests which cover individual missing URLs	2023-06-21 12:58:37 -05:00
George Robinson	8a13ee3cd4	Alerting: Add debug logs when saving instances is finished (#70447 )	2023-06-21 14:19:04 +02:00
George Robinson	815e98ed95	Alerting: Add debug logs for EndsAt timestamp (#70336 ) This commit adds debug logs for previous_ends_at and next_ends_at to state.go to help us debug issues where alerts are resolved in Alertmanager due to expiration. This change is in response to a support escalation where this information was needed but unavailable.	2023-06-20 12:13:38 +03:00
SatVeer Singh	1bfa3a0f1e	Chore: Replace go-multierror with errors package (#66432 ) * code refactor and type assertions added to tests * no-lint rule added for specific line	2023-06-19 12:29:45 +03:00
Yuri Tseretyan	baffe83da6	Alerting: Improve performance of cache.getOrCreate (#63909 ) * move expansion of labels and annotations outside of mutex lock * propagate struct but not pointer	2023-06-15 09:37:47 -04:00
Santiago	ff3e028a85	Alerting: Add image URI annotation only when there's an image (#69825 ) * Alerting: Add image URI annotation only when there's an image * fix function name (changed on main branch)	2023-06-09 10:59:24 -03:00
Matthew Jacobson	ba3994d338	Alerting: Repurpose rule testing endpoint to return potential alerts (#69755 ) * Alerting: Repurpose rule testing endpoint to return potential alerts This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations. The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent. This means that the endpoint will, among other things: - Attach static annotations and labels from the rule configuration to the alert instances. - Attach dynamic annotations from the datasource to the alert instances. - Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances. - Interpolate templated annotations / labels and accept allowed template functions.	2023-06-08 18:59:54 -04:00
Santiago	b0881daf23	Alerting: Use URLs in image annotations (#66804 ) * use tokens or urls in image annotations * improve tests, fix some comments * fix empty tokens * code review changes, check for url before checking for token (support old token formats)	2023-04-26 13:06:18 -03:00
Alexander Weaver	3634079b8f	Alerting: Attach hash of instance labels to state history log lines (#65968 ) * Add instanceID which is hash of labels * Rename field to fingerprint * Move to prometheus style signature * Appease linter	2023-04-19 14:22:19 -05:00
Alexander Weaver	a384194e15	Alerting: Use default page size of 5000 when querying Loki for state history (#66315 ) Always specify limit of 5000	2023-04-18 14:31:29 -05:00
Alexander Weaver	cf7157f683	Alerting: Capture refID of rule's condition expression in Loki state history entries (#66419 ) * Capture condition from rule * Add test	2023-04-18 14:21:28 -05:00
Matthew Jacobson	63187fae0c	Alerting: Remove and revert flag alertingBigTransactions (#65976 ) * Alerting: Remove and revert flag alertingBigTransactions This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag. Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count came at the cost of significant increase in CPU usage and query latency. * Fix lint backend * Removed last bits of alertingBigTransactions --------- Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>	2023-04-06 18:06:25 +02:00
Alexander Weaver	fb520edd72	Alerting: Use a completely isolated context for state history writes (#64989 ) * Add fresh context with timeout and same log properties, re-derive logger * Unify timeout constants * Move ctx after shortcut that got added through rebasing * Unify timeouts * Port opentracing's SpanFromContext and ContextFromSpan to the grafana tracing package * Support both opentracing and otel variants * Better document why we're creating a new ctx * Add new func to FakeSpan which was added after rebase * Support grafana-specific traceID key in both tracer implementations	2023-04-04 16:41:46 -05:00
Alexander Weaver	da4832724e	Alerting: Delete stub for SQL alert state history backend (#65667 ) Delete stub for SQL backend	2023-03-31 11:15:56 -05:00
Matthew Jacobson	b9dc04139a	Alerting: Respect "For" Duration for NoData alerts (#65574 ) * Alerting: Respect "For" Duration for NoData alerts This change modifies `resultNoData` to be more inline with the logic of the other state handlers. The main effects of this are: 1) NoData states with NoDataState config set to Alerting will respect "For" duration. 2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK. 3) Better state transition logging.	2023-03-31 19:05:15 +03:00
Steve Simpson	04336d53a9	Alerting: Update prometheus version (#65688 )	2023-03-31 16:34:35 +02:00
Yuri Tseretyan	622c23716a	Alerting: Use logger with context in the state cache (#65663 )	2023-03-31 10:11:30 -04:00
Alexander Weaver	5e87ea745d	Alerting: Fix and re-enable `filters instance labels in log line` test (#65618 ) Fix and reenable test	2023-03-30 09:02:18 -05:00
Dimitris Sotirakis	e758b017d0	Alerting: Disable `filters instance labels in log line` test (#65610 ) * Disable filters instance labels in log line test * Add drone reference	2023-03-30 16:04:29 +03:00
Alexander Weaver	a416100abc	Alerting: No longer index state history log streams by instance labels (#65474 ) * Remove private labels * No longer index by instance labels * Labels are now invariant, only build them once * Remove bucketing since everything is in a single stream * Refactor statesToStreams to only return a single unified log stream * Don't query on labels that no longer exist * Move selector logic to loki layer, genericize client to work in terms of straight logQL * Add support for line-level label filters in query * Combine existing selector tests for better parallelism * Tests for logQL construction * Underscore instead of dot for unwrapping labels in logql	2023-03-29 11:52:11 -05:00
Alexander Weaver	de1637afe5	Alerting: Add alert instance labels to Loki log lines in addition to stream labels (#65403 ) Add instance labels to log line	2023-03-28 08:57:51 -05:00
Alexander Weaver	dd04757fc9	Alerting: Add "backend" label to state history writes metrics (#65395 ) * Add backend label to state history writes metrics * Update test expectations	2023-03-28 08:49:51 -05:00
Serge Zaitsev	0beb768427	Chore: Remove result fields from ngalert (#65410 ) * remove result fields from ngalert * remove duplicate imports	2023-03-28 10:34:35 +02:00
Alexander Weaver	07368dec74	Alerting: Fix attachment of external labels to Loki state history log streams (#65140 ) Fix attachment of external labels, add tests	2023-03-21 18:00:59 -05:00
Alexander Weaver	bf54f2672e	Alerting: Switch to snappy-compressed-protobuf for outgoing push requests to Loki (#65077 ) * Encode with snappy, always * JSON encoder type * Headers * Copy labels formatter from promtail * Implement snappy-proto encoding * Create encoder interface, test both encoders, choose snappy-proto by default * Make encoder configurable at the LokiCfg level * Export both encoders * Touch up comment and tests * Drop unnecessary conversions after move to plain strings to appease linter	2023-03-21 13:38:42 -05:00
Alexander Weaver	cc7e5ce62e	Alerting: Fix ambiguous handling of equals in labels when bucketing Loki state history streams (#65013 ) * Use JSON instead of data.Labels string format as label repr * Drop debug log line	2023-03-21 12:33:27 -05:00
Alexander Weaver	e39d7f44c9	Alerting: Elide requests to Loki if nothing should be recorded (#65011 ) Exit early if no log streams or annotations	2023-03-21 09:30:56 -05:00
Alexander Weaver	40c5713cbd	Vendor errors.Join from Go standard library to avoid version incompatibilities (#64985 ) Vendor errors.Join from std lib	2023-03-17 14:07:58 -05:00
Alexander Weaver	a31672fa40	Alerting: Create new state history "fanout" backend that dispatches to multiple other backends at once (#64774 ) * Rename RecordStatesAsync to Record * Rename QueryStates to Query * Implement fanout writes * Implement primary queries * Simplify error joining * Add test for query path * Add tests for writes and error propagation * Allow fanout backend to be configured * Touch up log messages and config validation * Consistent documentation for all backend structs * Parse and normalize backend names more consistently against an enum * Touch-ups to documentation * Improve clarity around multi-record blocking * Keep primary and secondaries more distinct * Rename fanout backend to multiple backend * Simplify config keys for multi backend mode	2023-03-17 12:41:18 -05:00
Andrej Ocenas	6647217208	Phlare: Use enum config to send deduplicated func and filenames (#64435 )	2023-03-13 11:06:04 +01:00
Jean-Philippe Quéméner	fb5ed0b0b3	Alerting: fix flaky cache test (#64499 )	2023-03-09 06:08:05 -05:00
George Robinson	0c8876c3a2	Alerting: Return errors when expanding templates (#63662 ) This commit changes the state package so that errors encountered while expanding templates for custom labels and annotations are returned from the function. This is not used at present, but will be used in the future as we look at how to offer better feedback to users who don't have access to logs, for example our customers who use Hosted Grafana.	2023-03-08 12:25:02 +00:00
George Robinson	ed71012ced	Alerting: Fix Classic Conditions $values variable (#64243 ) This commit fixes a bug in the $values variable in notification templates when using Classic Conditions. Since Classic Conditions are not multi-dimensional, the values of each series that exceeded the condition should be available as a RefID and offset. For example, B0, B1, etc. However, this bug meant that instead just a single condition would be printed as B, not B0.	2023-03-06 12:08:00 -05:00
Alexander Weaver	19d01dff91	Alerting: Expose Prometheus metrics for persisting state history (#63157 ) * Create historian metrics and dependency inject * Record counter for total number of state transitions logged * Track write failures * Track current number of active write goroutines * Record histogram of how long it takes to write history data * Don't copy the registerer * Adjust naming of write failures metric * Introduce WritesTotal to complement WritesFailedTotal * Measure TransitionsFailedTotal to complement TransitionsTotal * Rename all to state_history * Remove redundant Total suffix * Increment totals all the time, not just on success * Drop ActiveWriteGoroutines * Drop PersistDuration in favor of WriteDuration * Drop unused gauge * Make writes and writesFailed per org * Add metric indicating backend and a spot for future metadata * Drop _batch_ from names and update help * Add metric for bytes written * Better pairing of total + failure metric updates * Few tweaks to wording and naming * Record info metric during composition * Create fakeRequester and simple happy path test using it * Blocking test for the full historian and test for happy path metrics * Add tests for failure case metrics * Smoke test for full annotation persistence * Create test for metrics on annotation persistence, both happy and failing paths * Address linter complaints * More linter complaints * Remove unnecessary whitespace * Consistency improvements to help texts * Update tests to match new descs	2023-03-06 10:40:37 -06:00
Serge Zaitsev	0bdb105df2	Chore: Remove xorcare/pointer dependency (#63900 ) * Chore: remove pointer dependency * fix type casts * deprecate xorcare/pointer library in linter * rooky mistake	2023-03-06 05:23:15 -05:00
Alexander Weaver	e77621649d	Alerting: Instrument outgoing state history requests using weaveworks/common (#63600 ) * Loki backend and client depend on a requester * Instrument all requests to loki using weaveworks TimedClient * Construct collector in metrics package	2023-02-23 17:52:02 -06:00
George Robinson	9f2fb3fa27	Alerting: Add filter and remove funcs for custom labels and annotations (#63437 ) This commit adds filterLabels, filterLabelsRe, removeLabels, and removeLabelsRe functions to templates for custom labels and annotations. It allows for use cases such as removing all private labels.	2023-02-20 14:40:26 +00:00
George Robinson	c637a5543e	Alerting: Rename caps to captures as cap is a reserved word (#63432 )	2023-02-20 10:08:36 +00:00
George Robinson	aacf9da969	Alerting: Change Data to use Labels instead of map[string]string (#63431 ) This commit changes the Data struct in template.go to use Labels instead of map[string]string. It changes how labels are printed when using {{ .Labels }} from map[foo:bar bar:baz] to foo=bar, bar=baz.	2023-02-20 10:08:23 +00:00
George Robinson	0a01391ebe	Alerting: Small readability improvements to template.go (#63422 ) * Alerting: Small readability improvements to template.go * Fix lint	2023-02-20 09:24:11 +00:00
George Robinson	0659134793	Alerting: Better printing of labels (#63348 ) This commit changes how labels are printed in templates for custom annotations and labels from map[foo:bar bar:baz] to foo=bar, bar=baz. Labels are comma separated, and sorted in increasing order.	2023-02-16 12:04:15 -05:00
George Robinson	9e86916d48	Alerting: Move templating to template package (#63347 ) This commit moves templating from the state package to a sub-package called template. This sub-package will be the logical package for future ease-of-use improvements to templating custom annotations and labels.	2023-02-16 17:16:36 +01:00
Alexander Weaver	958fb2c50a	Alerting: Unify structs in Loki client and make them more consistent with Prometheus (#63055 ) * Use existing row struct instead of [2]string, add deserialization helper * Replace Stream struct with stream struct which is exactly the same * Drop unused status field * Don't export queryRes and queryData * Tests for custom marshalling * Rename row fields to T and V for consistency with prometheus samples * Rename row to sample	2023-02-11 05:17:44 -06:00
Steve Simpson	4d1a2c3370	Alerting: Move `rule_groups_rules` metric from State to Scheduler. (#63144 ) The `rule_groups_rules` metric is currently defined and computed by `State`. It makes more sense for this metric to be computed off of the configured rule set, not based on the rule evaluation state. There could be an edge condition where a rule does not have a state yet, and so is uncounted. Additionally, we would like this metric (and others), to have a `rule_group` label, and this is much easier to achieve if the metric is produced from the `Scheduler` package.	2023-02-09 17:05:19 +01:00
Alexander Weaver	f80bf11782	Alerting: Make time range query parameters not required when querying Loki (#62985 ) * Make from and to not required * Move default range calculation up to loki.go	2023-02-07 14:26:43 -06:00
Joey Tawadrous	121260e0dd	Parca: Use data query schema (#62840 ) * Parca data query schema * Remove groupBy	2023-02-07 09:56:21 +00:00
Alexander Weaver	0efb84617e	Alerting: Create benchmarking test for state.ProcessEvalResults (#62041 ) * Create benchmark for ProcessEvalResults * Simplify the test	2023-02-03 15:38:08 -06:00

1 2 3 4 5

215 Commits