Commit Graph

43 Commits

Author SHA1 Message Date
Alexander Weaver
de1637afe5
Alerting: Add alert instance labels to Loki log lines in addition to stream labels (#65403)
Add instance labels to log line
2023-03-28 08:57:51 -05:00
Alexander Weaver
dd04757fc9
Alerting: Add "backend" label to state history writes metrics (#65395)
* Add backend label to state history writes metrics

* Update test expectations
2023-03-28 08:49:51 -05:00
Serge Zaitsev
0beb768427
Chore: Remove result fields from ngalert (#65410)
* remove result fields from ngalert

* remove duplicate imports
2023-03-28 10:34:35 +02:00
Alexander Weaver
07368dec74
Alerting: Fix attachment of external labels to Loki state history log streams (#65140)
Fix attachment of external labels, add tests
2023-03-21 18:00:59 -05:00
Alexander Weaver
bf54f2672e
Alerting: Switch to snappy-compressed-protobuf for outgoing push requests to Loki (#65077)
* Encode with snappy, always

* JSON encoder type

* Headers

* Copy labels formatter from promtail

* Implement snappy-proto encoding

* Create encoder interface, test both encoders, choose snappy-proto by default

* Make encoder configurable at the LokiCfg level

* Export both encoders

* Touch up comment and tests

* Drop unnecessary conversions after move to plain strings to appease linter
2023-03-21 13:38:42 -05:00
Alexander Weaver
cc7e5ce62e
Alerting: Fix ambiguous handling of equals in labels when bucketing Loki state history streams (#65013)
* Use JSON instead of data.Labels string format as label repr

* Drop debug log line
2023-03-21 12:33:27 -05:00
Alexander Weaver
e39d7f44c9
Alerting: Elide requests to Loki if nothing should be recorded (#65011)
Exit early if no log streams or annotations
2023-03-21 09:30:56 -05:00
Alexander Weaver
40c5713cbd
Vendor errors.Join from Go standard library to avoid version incompatibilities (#64985)
Vendor errors.Join from std lib
2023-03-17 14:07:58 -05:00
Alexander Weaver
a31672fa40
Alerting: Create new state history "fanout" backend that dispatches to multiple other backends at once (#64774)
* Rename RecordStatesAsync to Record

* Rename QueryStates to Query

* Implement fanout writes

* Implement primary queries

* Simplify error joining

* Add test for query path

* Add tests for writes and error propagation

* Allow fanout backend to be configured

* Touch up log messages and config validation

* Consistent documentation for all backend structs

* Parse and normalize backend names more consistently against an enum

* Touch-ups to documentation

* Improve clarity around multi-record blocking

* Keep primary and secondaries more distinct

* Rename fanout backend to multiple backend

* Simplify config keys for multi backend mode
2023-03-17 12:41:18 -05:00
Alexander Weaver
19d01dff91
Alerting: Expose Prometheus metrics for persisting state history (#63157)
* Create historian metrics and dependency inject

* Record counter for total number of state transitions logged

* Track write failures

* Track current number of active write goroutines

* Record histogram of how long it takes to write history data

* Don't copy the registerer

* Adjust naming of write failures metric

* Introduce WritesTotal to complement WritesFailedTotal

* Measure TransitionsFailedTotal to complement TransitionsTotal

* Rename all to state_history

* Remove redundant Total suffix

* Increment totals all the time, not just on success

* Drop ActiveWriteGoroutines

* Drop PersistDuration in favor of WriteDuration

* Drop unused gauge

* Make writes and writesFailed per org

* Add metric indicating backend and a spot for future metadata

* Drop _batch_ from names and update help

* Add metric for bytes written

* Better pairing of total + failure metric updates

* Few tweaks to wording and naming

* Record info metric during composition

* Create fakeRequester and simple happy path test using it

* Blocking test for the full historian and test for happy path metrics

* Add tests for failure case metrics

* Smoke test for full annotation persistence

* Create test for metrics on annotation persistence, both happy and failing paths

* Address linter complaints

* More linter complaints

* Remove unnecessary whitespace

* Consistency improvements to help texts

* Update tests to match new descs
2023-03-06 10:40:37 -06:00
Alexander Weaver
e77621649d
Alerting: Instrument outgoing state history requests using weaveworks/common (#63600)
* Loki backend and client depend on a requester

* Instrument all requests to loki using weaveworks TimedClient

* Construct collector in metrics package
2023-02-23 17:52:02 -06:00
Alexander Weaver
958fb2c50a
Alerting: Unify structs in Loki client and make them more consistent with Prometheus (#63055)
* Use existing row struct instead of [2]string, add deserialization helper

* Replace Stream struct with stream struct which is exactly the same

* Drop unused status field

* Don't export queryRes and queryData

* Tests for custom marshalling

* Rename row fields to T and V for consistency with prometheus samples

* Rename row to sample
2023-02-11 05:17:44 -06:00
Alexander Weaver
f80bf11782
Alerting: Make time range query parameters not required when querying Loki (#62985)
* Make from and to not required

* Move default range calculation up to loki.go
2023-02-07 14:26:43 -06:00
idafurjes
982939111b
Rename Id to ID for annotation models (#62886)
* Rename Id to ID for annotation models

* Add xorm tags

* Rename Id to ID for API key models

* Add xorm tags
2023-02-03 17:23:09 +01:00
Jean-Philippe Quéméner
1694a35e0c
Alerting: implement loki query for alert state history (#61992)
* Alerting: implement loki query for alert state history

* extract selector building

* add unit tests for selector creation

* backup

* give selectors their own type

* build dataframe

* add some tests

* small changes after manual testing

* use struct client

* golint

* more golint

* Make RuleUID optional for Loki implementation

* Drop initial assumption that we only have one series

* Pare down to three columns, fix timestamp overflows, improve failure cases in loki responses

* Embed structred log lines in the dataframe as objects rather than json strings

* Include state history label filter

* Remove dead code

---------

Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
2023-02-02 16:31:51 -06:00
Alexander Weaver
647f73ddc5
Alerting: Add static label to all state history entries (#62817)
* Add static label to all state history entries

* Separate label and value visually
2023-02-02 13:25:26 -06:00
Alexander Weaver
6ad1cfef38
Alerting: Add endpoint for querying state history (#62166)
* Define endpoint and generate

* Wire up and register endpoint

* Cleanup, define authorization

* Forgot the leading slash

* Wire up query and SignedInUser

* Wire up timerange query params

* Add todo for label queries

* Drop comment

* Update path to rules subtree
2023-02-02 11:34:00 -06:00
Alexander Weaver
9fa28c11c5
Alerting: Usability adjustments to Loki representation of state history values (#62643)
* Extract label merge, add test file

* Extract error/NoData to first class fields, remove a layer from values

* Include dashUID and panelID as line-level fields

* Drop unnecessary object receiver

* Add tests for stream building

* Drop NoData field from log lines
2023-02-02 10:54:10 -06:00
Alexander Weaver
fcecf4d3cb
Alerting: Refactor away a layer of indirection around the goroutine in Loki state history (#62644)
Inline recordStreamsAsync in loki backend
2023-02-01 11:57:29 -06:00
Alexander Weaver
03f3fbec0d
Alerting: Fix handling of special floating-point cases when writing observed values to annotations (#61074)
* Fix json serialization of state values

* Simplify two of the tests

* Fix linter complaint

* Don't return error if we fail to look up dashboard, just log it and move on

* Address linter complaint
2023-01-31 15:27:51 -06:00
Alexander Weaver
e7ace4ed62
Alerting: Allow separate read and write path URLs for Loki state history (#62268)
Extract config parsing and add tests
2023-01-30 16:30:05 -06:00
Alexander Weaver
b4682fe3cb
Alerting: Configurable externalLabels for Loki state history (#62404)
* Add config option for external labels

* Remove redundant nilcheck
2023-01-30 14:24:45 -06:00
Serge Zaitsev
d6d4097567
Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
Yuri Tseretyan
0c4671e31f
Alerting: Update historian to ignore transitions from Normal Paused and Updated (#62267) 2023-01-27 16:26:22 -05:00
George Robinson
a7eab8e46e
Alerting: Support context.Context in Loki interface (#61979)
This commit adds support for canceleable contexts in the Loki
interface.
2023-01-26 09:31:20 +00:00
Alexander Weaver
046a9bb7c1
Alerting: Copy rule definitions into state history (#62032)
* Copy rules instead of accepting pointer

* Deep-copy the rule, for even more guarantees

* Create struct just for needed fields

* Move RuleMeta to historian/model package, iron out package dependencies

* Move tests for dash ID parsing to model package along with code
2023-01-25 11:29:57 -06:00
idafurjes
b54b80f473
Chore: Remove Result from dashboard models (#61997)
* Chore: Remove Result from dashboard models

* Fix lint tests

* Fix dashboard service tests

* Fix API tests

* Remove commented out code

* Chore: Merge main - cleanup
2023-01-25 10:36:26 +01:00
George Robinson
239d94205a
Alerting: Return chan <-error for #61811 (#61858) 2023-01-24 15:41:38 +00:00
Alexander Weaver
7ccc845187
Alerting: Push state history entries to Loki (#61724)
* Implement push endpoint

* Drop duplicated struct

* Genericize auth/tenant headers and improve logging in error case

* Flesh out the data model

* Drop dead code

* Drop log line entirely

* Drop unused arg

* Rename a few type manipulation functions

* Extract label keys as constants

* Improve logs when loki responds with error

* Inline lokiRepresentation function
2023-01-23 16:31:03 -06:00
Alexander Weaver
c10713ea76
Alerting: Create query interface for state history along with annotation-based implementation (#61646) 2023-01-19 10:45:31 +01:00
Jean-Philippe Quéméner
44b11d3228
Alerting: support basic auth for the state history loki client (#61696) 2023-01-18 20:24:40 +01:00
Alexander Weaver
1ac89ea040
Alerting: Add client configuration for remote Loki historian backend and test connection (#61114)
* Create loki client type and ping method

* Expose TestConnection on client

* Configure and ping Loki URL

* Close response body reader if present

* Add 30 second timeout

* Remove duplicate close
2023-01-17 13:58:52 -06:00
idafurjes
7c2522c477
Chore: Move dashboard models to dashboard pkg (#61458)
* Copy dashboard models to dashboard pkg

* Use some models from current pkg instead of models

* Adjust api pkg

* Adjust pkg services

* Fix lint
2023-01-16 16:33:55 +01:00
Alexander Weaver
eb960d9725
Alerting: Add un-documented toggle for changing state history backend, add shells for remote loki and sql (#61072)
* Add toggle for state history backend and shells

* Extract some shared logic and add tests
2023-01-06 12:06:01 -06:00
Alexander Weaver
8c3a5f6da0
Alerting: Allow state history to be disabled through configuration (#61006)
* Add configuration option for if state history should be enabled

* Inject no-op when history is disabled
2023-01-05 12:21:07 -06:00
Alexander Weaver
b88b8bc291
Alerting: Fix missing dashboard/panelID links in annotations (#60926)
Assign thru ref
2023-01-03 14:12:27 -06:00
Yuri Tseretyan
a85adeed96
Alerting: Update state history service to filter states transitions (#58863)
* rename the method to better reflect its behavior
* make historian filter transition on itself
* call historian with all changes
2022-12-06 12:33:15 -05:00
Alexander Weaver
cc8c1380e2
Alerting: Persist annotations from multidimensional rules in batches (#56575)
* Reduce piecemeal state fields

* Read data directly off state instead of rule

* Unify state and context into single struct

* Expose contextual information to layer above setNextState

* Work in terms of ContextualState and call historian in batches

* Call annotations service in batches

* Export format state and reason and remove workaround in unrelated test package

* Add new method to annotation service for batch inserting

* Fix loop variable aliasing bug caught by linter, didn't change behavior

* Incl timerange on annotation tests

* Insert one at a time if tags are present

* Point to rule from ContextualState rather than copy fields

* Build annotations and copy data prior to starting goroutine

* Rename to StateTransition

* Use new bulk-insert utility

* Remove rule from StateTransition and pass in directly to historian

* Simplify annotations logic since we have only one rule

* Fix logs and context, nilcheck, simplify method name

* Regenerate mock
2022-11-04 10:39:26 -05:00
Alex Moreno
ba15d675e7
Alerting: Add values to annotations (#57738)
* Add values to annotations

* Fix imports

* Use State attrs instead of Result attrs

* Remove unnecessary variable
2022-11-03 10:35:34 +01:00
Alexander Weaver
de46c1b002
Alerting: Improve logs in state manager and historian (#57374)
* Touch up log statements, fix casing, add and normalize contexts

* Dedicated logger for dashboard resolver

* Avoid injecting logger to historian

* More minor log touch-ups

* Dedicated logger for state manager

* Use rule context in annotation creator

* Rename base logger and avoid redundant contextual loggers
2022-10-21 16:16:51 -05:00
Alexander Weaver
3ddb28bad9
Find-and-replace 'err' logs to 'error' to match log search conventions (#57309) 2022-10-19 17:36:54 -04:00
Alexander Weaver
129a28919b
Alerting: Cache result of dashboard ID lookups (#56587)
* Create caching dashboard resolver

* A couple tests for dashboard resolving

* Log warning on not found

* Additional polish + review nits

* Move to singleflight instead of a plain mutex

* Store errors instead of -1 in cache and use reflection when reading

* Address linter error

* One more linter error
2022-10-14 15:48:02 -05:00
Alexander Weaver
8df830557a
Alerting: Move annotation functionality behind a history persistence interface (#56133)
* Move annotation functionality behind a history persistence interface

* Rename to RecordState

* Fix lint error in import aliasing

* One more import linter error
2022-10-05 15:32:20 -05:00