Commit Graph

177 Commits

Author SHA1 Message Date
Alexander Weaver
e77621649d
Alerting: Instrument outgoing state history requests using weaveworks/common (#63600)
* Loki backend and client depend on a requester

* Instrument all requests to loki using weaveworks TimedClient

* Construct collector in metrics package
2023-02-23 17:52:02 -06:00
George Robinson
9f2fb3fa27
Alerting: Add filter and remove funcs for custom labels and annotations (#63437)
This commit adds filterLabels, filterLabelsRe, removeLabels, and
removeLabelsRe functions to templates for custom labels and annotations.
It allows for use cases such as removing all private labels.
2023-02-20 14:40:26 +00:00
George Robinson
c637a5543e
Alerting: Rename caps to captures as cap is a reserved word (#63432) 2023-02-20 10:08:36 +00:00
George Robinson
aacf9da969
Alerting: Change Data to use Labels instead of map[string]string (#63431)
This commit changes the Data struct in template.go to use Labels
instead of map[string]string. It changes how labels are printed
when using {{ .Labels }} from map[foo:bar bar:baz] to
foo=bar, bar=baz.
2023-02-20 10:08:23 +00:00
George Robinson
0a01391ebe
Alerting: Small readability improvements to template.go (#63422)
* Alerting: Small readability improvements to template.go

* Fix lint
2023-02-20 09:24:11 +00:00
George Robinson
0659134793
Alerting: Better printing of labels (#63348)
This commit changes how labels are printed in templates for custom
annotations and labels from map[foo:bar bar:baz] to foo=bar, bar=baz.
Labels are comma separated, and sorted in increasing order.
2023-02-16 12:04:15 -05:00
George Robinson
9e86916d48
Alerting: Move templating to template package (#63347)
This commit moves templating from the state package to a sub-package
called template. This sub-package will be the logical package for
future ease-of-use improvements to templating custom annotations
and labels.
2023-02-16 17:16:36 +01:00
Alexander Weaver
958fb2c50a
Alerting: Unify structs in Loki client and make them more consistent with Prometheus (#63055)
* Use existing row struct instead of [2]string, add deserialization helper

* Replace Stream struct with stream struct which is exactly the same

* Drop unused status field

* Don't export queryRes and queryData

* Tests for custom marshalling

* Rename row fields to T and V for consistency with prometheus samples

* Rename row to sample
2023-02-11 05:17:44 -06:00
Steve Simpson
4d1a2c3370
Alerting: Move rule_groups_rules metric from State to Scheduler. (#63144)
The `rule_groups_rules` metric is currently defined and computed by `State`.
It makes more sense for this metric to be computed off of the configured rule
set, not based on the rule evaluation state. There could be an edge condition
where a rule does not have a state yet, and so is uncounted.

Additionally, we would like this metric (and others), to have a `rule_group`
label, and this is much easier to achieve if the metric is produced from the
`Scheduler` package.
2023-02-09 17:05:19 +01:00
Alexander Weaver
f80bf11782
Alerting: Make time range query parameters not required when querying Loki (#62985)
* Make from and to not required

* Move default range calculation up to loki.go
2023-02-07 14:26:43 -06:00
Joey Tawadrous
121260e0dd
Parca: Use data query schema (#62840)
* Parca data query schema

* Remove groupBy
2023-02-07 09:56:21 +00:00
Alexander Weaver
0efb84617e
Alerting: Create benchmarking test for state.ProcessEvalResults (#62041)
* Create benchmark for ProcessEvalResults

* Simplify the test
2023-02-03 15:38:08 -06:00
idafurjes
982939111b
Rename Id to ID for annotation models (#62886)
* Rename Id to ID for annotation models

* Add xorm tags

* Rename Id to ID for API key models

* Add xorm tags
2023-02-03 17:23:09 +01:00
Jean-Philippe Quéméner
1694a35e0c
Alerting: implement loki query for alert state history (#61992)
* Alerting: implement loki query for alert state history

* extract selector building

* add unit tests for selector creation

* backup

* give selectors their own type

* build dataframe

* add some tests

* small changes after manual testing

* use struct client

* golint

* more golint

* Make RuleUID optional for Loki implementation

* Drop initial assumption that we only have one series

* Pare down to three columns, fix timestamp overflows, improve failure cases in loki responses

* Embed structred log lines in the dataframe as objects rather than json strings

* Include state history label filter

* Remove dead code

---------

Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
2023-02-02 16:31:51 -06:00
Alexander Weaver
647f73ddc5
Alerting: Add static label to all state history entries (#62817)
* Add static label to all state history entries

* Separate label and value visually
2023-02-02 13:25:26 -06:00
Alexander Weaver
6ad1cfef38
Alerting: Add endpoint for querying state history (#62166)
* Define endpoint and generate

* Wire up and register endpoint

* Cleanup, define authorization

* Forgot the leading slash

* Wire up query and SignedInUser

* Wire up timerange query params

* Add todo for label queries

* Drop comment

* Update path to rules subtree
2023-02-02 11:34:00 -06:00
Alexander Weaver
9fa28c11c5
Alerting: Usability adjustments to Loki representation of state history values (#62643)
* Extract label merge, add test file

* Extract error/NoData to first class fields, remove a layer from values

* Include dashUID and panelID as line-level fields

* Drop unnecessary object receiver

* Add tests for stream building

* Drop NoData field from log lines
2023-02-02 10:54:10 -06:00
Alexander Weaver
fcecf4d3cb
Alerting: Refactor away a layer of indirection around the goroutine in Loki state history (#62644)
Inline recordStreamsAsync in loki backend
2023-02-01 11:57:29 -06:00
Alexander Weaver
03f3fbec0d
Alerting: Fix handling of special floating-point cases when writing observed values to annotations (#61074)
* Fix json serialization of state values

* Simplify two of the tests

* Fix linter complaint

* Don't return error if we fail to look up dashboard, just log it and move on

* Address linter complaint
2023-01-31 15:27:51 -06:00
Alexander Weaver
e7ace4ed62
Alerting: Allow separate read and write path URLs for Loki state history (#62268)
Extract config parsing and add tests
2023-01-30 16:30:05 -06:00
Alexander Weaver
b4682fe3cb
Alerting: Configurable externalLabels for Loki state history (#62404)
* Add config option for external labels

* Remove redundant nilcheck
2023-01-30 14:24:45 -06:00
Serge Zaitsev
d6d4097567
Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
Yuri Tseretyan
0c4671e31f
Alerting: Update historian to ignore transitions from Normal Paused and Updated (#62267) 2023-01-27 16:26:22 -05:00
Yuri Tseretyan
05bf241952
Alerting: Update state manager to return StateTransitions when Delete or Reset (#62264)
* update Delete and Reset methods to return state transitions

this will be used by notifier code to decide whether alert needs to be sent or not.

* update scheduler to provide reason to delete states and use transitions

* update FromAlertsStateToStoppedAlert to accept StateTransition and filter by old state

* fixup

* fix tests
2023-01-27 09:46:21 +01:00
Alex Moreno
531b439cf1
Alerting: Add alert pausing feature (#60734)
* Add field in alert_rule model, add state to alert_instance model, and state to eval

* Remove paused state from eval package

* Skip paused alert rules in scheduler

* Add migration to add is_paused field to alert_rule table

* Convert to postable alerts only if not normal, pernding, or paused

* Handle paused eval results in state manager

* Add Paused state to eval package

* Add paused alerts logic in scheduler

* Skip alert on scheduler

* Remove paused status from eval package

* Apply suggestions from code review

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Remove state

* Rethink schedule and manager for paused alerts

* Change return to continue

* Remove unused var

* Rethink alert pausing

* Paused alerts storing annotations

* Only add one state transition

* Revert boolean method renaming refactor

* Revert take image refactor

* Make registry errors public

* Revert method extraction for getting a folder title

* Revert variable renaming refactor

* Undo unnecessary changes

* Revert changes in test

* Remove IsPause check in PatchPartiLAlertRule function

* Use SetNormal to set state

* Fix text by returning to old behaviour on alert rule deletion

* Add test in schedule_unit_test.go to test ticks with paused alerts

* Add coment to clarify usage of context.Background()

* Add comment to clarify resetStateByRuleUID method usage

* Move rule get to a more limited scope

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* rum gofmt on pkg/services/ngalert/schedule/schedule.go

* Remove defer cancel for context

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/testing.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* skip scheduler rule state clean up on paused alert rule

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Fix mock in test

* Add (hopefully) final suggestions

* Use error channel from recordAnnotationsSync to cancel context

* Run make gen-cue

* Place pause alert check in channel update after version check

* Reduce branching un update channel select

* Add if for error and move code inside if in state manager ResetStateByRuleUID

* Add reason to logs

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Do not delete alert rule routine, just exit on eval if is paused

* Reduce branching and create-close a channel to avoid deadlocks

* Separate state deletion and state reset (includes history saving)

* Add current pause state in rule route in scheduler

* Split clearState and bring errCh closer to RecordStatesAsync call

* Change rule to ruleMeta in RecordStatesAsync

* copy state to be able to modify it

* Add timeout to context creation

* Shorten the timeout

* Use resetState is rule is paused and deleteState if rule is not paused

* Remove Empty state reason

* Save every rule change in historian

* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID

* Remove useless line

* Remove outdated comment

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
2023-01-26 18:29:10 +01:00
George Robinson
a7eab8e46e
Alerting: Support context.Context in Loki interface (#61979)
This commit adds support for canceleable contexts in the Loki
interface.
2023-01-26 09:31:20 +00:00
Alexander Weaver
046a9bb7c1
Alerting: Copy rule definitions into state history (#62032)
* Copy rules instead of accepting pointer

* Deep-copy the rule, for even more guarantees

* Create struct just for needed fields

* Move RuleMeta to historian/model package, iron out package dependencies

* Move tests for dash ID parsing to model package along with code
2023-01-25 11:29:57 -06:00
idafurjes
b54b80f473
Chore: Remove Result from dashboard models (#61997)
* Chore: Remove Result from dashboard models

* Fix lint tests

* Fix dashboard service tests

* Fix API tests

* Remove commented out code

* Chore: Merge main - cleanup
2023-01-25 10:36:26 +01:00
George Robinson
239d94205a
Alerting: Return chan <-error for #61811 (#61858) 2023-01-24 15:41:38 +00:00
Alexander Weaver
7ccc845187
Alerting: Push state history entries to Loki (#61724)
* Implement push endpoint

* Drop duplicated struct

* Genericize auth/tenant headers and improve logging in error case

* Flesh out the data model

* Drop dead code

* Drop log line entirely

* Drop unused arg

* Rename a few type manipulation functions

* Extract label keys as constants

* Improve logs when loki responds with error

* Inline lokiRepresentation function
2023-01-23 16:31:03 -06:00
Alexander Weaver
c10713ea76
Alerting: Create query interface for state history along with annotation-based implementation (#61646) 2023-01-19 10:45:31 +01:00
Jean-Philippe Quéméner
44b11d3228
Alerting: support basic auth for the state history loki client (#61696) 2023-01-18 20:24:40 +01:00
Alexander Weaver
1ac89ea040
Alerting: Add client configuration for remote Loki historian backend and test connection (#61114)
* Create loki client type and ping method

* Expose TestConnection on client

* Configure and ping Loki URL

* Close response body reader if present

* Add 30 second timeout

* Remove duplicate close
2023-01-17 13:58:52 -06:00
Denis Limarev
e6dee8a723
Perfomance: Preallocate slices (#61580) 2023-01-17 11:50:17 +00:00
idafurjes
7c2522c477
Chore: Move dashboard models to dashboard pkg (#61458)
* Copy dashboard models to dashboard pkg

* Use some models from current pkg instead of models

* Adjust api pkg

* Adjust pkg services

* Fix lint
2023-01-16 16:33:55 +01:00
Yuri Tseretyan
9d57b1c72e
Alerting: Do not persist noop transition from Normal state. (#61201)
* add feature flag `alertingNoNormalState`
* update instance database to support exclusion of state in list operation
* do not save normal state and delete transitions to normal
* update get methods to filter out normal state
2023-01-13 18:29:29 -05:00
Alexander Weaver
b289b8ac6e
Alerting: Set error annotation on EvaluationError regardless of underlying error type (#61506)
Set error annotation regardless of underlying error type
2023-01-13 13:58:02 -06:00
Denis Limarev
90badc8729
Performance: Add preallocation for some slices (#59593) 2023-01-11 18:03:37 +01:00
Yuri Tseretyan
86b5fbbf60
Alerting: Introduce state manager config structure (#61249) 2023-01-10 16:26:15 -05:00
Alexander Weaver
eb960d9725
Alerting: Add un-documented toggle for changing state history backend, add shells for remote loki and sql (#61072)
* Add toggle for state history backend and shells

* Extract some shared logic and add tests
2023-01-06 12:06:01 -06:00
Alexander Weaver
8c3a5f6da0
Alerting: Allow state history to be disabled through configuration (#61006)
* Add configuration option for if state history should be enabled

* Inject no-op when history is disabled
2023-01-05 12:21:07 -06:00
Yuri Tseretyan
4d989860fb
Alerting: Fix conversion of alert state from db state during manager warmup (#60933) 2023-01-04 09:40:04 -05:00
Alexander Weaver
b88b8bc291
Alerting: Fix missing dashboard/panelID links in annotations (#60926)
Assign thru ref
2023-01-03 14:12:27 -06:00
Santiago
05c9af5110
Extract custom template functions (#60695)
extract custom template functions and export the FuncMap
2022-12-22 17:31:40 -03:00
Kristina
5a7f38053b
Remove explore compact URLs (#59686)
* Remove explore compact URLs

* Remove two explore link builders that create compact URLs

* Fix merge conflict
2022-12-14 12:57:53 -06:00
Yuri Tseretyan
4374966987
Alerting: Replace hardcoded <no value> to [no value] in label expansion (#60129)
* replace hardcoded <no value> to [no value] in label expansion
2022-12-12 10:12:30 -05:00
George Robinson
76601f3ae7
Alerting: Better define how we set states (#59977)
This commit better defines how we set states in resultNormal,
resultAlerting, resultError and resultNoData. It changes the existing
code to call methods such as SetAlerting, SetPending, SetNormal,
SetError and NoData instead of assigning values to each individual field
whenever the state is changed. This should make it easier to understand
what fields should be set for which states and avoid cases where states are
missing, or have additional unexpected fields.
2022-12-08 20:12:13 +00:00
George Robinson
6359dab040
Alerting: Change resultError in preparation for supporting ForError duration (#59894) 2022-12-07 10:45:56 +00:00
George Robinson
3c249e1b99
Fix incorrect start time for DatasourceError alerts (#59903) 2022-12-06 18:44:06 +00:00
Yuri Tseretyan
abb49d96b5
Alerting: update state manager to return StateTransition instead of State (#58867)
* improve test for stale states
* update state manager return StateTransition
* update scheduler to accept state transitions
2022-12-06 13:07:39 -05:00