* Alerting: implement loki query for alert state history
* extract selector building
* add unit tests for selector creation
* backup
* give selectors their own type
* build dataframe
* add some tests
* small changes after manual testing
* use struct client
* golint
* more golint
* Make RuleUID optional for Loki implementation
* Drop initial assumption that we only have one series
* Pare down to three columns, fix timestamp overflows, improve failure cases in loki responses
* Embed structred log lines in the dataframe as objects rather than json strings
* Include state history label filter
* Remove dead code
---------
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
* Alerting: Fix template validation in provisioning api
Fix issue where provisioning API accepts a malformed template having extra
text outside of definition block and template name matching definition name.
* Mark AM configuration as applied
* add missing checks, make linter happy
* fix deadlock, mark as valid on save and on load
* mark configurations only if needed
* check error after applyConfig()
* code review comments
* code review changes
* more code review changes
* clean HistoricConfigFromAlertConfig function
* Define endpoint and generate
* Wire up and register endpoint
* Cleanup, define authorization
* Forgot the leading slash
* Wire up query and SignedInUser
* Wire up timerange query params
* Add todo for label queries
* Drop comment
* Update path to rules subtree
* Extract label merge, add test file
* Extract error/NoData to first class fields, remove a layer from values
* Include dashUID and panelID as line-level fields
* Drop unnecessary object receiver
* Add tests for stream building
* Drop NoData field from log lines
* Set YAML as default value for exporting alert rules
* use YAML format for rule list export
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
* lint
* Add new format query param to swagger+docs
* Fix broken test
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
Co-authored-by: Matt Jacobson <matthew.jacobson@grafana.com>
This change exposes more metrics from the embedded Alertmanager, which are
valuable for troubleshooting Alertmanager operation particularly in HA setups.
```
grafana_alerting_notifications_total
grafana_alerting_notifications_failed_total
grafana_alerting_notification_requests_total
grafana_alerting_notification_requests_failed_total
grafana_alerting_notification_latency_seconds
grafana_alerting_nflog_gc_duration_seconds
grafana_alerting_nflog_snapshot_duration_seconds
grafana_alerting_nflog_snapshot_size_bytes
grafana_alerting_nflog_queries_total
grafana_alerting_nflog_query_errors_total
grafana_alerting_nflog_query_duration_seconds
grafana_alerting_nflog_gossip_messages_propagated_total
grafana_alerting_dispatcher_aggregation_groups
grafana_alerting_dispatcher_alert_processing_duration_seconds
```
Note that `alertmanager_dispatcher_aggregation_group_limit_reached_total` is
explicitly not exposed, as the group limit metrics are not enabled.
* Add is_paused attr to the POST alert rule group endpoint
* Add is_paused to alerting API POST alert rule group
* Fixed tests
* Add is_paused to alerting gettable endpoints
* Fix integration tests
* Alerting: allow to pause existing rules (#62401)
* Display Pause Rule switch in Editing Rule form
* add isPaused property to form interface and dto
* map isPaused prop with is_paused value from DTO
Also update test snapshots
* Append '(Paused)' text on alert list state column when appropriate
* Change Switch styles according to discussion with UX
Also adding a tooltip with info what this means
* Adjust styles
* Fix alignment and isPaused type definition
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* Fix test
* Fix test
* Fix RuleList test
---------
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* wip
* Fix tests and add comments to clarify AlertRuleWithOptionals
* Fix one more test
* Fix tests
* Fix typo in comment
* Fix alert rule(s) cannot be paused via API
* Add integration tests for alerting api pausing flow
* Remove duplicated integration test
---------
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Fix json serialization of state values
* Simplify two of the tests
* Fix linter complaint
* Don't return error if we fail to look up dashboard, just log it and move on
* Address linter complaint
* Use suggested value for uid
* update the snapshot
* use __expr__
* replace all -100 with __expr__
* update snapshot
* more changes
* revert redundant change
* Use expr.DatasourceUID where it's possible
* generate files
* Allow pausing alerts from provisioning
* Update swagger
* Add IsPaused to provision export endpoints
* Add pause field in sample.yml
* Add exception for reset state in first loop iteration of scheduler if rule is paused
* Update provision definition and swagger docs
* Fix provisioning export tests
* Suggestion: Simplify if condition
* Add more context to a comment
* add nested folder scope inheritance to managed permission services
* add a more specific erorr
* remove circular dependencies
* use errutil for returning erorr
* fix tests
* fix tests
* define a new error in ac package
This adds provisioning endpoints for downloading alert rules and alert rule groups in a
format that is compatible with file provisioning. Each endpoint supports both json and
yaml response types via Accept header as well as a query parameter
download=true/false that will set Content-Disposition to recommend initiating a download
or inline display.
This also makes some package changes to keep structs with potential to drift closer
together. Eventually, other alerting file structs should also move into this new file
package, but the rest require some refactoring that is out of scope for this PR.
* update Delete and Reset methods to return state transitions
this will be used by notifier code to decide whether alert needs to be sent or not.
* update scheduler to provide reason to delete states and use transitions
* update FromAlertsStateToStoppedAlert to accept StateTransition and filter by old state
* fixup
* fix tests
* Add field in alert_rule model, add state to alert_instance model, and state to eval
* Remove paused state from eval package
* Skip paused alert rules in scheduler
* Add migration to add is_paused field to alert_rule table
* Convert to postable alerts only if not normal, pernding, or paused
* Handle paused eval results in state manager
* Add Paused state to eval package
* Add paused alerts logic in scheduler
* Skip alert on scheduler
* Remove paused status from eval package
* Apply suggestions from code review
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Remove state
* Rethink schedule and manager for paused alerts
* Change return to continue
* Remove unused var
* Rethink alert pausing
* Paused alerts storing annotations
* Only add one state transition
* Revert boolean method renaming refactor
* Revert take image refactor
* Make registry errors public
* Revert method extraction for getting a folder title
* Revert variable renaming refactor
* Undo unnecessary changes
* Revert changes in test
* Remove IsPause check in PatchPartiLAlertRule function
* Use SetNormal to set state
* Fix text by returning to old behaviour on alert rule deletion
* Add test in schedule_unit_test.go to test ticks with paused alerts
* Add coment to clarify usage of context.Background()
* Add comment to clarify resetStateByRuleUID method usage
* Move rule get to a more limited scope
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* rum gofmt on pkg/services/ngalert/schedule/schedule.go
* Remove defer cancel for context
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/testing.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* skip scheduler rule state clean up on paused alert rule
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Fix mock in test
* Add (hopefully) final suggestions
* Use error channel from recordAnnotationsSync to cancel context
* Run make gen-cue
* Place pause alert check in channel update after version check
* Reduce branching un update channel select
* Add if for error and move code inside if in state manager ResetStateByRuleUID
* Add reason to logs
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Do not delete alert rule routine, just exit on eval if is paused
* Reduce branching and create-close a channel to avoid deadlocks
* Separate state deletion and state reset (includes history saving)
* Add current pause state in rule route in scheduler
* Split clearState and bring errCh closer to RecordStatesAsync call
* Change rule to ruleMeta in RecordStatesAsync
* copy state to be able to modify it
* Add timeout to context creation
* Shorten the timeout
* Use resetState is rule is paused and deleteState if rule is not paused
* Remove Empty state reason
* Save every rule change in historian
* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID
* Remove useless line
* Remove outdated comment
Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* Alerting: Fix Test Receivers when settings are non-strings
As part of the Alerting extraction, we want to make sure we don't have circular depedencies. As such, I had to move `PostableGrafanaReceiver` to a new struct in `grafana/alerting` called `GrafanaReceiver`.
`PostableGrafanaReceiver` has an attribute called `Settings` that uses a Grafana-propietary struct called `RawMessage`, this struct shadows `json.RawMessage`.
When I created `GrafanaReceiver`, I turned settings into a `map[string]string` thinking all settings would end up as strings. This was a mistake, and this test proves that it doesn't work, and breaks the API.
* Access Control: Add folder service dependency to the dashboard/folder resolvers
* Expose the function fetching parents to folder interface
* Add generic prepend utility
* Modify dashboard resolvers to return inherited scopes
* Copy rules instead of accepting pointer
* Deep-copy the rule, for even more guarantees
* Create struct just for needed fields
* Move RuleMeta to historian/model package, iron out package dependencies
* Move tests for dash ID parsing to model package along with code
* Implement push endpoint
* Drop duplicated struct
* Genericize auth/tenant headers and improve logging in error case
* Flesh out the data model
* Drop dead code
* Drop log line entirely
* Drop unused arg
* Rename a few type manipulation functions
* Extract label keys as constants
* Improve logs when loki responds with error
* Inline lokiRepresentation function