This commit adds support for limits and filters to the Prometheus Rules
API.
Limits:
It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:
- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.
Filters:
Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.
A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.
Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.
Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.
The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:
{
"name": "test",
"value": "value1",
"isRegex": false,
"isEqual": true
}
The `isRegex` and `isEqual` options work as follows:
| IsEqual | IsRegex | Operator |
| ------- | -------- | -------- |
| true | false | = |
| true | true | =~ |
| false | true | !~ |
| false | false | != |
* Delete folders, dashboards with registry service
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Update signature of ProvideDashboardServiceImpl
* Regenerate mockery file
* Add test for DeleteInFolder
* Add test for DeleteDashboardsInFolder
* Delete child dashboard associations via registry
* Add validation of folder uid and org id
---------
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* replace receiver errors with one from alerting
* add the converter to alerting models
* update buildReceiverIntegration to accept GrafanaReceiver
---------
Co-authored-by: George Robinson <george.robinson@grafana.com>
* define initial service and add to wire
* update caching service interface
* add skipQueryCache header handler and update metrics query function to use it
* add caching service as a dependency to query service
* working caching impl
* propagate cache status to frontend in response
* beginning of improvements suggested by Lean - separate caching logic from query logic.
* more changes to simplify query function
* Decided to revert renaming of function
* Remove error status from cache request
* add extra documentation
* Move query caching duration metric to query package
* add a little bit of documentation
* wip: convert resource caching
* Change return type of query service QueryData to a QueryDataResponse with Headers
* update codeowners
* change X-Cache value to const
* use resource caching in endpoint handlers
* write resource headers to response even if it's not a cache hit
* fix panic caused by lack of nil check
* update unit test
* remove NONE header - shouldn't show up in OSS
* Convert everything to use the plugin middleware
* revert a few more things
* clean up unused vars
* start reverting resource caching, start to implement in plugin middleware
* revert more, fix typo
* Update caching interfaces - resource caching now has a separate cache method
* continue wiring up new resource caching conventions - still in progress
* add more safety to implementation
* remove some unused objects
* remove some code that I left in by accident
* add some comments, fix codeowners, fix duplicate registration
* fix source of panic in resource middleware
* Update client decorator test to provide an empty response object
* create tests for caching middleware
* fix unit test
* Update pkg/services/caching/service.go
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
* improve error message in error log
* quick docs update
* Remove use of mockery. Update return signature to return an explicit hit/miss bool
* create unit test for empty request context
* rename caching metrics to make it clear they pertain to caching
* Update pkg/services/pluginsintegration/clientmiddleware/caching_middleware.go
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Add clarifying comments to cache skip middleware func
* Add comment pointing to the resource cache update call
* fix unit tests (missing dependency)
* try to fix mystery syntax error
* fix a panic
* Caching: Introduce feature toggle to caching service refactor (#66323)
* introduce new feature toggle
* hide calls to new service behind a feature flag
* remove licensing flag from toggle (misunderstood what it was for)
* fix unit tests
* rerun toggle gen
---------
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Takes a specific code path for data that identifies itself as dataplane instead of "guessing" what the data is.
The data must identify itself by being in the dataplane by having both the following frame metadata properties:
- TypeVersion property that is greater than 0.0
- 'Type' property
The flag is disableSSEDataplane and disables this functionality and uses the old code for all queries regardless.
See https://github.com/grafana/grafana-plugin-sdk-go/blob/main/data/contract_docs/contract.md for dataplane details.
* Alerting: Remove and revert flag alertingBigTransactions
This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag.
Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count
came at the cost of significant increase in CPU usage and query latency.
* Fix lint backend
* Removed last bits of alertingBigTransactions
---------
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* Alerting: Tiny refactor on the eval and schedule packages
two very small things:
- We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext`
- The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary.
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
---------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Alerting: Add endpoint to revert to a previous alertmanager configuration
This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
* Add fresh context with timeout and same log properties, re-derive logger
* Unify timeout constants
* Move ctx after shortcut that got added through rebasing
* Unify timeouts
* Port opentracing's SpanFromContext and ContextFromSpan to the grafana tracing package
* Support both opentracing and otel variants
* Better document why we're creating a new ctx
* Add new func to FakeSpan which was added after rebase
* Support grafana-specific traceID key in both tracer implementations
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:
1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
* WIP
* skip invalid historic configurations instead of erroring
* add warning log when bad historic config is found
* remove unused custom marshaller for GettableHistoricUserConfig
* add id to historic user config, move limit check to store, fix typo
* swagger spec
* Alerting: Respect "For" Duration for NoData alerts
This change modifies `resultNoData` to be more inline with the logic of the other state handlers.
The main effects of this are:
1) NoData states with NoDataState config set to Alerting will respect "For" duration.
2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK.
3) Better state transition logging.
* define 3 feature toggles for rollout phases
* Pass feature toggles along
* Implement first feature toggle
* Try a different strategy with fall-throughs to specific configurations
* Apply toggle overrides once outside of backend composition
* Emit log messages when we coerce backends
* Run code generator for feature toggle files
* Improve wording in flag descs
* Re-run generator
* Use code-generated constants instead of plain strings
* Use converted enum values rather than strings for pre-parsing
* move export rules to definitions package
* move provisioning contact point methods to provisioning package
* move AlertRuleGroupWithFolderTitle to ngalert models and adapter functions to api's compat
* move rule_types files back to where they were before.
* Remove private labels
* No longer index by instance labels
* Labels are now invariant, only build them once
* Remove bucketing since everything is in a single stream
* Refactor statesToStreams to only return a single unified log stream
* Don't query on labels that no longer exist
* Move selector logic to loki layer, genericize client to work in terms of straight logQL
* Add support for line-level label filters in query
* Combine existing selector tests for better parallelism
* Tests for logQL construction
* Underscore instead of dot for unwrapping labels in logql
* Alerting: Add CustomDetails for PagerDuty
* fix default value for 'severity' from 'error' to 'critical'
* minimal docs for notifiers, specifying config for PagerDuty
* replace notifier -> integration
* replace notifier -> integration
* copy AlertQuery from ngmodels to the definition package
* replaces usages of ngmodels.AlertQuery in API models
* create a converter between models of AlertQuery
---------
Co-authored-by: Alex Moreno <alexander.moreno@grafana.com>
* Encode with snappy, always
* JSON encoder type
* Headers
* Copy labels formatter from promtail
* Implement snappy-proto encoding
* Create encoder interface, test both encoders, choose snappy-proto by default
* Make encoder configurable at the LokiCfg level
* Export both encoders
* Touch up comment and tests
* Drop unnecessary conversions after move to plain strings to appease linter
* Rename RecordStatesAsync to Record
* Rename QueryStates to Query
* Implement fanout writes
* Implement primary queries
* Simplify error joining
* Add test for query path
* Add tests for writes and error propagation
* Allow fanout backend to be configured
* Touch up log messages and config validation
* Consistent documentation for all backend structs
* Parse and normalize backend names more consistently against an enum
* Touch-ups to documentation
* Improve clarity around multi-record blocking
* Keep primary and secondaries more distinct
* Rename fanout backend to multiple backend
* Simplify config keys for multi backend mode
* stop using the scheduler's Update and Delete methods all communication must be via the database
* update scheduler's registry to calculate diff before re-setting the cache
* update fetcher to return the diff generated by registry
* update processTick to update rule eval routine if the rule was updated and it is not going to be evaluated at this tick.
* remove references to the scheduler from api package
* remove unused methods in the scheduler
This commit changes the state package so that errors encountered while
expanding templates for custom labels and annotations are returned
from the function. This is not used at present, but will be used in the
future as we look at how to offer better feedback to users who don't
have access to logs, for example our customers who use Hosted Grafana.
This commit fixes a bug in the $values variable in notification
templates when using Classic Conditions. Since Classic Conditions
are not multi-dimensional, the values of each series that exceeded
the condition should be available as a RefID and offset. For example,
B0, B1, etc. However, this bug meant that instead just a single
condition would be printed as B, not B0.
* Create historian metrics and dependency inject
* Record counter for total number of state transitions logged
* Track write failures
* Track current number of active write goroutines
* Record histogram of how long it takes to write history data
* Don't copy the registerer
* Adjust naming of write failures metric
* Introduce WritesTotal to complement WritesFailedTotal
* Measure TransitionsFailedTotal to complement TransitionsTotal
* Rename all to state_history
* Remove redundant Total suffix
* Increment totals all the time, not just on success
* Drop ActiveWriteGoroutines
* Drop PersistDuration in favor of WriteDuration
* Drop unused gauge
* Make writes and writesFailed per org
* Add metric indicating backend and a spot for future metadata
* Drop _batch_ from names and update help
* Add metric for bytes written
* Better pairing of total + failure metric updates
* Few tweaks to wording and naming
* Record info metric during composition
* Create fakeRequester and simple happy path test using it
* Blocking test for the full historian and test for happy path metrics
* Add tests for failure case metrics
* Smoke test for full annotation persistence
* Create test for metrics on annotation persistence, both happy and failing paths
* Address linter complaints
* More linter complaints
* Remove unnecessary whitespace
* Consistency improvements to help texts
* Update tests to match new descs
* Alerting: Add metrics for active receiver and integrations
Introduces metrics that allows us to track the number of configured receivers and integration in the Alertmanager for all orgs.
As a bonus, I realised that the alert reception metrics where not being exported nor collected. This does that too.
* revert to using folder store from the resolvers
* fixing tests after revert
* api test fixes
---------
Co-authored-by: Kristin Laemmert <mildwonkey@users.noreply.github.com>
* chore(services): replace dependencies on dashboard store with dashboard service
This continues the backend service/store split by replacing dashboard store dependencies with service dependencies. the folder service remains the single exception for now; otherwise we'd have a dependency cycle between the folder and dashboard services. I have some ideas for that, but I'll take care of all the easy parts first.
While doing this, I identified and removed a number of unused arguments from the following functions:
NewFolderNameScopeResolver
NewFolderIDScopeResolver
NewFolderUIDScopeResolver
NewDashboardIDScopeResolver
NewDashboardUIDScopeResolver
resolveDashboardScope
I have a small enterprise PR to support this commit.
* lingering fmt
* Loki backend and client depend on a requester
* Instrument all requests to loki using weaveworks TimedClient
* Construct collector in metrics package
This commit fixes an incorrect comment in the Result struct in eval.go
that I had written some time ago. The comment now documents the
actual behaviour and content of this field.
* Alerting: get alert rules on faults (#61248)
Two functions used to fetch alert rules from DB are updated:
- GetAlertRulesForScheduling
- ListAlertRules
Rows are scanned one by one so good ones are returned.
Common Error is logged with indication how many
rules failed on deserialization.
Resolved: #61248
* updates from review comments
This commit adds filterLabels, filterLabelsRe, removeLabels, and
removeLabelsRe functions to templates for custom labels and annotations.
It allows for use cases such as removing all private labels.
This commit changes the Data struct in template.go to use Labels
instead of map[string]string. It changes how labels are printed
when using {{ .Labels }} from map[foo:bar bar:baz] to
foo=bar, bar=baz.
This commit changes how labels are printed in templates for custom
annotations and labels from map[foo:bar bar:baz] to foo=bar, bar=baz.
Labels are comma separated, and sorted in increasing order.
This commit moves templating from the state package to a sub-package
called template. This sub-package will be the logical package for
future ease-of-use improvements to templating custom annotations
and labels.
* Use existing row struct instead of [2]string, add deserialization helper
* Replace Stream struct with stream struct which is exactly the same
* Drop unused status field
* Don't export queryRes and queryData
* Tests for custom marshalling
* Rename row fields to T and V for consistency with prometheus samples
* Rename row to sample
The `rule_groups_rules` metric is currently defined and computed by `State`.
It makes more sense for this metric to be computed off of the configured rule
set, not based on the rule evaluation state. There could be an edge condition
where a rule does not have a state yet, and so is uncounted.
Additionally, we would like this metric (and others), to have a `rule_group`
label, and this is much easier to achieve if the metric is produced from the
`Scheduler` package.
* Remove Result field from AddDataSourceCommand
* Remove DatasourcesPermissionFilterQuery Result
* Remove GetDataSourceQuery Result
* Remove GetDataSourcesByTypeQuery Result
* Remove GetDataSourcesQuery Result
* Remove GetDefaultDataSourceQuery Result
* Remove UpdateDataSourceCommand Result
* change error to warning, apply correct format
* Update pkg/services/ngalert/store/alertmanager.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* different wording and data
---------
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Alerting: implement loki query for alert state history
* extract selector building
* add unit tests for selector creation
* backup
* give selectors their own type
* build dataframe
* add some tests
* small changes after manual testing
* use struct client
* golint
* more golint
* Make RuleUID optional for Loki implementation
* Drop initial assumption that we only have one series
* Pare down to three columns, fix timestamp overflows, improve failure cases in loki responses
* Embed structred log lines in the dataframe as objects rather than json strings
* Include state history label filter
* Remove dead code
---------
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
* Alerting: Fix template validation in provisioning api
Fix issue where provisioning API accepts a malformed template having extra
text outside of definition block and template name matching definition name.
* Mark AM configuration as applied
* add missing checks, make linter happy
* fix deadlock, mark as valid on save and on load
* mark configurations only if needed
* check error after applyConfig()
* code review comments
* code review changes
* more code review changes
* clean HistoricConfigFromAlertConfig function
* Define endpoint and generate
* Wire up and register endpoint
* Cleanup, define authorization
* Forgot the leading slash
* Wire up query and SignedInUser
* Wire up timerange query params
* Add todo for label queries
* Drop comment
* Update path to rules subtree
* Extract label merge, add test file
* Extract error/NoData to first class fields, remove a layer from values
* Include dashUID and panelID as line-level fields
* Drop unnecessary object receiver
* Add tests for stream building
* Drop NoData field from log lines
* Set YAML as default value for exporting alert rules
* use YAML format for rule list export
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
* lint
* Add new format query param to swagger+docs
* Fix broken test
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
Co-authored-by: Matt Jacobson <matthew.jacobson@grafana.com>
This change exposes more metrics from the embedded Alertmanager, which are
valuable for troubleshooting Alertmanager operation particularly in HA setups.
```
grafana_alerting_notifications_total
grafana_alerting_notifications_failed_total
grafana_alerting_notification_requests_total
grafana_alerting_notification_requests_failed_total
grafana_alerting_notification_latency_seconds
grafana_alerting_nflog_gc_duration_seconds
grafana_alerting_nflog_snapshot_duration_seconds
grafana_alerting_nflog_snapshot_size_bytes
grafana_alerting_nflog_queries_total
grafana_alerting_nflog_query_errors_total
grafana_alerting_nflog_query_duration_seconds
grafana_alerting_nflog_gossip_messages_propagated_total
grafana_alerting_dispatcher_aggregation_groups
grafana_alerting_dispatcher_alert_processing_duration_seconds
```
Note that `alertmanager_dispatcher_aggregation_group_limit_reached_total` is
explicitly not exposed, as the group limit metrics are not enabled.
* Add is_paused attr to the POST alert rule group endpoint
* Add is_paused to alerting API POST alert rule group
* Fixed tests
* Add is_paused to alerting gettable endpoints
* Fix integration tests
* Alerting: allow to pause existing rules (#62401)
* Display Pause Rule switch in Editing Rule form
* add isPaused property to form interface and dto
* map isPaused prop with is_paused value from DTO
Also update test snapshots
* Append '(Paused)' text on alert list state column when appropriate
* Change Switch styles according to discussion with UX
Also adding a tooltip with info what this means
* Adjust styles
* Fix alignment and isPaused type definition
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* Fix test
* Fix test
* Fix RuleList test
---------
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* wip
* Fix tests and add comments to clarify AlertRuleWithOptionals
* Fix one more test
* Fix tests
* Fix typo in comment
* Fix alert rule(s) cannot be paused via API
* Add integration tests for alerting api pausing flow
* Remove duplicated integration test
---------
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Fix json serialization of state values
* Simplify two of the tests
* Fix linter complaint
* Don't return error if we fail to look up dashboard, just log it and move on
* Address linter complaint
* Use suggested value for uid
* update the snapshot
* use __expr__
* replace all -100 with __expr__
* update snapshot
* more changes
* revert redundant change
* Use expr.DatasourceUID where it's possible
* generate files
* Allow pausing alerts from provisioning
* Update swagger
* Add IsPaused to provision export endpoints
* Add pause field in sample.yml
* Add exception for reset state in first loop iteration of scheduler if rule is paused
* Update provision definition and swagger docs
* Fix provisioning export tests
* Suggestion: Simplify if condition
* Add more context to a comment
* add nested folder scope inheritance to managed permission services
* add a more specific erorr
* remove circular dependencies
* use errutil for returning erorr
* fix tests
* fix tests
* define a new error in ac package
This adds provisioning endpoints for downloading alert rules and alert rule groups in a
format that is compatible with file provisioning. Each endpoint supports both json and
yaml response types via Accept header as well as a query parameter
download=true/false that will set Content-Disposition to recommend initiating a download
or inline display.
This also makes some package changes to keep structs with potential to drift closer
together. Eventually, other alerting file structs should also move into this new file
package, but the rest require some refactoring that is out of scope for this PR.
* update Delete and Reset methods to return state transitions
this will be used by notifier code to decide whether alert needs to be sent or not.
* update scheduler to provide reason to delete states and use transitions
* update FromAlertsStateToStoppedAlert to accept StateTransition and filter by old state
* fixup
* fix tests
* Add field in alert_rule model, add state to alert_instance model, and state to eval
* Remove paused state from eval package
* Skip paused alert rules in scheduler
* Add migration to add is_paused field to alert_rule table
* Convert to postable alerts only if not normal, pernding, or paused
* Handle paused eval results in state manager
* Add Paused state to eval package
* Add paused alerts logic in scheduler
* Skip alert on scheduler
* Remove paused status from eval package
* Apply suggestions from code review
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Remove state
* Rethink schedule and manager for paused alerts
* Change return to continue
* Remove unused var
* Rethink alert pausing
* Paused alerts storing annotations
* Only add one state transition
* Revert boolean method renaming refactor
* Revert take image refactor
* Make registry errors public
* Revert method extraction for getting a folder title
* Revert variable renaming refactor
* Undo unnecessary changes
* Revert changes in test
* Remove IsPause check in PatchPartiLAlertRule function
* Use SetNormal to set state
* Fix text by returning to old behaviour on alert rule deletion
* Add test in schedule_unit_test.go to test ticks with paused alerts
* Add coment to clarify usage of context.Background()
* Add comment to clarify resetStateByRuleUID method usage
* Move rule get to a more limited scope
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* rum gofmt on pkg/services/ngalert/schedule/schedule.go
* Remove defer cancel for context
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/testing.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/schedule/schedule_unit_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/models/instance_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* skip scheduler rule state clean up on paused alert rule
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Fix mock in test
* Add (hopefully) final suggestions
* Use error channel from recordAnnotationsSync to cancel context
* Run make gen-cue
* Place pause alert check in channel update after version check
* Reduce branching un update channel select
* Add if for error and move code inside if in state manager ResetStateByRuleUID
* Add reason to logs
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Do not delete alert rule routine, just exit on eval if is paused
* Reduce branching and create-close a channel to avoid deadlocks
* Separate state deletion and state reset (includes history saving)
* Add current pause state in rule route in scheduler
* Split clearState and bring errCh closer to RecordStatesAsync call
* Change rule to ruleMeta in RecordStatesAsync
* copy state to be able to modify it
* Add timeout to context creation
* Shorten the timeout
* Use resetState is rule is paused and deleteState if rule is not paused
* Remove Empty state reason
* Save every rule change in historian
* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID
* Remove useless line
* Remove outdated comment
Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* Alerting: Fix Test Receivers when settings are non-strings
As part of the Alerting extraction, we want to make sure we don't have circular depedencies. As such, I had to move `PostableGrafanaReceiver` to a new struct in `grafana/alerting` called `GrafanaReceiver`.
`PostableGrafanaReceiver` has an attribute called `Settings` that uses a Grafana-propietary struct called `RawMessage`, this struct shadows `json.RawMessage`.
When I created `GrafanaReceiver`, I turned settings into a `map[string]string` thinking all settings would end up as strings. This was a mistake, and this test proves that it doesn't work, and breaks the API.
* Access Control: Add folder service dependency to the dashboard/folder resolvers
* Expose the function fetching parents to folder interface
* Add generic prepend utility
* Modify dashboard resolvers to return inherited scopes
* Copy rules instead of accepting pointer
* Deep-copy the rule, for even more guarantees
* Create struct just for needed fields
* Move RuleMeta to historian/model package, iron out package dependencies
* Move tests for dash ID parsing to model package along with code
* Implement push endpoint
* Drop duplicated struct
* Genericize auth/tenant headers and improve logging in error case
* Flesh out the data model
* Drop dead code
* Drop log line entirely
* Drop unused arg
* Rename a few type manipulation functions
* Extract label keys as constants
* Improve logs when loki responds with error
* Inline lokiRepresentation function
This commit renames "Message templates" to "Notification templates"
in the user interface as it suggests that these templates cannot
be used to template anything other than the message. However, message
templates are much more general and can be used to template other fields
too such as the subject of an email, or the title of a Slack message.
* Create loki client type and ping method
* Expose TestConnection on client
* Configure and ping Loki URL
* Close response body reader if present
* Add 30 second timeout
* Remove duplicate close
* add feature flag `alertingNoNormalState`
* update instance database to support exclusion of state in list operation
* do not save normal state and delete transitions to normal
* update get methods to filter out normal state
This commit adds a customizable timeout for screenshots called
capture_timeout. The default value is 10 seconds, and the maximum
value is 30 seconds. This timeout should be less than the minimum
Interval of all Evaluation Groups to avoid back pressure on alert
rule evaluation.
* Update config store to split between active and history tables
* Migrations to fix up indexes
* Implement migration from old format to new
* Move add migrations call
* Delete duplicated rows
* Explicitly map fields
* Quote the column name because it's a reserved word
* Lift migrations to top
* Use XORM for nearly everything, avoid any non trivial raw SQL
* Touch up indexes and zero out IDs on move
* Drop TODO that's already completed
* Fix assignment of IDs
Automatically forward core plugin request HTTP headers in outgoing HTTP requests.
Core datasource plugin authors don't have to specifically handle forwarding of HTTP
headers, e.g. do not have to "hardcode" the header-names in the datasource plugin,
if not having custom needs.
Fixes#57065
* refactor email to not use simplejson
* add tests
* split integration test and unit test + more unit-tests
* Remove outdated comment
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* introduce alias for json.RawMessage with name RawMessage. This is needed to keep raw JSON and implement a marshaler for YAML, which does not seem to be used but there are tests that fail.
* replace usage of simplejson with RawMessage in NotificationChannelConfig
* remove usage of simplejson in tests
* change migration code to convert simplejson to raw message
* Set Dashboard and Panel IDs on rule group replacement
* fix comments and abbreviate test variable name
* Update pkg/services/ngalert/provisioning/alert_rules.go
Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
* Update config store to split between active and history tables
* Migrations to fix up indexes
* Implement migration from old format to new
* Move add migrations call
* Delete duplicated rows
* Explicitly map fields
* Quote the column name because it's a reserved word
* Lift migrations to top
* introduce Logger interface local to channles + implementaton that wraps the Grafana logger
* make NewFactoryConfig accept LoggerFactory
* add logger field to FactoryConfig
* update usages of log.Logger to internal interface
* Guardian: Use dashboard UID instead of ID
* Apply suggestions from code review
Introduce several guardian constructors and each time use
the most appropriate one.
* Implement backtesting engine that can process regular rule specification (with queries to datasource) as well as special kind of rules that have data frame instead of query.
* declare a new API endpoint and model
* add feature toggle `alertingBacktesting`
* Move truncation code to util to mirror upstream
* Resolve merge conflicts
* Align logging of alert key
* Update tests and fix field passing bug
* Remove superfluous newline in test now that we trim whitespace
* Uptake minor log changes from upstream
This change marks tests in the `sender` package that use an external
process as integration tests instead of unit tests. This speeds up the
package's unit tests by about 20 seconds.
This change also reduces the number of alert instances in the `store`
package's bulk write integration test from 20_000 to 10_000. This is
still enough to exercise the bulk-write code but speeds up the package
tests from about 250s to 130s.
Put together, integration tests go to about 160s while also speeding up
unit tests by 20s.
This commit better defines how we set states in resultNormal,
resultAlerting, resultError and resultNoData. It changes the existing
code to call methods such as SetAlerting, SetPending, SetNormal,
SetError and NoData instead of assigning values to each individual field
whenever the state is changed. This should make it easier to understand
what fields should be set for which states and avoid cases where states are
missing, or have additional unexpected fields.
Before this change, the alerting provisioning system incorrectly used
the QuotaTarget to check if alerting's request quota had been reached.
The quota service requires the QuotaTargetSrv, which is what's
registered with the service at startup time. This is leading to errors
in the provisioning system.