* Alerting: Rename remote.ExternalAlertmanager to remote.Alertmanager
* Alerting: Send alerts to the remote Alertmanager
* add ticker to readiness check, add tests
* use options when creating a new sender.ExternaAlertmanager
* unexport defaultMaxQueueCapacity
* delete unused defaultConfig field
* add debug log line when sending alerts to the remote alertmanager
* move and refactor readiness check
* update tests to not include defaultConfig
* Alerting: Move `ExternalAlertmanager` to its own package
We'll avoid import cycles when using components from other packages. In addition to that, I've created an `Options` approach for the multiorg alertmanger to allow us to override how per tenant alertmanagers are created.
* switch things around
* address review comments
* fix references and warnings
* Alerting: Move migration from background service run to ngalert init
sqlite database write contention between the migration's single transaction and
dashboard provisioning's frequent commits was causing the migration to
fail with SQLITE_BUSY/SQLITE_BUSY_SNAPSHOT on all retries.
This is not a new issue for sqlite+grafana, but the discrepancy between the
length of the transactions was causing it to be very consistent. In addition,
since a failed migration has implications on the assumed correctness of the
alertmanager and alert rule definition state, we cause a server shutdown on
error. This can make e2e tests as well as some high-load provisioned
sqlite installations flaky on startup.
The correct fix for this is better transaction management across various
services and is out of scope for this change as we're primarily interested in
mitigating the current bout of server failures in e2e tests when using sqlite.
* Alerting: post alerts to the remote Alertmanager and fetch them
* fix broken tests
* Alerting: Add Mimir Backend image to devenv (blocks)
* add alerting as code owner for mimir_backend block
* Alerting: Use Mimir image to run integration tests for the remote Alertmanager
* skip integration test when running all tests
* skipping integration test when no Alertmanager URL is provided
* fix bad host for mimir_backend
* remove basic auth testing until we have an nginx image in our CI
* add integration tests for alerts
* fix tests
* change SendCtx -> Send, add context.Context to Send, fix CI
* add reover() for functions from the Prometheus Alertmanager HTTP client that could panic
* add TODO to implement PutAlerts in a way that mimicks what Prometheus does
* fix log format
* Alerting: Use Mimir image to run integration tests for the remote Alertmanager
* skip integration test when running all tests
* skipping integration test when no Alertmanager URL is provided
* fix bad host for mimir_backend
* remove basic auth testing until we have an nginx image in our CI
* Fix migration of custom dashboard permissions
Dashboard alert permissions were determined by both its dashboard and
folder scoped permissions, while UA alert rules only have folder
scoped permissions.
This means, when migrating an alert, we'll need to decide if the parent folder
is a correct location for the newly created alert rule so that users, teams,
and org roles have the same access to it as they did in legacy.
To do this, we translate both the folder and dashboard resource
permissions to two sets of SetResourcePermissionCommands. Each of these
encapsulates a mapping of all:
OrgRoles -> Viewer/Editor/Admin
Teams -> Viewer/Editor/Admin
Users -> Viewer/Editor/Admin
When the dashboard permissions (including those inherited from the parent
folder) differ from the parent folder permissions alone, we need to create a
new folder to represent the access-level of the legacy dashboard.
Compromises:
When determining the SetResourcePermissionCommands we only take into account
managed and basic roles. Fixed and custom roles introduce significant complexity
and synchronicity hurdles. Instead, we log a warning they had the potential to
override the newly created folder permissions.
Also, we don't attempt to reconcile datasource permissions that were
not necessary in legacy alerting. Users without access to the necessary
datasources to edit an alert rule will need to obtain said access separate from
the migration.
This PR replaces the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls.
It also fills in some gaps in the testing suite around:
- Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
- Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993
Replacing the checks for custom dashboard ACLs will be replaced in a separate targeted PR as it will be complex enough alone.
* update storage's method InstertRules to return ids of added rules as slice to keep the same order as rules in the argument
* schematize response of update rule group endpoint, add created, updated, deleted fields that contain UID of affected rules.
* update integration tests to use the new fields
* extend RuleStore interface to get namespace by UID
* add new export API endpoints
* implement request handlers
* update authorization and wire handlers to paths
* add folder error matchers to errorToResponse
* add tests for export methods
* Alerting: Expose metrics for Alertmanager Alerts
In Grafana, the alert evaluation and alert delivery are combined. We're always used a metric named `grafana_alerting_alerts` to get a sense of what are the alerts that are currently firing (these come from the evaluation side) and opted to not map the alertmanager alerts metric directly.
I think it's important that we make a disction between alerts that happen at evaluation vs alerts that are received for delivery by the internal Alertmanager as we have options to skip the delivery of these alerts to the internal alertmanager altogether.
* Migrate old alerting templates to use $labels
* Fix imports
* Add test coverage and separate rewriting to Go templates
* Fix lint
* Check for additional closing braces
* Add logging of invalid message templates
* Fix tests
* Small fixes
* Update comments
* Panic on empty token
* Use logtest.Fake
* Fix lint
* Allow for spaces in variable names by not tokenizing spaces
* Add template function to deduplicate Labels in a Value map
* Fix behavior of mapLookupString
* Reference deduplicated labels in migrated message template
* Fix behavior of deduplicateLabelsFunc
* Don't create variable for parent logger
* Add more tests for deduplicateLabelsFunc
* Remove unused function
* Apply suggestions from code review
Co-authored by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* Give label val merge function better name
* Extract template migration and escape literal tokens
* Consolidate + simplify template migration
---------
Co-authored-by: William Wernert <william.wernert@grafana.com>
* Alerting: Manage remote Alertmanager silences
* fix typo
* check errors when encoding json in fake external AM
* take path from configured URL, check for nil responses
* Alerting: Don't use a separate collection system for metrics
The state package had a metric collection system that ran every 15s updating the values of the metrics - there is a common pattern for this in the Prometheus ecosystem called "collectors".
I have removed the behaviour of using a time-based interval to "set" the metrics in favour of a set of functions as the "value" that get called at scrape time.
* Add support for `keep_firing_for` in ruler proxy
* Don't delete `keep_firing_for` when editing a rule with the field set
Co-Authored-By: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
---------
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
Changes SSE to not always fail all queries when one fails. Now only the query itself, and nodes that depend on it will error.
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* Make identity.Requester available at Context
* Clean pkg/services/guardian/guardian.go
* Clean guardian provider and guardian AC
* Clean pkg/api/team.go
* Clean ctxhandler, datasources, plugin and live
* Clean dashboards and guardian
* Implement NewUserDisplayDTOFromRequester
* Change status code numbers for http constants
* Upgrade signature of ngalert services
* log parsing errors instead of throwing error
* Make identity.Requester available at Context
* Clean pkg/services/guardian/guardian.go
* Clean guardian provider and guardian AC
* Clean pkg/api/team.go
* Clean ctxhandler, datasources, plugin and live
* Question: what to do with the UserDisplayDTO?
* Clean dashboards and guardian
* Remove identity.Requester from ReqContext
* Implement NewUserDisplayDTOFromRequester
* Fix tests
* Change status code numbers for http constants
* Upgrade signature of ngalert services
* log parsing errors instead of throwing error
* Fix tests and add logs
* linting
* add metrics and tracing to state manager
* propagate tracer to state manager
* add scheduler metrics
* fix backtesting
* add test for state metrics
* remove StateUpdateCount
* update docs
* metrics can be null
* add tracer to new tests
* make discord url secure
* support migrating unsecure settings to secure settings
* Update public/app/features/alerting/unified/utils/receiver-form.ts
Co-authored-by: William Wernert <william.wernert@grafana.com>
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
Co-authored-by: William Wernert <william.wernert@grafana.com>
* introduce a new action "alert.provisioning.secrets:read" and role "fixed:alerting.provisioning.secrets:reader"
* update alerting API authorization layer to let the user read provisioning with the new action
* let new action use decrypt flag
* add action and role to docs
* calculate cacheID instead of literals
* use mocked clocks
* advance clocks with the eval results
* use clearer timestamp aliases
* make expected state labels be more clear to read
Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>
* add folder data migration, fix unique index
* fix unique index
* pass a fake store in tests
* pass store into other providers in tests
* and now with alerting!
* Alerting: Fix contact point testing with secure settings
Fixes double encryption of secure settings during contact point testing and removes code duplication
that helped cause the drift between alertmanager and test endpoint. Also adds integration tests to cover
the regression.
Note: provisioningStore is created to remove cycle and the unnecessary dependency.
* Expose library element service's folder service
* Register library panels, add count implementation
* Expand folder counts test
* Update registry deletion method interface
* Allow getting library elements from any folder
* Add test for library panel deletion
* Add test for library panel counting
* introduce a function checkIfSeriesNeedToBeFixed to scan all value fields in the response and provide a function that updates Series so they can be uniquely identifiable. Only Graphite and TestData are checked.
* update `convertDataFramesToResults` to run this function and provide it to WideToMany
* update WideToMany to run the fix function if it is not nil
This commit updates eval.go to improve the performance of matching
captures in the general case. In some cases we have reduced the
runtime of the function from 10s of minutes to a couple 100ms.
In the case where no capture matches the exact labels, we revert to
the current subset/superset match, but with a reduced search space
due to grouping captures.
This commit changes extractEvalString to sort NumberCaptureValues
in ascending order of Var before building the output string. This
means that users will see EvaluationString in a consistent order,
but also make it possible to assert its output in tests.
* introduce a new node-type ML and implement a command outlier that uses ML plugin as a source of data.
* add feature flag mlExpressions that guards the feature
* Alerting: Make ApplyAlertmanagerConfiguration only decrypt/encrypt new/changed secure settings
Previously, ApplyAlertmanagerConfiguration would decrypt and re-encrypt all secure settings. However, this caused re-encrypted secure settings to be included in the raw configuration when applied to the embedded alertmanager, resulting in changes to the hash. Consequently, even if no actual modifications were made, saving any alertmanager configuration triggered an apply/restart and created a new historical entry in the database.
To address the issue, this modifies ApplyAlertmanagerConfiguration, which is called by POST `api/alertmanager/grafana/config/api/v1/alerts`, to decrypt and re-encrypt only new and updated secure settings. Unchanged secure settings are loaded directly from the database without alteration.
We determine whether secure settings have changed based on the following (already in-use) assumption: Only new or updated secure settings are provided via the POST `api/alertmanager/grafana/config/api/v1/alerts` request, while existing unchanged settings are omitted.
* Ensure saving a grafana-managed contact point will only send new/changed secure settings
Previously, when saving a grafana-managed contact point, empty string values were transmitted for all unset secure settings. This led to potential backend issues, as it assumed that only newly added or updated secure settings would be provided.
To address this, we now exclude empty ('', null, undefined) secure settings, unless there was a pre-existing entry in secureFields for that specific setting. In essence, this means we only transmit an empty secure setting if a previously configured value was cleared.
* Fix linting
* refactor omitEmptyUnlessExisting
* fixup
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* add test for the bug
* update backtesting evaluators to accept a number of evaluations instead of `to` to have control over the number evaluations in one place
* Add limit query parameter
* Drop copy paste comment
* Extend history query limit to 30 days and 250 entries
* Fix history log entries ordering
* Update no history message, add empty history test
---------
Co-authored-by: Konrad Lalik <konrad.lalik@grafana.com>
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.
This commit adds debug logs for previous_ends_at and next_ends_at
to state.go to help us debug issues where alerts are resolved in
Alertmanager due to expiration. This change is in response to a
support escalation where this information was needed but unavailable.
* add NodeTypeFromDatasourceUID and DataSourceModelFromNodeType()
* deprecate expr.DataSourceModel
* replace usages of IsDataSource to NodeTypeFromDatasourceUID
* replace usages of DataSourceModel to DataSourceModelFromNodeType()
* replace condition validation with just structural validation
* validate conditions of only new and updated rules
* add integration tests for rule update\delete API
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Alerting: Repurpose rule testing endpoint to return potential alerts
This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations.
The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent.
This means that the endpoint will, among other things:
- Attach static annotations and labels from the rule configuration to the alert instances.
- Attach dynamic annotations from the datasource to the alert instances.
- Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances.
- Interpolate templated annotations / labels and accept allowed template functions.
* Alerting: Fix unique violation when updating rule group with title chains/cycles
The uniqueness constraint for titles within an org+folder is enforced on every update within a transaction instead of on commit (deferred constraint). This means that there could be a set of updates that will throw a unique constraint violation in an intermediate step even though the final state is valid. For example, a chain of updates RuleA -> RuleB -> RuleC could fail if not executed in the correct order, or a swap of titles RuleA <-> RuleB cannot be executed in any order without violating the constraint.
The exact solution to this is complex and requires determining directed paths and cycles in the update graph, adding in temporary updates to break cycles, and then executing the updates in reverse topological order (see first commit in PR if curious).
This is not implemented here.
Instead, we choose a simpler solution that works in all cases but might perform more updates than necessary. This simpler solution makes a determination of whether an intermediate collision could occur and if so, adds a temporary title on all updated rules to break any cycles and remove the need for specific ordering.
In addition, we make sure diffs are executed in the following order: DELETES, UPDATES, INSERTS.
* Alerting: Fix provisioned templates being ignored by alertmanager
Template provisioning sets the template in cfg.TemplateFiles while a recent change
made it so that alertmanager reads cfg.AlertmanagerConfig.Templates instead.
This change fixes the issue on both ends, by having provisioning set boths fields and
reverts the change on the alertmanager side so that it uses cfg.TemplateFiles.
* Let alert rule service implement registry service
* Add count method to RuleStore interface
* Add implementation for deletion of alert rules
* Rename uid to folderUID in registry methods
* Check forceDeleteRule value for registry deletion
* Register alerting store with folder service
* Move folder test functions to separate package
* Add testing for alert rule counting, deletion
* Remove redundant count method
* Fix deleteChildrenInFolder signature
* Update pkg/services/ngalert/store/alert_rule.go
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* Add tests for nested folder deletion
* Refactor TestIntegrationNestedFolderService
* Add rules store as parameter for alertng provider
---------
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* (WIP) Refactor the ImageStore interface to work with our latest alerting repository
* update alerting package
* refactor, new URLExists method in ImageProvider
* tests for the new methods
* fix linter warnings
* use alertingImages as an alias for grafana/alerting/images
* logs about image uris and not found images
* nerf image not found logs
* extract duplicated code to getImageFromURI() method
* refactor getImageFromURI()
* add index on url
* add comment about migration log
* sync generated files
* remove unused HasAdmin and HasEdit permission methods
* remove legacy AC from HasAccess method
* remove unused function
* update alerting tests to work with RBAC
* use tokens or urls in image annotations
* improve tests, fix some comments
* fix empty tokens
* code review changes, check for url before checking for token (support old token formats)
* update to alerting 20230418161049-5f374e58cb32
* rename renamed structs in https://github.com/grafana/alerting/pull/73
* update ValidateContactPoint to use BuildReceiverConfiguration
* update logger factory according to changes
* rewrite integration builder
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Alerting: Allow hooking into request handler functions.
Adds a facility to AlertNG for hooking into API handlers, allowing the
replacement of request handlers for specific paths. One of goals of this
approach was to allow hooking as late as possible in the request, e.g.
after all middleware has been applied, to simplfiy usage.
* Update pkg/services/ngalert/api/hooks.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Update pkg/services/ngalert/api/hooks.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Update pkg/services/ngalert/ngalert.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Fixes to review comments
* Fix passing logger in
---------
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Alerting: Add totalsFiltered to RuleResponse to facilitate hidden by filters count
Currently, when both a limit_alerts and a matcher/state filter is applied, there is not enough information to determine how many alert instances were hidden by the filters. Only enough to determine the total hidden by the limit and filter combined.
This change adds a separate totalsFiltered field alongside the AlertRule totals that will contain the count of instances after filters but before limits.
This commit fixes a bug where DatasourceUID and RefID annotations are
missing for DatasourceNoData alerts in Grafana 9.5. This bug affects
datasource plugins that have moved to using the data plane contract.
This commit adds support for limits and filters to the Prometheus Rules
API.
Limits:
It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:
- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.
Filters:
Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.
A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.
Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.
Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.
The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:
{
"name": "test",
"value": "value1",
"isRegex": false,
"isEqual": true
}
The `isRegex` and `isEqual` options work as follows:
| IsEqual | IsRegex | Operator |
| ------- | -------- | -------- |
| true | false | = |
| true | true | =~ |
| false | true | !~ |
| false | false | != |
* Delete folders, dashboards with registry service
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Update signature of ProvideDashboardServiceImpl
* Regenerate mockery file
* Add test for DeleteInFolder
* Add test for DeleteDashboardsInFolder
* Delete child dashboard associations via registry
* Add validation of folder uid and org id
---------
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* replace receiver errors with one from alerting
* add the converter to alerting models
* update buildReceiverIntegration to accept GrafanaReceiver
---------
Co-authored-by: George Robinson <george.robinson@grafana.com>
* define initial service and add to wire
* update caching service interface
* add skipQueryCache header handler and update metrics query function to use it
* add caching service as a dependency to query service
* working caching impl
* propagate cache status to frontend in response
* beginning of improvements suggested by Lean - separate caching logic from query logic.
* more changes to simplify query function
* Decided to revert renaming of function
* Remove error status from cache request
* add extra documentation
* Move query caching duration metric to query package
* add a little bit of documentation
* wip: convert resource caching
* Change return type of query service QueryData to a QueryDataResponse with Headers
* update codeowners
* change X-Cache value to const
* use resource caching in endpoint handlers
* write resource headers to response even if it's not a cache hit
* fix panic caused by lack of nil check
* update unit test
* remove NONE header - shouldn't show up in OSS
* Convert everything to use the plugin middleware
* revert a few more things
* clean up unused vars
* start reverting resource caching, start to implement in plugin middleware
* revert more, fix typo
* Update caching interfaces - resource caching now has a separate cache method
* continue wiring up new resource caching conventions - still in progress
* add more safety to implementation
* remove some unused objects
* remove some code that I left in by accident
* add some comments, fix codeowners, fix duplicate registration
* fix source of panic in resource middleware
* Update client decorator test to provide an empty response object
* create tests for caching middleware
* fix unit test
* Update pkg/services/caching/service.go
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
* improve error message in error log
* quick docs update
* Remove use of mockery. Update return signature to return an explicit hit/miss bool
* create unit test for empty request context
* rename caching metrics to make it clear they pertain to caching
* Update pkg/services/pluginsintegration/clientmiddleware/caching_middleware.go
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Add clarifying comments to cache skip middleware func
* Add comment pointing to the resource cache update call
* fix unit tests (missing dependency)
* try to fix mystery syntax error
* fix a panic
* Caching: Introduce feature toggle to caching service refactor (#66323)
* introduce new feature toggle
* hide calls to new service behind a feature flag
* remove licensing flag from toggle (misunderstood what it was for)
* fix unit tests
* rerun toggle gen
---------
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Takes a specific code path for data that identifies itself as dataplane instead of "guessing" what the data is.
The data must identify itself by being in the dataplane by having both the following frame metadata properties:
- TypeVersion property that is greater than 0.0
- 'Type' property
The flag is disableSSEDataplane and disables this functionality and uses the old code for all queries regardless.
See https://github.com/grafana/grafana-plugin-sdk-go/blob/main/data/contract_docs/contract.md for dataplane details.
* Alerting: Remove and revert flag alertingBigTransactions
This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag.
Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count
came at the cost of significant increase in CPU usage and query latency.
* Fix lint backend
* Removed last bits of alertingBigTransactions
---------
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* Alerting: Tiny refactor on the eval and schedule packages
two very small things:
- We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext`
- The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary.
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
---------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Alerting: Add endpoint to revert to a previous alertmanager configuration
This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
* Add fresh context with timeout and same log properties, re-derive logger
* Unify timeout constants
* Move ctx after shortcut that got added through rebasing
* Unify timeouts
* Port opentracing's SpanFromContext and ContextFromSpan to the grafana tracing package
* Support both opentracing and otel variants
* Better document why we're creating a new ctx
* Add new func to FakeSpan which was added after rebase
* Support grafana-specific traceID key in both tracer implementations
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:
1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
* WIP
* skip invalid historic configurations instead of erroring
* add warning log when bad historic config is found
* remove unused custom marshaller for GettableHistoricUserConfig
* add id to historic user config, move limit check to store, fix typo
* swagger spec
* Alerting: Respect "For" Duration for NoData alerts
This change modifies `resultNoData` to be more inline with the logic of the other state handlers.
The main effects of this are:
1) NoData states with NoDataState config set to Alerting will respect "For" duration.
2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK.
3) Better state transition logging.
* define 3 feature toggles for rollout phases
* Pass feature toggles along
* Implement first feature toggle
* Try a different strategy with fall-throughs to specific configurations
* Apply toggle overrides once outside of backend composition
* Emit log messages when we coerce backends
* Run code generator for feature toggle files
* Improve wording in flag descs
* Re-run generator
* Use code-generated constants instead of plain strings
* Use converted enum values rather than strings for pre-parsing
* move export rules to definitions package
* move provisioning contact point methods to provisioning package
* move AlertRuleGroupWithFolderTitle to ngalert models and adapter functions to api's compat
* move rule_types files back to where they were before.
* Remove private labels
* No longer index by instance labels
* Labels are now invariant, only build them once
* Remove bucketing since everything is in a single stream
* Refactor statesToStreams to only return a single unified log stream
* Don't query on labels that no longer exist
* Move selector logic to loki layer, genericize client to work in terms of straight logQL
* Add support for line-level label filters in query
* Combine existing selector tests for better parallelism
* Tests for logQL construction
* Underscore instead of dot for unwrapping labels in logql
* Alerting: Add CustomDetails for PagerDuty
* fix default value for 'severity' from 'error' to 'critical'
* minimal docs for notifiers, specifying config for PagerDuty
* replace notifier -> integration
* replace notifier -> integration
* copy AlertQuery from ngmodels to the definition package
* replaces usages of ngmodels.AlertQuery in API models
* create a converter between models of AlertQuery
---------
Co-authored-by: Alex Moreno <alexander.moreno@grafana.com>
* Encode with snappy, always
* JSON encoder type
* Headers
* Copy labels formatter from promtail
* Implement snappy-proto encoding
* Create encoder interface, test both encoders, choose snappy-proto by default
* Make encoder configurable at the LokiCfg level
* Export both encoders
* Touch up comment and tests
* Drop unnecessary conversions after move to plain strings to appease linter
* Rename RecordStatesAsync to Record
* Rename QueryStates to Query
* Implement fanout writes
* Implement primary queries
* Simplify error joining
* Add test for query path
* Add tests for writes and error propagation
* Allow fanout backend to be configured
* Touch up log messages and config validation
* Consistent documentation for all backend structs
* Parse and normalize backend names more consistently against an enum
* Touch-ups to documentation
* Improve clarity around multi-record blocking
* Keep primary and secondaries more distinct
* Rename fanout backend to multiple backend
* Simplify config keys for multi backend mode