* Delete folders, dashboards with registry service
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* Update signature of ProvideDashboardServiceImpl
* Regenerate mockery file
* Add test for DeleteInFolder
* Add test for DeleteDashboardsInFolder
* Delete child dashboard associations via registry
* Add validation of folder uid and org id
---------
Co-authored-by: Serge Zaitsev <hello@zserge.com>
* replace receiver errors with one from alerting
* add the converter to alerting models
* update buildReceiverIntegration to accept GrafanaReceiver
---------
Co-authored-by: George Robinson <george.robinson@grafana.com>
* define initial service and add to wire
* update caching service interface
* add skipQueryCache header handler and update metrics query function to use it
* add caching service as a dependency to query service
* working caching impl
* propagate cache status to frontend in response
* beginning of improvements suggested by Lean - separate caching logic from query logic.
* more changes to simplify query function
* Decided to revert renaming of function
* Remove error status from cache request
* add extra documentation
* Move query caching duration metric to query package
* add a little bit of documentation
* wip: convert resource caching
* Change return type of query service QueryData to a QueryDataResponse with Headers
* update codeowners
* change X-Cache value to const
* use resource caching in endpoint handlers
* write resource headers to response even if it's not a cache hit
* fix panic caused by lack of nil check
* update unit test
* remove NONE header - shouldn't show up in OSS
* Convert everything to use the plugin middleware
* revert a few more things
* clean up unused vars
* start reverting resource caching, start to implement in plugin middleware
* revert more, fix typo
* Update caching interfaces - resource caching now has a separate cache method
* continue wiring up new resource caching conventions - still in progress
* add more safety to implementation
* remove some unused objects
* remove some code that I left in by accident
* add some comments, fix codeowners, fix duplicate registration
* fix source of panic in resource middleware
* Update client decorator test to provide an empty response object
* create tests for caching middleware
* fix unit test
* Update pkg/services/caching/service.go
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
* improve error message in error log
* quick docs update
* Remove use of mockery. Update return signature to return an explicit hit/miss bool
* create unit test for empty request context
* rename caching metrics to make it clear they pertain to caching
* Update pkg/services/pluginsintegration/clientmiddleware/caching_middleware.go
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Add clarifying comments to cache skip middleware func
* Add comment pointing to the resource cache update call
* fix unit tests (missing dependency)
* try to fix mystery syntax error
* fix a panic
* Caching: Introduce feature toggle to caching service refactor (#66323)
* introduce new feature toggle
* hide calls to new service behind a feature flag
* remove licensing flag from toggle (misunderstood what it was for)
* fix unit tests
* rerun toggle gen
---------
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Takes a specific code path for data that identifies itself as dataplane instead of "guessing" what the data is.
The data must identify itself by being in the dataplane by having both the following frame metadata properties:
- TypeVersion property that is greater than 0.0
- 'Type' property
The flag is disableSSEDataplane and disables this functionality and uses the old code for all queries regardless.
See https://github.com/grafana/grafana-plugin-sdk-go/blob/main/data/contract_docs/contract.md for dataplane details.
* Alerting: Remove and revert flag alertingBigTransactions
This is a partial revert of #56575 and a removal of the `alertingBigTransactions` flag.
Real-word use has seen no clear performance incentive to maintain this flag. Lowered db connection count
came at the cost of significant increase in CPU usage and query latency.
* Fix lint backend
* Removed last bits of alertingBigTransactions
---------
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
* Alerting: Tiny refactor on the eval and schedule packages
two very small things:
- We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext`
- The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary.
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Update pkg/services/ngalert/schedule/schedule.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
---------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Alerting: Add endpoint to revert to a previous alertmanager configuration
This endpoint is meant to be used in conjunction with /api/alertmanager/grafana/config/history to
revert to a previously applied alertmanager configuration. This is done by ID instead of raw config
string in order to avoid secure field complications.
* Add fresh context with timeout and same log properties, re-derive logger
* Unify timeout constants
* Move ctx after shortcut that got added through rebasing
* Unify timeouts
* Port opentracing's SpanFromContext and ContextFromSpan to the grafana tracing package
* Support both opentracing and otel variants
* Better document why we're creating a new ctx
* Add new func to FakeSpan which was added after rebase
* Support grafana-specific traceID key in both tracer implementations
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:
1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
* WIP
* skip invalid historic configurations instead of erroring
* add warning log when bad historic config is found
* remove unused custom marshaller for GettableHistoricUserConfig
* add id to historic user config, move limit check to store, fix typo
* swagger spec
* Alerting: Respect "For" Duration for NoData alerts
This change modifies `resultNoData` to be more inline with the logic of the other state handlers.
The main effects of this are:
1) NoData states with NoDataState config set to Alerting will respect "For" duration.
2) Prevents zero value in StartsAt and EndsAt for alerts that have only even been in normal state. This includes state transitions from NoDataState=OK and ExecErrState=OK.
3) Better state transition logging.
* define 3 feature toggles for rollout phases
* Pass feature toggles along
* Implement first feature toggle
* Try a different strategy with fall-throughs to specific configurations
* Apply toggle overrides once outside of backend composition
* Emit log messages when we coerce backends
* Run code generator for feature toggle files
* Improve wording in flag descs
* Re-run generator
* Use code-generated constants instead of plain strings
* Use converted enum values rather than strings for pre-parsing
* move export rules to definitions package
* move provisioning contact point methods to provisioning package
* move AlertRuleGroupWithFolderTitle to ngalert models and adapter functions to api's compat
* move rule_types files back to where they were before.
* Remove private labels
* No longer index by instance labels
* Labels are now invariant, only build them once
* Remove bucketing since everything is in a single stream
* Refactor statesToStreams to only return a single unified log stream
* Don't query on labels that no longer exist
* Move selector logic to loki layer, genericize client to work in terms of straight logQL
* Add support for line-level label filters in query
* Combine existing selector tests for better parallelism
* Tests for logQL construction
* Underscore instead of dot for unwrapping labels in logql
* Alerting: Add CustomDetails for PagerDuty
* fix default value for 'severity' from 'error' to 'critical'
* minimal docs for notifiers, specifying config for PagerDuty
* replace notifier -> integration
* replace notifier -> integration
* copy AlertQuery from ngmodels to the definition package
* replaces usages of ngmodels.AlertQuery in API models
* create a converter between models of AlertQuery
---------
Co-authored-by: Alex Moreno <alexander.moreno@grafana.com>
* Encode with snappy, always
* JSON encoder type
* Headers
* Copy labels formatter from promtail
* Implement snappy-proto encoding
* Create encoder interface, test both encoders, choose snappy-proto by default
* Make encoder configurable at the LokiCfg level
* Export both encoders
* Touch up comment and tests
* Drop unnecessary conversions after move to plain strings to appease linter
* Rename RecordStatesAsync to Record
* Rename QueryStates to Query
* Implement fanout writes
* Implement primary queries
* Simplify error joining
* Add test for query path
* Add tests for writes and error propagation
* Allow fanout backend to be configured
* Touch up log messages and config validation
* Consistent documentation for all backend structs
* Parse and normalize backend names more consistently against an enum
* Touch-ups to documentation
* Improve clarity around multi-record blocking
* Keep primary and secondaries more distinct
* Rename fanout backend to multiple backend
* Simplify config keys for multi backend mode
* stop using the scheduler's Update and Delete methods all communication must be via the database
* update scheduler's registry to calculate diff before re-setting the cache
* update fetcher to return the diff generated by registry
* update processTick to update rule eval routine if the rule was updated and it is not going to be evaluated at this tick.
* remove references to the scheduler from api package
* remove unused methods in the scheduler
This commit changes the state package so that errors encountered while
expanding templates for custom labels and annotations are returned
from the function. This is not used at present, but will be used in the
future as we look at how to offer better feedback to users who don't
have access to logs, for example our customers who use Hosted Grafana.
This commit fixes a bug in the $values variable in notification
templates when using Classic Conditions. Since Classic Conditions
are not multi-dimensional, the values of each series that exceeded
the condition should be available as a RefID and offset. For example,
B0, B1, etc. However, this bug meant that instead just a single
condition would be printed as B, not B0.
* Create historian metrics and dependency inject
* Record counter for total number of state transitions logged
* Track write failures
* Track current number of active write goroutines
* Record histogram of how long it takes to write history data
* Don't copy the registerer
* Adjust naming of write failures metric
* Introduce WritesTotal to complement WritesFailedTotal
* Measure TransitionsFailedTotal to complement TransitionsTotal
* Rename all to state_history
* Remove redundant Total suffix
* Increment totals all the time, not just on success
* Drop ActiveWriteGoroutines
* Drop PersistDuration in favor of WriteDuration
* Drop unused gauge
* Make writes and writesFailed per org
* Add metric indicating backend and a spot for future metadata
* Drop _batch_ from names and update help
* Add metric for bytes written
* Better pairing of total + failure metric updates
* Few tweaks to wording and naming
* Record info metric during composition
* Create fakeRequester and simple happy path test using it
* Blocking test for the full historian and test for happy path metrics
* Add tests for failure case metrics
* Smoke test for full annotation persistence
* Create test for metrics on annotation persistence, both happy and failing paths
* Address linter complaints
* More linter complaints
* Remove unnecessary whitespace
* Consistency improvements to help texts
* Update tests to match new descs
* Alerting: Add metrics for active receiver and integrations
Introduces metrics that allows us to track the number of configured receivers and integration in the Alertmanager for all orgs.
As a bonus, I realised that the alert reception metrics where not being exported nor collected. This does that too.
* revert to using folder store from the resolvers
* fixing tests after revert
* api test fixes
---------
Co-authored-by: Kristin Laemmert <mildwonkey@users.noreply.github.com>