* do not suppress NoData state
* extract conversion of state to postable alert + tests
* create a special alert instance if nodata
* use NoData when converting from Keep Last State instead of Alerting
* add silence during migration if NoData is mapped to KeepLastState.
* Alerting: (wip) add template funcs
* Alerting: (wip) numeric template functions
* Alerting: (wip) template functions
* Test for the "args" function
* Alerting: (wip) Documentation for template functions
* Alerting: template functions - refactor
* code review changes
* disable linter error
* Use Prometheus implementation of TemplateExpander
* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
* change templateCaptureValue to support using template functions
* Update pkg/services/ngalert/state/template.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Test and documentation added for reReplaceAll template function
* complete missing functions, documentation and tests
* Use the alert instance's evaluation time for expanding the template
* strvalue graphlink and tablelink functions
* delete duplicate test
* make strvalue return an empty string
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Chore: GetDashboardQuery should be dispatched using DispatchCtx
* Fix after merge
* Changes after review
* Various fixes
* Use GetDashboardCtx function instead of GetDashboard
* Alerting: Refactor & fix unified alerting metrics structure
Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.
This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
* Change templateCaptureValue to support using template functions
This commit changes templateCaptureValue to use float64 for the value
instead of *float64. This change means that annotations and labels can
use the float64 value with functions such as printf and avoid having to
check for nil. It also means that absent values are now printed as 0.
* Use math.NaN() instead of 0 for absent value
* Alerting: Fix alert flapping in the alertmanager
fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager.
Mostly due to a combination of `EndsAt` and resend delay.
The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard
back from the alert generation system.
Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself,
and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing.
This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.
Fixes#30144
Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
* initial attempt at automatic removal of stale states
* test case, need espected states
* finish unit test
* PR feedback
* still multiply by time.second
* pr feedback
* Expand the value of math and reduce expressions in annotations and labels
This commit makes it possible to use the values of reduce and math
expressions in annotations and labels via their RefIDs. It uses the
Stringer interface to ensure that "{{ $values.A }}" still prints the
value in decimal format while also making the labels for each RefID
available with "{{ $values.A.Labels }}" and the float64 value with
"{{ $values.A.Value }}"
* Alerting: Refactor state manager as a dependency
Within the scheduler, the state manager was being passed around a
certain number of functions. I've introduced it as a dependency to keep
the "service" interfaces as clean and homogeneous as possible.
This is relevant, because I'm going to introduce live reload of these
components as part of my next PR and it is better if dependencies are
self-contained.
* remove unused functions
* Fix a few more tests
* Make sure the `stateManager` is declared before the schedule
* Alerting: Expand `{{$labels.xyz}}` template in labels and annotations
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix annotation not updating for same alert
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
When, and currently only when using a classic condition, evaluation information is added (which is like the EvalMatches from dashboard alerting).
This is returned via the API and can be included in notifications by reading the `__value__` label attached `.Alerts` in the template. It is a string.
Instead put in package folder but with package name suffixed with _test
This enables code coverage within the pkg while still allow the tests to operate from external to package perspective (only exported things).
* nest cache by orgID, ruleUID, stateID
* update accessors to use new cache structure
* test and linter fixup
* fix panic
Co-authored-by: Kyle Brandt <kyle@grafana.com>
* add comment to identify what's going on with nested maps in cache
Co-authored-by: Kyle Brandt <kyle@grafana.com>
* Fix fialure when adding state annotations
* Fix get org rules API
Do not fail response if user has no access to view a namespace.
Do not include the namespace in the response instead.
* lint
* set processing time
* merge labels and set on response
* use state cache for adding alerts to rules
* minor cleanup
* add support for NoData and Error results
* rename test
* bring in changes from other PRs tha have been merged
* pr feedback
* add integration test
* close state tracker cleanup on context.Done
* fixup test
* rename state tracker
* set EvaluationDuration on Result
* default labels set as constants
* separate cache and state from manager
* use RWMutex in cache
* set processing time
* merge labels and set on response
* use state cache for adding alerts to rules
* minor cleanup
* add support for NoData and Error results
* rename test
* bring in changes from other PRs tha have been merged
* pr feedback
* add integration test
* close state tracker cleanup on context.Done
* fixup test
* not those annotations
* set processing time
* merge labels and set on response
* use state cache for adding alerts to rules
* minor cleanup
* pr feedback
* Do not initialize mutex unnecessarily
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* linter
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* move state tracker tests to /tests
* set default labels on alerts
* handle empty labels in result.Instance
* create annotation on transition to alerting state
* Return cached alerts for prometheus/api/v1/alerts
* Return not implemented for /prometheus/grafana/api/v1/rules
* Set StartsAt for already alerting states
* Fix tests