* Fix Annotation creation
- Remove validation of panelID, now annotations are created irrespective on whether they're attached to a panel or not.
- Alwasy attach the annotation to an AlertID
* Fix annotation creation
* fix tests
* Alerting: Clear alerting rule evaluation errors after intermittent failures
When an alert transitioned in a way that `alerting -> error -> (alerting|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.
* Alerting: (wip) add template funcs
* Alerting: (wip) numeric template functions
* Alerting: (wip) template functions
* Test for the "args" function
* Alerting: (wip) Documentation for template functions
* Alerting: template functions - refactor
* code review changes
* disable linter error
* Use Prometheus implementation of TemplateExpander
* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
* change templateCaptureValue to support using template functions
* Update pkg/services/ngalert/state/template.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Test and documentation added for reReplaceAll template function
* complete missing functions, documentation and tests
* Use the alert instance's evaluation time for expanding the template
* strvalue graphlink and tablelink functions
* delete duplicate test
* make strvalue return an empty string
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Refactor & fix unified alerting metrics structure
Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.
This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
* Alerting: Fix alert flapping in the alertmanager
fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager.
Mostly due to a combination of `EndsAt` and resend delay.
The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard
back from the alert generation system.
Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself,
and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing.
This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.
Fixes#30144
Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
* initial attempt at automatic removal of stale states
* test case, need espected states
* finish unit test
* PR feedback
* still multiply by time.second
* pr feedback
* Expand the value of math and reduce expressions in annotations and labels
This commit makes it possible to use the values of reduce and math
expressions in annotations and labels via their RefIDs. It uses the
Stringer interface to ensure that "{{ $values.A }}" still prints the
value in decimal format while also making the labels for each RefID
available with "{{ $values.A.Labels }}" and the float64 value with
"{{ $values.A.Value }}"
* Alerting: Refactor state manager as a dependency
Within the scheduler, the state manager was being passed around a
certain number of functions. I've introduced it as a dependency to keep
the "service" interfaces as clean and homogeneous as possible.
This is relevant, because I'm going to introduce live reload of these
components as part of my next PR and it is better if dependencies are
self-contained.
* remove unused functions
* Fix a few more tests
* Make sure the `stateManager` is declared before the schedule
* Alerting: Expand `{{$labels.xyz}}` template in labels and annotations
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix annotation not updating for same alert
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Instead put in package folder but with package name suffixed with _test
This enables code coverage within the pkg while still allow the tests to operate from external to package perspective (only exported things).