grafana/pkg/services/ngalert/schedule
gotjosh c631261681
Alerting: Attempt to retry retryable errors (#79161)
* Alerting: Attempt to retry retryable errors

Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible.

I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected.

There's two small differences between how retries work now and how they used to work in legacy alerting.

Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying.
We have added a constant backoff of 1s in between retries.

---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-12-06 20:45:08 +00:00
..
alerts_sender_mock.go Alerting: Fetch alerts from a remote Alertmanager (#75844) 2023-10-19 11:27:37 +02:00
fetcher_test.go Alerting: Scheduler to use AlertRule (#52354) 2022-07-26 09:40:06 -04:00
fetcher.go Alerting: Update scheduler to get updates only from database (#64635) 2023-03-14 18:02:51 -04:00
registry_bench_test.go Alerting: Scheduler use rule fingerprint instead of version (#66531) 2023-04-28 10:42:16 -04:00
registry_test.go Alerting: Scheduler use rule fingerprint instead of version (#66531) 2023-04-28 10:42:16 -04:00
registry.go Alerting: Use unsafe.Slice for hashing a string during rule fingerprint calculation (#71000) 2023-06-30 14:58:23 -04:00
schedule_unit_test.go Alerting: Attempt to retry retryable errors (#79161) 2023-12-06 20:45:08 +00:00
schedule.go Alerting: Attempt to retry retryable errors (#79161) 2023-12-06 20:45:08 +00:00
testing.go Alerting: update test TestAlertingTicker to not rely on clock (#58544) 2022-11-09 15:08:57 -05:00