Alerting: Refactor & fix unified alerting metrics structure (#39151)

* Alerting: Refactor & fix unified alerting metrics structure

Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.

This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
This commit is contained in:
gotjosh
2021-09-14 12:55:01 +01:00
committed by GitHub
parent 1edd415ddf
commit a2f4344bf2
21 changed files with 243 additions and 119 deletions

View File

@@ -22,10 +22,10 @@ type cache struct {
states map[int64]map[string]map[string]*State // orgID > alertRuleUID > stateID > state
mtxStates sync.RWMutex
log log.Logger
metrics *metrics.Metrics
metrics *metrics.State
}
func newCache(logger log.Logger, metrics *metrics.Metrics) *cache {
func newCache(logger log.Logger, metrics *metrics.State) *cache {
return &cache{
states: make(map[int64]map[string]map[string]*State),
log: logger,