* Alerting: Decrypt secure settings when testing receivers in the remote Alertmanager
* go work sync
* make update-workspace
* point to latest main in grafana/alerting
* unit test
* import definitions only once
* Alerting: Fix duplicated silences in remote primary mode bug
* test that a new silence id returned by calling CreateSilence() on the internal Alertmanager is ignored
* Add TracedClient
* Handle errors and status codes
* Wire up tracing to normal ASH and loki annotation mapping
* Add tracing to remote alertmanager
* one more spot
* and not or
* More consistency with other grafana traces, lower cardinality name
* chore(perf): Pre-allocate where possible (enable prealloc linter)
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
* fix TestAlertManagers_buildRedactedAMs
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
* prealloc a slice that appeared after rebase
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
---------
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
* make the config sync happen on each call to ApplyConfig(), fix tests
* send autogen config
* add fake autogen function for tests
* update stale comments, tidy things up, make linter happy
* add auto-gen routes only if the feature toggle is enabled
* remove unnecessary fake autogen function
* throttle configuration syncs
* restore pkg/services/store/entity/sqlstash/sql_storage_server.go
* test sync loop in ApplyConfig, skip invalid autogen routes
* restore conf/defaults.ini
* restore conf/defaults.ini
* avoid skipping invalid auto-gen routes in SaveAndApplyConfig
* test that autogenFn is called and its errors are returned
* add debug message about the sync interval not having elapsed
* collapse two log lines into one
* Alerting: Update grafana/alerting
* make tests pass by implementing yaml unmarshallers and deleting fields with omitempty in their yaml tags
* go mod tidy
* fix tests by implementing not calling GettableApiAlertingConfig.UnmarshalYAML from GettableApiAlertingConfig.UnmarshalJSON
* cleanup, reduce diff
* fix more tests
* update grafana/alerting to latest commit, delete global section from configs in tests
* bring back YAML unmarshaller for GettableApiAlertingConfig
* update alerting package dependency to point to main
* skip test for sns notifier
* Prometheus: Update dependency to v0.52.0
* go work sync
* fix panics in tests
* go work sync
* prometheus v0.52.0
* handle errors
* Update pkg/services/ngalert/sender/sender_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Update pkg/services/ngalert/sender/sender_test.go
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
---------
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: ismail simsek <ismailsimsek09@gmail.com>
* Alerting: Pass metrics Registerer into NewExternalAlertmanagerSender.
I will work on a separate change to export the metrics from Grafana, this
is a little more complicated.
* Typo
* Alerting: Implement GetStatus in the remote Alertmanager struct
* update tests
* fix tests, extract AlertmanagerConfig from PostableConfig
* get the remote AM config instead of the Grafana one from the remote AM
* pass grafana AM config in test
* return error in GetStatus instead of logging it (internal AM)
* Alerting: Implement SaveAndApplyConfiguration in the forked Alertmanager struct
* call SaveAndApplyConfig on the remote first, log errors for the internal
* add comments explaining why we ignore errors in the internal AM
* restore go.work.sum
* implement SaveAndApplyConfig in the remote Alertmanager struct
* remove ID from CreateGrafanaAlertmanagerConfig call
* decrypt, test that we decrypt, refactor
* fix duplicated declaration in test
* rephrase comment, remove unnecessary conversion to slice of bytes
* fix test
* Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary)
* log the error for the internal AM instead of returning it
* Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct
* send the hash of the encrypted configuration
* tests, default config hash in AM struct
* add missing default config to test
* restore build directory
* go work file...
* fix broken test
* remove unnecessary conversion to []byte
* go work again...
* make things work again with latest main branch changes
* update error messages in tests for decrypting config
* Alerting: Persist silence state immediately on Create/Delete
Persists the silence state to the kvstore immediately instead of waiting for the
next maintenance run. This is used after Create/Delete to prevent silences from
being lost when a new Alertmanager is started before the state has persisted.
This can happen, for example, in a rolling deployment scenario.
* Fix test that requires real data
* Don't error if silence state persist fails, maintenance will correct
* Alerting: Implement ApplyConfig for remote primary mode (forked AM)
* add TODO for saving the config hash in other config-related methods
* fix bad method receiver name (m -> am)
* tests
* add mutex
* remove sync loop
* (WIP) Alerting: Decrypt secrets before sending configuration to the remote Alertmanager
* refactor, fix tests
* test decrypting secrets
* tidy up
* test SendConfiguration, quote keys, refactor tests
* make linter happy
* decrypt configuration before comparing
* copy configuration struct before decrypting
* reduce diff in TestCompareAndSendConfiguration
* clean up remote/alertmanager.go
* make linter happy
* avoid serializing into JSON to copy struct
* codeowners
* Alerting: Add metrics to the remote Alertmanager struct
* rephrase http_requests_failed description
* make linter happy
* remove unnecessary metrics
* extract timed client to separate package
* use histogram collector from dskit
* remove weaveworks dependency
* capture metrics for all requests to the remote Alertmanager (both clients)
* use the timed client in the MimirAuthRoundTripper
* HTTPRequestsDuration -> HTTPRequestDuration, clean up mimir client factory function
* refactor
* less git diff
* gauge for last readiness check in seconds
* initialize LastReadinesCheck to 0, tweak metric names and descriptions
* add counters for sync attempts/errors
* last config sync and last state sync timestamps (gauges)
* change latency metric name
* metric for remote Alertmanager mode
* code review comments
* move label constants to metrics package
* (WIP) Alerting: Use the forked Alertmanager for remote secondary mode
* fall back to using internal AM in case of error
* remove TODOs, clean up .ini file, add orgId as part of remote AM config struct
* log warnings and errors, fall back to remoteSecondary, fall back to internal AM only
* extract logic to decide remote Alertmanager mode to a separate function, switch on mode
* tests
* make linter happy
* remove func to decide remote Alertmanager mode
* refactor factory function and options
* add default case to switch statement
* remove ineffectual assignment
* Alerting: Send configuration and state to the remote Alertmanager on shutdown
* Alerting: Add a sync interval for ApplyConfig in remote secondary mode
* add routine to sync states and configs
* pass a cancellable context to syncRoutine(), remove tests for ApplyConfig, cache last config in memory
* extract logic to update config and state in the remote Alertmanager
* get latest config from the database
* avoid using separate goroutine for updating state and config
* clean up PR
* refactor, comments, tests
* update tests
* remove canceled context from calls to StopAndWait()
* create context with timeout and send config and state to remote Alertmanager
* update tests
* address code review comments
* Alerting: Add a sync interval for ApplyConfig in remote secondary mode
* add routine to sync states and configs
* pass a cancellable context to syncRoutine(), remove tests for ApplyConfig, cache last config in memory
* extract logic to update config and state in the remote Alertmanager
* get latest config from the database
* avoid using separate goroutine for updating state and config
* clean up PR
* refactor, comments, tests
* update tests
* add config struct for remote secondary forked Alertmanager
* use errgroups for sync operations
* use waitgroup instead of errgroup
* remove helper method to sync AMs
* check for errors instead of bool syncErr
* Alerting: Refactor readiness check
Moves the readiness check to the mimir client and removes the need to assert that we have senders - it already has a queue and can hold notifications until we're ready to send them.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Add a sync interval for ApplyConfig in remote secondary mode
* remove out of scope code
* remove parentheses after CleanUp for consistency in test comments
* Add comment to ApplyConfig