Alerting: fix preserving errors in the alert rule state during error to error transitions
Alert state transition from one error to another did not update state.Error correctly.
The error in state.Error remained as the initial error encountered.
This led to another issue, where after a Grafana restart, the error was lost because
the state of the alert rule did not change, but the Error is not preserved in the database
between restarts.
This could happen if the expression service returned an error or the alert routine panicked
during querying.
* create a new table for migration resources
* remove raw result bytes from db
* more snapshot resource management stuff
* integrate new table with snapshots
* pass in result limit and offset as params
* combine create and update
* set up xorm store test
* add unit tests
* save some cpu
* remove unneeded arg
* regen swagger
* fix bug with result processing
* fix update create logic so that uid isn't required for lookup
* change offset to page
* regen swagger
* revert accidental changes to file
* curl command page should be 1 indexed
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Dan Cech <dcech@grafana.com>
Co-authored-by: Andres Martinez Gotor <andres.martinez@grafana.com>
* add regex support for api tests
* revert dumb thing
* add api tests
* add unit test for core async workflow
* add xorm store unit tests
* fix typo
* remove unnecessary assignment
* expose ngalert API to public
* add delete action to time-intervals
* introduce time-interval model generated by app-platform-sdk from CUE model the fields of the model are chosen to be compatible with the current model
* implement api server
* add feature flag alertingApiServer
---- Test Infra
* update helper to support creating custom users with enterprise permissions
* add generator for Interval model
* Simple replace of State.Resolved with State.ResolvedAt
* Retain ResolvedAt time between Normal->Normal transition
* Introduce ResolvedRetention to keep sending recently resolved alerts
* Make ResolvedRetention configurable with resolved_alert_retention
* Tick-based LastSentAt for testing of ResendDelay and ResolvedRetention
* Do not reset ResolvedAt during Normal->Pending transition
Initially this was done to be inline with Prom ruler. However, Prom ruler
doesn't keep track of Inactive->Pending/Alerting using the same alert instance,
so it's more understandable that they choose not to retain ResolvedAt. In our
case, since we use the same cached instance to represent the transition, it
makes more sense to retain it.
This should help alleviate some odd situations where temporarily entering
Pending will stop future resolved notifications that would have happened
because of ResolvedRetention.
* Pointers for ResolvedAt & LastSentAt
To avoid awkward time.Time{}.Unix() defaults on persist
* Zanana: Use grafana migrations to run openFGA migration files and initilize store.
* Add feature toggle
* Zanzana: return noop client if feature toggle is disabled
* add new apis
* add payloads
* create snapshot status type
* add some impl
* finish implementing update
* start implementing build snapshot func
* add more fake build logic
* add cancel endpoint. do some cleanup
* implement GetSnapshot
* implement upload snapshot
* merge onprem status with gms result
* get it working
* update comment
* rename list endpoint
* add query limit and offset
* add helper method to snapshot
* little bit of cleanup
* work on swagger annotations
* manual merge
* generate swagger specs
* clean up curl commands
* fix bugs found during final testing
* fix linter issue
* fix unit test
This adds a version of the SQLStore that includes a ReadReplica. The primary DB can be accessed directly - from the caller's standpoint, there is no difference between the SQLStore and ReplStore unless they wish to explicitly call the ReadReplica() and use that for the DB sessions.
Currently only the stats service GetSystemStats and GetAdminStats are using the ReadReplica(); if it's misconfigured or if the databaseReadReplica feature flag is not turned on, it will fall back to the usual (SQLStore) behavior.
Testing requires a database and read replica - the replication should already be configured. I have been testing this locally with a docker mysql setup (https://medium.com/@vbabak/docker-mysql-master-slave-replication-setup-2ff553fceef2) and the following config:
[feature_toggles]
databaseReadReplica = true
[database]
type = mysql
name = grafana
user = grafana
password = password
host = 127.0.0.1:3306
[database_replica]
type = mysql
name = grafana
user = grafana
password = password
host = 127.0.0.1:3307
* keep config in a separate struct in LDAP
* implement reload function for LDAP
* remove param from sso service constructor
* update unit tests
* add feature flag
* remove nil params
* address feedback
* add unit test for disabled config