Alerting: fix preserving errors in the alert rule state during error to error transitions
Alert state transition from one error to another did not update state.Error correctly.
The error in state.Error remained as the initial error encountered.
This led to another issue, where after a Grafana restart, the error was lost because
the state of the alert rule did not change, but the Error is not preserved in the database
between restarts.
This could happen if the expression service returned an error or the alert routine panicked
during querying.
* create a new table for migration resources
* remove raw result bytes from db
* more snapshot resource management stuff
* integrate new table with snapshots
* pass in result limit and offset as params
* combine create and update
* set up xorm store test
* add unit tests
* save some cpu
* remove unneeded arg
* regen swagger
* fix bug with result processing
* fix update create logic so that uid isn't required for lookup
* change offset to page
* regen swagger
* revert accidental changes to file
* curl command page should be 1 indexed
* add regex support for api tests
* revert dumb thing
* add api tests
* add unit test for core async workflow
* add xorm store unit tests
* fix typo
* remove unnecessary assignment
* expose ngalert API to public
* add delete action to time-intervals
* introduce time-interval model generated by app-platform-sdk from CUE model the fields of the model are chosen to be compatible with the current model
* implement api server
* add feature flag alertingApiServer
---- Test Infra
* update helper to support creating custom users with enterprise permissions
* add generator for Interval model
* Simple replace of State.Resolved with State.ResolvedAt
* Retain ResolvedAt time between Normal->Normal transition
* Introduce ResolvedRetention to keep sending recently resolved alerts
* Make ResolvedRetention configurable with resolved_alert_retention
* Tick-based LastSentAt for testing of ResendDelay and ResolvedRetention
* Do not reset ResolvedAt during Normal->Pending transition
Initially this was done to be inline with Prom ruler. However, Prom ruler
doesn't keep track of Inactive->Pending/Alerting using the same alert instance,
so it's more understandable that they choose not to retain ResolvedAt. In our
case, since we use the same cached instance to represent the transition, it
makes more sense to retain it.
This should help alleviate some odd situations where temporarily entering
Pending will stop future resolved notifications that would have happened
because of ResolvedRetention.
* Pointers for ResolvedAt & LastSentAt
To avoid awkward time.Time{}.Unix() defaults on persist
* Zanana: Use grafana migrations to run openFGA migration files and initilize store.
* Add feature toggle
* Zanzana: return noop client if feature toggle is disabled
* add new apis
* add payloads
* create snapshot status type
* add some impl
* finish implementing update
* start implementing build snapshot func
* add more fake build logic
* add cancel endpoint. do some cleanup
* implement GetSnapshot
* implement upload snapshot
* merge onprem status with gms result
* get it working
* update comment
* rename list endpoint
* add query limit and offset
* add helper method to snapshot
* little bit of cleanup
* work on swagger annotations
* manual merge
* generate swagger specs
* clean up curl commands
* fix bugs found during final testing
* fix linter issue
* fix unit test
This adds a version of the SQLStore that includes a ReadReplica. The primary DB can be accessed directly - from the caller's standpoint, there is no difference between the SQLStore and ReplStore unless they wish to explicitly call the ReadReplica() and use that for the DB sessions.
Currently only the stats service GetSystemStats and GetAdminStats are using the ReadReplica(); if it's misconfigured or if the databaseReadReplica feature flag is not turned on, it will fall back to the usual (SQLStore) behavior.
Testing requires a database and read replica - the replication should already be configured. I have been testing this locally with a docker mysql setup (https://medium.com/@vbabak/docker-mysql-master-slave-replication-setup-2ff553fceef2) and the following config:
[feature_toggles]
databaseReadReplica = true
[database]
type = mysql
name = grafana
user = grafana
password = password
host = 127.0.0.1:3306
[database_replica]
type = mysql
name = grafana
user = grafana
password = password
host = 127.0.0.1:3307
* keep config in a separate struct in LDAP
* implement reload function for LDAP
* remove param from sso service constructor
* update unit tests
* add feature flag
* remove nil params
* address feedback
* add unit test for disabled config
* Fix restoring dashboard to root folder
* use a root folder representation instead of nil
* change root folder by general folder
---------
Co-authored-by: Ezequiel Victorero <ezequiel.victorero@grafana.com>
* Zanana: Initial work to run zanana as ebeddedn or standalone
* Add addr settings for when remote client is used.
* sync dependencies
* Lock mysql driver version
---------
Co-authored-by: Dan Cech <dcech@grafana.com>
* WIP
* Add barchart panel with scenes
* Fix timerange in barchart panel
* Refactor: component names
* Remove not used css styles and rename panel title
* Remove unnecessary HistoryEventsListObject class and update text in labels filter
* add padding top for filter
* Add translations
* update limit labels constant
* Update showing state reason
* Fix scene object
* Address review comments
* Update icons
* use endpoints instead of the autogenerated hook
* Address review comments
* Add tooltip for alert name
* use private polling interval
* fix autogenerated translations
* Address pr rewview comments
* Address review comments
* Update text in placeholder
* Rename variable and remove spaces in Trans children
* Fix several broadcaster data races and error handling
- Separate concerns between sender and receiver sides in channel usage
- broadcaster: Fix data race between Subscribe/Unsubscribe and start
- Fix Subscribe error to be io.EOF when broadcaster is terminated
- Fix Watch never unsubscribing
- General cleanup
- fix usage of context
- add a huge amount of documentation about channels
* Add TracedClient
* Handle errors and status codes
* Wire up tracing to normal ASH and loki annotation mapping
* Add tracing to remote alertmanager
* one more spot
* and not or
* More consistency with other grafana traces, lower cardinality name
* chore(perf): Pre-allocate where possible (enable prealloc linter)
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
* fix TestAlertManagers_buildRedactedAMs
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
* prealloc a slice that appeared after rebase
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
---------
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>