Prior to this change, all alert instance writes and deletes happened
individually, in their own database transaction. This change batches up
writes or deletes for a given rule's evaluation loop into a single
transaction before applying it.
These new transactions are off by default, guarded by the feature toggle "alertingBigTransactions"
Before:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s
```
After:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s
```
So we cut time by about 75% and memory allocations by about 60% when
storing and deleting 100 instances.
* WIP
* Set public_suffix to a pre Ruby 2.6 version
* we don't need to install python
* Stretch->Buster
* Bump versions in lib.star
* Manually update linter
Sort of messy, but the .mod-file need to contain all dependencies that
use 1.16+ features, otherwise they're assumed to be compiled with
-lang=go1.16 and cannot access generics et al.
Bingo doesn't seem to understand that, but it's possible to manually
update things to get Bingo happy.
* undo reformatting
* Various lint improvements
* More from the linter
* goimports -w ./pkg/
* Disable gocritic
* Add/modify linter exceptions
* lint + flatten nested list
Go 1.19 doesn't support nested lists, and there wasn't an obvious workaround.
https://go.dev/doc/comment#lists
Prior to this change, all alert instance writes and deletes happened
individually, in their own database transaction. This change batches up
writes or deletes for a given rule's evaluation loop into a single
transaction before applying it.
Before:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s
```
After:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s
```
So we cut time by about 75% and memory allocations by about 60% when
storing and deleting 100 instances.
This change also updates some of our tests so that they run successfully against postgreSQL - we were using random Int64s, but postgres integers, which our tables use, max out at 2^31-1
* Add database migrations
* Use short uids as data key ids
* Add support for manual data key rotation
* Fix duplicated mutex unlocks
* Fix migration
* Manage current data keys per name
* Adjust key re-encryption and test
* Modify rename column migration for MySQL compatibility
* Refactor secrets manager and data keys cache
* Multiple o11y adjustments
* Fix stats query
* Apply suggestions from code review
Co-authored-by: Tania <yalyna.ts@gmail.com>
* Fix linter
* Docs: Rotate data encryption keys API endpoint
Co-authored-by: Tania <yalyna.ts@gmail.com>
* update AlertingEnabled and UnifiedAlertingSettings.Enabled to be pointers
* add a pseudo migration to fix the AlertingEnabled and UnifiedAlertingSettings.Enabled if the latter is not defined
* update the default configuration file to make default value for both 'enabled' flags be undefined
Misc
* update Migrator to expose DB engine. This is needed for a ualert migration to access the database while the list of migrations is created.
* add more verbose failure when migrations do not match
Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
Fixes#30144
Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
* Alerting: modify table and accessors to limit org access appropriately
* Update migration to create multiple Alertmanager configs
* Apply suggestions from code review
Co-authored-by: gotjosh <josue@grafana.com>
* replace mg.ClearMigrationEntry()
mg.ClearMigrationEntry() would create a new session.
This commit introduces a new migration for clearing an entry from migration log for replacing mg.ClearMigrationEntry() so that all dashboard alert migration operations will run inside the same transaction.
It adds also `SkipMigrationLog()` in Migrator interface for skipping adding an entry in the migration_log.
Co-authored-by: gotjosh <josue@grafana.com>
makes it so the feature flag can be turned on off, and the migration will be cleared and rerun. All existing NG alert rules, configuration settings, etc are removed when disabling the feature flag.
for https://github.com/grafana/alerting-squad/issues/142
Co-authored-by: Sofia Papagiannaki <sofia@grafana.com>
* Always use cache: stop passing skipCache among ngalert functions
* Add updated column
* Scheduler initial draft
* Add retry on failure
* Allow settting/updating alert definition interval
Set default interval if no interval is provided during alert definition creation.
Keep existing alert definition interval if no interval is provided during alert definition update.
* Parameterise alerting.Ticker to run on custom interval
* Allow updating alert definition interval without having to provide the queries and expressions
* Add schedule tests
* Use xorm tags for having initialisms with consistent case in Go
* Add ability to pause/unpause the scheduler
* Add alert definition versioning
* Optimise scheduler to fetch alert definition only when it's necessary
* Change MySQL data column to mediumtext
* Delete alert definition versions
* Increase default scheduler interval to 10 seconds
* Fix setting OrgID on updates
* Add validation for alert definition name length
* Recreate tables
* sqlstore: Run tests as integration tests
* Truncate database instead of re-creating it on each test
* Fix test description
See https://github.com/grafana/grafana/pull/12129
* Fix lint issues
* Fix postgres dialect after review suggestion
* Rename and document functions after review suggestion
* Add periods
* Fix auto-increment value for mysql dialect
Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com>
* Implementation of optimistic lock pattern
Try to insert the remote cache key and handle integrity error
* Remove transaction
Integrity error inside a transaction results in deadlock
* Remove check for existing remote cache key
Is no longer needed since integrity constrain violations are handled
* Add check for integrity constrain violation
Do not update the row if the insert statement fails
for other than an integrity constrain violation
* Handle failing inserts because of deadlocks
If the insert statement fails because of a deadlock
try to update the row
* Add utility function for returning SQL error code
Useful for debugging
* Add logging for failing expired cache key deletion
Do not shallow it completely
* Revert "Add utility function for returning SQL error code"
This reverts commit 8e0b82c79633e7d8bc350823cbbab2ac7a58c0a5.
* Better log for failing deletion of expired cache key
* Add some comments
* Remove check for existing cache key
Attempt to insert the key without checking if it's already there
and handle the error situations
* Do not propagate deadlocks created during update
Most probably somebody else is trying to insert/update
the key at the same time so it is safe enough to ignore it
This is basically implementation of the https://github.com/grafana/grafana/issues/8900#issuecomment-435437167
points, except for the type conversion bit.
I tried to implement idea mentioned in cockroachdb ticket (see below).
And it is possible, but it complicates things as lot - not only we have to
have 4 SQL statements instead of one, but we would have to copy the column
structure as well - PK, FG, indexes and stuff, plus there will
be additional downtime with this approach.
So idea for this pull is to prepare our SQL as much as possible, so when
cockroachdb will add support for full type conversions, we could easilly add
support for it as well.
* Add `CASCADE` to `DROP INDEX` statement
* Make string conversions explicit
Thanks @Luit
Ref #8900
Ref cockroach/cockroach#9851
* refactor: tracing service refactoring
* refactor: sqlstore to instance service
* refactor: sqlstore & registory priority
* refactor: sqlstore refactor wip
* sqlstore: progress on getting tests to work again
* sqlstore: progress on refactoring and getting tests working
* sqlstore: connection string fix
* fix: not sure why this test is not working and required changing expires
* fix: updated grafana-cli