grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-15 10:03:33 -06:00

Author	SHA1	Message	Date
Neel	db1fd10ff1	Alerting: Append org ID to alert notification URLs (#57123 )	2022-11-07 16:03:25 +00:00
Yuriy Tseretyan	0a4121cef8	Alerting: Contextual log provider for rule key (#57476 ) * create contextual log context provider * use contextual provider in scheduler * init logger in the package * use context for log context * use context in state manager	2022-10-26 19:16:02 -04:00
Yuriy Tseretyan	2d20c8db7b	Chore: Expression engine to support relative time range (#57474 ) * make TimeRange interface and add relative range * make Execute methods support the current time * update resample to support relative time range * update DSNode to support relative time range * update query service to create queries with absolute time * make alerting evaluator create relative time ranges	2022-10-26 16:13:58 -04:00
George Robinson	802d67eeca	Alerting: Support values in notification templates (#56457 ) We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't. Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via. This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!	2022-10-10 13:40:21 +01:00
Joe Blubaugh	b476ae62fb	Alerting: Write and Delete multiple alert instances. (#55350 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. These new transactions are off by default, guarded by the feature toggle "alertingBigTransactions" Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances.	2022-10-06 14:22:58 +08:00
Alexander Weaver	d66ed6fe35	Alerting: Move stray model structs in store package to model package (#55968 ) * Move stray command structs to model package like the rest * Fix broken references	2022-09-29 15:47:56 -05:00
Alexander Weaver	d17ab82b98	Alerting: Break up store.RuleStore interface, delete dead code (#55776 ) * Refactor state manager to not depend on rule store interface * Refactor grafana and proxied ruler APIs to not depend on store.RuleStore * Refactor folder subscription logic to not use store.RuleStore * Delete dead code * Delete store.RuleStore	2022-09-27 08:56:30 -05:00
Alexander Weaver	f11495a4c3	Alerting: Remove dead functionality from alert instance store (#55774 ) * Update tests to use ListAlertInstances * Drop the actual methods rather than just updating tests	2022-09-26 14:38:53 -05:00
Yuriy Tseretyan	2d38664fe6	Alerting: Improve validation of query and expressions on rule submit (#53258 ) * Improve error messages of server-side expression * move validation of alert queries and a condition to eval package	2022-09-21 15:14:11 -04:00
Yuriy Tseretyan	199996cbf9	Alerting: Resolve stale state + add state reason to notifications (#49352 ) * adds a new reserved annotation `grafana_state_reason` * explicitly resolve stale states	2022-09-21 13:24:47 -04:00
Joe Blubaugh	22c937340e	Revert "Alerting: Write and Delete multiple alert instances. (#54072 )" (#54885 ) This reverts commit `5e4fd94413`.	2022-09-09 17:44:06 +02:00
Joe Blubaugh	5e4fd94413	Alerting: Write and Delete multiple alert instances. (#54072 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances. This change also updates some of our tests so that they run successfully against postgreSQL - we were using random Int64s, but postgres integers, which our tables use, max out at 2^31-1	2022-09-02 11:17:20 +08:00
Timur Olzhabayev	b5b41988cf	Docs: Deprecating packages_api and removing it from our pipelines (#54473 )	2022-09-01 18:15:44 +02:00
Yuriy Tseretyan	76ea0b15ae	Alerting: Scheduler to fetch folders along with rules (#52842 ) * Update GetAlertRulesForScheduling to query for folders (if needed) * Update scheduler's alertRulesRegistry to cache folder titles along with rules * Update rule eval loop to take folder title from the * Extract interface RuleStore * Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.	2022-08-31 11:08:19 -04:00
Yuriy Tseretyan	41bd36eb97	Alerting: Update rules delete endpoint to handle rules in group (#53790 ) * update RouteDeleteAlertRules rules to update as a group * remove expecter from scheduler mock to support variadic function * create function to check for provisioning status + tests Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>	2022-08-24 15:33:33 -04:00
Yuriy Tseretyan	9f90a7b54d	Alerting: State manager to use InstanceStore (#53852 ) * move saving the state to state manager when scheduler stops * move saving state to ProcessEvalResults * add GetRuleKey to State * add LogContext to AlertRuleKey	2022-08-18 09:40:33 -04:00
Alexander Weaver	f093c249ac	Alerting: Fix incorrect embedded DTO being returned when handling rule groups (#53701 ) * Fix DTO embedding when getting/putting alert rule groups * Drop usage of word 'Domain' * Rename var as well	2022-08-12 16:36:50 -05:00
George Robinson	196b781c70	Alerting: Delete expired images from the database (#53236 ) This commit adds a DeleteExpiredService that deletes expired images from the database. It is run in the periodic collector service.	2022-08-09 15:28:36 +01:00
Jean-Philippe Quéméner	54217a2037	Alerting: set dashboard and panel id using annotations in provisioning api (#53221 )	2022-08-03 16:05:32 +02:00
Yuriy Tseretyan	5fb778814c	Alerting: Update rules version when folder title is updated (#53013 ) * remove support for bus from scheduler * rename event to FolderTitleUpdated and fire only if title has changed * add method to increase version of all rules that belong to a folder * update ngalert service to subscribe to folder title change event call data store and update scheduler * add tests	2022-08-01 19:28:38 -04:00
Yuriy Tseretyan	a081764fd8	Alerting: Scheduler to use AlertRule (#52354 ) * update GetAlertRulesForSchedulingQuery to have result AlertRule * update fetcher utils and registry to support AlertRule * alertRuleInfo to use alert rule instead of version * update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation * update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels * remove unused function in ruleRoutine * remove unused model SchedulableAlertRule * store rule version in ruleRoutine instead of rule * do not call the sender if nothing to send	2022-07-26 09:40:06 -04:00
Yuriy Tseretyan	054fe54b03	Alerting: Split Scheduler and AlertRouter tests (#52416 ) * move fake FakeExternalAlertmanager to sender package * move tests from scheduler to router * update alerts router to have all fields private * update scheduler tests to use sender mock	2022-07-19 09:32:54 -04:00
Yuriy Tseretyan	6e1e4a4215	Alerting: Update DbStore to use disabled orgs from the config (#52156 ) * update DbStore to use UnifiedAlerting settings * remove disabled orgs from scheduler and use config in db store instead * remove test	2022-07-15 14:13:30 -04:00
Yuriy Tseretyan	e5e8747ee9	Alerting: Update state manager to accept reserved labels (#52189 ) * add tests for cache getOrCreate * update ProcessEvalResults to accept extra lables * extract to getRuleExtraLabels * move populating of constant rule labels to extra labels	2022-07-14 15:59:59 -04:00
Alexander Weaver	2d7389c34d	Alerting: Provisioning API respects global rule quota (#52180 ) * Inject interface for quota service and create mock * Check quota and return 403 if limit exceeded * Implement tests for quota being exceeded	2022-07-13 17:36:17 -05:00
Yuriy Tseretyan	554ebd647b	Alerting: Refactor Evaluator (#51673 ) * AlertRule to return condition * update ConditionEval to not return an error because it's always nil * make getExprRequest private * refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable. * log error if results have errors * change signature of evaluate function to not return an error	2022-07-12 16:51:32 -04:00
George Robinson	6844ac9879	Alerting: Change __alertScreenshotToken__ to __alertImageToken__ (#50771 )	2022-07-04 06:05:36 -04:00
Jean-Philippe Quéméner	580c5b6ad2	Alerting: add YAML support for relative time range (#51694 )	2022-07-04 06:03:34 -04:00
Yuriy Tseretyan	8b3b667a47	Alerting: Fix rule API to accept 0 duration of field `For` (#50992 ) * make 'for' pointer to distinguish between missing field and 0 * set 'for' to -1 if the value is missing but not allow negative in the request + path -1 with the value from original rule * update store validation to not allow negative 'for' * update usages to use pointer	2022-06-30 11:46:26 -04:00
Yuriy Tseretyan	78c012df65	move eval_conditions to API models package (#51447 )	2022-06-27 11:52:41 -04:00
Yuriy Tseretyan	ee5bcf2b96	make test more stable (#51268 )	2022-06-22 12:53:16 -04:00
Yuriy Tseretyan	4d02f73e5f	Alerting: Persist rule position in the group (#50051 ) Migrations: * add a new column alert_group_idx to alert_rule table * add a new column alert_group_idx to alert_rule_version table * re-index existing rules during migration API: * set group index on update. Use the natural order of items in the array as group index * sort rules in the group on GET * update the version of all rules of all affected groups. This will make optimistic lock work in the case of multiple concurrent request touching the same groups. UI: * update UI to keep the order of alerts in a group	2022-06-22 10:52:46 -04:00
Matthew Jacobson	5dee2ed24c	Alerting: Add first Grafana reserved label grafana_folder (#50262 ) * Alerting: Add first Grafana reserved label g_label g_label holds the title of the folder container the alert. The intention of this label is to use it as part of the new default notification policy groupBy. * Add nil check on updateRule labels map * Disable gocyclo lint on schedule.ruleRoutine will remove later in a separate refactoring PR to reduce complexity. * Address doc suggestions * Update g_folder for rules in folder when folder title changes * Remove global bus in FolderService * Modify tests to fit new common g_folder label * Add changelog entry * Fix merge conflicts * Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`	2022-06-17 13:10:49 -04:00
Yuriy Tseretyan	c314ce48c7	Alerting: Support for optimistic locking for alert rules (#50274 ) * add support for optimistic locking for alert_rule table * return 409 in the case of opitimistic lock	2022-06-13 12:15:28 -04:00
Jean-Philippe Quéméner	862f51216b	Alerting: improve provisioning docs (#50347 ) * Alerting: improve provisioning docs * add new provisioning page * add api docs * fix formatting and add better descriptions * fix typo	2022-06-10 16:25:15 +02:00
Jean-Philippe Quéméner	cf684ed38f	Alerting: bump rule version when updating rule group interval (#50295 ) * Alerting: move group update to alert rule service * rename validateAlertRuleInterval to validateRuleGroupInterval * init baseinterval correctly * add seconds suffix * extract validation function for reusability * add context to err message	2022-06-09 09:28:32 +02:00
Yuriy Tseretyan	a89d4a5be7	Alerting: Scheduler to drop ticks if a rule's evaluation is too slow (#48885 ) * drop ticks if evaluation of a rule is too slow. * add metric schedule_rule_evaluations_missed_total	2022-06-08 12:50:44 -04:00
Yuriy Tseretyan	49d93fb67e	Alerting: Update alert rule diff to not see difference between nil and empty map (#50192 )	2022-06-03 21:27:29 +02:00
Yuriy Tseretyan	ad25e2a20c	Alerting: Update RBAC for alert rules to consider access to rule as access to group it belongs (#49033 ) * update authz to exclude entire group if user does not have access to rule * change rule update authz to not return changes because if user does not have access to any rule in group, they do not have access to the rule * a new query that returns alerts in group by UID of alert that belongs to that group * collect all affected groups during calculate changes * update authorize to check access to groups * update tests for calculateChanges to assert new fields * add authorization tests	2022-06-01 10:23:54 -04:00
Joe Blubaugh	9e8efaa459	Alerting: Add stored screenshot utilities to the channels package. (#49470 ) Adds three functions: `withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function. `withStoredImage` does this for an image attached to a specific alert. `openImage` finds and opens an image file on disk. Moves `store.Image` to `models.Image` Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods. Updates all pkg/alert/notifier/channels to use withStoredImage routines.	2022-05-26 13:29:56 +08:00
Joe Blubaugh	1cc034d960	Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259 ) This change adds a field to state.State and models.AlertInstance that indicate the "Reason" that an instance has its current state. This helps us account for cases where the state is "Normal" but the underlying evaluation returned "NoData" or "Error", for example. Fixes #42606 Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>	2022-05-23 16:49:49 +08:00
Joe Blubaugh	1d724810de	Alerting: State Manager takes screenshots. (#49338 ) The State Manager will now take screenshots when an alert instance switches to an Alerting or Resolved state. Signed-off-by: Joe Blubaugh joe.blubaugh@grafana.com	2022-05-23 10:53:41 +08:00
George Robinson	43358c7248	Alerting: Keep private annotations across evaluations (#49080 )	2022-05-18 11:21:18 +02:00
Yuriy Tseretyan	952cb4fc0b	Alerting: introduce AlertRuleGroupKey and use it in API handlers (#48945 ) * create AlertGroupKey structure * update PrometheusSrv. - extract creation of RuleGroup to a separate method. Use group key for grouping * update RuleSrv - update calculateChanges to use groupKey - authorize to use groupkey	2022-05-16 15:45:45 -04:00
Yuriy Tseretyan	369fcc5e9a	Alerting: scheduler to use short version of model for alert rule (#48916 ) * scheduler to use a short version of alert rule model	2022-05-12 09:55:05 -04:00
Alexander Weaver	078a578803	Drop ProvenanceOrgAdapter and build into store API instead (#48137 )	2022-04-26 10:30:57 -05:00
George Robinson	c5547123bc	Remove redundant queries in GetAlertRules and GetOrgAlertRules and replace with ListAlertRules (#48108 )	2022-04-25 11:42:42 +01:00
George Robinson	d66fc6ed1a	Alerting: Add GetRuleGroups to RuleStore (#48036 ) This commit adds a new method GetRuleGroups to RuleStore which returns the set of rule groups across all organizations.	2022-04-21 17:59:22 +01:00
Jean-Philippe Quéméner	388ecb4037	Alerting: Provisioning API - Contact points (#47197 )	2022-04-13 22:15:55 +02:00
Alexander Weaver	dde0b93cf1	Alerting: Provisioning API - Notification Policies (#46755 ) * Base-line API for provisioning notification policies * Wire API up, some simple tests * Return provenance status through API * Fix missing call * Transactions * Clarity in package dependencies * Unify receivers in definitions * Fix issue introduced by receiver change * Drop unused internal test implementation * FGAC hooks for provisioning routes * Polish, swap names * Asserting on number of exposed routes * Don't bubble up updated object * Integrate with new concurrency token feature in store * Back out duplicated changes * Remove redundant tests * Regenerate and create unit tests for API layer * Integration tests for auth * Address linter errors * Put route behind toggle * Use alternative store API and fix feature toggle in tests * Fixes, polish * Fix whitespace * Re-kick drone * Rename services to provisioning	2022-04-05 16:48:51 -05:00

1 2 3

105 Commits