grafana

mirror of https://github.com/grafana/grafana.git synced 2025-02-14 17:43:35 -06:00

Author	SHA1	Message	Date
Marcus Efraimsson	e4c1a7a141	Tracing: Standardize on otel tracing (#75528 )	2023-10-03 14:54:20 +02:00
Steve Simpson	894f420014	Alerting: Pass loggers into SchedulerCfg and ManagerCfg. (#75158 )	2023-09-20 15:07:02 +02:00
Will Browne	e855efb13d	Plugins: Move store and plugin dto to pluginsintegration (#74655 ) move store and plugin dto	2023-09-11 13:59:24 +02:00
Ryan McKinley	025b2f3011	Chore: use any rather than interface{} (#74066 )	2023-08-30 18:46:47 +03:00
Yuri Tseretyan	938e26b59f	Alerting: Add new metrics and tracings to state manager and scheduler (#71398 ) * add metrics and tracing to state manager * propagate tracer to state manager * add scheduler metrics * fix backtesting * add test for state metrics * remove StateUpdateCount * update docs * metrics can be null * add tracer to new tests	2023-08-16 09:04:18 +02:00
Yuri Tseretyan	c7598cc6fb	Alerting: Add ability to control scheduler tick interval via config (#71980 ) * add ability to control scheduler interval via config * add feature flag `configurableSchedulerTick`	2023-07-26 12:44:12 -04:00
Will Browne	a8577c21ba	Plugins: Migrate PluginStore mock to pre-existing fakes package (#71664 ) * migrate to existing fakes package * fix imports	2023-07-17 10:21:44 +00:00
Kyle Brandt	f6a28cadbc	Alerting: (Chore/Instrumentation) Add traceID to logs with contextual logger (#71289 ) Alerting: (Chore) Add traceID to logs with contextual logger	2023-07-11 10:59:52 +02:00
Yuri Tseretyan	ada325de2a	Alerting: Use unsafe.Slice for hashing a string during rule fingerprint calculation (#71000 )	2023-06-30 14:58:23 -04:00
George Robinson	7edbe72483	Alerting: Support concurrent queries for saving alert instances (#70525 ) This commit adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.	2023-06-23 11:36:07 +01:00
SatVeer Singh	1bfa3a0f1e	Chore: Replace go-multierror with errors package (#66432 ) * code refactor and type assertions added to tests * no-lint rule added for specific line	2023-06-19 12:29:45 +03:00
Matthew Jacobson	ba3994d338	Alerting: Repurpose rule testing endpoint to return potential alerts (#69755 ) * Alerting: Repurpose rule testing endpoint to return potential alerts This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations. The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent. This means that the endpoint will, among other things: - Attach static annotations and labels from the rule configuration to the alert instances. - Attach dynamic annotations from the datasource to the alert instances. - Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances. - Interpolate templated annotations / labels and accept allowed template functions.	2023-06-08 18:59:54 -04:00
Yuri Tseretyan	9eb10bee1f	Alerting: Scheduler use rule fingerprint instead of version (#66531 ) * implement calculation of fingerprint for ruleWithFolder * update scheduler to use fingerprint instead of rule's version	2023-04-28 10:42:16 -04:00
Santiago	b0881daf23	Alerting: Use URLs in image annotations (#66804 ) * use tokens or urls in image annotations * improve tests, fix some comments * fix empty tokens * code review changes, check for url before checking for token (support old token formats)	2023-04-26 13:06:18 -03:00
Kyle Brandt	840fb32ad8	SSE: (Instrumentation) Add Tracing (#66700 ) spans are prefixed `SSE.`	2023-04-18 08:04:51 -04:00
Kyle Brandt	2f13c851e4	SSE: (Chore/Instrumentation) Add ds_queries_total metric and move met… (#66695 ) * SSE: (Chore/Instrumentation) Add ds_queries_total metric and move metrics to service	2023-04-17 16:12:44 -07:00
Kyle Brandt	e78be44e1a	SSE: Dataplane Compliance (#65927 ) Takes a specific code path for data that identifies itself as dataplane instead of "guessing" what the data is. The data must identify itself by being in the dataplane by having both the following frame metadata properties: - TypeVersion property that is greater than 0.0 - 'Type' property The flag is disableSSEDataplane and disables this functionality and uses the old code for all queries regardless. See https://github.com/grafana/grafana-plugin-sdk-go/blob/main/data/contract_docs/contract.md for dataplane details.	2023-04-12 12:24:34 -04:00
gotjosh	1c3ce0735f	Alerting: Tiny refactor on the eval and schedule packages (#66130 ) * Alerting: Tiny refactor on the eval and schedule packages two very small things: - We had a constructor on something called a `Context` which is not a `context.Context` so let's just name that constructor `NewContext` - The user that we use to run query evaluations is the same (with some variation) abstract it to a function so that it can be re-used when necessary. * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com> * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com> --------- Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>	2023-04-06 16:02:28 +01:00
Alexander Weaver	9bcf8819d3	Alerting: Handful of small adjustments to log levels and parameters (#64572 ) Calculate duration earlier in scheduler	2023-03-17 12:15:49 +00:00
Yuri Tseretyan	85a954cd81	Alerting: Update scheduler to get updates only from database (#64635 ) * stop using the scheduler's Update and Delete methods all communication must be via the database * update scheduler's registry to calculate diff before re-setting the cache * update fetcher to return the diff generated by registry * update processTick to update rule eval routine if the rule was updated and it is not going to be evaluated at this tick. * remove references to the scheduler from api package * remove unused methods in the scheduler	2023-03-14 18:02:51 -04:00
Alex Moreno	f60dc4441f	Alerting: Add status label to GroupRules metric (#63454 ) * Add status label to GroupRules metric * Add state (active and paused) label to GrouRules * Add active/paused metrics tests	2023-02-23 12:38:27 +01:00
Steve Simpson	4d1a2c3370	Alerting: Move `rule_groups_rules` metric from State to Scheduler. (#63144 ) The `rule_groups_rules` metric is currently defined and computed by `State`. It makes more sense for this metric to be computed off of the configured rule set, not based on the rule evaluation state. There could be an edge condition where a rule does not have a state yet, and so is uncounted. Additionally, we would like this metric (and others), to have a `rule_group` label, and this is much easier to achieve if the metric is produced from the `Scheduler` package.	2023-02-09 17:05:19 +01:00
Yuri Tseretyan	f066e8cdcd	Alerting: Update to alerting 20230203015918-0e4e2675d7aa (after refactoring) (#62823 ) * add alerting prefix to some packages from alerting that have similar names in prometheus alertmanager	2023-02-03 11:36:49 -05:00
ismail simsek	91221bc436	Expressions: Fixes the issue showing expressions editor (#62510 ) * Use suggested value for uid * update the snapshot * use __expr__ * replace all -100 with __expr__ * update snapshot * more changes * revert redundant change * Use expr.DatasourceUID where it's possible * generate files	2023-01-31 18:50:10 +01:00
Alex Moreno	7a465f42a6	Alerting: Allow pausing alerts from provisioning (#62263 ) * Allow pausing alerts from provisioning * Update swagger * Add IsPaused to provision export endpoints * Add pause field in sample.yml * Add exception for reset state in first loop iteration of scheduler if rule is paused * Update provision definition and swagger docs * Fix provisioning export tests * Suggestion: Simplify if condition * Add more context to a comment	2023-01-30 16:29:05 +01:00
Serge Zaitsev	d6d4097567	Chore: Fix goimports grouping in alerting (#62424 ) * fix goimports * fix goimports order	2023-01-30 09:55:35 +01:00
Yuri Tseretyan	05bf241952	Alerting: Update state manager to return StateTransitions when Delete or Reset (#62264 ) * update Delete and Reset methods to return state transitions this will be used by notifier code to decide whether alert needs to be sent or not. * update scheduler to provide reason to delete states and use transitions * update FromAlertsStateToStoppedAlert to accept StateTransition and filter by old state * fixup * fix tests	2023-01-27 09:46:21 +01:00
Alex Moreno	531b439cf1	Alerting: Add alert pausing feature (#60734 ) * Add field in alert_rule model, add state to alert_instance model, and state to eval * Remove paused state from eval package * Skip paused alert rules in scheduler * Add migration to add is_paused field to alert_rule table * Convert to postable alerts only if not normal, pernding, or paused * Handle paused eval results in state manager * Add Paused state to eval package * Add paused alerts logic in scheduler * Skip alert on scheduler * Remove paused status from eval package * Apply suggestions from code review Co-authored-by: George Robinson <george.robinson@grafana.com> * Remove state * Rethink schedule and manager for paused alerts * Change return to continue * Remove unused var * Rethink alert pausing * Paused alerts storing annotations * Only add one state transition * Revert boolean method renaming refactor * Revert take image refactor * Make registry errors public * Revert method extraction for getting a folder title * Revert variable renaming refactor * Undo unnecessary changes * Revert changes in test * Remove IsPause check in PatchPartiLAlertRule function * Use SetNormal to set state * Fix text by returning to old behaviour on alert rule deletion * Add test in schedule_unit_test.go to test ticks with paused alerts * Add coment to clarify usage of context.Background() * Add comment to clarify resetStateByRuleUID method usage * Move rule get to a more limited scope * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * rum gofmt on pkg/services/ngalert/schedule/schedule.go * Remove defer cancel for context * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/testing.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * skip scheduler rule state clean up on paused alert rule * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Fix mock in test * Add (hopefully) final suggestions * Use error channel from recordAnnotationsSync to cancel context * Run make gen-cue * Place pause alert check in channel update after version check * Reduce branching un update channel select * Add if for error and move code inside if in state manager ResetStateByRuleUID * Add reason to logs * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Do not delete alert rule routine, just exit on eval if is paused * Reduce branching and create-close a channel to avoid deadlocks * Separate state deletion and state reset (includes history saving) * Add current pause state in rule route in scheduler * Split clearState and bring errCh closer to RecordStatesAsync call * Change rule to ruleMeta in RecordStatesAsync * copy state to be able to modify it * Add timeout to context creation * Shorten the timeout * Use resetState is rule is paused and deleteState if rule is not paused * Remove Empty state reason * Save every rule change in historian * Add tests for DeleteStateByRuleUID and ResetStateByRuleUID * Remove useless line * Remove outdated comment Co-authored-by: George Robinson <george.robinson@grafana.com> Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>	2023-01-26 18:29:10 +01:00
Santiago	b5fa9e3501	Chore: Fix "manger" typo (#61649 ) fix mangers -> managers	2023-01-17 23:13:27 +00:00
Yuri Tseretyan	86b5fbbf60	Alerting: Introduce state manager config structure (#61249 )	2023-01-10 16:26:15 -05:00
George Robinson	2a291afbae	Alerting: Use consts from alerting package (#61241 )	2023-01-10 19:59:13 +00:00
Yuri Tseretyan	da18c89e91	Alerting: Scheduler to call DeleteAlertRule once when it stops deleted rules (#61189 ) scheduler to call DeleteAlertRule once when it stops deleted rules	2023-01-09 14:39:32 -05:00
Yuri Tseretyan	48f1db63ff	Alerting: Add support for tracing to alerting scheduler (#61057 )	2023-01-06 21:21:43 -05:00
Yuri Tseretyan	c5ee4e4ae1	Alerting: Improve rule validation to check if rule uses backend datasources (#58986 ) * validate if rule uses backend datasources * add backend datasource to test * fix tests * another forgotten import * remove unused var	2022-12-08 10:44:02 +01:00
Yuri Tseretyan	abb49d96b5	Alerting: update state manager to return StateTransition instead of State (#58867 ) * improve test for stale states * update state manager return StateTransition * update scheduler to accept state transitions	2022-12-06 13:07:39 -05:00
Alexander Weaver	9977c7ea43	Alerting: Simplify scheduler configuration and remove dependency on Grafana-wide settings (#59735 ) * Make scheduler not depend directly on grafana-wide settings * Re-add missing interval	2022-12-02 16:02:07 -06:00
Alexander Weaver	2bfdda5b68	Alerting: Break dependency between state and image packages (#58381 ) * Refactor state and manager to not depend directly on image interface * Move generic errors to models package * Move NotAvailableImageService to state as its only references are in state tests * Move NoopImageService to state package * Move mock to state package * Fix linter error * Fix comment styling * Fix a couple added references introduced by rebase * Empty commit to kick build	2022-11-09 15:06:49 -06:00
Yuri Tseretyan	bad4f28d0d	Alerting: update test TestAlertingTicker to not rely on clock (#58544 ) * extract method processTick * make processTick return scheduled rules * move state manager tests to state manager * update test * move all tests into one file * remove unused fields	2022-11-09 15:08:57 -05:00
Yuri Tseretyan	3621cf5a12	Alerting: Update handling of stale state (#58276 ) * delete all stale states in one lock * do not use touched states to detect stale rely only on LastEvaluationTime maintained correctly * fix tests to use correct eval time * delete unused method	2022-11-07 11:03:53 -05:00
Neel	db1fd10ff1	Alerting: Append org ID to alert notification URLs (#57123 )	2022-11-07 16:03:25 +00:00
Yuri Tseretyan	978f1119d7	Alerting: Run state manager as regular sub-service (#58246 )	2022-11-04 17:06:47 -04:00
Ryan McKinley	e6a9fa1cf9	ServiceAccounts: enable service accounts after IsRealUser change (#58263 ) * suppor service accounts * add: IsServiceAccount to scheduleUser in scheduler Co-authored-by: eleijonmarck <eric.leijonmarck@gmail.com>	2022-11-04 15:53:35 -04:00
Yuri Tseretyan	dce8879145	Alerting: Update state manager to accept rule store as Warm method argument (#58244 )	2022-11-04 14:23:08 -04:00
Eric Leijonmarck	72d0c6b428	Auth: add IsServiceAccount to IsRealUser (#58015 ) * add: IsServiceAccount to SignedInUser and IsRealUser * fix: linting error * refactor: add function IsServiceAccountUser() By adding the function IsServiceAccountUser() we use it to identify for ServiceAccounts in the HasUniqueID() since caching is built up on having a uniqueID, see comment: https://github.com/grafana/grafana/pull/58015#discussion_r1011361880	2022-11-04 12:39:54 +00:00
Yuriy Tseretyan	e3a4bde622	Alerting: Condition evaluator with cached pipeline (#57479 ) * create rule evaluator * load header from the context * init one factory * update scheduler	2022-11-02 10:13:39 -04:00
Yuriy Tseretyan	0a4121cef8	Alerting: Contextual log provider for rule key (#57476 ) * create contextual log context provider * use contextual provider in scheduler * init logger in the package * use context for log context * use context in state manager	2022-10-26 19:16:02 -04:00
Alexander Weaver	de46c1b002	Alerting: Improve logs in state manager and historian (#57374 ) * Touch up log statements, fix casing, add and normalize contexts * Dedicated logger for dashboard resolver * Avoid injecting logger to historian * More minor log touch-ups * Dedicated logger for state manager * Use rule context in annotation creator * Rename base logger and avoid redundant contextual loggers	2022-10-21 16:16:51 -05:00
Yuriy Tseretyan	f3c219a980	Alerting: update format of logs in scheduler (#57302 ) * Change the severity level of the log messages	2022-10-20 13:43:48 -04:00
Alexander Weaver	3ddb28bad9	Find-and-replace 'err' logs to 'error' to match log search conventions (#57309 )	2022-10-19 17:36:54 -04:00
Yuriy Tseretyan	3e6bc28de5	Alerting: Change severity level of fetcher log messages (#57299 )	2022-10-19 16:00:47 -04:00

1 2 3 4

186 Commits