* OK notifications using previous evaluation data
* copy rule.EvalMatches to avoid changes to the underlying array
* test cases added/modified
* delete trailing newline
* fix double newline in go import
* add change to the changelog
* specify that this only affects legacy alerting (changelog)
* use current eval data instead of prev eval data
* create evalMatch just once
* code comments, renamings, getTemplateMatches() function
* changelog and docs updated
* add check for access to rule's data source in GET APIs
* use more general method GetAlertRules instead of GetNamespaceAlertRules.
* remove unused GetNamespaceAlertRules.
Tests:
* create a method to generate permissions for rules
* extract method to create RuleSrv
* add tests for RouteGetNamespaceRulesConfig
* move validation at the beginning of method
* remove usage of GetOrgRuleGroups because it is not necessary. All information is already available in memory.
* remove unused method
This test of silence cleanup was flaky because of its use of real wall
time. In CI environments with slow execution, delays could cause the
test to fail. This change mitigates the problem by increasing the end time of
silences in the test.
After Prometheus merges this PR: https://github.com/prometheus/alertmanager/pull/2867
we can make the test fully deterministic by using a fake clock.
Fixes#47470
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
* Alerting: Introduce an internal changelog
Please note that this is not intented to replace Grafana's "add to changelog" label. It is _mostly_ for internal consumption of the Alerting team that owns this part of Grafana.
* Fix markdown formatting
* Fix changelog entry
* Base-line API for provisioning notification policies
* Wire API up, some simple tests
* Return provenance status through API
* Fix missing call
* Transactions
* Clarity in package dependencies
* Unify receivers in definitions
* Fix issue introduced by receiver change
* Drop unused internal test implementation
* FGAC hooks for provisioning routes
* Polish, swap names
* Asserting on number of exposed routes
* Don't bubble up updated object
* Integrate with new concurrency token feature in store
* Back out duplicated changes
* Remove redundant tests
* Regenerate and create unit tests for API layer
* Integration tests for auth
* Address linter errors
* Put route behind toggle
* Use alternative store API and fix feature toggle in tests
* Fixes, polish
* Fix whitespace
* Re-kick drone
* Rename services to provisioning
* Alerting: Accurately set value for prom-compatible APIs
Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames.
* Fix an extra test
* Ensure a consitent ordering
* Address review comments
* address review comments
* Add basic UI for custom ruler URL
* Add build info fetching for alerting data sources
* Add keeping data sources build info in the store
* Use data source build info to construct data source urls
* Remove unused code
* Add custom ruler support in prometheus api calls
* Migrate actions
* Use thunk condition to prevent multiple data source buildinfo fetches
* Unify prom and ruler rules loading
* Upgrade RuleEditor tests
* Upgrade RuleList tests
* Upgrade PanelAlertTab tests
* Upgrade actions tests
* Build info refactoring
* Get rid of lotex ruler support action
* Add prom ruler availability checking when the buildinfo is not available
* Add rulerUrlBuilder tests
* Improve prometheus data source validation, small build info refactoring
* Change prefix based on Prometheus subtype
* Use the correct path
* Revert config routing
* Add deprecation notice for /api/prom prefix
* Add tests to the datasource subtype
* Remove custom ruler support
* Remove deprecation notice
* Prevent fetching ruler rules when ruler api is not available
* Add build info tests
* Unify naming of ruler methods
* Fix test
* Change buildinfo data source validation
* Use strings for subtype params and unveil mimir
* organise imports
* frontend changes and wordsmithing
* fix test suite
* add a nicer verbose message for prometheus datasources
* detect Mimir datasource
* fix test
* fix buildinfo test for Mimir
* shrink vectors
* add some code documentation
* DRY prepareRulesFilterQueryParams
* clarify that Prometheus does not support managing rules
* Improve buildinfo error handling
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* make eval.Evaluator an interface
* inject Evaluator to TestingApiSrv
* move conditionEval to RouteTestGrafanaRuleConfig because it is the only place where it is used
* update rule test api to check data source permissions
* Use alert:create action for folder search with edit permissions. This matches the action that is used to query dashboards (the update will be addressed later)
* Update rule store to use FindDashboards instead of folder service to list folders the user has access to view alerts. Folder service does not support query type and additional filters.
* Do not check whether the user can save to folder if FGAC is enabled because it is checked on API level.
* require legacy Editor for post, put, delete endpoints
* require user to be signed in on group level because handler that checks that user has role Editor does not check it is signed in
* verify that the user has access to all data sources used by the rule that needs to be deleted from the group
* if a user is not authorized to access the rule, the rule is removed from the list to delete
* rename GetRuleGroupAlertRules to GetAlertRules
* make rule group optional in GetAlertRulesQuery
* simplify FakeStore. the current structure did not support optional rule group
update method getEvaluatorForAlertRule to accept permissions evaluator and exit on the first negative result, which is more effective than returning an evaluator that in fact is a bunch of slices.
Expired silences older than the retention period were not being cleaned up. The root problem was that notifier.Alertmanager overrides the Prometheus alert manager's silence maintenance function and was not calling Silences.GC() in the overriden function.
* Alerting: add collision safe update function for alertmanager configurations
* fix typo
* use bootstrap func for tests
* move hash calculation to store
* remove icons lol
* remove removed field
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* Alerting: Remove internal labels from prometheus compatible API responses
* Appease the linter
* Fix integration tests
* Fix API documentation & linter
* move removal of internal labels to the models
* move alerting actions to accesscontrol to avoid cycledeps
* define new actions and fixed roles for alerting
* add folder permission to alert reader role
* Remove InTransaction from RuleStore and make it its own interface
* Ensure that ctx-based is clear from name
* Resolve merge conflicts
* Refactor tests to work in terms of the introduced abstraction rather than concrete dbstore
* Move call to create permissions into folder service
* Inject cfg, feature toggles and permissions services into dashboard
service
* Move logic to set default permissions on create dashboard from api to
service
* Move call to set default permissions on import dashboard to dashboard
service
* Set permissions for provisioned dashboard and folders in service
* add actions, roles and route mapping for rule permission
* add instance\notification actions
* do not declare alerting roles if no feature flag is set (temporary)
* do not include update if no diff
* refactor calculate changes to include diff (and log)
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Add missing OK option to models
* add ok to legacy legacy UI does not support it but it is possible to do so via provisioning.
* use enums in migration so linter would catch missing cases
* Resolve merge conflicts
* Remove cruft from local exploration
* Move integration tests to intercept using new abstraction layer instead of channel
* Fix linter error after rebase
* add custom diff reporter DiffReporter that reports only paths that have a difference
* create Diff method for AlertRule that returns DiffReport, which is an alias for []Diff
Tests:
* create copy method for AlertRule in testing
* create GenerateAlertQuery method in testing
* Create DashAlertService service
* Remove no used dashboard service from plugin's manager that generates dependency cycle in Enterprise
* Remove bus for dashboard permissions
* Remove bus from dashboard extractor service
* Add missing argument
* Fix wire
* Fix lint
* More goimports
* Use datasource service instead sql calls
* Fix integration test
This commit changes staleResultsHandler to create an annotation if the current state is Alerting and the result is being removed from the state cache as it has not been updated since 2x the evaluation interval.
* add field for custom slack endpoint
* add test for using custom endpoint
* Update pkg/services/ngalert/notifier/channels/slack.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* specify description for endpoint
* remove brittle string constants
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Update API controller
- add validation of rules API model
- add function to calculate changes between the submitted alerts and existing alerts
- update RoutePostNameRulesConfig to validate input models, calculate changes and apply in a transaction
* Update DBStore
- delete unused storage method. All the logic is moved upstream.
- upsert to not modify fields of new by values from the existing alert
- if rule has UID do not try to pull it from db. (it is done upstream)
* Add rule generator
* Add providers to folder and dashboard services
* Refactor folder and dashboard services
* Move store implementation to its own file due wire cannot allow us to cast to SQLStore
* Add store in some places and more missing dependencies
* Bad merge fix
* Remove old functions from tests and few fixes
* Fix provisioning
* Remove store from http server and some test fixes
* Test fixes
* Fix dashboard and folder tests
* Fix library tests
* Fix provisioning tests
* Fix plugins manager tests
* Fix alert and org users tests
* Refactor service package and more test fixes
* Fix dashboard_test tets
* Fix api tests
* Some lint fixes
* Fix lint
* More lint :/
* Move dashboard integration tests to dashboards service and fix dependencies
* Lint + tests
* More integration tests fixes
* Lint
* Lint again
* Fix tests again and again anda again
* Update searchstore_test
* Fix goimports
* More go imports
* More imports fixes
* Fix lint
* Move UnprovisionDashboard function into dashboard service and remove bus
* Use search service instead of bus
* Fix test
* Fix go imports
* Use nil in tests
* API: Using go-swagger for extracting OpenAPI specification from source code
* Merge Grafana Alerting spec
* Include enterprise endpoints (if enabled)
* Serve SwaggerUI under feature flag
* Fix building dev docker images
* Configure swaggerUI
* Add missing json tags
Co-authored-by: Ying WANG <ying.wang@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Simplistic store API for provenance lookups on arbitrary types
* Add a few notes in comments
* Improved type safety for provisioned objects
* Clean-up TODOs for future PRs
* Clean up provisioning model
* Clean up tests
* Restrict allowable types in interface
* Fix linter error
* Move AlertRule domain methods to same file as AlertRule definition
* Update pkg/services/ngalert/models/provisioning.go
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Complete interface rename
* Pass context through store API
* More idiomatic method names
* Better error description
* Improve code-docs
* Use ORM language instead of raw sql
* Add support for records in different orgs
* ResourceTypeID -> ResourceType since it's not an ID
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Fix evaluation of alert rules for datasources with custom headers
* Fix unit tests
* Fix integration tests
* Evaluator fields should be package private
* (WIP) send alerts to external, internal, or both alertmanagers
* Modify admin configuration endpoint, update swagger docs
* Integration test for admin config updated
* Code review changes
* Fix alertmanagers choice not changing bug, add unit test
* Add AlertmanagersChoice as enum in swagger, code review changes
* Fix API and tests errors
* Change enum from int to string, use 'SendAlertsTo' instead of 'AlertmanagerChoice' where necessary
* Fix tests to reflect last changes
* Keep senders running when alerts are handled just internally
* Check if any external AM has been discovered before sending alerts, update tests
* remove duplicate data from logs
* update comment
* represent alertmanagers choice as an int instead of a string
* default alertmanagers choice to all alertmanagers, test cases
* update definitions and generate spec
* pass notification service down to the notifiers
* add ns to all notifiers
* remove bus from ngalert notifiers
* use smaller interfaces for notificationservice
* attempt to fix the tests
* remove unused struct field
* simplify notification service mock
* trying to resolve issues in the tests
* make linter happy
* make linter even happier
* linter, you are annoying
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache.
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry.
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
* Allow customizable googlechat message via optional setting
* Add optional message field in googlechat contact point configurator
* Fix strange error message on send if template fails to fully evaluate
* Elevate template evaluation failure logs to Warn level
* Extract default.title template embed from all channels to shared constant
* Create API test for overwriting invalid alertmanager config
* Avoid requiring alertmanager readiness for config changes
* AlertmanagerSrv depends on functionality rather than concrete types
* Add test for non-ready alertmanagers
* Additional cleanup and polish
* Back out previous integration test changes
* Refactor of tests incorrectly caused a test to become redundant
* Use pre-existing fake secret service
* Drop unused interface
* Test against concrete MultiOrgAlertmanager re-using fake infra from other tests
* Fix linter error
* Empty commit to rerun checks