This commit adds basic support for time_intervals, as mute_time_intervals
is deprecated in Alertmanager and scheduled to be removed before 1.0.
It does not add support for time_intervals in API or file provisioning,
nor does it support exporting time intervals. This will be added in
later commits to keep the changes as simple as possible.
* fix: sign in using auth_proxy with role a -> b -> a would end up with role b
* Update pkg/services/authn/clients/proxy.go
Co-authored-by: Karl Persson <kalle.persson92@gmail.com>
* Update pkg/services/authn/clients/proxy.go
Co-authored-by: Karl Persson <kalle.persson92@gmail.com>
* use split scopes instead of substr in search v1
* tests, of course
* yet, some test helpers dont use split scopes
* another test helper to fix
* add permission.identifier to group by
* check if attribute is uid
* fix tests
* use SplitScope()
* fix more tests
Previously receivers were only validated before saving the alertmanager
configuration. This is a suboptimal experience for those upgrading with preview
as the failed channel upgrade will return an API error instead of being
summarized in the table.
Adds a feature flag (alertingUpgradeDryrunOnStart) that will dry-run the legacy
alert upgrade on startup. It is enabled by default.
When on legacy alerting, this feature flag will log the results of the legacy
alerting upgrade on startup and draw attention to anything in the current legacy
alerting configuration that will cause issues when the upgrade is eventually
performed. It acts as a log warning for those where action is required before
upgrading to Grafana v11 where legacy alerting will be removed.
If the db already has an entry in the kvstore for the silences of an
alertmanager before the migration has taken place, then it's possible that the
active alertmanager will overwrite the silence file created by the migration
before it has a chance to load it into memory.
This should not happen normally but is possible in edge-cases.
This change opts to bypass the unnecessary step of writing the silences to disk
during the migration and instead write them directly to the kvstore. This avoids
the race condition entirely and is more correct as we treat the database as the
source of truth for AM state.
* add password service interface
* add password service implementation
* add tests for password service
* add password service wiring
* add feature toggle
* Rework from service interface to static function
* Replace previous password validations
* Add codeowners to password service
* add error logs
* update config files
---------
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Alerting: Remove start page of upgrade preview
Alerting upgrade page will now always show the summary table even before
upgrading any alerts or notification channels. There a few reasons for this:
- The information on the start page is redundant as it's now contained in the
documentation.
- Previously, if some unexpected issue prevented performing a full upgrade, a
user would have limited to no means to using the preview tool to help fix the
problem. This is because you could not see the summary table until the full
upgrade was performed at least once. Now, you can upgrade individual alerts and
notification channels from the beginning.
* Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping
* Support validation of notification settings when rules are updated
* Implement route generator for Alertmanager configuration. That fetches all notification settings.
* Update multi-tenant Alertmanager to run the generator before applying the configuration.
* Add notification settings labels to state calculation
* update the Multi-tenant Alertmanager to provide validation for notification settings
* update GET API so only admins can see auto-gen
* Bump scenes
* Make GroupByVariableModel a VariableWithOptions
* Serialise/deserialise group by variable
* WIP: Group by variable editor
* WIP tests
* Group by variable tests
* add feature toggle and gate variable creation behind it
* Fix types
* Do not resolve DS variable
* Do not show the message if no DS is selected
* Now groupby has options and current
* Update public/app/features/dashboard-scene/settings/variables/components/GroupByVariableForm.test.tsx
Co-authored-by: Ivan Ortega Alba <ivanortegaalba@gmail.com>
* don't allow creating groupby if toggle is off + update tests
* add unit tests
* remove groupByKeys
---------
Co-authored-by: Ashley Harrison <ashley.harrison@grafana.com>
Co-authored-by: Ivan Ortega <ivanortegaalba@gmail.com>
* Alerting: fix race condition in (*ngalert/sender.ExternalAlertmanager).Run
* Chore: Fix data races when accessing members of *ngalert/state.FakeInstanceStore
* Chore: Fix data races in tests in ngalert/schedule and enable some parallel tests
* Chore: fix linters
* Chore: add TODO comment to remove loopvar once we move to Go 1.22
* WIP add fuzzysearch to plugins catalog
* Add keywords to the plugins listing output
* add fuzzy search to plugin catalog, add keywords to plugins at frontend side
* refactor fuzzysearch function after review
* review changes
* change the version of uFuzzy library
* change reduce result object in getPluginDetailsForFuzzySearch
* fix yarn lock error
* fix helpers tests
* fix frontend searching test
* fix frontend linting issues
* fix tests
---------
Co-authored-by: Esteban Beltran <esteban@academo.me>
Co-authored-by: Giuseppe Guerra <giuseppe@guerra.in>
* Add config for limit of rules per rule group
* Warn when editing big groups through normal API
* Warn on prov api writes for groups
* Wire up comp root, tests
* Also add warning to state manager warm
* Drop unnecessary conversion
* Add user uid migration to run on every startup to protect against empty values in a upgrade downgrade scenario
* Add team uid migration to run on every startup to protect against empty values in a upgrade downgrade scenario
* Run team uid migration
* Feature Toggle Management: allow editing PublicPreview toggles
* lint
* fix a bunch of tests
* tests are passing
* add permissions unit tests back
* fix display
* close dialog after submit
* use reload method after submit
* make local development easier
* always show editing alert in the UI
* fix readme
---------
Co-authored-by: Michael Mandrus <michael.mandrus@grafana.com>
* merge JSON search logic
* document public methods
* improve test coverage
* use separate JWT setting struct
* correct use of cfg.JWTAuth
* add group tests
* fix DynMap typing
* add settings to default ini
* add groups option to devenv path
* fix test
* lint
* revert jwt-proxy change
* remove redundant check
* fix parallel test
* streamline initialization of test databases, support on-disk sqlite test db
* clean up test databases
* introduce testsuite helper
* use testsuite everywhere we use a test db
* update documentation
* improve error handling
* disable entity integration test until we can figure out locking error
* clean up intervalv2 functions
* use roundInterval from grafana-plugin-sdk-go
* use from grafana-plugin-sdk-go
* have intervalv2 in publicdashboards and remove tsdb/intervalv2
* legacydata cleanup
* remove unused variables
* Update pkg/tsdb/legacydata/interval/interval.go
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
---------
Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
* don't prepend app sub url to paths
* simplify logo path
* fix(plugins): dynamically prepend appSubUrl for System module resolving to work
* fix(sandbox): support dynamic appSuburl prepend when loading plugin module.js
* fix tests
* update test name
* fix tests
* update fe + add some tests
* refactor(plugins): move wrangleurl to utils, rename to resolveModulePath, update usage
* chore: fix a typo
* test(plugins): add missing name to utils test
* reset test flag
---------
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
* add support for sortBy field selector
* use label selectors instead of field selectors
* set entity_labels on create & update
* make entity server integration tests work
* test fixes
* be more consistent with handling of empty body, meta or status
* workaround for database is locked errors during migration
* fix double import of sqlite3
* rename functions and tidy up
* refactor update
* disable integration tests until we can fix the database locking issue
* Stub group to subframe transformation
* Get proper field grouping
* Mostly working but fields not displaying 😭
* Fix display processing in nested tables
* Modularize and start merging groupBy and groupToSubrame
* Get this working
* Prettier
* Typing things
* More types
* Add option for showing subframe table headers
* Prettier
* Get tests going
* Update tests
* Fix naming and add icons
* Betterer fix
* Prettier
* Fix CSS object syntax
* Prettier
* Stub alert for calcs with grouping, start renaming
* Add logic to show warning message for calculations
* Add calc warning
* Renaming and feature flag
* Rename images
* Prettier
* Fix tests
* Update feature toggle
* Fix error showing extra blank row
* minor code cleanup
---------
Co-authored-by: nmarrs <nathanielmarrs@gmail.com>
* update GetUserVisibleNamespaces to use FolderSeriver
* update GetNamespaceByUID to use FolderService.GetFolders
* update GetAlertRulesForScheduling to use FolderService.GetFolders
* Update API and GetAlertRulesForScheduling to use the folder's full path
* get full path of folder in RouteTestGrafanaRuleConfig
* fix escaping of titles for MySQL
* set FullPath and FullPathUIDs if feature flag is off and query requests
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* remove toggle
* remove code not behind toggle
* remove old MegaMenu
* rename DockedMegaMenu -> MegaMenu and clean up go code
* fix backend test
* run yarn i18n:extract
* fix some unit tests
* fix remaining unit tests
* fix remaining e2e/unit tests
* remove advancedDataSourcePicker feature toggle from DataSourcePickerWithPrompt
* remove advancedDataSourcePicker toggle from DataSourcePicker and adjust tests that relied on old picker
* adjust failing tests in QueryVariableEditorForm
* move DataSourceDropdown to DataSourcePicker file
* covert style declaration syntax to object style in DataSourcePicker
* remove advancedDataSourcePicker feature flag from registry
* remove .only from test
* adjust QueryVariableEditor test to avoid console.error
This pull request updates our fork of Alertmanager to commit 65bdab0, which is based on commit 5658f8c in Prometheus Alertmanager.
It applies the changes from grafana/alerting#155 which removes the overrides for validation of alerts, labels and silences that we had put in place to allow alerts and silences to work for non-Prometheus datasources. However, as this is now supported in Alertmanager with the UTF-8 work, we can use the new upstream functions and remove these overrides.
The compat package is a package in Alertmanager that takes care of backwards compatibility when parsing matchers, validating alerts, labels and silences. It has three modes: classic mode, UTF-8 strict mode, fallback mode. These modes are controlled via compat.InitFromFlags. Grafana initializes the compat package without any feature flags, which is the equivalent of fallback mode. Classic and UTF-8 strict mode are used in Mimir.
While Grafana Managed Alerts have no need for fallback mode, Grafana can still be used as an interface to manage the configurations of Mimir Alertmanagers and view configurations of Prometheus Alertmanager, and those installations might not have migrated or being running on older versions. Such installations behave as if in classic mode, and Grafana must be able to parse their configurations to interact with them for some period of time. As such, Grafana uses fallback mode until we are ready to drop support for outdated installations of Mimir and the Prometheus Alertmanager.
* Chore: Fix data race within tests of SSO Setting implementation
* Chore: fix data race within tests to allow parallel testing
* Chore: rollback changes runtime code to test a different approach
* Chore: Fix data race in SSO Setting implementation Upsert method
* Chore: fix typo in comment
* Add single receiver method
* Add receiver permissions
* Add single/multi GET endpoints for receivers
* Remove stable tag from time intervals
See end of PR description here: https://github.com/grafana/grafana/pull/81672
* add handling for legacy and k8s apis to frontend
* use backend srv directly not redux
* add unit test to make sure the correct apis are being called
* require api server flag
* fix feature toggle name
* ensure both pages work correctly
* make consistent with legacy api
* implement webhook update
* fix unit test
* remove old apis and update
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* declare new API and models GettableTimeIntervals, PostableTimeIntervals
* add new actions alert.notifications.time-intervals:read and alert.notifications.time-intervals:write.
* update existing alerting roles with the read action. Add to all alerting roles.
* add integration tests
* Create locking config store that mimics existing provisioning store
* Rename existing receivers(_test).go
* Introduce shared receiver group service
* Fix test
* Move query model to models package
* ReceiverGroup -> Receiver
* Remove locking config store
* Move convert methods to compat.go
* Cleanup
This commit prevents saving configurations containing inhibition
rules in Grafana Alertmanager. It does not reject inhibition
rules when using external Alertmanagers, such as Mimir. This meant
the validation had to be put in the MultiOrgAlertmanager instead of
in the validation of PostableUserConfig. We can remove this when
inhibition rules are supported in Grafana Managed Alerts.
* create the feature flag
* bring the concurrency in to the play
* Update feature flag
* Use concurrency number from settings
* update influxdb dependency
* use ConcurrentQueryCount from plugin-sdk-go
* use helper method for concurrent query count
* log the error
* add value guard
* add unit tests
* handle concurrency error
AM config applied via API would use the PostableUserConfig as the AM raw
config and also the hash used to decide when the AM config has changed.
However, when applied via the periodic sync the PostableApiAlertingConfig would
be used instead.
This leads to two issues:
- Inconsistent hash comparisons when modifying the AM causing redundant applies.
- GetStatus assumed the raw config was PostableUserConfig causing the endpoint
to return correctly after a new config is applied via API and then nothing once
the periodic sync runs.
Note: Technically, the upstream GrafanaAlertamanger GetStatus shouldn't be
returning PostableUserConfig or PostableApiAlertingConfig, but instead
GettableStatus. However, this issue required changes elsewhere and is out of
scope.
* add feature toggle
* add a middleware that appens headers for IP range AC
* sort imports
* sign IP range header and only append it if the request is going to allow listed data sources
* sign a random generated string instead of IP, also change the name of the middleware to make it more generic
* remove the DS IP range AC options from the config file; remove unwanted change
* add test
* sanitize the URLs when comparing
* cleanup and fixes
* check if X-Real-Ip is present, and set the internal request header if it is not present
* use split string function from the util package
* Add folder store method for fetching all folder descendants
* Modify GetDescendantCounts() to fetch folder descendants at once
* Reduce DB calls when counting library panels under dashboard
* Reduce DB calls when counting dashboards under folder
* Reduce DB calls during folder delete
* Modify folder registry to count/delete entities under multiple folders
* Reduce DB calls when counting
* Reduce DB calls when deleting
* Add test for create method
Co-authored-by: Tania B <10127682+undef1nd@users.noreply.github.com>
* Change structure of entity package to break import cycle
* Update wire file
---------
Co-authored-by: Tania B <10127682+undef1nd@users.noreply.github.com>
* Data query: Allo logging panel plugin id when executing queries
* Update tracing header middleware
* Test fix
* Add panelPluginType to query analytics
* Cleanup
- Feature Toggle is `promQLScope`.
- Query property is:
"scope": {
"matchers": "{job=~\".*\"}"
}
Misc:
- Also updates drone GO version to address ARM bug https://github.com/golang/go/issues/58425
- Also updates Swagger defs that were causing builds to fail
---------
Co-authored-by: Kevin Minehart <kmineh0151@gmail.com>
* Reload after deletion of the current settings
* Add grafana_ssosettings_setting_reload_failure_total counter
* Returns successfully if data reload failed
* add annotation permissions to dashboard managed role and add migrations for annotation permissions
* fix a bug with conditional access level definitions
* add tests
* Update pkg/services/sqlstore/migrations/accesscontrol/dashboard_permissions.go
Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
* apply feedback
* add batching, fix tests and a typo
* add one more test
* undo unneeded change
* undo unwanted change
* only check the default basic permissions for non-OSS instances
* account for all wildcards and simplify the check a bit
* error handling and extra conditionals to avoid test failures
* fix a bug with admin permissions not appearing for folders
* fix the OSS check
---------
Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
* Remove folderID from service tests
* Remove folderID from ngalert migration tests
* Remove tests related to folderIDs
* Roll back change
Before removing FolderID from this test, we need to adjust the code
* Remove FolderID from publicdashboard pkg
* Add back annotations test
* RBAC: Search add user login filter
* Switch to a userService resolving instead
* Remove unused error
* Fallback to use the cache
* account for userID filter
* Account for the error
* snake case
* Add test cases
* Add api tests
* Fix return on error
* Re-order imports
* Folders: Expose function for getting all org folders with specific UIDs
* lint
* Fix test
* fixup
* Apply suggestion from code review
* Remove changes in alerting scheduler
* fixup
* fixup after merge with main
* Add batching
* Use strings.Builder
* Return all org folders if UIDs is empty
* Filter out not accessible folders by the user
* Remove comment
* Fix batching when count is zero
* Do not include dashboard permissions
* Add some tests
* fix test
* Use batch request for folders
* Use batch request to deduplicate folders
* Refactor
* Fix after merging main
* Refactor
---------
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* Add API test
* Add move tests
* Fix create folder
* Fix move
* Fix test
* Drop and re-create index so that allows a folder to contain a dashboard and a subfolder with same name
* Get folder by title defaults to root folder and optionally fetches folder by provided parent folder
* Apply suggestions from code review
* Folders: Expose function for getting all org folders with specific UIDs
* Return all org folders if UIDs is empty
* Filter out not accessible folders by the user
* Modify query to optionally returning a string that contains the UIDs of all parent folders separated by slash.
* Alerting: Add action, scope, role_id to permission table
The existing role_id, action, scope index has the wrong ordering to be most
effectively used in dashboard/folder permission requests.
On a large tests set, the slow database calls were on the order of ~30-40ms, so
when performed individually they don't have that large of a latency impact.
However, when done in bulk in the migration this adds up to some very slow
requests.
After the index is added these same database calls are reduced to ~4-5ms
* Change index to action, scope, role_id
* Make new index unique and drop [role_id, action, scope] index
* Alerting: During legacy migration reduce the number of created silences
During legacy migration every migrated rule was given a label rule_uid=<uid>.
This was used to silence DatasourceError/DatasourceNoData alerts for
migrated rules that had either ExecutionErrorState/NoDataState set to
keep_state, respectively.
This could potentially create a large amount of silences and a high cardinality
label. Both of these scenarios have poor outcomes for CPU load and latency in
unified alerting.
Instead, this change creates one label per ExecutionErrorState/NoDataState when
they are set to keep_state as well as two silence rules, if rules with said
labels were created during migration. These silence rules are:
- __legacy_silence_error_keep_state__ = true
- __legacy_silence_nodata_keep_state__ = true
This will drastically reduce the number of created silence rules in most cases
as well as not create the potentially high cardinality label `rule_uid`.
* introduce feature toggle
* create base service structure
* fix sample metric
* register metrics
* add to codeowners
* separate api dtos from service models
* remove leading newline
* Add AuthNSvc reload handling
* Working, need to add test
* Remove commented out code
* Add Reload implementation to connectors
* Align and add tests, refactor
* Add more tests, linting
* Add extra checks + tests to oauth client
* Clean up based on reviews
* Move config instantiation into newSocialBase
* Use specific error
These don't get marshalled and unmarshalled in the same way as they are represented in Go
This PR changes the OpenAPI spec to reflect what the API accepts and sends back
* Simple, per-base-interval jitter
* Add log just for test purposes
* Add strategy approach, allow choosing between group or rule
* Add flag to jitter rules
* Add second toggle for jittering within a group
* Wire up toggles to strategy
* Slightly improve comment ordering
* Add tests for offset generation
* Rename JitterStrategyFrom
* Improve debug log message
* Use grafana SDK labels rather than prometheus labels
* add deployment registry API cloud only
* update versions
* add feature flag endpoints
* use helpers
* merge main
* update AllowSelfServie and re-run code gen
* fix package name
* add allowselfserve flag to payload
* remove config
* update list api to return the full registry including states
* change enabled check
* fix compile error
* add feature toggle and split path in frontend
* changes
* with status
* add more status/state
* add back config thing
* add back config thing
* merge main
* merge main
* now on the /current api endpoint
* now on the /current api endpoint
* drop frontend changes
* change group name to featuretoggle (singular)
* use the same settings
* now with patch
* more common refs
* more common refs
* WIP actually do the webhook
* fix comment
* fewer imports
* registe standalone
* one less file
* fix singular name
---------
Co-authored-by: Michael Mandrus <michael.mandrus@grafana.com>
* ngalert openapi: Use same `basePath` as rest of Grafana
Currently, there are two issues that prevent easily merging `ngalert` and grafana openapi specs:
- The basePath is different. `grafana` has `/api` and `ngalert` has `/api/v1`. I changed `ngalert` to use `/api`
- The `ngalert` endpoints have their basePath in the each operation path. The basePath should actually be omitted
---------
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
* first touches
* Merge missing SSO settings to support Advanced Auth pages
* fix
* Update secrets correctly
* Add test for upsert with redactedsecret
* Verify decryption in the List tests
* AuthnSync: Rename files and structures
* AuthnSync: register rbac cloud role sync if feature toggle is enabled
* RBAC: Add new sync function to service interface
* RBAC: add common prefix and role names for cloud fixed roles
* AuthnSync+RBAC: implement rbac cloud role sync
Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>
* Change ruler API to expect the folder UID as namespace
* Update example requests
* Fix tests
* Update swagger
* Modify FIle field in /api/prometheus/grafana/api/v1/rules
* Fix ruler export
* Modify folder in responses to be formatted as <parent UID>/<title>
* Add alerting test with nested folders
* Apply suggestion from code review
* Alerting: use folder UID instead of title in rule API (#77166)
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
* Drop a few more latent uses of namespace_id
* move getNamespaceKey to models package
* switch GetAlertRulesForScheduling to use folder table
* update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`.
* fi tests
* add tests for GetAlertRulesForScheduling when parent uid
* fix integration tests after merge
* fix test after merge
* change format of the namespace to JSON array
this is needed for forward compatibility, when we migrate to full paths
* update EF code to decode nested folder
---------
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* Alerting: Add metric to check for default AM configurations
* Use a gauge for the config hash
* don't go out of bounds when converting uint64 to float64
* expose metric for config hash
* update metrics after applying config
* remove latest.json and replace with api call to grafana.com
* remove latest.json
* Revert "remove latest.json"
This reverts commit bcff43d898.
* Revert "remove latest.json and replace with api call to grafana.com"
This reverts commit 02b867d84e.
* add deprecation message to latest.json