Commit Graph

6959 Commits

Author SHA1 Message Date
Santiago
a6be12c037
Alerting: Implement SaveAndApplyConfig in the forked Alertmanager (remote primary) (#84659)
* Alerting: Implement SaveAndApplyConfiguration in the forked Alertmanager struct

* call SaveAndApplyConfig on the remote first, log errors for the internal

* add comments explaining why we ignore errors in the internal AM

* restore go.work.sum
2024-04-23 15:45:35 +02:00
Steve Simpson
a6ad2380bf
Alerting: Refactor api_prometheus.go request handlers. (#86639)
This splits the request handlers into two functions, one which is the actual
handler and one which is independent from the Grafana `ReqContext` object. This
is to make it easier to reuse the implementation in other code.

Part of the refactoring changes the functions which get query parameters from
the request to operate on a `url.Values` instead of the request object.

The change also makes the code consistently use `req.Form` instead of a
combination of `req.URL.Query()` and `req.Form`, though I have left
`api_ruler` as-is to avoid this PR growing too large.
2024-04-23 14:50:26 +02:00
Santiago
c77ab53819
Alerting: implement SaveAndApplyConfig in the remote Alertmanager struct (#84642)
* implement SaveAndApplyConfig in the remote Alertmanager struct

* remove ID from CreateGrafanaAlertmanagerConfig call

* decrypt, test that we decrypt, refactor

* fix duplicated declaration in test

* rephrase comment, remove unnecessary conversion to slice of bytes

* fix test
2024-04-23 14:37:10 +02:00
Santiago
8b7c2a459b
Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary mode) (#85668)
* Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary)

* log the error for the internal AM instead of returning it
2024-04-23 14:36:40 +02:00
Mihai Doarna
4d9e35ba57
SSO: add configurableProviders list to SSO service (#86622)
* add configurableProviders list to sso service

* address feedback
2024-04-23 10:00:43 +03:00
Alexander Weaver
c32953e52c
Alertign: Create feature toggle for recording rules (#86696)
create toggle for recording rules
2024-04-22 12:53:16 -05:00
Yuri Tseretyan
9735a8a080
Alerting: Distinguish conflict violation errors (#86634)
* update generator to set ID = 0 and do not set 0 if unique is needed
* return proper message when the constraint violation
2024-04-22 12:28:46 -04:00
Julian Siebert
14f018e3fc
Docs: Use correct description for "og_priority" (#80889) 2024-04-22 13:53:18 +00:00
Kristina
2247d6c415
Short Links: Add setting for changing expiration time (#86003)
* Add setting for changing shortlink expiration time

* Add docs, add better language

* put all the numbers in the duration 🤷

* 🙄

* update language to be correct and clear

* Add max limit and more documentation
2024-04-22 07:39:24 -05:00
Steve Simpson
54290f2ac4
Alerting: Fix TestRouteGetRuleStatuses as much as possible. (#86666)
This test has been skipped for a long time, so it doesn't work anymore. I've
fixed the test so it works again, but left some tests disabled which were
apparently flaky. If we see the other test cases flaking, we'll have to
disable it again.

Fixes:
- Use fake access control for most test cases, and real one for FGAC test cases.
- Check that "file" in API responses the full folder path, not folder title.
2024-04-22 12:36:50 +02:00
Steve Simpson
f07f48616a
Alerting: Fix panic when limit_alerts=0. (#86640)
Oversight in the TopK function meant if k=0, then we'd panic when checking
element zero in the heap, because no items are ever allowed into the heap.
2024-04-22 10:14:19 +02:00
Michael Mandrus
45a7f649fe
CMS: Create local implementation of cloud migration for dev use (#86637)
* add developer mode property to config

* create cms stub

* cleanup

* implement and wire up gcom stub

* fix errors

* don't document the flag
2024-04-20 23:51:58 -04:00
Steve Simpson
6ea97e41fb
Alerting: Consistently return Prometheus-style responses from rules APIs. (#86600)
* Alerting: Consistently return Prometheus-style responses from rules APIs.

This commit is part refactor and part fix. The /rules API occasionally returns
error responses which are inconsistent with other error responses. This fixes
that, and adds a function to map from Prometheus error type and HTTP code.

* Fix integration tests

* Linter happiness

* Make linter more happy

* Fix up one more place returning non-Prometheus responses
2024-04-19 21:03:20 +02:00
Robert Horvath
86a9533863
Chore: Replace backend platform codeownership (#86010)
* Replace backend platform codeownership

* fix go.mod with work sync

* fix go.mod

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>

---------

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
Co-authored-by: Dave Henderson <dave.henderson@grafana.com>
2024-04-19 19:12:59 +02:00
Eric Leijonmarck
ddabef9895
RBAC: Add actionsets struct and write path (#86108)
* Add actionsets struct and failing test

* update from review

* review comments

* review comments update

* refactor: create interface

* actionset service

* fix tests

* move from wireoss to wire

* Apply suggestions from code review

remove unnecessary comments

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* nil for the actionsetservice

* Revert "nil for the actionsetservice"

This reverts commit e3d3cc8171.

---------

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>
2024-04-19 15:38:14 +01:00
Santiago
529f55cfe8
Alerting: Remove isDefault field from receivers (Alertmanager configuration) (#86605)
Alerting: Remove isDefault field from receivers in the Alertmanager configuration
2024-04-19 15:44:20 +02:00
Santiago
309a7e7684
Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct (#85005)
* Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct

* send the hash of the encrypted configuration

* tests, default config hash in AM struct

* add missing default config to test

* restore build directory

* go work file...

* fix broken test

* remove unnecessary conversion to []byte

* go work again...

* make things work again with latest main branch changes

* update error messages in tests for decrypting config
2024-04-19 15:11:07 +02:00
Santiago
a2ce8fefed
Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager (#86451)
* Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager

* remove '-distroless' from mimir image name
2024-04-19 13:04:18 +02:00
Steve Simpson
5f7612834e
Alerting: Refactoring in api_prometheus.go to allow code reuse. (#86575)
Preparing these functions to be used by some other part of the codebase,
which does not have a `contextmodel.ReqContext`, only the normal request
structure (`url.Values`, etc). This is slightly messy because of how
Grafana allows url parameters to be in the URL or in the request body,
so we need to make sure to invoke the form parsing logic in `ReqContext`.
2024-04-19 12:52:01 +02:00
Alex Khomenko
44e1bce55a
Feature toggles: Remove dashboardEmbed toggle (#86587) 2024-04-19 12:48:08 +02:00
Steve Simpson
73873f5a8a
Alerting: Optimize rule status gathering APIs when a limit is applied. (#86568)
* Alerting: Optimize rule status gathering APIs when a limit is applied.

The frontend very commonly calls the `/rules` API with `limit_alerts=16`. When
there are a very large number of alert instances present, this API is quite
slow to respond, and profiling suggests that a big part of the problem is
sorting the alerts by importance, in order to select the first 16.

This changes the application of the limit to use a more efficient heap-based
top-k algorithm. This maintains a slice of only the highest ranked items whilst
iterating the full set of alert instances, which substantially reduces the
number of comparisons needed. This is particularly effective, as the
`AlertsByImportance` comparison is quite complex.

I've included a benchmark to compare the new TopK function to the existing
Sort/limit strategy. It shows that for small limits, the new approach is
much faster, especially at high numbers of alerts, e.g.

100K alerts / limit 16: 1.91s vs 0.02s (-99%)

For situations where there is no effective limit, sorting is marginally faster,
therefore in the API implementation, if there is either a) no limit or b) no
effective limit, then we just sort the alerts as before. There is also a space
overhead using a heap which would matter for large limits.

* Remove commented test cases

* Make linter happy
2024-04-19 11:51:22 +02:00
Ryan McKinley
5a8384a245
QueryService: Add feature toggles to better support testing (#86493) 2024-04-19 12:26:21 +03:00
Will Browne
8a5c0cfdc0
Plugins: Pass cancellable context during API server creation (#86545) 2024-04-19 09:22:14 +03:00
Matthew Jacobson
a20197229e
Alerting: Prevent simplified routing zero duration GroupInterval and RepeatInterval (#86561)
Prevent zero duration GroupInterval and RepeatInterval
2024-04-18 21:08:38 -04:00
Matthew Jacobson
71445002b7
Alerting: Fix simplified routing group by override (#86552)
* Alerting: Fix simplified routing custom group by override

Custom group by overrides for simplified routing were missing required fields
GroupBy and GroupByAll normally set during upstream Route validation.

This fix ensures those missing fields are applied to the generated routes.

* Inline GroupBy and GroupByAll initialization instead of normalize after
2024-04-18 21:08:14 -04:00
ismail simsek
28a683cf28
InfluxDB: Remove influxdbSqlSupport feature toggle (#86518)
Remove influxdbSqlSupport feature toggle
2024-04-18 16:29:27 +02:00
Andres Martinez Gotor
eac02a61e1
Return plugin error when requesting settings (#86052) 2024-04-18 14:29:02 +02:00
Mihai Doarna
57848bbe23
Auth: encrypt/decrypt SAML secrets in SSO settings service (#85253)
encrypt/decrypt saml secrets in sso settings service
2024-04-18 15:16:59 +03:00
Andres Martinez Gotor
0f4a47b180
Plugin load errors: Add more well-known errors (#85960) 2024-04-18 13:04:22 +02:00
Michael Mandrus
df4c8c3cbc
CloudMigrations: Move business logic out of api layer (#86406)
* move run migration to the cloudmigrationimpl layer

* add migration run list logic down a layer

* remove useless comments

* pull cms calls into their own service
2024-04-17 15:43:09 -04:00
Aaron Godin
d409d8e860
IAM - Fix error messages for resource permissions endpoints (#85773)
* IAM: fix many error messages in access-related code to provide more information

* Remove debug statement

* Refactor resourcepermissions package to use errutil

* Replace a few more errors with errutil and wrap errors found in users and teams services

* Apply diff of openAPI spec
2024-04-17 08:53:28 -05:00
Karl Persson
1a6777cb93
User: use update function for password updates (#86419)
* Update password through Update function instead

* Remove duplicated to lower

* Refactor password code
2024-04-17 15:24:36 +02:00
Andre Pereira
93c24403b3
Explore: Move app to under explore > traces (#86436)
Move app to under explore > traces
2024-04-17 14:18:06 +01:00
Kristin Laemmert
03b795844c
SQLStore: Improve recursive CTE support detection (#86397)
sqlstore: improve recursive CTE support detection

Vitess returns a not supported error, not a parse error

Co-authored-by: Derek Perkins <derek@nozzle.io>
2024-04-17 08:37:47 -04:00
Eric Leijonmarck
9c1ef8b16e
Auth: Remove caseinsensitive check on update user (#86286)
* Removal: case insensitive check on update

* refactor and removal of test for duplicate user

* refactor to still shadow user variable
2024-04-16 17:47:17 +01:00
Matthew Jacobson
533bed6d94
Alerting: Fix simplified routes '...' groupBy creating invalid routes (#86006)
* Alerting: Fix simplified routes '...' groupBy creating invalid routes

There were a few ways to go about this fix:
1. Modifying our copy of upstream validation to allow this
2. Modify our notification settings validation to prevent this
3. Normalize group by on save
4. Normalized group by on generate

Option 4. was chosen as the others have a mix of the following cons:
- Generated routes risk being incompatible with upstream/remote AM
- Awkward FE UX when using '...'
- Rule definition changing after save and potential pitfalls with TF

With option 4. generated routes stay compatible with external/remote AMs, FE
doesn't need to change as we allow mixed '...' and custom label groupBys, and
settings we save to db are the same ones requested.

In addition, it has the slight benefit of allowing us to hide the internal
implementation details of `alertname, grafana_folder` from the user in the
future, since we don't need to send them with every FE or TF request.

* Safer use of DefaultNotificationSettingsGroupBy

* Fix missed API tests
2024-04-16 12:14:39 -04:00
Nick Richmond
d3fee607e2
Expressions: sort numeric metrics behind feature toggle (#85911)
* feat: sort numeric metrics behind feature toggle

* chore: upgrade `dataplane/sdata` to latest tag

* chore: `go work sync`
2024-04-16 10:52:47 -04:00
Ieva
036f826b87
AuthZ: Further protect admin endpoints (#86285)
* only users with Grafana Admin role can grant/revoke Grafana Admin role

* check permissions to user amdin endpoints globally

* allow checking global permissions for service accounts

* use a middleware for checking whether the caller is Grafana Admin
2024-04-16 15:48:12 +01:00
Karl Persson
0f06120b56
User: Clean up update functions (#86341)
* User: remove unused function

* User: Remove UpdatePermissions and support IsGrafanaAdmin flag in Update function instead

* User: Remove Disable function and use Update instead
2024-04-16 16:33:50 +02:00
owensmallwood
8c8885ef23
Storage Api: Adds traces (#85391)
- adds traces and improved logging to the unified storage server
- add a configurable logger to the gRPC server service
2024-04-16 08:30:51 -06:00
Karl Persson
8520892923
User: Fix GetByID (#86282)
* Auth: Remove unused lookup param

* Remove case sensitive lookup for GetByID
2024-04-16 15:24:34 +02:00
Ryan McKinley
ba510d1a7d
Playlists: Enable kubernetesPlaylists by default in OSS (#86259) 2024-04-16 09:19:35 +02:00
Brendan O'Handley
f85470d652
Prometheus: Use the frontend package in Prometheus and remove feature toggle (#86080)
* add history links for monaco completion provider folder

* add history links for monaco query field folder

* add history links for components folder

* add history links for configuration folder

* add history links for dashboard json folder

* add history links for gcopypaste folder

* add history link for variableMigration

* add history link for querybuilder/components/metrics-modal folder

* add history link for querybuilder/components/promqail folder

* add history links for querybuilder/components folder

* add history links for querybuilder/hooks folder

* add history links for querybuilder/shared folder

* add history links for querybuilder folder

* add history links for querycache folder

* add history links for src folder

* use frontend package and custom auth in module.ts

* remove files and fix import issues

* remove usePrometheusFrontendPackage

* remove extra files

* update betterer

* remove extra files after rebase

* fix betterer for rebase

* fix e2e flakiness
2024-04-15 16:45:23 -05:00
Alexander Weaver
5b1498f98f
Alerting: Return a 400 and errutil error when trying to delete a contact point that is referenced by a policy (#85481)
Return a 400 and errutil error when trying to delete a contact point that is referenced by a policy
2024-04-15 09:25:28 -05:00
linoman
51da96d94e
Auth: Add IsClientEnabled and IsEnabled for the authn.Service and authn.Client interfaces (#86034)
* Add `Service. IsClientEnabled` and `Client.IsEnabled` functions

* Implement `IsEnabled` function for authn clients

* Implement `IsClientEnabled` function for authn services
2024-04-15 10:54:50 +02:00
Giuseppe Guerra
eec9d3dbc4
Plugins: Removed feature toggle pluginsDynamicAngularDetectionPatterns (#85956)
* Plugins: Removed feature toggle pluginsDynamicAngularDetectionPatterns

* re-generate feature toggles
2024-04-15 10:37:28 +02:00
Yuri Tseretyan
12605bfed2
Alerting: Update fixed roles to include silences permissions (#85826)
* update fixed roles to include silences
* add silence actions to managed permissions
* update documentation
2024-04-12 12:37:34 -04:00
Ieva
56f4664875
RBAC: add a feature toggle for action sets (#86064)
* add a feature toggle for action sets

* update feature toggle name
2024-04-12 17:19:25 +01:00
Steve Simpson
ad7f804255
Alerting: Fix evaluation metrics to not count retries (#85873)
* Change evaluation metrics to only count once per eval, and add new metrics.

* Cosmetic: Move eval total Inc() to orginal place.
2024-04-12 16:20:46 +02:00
lean.dev
c8d185502b
Fixes CMS url for dev environment (#86056) 2024-04-12 14:19:38 +00:00