* introduce feature toggle
* create base service structure
* fix sample metric
* register metrics
* add to codeowners
* separate api dtos from service models
* remove leading newline
* add clearer comment for function def
* update test to reflect change in range for 1w step
* clarify docs
* add more clarity
* add explanation to query options min interval and link to min step
* Update docs/sources/panels-visualizations/query-transform-data/_index.md
Co-authored-by: Isabel <76437239+imatwawana@users.noreply.github.com>
---------
Co-authored-by: Isabel <76437239+imatwawana@users.noreply.github.com>
* Add AuthNSvc reload handling
* Working, need to add test
* Remove commented out code
* Add Reload implementation to connectors
* Align and add tests, refactor
* Add more tests, linting
* Add extra checks + tests to oauth client
* Clean up based on reviews
* Move config instantiation into newSocialBase
* Use specific error
These don't get marshalled and unmarshalled in the same way as they are represented in Go
This PR changes the OpenAPI spec to reflect what the API accepts and sends back
* Simple, per-base-interval jitter
* Add log just for test purposes
* Add strategy approach, allow choosing between group or rule
* Add flag to jitter rules
* Add second toggle for jittering within a group
* Wire up toggles to strategy
* Slightly improve comment ordering
* Add tests for offset generation
* Rename JitterStrategyFrom
* Improve debug log message
* Use grafana SDK labels rather than prometheus labels
Sets that status code on backend data responses in prometheus to match the status code returned by prometheus. If the failure is below HTTP/Application Layer, Bad Gateway is returned (502).
---------
Co-authored-by: ismail simsek <ismailsimsek09@gmail.com>
* add deployment registry API cloud only
* update versions
* add feature flag endpoints
* use helpers
* merge main
* update AllowSelfServie and re-run code gen
* fix package name
* add allowselfserve flag to payload
* remove config
* update list api to return the full registry including states
* change enabled check
* fix compile error
* add feature toggle and split path in frontend
* changes
* with status
* add more status/state
* add back config thing
* add back config thing
* merge main
* merge main
* now on the /current api endpoint
* now on the /current api endpoint
* drop frontend changes
* change group name to featuretoggle (singular)
* use the same settings
* now with patch
* more common refs
* more common refs
* WIP actually do the webhook
* fix comment
* fewer imports
* registe standalone
* one less file
* fix singular name
---------
Co-authored-by: Michael Mandrus <michael.mandrus@grafana.com>
* ngalert openapi: Use same `basePath` as rest of Grafana
Currently, there are two issues that prevent easily merging `ngalert` and grafana openapi specs:
- The basePath is different. `grafana` has `/api` and `ngalert` has `/api/v1`. I changed `ngalert` to use `/api`
- The `ngalert` endpoints have their basePath in the each operation path. The basePath should actually be omitted
---------
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
* first touches
* Merge missing SSO settings to support Advanced Auth pages
* fix
* Update secrets correctly
* Add test for upsert with redactedsecret
* Verify decryption in the List tests
* AuthnSync: Rename files and structures
* AuthnSync: register rbac cloud role sync if feature toggle is enabled
* RBAC: Add new sync function to service interface
* RBAC: add common prefix and role names for cloud fixed roles
* AuthnSync+RBAC: implement rbac cloud role sync
Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>
* Change ruler API to expect the folder UID as namespace
* Update example requests
* Fix tests
* Update swagger
* Modify FIle field in /api/prometheus/grafana/api/v1/rules
* Fix ruler export
* Modify folder in responses to be formatted as <parent UID>/<title>
* Add alerting test with nested folders
* Apply suggestion from code review
* Alerting: use folder UID instead of title in rule API (#77166)
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
* Drop a few more latent uses of namespace_id
* move getNamespaceKey to models package
* switch GetAlertRulesForScheduling to use folder table
* update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`.
* fi tests
* add tests for GetAlertRulesForScheduling when parent uid
* fix integration tests after merge
* fix test after merge
* change format of the namespace to JSON array
this is needed for forward compatibility, when we migrate to full paths
* update EF code to decode nested folder
---------
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
* Alerting: Add metric to check for default AM configurations
* Use a gauge for the config hash
* don't go out of bounds when converting uint64 to float64
* expose metric for config hash
* update metrics after applying config
* remove latest.json and replace with api call to grafana.com
* remove latest.json
* Revert "remove latest.json"
This reverts commit bcff43d898.
* Revert "remove latest.json and replace with api call to grafana.com"
This reverts commit 02b867d84e.
* add deprecation message to latest.json
* first commit
* add: pagination to anondevices
* fmt
* swagger and tests
* swagger
* testing out test
* fixing tests
* made it possible to query for from and to time
* refactor: change to query for ip adress instead
* fix: tests
* separate nestedFolderPickerOverride toggle to force enable it without nestedFolders
* let's call it newFolderPicker
* update unit tests and keyboard handling
* reduce spacing when no folder open chevron
---------
Co-authored-by: Josh Hunt <joshhunt@users.noreply.github.com>
* Split subquery when cleaning annotations
* update comment
* Raise batch size, now that we pay attention to it
* Iterate in batches
* Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use
* Use split-out utility in outer batching loop so it respects context cancellation
* guard against empty queries
* Use SQL parameters
* Use same approach for tags
* drop unused function
* Work around parameter limit on sqlite for large batches
* Bulk insert test data in DB
* Refactor test to customise test data creation
* Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit
* Turn annotation cleanup test to integration tests
* lint
---------
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* add get mute timing by name to MuteTimingService
* update get mute timing request handler to use the service method
* replace validation, uniqueness and used errors with errutils
* update mute timing methods return errutil responses
* use the term "time interval" in errors bevause mute timings are deprecated in Alertmanager and will be replaced by time intervals in the future.
* update create and update methods to return struct instead of pointer
* Remove FolderID from service tests
* Add models
* Add folderID pack to publicdashboard tests
* Remove folderID from dashboard tests
* Remove folderID from folders
* Remove folderID from ngalert tests
* Remove nolint comment
* Add back some tests after rebase
* implement Reload() func for azuread provider
* add unit test for failure
* use mutex when updating the info field
* implement the Reload() func for the other providers
* use mutex when reading info
* retrieve info using GetOAuthInfo() in common file
* move Reload() to SocialBase
* Alerting: Increase size of kvstore value type for MySQL to LONGTEXT
alertmanager uses the kvstore to persist its notification log and the current
column limit for MySQL (16.7mb) puts the maximum entries at a level that is
potentially achievable for heavy alerting users (~40-80k entries).
In comparison, the current type for PSQL (TEXT) is effectively unlimited and
I believe SQLIte defaults to 2gb which is also plenty of leeway.
* Move scope type vars to testutil package
* Expose parts of state historian for use in annotation backend
* Implement Loki ASH Annotation store
This store will only implement the `Get` method of a RepositoryImpl since alert state history
writes to Loki elsewhere.
* Use interface for Loki HTTP Client
* Add tests for Loki ASH Annotation store
* Add missing test
* Fix lint
* Organize tests
* Add filter tests
* Improve tests
* Move filter logic into outer function
* Fix lint
* Add comment
* Fix tests
* Fix lint
* Rename historian store + refactor
* Cleanup historian store
* Fix tests
* Minor cleanup
* Use new `ShouldRecordAnnotation` filter
* Fix logic and add tests for this check
* Fix typos, remove unused variables, `< 1` -> `== 0`
* More closely mimic RBAC filter from xorm to ensure correct logic
* Move off weaveworks client
* Address PR comments
* Alerting: Create feature flag for alert query optimization
Adds a feature flag alertingQueryOptimization for an already existing
functionality: alert query optimization. This feature flag will now be disabled
by default.
* reload SSO settings for HA setups
* remove check for grafana HA
* add unit tests
* fetch all sso settings with one sql query
* register background service
* Add enablePluginsTracingByDefault feature flag
* Enable tracing for all plugins if enablePluginsTracingByDefault is set
* fix docstrings for IsEnabled and IsEnabledGlobally
* fix tests
* do not use separate feature manager
* add test case
* Revert "fix tests"
This reverts commit 46a2420ed1.
* cleanup
* fix plugin tracing disabled if wrong plugin setting is present
* add test case for enabled on plugin with wrong plugin setting but with enablePluginsTracingByDefault feature flag
* Add RequiresRestart = true to enablePluginsTracingByDefault
* re-generate feature flags
* pr review feedback
* Alerting: Add metrics to the remote Alertmanager struct
* rephrase http_requests_failed description
* make linter happy
* remove unnecessary metrics
* extract timed client to separate package
* use histogram collector from dskit
* remove weaveworks dependency
* capture metrics for all requests to the remote Alertmanager (both clients)
* use the timed client in the MimirAuthRoundTripper
* HTTPRequestsDuration -> HTTPRequestDuration, clean up mimir client factory function
* refactor
* less git diff
* gauge for last readiness check in seconds
* initialize LastReadinesCheck to 0, tweak metric names and descriptions
* add counters for sync attempts/errors
* last config sync and last state sync timestamps (gauges)
* change latency metric name
* metric for remote Alertmanager mode
* code review comments
* move label constants to metrics package
* Alerting: Fix NoData & Error alerts not resolving when rule is reset
On rule reset, when creating the PostableAlerts StateToPostableAlert did not
attach the correct NoData/Error alertname and rulename labels to expire/resolve
the active alerts when the previous cached state was NoData/Error.
* Return data in camelCase from the OAuth fb strategy
* changes
* wip
* Add defaults for oauth fb strategy
* revert other changes
* basic includeDefaults query param implementation
* basic secret removal and etag implementation
* correct imports
* rebase
* move default settings filter to models
* only replace ClientSecret value if set
* first GetForProvider test & use FNV for ETag to avoid Blocklisted import error
* add tests
* add annotation for the openapi spec & generate spec
* remove TODO
* use IsSecret, improve tests, remove DefaultOAuthSettings
* add comment explaining generateFNVETag
* add error handling for generateFNVETag
* run go generate
* Update pkg/services/ssosettings/api/api.go
Co-authored-by: Mihai Doarna <mihai.doarna@grafana.com>
* move isSecret to service, create GetForProviderWithRedactedSecrets func
* add unit test for GetForProviderWithRedactedSecrets & remove duplicated code
* regen openapi/swagger
* revert dependency bumps
---------
Co-authored-by: Mihaly Gyongyosi <mgyongyosi@users.noreply.github.com>
Co-authored-by: Mihai Doarna <mihai.doarna@grafana.com>
This PR has two steps that together create a functional dry-run capability for the migration.
By enabling the feature flag alertingPreviewUpgrade when on legacy alerting it will:
a. Allow all Grafana Alerting background services except for the scheduler to start (multiorg alertmanager, state manager, routes, …).
b. Allow the UI to show Grafana Alerting pages alongside legacy ones (with appropriate in-app warnings that UA is not actually running).
c. Show a new “Alerting Upgrade” page and register associated /api/v1/upgrade endpoints that will allow the user to upgrade their organization live without restart and present a summary of the upgrade in a table.
* extract get and save operations to a alertmanagerConfigStore. this removes duplicated code in service (currently only mute timings) and improves testing
* replace generic errors with errutils one with better messages.
* update provisioning services to use new store
---------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Docs: Add table data in PDF
* fix lint issues
* Switch to public preview
* Apply suggestions from code review
Co-authored-by: Isabel <76437239+imatwawana@users.noreply.github.com>
---------
Co-authored-by: Isabel <76437239+imatwawana@users.noreply.github.com>
There were a few errors that prevented these endpoints (which are the most up-to-date ones) from being present in the openapi spec:
- The `enterprise` tag excluded the endpoints from being generated
- `okRespoonse` typo
- Invalid templating on the parameters
- Missing parameter structs
Some refactoring that will simplify next changes for dry-run PRs. This should be no-op as far as the created ngalert resources and database state, though it does change some logs.
The key change here is to modify migrateOrg to return pairs of legacy struct + ngalert struct instead of actually persisting the alerts and alertmanager config. This will allow us to capture error information during dry-run migration.
It also moves most persistence-related operations such as title deduplication and folder creation to the right before we persist. This will simplify eventual partial migrations (individual alerts, dashboards, channels, ...).
Additionally it changes channel code to deal with PostableGrafanaReceiver instead of PostableApiReceiver (integration instead of contact point).
* Separate overlapping legacy and UA alerting routes
api/alert-notifiers, alerting/list, and alerting/notifications existed in both
legacy and UA.
Rename legacy route paths and nav ids to be independent of UA ones.
Backend:
* Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command.
- add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels.
- add rule_fingerprint column to alert_instance
- update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it.
- add AlertingResultsFromRuleState that implements the new interface in eval package
- update getExprRequest to patch the hysteresis command.
* Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition.
Frontend:
* Add hysteresis option to Threshold in UI. It's called "Recovery Threshold"
* Add test for getUnloadEvaluatorTypeFromCondition
* Hide hysteresis in panel expressions
* Refactor isInvalid and add test for it
* Remove unnecesary React.memo
* Add tests for updateEvaluatorConditions
---------
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
* chore: Bump google.golang.org/grpc from 1.59.0 to 1.60.1
* Bump google.golang.org/protobuf to v1.32.0
* Fix make protobuf failing with latest protoc and protoc-gen-go
* Re-generate protobuf files
* Re-generate protobuf files
* Bump grafana-plugin-sdk-go
* go mod tidy
* merge with system settings before storing them in the db
* add base for validating sso settings
* add unit tests for sso settings validation
* call Reload() from sso service upsert()
* remove actual validation because it was moved in a separate pr
* use constant to fix lint error
* check if provider is configurable in service Upsert() method
* add unit tests for update provider settings api method
* fix lint error
* Canvas: Add Zoom
* Scale selecto components based on zoom state
* Fix pan by reverting to 3.1.0 for zoom-pan
* Update to latest library that fixes pan regression
* Add mini map to canvas pan zoom
* Fix selecto and anchors on hover
* Update naming to be more clear
* Switch back to contentComponent
* Apply transformScale to drag and resize
* Update connection source and target scaling
* Add option to display mini map
* Update yarn lock
* Revert "Update yarn lock"
This reverts commit 3d1dd65d57.
* Set yarn lock to main
* Revert "Set yarn lock to main"
This reverts commit 64bc50557e.
* Update to Yarn 4
* Add react-zoom-pan-pinch
* Update react-zoom-pan checksum
* Revert changes to json files
* Remove last line of api merged
* Remove last lines of all impacted jsons
* Update home json
* Update coordinate calc function to include scale
* Fix types in coordinate calc function
* Fix util calculation for transform
* Fix arrow anchor shift behavior
* Fix scale offset when adding elements during zoom
* Fix drag of selected group during zoom
* Add feature flag for canvas pan zoom
* Revert "Add feature flag for canvas pan zoom"
This reverts commit b026e31d8d.
* Regenerate feature flag after merge
* Apply feature flag to enable pan zoom wrappers
* Add mini map toggle behind feature flag
* Simplify minimap behavior
* Update feature flag registry
* Set minimap to false by default
* fix gen-cue
* Set toggles gen to main
Add blank line to toggle gen csv
* Add canvas pan zoom to csv
* Remove old comment
* Change ref parameter to be more descriptive
* Rename visibleFun to be more descriptive
* Consolidate transformScale transformRef in util
* Remove non-null assertion on connection parentRect
* Consolidate parentRect null coalescing into object
* Remove minimap and change toggle
* Add controls inline help for pan and zoom
* Clean up mouse events
* Pull scale out of ref and isolate transform
* Remove transform ref from scene div
* Fix context menu visible behavior
* Fix connections and update util functions
* Move transform component instance to util
* fix backend test
* minor updates
* Clean up connections / fix minor bug where offset of arrow wasn't being calculated correctly
* missed connection code cleanup
* cleanup scene code a bit more
* actually fix backend test
* move eslint disable line closer to actual issue
---------
Co-authored-by: nmarrs <nathanielmarrs@gmail.com>
* Unified Storage: Add resource from/to entity tests
* fixup
* Remove GRN
* Update tests
* truncate timestamps to account for RFC3339, set Group and GroupVersion in k8s object
* Update tests
---------
Co-authored-by: Dan Cech <dcech@grafana.com>
* (WIP) Alerting: Use the forked Alertmanager for remote secondary mode
* fall back to using internal AM in case of error
* remove TODOs, clean up .ini file, add orgId as part of remote AM config struct
* log warnings and errors, fall back to remoteSecondary, fall back to internal AM only
* extract logic to decide remote Alertmanager mode to a separate function, switch on mode
* tests
* make linter happy
* remove func to decide remote Alertmanager mode
* refactor factory function and options
* add default case to switch statement
* remove ineffectual assignment
* Add definition of external service registration
* Add style and tables for permissions needed
* Add external service registration to local without counterpart
* Add feature toggle check
* Add feature flag check in the backend as well
* Add the disclaimer for permissions
---------
Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>
* In migration, create one label per channel
This PR changes how routing is done by the legacy alerting migration.
Previously, we created a single label on each alert rule that contained an array of contact point names. Ex: __contact__="slack legacy testing","slack legacy testing2"
This label was then routed against a series of regex-matching policies with continue=true. Ex: __contacts__ =~ .*"slack legacy testing".*
In the case of many contact points, this array could quickly become difficult to manage and difficult to grok at-a-glance.
This PR replaces the single __contact__ label with multiple __legacy_c_{contactname}__ labels and simple equality-matching policies. These channel-specific policies are nested in a single route under the top-level route which matches against __legacy_use_channels__ = true for ease of organization.
This should improve the experience for users wanting to keep the default migrated routing strategy but who also want to modify which contact points an alert sends to.
* Exclude mapped nodata transitions when nodata mapped to OK
* Fix processEvalResults test
* Don't check NoDataState when filtering transition
* Add comment to explain purpose of separate function
---------
Co-authored-by: William Wernert <william.wernert@grafana.com>
* Nested Folders: Fix /api/folders pagination
We used to check access to the root folders after fetching them from the DB with pagination.
This fix splits logic for fetching folders in:
- fetching subfolders
- fetching root folders
and refactors the query for the latter so that is filters by folders with permissions
* Add tests
* Update benchmarks
* Drop from API response
* Drop from swagger docs
* Drop from integration tests
* regenerate public swagger docs
* Drop from frontend
* Drop asserts for namespaceID field
* replace SSOSettings with SSOSettingsDTO
* fix database tests
* fix oauth strategy
* fix sso settings service tests
* add secrets encryption on update
* rename SSOSettingsDTO to SSOSettings
* remove extraKeys from strategy
* change back settings type from createOAuthConnector to OAuthInfo
* do not parse multi-value fields in oauth strategy
* Move moving average and cumulative sum to private preview
* update docs
* move formatString to private preview
* rebuild docs
* undo changes that don't belong to this commit
* undo cumulative/window featureflag
* fix case
* Configure SkipOrgRoleSync from OAuthInfo
* Remove skipOrgRoleSync from socialbase and connectors
* Add test to socialimpl.ProvideService
* Deprecate AuthSettings' fields
* clean up misleading init of frontendsettings.Auth
* Can add allowed custom headers to an email Message. WIP.
* adds slug as a custom email header to all outgoing emails
* Headers are static - declared as key/value pairs in config. All static headers get added to emails.
* updates comment
* adds tests for parsing smtp static headers
* updates test to assert static headers are included when building email
* updates test to use multiple static headers
* updates test names
* fixes linting issue with error
* ignore gocyclo for loading config
* updates email headers in tests to be formatted properly
* add static headers first
* updates tests to assert that regular headers like From cant be overwritten
* ensures only the header is in a valid format for smtp and not the value
* updates comment and error message wording
* adds to docs and ini sample files
* updates smtp.static_headers docs examples formatting
* removes lines commented with semi colons
* prettier:write
* renames var
* Plugins: add option to disable TLS in the socks proxy
* fix allow_insecure docs
* upgrade github.com/grafana/grafana-plugin-sdk-go from v0.196.0 to v0.197.0
* fix conflicts
* Chore: Bump k8s dependencies to v0.29.0
* update the openapi fork
* use post process spec
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* Add Azure settings and update tests
* Filter by plugin ID
* Add forward settings config variable
* Update line
* Add tests
* Update so that data sources are fully defined in config
* Update SDK and test
* Fix lint
* Update docs/sources/setup-grafana/configure-grafana/_index.md
Co-authored-by: Andrew Hackmann <5140848+bossinc@users.noreply.github.com>
* Remove unnecessary if
---------
Co-authored-by: Andrew Hackmann <5140848+bossinc@users.noreply.github.com>
* Alerting: Send configuration and state to the remote Alertmanager on shutdown
* Alerting: Add a sync interval for ApplyConfig in remote secondary mode
* add routine to sync states and configs
* pass a cancellable context to syncRoutine(), remove tests for ApplyConfig, cache last config in memory
* extract logic to update config and state in the remote Alertmanager
* get latest config from the database
* avoid using separate goroutine for updating state and config
* clean up PR
* refactor, comments, tests
* update tests
* remove canceled context from calls to StopAndWait()
* create context with timeout and send config and state to remote Alertmanager
* update tests
* address code review comments
Swagger(ngalert): Add `X-Disable-Provenance` to missing operations
I added all functions that call the `determineProvenance` function
Schema changes are from:
`make` in `pkg/services/ngalert/api/tooling`
`make swagger-clean && make openapi3-gen` in root
* Alerting: Add a sync interval for ApplyConfig in remote secondary mode
* add routine to sync states and configs
* pass a cancellable context to syncRoutine(), remove tests for ApplyConfig, cache last config in memory
* extract logic to update config and state in the remote Alertmanager
* get latest config from the database
* avoid using separate goroutine for updating state and config
* clean up PR
* refactor, comments, tests
* update tests
* add config struct for remote secondary forked Alertmanager
* use errgroups for sync operations
* use waitgroup instead of errgroup
* remove helper method to sync AMs
* check for errors instead of bool syncErr
* Chore: Remove FolderID from Dashboard Import
* chore: regen specs
* Remove OrgID from DashboardImportRequest and DashboardImportResponse
* Remove OrdIDs from swagger and tests
---------
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
* bidirectional shared crosshair table WIP
* add shared crosshair to table panel
* lower around point threshold
* add feature toggle
* add index based verification
* add adaptive threshold
* switch to debounceTime
* lower debounce to 100
* raise debounce back to 200
* revert azure dashboard
* re-render only rows list on data hover event
* further break down table component
* refactor
* raise debounce time
* fix build
* Add Loki historian store stub
* Add composite store
* Use composite store if Loki historian enabled
* Split store interface into read/write
* Make composite + historian stores read only
* Use variadic constructor for composite
* Modify Loki store enable logic
* Use dskit.concurrency.ForEachJob for parallelism
* Alerting: Refactor readiness check
Moves the readiness check to the mimir client and removes the need to assert that we have senders - it already has a queue and can hold notifications until we're ready to send them.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* `accesscontrol` swagger: Add `global` field to `RoleDTO` type
The field is currently added in the MarshalJSON function so it isn't reflected in the spec
This PR sets the "static" version of the RoleDTO, that has the global field, as the swagger model
* Revert the marshalling logic
* Anonymous: Add device limiter
* break auth if limit reached
* fix typo
* refactored const to make it clearer with expiration
* anon device limit for config
---------
Co-authored-by: Eric Leijonmarck <eric.leijonmarck@gmail.com>
* add GetCommandsFromPipeline
* refactor method GetCommandType to func GetExpressionCommandType
* add function to create fingerprint frames
* add function to determine whether raw query represents a hysteresis command and a function to patch it with loaded metrics
* remove GRN and switch tenant to namespace
* clean up remaining references
* simplify and remove inconsistency in With* parameters
* parse listing keys so we can use db index
* bump the schema version
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* Refactor to prevent cyclic dependencies
* Move list authorization to the API layer
* Init connectors using the SSO settings service in case the ssoSettingsApi feature toggle is enabled
* wip, need to handle the cyclic dep
* Remove cyclic dependency
* Align tests + refactor
* Move back OAuthInfo to social
* Delete pkg/login/social/constants
* Move reloadable registration to the social providers
* Rename connectors.Error to connectors.SocialError
* Send sanitized selectors to the Pyroscope backend for LabelNames and LabelValues
* Clean LabelNames response to remove already used labels
* Improve performance after major changes
* Fix import order
* Further improve rendering performance
* Fix frontend tests
* Fix fake pyroscope client signature
* Bump pyroscope/api dependency to include start/end in LabelNames/LabelValues
* Fix issue with old queries running when using the run button
* Add generated file
* Make code more readable, add a few comments
* Format with prettier
* Fix error when assigning data
* Revert "Add generated file"
This reverts commit c4f33727b8.
* Remove leftover code
* Simplify query editor internal state objects
* Move label selector validation up, improve label filtering
* Simplify query editor state, switch to debounce to reduce rerenders
* Revert cosmetic change
* Chore: Remove FolderID from DTO Folder
* chore: add OrgID field to an instance of SaveDashboardCommand
* chore: add another OrgID to pair with the FolderUID:
* chore: add OrgId to Folder struct and expectedParentOrgIDs to testCase struct, unsure if last part is necessary
* Fix folder test, add expected orgID
* chore: regen specs
---------
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
* Alerting: Attempt to retry retryable errors
Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible.
I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected.
There's two small differences between how retries work now and how they used to work in legacy alerting.
Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying.
We have added a constant backoff of 1s in between retries.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* first round of entityapi updates
- quote column names and clean up insert/update queries
- replace grn with guid
- streamline table structure
fixes
streamline entity history
move EntitySummary into proto
remove EntitySummary
add guid to json
fix tests
change DB_Uuid to DB_NVarchar
fix folder test
convert interface to any
more cleanup
start entity store under grafana-apiserver dskit target
CRUD working, kind of
rough cut of wiring entity api to kube-apiserver
fake grafana user in context
add key to entity
list working
revert unnecessary changes
move entity storage files to their own package, clean up
use accessor to read/write grafana annotations
implement separate Create and Update functions
* go mod tidy
* switch from Kind to resource
* basic grpc storage server
* basic support for grpc entity store
* don't connect to database unless it's needed, pass user identity over grpc
* support getting user from k8s context, fix some mysql issues
* assign owner to snowflake dependency
* switch from ulid to uuid for guids
* cleanup, rename Search to List
* remove entityListResult
* EntityAPI: remove extra user abstraction (#79033)
* remove extra user abstraction
* add test stub (but
* move grpc context setup into client wrapper, fix lint issue
* remove unused constants
* remove custom json stuff
* basic list filtering, add todo
* change target to storage-server, allow entityStore flag in prod mode
* fix issue with Update
* EntityAPI: make test work, need to resolve expected differences (#79123)
* make test work, need to resolve expected differences
* remove the fields not supported by legacy
* sanitize out the bits legacy does not support
* sanitize out the bits legacy does not support
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* update feature toggle generated files
* remove unused http headers
* update feature flag strategy
* devmode
* update readme
* spelling
* readme
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* Alerting: Attempt to retry retryable errors
Currently in a draft state, but this was the minimal diff I could put together to exemplify how could achieve this.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Have the first iteration
* Prepare bench testing
* rename the test files
* Remove unnecessary test file
* Introduce influxqlStreamingParser feature flag
* Apply streaming parser feature flag
* Add new tests
* More tests
* return executedQueryString only in first frame
* add frame meta and config
* Update golden json files
* Support tags/labels
* more tests
* more tests
* Don't change original response_parser.go
* provide context
* create util package
* don't pass the row
* update converter with formatted frameName
* add executedQueryString info only to first frame
* update golden files
* rename
* update test file
* use pointer values
* update testdata
* update parsing
* update converter for null values
* prepare converter for table response
* clean up
* return timeField in fields
* handle no time column responses
* better nil field handling
* refactor the code
* add table tests
* fix config for table
* table response format
* fix value
* if there is no time column set name
* linting
* refactoring
* handle the status code
* add tracing
* Update pkg/tsdb/influxdb/influxql/converter/converter_test.go
Co-authored-by: İnanç Gümüş <m@inanc.io>
* fix import
* update test data
* sanity
* sanity
* linting
* simplicity
* return empty rsp
* rename to prevent confusion
* nullableJson field type for null values
* better handling null values
* remove duplicate test file
* fix healthcheck
* use util for pointer
* move bench test to root
* provide fake feature manager
* add more tests
* partial fix for null values in table response format
* handle partial null fields
* comments for easy testing
* move frameName allocation in readSeries
* one less append operation
* performance improvement by making string conversion once
pkg: github.com/grafana/grafana/pkg/tsdb/influxdb/influxql
│ stream2.txt │ stream3.txt │
│ sec/op │ sec/op vs base │
ParseJson-10 314.4m ± 1% 303.9m ± 1% -3.34% (p=0.000 n=10)
│ stream2.txt │ stream3.txt │
│ B/op │ B/op vs base │
ParseJson-10 425.2Mi ± 0% 382.7Mi ± 0% -10.00% (p=0.000 n=10)
│ stream2.txt │ stream3.txt │
│ allocs/op │ allocs/op vs base │
ParseJson-10 7.224M ± 0% 6.689M ± 0% -7.41% (p=0.000 n=10)
* add comment lines
---------
Co-authored-by: İnanç Gümüş <m@inanc.io>
* Unified Alerting: Set `max_attempts` to 1 by default
The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it.
I see two cons with this approach:
- Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating.
- Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Folders: Show folders user has access to at the root level
* Refactor
* Refactor
* Hide parent folders user has no access to
* Skip expensive computation if possible
* Fix tests
* Fix potential nil access
* Fix duplicated folders
* Fix linter error
* Fix querying folders if no managed permissions set
* Update benchmark
* Add special shared with me folder and fetch available non-root folders on demand
* Fix parents query
* Improve db query for folders
* Reset benchmark changes
* Fix permissions for shared with me folder
* Simplify dedup
* Add option to include shared folder permission to user's permissions
* Fix nil UID
* Remove duplicated folders from shared list
* Folders: Fix fetching empty folder
* Nested folders: Show dashboards with directly assigned permissions
* Fix slow dashboards fetch
* Refactor
* Fix cycle dependencies
* Move shared folder to models
* Fix shared folder links
* Refactor
* Use feature flag for permissions
* Use feature flag
* Review comments
* Expose shared folder UID through frontend settings
* Add frontend type for sharedWithMeFolderUID option
* Refactor: apply review suggestions
* Fix parent uid for shared folder
* Fix listing shared dashboards for users with access to all folders
* Prevent creating folder with "shared" UID
* Add tests for shared folders
* Add test for shared dashboards
* Fix linter
* Add metrics for shared with me folder
* Add metrics for shared with me dashboards
* Fix tests
* Tests: add metrics as a dependency
* Fix access control metadata for shared with me folder
* Use constant for shared with me
* Optimize parent folders access check, fetch all folders in one query.
* Use labels for metrics
* Export Notification Policy correctly (#78020)
The JSON version of an exported Notification Policy now
inline correctly the policy in the same way the Yaml version
does.
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* ngalert `make`: Support GNU install on Darwin
Currently, the Makefile assumes that Darwin is using the Mac version of `sed`
I have the GNU version, so it failed. With this PR, it checks which version is installed
I also called `make` and there are some changes that came out of it
* swagger-gen
The definition for preferences is globally named `Spec` because that's the type that cue outputs
This adds a swagger annotation to rename the definition in the swagger schema to `Preferences`
This will be easier to use in generated clients
* Prepare the test files
* use json files everywhere
* update golden json files
* disable update
* update test file
* fix naming
* lint
* InfluxDB: Add metadata information to first frame only (#78664)
* executedString in first frame only
* lint
* fix tests
* update tests
* don't update
* linting
* update
* update again
* handle nil values
* append in the right array
* add comments
* remove redundant if condition
* update fixed annotation roles if FlagAnnotationPermissionUpdate is enabled
* add dashboard type scope back in the fixed roles to make the migration easier
* Return data in camelCase from the OAuth fb strategy
* changes
* wip
* Add defaults for oauth fb strategy
* revert other changes
* Add tests
* Add Defaults to cfg and use it in OAuthStrategy
* Return *OAuthInfo from OAuthStrategy
* lint
* Remove unnecessary Defaults
* Introduce const for fields, fix import order
* Align failing tests
* clean up
* Changes requested by @gamab
* Update pkg/services/ssosettings/strategies/oauth_strategy_test.go
Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
* Load data on startup
* Rename + simplify
---------
Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
* Alerting: Only warm alert state cache if execute_alerts=true.
If the Grafana instance is not executing alerts, then Warm()-ing the state
manager is wasteful and could lead to misleading rule status queries, as the
status returned will be always based on the state loaded from the database at
startup, and not the most recent evaluation state.
* Move Warm() down to shared conditional.
* Alerting: Add clean_upgrade config and deprecate force_migration
Upgrading to UA and rolling back will no longer delete any data by default.
Instead, each set of tables will remain unchanged when switching between
legacy and UA. As such, the force_migration config has been deprecated
and no extra configuration is required to roll back to legacy anymore.
If clean_upgrade is set to true when upgrading from legacy alerting to Unified
Alerting, grafana will first delete all existing Unified Alerting resources,
thus re-upgrading all organizations from scratch. If false or unset,
organizations that have previously upgraded will not lose their existing Unified
Alerting data when switching between legacy and Unified Alerting.
Similar to force_migration, it should be kept false when not needed as it may
cause unintended data-loss if left enabled.
---------
Co-authored-by: Christopher Moyer <35463610+chri2547@users.noreply.github.com>
* Alerting: Keep track of individual org migration status
Save migration status per migrated org.
Change the meaning (and key/value) of the org_id=0 entry
to store the current (previous) config value used by alerting.
This is so we can know when to upgrade/downgrade by
comparing with the new config value in
UnifiedAlerting.IsEnabled.
* Chore: use errutil for pluginRepo errors
* Update pkg/util/errutil/status.go
* Use errutil helper functions
Co-Authored-By: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Forgot the log level
* Use entity
---------
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Alerting: Add a sync interval for ApplyConfig in remote secondary mode
* remove out of scope code
* remove parentheses after CleanUp for consistency in test comments
* Add comment to ApplyConfig
* Add error to surface for groups groups not valid
* Update pkg/login/social/azuread_oauth.go
---------
Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
* fix timeout issues when gathering prometheus flavor stats
* workaround data race in sdk tracing middleware
* cap concurrency at 10
---------
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Add anonymous stats and user table
- anonymous users users page
- add feature toggle `anonymousAccess`
- remove check for enterprise for `Device-Id` header in request
- add anonusers/device count to stats
* promise all, review comments
* make use of promise all settled
* refactoring: devices instead of users
* review comments, moved countdevices to httpserver
* fakeAnonService for tests and generate openapi spec
* do not commit openapi3 and api-merged
* add openapi
* Apply suggestions from code review
Co-authored-by: Alex Khomenko <Clarity-89@users.noreply.github.com>
* formatin
* precise anon devices to avoid confusion
---------
Co-authored-by: Alex Khomenko <Clarity-89@users.noreply.github.com>
Co-authored-by: jguer <me@jguer.space>
* refactor SSOSettings to use types
* test struct
* refactor SSOSettings struct to use types
* fix database tests
* fix populateSSOSettings() to accept an SSOSettings param
* fix all tests from the database layer
* handle errors for converting to/from SSOSettings
* add json tag on OAuthInfo fields
* use continue instead of if/else
* add the source field to SSOSettingsDTO conversion
* remove omitempty from json tags in OAuthInfo struct
* Prepare the test files
* use json files everywhere
* update golden json files
* disable update
* update test file
* fix naming
* lint
* InfluxDB: Add metadata information to first frame only (#78664)
* executedString in first frame only
* lint
* fix tests
* update tests
* don't update
* linting
* update
* update again
* Alerting: In migration improve deduplication of title and group
This change improves alert titles generated in the legacy migration
that occur when we need to deduplicate titles. Now when duplicate
titles are detected we will first attempt to append a sequential index,
falling back to a random uid if none are unique within 10 attempts.
This should cause shorter and more easily readable deduplicated
titles in most cases.
In addition, groups are no longer deduplicated. Instead we set them
to a combination of truncated dashboard name and humanized alert
frequency. This way, alerts from the same dashboard share a group
if they have the same evaluation interval. In the event that truncation
causes overlap, it won't be a big issue as all alerts will still be in a
group with the correct evaluation interval.
* Split signout_redirect_url into per provider settings
* Split signout_redirect_url into per provider settings
* Update docs/sources/setup-grafana/configure-security/configure-authentication/grafana/index.md
Co-authored-by: Christopher Moyer <35463610+chri2547@users.noreply.github.com>
* Split signout_redirect_url into per provider settings
* Split signout_redirect_url into per provider settings
* Split signout_redirect_url into per provider settings
* Split signout_redirect_url into per provider settings
* Split signout_redirect_url into per provider settings
* Split signout_redirect_url into per provider settings
* update docs
* update devenvs
* add missing struct tag
---------
Co-authored-by: Rao, B V Chalapathi <b_v_chalapathi.rao@nokia.com>
Co-authored-by: Christopher Moyer <35463610+chri2547@users.noreply.github.com>
Co-authored-by: jguer <me@jguer.space>
* Alerting: Introduce a Mimir client as part of the Remote Alertmanager
Mimir client that understands the new APIs developed for mimir. Very much a WIP still.
* more wip
* appease the linter
* more linting
* add more code
* get state from kvstore, encode, send
* send state to the remote Alertmanager, extract fullstate logic into its own function
* pass kvstore to remote.NewAlertmanager()
* refactor
* add fake kvstore to tests
* tests
* use FileStore to get state
* always log 'completed state upload'
* refactor compareRemoteConfig
* base64-encode the state in the file store
* export silences and nflog filenames, refactor
* log 'completed state/config upload...' regardless of outcome
* add values to the state store in tests
* address code review comments
* log error from filestore
---------
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* ExtSvcAuth: Assign roles locally
* Fix test
* HandlePluginStateChanged in the OrgID
* Remove Global from command
* Use AssignmentOrgID instead of OrgID
* Remove unecessary test case
* AuthN: Check API Key is not trying to access another organization
* Revert local change
* Add test
* Discussed with Kalle we should set r.OrgID
* Syntax sugar
* Suggestion org-mismatch
* Alerting: Apply query optimization to eval endpoints
Previously, query optimization was applied to alert queries when scheduled but
not when ran through `api/v1/eval` or `/api/v1/rule/test/grafana`. This could
lead to discrepancies between preview and scheduled alert results.
* Alerting: Add GetFullState method to FileStore
* make tests compile, create stateStore in NewAlertmanager
* return errors instead of logging, accept an arbitrary number of strings
* make NewAlertmanager() accept a stateStore
* Alerting: In migration, fallback to '1s' for malformed min interval
During legacy migration, when we encounter an alert datasource query
with a min interval (interval field in the query model) that is not
parseable, instead of failing the migration we fallback to a min interval
of 1s and continue.
The reason for this is a bug in legacy alerting (existing for a few major
versions) which allows arbitrary dashboard variables to be used as the
min interval, even though those variables do not work and will cause
the legacy alert to fail with `interval calculation failed: time: invalid
duration`.
* Check installer perm
* Failed eval better output
* Switch fetching json data in the repo
* Comment
* Account for feedback
* Mv single_organization config option
* Inline error check
* Starting to replace errors not to have to do the management in two places
* Continue error translation
* Cover ErrChecksumMismatch
* Refactor a bit
* Lint. Tab
* log instead of erroring out
* Nit.
* Revert change on kinds
* revert file again
* Fix tests
* Match core plugin error status code
* Skip permission check for Grafana Admin
* Use errutil templates
* Use errutil templating
* Inline
* Test templating
* revert error changes
* Remove isGrafanaAdmin skip
* Feature toggle check
* Small refactor on hasPluginRequestedPermissions
* Add test
* Imports
* Post install check
* change log messages so that they make sense
* Cover no scope case
* Inline
* Nit.
* Fix test
* regression analysis first dragt
* Swap to better regression libraries
* fix name
* Interpolate x points instead of using source x points
* clean up ui and add feature toggle
* fix merge error
* change to loop for finding min max, rename resolution
* Add docs
* add docs and tests
* change name to regression analysis
* update docs
* Fix editor labels
* add regression images
* fix docs
* Remote Alertmanager(refactor): Only parse the URL once
Exactly what it says in the tin.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* use the existing tests
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Introduce a Mimir client as part of the Remote Alertmanager
This is our first attempt at making Grafana communicate use Mimir as a backend - it uses a new set of APIs that we've developed on the Mimir side to upload the grafana configuration and alertmanager state so that it can then be ported over.
Codewise, we've introduced a couple of things:
A client to isolate in its own package all the communication that happens with Mimir
A few changes to the remote/alertmanager to include uploading the configuration and state when it starts
A few refactors that align a bit better with the design approach that we're thinking
An integration tests again these newly developed APIs using a custom image
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>