Fixes#30144
Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
Introduces org-level isolation for the Alertmanager and its components.
Silences, Alerts and Contact points are not separated by org and are not shared between them.
Co-authored with @davidmparrott and @papagian
This commit adds contact point testing to ngalerts via a new API
endpoint. This endpoint accepts JSON containing a list of
receiver configurations which are validated and then tested
with a notification for a test alert. The endpoint returns JSON
for each receiver with a status and error message. It accepts
a configurable timeout via the Request-Timeout header (in seconds)
up to a maximum of 30 seconds.
* Alerting: Expose discovered and dropped Alertmanagers
Exposes the API for discovered and dropped Alertmanagers.
* make admin config poll interval configurable
* update after rebase
* wordsmith
* More wordsmithing
* change name of the config
* settings package too
* Alerting: modify table and accessors to limit org access appropriately
* Update migration to create multiple Alertmanager configs
* Apply suggestions from code review
Co-authored-by: gotjosh <josue@grafana.com>
* replace mg.ClearMigrationEntry()
mg.ClearMigrationEntry() would create a new session.
This commit introduces a new migration for clearing an entry from migration log for replacing mg.ClearMigrationEntry() so that all dashboard alert migration operations will run inside the same transaction.
It adds also `SkipMigrationLog()` in Migrator interface for skipping adding an entry in the migration_log.
Co-authored-by: gotjosh <josue@grafana.com>
* Alerting: Send alerts to external Alertmanager(s)
Within this PR we're adding support for registering or unregistering
sending to a set of external alertmanagers. A few of the things that are
going are:
- Introduce a new table to hold "admin" (either org or global)
configuration we can change at runtime.
- A new periodic check that polls for this configuration and adjusts the
"senders" accordingly.
- Introduces a new concept of "senders" that are responsible for
shipping the alerts to the external Alertmanager(s). In a nutshell,
this is the Prometheus notifier (the one in charge of sending the alert)
mapped to a multi-tenant map.
There are a few code movements here and there but those are minor, I
tried to keep things intact as much as possible so that we could have an
easier diff.
* Alerting: deactivate an Alertmanager configuration
Implement DELETE /api/alertmanager/grafana/config/api/v1/alerts
by storing the default configuration which stops existing cnfiguration
from being in use.
* Apply suggestions from code review
* Alerting: Implement /status for the notification system
Implements the necessary plumbing to have a /status endpoint on the
notification system.
* Add API examples
* Update API specs
* Update prometheus/common dependency
Co-authored-by: Sofia Papagiannaki <sofia@grafana.com>
* Fix dashboard alert and nootifier migration for MySQL
* Fix POSTing Alertmanager configuration if no current configuration exists
in case the default configuration has not be stored yet
or has failed to get stored
* Change CreatedAt field type
When, and currently only when using a classic condition, evaluation information is added (which is like the EvalMatches from dashboard alerting).
This is returned via the API and can be included in notifications by reading the `__value__` label attached `.Alerts` in the template. It is a string.
* nest cache by orgID, ruleUID, stateID
* update accessors to use new cache structure
* test and linter fixup
* fix panic
Co-authored-by: Kyle Brandt <kyle@grafana.com>
* add comment to identify what's going on with nested maps in cache
Co-authored-by: Kyle Brandt <kyle@grafana.com>
* Quota: Extend service to set limit on alerts
* Add test for applying quota to alert rules
* Apply suggestions from code review
Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>
* Get used alert quota only if naglert is enabled
* Set alert limit to zero if nglalert is not enabled
Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>
* Fix fialure when adding state annotations
* Fix get org rules API
Do not fail response if user has no access to view a namespace.
Do not include the namespace in the response instead.
* lint
* update swagger json files match datasourceUid change
underlying change made in https://github.com/grafana/grafana/pull/33282
* Document DatasourceUID field in AlertQuery model
* Run spec generation from inside a docker container
* Generate latest spec
Co-authored-by: Sofia Papagiannaki <sofia@grafana.com>
* set processing time
* merge labels and set on response
* use state cache for adding alerts to rules
* minor cleanup
* add support for NoData and Error results
* rename test
* bring in changes from other PRs tha have been merged
* pr feedback
* add integration test
* close state tracker cleanup on context.Done
* fixup test
* rename state tracker
* set EvaluationDuration on Result
* default labels set as constants
* separate cache and state from manager
* use RWMutex in cache
* set processing time
* merge labels and set on response
* use state cache for adding alerts to rules
* minor cleanup
* add support for NoData and Error results
* rename test
* bring in changes from other PRs tha have been merged
* pr feedback
* add integration test
* close state tracker cleanup on context.Done
* fixup test
* not those annotations
* [Alerting]: Add alerting endpoint for Query Evaluation
* Fix passing down now parameter
* Add validations and test
* Fix eval queries and expressions test
* Add eval tests
* set processing time
* merge labels and set on response
* use state cache for adding alerts to rules
* minor cleanup
* pr feedback
* Do not initialize mutex unnecessarily
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* linter
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* [Alerting]: Fix updating rule group and add test
* Fix updating rule labels
* Set default values for rule no data and error states
if they are missing
* Add test for updating rule
* Test updating annotations
* Apply suggestions from code review
Co-authored-by: gotjosh <josue@grafana.com>
* add test for posting an unknown rule UID
* Fix alert rule validation and add tests
* Remove org id from PostableGrafanaRule
This field was not used; each rule gets the organisation of the user making
the rerquest
* Update pkg/tests/api/alerting/api_alertmanager_test.go
Co-authored-by: gotjosh <josue@grafana.com>
A set of fixes for the GET alert and groups endpoints.
- First, is the fact that the default values where not being for the query params. I've introduced a new method in the Grafana context that allow us to do this.
- Second, is the fact that alerts were never being transitioned to active. To my surprise this is actually done by the inhibitor in the pipeline - if an alert is not muted, or inhibited then it's active.
- Third, I have added an integration test to cover for regressions.
Signed-off-by: Josue Abreu <josue@grafana.com>
* init
* autogens AM route
* POST dashboards/db spec
* POST alert-notifications spec
* fix description
* re inits vendor, updates grafana to master
* go mod updates
* alerting routes
* renames to receivers
* prometheus endpoints
* align config endpoint with cortex, include templates
* Change grafana receiver type
* Update receivers.go
* rename struct to stop swagger thrashing
* add rules API
* index html
* standalone swagger ui html page
* Update README.md
* Expose GrafanaManagedAlert properties
* Some fixes
- /api/v1/rules/{Namespace} should return a map
- update ExtendedUpsertAlertDefinitionCommand properties
* am alerts routes
* rename prom swagger section for clarity, remove example endpoints
* Add missing json and yaml tags
* folder perms
* make folders POST again
* fix grafana receiver type
* rename fodler->namespace for perms
* make ruler json again
* PR fixes
* silences
* fix Ok -> Ack
* Add id to POST /api/v1/silences (#9)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Add POST /api/v1/alerts (#10)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* fix silences
* Add testing endpoints
* removes grpc replace directives
* [wip] starts validation
* pkg cleanup
* go mod tidy
* ignores vendor dir
* Change response type for Cortex/Loki alerts
* receiver unmarshaling tests
* ability to split routes between AM & Grafana
* api marshaling & validation
* begins work on routing lib
* [hack] ignores embedded field in generation
* path specific datasource for alerting
* align endpoint names with cloud
* single route per Alerting config
* removes unused routing pkg
* regens spec
* adds datasource param to ruler/prom route paths
* Modifications for supporting migration
* Apply suggestions from code review
* hack for cleaning circular refs in swagger definition
* generates files
* minor fixes for prom endpoints
* decorate prom apis with required: true where applicable
* Revert "generates files"
This reverts commit ef7e975584.
* removes server autogen
* Update imported structs from ngalert
* Fix listing rules response
* Update github.com/prometheus/common dependency
* Update get silence response
* Update get silences response
* adds ruler validation & backend switching
* Fix GET /alertmanager/{DatasourceId}/config/api/v1/alerts response
* Distinct gettable and postable grafana receivers
* Remove permissions routes
* Latest JSON specs
* Fix testing routes
* inline yaml annotation on apirulenode
* yaml test & yamlv3 + comments
* Fix yaml annotations for embedded type
* Rename DatasourceId path parameter
* Implement Backend.String()
* backend zero value is a real backend
* exports DiscoveryBase
* Fix GO initialisms
* Silences: Use PostableSilence as the base struct for creating silences
* Use type alias instead of struct embedding
* More fixes to alertmanager silencing routes
* post and spec JSONs
* Split rule config to postable/gettable
* Fix empty POST /silences payload
Recreating the generated JSON specs fixes the issue
without further modifications
* better yaml unmarshaling for nested yaml docs in cortex-am configs
* regens spec
* re-adds config.receivers
* omitempty to align with prometheus API behavior
* Prefix routes with /api
* Update Alertmanager models
* Make adjustments to follow the Alertmanager API
* ruler: add for and annotations to grafana alert (#45)
* Modify testing API routes
* Fix grafana rule for field type
* Move PostableUserConfig validation to this library
* Fix PostableUserConfig YAML encoding/decoding
* Use common fields for grafana and lotex rules
* Add namespace id in GettableGrafanaRule
* Apply suggestions from code review
* fixup
* more changes
* Apply suggestions from code review
* aligns structure pre merge
* fix new imports & tests
* updates tooling readme
* goimports
* lint
* more linting!!
* revive lint
Co-authored-by: Sofia Papagiannaki <papagian@gmail.com>
Co-authored-by: Domas <domasx2@gmail.com>
Co-authored-by: Sofia Papagiannaki <papagian@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: David Parrott <stomp.box.yo@gmail.com>
Co-authored-by: Kyle Brandt <kyle@grafana.com>
* set query in rules response
* Theme: tweaking dark theme colors (#33007)
* Library Panels: Add library panel tab to share modal (#32953)
* Explore: Scroll split panes in Explore independently (#32978)
* Change default prometheus to latest and prometheus v1 to prometheus1
* Update README
* Remove prometheus1 block as not used
* Explore: Separatae scrolling in split view
* Update snapshot
* Allow skip migrations in tests via environment variable (#32958)
* Dashboard: Fix issue where Slack notifications won't link to users (#32861)
* DashboardPage: refactored styles from sass to emotion (#32955)
* DashboardPage: refactored styles from sass to emotion
* refactored dashboardPage component to be alot easier to read and understand
* more refactoring...
* more cleaning...
* fixes frontend test
* fixes frontend test- I hope
* fixes frontend test- I hope
* moves dashboard scss styles back to it's standalone file
* GraphNG: use theme font family and size for axis labels (#33009)
* GraphNG: use theme font family and size for axis labels
* fix test
* AlertingNG: Slack notification channel (#32675)
* AlertingNG: Slack notification channel
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Add tests
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix review comments and small refactoring
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* GraphNG: stacking (#30749)
* First iteration
* Dev dash
* Re-use StackingMode type
* Fix ts and api issues
* Stacking work resurected
* Fix overrides
* Correct values in tooltip and updated test dashboard
* Update dev dashboard
* Apply correct bands for stacking
* Merge fix
* Update snapshot
* Revert go.sum
* Handle null values correctyl and make filleBelowTo and stacking mutual exclusive
* Snapshots update
* Graph->Time series stacking migration
* Review comments
* Indicate overrides in StandardEditorContext
* Change stacking UI editor, migrate stacking to object option
* Small refactor, fix for hiding series and dev dashboard
* VizLegend: sets a min and max value of the seriesCount control in Storybook (#33022)
* Alerting: Filter rules list (#32818)
* Chore: Reduces strict errors (#33012)
* Chore: reduces strict error in OptionPicker tests
* Chore: reduces strict errors in FormDropdownCtrl
* Chore: reduces has no initializer and is not definitely assigned in the constructor errors
* Chore: reduces has no initializer and is not definitely assigned in the constructor errors
* Chore: lowers strict count limit
* Tests: updates snapshots
* Tests: updates snapshots
* Chore: updates after PR comments
* Refactor: removes throw and changes signature for DashboardSrv.getCurrent
* [Alerting]: Several modifications in alert rules (#32983)
* [Alerting]: Use common properties for all rules
* Add Labels in rules
* Fix update ruleGroup API
Return 400 Bad Request response
when the request contains a UID that does not exist
* Check permissions and return namespace id
* Apply suggestions from code review
Co-authored-by: gotjosh <josue@grafana.com>
* WIP (#33025)
* Chore: Bump strict error count limit (#33035)
* set query in rules response
Co-authored-by: Torkel Ödegaard <torkel@grafana.org>
Co-authored-by: kay delaney <45561153+kaydelaney@users.noreply.github.com>
Co-authored-by: Ivana Huckova <30407135+ivanahuckova@users.noreply.github.com>
Co-authored-by: Dafydd <72009875+dafydd-t@users.noreply.github.com>
Co-authored-by: n-wbrown <n-wbrown@users.noreply.github.com>
Co-authored-by: Uchechukwu Obasi <obasiuche62@gmail.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Dominik Prokop <dominik.prokop@grafana.com>
Co-authored-by: Nathan Rodman <nathanrodman@gmail.com>
Co-authored-by: Hugo Häggmark <hugo.haggmark@grafana.com>
Co-authored-by: Sofia Papagiannaki <papagian@users.noreply.github.com>
Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* [Alerting]: Use common properties for all rules
* Add Labels in rules
* Fix update ruleGroup API
Return 400 Bad Request response
when the request contains a UID that does not exist
* Check permissions and return namespace id
* Apply suggestions from code review
Co-authored-by: gotjosh <josue@grafana.com>
* [Alerting]: Fix empty rules evaluation statuses
`GetRuleGroupAlertRules()` requires an non empty namespaceUID
* Include the namespace into the response
* [Alerting]: Use title instead of slug for retrieving the namespace
* Apply suggestions from code review
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
* Alerting: Use a default configuration and periodically poll for new ones
Use a default configuration to make sure we always start the grafana
instance. Then, regularly poll for new ones.
I've also made sure that failures to apply configuration do not stop the
Grafana server but instead keep polling until it is a success.
* add db columns
* Fix deserialisation issue of AlertRule For field (#32848)
* Update to latest alerting-api
Co-authored-by: Sofia Papagiannaki <papagian@users.noreply.github.com>
* Alerting: Cleanup and move legacy to a legacy file
A quick cleanup of the ngalert/api directory, optimising for an easy
removal of what is will be considered legacy at some point. A quick
summary of what's done is:
- Add a prefix `generated` prefix to files that are auto-generated by
our swagger definitions.
- Create a legacy file to place all the legacy API routes implementation
and helpers. Deleting files that where no longer needed after this
move.
- Rename the `lotex` file to `lotex_ruler`
- Adding a couple of comments here and there.
With this, I hope to organise our code in this directory a bit better
given there's a lot going on.
* Return cached alerts for prometheus/api/v1/alerts
* Return not implemented for /prometheus/grafana/api/v1/rules
* Set StartsAt for already alerting states
* Fix tests
* Add validation for grafana recipient
* Alertmanager API implementation (WIP)
* Fix encoding/decoding receiver settings from/to YAML
* Save templates together with the configuration
* update POST to apply latest config
* Alertmanager service enabled by the ngalert toggle
* Silence API integration with Alertmanager
* Apply suggestions from code review
Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Alerting: Introduce the silencing interface
The operations introduced are:
- Listing silences
- Retrieving an specific silence
- Deleting a silence
- Creating a silence
Signed-off-by: Josue Abreu <josue@grafana.com>
* Add a comment to listing silences
* Update to upstream alertmanager
* Remove copied code from the Alertmanager
* Alerting: Fetch configuration from the database and run a notification
instance
Co-Authored-By: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
- Takes the conditions property from the settings column of an alert from alerts table and turns into an ng alerting condition with the queries and classic condition.
- Has temp API rest endpoint that will take the dashboard conditions json, translate it to SEE queries + classic condition, and execute it (only enabled in dev mode).
- Changes expressions to catch query responses with a non-nil error property
- Adds two new states for an NG instance result (NoData, Error) and updates evaluation to match those states
- Changes the AsDataFrame (for frontend) from Bool to string to represent additional states
- Fix bug in condition model to accept first Operator as empty string.
- In ngalert, adds GetQueryDataRequest, which was part of execute and is still called from there. But this allows me to get the Expression request from a condition to make the "pipeline" can be built.
- Update AsDataFrame for evalresult to be row based so it displays a little better for now
* AlertingNG: base API implementation
* Pass the interface instead of the base impl
* Ruler mock draft (WIP)
* Update alerting-api dependency
* Improve mock implementation