mirror of
https://github.com/grafana/grafana.git
synced 2025-01-21 22:13:38 -06:00
Docs: Break down alerting HA topics (#48143)
* Initial commit * Added some refinement to the alerting HA topics. * Update docs/sources/administration/set-up-for-high-availability.md Co-authored-by: Christopher Moyer <35463610+chri2547@users.noreply.github.com> * Updates from Chris's review. Also fixed a couple of broken relrefs * Ran prettier Co-authored-by: Christopher Moyer <35463610+chri2547@users.noreply.github.com>
This commit is contained in:
parent
53e9bf47db
commit
7311c9757a
@ -20,17 +20,15 @@ First, you need to set up MySQL or Postgres on another server and configure Graf
|
||||
You can find the configuration for doing that in the [[database]]({{< relref "../administration/configuration.md#database" >}}) section in the Grafana config.
|
||||
Grafana will now persist all long term data in the database. How to configure the database for high availability is out of scope for this guide. We recommend finding an expert on the database you're using.
|
||||
|
||||
## Alerting
|
||||
## Alerting high availability
|
||||
|
||||
**Grafana 8 alerts**
|
||||
Grafana alerting provides a new [highly-available model]({{< relref "../alerting/unified-alerting/high-availability/_index.md" >}}). It also preserves the semantics of legacy dashboard alerting by executing all alerts on every server and by sending notifications only once per alert. Load distribution between servers is not supported at this time.
|
||||
|
||||
Grafana 8 Alerts provides a new highly-available model under the hood. It preserves the previous semantics by executing all alerts on every server and notifications are sent only once per alert. There is no support for load distribution between servers at this time.
|
||||
|
||||
For configuration, [follow the guide]({{< relref "../alerting/unified-alerting/high-availability.md" >}}).
|
||||
For instructions on setting up alerting high availability, see [enable alerting high availability]({{< relref "../alerting/unified-alerting/high-availability/enable-alerting-ha.md" >}}).
|
||||
|
||||
**Legacy dashboard alerts**
|
||||
|
||||
Legacy Grafana alerting supports a limited form of high availability. [Alert notifications]({{< relref "../alerting/old-alerting/notifications.md" >}}) are deduplicated when running multiple servers. This means all alerts are executed on every server but alert notifications are only sent once per alert. Grafana does not support load distribution between servers.
|
||||
Legacy Grafana alerting supports a limited form of high availability. In this model, [alert notifications]({{< relref "../alerting/old-alerting/notifications.md" >}}) are deduplicated when running multiple servers. This means all alerts are executed on every server, but alert notifications are only sent once per alert. Grafana does not support load distribution between servers.
|
||||
|
||||
## Grafana Live
|
||||
|
||||
|
@ -8,7 +8,7 @@ weight = 113
|
||||
|
||||
Grafana 8.0 has new and improved alerting that centralizes alerting information in a single, searchable view. It is enabled by default for all new OSS instances, and is an [opt-in]({{< relref "./opt-in.md" >}}) feature for older installations that still use legacy dashboard alerting. We encourage you to create issues in the Grafana GitHub repository for bugs found while testing Grafana alerting. See also, [What's New with Grafana alerting]({{< relref "./difference-old-new.md" >}}).
|
||||
|
||||
> Refer to [Fine-grained access control]({{< relref "../enterprise/access-control/_index.md" >}}) in Grafana Enterprise to learn more about controlling access to alerts using fine-grained permissions.
|
||||
> Refer to [Fine-grained access control]({{< relref "../../enterprise/access-control/_index.md" >}}) in Grafana Enterprise to learn more about controlling access to alerts using fine-grained permissions.
|
||||
|
||||
When Grafana alerting is enabled, you can:
|
||||
|
||||
|
@ -1,44 +0,0 @@
|
||||
+++
|
||||
title = " High availability"
|
||||
description = "High Availability"
|
||||
keywords = ["grafana", "alerting", "tutorials", "ha", "high availability"]
|
||||
weight = 450
|
||||
+++
|
||||
|
||||
# High availability
|
||||
|
||||
The Grafana alerting system has two main components: a `Scheduler` and an internal `Alertmanager`. The `Scheduler` is responsible for the evaluation of your [alert rules]({{< relref "./fundamentals/evaluate-grafana-alerts.md" >}}) while the internal Alertmanager takes care of the **routing** and **grouping**.
|
||||
|
||||
When it comes to running Grafana alerting in high availability the operational mode of the scheduler is unaffected such that all alerts continue be evaluated in each Grafana instance. Rather the operational change happens in the Alertmanager which **deduplicates** alert notifications across Grafana instances.
|
||||
|
||||
{{< figure src="/static/img/docs/alerting/unified/high-availability-ua.png" class="docs-image--no-shadow" max-width= "750px" caption="High availability" >}}
|
||||
|
||||
The coordination between Grafana instances happens via [a Gossip protocol](https://en.wikipedia.org/wiki/Gossip_protocol). Alerts are not gossiped between instances. It is expected that each scheduler delivers the same alerts to each Alertmanager.
|
||||
|
||||
The two types of messages that are gossiped between instances are:
|
||||
|
||||
- Notification logs: Who (which instance) notified what (which alert)
|
||||
- Silences: If an alert should fire or not
|
||||
|
||||
These two states are persisted in the database periodically and when Grafana is gracefully shutdown.
|
||||
|
||||
## Enable high availability
|
||||
|
||||
To enable high availability support you need to add at least 1 Grafana instance to the [`[ha_peer]` configuration option]({{<relref"../../administration/configuration.md#unified_alerting">}}) within the `[unified_alerting]` section:
|
||||
|
||||
1. In your custom configuration file ($WORKING_DIR/conf/custom.ini), go to the `[unified_alerting]` section.
|
||||
2. Set `[ha_peers]` to the number of hosts for each grafana instance in the cluster (using a format of host:port) e.g. `ha_peers=10.0.0.5:9094,10.0.0.6:9094,10.0.0.7:9094`
|
||||
3. Gossiping of notifications and silences uses both TCP and UDP port 9094. Each Grafana instance will need to be able to accept incoming connections on these ports.
|
||||
4. Set `[ha_listen_address]` to the instance IP address using a format of host:port (or the [Pod's](https://kubernetes.io/docs/concepts/workloads/pods/) IP in the case of using Kubernetes) by default it is set to listen to all interfaces (`0.0.0.0`).
|
||||
|
||||
## Kubernetes
|
||||
|
||||
If you are using Kubernetes, you can expose the pod IP [through an environment variable](https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/) via the container definition such as:
|
||||
|
||||
```bash
|
||||
env:
|
||||
- name: POD_IP
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: status.podIP
|
||||
```
|
@ -0,0 +1,25 @@
|
||||
+++
|
||||
title = " About alerting high availability"
|
||||
description = "High availability"
|
||||
keywords = ["grafana", "alerting", "tutorials", "ha", "high availability"]
|
||||
weight = 450
|
||||
+++
|
||||
|
||||
# About alerting high availability
|
||||
|
||||
The Grafana alerting system has two main components: a `Scheduler` and an internal `Alertmanager`. The `Scheduler` evaluates your [alert rules]({{< relref "../fundamentals/evaluate-grafana-alerts.md" >}}), while the internal Alertmanager manages **routing** and **grouping**.
|
||||
|
||||
When running Grafana alerting in high availability, the operational mode of the scheduler remains unaffected, and each Grafana instance evaluates all alerts. The operational change happens in the Alertmanager when it deduplicates alert notifications across Grafana instances.
|
||||
|
||||
{{< figure src="/static/img/docs/alerting/unified/high-availability-ua.png" class="docs-image--no-shadow" max-width= "750px" caption="High availability" >}}
|
||||
|
||||
The coordination between Grafana instances happens via [a Gossip protocol](https://en.wikipedia.org/wiki/Gossip_protocol). Alerts are not gossiped between instances and each scheduler delivers the same volume of alerts to each Alertmanager.
|
||||
|
||||
The two types of messages gossiped between Grafana instances are:
|
||||
|
||||
- Notification logs: Who (which instance) notified what (which alert).
|
||||
- Silences: If an alert should fire or not.
|
||||
|
||||
The notification logs and silences are persisted in the database periodically and during a graceful Grafana shut down.
|
||||
|
||||
For configuration instructions, refer to [enable alerting high availability]({{< relref "./enable-alerting-ha.md" >}}).
|
@ -0,0 +1,36 @@
|
||||
+++
|
||||
title = "Enable alerting high availability"
|
||||
description = "Enable alerting high availability"
|
||||
keywords = ["grafana", "alerting", "tutorials", "ha", "high availability"]
|
||||
weight = 450
|
||||
+++
|
||||
|
||||
# Enable alerting high availability
|
||||
|
||||
You can enable [alerting high availability]({{< relref "./_index.md" >}}) support by updating the Grafana configuration file. On Kubernetes, you can enable alerting high availability by updating the Kubernetes container definition.
|
||||
|
||||
## Update Grafana configuration file
|
||||
|
||||
### Before you begin
|
||||
|
||||
Since gossiping of notifications and silences uses both TCP and UDP port `9094`, ensure that each Grafana instance is able to accept incoming connections on these ports.
|
||||
|
||||
**To enable high availability support:**
|
||||
|
||||
1. In your custom configuration file ($WORKING_DIR/conf/custom.ini), go to the `[unified_alerting]` section.
|
||||
2. Set `[ha_peers]` to the number of hosts for each Grafana instance in the cluster (using a format of host:port), for example, `ha_peers=10.0.0.5:9094,10.0.0.6:9094,10.0.0.7:9094`.
|
||||
You must have at least one (1) Grafana instance added to the [`[ha_peer]` section.
|
||||
3. Set `[ha_listen_address]` to the instance IP address using a format of `host:port` (or the [Pod's](https://kubernetes.io/docs/concepts/workloads/pods/) IP in the case of using Kubernetes).
|
||||
By default, it is set to listen to all interfaces (`0.0.0.0`).
|
||||
|
||||
## Update Kubernetes container definition
|
||||
|
||||
If you are using Kubernetes, you can expose the pod IP [through an environment variable](https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/) via the container definition such as:
|
||||
|
||||
```bash
|
||||
env:
|
||||
- name: POD_IP
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: status.podIP
|
||||
```
|
@ -13,7 +13,7 @@ Dashboard snapshots are static . Queries and expressions cannot be re-executed f
|
||||
Before you begin, ensure that you have configured a data source. See also:
|
||||
|
||||
- [Working with Grafana dashboard UI]({{< relref "./dashboard-ui/_index.md" >}})
|
||||
- [Dashboard folders]({{< relref "./dashboard_folders.md" >}})
|
||||
- [Dashboard folders]({{< relref "./dashboard-folders.md" >}})
|
||||
- [Create dashboard]({{< relref "./dashboard-create" >}})
|
||||
- [Manage dashboards]({{< relref "./dashboard-manage.md" >}})
|
||||
- [Annotations]({{< relref "./annotations.md" >}})
|
||||
@ -22,7 +22,7 @@ Before you begin, ensure that you have configured a data source. See also:
|
||||
- [Keyboard shortcuts]({{< relref "./shortcuts.md" >}})
|
||||
- [Reporting]({{< relref "./reporting.md" >}})
|
||||
- [Time range controls]({{< relref "./time-range-controls.md" >}})
|
||||
- [Dashboard version history]({{< relref "./dashboard_history.md" >}})
|
||||
- [Dashboard version history]({{< relref "./dashboard-history.md" >}})
|
||||
- [Dashboard export and import]({{< relref "./export-import.md" >}})
|
||||
- [Dashboard JSON model]({{< relref "./json-model.md" >}})
|
||||
- [Scripted dashboards]({{< relref "./scripted-dashboards.md" >}})
|
||||
|
@ -13,7 +13,7 @@ Grafana supports user authentication through Okta, which is useful when you want
|
||||
## Before you begin
|
||||
|
||||
- To configure SAML integration with Okta, create integration inside the Okta organization first. [Add integration in Okta](https://help.okta.com/en/prod/Content/Topics/Apps/apps-overview-add-apps.htm)
|
||||
- Ensure you have permission to administer SAML authentication. For more information about permissions, refer to [About users and permissions]({{< relref "../manage-users-and-permissions/about-users-and-permissions.md#">}}).
|
||||
- Ensure you have permission to administer SAML authentication. For more information about permissions, refer to [About users and permissions]({{< relref "../../administration/manage-users-and-permissions/about-users-and-permissions.md#">}}).
|
||||
|
||||
**To set up SAML with Okta:**
|
||||
|
||||
|
@ -214,7 +214,7 @@ This release includes a series of features that build on our new usage analytics
|
||||
|
||||
### SAML Role and Team Sync
|
||||
|
||||
SAML support in Grafana Enterprise is improved by adding Role and Team Sync. Read more about how to use these features in the [SAML team sync documentation]({{< relref "../enterprise/saml.md#configure-team-sync" >}}).
|
||||
SAML support in Grafana Enterprise is improved by adding Role and Team Sync. Read more about how to use these features in the [SAML team sync documentation]({{< relref "../enterprise/saml/configure-saml.md#configure-team-sync" >}}).
|
||||
|
||||
### Okta OAuth Team Sync
|
||||
|
||||
|
@ -202,7 +202,7 @@ For more information, refer to [Export logs of usage insights]({{< relref "../en
|
||||
|
||||
### New audit log events
|
||||
|
||||
New log out events are logged based on when a token expires or is revoked, as well as [SAML Single Logout]({{< relref "../enterprise/saml.md#single-logout" >}}). A `tokenId` field was added to all audit logs to help understand which session was logged out of.
|
||||
New log out events are logged based on when a token expires or is revoked, as well as [SAML Single Logout]({{< relref "../enterprise/saml/configure-saml.md#single-logout" >}}). A `tokenId` field was added to all audit logs to help understand which session was logged out of.
|
||||
|
||||
Also, a counter for audit log writing actions with status (success / failure) and logger (loki / file / console) labels was added.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user