grafana/docs/sources/best-practices/common-observability-strategies.md

+++
title = "Common observability strategies"
description = "Common observability strategies"
keywords = ["grafana", "intro", "guide", "concepts", "methods"]
aliases = ["/docs/grafana/latest/getting-started/strategies/"]
weight = 300
+++

# Common observability strategies

When you have a lot to monitor, like a server farm, you need a strategy to decide what is important enough to monitor. This page describes several common methods for choosing what to monitor.

A logical strategy allows you to make uniform dashboards and scale your observability platform more easily.

## Guidelines for usage

- The USE method tells you how happy your machines are, the RED method tells you how happy your users are.
- USE reports on causes of issues.
- RED reports on user experience and is more likely to report symptoms of problems.
- The best practice of alerting is to alert on symptoms rather than causes, so alerting should be done on RED dashboards.

## USE method

USE stands for:

- **Utilization -** Percent time the resource is busy, such as node CPU usage
- **Saturation -** Amount of work a resource has to do, often queue length or node load
- **Errors -** Count of error events

This method is best for hardware resources in infrastructure, such as CPU, memory, and network devices. For more information, refer to [The USE Method](http://www.brendangregg.com/usemethod.html).

## RED method

RED stands for:

- **Rate -** Requests per second
- **Errors -** Number of requests that are failing
- **Duration -** Amount of time these requests take, distribution of latency measurements

This method is most applicable to services, especially a microservices environment. For each of your services, instrument the code to expose these metrics for each component. RED dashboards are good for alerting and SLAs. A well-designed RED dashboard is a proxy for user experience.

For more information, refer to Tom Wilkie's blog post [The RED method: How to instrument your services](https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services).

## The Four Golden Signals

According to the [Google SRE handbook](https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals), if you can only measure four metrics of your user-facing system, focus on these four.

This method is similar to the RED method, but it includes saturation.

- **Latency -** Time taken to serve a request
- **Traffic -** How much demand is placed on your system
- **Errors -** Rate of requests that are failing
- **Saturation -** How "full" your system is

[Here's an example from Grafana Play](https://play.grafana.org/d/000000109/the-four-golden-signals?orgId=1).
Docs: Add strategies topic (#26103) * Create use.md * Update timeseries.md * changed file name and added content * Update menu.yaml * Update strategies.md 2020-07-07 14:40:36 -05:00			`+++`
Docs: Add best practices (#27057) * added content * adding content * Update best-practices-for-creating-dashboards.md * moved strategies.md * updating content * Update best-practices-for-creating-dashboards.md * Update best-practices-for-managing-dashboards.md * Update best-practices-for-creating-dashboards.md * Update best-practices-for-managing-dashboards.md * Update best-practices-for-creating-dashboards.md * content updates * Update common-observability-strategies.md * Update best-practices-for-managing-dashboards.md * Update best-practices-for-managing-dashboards.md * Update dashboard-management-maturity-levels.md * updated menu * Update dashboard-management-maturity-levels.md * added Torkel tips * Update best-practices-for-creating-dashboards.md * Update docs/sources/best-practices/best-practices-for-creating-dashboards.md Co-authored-by: Peter Holmberg <peterholmberg@users.noreply.github.com> * Update docs/sources/best-practices/best-practices-for-managing-dashboards.md Co-authored-by: Peter Holmberg <peterholmberg@users.noreply.github.com> * Update docs/sources/best-practices/best-practices-for-managing-dashboards.md Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Update docs/sources/best-practices/dashboard-management-maturity-levels.md Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Update docs/sources/best-practices/dashboard-management-maturity-levels.md Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Update best-practices-for-creating-dashboards.md * DavKal updates * Update best-practices-for-creating-dashboards.md Co-authored-by: Peter Holmberg <peterholmberg@users.noreply.github.com> Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> 2020-08-26 10:31:19 -05:00			`title = "Common observability strategies"`
			`description = "Common observability strategies"`
Docs: Add strategies topic (#26103) * Create use.md * Update timeseries.md * changed file name and added content * Update menu.yaml * Update strategies.md 2020-07-07 14:40:36 -05:00			`keywords = ["grafana", "intro", "guide", "concepts", "methods"]`
Docs: Replace next with latest in aliases (#33054) 2021-04-15 16:08:58 -05:00			`aliases = ["/docs/grafana/latest/getting-started/strategies/"]`
Change metadata of doc topics (#27943) * Changes. to metadata in Manage users * changes to install section * Added titile * More changes. * Updated administration folder metadata, moved 2 files from installation folder. * Added links to Administration landing page, other metadata changes worked out. * Updated alerting section metadata. * Updated metadata of Auth section. Broke index and created separate Grafana Authentication section. * Added correct weight. * Updated metadata for the Best practices section. * Updated metadata in templates, broke Overview topic. * Updated subment Variable types metadata * Fixed yaml file and H1 description of Variables syntax topic. * Couple more metadata changes. * Added aliases files, as suggested by Diana. 2020-10-01 16:37:26 -05:00			`weight = 300`
Docs: Add strategies topic (#26103) * Create use.md * Update timeseries.md * changed file name and added content * Update menu.yaml * Update strategies.md 2020-07-07 14:40:36 -05:00			`+++`

			`# Common observability strategies`

			`When you have a lot to monitor, like a server farm, you need a strategy to decide what is important enough to monitor. This page describes several common methods for choosing what to monitor.`

			`A logical strategy allows you to make uniform dashboards and scale your observability platform more easily.`

			`## Guidelines for usage`

			`- The USE method tells you how happy your machines are, the RED method tells you how happy your users are.`
Docs: updates for file-based menu (#28500) * initial * cleanup * remove bad aliases * cleanup, fix links * add docs-file-based-command * update docs * update readme * fix broken links * fix spelling Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com> 2020-11-09 14:26:49 -06:00			`- USE reports on causes of issues.`
Docs: Add strategies topic (#26103) * Create use.md * Update timeseries.md * changed file name and added content * Update menu.yaml * Update strategies.md 2020-07-07 14:40:36 -05:00			`- RED reports on user experience and is more likely to report symptoms of problems.`
			`- The best practice of alerting is to alert on symptoms rather than causes, so alerting should be done on RED dashboards.`

			`## USE method`

			`USE stands for:`

			`- Utilization - Percent time the resource is busy, such as node CPU usage`
			`- Saturation - Amount of work a resource has to do, often queue length or node load`
			`- Errors - Count of error events`

			`This method is best for hardware resources in infrastructure, such as CPU, memory, and network devices. For more information, refer to [The USE Method](http://www.brendangregg.com/usemethod.html).`

			`## RED method`

			`RED stands for:`

			`- Rate - Requests per second`
			`- Errors - Number of requests that are failing`
			`- Duration - Amount of time these requests take, distribution of latency measurements`

			`This method is most applicable to services, especially a microservices environment. For each of your services, instrument the code to expose these metrics for each component. RED dashboards are good for alerting and SLAs. A well-designed RED dashboard is a proxy for user experience.`

			`For more information, refer to Tom Wilkie's blog post [The RED method: How to instrument your services](https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services).`

			`## The Four Golden Signals`

			`According to the [Google SRE handbook](https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals), if you can only measure four metrics of your user-facing system, focus on these four.`

			`This method is similar to the RED method, but it includes saturation.`

			`- Latency - Time taken to serve a request`
			`- Traffic - How much demand is placed on your system`
			`- Errors - Rate of requests that are failing`
			`- Saturation - How "full" your system is`
Docs: Add best practices (#27057) * added content * adding content * Update best-practices-for-creating-dashboards.md * moved strategies.md * updating content * Update best-practices-for-creating-dashboards.md * Update best-practices-for-managing-dashboards.md * Update best-practices-for-creating-dashboards.md * Update best-practices-for-managing-dashboards.md * Update best-practices-for-creating-dashboards.md * content updates * Update common-observability-strategies.md * Update best-practices-for-managing-dashboards.md * Update best-practices-for-managing-dashboards.md * Update dashboard-management-maturity-levels.md * updated menu * Update dashboard-management-maturity-levels.md * added Torkel tips * Update best-practices-for-creating-dashboards.md * Update docs/sources/best-practices/best-practices-for-creating-dashboards.md Co-authored-by: Peter Holmberg <peterholmberg@users.noreply.github.com> * Update docs/sources/best-practices/best-practices-for-managing-dashboards.md Co-authored-by: Peter Holmberg <peterholmberg@users.noreply.github.com> * Update docs/sources/best-practices/best-practices-for-managing-dashboards.md Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Update docs/sources/best-practices/dashboard-management-maturity-levels.md Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Update docs/sources/best-practices/dashboard-management-maturity-levels.md Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> * Update best-practices-for-creating-dashboards.md * DavKal updates * Update best-practices-for-creating-dashboards.md Co-authored-by: Peter Holmberg <peterholmberg@users.noreply.github.com> Co-authored-by: Emil Tullstedt <emil.tullstedt@grafana.com> 2020-08-26 10:31:19 -05:00
			`[Here's an example from Grafana Play](https://play.grafana.org/d/000000109/the-four-golden-signals?orgId=1).`