2020-05-05 17:23:30 -05:00
+++
title = "Histograms and heatmaps"
description = "An introduction to histograms and heatmaps"
keywords = ["grafana", "heatmap", "panel", "documentation", "histogram"]
2021-04-05 15:06:14 -05:00
aliases = ["/docs/grafana/latest/getting-started/intro-histograms"]
2020-11-25 17:53:10 -06:00
weight = 700
2020-05-05 17:23:30 -05:00
+++
# Introduction to histograms and heatmaps
A histogram is a graphical representation of the distribution of numerical data. It groups values into buckets
(sometimes also called bins) and then counts how many values fall into each bucket.
Instead of graphing the actual values, histograms graph the buckets. Each bar represents a bucket,
and the bar height represents the frequency (such as count) of values that fell into that bucket's interval.
## Histogram example
2021-06-10 10:41:38 -05:00
This _histogram_ shows the value distribution of a couple of time series. You can easily see that
2020-05-05 17:23:30 -05:00
most values land between 240-300 with a peak between 260-280.
2021-05-28 04:27:40 -05:00
![](/static/img/docs/v43/heatmap_histogram.png)
2020-05-05 17:23:30 -05:00
2021-06-10 10:41:38 -05:00
Here is an example showing height distribution of people.
{{< figure src = "/static/img/docs/histogram-panel/histogram-example-v8-0.png" max-width = "625px" caption = "Bar chart example" > }}
2021-08-25 13:14:00 -05:00
For more information about histogram visualization options, refer to [Histogram ]({{< relref "../visualizations/histogram.md" >}} ).
2021-06-10 10:41:38 -05:00
2021-10-04 14:03:26 -05:00
Histograms only look at _value distributions_ over a specific time range. The problem with histograms is that you cannot see any trends or changes in the distribution over time. This is where heatmaps become useful.
2020-05-05 17:23:30 -05:00
## Heatmaps
2021-10-04 14:03:26 -05:00
A _heatmap_ is like a histogram, but over time, where each time slice represents its own histogram. Instead of using bar height as a representation of frequency, it uses cells, and colors the cell proportional to the number of values in the bucket.
2020-05-05 17:23:30 -05:00
In this example, you can clearly see what values are more common and how they trend over time.
2021-05-28 04:27:40 -05:00
![](/static/img/docs/v43/heatmap_histogram_over_time.png)
2020-05-05 17:23:30 -05:00
2021-08-25 13:14:00 -05:00
For more information about heatmap visualization options, refer to [Heatmap ]({{< relref "../visualizations/heatmap.md" >}} ).
2021-06-10 10:41:38 -05:00
2020-05-05 17:23:30 -05:00
## Pre-bucketed data
2021-10-04 14:03:26 -05:00
There are a number of data sources supporting histogram over time, like Elasticsearch (by using a Histogram bucket
2020-05-05 17:23:30 -05:00
aggregation) or Prometheus (with [histogram ](https://prometheus.io/docs/concepts/metric_types/#histogram ) metric type
2021-10-04 14:03:26 -05:00
and _Format as_ option set to Heatmap). But generally, any data source could be used as long as it meets the requirement
that it either returns series with names representing bucket bounds, or that it returns series sorted by the bounds
in ascending order.
2020-05-05 17:23:30 -05:00
## Raw data vs aggregated
If you use the heatmap with regular time series data (not pre-bucketed), then it's important to keep in mind that your data
2021-10-04 14:03:26 -05:00
is often already aggregated by your time series backend. Most time series queries do not return raw sample data,
but instead include a group by time interval or maxDataPoints limit coupled with an aggregation function (usually average).
2020-05-05 17:23:30 -05:00
This all depends on the time range of your query of course. But the important point is to know that the histogram bucketing
2021-10-04 14:03:26 -05:00
that Grafana performs might be done on already aggregated and averaged data. To get more accurate heatmaps, it is better
to do the bucketing during metric collection, or to store the data in Elasticsearch or any other data source which
2020-05-05 17:23:30 -05:00
supports doing histogram bucketing on the raw data.
2021-10-04 14:03:26 -05:00
If you remove or lower the group by time (or raise maxDataPoints) in your query to return more data points, your heatmap will be
more accurate, but this can also be very CPU and memory taxing for your browser, possibly causing hangs or crashes if the number of
2020-05-05 17:23:30 -05:00
data points becomes unreasonably large.