Some documentation fixes (#39951)

* Some documentation fixes

* Address review comments

* Fixed lint issues using prettifier.

Co-authored-by: achatterjee-grafana <aparajita.chatterjee@grafana.com>
This commit is contained in:
Michael D 2021-10-04 21:03:26 +02:00 committed by GitHub
parent 562cd9e44e
commit ccd7f8a5ba
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 34 additions and 33 deletions

View File

@ -27,11 +27,11 @@ Here is an example showing height distribution of people.
For more information about histogram visualization options, refer to [Histogram]({{< relref "../visualizations/histogram.md" >}}).
Histograms only look at _value distributions_ over a specific time range. The problem with histograms is you cannot see any trends or changes in the distribution over time. This is where heatmaps become useful.
Histograms only look at _value distributions_ over a specific time range. The problem with histograms is that you cannot see any trends or changes in the distribution over time. This is where heatmaps become useful.
## Heatmaps
A _heatmap_ is like a histogram, but over time where each time slice represents its own histogram. Instead of using bar height as a representation of frequency, it uses cells and colors the cell proportional to the number of values in the bucket.
A _heatmap_ is like a histogram, but over time, where each time slice represents its own histogram. Instead of using bar height as a representation of frequency, it uses cells, and colors the cell proportional to the number of values in the bucket.
In this example, you can clearly see what values are more common and how they trend over time.
@ -41,22 +41,23 @@ For more information about heatmap visualization options, refer to [Heatmap]({{<
## Pre-bucketed data
There are a number of data sources supporting histogram over time like Elasticsearch (by using a Histogram bucket
There are a number of data sources supporting histogram over time, like Elasticsearch (by using a Histogram bucket
aggregation) or Prometheus (with [histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) metric type
and _Format as_ option set to Heatmap). But generally, any data source could be used if it meets the requirements:
returns series with names representing bucket bound or returns series sorted by the bound in ascending order.
and _Format as_ option set to Heatmap). But generally, any data source could be used as long as it meets the requirement
that it either returns series with names representing bucket bounds, or that it returns series sorted by the bounds
in ascending order.
## Raw data vs aggregated
If you use the heatmap with regular time series data (not pre-bucketed), then it's important to keep in mind that your data
is often already aggregated by your time series backend. Most time series queries do not return raw sample data
but include a group by time interval or maxDataPoints limit coupled with an aggregation function (usually average).
is often already aggregated by your time series backend. Most time series queries do not return raw sample data,
but instead include a group by time interval or maxDataPoints limit coupled with an aggregation function (usually average).
This all depends on the time range of your query of course. But the important point is to know that the histogram bucketing
that Grafana performs might be done on already aggregated and averaged data. To get more accurate heatmaps it is better
to do the bucketing during metric collection or store the data in Elasticsearch, or in the other data source which
that Grafana performs might be done on already aggregated and averaged data. To get more accurate heatmaps, it is better
to do the bucketing during metric collection, or to store the data in Elasticsearch or any other data source which
supports doing histogram bucketing on the raw data.
If you remove or lower the group by time (or raise maxDataPoints) in your query to return more data points your heatmap will be
more accurate but this can also be very CPU and memory taxing for your browser and could cause hangs and crashes if the number of
If you remove or lower the group by time (or raise maxDataPoints) in your query to return more data points, your heatmap will be
more accurate, but this can also be very CPU and memory taxing for your browser, possibly causing hangs or crashes if the number of
data points becomes unreasonably large.

View File

@ -22,7 +22,7 @@ To identify unique series within a set of time series, Grafana stores dimensions
## Labels
Each time series in Grafana optionally has labels. labels are set a of key/value pairs for identifying dimensions. Example labels could are `{location=us}` or `{country=us,state=ma,city=boston}`. Within a set of time series, the combination of its name and labels identifies each series. For example, `temperature {country=us,state=ma,city=boston}`.
Each time series in Grafana optionally has labels. Labels are set a of key/value pairs for identifying dimensions. Example labels could be `{location=us}` or `{country=us,state=ma,city=boston}`. Within a set of time series, the combination of its name and labels identifies each series. For example, `temperature {country=us,state=ma,city=boston}` could identify the series of temperature values for the city of Boston in the US.
Different sources of time series data have dimensions stored natively, or common storage patterns that allow the data to be extracted into dimensions.
@ -32,7 +32,7 @@ In table databases such SQL, these dimensions are generally the `GROUP BY` param
## Multiple dimensions in table format
In SQL or SQL-like databases that return table responses, additional dimensions usually columns in the query response table.
In SQL or SQL-like databases that return table responses, additional dimensions are usually represented as columns in the query response table.
### Single dimension
@ -44,7 +44,7 @@ SELECT BUCKET(StartTime, 1h), AVG(Temperature) AS Temp, Location FROM T
ORDER BY time asc
```
Might return a table with three columns that each respectively have data types time, number, and string.
This query would return a table with three columns with data types time, number, and string respectively:
| StartTime | Temp | Location |
| --------- | ---- | -------- |
@ -53,7 +53,7 @@ Might return a table with three columns that each respectively have data types t
| 10:00 | 26 | LGA |
| 10:00 | 22 | BOS |
The table format is _long_ formatted time series, also called _tall_. It has repeated time stamps, and repeated values in Location. In this case, we have two time series in the set that would be identified as `Temp {Location=LGA}` and `Temp {Location=BOS}`.
The table format is a _long_ formatted time series, also called _tall_. It has repeated time stamps, and repeated values in Location. In this case, we have two time series in the set that would be identified as `Temp {Location=LGA}` and `Temp {Location=BOS}`.
Individual time series from the set are extracted by using the time typed column `StartTime` as the time index of the time series, the numeric typed column `Temp` as the series name, and the name and values of the string typed `Location` column to build the labels, such as Location=LGA.
@ -80,6 +80,6 @@ In this case the labels that represent the dimensions will have two keys based o
### Multiple values
In the case SQL-like data sources, more than one numeric column can be selected, with or without additional string columns to be used as dimensions. For example, `AVG(Temperature) AS AvgTemp, MAX(Temperature) AS MaxTemp`. This, if combined with multiple dimensions can result in a lot of series. Selecting multiple values is currently only designed to be used with visualization.
In the case of SQL-like data sources, more than one numeric column can be selected, with or without additional string columns to be used as dimensions. For example, `AVG(Temperature) AS AvgTemp, MAX(Temperature) AS MaxTemp`. This, if combined with multiple dimensions, can result in a lot of series. Selecting multiple values is currently only designed to be used with visualization.
Additional technical information on tabular time series formats and how dimensions are extracted can be found in [the developer documentation on data frames as time series]({{< relref "../developers/plugins/data-frames.md#data-frames-as-time-series" >}}).

View File

@ -15,9 +15,9 @@ Imagine you wanted to know how the temperature outside changes throughout the da
| 10:00 | 26°C |
| 11:00 | 27°C |
Temperature data like this is one example of what we call a _time series_—a sequence of measurements, ordered in time. Every row in the table represents one individual measurement at a specific time.
Temperature data like this is one example of what we call a _time series_ a sequence of measurements, ordered in time. Every row in the table represents one individual measurement at a specific time.
Tables are useful when you want to identify individual measurements but make it difficult to see the big picture. A more common visualization for time series is the _graph_, which instead places each measurement along a time axis. Visual representations like the graph make it easier to discover patterns and features of the data that otherwise would be difficult to see.
Tables are useful when you want to identify individual measurements, but they make it difficult to see the big picture. A more common visualization for time series is the _graph_, which instead places each measurement along a time axis. Visual representations like the graph make it easier to discover patterns and features of the data that otherwise would be difficult to see.
{{< figure src="/static/img/docs/example_graph.png" class="docs-image--no-shadow" max-width="850px" >}}
@ -29,14 +29,14 @@ Temperature data like the one in the example, is far from the only example of a
While each of these examples are sequences of chronologically ordered measurements, they also share other attributes:
- New data is appended at the end, at regular intervals—for example, hourly at 09:00, 10:00, 11:00, and so on.
- Measurements are seldom updated after they were added—for example, yesterday's temperature doesn't change.
- New data is appended at the end, at regular intervals for example, hourly at 09:00, 10:00, 11:00, and so on.
- Measurements are seldom updated after they were added for example, yesterday's temperature doesn't change.
Time series are powerful. They help you understand the past by letting you analyze the state of the system at any point in time. Time series could tell you that the server crashed moments after the free disk space went down to zero.
Time series can also help you predict the future, by uncovering trends in your data. If the number of registered users has been increasing monthly by 4% for the past few months, you can predict how big your user base is going to be at the end of the year.
Some time series have patterns that repeat themselves over a known period. For example, the temperature is typically higher during the day, before it dips down at night. By identifying these periodic, or _seasonal_, time series, you can make confident predictions about the next period. If we know that the system load peaks every day around 18:00, we can add more machines right before.
Some time series have patterns that repeat themselves over a known period. For example, the temperature is typically higher during the day, before it dips down at night. By identifying these periodic, or _seasonal_, time series, you can make confident predictions about the next period. If you know that the system load peaks every day around 18:00, you can add more machines right before.
## Aggregating time series
@ -45,7 +45,7 @@ Depending on what you're measuring, the data can vary greatly. What if you wante
Combining a collection of measurements is called _aggregation_. There are several ways to aggregate time series data. Here are some common ones:
- **Average** returns the sum of all values divided by the total number of values.
- **Min** and **Max** return the smallest, and largest value in the collection.
- **Min** and **Max** return the smallest and largest value in the collection.
- **Sum** returns the sum of all values in the collection.
- **Count** returns the number of values in the collection.
@ -81,26 +81,26 @@ We could even take it a step further, by calculating the deltas of these deltas:
1572524345, +30, -1, +1, +0
```
If measurements are taken at regular intervals, most of these delta-of-deltas will be 0. Because of optimizations like these, TSDBs uses drastically less space than other databases.
If measurements are taken at regular intervals, most of these delta-of-deltas will be 0. Because of optimizations like these, TSDBs use drastically less space than other databases.
Another feature of a TSDB is the ability to filter measurements using _tags_. Each data point is labeled with a tag that adds context information, such as where the measurement was taken. Here's an example of the [InfluxDB data format](https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_tutorial/#syntax) that demonstrates how each measurement is stored.
```
weather,location=us-midwest temperature=82 1465839830100400200
| -------------------- -------------- |
| | | |
| | | |
+-----------+--------+-+---------+-+---------+
|measurement|,tag_set| |field_set| |timestamp|
+-----------+--------+-+---------+-+---------+
```
Here are some of the TSDBs supported by Grafana:
- [Graphite](https://graphiteapp.org/)
- [InfluxDB](https://www.influxdata.com/products/influxdb-overview/)
- [Prometheus](https://prometheus.io/)
```
weather,location=us-midwest temperature=82 1465839830100400200
| -------------------- -------------- |
| | | |
| | | |
+-----------+--------+-+---------+-+---------+
|measurement|,tag_set| |field_set| |timestamp|
+-----------+--------+-+---------+-+---------+
```
### Collecting time series data
Now that we have a place to store our time series, how do we actually gather the measurements? To collect time series data, you'd typically install a _collector_ on the device, machine, or instance you want to monitor. Some collectors are made with a specific database in mind, and some support different output destinations.