# Instrumenting Grafana This guide provides conventions and best practices for instrumenting Grafana using logs, metrics, and traces. ## Logs Logs are files that record events, warnings and errors as they occur within a software environment. Most logs include contextual information, such as the time an event occurred and which user or endpoint was associated with it. ### Usage Use the [pkg/infra/log](/pkg/infra/log/) package to create a named, structured logger. Example: ```go import ( "fmt" "github.com/grafana/grafana/pkg/infra/log" ) logger := log.New("my-logger") logger.Debug("Debug msg") logger.Info("Info msg") logger.Warning("Warning msg") logger.Error("Error msg", "error", fmt.Errorf("BOOM")) ``` ### Naming conventions Name the logger using lowercase characters, for example, `log.New("my-logger")` using snake_case or kebab-case styling. Prefix the logger name with an area name when using different loggers across a feature or related packages; for example, `log.New("plugin.loader")` and `log.New("plugin.client")`. Start the log message with a capital letter, for example, `logger.Info("Hello world")` instead of `logger.Info("hello world")`. The log message should be an identifier for the log entry. Avoid parameterization in favor of key-value pairs for additional data. To be consistent with Go identifiers, prefer using camelCase style when naming log keys; for example, `remoteAddr`. Use the key `Error` when logging Go errors; for example, `logger.Error("Something failed", "error", fmt.Errorf("BOOM"))`. ### Validate and sanitize input coming from user input If log messages or key/value pairs originate from user input they should be validated and sanitized. Be careful not to expose any sensitive information in log messages; for example, secrets and credentials. It's easy to do this by mistake if you include a struct as a value. ### Log levels When should you use each log level? - **Debug:** Informational messages of high frequency, less-important messages during normal operations, or both. - **Info:** Informational messages of low frequency, important messages, or both. - **Warning:** Use warning messages sparingly. If used, messages should be actionable. - **Error:** Error messages indicating some operation failed (with an error) and the program didn't have a way to handle the error. ### Contextual logging Use a contextual logger to include additional key/value pairs attached to `context.Context`. For example, a `traceID`, used to allow correlating logs with traces, correlate logs with a common identifier, either or both. You must [Enable tracing in Grafana](#enable-tracing-in-grafana) to get a `traceID`. For example: ```go import ( "context" "fmt" "github.com/grafana/grafana/pkg/infra/log" ) var logger = log.New("my-logger") func doSomething(ctx context.Context) { ctxLogger := logger.FromContext(ctx) ctxLogger.Debug("Debug msg") ctxLogger.Info("Info msg") ctxLogger.Warning("Warning msg") ctxLogger.Error("Error msg", "error", fmt.Errorf("BOOM")) } ``` ### Enable certain log levels for certain loggers You can enable certain log levels during development to make logging easier. For example, you can enable `debug` to allow certain loggers to minimize the generated log output and makes it easier to find things. Refer to [[log.filters]](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#filters) for information on how to to set different levels for specific loggers. You can also configure multiple loggers. For example: ```ini [log] filters = rendering:debug \ ; alerting.notifier:debug \ oauth.generic_oauth:debug \ ; oauth.okta:debug \ ; tsdb.postgres:debug \ ; tsdb.mssql:debug \ ; provisioning.plugins:debug \ ; provisioning:debug \ ; provisioning.dashboard:debug \ ; provisioning.datasources:debug \ datasources:debug \ data-proxy-log:debug ``` ## Metrics Metrics are quantifiable measurements that reflect the health and performance of applications or infrastructure. Consider using metrics to provide real-time insight into the state of resources. If you want to know how responsive your application is or identify anomalies that could be early signs of a performance issue, metrics are a key source of visibility. ### Metric types See [Prometheus metric types](https://prometheus.io/docs/concepts/metric_types/) for a list and description of the different metric types you can use and when to use them. There are many possible types of metrics that can be tracked. One popular method for defining metrics is the [RED method](https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services/). ### Naming conventions Use the namespace `grafana` to prefix any defined metric names with `grafana_`. This prefix makes it clear for operators that any metric named `grafana_*` belongs to Grafana. Use snake_case style when naming metrics; for example, `http_request_duration_seconds` instead of `httpRequestDurationSeconds`. Use snake_case style when naming labels; for example, `status_code` instead of `statusCode`. If a metric type is a counter, name it with a `_total` suffix; for example, `http_requests_total`. If a metric type is a histogram and you're measuring duration, name it with a `_` suffix; for example, `http_request_duration_seconds`. If a metric type is a gauge, name it to denote that it's a value that can increase and decrease; for example, `http_request_in_flight`. ### Label values and high cardinality Be careful with what label values you accept or add. Using or allowing too many label values could result in [high cardinality problems](https://grafana.com/blog/2022/02/15/what-are-cardinality-spikes-and-why-do-they-matter/). If label values originate from user input they should be validated. Use `metricutil.SanitizeLabelName(