Wavefront supports monitoring time series, histograms, and traces.
Each time series consists of numeric data points for a metric, for example, CPU load or failed network connections. Time series can use one of the supported data formats. The type of data that you’re collecting determines the type of metric. Wavefront supports gauges, counters, delta counters, and more.
- Wavefront histograms let you compute, store, and use distributions of metrics rather than single metrics. Histograms are useful for high-velocity metrics about your applications and infrastructure–-particularly metrics that are gathered across many distributed sources.
- Distributed tracing enables you to track the flow of work that is performed by an application as it processes a user request. We support the OpenTracing standard. You can either visualize and examine traces coming from a 3rd-party system such as Jaeger or Zipkin, or instrument your application for tracing using one of our SDKs.
Summary of Metric Types
The following table gives an overview of metric types. We introduce each type in more detail below.
|Gauge||Shows current value for each point in time.||CPU load, network connections|
|Counter||Shows values as they increase (and decrease).||Number of failed connections, registered users.|
|Delta counter||Useful for monitoring bursty traffic in a Function-as-a-Service (serverless) environment.||Shows how many times an FaaS function executed (or failed).|
|Histogram||Supports computing, storing, and using distributions of metrics that use the Wavefront histogram format.||Useful for very high frequency data. See the discussion of histograms.|
|Trace||A trace shows you how a request propagates from one microservice to the next in a distributed application. The basic building blocks of a trace are its spans.||You can think of a trace as a tree of related spans. The trace has a unique trace ID, which is shared by each member span in the tree. See "Sample Application for an example.|
|Span||Spans are the fundamental units of trace data. Each span corresponds to a distinct invocation of an operation that executes as part of the request.||For example, in our BeachShirts sample application, we have the
A gauge shows the current value for each point in time. Think of a thermometer that shows the current temperature or a gauge that shows how much electricity your Tesla has left.
Many metrics that come into Wavefront are gauges. For example, Wavefront internal metrics include
Counters show information over time. Think of a person with a counter at the entrance to a concert. The counter shows the total number of people that have entered so far.
Counter metrics usually increase over time but might reset back to zero, for example, when a service or system restarts. Users can wrap rate() around a counter if they want to ignore temporary 0 values and see only the positive rate of change.
Wavefront internal metrics that are counters include
Counter Example (Count Total)
In most cases, you can get the information you need from a counter as follows:
- A counter usually represents something like “how many requests have been processed” or “how many errors happened”. You get the metric like this:
- You use the
rate()function to get the corresponding per-second rate so you know, for example, “how many requests have been processed per second?” or “How many errors are happening per second”:
- There are often multiple time series that have the counter (e.g. coming from different sources). Each time series reports the count of the requests received or errors. If you’re interested in the total count across your system, you can use
sum()to sum it up into a single time series.
Counter Example (Count Total Over Time Period)
If you want to count the total number of occurrences of a certain time period, the syntax is slightly more complex. Because counters commonly reset to zero, you need a query that counts the total number of increments over the time period you’re looking at. You want to ignore any counter resets.
Here, we want to get the number of errors for 1 day.
- We start by wrapping the counter with
ratediff(), which, in contrast to
rate()returns the absolute difference between incrementing data points without dividing by the number of seconds between them.
- We use
alignto group the data values of the time series into buckets 1 minute.
align(1m, sum, ratediff(ts(the.counter)))
- We use
rawsum()to combine all time series into one series, and to not use interpolation.
rawsum(align(1m, sum, ratediff(ts(the.counter))))
- Finally, we get the result for 1 day by using the
msum(1d, rawsum(align(1m, sum, ratediff(ts(the.counter)))))
Gauge into Counter
To turn a gauge into a counter, you can use query language functions such as integral. For example, you could convert a
~alert.checking_frequency.My_ID to see the trend in checking frequency instead of the raw data.
Delta counters are well suited for the kind of bursty traffic you typically get in a Function-as-a-Service environment. Many functions execute simultaneously and it’s not possible to monitor bursty traffic like that without losing metric points to collision.
For example, instead of one person with a counter standing at a concert entrance, is an example. No single person can capture the composite count, so you add up the counters. In the same way, the Wavefront service can aggregate delta counter information.
If a metric starts with a delta character, the Wavefront service considers that metric a delta metric. The Wavefront service aggregates delta metric points and stores the aggregated point.
The following illustration compares a counter and a delta counter.
- The counter mycounter sends 3 data points to the Wavefront service. Wavefront stores each value with its timestamp. When you run a query, such as
integral(), the Wavefront service fetches the stored values, aggregates them, and returns the result.
- In the delta counter use case, a FaaS environment runs the function in multiple function invocation instances and sends the points to the Wavefront service. The Wavefront service aggregates the points and stores the result. When the user runs a query, the Wavefront service fetches the already aggregated value.
Wavefront can receive and store metrics at 1 point per second per unique source. However, some scenarios generate metrics even more frequently. Suppose you are measuring the latency of web requests. If you have a lot of traffic at multiple servers, you may have multiple distinct measurements for a given metric, timestamp, and source. Using “normal” metrics, we can’t measure this.
To address high frequency data, Wavefront supports histograms – a mechanism to compute, store, and use distributions of metrics. A Wavefront histogram is a distribution of metrics collected and computed by the Wavefront proxy. Histograms are supported by Wavefront proxy 4.12 and later. Wavefront Histograms describes the histogram format, histogram ports, and some examples.
Traces and Spans
Wavefront follows the OpenTracing standard for representing and manipulating trace data.
A trace represents an individual workflow in an application. A trace shows you how a particular request propagates through your application or among a set of services.
Spans are the individual segments of work in the trace. A Wavefront trace consists of one or more spans. Each span represents time spent by an operation in a service (often a microservice).
Because requests normally consist of other requests, a trace actually consists of a tree of spans.
Use the Metrics Browser to see which metrics are available in your environment and to hide and redisplay metrics.
To view, hide, and redisplay metrics
Hiding and Unhiding Metrics
You can manually hide metrics from the Metrics browser and in the autocomplete dropdown associated with queries. Manually hiding metrics does not permanently delete a metric or metric namespace.
To hide one or more metrics:
To view hidden metrics:
Search this doc set for details on any of the metric types, or read this:
- Delta counters are used by the AWS Lambda Functions Integration and discussed in more detail in AWS Lambda Integration Details
- Histograms are useful for distribution of metrics in high-velocity environment. We support a set of query language functions just for histograms.
- Our Tracing UI lets you drill down from the service level to the individual spans and examine outliers to find bottlenecks.