Learn how to use Wavefront histograms.

Wavefront histograms let you compute, store, and use distributions of metrics rather than single metrics. Histograms are useful for high-velocity metrics about your applications and infrastructure – particularly those gathered across many distributed sources.

Note: The histogram feature requires a separate license and is not enabled on your cluster by default.

Getting Started

Watch this video for an introduction to histograms:

histograms

The following blog posts give some background information:

Why Use Histograms?

Wavefront can receive and store highly granular metrics at 1 point per second per unique source. However, some scenarios generate even higher frequency data. Suppose you are measuring the latency of web requests. If you have a lot of traffic at multiple servers, you may have multiple distinct measurements for a given metric, timestamp, and source. Using “normal” metrics, we can’t measure this because, rather than metric-timestamp-source mapping to a single value, the composite key maps to a multiset (multiple and possibly duplicate values).

One approach to dealing with high frequency data is to calculate an aggregate statistic, such as a percentile, at each source and send only that data. The problem with this approach is that performing an aggregate of a percentile (such as a 95th percentile from a variety of sources) does not yield an accurate and valid percentile with high velocity metrics. That might mean that even though you have an outlier in some of the source data, it becomes obscured by all the other data.

To address high frequency data, Wavefront supports histograms – a mechanism to compute, store, and use distributions of metrics. A Wavefront histogram is a distribution of metrics collected and computed by the Wavefront proxy. Histograms are supported by Wavefront proxy 4.12 and later. To indicate that metrics should be treated as histogram data, the user must send the metrics to a histogram proxy port instead of the normal metrics port (2878).

Once the proxy forwards the histogram data to the Wavefront service, Wavefront rewrites the names of histogram metrics, which you can query with a set of functions.

Wavefront Histogram Distributions

Wavefront creates distributions by aggregating metrics into bins. The following figure illustrates a distribution of 205 points that range in value from 0 to 120 at t = 1 minute, into bins of size 10.

histogram

The following table lists the distribution of one metric at successive minutes. The first row of the table contains the distribution illustrated in the figure. The following rows show how the distribution evolves over successive minutes.

Time (minute)Distribution (number of points)
1 [2, 1, 9, 20, 31, 40, 40, 29, 19, 10, 2, 2]
2 [2, 1, 9, 22, 31, 38, 41, 28, 17, 11, 3, 2]
3 [1, 2, 10, 21, 31, 39, 40, 29, 19, 10, 1, 2]
4 [2, 1, 9, 19, 29, 40, 41, 31, 20, 10, 1, 2]

The Wavefront histogram bin size is computed using a T-digest algorithm, which retains better accuracy at the distribution edges where outliers typically arise. In the algorithm, bin size is not uniform (unlike the histogram illustrated above). However, the bin size that the algorithm selects is irrelevant.

Wavefront histograms do not store each actual data point value that is fed to it. Instead, histograms store the quantiles calculated from histogram points, which are estimates within a certain margin of error.

Histogram Metric Aggregation Intervals

Wavefront supports aggregating metrics by the minute, hour, or day. Intervals start and end on the minute, hour, or day, depending on the granularity that you choose. For example, day-long intervals start at the beginning of each day, UTC time zone.

The aggregation intervals do not overlap. If you are aggregating by the minute, a value reported at 13:58:37 is assigned to the interval [13:58:00;13:59:00]. If no metrics are sent during an interval, no histogram points are recorded.

Sending Histogram Distributions

A distribution allows you to send multiple points with a single value.

To send histogram data as a distribution:

  • Send to the distribution port listed in the table in Histogram Proxy Ports.

  • Use the following format:

    {!M | !H | !D} [<timestamp>] #<points> <metricValue> [... #<points> <metricValue>]
     <metricName> source=<source> <pointTagKey1>=<value1> ... <pointTagKeyn>=<valuen>
    

    where

    • {!M | !H | !D} identifies the aggregation interval (minute, hour, or day) used when computing the distribution
    • points is the number of points.

    For example:

    !M 1493773500 #20 30 #10 5 request.latency source=appServer1 region=us-west
    

    is a distribution that sends 20 points of the metric request.latency with value 30 and 10 points with value 5, that have been aggregated into minute intervals.

Histogram Example

Suppose you want to send the following points to the Wavefront proxy:

10, 20, 20, 30, 40, 100, 100

If you want an hourly aggregation, you can send those points as a distribution to the histogram distribution listener port (by default 40000):

!H <timestamp> #1 10 #2 20 #1 30 #2 100 my.metric source=s1

Here, you specify:

  • the interval, in this case hours (!H)
  • timestamp (optional)
  • a set of sequences. Each sequence starts with #, followed by the number of points and the value of the points. In this example, we have 2 for 20 and 100 because we’re sending that point twice.
  • metric name
  • source
  • optional point tag keys and values

You can also send the histogram data to one of the histogram proxy ports in Wavefront data format. For this example, we use the hour port (40002). You have to send each point separately and include a timestamp, and all points have to arrive within the hour. For example, if you sent a point in the range 3:00-3:59 with !H, it shows at 3:00 with an hs() query.

my.metric 10 <t1> <source>
my.metric 20 <t2> <source>
my.metric 20 <t3> <source>
my.metric 30 <t4> <source>
my.metric 40 <t5> <source>
my.metric 100 <t6> <source>
my.metric 100 <t7> <source>

The proxy aggregates the points and sends only the histogram distribution to Wavefront. The Wavefront service knows only what each bin is and how many points are in each bin. Wavefront does not store the value of each single histogram point, it computes and stores the quantiles.

You can now apply other functions to the histogram, for example, you can try to find out what the 85th percentile of the histogram is. For this example, you could now write a query like this:

percentile (85, hs(my.metric))

Histogram Configuration

Histograms are supported by Wavefront proxy 4.12 and later. To use histograms, ensure that your data is in histogram format, and set the histogram proxy port to send to. Different ports accept different data formats, as shown in the table below. For information on how to configure proxies, see Advanced Proxy Configuration.

Histogram Proxy Ports

To indicate that you are sending histogram data, send the metrics to one of the histogram proxy ports. You can use 40000 for sending a distribution or select one of the other ports to sending an aggregation interval. For example:

Aggregation Interval or DistributionProxy PropertyDefault ValueData Ingestion Format
distribution histogramDistListenerPorts 40000 Distribution data format
minute histogramMinuteListenerPorts 40001 Wavefront data format
hour histogramHourListenerPorts 40002 Wavefront data format
day histogramDayListenerPorts 40003 Wavefront data format

You can send Wavefront data format histogram data only to a minute, hour, or day port.

  • If you send Wavefront data format histogram data to the distribution port, the points are rejected as invalid input format and logged.
  • If you send Wavefront data format histogram data to port 2878 (instead of a min, hour, or day port), the data is not ingested as histogram data but as regular Wavefront data format metrics.

You can send distribution data format histogram data only to the distribution port. If you send Wavefront distribution data format to min, hour, or day ports, the points are rejected as invalid input format and logged.

Histogram Configuration Properties

Wavefront supports additional histogram configuration properties, shown in the following table. Note the requirements on the state directory and the effect of the two persist properties listed at the bottom of the table.

PropertyDescriptionFormat
histogramStateDirectory Directory for persistent proxy state, must be writable. Before being flushed to Wavefront, histogram data is persisted on the filesystem where the Wavefront proxy resides. If the files are corrupted or the files in the directory can't be accessed, the proxy reports the problem in its log and fails back to using in-memory structures. In this mode, samples can be lost if the proxy terminates without draining its queues. Default: /var/spool/wavefront-proxy. A valid path on the local file system.
persistAccumulator Whether to persist accumulation state. We suggest keeping this setting enabled unless you are not using hour and day level aggregation and consider losing up to 1 minute worth of data during proxy restarts acceptable. Default: true. .
persistMessages Whether to persist received metrics to disk. Default: true. Boolean.
histogramAccumulatorResolveInterval Interval in milliseconds to write back accumulation changes from memory cache to disk. Only applicable when memory cache is enabled. Increasing this setting reduces storage IO pressure but might increase heap memory use. Default: 100. Positive integer.
histogramAccumulatorFlushInterval Interval in milliseconds to check for histograms that need to be sent to Wavefront acccording to their histogramMinuteFlushSecs etc settings. Default: 1000. Positive integer.
histogramAccumulatorFlushMaxBatchSize Max number of histograms to move to the outbound queue in one flush. Default: no limit. Positive integer.
histogramReceiveBufferFlushInterval Sets maximum time in milliseconds that incoming points can stay in the receive buffer when incoming traffic volume is very low. Default: 100. Positive integer.
histogramProcessingQueueScanInterval Interval in milliseconds between checks for new entries in the processing queue. Default: 20. Positive integer.
histogramMinuteListenerPorts TCP ports to listen on for histograms to be aggregated by minute. Default: 40001. Comma-separated list of ports. Can be a single port.
histogramMinuteAccumulators Number of accumulators per minute port. In high traffic environments we recommend that the total number of accumulators per proxy across all utilized ports does not exceed the number of available CPU cores. Default: 2. Positive integer.
histogramMinuteFlushSecs Time-to-live, in seconds, for a minute granularity accumulation on the proxy (before the intermediary is sent to Wavefront). Default: 70. Positive integer.
histogramMinuteAccumulatorSize Expected upper bound of concurrent accumulations: ~ #time series * #parallel reporting bins. Default: 100000. Positive integer.
histogramMinuteCompression A bound on the number of centroids per histogram. Default: 100. Positive integer in the interval. [20;1000].
histogramMinuteMemoryCache Enabling memory cache reduces I/O load with fewer time series and higher frequency data (more than 1 point per second per time series). Default: false. Boolean.
histogramMinuteAvgDigestBytes Average number of bytes in an encoded distribution/accumulation. Default: 32 + histogramMinuteCompression * 7 Positive integer.
histogramMinuteAvgKeyBytes Average number of bytes in a UTF-8 encoded histogram key. Concatenation of metric, source, and point tags. Default: 150. Positive integer.
histogramHourListenerPorts TCP ports to listen on for histograms to be aggregated by hour. Default: 40002. Comma-separated list of ports. Can be a single port.
histogramHourAccumulators Number of accumulators per hour port. In high traffic environments we recommend that the total number of accumulators per proxy across all utilized ports does not exceed the number of available CPU cores. Default: 2. Positive integer.
histogramHourFlushSecs Time-to-live, in seconds, for an hour granularity accumulation on the proxy (before the intermediary is sent to Wavefront). Default: 4200. Positive integer.
histogramHourAccumulatorSize Expected upper bound of concurrent accumulations: ~ #time series * #parallel reporting bins. Default: 100000. Positive integer.
histogramHourCompression A bound on the number of centroids per histogram. Default: 100. Positive integer in the interval [20;1000].
histogramHourMemoryCache Enabling memory cache reduces I/O load with fewer time series and higher frequency data (more than 1 point per second per time series). Default: false. Boolean.
histogramHourAvgDigestBytes Average number of bytes in an encoded distribution/accumulation. Default: 32 + histogramMinuteCompression * 7 Positive integer.
histogramHourAvgKeyBytes Average number of bytes in a UTF-8 encoded histogram key. Concatenation of metric, source, and point tags. Default: 150. Positive integer.
histogramDayListenerPorts TCP ports to listen on for histograms to be aggregated by day. Default: 40003. Comma-separated list of ports.
histogramDayAccumulators Number of accumulators per day port. In high traffic environments we recommend that the total number of accumulators per proxy across all utilized ports does not exceed the number of available CPU cores. Default: 2. Positive integer.
histogramDayFlushSecs Time-to-live, in seconds, for a day granularity accumulation on the proxy (before the intermediary is sent to Wavefront). Default: 18000 (5 hours). Positive integer.
histogramDayAccumulatorSize Expected upper bound of concurrent accumulations: ~ #time series * #parallel reporting bins. Default: 100000. Positive integer.
histogramDayCompression A bound on the number of centroids per histogram. Default: 100. Positive integer in the interval [20;1000].
histogramDayMemoryCache Enabling memory cache reduces I/O load with fewer time series and higher frequency data (more than 1 point per second per time series). Default: false. Boolean.
histogramDayAvgDigestBytes Average number of bytes in an encoded distribution/accumulation. Default: 32 + histogramDayCompression * 7 Positive integer.
histogramDayAvgKeyBytes Average number of bytes in a UTF-8 encoded histogram key. Concatenation of metric, source, and point tags. Default: 150. Positive integer.
histogramDistListenerPorts TCP ports to listen on for ingesting histogram distributions. Default: 40000. Comma-separated list of ports. Can be a single port.
histogramDistAccumulators Number of accumulators per distribution port. In high traffic environments we recommend that the total number of accumulators per proxy across all utilized ports does not exceed the number of available CPU cores. Default: number of available CPU cores. Positive integer.
histogramDistFlushSecs Number of seconds to keep a new distribution bin open for new samples, before the intermediary is sent to Wavefront. Default: 70. Positive integer.
histogramDistAccumulatorSize Expected upper bound of concurrent accumulations: ~ #time series * #parallel reporting bins. Default: 100000. Positive integer.
histogramDistCompression A bound on the number of centroids per histogram. Default: 100. Positive integer in the interval [20;1000].
histogramDistMemoryCache Enabling memory cache reduces I/O load with fewer time series and higher frequency data (Aggregating more than 1 distribution per second per time series). Default: false. Boolean.
histogramDistAvgDigestBytes Average number of bytes in an encoded distribution/accumulation. Default: 32 + histogramDistCompression * 7 Positive integer.
histogramDistAvgKeyBytes Average number of bytes in a UTF-8 encoded histogram key. Concatenation of metric, source, and point tags. Default: 150. Positive integer.

Querying Histogram Metrics

Wavefront follows specific naming conventions for histogram metrics and defines functions to query and summarize histogram metrics.

Histogram Metric Naming

You send metrics using the standard Wavefront data format:

<metricName> <metricValue> [<timestamp>] source=<source> <pointTagKey1>=<value1> ... <pointTagKeyn>=<valuen>

For example, request.latency 20 1484877771 source=<source>.

The Wavefront proxy adds the suffixes .m, .h, or .d to the metric name according to the aggregation interval. For example, if the metric request.latency is aggregated over an hour, the metric will be named: request.latency.h.

Histogram Functions

To query histogram metrics, use hs() queries.

You can apply the following functions to the returned data— percentile, max, min, median, merge, align, and count. For example:

  • percentile(<percentile>, hs(<histogramMetricName>.m))– Returns <histogramMetricName> for the <percentile> percentile aggregated over a minute.
  • max(hs(<histogramMetricName>.m)) – Returns the largest value in <histogramMetricName>
  • median(hs(<histogramMetricName>.m)) – Returns the median of <histogramMetricName>
  • merge(hs(<histogramMetricName>.m)) – Merges the centroids and counts of each series and returns the aggregated result <histogramMetricName>. Because this is an aggregation function, you can also group by point tags. ie: merge(hs(<histogramMetricName>.m),key) where key is a point tag name.
  • align(<timeWindow>,hs(<histogramMetricName>.m)) – Allows the user to merge histograms across time buckets. For example, use align(1h, hs(<histogramMetricName>.m)) to output hourly buckets on a minutely histogram.
  • count(hs(<histogramMetricName>.m)) –- Returns the number of values in a distribution.