Monitor and troubleshoot your Wavefront instance and examine version information.

If system performance seems to be deteriorating, you can examine your Wavefront instance and Wavefront proxy with the Wavefront system dashboard, and look at internal metrics to investigate the problem.

This page discusses monitoring your Wavefront instance. It includes a section about examining versions of dashboards and alerts. See Monitoring Wavefront Proxies for details on investigating proxy issues.

Wavefront Internal Metrics Overview

Wavefront collects several categories of internal metrics. This section gives an overview, see Using Internal Metrics to Optimize Performance below for details.

  • ~alert* - set of metrics that allows you to examine the effect of alerts on your Wavefront instance.
  • ~collector - metrics processed at the collector gateway to the Wavefront instance. Includes spans.
  • ~metric - total unique sources and metrics. You can compute the rate of metric creation from each source.
  • ~proxy - metric rate received and sent from each Wavefront proxy, blocked and rejected metric rates, buffer metrics, and JVM stats of the proxy. Also includes counts of metrics affected by the proxy preprocessor.

    See Monitoring Wavefront Proxies.

  • ~wavefront - set of gauges that track metrics about your use of Wavefront.
  • ~http.api - namespace for looking at API request metrics.

If you have an AWS integration, metrics with the following prefix are available:

  • ~externalservices - metric rates, API requests, and events from AWS CloudWatch, AWS CloudTrail, and AWS Metrics+.

There’s also a metric you can use to monitor ongoing events and make sure the number does not exceed 1000:

Charts in the Wavefront Usage Integration Dashboard

The Wavefront Usage integration provides the Wavefront System Usage dashboard that displays metrics that help you find reasons for system slowdown. You can examine many aspects or your Wavefront Instance. We’ll look at the following sections here:

  • Overall Data Rate
  • Wavefront Stats
  • AWS Integration
  • Ingestion Rate by Source

See Monitoring Wavefront Proxies for details on the following sections:

  • Proxy Health
  • Proxy Troubleshooting

Overall Data Rate

The Overall Data Rate section shows the overall point rate being processed by the Wavefront servers.

overall_section

These charts use the following metrics:

  • Data Ingestion Rate
    • ~collector.points.reported – points coming from the proxy.
    • ~collector.direct-ingestion.points.reported – points coming through direct ingestion.
    • ~collector.delta_points.reported – delta counter points.
    • ~externalservices.<*>.points – per-second rate at which Wavefront ingests new points from cloud integrations such as AWS, GCP, and Azure.

    For example, use ~externalservices.ec2.points for the EC2 points.

    • externalservices.points.reported – shows how you get billed for external services.
  • Data Scan Rate - ~query.summaries_scanned, the per second rate at which data points are being queried out of Wavefront through dashboards, alerts, custom charts, or API calls.

Wavefront Stats

Charts that track the number of Wavefront users during various time windows, number of dashboards and alerts, and information about the types of alerts.

wavefront metrics

AWS Integration

If you have an AWS integration and are ingesting AWS CloudWatch, CloudTrail, and API Metrics+ metrics into Wavefront, this section monitors the count of CloudWatch requests, API requests, the point rate, and events coming in from your integration.

aws_metric_sections

The available metrics for the AWS integration are:

  • ~externalservices.cloudwatch.api-requests - number of CloudWatch API requests
  • ~externalservices.cloudwatch.points- number of CloudWatch metrics returned
  • ~externalservices.ec2.points - number of AWS Metrics+ metrics returned
  • ~externalservices.cloudtrail.events - number of CloudTrail events returned
  • ~externalservices.cloudwatch-cycle-timer - time in milliseconds CloudWatch requests take to complete

Ingest Rate by Source

This section gives insight into the shape of your data. It shows the total number of sources reporting. It also monitors the rate of metrics creation and breaks it down by source.

point_rate breakdown

The metrics used in this section are:

  • ~metric.counter - Number of metrics being collected. Does not include internal metrics.

If you’re interested in histogram ingestion by source, clone this dashboard and add a chart that uses the ~histogram.counter metric.

  • ~histogram.counter - Number of histograms being collected. Does not include internal histogram data.

Using Internal Metrics to Optimize Performance

A small set of internal metrics can help you optimize performance and monitor your costs. This section highlights some things to look for - the exact steps depend on how you’re using Wavefront and on the characteristics of your environment.

Wavefront customer support engineers have found the following metrics especially useful.

TypeMetricDescription
~alert ~alert.query_time.<alert_id> Tracks the average time, in ms, that a specified alert took to run in the past hour.
~alert ~alert.query_points.<alert_id> Tracks the average number of points that a specified alert scanned in the past hour.
~alert ~alert.checking_frequency.<alert_id> Tracks how often a specified alert performs a check. See Alert States for details.
~collector ~collector.points.reported
~collector.histograms.reported
~collector.tracing.spans.reported
~collector.tracing.span_logs.reported
~collector.tracing.span_logs.bytes_reported
Valid metric points, histogram points, trace data (spans), or span logs that the collector reports to Wavefront. This is a billing metric that you can look up on the Wavefront Usage dashboard.

Note: We have a corresponding direct ingestion metric for each metric. For example, corresponding to collector.points.reported we have collector.direct-ingestion.points.reported.
~collector ~collector.points.batches
~collector.histograms.batches
~collector.tracing.spans.batches
~collector.tracing.span_logs.batches
Number of batches of points, histogram points, or spans received by the collector, either via the proxy or via the direct ingestion API. In the histogram context a batch is the number of HTTP POST requests.

Note: We have a corresponding direct ingestion metric for each metric. For example, corresponding to collector.spans.batches we have collector.direct-ingestion.spans.batches.
~collector ~collector.points.undecodable
~collector.histograms.undecodable
~collector.tracing.spans.undecodable
~collector.tracing.span_logs.undecodable
Points, histogram points, spans, or span logs that the collector receives but cannot report to Wavefront because the input is not in the right format.

Note: We have a corresponding direct ingestion metric for each metric. For example, corresponding to collector.points.undecodable we have collector.direct-ingestion.points.undecodable.
~metric ~metric.new_host_ids Counter that increments when a new source= or host= is sent to Wavefront.
~metric ~metric.new_metric_ids Counter that increments when a new metric name is sent to Wavefront.
~metric ~metric.new_string_ids Counter that increments when a new point tag value is sent to Wavefront.
~query ~query.requestsCounter tracking the number of queries a user made.
~http.api ~http.api.v2.* Monotonic counter, without tags, that can be aligned with the API endpoints and allows you to examine API request metrics.
For example: ts(~http.api.v2.alert.{id}.GET.200.count) aligns with the GET /api/v2/alert/{id} API endpoint.
Examine the ~http.api.v2. namespace to see the counters for specific API endpoints.

If several slow queries are executed within the selected time window the Slow Query page can become long. Section links at the top left allow you to select a section. The links display only after you have scrolled down the page.

Examining Versions of Dashboards and Alerts

Wavefront stores details about each version of each dashboard and each alert. That means you have an audit trail of changes. When someone saves changes to a dashboard or alert, we create a new version and track the changes, including details about the change and the user who made the change.

You can examine dashboard and alert versions from the UI or using the REST API.

To examine versions of a dashboard:

  1. Select Browse > All Dashboards
  2. Click the three vertical dots to the left of the dashboard you’re interested in and select Versions.
  3. You can review the changes to the dashboard, revert to a previous version, or clone a previous version.

dashboard versions

The process is the same for alerts.