A trace shows you how a request propagates from one microservice to the next in a distributed application. The basic building blocks of a trace are its spans, where each span corresponds to a distinct invocation of an operation that executes as part of the request.
Spans are the fundamental units of trace data. This page provides details about the Wavefront format of a span, as well as the RED metrics that Wavefront automatically derives from spans. These details are mainly useful for developers who need to perform advanced customization.
Wavefront Span Format
A well-formed Wavefront span consists of fields and span tags that capture span attributes. These attributes enable Wavefront to identify and describe the span, organize it into a trace, and display the trace according to the service and application that emitted it. Some attributes are required by the OpenTracing specification and others are required by Wavefront.
Most use cases do not require you to know exactly how Wavefront expects a span to be formatted:
- When you instrument your application with a Wavefront OpenTracing SDK or a framework SDK, your application emits spans that are automatically constructed by the Wavefront Tracer. (You supply some of the attributes when you instantiate the ApplicationTags object required by the SDK.)
- When you instrument your application with a Wavefront sender SDK, your application emits spans that are automatically constructed from raw data you pass as parameters.
- When you instrument your application with a 3rd party distributed tracing system, your application emits spans that are automatically transformed by the integration you set up.
It is possible to manually construct a well-formed span and send it either directly to the Wavefront service or to a TCP port that the Wavefront proxy is listening on for trace data. You might want to do this if you instrumented your application with a proprietary distributed tracing system.
<operationName> source=<source> <spanTags> <start_milliseconds> <duration_milliseconds>
Fields must be space separated and each line must be terminated with the newline character (\n or ASCII hex 0A).
getAllUsers source=localhost traceId=7b3bf470-9456-11e8-9eb6-529269fb1459 spanId=0313bafe-9457-11e8-9eb6-529269fb1459 parent=2f64e538-9457-11e8-9eb6-529269fb1459 application=Wavefront service=auth cluster=us-west-2 shard=secondary http.method=GET 1552949776000 343
||Yes||The string name that indicates the operation represented by the span.||Valid characters: a-z, A-Z, 0-9, hyphen ("-"), underscore ("_"), dot (".").
Length: less than 1024 characters.
||Yes||The string name of a host or container on which the represented operation executed.||Valid characters: a-z, A-Z, 0-9, hyphen ("-"), underscore ("_"), dot (".").
Length: less than 1024 characters.
||Yes||See Span Tags, below.|
||Yes||Start time of the span, expressed as epoch time elapsed since 00:00:00 Coordinated Universal Time (UTC) on January 1, 1970.||Whole number of epoch milliseconds or other units (see below).|
||Yes||Duration of the span.||Whole number of milliseconds or other units (see below). Must be greater than or equal to 0.|
Span tags are special tags associated with a span. Many of these span tags are required for a span to be valid. An application can be instrumented to include custom span tags as well. Custom tag names must not use the reserved span tag names listed in the following tables.
Note: The maximum allowed length for a combination of a span tag key and value is 254 characters (255 including the “=” separating key and value). If the value is longer, the span is rejected.
The following table lists span tags that contain information about the span’s identity and relationships.
|Span Tags |
||Yes||Unique identifier of the trace the span belongs to. All spans that belong to the same trace share a common trace ID.||UUID|
||Yes||Unique identifier of the span.||UUID|
||No||Identifier of the span’s dependent parent, if it has one. This tag is populated as the result of an OpenTracing
||No||Identifier of the span’s non-dependent parent, if it has one. This tag is populated as the result of an OpenTracing
The following table lists span tags that describe the architecture of the instrumented application that emitted the span. Wavefront uses these tags to aggregate and filter trace data at different levels of granularity. These tags correspond to the application tags you set through a Wavefront observability SDK.
|Span Tags |
||Yes||Name of the instrumented application that emitted the span.||String|
||Yes||Name of the instrumented microservice that emitted the span.||String|
||Yes||Name of a group of related hosts that serves as a cluster or region in which the instrumented application runs.
Specify cluster=none to indicate a span that does not use this tag.
||Yes||Name of a subgroup of hosts within the cluster, for example, a mirror.
Specify shard=none to indicate a span that does not use this tag.
Wavefront does not allow the mandatory span tags to have multiple values. Make sure that your application does not send spans with multiple application/service tags.
For example, a span with two span tags
service=backend is invalid.
Note: Additional span tags may be present, depending on how you instrumented your application. For example, the framework SDKs automatically use span tags like
http.method, and so on. You can find out about these tags in the README file for the SDK on GitHub.
Time-Value Precision in Spans
A span has two time-value fields for specifying the start time (
start_milliseconds) and duration (
duration_milliseconds). Express these values in milliseconds, because Wavefront uses milliseconds for span storage and visualization. For convenience, you can specify time values in other units. Wavefront converts the values to milliseconds.
Wavefront requires that you use the same precision for both time values. Wavefront identifies the precision of the
start_milliseconds value, and interprets the
duration_milliseconds value using the same unit. The following table shows how to indicate the start-time precision:
|Precision for |
Start Time Values
|Number Format||Sample |
|Stored As |
|Seconds||Fewer than 13 digits||
||Multiplied by 1000|
(Thousandths of a second)
|13 to 15 digits||
(Millionths of a second)
|16 to 18 digits||
(Billionths of a second)
|19 or more digits||
Note: When specifying a span in Wavefront span format, make sure you adjust values as necessary so that the units match. For example, suppose you know a span started at
1533529977627 epoch milliseconds, and lasted for
3 seconds. In Wavefront span format, you could specify either of the following pairs of time values:
||(both values in seconds)|
||(both values in milliseconds)|
Indexed and Unindexed Span Tags
Wavefront uses indexes to optimize the performance of queries that filter on certain span tags. For example, Wavefront indexes the application tags (
shard) so you can quickly query for spans that represent operations from a particular application, service, cluster, or shard. In addition to the application tags, Wavefront indexes certain built-in span tags that conform to the OpenTracing standard, such as
For performance reasons, Wavefront automatically indexes built-in span tags with low cardinality. (A tag with low cardinality has comparatively few unique values that can be assigned to it.) So, for example, a tag like
spanId is not indexed.
Note: Wavefront does not automatically index any custom span tags that you might have added when you instrumented your application. If you plan to use a low-cardinality custom span tag in queries, contact Wavefront support to request indexing for that span tag.
RED Metrics Derived From Spans
If you instrument your application with a tracing-system integration or with a Wavefront OpenTracing SDK, Wavefront derives RED metrics from the spans that are sent from the instrumented application. Wavefront automatically provides the corresponding span RED metrics and trace RED metrics for the spans with no additional configuration or instrumentation on your part.
RED metrics are key indicators of the health of your services, and you can use them to help you discover problem traces. RED metrics are measures of:
- Rate of requests – the number of requests being served per minute
- Errors – the number of failed requests per minute
- Duration – per-minute histogram distributions of the amount of time that each request takes
The derived RED metrics are operation-level, which means that they measure individual operations, and not whole traces. For example, an operation-level metric might measure then number of calls per minute to the
dispatch operation in the
delivery service, where each call to
dispatch might correspond to one of many spans in a trace.
Operation-level and Trace-level RED Metrics
Wavefront uses ingested spans to derive RED metrics for two kinds of request:
Operation-level RED metrics measure individual operations, typically within a single service. For example, an operation-level metric might measure number of calls per minute to the
dispatchoperation in the
Wavefront uses operation-level metrics as the basis for the predefined charts shown below.
Trace-level RED metrics measure traces that start with a given root operation. For example, a trace-level metric might measure the number of traces that each start with a call to the
orderShirtsoperation in the
Wavefront derives trace-level metrics from each trace’s root span and end span. (If a trace has multiple root spans, the earliest is used.) You need to query for trace-level metrics to visualize them.
Note: For traces that consist entirely of synchronous member spans, trace-level RED metrics are equivalent to the corresponding operation-level RED metrics. For traces that have asynchronous member spans, trace-level RED metrics provide more accurate measures of trace duration, especially when a trace’s root span ends before a child span.
Wavefront automatically generates charts to display the auto-derived RED metrics for a particular service. To view these charts:
- Select Applications > Inventory in the Wavefront task bar. If necessary, scroll to find your application and its services.
- Click on the service you want to see metrics for.
- If you instrumented your application with a Wavefront SDK, look for the charts in the Overview section. (If you used a tracing-system integration, the charts are in the only section on the page.)
The predefined charts let you view:
- The per-minute Request Rate, per-minute Error Rate, and Duration (P95) for all requests that are processed by the service.
- The “top” operations each category: the most frequently invoked operations, the operations with the most errors, and the slowest operations. You can click on an operation in one of these charts to view the just the traces that contain spans for that operation.
Note: A service page also displays RED metrics that are collected and sent by the framework SDKs. These SDKs report the RED metrics directly from the instrumented framework APIs, instead of deriving them from the reported spans. (Other metrics and histograms might be sent as well.)
RED Metric Counters and Histograms
The types of RED metrics that we show in the predefined charts are rates and 95th percentile distributions. These metrics are themselves based on underlying counters and histograms that Wavefront automatically derives from spans. You can use these underlying counters and histograms in RED metrics queries, for example, to create alerts on trace data.
Wavefront constructs the names of the underlying counters and histograms as shown in the table below. The name components
<operationName> are string values that Wavefront obtains from the spans on which the metrics are derived. If necessary, Wavefront modifies these strings to comply with the Wavefront metric name format. Wavefront also associates each metric with point tags
operationName, and assigns the corresponding span tag values to these point tags. The span tag values are used without modification.
|Operation-Level Metric Names||Metric Type||Description|
||Counter||The number of times that the specified operation is invoked.
Used in the Request Rate chart that is generated for a service.
||Counter||The number of invoked operations that have errors (i.e., spans with
Used in the Error Rate chart that is generated for a service.
||Wavefront histogram||The duration of each invoked operation, in microseconds, aggregated in one-minute intervals.
Used in the Duration chart that is generated for a service.
|Trace-Level Metric Names||Metric Type||Description|
||Counter||The number of traces that start with the specified root operation.|
||Counter||The number of traces that start with the root operation, and contain one or more spans with errors
(i.e., spans with
||Wavefront histogram||The duration of each trace, in milliseconds, aggregated in one-minute intervals. Duration is measured from the start of the earliest root span to the end of the last span in a trace.|
RED Metrics Queries for Charts and Alerts
You can perform queries over RED metric counters and histograms and visualize the results in your own charts, just as you would do for any other metrics in Wavefront. You can create alerts on trace data by using RED metrics queries in alert conditions.
Find at the per-minute error rate for a specific operation executing on a specific cluster:
rate(ts(tracing.derived.beachshirts.shopping.orderShirts.error.count and cluster=us-east-1)) * 60
Find the per-minute error rate for traces that begin with a specific operation:
rate(ts(tracing.root.derived.beachshirts.shopping.orderShirts.error.count)) * 60
Use a histogram query to return durations at the 75th percentile for an operation in a service. (The predefined charts display only the 95th percentile.)
Wavefront supports 2 alternatives for specifying the RED metric counters and histograms in a query:
- Use the metric name, for example:
- Use the point tags
operationNamethat Wavefront automatically associates with the metric, for example:
ts(tracing.derived.*.invocation.count, application="beachshirts" and service="delivery" and operationName="dispatch")
The point tag technique is useful when the metric name contains string values for
<operationName> that have been modified to comply with the Wavefront metric name format. The point tag value always corresponds exactly to the span tag values.
Trace Sampling and Auto-Derived RED Metrics
If you have instrumented your application with a Wavefront observability SDK, Wavefront derives the RED metrics from 100% of the generated spans, before any sampling is performed. This is true when the sampling is performed by the SDK or when the sampling is performed by a Wavefront proxy. Consequently, the RED metrics provide a highly accurate picture of your application’s behavior. However, if you click through a chart to inspect a particular trace, you might discover that the trace has not actually been ingested in Wavefront. You can consider configuring a less restrictive sampling strategy.
If you have instrumented your application using a 3rd party distributed tracing system, Wavefront derives the RED metrics after sampling has occurred. The Wavefront proxy receives only a subset of the generated spans, and the auto-derived RED metrics will reflect just that subset. See Trace Sampling and RED Metrics from an Integration.