Learn how alerts work, and how to create and examine them.

With Wavefront, you can create smart alerts that dynamically filter noise and capture true anomalies. You can even view an image of the chart in the alert notification. The end results is fewer false alarms and faster remediation when real issues occur.

How Alerts Work

An alert defines:

  • The condition under which metric values indicate a system problem.
  • One or more targets to notify when the condition evaluates to true or false for a specified period of time.

An alert fires when a metric reaches a value that indicates a problem.

You express alert conditions using Wavefront Query Language expressions.

In this video, Clement explains how alerts work:

In this video, Jason explain alerts while he’s showing them in the UI:

Creating an Alert

To create an alert:

  1. Do one of the following:
    • Alerts browser - Select Alerts and click the Create Alert button located at the top of the filter bar.
    • Chart - Hover over a query field and click the Create Alert link below the query field.
  2. Fill in the following required and recommended alert properties.
    PropertyDescription
    Name Name of the alert. The name must contain 1-255 characters. Pick a name that makes it easy to identify the alert's purpose.
    Condition A conditional ts() expression that defines the threshold for the alert. The condition expression can include any valid Wavefront Query Language construct. The condition expression coupled with the Alert fires setting determines when the alert fires.
    • Alert fires - Length of time (in minutes) during which the Condition expression must be true before the alert fires. Minimum is 1. For example, if you enter 5, the alerting engine reviews the value of the condition during the last 5 minute window to determine whether the alert should fire.
    • Alert resolves - Length of time (in minutes) during which the Condition expression must be false before the alert switches to resolved. Minimum is 1. Omit this setting to use the Alert fires setting. Pick a value that is greater than or equal to the Alert fires to avoid potential chains of resolve-fire cycles.
    For details and examples, see Alert States and Lifecycle.
    Display Expression Recommended. A ts() expression that returns the data you want to inspect when the alert fires. The display expression can include any valid Wavefront Query Language construct, and typically captures the underlying time series being tested by the condition expression. The results of the display expression are:
    • Shown in the Events Display preview chart on the page for creating or editing the alert.
    • Shown in any chart image that is included in a notification triggered by the alert.
    • Shown in the interactive chart you can visit from a notification triggered by the alert.
    • Used as the basis for any statistics that you might include in a custom notification triggered by the alert.
    If you leave this field blank, the condition expression is used. Note, however, that the values returned by the condition expression are either 0 or 1, which might not provide the information you want to inspect when the alert changes state.
    Severity How important the alert is. In decreasing importance: SEVERE, WARN, SMOKE, and INFO.
    Target List Targets to notify when the alert changes state, for example, from CHECKING to FIRING, or when the alert is snoozed. You can specify up to ten different targets across the following types. Use commas to separate targets of the same type.
    • Email - Valid email addresses. Alert notifications are sent to these addresses in response to a default set of triggering events, and contain default HTML-formatted content.
    • PagerDuty Key - PagerDuty keys obtained by following the steps for the PagerDuty integration. Alert notifications that use these keys are sent in response to a default set of triggering events, and contain default content.
    • Alert Target - Names of custom alert targets that you have previously created to:
      • Configure webhook notifications for pager services and communication channels. Follow the steps for the VictorOps integration, Slack integration, or HipChat integration for notifications on these popular messaging platforms.
      • Configure email or PagerDuty notifications with nondefault content or triggers.
  3. Optionally fill in the following additional alert properties.
    PropertyDescription
    Events Display Whether to display actual or hypothetical alert firing event icons on the preview chart.
    • Actual Firings (existing alerts only) - Displays past alert-generated event icons on the chart. You will see how often the alert actually fired within the given chart time window.
    • Backtesting - Displays hypothetical alert-generated events icons on the chart. You will see how often an alert hypothetically would fire within the given chart time window based on the conditional threshold and the Alert fires field. See Backtesting above.
    Additional Information Any additional information, such as a link to a run book.
    Tags Tags assigned to the alert. You can enter existing alert tags or create new alert tags. See Organizing with Tags.
  4. Optionally click the Advanced link to configure the following alert properties:
    PropertyDescription
    Checking Frequency Number of minutes between checking whether Condition is true. Minimum and default is 1. When an alert is in the INVALID state, it is checked approximately every 15 minutes, instead of the specified checking frequency.
    Resend Notifications Whether to resend notification of a firing alert. If enabled, you can specify the number of minutes to wait before resending the notification.
    Metrics Whether to include obsolete metrics. If enabled, the alert considers metrics that have not reports for 4 weeks or more. Customers who use queries that aggregate data in longer timeframes sometimes want to include those older metrics.
  5. Click Save.

Watch this video to watch Jason create an alert:

Cloning an Alert

If you want to make copies of an existing alert, then change the copy slightly, you can clone the alert.

  1. Select Browse > Alerts to display the Alerts page.
  2. Click the 3 buttons to the left of the alert and click Clone.

    Alert cloning

  3. Update the properties you want to change, and click Save.

Viewing Alerts and Alert History

To view alerts, click the Alerts button or select Browse > Alerts. A list of alerts displays. Here’s an example that shows when the alert fires that is described in Tutorial: Getting Started:

Alert firing

To view alert details, click the chart icon in the State column. A chart displays with two queries:

  • <Alert name> - the alert condition.
  • Past Firings - an events() query that shows past firings of the alert.

For example, for the alert shown above, the chart displays:

Alert queries

The Firings column shows how many times an alert changed from non-firing to firing in the last day, week, and month.

Alert history shows the changes that have been made to an alert over time. To access the alert history, click he three dots to the left of the alert on the Alerts page and click Versions. Alert history shows:

  • Which user made the changes.
  • The date and time the changes were made.
  • A description of the changes. You can revert back to or clone a past alert version.

Editing an Alert

You can change an alert at any time.

  1. Click the Alerts button or select Browse > Alerts to display the Alerts page.
  2. Click the name of the alert you want to change to display the Edit Alert page.
  3. Update the properties you want to change, and click Save.

Alert Events

As alerts fire, update, and resolve, events are created in Wavefront. You can optionally display those events as icons on a chart’s X-axis:

event icons

Backtesting

Wavefront can display actual firings or hypothetical alert-generated events using backtesting. Backtesting enables you to fine tune new or existing alert conditions before you save them.

Backtesting does not always exactly match the actual alert firing. For example, if data comes in late, backtest events won’t match the actual alert firing. And even if data are meeting the alert condition for the “condition is true for x mins” amount of time, the alert itself might not fire because the alert check, determined by the alert check interval, happens too soon or too late. For both cases, backtesting shows the alert as firing while the actual alert might not show as firing.