Legacy monitoring systems are limited to simple, threshold-based alerts. With Wavefront, you can create smart alerts that dynamically filter noise and capture true anomalies. The end results is fewer false alarms and faster remediation when real issues occur.
How Alerts Work
An alert defines:
- The condition under which metric values indicate a system problem.
- One or more targets to notify when the condition evaluates to true or false for a specified period of time.
An alert fires when a metric reaches a value that indicates a problem.
View this video for an overview: Monitoring Your Data With Alerts
Wavefront can display actual firings or hypothetical alert-generated events using backtesting. Backtesting enables you to fine tune new or existing alert conditions before you save them.
Backtesting does not always exactly match the actual alert firing. For example, if data comes in late, backtest events won’t match the actual alert firing. And even if data are meeting the alert condition for the “condition is true for x mins” amount of time, the alert itself might not fire because the alert check, determined by the alert check interval, happens too soon or too late. For both cases, backtesting shows the alert as firing while the actual alert might not show as firing.
Creating an Alert
To create an alert:
- Do one of the following:
- Alerts browser - Select Alerts and click the Create Alert button located at the top of the filter bar.
- Chart - Hover over a query field and click the Create Alert link below the query field.
- Fill in the following required and recommended alert properties.
Property Description Name Name of the alert. The name must contain 1-255 characters. Pick a name that makes it easy to identify the alert's purpose. Condition A conditional ts() expression that defines the threshold for the alert. The condition expression can include any valid Wavefront Query Language construct. The condition expression coupled with the Alert fires setting determines when the alert fires.
- Alert fires - Length of time (in minutes) during which the Condition expression must be true before the alert fires. Minimum is 1. For example, if you enter 5, the alerting engine reviews the value of the condition during the last 5 minute window to determine whether the alert should fire.
- Alert resolves - Length of time (in minutes) during which the Condition expression must be false before the alert switches to resolved. Minimum is 1. Omit this setting to use the Alert fires setting. Pick a value that is greater than or equal to the Alert fires to avoid potential chains of resolve-fire cycles.
Display Expression Recommended. A ts() expression that returns the data you want to inspect when the alert fires. The display expression can include any valid Wavefront Query Language construct, and typically captures the underlying time series being tested by the condition expression. The results of the display expression are:
- Shown in the Events Display preview chart on the page for creating or editing the alert.
- Shown in any chart image that is included in a notification triggered by the alert.
- Shown in the interactive chart you can visit from a notification triggered by the alert.
- Used as the basis for any statistics that you might include in a custom notification triggered by the alert.
Severity How important the alert is. In decreasing importance: SEVERE, WARN, SMOKE, and INFO. Target List Targets to notify when the alert changes state, for example, from CHECKING to FIRING, or when the alert is snoozed. You can specify up to ten different targets across the following types. Use commas to separate targets of the same type.
- Email - Valid email addresses. Alert notifications are sent to these addresses in response to a default set of triggering events, and contain default HTML-formatted content.
- PagerDuty Key - PagerDuty keys obtained by following the steps for the PagerDuty integration. Alert notifications that use these keys are sent in response to a default set of triggering events, and contain default content.
- Alert Target - Names of custom alert targets that you have previously created to:
- Configure webhook notifications for pager services and communication channels. Follow the steps for the VictorOps integration, Slack integration, or HipChat integration for notifications on these popular messaging platforms.
- Configure email or PagerDuty notifications with nondefault content or triggers.
- Optionally fill in the following additional alert properties.
Property Description Events Display Whether to display actual or hypothetical alert firing event icons on the preview chart.
- Actual Firings (existing alerts only) - Displays past alert-generated event icons on the chart. You will see how often the alert actually fired within the given chart time window.
- Backtesting - Displays hypothetical alert-generated events icons on the chart. You will see how often an alert hypothetically would fire within the given chart time window based on the conditional threshold and the Alert fires field. See Backtesting above.
Additional Information Any additional information, such as a link to a run book. Tags Tags assigned to the alert. You can enter existing alert tags or create new alert tags. See Organizing with Tags.
- Optionally click the Advanced link to configure the following alert properties:
Property Description Checking Frequency Number of minutes between checking whether Condition is true. Minimum and default is 1. When an alert is in the INVALID state, it is checked approximately every 15 minutes, instead of the specified checking frequency. Resend Notifications Whether to resend notification of a firing alert. If enabled, you can specify the number of minutes to wait before resending the notification. Metrics Whether to include obsolete metrics. If enabled, the alert considers metrics that have not reports for 4 weeks or more. Customers who use queries that aggregate data in longer timeframes sometimes want to include those older metrics.
- Click Save.
Watch this video to see the process: Creating an Alert
Viewing Alerts and Alert History
To view alerts, click the Alerts button or select Browse > Alerts. A list of alerts displays. Here’s an example that shows when the alert fires that is described in Tutorial: Getting Started:
To view alert details, click the chart icon in the State column. A chart displays with two queries:
- <Alert name> - the alert condition.
- Past Firings - an events() query that shows past firings of the alert.
For example, for the alert shown above, the chart displays:
The Firings column shows how many times an alert changed from non-firing to firing in the last day, week, and month.
Alert history shows the changes that have been made to an alert over time. To access the alert history, click he three dots to the left of the alert on the Alerts page and click Versions. Alert history shows:
- Which user made the changes.
- The date and time the changes were made.
- A description of the changes. You can revert back to or clone a past alert version.
Editing an Alert
You can change an alert at any time.
- Click the Alerts button or select Browse > Alerts to display the Alerts page.
- Click the name of the alert you want to change to display the Edit Alert page.
- Update the properties you want to change, and click Save.
An alert reports state changes by sending notifications to one or more alert targets. Each notification contains information extracted from the alert about its state change.
The timing of an alert notification depends on the alert target:
- For simple targets (email addresses and PagerDuty keys added directly in the alert’s Target List), a notification is sent whenever the alert is firing, updated, resolved, snoozed or in a maintenance window.
- For custom alert targets, a notification is sent in response to each triggering event that is specified for the target.
Sample Alert Notification
If you have specified your email address as the alert target, you receive an email like the following whenever the alert fires:
Chart Images in Alert Notifications
When an alert starts firing or is updated, the resulting alert notification can include an image of a chart showing data at the time the alert was triggered. The sample email notification above includes the following chart image:
Chart images show the results of an alert’s display expression. If you have set the alert’s Display Expression field, the chart image provides a snapshot of the time series being tested by the alert.
A chart image is a static snapshot that captures the state of the data at the time the alert was triggered. Such a snapshot can be helpful for diagnosing a possible misfiring alert, because the chart image can show you the exact state of the data that caused the alert to fire. (In contrast, an interactive chart viewed through the notification shows the data at the time you bring up the chart, which might include data that was backfilled after a delay.)
For performance reasons, a chart image is included only if the alert’s conditional query takes a minute or less to return. The chart image can take a few seconds to create, so you might briefly see a placeholder image in the notification.
Chart images are automatically included in notifications for:
- Simple alert targets (email addresses and PagerDuty keys that are added directly in the alert’s target list).
- Custom alert targets for PagerDuty notifications.
- (Version 2018-26.x and later) Predefined templates for custom HTML email targets and for Slack targets.
You can optionally include chart images in notifications for custom alert targets for other messaging platforms.
Note If you created a custom alert target before 2018-26.x and you want to include chart images in notifications to that target, you must edit the alert target’s template. See Adding Chart Images to Older Custom Alert Targets for sample setup instructions for updating an email alert target.
(Version 2018-26.x and later) You exclude chart images from notifications to custom HTML email or Slack targets by removing the corresponding variable from their templates. You cannot remove chart images from custom PagerDuty alert targets.
Interactive Charts Linked by Alert Notifications
An alert notification includes a URL that links to an interactive chart showing data at the time the alert was triggered. The sample email notification above displays the URL as a View Alert button that you can click to see the following interactive chart:
The interactive chart viewed through an alert notification shows the results of the alert’s display expression. If you have set the alert’s Display Expression field, the interactive chart shows the time series being tested by the alert. Depending on the state change that triggered the alert, the interactive chart can display additional queries for alert events and alert metrics:
- <Alert name> - The display expression if one was specified. Otherwise, the condition expression.
- Alert Condition - The condition expression, if the display expression is shown.
- Alert Firings - An events() query that shows events of type
alertfor the alert. These system events occur whenever the alert is opened. The query shows both the current firing (an ongoing event) and any past firings (ended events).
- Alert Details - An events() query that shows events of type
alert-detailfor the alert. These system events occur whenever the alert is updated (continues firing while an individual time series changes from recovered to failing, or from failing to recovered).
- Alert Data - A query for alert metrics. These metrics are shown when the alert is open or updated.
Interactive charts enable you to investigate your data by performing additional queries, changing the time window, and so on.
Note that interactive charts always show the current state of your data as of the time you bring up the chart, which could be somewhat later than the event that triggered the alert. Consequently, although the interactive chart is set to a custom date showing the time window in which the alert was triggered, it could be backfilled with data values that were reported during that time window, but were not ingested until later. The presence of delayed and then backfilled data could obscure the reason why the alert fired. If you suspect a misfiring alert, you can inspect a chart image included in the notification.