Legacy monitoring systems are limited to simple, threshold-based alerts. With Wavefront, you can create smart alerts that dynamically filter noise and capture true anomalies. The end results is fewer false alarms and faster remediation when real issues occur.
How Alerts Work
An alert defines:
- The conditions under which metric values indicate a system problem and
- One or more targets to notify when the condition evaluates to true or false
- For a specified period of time.
An alert fires when a metric reaches a value that indicates a problem.
Wavefront can display actual firings or hypothetical alert-generated events using backtesting. Backtesting enables you to fine tune new or existing alert conditions before you save them.
Backtesting does not always exactly match the actual alert firing. For example, if data comes in late, backtest events won’t match the actual alert firing. And even if data are meeting the alert condition for the “condition is true for x mins” amount of time, the alert itself might not fire because the alert check, determined by the alert check interval, happens too soon or too late. For both cases, backtesting shows the alert as firing while the actual alert might not show as firing.
Creating an Alert
To create an alert:
- Do one of the following:
- Alerts browser - Select Alerts and click the Create Alert button located at the top of the filter bar.
- Chart - Hover over a query field and click the Create Alert link below the query field.
- Fill in the alert properties.
Property Description Events Display Whether to display actual or hypothetical alert firing event icons on the preview chart.
- Actual Firings (existing alerts only) - Displays past alert-generated event icons on the chart. You will see how often the alert actually fired within the given chart time window.
- Backtesting - Displays hypothetical alert-generated events icons on the chart. You will see how often an alert hypothetically would fire within the given chart time window based on the conditional threshold and the Alert fires field. See Backtesting above.
Name Name of the alert. The name must contain 1-255 characters. Pick a name that makes it easy to identify the alert's purpose. Condition A conditional ts() expression that defines the threshold for the alert. You can use any valid Wavefront Query Language construct in the expression. The expression coupled with the Alert fires setting determines when the alert fires.
Note: Setting Alert resolves to a value that is lower than Alert fires can result in multiple resolve-fire cycles under certain circumstances.
- Alert fires - Length of time during which the Condition expression must be true before the alert fires. The minimum number of minutes is 1. For example, if you enter 5, the alerting engine reviews the value of the Condition during the last 5 minute window to determine if the alert should fire or not.
- Alert resolves - Length of time during which the Condition expression must be false before the alert switches to resolved. The minimum number of minutes is 1. If you don't specify a time, defaults to the Alert fires setting.
Display Expression Optional. The query that is sent to targets when notified of alert state changes. Use this field to show a more helpful query, for example, the underlying time series. If not set, the query sent is the expression in the Condition field. Severity How important the alert is. In decreasing importance: SEVERE, WARN, SMOKE, and INFO. Targets Targets to notify when the alert changes state. For example, notifications are sent when alert state changes from FIRING to CHECKING, and when an alert is snoozed. A list of: ten different email addresses, pager services such as PagerDuty and VictorOps, communication channels such as Slack and HipChat, and webhooks separated by commas. See Using Alert Targets for details. Additional Information Any additional information, such as a link to a run book. Tags Tags assigned to the alert. You can enter existing alert tags or create new alert tags. See [Organizing with Tags](tags_overview.html). Property Description Checking Frequency Number of minutes between checking whether Condition is true. Minimum and default is 1. When an alert is in the INVALID state, it is checked approximately every 15 minutes, instead of the specified checking frequency. Resend Notifications Whether to resend notification of a firing alert. If enabled, you can specify the number of minutes to wait before resending the notification. Metrics Click the Obsolete Metrics check box to include metrics that did not report for 4 weeks or more. Customers who use queries that aggregate data in longer timeframes sometimes want to include those older metrics.
- Click Save.
Viewing Alerts and Alert History
To view alerts, click the Alerts button or select Browse > Alerts. A list of alerts displays. Here’s an example that shows when the alert fires that is described in Tutorial: Getting Started:
To view alert details, click the icon in the State column. A chart displays with two queries:
- <Alert name> - the alert condition.
- Past Firings - an events() query that shows past firings of the alert.
For example, for the alert shown above, the chart displays:
The Firings column shows how many times an alert changed from non-firing to firing in the last day, week, and month.
Alert history shows the changes that have been made to an alert over time. To access the alert history, select > Versions from the menu located to the right of an alert on the Alerts page. Alert history shows:
- Which user made the changes
- The date and time the changes were made,
- A description of the changes. You can revert back to or clone a past alert version. Alert history was implemented in Q4 of 2015. Even if the alert was created before that time, you won’t see history before Q4 of 2015.
Alert Notifications and Alerts
When an alert changes state, a notification containing alert information and a link to a chart is sent to the alert targets.
- You can add simple targets (email and PagerDuty) directly in the alert’s Targets field.
- You can explicitly create an alert target and add that target to your alert.
For example, if you have configured an email address as the alert target, you receive an email like the following whenever the alert fires, adds or removes an affected source, resolves, or is updated:
When you click the link in the notification, you see the following queries:
- <Alert name> - the alert condition.
- Alert Firings - an events() query that shows events of type alert for the alert. These are events when alerts are open and are resolved.
- Alert Details - an events() query that shows events of type alert-detail for the alert. These are events when sources are failing or recovered.
- Alert Data - a query for alert metrics.