Queries for common alert scenarios

The Wavefront Customer Success team has found that customers use certain alerts frequently. For example, customers want to alert on point rate drops or on between specific times.

Note: For improved legibility, we’ve included line breaks in some of the query examples.

Alert on Point Rate Drop

Define an alert that compares the number of data points reported in the last time window to the number of data points reported in the window before that. If the last time window is X percent less, return 1 to alert.

For example, here’s the query for an alert that fires if the number of processes for app-5 drops by 5% in a 30 minute time window:

mcount(30m, (ts(~sample.process.num, source="app-5"))) <
mcount(30m,lag(30m,(ts(~sample.process.num, source="app-5")))) * 5/100

Alert on Missing Data

Define an alert that compares the number of data points reported in the last time window to the number of data points reported in the window before that. An alert like that is useful for example, if you want to be notified when a service stops sending points.

For example, here’s the query for an alert that fires if app-2 sent bytes more than 30 seconds ago, but stopped sending bytes in the last 30 seconds.

mcount(30s, (ts(~sample.network.bytes.sent, source=app-2))) = 0 and mcount(30s,lag(30s,(ts(~sample.network.bytes.sent, source=app-2)))) != 0

See Also: Alerting on Missing Data

Alert on Exceeding a Threshold

Define an alert that fires when data exceeds a threshold.

For example, here’s the query for an alert that fires if the number of sample processes for app-5 is greater than 100.

(ts(~sample.process.num, source="app-5")) > 100

You can use a query like this with a classic alert or with threshold alert. A threshold alert allows you to set multiple thresholds associated with different severities.

Alert Only Between Specific Times

Define an alert that fires only between specified times.

For example, here’s the query for an alert that fires only on Thursdays between 8 and 5 PST.

(ts(~sample.process.num, source="app-5")) > 100
and between(hour("US/Pacific"),8,5)
and between(weekday("US/Pacific"),4,4)

The following diagram shows the corresponding query in a chart.

alert times

Alert When There Are More than X Points in a Time Window

Define an alert when there are more than a specified number of points in a specified time window.

For example, here’s the query for an alert that fires if the number of sample processes for app-5 is more than 100 in a 30 minute time window.

mcount(30m, (ts(~sample.process.num, source="app-5"))) > 100

Alert on Wavefront Proxy

The data from agents such as collectd, Telegraf, etc. are sent to the Wavefront proxy and the proxy pushes the data to the Wavefront collector service. Make sure that the proxy checks in with Wavefront and that data is being pushed to the collector. You can set up the following alert to monitor the proxy:

mcount(5m,sum(rate(ts(~agent.check-in)), sources))=0 and mcount(1h, sum(rate(ts(~agent.check-in)), sources)) !=0

This query uses the ~agent.check-in metric to verify that the agents are reporting. By applying a second argument to the alert query, you capture any time series that reported at least 1 value in the last hour and that stopped reporting in the last 5 minutes.

Examine the Wavefront System Usage dashboard for your instance for proxy monitoring examples.