Docs » Detectors and Alerts » Using Built-in Alert Conditions » Custom Threshold

Custom Threshold

What this alert condition does

Lets you trigger an alert by comparing one signal to another or by evaluating multiple conditions.

When to use this alert condition

Use Custom Threshold if you want to create a detector that maps to one of the following patterns:

  • You need to be alerted if one signal meets a condition based on the value of another signal
  • You want to specify compound conditions using AND and OR operators, based on the value of one signal
  • You want to specify compound conditions using AND and OR operators, based on the values of multiple signals

Using compound conditions

When you are on the Alert Settings tab, you can click Add another condition to create a compound condition using AND and OR operators. You can add a total of 10 conditions.

When specifying compound conditions, AND conditions are applied before OR conditions. To ensure that the conditions are evaluated as required, you can select options from a condition’s Actions menu to arrange them into the appropriate order. (You can also remove a condition from a condition’s Actions menu.)

../../_images/reorder-condition.png

Note that for a compound condition to trigger an alert, all the values involved in the condition must be non-null.

If you need to build more complex conditions than this alert condition supports, such as “a AND (b OR c) AND d”, or “a AND NOT b”, you can do so by using the SignalFx v2 API to create the detector.

Examples: Single condition, comparing signals

  • You want to be alerted when the number of cache-misses is higher than the number of cache-hits for 1 minute. In this case, you could use cache-misses as the signal to monitor, Above as the option for Alert when, cache-hits as the threshold, and a Duration of 1 minute as the option for Trigger sensitivity.
  • You have 3 signals, each of which measures maximum latency for a single AWS availability zone. You have had problems with one of the zones in the past, and you want to be alerted when that signal is outside the range of the other 2 signals. In this case, you would use the troublesome zone as the signal to monitor, Out of range as the option for Alert when, and the other two signals for the lower and upper thresholds.

Example: Compound conditions, monitoring a single signal

The following example shows how you might build compound conditions while monitoring a single signal.

You have 2 signals (A and B); A measures available memory in prod and B measures available memory in lab. You want to receive an alert:

  • if available memory in prod is lower than available memory in lab, or
  • if available memory in prod is below 50%

In this case, you would monitor a single signal (A, available memory in prod) and then set the following conditions:

  • alert when signal A below B OR
  • signal A below 50

Examples: Compound conditions, monitoring multiple signals

The following examples show how you might build compound conditions while monitoring multiple signals.

  • You have 3 signals (A, B, and C), each of which measures available memory in a particular environment (prod, lab, or dev respectively). You want to receive an alert:

    • if available memory in prod goes below 70%, or
    • if available memory in lab and in dev both go below 70%

    In this case, you would monitor multiple signals and then set the following conditions:

    • alert when signal A is below 70 OR
    • signal B below 70 AND
    • signal C below 70

    Tip

    Remember that, as noted above, AND conditions are always evaluated before OR conditions.

  • In your organization, one group is responsible for monitoring the health of a cluster while another monitors the health of individual nodes. You don’t want to trigger alerts for individual nodes when the cluster itself is unhealthy.

    Assuming A is a metric for cluster health and B is a metric for node health, you could create two detectors:

    • One detector monitors signal A and triggers alerts when A is unhealthy.
    • Another detectors monitors multiple signals, and has the following conditions:
      • alert when A is healthy AND
      • B is unhealthy

Settings

PARAMETER VALUES USAGE NOTES
Alert when Above, Below, Out of Range, Within Range none
Threshold, Lower threshold, Upper threshold Static value (see Static Threshold for acceptable values) or another signal. Static value is designed to be used as an option when you use the Out of Range or Within Range settings. For example, you might want to be alerted when the signal is between the value of another signal and a static value of 80. Using a static value with Above or Below is the same as using the Static Threshold condition.
Trigger sensitivity Immediately, Duration, Percent of duration

Immediately triggers an alert as soon as the threshold is met.

Duration triggers when the signal meets and remains at threshold condition for a specified period, such as 10 minutes. If it is normal for a signal to rise and fall rapidly, using this option reduces flappiness. For an alert to trigger with this option, there can be no missing datapoints during the duration. For more information, see Specifying how quickly to trigger an alert.

Percent of duration triggers based on the number of datapoints that met the threshold during the specified duration. For more information, see Specifying how quickly to trigger an alert.

Duration Integer >= 1, followed by time indicator (s, m, h, d, w), e.g. 30s, 10m, 2h, 5d, 1w The amount of time the signal must meet the threshold condition. Longer time periods result in lower sensitivity and potentially fewer alerts.
Percent of duration Percentage: Integer between 1 and 100; Duration: Integer >= 1, followed by time indicator (s, m, h, d, w), e.g. 30s, 10m, 2h, 5d, 1w The percentage of times the threshold was met during the specified duration.

Specifying how quickly to trigger an alert

As you might expect, choosing Immediately for Trigger Sensitivity means that an alert will be triggered as soon as the signal meets the threshold. This option is the most sensitive (may trigger the most alerts) of the three trigger sensitivity options.

Depending on the nature of your signal, triggering alerts immediately can lead to flappiness. In these cases, you can choose one of the other options, Duration or Percent of duration.

The Duration option triggers when the signal meets and remains at threshold condition for a specified period, such as 10 minutes. Therefore, using this option is less sensitive (may trigger fewer alerts) than the Immediately option. If you use this option, an alert will not be triggered if any datapoints are delayed or do not arrive at all during that time range, even if all the datapoints that are received do meet the threshold. (For more information about delayed or missing datapoints, see Handling delayed or missing datapoints.)

If you want an option that could trigger even if some datapoints do not arrive on time, use Percent of duration (with a percentage below 100).

The Percent of duration option triggers alerts based on the number of datapoints that met the threshold during the window, compared to how many datapoints were expected to arrive. Because this option triggers an alert based on the percentage of datapoints that met the threshold, it can sometimes trigger an alert even if some datapoints didn’t arrive on time. Therefore, using this option with a percentage below 100 is more sensitive (may trigger more alerts) than the Duration option.

The following examples illustrate how alerts would be triggered in various situations.

Example 1

  • Option you specify for Trigger Sensitivity: Duration = 3 minutes

  • Resolution of the signal: 5 seconds

  • Number of datapoints expected in 3 minutes: 12 per minute * 3 minutes (36)

  • Number of anomalous datapoints (how many times the threshold must be met) to trigger alert: 36

    Total datapoints expected Total datapoints received Anomalous datapoints required Anomalous datapoints received Alert is triggered?
    36 36 36 36 Yes
    36 36 36 35 or fewer No
    36 35 36 35 or fewer No

Example 2

  • Option you specify for Trigger Sensitivity: Percent of Duration = 75% of 3 minutes

  • Resolution of the signal: 5 seconds

  • Number of datapoints expected in 3 minutes: 12 per minute * 3 minutes (36)

  • Number of anomalous datapoints (how many times the threshold must be met) to trigger alert: 75% of 36 (27)

    Total datapoints expected Total datapoints received Anomalous datapoints required Anomalous datapoints received Alert is triggered?
    36 36 27 27-36 Yes
    36 30 27 27-30 Yes
    36 30 27 26 or fewer No

    Note that in the last example above, even if 26 anomalous datapoints arrive, and 26/30 is greater than the 75% you specified, the required number of anomalous datapoints (27) did not arrive. Therefore, the alert will not be triggered. The percent you specify represents percent of expected datapoints, not percent of received datapoints.