Docs » Concepts » About Detectors and Alerts

About Detectors and Alerts

SignalFx’s metadata-rich data model and flexible analytics language can be used not just to create charts and dashboards, but also to enable sophisticated alerting using detectors.

Detectors

SignalFx detectors evaluate metric time series - which can themselves be the output of some analytics functions, not just the raw data being sent in - against a specified condition, and optionally for a duration.

Alerts

When a condition has been met, detectors generate events of a specified level of severity. In the SignalFx application, events generated when detector conditions are met are referred to as alerts. Among other things, alerts can be used to trigger notifications in incident management platforms (e.g. PagerDuty) or messaging systems (e.g. Slack or email).

Using metadata in detectors

The metadata associated with metric time series can be used to make detector definition simpler, more compact and resilient to change.

For example, if you have a group of 30 virtual machines that are used to provide a clustered service like Kafka, you will normally have included the dimension service:kafka with all of the metrics coming from those virtual machines.

In this case, if you want to track whether cpu.utilization remains below 80 for each of those virtual machines, you can create a single detector that queries for cpu.utilization metrics that include the service:kafka dimension and evaluates them against the threshold of “80”. This detector will trigger individual alerts for each virtual machine whose cpu.utilization exceeds the threshold, as if you had 30 separate detectors, but you do not need to create 30 individual detectors - just the one.

In addition, if the population changes - say, because the cluster has grown to 40 virtual machines - you do not need to make any changes to your detector. Provided you have included the service:kafka dimension for the newly added virtual machines, the existing detector’s query will find them and include them in the threshold evaluation.

Dynamic threshold conditions

Detector conditions can be static values, e.g. CPU utilization percentage should not exceed a value of 80. However, static values are often the root cause for excessive or noisy alerting, because they are too simplistic. The value that is appropriate for one service or for a particular time of day may not be appropriate for another service or a different time of day. This is especially true if your applications or services make use of elastic infrastructure, like Docker containers or EC2 autoscaling.

To account for this fundamental problem, SignalFx lets you define conditions that are dynamically generated. Dynamic thresholds are the result of an ongoing computation on streaming data, rather than a simple constant.

For example, if your metric exhibits cyclical behavior, such that the best basis of comparison is the same metric but from a week ago, then you can define a threshold that is a 1-week timeshifted version of the same metric. Or, if the relevant basis of comparison is the behavior of a population like a clustered service, then you can define your threshold as a value that reflects that behavior - for example, the 90th percentile for the metric across the entire cluster over a moving 15-minute window.

For more information

For information about creating and using detectors, see Detectors and Alerts.