Docs » Integrations Guide » Integrations Reference » Aggregation

Aggregation

DESCRIPTION

Use the aggregation plugin to aggregate metrics. This plugin provides aggregates, by default average and summary.

REQUIREMENTS AND DEPENDENCIES

Software Version
collectd 1.3 or later
aggregation plugin collectd 5.2 or later

CONFIGURATION

By default the CPU plugin will assign each CPU a number and use that as the plugin_instance. This gives a very detailed report of CPU usage, but it is not generally useful. Use the followingconfiguration to aggregate CPU metrics.

METRICS

Below is a list of all metrics.

Metric Name Brief Type
cpu.idle CPU time spent not in any other state (in jiffies) cumulative_counter
cpu.interrupt CPU time spent servicing hardware interrupts (in jiffies) cumulative_counter
cpu.nice CPU time spent in userspace running ‘nice’-ed processes (in jiffies) cumulative_counter
cpu.softirq CPU time spent servicing software interrupts (in jiffies) cumulative_counter
cpu.steal CPU time spent by a hypervisor handling requests in other virtual machines (in jiffies) cumulative_counter
cpu.system CPU time spent running in the kernel (in jiffies) cumulative_counter
cpu.user CPU time spent running in userspace (in jiffies) cumulative_counter
cpu.wait CPU time spent idle while waiting for an I/O operation to complete (in jiffies) cumulative_counter

cpu.idle

cumulative_counter

CPU time spent not in any other state.

In order to get a percentage this value must be compared against the sum of all CPU states.

cpu.interrupt

cumulative_counter

CPU time spent while servicing hardware interrupts

A hardware interrupt happens at the physical layer. When this occurs, the CPU will stop whatever else it is doing and service the interrupt. This metric measures how many jiffies were spent handling these interrupts.

In order to get a percentage this value must be compared against the sum of all CPU states.

A sustained high value for this metric may be caused by:

  • Faulty hardware such as a broken peripheral.

cpu.nice

cumulative_counter

CPU time spent in userspace running ‘nice’-ed processes.

In order to get a percentage this value must be compared against the sum of all CPU states.

A sustained high value for this metric may be caused by:

  • The server not having enough CPU capacity for a process
  • A programming error which causes a process to use an unexpected amount of CPU

cpu.softirq

cumulative_counter

CPU time spent while servicing software interrupts

Unlike a hardware interrupt, a software interrupt happens at the sofware layer. Usually it is a userspace program requesting a service of the kernel. This metric measures how many jiffies were spent by the CPU handling these interrupts.

In order to get a percentage this value must be compared against the sum of all CPU states.

A sustained high value for this metric may be caused by:

  • A programming error which causes a process to unexpectedly request too many services from the kernel.

cpu.steal

cumulative_counter

CPU time spent waiting for a hypervisor to service requests from other virtual machines

This metric is only present on virtual machines. This metric records how much time this virtual machine had to wait to have the hypervisor kernel service a request.

In order to get a percentage this value must be compared against the sum of all CPU states.

A sustained high value for this metric may be caused by:

  • Another VM on the same hypervisor using too many resources
  • An underpowered hypervisor

cpu.system

cumulative_counter

CPU time spent running in the kernel

This value reflects how often processes are calling into the kernel for services (e.g to log to the console).

In order to get a percentage this value must be compared against the sum of all CPU states.

A sustained high value for this metric may be caused by:

  • A process that needs to be re-written to use kernel resources more efficiently
  • A userspace driver that is broken

cpu.user

cumulative_counter

CPU time spent running in userspace

In order to get a percentage this value must be compared against the sum of all CPU states.

If this value is high:

  • A process requires more CPU to run than is available on the server
  • There is an application programming error which is causing the CPU to be used unexpectedly

cpu.wait

cumulative_counter

Amount of total CPU time spent idle while waiting for an I/O operation to complete

In order to get a percentage this value must be compared against the sum of all CPU states.

A high value for a sustained period may be caused by:

  • A slow hardware device that is taking too long to service requests
  • Too many requests being sent to an I/O device