Docs » Integrations Guide » Integrations Reference » Docker

https://github.com/signalfx/integrations/blob/master/collectd-docker/img/integrations_docker.png Docker πŸ”—

DESCRIPTION πŸ”—

This integration primarily consists of the Smart Agent monitor docker-container-stats. Below is an overview of that monitor.

Smart Agent Monitor πŸ”—

This monitor reads container stats from a Docker API server. It is meant as a metric-compatible replacement of our docker-collectd plugin, which scales rather poorly against a large number of containers.

This currently does not support CPU share/quota metrics.

For more information on block IO metrics, see the Linux cgroup block io controller doc.

If you are running the agent directly on a host (outside of a container itself) and you are using the default Docker UNIX socket URL, you will probably need to add the signalfx-agent user to the docker group in order to have permission to access the Docker API via the socket.

Requires Docker API version 1.22+.

INSTALLATION πŸ”—

This integration is part of the SignalFx Smart Agent as the docker-container-stats monitor. You should first deploy the Smart Agent to the same host as the service you want to monitor, and then continue with the configuration instructions below.

CONFIGURATION πŸ”—

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: docker-container-stats
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
enableExtraBlockIOMetrics no bool Whether it will send all extra block IO metrics as well. (default: false)
enableExtraCPUMetrics no bool Whether it will send all extra CPU metrics as well. (default: false)
enableExtraMemoryMetrics no bool Whether it will send all extra memory metrics as well. (default: false)
enableExtraNetworkMetrics no bool Whether it will send all extra network metrics as well. (default: false)
dockerURL no string The URL of the docker server (default: unix:///var/run/docker.sock)
timeoutSeconds no integer The maximum amount of time to wait for docker API requests (default: 5)
labelsToDimensions no map of strings A mapping of container label names to dimension names. The corresponding label values will become the dimension value for the mapped name. E.g. io.kubernetes.container.name: container_spec_name would result in a dimension called container_spec_name that has the value of the io.kubernetes.container.name container label.
envToDimensions no map of strings A mapping of container environment variable names to dimension names. The corresponding env var values become the dimension values on the emitted metrics. E.g. APP_VERSION: version would result in datapoints having a dimension called version whose value is the value of the APP_VERSION envvar configured for that particular container, if present.
excludedImages no list of strings A list of filters of images to exclude. Supports literals, globs, and regex.

USAGE πŸ”—

Sample of built-in dashboard in SignalFx:

../../_images/dashboard_docker.png

METRICS πŸ”—

Metric Name Description Type
blkio.io_merged_recursive.async cumulative
blkio.io_merged_recursive.read cumulative
blkio.io_merged_recursive.sync cumulative
blkio.io_merged_recursive.total cumulative
blkio.io_merged_recursive.write cumulative
blkio.io_queue_recursive.async cumulative
blkio.io_queue_recursive.read cumulative
blkio.io_queue_recursive.sync cumulative
blkio.io_queue_recursive.total cumulative
blkio.io_queue_recursive.write cumulative
blkio.io_service_bytes_recursive.async Volume, in bytes, of asynchronous block I/O cumulative
blkio.io_service_bytes_recursive.read Volume, in bytes, of reads from block devices cumulative
blkio.io_service_bytes_recursive.sync Volume, in bytes, of synchronous block I/O cumulative
blkio.io_service_bytes_recursive.total Total volume, in bytes, of all block I/O cumulative
blkio.io_service_bytes_recursive.write Volume, in bytes, of writes to block devices cumulative
blkio.io_service_time_recursive.async cumulative
blkio.io_service_time_recursive.read cumulative
blkio.io_service_time_recursive.sync cumulative
blkio.io_service_time_recursive.total cumulative
blkio.io_service_time_recursive.write cumulative
blkio.io_serviced_recursive.async Number of asynchronous block I/O requests cumulative
blkio.io_serviced_recursive.read Number of reads requests from block devices cumulative
blkio.io_serviced_recursive.sync Number of synchronous block I/O requests cumulative
blkio.io_serviced_recursive.total Total number of block I/O requests cumulative
blkio.io_serviced_recursive.write Number of write requests to block devices cumulative
blkio.io_time_recursive.async cumulative
blkio.io_time_recursive.read cumulative
blkio.io_time_recursive.sync cumulative
blkio.io_time_recursive.total cumulative
blkio.io_time_recursive.write cumulative
blkio.io_wait_time_recursive.async cumulative
blkio.io_wait_time_recursive.read cumulative
blkio.io_wait_time_recursive.sync cumulative
blkio.io_wait_time_recursive.total cumulative
blkio.io_wait_time_recursive.write cumulative
cpu.percent Percentage of host CPU resources used by the container gauge
cpu.percpu.usage Jiffies of CPU time spent by the container, per CPU core cumulative
cpu.throttling_data.periods Number of periods cumulative
cpu.throttling_data.throttled_periods Number of periods throttled cumulative
cpu.throttling_data.throttled_time Throttling time in nano seconds cumulative
cpu.usage.kernelmode Jiffies of CPU time spent in kernel mode by the container cumulative
cpu.usage.system Jiffies of CPU time used by the system cumulative
cpu.usage.total Jiffies of CPU time used by the container cumulative
cpu.usage.usermode Jiffies of CPU time spent in user mode by the container cumulative
memory.percent Percent of memory (0-100) used by the container relative to its limit (excludes page cache usage) gauge
memory.stats.active_anon Amount of memory that has been identified as active by the kernel gauge
memory.stats.active_file Amount of active file cache memory gauge
memory.stats.cache The amount of memory used by the processes of this control group that can be associated with a block on a block device gauge
memory.stats.dirty The amount of memory waiting to get written to disk gauge
memory.stats.hierarchical_memory_limit The memory limit in place by the hierarchy cgroup gauge
memory.stats.hierarchical_memsw_limit The memory+swap limit in place by the hierarchy cgroup gauge
memory.stats.inactive_anon Amount of memory that has been identified as inactive by the kernel gauge
memory.stats.inactive_file Amount of inactive file cache memory gauge
memory.stats.mapped_file Indicates the amount of memory mapped by the processes in the control group gauge
memory.stats.pgfault Number of times that a process of the cgroup triggered a page fault cumulative
memory.stats.pgmajfault Number of times that a process of the cgroup triggered a major page fault cumulative
memory.stats.pgpgin Number of charging events to the memory cgroup cumulative
memory.stats.pgpgout Number of uncharging events to the memory cgroup cumulative
memory.stats.rss The amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps gauge
memory.stats.rss_huge Amount of memory due to anonymous transparent hugepages gauge
memory.stats.shmem Amount of Shared Memory used by the container, in bytes gauge
memory.stats.swap Bytes of swap memory used by container gauge
memory.stats.total_active_anon Total amount of memory that has been identified as active by the kernel gauge
memory.stats.total_active_file Total amount of active file cache memory gauge
memory.stats.total_cache Total amount of memory used by the processes of this control group that can be associated with a block on a block device gauge
memory.stats.total_dirty Total amount of memory waiting to get written to disk gauge
memory.stats.total_inactive_anon Total amount of memory that has been identified as inactive by the kernel gauge
memory.stats.total_inactive_file Total amount of inactive file cache memory gauge
memory.stats.total_mapped_file Total amount of memory mapped by the processes in the control group gauge
memory.stats.total_pgfault Total number of page faults cumulative
memory.stats.total_pgmajfault Total number of major page faults cumulative
memory.stats.total_pgpgin Total number of charging events cumulative
memory.stats.total_pgpgout Total number of uncharging events cumulative
memory.stats.total_rss Total amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps gauge
memory.stats.total_rss_huge Total amount of memory due to anonymous transparent hugepages gauge
memory.stats.total_shmem Available amount of Shared Memory used by the container, in bytes gauge
memory.stats.total_swap Total amount of swap memory available to this container gauge
memory.stats.total_unevictable Total amount of memory that can not be reclaimed gauge
memory.stats.total_writeback Total amount of memory from file/anon cache that are queued for syncing to the disk gauge
memory.stats.unevictable The amount of memory that cannot be reclaimed gauge
memory.stats.writeback The amount of memory from file/anon cache that are queued for syncing to the disk gauge
memory.usage.limit Memory usage limit of the container, in bytes gauge
memory.usage.max Maximum measured memory usage of the container, in bytes gauge
memory.usage.total Bytes of memory used by the container gauge
network.usage.rx_bytes Bytes received by the container via its network interface cumulative
network.usage.rx_dropped Number of inbound network packets dropped by the container cumulative
network.usage.rx_errors Errors receiving network packets cumulative
network.usage.rx_packets Network packets received by the container via its network interface cumulative
network.usage.tx_bytes Bytes sent by the container via its network interface cumulative
network.usage.tx_dropped Number of outbound network packets dropped by the container cumulative
network.usage.tx_errors Errors sending network packets cumulative
network.usage.tx_packets Network packets sent by the container via its network interface cumulative

blkio.io_merged_recursive.async πŸ”—

cumulative

blkio.io_merged_recursive.read πŸ”—

cumulative

blkio.io_merged_recursive.sync πŸ”—

cumulative

blkio.io_merged_recursive.total πŸ”—

cumulative

blkio.io_merged_recursive.write πŸ”—

cumulative

blkio.io_queue_recursive.async πŸ”—

cumulative

blkio.io_queue_recursive.read πŸ”—

cumulative

blkio.io_queue_recursive.sync πŸ”—

cumulative

blkio.io_queue_recursive.total πŸ”—

cumulative

blkio.io_queue_recursive.write πŸ”—

cumulative

blkio.io_service_bytes_recursive.async πŸ”—

cumulative

Volume, in bytes, of asynchronous block I/O

blkio.io_service_bytes_recursive.read πŸ”—

cumulative

Volume, in bytes, of reads from block devices

blkio.io_service_bytes_recursive.sync πŸ”—

cumulative

Volume, in bytes, of synchronous block I/O

blkio.io_service_bytes_recursive.total πŸ”—

cumulative

Total volume, in bytes, of all block I/O

blkio.io_service_bytes_recursive.write πŸ”—

cumulative

Volume, in bytes, of writes to block devices

blkio.io_service_time_recursive.async πŸ”—

cumulative

blkio.io_service_time_recursive.read πŸ”—

cumulative

blkio.io_service_time_recursive.sync πŸ”—

cumulative

blkio.io_service_time_recursive.total πŸ”—

cumulative

blkio.io_service_time_recursive.write πŸ”—

cumulative

blkio.io_serviced_recursive.async πŸ”—

cumulative

Number of asynchronous block I/O requests

blkio.io_serviced_recursive.read πŸ”—

cumulative

Number of reads requests from block devices

blkio.io_serviced_recursive.sync πŸ”—

cumulative

Number of synchronous block I/O requests

blkio.io_serviced_recursive.total πŸ”—

cumulative

Total number of block I/O requests

blkio.io_serviced_recursive.write πŸ”—

cumulative

Number of write requests to block devices

blkio.io_time_recursive.async πŸ”—

cumulative

blkio.io_time_recursive.read πŸ”—

cumulative

blkio.io_time_recursive.sync πŸ”—

cumulative

blkio.io_time_recursive.total πŸ”—

cumulative

blkio.io_time_recursive.write πŸ”—

cumulative

blkio.io_wait_time_recursive.async πŸ”—

cumulative

blkio.io_wait_time_recursive.read πŸ”—

cumulative

blkio.io_wait_time_recursive.sync πŸ”—

cumulative

blkio.io_wait_time_recursive.total πŸ”—

cumulative

blkio.io_wait_time_recursive.write πŸ”—

cumulative

cpu.percent πŸ”—

gauge

Percentage of host CPU resources used by the container

cpu.percpu.usage πŸ”—

cumulative

Jiffies of CPU time spent by the container, per CPU core

cpu.throttling_data.periods πŸ”—

cumulative

Number of periods

cpu.throttling_data.throttled_periods πŸ”—

cumulative

Number of periods throttled

cpu.throttling_data.throttled_time πŸ”—

cumulative

Throttling time in nano seconds

cpu.usage.kernelmode πŸ”—

cumulative

Jiffies of CPU time spent in kernel mode by the container

cpu.usage.system πŸ”—

cumulative

Jiffies of CPU time used by the system

cpu.usage.total πŸ”—

cumulative

Jiffies of CPU time used by the container

cpu.usage.usermode πŸ”—

cumulative

Jiffies of CPU time spent in user mode by the container

memory.percent πŸ”—

gauge

Percent of memory (0-100) used by the container relative to its limit (excludes page cache usage)

memory.stats.active_anon πŸ”—

gauge

Amount of memory that has been identified as active by the kernel. Anonymous memory is memory that is not linked to disk pages.

memory.stats.active_file πŸ”—

gauge

Amount of active file cache memory. Cache memory = active_file + inactive_file + tmpfs

memory.stats.cache πŸ”—

gauge

The amount of memory used by the processes of this control group that can be associated with a block on a block device. Also accounts for memory used by tmpfs.

memory.stats.dirty πŸ”—

gauge

The amount of memory waiting to get written to disk

memory.stats.hierarchical_memory_limit πŸ”—

gauge

The memory limit in place by the hierarchy cgroup

memory.stats.hierarchical_memsw_limit πŸ”—

gauge

The memory+swap limit in place by the hierarchy cgroup

memory.stats.inactive_anon πŸ”—

gauge

Amount of memory that has been identified as inactive by the kernel. Anonymous memory is memory that is not linked to disk pages.

memory.stats.inactive_file πŸ”—

gauge

Amount of inactive file cache memory. Cache memory = active_file + inactive_file + tmpfs

memory.stats.mapped_file πŸ”—

gauge

Indicates the amount of memory mapped by the processes in the control group. It doesn’t give you information about how much memory is used; it rather tells you how it is used.

memory.stats.pgfault πŸ”—

cumulative

Number of times that a process of the cgroup triggered a page fault. Page faults occur when a process accesses part of its virtual memory space which is nonexistent or protected. See https://docs.docker.com/config/containers/runmetrics for more info.

memory.stats.pgmajfault πŸ”—

cumulative

Number of times that a process of the cgroup triggered a major page fault. Page faults occur when a process accesses part of its virtual memory space which is nonexistent or protected. See https://docs.docker.com/config/containers/runmetrics for more info.

memory.stats.pgpgin πŸ”—

cumulative

Number of charging events to the memory cgroup. Charging events happen each time a page is accounted as either mapped anon page(RSS) or cache page to the cgroup.

memory.stats.pgpgout πŸ”—

cumulative

Number of uncharging events to the memory cgroup. Uncharging events happen each time a page is unaccounted from the cgroup.

memory.stats.rss πŸ”—

gauge

The amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.

memory.stats.rss_huge πŸ”—

gauge

Amount of memory due to anonymous transparent hugepages.

memory.stats.shmem πŸ”—

gauge

Amount of Shared Memory used by the container, in bytes.

memory.stats.swap πŸ”—

gauge

Bytes of swap memory used by container

memory.stats.total_active_anon πŸ”—

gauge

Total amount of memory that has been identified as active by the kernel. Anonymous memory is memory that is not linked to disk pages.

memory.stats.total_active_file πŸ”—

gauge

Total amount of active file cache memory. Cache memory = active_file + inactive_file + tmpfs

memory.stats.total_cache πŸ”—

gauge

Total amount of memory used by the processes of this control group that can be associated with a block on a block device. Also accounts for memory used by tmpfs.

memory.stats.total_dirty πŸ”—

gauge

Total amount of memory waiting to get written to disk

memory.stats.total_inactive_anon πŸ”—

gauge

Total amount of memory that has been identified as inactive by the kernel. Anonymous memory is memory that is not linked to disk pages.

memory.stats.total_inactive_file πŸ”—

gauge

Total amount of inactive file cache memory. Cache memory = active_file + inactive_file + tmpfs

memory.stats.total_mapped_file πŸ”—

gauge

Total amount of memory mapped by the processes in the control group. It doesn’t give you information about how much memory is used; it rather tells you how it is used.

memory.stats.total_pgfault πŸ”—

cumulative

Total number of page faults

memory.stats.total_pgmajfault πŸ”—

cumulative

Total number of major page faults

memory.stats.total_pgpgin πŸ”—

cumulative

Total number of charging events

memory.stats.total_pgpgout πŸ”—

cumulative

Total number of uncharging events

memory.stats.total_rss πŸ”—

gauge

Total amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.

memory.stats.total_rss_huge πŸ”—

gauge

Total amount of memory due to anonymous transparent hugepages.

memory.stats.total_shmem πŸ”—

gauge

Available amount of Shared Memory used by the container, in bytes.

memory.stats.total_swap πŸ”—

gauge

Total amount of swap memory available to this container

memory.stats.total_unevictable πŸ”—

gauge

Total amount of memory that can not be reclaimed

memory.stats.total_writeback πŸ”—

gauge

Total amount of memory from file/anon cache that are queued for syncing to the disk

memory.stats.unevictable πŸ”—

gauge

The amount of memory that cannot be reclaimed.

memory.stats.writeback πŸ”—

gauge

The amount of memory from file/anon cache that are queued for syncing to the disk

memory.usage.limit πŸ”—

gauge

Memory usage limit of the container, in bytes

memory.usage.max πŸ”—

gauge

Maximum measured memory usage of the container, in bytes

memory.usage.total πŸ”—

gauge

Bytes of memory used by the container

network.usage.rx_bytes πŸ”—

cumulative

Bytes received by the container via its network interface

network.usage.rx_dropped πŸ”—

cumulative

Number of inbound network packets dropped by the container

network.usage.rx_errors πŸ”—

cumulative

Errors receiving network packets

network.usage.rx_packets πŸ”—

cumulative

Network packets received by the container via its network interface

network.usage.tx_bytes πŸ”—

cumulative

Bytes sent by the container via its network interface

network.usage.tx_dropped πŸ”—

cumulative

Number of outbound network packets dropped by the container

network.usage.tx_errors πŸ”—

cumulative

Errors sending network packets

network.usage.tx_packets πŸ”—

cumulative

Network packets sent by the container via its network interface

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

Group blkio πŸ”—

All of the following metrics are part of the blkio metric group. All of the non-default metrics below can be turned on by adding blkio to the monitor config option extraGroups:

  • blkio.io_merged_recursive.async (cumulative)
  • blkio.io_merged_recursive.read (cumulative)
  • blkio.io_merged_recursive.sync (cumulative)
  • blkio.io_merged_recursive.total (cumulative)
  • blkio.io_merged_recursive.write (cumulative)
  • blkio.io_queue_recursive.async (cumulative)
  • blkio.io_queue_recursive.read (cumulative)
  • blkio.io_queue_recursive.sync (cumulative)
  • blkio.io_queue_recursive.total (cumulative)
  • blkio.io_queue_recursive.write (cumulative)
  • blkio.io_service_bytes_recursive.async (cumulative)
    Volume, in bytes, of asynchronous block I/O
  • blkio.io_service_bytes_recursive.read (cumulative)
    Volume, in bytes, of reads from block devices
  • blkio.io_service_bytes_recursive.sync (cumulative)
    Volume, in bytes, of synchronous block I/O
  • blkio.io_service_bytes_recursive.total (cumulative)
    Total volume, in bytes, of all block I/O
  • blkio.io_service_bytes_recursive.write (cumulative)
    Volume, in bytes, of writes to block devices
  • blkio.io_service_time_recursive.async (cumulative)
  • blkio.io_service_time_recursive.read (cumulative)
  • blkio.io_service_time_recursive.sync (cumulative)
  • blkio.io_service_time_recursive.total (cumulative)
  • blkio.io_service_time_recursive.write (cumulative)
  • blkio.io_serviced_recursive.async (cumulative)
    Number of asynchronous block I/O requests
  • blkio.io_serviced_recursive.read (cumulative)
    Number of reads requests from block devices
  • blkio.io_serviced_recursive.sync (cumulative)
    Number of synchronous block I/O requests
  • blkio.io_serviced_recursive.total (cumulative)
    Total number of block I/O requests
  • blkio.io_serviced_recursive.write (cumulative)
    Number of write requests to block devices
  • blkio.io_time_recursive.async (cumulative)
  • blkio.io_time_recursive.read (cumulative)
  • blkio.io_time_recursive.sync (cumulative)
  • blkio.io_time_recursive.total (cumulative)
  • blkio.io_time_recursive.write (cumulative)
  • blkio.io_wait_time_recursive.async (cumulative)
  • blkio.io_wait_time_recursive.read (cumulative)
  • blkio.io_wait_time_recursive.sync (cumulative)
  • blkio.io_wait_time_recursive.total (cumulative)
  • blkio.io_wait_time_recursive.write (cumulative)

Group cpu πŸ”—

All of the following metrics are part of the cpu metric group. All of the non-default metrics below can be turned on by adding cpu to the monitor config option extraGroups:

  • cpu.percent (gauge)
    Percentage of host CPU resources used by the container
  • cpu.percpu.usage (cumulative)
    Jiffies of CPU time spent by the container, per CPU core
  • cpu.throttling_data.periods (cumulative)
    Number of periods
  • cpu.throttling_data.throttled_periods (cumulative)
    Number of periods throttled
  • cpu.throttling_data.throttled_time (cumulative)
    Throttling time in nano seconds
  • cpu.usage.kernelmode (cumulative)
    Jiffies of CPU time spent in kernel mode by the container
  • cpu.usage.system (cumulative)
    Jiffies of CPU time used by the system
  • cpu.usage.total (cumulative)
    Jiffies of CPU time used by the container
  • cpu.usage.usermode (cumulative)
    Jiffies of CPU time spent in user mode by the container

Group memory πŸ”—

All of the following metrics are part of the memory metric group. All of the non-default metrics below can be turned on by adding memory to the monitor config option extraGroups:

  • memory.percent (gauge)
    Percent of memory (0-100) used by the container relative to its limit (excludes page cache usage)
  • memory.stats.active_anon (gauge)
    Amount of memory that has been identified as active by the kernel. Anonymous memory is memory that is not linked to disk pages.
  • memory.stats.active_file (gauge)
    Amount of active file cache memory. Cache memory = active_file + inactive_file + tmpfs
  • memory.stats.cache (gauge)
    The amount of memory used by the processes of this control group that can be associated with a block on a block device. Also accounts for memory used by tmpfs.
  • memory.stats.dirty (gauge)
    The amount of memory waiting to get written to disk
  • memory.stats.hierarchical_memory_limit (gauge)
    The memory limit in place by the hierarchy cgroup
  • memory.stats.hierarchical_memsw_limit (gauge)
    The memory+swap limit in place by the hierarchy cgroup
  • memory.stats.inactive_anon (gauge)
    Amount of memory that has been identified as inactive by the kernel. Anonymous memory is memory that is not linked to disk pages.
  • memory.stats.inactive_file (gauge)
    Amount of inactive file cache memory. Cache memory = active_file + inactive_file + tmpfs
  • memory.stats.mapped_file (gauge)
    Indicates the amount of memory mapped by the processes in the control group. It doesn’t give you information about how much memory is used; it rather tells you how it is used.
  • memory.stats.pgfault (cumulative)
    Number of times that a process of the cgroup triggered a page fault. Page faults occur when a process accesses part of its virtual memory space which is nonexistent or protected. See https://docs.docker.com/config/containers/runmetrics for more info.
  • memory.stats.pgmajfault (cumulative)
    Number of times that a process of the cgroup triggered a major page fault. Page faults occur when a process accesses part of its virtual memory space which is nonexistent or protected. See https://docs.docker.com/config/containers/runmetrics for more info.
  • memory.stats.pgpgin (cumulative)
    Number of charging events to the memory cgroup. Charging events happen each time a page is accounted as either mapped anon page(RSS) or cache page to the cgroup.
  • memory.stats.pgpgout (cumulative)
    Number of uncharging events to the memory cgroup. Uncharging events happen each time a page is unaccounted from the cgroup.
  • memory.stats.rss (gauge)
    The amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
  • memory.stats.rss_huge (gauge)
    Amount of memory due to anonymous transparent hugepages.
  • memory.stats.shmem (gauge)
    Amount of Shared Memory used by the container, in bytes.
  • memory.stats.swap (gauge)
    Bytes of swap memory used by container
  • memory.stats.total_active_anon (gauge)
    Total amount of memory that has been identified as active by the kernel. Anonymous memory is memory that is not linked to disk pages.
  • memory.stats.total_active_file (gauge)
    Total amount of active file cache memory. Cache memory = active_file + inactive_file + tmpfs
  • memory.stats.total_cache (gauge)
    Total amount of memory used by the processes of this control group that can be associated with a block on a block device. Also accounts for memory used by tmpfs.
  • memory.stats.total_dirty (gauge)
    Total amount of memory waiting to get written to disk
  • memory.stats.total_inactive_anon (gauge)
    Total amount of memory that has been identified as inactive by the kernel. Anonymous memory is memory that is not linked to disk pages.
  • memory.stats.total_inactive_file (gauge)
    Total amount of inactive file cache memory. Cache memory = active_file + inactive_file + tmpfs
  • memory.stats.total_mapped_file (gauge)
    Total amount of memory mapped by the processes in the control group. It doesn’t give you information about how much memory is used; it rather tells you how it is used.
  • memory.stats.total_pgfault (cumulative)
    Total number of page faults
  • memory.stats.total_pgmajfault (cumulative)
    Total number of major page faults
  • memory.stats.total_pgpgin (cumulative)
    Total number of charging events
  • memory.stats.total_pgpgout (cumulative)
    Total number of uncharging events
  • memory.stats.total_rss (gauge)
    Total amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
  • memory.stats.total_rss_huge (gauge)
    Total amount of memory due to anonymous transparent hugepages.
  • memory.stats.total_shmem (gauge)
    Available amount of Shared Memory used by the container, in bytes.
  • memory.stats.total_swap (gauge)
    Total amount of swap memory available to this container
  • memory.stats.total_unevictable (gauge)
    Total amount of memory that can not be reclaimed
  • memory.stats.total_writeback (gauge)
    Total amount of memory from file/anon cache that are queued for syncing to the disk
  • memory.stats.unevictable (gauge)
    The amount of memory that cannot be reclaimed.
  • memory.stats.writeback (gauge)
    The amount of memory from file/anon cache that are queued for syncing to the disk
  • memory.usage.limit (gauge)
    Memory usage limit of the container, in bytes
  • memory.usage.max (gauge)
    Maximum measured memory usage of the container, in bytes
  • memory.usage.total (gauge)
    Bytes of memory used by the container

Group network πŸ”—

All of the following metrics are part of the network metric group. All of the non-default metrics below can be turned on by adding network to the monitor config option extraGroups:

  • network.usage.rx_bytes (cumulative)
    Bytes received by the container via its network interface
  • network.usage.rx_dropped (cumulative)
    Number of inbound network packets dropped by the container
  • network.usage.rx_errors (cumulative)
    Errors receiving network packets
  • network.usage.rx_packets (cumulative)
    Network packets received by the container via its network interface
  • network.usage.tx_bytes (cumulative)
    Bytes sent by the container via its network interface
  • network.usage.tx_dropped (cumulative)
    Number of outbound network packets dropped by the container
  • network.usage.tx_errors (cumulative)
    Errors sending network packets
  • network.usage.tx_packets (cumulative)
    Network packets sent by the container via its network interface

Non-default metrics (version 4.7.0+) πŸ”—

The following information applies to the agent version 4.7.0+ that has enableBuiltInFiltering: true set on the top level of the agent config.

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Legacy non-default metrics (version < 4.7.0) πŸ”—

The following information only applies to agent version older than 4.7.0. If you have a newer agent and have set enableBuiltInFiltering: true at the top level of your agent config, see the section above. See upgrade instructions in Old-style whitelist filtering.

If you have a reference to the whitelist.json in your agent’s top-level metricsToExclude config option, and you want to emit metrics that are not in that whitelist, then you need to add an item to the top-level metricsToInclude config option to override that whitelist (see Inclusion filtering. Or you can just copy the whitelist.json, modify it, and reference that in metricsToExclude.