Docs » Integrations Guide » Integrations Reference » Consul

../../_images/integrations_consul.png Consul ๐Ÿ”—

DESCRIPTION ๐Ÿ”—

This integration primarily consists of the Smart Agent monitor collectd/consul. Below is an overview of that monitor.

Smart Agent Monitor ๐Ÿ”—

Monitors the Consul data store by using the Consul collectd Python plugin, which collects metrics from Consul instances by hitting these endpoints:

Supports Consul 0.7.0+.

INSTALLATION ๐Ÿ”—

This integration is part of the SignalFx Smart Agent as the collectd/consul monitor. You should first deploy the Smart Agent to the same host as the service you want to monitor, and then continue with the configuration instructions below.

CONFIGURATION ๐Ÿ”—

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: collectd/consul
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
pythonBinary no string Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well.
host yes string
port yes integer
aclToken no string Consul ACL token
useHTTPS no bool Set to true to connect to Consul using HTTPS. You can figure the certificate for the server with the caCertificate config option. (default: false)
telemetryServer no bool (default: false)
telemetryHost no string IP address or DNS to which Consul is configured to send telemetry UDP packets. Relevant only if telemetryServer is set to true. (default: 0.0.0.0)
telemetryPort no integer Port to which Consul is configured to send telemetry UDP packets. Relevant only if telemetryServer is set to true. (default: 8125)
enhancedMetrics no bool Set to true to enable collecting all metrics from Consul's runtime telemetry send via UDP or from the /agent/metrics endpoint. (default: false)
caCertificate no string If Consul server has HTTPS enabled for the API, specifies the path to the CA's Certificate.
clientCertificate no string If client-side authentication is enabled, specifies the path to the certificate file.
clientKey no string If client-side authentication is enabled, specifies the path to the key file.
signalFxAccessToken no string

USAGE ๐Ÿ”—

Interpreting Built-in dashboards ๐Ÿ”—

  • CONSUL CLUSTER:

    • Total Services: Shows the total number of services registered with the Consul cluster.

      ../../_images/chart_cluster_total_services.png

    • Total Nodes: Shows the total number of nodes in the Consul clusterโ€™s catalog. Nodes include instances running consul agent in either client or server mode and external nodes registered with the Consul store.

      ../../_images/chart_cluster_total_nodes.png

    • Number of services by node: Descending list showing the number of services that are registered with a given node. The node name displayed is the Consul NodeName config value.

      ../../_images/chart_services_by_node.png

    • Number of Nodes by Service: Descending list showing the number of nodes that are providing a given service in the datacenter.

      ../../_images/chart_nodes_by_service.png

    • Service health check results: A list showing the results of service health checks that are registered with Consul. Checks can result in 3 states - passing, warning and critical.

      ../../_images/chart_service_health_check.png

    • Node health check results: Node checks are done on the individual host level. If a host fails a check, all services registered with it are marked as failed and Consul no longer returns the node in service discovery requests. The chart is a list showing the results of node health checks. Checks can result in 3 states - passing, warning and critical.

      ../../_images/chart_node_health_check.png

    • Total Peers: Number of consul Raft peers or consul agents in server mode in a given datacenter.

      ../../_images/chart_total_peers.png

    • Consul Server Map: Displays the followers and leader in given datacenter.

      ../../_images/chart_consul_server_map.png

    • Mean node network latency: Shows the average latency of a given node from other nodes in the Consul cluster. The dimension consul_node corresponds to the source node. The maximum and minimum values for this metric are also available.

      ../../_images/chart_mean_node_latency.png

    • Mean datacenter latency: Average datacenter latency between 2 datacenters. This metric has the additional dimension destination_dc dimension. The latency is calculated between this destination datacenter and the agentโ€™s datacenter given by the datacenter dimension. The maximum and minimum values for this metric are also available.

      ../../_images/chart_mean_dc_latency.png

  • CONSUL HEALTH:

    • Leadership Change Event: Event feed showing leader tranisiton events. The event has the new and old leader node name as dimensions.

      ../../_images/chart_leader_change_event.png

    • Leadership Transitions: Tracks number of leadership transitions. If there are frequent leadership changes this may be an indication that the servers are overloaded and arenโ€™t meeting the soft real-time requirements for Raft, or that there are networking problems between the servers.

      ../../_images/chart_leader_transition.png

    • Leader last contact with followers: This shows the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease.

      ../../_images/chart_leader_last_contact.png

    • Leader latency to commit to disk: Time it takes for the leader to write log entries to disk.

      ../../_images/chart_leader_disk_commit.png

    • Raft commit time: Time it takes to commit a new entry to the Raft log on the leader.

      ../../_images/chart_raft_commit_time.png

    • Number of Raft Transactions: This is a general indicator of the write load on the Consul servers.

      ../../_images/chart_raft_transactions.png

    • Leader Time to Append Entries: This measures the time it takes the leader to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers.

      ../../_images/chart_time_to_append_entries.png

    • Number of RPC queries: Total number of rpc queries per interval. This is a general measure of all read volume.

      ../../_images/chart_number_of_rpc_queries.png

    • Cluster Joins and Leaves: This chart tracks successful node joins and leaves in the Serf memberlist.

      ../../_images/chart_join_leave.png

    • Leader time to reconcile: Shows the time it takes for the leader to reconcile Serf membership and what is reflected in Consulโ€™s store.

      ../../_images/chart_leader_reconcile.png

    • Serf Events: Consul provides an event feature by which custom events can be propagated across your entire datacenter. This chart shows the number of events processed by Consul agents per interval. Using this chart you can track if triggered events were processed by a consul node. Additinally, you can also easily setup a chart to track events for a selected node in the CLIENT and SERVER dashboard.

      ../../_images/chart_serf_events.png

    • Serf Event Queue: Shows the avg and max number of backlog of serf events in queue of Consul agents.

      ../../_images/chart_serf_event_queue.png

  • CONSUL CLIENT

    • Number of allocated heap objects: Gives the number of heap objects allocated to the consul process. Indicates memory pressure on a Consul node.

      ../../_images/chart_heap_objects.png

    • Allocated Bytes: Number of allocated bytes to the Consul process.

      ../../_images/chart_allocated_bytes.png

    • Number of GO routines: The number of GO routines Consul is running. This is a general load pressure indicator for Consul agent.

      ../../_images/chart_go_routines.png

    • Network Latency: Shows the avg, max and min network latency between the node and other nodes in the datacenter.

      ../../_images/chart_network_latency.png

    • Time to service DNS queries: Consul provides both DNS and HTTP interfaces for service discovery. This shows the time it takes to service forward and reverse DNS lookups by the selected node.

      ../../_images/chart_dns_queries.png

  • CONSUL SERVER All charts metioned in the Client dashboard are also present in the Server dashboard. In addition to those, the following charts are present

    • Raft candidate state: This chart tracks if the selected Consul server starts an election. If this metric increments without a leadership change occurring it could indicate that a single server is overloaded or is experiencing network connectivity issues.

      ../../_images/chart_raft_candidate.png

All metrics reported by the Consul collectd plugin will contain the following dimensions by default:

  • datacenter, this is the datacenter to which the Consul agent belongs to. The value for this dimension is read from the agents configuration
  • consul_node, this is the Consul node name as seen in Consul agents configuration
  • consul_mode, consul agent is in client or server mode

The metric consul.is_leader is reported by consul servers and have the dimension - consul_server_state which can be either leader or follower.

Additional default metrics to track

  • consul.memberlist.msg.suspect - This metric counts the number of times an agent suspects another as failed when executing random probes as part of the gossip protocol. These can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports.
  • consul.serf.member.flap - This metric tracks when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports.
  • consul.dns.stale_queries - This metric tracks when an agent serves a DNS query based on information from a server that is more than 5 seconds out of date.

A few other details:

  • plugin is always set to consul
  • To add additional metrics from the telemetry stream or /agent/metrics endpoint, use the configuration options mentioned in configuration. If metrics are being included individually, make sure to give valid prefixes. For e.g., to add metrics which track time taken to serve http requests, Consul emits these metrics in the form consul.http.<verb>.<path>. So to enable metrics which track time taken to service GET requests on Key/Value endpoint, add this consul.http.GET.v1.kv to the IncludeMetric cofiguration. If you want to allow metrics which track time taken to service all GET requests, add consul.http.GET to the configuration. When enhance metrics are enabled, you can block metrics in a similar manner.
  • The metrics from /agent/metric endpoint are aggregated over an interval of 10 seconds. Keep this in mind when changing the default collectd interval from 10 seconds.

METRICS ๐Ÿ”—

Metric Name Description Type
consul.dns.stale_queries Number of times an agent serves a DNS query based on information from a server that is more than 5 seconds out of date gauge
consul.memberlist.msg.suspect This increments when an agent suspects another as failed when executing random probes as part of the gossip protocol gauge
consul.serf.member.flap This metric increments when an agent is marked dead and then recovers within a short time period gauge
gauge.consul.catalog.nodes.total The total number of nodes in the Consul datacenter gauge
gauge.consul.catalog.nodes_by_service Number of nodes providing a given service gauge
gauge.consul.catalog.services.total The total number of services registered with Consul in the given datacenter gauge
gauge.consul.catalog.services_by_node Number of services registered with a node gauge
gauge.consul.consul.dns.domain_query.AGENT.avg This tracks how long it takes to service forward DNS lookups on the given Consul agent gauge
gauge.consul.consul.dns.domain_query.AGENT.max This tracks maximum time takes to service forward DNS lookups on the given Consul agent gauge
gauge.consul.consul.dns.domain_query.AGENT.min This tracks minimum time it takes to service forward DNS lookups on the given Consul agent gauge
gauge.consul.consul.dns.ptr_query.AGENT.avg This tracks average time it takes to service reverse DNS lookups on the given Consul agent gauge
gauge.consul.consul.dns.ptr_query.AGENT.max This tracks maximum time it takes to service reverse DNS lookups on the given Consul agent gauge
gauge.consul.consul.dns.ptr_query.AGENT.min This tracks minimum time it takes to service reverse DNS lookups on the given Consul agent gauge
gauge.consul.consul.leader.reconcile.avg Time it takes the leader to reconcile the differences between Serf membership and Consul's store gauge
gauge.consul.consul.rpc.query A general measure of all read volume gauge
gauge.consul.health.nodes.critical Number of nodes for which health checks are reporting Critical state gauge
gauge.consul.health.nodes.passing Number of nodes which health checks are reporting to be in Passing state gauge
gauge.consul.health.nodes.warning Number of nodes which health checks are reporting to be in Warning state gauge
gauge.consul.health.services.critical Number of services for which health checks are reporting Critical state gauge
gauge.consul.health.services.passing Number of services which health checks are reporting to be in Passing state gauge
gauge.consul.health.services.warning Number of services which health checks are reporting to be in Warning state gauge
gauge.consul.is_leader Metric to map consul server's in leader or follower state gauge
gauge.consul.network.dc.latency.avg Average datacenter latency between 2 datacenters gauge
gauge.consul.network.dc.latency.max Maximum datacenter latency between 2 datacenters gauge
gauge.consul.network.dc.latency.min Minimum datacenter latency between 2 datacenters gauge
gauge.consul.network.node.latency.avg Average network latency between given node and other nodes in the datacenter gauge
gauge.consul.network.node.latency.max Minimum network latency between given node and other nodes in the datacenter gauge
gauge.consul.network.node.latency.min Minimum network latency between given node and other nodes in the datacenter gauge
gauge.consul.peers Number of consul Raft peers or consul agents in server mode in a given datacenter gauge
gauge.consul.raft.apply This metric is a general indicator of the write load on the Consul servers gauge
gauge.consul.raft.commitTime.avg This measures the mean time it takes to commit a new entry to the Raft log on the leader gauge
gauge.consul.raft.commitTime.max This measures the max time it takes to commit a new entry to the Raft log on the leader gauge
gauge.consul.raft.commitTime.min This measures the minimum time it takes to commit a new entry to the Raft log on the leader gauge
gauge.consul.raft.leader.dispatchLog.avg This measures the mean time it takes for the leader to write log entries to disk gauge
gauge.consul.raft.leader.dispatchLog.max This measures the maximum time it takes for the leader to write log entries to disk gauge
gauge.consul.raft.leader.dispatchLog.min This measures the minimum time it takes for the leader to write log entries to disk gauge
gauge.consul.raft.leader.lastContact.avg This measures the time since the leader was last able to contact the follower nodes when checking its leader lease gauge
gauge.consul.raft.leader.lastContact.max This measures the maximum time since the leader was last able to contact the follower nodes when checking its leader lease gauge
gauge.consul.raft.leader.lastContact.min This measures the minimum time since the leader was last able to contact the follower nodes when checking its leader lease gauge
gauge.consul.raft.replication.appendEntries.rpc.AGENT.avg This measures the time it takes to replicate log entries to followers gauge
gauge.consul.raft.replication.appendEntries.rpc.AGENT.max This measures the maximum time it takes to replicate log entries to followers gauge
gauge.consul.raft.replication.appendEntries.rpc.AGENT.min This measures the minimum time it takes to replicate log entries to followers gauge
gauge.consul.raft.state.candidate Tracks the number of times given node enters the candidate state, i.e., the number of times the Consul server starts a leader election gauge
gauge.consul.raft.state.leader This metric increments whenever a Consul server becomes a leader gauge
gauge.consul.rpc.query gauge
gauge.consul.runtime.alloc_bytes Number of bytes allocated to Consul process on the node gauge
gauge.consul.runtime.heap_objects Number of heap objects allocated to Consul, indicates memory pressure on a Consul agent gauge
gauge.consul.runtime.num_goroutines Number of GO routines run by Consul process on the node gauge
gauge.consul.serf.events Number of serf events processed by Consul gauge
gauge.consul.serf.events.consul:new-leader gauge
gauge.consul.serf.member.join This metric tracks successful node joins to the Serf memberlist gauge
gauge.consul.serf.member.left This metric tracks successful node leaves to the Serf memberlist gauge
gauge.consul.serf.queue.Event.avg Average number of serf events in queue yet to be processed by Consul agent gauge
gauge.consul.serf.queue.Event.max Maximum number of serf events in queue yet to be processed by Consul agent gauge
gauge.consul.serf.queue.Event.min Minimum number of serf events in queue yet to be processed by Consul agent gauge
gauge.consul.serf.queue.Query.avg Average number of serf queries in queue yet to be processed by Consul agent gauge
gauge.consul.serf.queue.Query.max Maximum number of serf queries in queue yet to be processed by Consul agent gauge
gauge.consul.serf.queue.Query.min Minimum number of serf queries in queue yet to be processed by Consul agent gauge

consul.dns.stale_queries ๐Ÿ”—

gauge

Number of times an agent serves a DNS query based on information from a server that is more than 5 seconds out of date. This metric has the dimensions datacenter, consul_node and consul_mode.

consul.memberlist.msg.suspect ๐Ÿ”—

gauge

This increments when an agent suspects another as failed when executing random probes as part of the gossip protocol. These can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports. This metric has the dimensions datacenter, consul_node and consul_mode.

consul.serf.member.flap ๐Ÿ”—

gauge

This metric increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.catalog.nodes.total ๐Ÿ”—

gauge

The total number of nodes in the Consul datacenter. This metric is common to the cluster and, therefore, reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode to indicate which mode - server or client - is the reporting consul agent.

gauge.consul.catalog.nodes_by_service ๐Ÿ”—

gauge

Number of nodes providing a given service. This metric is reported by the leader only. The dimension consul_service indicates which service the metric corresponds too. Additionally, the metric also has the datacenter and consul_mode dimension.

gauge.consul.catalog.services.total ๐Ÿ”—

gauge

The total number of services registered with Consul in the given datacenter. This metric is common to the cluster and, therefore, reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode to indicate which mode - server or client - is the reporting consul agent.

gauge.consul.catalog.services_by_node ๐Ÿ”—

gauge

Number of services registered with a node. This metric is reported by the leader only. The dimension consul_node indicates which node the metric corresponds too. Additionally, the metric also has the datacenter and consul_mode dimension.

gauge.consul.consul.dns.domain_query.AGENT.avg ๐Ÿ”—

gauge

This tracks how long it takes to service forward DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.consul.dns.domain_query.AGENT.max ๐Ÿ”—

gauge

This tracks maximum time takes to service forward DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.consul.dns.domain_query.AGENT.min ๐Ÿ”—

gauge

This tracks minimum time it takes to service forward DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.consul.dns.ptr_query.AGENT.avg ๐Ÿ”—

gauge

This tracks average time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.consul.dns.ptr_query.AGENT.max ๐Ÿ”—

gauge

This tracks maximum time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.consul.dns.ptr_query.AGENT.min ๐Ÿ”—

gauge

This tracks minimum time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.consul.leader.reconcile.avg ๐Ÿ”—

gauge

Time it takes the leader to reconcile the differences between Serf membership and Consulโ€™s store. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.consul.rpc.query ๐Ÿ”—

gauge

A general measure of all read volume. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.health.nodes.critical ๐Ÿ”—

gauge

Number of nodes for which health checks are reporting Critical state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.

gauge.consul.health.nodes.passing ๐Ÿ”—

gauge

Number of nodes which health checks are reporting to be in Passing state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.

gauge.consul.health.nodes.warning ๐Ÿ”—

gauge

Number of nodes which health checks are reporting to be in Warning state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.

gauge.consul.health.services.critical ๐Ÿ”—

gauge

Number of services for which health checks are reporting Critical state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.

gauge.consul.health.services.passing ๐Ÿ”—

gauge

Number of services which health checks are reporting to be in Passing state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.

gauge.consul.health.services.warning ๐Ÿ”—

gauge

Number of services which health checks are reporting to be in Warning state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.

gauge.consul.is_leader ๐Ÿ”—

gauge

Metric to map consul serverโ€™s in leader or follower state. A follower instance returns value of 0 and leader returns a value of 1. Used by a Heat Map in the dashboard which makes recognizing the leader from followers visually easy. This metric comes with the dimension - consul_server_state which can be either leader or follower. Also has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.network.dc.latency.avg ๐Ÿ”—

gauge

Average datacenter latency between 2 datacenters. This metric has the additional dimension destination_dc dimension. The latency is calculated between this destination datacenter and the agentโ€™s datacenter given by the datacenter dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensions consul_mode and consul_node.

gauge.consul.network.dc.latency.max ๐Ÿ”—

gauge

Maximum datacenter latency between 2 datacenters. This metric has the additional dimension destination_dc dimension. The latency is calculated between this destination datacenter and the agentโ€™s datacenter given by the datacenter dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensions consul_mode and consul_node.

gauge.consul.network.dc.latency.min ๐Ÿ”—

gauge

Minimum datacenter latency between 2 datacenters. This metric has the additional dimension destination_dc dimension. The latency is calculated between this destination datacenter and the agentโ€™s datacenter given by the datacenter dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensions consul_mode and consul_node.

gauge.consul.network.node.latency.avg ๐Ÿ”—

gauge

Average network latency between given node and other nodes in the datacenter. The dimension consul_node corresponds to the source node. The metric also has the dimensions datacenter and consul_mode.

gauge.consul.network.node.latency.max ๐Ÿ”—

gauge

Minimum network latency between given node and other nodes in the datacenter. The dimension consul_node corresponds to the source node. The metric also has the dimensions datacenter and consul_mode.

gauge.consul.network.node.latency.min ๐Ÿ”—

gauge

Minimum network latency between given node and other nodes in the datacenter. The dimension consul_node corresponds to the source node. The metric also has the dimensions datacenter and consul_mode.

gauge.consul.peers ๐Ÿ”—

gauge

Number of consul Raft peers or consul agents in server mode in a given datacenter. This metric is reported by the leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode

gauge.consul.raft.apply ๐Ÿ”—

gauge

This metric is a general indicator of the write load on the Consul servers. This metric has the global dimensions consul_node, consul_mode and datacenter.

gauge.consul.raft.commitTime.avg ๐Ÿ”—

gauge

This measures the mean time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.commitTime.max ๐Ÿ”—

gauge

This measures the max time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.commitTime.min ๐Ÿ”—

gauge

This measures the minimum time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.leader.dispatchLog.avg ๐Ÿ”—

gauge

This measures the mean time it takes for the leader to write log entries to disk. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.leader.dispatchLog.max ๐Ÿ”—

gauge

This measures the maximum time it takes for the leader to write log entries to disk. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.leader.dispatchLog.min ๐Ÿ”—

gauge

This measures the minimum time it takes for the leader to write log entries to disk. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.leader.lastContact.avg ๐Ÿ”—

gauge

This measures the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.leader.lastContact.max ๐Ÿ”—

gauge

This measures the maximum time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.leader.lastContact.min ๐Ÿ”—

gauge

This measures the minimum time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.replication.appendEntries.rpc.AGENT.avg ๐Ÿ”—

gauge

This measures the time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.replication.appendEntries.rpc.AGENT.max ๐Ÿ”—

gauge

This measures the maximum time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.replication.appendEntries.rpc.AGENT.min ๐Ÿ”—

gauge

This measures the minimum time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.state.candidate ๐Ÿ”—

gauge

Tracks the number of times given node enters the candidate state, i.e., the number of times the Consul server starts a leader election. If this increments without a leadership change occurring it could indicate that a single server is overloaded or is experiencing network connectivity issues. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.raft.state.leader ๐Ÿ”—

gauge

This metric increments whenever a Consul server becomes a leader. If there are frequent leadership changes this may be indication that the servers are overloaded and arenโ€™t meeting the soft real-time requirements for Raft, or that there are networking problems between the servers. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.rpc.query ๐Ÿ”—

gauge

gauge.consul.runtime.alloc_bytes ๐Ÿ”—

gauge

Number of bytes allocated to Consul process on the node. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.runtime.heap_objects ๐Ÿ”—

gauge

Number of heap objects allocated to Consul, indicates memory pressure on a Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.runtime.num_goroutines ๐Ÿ”—

gauge

Number of GO routines run by Consul process on the node. Gives the general load pressure indicator for Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.events ๐Ÿ”—

gauge

Number of serf events processed by Consul. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.events.consul:new-leader ๐Ÿ”—

gauge

gauge.consul.serf.member.join ๐Ÿ”—

gauge

This metric tracks successful node joins to the Serf memberlist. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.member.left ๐Ÿ”—

gauge

This metric tracks successful node leaves to the Serf memberlist. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.queue.Event.avg ๐Ÿ”—

gauge

Average number of serf events in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.queue.Event.max ๐Ÿ”—

gauge

Maximum number of serf events in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.queue.Event.min ๐Ÿ”—

gauge

Minimum number of serf events in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.queue.Query.avg ๐Ÿ”—

gauge

Average number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.queue.Query.max ๐Ÿ”—

gauge

Maximum number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

gauge.consul.serf.queue.Query.min ๐Ÿ”—

gauge

Minimum number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

  • consul.dns.stale_queries (gauge)
    Number of times an agent serves a DNS query based on information from a server that is more than 5 seconds out of date. This metric has the dimensions datacenter, consul_node and consul_mode.
  • consul.memberlist.msg.suspect (gauge)
    This increments when an agent suspects another as failed when executing random probes as part of the gossip protocol. These can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports. This metric has the dimensions datacenter, consul_node and consul_mode.
  • consul.serf.member.flap (gauge)
    This metric increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.catalog.nodes.total (gauge)
    The total number of nodes in the Consul datacenter. This metric is common to the cluster and, therefore, reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode to indicate which mode - server or client - is the reporting consul agent.
  • gauge.consul.catalog.nodes_by_service (gauge)
    Number of nodes providing a given service. This metric is reported by the leader only. The dimension consul_service indicates which service the metric corresponds too. Additionally, the metric also has the datacenter and consul_mode dimension.
  • gauge.consul.catalog.services.total (gauge)
    The total number of services registered with Consul in the given datacenter. This metric is common to the cluster and, therefore, reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode to indicate which mode - server or client - is the reporting consul agent.
  • gauge.consul.catalog.services_by_node (gauge)
    Number of services registered with a node. This metric is reported by the leader only. The dimension consul_node indicates which node the metric corresponds too. Additionally, the metric also has the datacenter and consul_mode dimension.
  • gauge.consul.consul.dns.domain_query.AGENT.avg (gauge)
    This tracks how long it takes to service forward DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.consul.dns.domain_query.AGENT.max (gauge)
    This tracks maximum time takes to service forward DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.consul.dns.domain_query.AGENT.min (gauge)
    This tracks minimum time it takes to service forward DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.consul.dns.ptr_query.AGENT.avg (gauge)
    This tracks average time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.consul.dns.ptr_query.AGENT.max (gauge)
    This tracks maximum time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.consul.dns.ptr_query.AGENT.min (gauge)
    This tracks minimum time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.consul.leader.reconcile.avg (gauge)
    Time it takes the leader to reconcile the differences between Serf membership and Consulโ€™s store. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.consul.rpc.query (gauge)
    A general measure of all read volume. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.health.nodes.critical (gauge)
    Number of nodes for which health checks are reporting Critical state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.
  • gauge.consul.health.nodes.passing (gauge)
    Number of nodes which health checks are reporting to be in Passing state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.
  • gauge.consul.health.nodes.warning (gauge)
    Number of nodes which health checks are reporting to be in Warning state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.
  • gauge.consul.health.services.critical (gauge)
    Number of services for which health checks are reporting Critical state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.
  • gauge.consul.health.services.passing (gauge)
    Number of services which health checks are reporting to be in Passing state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.
  • gauge.consul.health.services.warning (gauge)
    Number of services which health checks are reporting to be in Warning state. This metric is reported by leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode.
  • gauge.consul.is_leader (gauge)
    Metric to map consul serverโ€™s in leader or follower state. A follower instance returns value of 0 and leader returns a value of 1. Used by a Heat Map in the dashboard which makes recognizing the leader from followers visually easy. This metric comes with the dimension - consul_server_state which can be either leader or follower. Also has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.network.dc.latency.avg (gauge)
    Average datacenter latency between 2 datacenters. This metric has the additional dimension destination_dc dimension. The latency is calculated between this destination datacenter and the agentโ€™s datacenter given by the datacenter dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensions consul_mode and consul_node.
  • gauge.consul.network.dc.latency.max (gauge)
    Maximum datacenter latency between 2 datacenters. This metric has the additional dimension destination_dc dimension. The latency is calculated between this destination datacenter and the agentโ€™s datacenter given by the datacenter dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensions consul_mode and consul_node.
  • gauge.consul.network.dc.latency.min (gauge)
    Minimum datacenter latency between 2 datacenters. This metric has the additional dimension destination_dc dimension. The latency is calculated between this destination datacenter and the agentโ€™s datacenter given by the datacenter dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensions consul_mode and consul_node.
  • gauge.consul.network.node.latency.avg (gauge)
    Average network latency between given node and other nodes in the datacenter. The dimension consul_node corresponds to the source node. The metric also has the dimensions datacenter and consul_mode.
  • gauge.consul.network.node.latency.max (gauge)
    Minimum network latency between given node and other nodes in the datacenter. The dimension consul_node corresponds to the source node. The metric also has the dimensions datacenter and consul_mode.
  • gauge.consul.network.node.latency.min (gauge)
    Minimum network latency between given node and other nodes in the datacenter. The dimension consul_node corresponds to the source node. The metric also has the dimensions datacenter and consul_mode.
  • gauge.consul.peers (gauge)
    Number of consul Raft peers or consul agents in server mode in a given datacenter. This metric is reported by the leader only. This metric is reported with the dimension datacenter, consul_node name and consul_mode
  • gauge.consul.raft.apply (gauge)
    This metric is a general indicator of the write load on the Consul servers. This metric has the global dimensions consul_node, consul_mode and datacenter.
  • gauge.consul.raft.commitTime.avg (gauge)
    This measures the mean time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.commitTime.max (gauge)
    This measures the max time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.commitTime.min (gauge)
    This measures the minimum time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.leader.dispatchLog.avg (gauge)
    This measures the mean time it takes for the leader to write log entries to disk. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.leader.dispatchLog.max (gauge)
    This measures the maximum time it takes for the leader to write log entries to disk. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.leader.dispatchLog.min (gauge)
    This measures the minimum time it takes for the leader to write log entries to disk. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.leader.lastContact.avg (gauge)
    This measures the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.leader.lastContact.max (gauge)
    This measures the maximum time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.leader.lastContact.min (gauge)
    This measures the minimum time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.replication.appendEntries.rpc.AGENT.avg (gauge)
    This measures the time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.replication.appendEntries.rpc.AGENT.max (gauge)
    This measures the maximum time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.replication.appendEntries.rpc.AGENT.min (gauge)
    This measures the minimum time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.state.candidate (gauge)
    Tracks the number of times given node enters the candidate state, i.e., the number of times the Consul server starts a leader election. If this increments without a leadership change occurring it could indicate that a single server is overloaded or is experiencing network connectivity issues. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.raft.state.leader (gauge)
    This metric increments whenever a Consul server becomes a leader. If there are frequent leadership changes this may be indication that the servers are overloaded and arenโ€™t meeting the soft real-time requirements for Raft, or that there are networking problems between the servers. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.rpc.query (gauge)
  • gauge.consul.runtime.alloc_bytes (gauge)
    Number of bytes allocated to Consul process on the node. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.runtime.heap_objects (gauge)
    Number of heap objects allocated to Consul, indicates memory pressure on a Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.runtime.num_goroutines (gauge)
    Number of GO routines run by Consul process on the node. Gives the general load pressure indicator for Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.events (gauge)
    Number of serf events processed by Consul. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.events.consul:new-leader (gauge)
  • gauge.consul.serf.member.join (gauge)
    This metric tracks successful node joins to the Serf memberlist. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.member.left (gauge)
    This metric tracks successful node leaves to the Serf memberlist. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.queue.Event.avg (gauge)
    Average number of serf events in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.queue.Event.max (gauge)
    Maximum number of serf events in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.queue.Event.min (gauge)
    Minimum number of serf events in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.queue.Query.avg (gauge)
    Average number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.queue.Query.max (gauge)
    Maximum number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.
  • gauge.consul.serf.queue.Query.min (gauge)
    Minimum number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensions datacenter, consul_node and consul_mode.

Non-default metrics (version 4.7.0+) ๐Ÿ”—

The following information applies to the agent version 4.7.0+ that has enableBuiltInFiltering: true set on the top level of the agent config.

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Legacy non-default metrics (version < 4.7.0) ๐Ÿ”—

The following information only applies to agent version older than 4.7.0. If you have a newer agent and have set enableBuiltInFiltering: true at the top level of your agent config, see the section above. See upgrade instructions in Old-style whitelist filtering.

If you have a reference to the whitelist.json in your agentโ€™s top-level metricsToExclude config option, and you want to emit metrics that are not in that whitelist, then you need to add an item to the top-level metricsToInclude config option to override that whitelist (see Inclusion filtering. Or you can just copy the whitelist.json, modify it, and reference that in metricsToExclude.