Docs » Integrations Guide » Integrations Reference » Zookeeper

image0 Zookeeper

Metadata associated with SignalFx’s integration with Zookeeper can be found here. The relevant code for the plugin can be found here.

DESCRIPTION

This is a collectd plugin for getting metrics and information from ZooKeeper servers, based on the ZooKeeper monitoring script check_zookeeper.py.

FEATURES

Built-in dashboards

  • Zookeeper Nodes: Overview of data from all Zookeeper nodes.

image1

  • Zookeeper Node: Focus on a single Zookeeper node.

image2

REQUIREMENTS AND DEPENDENCIES

This plugin requires:

Software Version
collectd 4.9+
Python plugin for collectd (included with SignalFx collectd agent)
Python 2.6+
Zookeeper 3.4.0+

Note:

  • Requires ZooKeeper 3.4.0 or greater in order to use the mntr four letter word command.
  • If support for earlier versions is needed, add srvr command, available in since 3.3.0, or stat (fetches extra uneeded data but available pre-3.3).

INSTALLATION

If you are using the new Smart Agent, see the docs for thecollectd/zookeepermonitor for more information. The configuration documentation below may be helpful as well, but consult the Smart Agent repo’s docs for the exact schema.

  1. Download the collectd-zookeeper Python module.
  2. Download SignalFxs sample configuration file to /etc/collectd/managed_config.
  3. Modify the configuration file as described in Configuration below.
  4. Restart collectd.

CONFIGURATION

Using the example configuration file 20-zookeeper.conf as a guide, provide values for the configuration options listed below that make sense for your environment and allow you to connect to the Zookeeper instance to be monitored.

Setting Description Default
Hosts Hostname where Zookeeper is running "localhost"
Port port number for Zookeeper 2181
Instance Specify a cluster name none (commented out)

USAGE

Sample of built-in dashboard in SignalFx:

image3

METRICS

Below is a list of all metrics.

Metric Name Brief Type
counter.zk_packets_received Count of the number of ZooKeeper packets received by a server cumulative counter
counter.zk_packets_sent Count of the number of ZooKeeper packets sent from a server cumulative counter
gauge.zk_approximate_data_size Size of data in bytes that a ZooKeeper server has in its data tree gauge
gauge.zk_avg_latency Average time in milliseconds for requests to be processed gauge
gauge.zk_ephemerals_count Number of ephemeral nodes that a ZooKeeper server has in its data tree gauge
gauge.zk_max_file_descriptor_count Maximum number of file descriptors that a ZooKeeper server can open gauge
gauge.zk_max_latency Maximum time in milliseconds for a request to be processed gauge
gauge.zk_min_latency Minimum time in milliseconds for a request to be processed gauge
gauge.zk_num_alive_connections Number of active clients connected to a ZooKeeper server gauge
gauge.zk_open_file_descriptor_count Number of file descriptors that a ZooKeeper server has open gauge
gauge.zk_outstanding_requests Number of currently executing requests gauge
gauge.zk_watch_count Number of watches placed on Z-Nodes on a ZooKeeper server gauge
gauge.zk_znode_count Number of z-nodes that a ZooKeeper server has in its data tree gauge

counter.zk_packets_received

cumulative counter

How many ZooKeeper packets have been received by a ZooKeeper server.

Use this metric to see how many packets are being received by a ZooKeeper server.

If the value of this metric on one server differs significantly from other servers:

  • There might be a connection imbalance. Check the gauge.zk_num_alive_connections metric for this server to see if it also differs significantly from other servers.
  • There might be an unbalanced number of requests being sent to this server. This can happen if clients are not sending the expected number of requests.

counter.zk_packets_sent

cumulative counter

How many ZooKeeper packets have been sent from a ZooKeeper server.

Use this metric to see how many packets are being sent from a ZooKeeper server.

If this metric is significantly different than other servers:

  • There might be a connection imbalance. Check the gauge.zk_num_alive_connections metric for this server to see if it also differs significantly from other servers.
  • There might be an unbalanced number of requests being sent to this server. Check the gauge.zk_packets_received metric for this server to see if it also differs significantly from other servers.

gauge.zk_approximate_data_size

gauge

The size in bytes of the data tree for a ZooKeeper server

Use this metric to keep track of the size of the data tree on a ZooKeeper server.

Any unexpected changes maybe caused by:

  • A client writing or deleting data.
  • A client disconnecting and ephemeral nodes being deleted as a result. Check to see if the gauge.zk_num_alive_connections metric for this server is also changing unexpected.

gauge.zk_avg_latency

gauge

How long on average it takes for this ZooKeeper server to process a request in milliseconds. This is
measured since the last restart of the ZooKeeper server.

If this metric is continuously rising then one of the following may be true:

  • There may be too many requests being sent to this server for its CPU capacity. Make sure there is enough CPU capacity for the ZooKeeper Server on the server.
  • This server may be having an issue connecting to other ZooKeeper servers. Look at logs on the server to see if they contain exceptions.

gauge.zk_ephemerals_count

gauge

The number of unique ephemeral z-nodes on a ZooKeeper server.

Use this metric to keep track of the number of ephemeral z-nodes on a ZooKeeper server.

Any unexpected changes may be caused by:

  • A client deleting or creating new ephemeral nodes.
  • A client disconnecting and ephemeral nodes being deleted as a result. Check to see if the gauge.zk_num_alive_connections.md metric for this server is decreasing unexpectedly.

gauge.zk_max_file_descriptor_count

gauge

The maximum number of file descriptors a ZooKeeper server can open.

Compare this metric to gauge.zk_open_file_descriptor_count to keep track of file descriptor capacity in a ZooKeeper process.

gauge.zk_max_latency

gauge

The maximum time it took this ZooKeeper server to process a request in milliseconds. This is measured
since the last restart of the ZooKeeper server.

If this metric is rising then one of the following may be true:

  • There may be too many requests being sent to this server for its CPU capacity. Make sure there is enough CPU capacity for the ZooKeeper server on the server.
  • This server may be unable to connect to other ZooKeeper servers. Look at logs on the server to see if they contain exceptions.

gauge.zk_min_latency

gauge

The minimum time it took this ZooKeeper server to process a request in milliseconds. This is measured since the last restart of this ZooKeeper server.

If this metric is rising then one of the following may be true:

  • There may be too many requests being sent to this server for its CPU capacity. Make sure there is enough CPU capacity for the ZooKeeper Server on the server.
  • This server may be unable to connect to other ZooKeeper servers. Look at logs on the server to see if they contain exceptions.

gauge.zk_num_alive_connections

gauge

The number of active clients connected to a ZooKeeper server.

An active client is one that is sending in regular heartbeat messages.

Use this metric to keep track of the number of clients on a ZooKeeper server.
For example, this metric will show you if one server in a ZooKeeper cluster has more clients than the others.

Any unexpected changes may be caused by:

  • Networking issues such as dropped packets or large network latency.
  • Clients disconnecting or reconnecting.

gauge.zk_open_file_descriptor_count

gauge

The number of file descriptors a ZooKeeper server has open.

Use this metric to keep track of open file descriptors in a ZooKeeper process.

If this number is too high then ZooKeeper will stop accepting connections from clients.
Compare this metric to gauge.zk_max_file_descriptor_count to figure out how close you are to file descriptor capacity.

File descriptor counts grow with:

  • The number of clients connected
  • The number of data files on disk

gauge.zk_outstanding_requests

gauge

The instantaneous number of requests on a ZooKeeper server that have started but have not finished yet.

If this metric is climbing:

  • There may be too many requests being sent to this server for its CPU capacity. Check CPU capacity on your ZooKeeper cluster and make sure that this server is not receiving a disproportionate number of requests relative to other servers.
  • This server may be unable to connect to other ZooKeeper servers. Look at logs on the server to see if they contain exceptions.

gauge.zk_watch_count

gauge

The number of watches on z-nodes on a ZooKeeper server.

Use this metric to keep track of the number of z-node watches on a ZooKeeper server.

Any unexpected changes may be caused by:

  • A client creating or deleting new watches. Check that clients are not acting unexpectedly.
  • A client disconnecting. This will remove that client’s watches. Confirm this by checking the gauge.zk_num_alive_connections metric for this server.

gauge.zk_znode_count

gauge

The number of unique z-nodes on a ZooKeeper server has in it’s data tree

Use this metric to keep track of the number of z-nodes on a ZooKeeper server.

Any unexpected changes may be caused by:

  • A client deleting or creating new nodes. Check that clients are not acting unexpectedly.
  • A client disconnecting and ephemeral nodes being deleted. Check to see if the gauge.zk_ephemerals_count for this server has decreased unexpectedly.