Cassandra¶
Metadata associated with SignalFx’s Cassandra integration with collectd can be found here. The relevant code for the plugin can be found here.
DESCRIPTION¶
Monitor Cassandra using SignalFx’s configuration of the Java plugin for collectd.
Use this integration to monitor the following types of information from Cassandra nodes:
- read/write/range-slice requests
- read/write/range-slice errors (timeouts and unavailable)
- read/write/range-slice latency (median, 99th percentile, maximum)
- compaction activity
- hint activity
REQUIREMENTS AND DEPENDENCIES¶
Version information¶
Software | Version |
---|---|
collectd | 4.9+ |
Java plugin for collectd | (match with collectd version) |
Cassandra | 2.0.10+ |
INSTALLATION¶
If you are using the new Smart Agent, see the docs for thecollectd/cassandramonitor for more information. The configuration documentation below may be helpful as well, but consult the Smart Agent repo’s docs for the exact schema.
System modifications¶
Open the JMX port on your Cassandra app. Cassandra will listen for connections on port 8080 (port 7199 starting in 0.8.0-beta1). More information can be found at the Cassandra Projectsite. There is also a page covering a few commonissues.
Install Cassandra integration¶
- RHEL/CentOS and Amazon Linux users: Install the Java plugin for collectd](https://docs.signalfx.com/en/latest/integrations/integrations-reference/integrations.java.html) if it is not already installed.
- Download SignalFx’s example Cassandra configuration file to
/etc/collectd/managed_config
: 20-cassandra.conf - Modify
20-cassandra.conf
to provide values that make sense for your environment, as described in Configuration, below. - Restart collectd.
CONFIGURATION¶
Using the example configuration file 20-cassandra.conf as a guide, provide values for the configuration options listed below that make sense for your environment and allow you to connect to the Cassandra instance to be monitored.
Configuration Option | Description | Default |
---|---|---|
ServiceURL | URL of your JMX application. | service:jmx:rmi:///jndi/rmi://localhost:7199/jmxrmi |
Host | The name of your host. Appears as dimension host in SignalFx. Note: (Please leave the identifier [hostHasService=cassandra] ) in the host name. |
testcassandraserver[hostHasService=cassandra] |
METRICS¶
Below is a list of all metrics.
Metric Name | Brief | Type |
---|---|---|
counter.cassandra.ClientRequest.RangeSlice.Latency.Count | Count of range slice operations since server start | cumulative_counter |
counter.cassandra.ClientRequest.RangeSlice.Timeouts.Count | Count of range slice timeouts since server start | cumulative_counter |
counter.cassandra.ClientRequest.RangeSlice.Unavailables.Count | Count of range slice unavailables since server start | cumulative_counter |
counter.cassandra.ClientRequest.Read.Latency.Count | Count of read operations since server start | cumulative_counter |
counter.cassandra.ClientRequest.Read.Timeouts.Count | Count of read timeouts since server start | cumulative_counter |
counter.cassandra.ClientRequest.Read.Unavailables.Count | Count of read unavailables since server start | cumulative_counter |
counter.cassandra.ClientRequest.Write.Latency.Count | Count of write operations since server start | cumulative_counter |
counter.cassandra.ClientRequest.Write.Timeouts.Count | Count of write timeouts since server start | cumulative_counter |
counter.cassandra.ClientRequest.Write.Unavailables.Count | Count of write unavailables since server start | cumulative_counter |
counter.cassandra.Compaction.TotalCompactionsCompleted.Count | Number of compaction operations since node start | cumulative_counter |
gauge.cassandra.ClientRequest.RangeSlice.Latency.50thPercentile | 50th percentile (median) of Cassandra range slice latency | gauge |
gauge.cassandra.ClientRequest.RangeSlice.Latency.99thPercentile | 99th percentile of Cassandra range slice latency | gauge |
gauge.cassandra.ClientRequest.RangeSlice.Latency.Max | Maximum Cassandra range slice latency | gauge |
gauge.cassandra.ClientRequest.Read.Latency.50thPercentile | 50th percentile (median) of Cassandra read latency | gauge |
gauge.cassandra.ClientRequest.Read.Latency.99thPercentile | 99th percentile of Cassandra read latency | gauge |
gauge.cassandra.ClientRequest.Read.Latency.Max | Maximum Cassandra read latency | gauge |
gauge.cassandra.ClientRequest.Write.Latency.50thPercentile | 50th percentile (median) of Cassandra write latency | gauge |
gauge.cassandra.ClientRequest.Write.Latency.99thPercentile | 99th percentile of Cassandra write latency | gauge |
gauge.cassandra.ClientRequest.Write.Latency.Max | Maximum Cassandra write latency | gauge |
gauge.cassandra.Compaction.PendingTasks.Value | Number of compaction operations waiting to run | gauge |
gauge.cassandra.Storage.Load.Count | Storage used for Cassandra data in bytes | gauge |
gauge.cassandra.Storage.TotalHints.Count | Total hints since node start | gauge |
gauge.cassandra.Storage.TotalHintsInProgress.Count | Total pending hints | gauge |
counter.cassandra.ClientRequest.RangeSlice.Latency.Count¶
cumulative_counter
Count of range slice operations since server start
This metric indicates the range slice load of the server.
counter.cassandra.ClientRequest.RangeSlice.Timeouts.Count¶
cumulative_counter
Count of range slice timeouts since server start
This typically indicates a server overload condition.
If this value is increasing across the cluster then the cluster is too small for the application range slice load.
If this value is increasing for a single server in a cluster, then one of the following conditions may be true:
- one or more clients are directing more load to this server than the others
- the server is experiencing hardware or software issues and may require maintenance.
counter.cassandra.ClientRequest.Read.Latency.Count¶
cumulative_counter
Count of read operations since server start
This metric indicates the read load of the server.
counter.cassandra.ClientRequest.Read.Timeouts.Count¶
cumulative_counter
Count of read timeouts since server start
This typically indicates a server overload condition.
If this value is increasing across the cluster then the cluster is too small for the application read load.
If this value is increasing for a single server in a cluster, then one of the following conditions may be true:
- one or more clients are directing more load to this server than the others
- the server is experiencing hardware or software issues and may require maintenance.
counter.cassandra.ClientRequest.Write.Latency.Count¶
cumulative_counter
Count of write operations since server start
This metric indicates the write load of the server.
counter.cassandra.ClientRequest.Write.Timeouts.Count¶
cumulative_counter
Count of write timeouts since server start
This typically indicates a server overload condition.
If this value is increasing across the cluster then the cluster is too small for the application write load.
If this value is increasing for a single server in a cluster, then one of the following conditions may be true:
- one or more clients are directing more load to this server than the others
- the server is experiencing hardware or software issues and may require maintenance.
counter.cassandra.Compaction.TotalCompactionsCompleted.Count¶
cumulative_counter
Number of compaction operations since node start
If this value does not increase steadily over time then the node may be experiencing problems completing compaction operations.
gauge.cassandra.ClientRequest.RangeSlice.Latency.50thPercentile¶
gauge
50th percentile (median) of recent Cassandra range slice latency
This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.RangeSlice.Latency.99thPercentile¶
gauge
99th percentile of recent Cassandra range slice latency
This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.RangeSlice.Latency.Max¶
gauge
Maximum recent Cassandra range slice latency
gauge.cassandra.ClientRequest.Read.Latency.50thPercentile¶
gauge
50th percentile (median) of recent Cassandra read latency
This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.Read.Latency.99thPercentile¶
gauge
99th percentile of recent Cassandra read latency
This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.Write.Latency.50thPercentile¶
gauge
50th percentile (median) of recent Cassandra Write latency
This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.Write.Latency.99thPercentile¶
gauge
99th percentile of recent Cassandra write latency
This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.Compaction.PendingTasks.Value¶
gauge
Number of compaction operations waiting to run
If this value is continually increasing then the node may be experiencing problems completing compaction operations.
gauge.cassandra.Storage.Load.Count¶
gauge
Storage used for Cassandra data in bytes
Use this metric to see how much storage is being used for data by a Cassandra node
The value of this metric is influenced by:
- Total data stored into the database
- compaction behavior
gauge.cassandra.Storage.TotalHints.Count¶
gauge
Total hints since node start
Indicates that write operations cannot be delivered to a node, usually because a node is down. If this value is increasing and all nodes are up then there may be some connectivity issue between nodes in the cluster.
gauge.cassandra.Storage.TotalHintsInProgress.Count¶
gauge
Total pending hints
Indicates that write operations cannot be delivered to a node, usually because a node is down. If this value is increasing and all nodes are up then there may be some connectivity issue between nodes in the cluster.