Docs » Integrations Guide » Integrations Reference » Apache Spark

image0 Apache Spark

Plugin currently supports cluster modes Standalone and Mesos

Metadata associated with the spark plugin for collectd can be found here. The relevant code for the plugin can be found here.

DESCRIPTION

This is the SignalFx Apache Spark plugin. Note we currently only support cluster modes Standalone and Mesos. Follow these instructions to install the Apache Spark plugin for collectd.

The spark-collectd plugin collects metrics from Spark cluster and instances hitting endpoints specified in Spark’s Monitoring and Instrumentation documentation under REST API and Metrics: Sparkdocumentation

FEATURES

Built-in dashboards

  • Spark Overview: Overview of Spark cluster.

image1

image2

  • Master Process: Overview of master process (data captured from Metrics sink).

image3

image4

  • Worker Process: Overview of worker process (data captured from Metrics sink).

image5

image6

  • Spark Application: Overview of data by Spark Application and user.

image7

image8

  • Running Jobs: Overview of running jobs by Spark Application and user.

image9

  • Active Stages: Overview of active stages by Spark Application and user.

image10

  • Driver: Overview of driver executor by Spark Application and user.

image11

  • Executors: Overview of executors (excluding driver) by Spark Application and user.

image12

  • Streaming Statistics: Overview of streaming applications by Spark Application and user.

image13

image14

REQUIREMENTS AND DEPENDENCIES

Version information

Software Version
collectd 4.9 or later
python 2.6 or later
spark 2.2.0 or later
Python plugin for collectd (included with SignalFx collectd agent)

INSTALLATION

If you are using the new Smart Agent, see the docs for thecollectd/sparkmonitor for more information. The configuration documentation below may be helpful as well, but consult the Smart Agent repo’s docs for the exact schema.

  1. Download collectd-spark. Place the spark_plugin.py file in /usr/share/collectd/collectd-spark
  2. Modify the sample configuration file for this plugin to /etc/collectd/managed_config
  3. Modify the sample configuration file as described in Configuration, below
  4. Install the Python requirements with sudo pip install -r requirements.txt
  5. Restart collectd

CONFIGURATION

Using the example configuration file 10-spark.conf as a guide, provide values for the configuration options listed below that make sense for your environment and the metrics you want to be collected and reported.

configuration option definition example value
ModulePath Path on disk where collectd can find this module. “/usr/share/collectd/collectd-spark/”
MetricsURL URL for master or worker node if Metrics source (and therefore, by default Metrics HTTP Servlet Sink) are enabled http://localhost
MasterPort Master (webui) port to query for metrics 8080
WorkerPorts Space-separated worker (webui) ports to query for metrics 8081 8082
Applications Boolean indicating whether to collect application level metrics “False”
Master URL for master application http://localhost:8080
Cluster Your Spark cluster mode - only standalone and mesos supported “Standalone”
EnhancedMetrics Flag to specify whether to include additional metrics “False”
IncludeMetrics Metrics from enhanced metrics that can be included individually “metric_name_1,metric_name_2”
ExcludeMetrics Metrics from enhanced metrics that can be excluded individually “metric_name_1,metric_name2”
Dimension Key-value pair for a user-defined dimension “dimension_key=dimension_value”
Dimension Comma-separated key-value pairs for user-defined dimensions “dimension_key1=dimension_value1,dimension_key2=dimension_value2”

Example configuration:

LoadPlugin python
<Plugin python>
  ModulePath "/usr/share/collectd/collectd-spark"

  Import spark_plugin

  <Module spark_plugin>
  MetricsURL "http://127.0.0.1"
  MasterPort 8080
  WorkerPorts 8081 8082
  Applications "True"
  Master "http://127.0.0.1:8080"
  Cluster "Standalone"
  EnhancedMetrics "True"
  ExcludeMetrics "jvm.pools.Code-Cache.committed"
  </Module>
</Plugin>

The plugin can be configured to collect metrics from multiple instances in the following manner.

LoadPlugin python
<Plugin python>
  ModulePath "/usr/share/collectd/collectd-spark"

  Import spark_plugin

  <Module spark_plugin>
    MetricsURL "http://master"
    MasterPort 8080
    Applications "True"
    Master "http://master:8080"
    Cluster "Standalone"
    Dimension "name=MASTERTEST"
    IncludeMetrics "jvm.pools.Code-Cache.committed"
  </Module>

  <Module spark_plugin>
    MetricsURL "http://worker"
    WorkerPorts 8081 8082
    Applications "False"
    Master "http://master:8080"
    Dimension "name=WORKER1TEST"
    IncludeMetrics "jvm.pools.Code-Cache.committed"
  </Module>
</Plugin>

USAGE

Sample of built-in dashboard in SignalFx:

image15

image16

Metrics corresponding to Metrics sink will contain the following dimension by default:

  • spark_process, either master or worker to differentiate master- and worker- specific metrics like master.apps and worker.coresFree

Metrics at the application level (endpoint /api/v1/applications) will contain the following dimension by default:

  • cluster, set to value corresponding to key “Cluster” in configuration file

Additional details:

  • plugin is always set to apache_spark

METRICS

To emphasize, metrics will only be collected if MetricsURL is provided and/or Applications is set to True in 10-spark.conf (please read the example configuration file for associated required keys). See usage for details.

The following are default metrics captured and sent if Metrics Sink and Applications are enabled:

  • jvm.total.used
  • jvm.total.committed
  • jvm.heap.used
  • jvm.heap.committed
  • jvm.non-heap.used
  • jvm.non-heap.committed
  • jvm.MarkSweepCompact.count
  • jvm.MarkSweepCompact.time
  • worker.coresFree
  • worker.coresUsed
  • worker.executors
  • worker.memFree_MB
  • worker.memUsed_MB
  • master.aliveWorkers
  • master.apps
  • master.waitingApps
  • master.workers
  • spark.job.num_tasks
  • spark.job.num_active_tasks
  • spark.job.num_completed_tasks
  • spark.job.num_skipped_tasks
  • spark.job.num_failed_tasks
  • spark.job.num_active_stages
  • spark.job.num_completed_stages
  • spark.job.num_skipped_stages
  • spark.job.num_failed_stages
  • spark.stage.executor_run_time
  • spark.stage.input_bytes
  • spark.stage.input_records
  • spark.stage.output_bytes
  • spark.stage.output_records
  • spark.stage.memory_bytes_spilled
  • spark.stage.disk_bytes_spilled
  • spark.driver.memory_used
  • spark.driver.disk_used
  • spark.driver.total_input_bytes
  • spark.driver.total_shuffle_read
  • spark.driver.total_shuffle_write
  • spark.driver.max_memory
  • spark.executor.memory_used
  • spark.executor.disk_used
  • spark.executor.total_input_bytes
  • spark.executor.total_shuffle_read
  • spark.executor.total_shuffle_write
  • spark.executor.max_memory
  • spark.streaming.avg_input_rate
  • spark.streaming.num_total_completed_batches
  • spark.streaming.num_active_batches
  • spark.streaming.num_inactive_receivers
  • spark.streaming.num_received_records
  • spark.streaming.num_processed_records
  • spark.streaming.avg_processing_time
  • spark.streaming.avg_scheduling_delay
  • spark.streaming.avg_total_delay

The following are metrics that can be collected and sent if EnhancedMetrics is set to “True” in configurations (see CONFIGURATION for more information):

  • jvm.pools.Code-Cache.used
  • jvm.pools.Code-Cache.committed
  • jvm.pools.Compressed-Class-Space.used
  • jvm.pools.Compressed-Class-Space.committed
  • jvm.pools.Metaspace.used
  • jvm.pools.Metaspace.committed
  • jvm.pools.Eden-Space.used
  • jvm.pools.Eden-Space.committed
  • jvm.pools.Survivor-Space.used
  • jvm.pools.Survivor-Space.committed
  • jvm.pools.Tenured-Gen.used
  • jvm.pools.Tenured-Gen.committed
  • HiveExternalCatalog.fileCacheHits
  • HiveExternalCatalog.filesDiscovered
  • HiveExternalCatalog.hiveClientCalls
  • HiveExternalCatalog.parallelListingJobCount
  • HiveExternalCatalog.partitionsFetched
  • spark.stage.shuffle_read_bytes
  • spark.stage.shuffle_read_records
  • spark.stage.shuffle_write_bytes
  • spark.stage.shuffle_write_records
  • spark.driver.rdd_blocks
  • spark.driver.active_tasks
  • spark.driver.failed_tasks
  • spark.driver.completed_tasks
  • spark.driver.total_tasks
  • spark.driver.total_duration
  • spark.executor.rdd_blocks
  • spark.executor.active_tasks
  • spark.executor.failed_tasks
  • spark.executor.completed_tasks
  • spark.executor.total_tasks
  • spark.executor.total_duration

Metric naming

<metric type>.spark.<name of metric> or <metric type><name of metric>. This is the format of default metric names reported by the plugin.

Below is a list of all metrics.

Metric Name Brief Type
counter.HiveExternalCatalog.counter.HiveClientCalls Total number of client calls sent to Hive for query processing counter
counter.HiveExternalCatalog.fileCacheHits Total number of file level cache hits occurred counter
counter.HiveExternalCatalog.filesDiscovered Total number of files discovered counter
counter.HiveExternalCatalog.parallelListingJobCount Total number of Hive-specific jobs running in parallel counter
counter.HiveExternalCatalog.partitionsFetched Total number of partitions fetched counter
counter.spark.driver.completed_tasks Total number of completed tasks in driver mapped to a particular application counter
counter.spark.driver.disk_used Amount of disk used by driver mapped to a particular application counter
counter.spark.driver.failed_tasks Total number of failed tasks in driver mapped to a particular application counter
counter.spark.driver.memory_used Amount of memory used by driver mapped to a particular application counter
counter.spark.driver.total_duration Fraction of time spent by driver mapped to a particular application counter
counter.spark.driver.total_input_bytes Number of input bytes in driver mapped to a particular application counter
counter.spark.driver.total_shuffle_read Size read during a shuffle in driver mapped to a particular application counter
counter.spark.driver.total_shuffle_write Size written to during a shuffle in driver mapped to a particular application counter
counter.spark.driver.total_tasks Total number of tasks in driver mapped to a particular application counter
counter.spark.executor.completed_tasks Completed tasks across executors working for a particular application counter
counter.spark.executor.disk_used Amount of disk used across executors working for a particular application counter
counter.spark.executor.failed_tasks Failed tasks across executors working for a particular application counter
counter.spark.executor.memory_used Amount of memory used across executors working for a particular application counter
counter.spark.executor.total_duration Fraction of time spent across executors working for a particular application counter
counter.spark.executor.total_input_bytes Number of input bytes across executors working for a particular application counter
counter.spark.executor.total_shuffle_read Size read during a shuffle in a particular application’s executors counter
counter.spark.executor.total_shuffle_write Size written to during a shuffle in a particular application’s executors counter
counter.spark.executor.total_tasks Total tasks across executors working for a particular application counter
counter.spark.streaming.num_processed_records Number of processed records in a streaming application counter
counter.spark.streaming.num_received_records Number of received records in a streaming application counter
counter.spark.streaming.num_total_completed_batches Number of batches completed in a streaming application counter
gauge.jvm.MarkSweepCompact.count Garbage collection count gauge
gauge.jvm.MarkSweepCompact.time Garbage collection time gauge
gauge.jvm.heap.committed Amount of committed heap memory (in MB) gauge
gauge.jvm.heap.used Amount of used heap memory (in MB) gauge
gauge.jvm.non-heap.committed Amount of committed non-heap memory (in MB) gauge
gauge.jvm.non-heap.used Amount of used non-heap memory (in MB) gauge
gauge.jvm.pools.Code-Cache.committed Amount of memory committed for compilation and storage of native code gauge
gauge.jvm.pools.Code-Cache.used Amount of memory used to compile and store native code gauge
gauge.jvm.pools.Compressed-Class-Space.committed Amount of memory committed for compressing a class object gauge
gauge.jvm.pools.Compressed-Class-Space.used Amount of memory used to compress a class object gauge
gauge.jvm.pools.Eden-Space.committed Amount of memory committed for the initial allocation of objects gauge
gauge.jvm.pools.Eden-Space.used Amount of memory used for the initial allocation of objects gauge
gauge.jvm.pools.Metaspace.committed Amount of memory committed for storing classes and classloaders gauge
gauge.jvm.pools.Metaspace.used Amount of memory used to store classes and classloaders gauge
gauge.jvm.pools.Survivor-Space.committed Amount of memory committed specifically for objects that have survived GC of the Eden Space gauge
gauge.jvm.pools.Survivor-Space.used Amount of memory used for objects that have survived GC of the Eden Space gauge
gauge.jvm.pools.Tenured-Gen.committed Amount of memory committed to store objects that have lived in the survivor space for a given period of time gauge
gauge.jvm.pools.Tenured-Gen.used Amount of memory used for objects that have lived in the survivor space for a given period of time gauge
gauge.jvm.total.committed Amount of committed JVM memory (in MB) gauge
gauge.jvm.total.used Amount of used JVM memory (in MB) gauge
gauge.master.aliveWorkers Total functioning workers gauge
gauge.master.apps Total number of active applications in the spark cluster gauge
gauge.master.waitingApps Total number of waiting applications in the spark cluster gauge
gauge.master.workers Total number of workers in spark cluster gauge
gauge.spark.driver.active_tasks Total number of active tasks in driver mapped to a particular application gauge
gauge.spark.driver.max_memory Maximum memory used by driver mapped to a particular application gauge
gauge.spark.driver.rdd_blocks Number of RDD blocks in the driver mapped to a particular application gauge
gauge.spark.executor.active_tasks Total number of active tasks across all executors working for a particular application gauge
gauge.spark.executor.count Total number of executors performing for an active application in the spark cluster gauge
gauge.spark.executor.max_memory Max memory across all executors working for a particular application gauge
gauge.spark.executor.rdd_blocks Number of RDD blocks across all executors working for a particular application gauge
gauge.spark.job.num_active_stages Total number of active stages for an active application in the spark cluster gauge
gauge.spark.job.num_active_tasks Total number of active tasks for an active application in the spark cluster gauge
gauge.spark.job.num_completed_stages Total number of completed stages for an active application in the spark cluster gauge
gauge.spark.job.num_completed_tasks Total number of completed tasks for an active application in the spark cluster gauge
gauge.spark.job.num_failed_stages Total number of failed stages for an active application in the spark cluster gauge
gauge.spark.job.num_failed_tasks Total number of failed tasks for an active application in the spark cluster gauge
gauge.spark.job.num_skipped_stages Total number of skipped stages for an active application in the spark cluster gauge
gauge.spark.job.num_skipped_tasks Total number of skipped tasks for an active application in the spark cluster gauge
gauge.spark.job.num_tasks Total number of tasks for an active application in the spark cluster gauge
gauge.spark.num_active_stages Total number of active stages for an active application in the spark cluster gauge
gauge.spark.num_running_jobs Total number of running jobs for an active application in the spark cluster gauge
gauge.spark.stage.disk_bytes_spilled Actual size written to disk for an active application in the spark cluster gauge
gauge.spark.stage.executor_run_time Fraction of time spent by (and averaged across) executors for a particular application gauge
gauge.spark.stage.input_bytes Input size for a particular application gauge
gauge.spark.stage.input_records Input records received for a particular application gauge
gauge.spark.stage.memory_bytes_spilled Size spilled to disk from memory for an active application in the spark cluster gauge
gauge.spark.stage.output_bytes Output size for a particular application gauge
gauge.spark.stage.output_records Output records written to for a particular application gauge
gauge.spark.stage.shuffle_read_bytes Read size during shuffle phase for a particular application gauge
gauge.spark.stage.shuffle_read_records Number of records read during shuffle phase for a particular application gauge
gauge.spark.stage.shuffle_write_bytes Size written during shuffle phase for a particular application gauge
gauge.spark.stage.shuffle_write_records Number of records written to during shuffle phase for a particular application gauge
gauge.spark.streaming.avg_input_rate Average input rate of records across retained batches in a streaming application gauge
gauge.spark.streaming.avg_processing_time Average processing time in a streaming application gauge
gauge.spark.streaming.avg_scheduling_delay Average scheduling delay in a streaming application gauge
gauge.spark.streaming.avg_total_delay Average total delay in a streaming application gauge
gauge.spark.streaming.num_active_batches Number of active batches in a streaming application gauge
gauge.spark.streaming.num_inactive_receivers Number of inactive receivers in a streaming application gauge
gauge.worker.coresFree Total cores free for a particular worker process gauge
gauge.worker.coresUsed Total cores used by a particular worker process gauge
gauge.worker.executors Total number of executors for a particular worker process gauge
gauge.worker.memFree_MB Total memory free for a particular worker process gauge
gauge.worker.memUsed_MB Memory used by a particular worker process gauge

counter.HiveExternalCatalog.counter.HiveClientCalls

counter

The total number of client calls sent to Hive for query processing. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

counter.HiveExternalCatalog.fileCacheHits

counter

The total number of file level cache hits occurred. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

counter.HiveExternalCatalog.filesDiscovered

counter

The total number of files discovered. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

counter.HiveExternalCatalog.parallelListingJobCount

counter

The total number of Hive-specific jobs running in parallel. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

counter.HiveExternalCatalog.partitionsFetched

counter

The total number of partitions fetched in Hive. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

counter.spark.driver.completed_tasks

counter

Total number of completed tasks in driver mapped to a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.disk_used

counter

Amount of disk used by driver mapped to a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.failed_tasks

counter

Total number of failed tasks in driver mapped to a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.memory_used

counter

Amount of memory used by driver mapped to a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.total_duration

counter

Fraction of time spent by driver mapped to a particular application (expressed in ms/s). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.total_input_bytes

counter

The number of input bytes in driver mapped to a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.total_shuffle_read

counter

The size read during a shuffle in driver mapped to a particular application(expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.total_shuffle_write

counter

The size written to during a shuffle in driver mapped to a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.driver.total_tasks

counter

Total number of tasks in driver mapped to a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.completed_tasks

counter

The number of completed tasks across executors working for a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.disk_used

counter

Amount of disk used across executors working for a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.failed_tasks

counter

The number of failed tasks across executors working for a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.memory_used

counter

Amount of memory used across executors working for a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.total_duration

counter

Fraction of time spent across executors working for a particular application (expressed in ms/s). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.total_input_bytes

counter

The number of input bytes across executors working for a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.total_shuffle_read

counter

The size read during a shuffle in a particular application’s executors (expressed in bytes) - aggregated across executors. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.total_shuffle_write

counter

The size written to during a shuffle in a particular application’s executors (expressed in bytes) - aggregated across executors. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.executor.total_tasks

counter

The total number of tasks across executors working for a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.streaming.num_processed_records

counter

The number of processed records in a streaming application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.streaming.num_received_records

counter

The number of received records in a streaming application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

counter.spark.streaming.num_total_completed_batches

counter

The number of batches completed in a streaming application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.jvm.MarkSweepCompact.count

gauge

The number of times garbage collection have occurred in the Marksweep GC. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.MarkSweepCompact.time

gauge

The time taken for garbage collection that have occurred in the Marksweep GC. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.heap.committed

gauge

The total amount of heap memory (expressed in MB) committed. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.heap.used

gauge

The total amount of heap memory (expressed in MB) used. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.non-heap.committed

gauge

The total amount of non-heap memory (expressed in MB) committed. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.non-heap.used

gauge

The total amount of non-heap memory (expressed in MB) used. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Code-Cache.committed

gauge

The amount of memory (expressed in MB) committed for compilation and storage of native code. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Code-Cache.used

gauge

The amount of memory (expressed in MB) used to compile and store native code. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Compressed-Class-Space.committed

gauge

The amount of memory (expressed in MB) committed for compressing a class object. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Compressed-Class-Space.used

gauge

The amount of memory (expressed in MB) used to compress a class object. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Eden-Space.committed

gauge

The amount of memory (expressed in MB) committed for the initial allocation of objects. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Eden-Space.used

gauge

The amount of memory (expressed in MB) used for the initial allocation of objects. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Metaspace.committed

gauge

The amount of memory (expressed in MB) committed for storing classes and classloaders. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Metaspace.used

gauge

The amount of memory (expressed in MB) used to store classes and classloaders. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Survivor-Space.committed

gauge

Amount of memory (expressed in MB) committed specifically for objects that have survived garbace collection of the Eden Space. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Survivor-Space.used

gauge

Amount of memory (expressed in MB) used for objects that have survived garbace collection of the Eden Space. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Tenured-Gen.committed

gauge

Amount of memory (expressed in MB) committed to store objects that have lived in the survivor space for a given period of time. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.pools.Tenured-Gen.used

gauge

Amount of memory (expressed in MB) used for objects that have lived in the survivor space for a given period of time. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.total.committed

gauge

The total amount of JVM memory (expressed in MB) committed. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.jvm.total.used

gauge

The total amount of JVM memory (expressed in MB) used. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.master.aliveWorkers

gauge

The total number of functioning workers reporting to the master process. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.master.apps

gauge

The total number of applications still active/running in the spark cluster. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.master.waitingApps

gauge

The total number of applications in queue waiting to be executed in the spark cluster. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.master.workers

gauge

The total number of workers in spark cluster registered to the master process. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.spark.driver.active_tasks

gauge

Total number of active tasks in driver mapped to a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.driver.max_memory

gauge

Maximum memory used by driver mapped to a particular application (express in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.driver.rdd_blocks

gauge

Number of RDD blocks in the driver mapped to a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.executor.active_tasks

gauge

Total number of active tasks across all executors working for a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.executor.count

gauge

Total number of executors performing for an active application in the spark cluster. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.executor.max_memory

gauge

Max memory across all executors working for a particular application (expressed in bytes). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.executor.rdd_blocks

gauge

Number of RDD blocks across all executors working for a particular application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_active_stages

gauge

The total number of active stages for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_active_tasks

gauge

The total number of active tasks for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_completed_stages

gauge

The total number of completed stages for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_completed_tasks

gauge

The total number of completed tasks for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_failed_stages

gauge

The total number of failed stages for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_failed_tasks

gauge

The total number of failed tasks for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_skipped_stages

gauge

The total number of skipped stages for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_skipped_tasks

gauge

The total number of skipped tasks for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.job.num_tasks

gauge

The total number of tasks for an active application in the spark cluster - aggregated by jobs. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.num_active_stages

gauge

The total number of active stages for an active application in the spark cluster. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.num_running_jobs

gauge

The total number of running jobs for an active application in the spark cluster. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.disk_bytes_spilled

gauge

Actual size written to disk (expressed in bytes) - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.executor_run_time

gauge

Fraction of time spent by (and averaged across) executors for a particular application (expressed in ms/s) - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.input_bytes

gauge

Input size for a particular application (expressed as bytes) - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.input_records

gauge

Input records received for a particular application - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.memory_bytes_spilled

gauge

Size spilled to disk from memory (expressed in bytes) - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.output_bytes

gauge

Output size for a particular application (expressed as bytes) - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.output_records

gauge

Output records written to for a particular application - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.shuffle_read_bytes

gauge

Read size during shuffle phase for a particular application (expressed as bytes) - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.shuffle_read_records

gauge

Number of records read during shuffle phase for a particular application - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.shuffle_write_bytes

gauge

Size written during shuffle phase for a particular application (expressed as bytes) - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.stage.shuffle_write_records

gauge

Number of records written to during shuffle phase for a particular application - aggregated by active stages. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.streaming.avg_input_rate

gauge

The average input rate of records across retained batches in a streaming application (expressed in ms/s). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.streaming.avg_processing_time

gauge

The average processing time in a streaming application (expressed in ms). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.streaming.avg_scheduling_delay

gauge

The average scheduling delay in a streaming application (expressed in ms). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.streaming.avg_total_delay

gauge

The average total delay in a streaming application (expressed in ms). This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.streaming.num_active_batches

gauge

The number of active batches in a streaming application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.spark.streaming.num_inactive_receivers

gauge

The number of receivers (e.g. Kafka, Flume, etc.) that have become inactive in a streaming application. This metric is reported with the dimension cluster to specify the cluster Spark is running on (e.g. Mesos)

gauge.worker.coresFree

gauge

The total number of cores free for a particular worker process. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.worker.coresUsed

gauge

The total number of cores used by a particular worker process. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.worker.executors

gauge

The total number of executors running jobs for a particular worker process. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.worker.memFree_MB

gauge

The total amount of memory (expressed in MB) available to a particular worker process. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.

gauge.worker.memUsed_MB

gauge

The amount of memory (expressed in MB) currently used by a particular worker process. This metric is reported with the dimension spark_process to indicate whether it corresponds to a master or worker process.