Docs » Integrations Guide » Integrations Reference » Apache Spark

../../_images/integrations_apachespark.png Apache Spark πŸ”—

DESCRIPTION πŸ”—

This integration primarily consists of the Smart Agent monitor collectd/spark. Below is an overview of that monitor.

Smart Agent Monitor πŸ”—

Collects metrics about a Spark cluster using the collectd Spark Python plugin. That plugin collects metrics from Spark cluster and instances by hitting endpoints specified in Spark’s Monitoring and Instrumentation documentation under REST API and Metrics.

We currently only support cluster modes Standalone, Mesos, and Hadoop Yarn via HTTP endpoints.

You have to specify distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, set isMaster to true.

When running Spark on Apache Hadoop / Yarn, this integration is only capable of reporting application metrics from the master node. Please use the collectd/hadoop monitor to report on the health of the cluster.

Example config: πŸ”—

An example configuration for monitoring applications on Yarn

monitors:
  - type: collectd/spark
    host: 000.000.000.000
    port: 8088
    clusterType: Yarn
    isMaster: true
    collectApplicationMetrics: true

INSTALLATION πŸ”—

This integration is part of the SignalFx Smart Agent as the collectd/spark monitor. You should first deploy the Smart Agent to the same host as the service you want to monitor, and then continue with the configuration instructions below.

CONFIGURATION πŸ”—

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: collectd/spark
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
pythonBinary no string Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well.
host yes string
port yes integer
isMaster no bool Set to true when monitoring a master Spark node (default: false)
clusterType yes string Should be one of Standalone or Mesos or Yarn. Cluster metrics will not be collected on Yarn. Please use the collectd/hadoop monitor to gain insights to your cluster's health.
collectApplicationMetrics no bool (default: false)
enhancedMetrics no bool (default: false)

USAGE πŸ”—

Sample of built-in dashboard in SignalFx:

../../_images/dashboard_spark_cluster_top.png

../../_images/dashboard_spark_cluster_bottom.png

METRICS πŸ”—

Metric Name Description Type
counter.HiveExternalCatalog.fileCacheHits Total number of file level cache hits occurred counter
counter.HiveExternalCatalog.filesDiscovered Total number of files discovered counter
counter.HiveExternalCatalog.hiveClientCalls Total number of client calls sent to Hive for query processing counter
counter.HiveExternalCatalog.parallelListingJobCount Total number of Hive-specific jobs running in parallel counter
counter.HiveExternalCatalog.partitionsFetched Total number of partitions fetched counter
counter.spark.driver.completed_tasks Total number of completed tasks in driver mapped to a particular application counter
counter.spark.driver.disk_used Amount of disk used by driver mapped to a particular application counter
counter.spark.driver.failed_tasks Total number of failed tasks in driver mapped to a particular application counter
counter.spark.driver.memory_used Amount of memory used by driver mapped to a particular application counter
counter.spark.driver.total_duration Fraction of time spent by driver mapped to a particular application counter
counter.spark.driver.total_input_bytes Number of input bytes in driver mapped to a particular application counter
counter.spark.driver.total_shuffle_read Size read during a shuffle in driver mapped to a particular application counter
counter.spark.driver.total_shuffle_write Size written to during a shuffle in driver mapped to a particular application counter
counter.spark.driver.total_tasks Total number of tasks in driver mapped to a particular application counter
counter.spark.executor.completed_tasks Completed tasks across executors working for a particular application counter
counter.spark.executor.disk_used Amount of disk used across executors working for a particular application counter
counter.spark.executor.failed_tasks Failed tasks across executors working for a particular application counter
counter.spark.executor.memory_used Amount of memory used across executors working for a particular application counter
counter.spark.executor.total_duration Fraction of time spent across executors working for a particular application counter
counter.spark.executor.total_input_bytes Number of input bytes across executors working for a particular application counter
counter.spark.executor.total_shuffle_read Size read during a shuffle in a particular application's executors counter
counter.spark.executor.total_shuffle_write Size written to during a shuffle in a particular application's executors counter
counter.spark.executor.total_tasks Total tasks across executors working for a particular application counter
counter.spark.streaming.num_processed_records Number of processed records in a streaming application counter
counter.spark.streaming.num_received_records Number of received records in a streaming application counter
counter.spark.streaming.num_total_completed_batches Number of batches completed in a streaming application counter
gauge.jvm.MarkSweepCompact.count Garbage collection count gauge
gauge.jvm.MarkSweepCompact.time Garbage collection time gauge
gauge.jvm.heap.committed Amount of committed heap memory (in MB) gauge
gauge.jvm.heap.used Amount of used heap memory (in MB) gauge
gauge.jvm.non-heap.committed Amount of committed non-heap memory (in MB) gauge
gauge.jvm.non-heap.used Amount of used non-heap memory (in MB) gauge
gauge.jvm.pools.Code-Cache.committed Amount of memory committed for compilation and storage of native code gauge
gauge.jvm.pools.Code-Cache.used Amount of memory used to compile and store native code gauge
gauge.jvm.pools.Compressed-Class-Space.committed Amount of memory committed for compressing a class object gauge
gauge.jvm.pools.Compressed-Class-Space.used Amount of memory used to compress a class object gauge
gauge.jvm.pools.Eden-Space.committed Amount of memory committed for the initial allocation of objects gauge
gauge.jvm.pools.Eden-Space.used Amount of memory used for the initial allocation of objects gauge
gauge.jvm.pools.Metaspace.committed Amount of memory committed for storing classes and classloaders gauge
gauge.jvm.pools.Metaspace.used Amount of memory used to store classes and classloaders gauge
gauge.jvm.pools.Survivor-Space.committed Amount of memory committed specifically for objects that have survived GC of the Eden Space gauge
gauge.jvm.pools.Survivor-Space.used Amount of memory used for objects that have survived GC of the Eden Space gauge
gauge.jvm.pools.Tenured-Gen.committed Amount of memory committed to store objects that have lived in the survivor space for a given period of time gauge
gauge.jvm.pools.Tenured-Gen.used Amount of memory used for objects that have lived in the survivor space for a given period of time gauge
gauge.jvm.total.committed Amount of committed JVM memory (in MB) gauge
gauge.jvm.total.used Amount of used JVM memory (in MB) gauge
gauge.master.aliveWorkers Total functioning workers gauge
gauge.master.apps Total number of active applications in the spark cluster gauge
gauge.master.waitingApps Total number of waiting applications in the spark cluster gauge
gauge.master.workers Total number of workers in spark cluster gauge
gauge.spark.driver.active_tasks Total number of active tasks in driver mapped to a particular application gauge
gauge.spark.driver.max_memory Maximum memory used by driver mapped to a particular application gauge
gauge.spark.driver.rdd_blocks Number of RDD blocks in the driver mapped to a particular application gauge
gauge.spark.executor.active_tasks Total number of active tasks across all executors working for a particular application gauge
gauge.spark.executor.count Total number of executors performing for an active application in the spark cluster gauge
gauge.spark.executor.max_memory Max memory across all executors working for a particular application gauge
gauge.spark.executor.rdd_blocks Number of RDD blocks across all executors working for a particular application gauge
gauge.spark.job.num_active_stages Total number of active stages for an active application in the spark cluster gauge
gauge.spark.job.num_active_tasks Total number of active tasks for an active application in the spark cluster gauge
gauge.spark.job.num_completed_stages Total number of completed stages for an active application in the spark cluster gauge
gauge.spark.job.num_completed_tasks Total number of completed tasks for an active application in the spark cluster gauge
gauge.spark.job.num_failed_stages Total number of failed stages for an active application in the spark cluster gauge
gauge.spark.job.num_failed_tasks Total number of failed tasks for an active application in the spark cluster gauge
gauge.spark.job.num_skipped_stages Total number of skipped stages for an active application in the spark cluster gauge
gauge.spark.job.num_skipped_tasks Total number of skipped tasks for an active application in the spark cluster gauge
gauge.spark.job.num_tasks Total number of tasks for an active application in the spark cluster gauge
gauge.spark.num_active_stages Total number of active stages for an active application in the spark cluster gauge
gauge.spark.num_running_jobs Total number of running jobs for an active application in the spark cluster gauge
gauge.spark.stage.disk_bytes_spilled Actual size written to disk for an active application in the spark cluster gauge
gauge.spark.stage.executor_run_time Fraction of time spent by (and averaged across) executors for a particular application gauge
gauge.spark.stage.input_bytes Input size for a particular application gauge
gauge.spark.stage.input_records Input records received for a particular application gauge
gauge.spark.stage.memory_bytes_spilled Size spilled to disk from memory for an active application in the spark cluster gauge
gauge.spark.stage.output_bytes Output size for a particular application gauge
gauge.spark.stage.output_records Output records written to for a particular application gauge
gauge.spark.stage.shuffle_read_bytes Read size during shuffle phase for a particular application gauge
gauge.spark.stage.shuffle_read_records Number of records read during shuffle phase for a particular application gauge
gauge.spark.stage.shuffle_write_bytes Size written during shuffle phase for a particular application gauge
gauge.spark.stage.shuffle_write_records Number of records written to during shuffle phase for a particular application gauge
gauge.spark.streaming.avg_input_rate Average input rate of records across retained batches in a streaming application gauge
gauge.spark.streaming.avg_processing_time Average processing time in a streaming application gauge
gauge.spark.streaming.avg_scheduling_delay Average scheduling delay in a streaming application gauge
gauge.spark.streaming.avg_total_delay Average total delay in a streaming application gauge
gauge.spark.streaming.num_active_batches Number of active batches in a streaming application gauge
gauge.spark.streaming.num_inactive_receivers Number of inactive receivers in a streaming application gauge
gauge.worker.coresFree Total cores free for a particular worker process gauge
gauge.worker.coresUsed Total cores used by a particular worker process gauge
gauge.worker.executors Total number of executors for a particular worker process gauge
gauge.worker.memFree_MB Total memory free for a particular worker process gauge
gauge.worker.memUsed_MB Memory used by a particular worker process gauge

counter.HiveExternalCatalog.fileCacheHits πŸ”—

counter

Total number of file level cache hits occurred

counter.HiveExternalCatalog.filesDiscovered πŸ”—

counter

Total number of files discovered

counter.HiveExternalCatalog.hiveClientCalls πŸ”—

counter

Total number of client calls sent to Hive for query processing

counter.HiveExternalCatalog.parallelListingJobCount πŸ”—

counter

Total number of Hive-specific jobs running in parallel

counter.HiveExternalCatalog.partitionsFetched πŸ”—

counter

Total number of partitions fetched

counter.spark.driver.completed_tasks πŸ”—

counter

Total number of completed tasks in driver mapped to a particular application

counter.spark.driver.disk_used πŸ”—

counter

Amount of disk used by driver mapped to a particular application

counter.spark.driver.failed_tasks πŸ”—

counter

Total number of failed tasks in driver mapped to a particular application

counter.spark.driver.memory_used πŸ”—

counter

Amount of memory used by driver mapped to a particular application

counter.spark.driver.total_duration πŸ”—

counter

Fraction of time spent by driver mapped to a particular application

counter.spark.driver.total_input_bytes πŸ”—

counter

Number of input bytes in driver mapped to a particular application

counter.spark.driver.total_shuffle_read πŸ”—

counter

Size read during a shuffle in driver mapped to a particular application

counter.spark.driver.total_shuffle_write πŸ”—

counter

Size written to during a shuffle in driver mapped to a particular application

counter.spark.driver.total_tasks πŸ”—

counter

Total number of tasks in driver mapped to a particular application

counter.spark.executor.completed_tasks πŸ”—

counter

Completed tasks across executors working for a particular application

counter.spark.executor.disk_used πŸ”—

counter

Amount of disk used across executors working for a particular application

counter.spark.executor.failed_tasks πŸ”—

counter

Failed tasks across executors working for a particular application

counter.spark.executor.memory_used πŸ”—

counter

Amount of memory used across executors working for a particular application

counter.spark.executor.total_duration πŸ”—

counter

Fraction of time spent across executors working for a particular application

counter.spark.executor.total_input_bytes πŸ”—

counter

Number of input bytes across executors working for a particular application

counter.spark.executor.total_shuffle_read πŸ”—

counter

Size read during a shuffle in a particular application’s executors

counter.spark.executor.total_shuffle_write πŸ”—

counter

Size written to during a shuffle in a particular application’s executors

counter.spark.executor.total_tasks πŸ”—

counter

Total tasks across executors working for a particular application

counter.spark.streaming.num_processed_records πŸ”—

counter

Number of processed records in a streaming application

counter.spark.streaming.num_received_records πŸ”—

counter

Number of received records in a streaming application

counter.spark.streaming.num_total_completed_batches πŸ”—

counter

Number of batches completed in a streaming application

gauge.jvm.MarkSweepCompact.count πŸ”—

gauge

Garbage collection count

gauge.jvm.MarkSweepCompact.time πŸ”—

gauge

Garbage collection time

gauge.jvm.heap.committed πŸ”—

gauge

Amount of committed heap memory (in MB)

gauge.jvm.heap.used πŸ”—

gauge

Amount of used heap memory (in MB)

gauge.jvm.non-heap.committed πŸ”—

gauge

Amount of committed non-heap memory (in MB)

gauge.jvm.non-heap.used πŸ”—

gauge

Amount of used non-heap memory (in MB)

gauge.jvm.pools.Code-Cache.committed πŸ”—

gauge

Amount of memory committed for compilation and storage of native code

gauge.jvm.pools.Code-Cache.used πŸ”—

gauge

Amount of memory used to compile and store native code

gauge.jvm.pools.Compressed-Class-Space.committed πŸ”—

gauge

Amount of memory committed for compressing a class object

gauge.jvm.pools.Compressed-Class-Space.used πŸ”—

gauge

Amount of memory used to compress a class object

gauge.jvm.pools.Eden-Space.committed πŸ”—

gauge

Amount of memory committed for the initial allocation of objects

gauge.jvm.pools.Eden-Space.used πŸ”—

gauge

Amount of memory used for the initial allocation of objects

gauge.jvm.pools.Metaspace.committed πŸ”—

gauge

Amount of memory committed for storing classes and classloaders

gauge.jvm.pools.Metaspace.used πŸ”—

gauge

Amount of memory used to store classes and classloaders

gauge.jvm.pools.Survivor-Space.committed πŸ”—

gauge

Amount of memory committed specifically for objects that have survived GC of the Eden Space

gauge.jvm.pools.Survivor-Space.used πŸ”—

gauge

Amount of memory used for objects that have survived GC of the Eden Space

gauge.jvm.pools.Tenured-Gen.committed πŸ”—

gauge

Amount of memory committed to store objects that have lived in the survivor space for a given period of time

gauge.jvm.pools.Tenured-Gen.used πŸ”—

gauge

Amount of memory used for objects that have lived in the survivor space for a given period of time

gauge.jvm.total.committed πŸ”—

gauge

Amount of committed JVM memory (in MB)

gauge.jvm.total.used πŸ”—

gauge

Amount of used JVM memory (in MB)

gauge.master.aliveWorkers πŸ”—

gauge

Total functioning workers

gauge.master.apps πŸ”—

gauge

Total number of active applications in the spark cluster

gauge.master.waitingApps πŸ”—

gauge

Total number of waiting applications in the spark cluster

gauge.master.workers πŸ”—

gauge

Total number of workers in spark cluster

gauge.spark.driver.active_tasks πŸ”—

gauge

Total number of active tasks in driver mapped to a particular application

gauge.spark.driver.max_memory πŸ”—

gauge

Maximum memory used by driver mapped to a particular application

gauge.spark.driver.rdd_blocks πŸ”—

gauge

Number of RDD blocks in the driver mapped to a particular application

gauge.spark.executor.active_tasks πŸ”—

gauge

Total number of active tasks across all executors working for a particular application

gauge.spark.executor.count πŸ”—

gauge

Total number of executors performing for an active application in the spark cluster

gauge.spark.executor.max_memory πŸ”—

gauge

Max memory across all executors working for a particular application

gauge.spark.executor.rdd_blocks πŸ”—

gauge

Number of RDD blocks across all executors working for a particular application

gauge.spark.job.num_active_stages πŸ”—

gauge

Total number of active stages for an active application in the spark cluster

gauge.spark.job.num_active_tasks πŸ”—

gauge

Total number of active tasks for an active application in the spark cluster

gauge.spark.job.num_completed_stages πŸ”—

gauge

Total number of completed stages for an active application in the spark cluster

gauge.spark.job.num_completed_tasks πŸ”—

gauge

Total number of completed tasks for an active application in the spark cluster

gauge.spark.job.num_failed_stages πŸ”—

gauge

Total number of failed stages for an active application in the spark cluster

gauge.spark.job.num_failed_tasks πŸ”—

gauge

Total number of failed tasks for an active application in the spark cluster

gauge.spark.job.num_skipped_stages πŸ”—

gauge

Total number of skipped stages for an active application in the spark cluster

gauge.spark.job.num_skipped_tasks πŸ”—

gauge

Total number of skipped tasks for an active application in the spark cluster

gauge.spark.job.num_tasks πŸ”—

gauge

Total number of tasks for an active application in the spark cluster

gauge.spark.num_active_stages πŸ”—

gauge

Total number of active stages for an active application in the spark cluster

gauge.spark.num_running_jobs πŸ”—

gauge

Total number of running jobs for an active application in the spark cluster

gauge.spark.stage.disk_bytes_spilled πŸ”—

gauge

Actual size written to disk for an active application in the spark cluster

gauge.spark.stage.executor_run_time πŸ”—

gauge

Fraction of time spent by (and averaged across) executors for a particular application

gauge.spark.stage.input_bytes πŸ”—

gauge

Input size for a particular application

gauge.spark.stage.input_records πŸ”—

gauge

Input records received for a particular application

gauge.spark.stage.memory_bytes_spilled πŸ”—

gauge

Size spilled to disk from memory for an active application in the spark cluster

gauge.spark.stage.output_bytes πŸ”—

gauge

Output size for a particular application

gauge.spark.stage.output_records πŸ”—

gauge

Output records written to for a particular application

gauge.spark.stage.shuffle_read_bytes πŸ”—

gauge

Read size during shuffle phase for a particular application

gauge.spark.stage.shuffle_read_records πŸ”—

gauge

Number of records read during shuffle phase for a particular application

gauge.spark.stage.shuffle_write_bytes πŸ”—

gauge

Size written during shuffle phase for a particular application

gauge.spark.stage.shuffle_write_records πŸ”—

gauge

Number of records written to during shuffle phase for a particular application

gauge.spark.streaming.avg_input_rate πŸ”—

gauge

Average input rate of records across retained batches in a streaming application

gauge.spark.streaming.avg_processing_time πŸ”—

gauge

Average processing time in a streaming application

gauge.spark.streaming.avg_scheduling_delay πŸ”—

gauge

Average scheduling delay in a streaming application

gauge.spark.streaming.avg_total_delay πŸ”—

gauge

Average total delay in a streaming application

gauge.spark.streaming.num_active_batches πŸ”—

gauge

Number of active batches in a streaming application

gauge.spark.streaming.num_inactive_receivers πŸ”—

gauge

Number of inactive receivers in a streaming application

gauge.worker.coresFree πŸ”—

gauge

Total cores free for a particular worker process

gauge.worker.coresUsed πŸ”—

gauge

Total cores used by a particular worker process

gauge.worker.executors πŸ”—

gauge

Total number of executors for a particular worker process

gauge.worker.memFree_MB πŸ”—

gauge

Total memory free for a particular worker process

gauge.worker.memUsed_MB πŸ”—

gauge

Memory used by a particular worker process

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

  • counter.HiveExternalCatalog.fileCacheHits (counter)
    Total number of file level cache hits occurred
  • counter.HiveExternalCatalog.filesDiscovered (counter)
    Total number of files discovered
  • counter.HiveExternalCatalog.hiveClientCalls (counter)
    Total number of client calls sent to Hive for query processing
  • counter.HiveExternalCatalog.parallelListingJobCount (counter)
    Total number of Hive-specific jobs running in parallel
  • counter.HiveExternalCatalog.partitionsFetched (counter)
    Total number of partitions fetched
  • counter.spark.driver.completed_tasks (counter)
    Total number of completed tasks in driver mapped to a particular application
  • counter.spark.driver.disk_used (counter)
    Amount of disk used by driver mapped to a particular application
  • counter.spark.driver.failed_tasks (counter)
    Total number of failed tasks in driver mapped to a particular application
  • counter.spark.driver.memory_used (counter)
    Amount of memory used by driver mapped to a particular application
  • counter.spark.driver.total_duration (counter)
    Fraction of time spent by driver mapped to a particular application
  • counter.spark.driver.total_input_bytes (counter)
    Number of input bytes in driver mapped to a particular application
  • counter.spark.driver.total_shuffle_read (counter)
    Size read during a shuffle in driver mapped to a particular application
  • counter.spark.driver.total_shuffle_write (counter)
    Size written to during a shuffle in driver mapped to a particular application
  • counter.spark.driver.total_tasks (counter)
    Total number of tasks in driver mapped to a particular application
  • counter.spark.executor.completed_tasks (counter)
    Completed tasks across executors working for a particular application
  • counter.spark.executor.disk_used (counter)
    Amount of disk used across executors working for a particular application
  • counter.spark.executor.failed_tasks (counter)
    Failed tasks across executors working for a particular application
  • counter.spark.executor.memory_used (counter)
    Amount of memory used across executors working for a particular application
  • counter.spark.executor.total_duration (counter)
    Fraction of time spent across executors working for a particular application
  • counter.spark.executor.total_input_bytes (counter)
    Number of input bytes across executors working for a particular application
  • counter.spark.executor.total_shuffle_read (counter)
    Size read during a shuffle in a particular application’s executors
  • counter.spark.executor.total_shuffle_write (counter)
    Size written to during a shuffle in a particular application’s executors
  • counter.spark.executor.total_tasks (counter)
    Total tasks across executors working for a particular application
  • counter.spark.streaming.num_processed_records (counter)
    Number of processed records in a streaming application
  • counter.spark.streaming.num_received_records (counter)
    Number of received records in a streaming application
  • counter.spark.streaming.num_total_completed_batches (counter)
    Number of batches completed in a streaming application
  • gauge.jvm.MarkSweepCompact.count (gauge)
    Garbage collection count
  • gauge.jvm.MarkSweepCompact.time (gauge)
    Garbage collection time
  • gauge.jvm.heap.committed (gauge)
    Amount of committed heap memory (in MB)
  • gauge.jvm.heap.used (gauge)
    Amount of used heap memory (in MB)
  • gauge.jvm.non-heap.committed (gauge)
    Amount of committed non-heap memory (in MB)
  • gauge.jvm.non-heap.used (gauge)
    Amount of used non-heap memory (in MB)
  • gauge.jvm.pools.Code-Cache.committed (gauge)
    Amount of memory committed for compilation and storage of native code
  • gauge.jvm.pools.Code-Cache.used (gauge)
    Amount of memory used to compile and store native code
  • gauge.jvm.pools.Compressed-Class-Space.committed (gauge)
    Amount of memory committed for compressing a class object
  • gauge.jvm.pools.Compressed-Class-Space.used (gauge)
    Amount of memory used to compress a class object
  • gauge.jvm.pools.Eden-Space.committed (gauge)
    Amount of memory committed for the initial allocation of objects
  • gauge.jvm.pools.Eden-Space.used (gauge)
    Amount of memory used for the initial allocation of objects
  • gauge.jvm.pools.Metaspace.committed (gauge)
    Amount of memory committed for storing classes and classloaders
  • gauge.jvm.pools.Metaspace.used (gauge)
    Amount of memory used to store classes and classloaders
  • gauge.jvm.pools.Survivor-Space.committed (gauge)
    Amount of memory committed specifically for objects that have survived GC of the Eden Space
  • gauge.jvm.pools.Survivor-Space.used (gauge)
    Amount of memory used for objects that have survived GC of the Eden Space
  • gauge.jvm.pools.Tenured-Gen.committed (gauge)
    Amount of memory committed to store objects that have lived in the survivor space for a given period of time
  • gauge.jvm.pools.Tenured-Gen.used (gauge)
    Amount of memory used for objects that have lived in the survivor space for a given period of time
  • gauge.jvm.total.committed (gauge)
    Amount of committed JVM memory (in MB)
  • gauge.jvm.total.used (gauge)
    Amount of used JVM memory (in MB)
  • gauge.master.aliveWorkers (gauge)
    Total functioning workers
  • gauge.master.apps (gauge)
    Total number of active applications in the spark cluster
  • gauge.master.waitingApps (gauge)
    Total number of waiting applications in the spark cluster
  • gauge.master.workers (gauge)
    Total number of workers in spark cluster
  • gauge.spark.driver.active_tasks (gauge)
    Total number of active tasks in driver mapped to a particular application
  • gauge.spark.driver.max_memory (gauge)
    Maximum memory used by driver mapped to a particular application
  • gauge.spark.driver.rdd_blocks (gauge)
    Number of RDD blocks in the driver mapped to a particular application
  • gauge.spark.executor.active_tasks (gauge)
    Total number of active tasks across all executors working for a particular application
  • gauge.spark.executor.count (gauge)
    Total number of executors performing for an active application in the spark cluster
  • gauge.spark.executor.max_memory (gauge)
    Max memory across all executors working for a particular application
  • gauge.spark.executor.rdd_blocks (gauge)
    Number of RDD blocks across all executors working for a particular application
  • gauge.spark.job.num_active_stages (gauge)
    Total number of active stages for an active application in the spark cluster
  • gauge.spark.job.num_active_tasks (gauge)
    Total number of active tasks for an active application in the spark cluster
  • gauge.spark.job.num_completed_stages (gauge)
    Total number of completed stages for an active application in the spark cluster
  • gauge.spark.job.num_completed_tasks (gauge)
    Total number of completed tasks for an active application in the spark cluster
  • gauge.spark.job.num_failed_stages (gauge)
    Total number of failed stages for an active application in the spark cluster
  • gauge.spark.job.num_failed_tasks (gauge)
    Total number of failed tasks for an active application in the spark cluster
  • gauge.spark.job.num_skipped_stages (gauge)
    Total number of skipped stages for an active application in the spark cluster
  • gauge.spark.job.num_skipped_tasks (gauge)
    Total number of skipped tasks for an active application in the spark cluster
  • gauge.spark.job.num_tasks (gauge)
    Total number of tasks for an active application in the spark cluster
  • gauge.spark.num_active_stages (gauge)
    Total number of active stages for an active application in the spark cluster
  • gauge.spark.num_running_jobs (gauge)
    Total number of running jobs for an active application in the spark cluster
  • gauge.spark.stage.disk_bytes_spilled (gauge)
    Actual size written to disk for an active application in the spark cluster
  • gauge.spark.stage.executor_run_time (gauge)
    Fraction of time spent by (and averaged across) executors for a particular application
  • gauge.spark.stage.input_bytes (gauge)
    Input size for a particular application
  • gauge.spark.stage.input_records (gauge)
    Input records received for a particular application
  • gauge.spark.stage.memory_bytes_spilled (gauge)
    Size spilled to disk from memory for an active application in the spark cluster
  • gauge.spark.stage.output_bytes (gauge)
    Output size for a particular application
  • gauge.spark.stage.output_records (gauge)
    Output records written to for a particular application
  • gauge.spark.stage.shuffle_read_bytes (gauge)
    Read size during shuffle phase for a particular application
  • gauge.spark.stage.shuffle_read_records (gauge)
    Number of records read during shuffle phase for a particular application
  • gauge.spark.stage.shuffle_write_bytes (gauge)
    Size written during shuffle phase for a particular application
  • gauge.spark.stage.shuffle_write_records (gauge)
    Number of records written to during shuffle phase for a particular application
  • gauge.spark.streaming.avg_input_rate (gauge)
    Average input rate of records across retained batches in a streaming application
  • gauge.spark.streaming.avg_processing_time (gauge)
    Average processing time in a streaming application
  • gauge.spark.streaming.avg_scheduling_delay (gauge)
    Average scheduling delay in a streaming application
  • gauge.spark.streaming.avg_total_delay (gauge)
    Average total delay in a streaming application
  • gauge.spark.streaming.num_active_batches (gauge)
    Number of active batches in a streaming application
  • gauge.spark.streaming.num_inactive_receivers (gauge)
    Number of inactive receivers in a streaming application
  • gauge.worker.coresFree (gauge)
    Total cores free for a particular worker process
  • gauge.worker.coresUsed (gauge)
    Total cores used by a particular worker process
  • gauge.worker.executors (gauge)
    Total number of executors for a particular worker process
  • gauge.worker.memFree_MB (gauge)
    Total memory free for a particular worker process
  • gauge.worker.memUsed_MB (gauge)
    Memory used by a particular worker process

Non-default metrics (version 4.7.0+) πŸ”—

The following information applies to the agent version 4.7.0+ that has enableBuiltInFiltering: true set on the top level of the agent config.

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Legacy non-default metrics (version < 4.7.0) πŸ”—

The following information only applies to agent version older than 4.7.0. If you have a newer agent and have set enableBuiltInFiltering: true at the top level of your agent config, see the section above. See upgrade instructions in Old-style whitelist filtering.

If you have a reference to the whitelist.json in your agent’s top-level metricsToExclude config option, and you want to emit metrics that are not in that whitelist, then you need to add an item to the top-level metricsToInclude config option to override that whitelist (see Inclusion filtering. Or you can just copy the whitelist.json, modify it, and reference that in metricsToExclude.