Apache Spark π
DESCRIPTION π
This integration primarily consists of the Smart Agent monitor collectd/spark
.
Below is an overview of that monitor.
Smart Agent Monitor π
Collects metrics about a Spark cluster using the collectd Spark Python
plugin. That plugin collects
metrics from Spark cluster and instances by hitting endpoints specified in
Sparkβs Monitoring and Instrumentation
documentation under
REST API
and Metrics
.
We currently only support cluster modes Standalone, Mesos, and Hadoop Yarn via HTTP endpoints.
You have to specify distinct monitor configurations and discovery rules for
master and worker processes. For the master configuration, set isMaster
to true.
When running Spark on Apache Hadoop / Yarn, this integration is only capable of reporting application metrics from the master node. Please use the collectd/hadoop monitor to report on the health of the cluster.
Example config: π
An example configuration for monitoring applications on Yarn
monitors:
- type: collectd/spark
host: 000.000.000.000
port: 8088
clusterType: Yarn
isMaster: true
collectApplicationMetrics: true
INSTALLATION π
This integration is part of the SignalFx Smart Agent
as the collectd/spark
monitor. You should first deploy the Smart Agent to the
same host as the service you want to monitor, and then continue with the
configuration instructions below.
CONFIGURATION π
To activate this monitor in the Smart Agent, add the following to your agent config:
monitors: # All monitor config goes under this key
- type: collectd/spark
... # Additional config
For a list of monitor options that are common to all monitors, see Common Configuration.
Config option | Required | Type | Description |
---|---|---|---|
pythonBinary |
no | string |
Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well. |
host |
yes | string |
|
port |
yes | integer |
|
isMaster |
no | bool |
Set to true when monitoring a master Spark node (default: false ) |
clusterType |
yes | string |
Should be one of Standalone or Mesos or Yarn . Cluster metrics will not be collected on Yarn. Please use the collectd/hadoop monitor to gain insights to your cluster's health. |
collectApplicationMetrics |
no | bool |
(default: false ) |
enhancedMetrics |
no | bool |
(default: false ) |
METRICS π
Metric Name | Description | Type |
---|---|---|
counter.HiveExternalCatalog.fileCacheHits | Total number of file level cache hits occurred | counter |
counter.HiveExternalCatalog.filesDiscovered | Total number of files discovered | counter |
counter.HiveExternalCatalog.hiveClientCalls | Total number of client calls sent to Hive for query processing | counter |
counter.HiveExternalCatalog.parallelListingJobCount | Total number of Hive-specific jobs running in parallel | counter |
counter.HiveExternalCatalog.partitionsFetched | Total number of partitions fetched | counter |
counter.spark.driver.completed_tasks | Total number of completed tasks in driver mapped to a particular application | counter |
counter.spark.driver.disk_used | Amount of disk used by driver mapped to a particular application | counter |
counter.spark.driver.failed_tasks | Total number of failed tasks in driver mapped to a particular application | counter |
counter.spark.driver.memory_used | Amount of memory used by driver mapped to a particular application | counter |
counter.spark.driver.total_duration | Fraction of time spent by driver mapped to a particular application | counter |
counter.spark.driver.total_input_bytes | Number of input bytes in driver mapped to a particular application | counter |
counter.spark.driver.total_shuffle_read | Size read during a shuffle in driver mapped to a particular application | counter |
counter.spark.driver.total_shuffle_write | Size written to during a shuffle in driver mapped to a particular application | counter |
counter.spark.driver.total_tasks | Total number of tasks in driver mapped to a particular application | counter |
counter.spark.executor.completed_tasks | Completed tasks across executors working for a particular application | counter |
counter.spark.executor.disk_used | Amount of disk used across executors working for a particular application | counter |
counter.spark.executor.failed_tasks | Failed tasks across executors working for a particular application | counter |
counter.spark.executor.memory_used | Amount of memory used across executors working for a particular application | counter |
counter.spark.executor.total_duration | Fraction of time spent across executors working for a particular application | counter |
counter.spark.executor.total_input_bytes | Number of input bytes across executors working for a particular application | counter |
counter.spark.executor.total_shuffle_read | Size read during a shuffle in a particular application's executors | counter |
counter.spark.executor.total_shuffle_write | Size written to during a shuffle in a particular application's executors | counter |
counter.spark.executor.total_tasks | Total tasks across executors working for a particular application | counter |
counter.spark.streaming.num_processed_records | Number of processed records in a streaming application | counter |
counter.spark.streaming.num_received_records | Number of received records in a streaming application | counter |
counter.spark.streaming.num_total_completed_batches | Number of batches completed in a streaming application | counter |
gauge.jvm.MarkSweepCompact.count | Garbage collection count | gauge |
gauge.jvm.MarkSweepCompact.time | Garbage collection time | gauge |
gauge.jvm.heap.committed | Amount of committed heap memory (in MB) | gauge |
gauge.jvm.heap.used | Amount of used heap memory (in MB) | gauge |
gauge.jvm.non-heap.committed | Amount of committed non-heap memory (in MB) | gauge |
gauge.jvm.non-heap.used | Amount of used non-heap memory (in MB) | gauge |
gauge.jvm.pools.Code-Cache.committed | Amount of memory committed for compilation and storage of native code | gauge |
gauge.jvm.pools.Code-Cache.used | Amount of memory used to compile and store native code | gauge |
gauge.jvm.pools.Compressed-Class-Space.committed | Amount of memory committed for compressing a class object | gauge |
gauge.jvm.pools.Compressed-Class-Space.used | Amount of memory used to compress a class object | gauge |
gauge.jvm.pools.Eden-Space.committed | Amount of memory committed for the initial allocation of objects | gauge |
gauge.jvm.pools.Eden-Space.used | Amount of memory used for the initial allocation of objects | gauge |
gauge.jvm.pools.Metaspace.committed | Amount of memory committed for storing classes and classloaders | gauge |
gauge.jvm.pools.Metaspace.used | Amount of memory used to store classes and classloaders | gauge |
gauge.jvm.pools.Survivor-Space.committed | Amount of memory committed specifically for objects that have survived GC of the Eden Space | gauge |
gauge.jvm.pools.Survivor-Space.used | Amount of memory used for objects that have survived GC of the Eden Space | gauge |
gauge.jvm.pools.Tenured-Gen.committed | Amount of memory committed to store objects that have lived in the survivor space for a given period of time | gauge |
gauge.jvm.pools.Tenured-Gen.used | Amount of memory used for objects that have lived in the survivor space for a given period of time | gauge |
gauge.jvm.total.committed | Amount of committed JVM memory (in MB) | gauge |
gauge.jvm.total.used | Amount of used JVM memory (in MB) | gauge |
gauge.master.aliveWorkers | Total functioning workers | gauge |
gauge.master.apps | Total number of active applications in the spark cluster | gauge |
gauge.master.waitingApps | Total number of waiting applications in the spark cluster | gauge |
gauge.master.workers | Total number of workers in spark cluster | gauge |
gauge.spark.driver.active_tasks | Total number of active tasks in driver mapped to a particular application | gauge |
gauge.spark.driver.max_memory | Maximum memory used by driver mapped to a particular application | gauge |
gauge.spark.driver.rdd_blocks | Number of RDD blocks in the driver mapped to a particular application | gauge |
gauge.spark.executor.active_tasks | Total number of active tasks across all executors working for a particular application | gauge |
gauge.spark.executor.count | Total number of executors performing for an active application in the spark cluster | gauge |
gauge.spark.executor.max_memory | Max memory across all executors working for a particular application | gauge |
gauge.spark.executor.rdd_blocks | Number of RDD blocks across all executors working for a particular application | gauge |
gauge.spark.job.num_active_stages | Total number of active stages for an active application in the spark cluster | gauge |
gauge.spark.job.num_active_tasks | Total number of active tasks for an active application in the spark cluster | gauge |
gauge.spark.job.num_completed_stages | Total number of completed stages for an active application in the spark cluster | gauge |
gauge.spark.job.num_completed_tasks | Total number of completed tasks for an active application in the spark cluster | gauge |
gauge.spark.job.num_failed_stages | Total number of failed stages for an active application in the spark cluster | gauge |
gauge.spark.job.num_failed_tasks | Total number of failed tasks for an active application in the spark cluster | gauge |
gauge.spark.job.num_skipped_stages | Total number of skipped stages for an active application in the spark cluster | gauge |
gauge.spark.job.num_skipped_tasks | Total number of skipped tasks for an active application in the spark cluster | gauge |
gauge.spark.job.num_tasks | Total number of tasks for an active application in the spark cluster | gauge |
gauge.spark.num_active_stages | Total number of active stages for an active application in the spark cluster | gauge |
gauge.spark.num_running_jobs | Total number of running jobs for an active application in the spark cluster | gauge |
gauge.spark.stage.disk_bytes_spilled | Actual size written to disk for an active application in the spark cluster | gauge |
gauge.spark.stage.executor_run_time | Fraction of time spent by (and averaged across) executors for a particular application | gauge |
gauge.spark.stage.input_bytes | Input size for a particular application | gauge |
gauge.spark.stage.input_records | Input records received for a particular application | gauge |
gauge.spark.stage.memory_bytes_spilled | Size spilled to disk from memory for an active application in the spark cluster | gauge |
gauge.spark.stage.output_bytes | Output size for a particular application | gauge |
gauge.spark.stage.output_records | Output records written to for a particular application | gauge |
gauge.spark.stage.shuffle_read_bytes | Read size during shuffle phase for a particular application | gauge |
gauge.spark.stage.shuffle_read_records | Number of records read during shuffle phase for a particular application | gauge |
gauge.spark.stage.shuffle_write_bytes | Size written during shuffle phase for a particular application | gauge |
gauge.spark.stage.shuffle_write_records | Number of records written to during shuffle phase for a particular application | gauge |
gauge.spark.streaming.avg_input_rate | Average input rate of records across retained batches in a streaming application | gauge |
gauge.spark.streaming.avg_processing_time | Average processing time in a streaming application | gauge |
gauge.spark.streaming.avg_scheduling_delay | Average scheduling delay in a streaming application | gauge |
gauge.spark.streaming.avg_total_delay | Average total delay in a streaming application | gauge |
gauge.spark.streaming.num_active_batches | Number of active batches in a streaming application | gauge |
gauge.spark.streaming.num_inactive_receivers | Number of inactive receivers in a streaming application | gauge |
gauge.worker.coresFree | Total cores free for a particular worker process | gauge |
gauge.worker.coresUsed | Total cores used by a particular worker process | gauge |
gauge.worker.executors | Total number of executors for a particular worker process | gauge |
gauge.worker.memFree_MB | Total memory free for a particular worker process | gauge |
gauge.worker.memUsed_MB | Memory used by a particular worker process | gauge |
counter.HiveExternalCatalog.fileCacheHits π
counter
Total number of file level cache hits occurred
counter.HiveExternalCatalog.hiveClientCalls π
counter
Total number of client calls sent to Hive for query processing
counter.HiveExternalCatalog.parallelListingJobCount π
counter
Total number of Hive-specific jobs running in parallel
counter.spark.driver.completed_tasks π
counter
Total number of completed tasks in driver mapped to a particular application
counter.spark.driver.disk_used π
counter
Amount of disk used by driver mapped to a particular application
counter.spark.driver.failed_tasks π
counter
Total number of failed tasks in driver mapped to a particular application
counter.spark.driver.memory_used π
counter
Amount of memory used by driver mapped to a particular application
counter.spark.driver.total_duration π
counter
Fraction of time spent by driver mapped to a particular application
counter.spark.driver.total_input_bytes π
counter
Number of input bytes in driver mapped to a particular application
counter.spark.driver.total_shuffle_read π
counter
Size read during a shuffle in driver mapped to a particular application
counter.spark.driver.total_shuffle_write π
counter
Size written to during a shuffle in driver mapped to a particular application
counter.spark.driver.total_tasks π
counter
Total number of tasks in driver mapped to a particular application
counter.spark.executor.completed_tasks π
counter
Completed tasks across executors working for a particular application
counter.spark.executor.disk_used π
counter
Amount of disk used across executors working for a particular application
counter.spark.executor.failed_tasks π
counter
Failed tasks across executors working for a particular application
counter.spark.executor.memory_used π
counter
Amount of memory used across executors working for a particular application
counter.spark.executor.total_duration π
counter
Fraction of time spent across executors working for a particular application
counter.spark.executor.total_input_bytes π
counter
Number of input bytes across executors working for a particular application
counter.spark.executor.total_shuffle_read π
counter
Size read during a shuffle in a particular applicationβs executors
counter.spark.executor.total_shuffle_write π
counter
Size written to during a shuffle in a particular applicationβs executors
counter.spark.executor.total_tasks π
counter
Total tasks across executors working for a particular application
counter.spark.streaming.num_processed_records π
counter
Number of processed records in a streaming application
counter.spark.streaming.num_received_records π
counter
Number of received records in a streaming application
counter.spark.streaming.num_total_completed_batches π
counter
Number of batches completed in a streaming application
gauge.jvm.pools.Code-Cache.committed π
gauge
Amount of memory committed for compilation and storage of native code
gauge.jvm.pools.Compressed-Class-Space.committed π
gauge
Amount of memory committed for compressing a class object
gauge.jvm.pools.Compressed-Class-Space.used π
gauge
Amount of memory used to compress a class object
gauge.jvm.pools.Eden-Space.committed π
gauge
Amount of memory committed for the initial allocation of objects
gauge.jvm.pools.Eden-Space.used π
gauge
Amount of memory used for the initial allocation of objects
gauge.jvm.pools.Metaspace.committed π
gauge
Amount of memory committed for storing classes and classloaders
gauge.jvm.pools.Survivor-Space.committed π
gauge
Amount of memory committed specifically for objects that have survived GC of the Eden Space
gauge.jvm.pools.Survivor-Space.used π
gauge
Amount of memory used for objects that have survived GC of the Eden Space
gauge.jvm.pools.Tenured-Gen.committed π
gauge
Amount of memory committed to store objects that have lived in the survivor space for a given period of time
gauge.jvm.pools.Tenured-Gen.used π
gauge
Amount of memory used for objects that have lived in the survivor space for a given period of time
gauge.spark.driver.active_tasks π
gauge
Total number of active tasks in driver mapped to a particular application
gauge.spark.driver.max_memory π
gauge
Maximum memory used by driver mapped to a particular application
gauge.spark.driver.rdd_blocks π
gauge
Number of RDD blocks in the driver mapped to a particular application
gauge.spark.executor.active_tasks π
gauge
Total number of active tasks across all executors working for a particular application
gauge.spark.executor.count π
gauge
Total number of executors performing for an active application in the spark cluster
gauge.spark.executor.max_memory π
gauge
Max memory across all executors working for a particular application
gauge.spark.executor.rdd_blocks π
gauge
Number of RDD blocks across all executors working for a particular application
gauge.spark.job.num_active_stages π
gauge
Total number of active stages for an active application in the spark cluster
gauge.spark.job.num_active_tasks π
gauge
Total number of active tasks for an active application in the spark cluster
gauge.spark.job.num_completed_stages π
gauge
Total number of completed stages for an active application in the spark cluster
gauge.spark.job.num_completed_tasks π
gauge
Total number of completed tasks for an active application in the spark cluster
gauge.spark.job.num_failed_stages π
gauge
Total number of failed stages for an active application in the spark cluster
gauge.spark.job.num_failed_tasks π
gauge
Total number of failed tasks for an active application in the spark cluster
gauge.spark.job.num_skipped_stages π
gauge
Total number of skipped stages for an active application in the spark cluster
gauge.spark.job.num_skipped_tasks π
gauge
Total number of skipped tasks for an active application in the spark cluster
gauge.spark.job.num_tasks π
gauge
Total number of tasks for an active application in the spark cluster
gauge.spark.num_active_stages π
gauge
Total number of active stages for an active application in the spark cluster
gauge.spark.num_running_jobs π
gauge
Total number of running jobs for an active application in the spark cluster
gauge.spark.stage.disk_bytes_spilled π
gauge
Actual size written to disk for an active application in the spark cluster
gauge.spark.stage.executor_run_time π
gauge
Fraction of time spent by (and averaged across) executors for a particular application
gauge.spark.stage.memory_bytes_spilled π
gauge
Size spilled to disk from memory for an active application in the spark cluster
gauge.spark.stage.shuffle_read_bytes π
gauge
Read size during shuffle phase for a particular application
gauge.spark.stage.shuffle_read_records π
gauge
Number of records read during shuffle phase for a particular application
gauge.spark.stage.shuffle_write_bytes π
gauge
Size written during shuffle phase for a particular application
gauge.spark.stage.shuffle_write_records π
gauge
Number of records written to during shuffle phase for a particular application
gauge.spark.streaming.avg_input_rate π
gauge
Average input rate of records across retained batches in a streaming application
gauge.spark.streaming.avg_processing_time π
gauge
Average processing time in a streaming application
gauge.spark.streaming.avg_scheduling_delay π
gauge
Average scheduling delay in a streaming application
gauge.spark.streaming.num_active_batches π
gauge
Number of active batches in a streaming application
gauge.spark.streaming.num_inactive_receivers π
gauge
Number of inactive receivers in a streaming application
gauge.worker.memUsed_MB π
gauge
Memory used by a particular worker process
Metrics that are categorized as container/host (default) are in bold and italics in the list below.
These are the metrics available for this integration.
counter.HiveExternalCatalog.fileCacheHits
(counter)
Total number of file level cache hits occurredcounter.HiveExternalCatalog.filesDiscovered
(counter)
Total number of files discoveredcounter.HiveExternalCatalog.hiveClientCalls
(counter)
Total number of client calls sent to Hive for query processingcounter.HiveExternalCatalog.parallelListingJobCount
(counter)
Total number of Hive-specific jobs running in parallelcounter.HiveExternalCatalog.partitionsFetched
(counter)
Total number of partitions fetchedcounter.spark.driver.completed_tasks
(counter)
Total number of completed tasks in driver mapped to a particular applicationcounter.spark.driver.disk_used
(counter)
Amount of disk used by driver mapped to a particular applicationcounter.spark.driver.failed_tasks
(counter)
Total number of failed tasks in driver mapped to a particular applicationcounter.spark.driver.memory_used
(counter)
Amount of memory used by driver mapped to a particular applicationcounter.spark.driver.total_duration
(counter)
Fraction of time spent by driver mapped to a particular applicationcounter.spark.driver.total_input_bytes
(counter)
Number of input bytes in driver mapped to a particular applicationcounter.spark.driver.total_shuffle_read
(counter)
Size read during a shuffle in driver mapped to a particular applicationcounter.spark.driver.total_shuffle_write
(counter)
Size written to during a shuffle in driver mapped to a particular applicationcounter.spark.driver.total_tasks
(counter)
Total number of tasks in driver mapped to a particular applicationcounter.spark.executor.completed_tasks
(counter)
Completed tasks across executors working for a particular applicationcounter.spark.executor.disk_used
(counter)
Amount of disk used across executors working for a particular applicationcounter.spark.executor.failed_tasks
(counter)
Failed tasks across executors working for a particular applicationcounter.spark.executor.memory_used
(counter)
Amount of memory used across executors working for a particular applicationcounter.spark.executor.total_duration
(counter)
Fraction of time spent across executors working for a particular applicationcounter.spark.executor.total_input_bytes
(counter)
Number of input bytes across executors working for a particular applicationcounter.spark.executor.total_shuffle_read
(counter)
Size read during a shuffle in a particular applicationβs executorscounter.spark.executor.total_shuffle_write
(counter)
Size written to during a shuffle in a particular applicationβs executorscounter.spark.executor.total_tasks
(counter)
Total tasks across executors working for a particular applicationcounter.spark.streaming.num_processed_records
(counter)
Number of processed records in a streaming applicationcounter.spark.streaming.num_received_records
(counter)
Number of received records in a streaming applicationcounter.spark.streaming.num_total_completed_batches
(counter)
Number of batches completed in a streaming applicationgauge.jvm.MarkSweepCompact.count
(gauge)
Garbage collection countgauge.jvm.MarkSweepCompact.time
(gauge)
Garbage collection timegauge.jvm.heap.committed
(gauge)
Amount of committed heap memory (in MB)gauge.jvm.heap.used
(gauge)
Amount of used heap memory (in MB)gauge.jvm.non-heap.committed
(gauge)
Amount of committed non-heap memory (in MB)gauge.jvm.non-heap.used
(gauge)
Amount of used non-heap memory (in MB)gauge.jvm.pools.Code-Cache.committed
(gauge)
Amount of memory committed for compilation and storage of native codegauge.jvm.pools.Code-Cache.used
(gauge)
Amount of memory used to compile and store native codegauge.jvm.pools.Compressed-Class-Space.committed
(gauge)
Amount of memory committed for compressing a class objectgauge.jvm.pools.Compressed-Class-Space.used
(gauge)
Amount of memory used to compress a class objectgauge.jvm.pools.Eden-Space.committed
(gauge)
Amount of memory committed for the initial allocation of objectsgauge.jvm.pools.Eden-Space.used
(gauge)
Amount of memory used for the initial allocation of objectsgauge.jvm.pools.Metaspace.committed
(gauge)
Amount of memory committed for storing classes and classloadersgauge.jvm.pools.Metaspace.used
(gauge)
Amount of memory used to store classes and classloadersgauge.jvm.pools.Survivor-Space.committed
(gauge)
Amount of memory committed specifically for objects that have survived GC of the Eden Spacegauge.jvm.pools.Survivor-Space.used
(gauge)
Amount of memory used for objects that have survived GC of the Eden Spacegauge.jvm.pools.Tenured-Gen.committed
(gauge)
Amount of memory committed to store objects that have lived in the survivor space for a given period of timegauge.jvm.pools.Tenured-Gen.used
(gauge)
Amount of memory used for objects that have lived in the survivor space for a given period of timegauge.jvm.total.committed
(gauge)
Amount of committed JVM memory (in MB)gauge.jvm.total.used
(gauge)
Amount of used JVM memory (in MB)gauge.master.aliveWorkers
(gauge)
Total functioning workersgauge.master.apps
(gauge)
Total number of active applications in the spark clustergauge.master.waitingApps
(gauge)
Total number of waiting applications in the spark clustergauge.master.workers
(gauge)
Total number of workers in spark clustergauge.spark.driver.active_tasks
(gauge)
Total number of active tasks in driver mapped to a particular applicationgauge.spark.driver.max_memory
(gauge)
Maximum memory used by driver mapped to a particular applicationgauge.spark.driver.rdd_blocks
(gauge)
Number of RDD blocks in the driver mapped to a particular applicationgauge.spark.executor.active_tasks
(gauge)
Total number of active tasks across all executors working for a particular applicationgauge.spark.executor.count
(gauge)
Total number of executors performing for an active application in the spark clustergauge.spark.executor.max_memory
(gauge)
Max memory across all executors working for a particular applicationgauge.spark.executor.rdd_blocks
(gauge)
Number of RDD blocks across all executors working for a particular applicationgauge.spark.job.num_active_stages
(gauge)
Total number of active stages for an active application in the spark clustergauge.spark.job.num_active_tasks
(gauge)
Total number of active tasks for an active application in the spark clustergauge.spark.job.num_completed_stages
(gauge)
Total number of completed stages for an active application in the spark clustergauge.spark.job.num_completed_tasks
(gauge)
Total number of completed tasks for an active application in the spark clustergauge.spark.job.num_failed_stages
(gauge)
Total number of failed stages for an active application in the spark clustergauge.spark.job.num_failed_tasks
(gauge)
Total number of failed tasks for an active application in the spark clustergauge.spark.job.num_skipped_stages
(gauge)
Total number of skipped stages for an active application in the spark clustergauge.spark.job.num_skipped_tasks
(gauge)
Total number of skipped tasks for an active application in the spark clustergauge.spark.job.num_tasks
(gauge)
Total number of tasks for an active application in the spark clustergauge.spark.num_active_stages
(gauge)
Total number of active stages for an active application in the spark clustergauge.spark.num_running_jobs
(gauge)
Total number of running jobs for an active application in the spark clustergauge.spark.stage.disk_bytes_spilled
(gauge)
Actual size written to disk for an active application in the spark clustergauge.spark.stage.executor_run_time
(gauge)
Fraction of time spent by (and averaged across) executors for a particular applicationgauge.spark.stage.input_bytes
(gauge)
Input size for a particular applicationgauge.spark.stage.input_records
(gauge)
Input records received for a particular applicationgauge.spark.stage.memory_bytes_spilled
(gauge)
Size spilled to disk from memory for an active application in the spark clustergauge.spark.stage.output_bytes
(gauge)
Output size for a particular applicationgauge.spark.stage.output_records
(gauge)
Output records written to for a particular applicationgauge.spark.stage.shuffle_read_bytes
(gauge)
Read size during shuffle phase for a particular applicationgauge.spark.stage.shuffle_read_records
(gauge)
Number of records read during shuffle phase for a particular applicationgauge.spark.stage.shuffle_write_bytes
(gauge)
Size written during shuffle phase for a particular applicationgauge.spark.stage.shuffle_write_records
(gauge)
Number of records written to during shuffle phase for a particular applicationgauge.spark.streaming.avg_input_rate
(gauge)
Average input rate of records across retained batches in a streaming applicationgauge.spark.streaming.avg_processing_time
(gauge)
Average processing time in a streaming applicationgauge.spark.streaming.avg_scheduling_delay
(gauge)
Average scheduling delay in a streaming applicationgauge.spark.streaming.avg_total_delay
(gauge)
Average total delay in a streaming applicationgauge.spark.streaming.num_active_batches
(gauge)
Number of active batches in a streaming applicationgauge.spark.streaming.num_inactive_receivers
(gauge)
Number of inactive receivers in a streaming applicationgauge.worker.coresFree
(gauge)
Total cores free for a particular worker processgauge.worker.coresUsed
(gauge)
Total cores used by a particular worker processgauge.worker.executors
(gauge)
Total number of executors for a particular worker processgauge.worker.memFree_MB
(gauge)
Total memory free for a particular worker processgauge.worker.memUsed_MB
(gauge)
Memory used by a particular worker process
Non-default metrics (version 4.7.0+) π
The following information applies to the agent version 4.7.0+ that has
enableBuiltInFiltering: true
set on the top level of the agent config.
To emit metrics that are not default, you can add those metrics in the
generic monitor-level extraMetrics
config option. Metrics that are derived
from specific configuration options that do not appear in the above list of
metrics do not need to be added to extraMetrics
.
To see a list of metrics that will be emitted you can run agent-status monitors
after configuring this monitor in a running agent instance.
Legacy non-default metrics (version < 4.7.0) π
The following information only applies to agent version older than 4.7.0. If
you have a newer agent and have set enableBuiltInFiltering: true
at the top
level of your agent config, see the section above. See upgrade instructions in
Old-style inclusion list filtering.
If you have a reference to the whitelist.json
in your agentβs top-level
metricsToExclude
config option, and you want to emit metrics that are not in
that allow list, then you need to add an item to the top-level
metricsToInclude
config option to override that allow list (see Inclusion
filtering. Or you can just
copy the whitelist.json, modify it, and reference that in metricsToExclude
.