Docs » Available host and application monitors » Configure application receivers for databases » Apache Spark

Apache Spark ๐Ÿ”—

Description ๐Ÿ”—

The Splunk Distribution of OpenTelemetry Collector provides this integration as the Apache Spark monitor type for the Smart Agent Receiver.

The integration monitors Apache Spark clusters, but does not support fetching metrics from Spark Structured Streaming.

Note

This monitor is not available on Windows as collectd plugins are only supported in Linux and Kubernetes.

For the following cluster modes, the integration only supports HTTP endpoints:

  • Standalone

  • Mesos

  • Hadoop YARN

You need to select distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, set isMaster to true.

When you run Apache Spark on Hadoop YARN, this integration can only report application metrics from the master node.

Benefits ๐Ÿ”—

After you configure the integration, you can access these features:

  • View metrics. You can create your own custom dashboards, and most monitors provide built-in dashboards as well. For information about dashboards, see View dashboards in Observability Cloud.

  • View a data-driven visualization of the physical servers, virtual machines, AWS instances, and other resources in your environment that are visible to Infrastructure Monitoring. For information about navigators, see Splunk Infrastructure Monitoring navigators.

  • Access the Metric Finder and search for metrics sent by the monitor. For information, see Use the Metric Finder.

Installation ๐Ÿ”—

Follow these steps to deploy this integration:

  1. Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform:

  2. Configure the monitor, as described in the Configuration section.

  3. Restart the Splunk Distribution of OpenTelemetry Collector.

Configuration ๐Ÿ”—

To use this Smart Agent monitor with the Collector, include the smartagent receiver and service pipeline in your configuration file. The Smart Agent receiver is fully supported only on x86_64/amd64 platforms.

See the examples below for more details.

To activate this monitor in the Splunk Distribution of OpenTelemetry Collector, add one of the following to your agent configuration:

receivers:
  smartagent/collectd_spark_master:
    type: collectd/spark
    ...  # Additional config
receivers:
  smartagent/collectd_spark_worker:
    type: collectd/spark
    ...  # Additional config

To complete the integration, include the monitor in a metrics pipeline. Add the monitor item to the service/pipelines/metrics/receivers section of your configuration file. For example:

service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_master]
service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_worker]

Note: The names of the monitor, collectd_spark_master and collectd_spark_worker, are for identification purposes and donโ€™t affect functionality. You can use either name in your configuration, but you need to select distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, see the isMaster field in the Configuration settings section.

Configuration settings ๐Ÿ”—

The following table shows the configuration options for this monitor:

Option

Required

Type

Description

pythonBinary

no

string

This option specifies the path to a Python binary that executes the Python code. If you donโ€™t set this option, the system uses a built-in runtime. You can also include arguments to the binary.

host

yes

string

port

yes

integer

isMaster

no

bool

Set this option to true when you want to monitor a master Spark node. The default is false.

clusterType

yes

string

Set this option to the type of cluster youโ€™re monitoring. The allowed values are Standalone, Mesos or Yarn. The system doesnโ€™t collect cluster metrics for Yarn. Use the collectd/hadoop monitor to gain insights to your clusterโ€™s health.

collectApplicationMetrics

no

bool

The default is false.

enhancedMetrics

no

bool

The default is false.

Metrics ๐Ÿ”—

These are the metrics available for this integration:

Get help ๐Ÿ”—

If you are not able to see your data in Splunk Observability Cloud, try these tips:

To learn about even more support options, see Splunk Customer Success.