Docs » Available host and application monitors » Configure application receivers for databases » Apache Spark

Apache Spark ๐Ÿ”—

The Splunk Distribution of OpenTelemetry Collector uses the Smart Agent receiver with the Apache Spark monitor type to monitor Apache Spark clusters. It does not support fetching metrics from Spark Structured Streaming.

For the following cluster modes, the integration only supports HTTP endpoints:

  • Standalone

  • Mesos

  • Hadoop YARN

You need to select distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, set isMaster to true. When you run Apache Spark on Hadoop YARN, this integration can only report application metrics from the master node.

This integration is only available on Kubernetes and Linux.

Benefits ๐Ÿ”—

After you configure the integration, you can access these features:

  • View metrics. You can create your own custom dashboards, and most monitors provide built-in dashboards as well. For information about dashboards, see View dashboards in Observability Cloud.

  • View a data-driven visualization of the physical servers, virtual machines, AWS instances, and other resources in your environment that are visible to Infrastructure Monitoring. For information about navigators, see Splunk Infrastructure Monitoring navigators.

  • Access the Metric Finder and search for metrics sent by the monitor. For information, see Use the Metric Finder.

Installation ๐Ÿ”—

Follow these steps to deploy this integration:

  1. Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform:

  2. Configure the monitor, as described in the Configuration section.

  3. Restart the Splunk Distribution of OpenTelemetry Collector.

Configuration ๐Ÿ”—

To use this integration of a Smart Agent monitor with the Collector:

  1. Include the Smart Agent receiver in your configuration file.

  2. Add the monitor type to the Collector configuration, both in the receiver and pipelines sections.

Example ๐Ÿ”—

To activate this integration, add one of the following to your Collector configuration:

receivers:
  smartagent/collectd_spark_master:
    type: collectd/spark
    ...  # Additional config
receivers:
  smartagent/collectd_spark_worker:
    type: collectd/spark
    ...  # Additional config

Next, add the monitor to the service > pipelines > metrics > receivers section of your configuration file:

service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_master]
service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_worker]

Note: The names collectd_spark_master and collectd_spark_worker are for identification purposes only and donโ€™t affect functionality. You can use either name in your configuration, but you need to select distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, see the isMaster field in the Configuration settings section.

Configuration settings ๐Ÿ”—

The following table shows the configuration options for this integration:

Option

Required

Type

Description

pythonBinary

no

string

This option specifies the path to a Python binary that executes the Python code. If you donโ€™t set this option, the system uses a built-in runtime. You can also include arguments to the binary.

host

yes

string

port

yes

integer

isMaster

no

bool

Set this option to true when you want to monitor a master Spark node. The default is false.

clusterType

yes

string

Set this option to the type of cluster youโ€™re monitoring. The allowed values are Standalone, Mesos or Yarn. The system doesnโ€™t collect cluster metrics for Yarn. Use the collectd/hadoop monitor to gain insights to your clusterโ€™s health.

collectApplicationMetrics

no

bool

The default is false.

enhancedMetrics

no

bool

The default is false.

Metrics ๐Ÿ”—

These are the metrics available for this integration:

Notes ๐Ÿ”—

  • Learn more about the available metric types in Observability Cloud.

  • Default metrics are those metrics included in host-based subscriptions in Observability Cloud, such as host, container, or bundled metrics. Custom metrics are not provided by default and might be subject to charges. See more about metric categories.

  • To add additional metrics, see how to configure extraMetrics using the Collector.

Troubleshooting ๐Ÿ”—

If you are not able to see your data in Splunk Observability Cloud, try these tips:

To learn about even more support options, see Splunk Customer Success.