Docs » Integrations Guide » Integrations Reference » Marathon

../../_images/integrations_marathon.png Marathon πŸ”—

DESCRIPTION πŸ”—

This integration primarily consists of the Smart Agent monitor collectd/marathon. Below is an overview of that monitor.

Smart Agent Monitor πŸ”—

Monitors a Mesos Marathon instance using the collectd Marathon Python plugin.

See the integrations doc for more information on configuration.

Sample YAML configuration:

monitors:
  - type: collectd/marathon
    host: 127.0.0.1
    port: 8080
    scheme: http

Sample YAML configuration for DC/OS:

monitors:
  - type: collectd/marathon
    host: 127.0.0.1
    port: 8080
    scheme: https
    dcosAuthURL: https://leader.mesos/acs/api/v1/auth/login

REQUIREMENTS AND DEPENDENCIES πŸ”—

Version information πŸ”—

Software Version
collectd 5.0 or later
Python 2.6 or later
Marathon 1.1.1 or later
Python plugin for collectd (included with SignalFx collectd agent)

INSTALLATION πŸ”—

This integration is part of the SignalFx Smart Agent as the collectd/marathon monitor. You should first deploy the Smart Agent to the same host as the service you want to monitor, and then continue with the configuration instructions below.

CONFIGURATION πŸ”—

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: collectd/marathon
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
pythonBinary no string Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well.
host yes string
port yes integer
username no string Username used to authenticate with Marathon.
password no string Password used to authenticate with Marathon.
scheme no string Set to either http or https. (default: http)
dcosAuthURL no string The dcos authentication URL which the plugin uses to get authentication tokens from. Set scheme to "https" if operating DC/OS in strict mode and dcosAuthURL to "https://leader.mesos/acs/api/v1/auth/login" (which is the default DNS entry provided by DC/OS)

USAGE πŸ”—

All metrics reported by the Marathon collectd plugin will contain the following dimensions:

  • host will contain the hostname (as known by collectd) of the machine reporting the metrics.
  • plugin is always set to marathon.
  • plugin_instance will always be marathon concated with . and the Mesos agent id. Ex. marathon.<mesos agent id>.

Sample of built-in dashboard in SignalFx:

../../_images/dashboard_marathon_overview.png

METRICS πŸ”—

Metric Name Description Type
gauge.marathon.app.cpu.allocated Number of CPUs allocated to an application gauge
gauge.marathon.app.cpu.allocated.per.instance Configured number of CPUs allocated to each application instance gauge
gauge.marathon.app.delayed Indicates if the application is delayed or not gauge
gauge.marathon.app.deployments.total Number of application deployments gauge
gauge.marathon.app.disk.allocated Storage allocated to a Marathon application gauge
gauge.marathon.app.disk.allocated.per.instance Configured storage allocated each to application instance gauge
gauge.marathon.app.gpu.allocated GPU Allocated to a Marathon application gauge
gauge.marathon.app.gpu.allocated.per.instance Configured number of GPUs allocated to each application instance gauge
gauge.marathon.app.instances.total Number of application instances gauge
gauge.marathon.app.memory.allocated Memory Allocated to a Marathon application gauge
gauge.marathon.app.memory.allocated.per.instance Configured amount of memory allocated to each application instance gauge
gauge.marathon.app.tasks.running Number tasks running for an application gauge
gauge.marathon.app.tasks.staged Number tasks staged for an application gauge
gauge.marathon.app.tasks.unhealthy Number unhealthy tasks for an application gauge
gauge.marathon.task.healthchecks.failing.total The number of failing health checks for a task gauge
gauge.marathon.task.healthchecks.passing.total The number of passing health checks for a task gauge
gauge.marathon.task.staged.time.elapsed The amount of time the task spent in staging gauge
gauge.marathon.task.start.time.elapsed Time elapsed since the task started gauge

gauge.marathon.app.cpu.allocated πŸ”—

gauge

Number of CPUs allocated to an application

gauge.marathon.app.cpu.allocated.per.instance πŸ”—

gauge

Configured number of CPUs allocated to each application instance

gauge.marathon.app.delayed πŸ”—

gauge

Indicates if the application is delayed or not

gauge.marathon.app.deployments.total πŸ”—

gauge

Number of application deployments

gauge.marathon.app.disk.allocated πŸ”—

gauge

Storage allocated to a Marathon application

gauge.marathon.app.disk.allocated.per.instance πŸ”—

gauge

Configured storage allocated each to application instance

gauge.marathon.app.gpu.allocated πŸ”—

gauge

GPU Allocated to a Marathon application

gauge.marathon.app.gpu.allocated.per.instance πŸ”—

gauge

Configured number of GPUs allocated to each application instance

gauge.marathon.app.instances.total πŸ”—

gauge

Number of application instances

gauge.marathon.app.memory.allocated πŸ”—

gauge

Memory Allocated to a Marathon application

gauge.marathon.app.memory.allocated.per.instance πŸ”—

gauge

Configured amount of memory allocated to each application instance

gauge.marathon.app.tasks.running πŸ”—

gauge

Number tasks running for an application

gauge.marathon.app.tasks.staged πŸ”—

gauge

Number tasks staged for an application

gauge.marathon.app.tasks.unhealthy πŸ”—

gauge

Number unhealthy tasks for an application

gauge.marathon.task.healthchecks.failing.total πŸ”—

gauge

The number of failing health checks for a task

gauge.marathon.task.healthchecks.passing.total πŸ”—

gauge

The number of passing health checks for a task

gauge.marathon.task.staged.time.elapsed πŸ”—

gauge

The amount of time the task spent in staging

gauge.marathon.task.start.time.elapsed πŸ”—

gauge

Time elapsed since the task started

Metrics that are categorized as container/host (default) are in bold and italics in the list below.

These are the metrics available for this integration.

  • gauge.marathon.app.cpu.allocated (gauge)
    Number of CPUs allocated to an application
  • gauge.marathon.app.cpu.allocated.per.instance (gauge)
    Configured number of CPUs allocated to each application instance
  • gauge.marathon.app.delayed (gauge)
    Indicates if the application is delayed or not
  • gauge.marathon.app.deployments.total (gauge)
    Number of application deployments
  • gauge.marathon.app.disk.allocated (gauge)
    Storage allocated to a Marathon application
  • gauge.marathon.app.disk.allocated.per.instance (gauge)
    Configured storage allocated each to application instance
  • gauge.marathon.app.gpu.allocated (gauge)
    GPU Allocated to a Marathon application
  • gauge.marathon.app.gpu.allocated.per.instance (gauge)
    Configured number of GPUs allocated to each application instance
  • gauge.marathon.app.instances.total (gauge)
    Number of application instances
  • gauge.marathon.app.memory.allocated (gauge)
    Memory Allocated to a Marathon application
  • gauge.marathon.app.memory.allocated.per.instance (gauge)
    Configured amount of memory allocated to each application instance
  • gauge.marathon.app.tasks.running (gauge)
    Number tasks running for an application
  • gauge.marathon.app.tasks.staged (gauge)
    Number tasks staged for an application
  • gauge.marathon.app.tasks.unhealthy (gauge)
    Number unhealthy tasks for an application
  • gauge.marathon.task.healthchecks.failing.total (gauge)
    The number of failing health checks for a task
  • gauge.marathon.task.healthchecks.passing.total (gauge)
    The number of passing health checks for a task
  • gauge.marathon.task.staged.time.elapsed (gauge)
    The amount of time the task spent in staging
  • gauge.marathon.task.start.time.elapsed (gauge)
    Time elapsed since the task started

Non-default metrics (version 4.7.0+) πŸ”—

The following information applies to the agent version 4.7.0+ that has enableBuiltInFiltering: true set on the top level of the agent config.

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Legacy non-default metrics (version < 4.7.0) πŸ”—

The following information only applies to agent version older than 4.7.0. If you have a newer agent and have set enableBuiltInFiltering: true at the top level of your agent config, see the section above. See upgrade instructions in Old-style whitelist filtering.

If you have a reference to the whitelist.json in your agent’s top-level metricsToExclude config option, and you want to emit metrics that are not in that whitelist, then you need to add an item to the top-level metricsToInclude config option to override that whitelist (see Inclusion filtering. Or you can just copy the whitelist.json, modify it, and reference that in metricsToExclude.