Docs » Integrations Guide » Integrations Reference » Marathon

../../_images/integrations_marathon.png Marathon ๐Ÿ”—

DESCRIPTION ๐Ÿ”—

The collectd-marathon plugin collects metrics about Marathon applications and tasks.

Features ๐Ÿ”—

Built-in dashboards ๐Ÿ”—

  • Marathon: Overview of Marathon environment.

    ../../_images/dashboard_marathon_overview.png

  • Marathon Application: Focus on Marathon Applications.

    ../../_images/dashboard_marathon_application.png

  • Marathon Resources: Focus on a Marathon Resource Allocation.

    ../../_images/dashboard_marathon_resources.png

  • Marathon Task: Focus on a Marathon Task.

    ../../_images/dashboard_marathon_task.png

REQUIREMENTS AND DEPENDENCIES ๐Ÿ”—

Version information ๐Ÿ”—

Software Version
collectd 5.0 or later
Python 2.6 or later
Marathon 1.1.1 or later
Python plugin for collectd (included with SignalFx collectd agent)

INSTALLATION ๐Ÿ”—

If you are using the new Smart Agent, see the docs for the collectd/marathon monitor for more information. The configuration documentation below may be helpful as well, but consult the Smart Agent repoโ€™s docs for the exact schema.

  1. Download the collectd-marathon Python module onto a host that has access to the Marathon API.

  2. Run the following command to install the moduleโ€™s dependencies using pip, replacing the example path with the download location of the collectd-marathon module:

    sudo pip install -r /path/to/collectd-marathon/requirements.txt
    
  3. Download SignalFxโ€™s sample configuration file for this plugin to /etc/collectd/managed_config.

  4. Modify the configuration file to provide values that make sense for your environment, as described in Configuration below.

  5. Restart collectd.

CONFIGURATION ๐Ÿ”—

Using the sample configuration file 20-collectd-marathon.conf as a guide, provide values for the configuration options listed below that make sense for your environment.

configuration option definition default value
ModulePath Path on disk where collectd can find this module. "/usr/share/collectd/collectd-marathon"
Import Path to the name of the python module with out the .py extension marathon
LogTraces Logs traces from the plugin's execution true
verbose Turns on verbose log statements False
host A python list of ["<scheme>", "<host>", "<port>", "username", "password", "<dcos_auth_url>"]. scheme is either "http" or "https". The username and password are only required for Basic Authentication with the Marathon API. dcos_auth_url is a string that takes the dcos authentication URL which the plugin uses to get authentication tokens from. Set scheme to "https" if operating DC/OS in strict mode and dcos_auth_url to "https://leader.mesos/acs/api/v1/auth/login" (which is the default DNS entry provided by DC/OS) no default

Note: Metrics from the /metrics endpoint are not available while operating in DC/OS strict mode.

An example configuration would look like the following:

<LoadPlugin "python">
  Globals true
</LoadPlugin>

<Plugin "python">
  ModulePath "/usr/share/collectd/collectd-marathon"
  Import "marathon"
  LogTraces true
  <Module "marathon">
    # Note that the last config option can also be set to the base URL of the
    # DC/OS UI and /acs/api/v1/auth/login is the authentication endpoint the plugin
    # uses to obtain token for subsequent requests.
    host  ["https", "localhost", "8443", "username", "password", "https://leader.mesos/acs/api/v1/auth/login"]
    verbose False
  </Module>
</Plugin>

USAGE ๐Ÿ”—

All metrics reported by the Marathon collectd plugin will contain the following dimensions:

  • host will contain the hostname (as known by collectd) of the machine reporting the metrics.
  • plugin is always set to marathon.
  • plugin_instance will always be marathon concated with . and the Mesos agent id. Ex. marathon.<mesos agent id>.

Sample of built-in dashboard in SignalFx:

../../_images/dashboard_marathon_overview1.png

METRICS ๐Ÿ”—

Metric Name Description Type
gauge.marathon-api-metric Metrics reported by the Marathon Metrics API gauge
gauge.marathon.app.cpu.allocated Number of CPUs allocated to an application gauge
gauge.marathon.app.cpu.allocated.per.instance Configured number of CPUs allocated to each application instance gauge
gauge.marathon.app.delayed Indicates if the application is delayed or not gauge
gauge.marathon.app.deployments.total Number of application deployments gauge
gauge.marathon.app.disk.allocated Storage allocated to a Marathon application gauge
gauge.marathon.app.disk.allocated.per.instance Configured storage allocated each to application instance gauge
gauge.marathon.app.gpu.allocated GPU Allocated to a Marathon application gauge
gauge.marathon.app.gpu.allocated.per.instance Configured number of GPUs allocated to each application instance gauge
gauge.marathon.app.instances.total Number of application instances gauge
gauge.marathon.app.memory.allocated Memory Allocated to a Marathon application gauge
gauge.marathon.app.memory.allocated.per.instance Configured amount of memory allocated to each application instance gauge
gauge.marathon.app.tasks.running Number tasks running for an application gauge
gauge.marathon.app.tasks.staged Number tasks staged for an application gauge
gauge.marathon.app.tasks.unhealthy Number unhealthy tasks for an application gauge
gauge.marathon.task.healthchecks.failing.total The number of failing health checks for a task gauge
gauge.marathon.task.healthchecks.passing.total The number of passing health checks for a task gauge
gauge.marathon.task.staged.time.elapsed The amount of time the task spent in staging gauge
gauge.marathon.task.start.time.elapsed Time elapsed since the task started gauge

gauge.marathon-api-metric ๐Ÿ”—

gauge

Metrics reported by the Marathon Metrics API The Marathon API offers a set of metrics that are reported by this plugin. API Endpoint: /metrics

Counters ๐Ÿ”—

This plugin reports all โ€œcountersโ€ as gauge.<metric name>. The Marathon API does the counting for the plugin, so it reads and reports the counts as a gauge type value.

Gauges ๐Ÿ”—

This plugin reports all gauge metrics as gauge.<metric name>.

Meters ๐Ÿ”—

This plugin reports all metrics listed under โ€œmetersโ€ as gauge.<metric name>.<unit>.per.<unit>.

gauge.marathon.app.cpu.allocated ๐Ÿ”—

gauge

Number of CPUs allocated to an application Represents the configured instance allocation multiplied by the number of instances.

gauge.marathon.app.cpu.allocated.per.instance ๐Ÿ”—

gauge

Configured number of CPUs allocated to each application instance

gauge.marathon.app.delayed ๐Ÿ”—

gauge

Indicates if the application is delayed by returning 1 or 0 if it is not

gauge.marathon.app.deployments.total ๐Ÿ”—

gauge

Number of deployments for an application

gauge.marathon.app.disk.allocated ๐Ÿ”—

gauge

Storage allocated to a Marathon application Represents the configured instance allocation multiplied by the number of instances.

gauge.marathon.app.disk.allocated.per.instance ๐Ÿ”—

gauge

Configured storage allocated to each application instance

gauge.marathon.app.gpu.allocated ๐Ÿ”—

gauge

GPU Allocated to a Marathon application Represents the configured instance allocation multiplied by the number of instances.

gauge.marathon.app.gpu.allocated.per.instance ๐Ÿ”—

gauge

Configured number of GPUs allocated to each application instance

gauge.marathon.app.instances.total ๐Ÿ”—

gauge

Number of application instances

gauge.marathon.app.memory.allocated ๐Ÿ”—

gauge

Memory Allocated to a Marathon application Represents the configured instance allocation multiplied by the number of instances.

gauge.marathon.app.memory.allocated.per.instance ๐Ÿ”—

gauge

Configured amount of memory allocated to each application instance

gauge.marathon.app.tasks.running ๐Ÿ”—

gauge

Number of tasks running for an application

gauge.marathon.app.tasks.staged ๐Ÿ”—

gauge

Number of tasks staged for an application

gauge.marathon.app.tasks.unhealthy ๐Ÿ”—

gauge

Number of unhealthy tasks for an application

gauge.marathon.task.healthchecks.failing.total ๐Ÿ”—

gauge

The number of failing health checks for a task

gauge.marathon.task.healthchecks.passing.total ๐Ÿ”—

gauge

The number of passing health checks for a task

gauge.marathon.task.staged.time.elapsed ๐Ÿ”—

gauge

The amount of time the task spent in staging

gauge.marathon.task.start.time.elapsed ๐Ÿ”—

gauge

Time elapsed since the task started