Docs » µAPM Instrumentation Guide » Service Mesh Instrumentation

Service Mesh Instrumentation 🔗

Important

Before you start instrumenting your applications, review the information in Instrumentation Overview.

Introduction 🔗

Service meshes provide a consistent way to connect, manage, and secure microservices. Observability is one of the biggest use cases for adopting service mesh, as it can provide a powerful tool for monitoring and responding to deployment and usage changes in a microservices environment. Service Meshes can be easily configured to send data to SignalFx. In this document, we will focus on Istio and App Mesh configuration for sending traces into SignalFx, enabling out-of-the-box service monitoring and troubleshooting.

Istio and App Mesh both use Envoy as a data plane. Out of the box, the Envoy proxies used in Istio and App Mesh can be configured to easily send traces using the built-in Zipkin instrumentation. However, they will trace ingress and egress separately per proxy. So while you will get connections between those outgoing requests from one proxy to the incoming request to another proxy, subsequent requests will not be connected to those traces. In order to fully trace the application, instrumentation must be added in each microservice to propagate the trace context between incoming and outgoing requests. This can be done through manual instrumentation or automatic instrumentation if the frameworks in use are supported.

The recommended deployment model for adding APM to your sevice mesh is to run the SignalFx Smart Agent on each of the host machines and deploy a Smart Gateway outside of the mesh. The app must be configured to send traces through the Smart Agent, which will allow host correlation and host metrics for the services. The Smart Agent must be configured to forward traces to the deployed Smart Gateway. In some cases, the traced application will be sufficient.

If Envoy tracing is enabled, when possible they should be routed through the host’s Smart Agent instance. If that is not possible, they can be configured to send directly to the Smart Gateway. This will still provide some information on the service dashboard for the proxies. By default, the Envoy spans will appear to be associated with a separate service. When routing or other proxy capabilities are not in use, such as when only one instance is in use behind a service or if they are all replicas, the Envoy trace service name should be set the same as the microservice. This declutters the service map by grouping spans in a logical service. However, when routing policies are in use, giving distinct names to each sidecar proxy will allow you to identify which routes were used in a particular trace. As such, the best configuration would be to enable Envoy tracing on a per-microservice basis.

The sections below give more details on configuring your microservices and sidecar proxies on Istio and App Mesh.

Istio 🔗

The Istio service mesh is composed of a data plane and a control plane. The data plane consists of Envoy sidecars, which control traffic in and out of microservices, and Mixer, a general-purpose policy and telemetry hub. The control plane configures the proxies to route traffic, and configures Mixers to enforce policies and collect telemetry. To learn more about observability for Istio with SignalFx, see this SignalFx blog post.

Mixer Adapter 🔗

SignalFx provides an Istio Mixer adapter to bring traces in from Istio. The Mixer adapter must be configured to report to the SignalFx Smart Gateway.

To configure the SignalFx Mixer adapter without Helm, edit resources/handler.yaml and set the access token and ingest url. For example, if the Smart Gateway is deployed in the default namespace with a service exposed, the file would look like this:

---
# Source: signalfx-istio-adapter/templates/handler.yaml
apiVersion: "config.istio.io/v1alpha2"
kind: handler
metadata:
  name: signalfx
  namespace: istio-system
  labels:
    app.kubernetes.io/name: signalfx-adapter
spec:
  adapter: signalfx
  connection:
    address: "signalfx-adapter:8080"
  params:
    # This can also be set with the env var SIGNALFX_ACCESS_TOKEN on the
    # adapter pod.
    access_token: "MY_ACCESS_TOKEN"
    ingest_url: "gateway.default.svc.cluster.local:8080"
    datapoint_interval: 10s
    enable_metrics: true
    enable_tracing: true
    metrics:
    - name: requestcount.instance.istio-system
      type: COUNTER
    - name: requestduration.instance.istio-system
      type: HISTOGRAM
    - name: requestsize.instance.istio-system
      type: COUNTER
    - name: responsesize.instance.istio-system
      type: COUNTER
    - name: tcpbytesent.instance.istio-system
      type: COUNTER
    - name: tcpbytereceived.instance.istio-system
      type: COUNTER
    tracing:
      buffer_size: 1000
      localEndpointIpTagKey: source.ip
      localEndpointNameTagKey: source.workload.name
      remoteEndpointIpTagKey: destination.ip
      remoteEndpointNameTagKey: destination.workload.name
      swapLocalRemoteEndpoints: true

Then the adapter can be applied with the following kubectl command:

$ kubectl apply -f resources/

If Helm is used to install the adapter, set the value for ingestUrl to the Gateway address:

$ helm install --name signalfx-adapter \
    --set fullnameOverride=signalfx-adapter \
    --namespace istio-system \
    --set-string accessToken=MY_ORG_ACCESS_TOKEN \
    --set-string ingestUrl=gateway.default.svc.cluster.local:8080 \
    ./helm/signalfx-adapter/

Services to ignore can be configured in the file rules-tracing.yaml by adding expressions to match:

---
# Source: signalfx-istio-adapter/templates/rules-tracing.yaml
...
spec:
  match: (context.protocol == "http" || context.protocol == "grpc") && request.host != "mixer" && source.workload != "service-to-ignore"

This will not add spans for the service service-to-ignore.

Smart Agent 🔗

While the Mixer adapter can be used alone to send request spans to SignalFx, the ideal use case would be to use it in conjunction with the SignalFx Smart Agent. By sending application spans though the SignalFx Smart Agent, you will capture important infrastructure (host) correlation metrics for your service.

On GKE and other Kubernetes platforms, the recommended way to deploy the SignalFx Smart Agent is as a DaemonSet, so that one instance will run on every host. The deployment files for doing that can be found in the Smart Agent’s deployment documentation.

The following example ConfigMap is for a GKE cluster called my-gke-cluster with a trace forwarder listening on each host at port 9080. It sends metrics and traces to a gateway deployed as gateway.default.svc.cluster.local listening on port 8080.

---
# Source: signalfx-agent/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: signalfx-agent
  labels:
    app: signalfx-agent
data:
  agent.yaml: |
    signalFxAccessToken: ${SFX_ACCESS_TOKEN}
    ingestUrl: http://gateway.default.svc.cluster.local:8080

    intervalSeconds: 10

    logging:
      level: info

    writer:
      logTraceSpans: true

    globalDimensions:
      kubernetes_cluster: my-gke-cluster

    sendMachineID: false

    observers:
    - type: k8s-api

    monitors:
    - type: collectd/cpu
    - type: collectd/cpufreq
    - type: collectd/df
      hostFSPath: /hostfs
    - type: collectd/disk
    - type: collectd/interface
    - type: collectd/load
    - type: collectd/memory
    - type: collectd/protocols
    - type: collectd/signalfx-metadata
      procFSPath: /hostfs/proc
    - type: host-metadata
      etcPath: /hostfs/etc
      procFSPath: /hostfs/proc
    - type: collectd/uptime
    - type: collectd/vmem
    - type: trace-forwarder
      listenAddress: 0.0.0.0:9080

    - type: collectd/processes
      processes:
      - collectd
      - signalfx-agent

    - type: kubelet-stats
      kubeletAPI:
        authType: none
        url: http://localhost:10255

    # Collects k8s cluster-level metrics
    - type: kubernetes-cluster
      useNodeName: true

    - type: docker-container-stats
      dockerURL: unix:///var/run/docker.sock
      excludedImages:
       - '*pause-amd64*'
      labelsToDimensions:
        io.kubernetes.container.name: container_spec_name
        io.kubernetes.pod.name: kubernetes_pod_name
        io.kubernetes.pod.uid: kubernetes_pod_uid
        io.kubernetes.pod.namespace: kubernetes_namespace

    - type: internal-metrics
      discoveryRule: container_image =~ "gateway" && private_port == 9091

    collectd:
      readThreads: 5
      writeQueueLimitHigh: 500000
      writeQueueLimitLow: 400000
      timeout: 40
      logLevel: info

    metricsToExclude:
      # The StackDriver metadata-agent pod on GKE restarts every few minutes so
      # ignore its containers
      - dimensions:
          container_spec_name: metadata-agent
      - '#from': /lib/whitelist.json
        flatten: true

The application should be configured to send to the Smart Agent by using the host IP provided by the Downward API:

env:
- name: SIGNALFX_AGENT_HOST
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

The application can read this environment variable when configuring a tracer endpoint. Assuming the same example Smart Agent configuration provided above, the endpoint URL will be of the form:

http://$(SIGNALFX_AGENT_HOST):9080/v1/trace

In your application’s instrumentation, you must read the environment variable and build the endpoint URL to pass to the tracer. For example, using the SignalFx Python autoinstrumentation, the tracer can be configured as follows:

from signalfx_tracing.utils import create_tracer

create_tracer("",
              config={
                'logging': True,
                'jaeger_endpoint': 'http://' + os.getenv('SIGNALFX_AGENT_HOST') + ':9080/v1/trace'
              },
              service_name="service_name")

If using the SignalFx Java Agent, setting the environment variable SIGNALFX_AGENT_HOST as above, using the host IP, will automatically configure it to send to the Smart Agent.

Envoy Configuration 🔗

The Envoy traces can be used as an alternative to the SignalFx Mixer Adapter to get traffic routing information. As mentioned above, this information can be configured to be reported through the Mixer adapter. Envoy tracing just provides a more direct level of control, and provides separate ingress and egress spans for every operation. This section goes over configuration for a few of the common use cases.

By default, there is no easy way to make the sidecar injector point to the host where the pod is running, since the default sidecar injector reads the Zipkin collector address from the Istio ConfigMap. The sidecar proxy can be configured, in a separate ConfigMap, or by editing the existing istio-sidecar-injector ConfigMap, to send to the host Smart Agent with a custom sidecar injector configuration, using an environment variable set using the Downward API:

# istio-custom-sidecar-injector.yaml
---
apiVersion: v1
kind: ConfigMap metadata:
  name: istio-sidecar-injector
  namespace: istio-system
  labels:
    app: istio
    chart: istio-1.1.0
    heritage: Tiller
    release: istio
    istio: sidecar-injector
data:
  config: |-
    policy: enabled
    template: |-
      containers:
      - name: istio-proxy
        args:
        ...
        - --zipkinAddress
        - $(POD_HOST_IP):9080
        ...
        env:
        - name: POD_HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP

        ...
---

Whether sending to the Smart Agent or the Smart Gateway, the Zipkin collector endpoint must be updated to point to /v1/trace rather than the default, /api/v1/spans. To set the trace endpoint, a custom Envoy bootstrap config is needed. This ConfigMap should be saved to istio-signalfx-bootstrap-config.yaml and applied in the same namespace as your application:

# istio-signalfx-bootstrap-config.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-signalfx-bootstrap-config
data:
  custom_bootstrap.json: |
    {
      "tracing": {
        "http": { "name": "envoy.zipkin", "config": {
            "collector_cluster": "zipkin",
            "collector_endpoint": "/v1/trace",
            "trace_id_128bit": "false"
          }
        }
      }
    }
$ kubectl apply -f istio-signalfx-bootstrap-config.yaml

Then in the pod spec for your deployment, set the annotation sidecar.istio.io/bootstrapOverride to the name of the custom Envoy ConfigMap:

# deployment yaml

apiVersion: extensions/v1beta1
kind: Deployment
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/bootstrapOverride: "istio-signalfx-bootstrap-config"
...

The Envoy trace service name is inherited from the --serviceCluster argument passed to the init command. This can also be controlled through a custom injector configuration. The default injector configuration names the service cluster using the pod label app and the pod’s namespace in the form $(APP).(POD_NAMESPACE). This may not be an ideal name, the app label can be edited directly in the pod spec. Alternatively, the injector configuration can be updated to use a custom naming scheme:

# istio-custom-sidecar-injector.yaml
---
apiVersion: v1
kind: ConfigMap metadata:
  name: istio-sidecar-injector
  namespace: istio-system
  labels:
    ...
    istio: sidecar-injector
data:
  config: |-
    policy: enabled
    template: |-
      containers:
      - name: istio-proxy
        args:
        ...
        - --serviceCluster
        - $(CUSTOM_NAMING_SCHEME)
        ...
---

In some cases, it may be preferable to disable Envoy tracing altogether for a service. To do so, simply omit the --zipkinAddress argument to the Envoy sidecar. One way to do this when using automatic injection is to modify the injector bootstrap config to only add the Zipkin argument if a label is set. For example, to disable tracing on all pods that have the label trace: disabled, the sidecar injection configuration should look like this:

# istio-custom-sidecar-injector.yaml
---
apiVersion: v1
kind: ConfigMap metadata:
  name: istio-sidecar-injector
  namespace: istio-system
  labels:
    ...
    istio: sidecar-injector
data:
  config: |-
    policy: enabled
    template: |-
      containers:
      - name: istio-proxy
        args:
        ...
        [[ if ne (index .ObjectMeta.Labels "trace") "disabled" -]]
        - --zipkinAddress
        - $(ZIPKIN_ADDRESS)
        [[ end -]]
        ...
---

When doing manual sidecar injection, just remove the two lines corresponding to zipkinAddress and its value.

Also be sure to remove any annotations pointing to custom Zipkin bootstrap configs.

Example 🔗

An example showing many of the configurations discussed here can be examined at the tracing examples repo.

AWS App Mesh 🔗

App Mesh is a service mesh with a managed control plane that uses Envoy as the data plane. The primary way to get traces from the mesh infrastructure is by configuring the Envoy sidecars to send traces from built-in Zipkin instrumentation to a SignalFx Smart Agent running on the host or in the same pod/task. To learn more about observability for App Mesh with SignalFx, see this SignalFx blog post.

Envoy and Mesh Configuration 🔗

The Envoy sidecar proxies can be configured for tracing by providing a tracing configuration stub, which will be appended to the standard Envoy configuration at initialization. This configuration file will be similar to this:

# tracing.yaml
static_resources:
  clusters:
  - name: zipkin
    connect_timeout: 2s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN
    hosts:
    - socket_address:
        address: SIGNALFX_ENDPOINT_HOST
        port_value: SIGNALFX_ENDPOINT_PORT

tracing:
  http:
    name: envoy.zipkin
    config:
      collector_cluster: zipkin
      collector_endpoint: "/v1/trace"

SIGNALFX_ENDPOINT_HOST and SIGNALFX_ENDPOINT_PORT should be replaced with the address and port of a Smart Gateway deployed outside of the cluster or to the host IP where a Smart Agent is running.

If traces are being sent to a Smart Agent running on the task’s host, we recommend that you configure the Envoy container by building a container with a Docker volume and an init script. During task creation, the init script can determine the host’s IP address by accessing the EC2 metadata service via curl and substitute the correct values into the tracing configuration file. The Envoy container can mount this volume and be directed to read the tracing configuration contained within.

An example Dockerfile for this approach:

FROM alpine

# needed to query the metadata service
RUN apk add -- update \
    curl \
    && rm -rf /var/cache/apk/*

ADD tracing.yaml /envoy_config/tracing.yaml
ADD startup.sh /startup.sh

RUN chmod +x /startup.sh

VOLUME /envoy_config

CMD /startup.sh

startup.sh is a simple bash script that replaces the Zipkin endpoint address in tracing.yaml with the the port and address provided through environment variables, SIGNALFX_ENDPOINT_PORT and SIGNALFX_ENDPOINT_HOST. If the host variable is not provided, the endpoint address is replaced with the IP address returned by the metadata server:

#!/bin/bash

export HOST_IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)

sed -i 's/SIGNALFX_ENDPOINT_HOST/'"${SIGNALFX_ENDPOINT_HOST:-$HOST_IP}"'/g; s/SIGNALFX_ENDPOINT_PORT/'"${SIGNALFX_ENDPOINT_PORT:-9080}"'/g' /envoy_config/tracing.yaml

The Envoy container can read this configuration file by mounting the Docker volume and setting an environment variable with the path to the file. For example, the modified container definition might look like this:

{
  "name": "envoy",
  "image": $ENVOY_IMAGE,
  "user": "1337",
  "essential": true,
  "ulimits": [
    {
      "name": "nofile",
      "hardLimit": 15000,
      "softLimit": 15000
    }
  ],
  "portMappings": [
    {
      "containerPort": 9901,
      "hostPort": 9901,
      "protocol": "tcp"
    },
    {
      "containerPort": 15000,
      "hostPort": 15000,
      "protocol": "tcp"
    },
    {
      "containerPort": 15001,
      "hostPort": 15001,
      "protocol": "tcp"
    }
  ],
  "environment": [
    {
      "name": "APPMESH_VIRTUAL_NODE_NAME",
      "value": $VIRTUAL_NODE
    },
    {
      "name": "APPMESH_VIRTUAL_NODE_CLUSTER",
      "value": $ENVOY_SERVICE_NAME
    },
    {
      "name": "ENVOY_LOG_LEVEL",
      "value": $ENVOY_LOG_LEVEL
    },
    {
      "name": "APPMESH_XDS_ENDPOINT",
      "value": $APPMESH_XDS_ENDPOINT
    },
    {
      "name": "ENABLE_ENVOY_STATS_TAGS",
      "value": "1"
    },
    {
      "name": "ENVOY_TRACING_CFG_FILE",
      "value": "/envoy_config/tracing.yaml"
    }
  ],
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": $ECS_SERVICE_LOG_GROUP,
      "awslogs-region": $AWS_REGION,
      "awslogs-stream-prefix": $AWS_LOG_STREAM_PREFIX_ENVOY
    }
  },
  "healthCheck": {
    "command": [
      "CMD-SHELL",
      "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE"
    ],
    "interval": 5,
    "timeout": 2,
    "retries": 3
  },
  "volumesFrom": [
    {
      "sourceContainer" : "config-volume",
      "readOnly" : true
    }
  ]
}

Specifically, the volumesFrom container definition attribute and the ENVOY_TRACING_CFG_FILE must be correctly set to read the configuration file.

Since traffic is restricted by default, to allow the application to send traces to the Smart Agent, add the Smart Agent port to the egress ignored ports list. In ECS, this is done by adding property to proxyConfiguration called EgressIgnoredPorts:

{
  ...
  "proxyConfiguration": {
    "type": "APPMESH",
    "containerName": "envoy",
    "properties": [
      ...
      {
        "name": "EgressIgnoredPorts",
        "value": "9080"
      },
      ...
    ]
  }
}

In EKS, set the environment variable APPMESH_EGRESS_IGNORED_PORTS on the proxyinit container in your pod spec.

spec:
  ...
  initContainers:
    - name: proxyinit
      image: 111345817488.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-proxy-route-manager:latest
      securityContext:
        capabilities:
          add:
            - NET_ADMIN
      env:
        ...
        - name: "APPMESH_EGRESS_IGNORED_PORTS"
          value: "9080"

The proxy will refuse to route traffic for these ports, regardless of the host, so be sure that this port is set to be distinct from any port used by application logic.

When sending directly to the Smart Gateway, the Envoys and the application can just use the service discovery endpoint or IP of the Smart Gateway deployment. However, the Smart Gateway must be modeled in the mesh in order to allow traces to leave the mesh and reach it. This can be done by defining a virtual node and virtual service for the Smart Gateway. Below are examples given as a CloudFormation template:

# gateway_mesh_endpoint.yaml
---
Resources:
  GatewayVirtualNode:
    Type: AWS::AppMesh::VirtualNode
    Properties:
      MeshName: APPMESH_NAME
      VirtualNodeName: gateway-vn
      Spec:
        Listeners:
          - PortMapping:
              Port: GATEWAY_PORT
              Protocol: http
        ServiceDiscovery:
          DNS:
            Hostname: GATEWAY_URL

  GatewayVirtualService:
    Type: AWS::AppMesh::VirtualService
    DependsOn:
      - GatewayVirtualNode
    Properties:
      MeshName: APPMESH_NAME
      VirtualServiceName: GATEWAY_URL
      Spec:
        Provider:
          VirtualNode:
            VirtualNodeName: gateway-vn

Replace APPMESH_NAME, GATEWAY_PORT, and GATEWAY_URL with appropriate values or parameter references.

Smart Agent 🔗

The SignalFx Smart Agent must be configured according to the deployment type of the App Mesh application.

For ECS deployments, configure the Smart Agent with a Daemon placement strategy, as described in the deployment documentation.

For Fargate task types, the Smart Agent must run as a sidecar, so Envoy and the application must be configured to send traces to localhost:9080. More information about deploying with Fargate can be found here.

Example 🔗

An example application with sample configurations and deployment files is available in our tracing examples repo.