Docs » µAPM Deployment Guide » High-Scale Smart Gateway clusters

High-Scale Smart Gateway clusters 🔗

When operating as a cluster, the SignalFx Smart Gateway can scale horizontally to very large volumes of inbound traces and spans to support the largest of applications. To make the deployment and operation of such large clusters (typically above 10 nodes) easier, we recommend the deployment and/or use of a separate Etcd cluster instead of relying on the Smart Gateway’s embedded Etcd component. This not only allows for larger clusters, but also for smoother deployment and management of the Smart Gateway instances themselves, and opens the door to autoscaling your Smart Gateway cluster if necessary. In this configuration, the Smart Gateway instances no longer become members of the Etcd cluster but instead simply connect to it as clients.

For more detailed recommendations on how many Smart Gateway instances to run in your cluster, refer to the Sizing recommendations.

The example below will install and configure a standalone 3-node Etcd cluster, along with a 3-node Smart Gateway cluster behind a NGINX load balancer. To scale that cluster up, add similarly configured Smart Gateway instances connecting to the same Etcd cluster, and register them in the NGINX configuration.

Deploying and running an Etcd cluster 🔗

Running Etcd separately from your Smart Gateway instances has several benefits. By running Etcd independently, you can leverage existing Etcd documentation, best practices and operators for a more transparent view of your Smart Gateway cluster orchestrator. As clients of this Etcd cluster, the Smart Gateway instances can start, stop, restart or auto-scale without affecting the membership or stability of the Etcd cluster and without any specific bootstrapping or orchestration requirements.

Etcd cluster sizing 🔗

When running standalone, Etcd is a very stable, low-maintenance and low-requirements software to run in your infrastructure. The Smart Gateway does not make a high volume of requests to Etcd so it can be run on a small set of low-cost instances in your infrastructure. Refer to Etcd’s hardware recommendations guide for guidance on cluster size and instance types; a simple 3-node setup will serve most Smart Gateway cluster deployments.

Cluster bootstrapping 🔗

To bootstrap the Etcd cluster, start by creating a discovery address to help the Etcd nodes discover each other. If you intend to run a larger Etcd cluster, adjust the size parameter accordingly.

$ curl https://discovery.etcd.io/new?size=3
https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de

Then start each Etcd instance with that discovery address using the following command, on each instance (replace the ETCD_SERVER_ADDRESS with the local IP address of each node, respectively):

$ etcd --name sgw-etcd --initial-advertise-peer-urls http://ETCD_SERVER_ADDRESS:2380 \
  --listen-peer-urls http://ETCD_SERVER_ADDRESS:2380 \
  --listen-client-urls http://ETCD_SERVER_ADDRESS:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://ETCD_SERVER_ADDRESS:2379 \
  --discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de

Runtime reconfiguration 🔗

After the cluster starts up, add or remove instances using etcdctl or Etcd’s HTTP API, as described in the Etcd Runtime Reconfiguration Guide. You should not attempt to use discovery for adding/removing instances to a running cluster.

Round-robin DNS 🔗

To make the configuration of your Smart Gateway instances easier, you can define a round-robin DNS name for your Etcd cluster instances. This can be done by defining A records for the same DNS name for each one of your Etcd servers in your DNS zone file. If you are deployed on AWS and rely on Route 53, you can find additional information about configuring Simple Routing with multiple values here.

NGINX load balancer deployment 🔗

The easiest way to run NGINX depends on your preferred application deployment method. For example, on an Amazon Linux system, you can install NGINX with yum install nginx. Alternatively, you can run NGINX as a Docker container using the publicly available nginx Docker image:

$ docker run -v /host/path/nginx.conf:/etc/nginx/nginx.conf:ro -d -p 80:80 nginx

In both cases, create your nginx.conf configuration file to setup NGINX for load balancing across the three provisioned Smart Gateway instances:

events {
}

http {
    upstream smart-gateway {
        server IP_OF_SGW_1:8080;
        server IP_OF_SGW_2:8080;
        server IP_OF_SGW_3:8080;
    }

    server {
        listen 80;

        location / {
            proxy_set_header Host $host;
            proxy_pass http://smart-gateway;
        }
    }
}

This will configure NGINX to listen on port 80 and proxy incoming requests to one of the configured Smart Gateway backend instances, on port 8080. For more information on configuring NGINX, see NGINX’s load balacing documentation.

High-availability load balancing 🔗

As a single NGINX load balancer instance might be an undesirable single point of failure in your deployment, you can choose to deploy multiple NGINX instances with the same configuration (as shown above) and use a round-robin DNS name pointing at your deployed NGINX instances, in a similar fashion as the round-robin DNS name defined for your Etcd cluster.

Smart Gateway deployment and configuration 🔗

Deploying and configuring the Smart Gateway is quicker and easier with an external Etcd cluster. The ClusterName and ServerName are always required, and ClusterOperation must be set to client to indicate that the Smart Gateway will connect to (instead of join) the Etcd cluster. To configure the address(es) of the Etcd cluster, use the TargetClusterAddresses list. If you’ve setup a round-robin DNS name or a TCP load balancer for your Etcd cluster instances, you can provide that single address and port in the list of TargetClusterAddresses.

Here is a complete example of a Smart Gateway configuration that connects to an external Etcd cluster:

{
  "ServerName": "smart-gateway-1",
  "ClusterName": "prod",
  "TargetClusterAddresses": [
    "ETCD_CLUSTER_ADDRESS:2379",
  ],
  "LogDir": "/var/log/gateway",
  "ListenFrom": [
    {
      "Type": "signalfx",
      "ListenAddr": "0.0.0.0:8080"
    }
  ],
  "ForwardTo": [
    {
      "Type": "signalfx",
      "URL": "https://ingest.YOUR_SIGNALFX_REALM.signalfx.com/v2/datapoint",
      "EventURL": "https://ingest.YOUR_SIGNALFX_REALM.signalfx.com/v2/event",
      "TraceURL": "https://ingest.YOUR_SIGNALFX_REALM.signalfx.com/v1/trace",
      "DefaultAuthToken": "YOUR_SIGNALFX_API_TOKEN",
      "Name": "smart-gateway-forwarder",
      "TraceSample": {
        "BackupLocation": "/var/lib/gateway/data",
        "IngestAddress": "http://IP_OF_SGW_INSTANCE:8080",
        "ListenRebalanceAddress": "0.0.0.0:2382",
        "AdvertiseRebalanceAddress": "IP_OF_SGW_INSTANCE:2382"
      }
    }
  ]
}

Repeat this configuration for the rest of the Smart Gateway instances in your cluster (replacing IP_OF_SGW_INSTANCE as necessary for each instance).

The following configuration options related to embedded Etcd operation are not required and should not be specified:

  • ListenOnPeerAddress
  • AdvertisedPeerAddress
  • ListenOnClientAddress
  • AdvertisedClientAddress
  • ETCDMetricsAddress
  • ClusterDataDir
  • UnhealthyMemberTTL

Running the Smart Gateway instances 🔗

Because this deployment model uses an external Etcd cluster, the Smart Gateway instances don’t need any specific bootstrapping order or instructions. Just start all your Smart Gateway instances using the client cluster operation (or by setting the SFX_CLUSTER_OPERATION=client environment variable):

$ ./smart-gateway --configfile gateway.conf --cluster-op client

You can check that the Etcd cluster contains the members you expect by querying Etcd directly. For example a three-node Gateway cluster should show three advertised instances under smrt/:

$ ETCDCTL_API=3 etcdctl --endpoints=http://ETCD_CLUSTER_ADDRESS:2379 get --prefix --keys-only smrt/
smrt/i-00be46af42f8911ed
smrt/i-04f8e00df56d7955d
smrt/i-0dbbb9fa64ee63530