Docs » µAPM Deployment Guide » Smart Gateway cluster operations

Smart Gateway cluster operations 🔗

Graceful instance shutdown 🔗

To avoid data loss, Smart Gateway instances should be gracefully terminated by sending the process a SIGTERM signal and waiting for the process to terminate. Upon receipt of this signal, the Smart Gateway implements a graceful shutdown procedure that ensures it correctly drains its incoming connections and buffers, and transfers to the rest of the cluster any in-flight trace spans that still need to be analyzed.

Performing a rolling restart or upgrade 🔗

Important

When deploying a Smart Gateway cluster, instances with different major versions cannot be running in the same cluster, so it’s not possible to do a rolling upgrade to a new major version. You will need to completely stop the cluster and re-deploy it with the new version.

You can restart and upgrade Smart Gateway clusters with no downtime by performing a rolling restart of the Smart Gateway instances. The safest approach to performing a rolling restart is to restart one instance at a time, waiting for the new process to join the cluster and reach steady state before moving on to the next instance. When restarting an instance within an existing established cluster, you should never use the seed cluster operation.

Follow the steps below to perform rolling restart or upgrade of a Smart Gateway cluster:

  • Identify a node to restart or upgrade.
  • Gracefully shut down the identified node by sending a SIGTERM signal to the Smart Gateway process.
  • Wait for the process to gracefully terminate.
  • Start the new replacement process; ensure that you use either a join or client cluster operation based on your desired cluster topology.
  • Wait for the process to start (HTTP requests to /healthz on the listener port should return 200 OK).

Performing an instance replacement 🔗

Small-scale, embedded Etcd clusters 🔗

When replacing an instance, or often when restarting an instance in a containerized environment, it is likely that the new instance will not have the same IP address as the instance it replaces. As cluster members’ IP addresses are embedded into the configuration’s TargetClusterAddresses field, it is important to understand the implications of replacing an instance.

Because the cluster members primarily learn about each other through Etcd, it is not required for the TargetClusterAddresses to be an always up-to-date and accurate list of the other members’ IP addresses. This list is only used when a Smart Gateway instance starts up and connects to an established cluster; once connected, all other coordination and cluster membership information is performed through Etcd. Therefore, the absolute minimum requirement for the TargetClusterAddresses list is that at least one of those IP addresses is a valid and active cluster instance that can be reached to establish this connection.

When replacing an instance, it is therefore not necessary to reconfigure and restart the rest of the cluster to make it aware of this new instance, as this discovery will happen through Etcd for all instances. The TargetClusterAddresses list might of course become stale in long-running instances. If the cluster has changed significantly since the last restart, we recommend that you refresh this part of the configuration with an up-to-date list before (re)starting a Smart Gateway process.

High-scale, external Etcd clusters 🔗

For high-scale clusters that leverage a distinct, standalone Etcd cluster, the replacement of a Smart Gateway instance does not require any specific reconfiguration or orchestration. Deploy and configure a new Smart Gateway instance as described in High-Scale Smart Gateway clusters and start it when ready.