Docs » µAPM Concepts and Terminology

µAPM Concepts and Terminology 🔗

Important

The original µAPM product, released in 2019, is now called µAPM Previous Generation (µAPM PG). In the documentation, µAPM now refers to the product released on March 31, 2020.

If you are using µAPM Previous Generation (µAPM PG), see µAPM PG Traces, Spans, Metrics, and Metadata.

About traces and spans 🔗

A trace is a collection of spans that share the same trace ID, representing a unique transaction handled by your application and its constituent services.

../../_images/identities-11.png

Each span has a name, representing the operation captured by this span, and a service name, representing within which service the operation took place. Additionally, spans may reference another span as their parent, defining the relationships between the operations captured in the trace that were performed to process that transaction.

Each span contains a lot of information about the method, operation, or block of code that it captures, including:

  • the operation name
  • the start time of the operation with microsecond precision
  • how long the operation took to execute, also with microsecond precision
  • the logical name of the service on which the operation took place
  • the IP address of the service instance on which the operation took place

Span tags 🔗

Beyond their intrinsic properties such as name, timestamp, and duration, trace spans carry a lot of additional, free-form metadata to provide information and context about the operations that they represent. This metadata can be used to query and filter traces that contain the desired metadata, or to provide extra information about each operation when inspecting the spans of a trace during troubleshooting.

Span tags are key-value pairs attached to the span to carry additional context about the operation. Span tags are kept as a map of keys to values; tag keys must therefore be unique within a given span. Both keys and values are text strings and are mostly free-form, but tag keys typically follow predefined ontologies and conventions to ensure that the same information is captured by the same tag names across spans.

Standard tags such as component, http.method, http.url, db.statement, etc., as applicable are included by default, but developers can add custom tags through code instrumentation.

Span tag naming conventions 🔗

Tags work best when you can construct a simple, dependable, and clear metadata model with them. As such, you should thrive for defining clear tag names that are used throughout your applications to carry the same information. Although there is no common standard or convention around tag names, OpenTracing has laid out some proposals around common tag names and their semantic definitions.

Library and framework instrumentation, as well as automatic instrumentation, leverages those tag names and their meaning to carry additional information about the operations and requests being traced. Whenever one of those tag names corresponds to the type of information you are capturing in a tag, you should use it. Otherwise, you can simply define your own tag names; when you do, we recommend that you be consistent in their usage.

The host tag 🔗

When following the recommended deployment model (see µAPM Getting Started), trace spans emitted by your applications are sent to the SignalFx Smart Agent, which is responsible for automatically adding a host tag to every span captured on that host. This allows SignalFx to identify on which piece of infrastructure each trace span was executed, render the corresponding key infrastructure metrics, and link to more complete dashboards for the host. The value of the host tag is typically the hostname of the underlying instance, or its unique resource identifier.

Additional tags 🔗

In addition to the host tag discussed above, the Smart Agent automatically adds a few span tags to every span captured on that host. Those additional tags allow SignalFx to identify on which piece of infrastructure each trace span was executed, render the corresponding key infrastructure metrics, and link to more complete dashboards for the underlying host or Kubernetes pod.

  • A AWSUniqueId or gcp_id tag is added when the host is an AWS or a GCP instance, respectively, to provide their globally unique cloud-provider resource identifiers.
  • A container_id tag is added when the instrumented application is running in a Docker container.
  • A kubernetes_pod_uid tag is added when the instrumented application is containerized and deployed in a Kubernetes cluster.

Limits on span metadata & traces 🔗

There is no limit to the number of span tags or span annotations attached to each span. There is however a limit to the total cumulative size of this metadata: the total length of all tag keys, tag values, and span annotations may not exceed 64kB.

Also, max number of spans in any given trace is capped at 5000 spans per trace.

APM identities and MetricSets 🔗

In addition to retaining 100% of traces in their raw form with spans & tags, µAPM generates sets of metrics for real-time monitoring, alerting as well as for high-cardinality troubleshooting. These sets can be generated for each APM Identity which is defined below.

APM identity 🔗

An APM Identity is defined as a unique combination of an APM Object and indexed span tag value(s).

An APM Object can be one of the following, and always includes at least one service.

  • Service: eg. Service-1
  • Endpoint (First Span into a service): eg. Service-1.Endpoint-1
  • Operation (Intra-Service Span): eg. Service-1.Operation-1
  • Edge (Inter-Service): eg. Service-1.Endpoint-1->Service-2.Endpoint-2
  • Workflow: Endpoints where traces initiate, so, eg: Service-1.InitEndpoint-1

Identity examples 🔗

  • Identity for Service 1
    • Service-1
  • Service 1 identity in Environment A and Environment B creates additional identities resulting in up to four identities
    • Service-1
    • Service‑1.Unknown, Service‑1.Environment‑A, Service‑1.Environment‑B
  • Service 1 identity in Environment A and Environment B with two release versions in each environment (for example, Release Version 1 and Release Version 2) can result in up to thirteen identities:
    • Service-1
    • Service‑1.Unknown, Service‑1.Environment‑A, Service‑1.Environment‑B
    • Service‑1.Environment‑A.Unknown, Service‑1.Environment‑A.ReleaseVersion‑1, Service‑1.Environment‑A.ReleaseVersion‑2
    • Service‑1.Environment‑B.Unknown, Service‑1.Environment‑B.ReleaseVersion‑2, Service‑1.Environment‑B.ReleaseVersion‑2
    • Service‑1.Unknown.Unknown, Service‑1.Unknown.ReleaseVersion‑1, Service‑1.Unknown.ReleaseVersion‑2

MetricSets 🔗

You may be familiar with the SignalFx concepts of metrics and metric time series, which are used to populate charts and generate alerts. When discussing metrics in the context of APM, we are using different terminology: Monitoring MetricSets and Troubleshooting MetricSets.

  • Monitoring MetricSets are used for real-time monitoring & alerting. These are essentially counterparts of the original term “metric time series.”
  • Troubleshooting MetricSets are used in the µAPM UI for filtering service-graphs and breaking down SLIs to enable historical comparison for spans and workflows.

By default, each identity has Troubleshooting MetricSets, but may not have Monitoring MetricSets.

Troubleshooting MetricSets are stored for eight days by default, along with full-fidelity traces. Does not require an accompanying monitoring metricset. Each Troubleshooting MetricSet has the following metrics available to use for troubleshooting in the APM UI:

  • Request-Rate
  • Error-Rate
  • Root-Cause Error-Rate
  • Latency: Min, Max, P50, P90, P99

Monitoring MetricSets are stored for thirteen months by default. Each Monitoring MetricSet has the following metrics available for monitoring and alerting:

  • Request-Rate
  • Error-Rate
  • Latency: Min, Max, P50, P90, P99

Out-of-the-Box Experience 🔗

By default, µAPM creates the following identities and associated MetricSets.

APM Object OOB Description Identity Example Troubleshooting MetricSet Monitoring MetricSet
Service Identities for all services Service‑1 Yes Yes
Endpoint Identities for all endpoints

Service‑1.Endpoint‑1.HTTPMethod

(or Service‑1.InitEndpoint‑1 if HTTPMethod is absent)

Yes Yes
Workflow Identities for all initiating endpoints

Service‑1.InitEndpoint‑1.HTTPMethod

(or Service‑1.InitEndpoint‑1 if HTTPMethod is absent)

Yes Yes
Edge Identities for all edges between services Service‑1.Endpoint‑1.HTTPMethod‑>Service‑2.Endpoint‑2.HTTPMethod Yes No
Operation Identities for all spans within services Service‑1.Operation‑1 No No

The resulting total number of Troubleshooting MetricSets OOB is the sum of all identities, while the resulting total number of Monitoring MetricSets OOB will be the sum of “Service + Endpoint + Workflow” Identities.

Note

Presence of multiple environments creates additional identities for each of the above APM Objects as described above in Identity examples. So, for two environments, we will have identities per environment in addition to identities representing SLIs across both environments.