Docs » Getting Started » Built-In Content » Infrastructure Navigator

Infrastructure Navigator

Click Infrastructure on the navigation bar to view the Infrastructure Navigator, which provides a data-driven visualization of the physical servers, virtual machines, AWS instances, and other resources in your environment that are visible to SignalFx. The resources reflect services you have integrated with SignalFx (see Integrations Guide).

Note on AWS Lambda

The format and content displayed in the Infrastructure Navigator for AWS Lambda is different from what is discussed below. For more information, see Using the Infrastructure Navigator for AWS Lambda.

../../_images/infra-sidebar.png

The Infrastructure Navigator shows all resources emitting a core set of infrastructure metrics; metrics shown depend on the service you are viewing. This page gives you an immediate overview of the health of your system, and also lets you navigate through your infrastructure by drilling down for more detail and viewing related resources or views of the same item in a different context. Lists of resources and active alerts, as well as relevant built-in dashboards, are available in information tabs below the data visualization.

In the Infrastructure Navigator, each square is colored according to the value of the metric chosen in the Color by selector in the control bar above. The colors of the squares range from green (low) to red (high) depending on the value of the metric.

../../_images/infra-legend.png

For information on customizing the content and format of the Infrastructure Navigator, including filtering, grouping, and more, see Customizing the display.

Drilling down

When your cursor hovers above a square in the Infrastructure Navigator, you can see the information about that resource.

../../_images/hover1.png

Click on a square to zoom in and see details about that resource in the information tabs below the navigator. For example, if you click on the name of a host, the tabs will display various properties of the host, processes running on the host, and so on. The following illustration shows a built-in dashboard with charts that are relevant to the selected host.

../../_images/hover-single.png

Note

The color or statistics for an element may change as you drill down or click through your system. This is because the information may be refreshed between the time you begin navigating and the time a target element is displayed.

While you’re looking at a single resource in drilldown view, you can use the Go to feature to see related resources or views for that resource.

To exit the drilldown view, click the x in the top right corner of the square.

Using the information tabs

A set of tabs below the Infrastructure Navigator provides access to detailed information about the resources displayed. The tabs vary depending on the service you are viewing. For example, tabs may display a list of hosts, a list of current alerts, a dashboard with charts showing values of certain metrics, and so on.

If the host is running services that are monitored via collectd plugins (such as Elastic Search, Cassandra, or Kafka), the relevant built-in dashboards will be listed in tabs as well, as shown below.

../../_images/single-host.png

Customizing the display

The control bar above the Infrastructure Navigator lets you modify which resources are shown, how they are grouped, which metric you are focusing on, and more.

Filter

Use the Filter selector in the control bar to view a specific slice of your environment based on dimension(s) you specify. Filtering is particularly useful for viewing just the resources running a specific service, or in a particular availability zone.

Group by

Use the Group By button in the control bar to partition resources by the selected dimension. As you hover over or select the different options in the list, the resources will immediately re-arrange themselves in the Infrastructure Navigator below. This feature allows you to do a hierarchical grouping (up to two levels) - though we would not recommend this feature on small screens. For example, in the image below, the hosts have been grouped by the single dimension aws_availability_zone.

../../_images/host-group-by.png

When you have specified a Group‑by option, you can also click a group name to see a dashboard for that group in the System Metrics tab (see Using the information tabs). Other groups are greyed out, to make it clear which group’s dashboard you are viewing.

../../_images/group-by.png

Color by

Use the Color By selector in the control bar to specify the metric you want to use to color the squares. Colors range from green (lowest 20% of values among all resources) to red (highest 20% of values among all resources). Note that for many metrics, red does not necessarily indicate a problem situation – it is primarily informational.

You can also color by the “Most severe alert.” The highest severity currently active alert in each resource is determined, and the squares are colored from green (no alerts) to red (one or more critical alerts). In this view, red does in fact indicate a problem situation that you will probably want to address. You can start by looking at the Alerts tab below the visualization. For information on working with a list of alerts, see Viewing active alerts on the Alerts page.

Note

White squares indicate that resources are not currently emitting values for the specified metric. Black squares indicate that SignalFx considers the resources “dead” because they have not emitted values for a specified period of time. You can specify settings related to these non-emitting resources by selecting Infrastructure Navigator Settings from the Actions menu. When the resources begin emitting values again, the squares will be colored appropriately.

../../_images/infra-nav-settings.png

Find outliers

Outlier detection highlights the powerful real-time analytics that the SignalFx platform is capable of and can be enabled by selecting Find Outliers in the control bar. When enabled, resources that are determined to be outliers in their group, based on values of the Color by metric, are colored red.

Outlier detection can be determined by one of two strategies that are common in data analysis:

  • Deviation from population mean
Highlight resources with values significantly above the average value of other resources. This strategy tends to highlight only those resources with the most extreme values, and generally provides meaningful results only when you have a large number of resources (15 or more).
  • Deviation from the population median
Highlight resources with values significantly above the median value of other resources. If there are relatively small differences in value among the majority of resources, this strategy tends to highlight any host which is not part of this majority.

For example, if resources are grouped by the service that they are running, and colored by cpu.utilization, and outlier detection is enabled, then resources that are using significantly more CPU than their peers will be highlighted in red. You can then investigate those specific resources to determine why they are behaving differently.

While both outlier strategies will highlight resources that are behaving differently from others, if the population has two groups of outliers (e.g. most resources are running at 20% CPU utilization but there are 3 running at 60% and 1 more running at 80%), deviation from mean will find the greatest outlier (resources running at 80%), while the deviation from median will typically be able to identify both groups. It is easy to switch from one strategy to the other, so you can always use a strategy that works best for your specific environment.

This feature also provides a population selector which allows you to restrict the comparison population to only those resources that have similar characteristics (as defined by the Group By dimension). For example, it may not make sense to compare a server against others that are running different software. It would be more relevant to determine outliers among servers providing the same service. Group resources by the service that they are running and then choose that as your population basis. This will ensure that resources are compared only with their peers to determine if they are behaving abnormally.