Docs » Integrations Guide » Integrations Reference » MongoDB Atlas

../../_images/integration_mongodbatlas.png MongoDB Atlas ๐Ÿ”—

DESCRIPTION ๐Ÿ”—

This integration primarily consists of the Smart Agent monitor mongodb-atlas. Below is an overview of that monitor.

Smart Agent Monitor ๐Ÿ”—

MongoDB Atlas is a provider of MongoDB as an on-demand fully managed service. Atlas exposes MongoDB cluster monitoring and logging data through its monitoring and logs REST API endpoints. These Atlas monitoring API resources are grouped into measurements for MongoDB processes, host disks and MongoDB databases.

This monitor repeatedly scrapes MongoDB monitoring data from Atlas at the configured time interval. It scrapes the process and disk measurements into metric groups called mongodb and hardware. The original measurement names are included in the metric descriptions. A set of data points are fetched at the configured granularity and period for each measurement. Metric values are set to the latest non-empty data point value in the set. The finest granularity supported by Atlas is 1 minute. The configured period for the monitor needs to be wider than the interval at which Atlas provides values for measurements. Otherwise some of the sets of fetched data points will contain only empty values. The default configured period is 20 minutes which works across all measurements and gives a reasonable response payload size.

Below is an excerpt of the agent configuration yaml showing the minimal required fields. Note that disableHostDimensions is set to true so that the host name in which the agent/monitor is running is not used for the host metric dimension value. The names of the MongoDB cluster hosts from which metrics emanate are used instead.

monitors:
- type: mongodb-atlas
  projectID:  <Project ID>
  publicKey:  <Public API Key>
  privateKey: <Private API Key>
  disableHostDimensions: true

INSTALLATION ๐Ÿ”—

This integration is part of the SignalFx Smart Agent as the mongodb-atlas monitor. You should first deploy the Smart Agent to the same host as the service you want to monitor, and then continue with the configuration instructions below.

CONFIGURATION ๐Ÿ”—

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: mongodb-atlas
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
projectID yes string ProjectID is the Atlas project ID.
publicKey yes string PublicKey is the Atlas public API key
privateKey yes string PrivateKey is the Atlas private API key
timeout no integer Timeout for HTTP requests to get MongoDB process measurements from Atlas. This should be a duration string that is accepted by https://golang.org/pkg/time/#ParseDuration (default: 5s)
enableCache no bool EnableCache enables locally cached Atlas metric measurements to be used when true. The metric measurements that were supposed to be fetched are in fact always fetched asynchronously and cached. (default: true)
granularity no string Granularity is the duration in ISO 8601 notation that specifies the interval between measurement data points from Atlas over the configured period. The default is shortest duration supported by Atlas of 1 minute. (default: PT1M)
period no string Period the duration in ISO 8601 notation that specifies how far back in the past to retrieve measurements from Atlas. (default: PT20M)

USAGE ๐Ÿ”—

Below are screen captures of dashboards created for this monitor by SignalFx, illustrating the metrics emitted by this monitor.

For general reference on how to monitor Atlas MongoDB clusters, see Monitor Your Cluster.

Monitoring Atlas MongoDB replica set

The Writes to MongoDB require the use of the global lock. If lock utilization is high, operations can begin to slow down. This can be a symptom of database issues such as poorly configured or absent indexes, or a schema design that needs improvement. It can also indicate the failure of a disk. Monitor the number of readers and writers waiting for the lock with metric gauge.globalLock.currentQueue.total.

../../_images/mongodbatlas_lock_queue.pnglock queue

This lock has little utilization and few queued readers and writers.

When analyzing the performance of a MongoDB cluster, itโ€™s important to verify that the load is balanced across each instance. The replica sets dashboard included in this repository contains many list charts of MongoDB instances ordered by important metrics like requests per second (counter.network.numRequests) and number of connections to MongoDB (gauge.connections.current). This can help you compare load between instances. Load imbalance can arise in a sharded cluster if MongoDB is unable to balance chunks equally between the shards, for example if lock utilization is high.

../../_images/mongodbatlas_top_hosts_by_requests.pngtop hosts by requests and connections

All the listed instances show about the same requests per second and number of connections. Their load is balanced.

Monitoring Atlas MongoDB process

On an individual process level, itโ€™s important to monitor system statistics like memory usage, page faults, and disk I/O utilization.

It is important to compare the amount of memory that MongoDB has allocated to the amount of system memory. This monitor reports resident memory usage in mem.resident and virtual memory usage in mem.virtual.

../../_images/mongodbatlas_memory.pngMemory statistics from MongoDB

This MongoDB process is not using a large amount of resident memory.

This monitor reports page faults in extra_info.page_faults. Page faults indicate that reads or writes are occurring to data files that are not currently in memory. This is different from an OS page fault. Sudden increases in MongoDB page faults can indicate that a large read operation is taking place. Steadily high numbers of page faults indicate that MongoDB is reading more often from disk than is optimal.

../../_images/mongodbatlas_page_faults.pngPage fault statistics from MongoDB

This MongoDB process has a low rate of page faults. This means that most of the data MongoDB needs to access is in memory, and doesnโ€™t need to be fetched from disk.

METRICS ๐Ÿ”—

Metric Name Description Type
asserts.msg This is Atlas metric measurement ASSERT_MSG counter
asserts.regular This is Atlas metric measurement ASSERT_REGULAR counter
asserts.user This is Atlas metric measurement ASSERT_USER counter
asserts.warning This is Atlas metric measurement ASSERT_WARNING counter
background_flush_avg This is Atlas metric measurement BACKGROUND_FLUSH_AVG counter
cache.bytes.read_into This is Atlas metric measurement CACHE_BYTES_READ_INTO counter
cache.bytes.written_from This is Atlas metric measurement CACHE_BYTES_WRITTEN_FROM counter
cache.dirty_bytes This is Atlas metric measurement CACHE_DIRTY_BYTES gauge
cache.used_bytes This is Atlas metric measurement CACHE_USED_BYTES gauge
connections.current This is Atlas metric measurement CONNECTIONS gauge
cursors.timed_out This is Atlas metric measurement CURSORS_TOTAL_TIMED_OUT counter
cursors.total_open This is Atlas metric measurement CURSORS_TOTAL_OPEN gauge
data_size This is Atlas metric measurement DB_DATA_SIZE_TOTAL gauge
disk.partition.iops.read This is Atlas metric measurement DISK_PARTITION_IOPS_READ counter
disk.partition.iops.total This is Atlas metric measurement DISK_PARTITION_IOPS_TOTAL counter
disk.partition.iops.write This is Atlas metric measurement DISK_PARTITION_IOPS_WRITE counter
disk.partition.latency.read This is Atlas metric measurement DISK_PARTITION_LATENCY_READ gauge
disk.partition.latency.write This is Atlas metric measurement DISK_PARTITION_LATENCY_WRITE gauge
disk.partition.space.free This is Atlas metric measurement DISK_PARTITION_SPACE_FREE gauge
disk.partition.space.percent_free This is Atlas metric measurement DISK_PARTITION_SPACE_PERCENT_FREE gauge
disk.partition.space.percent_used This is Atlas metric measurement DISK_PARTITION_SPACE_PERCENT_USED gauge
disk.partition.space.used This is Atlas metric measurement DISK_PARTITION_SPACE_USED gauge
disk.partition.utilization This is Atlas metric measurement DISK_PARTITION_UTILIZATION gauge
document.metrics.deleted This is Atlas metric measurement DOCUMENT_METRICS_DELETED counter
document.metrics.inserted This is Atlas metric measurement DOCUMENT_METRICS_INSERTED counter
document.metrics.returned This is Atlas metric measurement DOCUMENT_METRICS_RETURNED counter
document.metrics.updated This is Atlas metric measurement DOCUMENT_METRICS_UPDATED counter
extra_info.page_faults This is Atlas metric measurement EXTRA_INFO_PAGE_FAULTS counter
global_lock.current_queue.readers This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_READERS gauge
global_lock.current_queue.total This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_TOTAL gauge
global_lock.current_queue.writers This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_WRITERS gauge
index_size This is Atlas metric measurement DB_INDEX_SIZE_TOTAL gauge
mem.mapped This is Atlas metric measurement MEMORY_MAPPED gauge
mem.resident This is Atlas metric measurement MEMORY_RESIDENT gauge
mem.virtual This is Atlas metric measurement MEMORY_VIRTUAL gauge
network.bytes_in This is Atlas metric measurement NETWORK_BYTES_IN gauge
network.bytes_out This is Atlas metric measurement NETWORK_BYTES_OUT gauge
network.num_requests This is Atlas metric measurement NETWORK_NUM_REQUESTS counter
op.execution.time.commands This is Atlas metric measurement OP_EXECUTION_TIME_COMMANDS gauge
op.execution.time.reads This is Atlas metric measurement OP_EXECUTION_TIME_READS gauge
op.execution.time.writes This is Atlas metric measurement OP_EXECUTION_TIME_WRITES gauge
opcounter.command This is Atlas metric measurement OPCOUNTER_CMD counter
opcounter.delete This is Atlas metric measurement OPCOUNTER_DELETE counter
opcounter.getmore This is Atlas metric measurement OPCOUNTER_GETMORE counter
opcounter.insert This is Atlas metric measurement OPCOUNTER_INSERT counter
opcounter.query This is Atlas metric measurement OPCOUNTER_QUERY counter
opcounter.repl.command This is Atlas metric measurement OPCOUNTER_REPL_CMD counter
opcounter.repl.delete This is Atlas metric measurement OPCOUNTER_REPL_DELETE counter
opcounter.repl.insert This is Atlas metric measurement OPCOUNTER_REPL_INSERT counter
opcounter.repl.update This is Atlas metric measurement OPCOUNTER_REPL_UPDATE counter
opcounter.update This is Atlas metric measurement OPCOUNTER_UPDATE counter
operations_scan_and_order This is Atlas metric measurement OPERATIONS_SCAN_AND_ORDER counter
oplog.master.lag_time_diff This is Atlas metric measurement OPLOG_MASTER_LAG_TIME_DIFF gauge
oplog.master.time This is Atlas metric measurement OPLOG_MASTER_TIME gauge
oplog.rate This is Atlas metric measurement OPLOG_RATE_GB_PER_HOUR gauge
oplog.slave.lag_master_time This is Atlas metric measurement OPLOG_SLAVE_LAG_MASTER_TIME gauge
process.cpu.kernel This is Atlas metric measurement PROCESS_CPU_KERNEL gauge
process.cpu.user This is Atlas metric measurement PROCESS_CPU_USER gauge
process.normalized.cpu.children_kernel This is Atlas metric measurement PROCESS_NORMALIZED_CPU_CHILDREN_KERNEL gauge
process.normalized.cpu.children_user This is Atlas metric measurement PROCESS_NORMALIZED_CPU_CHILDREN_USER gauge
process.normalized.cpu.kernel This is Atlas metric measurement PROCESS_NORMALIZED_CPU_KERNEL gauge
process.normalized.cpu.user This is Atlas metric measurement PROCESS_NORMALIZED_CPU_USER gauge
query.executor.scanned This is Atlas metric measurement QUERY_EXECUTOR_SCANNED counter
query.executor.scanned_objects This is Atlas metric measurement QUERY_EXECUTOR_SCANNED_OBJECTS counter
query.targeting.scanned_objects_per_returned This is Atlas metric measurement QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED gauge
query.targeting.scanned_per_returned This is Atlas metric measurement QUERY_TARGETING_SCANNED_PER_RETURNED gauge
storage_size This is Atlas metric measurement DB_STORAGE_TOTAL gauge
system.cpu.guest This is Atlas metric measurement SYSTEM_CPU_GUEST gauge
system.cpu.iowait This is Atlas metric measurement SYSTEM_CPU_IOWAIT gauge
system.cpu.irq This is Atlas metric measurement SYSTEM_CPU_IRQ gauge
system.cpu.kernel This is Atlas metric measurement SYSTEM_CPU_KERNEL gauge
system.cpu.nice This is Atlas metric measurement SYSTEM_CPU_NICE gauge
system.cpu.softirq This is Atlas metric measurement SYSTEM_CPU_SOFTIRQ gauge
system.cpu.steal This is Atlas metric measurement SYSTEM_CPU_STEAL gauge
system.cpu.user This is Atlas metric measurement SYSTEM_CPU_USER gauge
system.normalized.cpu.guest This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_GUEST gauge
system.normalized.cpu.iowait This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_IOWAIT gauge
system.normalized.cpu.irq This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_IRQ gauge
system.normalized.cpu.kernel This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_KERNEL gauge
system.normalized.cpu.nice This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_NICE gauge
system.normalized.cpu.softirq This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_SOFTIRQ gauge
system.normalized.cpu.steal This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_STEAL gauge
system.normalized.cpu.user This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_USER gauge
tickets.available.reads This is Atlas metric measurement TICKETS_AVAILABLE_READS gauge
tickets.available.write This is Atlas metric measurement TICKETS_AVAILABLE_WRITE gauge

asserts.msg ๐Ÿ”—

counter

This is Atlas metric measurement ASSERT_MSG. The average rate of message asserts per second over the selected sample period. These are internal server errors that have a well defined text string. Stack traces are logged for these.

asserts.regular ๐Ÿ”—

counter

This is Atlas metric measurement ASSERT_REGULAR. The average rate of regular asserts raised per second over the selected sample period.

asserts.user ๐Ÿ”—

counter

This is Atlas metric measurement ASSERT_USER. The average rate of user asserts per second over the selected sample period. These are errors that can be generated by a user such as out of disk space or duplicate key.

asserts.warning ๐Ÿ”—

counter

This is Atlas metric measurement ASSERT_WARNING. The average rate of warnings per second over the a selected sample period.

background_flush_avg ๐Ÿ”—

counter

This is Atlas metric measurement BACKGROUND_FLUSH_AVG. Amount of data flushed in the background.

cache.bytes.read_into ๐Ÿ”—

counter

This is Atlas metric measurement CACHE_BYTES_READ_INTO. The average rate of bytes per second read into WiredTigerโ€™s cache over the selected sample period.

cache.bytes.written_from ๐Ÿ”—

counter

This is Atlas metric measurement CACHE_BYTES_WRITTEN_FROM. The average rate of bytes per second written from WiredTigerโ€™s cache over the selected sample period.

cache.dirty_bytes ๐Ÿ”—

gauge

This is Atlas metric measurement CACHE_DIRTY_BYTES. The number of tracked dirty bytes currently in the WiredTiger cache.

cache.used_bytes ๐Ÿ”—

gauge

This is Atlas metric measurement CACHE_USED_BYTES. The number of bytes currently in the WiredTiger cache.

connections.current ๐Ÿ”—

gauge

This is Atlas metric measurement CONNECTIONS. The number of currently active connections to this server. A stack is allocated per connection; thus very many connections can result in significant RAM usage.

cursors.timed_out ๐Ÿ”—

counter

This is Atlas metric measurement CURSORS_TOTAL_TIMED_OUT. The average rate of cursors that have timed out per second over the selected sample period.

cursors.total_open ๐Ÿ”—

gauge

This is Atlas metric measurement CURSORS_TOTAL_OPEN. The number of cursors that the server is maintaining for clients. Because MongoDB exhausts unused cursors, typically this value is small or zero. However, if there is a queue, stale tailable cursors, or a large number of operations this value may rise.

data_size ๐Ÿ”—

gauge

This is Atlas metric measurement DB_DATA_SIZE_TOTAL. Sum total size in bytes of the document data (including the padding factor) across all databases.

disk.partition.iops.read ๐Ÿ”—

counter

This is Atlas metric measurement DISK_PARTITION_IOPS_READ. The read throughput of I/O operations per second for the disk partition used for MongoDB.

disk.partition.iops.total ๐Ÿ”—

counter

This is Atlas metric measurement DISK_PARTITION_IOPS_TOTAL. The total throughput of I/O operations per second for the disk partition used for MongoDB.

disk.partition.iops.write ๐Ÿ”—

counter

This is Atlas metric measurement DISK_PARTITION_IOPS_WRITE. The write throughput of I/O operations per second for the disk partition used for MongoDB.

disk.partition.latency.read ๐Ÿ”—

gauge

This is Atlas metric measurement DISK_PARTITION_LATENCY_READ. The read latency in milliseconds of the disk partition used by MongoDB.

disk.partition.latency.write ๐Ÿ”—

gauge

This is Atlas metric measurement DISK_PARTITION_LATENCY_WRITE. The write latency in milliseconds of the disk partition used by MongoDB.

disk.partition.space.free ๐Ÿ”—

gauge

This is Atlas metric measurement DISK_PARTITION_SPACE_FREE. The total bytes of free disk space on the disk partition used by MongoDB.

disk.partition.space.percent_free ๐Ÿ”—

gauge

This is Atlas metric measurement DISK_PARTITION_SPACE_PERCENT_FREE. The percent of free disk space on the partition used by MongoDB.

disk.partition.space.percent_used ๐Ÿ”—

gauge

This is Atlas metric measurement DISK_PARTITION_SPACE_PERCENT_USED. The percent of used disk space on the partition that runs MongoDB.

disk.partition.space.used ๐Ÿ”—

gauge

This is Atlas metric measurement DISK_PARTITION_SPACE_USED. The total bytes of used disk space on the partition that runs MongoDB.

disk.partition.utilization ๐Ÿ”—

gauge

This is Atlas metric measurement DISK_PARTITION_UTILIZATION. The percentage of time during which requests are being issued to and serviced by the partition. This includes requests from any process, not just MongoDB processes.

document.metrics.deleted ๐Ÿ”—

counter

This is Atlas metric measurement DOCUMENT_METRICS_DELETED. The average rate per second of documents deleted over the selected sample period.

document.metrics.inserted ๐Ÿ”—

counter

This is Atlas metric measurement DOCUMENT_METRICS_INSERTED. The average rate per second of documents inserted over the selected sample period.

document.metrics.returned ๐Ÿ”—

counter

This is Atlas metric measurement DOCUMENT_METRICS_RETURNED. The average rate per second of documents returned by queries over the selected sample period.

document.metrics.updated ๐Ÿ”—

counter

This is Atlas metric measurement DOCUMENT_METRICS_UPDATED. The average rate per second of documents updated over the selected sample period.

extra_info.page_faults ๐Ÿ”—

counter

This is Atlas metric measurement EXTRA_INFO_PAGE_FAULTS. The average rate of page faults on this process per second over the selected sample period. In non-Windows environments this is hard page faults only.

global_lock.current_queue.readers ๐Ÿ”—

gauge

This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_READERS. The number of operations queued waiting for a read lock.

global_lock.current_queue.total ๐Ÿ”—

gauge

This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_TOTAL. The number of operations queued waiting for any lock.

global_lock.current_queue.writers ๐Ÿ”—

gauge

This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_WRITERS. The number of operations queued waiting for a write lock.

index_size ๐Ÿ”—

gauge

This is Atlas metric measurement DB_INDEX_SIZE_TOTAL. Sum total size in bytes of the index data across all databases.

mem.mapped ๐Ÿ”—

gauge

This is Atlas metric measurement MEMORY_MAPPED. As MMAPv1 memory maps all the data files, this number is likely similar to your total database(s) size. WiredTiger does not use memory mapped files, so this should be 0.

mem.resident ๐Ÿ”—

gauge

This is Atlas metric measurement MEMORY_RESIDENT. The number of megabytes resident. MMAPv1: It is typical over time, on a dedicated database server, for this number to approach the amount of physical ram on the box. WiredTiger: In a standard deployment resident is the amount of memory used by the WiredTiger cache plus the memory dedicated to other in memory structures used by the mongod process. By default, mongod with WiredTiger reserves 50% of the total physical memory on the server for the cache and at steady state, WiredTiger tries to limit cache usage to 80% of that total. For example, if a server has 16GB of memory, WiredTiger will assume it can use 8GB for cache and at steady state should use about 6.5GB.

mem.virtual ๐Ÿ”—

gauge

This is Atlas metric measurement MEMORY_VIRTUAL. The virtual megabytes for the mongod process. MMAPv1: Generally virtual should be a little larger than mapped (or 2x with โ€“journal), but if virtual is many gigabytes larger, it indicates that excessive memory is being used by other aspects than the memory mapping of files โ€“ that would be bad/suboptimal. The most common case of usage of a high amount of memory for non-mapped is that there are very many connections to the database. Each connection has a thread stack and the memory for those stacks can add up to a considerable amount. WiredTiger: Generally virtual should be a little larger than mapped, but if virtual is many gigabytes larger, it indicates that excessive memory is being used by other aspects than the memory mapping of files โ€“ that would be bad/suboptimal. The most common case of usage of a high amount of memory for non-mapped is that there are very many connections to the database. Each connection has a thread stack and the memory for those stacks can add up to a considerable amount.

network.bytes_in ๐Ÿ”—

gauge

This is Atlas metric measurement NETWORK_BYTES_IN. The average rate of physical (after any wire compression) bytes sent to this database server per second over the selected sample period.

network.bytes_out ๐Ÿ”—

gauge

This is Atlas metric measurement NETWORK_BYTES_OUT. The average rate of physical (after any wire compression) bytes sent from this database server per second over the selected sample period.

network.num_requests ๐Ÿ”—

counter

This is Atlas metric measurement NETWORK_NUM_REQUESTS. The average rate of requests sent to this database server per second over the selected sample period.

op.execution.time.commands ๐Ÿ”—

gauge

This is Atlas metric measurement OP_EXECUTION_TIME_COMMANDS. The average execution time in milliseconds per command operation over the selected sample period.

op.execution.time.reads ๐Ÿ”—

gauge

This is Atlas metric measurement OP_EXECUTION_TIME_READS. The average execution time in milliseconds per read operation over the selected sample period.

op.execution.time.writes ๐Ÿ”—

gauge

This is Atlas metric measurement OP_EXECUTION_TIME_WRITES. The average execution time in milliseconds per write operation over the selected sample period.

opcounter.command ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_CMD. The average rate of commands performed per second over the selected sample period.

opcounter.delete ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_DELETE. The average rate of deletes performed per second over the selected sample period.

opcounter.getmore ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_GETMORE. The average rate of getMores performed per second on any cursor over the selected sample period. On a primary, this number can be high even if the query count is low as the secondaries โ€˜getMoreโ€™ from the primary often as part of replication.

opcounter.insert ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_INSERT. The average rate of inserts performed per second over the selected sample period.

opcounter.query ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_QUERY. The average rate of queries performed per second over the selected sample period.

opcounter.repl.command ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_REPL_CMD. The average rate of replicated commands applied per second over the selected sample period.

opcounter.repl.delete ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_REPL_DELETE. The average rate of replicated deletes applied per second over the selected sample period.

opcounter.repl.insert ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_REPL_INSERT. The average rate of replicated inserts applied per second over the selected sample period.

opcounter.repl.update ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_REPL_UPDATE. The average rate of replicated updates applied per second over the selected sample period.

opcounter.update ๐Ÿ”—

counter

This is Atlas metric measurement OPCOUNTER_UPDATE. The average rate of updates performed per second over the selected sample period.

operations_scan_and_order ๐Ÿ”—

counter

This is Atlas metric measurement OPERATIONS_SCAN_AND_ORDER. The average rate per second over the selected sample period of queries that return sorted results that cannot perform the sort operation using an index.

oplog.master.lag_time_diff ๐Ÿ”—

gauge

This is Atlas metric measurement OPLOG_MASTER_LAG_TIME_DIFF. The replication headroom which is the difference between the primaryโ€™s replication oplog window (i.e. latest minus oldest oplog entry time) and the secondaryโ€™s replication lag. A secondary can go into RECOVERING if this value goes to zero.

oplog.master.time ๐Ÿ”—

gauge

This is Atlas metric measurement OPLOG_MASTER_TIME. The replication oplog window. The approximate number of hours available in the primaryโ€™s replication oplog. If a secondary is behind real-time by more than this amount, it cannot catch up and will require a full resync.

oplog.rate ๐Ÿ”—

gauge

This is Atlas metric measurement OPLOG_RATE_GB_PER_HOUR. The average rate of gigabytes of oplog the primary generates per hour.

oplog.slave.lag_master_time ๐Ÿ”—

gauge

This is Atlas metric measurement OPLOG_SLAVE_LAG_MASTER_TIME. The replication lag. The approximate number of seconds the secondary is behind the primary in write application. Only accurate if the lag is larger than 1-2 seconds, as the precision of this statistic is limited.

process.cpu.kernel ๐Ÿ”—

gauge

This is Atlas metric measurement PROCESS_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls for this MongoDB process. For servers with more than 1 CPU core, this value can exceed 100%.

process.cpu.user ๐Ÿ”—

gauge

This is Atlas metric measurement PROCESS_CPU_USER. The percentage of time the CPU spent servicing this MongoDB process. For servers with more than 1 CPU core, this value can exceed 100%.

process.normalized.cpu.children_kernel ๐Ÿ”—

gauge

This is Atlas metric measurement PROCESS_NORMALIZED_CPU_CHILDREN_KERNEL. The percentage of time the CPU spent servicing operating system calls for this MongoDB processโ€™s children, scaled to a range of 0-100% by dividing by the number of CPU cores.

process.normalized.cpu.children_user ๐Ÿ”—

gauge

This is Atlas metric measurement PROCESS_NORMALIZED_CPU_CHILDREN_USER. The percentage of time the CPU spent servicing this MongoDB processโ€™s children, scaled to a range of 0-100% by dividing by the number of CPU cores.

process.normalized.cpu.kernel ๐Ÿ”—

gauge

This is Atlas metric measurement PROCESS_NORMALIZED_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls for this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores.

process.normalized.cpu.user ๐Ÿ”—

gauge

This is Atlas metric measurement PROCESS_NORMALIZED_CPU_USER. The percentage of time the CPU spent servicing this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores.

query.executor.scanned ๐Ÿ”—

counter

This is Atlas metric measurement QUERY_EXECUTOR_SCANNED. The average rate per second over the selected sample period of index items scanned during queries and query-plan evaluation. This rate is driven by the same value as totalKeysExamined in the output of explain().

query.executor.scanned_objects ๐Ÿ”—

counter

This is Atlas metric measurement QUERY_EXECUTOR_SCANNED_OBJECTS. The average rate per second over the selected sample period of documents scanned during queries and query-plan evaluation. This rate is driven by the same value as totalDocsExamined in the output of explain().

query.targeting.scanned_objects_per_returned ๐Ÿ”—

gauge

This is Atlas metric measurement QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED. The ratio of the number of documents scanned to the number of documents returned by queries, since the previous data point for the selected sample period.

query.targeting.scanned_per_returned ๐Ÿ”—

gauge

This is Atlas metric measurement QUERY_TARGETING_SCANNED_PER_RETURNED. The ratio of the number of index items scanned to the number of documents returned by queries, since the previous data point for the selected sample period. A value of 1.0 means all documents returned exactly match query criteria for the sample period. A value of 100 means on average for the sample period, a query scans 100 documents to find one thatโ€™s returned.

storage_size ๐Ÿ”—

gauge

This is Atlas metric measurement DB_STORAGE_TOTAL. Sum total on-disk storage space allocated for document storage across all databases.

system.cpu.guest ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_GUEST. The percentage of time the CPU spent servicing guest, which is included in user. For servers with more than 1 CPU core, this value can exceed 100%.

system.cpu.iowait ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_IOWAIT. The percentage of time the CPU spent waiting for IO operations to complete. For servers with more than 1 CPU core, this value can exceed 100%.

system.cpu.irq ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_IRQ. The percentage of time the CPU spent performing hardware interrupts. For servers with more than 1 CPU core, this value can exceed 100%.

system.cpu.kernel ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls from all processes. For servers with more than 1 CPU core, this value can exceed 100%.

system.cpu.nice ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_NICE. The percentage of time the CPU spent occupied by all processes with a positive nice value. For servers with more than 1 CPU core, this value can exceed 100%.

system.cpu.softirq ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_SOFTIRQ. The percentage of time the CPU spent performing software interrupts. For servers with more than 1 CPU core, this value can exceed 100%.

system.cpu.steal ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_STEAL. The percentage of time the CPU had something runnable, but the hypervisor chose to run something else. For servers with more than 1 CPU core, this value can exceed 100%.

system.cpu.user ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_CPU_USER. The percentage of time the CPU spent servicing all user applications (not just MongoDB processes). For servers with more than 1 CPU core, this value can exceed 100%.

system.normalized.cpu.guest ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_GUEST. The percentage of time the CPU spent servicing guest, which is included in user. It is scaled to a range of 0-100% by dividing by the number of CPU cores.

system.normalized.cpu.iowait ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_IOWAIT. The percentage of time the CPU spent waiting for IO operations to complete. It is scaled to a range of 0-100% by dividing by the number of CPU cores.

system.normalized.cpu.irq ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_IRQ. The percentage of time the CPU spent performing hardware interrupts. It is scaled to a range of 0-100% by dividing by the number of CPU cores.

system.normalized.cpu.kernel ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls from all processes. It is scaled to a range of 0-100% by dividing by the number of CPU cores.

system.normalized.cpu.nice ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_NICE. The percentage of time the CPU spent occupied by all processes with a positive nice value. It is scaled to a range of 0-100% by dividing by the number of CPU cores.

system.normalized.cpu.softirq ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_SOFTIRQ. The percentage of time the CPU spent performing software interrupts. It is scaled to a range of 0-100% by dividing by the number of CPU cores.

system.normalized.cpu.steal ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_STEAL. The percentage of time the CPU had something runnable, but the hypervisor chose to run something else. It is scaled to a range of 0-100% by dividing by the number of CPU cores.

system.normalized.cpu.user ๐Ÿ”—

gauge

This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_USER. The percentage of time the CPU spent servicing all user applications (not just MongoDB processes). It is scaled to a range of 0-100% by dividing by the number of CPU cores.

tickets.available.reads ๐Ÿ”—

gauge

This is Atlas metric measurement TICKETS_AVAILABLE_READS. The number of read tickets available to the WiredTiger storage engine. Read tickets represent the number of concurrent read operations allowed into the storage engine. When this value reaches zero new read requests may queue until a read ticket becomes available.

tickets.available.write ๐Ÿ”—

gauge

This is Atlas metric measurement TICKETS_AVAILABLE_WRITE. The number of write tickets available to the WiredTiger storage engine. Write tickets represent the number of concurrent write operations allowed into the storage engine. When this value reaches zero new write requests may queue until a write ticket becomes available.

Metrics that are categorized as container/host (default) are in bold and italics in the list below.

These are the metrics available for this integration.

Group hardware ๐Ÿ”—

All of the following metrics are part of the hardware metric group. All of the non-default metrics below can be turned on by adding hardware to the monitor config option extraGroups:

  • disk.partition.iops.read (counter)
    This is Atlas metric measurement DISK_PARTITION_IOPS_READ. The read throughput of I/O operations per second for the disk partition used for MongoDB.
  • disk.partition.iops.total (counter)
    This is Atlas metric measurement DISK_PARTITION_IOPS_TOTAL. The total throughput of I/O operations per second for the disk partition used for MongoDB.
  • disk.partition.iops.write (counter)
    This is Atlas metric measurement DISK_PARTITION_IOPS_WRITE. The write throughput of I/O operations per second for the disk partition used for MongoDB.
  • disk.partition.latency.read (gauge)
    This is Atlas metric measurement DISK_PARTITION_LATENCY_READ. The read latency in milliseconds of the disk partition used by MongoDB.
  • disk.partition.latency.write (gauge)
    This is Atlas metric measurement DISK_PARTITION_LATENCY_WRITE. The write latency in milliseconds of the disk partition used by MongoDB.
  • disk.partition.space.free (gauge)
    This is Atlas metric measurement DISK_PARTITION_SPACE_FREE. The total bytes of free disk space on the disk partition used by MongoDB.
  • disk.partition.space.percent_free (gauge)
    This is Atlas metric measurement DISK_PARTITION_SPACE_PERCENT_FREE. The percent of free disk space on the partition used by MongoDB.
  • disk.partition.space.percent_used (gauge)
    This is Atlas metric measurement DISK_PARTITION_SPACE_PERCENT_USED. The percent of used disk space on the partition that runs MongoDB.
  • disk.partition.space.used (gauge)
    This is Atlas metric measurement DISK_PARTITION_SPACE_USED. The total bytes of used disk space on the partition that runs MongoDB.
  • disk.partition.utilization (gauge)
    This is Atlas metric measurement DISK_PARTITION_UTILIZATION. The percentage of time during which requests are being issued to and serviced by the partition. This includes requests from any process, not just MongoDB processes.
  • process.cpu.kernel (gauge)
    This is Atlas metric measurement PROCESS_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls for this MongoDB process. For servers with more than 1 CPU core, this value can exceed 100%.
  • process.cpu.user (gauge)
    This is Atlas metric measurement PROCESS_CPU_USER. The percentage of time the CPU spent servicing this MongoDB process. For servers with more than 1 CPU core, this value can exceed 100%.
  • process.normalized.cpu.children_kernel (gauge)
    This is Atlas metric measurement PROCESS_NORMALIZED_CPU_CHILDREN_KERNEL. The percentage of time the CPU spent servicing operating system calls for this MongoDB processโ€™s children, scaled to a range of 0-100% by dividing by the number of CPU cores.
  • process.normalized.cpu.children_user (gauge)
    This is Atlas metric measurement PROCESS_NORMALIZED_CPU_CHILDREN_USER. The percentage of time the CPU spent servicing this MongoDB processโ€™s children, scaled to a range of 0-100% by dividing by the number of CPU cores.
  • process.normalized.cpu.kernel (gauge)
    This is Atlas metric measurement PROCESS_NORMALIZED_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls for this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores.
  • process.normalized.cpu.user (gauge)
    This is Atlas metric measurement PROCESS_NORMALIZED_CPU_USER. The percentage of time the CPU spent servicing this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.cpu.guest (gauge)
    This is Atlas metric measurement SYSTEM_CPU_GUEST. The percentage of time the CPU spent servicing guest, which is included in user. For servers with more than 1 CPU core, this value can exceed 100%.
  • system.cpu.iowait (gauge)
    This is Atlas metric measurement SYSTEM_CPU_IOWAIT. The percentage of time the CPU spent waiting for IO operations to complete. For servers with more than 1 CPU core, this value can exceed 100%.
  • system.cpu.irq (gauge)
    This is Atlas metric measurement SYSTEM_CPU_IRQ. The percentage of time the CPU spent performing hardware interrupts. For servers with more than 1 CPU core, this value can exceed 100%.
  • system.cpu.kernel (gauge)
    This is Atlas metric measurement SYSTEM_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls from all processes. For servers with more than 1 CPU core, this value can exceed 100%.
  • system.cpu.nice (gauge)
    This is Atlas metric measurement SYSTEM_CPU_NICE. The percentage of time the CPU spent occupied by all processes with a positive nice value. For servers with more than 1 CPU core, this value can exceed 100%.
  • system.cpu.softirq (gauge)
    This is Atlas metric measurement SYSTEM_CPU_SOFTIRQ. The percentage of time the CPU spent performing software interrupts. For servers with more than 1 CPU core, this value can exceed 100%.
  • system.cpu.steal (gauge)
    This is Atlas metric measurement SYSTEM_CPU_STEAL. The percentage of time the CPU had something runnable, but the hypervisor chose to run something else. For servers with more than 1 CPU core, this value can exceed 100%.
  • system.cpu.user (gauge)
    This is Atlas metric measurement SYSTEM_CPU_USER. The percentage of time the CPU spent servicing all user applications (not just MongoDB processes). For servers with more than 1 CPU core, this value can exceed 100%.
  • system.normalized.cpu.guest (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_GUEST. The percentage of time the CPU spent servicing guest, which is included in user. It is scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.normalized.cpu.iowait (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_IOWAIT. The percentage of time the CPU spent waiting for IO operations to complete. It is scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.normalized.cpu.irq (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_IRQ. The percentage of time the CPU spent performing hardware interrupts. It is scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.normalized.cpu.kernel (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_KERNEL. The percentage of time the CPU spent servicing operating system calls from all processes. It is scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.normalized.cpu.nice (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_NICE. The percentage of time the CPU spent occupied by all processes with a positive nice value. It is scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.normalized.cpu.softirq (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_SOFTIRQ. The percentage of time the CPU spent performing software interrupts. It is scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.normalized.cpu.steal (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_STEAL. The percentage of time the CPU had something runnable, but the hypervisor chose to run something else. It is scaled to a range of 0-100% by dividing by the number of CPU cores.
  • system.normalized.cpu.user (gauge)
    This is Atlas metric measurement SYSTEM_NORMALIZED_CPU_USER. The percentage of time the CPU spent servicing all user applications (not just MongoDB processes). It is scaled to a range of 0-100% by dividing by the number of CPU cores.

Group mongodb ๐Ÿ”—

All of the following metrics are part of the mongodb metric group. All of the non-default metrics below can be turned on by adding mongodb to the monitor config option extraGroups:

  • asserts.msg (counter)
    This is Atlas metric measurement ASSERT_MSG. The average rate of message asserts per second over the selected sample period. These are internal server errors that have a well defined text string. Stack traces are logged for these.
  • asserts.regular (counter)
    This is Atlas metric measurement ASSERT_REGULAR. The average rate of regular asserts raised per second over the selected sample period.
  • asserts.user (counter)
    This is Atlas metric measurement ASSERT_USER. The average rate of user asserts per second over the selected sample period. These are errors that can be generated by a user such as out of disk space or duplicate key.
  • asserts.warning (counter)
    This is Atlas metric measurement ASSERT_WARNING. The average rate of warnings per second over the a selected sample period.
  • background_flush_avg (counter)
    This is Atlas metric measurement BACKGROUND_FLUSH_AVG. Amount of data flushed in the background.
  • cache.bytes.read_into (counter)
    This is Atlas metric measurement CACHE_BYTES_READ_INTO. The average rate of bytes per second read into WiredTigerโ€™s cache over the selected sample period.
  • cache.bytes.written_from (counter)
    This is Atlas metric measurement CACHE_BYTES_WRITTEN_FROM. The average rate of bytes per second written from WiredTigerโ€™s cache over the selected sample period.
  • cache.dirty_bytes (gauge)
    This is Atlas metric measurement CACHE_DIRTY_BYTES. The number of tracked dirty bytes currently in the WiredTiger cache.
  • cache.used_bytes (gauge)
    This is Atlas metric measurement CACHE_USED_BYTES. The number of bytes currently in the WiredTiger cache.
  • connections.current (gauge)
    This is Atlas metric measurement CONNECTIONS. The number of currently active connections to this server. A stack is allocated per connection; thus very many connections can result in significant RAM usage.
  • cursors.timed_out (counter)
    This is Atlas metric measurement CURSORS_TOTAL_TIMED_OUT. The average rate of cursors that have timed out per second over the selected sample period.
  • cursors.total_open (gauge)
    This is Atlas metric measurement CURSORS_TOTAL_OPEN. The number of cursors that the server is maintaining for clients. Because MongoDB exhausts unused cursors, typically this value is small or zero. However, if there is a queue, stale tailable cursors, or a large number of operations this value may rise.
  • data_size (gauge)
    This is Atlas metric measurement DB_DATA_SIZE_TOTAL. Sum total size in bytes of the document data (including the padding factor) across all databases.
  • document.metrics.deleted (counter)
    This is Atlas metric measurement DOCUMENT_METRICS_DELETED. The average rate per second of documents deleted over the selected sample period.
  • document.metrics.inserted (counter)
    This is Atlas metric measurement DOCUMENT_METRICS_INSERTED. The average rate per second of documents inserted over the selected sample period.
  • document.metrics.returned (counter)
    This is Atlas metric measurement DOCUMENT_METRICS_RETURNED. The average rate per second of documents returned by queries over the selected sample period.
  • document.metrics.updated (counter)
    This is Atlas metric measurement DOCUMENT_METRICS_UPDATED. The average rate per second of documents updated over the selected sample period.
  • extra_info.page_faults (counter)
    This is Atlas metric measurement EXTRA_INFO_PAGE_FAULTS. The average rate of page faults on this process per second over the selected sample period. In non-Windows environments this is hard page faults only.
  • global_lock.current_queue.readers (gauge)
    This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_READERS. The number of operations queued waiting for a read lock.
  • global_lock.current_queue.total (gauge)
    This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_TOTAL. The number of operations queued waiting for any lock.
  • global_lock.current_queue.writers (gauge)
    This is Atlas metric measurement GLOBAL_LOCK_CURRENT_QUEUE_WRITERS. The number of operations queued waiting for a write lock.
  • index_size (gauge)
    This is Atlas metric measurement DB_INDEX_SIZE_TOTAL. Sum total size in bytes of the index data across all databases.
  • mem.mapped (gauge)
    This is Atlas metric measurement MEMORY_MAPPED. As MMAPv1 memory maps all the data files, this number is likely similar to your total database(s) size. WiredTiger does not use memory mapped files, so this should be 0.
  • mem.resident (gauge)
    This is Atlas metric measurement MEMORY_RESIDENT. The number of megabytes resident. MMAPv1: It is typical over time, on a dedicated database server, for this number to approach the amount of physical ram on the box. WiredTiger: In a standard deployment resident is the amount of memory used by the WiredTiger cache plus the memory dedicated to other in memory structures used by the mongod process. By default, mongod with WiredTiger reserves 50% of the total physical memory on the server for the cache and at steady state, WiredTiger tries to limit cache usage to 80% of that total. For example, if a server has 16GB of memory, WiredTiger will assume it can use 8GB for cache and at steady state should use about 6.5GB.
  • mem.virtual (gauge)
    This is Atlas metric measurement MEMORY_VIRTUAL. The virtual megabytes for the mongod process. MMAPv1: Generally virtual should be a little larger than mapped (or 2x with โ€“journal), but if virtual is many gigabytes larger, it indicates that excessive memory is being used by other aspects than the memory mapping of files โ€“ that would be bad/suboptimal. The most common case of usage of a high amount of memory for non-mapped is that there are very many connections to the database. Each connection has a thread stack and the memory for those stacks can add up to a considerable amount. WiredTiger: Generally virtual should be a little larger than mapped, but if virtual is many gigabytes larger, it indicates that excessive memory is being used by other aspects than the memory mapping of files โ€“ that would be bad/suboptimal. The most common case of usage of a high amount of memory for non-mapped is that there are very many connections to the database. Each connection has a thread stack and the memory for those stacks can add up to a considerable amount.
  • network.bytes_in (gauge)
    This is Atlas metric measurement NETWORK_BYTES_IN. The average rate of physical (after any wire compression) bytes sent to this database server per second over the selected sample period.
  • network.bytes_out (gauge)
    This is Atlas metric measurement NETWORK_BYTES_OUT. The average rate of physical (after any wire compression) bytes sent from this database server per second over the selected sample period.
  • network.num_requests (counter)
    This is Atlas metric measurement NETWORK_NUM_REQUESTS. The average rate of requests sent to this database server per second over the selected sample period.
  • op.execution.time.commands (gauge)
    This is Atlas metric measurement OP_EXECUTION_TIME_COMMANDS. The average execution time in milliseconds per command operation over the selected sample period.
  • op.execution.time.reads (gauge)
    This is Atlas metric measurement OP_EXECUTION_TIME_READS. The average execution time in milliseconds per read operation over the selected sample period.
  • op.execution.time.writes (gauge)
    This is Atlas metric measurement OP_EXECUTION_TIME_WRITES. The average execution time in milliseconds per write operation over the selected sample period.
  • opcounter.command (counter)
    This is Atlas metric measurement OPCOUNTER_CMD. The average rate of commands performed per second over the selected sample period.
  • opcounter.delete (counter)
    This is Atlas metric measurement OPCOUNTER_DELETE. The average rate of deletes performed per second over the selected sample period.
  • opcounter.getmore (counter)
    This is Atlas metric measurement OPCOUNTER_GETMORE. The average rate of getMores performed per second on any cursor over the selected sample period. On a primary, this number can be high even if the query count is low as the secondaries โ€˜getMoreโ€™ from the primary often as part of replication.
  • opcounter.insert (counter)
    This is Atlas metric measurement OPCOUNTER_INSERT. The average rate of inserts performed per second over the selected sample period.
  • opcounter.query (counter)
    This is Atlas metric measurement OPCOUNTER_QUERY. The average rate of queries performed per second over the selected sample period.
  • opcounter.repl.command (counter)
    This is Atlas metric measurement OPCOUNTER_REPL_CMD. The average rate of replicated commands applied per second over the selected sample period.
  • opcounter.repl.delete (counter)
    This is Atlas metric measurement OPCOUNTER_REPL_DELETE. The average rate of replicated deletes applied per second over the selected sample period.
  • opcounter.repl.insert (counter)
    This is Atlas metric measurement OPCOUNTER_REPL_INSERT. The average rate of replicated inserts applied per second over the selected sample period.
  • opcounter.repl.update (counter)
    This is Atlas metric measurement OPCOUNTER_REPL_UPDATE. The average rate of replicated updates applied per second over the selected sample period.
  • opcounter.update (counter)
    This is Atlas metric measurement OPCOUNTER_UPDATE. The average rate of updates performed per second over the selected sample period.
  • operations_scan_and_order (counter)
    This is Atlas metric measurement OPERATIONS_SCAN_AND_ORDER. The average rate per second over the selected sample period of queries that return sorted results that cannot perform the sort operation using an index.
  • oplog.master.lag_time_diff (gauge)
    This is Atlas metric measurement OPLOG_MASTER_LAG_TIME_DIFF. The replication headroom which is the difference between the primaryโ€™s replication oplog window (i.e. latest minus oldest oplog entry time) and the secondaryโ€™s replication lag. A secondary can go into RECOVERING if this value goes to zero.
  • oplog.master.time (gauge)
    This is Atlas metric measurement OPLOG_MASTER_TIME. The replication oplog window. The approximate number of hours available in the primaryโ€™s replication oplog. If a secondary is behind real-time by more than this amount, it cannot catch up and will require a full resync.
  • oplog.rate (gauge)
    This is Atlas metric measurement OPLOG_RATE_GB_PER_HOUR. The average rate of gigabytes of oplog the primary generates per hour.
  • oplog.slave.lag_master_time (gauge)
    This is Atlas metric measurement OPLOG_SLAVE_LAG_MASTER_TIME. The replication lag. The approximate number of seconds the secondary is behind the primary in write application. Only accurate if the lag is larger than 1-2 seconds, as the precision of this statistic is limited.
  • query.executor.scanned (counter)
    This is Atlas metric measurement QUERY_EXECUTOR_SCANNED. The average rate per second over the selected sample period of index items scanned during queries and query-plan evaluation. This rate is driven by the same value as totalKeysExamined in the output of explain().
  • query.executor.scanned_objects (counter)
    This is Atlas metric measurement QUERY_EXECUTOR_SCANNED_OBJECTS. The average rate per second over the selected sample period of documents scanned during queries and query-plan evaluation. This rate is driven by the same value as totalDocsExamined in the output of explain().
  • query.targeting.scanned_objects_per_returned (gauge)
    This is Atlas metric measurement QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED. The ratio of the number of documents scanned to the number of documents returned by queries, since the previous data point for the selected sample period.
  • query.targeting.scanned_per_returned (gauge)
    This is Atlas metric measurement QUERY_TARGETING_SCANNED_PER_RETURNED. The ratio of the number of index items scanned to the number of documents returned by queries, since the previous data point for the selected sample period. A value of 1.0 means all documents returned exactly match query criteria for the sample period. A value of 100 means on average for the sample period, a query scans 100 documents to find one thatโ€™s returned.
  • storage_size (gauge)
    This is Atlas metric measurement DB_STORAGE_TOTAL. Sum total on-disk storage space allocated for document storage across all databases.
  • tickets.available.reads (gauge)
    This is Atlas metric measurement TICKETS_AVAILABLE_READS. The number of read tickets available to the WiredTiger storage engine. Read tickets represent the number of concurrent read operations allowed into the storage engine. When this value reaches zero new read requests may queue until a read ticket becomes available.
  • tickets.available.write (gauge)
    This is Atlas metric measurement TICKETS_AVAILABLE_WRITE. The number of write tickets available to the WiredTiger storage engine. Write tickets represent the number of concurrent write operations allowed into the storage engine. When this value reaches zero new write requests may queue until a write ticket becomes available.

Non-default metrics (version 4.7.0+) ๐Ÿ”—

The following information applies to the agent version 4.7.0+ that has enableBuiltInFiltering: true set on the top level of the agent config.

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Legacy non-default metrics (version < 4.7.0) ๐Ÿ”—

The following information only applies to agent version older than 4.7.0. If you have a newer agent and have set enableBuiltInFiltering: true at the top level of your agent config, see the section above. See upgrade instructions in Old-style inclusion list filtering.

If you have a reference to the whitelist.json in your agentโ€™s top-level metricsToExclude config option, and you want to emit metrics that are not in that allow list, then you need to add an item to the top-level metricsToInclude config option to override that allow list (see Inclusion filtering. Or you can just copy the whitelist.json, modify it, and reference that in metricsToExclude.