Skip to content

Insight Reference Metric

The metrics in this article are organized based on the community's kube-prometheus framework. Currently, it covers metrics from multiple levels, including Cluster, Node, Namespace, and Workload. This article lists some commonly used metrics, their descriptions, and units for easy reference.

Cluster

Metric Name Description Unit
cluster_cpu_utilization Cluster CPU Utilization
cluster_cpu_total Total CPU in Cluster Core
cluster_cpu_usage CPU Used in Cluster Core
cluster_cpu_requests_commitment CPU Allocation Rate in Cluster
cluster_memory_utilization Cluster Memory Utilization
cluster_memory_usage Memory Usage in Cluster Byte
cluster_memory_available Available Memory in Cluster Byte
cluster_memory_requests_commitment Memory Allocation Rate in Cluster
cluster_memory_total Total Memory in Cluster Byte
cluster_net_utilization Network Data Transfer Rate in Cluster Byte/s
cluster_net_bytes_transmitted Network Data Transmitted in Cluster (Upstream) Byte/s
cluster_net_bytes_received Network Data Received in Cluster (Downstream) Byte/s
cluster_disk_read_iops Disk Read IOPS in Cluster times/s
cluster_disk_write_iops Disk Write IOPS in Cluster times/s
cluster_disk_read_throughput Disk Read Throughput in Cluster Byte/s
cluster_disk_write_throughput Disk Write Throughput in Cluster Byte/s
cluster_disk_size_capacity Total Disk Capacity in Cluster Byte
cluster_disk_size_available Available Disk Size in Cluster Byte
cluster_disk_size_usage Disk Usage in Cluster Byte
cluster_disk_size_utilization Disk Utilization in Cluster
cluster_node_total Total Nodes in Cluster units
cluster_node_online Online Nodes in Cluster units
cluster_node_offline_count Count of Offline Nodes in Cluster units
cluster_pod_count Total Pods in Cluster units
cluster_pod_running_count Count of Running Pods in Cluster units
cluster_pod_abnormal_count Count of Abnormal Pods in Cluster units
cluster_deployment_count Total Deployments in Cluster units
cluster_deployment_normal_count Count of Normal Deployments in Cluster units
cluster_deployment_abnormal_count Count of Abnormal Deployments in Cluster units
cluster_statefulset_count Count of StatefulSets in Cluster units
cluster_statefulset_normal_count Count of Normal StatefulSets in Cluster units
cluster_statefulset_abnormal_count Count of Abnormal StatefulSets in Cluster units
cluster_daemonset_count Count of DaemonSets in Cluster units
cluster_daemonset_normal_count Count of Normal DaemonSets in Cluster units
cluster_daemonset_abnormal_count Count of Abnormal DaemonSets in Cluster units
cluster_job_count Total Jobs in Cluster units
cluster_job_normal_count Count of Normal Jobs in Cluster units
cluster_job_abnormal_count Count of Abnormal Jobs in Cluster units

Tip

Utilization is generally a number in the range (0,1] (e.g., 0.21, not 21%)

Node

Metric Name Description Unit
node_cpu_utilization Node CPU Utilization
node_cpu_total Total CPU in Node Core
node_cpu_usage CPU Usage in Node Core
node_cpu_requests_commitment CPU Allocation Rate in Node
node_memory_utilization Node Memory Utilization
node_memory_usage Memory Usage in Node Byte
node_memory_requests_commitment Memory Allocation Rate in Node
node_memory_available Available Memory in Node Byte
node_memory_total Total Memory in Node Byte
node_net_utilization Network Data Transfer Rate in Node Byte/s
node_net_bytes_transmitted Network Data Transmitted in Node (Upstream) Byte/s
node_net_bytes_received Network Data Received in Node (Downstream) Byte/s
node_disk_read_iops Disk Read IOPS in Node times/s
node_disk_write_iops Disk Write IOPS in Node times/s
node_disk_read_throughput Disk Read Throughput in Node Byte/s
node_disk_write_throughput Disk Write Throughput in Node Byte/s
node_disk_size_capacity Total Disk Capacity in Node Byte
node_disk_size_available Available Disk Size in Node Byte
node_disk_size_usage Disk Usage in Node Byte
node_disk_size_utilization Disk Utilization in Node

Workload

The currently supported workload types include: Deployment, StatefulSet, DaemonSet, Job, and CronJob.

Metric Name Description Unit
workload_cpu_usage Workload CPU Usage Core
workload_cpu_limits Workload CPU Limit Core
workload_cpu_requests Workload CPU Requests Core
workload_cpu_utilization Workload CPU Utilization
workload_memory_usage Workload Memory Usage Byte
workload_memory_limits Workload Memory Limit Byte
workload_memory_requests Workload Memory Requests Byte
workload_memory_utilization Workload Memory Utilization
workload_memory_usage_cached Workload Memory Usage (including cache) Byte
workload_net_bytes_transmitted Workload Network Data Transmitted Rate Byte/s
workload_net_bytes_received Workload Network Data Received Rate Byte/s
workload_disk_read_throughput Workload Disk Read Throughput Byte/s
workload_disk_write_throughput Workload Disk Write Throughput Byte/s
  1. Total workload is calculated here.
  2. Metrics can be obtained using workload_cpu_usage{workload_type="deployment", workload="prometheus"}.
  3. Calculation rule for workload_pod_utilization: workload_pod_usage / workload_pod_request.

Pod

Metric Name Description Unit
pod_cpu_usage Pod CPU Usage Core
pod_cpu_limits Pod CPU Limit Core
pod_cpu_requests Pod CPU Requests Core
pod_cpu_utilization Pod CPU Utilization
pod_memory_usage Pod Memory Usage Byte
pod_memory_limits Pod Memory Limit Byte
pod_memory_requests Pod Memory Requests Byte
pod_memory_utilization Pod Memory Utilization
pod_memory_usage_cached Pod Memory Usage (including cache) Byte
pod_net_bytes_transmitted Pod Network Data Transmitted Rate Byte/s
pod_net_bytes_received Pod Network Data Received Rate Byte/s
pod_disk_read_throughput Pod Disk Read Throughput Byte/s
pod_disk_write_throughput Pod Disk Write Throughput Byte/s

You can obtain the CPU usage of all Pods belonging to the Deployment named prometheus by using pod_cpu_usage{workload_type="deployment", workload="prometheus"}.

Span Metrics

Metric Name Description Unit
calls_total Total Service Requests
duration_milliseconds_bucket Service Latency Histogram
duration_milliseconds_sum Total Service Latency ms
duration_milliseconds_count Number of Latency Records
otelcol_processor_groupbytrace_spans_released Number of Collected Spans
otelcol_processor_groupbytrace_traces_released Number of Collected Traces
traces_service_graph_request_total Total Service Requests (Topology Feature)
traces_service_graph_request_server_seconds_sum Total Latency (Topology Feature) ms
traces_service_graph_request_server_seconds_bucket Service Latency Histogram (Topology Feature)
traces_service_graph_request_server_seconds_count Total Service Requests (Topology Feature)

Comments