Skip to content

Insight Reference Metric¶

The metrics in this article are organized based on the community's kube-prometheus framework. Currently, it covers metrics from multiple levels, including Cluster, Node, Namespace, and Workload. This article lists some commonly used metrics, their descriptions, and units for easy reference.

Cluster¶

Metric Name	Description	Unit
cluster_cpu_utilization	Cluster CPU Utilization
cluster_cpu_total	Total CPU in Cluster	Core
cluster_cpu_usage	CPU Used in Cluster	Core
cluster_cpu_requests_commitment	CPU Allocation Rate in Cluster
cluster_memory_utilization	Cluster Memory Utilization
cluster_memory_usage	Memory Usage in Cluster	Byte
cluster_memory_available	Available Memory in Cluster	Byte
cluster_memory_requests_commitment	Memory Allocation Rate in Cluster
cluster_memory_total	Total Memory in Cluster	Byte
cluster_net_utilization	Network Data Transfer Rate in Cluster	Byte/s
cluster_net_bytes_transmitted	Network Data Transmitted in Cluster (Upstream)	Byte/s
cluster_net_bytes_received	Network Data Received in Cluster (Downstream)	Byte/s
cluster_disk_read_iops	Disk Read IOPS in Cluster	times/s
cluster_disk_write_iops	Disk Write IOPS in Cluster	times/s
cluster_disk_read_throughput	Disk Read Throughput in Cluster	Byte/s
cluster_disk_write_throughput	Disk Write Throughput in Cluster	Byte/s
cluster_disk_size_capacity	Total Disk Capacity in Cluster	Byte
cluster_disk_size_available	Available Disk Size in Cluster	Byte
cluster_disk_size_usage	Disk Usage in Cluster	Byte
cluster_disk_size_utilization	Disk Utilization in Cluster
cluster_node_total	Total Nodes in Cluster	units
cluster_node_online	Online Nodes in Cluster	units
cluster_node_offline_count	Count of Offline Nodes in Cluster	units
cluster_pod_count	Total Pods in Cluster	units
cluster_pod_running_count	Count of Running Pods in Cluster	units
cluster_pod_abnormal_count	Count of Abnormal Pods in Cluster	units
cluster_deployment_count	Total Deployments in Cluster	units
cluster_deployment_normal_count	Count of Normal Deployments in Cluster	units
cluster_deployment_abnormal_count	Count of Abnormal Deployments in Cluster	units
cluster_statefulset_count	Count of StatefulSets in Cluster	units
cluster_statefulset_normal_count	Count of Normal StatefulSets in Cluster	units
cluster_statefulset_abnormal_count	Count of Abnormal StatefulSets in Cluster	units
cluster_daemonset_count	Count of DaemonSets in Cluster	units
cluster_daemonset_normal_count	Count of Normal DaemonSets in Cluster	units
cluster_daemonset_abnormal_count	Count of Abnormal DaemonSets in Cluster	units
cluster_job_count	Total Jobs in Cluster	units
cluster_job_normal_count	Count of Normal Jobs in Cluster	units
cluster_job_abnormal_count	Count of Abnormal Jobs in Cluster	units

Tip

Utilization is generally a number in the range (0,1] (e.g., 0.21, not 21%)

Node¶

Metric Name	Description	Unit
node_cpu_utilization	Node CPU Utilization
node_cpu_total	Total CPU in Node	Core
node_cpu_usage	CPU Usage in Node	Core
node_cpu_requests_commitment	CPU Allocation Rate in Node
node_memory_utilization	Node Memory Utilization
node_memory_usage	Memory Usage in Node	Byte
node_memory_requests_commitment	Memory Allocation Rate in Node
node_memory_available	Available Memory in Node	Byte
node_memory_total	Total Memory in Node	Byte
node_net_utilization	Network Data Transfer Rate in Node	Byte/s
node_net_bytes_transmitted	Network Data Transmitted in Node (Upstream)	Byte/s
node_net_bytes_received	Network Data Received in Node (Downstream)	Byte/s
node_disk_read_iops	Disk Read IOPS in Node	times/s
node_disk_write_iops	Disk Write IOPS in Node	times/s
node_disk_read_throughput	Disk Read Throughput in Node	Byte/s
node_disk_write_throughput	Disk Write Throughput in Node	Byte/s
node_disk_size_capacity	Total Disk Capacity in Node	Byte
node_disk_size_available	Available Disk Size in Node	Byte
node_disk_size_usage	Disk Usage in Node	Byte
node_disk_size_utilization	Disk Utilization in Node

Workload¶

The currently supported workload types include: Deployment, StatefulSet, DaemonSet, Job, and CronJob.

Metric Name	Description	Unit
workload_cpu_usage	Workload CPU Usage	Core
workload_cpu_limits	Workload CPU Limit	Core
workload_cpu_requests	Workload CPU Requests	Core
workload_cpu_utilization	Workload CPU Utilization
workload_memory_usage	Workload Memory Usage	Byte
workload_memory_limits	Workload Memory Limit	Byte
workload_memory_requests	Workload Memory Requests	Byte
workload_memory_utilization	Workload Memory Utilization
workload_memory_usage_cached	Workload Memory Usage (including cache)	Byte
workload_net_bytes_transmitted	Workload Network Data Transmitted Rate	Byte/s
workload_net_bytes_received	Workload Network Data Received Rate	Byte/s
workload_disk_read_throughput	Workload Disk Read Throughput	Byte/s
workload_disk_write_throughput	Workload Disk Write Throughput	Byte/s

Total workload is calculated here.
Metrics can be obtained using workload_cpu_usage{workload_type="deployment", workload="prometheus"}.
Calculation rule for workload_pod_utilization: workload_pod_usage / workload_pod_request.

Pod¶

Metric Name	Description	Unit
pod_cpu_usage	Pod CPU Usage	Core
pod_cpu_limits	Pod CPU Limit	Core
pod_cpu_requests	Pod CPU Requests	Core
pod_cpu_utilization	Pod CPU Utilization
pod_memory_usage	Pod Memory Usage	Byte
pod_memory_limits	Pod Memory Limit	Byte
pod_memory_requests	Pod Memory Requests	Byte
pod_memory_utilization	Pod Memory Utilization
pod_memory_usage_cached	Pod Memory Usage (including cache)	Byte
pod_net_bytes_transmitted	Pod Network Data Transmitted Rate	Byte/s
pod_net_bytes_received	Pod Network Data Received Rate	Byte/s
pod_disk_read_throughput	Pod Disk Read Throughput	Byte/s
pod_disk_write_throughput	Pod Disk Write Throughput	Byte/s

You can obtain the CPU usage of all Pods belonging to the Deployment named prometheus by using pod_cpu_usage{workload_type="deployment", workload="prometheus"}.

Span Metrics¶

Metric Name	Description	Unit
calls_total	Total Service Requests
duration_milliseconds_bucket	Service Latency Histogram
duration_milliseconds_sum	Total Service Latency	ms
duration_milliseconds_count	Number of Latency Records
otelcol_processor_groupbytrace_spans_released	Number of Collected Spans
otelcol_processor_groupbytrace_traces_released	Number of Collected Traces
traces_service_graph_request_total	Total Service Requests (Topology Feature)
traces_service_graph_request_server_seconds_sum	Total Latency (Topology Feature)	ms
traces_service_graph_request_server_seconds_bucket	Service Latency Histogram (Topology Feature)
traces_service_graph_request_server_seconds_count	Total Service Requests (Topology Feature)

Comments