Insight Reference Metric¶
The metrics in this article are organized based on the community's kube-prometheus framework. Currently, it covers metrics from multiple levels, including Cluster, Node, Namespace, and Workload. This article lists some commonly used metrics, their descriptions, and units for easy reference.
Cluster¶
| Metric Name | Description | Unit | 
|---|---|---|
| cluster_cpu_utilization | Cluster CPU Utilization | |
| cluster_cpu_total | Total CPU in Cluster | Core | 
| cluster_cpu_usage | CPU Used in Cluster | Core | 
| cluster_cpu_requests_commitment | CPU Allocation Rate in Cluster | |
| cluster_memory_utilization | Cluster Memory Utilization | |
| cluster_memory_usage | Memory Usage in Cluster | Byte | 
| cluster_memory_available | Available Memory in Cluster | Byte | 
| cluster_memory_requests_commitment | Memory Allocation Rate in Cluster | |
| cluster_memory_total | Total Memory in Cluster | Byte | 
| cluster_net_utilization | Network Data Transfer Rate in Cluster | Byte/s | 
| cluster_net_bytes_transmitted | Network Data Transmitted in Cluster (Upstream) | Byte/s | 
| cluster_net_bytes_received | Network Data Received in Cluster (Downstream) | Byte/s | 
| cluster_disk_read_iops | Disk Read IOPS in Cluster | times/s | 
| cluster_disk_write_iops | Disk Write IOPS in Cluster | times/s | 
| cluster_disk_read_throughput | Disk Read Throughput in Cluster | Byte/s | 
| cluster_disk_write_throughput | Disk Write Throughput in Cluster | Byte/s | 
| cluster_disk_size_capacity | Total Disk Capacity in Cluster | Byte | 
| cluster_disk_size_available | Available Disk Size in Cluster | Byte | 
| cluster_disk_size_usage | Disk Usage in Cluster | Byte | 
| cluster_disk_size_utilization | Disk Utilization in Cluster | |
| cluster_node_total | Total Nodes in Cluster | units | 
| cluster_node_online | Online Nodes in Cluster | units | 
| cluster_node_offline_count | Count of Offline Nodes in Cluster | units | 
| cluster_pod_count | Total Pods in Cluster | units | 
| cluster_pod_running_count | Count of Running Pods in Cluster | units | 
| cluster_pod_abnormal_count | Count of Abnormal Pods in Cluster | units | 
| cluster_deployment_count | Total Deployments in Cluster | units | 
| cluster_deployment_normal_count | Count of Normal Deployments in Cluster | units | 
| cluster_deployment_abnormal_count | Count of Abnormal Deployments in Cluster | units | 
| cluster_statefulset_count | Count of StatefulSets in Cluster | units | 
| cluster_statefulset_normal_count | Count of Normal StatefulSets in Cluster | units | 
| cluster_statefulset_abnormal_count | Count of Abnormal StatefulSets in Cluster | units | 
| cluster_daemonset_count | Count of DaemonSets in Cluster | units | 
| cluster_daemonset_normal_count | Count of Normal DaemonSets in Cluster | units | 
| cluster_daemonset_abnormal_count | Count of Abnormal DaemonSets in Cluster | units | 
| cluster_job_count | Total Jobs in Cluster | units | 
| cluster_job_normal_count | Count of Normal Jobs in Cluster | units | 
| cluster_job_abnormal_count | Count of Abnormal Jobs in Cluster | units | 
Tip
Utilization is generally a number in the range (0,1] (e.g., 0.21, not 21%)
Node¶
| Metric Name | Description | Unit | 
|---|---|---|
| node_cpu_utilization | Node CPU Utilization | |
| node_cpu_total | Total CPU in Node | Core | 
| node_cpu_usage | CPU Usage in Node | Core | 
| node_cpu_requests_commitment | CPU Allocation Rate in Node | |
| node_memory_utilization | Node Memory Utilization | |
| node_memory_usage | Memory Usage in Node | Byte | 
| node_memory_requests_commitment | Memory Allocation Rate in Node | |
| node_memory_available | Available Memory in Node | Byte | 
| node_memory_total | Total Memory in Node | Byte | 
| node_net_utilization | Network Data Transfer Rate in Node | Byte/s | 
| node_net_bytes_transmitted | Network Data Transmitted in Node (Upstream) | Byte/s | 
| node_net_bytes_received | Network Data Received in Node (Downstream) | Byte/s | 
| node_disk_read_iops | Disk Read IOPS in Node | times/s | 
| node_disk_write_iops | Disk Write IOPS in Node | times/s | 
| node_disk_read_throughput | Disk Read Throughput in Node | Byte/s | 
| node_disk_write_throughput | Disk Write Throughput in Node | Byte/s | 
| node_disk_size_capacity | Total Disk Capacity in Node | Byte | 
| node_disk_size_available | Available Disk Size in Node | Byte | 
| node_disk_size_usage | Disk Usage in Node | Byte | 
| node_disk_size_utilization | Disk Utilization in Node | 
Workload¶
The currently supported workload types include: Deployment, StatefulSet, DaemonSet, Job, and CronJob.
| Metric Name | Description | Unit | 
|---|---|---|
| workload_cpu_usage | Workload CPU Usage | Core | 
| workload_cpu_limits | Workload CPU Limit | Core | 
| workload_cpu_requests | Workload CPU Requests | Core | 
| workload_cpu_utilization | Workload CPU Utilization | |
| workload_memory_usage | Workload Memory Usage | Byte | 
| workload_memory_limits | Workload Memory Limit | Byte | 
| workload_memory_requests | Workload Memory Requests | Byte | 
| workload_memory_utilization | Workload Memory Utilization | |
| workload_memory_usage_cached | Workload Memory Usage (including cache) | Byte | 
| workload_net_bytes_transmitted | Workload Network Data Transmitted Rate | Byte/s | 
| workload_net_bytes_received | Workload Network Data Received Rate | Byte/s | 
| workload_disk_read_throughput | Workload Disk Read Throughput | Byte/s | 
| workload_disk_write_throughput | Workload Disk Write Throughput | Byte/s | 
- Total workload is calculated here.
 - Metrics can be obtained using 
workload_cpu_usage{workload_type="deployment", workload="prometheus"}. - Calculation rule for 
workload_pod_utilization:workload_pod_usage / workload_pod_request. 
Pod¶
| Metric Name | Description | Unit | 
|---|---|---|
| pod_cpu_usage | Pod CPU Usage | Core | 
| pod_cpu_limits | Pod CPU Limit | Core | 
| pod_cpu_requests | Pod CPU Requests | Core | 
| pod_cpu_utilization | Pod CPU Utilization | |
| pod_memory_usage | Pod Memory Usage | Byte | 
| pod_memory_limits | Pod Memory Limit | Byte | 
| pod_memory_requests | Pod Memory Requests | Byte | 
| pod_memory_utilization | Pod Memory Utilization | |
| pod_memory_usage_cached | Pod Memory Usage (including cache) | Byte | 
| pod_net_bytes_transmitted | Pod Network Data Transmitted Rate | Byte/s | 
| pod_net_bytes_received | Pod Network Data Received Rate | Byte/s | 
| pod_disk_read_throughput | Pod Disk Read Throughput | Byte/s | 
| pod_disk_write_throughput | Pod Disk Write Throughput | Byte/s | 
You can obtain the CPU usage of all Pods belonging to the Deployment named prometheus by using pod_cpu_usage{workload_type="deployment", workload="prometheus"}.
Span Metrics¶
| Metric Name | Description | Unit | 
|---|---|---|
| calls_total | Total Service Requests | |
| duration_milliseconds_bucket | Service Latency Histogram | |
| duration_milliseconds_sum | Total Service Latency | ms | 
| duration_milliseconds_count | Number of Latency Records | |
| otelcol_processor_groupbytrace_spans_released | Number of Collected Spans | |
| otelcol_processor_groupbytrace_traces_released | Number of Collected Traces | |
| traces_service_graph_request_total | Total Service Requests (Topology Feature) | |
| traces_service_graph_request_server_seconds_sum | Total Latency (Topology Feature) | ms | 
| traces_service_graph_request_server_seconds_bucket | Service Latency Histogram (Topology Feature) | |
| traces_service_graph_request_server_seconds_count | Total Service Requests (Topology Feature) |