Insight Reference Metric¶
The metrics in this article are organized based on the community's kube-prometheus framework. Currently, it covers metrics from multiple levels, including Cluster, Node, Namespace, and Workload. This article lists some commonly used metrics, their descriptions, and units for easy reference.
Cluster¶
Metric Name | Description | Unit |
---|---|---|
cluster_cpu_utilization | Cluster CPU Utilization | |
cluster_cpu_total | Total CPU in Cluster | Core |
cluster_cpu_usage | CPU Used in Cluster | Core |
cluster_cpu_requests_commitment | CPU Allocation Rate in Cluster | |
cluster_memory_utilization | Cluster Memory Utilization | |
cluster_memory_usage | Memory Usage in Cluster | Byte |
cluster_memory_available | Available Memory in Cluster | Byte |
cluster_memory_requests_commitment | Memory Allocation Rate in Cluster | |
cluster_memory_total | Total Memory in Cluster | Byte |
cluster_net_utilization | Network Data Transfer Rate in Cluster | Byte/s |
cluster_net_bytes_transmitted | Network Data Transmitted in Cluster (Upstream) | Byte/s |
cluster_net_bytes_received | Network Data Received in Cluster (Downstream) | Byte/s |
cluster_disk_read_iops | Disk Read IOPS in Cluster | times/s |
cluster_disk_write_iops | Disk Write IOPS in Cluster | times/s |
cluster_disk_read_throughput | Disk Read Throughput in Cluster | Byte/s |
cluster_disk_write_throughput | Disk Write Throughput in Cluster | Byte/s |
cluster_disk_size_capacity | Total Disk Capacity in Cluster | Byte |
cluster_disk_size_available | Available Disk Size in Cluster | Byte |
cluster_disk_size_usage | Disk Usage in Cluster | Byte |
cluster_disk_size_utilization | Disk Utilization in Cluster | |
cluster_node_total | Total Nodes in Cluster | units |
cluster_node_online | Online Nodes in Cluster | units |
cluster_node_offline_count | Count of Offline Nodes in Cluster | units |
cluster_pod_count | Total Pods in Cluster | units |
cluster_pod_running_count | Count of Running Pods in Cluster | units |
cluster_pod_abnormal_count | Count of Abnormal Pods in Cluster | units |
cluster_deployment_count | Total Deployments in Cluster | units |
cluster_deployment_normal_count | Count of Normal Deployments in Cluster | units |
cluster_deployment_abnormal_count | Count of Abnormal Deployments in Cluster | units |
cluster_statefulset_count | Count of StatefulSets in Cluster | units |
cluster_statefulset_normal_count | Count of Normal StatefulSets in Cluster | units |
cluster_statefulset_abnormal_count | Count of Abnormal StatefulSets in Cluster | units |
cluster_daemonset_count | Count of DaemonSets in Cluster | units |
cluster_daemonset_normal_count | Count of Normal DaemonSets in Cluster | units |
cluster_daemonset_abnormal_count | Count of Abnormal DaemonSets in Cluster | units |
cluster_job_count | Total Jobs in Cluster | units |
cluster_job_normal_count | Count of Normal Jobs in Cluster | units |
cluster_job_abnormal_count | Count of Abnormal Jobs in Cluster | units |
Tip
Utilization is generally a number in the range (0,1] (e.g., 0.21, not 21%)
Node¶
Metric Name | Description | Unit |
---|---|---|
node_cpu_utilization | Node CPU Utilization | |
node_cpu_total | Total CPU in Node | Core |
node_cpu_usage | CPU Usage in Node | Core |
node_cpu_requests_commitment | CPU Allocation Rate in Node | |
node_memory_utilization | Node Memory Utilization | |
node_memory_usage | Memory Usage in Node | Byte |
node_memory_requests_commitment | Memory Allocation Rate in Node | |
node_memory_available | Available Memory in Node | Byte |
node_memory_total | Total Memory in Node | Byte |
node_net_utilization | Network Data Transfer Rate in Node | Byte/s |
node_net_bytes_transmitted | Network Data Transmitted in Node (Upstream) | Byte/s |
node_net_bytes_received | Network Data Received in Node (Downstream) | Byte/s |
node_disk_read_iops | Disk Read IOPS in Node | times/s |
node_disk_write_iops | Disk Write IOPS in Node | times/s |
node_disk_read_throughput | Disk Read Throughput in Node | Byte/s |
node_disk_write_throughput | Disk Write Throughput in Node | Byte/s |
node_disk_size_capacity | Total Disk Capacity in Node | Byte |
node_disk_size_available | Available Disk Size in Node | Byte |
node_disk_size_usage | Disk Usage in Node | Byte |
node_disk_size_utilization | Disk Utilization in Node |
Workload¶
The currently supported workload types include: Deployment, StatefulSet, DaemonSet, Job, and CronJob.
Metric Name | Description | Unit |
---|---|---|
workload_cpu_usage | Workload CPU Usage | Core |
workload_cpu_limits | Workload CPU Limit | Core |
workload_cpu_requests | Workload CPU Requests | Core |
workload_cpu_utilization | Workload CPU Utilization | |
workload_memory_usage | Workload Memory Usage | Byte |
workload_memory_limits | Workload Memory Limit | Byte |
workload_memory_requests | Workload Memory Requests | Byte |
workload_memory_utilization | Workload Memory Utilization | |
workload_memory_usage_cached | Workload Memory Usage (including cache) | Byte |
workload_net_bytes_transmitted | Workload Network Data Transmitted Rate | Byte/s |
workload_net_bytes_received | Workload Network Data Received Rate | Byte/s |
workload_disk_read_throughput | Workload Disk Read Throughput | Byte/s |
workload_disk_write_throughput | Workload Disk Write Throughput | Byte/s |
- Total workload is calculated here.
- Metrics can be obtained using
workload_cpu_usage{workload_type="deployment", workload="prometheus"}
. - Calculation rule for
workload_pod_utilization
:workload_pod_usage / workload_pod_request
.
Pod¶
Metric Name | Description | Unit |
---|---|---|
pod_cpu_usage | Pod CPU Usage | Core |
pod_cpu_limits | Pod CPU Limit | Core |
pod_cpu_requests | Pod CPU Requests | Core |
pod_cpu_utilization | Pod CPU Utilization | |
pod_memory_usage | Pod Memory Usage | Byte |
pod_memory_limits | Pod Memory Limit | Byte |
pod_memory_requests | Pod Memory Requests | Byte |
pod_memory_utilization | Pod Memory Utilization | |
pod_memory_usage_cached | Pod Memory Usage (including cache) | Byte |
pod_net_bytes_transmitted | Pod Network Data Transmitted Rate | Byte/s |
pod_net_bytes_received | Pod Network Data Received Rate | Byte/s |
pod_disk_read_throughput | Pod Disk Read Throughput | Byte/s |
pod_disk_write_throughput | Pod Disk Write Throughput | Byte/s |
You can obtain the CPU usage of all Pods belonging to the Deployment named prometheus by using pod_cpu_usage{workload_type="deployment", workload="prometheus"}
.
Span Metrics¶
Metric Name | Description | Unit |
---|---|---|
calls_total | Total Service Requests | |
duration_milliseconds_bucket | Service Latency Histogram | |
duration_milliseconds_sum | Total Service Latency | ms |
duration_milliseconds_count | Number of Latency Records | |
otelcol_processor_groupbytrace_spans_released | Number of Collected Spans | |
otelcol_processor_groupbytrace_traces_released | Number of Collected Traces | |
traces_service_graph_request_total | Total Service Requests (Topology Feature) | |
traces_service_graph_request_server_seconds_sum | Total Latency (Topology Feature) | ms |
traces_service_graph_request_server_seconds_bucket | Service Latency Histogram (Topology Feature) | |
traces_service_graph_request_server_seconds_count | Total Service Requests (Topology Feature) |