VictoriaMetrics Resource Planning¶
The resource consumption of VictoriaMetrics components in actual Prometheus RemoteWrite usage is impacted by the number of containers in the cluster and the volume of metrics. This can lead to vminsert and vmstorage exceeding their allocated CPU and memory limits. To ensure stable operation of VictoriaMetrics components across clusters of different sizes, resources need to be adjusted according to the actual scale of the cluster.
Test Results¶
After continuous monitoring of metrics from Prometheus RemoteWrite to VictoriaMetrics, we observed that the resource usage of VictoriaMetrics components is generally positively correlated with the ingestion rate, which is also positively correlated with the number of Pods.
- Ingestion_rate:sum(rate(vm_rows_inserted_total{job="vminsert-insight-victoria-metrics-k8s-stack"}[1m]))
- Pod 数量:sum(kube_pod_info)
This shows how many data points per second are inserted into vminsert (replication is not accounted for in this metric). If you need to know the ingestion rate including the replication factor, use:
sum(rate(vm_vminsert_metrics_read_total{job="vmstorage-insight-victoria-metrics-k8s-stack"}[1m]))
This query shows how many data points vmstorage reads from vminsert.
Through long-term testing across multiple clusters, the difference between these two ingestion rates is minor. Therefore, for simplicity, we use the data points per second inserted into vminsert as the primary ingestion rate in this document.
Calculation Methods¶
Resource Usage Formulas¶
| Component | Resource | Formula | Notes |
|---|---|---|---|
| vminsert | CPU | (ingestion_rate / 100k + 0.07) * 2 | |
| Ingress bandwidth (KB/s) | 60 * ingestion_rate / 1k + 120 | ||
| Outgoing bandwidth (KB/s) | 20 * ingestion_rate / 1k + 40 | Approximately ⅓ of ingress bandwidth | |
| vmstorage | CPU | (2 * ingestion_rate / 100k - 0.02) * 2 | |
| Memory (MB) | 100 * ingestion_rate / 1k * 2 | ||
| Ingress bandwidth (KB/s) | 30 * ingestion_rate / 1k |
Parameter explanations:
- The
* 2multiplier in the CPU formula reserves 50% idle CPU to reduce performance bottlenecks during peak load. - The
* 2multiplier in the memory formula reserves 50% idle memory to lower the risk of OOM (Out of Memory) crashes during usage spikes.
Cluster Sizing¶
Based on current observations, the approximate relationship between cluster size and ingestion rate is:
ingestion_rate = (Pod count * 1000) / 120
Resource Planning Reference¶
The following table shows the approximate resource requirements for VictoriaMetrics components at different cluster sizes under Prometheus RemoteWrite scenarios:
| Cluster Size (Pod Count) | ingestion_rate | vminsert CPU (core) | vminsert Memory | vminsert Ingress Bandwidth | vminsert Outgoing Bandwidth | vmstorage CPU (core) | vmstorage Memory | vmstorage Ingress Bandwidth |
|---|---|---|---|---|---|---|---|---|
| 100 | 8k | 0.3 | 160 MB | 600 KB/s | 200 KB/s | 0.28 | 1.6 GB | 240 KB/s |
| 200 | 17k | 0.48 | 340 MB | 1.1 MB/s | 380 KB/s | 0.64 | 3.3 GB | 510 KB/s |
| 300 | 25k | 0.64 | 500 MB | 1.6 MB/s | 540 KB/s | 0.96 | 4.9 GB | 750 KB/s |
| 400 | 34k | 0.84 | 500 MB | 2.1 MB/s | 720 KB/s | 1.32 | 6.7 GB | 1,020 KB/s |
| 500 | 42k | 0.98 | 500 MB | 2.6 MB/s | 880 KB/s | 1.64 | 8.2 GB | 1.3 MB/s |
| 800 | 67k | 1.48 | 500 MB | 4.1 MB/s | 1.4 MB/s | 2.64 | 13.1 GB | 2 MB/s |
| 1000 | 84k | 1.82 | 500 MB | 5.1 MB/s | 1.7 MB/s | 3.32 | 16.4 GB | 2.5 MB/s |
| 2000 | 167k | 3.48 | 500 MB | 10 MB/s | 3.3 MB/s | 6.64 | 32.6 GB | 4.9 MB/s |
Notes:
- The Pod count refers to the number of stable-running Pods in clusters where
insight-agentis installed. If there are frequent Pod restarts, it can cause short-term metric spikes, requiring temporary resource adjustments. - The table shows resource usage values where components run stably at the given scale. For more precise requirements, it is recommended to monitor actual usage in a production environment before fine-tuning resource settings.
- During Prometheus RemoteWrite operations,
vminsertandvmselectare primarily involved. Currently, there’s no resource planning guideline forvmselect; it should be evaluated based on query load.
Recommendations for Scaling Components¶
- Scaling out
vminsertreplicas increases ingestion capacity, as incoming data can be distributed across morevminsertnodes. - Scaling out
vmstoragereplicas or adding more CPU and memory can increase the number of active time series it can handle. Replica scaling is preferred, as it improves query performance and cluster stability, especially under high churn rates. - Increasing
vmselectCPU and memory improves performance for complex queries, particularly those processing large numbers of time series or raw samples. Addingvmselectreplicas improves query concurrency by distributing requests across more nodes. - Under standard installation modes,
VictoriaMetricsuses default configurations: - The
vm_concurrent_insert_capacitydefaults to twice the value ofvm_available_cpu_cores{job="vminsert-insight-victoria-metrics-k8s-stack"}. - The
vm_concurrent_select_capacitydefaults to twice the value ofvm_available_cpu_cores{job="vmselect-insight-victoria-metrics-k8s-stack"}. - If CPU resources are insufficient,
vm_concurrent_insert_currentorvm_concurrent_select_currentmay hit their limits. In that case, you can increase the limits in theCRby adding the following:
# In the vmcluster CR, under spec.vminsert.extraArgs:
maxConcurrentInserts: "32"
# In the vmcluster CR, under spec.vmselect.extraArgs:
search.maxConcurrentRequests: "16"