VictoriaMetrics Resource Planning¶
The resource consumption of VictoriaMetrics
components in actual Prometheus RemoteWrite
usage is impacted by the number of containers in the cluster and the volume of metrics. This can lead to vminsert
and vmstorage
exceeding their allocated CPU and memory limits. To ensure stable operation of VictoriaMetrics
components across clusters of different sizes, resources need to be adjusted according to the actual scale of the cluster.
Test Results¶
After continuous monitoring of metrics from Prometheus RemoteWrite
to VictoriaMetrics
, we observed that the resource usage of VictoriaMetrics
components is generally positively correlated with the ingestion rate, which is also positively correlated with the number of Pods.
- Ingestion_rate:sum(rate(vm_rows_inserted_total{job="vminsert-insight-victoria-metrics-k8s-stack"}[1m]))
- Pod 数量:sum(kube_pod_info)
This shows how many data points per second are inserted into vminsert
(replication is not accounted for in this metric). If you need to know the ingestion rate including the replication factor, use:
sum(rate(vm_vminsert_metrics_read_total{job="vmstorage-insight-victoria-metrics-k8s-stack"}[1m]))
This query shows how many data points vmstorage
reads from vminsert
.
Through long-term testing across multiple clusters, the difference between these two ingestion rates is minor. Therefore, for simplicity, we use the data points per second inserted into vminsert
as the primary ingestion rate in this document.
Calculation Methods¶
Resource Usage Formulas¶
Component | Resource | Formula | Notes |
---|---|---|---|
vminsert | CPU | (ingestion_rate / 100k + 0.07) * 2 | |
Ingress bandwidth (KB/s) | 60 * ingestion_rate / 1k + 120 | ||
Outgoing bandwidth (KB/s) | 20 * ingestion_rate / 1k + 40 | Approximately ⅓ of ingress bandwidth | |
vmstorage | CPU | (2 * ingestion_rate / 100k - 0.02) * 2 | |
Memory (MB) | 100 * ingestion_rate / 1k * 2 | ||
Ingress bandwidth (KB/s) | 30 * ingestion_rate / 1k |
Parameter explanations:
- The
* 2
multiplier in the CPU formula reserves 50% idle CPU to reduce performance bottlenecks during peak load. - The
* 2
multiplier in the memory formula reserves 50% idle memory to lower the risk of OOM (Out of Memory) crashes during usage spikes.
Cluster Sizing¶
Based on current observations, the approximate relationship between cluster size and ingestion rate is:
ingestion_rate = (Pod count * 1000) / 120
Resource Planning Reference¶
The following table shows the approximate resource requirements for VictoriaMetrics
components at different cluster sizes under Prometheus RemoteWrite scenarios:
Cluster Size (Pod Count) | ingestion_rate | vminsert CPU (core) | vminsert Memory | vminsert Ingress Bandwidth | vminsert Outgoing Bandwidth | vmstorage CPU (core) | vmstorage Memory | vmstorage Ingress Bandwidth |
---|---|---|---|---|---|---|---|---|
100 | 8k | 0.3 | 160 MB | 600 KB/s | 200 KB/s | 0.28 | 1.6 GB | 240 KB/s |
200 | 17k | 0.48 | 340 MB | 1.1 MB/s | 380 KB/s | 0.64 | 3.3 GB | 510 KB/s |
300 | 25k | 0.64 | 500 MB | 1.6 MB/s | 540 KB/s | 0.96 | 4.9 GB | 750 KB/s |
400 | 34k | 0.84 | 500 MB | 2.1 MB/s | 720 KB/s | 1.32 | 6.7 GB | 1,020 KB/s |
500 | 42k | 0.98 | 500 MB | 2.6 MB/s | 880 KB/s | 1.64 | 8.2 GB | 1.3 MB/s |
800 | 67k | 1.48 | 500 MB | 4.1 MB/s | 1.4 MB/s | 2.64 | 13.1 GB | 2 MB/s |
1000 | 84k | 1.82 | 500 MB | 5.1 MB/s | 1.7 MB/s | 3.32 | 16.4 GB | 2.5 MB/s |
2000 | 167k | 3.48 | 500 MB | 10 MB/s | 3.3 MB/s | 6.64 | 32.6 GB | 4.9 MB/s |
Notes:
- The Pod count refers to the number of stable-running Pods in clusters where
insight-agent
is installed. If there are frequent Pod restarts, it can cause short-term metric spikes, requiring temporary resource adjustments. - The table shows resource usage values where components run stably at the given scale. For more precise requirements, it is recommended to monitor actual usage in a production environment before fine-tuning resource settings.
- During Prometheus RemoteWrite operations,
vminsert
andvmselect
are primarily involved. Currently, there’s no resource planning guideline forvmselect
; it should be evaluated based on query load.
Recommendations for Scaling Components¶
- Scaling out
vminsert
replicas increases ingestion capacity, as incoming data can be distributed across morevminsert
nodes. - Scaling out
vmstorage
replicas or adding more CPU and memory can increase the number of active time series it can handle. Replica scaling is preferred, as it improves query performance and cluster stability, especially under high churn rates. - Increasing
vmselect
CPU and memory improves performance for complex queries, particularly those processing large numbers of time series or raw samples. Addingvmselect
replicas improves query concurrency by distributing requests across more nodes. - Under standard installation modes,
VictoriaMetrics
uses default configurations: - The
vm_concurrent_insert_capacity
defaults to twice the value ofvm_available_cpu_cores{job="vminsert-insight-victoria-metrics-k8s-stack"}
. - The
vm_concurrent_select_capacity
defaults to twice the value ofvm_available_cpu_cores{job="vmselect-insight-victoria-metrics-k8s-stack"}
. - If CPU resources are insufficient,
vm_concurrent_insert_current
orvm_concurrent_select_current
may hit their limits. In that case, you can increase the limits in theCR
by adding the following:
# In the vmcluster CR, under spec.vminsert.extraArgs:
maxConcurrentInserts: "32"
# In the vmcluster CR, under spec.vmselect.extraArgs:
search.maxConcurrentRequests: "16"