Skip to content

Prometheus Resource Planning

In the actual use of Prometheus, affected by the number of cluster containers and the opening of Istio, the CPU, memory and other resource usage of Prometheus will exceed the set resources.

In order to ensure the normal operation of Prometheus in clusters of different sizes, it is necessary to adjust the resources of Prometheus according to the actual size of the cluster.

Reference resource planning

In the case that the mesh is not enabled, the test statistics show that the relationship between the system Job index and pods is Series count = 800 * pod count

When the service mesh is enabled, the magnitude of the Istio-related metrics generated by the pod after the feature is enabled is Series count = 768 * pod count

When the service mesh is not enabled

The following resource planning is recommended by Prometheus when the service mesh is not enabled :

Cluster size (pod count) Metrics (service mesh is not enabled) CPU (core) Memory (GB)
100 8w Request: 0.5
Limit: 1
Request: 2GB
Limit: 4GB
200 16w Request: 1
Limit: 1.5
Request: 3GB
Limit: 6GB
300 24w Request: 1
Limit: 2
Request: 3GB
Limit: 6GB
400 32w Request: 1
Limit: 2
Request: 4GB
Limit: 8GB
500 40w Request: 1.5
Limit: 3
Request: 5GB
Limit: 10GB
800 64w Request: 2
Limit: 4
Request: 8GB
Limit: 16GB
1000 80w Request: 2.5
Limit: 5
Request: 9GB
Limit: 18GB
2000 160w Request: 3.5
Limit: 7
Request: 20GB
Limit: 40GB
3000 240w Request: 4
Limit: 8
Request: 33GB
Limit: 66GB

When the service mesh feature is enabled

The following resource planning is recommended by Prometheus in the scenario of starting the service mesh:

Cluster size (pod count) metric volume (service mesh enabled) CPU (core) Memory (GB)
100 15w Request: 1
Limit: 2
Request: 3GB
Limit: 6GB
200 31w Request: 2
Limit: 3
Request: 5GB
Limit: 10GB
300 46w Request: 2
Limit: 4
Request: 6GB
Limit: 12GB
400 62w Request: 2
Limit: 4
Request: 8GB
Limit: 16GB
500 78w Request: 3
Limit: 6
Request: 10GB
Limit: 20GB
800 125w Request: 4
Limit: 8
Request: 15GB
Limit: 30GB
1000 156w Request: 5
Limit: 10
Request: 18GB
Limit: 36GB
2000 312w Request: 7
Limit: 14
Request: 40GB
Limit: 80GB
3000 468w Request: 8
Limit: 16
Request: 65GB
Limit: 130GB

Note

  1. Pod count in the table refers to the pod count that is basically running stably in the cluster. If a large number of pods are restarted, the index will increase sharply in a short period of time. At this time, resources need to be adjusted accordingly.
  2. Prometheus stores two hours of data by default in memory, and when the Remote Write function is enabled in the cluster, a certain amount of memory will be occupied, and resources surge ratio is recommended to be set to 2.
  3. The data in the table are recommended values, applicable to general situations. If the environment has precise resource requirements, it is recommended to check the resource usage of the corresponding Prometheus after the cluster has been running for a period of time for precise configuration.

Comments