Prometheus Resource Planning¶
In the actual use of Prometheus, affected by the number of cluster containers and the opening of Istio, the CPU, memory and other resource usage of Prometheus will exceed the set resources.
In order to ensure the normal operation of Prometheus in clusters of different sizes, it is necessary to adjust the resources of Prometheus according to the actual size of the cluster.
Reference resource planning¶
In the case that the mesh is not enabled, the test statistics show that the relationship between the system Job index and pods is Series count = 800 * pod count
When the service mesh is enabled, the magnitude of the Istio-related metrics generated by the pod after the feature is enabled is Series count = 768 * pod count
When the service mesh is not enabled¶
The following resource planning is recommended by Prometheus when the service mesh is not enabled :
Cluster size (pod count) | Metrics (service mesh is not enabled) | CPU (core) | Memory (GB) |
---|---|---|---|
100 | 8w | Request: 0.5 Limit: 1 | Request: 2GB Limit: 4GB |
200 | 16w | Request: 1 Limit: 1.5 | Request: 3GB Limit: 6GB |
300 | 24w | Request: 1 Limit: 2 | Request: 3GB Limit: 6GB |
400 | 32w | Request: 1 Limit: 2 | Request: 4GB Limit: 8GB |
500 | 40w | Request: 1.5 Limit: 3 | Request: 5GB Limit: 10GB |
800 | 64w | Request: 2 Limit: 4 | Request: 8GB Limit: 16GB |
1000 | 80w | Request: 2.5 Limit: 5 | Request: 9GB Limit: 18GB |
2000 | 160w | Request: 3.5 Limit: 7 | Request: 20GB Limit: 40GB |
3000 | 240w | Request: 4 Limit: 8 | Request: 33GB Limit: 66GB |
When the service mesh feature is enabled¶
The following resource planning is recommended by Prometheus in the scenario of starting the service mesh:
Cluster size (pod count) | metric volume (service mesh enabled) | CPU (core) | Memory (GB) |
---|---|---|---|
100 | 15w | Request: 1 Limit: 2 | Request: 3GB Limit: 6GB |
200 | 31w | Request: 2 Limit: 3 | Request: 5GB Limit: 10GB |
300 | 46w | Request: 2 Limit: 4 | Request: 6GB Limit: 12GB |
400 | 62w | Request: 2 Limit: 4 | Request: 8GB Limit: 16GB |
500 | 78w | Request: 3 Limit: 6 | Request: 10GB Limit: 20GB |
800 | 125w | Request: 4 Limit: 8 | Request: 15GB Limit: 30GB |
1000 | 156w | Request: 5 Limit: 10 | Request: 18GB Limit: 36GB |
2000 | 312w | Request: 7 Limit: 14 | Request: 40GB Limit: 80GB |
3000 | 468w | Request: 8 Limit: 16 | Request: 65GB Limit: 130GB |
Note
- Pod count in the table refers to the pod count that is basically running stably in the cluster. If a large number of pods are restarted, the index will increase sharply in a short period of time. At this time, resources need to be adjusted accordingly.
- Prometheus stores two hours of data by default in memory, and when the Remote Write function is enabled in the cluster, a certain amount of memory will be occupied, and resources surge ratio is recommended to be set to 2.
- The data in the table are recommended values, applicable to general situations. If the environment has precise resource requirements, it is recommended to check the resource usage of the corresponding Prometheus after the cluster has been running for a period of time for precise configuration.