Monitoring metrics¶
This page explains how to enable calico_prometheus_metrics
in Calico to access Prometheus monitoring metrics.
Enable component metrics¶
When deploying through kubespray
, you can decide whether to enable it according to the calico_felix_prometheusmetricsenabled
parameter, which is false by default, or manually enable it in the following ways:
-
Enable
calico_felix_prometheusmetricsenabled
:or
-
Enable
calico_kube_controller_metrics
:calicoctl patch kubecontrollersconfiguration default --patch '{"spec":{"prometheusMetricsPort": 9095}}'
or
Create the metrics service of the respective components¶
calico-node-metrics
Service:
apiVersion: v1
kind: Service
metadata:
name: calico-node-metrics
namespace: kube-system
labels:
app: calico-node
role: metrics
spec:
clusterIP: None
selector:
k8s-app: calico-node
ports:
- port: 9091
name: metrics
targetPort: 9091
calico-kube-controllers-metrics
Service:
apiVersion: v1
kind: Service
metadata:
name: calico-kube-controllers-metrics
namespace: kube-system
labels:
app: calico-kube-controllers
role: metrics
spec:
clusterIP: None
selector:
k8s-app: calico-kube-controllers
ports:
- port: 9095
name: metrics
targetPort: 9095
Create ServiceMonitor
object¶
calico-node
ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: calico-node
namespace: kube-system
labels:
app: calico-node
operator.insight.io/managed-by: insight
spec:
endpoints:
- interval: 30s
port: metrics
selector:
matchLabels:
app: calico-node
role: metrics
calico-kube-controllers-metrics
ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: calico-kube-controller
namespace: kube-system
labels:
app: calico-kube-controller
operator.insight.io/managed-by: insight
spec:
endpoints:
- interval: 30s
port: metrics
selector:
matchLabels:
app: calico-kube-controllers
role: metrics
List of important metrics¶
No. | Index | Explanation | Value Index | Other Explanations | Associated Issues |
---|---|---|---|---|---|
1 | felix_ipset_errors | Execution ipset-restore failed times | ** | Consider collection | Times +1 does not necessarily cause problems |
2 | felix_iptables_restore_calls | Number of iptables-restore executions | ***** | Collection | NA |
3 | felix_iptables_restore_errors | Number of iptables-restore failures | ***** | Collection | Restore failure may cause Pod access failure. The restore failure may be due to the failure of xtables_lock competition, please check whether the number of iptables on the host is too large |
4 | felix_iptables_save_calls | Number of times to run iptables-save | **** | Can consider collecting | NA |
5 | felix_iptables_save_errors | Number of failed iptables-save operations | ***** | Collection | |
6 | felix_log_errors | The number of times the log reports an error | ***** | Recommended collection | |
7 | ipam_allocations_per_node | Number of IP allocations on each node | **** | Suggested collection | |
8 | ipam_blocks_per_node | Number of Blocks allocated on each node | ***** | Collection |