Introduction to failover¶
When a cluster fails, the pod replicas in the cluster will be automatically migrated to other available clusters to ensure service stability.
Prerequisites
The scheduling policy of multicloud workloads can only choose aggregation mode or dynamic weight mode, and the failover feature can only take effect at this time.
Enabling failover¶
-
Enter the Multicloud Management module, click System Settings -> Advanced Settings , failover can realize copy scheduling between multiple clusters, it is disabled by default, please enable it if necessary.
-
The following parameters are for the cluster, click to enable failover and save.
Parameter Description Field Name Default Value ClusterMonitorPeriod Interval for checking cluster status Check Interval 60s ClusterMonitorGracePeriod If the cluster health status is not obtained within this configured time during runtime, the cluster will be marked as unhealthy The runtime marks the duration of an unhealthy check 40s ClusterStartupGracePeriod If the cluster health status is not obtained within this configured time at startup, the cluster will be marked as unhealthy Mark health check duration at startup 600s FailoverEvictionTimeout After a cluster is marked as unhealthy, it will be tainted and enter eviction state if this duration is exceeded (cluster will be tainted with eviction) Eviction tolerance time 30s ClusterTaintEvictionRetryFrequency Maximum waiting duration after entering the graceful eviction queue, after which immediate deletion will occur Graceful eviction timeout duration 5s
Verifying failover¶
-
Create a multicloud deployment, choose to deploy on multiple clusters, and select the Aggregated/DynamicWeight mode for the scheduling policy.
-
If a cluster is unhealthy at this time and has not recovered within the specified time range, the cluster will be stained and enter the eviction state (this document will manually stain a cluster).
-
At this time, the Pods with no deployment will be migrated according to the resources of the remaining clusters. Eventually there will be no Pods in an unhealthy (tainted) cluster.