Replace the first master node of the worker cluster¶
This page will take a highly available three-master-node worker cluster as an example. When the first master node of the worker cluster fails or malfunctions, how to replace or reintroduce the first master node.
This page features a highly available cluster with three master nodes.
- node1 (172.30.41.161)
- node2 (172.30.41.162)
- node3 (172.30.41.163)
Assuming node1 is down, the following steps will explain how to reintroduce the recovered node1 back into the worker cluster.
Preparations¶
Before performing the replacement operation, first obtain basic information about the cluster resources, which will be used when modifying related configurations.
Note
The following commands to obtain cluster resource information are executed in the management cluster.
-
Get the cluster name
Run the following command to find the
clusters.kubean.io
resource corresponding to the cluster: -
Get the host list configmap of the cluster
-
Get the configuration parameters configmap of the cluster
Steps¶
-
Adjust the order of control plane nodes
Reset the node1 node to restore it to the state before installing the cluster (or use a new node), maintaining the network connectivity of the node1 node.
Adjust the order of the node1 node in the kube_control_plane, kube_node, and etcd sections in the host list (node1/node2/node3 -> node2/node3/node1):
function change_control_plane_order() { cat << EOF | kubectl apply -f - --- apiVersion: v1 kind: ConfigMap metadata: name: mini-1-hosts-conf namespace: kubean-system data: hosts.yml: | all: hosts: node1: ip: "172.30.41.161" access_ip: "172.30.41.161" ansible_host: "172.30.41.161" ansible_connection: ssh ansible_user: root ansible_password: dangerous node2: ip: "172.30.41.162" access_ip: "172.30.41.162" ansible_host: "172.30.41.162" ansible_connection: ssh ansible_user: root ansible_password: dangerous node3: ip: "172.30.41.163" access_ip: "172.30.41.163" ansible_host: "172.30.41.163" ansible_connection: ssh ansible_user: root ansible_password: dangerous children: kube_control_plane: hosts: node2: node3: node1: kube_node: hosts: node2: node3: node1: etcd: hosts: node2: node3: node1: k8s_cluster: children: kube_control_plane: kube_node: calico_rr: hosts: {} EOF } change_control_plane_order
-
Remove the first master node in an abnormal state
After adjusting the order of nodes in the host list, remove the node1 in an abnormal state of the K8s control plane.
Note
If node1 is offline or malfunctioning, the following ConfigMaps must be added to extraArgs, you need not to add them when node1 is online.
# Image spray-job can use an accelerator address here SPRAY_IMG_ADDR="ghcr.m.daocloud.io/kubean-io/spray-job" SPRAY_RLS_2_22_TAG="2.22-336b323" KUBE_VERSION="v1.24.14" CLUSTER_NAME="cluster-mini-1" REMOVE_NODE_NAME="node1" cat << EOF | kubectl apply -f - --- apiVersion: kubean.io/v1alpha1 kind: ClusterOperation metadata: name: cluster-mini-1-remove-node-ops spec: cluster: ${CLUSTER_NAME} image: ${SPRAY_IMG_ADDR}:${SPRAY_RLS_2_22_TAG} actionType: playbook action: remove-node.yml extraArgs: -e node=${REMOVE_NODE_NAME} -e reset_nodes=false -e allow_ungraceful_removal=true -e kube_version=${KUBE_VERSION} postHook: - actionType: playbook action: cluster-info.yml EOF
-
Manually modify the cluster configuration, edit and update cluster-info
# Edit cluster-info kubectl -n kube-public edit cm cluster-info # 1. If the ca.crt certificate is updated, the content of the certificate-authority-data field needs to be updated # View the base64 encoding of the ca certificate: cat /etc/kubernetes/ssl/ca.crt | base64 | tr -d '\n' # 2. Change the IP address in the server field to the new first master IP, this document will use the IP address of node2, 172.30.41.162
-
Manually modify the cluster configuration, edit and update kubeadm-config
-
Scale up the master node and update the cluster
Note
- Use
--limit
to limit the update operation to only affect the etcd and kube_control_plane node groups. - If it is an offline environment, spec.preHook needs to add enable-repo.yml, and the extraArgs parameter should fill in the correct repo_list for the related OS.
cat << EOF | kubectl apply -f - --- apiVersion: kubean.io/v1alpha1 kind: ClusterOperation metadata: name: cluster-mini-1-update-cluster-ops spec: cluster: ${CLUSTER_NAME} image: ${SPRAY_IMG_ADDR}:${SPRAY_RLS_2_22_TAG} actionType: playbook action: cluster.yml extraArgs: --limit=etcd,kube_control_plane -e kube_version=${KUBE_VERSION} preHook: - actionType: playbook action: enable-repo.yml # This yaml needs to be added in an offline environment, # and set the correct repo-list (install operating system packages), # the following parameter values are for reference only extraArgs: | -e "{repo_list: ['http://172.30.41.0:9000/kubean/centos/\$releasever/os/\$basearch','http://172.30.41.0:9000/kubean/centos-iso/\$releasever/os/\$basearch']}" postHook: - actionType: playbook action: cluster-info.yml EOF
- Use
Now, you completed the replacement of the first Master node.