Migrate Cluster Operating System without Downtime¶
Background¶
A customer has deployed a DCE 5.0 worker cluster in a production environment on CentOS 7.9. However, since CentOS 7.9 will no longer be maintained, the customer wants to migrate the operating system to Ubuntu 22.04. As this is a production environment, the customer wants to migrate from CentOS 7.9 to Ubuntu 22.04 without any downtime.
Risks¶
-
Normally, the best practice for migrating Kubernetes nodes is add-before-remove. However, the customer requires that node IPs do not change. Therefore, the migration sequence needs to be adjusted to remove-before-add. This change in sequence could reduce the cluster's fault tolerance and stability, so it should be handled with caution.
-
In the process of remove-before-add, resources from the original node will migrate to other nodes. Ensure that other nodes have sufficient resources and pods, or the migration will fail.
- Pods per node (
kubelet_max_pods
) is up to 110 by default. If your migration might exceed this limit, it is recomended to update the limit on all nodes in advance. - In the process of the node removal, if some services are fixed on the node being removed or traffic directed to pods on the node being removed, your business may be interrupted about 2-3 seconds.
- Pods per node (
Solution¶
- The migration needs to be handled from both Master and Worker nodes.
- The sequence to migrate: Migrate Worker Nodes -> Migrate nodes other then master1 -> Migrate master1.
- Before migrating any node, it is recommended to back up your cluster data for security.
Note
This document assumes a remove-before-add sequence. You can adjust the operation sequence as needed based on your actual scenarios.
Prepare offline resources (skip if online)¶
-
Use the installer command to import the iso and ospackage files for Ubuntu 22.04:
-
Configure offline source of Ubuntu 22.04 (
--limit
is used to restrict the YAML to only be applied to specified nodes):apiVersion: kubean.io/v1alpha1 kind: ClusterOperation metadata: name: cluster-ops-test spec: cluster: sample image: ghcr.io/kubean-io/spray-job:latest actionType: playbook action: scale.yml preHook: - actionType: playbook action: ping.yml - actionType: playbook action: enable-repo.yml # (1)! extraArgs: | --limit=ubuntu-worker1 -e "{repo_list: ['deb [trusted=yes] http://MINIO_ADDR:9000/kubean/ubuntu jammy main', 'deb [trusted=yes] http://MINIO_ADDR:9000/kubean/ubuntu-iso jammy main restricted']}" - actionType: playbook action: disable-firewalld.yml
- Before running the task, run the enable-repo playbook to create the specified URL for each node.
Migrate worker nodes¶
-
Enter the details page of the worker cluster, and in Clusters -> Advanced Settings , disable Cluster Deletion Protection .
-
In the Nodes list, select a worker node and click Remove .
-
After successful removal, connect to the Global Service Cluster via terminal or command line and get the
hosts-conf
argument under theclusters.kubean.io
resource type for the worker cluster named. In this example, the worker cluster name is centos
. -
In the Nodes list, click Add Node to re-add the previously removed node.
-
After the new node is successfully added, repeat the above steps for the other worker nodes until all migrations are complete.
Migrate master nodes¶
The process of migrating master nodes can be divided into two steps:
- Migrate master1
- Migrate nodes other than master1
Identify master1¶
Check the clusters.kubean.io
host-conf
content for the cluster, specifically all.children.kube_control_plane.hosts
. The node listed first in the kube_control_plane
section is the master1 node.
children:
kube_control_plane:
hosts:
centos-master1: null # (1)!
centos-master2: null
centos-master3: null
- Primary master node
Migrate nodes other than master1¶
-
Connect to the global service cluster via terminal or command line and get the
hosts-conf
andvars-conf
arguments under theclusters.kubean.io
resource for the worker cluster named. In this example, the worker cluster name is centos
.# Get hosts-conf argument $ kubectl get clusters.kubean.io centos -o=jsonpath="{.spec.hostsConfRef}{'\n'}" {"name":"centos-hosts-conf","namespace":"kubean-system"} # Get vars-conf argument $ kubectl get clusters.kubean.io centos -o=jsonpath="{.spec.varsConfRef}{'\n'}" {"name":"centos-vars-conf","namespace":"kubean-system"}
-
Create a ConfigMap to clean up the kube-apiserver task. The YAML file is as follows:
apiVersion: v1 kind: ConfigMap metadata: name: pb-clean-kube-apiserver namespace: kubean-system data: clean-kube-apiserver.yml: | - name: Clean kube-apiserver hosts: "{{ node | default('kube_node') }}" gather_facts: no tasks: - name: Kill kube-apiserver process shell: ps -eopid=,comm= | awk '$2=="kube-apiserver" {print $1}' | xargs -r kill -9
-
After deploying the above YAMl file, create a ClusterOperation for the master node removal. The YAML file is as follows:
apiVersion: kubean.io/v1alpha1 kind: ClusterOperation metadata: name: cluster-remove-master2 spec: cluster: centos # (1)! image: ghcr.io/kubean-io/spray-job:v0.12.2 # (2)! actionType: playbook action: remove-node.yml extraArgs: -e node=centos-master2 # (3)! postHook: - actionType: playbook actionSource: configmap actionSourceRef: name: pb-clean-kube-apiserver namespace: kubean-system action: clean-kube-apiserver.yml - actionType: playbook action: cluster-info.yml
- Worker cluster name
- Specify the image for running the Kubean task. The image address should match the one used in the previous deployment job.
- Define this as any non-master1 node.
-
After deploying the above YAML file and waiting for the node to be successfully removed, create a ClusterOperation resource for the master node addition. The YAML file is as follows:
apiVersion: kubean.io/v1alpha1 kind: ClusterOperation metadata: name: cluster-scale-master2 spec: cluster: centos # (1)! image: ghcr.io/kubean-io/spray-job:v0.12.2 # (2)! actionType: playbook action: cluster.yml extraArgs: --limit=etcd,kube_control_plane -e ignore_assert_errors=yes --skip-tags=multus preHook: - actionType: playbook action: disable-firewalld.yml postHook: - actionType: playbook action: upgrade-cluster.yml extraArgs: --limit=etcd,kube_control_plane -e ignore_assert_errors=yes - actionType: playbook action: cluster-info.yml
- Worker cluster name
- Specify the image for running the Kubean task. The image address should match the one used in the previous deployment job.
-
After deploying the above file and waiting for the node to be successfully added back, repeat steps 3 and 4 to complete the migration of the third node.
Migrate master1¶
Refer to Migrating the Master Node of Worker Cluster.
-
Update the kubeconfig of the worker cluster in the management cluster, and log in to the second node via terminal.
-
Update the ConfigMap
centos-hosts-conf
, adjusting the order of master nodes inkube_control_plane
,kube_node
, andetcd
(for example: node1/node2/node3 -> node2/node3/node1): -
Create a ClusterOperation for the master1 node removal. The YAML file is as follows:
apiVersion: kubean.io/v1alpha1 kind: ClusterOperation metadata: name: cluster-test-remove-master1 spec: cluster: centos # (1)! image: ghcr.io/kubean-io/spray-job:v0.12.2 # (2)! actionType: playbook action: remove-node.yml extraArgs: -e node=centos-master1 # (3)! postHook: - actionType: playbook actionSource: configmap actionSourceRef: name: pb-clean-kube-apiserver namespace: kubean-system action: clean-kube-apiserver.yml - actionType: playbook action: cluster-info.yml
- Worker cluster name
- Specify the image for running the Kubean task. The image address should match the one used in the previous deployment job.
- Define this as the master1 node.
-
After deploying the above YAML file and successful removal, update the ConfigMap
cluster-info
andkubeadm-config
.# Edit cluster-info kubectl -n kube-public edit cm cluster-info # 1. If the ca.crt certificate is updated, update the certificate-authority-data field. # View the base64 encoding of the ca certificate: cat /etc/kubernetes/ssl/ca.crt | base64 | tr -d '\n' # 2. Modify the server field's IP address to the second Master IP.
-
Create a ClusterOperation for the master1 node addition. The YAML file is as follows:
apiVersion: kubean.io/v1alpha1 kind: ClusterOperation metadata: name: cluster-test-scale-master1 spec: cluster: centos # (1)! image: ghcr.io/kubean-io/spray-job:v0.12.2 # (2)! actionType: playbook action: cluster.yml extraArgs: --limit=etcd,kube_control_plane -e ignore_assert_errors=yes --skip-tags=multus preHook: - actionType: playbook action: disable-firewalld.yml postHook: - actionType: playbook action: upgrade-cluster.yml extraArgs: --limit=etcd,kube_control_plane -e ignore_assert_errors=yes - actionType: playbook action: cluster-info.yml
- Worker cluster name
- Specify the image for running the Kubean task. The image address should match the one used in the previous deployment job.
-
Once the addition task is complete, the migration of the master1 node is successfully completed.