This article will introduce the prerequisites for configuring GPU when creating a virtual machine.
The key point of configuring GPU for virtual machines is to configure the GPU Operator so that different software components can be deployed on working nodes depending on the GPU workloads configured on these nodes. Taking the following three nodes as examples:
The controller-node-1 node is configured to run containers.
The work-node-1 node is configured to run virtual machines with direct GPUs.
The work-node-2 node is configured to run virtual machines with virtual vGPUs.
Working nodes can run GPU-accelerated containers, virtual machines with direct GPUs, or virtual machines with vGPUs, but not a combination of any.
Working nodes can run GPU-accelerated containers, virtual machines with direct GPUs, or virtual machines with vGPUs separately, without supporting any combination forms.
Cluster administrators or developers need to understand the cluster situation in advance and correctly label the nodes to indicate the type of GPU workload they will run.
The working nodes running virtual machines with direct GPUs or vGPUs are assumed to be bare metal. If the working nodes are virtual machines, the GPU direct pass-through feature needs to be enabled on the virtual machine platform. Please consult the virtual machine platform provider.
Nvidia MIG vGPU is not supported.
The GPU Operator will not automatically install GPU drivers in virtual machines.
To enable the GPU direct pass-through feature, the cluster nodes need to enable IOMMU. Please refer to How to Enable IOMMU. If your cluster is running on a virtual machine, consult your virtual machine platform provider.
Note: Building a vGPU Manager image is only required when using NVIDIA vGPUs. If you plan to use only GPU direct pass-through, skip this section.
The following are the steps to build the vGPU Manager image and push it to the container registry:
Download the vGPU software from the NVIDIA Licensing Portal.
Log in to the NVIDIA Licensing Portal and go to the Software Downloads page.
The NVIDIA vGPU software is located in the Driver downloads tab on the Software Downloads page.
Select VGPU + Linux in the filter criteria and click Download to get the software package for Linux KVM. Unzip the downloaded file (NVIDIA-Linux-x86_64-<version>-vgpu-kvm.run).
Clone the container-images/driver repository in the terminal
Go to Container Management , select your worker cluster and click Nodes. On the right of the list, click ┇ and select Edit Labels to add labels to the nodes. Each node can only have one label.
You can assign the following values to the labels: container, vm-passthrough, and vm-vgpu.
Go to Container Management , select your worker cluster, click Helm Apps -> Helm Charts , choose and install gpu-operator. You need to modify some fields in the yaml.
Add vGPU and GPU direct pass-through to the Virtnest Kubevirt CR. The following example shows the key yaml after adding vGPU and GPU direct pass-through:
spec:configuration:developerConfiguration:featureGates:-GPU-DisableMDEVConfiguration# Fill in the information belowpermittedHostDevices:mediatedDevices:# vGPU-mdevNameSelector:GRID P4-1QresourceName:nvidia.com /GRID_P4-1QpciHostDevices:# GPU direct pass-through-externalResourceProvider:truepciVendorSelector:10DE:1BB3resourceName:nvidia.com /GP104GL_TESLA_P4
In the kubevirt CR yaml, permittedHostDevices is used to import VM devices, and vGPU should be added in mediatedDevices with the following structure:
mediatedDevices:-mdevNameSelector:GRID P4-1Q# Device NameresourceName:nvidia.com/GRID_P4-1Q# vGPU information registered by GPU Operator to the node
GPU direct pass-through should be added in pciHostDevices under permittedHostDevices with the following structure:
pciHostDevices:-externalResourceProvider:true# Do not change by defaultpciVendorSelector:10DE:1BB3# Vendor id of the current pci deviceresourceName:nvidia.com/GP104GL_TESLA_P4# GPU information registered by GPU Operator to the node
Example of obtaining vGPU information (only applicable to vGPU): View node information on a node marked as nvidia.com/gpu.workload.config=vm-vgpu (e.g., work-node-2), and the nvidia.com/GRID_P4-1Q: 8 in Capacity indicates available vGPUs:
So the mdevNameSelector should be "GRID P4-1Q" and the resourceName should be "GRID_P4-1Q".
Obtain GPU direct pass-through information: On a node marked as nvidia.com/gpu.workload.config=vm-passthrough (e.g., work-node-1), view the node information, and nvidia.com/GP104GL_TESLA_P4: 2 in Capacity indicates available vGPUs:
So the resourceName should be "GRID_P4-1Q". How to obtain the pciVendorSelector? SSH into the target node work-node-1 and use the command "lspci -nnk -d 10de:" to get the Nvidia GPU PCI information, as shown in the image above.
Editing kubevirt CR note: If there are multiple GPUs of the same model, only one needs to be written in the CR, listing each GPU is not necessary.
# kubectl -n virtnest-system edit kubevirt kubevirtspec:
configuration:
developerConfiguration:
featureGates:
-GPU
-DisableMDEVConfiguration
# Fill in the information belowpermittedHostDevices:
mediatedDevices:# vGPU-mdevNameSelector:GRIDP4-1Q
resourceName:nvidia.com/GRID_P4-1Q
pciHostDevices:# GPU direct pass-through, in the above example, TEESLA P4 has two GPUs, only register one here-externalResourceProvider:truepciVendorSelector:10DE:1BB3
resourceName:nvidia.com/GP104GL_TESLA_P4