HAMi - Heterogeneous Computing Virtualization Middleware¶

HAMi is already a project in the Cloud Native Computing Foundation landscape. HAMi is applying to join the Sandbox and has currently passed the voting stage, exceeding the 66% qualified score threshold. See related post news.

Supported Devices:

example

The heterogeneous computing virtualization middleware HAMi meets all the capabilities you need to manage heterogeneous computing clusters, including:

Device Reuse: Each task can occupy only a portion of the GPU, and multiple tasks can share a single GPU.
Restrictable Memory Allocation: You can now allocate GPUs using memory values (e.g., 3000M) or memory ratios (e.g., 50%), ensuring that the memory used by tasks does not exceed the allocated value.
Specify Device Models: The current task can choose to use or not use specific models of devices by setting annotation.
Device UUID Specification: The current task can choose to use or not use specified devices by setting annotation, such as "nvidia.com/use-gpuuuid" or "nvidia.com/nouse-gpuuuid".
Non-Intrusive: The vGPU scheduler is compatible with the official NVIDIA plugin's GPU allocation method, so after installation, you do not need to modify existing task files to use vGPU features. You can also customize resource names.
Scheduling Policies: The vGPU scheduler supports various scheduling policies, including node and GPU card dimension scheduling policies, which can be set as defaults through scheduler parameters or based on application scenarios by setting Pod's annotation, such as "hami.io/node-scheduler-policy" or "hami.io/gpu-scheduler-policy". Both dimensions support binpack and spread strategies.

Use Cases¶

Scenarios where computing devices need to be reused in a cloud-native environment.
Custom requests for heterogeneous computing, such as requesting virtual GPUs with specific memory sizes, with each virtual GPU using a specific proportion of computing.
In a cluster composed of multiple heterogeneous computing nodes, tasks need to be allocated to suitable nodes based on their GPU requirements.
Situations where memory and computing unit utilization is low, such as running 10 tf-serving instances on a single GPU.
Scenarios requiring many small GPUs, such as providing a single GPU for multiple students in an educational setting or cloud platforms offering small GPU instances.

Product Design¶

Architecture Diagram

HAMi includes several components: a unified mutating webhook, a unified scheduler, and device plugins and control components for various heterogeneous computing devices. The overall architectural features are shown in the diagram above.

Product Features¶

HAMi can achieve hard isolation of memory resources.

A simple demonstration of hard isolation: After submitting tasks defined in the following way

resources:
  limits:
    nvidia.com/gpu: 1 # requesting 1 vGPU
    nvidia.com/gpumem: 3000 # Each vGPU contains 3000m device memory

Only 3G of visible memory will be available:

Allows requesting computing devices by specifying memory.
Hard isolation of computing resources.
Allows requesting computing devices by specifying utilization ratios.
Zero modification required for existing programs.

Installation Requirements¶

NVIDIA drivers >= 440
nvidia-docker version > 2.0
docker/containerd/cri-o configured with nvidia as the default runtime
Kubernetes version >= 1.16
glibc >= 2.17 & glibc < 2.3.0
kernel version >= 3.10
helm > 3.0

Quick Start¶

Choose Your Cluster Scheduler¶

GPU Node Preparation¶

The following steps need to be performed on all GPU nodes. This README assumes that the GPU nodes have NVIDIA drivers installed. It also assumes you have docker or containerd installed and need to configure nvidia-container-runtime as the default low-level runtime to use.

Example Installation Steps¶

# Add the package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

Configure Docker¶

You need to set the nvidia runtime as your default docker runtime on the nodes. We will edit the docker daemon configuration file, usually located at /etc/docker/daemon.json:

{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

systemctl daemon-reload && systemctl restart docker

Configure containerd¶

You need to set the nvidia runtime as your default containerd runtime on the nodes. We will edit the containerd daemon configuration file, usually located at /etc/containerd/config.toml:

version = 2
[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

systemctl daemon-reload && systemctl restart containerd

Finally, you need to label all GPU nodes with gpu=on, otherwise, the nodes will not be scheduled:

kubectl label nodes {nodeid} gpu=on

Install, Update, and Uninstall¶

First, add the HAMi repo using helm:

helm repo add hami-charts https://project-hami.github.io/HAMi/

Then, use the following command to get the cluster server version:

kubectl version

During installation, specify the scheduler image version based on your cluster server version (the result from the previous command). For example, if the cluster server version is 1.16.8, you can use the following command to install:

helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.16.8 -n kube-system

You can modify the configuration here to customize the installation.

If you see the vgpu-device-plugin and vgpu-scheduler pods in the Running state using the kubectl get pods command, the installation is successful.

kubectl get pods -n kube-system

Update

You just need to update the helm repo and restart the entire Chart to automatically complete the update; the latest images will be downloaded automatically:

helm uninstall hami -n kube-system
helm repo update
helm install hami hami-charts/hami -n kube-system

Note: If you perform a hot update without cleaning up tasks, running tasks may encounter segmentation faults or other errors.

Uninstall

helm uninstall hami -n kube-system

Note: Uninstalling the components will not cause running tasks to fail.

Submit Tasks¶

NVIDIA vGPUs can now be requested through the resource type nvidia.com/gpu:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: ubuntu-container
      image: ubuntu:18.04
      command: ["bash", "-c", "sleep 86400"]
      resources:
        limits:
          nvidia.com/gpu: 2 # Requesting 2 vGPUs
          nvidia.com/gpumem: 3000 # Each vGPU requests 3000m memory (optional, integer type)
          nvidia.com/gpucores: 30 # Each vGPU's computing is 30% of the actual GPU's computing (optional, integer type)

If your task cannot run on any node (for example, if the task's nvidia.com/gpu is greater than the actual GPU count on any GPU node in the cluster), the task will be stuck in a pending state.

Now you can execute the nvidia-smi command in the container and compare the differences between vGPU and actual GPU memory sizes.

Note:

If you use the privileged field, this task will not be scheduled because it can see all GPUs and may affect other tasks.

Do not set the nodeName field; for similar requirements, use nodeSelector.

More Examples¶

See more examples.

Monitoring¶

Monitoring is automatically enabled by default after the scheduler is successfully deployed. You can access monitoring data via:

http://{nodeip}:{monitorPort}/metrics

Where monitorPort can be configured in Values, defaulting to 31992.

Refer to the Grafana dashboard example.

Note: The vGPU status on the node will only be counted after it has been used.

Refer to the complete performance testing documentation.

Known Issues¶

Currently supports only computing tasks, not video encoding/decoding processing.
Temporarily only supports MIG's "none" and "mixed" modes, single mode is not supported yet.
Tasks with the "nodeName" field may encounter scheduling issues; for similar requirements, please use "nodeSelector".
We modified the device-plugin component's environment variable from NodeName to NODE_NAME. If you are using the image version v2.3.9, device-plugin may fail to start. There are currently two suggested fixes:
- Manually execute kubectl edit daemonset to change the device-plugin environment variable from NodeName to NODE_NAME.
- Upgrade to the latest version using helm; the latest version of the device-plugin image is v2.3.10. Execute helm upgrade hami hami/hami -n kube-system to automatically fix it.

Development Plans¶

The currently supported heterogeneous computing devices and their corresponding reuse features are shown in the table below:

Product	Manufacturer	Memory Isolation	computing Isolation	Multi-GPU Support
GPU	NVIDIA	✅	✅	✅
MLU	Cambricon	✅	❌	❌
DCU	Hygon	✅	✅	❌
Ascend	Huawei	In Development	In Development	❌
GPU	Iluvatar	In Development	In Development	❌
DPU	Taihu	In Development	In Development	❌

Contribution¶

If you want to become a contributor to HAMi, refer to the contributor guidelines, which contains detailed contribution processes.