Skip to content

Enabling MIG Function

This section describes how to enable NVIDIA MIG function. NVIDIA currently provides two strategies for exposing MIG devices on Kubernetes nodes:

  • Single mode : Nodes expose a single type of MIG device on all their GPUs.
  • Mixed mode : Nodes expose a mixture of MIG device types on all their GPUs.

Tip

After disabling MIG mode, the physical node needs to be restarted in order to use the whole card mode properly.

For more details, refer to the NVIDIA GPU Card Usage Modes.

Prerequisites

  • Check the system requirements for the GPU driver installation on the target node: GPU Support Matrix
  • Ensure that the cluster nodes have GPUs of the corresponding models (NVIDIA H100, A100, and A30 Tensor Core GPUs). For more information, see the GPU Support Matrix.
  • All GPUs on the nodes must belong to the same product line (e.g., A100-SXM-40GB).

Enable GPU MIG single mode

  1. Enable MIG Single mode through the Operator. Configure the parameters in the installation interface:

  2. After the installation is complete, it is necessary to label the corresponding node (the node where the GPU card is inserted) with the partitioning specifications. If this step is not executed, it will default to no partitioning.

    Tip

    The Single mode can only be partitioned in a single mode. It is recommended to use the default strategy, but you can also customize the partitioning strategy.

    UI Configuration:

    1. Search for default-mig-parted-config in the ConfigMap, enter the details, and find the partitioning specifications corresponding to the GPU card model.

    2. Find the corresponding node, select Modify Labels, and add nvidia.com/mig.config="all-1g.10gb". If you choose another specification, then partition according to that specification.

      single02

    CLI Configuration:

    kubectl label nodes {node} nvidia.com/mig.config="all-1g.10gb" --overwrite
    
  3. Check the configuration result:

    kubectl get node 10.206.0.17 -o yaml|grep nvidia.com/mig.config
    

After the setup is complete, you can confirm the deployment of the application and then use GPU MIG resources.

Enable GPU MIG Mixed mode

  1. Enable MIG Mixed mode through the Operator. Configure the parameters in the installation interface:

    mixed

    • Set DevicePlugin to enable
    • Set MIG strategy to mixed
    • Enable the enabled parameter under Mig Manager
    • Set MigManager Config to the default MIG partitioning strategy default-mig-parted-config , or customize the partitioning strategy configuration file.
  2. After the installation is complete, it is necessary to label the corresponding node (the node where the GPU card is inserted) with the partitioning specifications. If this step is not executed, it will default to no partitioning.

    Tip

    It is recommended to use the default strategy, but you can also customize the partitioning strategy.

    UI Configuration:

    1. Search for default-mig-parted-config in the ConfigMap, enter the details, and find the partitioning specifications corresponding to the GPU card model.

    2. Find the corresponding node, select Modify Labels, and add nvidia.com/mig.config="all-1g.10gb". If you choose another specification, then partition according to that specification.

      single02

    CLI Configuration:

    kubectl label nodes {node} nvidia.com/mig.config="all-1g.10gb" --overwrite
    
  3. Check the configuration result

    kubectl get node 10.206.0.17 -o yaml|grep nvidia.com/mig.config
    

After the setup is complete, you can confirm the deployment of the application and then use GPU MIG resources.

Custom Partitioning Strategy

You can customize the partitioning strategy configuration file, with a maximum of 7 instances per card. This needs to be created before installing the GPU Operator and specified during installation with the ConfigMap name.

  1. Create a custom partitioning strategy in the ConfigMap, which needs to be in the same namespace as the GPU operator during deployment. The file name you create cannot be the same as the default default-mig-parted-config. The configuration data can be referenced in the following YAML.

    Click to view detailed YAML configuration instructions

    The following YAML is an example of a custom configuration named custom-mig-parted-config. The key in the configuration data is as shown in the content of config.yaml below, and you can customize and add other partitioning strategies.

    config.yaml
      ## Custom split GI instance configuration
      version: v1
      mig-configs:
        all-disabled:
          - devices: all
            mig-enabled: false
    
        # A100-40GB, A800-40GB
          all-1g.5gb:
            - devices: all
              mig-enabled: true
              mig-devices:
                "1g.5gb": 7
    
        all-1g.5gb.me:
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.5gb+me": 1
    
        all-2g.10gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "2g.10gb": 3
    
        all-3g.20gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "3g.20gb": 2
    
        all-4g.20gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "4g.20gb": 1
    
        all-7g.40gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "7g.40gb": 1
    
        # H100-80GB, H800-80GB, A100-80GB, A800-80GB, A100-40GB, A800-40GB
        all-1g.10gb:
          # H100-80GB, H800-80GB, A100-80GB, A800-80GB
          - device-filter: ["0x233010DE", "0x233110DE", "0x232210DE", "0x20B210DE", "0x20B510DE", "0x20F310DE", "0x20F510DE"]
            devices: all
            mig-enabled: true
            mig-devices:
              "1g.10gb": 7
    
          # A100-40GB, A800-40GB
          - device-filter: ["0x20B010DE", "0x20B110DE", "0x20F110DE", "0x20F610DE"]
            devices: all
            mig-enabled: true
            mig-devices:
              "1g.10gb": 4
    
        # H100-80GB, H800-80GB, A100-80GB, A800-80GB
        all-1g.10gb.me:
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.10gb+me": 1
    
        # H100-80GB, H800-80GB, A100-80GB, A800-80GB
        all-1g.20gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.20gb": 4
    
        all-2g.20gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "2g.20gb": 3
    
        all-3g.40gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "3g.40gb": 2
    
        all-4g.40gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "4g.40gb": 1
    
        all-7g.80gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "7g.80gb": 1
    
        # A30-24GB
        all-1g.6gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.6gb": 4
    
        all-1g.6gb.me:
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.6gb+me": 1
    
        all-2g.12gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "2g.12gb": 2
    
        all-2g.12gb.me:
          - devices: all
            mig-enabled: true
            mig-devices:
              "2g.12gb+me": 1
    
        all-4g.24gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "4g.24gb": 1
    
        # H100 NVL, H800 NVL
        all-1g.12gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.12gb": 7
    
        all-1g.12gb.me:
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.12gb+me": 1
    
        all-2g.24gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "2g.24gb": 3
    
        all-3g.47gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "3g.47gb": 2
    
        all-4g.47gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "4g.47gb": 1
    
        all-7g.94gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "7g.94gb": 1
    
        # H100-96GB, PG506-96GB
        all-3g.48gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "3g.48gb": 2
    
        all-4g.48gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "4g.48gb": 1
    
        all-7g.96gb:
          - devices: all
            mig-enabled: true
            mig-devices:
              "7g.96gb": 1
    
        # H100-96GB, H100 NVL, H800 NVL, H100-80GB, H800-80GB, A800-40GB, A800-80GB, A100-40GB, A100-80GB, A30-24GB, PG506-96GB
        all-balanced:
          # H100 NVL, H800 NVL
          - device-filter: ["0x232110DE", "0x233A10DE"]
            devices: all
            mig-enabled: true
            mig-devices:
              "1g.12gb": 1
              "2g.24gb": 1
              "3g.47gb": 1
    
          # H100-80GB, H800-80GB, A100-80GB, A800-80GB
          - device-filter: ["0x233010DE", "0x233110DE", "0x232210DE", "0x20B210DE", "0x20B510DE", "0x20F310DE", "0x20F510DE"]
            devices: all
            mig-enabled: true
            mig-devices:
              "1g.10gb": 2
              "2g.20gb": 1
              "3g.40gb": 1
    
          # A100-40GB, A800-40GB
          - device-filter: ["0x20B010DE", "0x20B110DE", "0x20F110DE", "0x20F610DE"]
            devices: all
            mig-enabled: true
            mig-devices:
              "1g.5gb": 2
              "2g.10gb": 1
              "3g.20gb": 1
    
          # A30-24GB
          - device-filter: "0x20B710DE"
            devices: all
            mig-enabled: true
            mig-devices:
              "1g.6gb": 2
              "2g.12gb": 1
    
          # H100-96GB, PG506-96GB
          - device-filter: ["0x233D10DE", "0x20B610DE"]
            devices: all
            mig-enabled: true
            mig-devices:
              "1g.12gb": 2
              "2g.24gb": 1
              "3g.48gb": 1
    
       # After setting, the CI instance will be partitioned according to the set specifications 
        custom-config:    
          - devices: all
            mig-enabled: true
            mig-devices:
              "1g.10gb": 4
              "1g.20gb": 2
    

    Set custom-config in the above YAML, and after setting, the CI instance will be partitioned according to the specification.

    custom-config:
         devices: all
        mig-enabled: true
        mig-devices:
          1c.3g.40gb: 6
    
  2. Specify this ConfigMap during the installation of the GPU Operator.

Comments