Skip to content

Initialize Computing Cluster

By default, when installing DCE 5.0 Enterprise, the Intelligent Engine Module can be installed synchronously. Please contact the delivery support team to obtain the Enterprise installation package.

Install Intelligent Engine

Ensure that the Intelligent Engine components have been installed in the global management cluster. You can verify this by checking if the Intelligent Engine module is available through the DCE 5.0 UI.

Info

There is an entry for Intelligent Engine in the primary navigation bar.

If it is not available, you can install it using the following method. Please note that it needs to be installed in the kpanda-global-cluster global management cluster:

# "baize" is the development codename for the Intelligent Engine component
helm repo add baize https://release.daocloud.io/chartrepo/baize
helm repo update
export VERSION=v0.1.1
helm upgrade --install baize baize/baize \
    --create-namespace \
    -n baize-system \
    --set global.imageRegistry=release.daocloud.io \
    --version=${VERSION}

If you are installing in an existing DCE environment, you can add the helm source to the container management and also use a graphical installation method.

Initialize Worker Cluster

In each worker cluster with computing resources, the corresponding basic computing components need to be deployed. The main components include:

  • gpu-operator: Initializes the GPU resources in the cluster. The installation method may vary depending on the type of GPU resources. For details, refer to GPU Management.
  • insight-agent: Observability component used to collect infrastructure information in the cluster, including logs, metrics, and events
  • baize-agent: Core component of the Intelligent Engine module, responsible for scheduling, monitoring, Pytorch, Tensorflow, and other computing components
  • nfs: Storage service used for dataset preheating

Danger

The above components must be installed, otherwise it may cause the functionality to not work properly.

After completing the above tasks, you can now perform task training and model development in the Intelligent Engine module. For detailed usage, you can refer to the following:

Introduction to Preheating Components

In the data management provided by the Intelligent Engine module, the preheating capability of datasets relies on a storage service, and it is recommended to use an NFS service:

  • Deploy NFS Server
    • If NFS already exists, you can skip this step
    • If it does not exist, you can refer to the best practices for Deploying NFS Service
  • Deploy nfs-driver-csi
  • Deploy StorageClass

Conclusion

After completing the above tasks, you can now experience all the functionalities of Intelligent Engine in the worker cluster. Enjoy using it!

Comments