Offline Installation and Usage of Metax GPU Components¶
This section provides guidance on the offline installation of Metax components—metax-gpu-extensions
, metax-operator
, and metax-exporter
—as well as instructions for using Metax GPU cards.
Prerequisites¶
- DCE 5.0 container management platform has been deployed and is functioning properly.
- The container management module has either joined an existing Kubernetes cluster or created a new one, and the UI interface of the cluster is accessible.
- The GPU cards in the current cluster are not virtualized and are not in use by other applications.
Component Overview¶
The container management system provides three Helm chart packages: metax-gpu-extensions
, metax-operator
, and metax-exporter
. You may choose to install different components based on your use case.
- metax-gpu-extensions: Includes
gpu-device
andgpu-label
components. This package is intended only for full GPU usage scenarios. - metax-operator: Includes
gpu-device
,gpu-label
,driver-manager
,container-runtime
, andoperator-controller
components. This package supports both full GPU and vGPU scenarios. - metax-exporter: Includes
ServiceAccount
,ConfigMap
,Service
,DaemonSet
, andServiceMonitor
. It is primarily used for monitoring Metax GPU cards.
Procedure¶
- In the left-hand navigation bar, go to Container Management → Cluster Management, and click the name of the target cluster.
- In the left-hand navigation bar, click Helm Apps → Helm Templates, then search for
metax
. - You will see the following three components. Select and install the ones you need.
Installation Notes¶
-
Issues with
metax-gpu-extensions
andmetax-operator
:
These addons have a design flaw—each has a standaloneregistry
field.
Thecharts-syncer
tool only supports two-level fields and cannot handle this structure.
Therefore, you must manually update theregistry
field in both addons to match the value of theimage.registry
field during deployment.Before modification:
The
charts-syncer
tool fails due to this nested field structure. You need to refactor it to a simpler form.After modification:
-
Panel issue with
metax-exporter
v0.5.0:In version
v0.5.0
, the metric names are prefixed withmx_
.
To restore the original naming, you must add areplace
configuration in theServiceMonitor
.-
After installing
metax-exporter v0.5.0
, go to the target cluster and search forservicemonitor
under Custom Resources, then click the item namedservicemonitors.monitoring.coreos.com
. -
Edit the YAML of
metax-exporter
, and add the following content at the appropriate location:Content to add:
Result after adding (make sure the format and indentation are correct):
-