Deploying the LLM Studio(WS Mode)¶

This document explains how to deploy the privatized WS (Workspace) mode of the LLM StudioHydra, which supports inference without GPUs.

Global Service Cluster¶

In the Global Service Cluster, you need to install:

Hydra: manually install or install via the installer
Service Mesh (depends on Istio to create gateways for routing)
MySQL
Redis

Tip

The Global Service Cluster installs a common MySQL and Redis by default.
If Hydra is installed via the installer, it will use the common MySQL and Redis by default.
If Hydra is installed manually, you need to specify MySQL and Redis in the installation parameters.

The installed result is as shown below:

Worker Cluster¶

The worker cluster needs Hydra Agent and metallb installed.

Install Hydra Agent¶

Create a worker cluster to deploy Hydra Agent (if resources are limited, you can also directly use the Global Service Cluster)

Deploy Hydra Agent in the worker cluster

Pay attention to the following parameters:

global:
  config:
    cluster_name: 'jxj31-mspider'
    agent_base_url: 'http:/cn-sh-a1' # Gateway address accessible by the worker cluster
    agent_server:
      address: example.com:443 # DCE address of the Global Service Cluster
      plaintext: false
      insecure: true # For test environments, you can set this to true to bypass certificates

Deploy metallb in the worker cluster (used for routing access to the model) and allocate a LoadBalancer

Create Service Mesh (Istio + Gateway API)¶

Create a dedicated mesh in the worker cluster (use default parameters when creating the mesh)

Note: Managed meshes currently do not support gateway API.

Initialize Gateway API CRD

root@controller-node-1:~# kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd?ref=v1.2.1" | kubectl apply -f -;

customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created

Create the gateway and routing rules

root@controller-node-1:~# cat gateway.yaml

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: gateway
  namespace: default 
spec:
  gatewayClassName: istio
  listeners:
  - allowedRoutes:
      namespaces:
        from: All
    name: default
    port: 80
    protocol: HTTP
- allowedRoutes:
      namespaces:
        from: All
    hostname: 'cn-sh-a1'
    name: https
    port: 443
    protocol: HTTPS
    tls:
      certificateRefs:
      - group: ""
        kind: Secret
        name: cn-sh-a1
    mode: Terminate

root@controller-node-1:~# kubectl apply -f gateway.yaml

gateway.gateway.networking.k8s.io/gateway created

root@controller-node-1:~# kubectl get po

NAME                             READY   STATUS    RESTARTS   AGE
gateway-istio-5c497d4b6d-9xmqp   1/1     Running   0          14s
root@controller-node-1:~# kubectl  get svc
NAME            TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                                      AGE
gateway-istio   LoadBalancer   10.233.1.140   10.64.24.211   15021:32565/TCP,80:32053/TCP,443:32137/TCP   45s

root@controller-node-1:~# cat httproute.yaml

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
  name: hydra-agent-knoway
  namespace: hydra-system
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: gateway
    namespace: default 
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: knoway-gateway
      port: 8080
      weight: 1
    filters:
    - responseHeaderModifier:
        add:
        - name: Access-Control-Allow-Headers
          value: '*'
        - name: Access-Control-Allow-Methods
          value: '*'
      type: ResponseHeaderModifier
    matches:
    - path:
        type: PathPrefix
        value: /v1

root@controller-node-1:~# kubectl apply -f httproute.yaml

httproute.gateway.networking.k8s.io/hydra-agent-knoway created

Configure domain name resolution, mapping the domain name to the Ingress gateway’s LB. Append the domain mapping to /etc/hosts on both the worker cluster and the local computer running the browser:
```
echo "10.64.xx.xx cn-sh-a1" | sudo tee -a /etc/hosts
```

Set the local computer to trust the certificate:

Worker Cluster Initialization¶

Insert vendor data into the database.
For new versions of Hydra, this step is unnecessary because Hydra can automatically create vendors.

When implementing (only batch upload supported), use the mcamel-system workload mcamel-common-mysql-cluster-mysql.

Example:

{"enUS": "Alibaba", "zhCN": "通义千问"}

Register Model in Hydra Ops Platform¶

Configure model deployment parameters (adjust according to actual needs)

Install nfs drive¶

The nfs drive is required when downloading model artifacts.

wget http://example.com/nfs-install.tar # replace with your download URL
tar xvf nfs-install.tar
./install.sh

Worker Cluster Model Warm-up¶

Model warm-up refers to pre-downloading the model image.

When Hydra and AI Lab both exist, there will be two dataset CRDs. Note: use dataset.baizeai.io.

root@controller-node-1:~# cat dataset-qwen3-06b.yaml

apiVersion: dataset.baizeai.io/v1alpha1 # This should belong to the AI Lab group
kind: Dataset
metadata:
  name: qwen3-0.6b
  namespace: public # must be the public namespace
  labels:
    hydra.io/model-id: qwen3-0.6b # must match the model name
spec:
  dataSyncRound: 1
  share: true
  source:
    type: HTTP
    uri: http://example.com:81/model/qwen3-06b/ # replace with your address
    options:
      repoType: MODEL
  mountOptions:
    uid: 1000
    gid: 1000
    mode: "0774"
    path: /
  volumeClaimTemplate:
    spec:
      storageClassName: nfs-csi

root@jxj:~/hydra-deploy# kubectl create ns public

namespace/public created

root@jxj:~/hydra-deploy# kubectl apply -f dataset-qwen3-06b.yaml

dataset.dataset.baizeai.io/qwen3-0.6b created

Try the Model¶

Try the model in DCE 5.0.

Model Deployment¶

If running without GPUs, note:

Model deployment without GPU (deployment detects no GPU and fails)

Therefore, deploy gpu-operator-fake (pulling external images requires proxy):
1. Configure Node
```
kubectl label node <node-name> run.ai/simulated-gpu-node-pool=default
```
2. Install fake-gpu-operator
  
  For offline installation, first download the offline package and deploy:
  
  For online installation, run:
```
helm upgrade -i gpu-operator oci://ghcr.io/run-ai/fake-gpu-operator/fake-gpu-operator --namespace gpu-operator --create-namespace --version=0.0.63
```
  After deployment, wait a few minutes, check that the Node has GPU labels, and refresh status to confirm detection passed.

After creating a model deployment task, modify deployment parameters:

--dtype=half
--device=cpu
--max-model-len=8192
release.daocloud.io/hydra/vllm-openai:0.8.5.dev940-cpu

When using CPU inference, the Qwen3 0.6b model occupies about 7GB memory. CPU allocation determines token speed.

Tip

For test environment model traffic, route directly through the knoway-gateway.
Create a NodePort type svc service for the gateway
When accessing DCE, only the HTTP port can be used