DaoCloud is Among the First to Pass CNCF Kubernetes AI Conformance¶

As AI/ML workloads are driving exponential growth in demand for compute and hardware acceleration, CNCF has launched the Kubernetes AI Conformance certification standard. This standard builds on top of the baseline Kubernetes Conformance certification and defines AI-specific features, APIs and configuration requirements, to provide a uniform benchmark for cross-environment portability and efficient execution of AI workloads.

Note

It is worth noting that any vendor’s customized Kubernetes platform or distribution must first obtain the Kubernetes Conformance certification before it is eligible to apply for AI Conformance.

As a leading open-source company in China, DaoCloud keeps pace with cloud-native AI. Once the community released the Kubernetes AI Conformance standard, DaoCloud immediately launched AI Conformance testing for its widely deployed Kubernetes v1.33-based DCE 5.0 platform — and successfully passed in October 2025 (link to certificate PR) — becoming the first enterprise-grade AI/ML platform in China to pass the certification for this version.

DCE 5.0 is a high-performance, scalable, cloud-native AI operating system. It delivers a consistent and stable experience across any infrastructure or environment, supports heterogeneous clouds, edge clouds and multi-cloud orchestration. The platform integrates service mesh and microservice technologies for full-link traffic tracing, and provides intelligent monitoring and dynamic visualization dashboards to make the health of clusters, nodes, apps and services clearly observable. It also natively supports DevOps and GitOps, enabling standardized and automated application delivery, and comes with curated databases and middleware to make ops more efficient.

DCE 5.0’s modular architecture ensures each capability is decoupled and upgradeable, while also integrating with a rich AI ecosystem to provide end-to-end solutions. Validated in production by nearly a thousand enterprise customers, it forms a robust digital foundation that helps enterprises unlock AI productivity and move toward an intelligent, AI-driven digital future.

About AI Conformance Requirements¶

The AI Conformance specification defines two categories — MUST and SHOULD — covering the critical requirements for AI workloads:

MUST: covers core capabilities such as accelerator resource allocation, AI inference ingress, Gang scheduling, autoscaling, performance telemetry and security access — ensuring the platform can reliably support foundational AI training/inference.
SHOULD: extends to advanced functions such as GPU sharing, high-performance storage, topology-aware scheduling, confidential computing — enabling optimization and refinement of AI platforms.

MUST¶

Category	Item	Functional Requirement	Test Requirement
Accelerator	Accelerator resource exposure & allocation	Must support Dynamic Resource Allocation (DRA) API to allow more granular resource requests than simple counting	Validate all `resource.k8s.io/v1` DRA resources enabled
Network	Advanced AI inference ingress	Must support Kubernetes Gateway API for advanced traffic management for model inference services	Validate all `gateway.networking.k8s.io/v1` Gateway API resources enabled
Scheduling & Orchestration	Gang scheduling	Must allow installation and successful operation of at least one Gang scheduling implementation	Proof that at least one Gang scheduler works end-to-end
	Effective autoscaling for AI workloads	Cluster autoscaler must scale node groups by accelerator type	Create node pool and (A*N)+1 Pods requesting accelerators; verify scaling
		HorizontalPodAutoscaler must autoscale Pods that use accelerators correctly	Configure custom metrics pipeline, Deployment and HPA; apply load and verify
Observability & Telemetry	Accelerator performance metrics	Must support at least one accelerator metrics solution with fine-grained KPIs	Scrape Prometheus-compatible endpoint and verify accelerator metrics
	AI job & inference service metrics	Must provide metrics in standard format	Deploy app, send traffic, verify metrics collected
Security	Secure accelerator access	Must guarantee in-container access isolation & control	Deploy Pod and verify unauthorized access denied
AI Framework & Operators	Robust CRDs & controllers	Must support at least one complex AI Operator installed and working	Deploy Operator, verify Pods/Webhooks/CRDs run normally

SHOULD¶

Category	Item	Functional Requirement
Accelerator	Driver & runtime management	Verifiable mechanisms to ensure compatible driver/runtime installed; expose driver version via DRA
	GPU sharing	If supported, must have clear mechanism to improve utilization for partial-GPU workloads
	Virtual accelerators	If vGPU supported, must expose/manage via DRA
	Hardware topology awareness	Node topology (accelerator & NIC layout) should be discoverable & exposed via DRA
Storage	High-performance storage	High IOPS/throughput block/file storage exposed via `StorageClass`
		Provide at least one RWX high-performance CSI StorageClass
	Image pull optimization	Support fast pulling of large images — caching / streaming
	Data cache	Allow caching of frequently accessed data near compute nodes
Network	High-performance Pod-to-Pod networking	Use DRA to attach Pods to multiple NICs for high-perf networking
	Advanced AI inference ingress	Support Gateway API inference extensions — model hosting, LLM routing
	NetworkPolicy enforcement	Must have provider installed & active; enforce user `NetworkPolicy`
Scheduling & Orchestration	Batch job enhanced management	Support JobSet API for tightly coupled Jobs
		Support Kueue API (queues, fairness, Gang)
	Effective autoscaling	Support heterogeneous node groups and affinity/anti-affinity / taints
	Accelerator topology-aware scheduling	If accelerator interconnects are discoverable, scheduler should use them
Security	Secure workload authentication	Mechanism for secure service access without long-lived static credentials
	Confidential AI	Support confidential containers in TEE
	Model/software supply chain security	Admission control integrated with Sigstore/Cosign & policy engines
	Untrusted code sandboxing	Strong isolation to protect processes/memory/network
Maintenance & Repair	Faulty device detection	Mechanisms to detect faulty devices (and optionally auto-heal)
	Advance maintenance notification	Provide early scheduled maintenance alerts
	Gang maintenance for highly inter-connected nodes	Support Gang maintenance to minimize disruption

Note

These items and categories may evolve with industry development and standard updates — for reference only. Refer to cncf/ai-conformance README for the latest.

Certified platforms can officially use the AI Conformance mark — recognized by CNCF — signaling that the distribution is AI-friendly.

DaoCloud Leads China — Among the World’s First¶

For Kubernetes v1.33, only a handful of platforms have passed so far. The DaoCloud DCE platform was selected together with Red Hat OpenShift, SUSE RKE2 and other globally recognized platforms — demonstrating that Chinese vendors are at the frontier of cloud-native AI:

DaoCloud DCE
NeoNephos Foundation Gardener
Giant Swarm Platform
Red Hat OpenShift Container Platform
SUSE RKE2
Sidero Labs Talos Linux

DCE 5.0’s certification proves it can deliver standards-compliant, portable and highly reliable AI run-time — whether large-scale model training, high-performance inference, or MLOps pipelines — with efficiency and elasticity. Going forward, DaoCloud will continue to invest in cloud-native AI and provide a stronger foundation for enterprise AI transformation.