If you wanted to effortlessly move your AI inferencing and modeling workloads across the clouds, what would you need from Kubernetes?
The Cloud Native Computing Foundation (CNCF) wants to know.
CNCF is creating a program for certifying Kubernetes distributions that can run select types of AI workloads. But it needs a set of requirements and recommendations first. And they are looking for your help.
The idea is to replicate what CNCF has done with the conformance guide for Kubernetes. Thus far, well over 100 K8s distributions have made that list.
A workload running on a Kubernetes-conformant distribution, whether it is on a public or private cloud, can be moved into another conformant environment with no changes.
“We want to do the same thing for AI workloads,” said CNCF CTO Chris Aniszczyk during KubeCon + CloudNativeCon China in June. This will require a set of capabilities, APIs and configurations that a Kubernetes cluster must offer (on top of the regular conformance).
The idea is to provide a “baseline compatibility” across different environments, the globe-trotting Aniszczyk further explained at KubeCon + CloudNativeCon Japan.
When the “CNCF started, the whole idea was to build infrastructure that would run on every cloud,” be it public or private, he said.
The question of how to define AI requirements is being held in SIG-Architecture, within a newly formed working group for the task.
The goal of this group is “to define a standardized set of capabilities, APIs, and configurations that a Kubernetes cluster must offer to reliably and efficiently run AI/ML [machine learning] workloads,” the working group’s GitHub page explains.
This work will also set the stage for a broader “Cloud Native AI Conformance” definition, including the other aspects of cloud native computing, such as telemetry, storage and security.
Google, Red Hat and other commercial firms are lending resources to the project.

Commoditize Kubernetes
In early virtual discussions, the overall goal is to make AI/ML workload platforms as commoditized as possible. “The hope is to minimize the amount of DIY and framework-specific patches needed to run AI/ML workloads,” a working group contributor wrote.
The group identified three types of workloads well-suited for Kubernetes:
- Large-scale training and fine-tuning: Key platform requirements include access to high-performance accelerators, high-throughput and topology-aware networking, gang scheduling and scalable access to data.
- High-performance inference: Key platform requirements include access to accelerators, advanced traffic management and standardized metrics for monitoring latency and throughput.
- MLOps pipelines: Key platform requirements include a robust batch job system, a queuing system for managing resource contention, secure access to other services like object storage and model registries, and reliable CRD/operator support.
The draft document also lists a set of recommended practices (“should”) and flat-out requirements (“must”), many of which are based on recent Kubernetes enhancements for the AI crowd.
For instance, a Kubernetes AI-compliant system must support Dynamic Resource Allocation (DRA), which will be fully available in the upcoming Kubernetes 1.34 release later this month. DRA provides more flexible and fine-grained resource controls, such as the ability to specify GPUs.
It also must support the Kubernetes Gateway API Inference extension, which specifies traffic routing patterns for LLMs.
The Cluster autoscaler must be able to scale node groups up/down with specific accelerator types requested.
And so on…
The Certification Program
A separate, as-yet-unnamed group will be in charge of accreditation.
The certification program will have a public website, listing all the Kubernetes distributions that passed the conformance tests. They will be tested annually. Each distribution will have a completed YAML-based conformance checklist.
CNCF plans to unveil the finished conformance guide at this year’s KubeCon+CloudNativeCon North America 2025 in Atlanta, Nov. 10-13.
The post CNCF Seeks Requirements for K8s-Portable AI/ML Workloads appeared first on The New Stack.
Conformance could eliminate the chaos of custom configurations, ensuring a seamless migration of AI models across environments.