Live Webinar | Running GenAI on Kubernetes

If you're working with LLMs or production AI workloads and want to leverage Kubernetes effectively, this session is for you. Join us for a deep dive into managing and scaling Generative AI on Kubernetes.

What we'll cover:

How to run AI models for inference on Kubernetes for production: from packaging your model to scaling and performance monitoring
Kubernetes, GPUs, and quota management
How Kubernetes itself is evolving to better support LLM workloads (DRA, Gateway Extension, LeaderWorkerSet, Kueue)
Together with the ecosystem to manage training and inference workload (vLLM, Kubeflow, KServe, Llama Stack, llm-d)

This webinar is a practical companion to the book "Generative AI on Kubernetes", authored by our hosts, Roland Huß and Daniel Zonca, and offering hands-on strategies for running and optimizing your infrastructure to support these large-scale workloads.

ON-DEMAND WEBINAR

Manage & Scale GenAI on Kubernetes

Roland Huß