PerfectScale by doit logo

 

ON-DEMAND WEBINAR

Manage & Scale GenAI on Kubernetes 

 

 

If you're working with LLMs or production AI workloads and want to leverage Kubernetes effectively, this session is for you. Join us for a deep dive into managing and scaling Generative AI on Kubernetes. 

What we'll cover: 

  • How to run AI models for inference on Kubernetes for production: from packaging your model to scaling and performance monitoring
  • Kubernetes, GPUs, and quota management
  • How Kubernetes itself is evolving to better support LLM workloads (DRA, Gateway Extension, LeaderWorkerSet, Kueue)
  • Together with the ecosystem to manage training and inference workload (vLLM, Kubeflow, KServe, Llama Stack, llm-d)
This webinar is a practical companion to the book "Generative AI on Kubernetes", authored by our hosts, Roland Huß and Daniel Zonca, and offering hands-on strategies for running and optimizing your infrastructure to support these large-scale workloads. 

Who is this webinar for:

  • DevOps engineers and platform teams looking to support AI/LLM workloads
  • ML/AI engineers deploying models in production environments
  • Kubernetes administrators and architects interested in AI scalability
  • Anyone curious about running or scaling LLMs using modern Kubernetes tools

Meet our Experts

roland hub

Roland Huß

Distinguished Engineer, Red Hat

Roland Huß is a Distinguished Engineer at Red Hat with over 25 years of programming experience. He currently works as the Llama Stack architect within Red Hat OpenShift AI (RHOAI), where he focuses on integrating the Llama Stack to advance AI-driven development workflows. He is also a co-author of Kubernetes Patterns (O’Reilly), sharing his extensive expertise in cloud-native architecture, AI integration, and serverless innovation.

daniele zonca

Daniele Zonca

Senior Principal Software Engineer, Red Hat

Daniele Zonca is a Senior Principal Software Engineer at Red Hat and the architect model serving for Red Hat OpenShift AI product. He is one of the founders of the TrustyAI project and contributes to many open source projects like KServe, vLLM or Kubeflow. Before that he led the Big Data development team in one of the major European banks designing and implementing analytical engines.

Anton Weiss

Anton Weiss

Chief Cluster Whisperer, PerfectScale

Software delivery optimization expert and Kubernetes fanboy. With previous experience as a CD Unit Leader, Head of DevOps, CTO, and CEO he has worn many hats as a consultant, instructor, and public speaker. 

He is passionate about leveraging his expertise to support the needs of DevOps, Platform Engineering, and Kubernetes communities.