[LIVE WEBINAR] Beyond load balancers: scaling out genAI

Wednesday, September 25th 2024, 12pm EST.

Load balancers are a staple of scalable, high-throughput, high-availability architectures. They work great to scale web services. When requests take longer, though, things get complicated. Requests can pile up on some backends; bursts of traffic can send the latency through the roof; and when autoscaling kicks in, it might be too late and/or too expensive.

Asynchronous architectures and message queues can help a lot here combined with event-driven autoscaling.

We're going to see how to implement that pattern on Kubernetes, leveraging:

A popular LLM to generate thousands of completions;
RabbitMQ and PostgreSQL to store requests and responses;
Bento to implement API servers, producers, and consumers without writing code;
Prometheus, Grafana, and KEDA for observability, dashboard, and autoscaling;
Helm and Helmfile to automate deployment as much as possible.

Who should join:

DevOps, Platform, and SRE professionals looking for ways to improve their autoscaling practices.
Data engineers who want a better understanding of running their workloads on Kubernetes.

Join us for a live webinar

Beyond Load Balancers: Scaling Out GenAI with Message Queues on Kubernetes

Register Now

Jerome Petazzoni
Tinkerer Extraordinaire

Join us for a live webinar

Beyond Load Balancers: Scaling Out GenAI with Message Queues on Kubernetes

Register Now

Jerome PetazzoniTinkerer Extraordinaire

Jerome Petazzoni
Tinkerer Extraordinaire