PerfectScale Logo

Join us for a live webinar

Beyond Load Balancers: Scaling Out GenAI with Message Queues on Kubernetes

Wednesday, September 25th 2024, 12pm EST.

Load balancers are a staple of scalable, high-throughput, high-availability architectures. They work great to scale web services. When requests take longer, though, things get complicated. Requests can pile up on some backends; bursts of traffic can send the latency through the roof; and when autoscaling kicks in, it might be too late and/or too expensive.

Asynchronous architectures and message queues can help a lot here combined with event-driven autoscaling.

Register Now

We're going to see how to implement that pattern on Kubernetes, leveraging:

  • A popular LLM to generate thousands of completions;
  • RabbitMQ and PostgreSQL to store requests and responses;
  • Bento to implement API servers, producers, and consumers without writing code;
  • Prometheus, Grafana, and KEDA for observability, dashboard, and autoscaling;
  • Helm and Helmfile to automate deployment as much as possible.

Who should join:

  • DevOps, Platform, and SRE professionals looking for ways to improve their autoscaling practices. 
  • Data engineers who want a better understanding of running their workloads on Kubernetes.