Examples

This section contains examples of using LWS with or without specific inference runtime.

This section provides practical examples of using LeaderWorkerSet (LWS) in various scenarios:

Infrastructure Examples

Inference Runtime Examples

  • vLLM - Deploy distributed inference with vLLM on GPUs/TPUs
  • TensorRT-LLM - High-performance inference with TensorRT-LLM
  • SGLang - Structured generation language inference
  • LlamaCPP - CPU-based inference with LlamaCPP

Each example includes detailed configuration files, deployment instructions, and best practices for production use.


vLLM

An example of using vLLM with LWS

TensorRT-LLM

An example of using TensorRT-LLM with LWS

llama.cpp

An example of using llama.cpp with LWS

SGLang

An example of using SGLang with LWS

Horizontal Pod Autoscaler (HPA)

An example of using Horizontal Pod Autoscaler with LeaderWorkerSet

Last modified September 22, 2025: add hpa docs (c1e9ac6)