Introducing llamaBench — LLM Inference Benchmark Runner
What is llamaBench? llamaBench is an LLM inference benchmark runner designed for any OpenAI-compatible server. It works with any backend — ROCm (Lemonade SDK), CUDA (Ollama, vLLM), Vulkan, or CPU ...