NVIDIA TensorRT-LLM
Compiler and runtime library that accelerates LLM inference on NVIDIA GPUs through quantization, kernel fusion, and batching.
Use Cases1
Companies1
Industries1
AI Use Cases with NVIDIA TensorRT-LLM
1
How Baseten Uses NVIDIA Blackwell to Achieve 5x AI Inference Throughput
Baseten · Software Engineering
5x
Throughput improvement for high-traffic endpoints
5xThroughput improvement for high-traffic endpoints
Get the full context.
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.
Often Used With
NVIDIA Dynamo
Inference optimization framework for distributed LLM serving on NVIDIA GPUs, enabling high-throughput multi-node deployments.
Get the full context.
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.