What AI tools did Baseten use?

Baseten used NVIDIA Dynamo, NVIDIA TensorRT-LLM in this implementation within the Technology sector.

What business function does this AI use case address?

This use case focuses on Software Engineering, demonstrating how AI can drive impact in that area.

TechnologySoftware Engineering

How Baseten Uses NVIDIA Blackwell to Achieve 5x AI Inference Throughput

Baseten, the AI inference platform pooling GPUs from 10+ cloud providers for some of the world’s fastest-growing AI companies, adopted NVIDIA Blackwell GPUs on Google Cloud alongside NVIDIA Dynamo and TensorRT-LLM. The result: 5x higher throughput for high-traffic endpoints, up to 225% better price-performance serving DeepSeek-R1 and Llama 4, and 38% lower latency for large language model serving.

Outcomes

5xThroughput improvement for high-traffic endpoints

Up to 225%Price-performance improvement for reasoning models

Up to 38%Reduction in LLM serving latency

<5 minutesGPU provisioning speed

Tools & Technologies

NVIDIA Dynamo

Inference optimization framework for distributed LLM serving on NVIDIA GPUs, enabling high-throughput multi-node deployments.

NVIDIA TensorRT-LLM

Compiler and runtime library that accelerates LLM inference on NVIDIA GPUs through quantization, kernel fusion, and batching.

AI Categories

ML Platform

Developer Tools

Challenge

Baseten needed to serve frontier reasoning models like DeepSeek-R1 and Llama 4 in production without making unacceptable tradeoffs between latency and cost— previous GPU infrastructure couldn’t handle massive context windows and extended inference compute for reasoning models at competitive price-performance.

Solution

Baseten adopted NVIDIA Blackwell GPUs on Google Cloud—the first company to do so—paired with NVIDIA Dynamo for multi-node inference orchestration and TensorRT-LLM for hardware-optimized model serving, enabling 5x throughput improvement, up to 225% better price-performance on reasoning models, and 38% latency reduction.

Full Story

Baseten operates a global AI inference platform that aggregates GPU capacity from more than 10 cloud providers across dozens of regions into a unified pool. The company’s customers are AI-native companies running production workloads on state-of-the-art large language models—and their demands are non-negotiable: low latency, high throughput, and cost efficiency, all at scale. Baseten’s orchestration layer abstracts away the complexity of managing geographically distributed GPU infrastructure, turning a fragmented set of cloud instances into a single fungible compute pool.

Access 451+ AI use cases, 424+ tools, and adoption signal rankings.

Source

NVIDIA

May 2026

Original case study ↗

Similar Cases

How Rakuten Uses Claude Code to Cut Feature Delivery from 24 to 5 Days

Rakuten

79%

Reduction in average time to market for new features

79%Reduction in average time to market for new features

How Palo Alto Networks Saves 351K Hours with Moveworks AI

Palo Alto Networks

351,000 hours

Employee productivity hours saved

351,000 hoursEmployee productivity hours saved

How Hostinger Uses Claude to Build Websites from Natural Language

Hostinger

Minutes vs. days

Website creation time

Minutes vs. daysWebsite creation time

How Notion Built Agent Orchestration on Claude to Cut Costs 90%

Notion

90%

Infrastructure cost reduction via prompt caching

90%Infrastructure cost reduction via prompt caching

How Jamf Uses Claude to Automate Workflows Across 16 Departments

Jamf

Under 45 minutes

Performance review skill build time

Under 45 minutesPerformance review skill build time

How Anything Uses Claude to Power a No-Code App Builder for 1.5M Users

Anything

800,000+

Apps created by users

800,000+Apps created by users

How Cognition Tripled Merged PRs Per Week Using Claude to Power Devin, Its Autonomous AI Engineer

Cognition

3.5×

Increase in merged PRs per week after adopting Claude Sonnet 3.6

3.5×Increase in merged PRs per week after adopting Claude Sonnet 3.6

Pfizer Migrates to SAP S/4HANA on IBM Power10

Pfizer

93%

Database reduction

93%Database reduction

How Motive Uses Glean to Deploy 2,000+ AI Agents and Save Thousands of Hours

Motive

2,000+

AI agents deployed

2,000+AI agents deployed

How OpenTable Uses Agentforce to Resolve 70% of Customer Inquiries

OpenTable

70%

Diner and restaurant inquiries resolved autonomously

70%Diner and restaurant inquiries resolved autonomously

See all use cases →