TechnologySoftware Engineering

How Trillion Labs Cuts LLM Training Time 7x with NVIDIA NeMo Curator

Trillion Labs, a Korean AI startup building sovereign LLMs for the Korean language, deployed NVIDIA NeMo Curator to accelerate data curation across 2 trillion tokens. GPU-accelerated processing on 8x H100s cut processing time from 24 hours to 3.4 hours — a 7x improvement — and reduced compute costs up to 10x compared to CPU pipelines, while delivering a 5% accuracy boost for Korean language models.

Impact

7x

Data processing speedup

up to 10x

Compute cost reduction vs CPU

5%

Korean language accuracy improvement

Challenge

Trillion Labs’ CPU-based data curation pipeline for Korean LLM training took 24 hours per run on datasets exceeding 2 trillion tokens, creating iteration bottlenecks that slowed model development and made rapid experimentation on high-quality Korean language data practically impossible.

Solution

Trillion Labs deployed NVIDIA NeMo Curator on 8x H100 GPUs with DASK for parallel processing, GPU-accelerating deduplication, quality filtering, and data shuffling across 100 billion curated Korean tokens — cutting processing time from 24 hours to 3.4 hours and reducing compute costs up to 10x.

Tools & Technologies

What Leaders Say

Deduplication is one of the most time-consuming processes when handling very large datasets. The time saved from NeMo Curator GPU acceleration was the most significant benefit.

Jason Park, Co-Founder, Trillion Labs
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

Trillion Labs is a Korean AI startup dedicated to building sovereign large language models for the Korean language. Its mission is to close the gap between English-dominant foundation models and the needs of Korean public sector organizations and enterprises, which require LLMs that understand Korean linguistic nuance, government terminology, and cultural context. Building high-quality Korean LLMs at scale requires curation pipelines capable of processing datasets exceeding 2 trillion tokens — volumes that expose every inefficiency in traditional CPU-based workflows.

The core problem was data pipeline throughput. Deduplication and shuffling operations on datasets at this scale took 24 hours per run on CPU infrastructure. Iteration cycles became prohibitively slow: every experiment in model architecture or data composition required waiting nearly a full day to simply complete preprocessing. This bottleneck made it impossible to move quickly on model development, creating a compounding disadvantage relative to resource-rich competitors working on high-resource language models.

Trilion Labs deployed NVIDIA NeMo Curator, a GPU-accelerated data curation library, running on a cluster of 8x H100 GPUs with DASK for parallel and distributed processing. NeMo Curator’s GPU acceleration applied to the most compute-intensive steps — exact and fuzzy deduplication, quality filtering, and data shuffling — across a curated dataset of 100 billion high-quality Korean tokens. NVIDIA NeMo provided the broader framework for model training and evaluation downstream.

The processing speedup was immediate and dramatic. Curation pipelines that previously ran for 24 hours completed in 3.4 hours on the 8x H100 configuration — a 7x improvement in throughput. Energy and compute costs dropped up to 10x compared to CPU-based approaches. The curated 100-billion-token Korean dataset produced a measurable 5% accuracy improvement in Korean language tasks, validating the quality of GPU-accelerated curation against the brute-force CPU alternative. Co-founder Jason Park summarized the bottleneck that NeMo Curator removed: “Deduplication is one of the most time-consuming processes when handling very large datasets. The time saved from NeMo Curator GPU acceleration was the most significant benefit.”

For Trillion Labs, the operational gain extends beyond raw speed. Faster iteration cycles mean faster experiments, faster model releases, and a faster path to competitive Korean LLMs that Korean public sector institutions can adopt without routing sensitive data through foreign infrastructure. The company is continuing to scale its sovereign AI work, with NeMo Curator’s GPU pipeline as the foundation for processing the next generation of Korean training datasets.

Similar Cases

A
Allspice
20% → 97%
ingredient matching accuracy

Allspice, a food technology startup building a kitchen operating system for consumers and recipe publishers, deployed Pinecone’s vector database to solve the inherent messiness of ingredient data that traditional text search could not handle. The implementation raised ingredient matching accuracy from roughly 20% to 97%, enabling the launch of recipe importing as a core product feature and expanding into a platform-wide semantic layer for search, recommendations, and conversational AI.

TechnologyTtext-embedding-3-largePPinecone
S
Sommo
500–800
additional leads generated monthly

Sommo built an AI-powered SRS generator in Make in a single day, generating 500–800 additional leads per month and achieving a 5x increase in active website users.

TechnologyMMakeOOpenAI
C
Confluent
15,000+
hours saved monthly

Confluent, a data streaming platform company with 2,000+ employees and 4,000+ customers, deployed Glean to solve the knowledge fragmentation that came with rapid growth from 250 to 2,000+ employees across 20+ systems. Glean indexed the company's full tool stack — Slack, Salesforce, Confluence, and more — enabling instant knowledge retrieval across all teams. The result: 15,000+ hours saved monthly, a 13% increase in support team satisfaction, and over 70% employee adoption.

TechnologyGGlean
H
Headstart
90–97%
code written by claude

Headstart, an AI-native software studio, uses Claude 3.5 Sonnet to write 90-97% of client code, compressing enterprise software project timelines from months to weeks and delivering 10-100x development speed.

TechnologyC3Claude 3.5 Sonnet
M
Motive
2,000+
ai agents deployed

Motive, an AI platform for physical operations serving nearly 100,000 customers, deployed Glean across its workforce to democratize enterprise AI through unified search and agentic workflows. The company has deployed over 2,000 AI agents, cut account planning time by 75%, and reports thousands of hours saved per week across teams.

TechnologyGGleanGAGlean Agent Builder
N
Nextdoor
2–3x
engineering productivity improvement

Nextdoor, the neighborhood social network, deployed Glean as a unified Work AI layer embedded directly into the tools employees already use. Rather than mandating adoption, the team built a self-reinforcing learning loop of Slack channels, live office hours, and quick-win storytelling that turned early experimentation into company-wide AI habits — with engineering productivity gains of 2–3x and RevOps workflows shrinking from hours to minutes.

TechnologyGGlean
A
ASAPP
91%
first-call resolution rate

ASAPP is an AI-native customer service platform that orchestrates large language models to automate contact center interactions for enterprise clients. By deploying Anthropic’s Claude through Amazon Bedrock, ASAPP eliminated its homegrown PII redaction layer and reduced call escalations by up to 40%, while helping clients achieve a 91% first-call resolution rate. The platform now automates more than 90% of contact center interactions, with human agents freed to handle three times the volume of complex cases.

TechnologyABAmazon BedrockCClaude
D
Draftwise
30%
improvement in search result quality

Draftwise, an AI-powered contract drafting and negotiation platform, built its Smart Draft product on Cohere’s Command, Embed, and Rerank models to enable semantic search and AI-generated contract language grounded in clients’ own document libraries. The system delivered a 30% improvement in search result quality and tripled API usage within a single quarter.

TechnologyCCCohere CommandCRCohere Rerank