How Trillion Labs Cuts LLM Training Time 7x with NVIDIA NeMo Curator
Trillion Labs, a Korean AI startup building sovereign LLMs for the Korean language, deployed NVIDIA NeMo Curator to accelerate data curation across 2 trillion tokens. GPU-accelerated processing on 8x H100s cut processing time from 24 hours to 3.4 hours — a 7x improvement — and reduced compute costs up to 10x compared to CPU pipelines, while delivering a 5% accuracy boost for Korean language models.
Impact
7x
Data processing speedup
up to 10x
Compute cost reduction vs CPU
5%
Korean language accuracy improvement
Challenge
Trillion Labs’ CPU-based data curation pipeline for Korean LLM training took 24 hours per run on datasets exceeding 2 trillion tokens, creating iteration bottlenecks that slowed model development and made rapid experimentation on high-quality Korean language data practically impossible.
Solution
Trillion Labs deployed NVIDIA NeMo Curator on 8x H100 GPUs with DASK for parallel processing, GPU-accelerating deduplication, quality filtering, and data shuffling across 100 billion curated Korean tokens — cutting processing time from 24 hours to 3.4 hours and reducing compute costs up to 10x.
Tools & Technologies
What Leaders Say
“Deduplication is one of the most time-consuming processes when handling very large datasets. The time saved from NeMo Curator GPU acceleration was the most significant benefit.”
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.
Full Story
Trillion Labs is a Korean AI startup dedicated to building sovereign large language models for the Korean language. Its mission is to close the gap between English-dominant foundation models and the needs of Korean public sector organizations and enterprises, which require LLMs that understand Korean linguistic nuance, government terminology, and cultural context. Building high-quality Korean LLMs at scale requires curation pipelines capable of processing datasets exceeding 2 trillion tokens — volumes that expose every inefficiency in traditional CPU-based workflows.
The core problem was data pipeline throughput. Deduplication and shuffling operations on datasets at this scale took 24 hours per run on CPU infrastructure. Iteration cycles became prohibitively slow: every experiment in model architecture or data composition required waiting nearly a full day to simply complete preprocessing. This bottleneck made it impossible to move quickly on model development, creating a compounding disadvantage relative to resource-rich competitors working on high-resource language models.
Trilion Labs deployed NVIDIA NeMo Curator, a GPU-accelerated data curation library, running on a cluster of 8x H100 GPUs with DASK for parallel and distributed processing. NeMo Curator’s GPU acceleration applied to the most compute-intensive steps — exact and fuzzy deduplication, quality filtering, and data shuffling — across a curated dataset of 100 billion high-quality Korean tokens. NVIDIA NeMo provided the broader framework for model training and evaluation downstream.
The processing speedup was immediate and dramatic. Curation pipelines that previously ran for 24 hours completed in 3.4 hours on the 8x H100 configuration — a 7x improvement in throughput. Energy and compute costs dropped up to 10x compared to CPU-based approaches. The curated 100-billion-token Korean dataset produced a measurable 5% accuracy improvement in Korean language tasks, validating the quality of GPU-accelerated curation against the brute-force CPU alternative. Co-founder Jason Park summarized the bottleneck that NeMo Curator removed: “Deduplication is one of the most time-consuming processes when handling very large datasets. The time saved from NeMo Curator GPU acceleration was the most significant benefit.”
For Trillion Labs, the operational gain extends beyond raw speed. Faster iteration cycles mean faster experiments, faster model releases, and a faster path to competitive Korean LLMs that Korean public sector institutions can adopt without routing sensitive data through foreign infrastructure. The company is continuing to scale its sovereign AI work, with NeMo Curator’s GPU pipeline as the foundation for processing the next generation of Korean training datasets.