What AI tools did Trillion Labs use?

Trillion Labs used NVIDIA NeMo, NVIDIA NeMo Curator in this implementation within the Technology sector.

What business function does this AI use case address?

This use case focuses on Software Engineering, demonstrating how AI can drive impact in that area.

TechnologySoftware Engineering

How Trillion Labs Cuts LLM Training Time 7x with NVIDIA NeMo Curator

Trillion Labs, a Korean AI startup building sovereign LLMs for the Korean language, deployed NVIDIA NeMo Curator to accelerate data curation across 2 trillion tokens. GPU-accelerated processing on 8x H100s cut processing time from 24 hours to 3.4 hours — a 7x improvement — and reduced compute costs up to 10x compared to CPU pipelines, while delivering a 5% accuracy boost for Korean language models.

Outcomes

7xData processing speedup

up to 10xCompute cost reduction vs CPU

5%Korean language accuracy improvement

Tools & Technologies

NVIDIA NeMo

Open-source framework for training, fine-tuning, and deploying large language models at scale.

NVIDIA NeMo Curator

GPU-accelerated data curation library for deduplication, filtering, and preprocessing LLM training datasets.

AI Categories

ML Platform

Challenge

Trillion Labs’ CPU-based data curation pipeline for Korean LLM training took 24 hours per run on datasets exceeding 2 trillion tokens, creating iteration bottlenecks that slowed model development and made rapid experimentation on high-quality Korean language data practically impossible.

Solution

Trillion Labs deployed NVIDIA NeMo Curator on 8x H100 GPUs with DASK for parallel processing, GPU-accelerating deduplication, quality filtering, and data shuffling across 100 billion curated Korean tokens — cutting processing time from 24 hours to 3.4 hours and reducing compute costs up to 10x.

Full Story

Trillion Labs is a Korean AI startup dedicated to building sovereign large language models for the Korean language. Its mission is to close the gap between English-dominant foundation models and the needs of Korean public sector organizations and enterprises, which require LLMs that understand Korean linguistic nuance, government terminology, and cultural context. Building high-quality Korean LLMs at scale requires curation pipelines capable of processing datasets exceeding 2 trillion tokens — volumes that expose every inefficiency in traditional CPU-based workflows.

Access 451+ AI use cases, 425+ tools, and adoption signal rankings.

Source

NVIDIA

May 2026

Original case study ↗

Similar Cases

How Rakuten Uses Claude Code to Cut Feature Delivery from 24 to 5 Days

Rakuten

79%

Reduction in average time to market for new features

79%Reduction in average time to market for new features

How Palo Alto Networks Saves 351K Hours with Moveworks AI

Palo Alto Networks

351,000 hours

Employee productivity hours saved

351,000 hoursEmployee productivity hours saved

How Hostinger Uses Claude to Build Websites from Natural Language

Hostinger

Minutes vs. days

Website creation time

Minutes vs. daysWebsite creation time

How Notion Built Agent Orchestration on Claude to Cut Costs 90%

Notion

90%

Infrastructure cost reduction via prompt caching

90%Infrastructure cost reduction via prompt caching

How Anything Uses Claude to Power a No-Code App Builder for 1.5M Users

Anything

800,000+

Apps created by users

800,000+Apps created by users

How Jamf Uses Claude to Automate Workflows Across 16 Departments

Jamf

Under 45 minutes

Performance review skill build time

Under 45 minutesPerformance review skill build time

How Cognition Tripled Merged PRs Per Week Using Claude to Power Devin, Its Autonomous AI Engineer

Cognition

3.5×

Increase in merged PRs per week after adopting Claude Sonnet 3.6

3.5×Increase in merged PRs per week after adopting Claude Sonnet 3.6

Pfizer Migrates to SAP S/4HANA on IBM Power10

Pfizer

93%

Database reduction

93%Database reduction

How Motive Uses Glean to Deploy 2,000+ AI Agents and Save Thousands of Hours

Motive

2,000+

AI agents deployed

2,000+AI agents deployed

How OpenTable Uses Agentforce to Resolve 70% of Customer Inquiries

OpenTable

70%

Diner and restaurant inquiries resolved autonomously

70%Diner and restaurant inquiries resolved autonomously

See all use cases →