How GitHub Uses Elasticsearch to Bring Semantic Search to 395 Million Code Repositories
GitHub, the world’s largest code host serving 180 million developers, deployed Elasticsearch on Elastic Cloud to add semantic search across more than 395 million repositories and billions of documents. The system handles natural-language queries from both human developers and AI agents, dramatically reducing zero-hit search results and improving click-through rates. A team of five to six engineers runs the entire search platform at that scale, with BBQ vector compression reducing infrastructure costs 32x.
Impact
395 million+
Code repositories searchable with semantic search
32x
Vector compression ratio with BBQ
5-6
Engineers running the search platform
Challenge
GitHub’s keyword-based search failed to handle the natural-language queries developers increasingly use, and broke entirely for AI agents and assistants that interact with GitHub data as first-class clients, leaving users with zero-hit results.
Solution
Elasticsearch on Elastic Cloud was deployed to power semantic search across billions of documents, using vector embeddings and BBQ compression to handle natural-language queries from humans and AI systems at scale, with Kibana enabling the engineering team to iterate quickly.
Tools & Technologies
What Leaders Say
“The fact that we can run a search platform used by hundreds of millions of users with a team of about five or six engineers is mind-blowing.”
“With Elastic and semantic search, our users can take full advantage of the largest code resource in the world to develop the future together.”
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.
Full Story
GitHub is where the world builds software. More than 180 million developers at 4 million organizations — including 90% of the Fortune 100 — rely on it to create, store, and share code. That means GitHub manages more than 395 million repositories and billions of documents covering source code, patch notes, discussions, and wikis.
Search is the primary way users navigate this ecosystem, but developer search behaviour was changing. Keyword search worked when users knew exactly what they were looking for — a function name, a repo identifier. It struggled with the natural-language questions developers increasingly ask, and it failed entirely when AI agents and assistants began querying GitHub data as first-class clients.
GitHub became an early adopter of Elastic for semantic search. Using Elasticsearch on Elastic Cloud, the team generates embeddings for content in the issues system and stores them in Elasticsearch. When users search, their queries are compared against stored vectors, returning results based on semantic similarity rather than keyword matching. GitHub adopted BBQ (Better Binary Quantization) as soon as it became production-ready: compressing high-dimensional vectors 32x, it dramatically reduces memory footprint and query latency at scale while maintaining retrieval quality through oversampling and rescoring.
The impact was measurable: zero-hit search results dropped significantly and click-through rates improved, confirming users were finding relevant results faster. The Kibana Dev Tools Console made it easier for GitHub’s software engineers — most of whom are not search specialists — to explore, test, and fine-tune queries before hitting production. The team built a repeatable onboarding pipeline so internal teams can adopt semantic search themselves with a single click.
The most striking result is operational efficiency: a team of five to six engineers runs a search platform used by hundreds of millions of users. GitHub is now working with Elastic on user behaviour insights (UBI) to quantify how search adjustments improve outcomes, pushing the platform toward continuous, data-driven improvement.