TechnologyProduct Development

How GitHub Uses Elasticsearch to Bring Semantic Search to 395 Million Code Repositories

GitHub, the world’s largest code host serving 180 million developers, deployed Elasticsearch on Elastic Cloud to add semantic search across more than 395 million repositories and billions of documents. The system handles natural-language queries from both human developers and AI agents, dramatically reducing zero-hit search results and improving click-through rates. A team of five to six engineers runs the entire search platform at that scale, with BBQ vector compression reducing infrastructure costs 32x.

Impact

395 million+

Code repositories searchable with semantic search

32x

Vector compression ratio with BBQ

5-6

Engineers running the search platform

Challenge

GitHub’s keyword-based search failed to handle the natural-language queries developers increasingly use, and broke entirely for AI agents and assistants that interact with GitHub data as first-class clients, leaving users with zero-hit results.

Solution

Elasticsearch on Elastic Cloud was deployed to power semantic search across billions of documents, using vector embeddings and BBQ compression to handle natural-language queries from humans and AI systems at scale, with Kibana enabling the engineering team to iterate quickly.

Tools & Technologies

What Leaders Say

The fact that we can run a search platform used by hundreds of millions of users with a team of about five or six engineers is mind-blowing.

David Tippett, Senior Search Engineer, GitHub

With Elastic and semantic search, our users can take full advantage of the largest code resource in the world to develop the future together.

David Tippett, Senior Search Engineer, GitHub
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

GitHub is where the world builds software. More than 180 million developers at 4 million organizations — including 90% of the Fortune 100 — rely on it to create, store, and share code. That means GitHub manages more than 395 million repositories and billions of documents covering source code, patch notes, discussions, and wikis.

Search is the primary way users navigate this ecosystem, but developer search behaviour was changing. Keyword search worked when users knew exactly what they were looking for — a function name, a repo identifier. It struggled with the natural-language questions developers increasingly ask, and it failed entirely when AI agents and assistants began querying GitHub data as first-class clients.

GitHub became an early adopter of Elastic for semantic search. Using Elasticsearch on Elastic Cloud, the team generates embeddings for content in the issues system and stores them in Elasticsearch. When users search, their queries are compared against stored vectors, returning results based on semantic similarity rather than keyword matching. GitHub adopted BBQ (Better Binary Quantization) as soon as it became production-ready: compressing high-dimensional vectors 32x, it dramatically reduces memory footprint and query latency at scale while maintaining retrieval quality through oversampling and rescoring.

The impact was measurable: zero-hit search results dropped significantly and click-through rates improved, confirming users were finding relevant results faster. The Kibana Dev Tools Console made it easier for GitHub’s software engineers — most of whom are not search specialists — to explore, test, and fine-tune queries before hitting production. The team built a repeatable onboarding pipeline so internal teams can adopt semantic search themselves with a single click.

The most striking result is operational efficiency: a team of five to six engineers runs a search platform used by hundreds of millions of users. GitHub is now working with Elastic on user behaviour insights (UBI) to quantify how search adjustments improve outcomes, pushing the platform toward continuous, data-driven improvement.

Similar Cases

C
Cypris
Weeks → 15 minutes
research report generation time

Cypris is an AI-powered R&D intelligence platform that enables teams to analyze over 500 million technical and market data points—patents, scientific literature, funding data, and news—in seconds. The company built its core RAG architecture on Elasticsearch for vector search and semantic retrieval, replacing a problematic prior search provider. The platform now generates detailed research reports in 15 minutes rather than weeks, supports 30% quarterly enterprise customer growth, and manages more than 10 terabytes of indexed data without scalability constraints.

TechnologyECElastic CloudEElasticsearch
L
Lusha
300%
increase in outbound leads

Lusha is a B2B sales intelligence platform with 1.5 million users and a database of over 200 million business contacts. By deploying Elasticsearch as both a full-text search engine and a vector database for AI-powered lead recommendations, Lusha helps customers generate 300% more leads, achieve conversion rates up to 10x higher, and realize return on investment of up to 1,000%.

TechnologyEElasticsearch
A
Apna
20%
increase in employers paying for premium access

Apna, India’s largest jobs and professional networking platform with 50 million registered users and 600,000 employers, built its candidate search and AI job matching infrastructure on Elasticsearch running on Elastic Cloud on Google Cloud. Semantic search capabilities allow employers to find candidates by intent—not just keywords—while AI algorithms analyze candidate profiles to surface the most relevant matches. The result: a 20% increase in employers paying for premium access, 20% higher platform team productivity, and a 50% improvement in employee productivity.

TechnologyEElasticsearch
WE
WP Engine
~5 milliseconds
search query response time

WP Engine, the leading WordPress hosting platform serving more than 1.5 million users across 200,000 websites in 150+ countries, deployed Elastic’s Search AI Platform alongside Google Cloud Vertex AI and Gemini to build Smart Search AI and enable retrieval-augmented generation (RAG) capabilities for its customers. The integration allows WP Engine to deliver natural language search, context-aware product recommendations, and AI-powered chatbots to website owners without requiring them to stitch together multiple vendors. Response times dropped to as low as five milliseconds, and the platform handled traffic spikes from hundreds of thousands to tens of millions of queries per minute with zero downtime.

TechnologyEElasticsearchGVGoogle Vertex AI
CA
Contextual AI
90%+
rag accuracy achieved in production

Contextual AI is an enterprise AI platform company that specializes in production-ready Retrieval Augmented Generation systems for complex knowledge tasks. The company built its context engineering platform on Elasticsearch, using hybrid search combining BM25 and vector search to power accurate, scalable AI agents for enterprise customers. With this foundation, Contextual AI’s agents achieve over 90% accuracy on demanding production tasks—well above the 65–75% range typical of traditional RAG approaches.

TechnologyEElasticsearch
D
Docusign
Under 1 minute
document retrieval time

Docusign, the Intelligent Agreement Management (IAM) platform serving 1.6 million customers and over 1 billion users across 180 countries, built its AI-powered Navigator repository on Elasticsearch to index and search billions of agreements in real time. The deployment enables customers to find specific documents in under a minute—tasks that previously took hours—while handling millions of new agreements added to the platform each day.

TechnologyEElasticsearchMAMicrosoft Azure
F
Flockx
10x
search response time improvement

Flockx is a social discovery startup that uses AI agents to help people find events, local communities, and like-minded individuals in their area. The company built its core platform on Elasticsearch, using semantic search, RAG, and Elastic Observability to power personalized recommendations and real-time operations. Search response times dropped from hundreds of milliseconds to tens of milliseconds, a 10x improvement, while infrastructure deployment time shrunk from months to days.

TechnologyGCGoogle CloudEAElastic AI Assistant
FA
Fiber AI
$1M+ ARR
annual recurring revenue at launch

Fiber AI is a Y Combinator-backed startup that automates outbound sales prospecting, drawing on a database of 850 million LinkedIn profiles, 40 million companies, and 13 million job postings. The company built its search infrastructure on Elasticsearch, which now searches across a billion rows in under one second. Within six months of launch, Fiber AI reached $1M in annual recurring revenue while operating with a team of eight people.

TechnologyEElasticsearch