TecnologíaDesarrollo de Producto

Cómo GitHub usa Elasticsearch para llevar la búsqueda semántica a 395 millones de repositorios de código

GitHub, el mayor host de código del mundo con 180 millones de desarrolladores, desplegó Elasticsearch en Elastic Cloud para añadir búsqueda semántica en más de 395 millones de repositorios y miles de millones de documentos. El sistema gestiona consultas en lenguaje natural de desarrolladores humanos y agentes de IA, reduciendo los resultados con cero coincidencias y mejorando las tasas de clics. Un equipo de cinco o seis ingenieros gestiona toda la plataforma de búsqueda a esa escala.

Impacto

395 million+

Repositorios de código con búsqueda semántica

32x

Ratio de compresión vectorial con BBQ

5-6

Ingenieros que gestionan la plataforma de búsqueda

Desafío

La búsqueda por palabras clave de GitHub no podía gestionar las consultas en lenguaje natural que los desarrolladores hacen cada vez más, y fallaba por completo con los agentes de IA que interactúan con los datos de GitHub como clientes de primera clase.

Solución

Elasticsearch en Elastic Cloud para búsqueda semántica en miles de millones de documentos, usando embeddings vectoriales y compresión BBQ para gestionar consultas en lenguaje natural de humanos y sistemas de IA a escala, con Kibana para iterar rápidamente.

Herramientas y tecnologías

Lo que dicen los líderes

El hecho de que podamos gestionar una plataforma de búsqueda usada por cientos de millones de usuarios con un equipo de cinco o seis ingenieros es increíble.

David Tippett, Ingeniero Sénior de Búsqueda, GitHub

Con Elastic y la búsqueda semántica, nuestros usuarios pueden aprovechar al máximo el mayor recurso de código del mundo para desarrollar el futuro juntos.

David Tippett, Ingeniero Sénior de Búsqueda, GitHub
Entiende todo el contexto.

Regístrate para leer casos de estudio completos, acceder a métricas detalladas y recibir todos los reportes.

Historia completa

GitHub is where the world builds software. More than 180 million developers at 4 million organizations — including 90% of the Fortune 100 — rely on it to create, store, and share code. That means GitHub manages more than 395 million repositories and billions of documents covering source code, patch notes, discussions, and wikis.

Search is the primary way users navigate this ecosystem, but developer search behaviour was changing. Keyword search worked when users knew exactly what they were looking for — a function name, a repo identifier. It struggled with the natural-language questions developers increasingly ask, and it failed entirely when AI agents and assistants began querying GitHub data as first-class clients.

GitHub became an early adopter of Elastic for semantic search. Using Elasticsearch on Elastic Cloud, the team generates embeddings for content in the issues system and stores them in Elasticsearch. When users search, their queries are compared against stored vectors, returning results based on semantic similarity rather than keyword matching. GitHub adopted BBQ (Better Binary Quantization) as soon as it became production-ready: compressing high-dimensional vectors 32x, it dramatically reduces memory footprint and query latency at scale while maintaining retrieval quality through oversampling and rescoring.

The impact was measurable: zero-hit search results dropped significantly and click-through rates improved, confirming users were finding relevant results faster. The Kibana Dev Tools Console made it easier for GitHub’s software engineers — most of whom are not search specialists — to explore, test, and fine-tune queries before hitting production. The team built a repeatable onboarding pipeline so internal teams can adopt semantic search themselves with a single click.

The most striking result is operational efficiency: a team of five to six engineers runs a search platform used by hundreds of millions of users. GitHub is now working with Elastic on user behaviour insights (UBI) to quantify how search adjustments improve outcomes, pushing the platform toward continuous, data-driven improvement.

Casos similares

C
Cypris
Weeks → 15 minutes
research report generation time

Cypris is an AI-powered R&D intelligence platform that enables teams to analyze over 500 million technical and market data points—patents, scientific literature, funding data, and news—in seconds. The company built its core RAG architecture on Elasticsearch for vector search and semantic retrieval, replacing a problematic prior search provider. The platform now generates detailed research reports in 15 minutes rather than weeks, supports 30% quarterly enterprise customer growth, and manages more than 10 terabytes of indexed data without scalability constraints.

TechnologyECElastic CloudEElasticsearch
L
Lusha
300%
increase in outbound leads

Lusha is a B2B sales intelligence platform with 1.5 million users and a database of over 200 million business contacts. By deploying Elasticsearch as both a full-text search engine and a vector database for AI-powered lead recommendations, Lusha helps customers generate 300% more leads, achieve conversion rates up to 10x higher, and realize return on investment of up to 1,000%.

TechnologyEElasticsearch
A
Apna
20%
increase in employers paying for premium access

Apna, India’s largest jobs and professional networking platform with 50 million registered users and 600,000 employers, built its candidate search and AI job matching infrastructure on Elasticsearch running on Elastic Cloud on Google Cloud. Semantic search capabilities allow employers to find candidates by intent—not just keywords—while AI algorithms analyze candidate profiles to surface the most relevant matches. The result: a 20% increase in employers paying for premium access, 20% higher platform team productivity, and a 50% improvement in employee productivity.

TechnologyEElasticsearch
WE
WP Engine
~5 milliseconds
search query response time

WP Engine, the leading WordPress hosting platform serving more than 1.5 million users across 200,000 websites in 150+ countries, deployed Elastic’s Search AI Platform alongside Google Cloud Vertex AI and Gemini to build Smart Search AI and enable retrieval-augmented generation (RAG) capabilities for its customers. The integration allows WP Engine to deliver natural language search, context-aware product recommendations, and AI-powered chatbots to website owners without requiring them to stitch together multiple vendors. Response times dropped to as low as five milliseconds, and the platform handled traffic spikes from hundreds of thousands to tens of millions of queries per minute with zero downtime.

TechnologyEElasticsearchGVGoogle Vertex AI
CA
Contextual AI
90%+
rag accuracy achieved in production

Contextual AI is an enterprise AI platform company that specializes in production-ready Retrieval Augmented Generation systems for complex knowledge tasks. The company built its context engineering platform on Elasticsearch, using hybrid search combining BM25 and vector search to power accurate, scalable AI agents for enterprise customers. With this foundation, Contextual AI’s agents achieve over 90% accuracy on demanding production tasks—well above the 65–75% range typical of traditional RAG approaches.

TechnologyEElasticsearch
D
Docusign
Under 1 minute
document retrieval time

Docusign, the Intelligent Agreement Management (IAM) platform serving 1.6 million customers and over 1 billion users across 180 countries, built its AI-powered Navigator repository on Elasticsearch to index and search billions of agreements in real time. The deployment enables customers to find specific documents in under a minute—tasks that previously took hours—while handling millions of new agreements added to the platform each day.

TechnologyEElasticsearchMAMicrosoft Azure
F
Flockx
10x
search response time improvement

Flockx is a social discovery startup that uses AI agents to help people find events, local communities, and like-minded individuals in their area. The company built its core platform on Elasticsearch, using semantic search, RAG, and Elastic Observability to power personalized recommendations and real-time operations. Search response times dropped from hundreds of milliseconds to tens of milliseconds, a 10x improvement, while infrastructure deployment time shrunk from months to days.

TechnologyGCGoogle CloudEAElastic AI Assistant
FA
Fiber AI
$1M+ ARR
annual recurring revenue at launch

Fiber AI is a Y Combinator-backed startup that automates outbound sales prospecting, drawing on a database of 850 million LinkedIn profiles, 40 million companies, and 13 million job postings. The company built its search infrastructure on Elasticsearch, which now searches across a billion rows in under one second. Within six months of launch, Fiber AI reached $1M in annual recurring revenue while operating with a team of eight people.

TechnologyEElasticsearch