TechnologySoftware Engineering

How Contextual AI Uses Elasticsearch to Achieve 90%+ RAG Accuracy at Scale

Contextual AI is an enterprise AI platform company that specializes in production-ready Retrieval Augmented Generation systems for complex knowledge tasks. The company built its context engineering platform on Elasticsearch, using hybrid search combining BM25 and vector search to power accurate, scalable AI agents for enterprise customers. With this foundation, Contextual AI’s agents achieve over 90% accuracy on demanding production tasks—well above the 65–75% range typical of traditional RAG approaches.

Impact

90%+

RAG accuracy achieved in production

22 million chunks

Largest single data repository indexed

60,000+

Documents in largest repository

Challenge

Enterprise AI deployments built on fragmented open-source RAG components typically plateau at 65–75% accuracy—inadequate for production use cases in compliance, knowledge management, or customer support where errors carry real business cost. Managing separate vector and keyword search systems added engineering overhead and made it difficult to maintain consistency between research and production environments.

Solution

Contextual AI built its context engineering platform on Elasticsearch, using Elastic’s native hybrid search capability to run BM25 keyword and vector similarity queries through a single API. This unified approach enabled the team to handle repositories of up to 22 million chunks, align research and production environments, and support multi-cloud and on-premises deployments for enterprise customers.

Tools & Technologies

What Leaders Say

Elastic’s comprehensive support for BM25, combined with its vector search capabilities within the same database, means we can conduct both types of searches simultaneously without the complexity of managing separate services.

Junaid Saiyed, Head of Engineering, Product & Design, Contextual AI

A significant advantage for us is that our platform team also uses Elasticsearch as their deployment solution. This ensures alignment between the research and platform environments.

Gurnoor Singh Khurana, Member of Technical Staff, Contextual AI

Ultimately, the versatility of Elasticsearch is a significant asset. It provides us with sales flexibility and the agility to rapidly accommodate novel deployment requirements from our customers.

Junaid Saiyed, Head of Engineering, Product & Design, Contextual AI
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

Contextual AI was founded to solve one of enterprise AI’s most stubborn problems: getting large language models to reliably answer questions using proprietary company data. The company’s context engineering platform sits between raw enterprise data and LLMs, providing the retrieval, reranking, and grounded generation needed to make AI agents accurate enough for high-value production use cases. Customers deploy these agents for tasks like enterprise knowledge search, compliance and risk analysis, and customer support—workflows where a 70% accuracy rate is simply not acceptable.

Before adopting Elasticsearch as the core of their data infrastructure, most RAG deployments stitched together open-source tools to handle keyword and semantic search separately. This fragmentation created engineering complexity, made it difficult to scale, and left accuracy stuck in a 65–75% range even after fine-tuning. For Contextual AI, building a platform that enterprise customers could trust in production required a different foundation—one that could handle massive, multimodal document collections while supporting both search paradigms in a single system.

Contextual AI built its platform on Elasticsearch, taking advantage of two key capabilities. First, Elastic’s support for BM25 keyword search and vector similarity search within the same database eliminated the need to manage separate services. The multi search API allows the platform to run hybrid queries in a single call, streamlining engineering workflows and reducing latency. Second, Elastic’s vector database handles the company’s most demanding data repositories: its largest single index contains approximately 14 million chunks sourced from more than 60,000 documents, much of it unstructured and multimodal—PDFs, HTML files, and documents containing images, tables, schematics, and charts. The research team also uses Elasticsearch to evaluate embedding models in real-world conditions before promoting them to production.

The result is an accuracy floor that competitors cannot match. Contextual AI’s agents consistently achieve 90%+ accuracy on complex knowledge tasks, compared to the 65–75% typical of conventional RAG. One of the most significant advantages comes from alignment between research and production: because both the research and platform teams use Elasticsearch, techniques proven in testing translate directly to deployed systems without integration risk. This consistency accelerates iteration and gives enterprise customers confidence that what they saw in evaluation is what they get in production.

Contextual AI’s deployment model spans Google Cloud as its primary environment, with the ability to extend to AWS and Azure regions—or to customer-controlled on-premises and VPC environments for clients with strict data compliance requirements. This flexibility, underpinned by Elasticsearch’s multi-cloud support and self-hosting capabilities, has become a commercial differentiator as more enterprises impose data sovereignty requirements on AI vendors. The company sees its Elastic-based architecture as foundational to scaling into increasingly complex enterprise deployments.

Similar Cases

L
Lusha
300%
increase in outbound leads

Lusha is a B2B sales intelligence platform with 1.5 million users and a database of over 200 million business contacts. By deploying Elasticsearch as both a full-text search engine and a vector database for AI-powered lead recommendations, Lusha helps customers generate 300% more leads, achieve conversion rates up to 10x higher, and realize return on investment of up to 1,000%.

TechnologyEElasticsearch
A
Apna
20%
increase in employers paying for premium access

Apna, India’s largest jobs and professional networking platform with 50 million registered users and 600,000 employers, built its candidate search and AI job matching infrastructure on Elasticsearch running on Elastic Cloud on Google Cloud. Semantic search capabilities allow employers to find candidates by intent—not just keywords—while AI algorithms analyze candidate profiles to surface the most relevant matches. The result: a 20% increase in employers paying for premium access, 20% higher platform team productivity, and a 50% improvement in employee productivity.

TechnologyEElasticsearch
WE
WP Engine
~5 milliseconds
search query response time

WP Engine, the leading WordPress hosting platform serving more than 1.5 million users across 200,000 websites in 150+ countries, deployed Elastic’s Search AI Platform alongside Google Cloud Vertex AI and Gemini to build Smart Search AI and enable retrieval-augmented generation (RAG) capabilities for its customers. The integration allows WP Engine to deliver natural language search, context-aware product recommendations, and AI-powered chatbots to website owners without requiring them to stitch together multiple vendors. Response times dropped to as low as five milliseconds, and the platform handled traffic spikes from hundreds of thousands to tens of millions of queries per minute with zero downtime.

TechnologyEElasticsearchGVGoogle Vertex AI
P
Pfizer
93%
database reduction

Pfizer achieved a 93% database reduction and 20% cost avoidance by migrating their global SAP environment to S/4HANA on IBM Power10 infrastructure.

PharmaceuticalsTechnologyICIBM ConsultingIPIBM Power Virtual Server
J
Jamf
Under 45 minutes
performance review skill build time

Jamf deployed Claude Enterprise across 16 departments, then built interactive workflow skills using Claude Cowork that transformed manual spreadsheet-based processes into guided, conversational experiences. Performance reviews that previously required months of effort are now built in under 45 minutes, and non-engineering teams independently create custom data dashboards.

TechnologyCEClaude EnterpriseCCClaude Cowork
C
Confluent
15,000+
hours saved monthly

Confluent, a data streaming platform company with 2,000+ employees and 4,000+ customers, deployed Glean to solve the knowledge fragmentation that came with rapid growth from 250 to 2,000+ employees across 20+ systems. Glean indexed the company's full tool stack — Slack, Salesforce, Confluence, and more — enabling instant knowledge retrieval across all teams. The result: 15,000+ hours saved monthly, a 13% increase in support team satisfaction, and over 70% employee adoption.

TechnologyGGlean
C
Classmethod
up to 90%
reduction in development time

Classmethod, a leading Japanese cloud integrator, deployed Claude Code across its engineering teams to address chronic developer shortages. The tool automated code generation, review, and testing workflows, reducing development time by up to 90% on specific tasks and cutting code review time by 80%.

TechnologyCCClaude Code
A
Aquant
98%+
retrieval accuracy

Aquant is an agentic AI platform purpose-built for professionals servicing complex industrial and medical equipment at large manufacturing companies. When the company’s homegrown vector search infrastructure—built on PostgreSQL extensions—began to slow under real-time production demands, Aquant migrated to Pinecone as the retrieval backbone for its AI platform. The switch delivered sub-100ms semantic search, pushed retrieval accuracy above 98%, and helped Aquant’s customers cut average service resolution time by 49%.

TechnologyPPinecone