How Delphi Scales to 100M+ Vectors at 100ms Latency with Pinecone

Delphi is an AI platform that enables coaches, creators, and experts to deploy interactive “Digital Minds”—always-on conversational agents trained on their unique content. Scaling from proof of concept to a commercial platform with thousands of customers required a vector database that could support millions of isolated namespaces, billions of vectors, and sub-second retrieval under variable load. Delphi selected Pinecone, achieving P95 query latency of 100ms and keeping retrieval under 30% of total response time—freeing the engineering team to build product rather than manage infrastructure.

Impact

>100M

Vectors stored

100ms

P95 query latency

<30%

Share of response time on retrieval

Challenge

Delphi’s open-source vector databases couldn’t support the millions of isolated namespaces, predictable sub-second latency, and seamless scaling required to serve thousands of simultaneous Digital Mind conversations without engineering overhead.

Solution

Delphi deployed Pinecone as its fully managed vector database, assigning each Digital Mind its own namespace for data isolation and SOC 2 compliance, achieving 100ms P95 latency across 100M+ vectors without any infrastructure management.

Tools & Technologies

Get the full story.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

Delphi is building a new category of AI product: personalized knowledge agents that let coaches, experts, and creators scale their expertise to unlimited conversations. Each “Digital Mind” is a distinct agent trained on a creator’s books, podcasts, videos, and social posts, capable of having meaningful real-time conversations with end users. The product’s value depends entirely on retrieval quality and speed—every millisecond of latency risks disrupting live conversations.

As Delphi moved from early prototype to commercial platform, three infrastructure problems surfaced with open-source vector databases. First, HNSW-based indexes grew unboundedly as content scaled, making predictable retrieval impossible. Second, approximate nearest neighbor searches degraded under concurrent load—threatening the 1-second end-to-end latency target required for live phone and video interactions. Third, hard caps on partition counts blocked scaling beyond initial capacity without complex re-architecture. Each new creator added operational complexity rather than simply adding data.

Delphi selected Pinecone to replace its open-source vector infrastructure. Each Digital Mind’s content lives in its own Pinecone namespace, providing natural data isolation and simplifying compliance with enterprise privacy requirements including SOC 2. Pinecone’s fully managed, cloud-native architecture eliminated the operational burden entirely: no index tuning, no sharding logic, no capacity planning. As new creators onboard and usage spikes around live events, the database scales automatically.

The performance numbers are concrete: Delphi now stores over 100 million vectors across thousands of customers, with P95 query latency at 100ms. Retrieval accounts for less than 30% of total response time—leaving the remaining budget for LLM generation and delivery. The engineering team, which is small and growing, focuses on product features rather than database maintenance.

Delphi’s architecture is a blueprint for AI-native companies building multi-tenant agent platforms. The combination of namespace isolation, managed scaling, and enterprise security compliance makes Pinecone the infrastructure layer that allows Delphi to onboard creators at any scale without re-architecting for each growth milestone.

Similar Cases

N
Notion
Millions
notion ai users reached

Notion, the connected workspace platform used by millions worldwide, integrated Cohere Rerank into its search pipeline to power Notion AI’s search accuracy across multilingual enterprise workspaces. Every search and Notion AI interaction now routes through Cohere Rerank, delivering dramatically improved relevance while cutting the cost and complexity of embedding-based retrieval for smaller workspaces.

TechnologyCRCohere Rerank
F
Fujitsu
World-class score
jglue benchmark performance

Fujitsu, the global IT and digital transformation company with 124,000 employees, partnered with Cohere to develop Takane — a state-of-the-art Japanese large language model built on the Cohere Command series. Designed for private deployment in regulated sectors such as finance, healthcare, and government, Takane delivers world-class performance on the JGLUE benchmark and is now integrated into Fujitsu’s AI service offerings and data intelligence platform.

TechnologyCCCohere Command
PA
Palo Alto Networks
351,000 hours
employee productivity hours saved

Palo Alto Networks, the global cybersecurity leader with nearly 15,000 employees, deployed Moveworks as an AI Assistant named Sheldon to deliver autonomous support across Slack, email, and ServiceNow. The platform resolves 4,000 IT and HR issues per month while saving 351,000 employee hours, enabling the company to scale its hybrid FLEXWORK model without adding headcount.

TechnologyMMoveworks
PS
Pure Storage
30+ minutes
time saved per search

Pure Storage, a Santa Clara-based enterprise data storage company, deployed Glean to unify knowledge access across Jira, GitHub, and internal wikis for teams spanning engineering, legal, and customer support. The AI-powered search platform cuts information-retrieval time by more than 30 minutes per search and enables employees to build custom GenAI applications in as little as 5 minutes, while boosting overall employee satisfaction scores by 39 points.

TechnologyGGlean
C
CoreWeave
2–5 days (down from 4–8 days)
mean time to resolution

CoreWeave, a global AI cloud provider serving top AI labs and enterprises, deployed Cohere’s North agentic AI platform to overhaul its Slack-based customer support workflow in 90 days. North automated ticket triage, context gathering, and routing recommendations, cutting mean resolution time from 4–8 days to 2–5 days while sustaining customer satisfaction scores between 4.9 and 5.0.

TechnologyCNCohere North
S
Salesforce
20%
productivity increase

Salesforce, the world’s leading CRM company, deployed Writer across more than 3,000 employees spanning marketing, communications, product, and customer success. Using Writer’s AI Studio no-code builder and Knowledge Graph RAG, teams create and launch custom agents in minutes without engineering support. Users report a 20% productivity gain—equivalent to reclaiming one full workday per week—with 78% saying the platform positively affects their daily work.

TechnologyWWriter
FD
Fifth Dimension
50x
document processing capacity increase

Fifth Dimension, a UK-based AI analytics company serving the real estate industry, migrated to Google Cloud to overcome critical infrastructure bottlenecks. By adopting Vertex AI, Cloud Run, and serverless architecture, the company achieved 50x processing scalability, 6x revenue growth, and a 30% reduction in infrastructure costs — all within a rapid growth trajectory from founding in 2023 to global scale by 2025.

TechnologyVAVertex AIPPub/Sub
A
Adobe
30%
faster case resolutions

Adobe deployed the ServiceNow AI Platform across IT, HR, security, and workplace operations to streamline employee experiences for over 30,000 staff. Generative AI tools like Now Assist help more than 8,000 IT and HR team members resolve cases faster, reduce outage recovery time, and automate email triage. The result is a measurably faster, more connected workforce that frees employees to focus on high-value creative work.

TechnologyNANow AssistSAServiceNow AI Experience