What AI tools did Delphi use?

Delphi used Pinecone in this implementation within the Technology sector.

What business function does this AI use case address?

This use case focuses on Software Engineering, demonstrating how AI can drive impact in that area.

TechnologySoftware Engineering

How Delphi Scales to 100M+ Vectors at 100ms Latency with Pinecone

Delphi Source: PINECONE ↗January 2025

Delphi is an AI platform that enables coaches, creators, and experts to deploy interactive “Digital Minds”—always-on conversational agents trained on their unique content. Scaling from proof of concept to a commercial platform with thousands of customers required a vector database that could support millions of isolated namespaces, billions of vectors, and sub-second retrieval under variable load. Delphi selected Pinecone, achieving P95 query latency of 100ms and keeping retrieval under 30% of total response time—freeing the engineering team to build product rather than manage infrastructure.

Impact

>100M

Vectors stored

100ms

P95 query latency

<30%

Share of response time on retrieval

Challenge

Delphi’s open-source vector databases couldn’t support the millions of isolated namespaces, predictable sub-second latency, and seamless scaling required to serve thousands of simultaneous Digital Mind conversations without engineering overhead.

Solution

Delphi deployed Pinecone as its fully managed vector database, assigning each Digital Mind its own namespace for data isolation and SOC 2 compliance, achieving 100ms P95 latency across 100M+ vectors without any infrastructure management.

Tools & Technologies

Pinecone Search & Vector Database

Get the full context.

Full Story

Delphi is building a new category of AI product: personalized knowledge agents that let coaches, experts, and creators scale their expertise to unlimited conversations. Each “Digital Mind” is a distinct agent trained on a creator’s books, podcasts, videos, and social posts, capable of having meaningful real-time conversations with end users. The product’s value depends entirely on retrieval quality and speed—every millisecond of latency risks disrupting live conversations.

As Delphi moved from early prototype to commercial platform, three infrastructure problems surfaced with open-source vector databases. First, HNSW-based indexes grew unboundedly as content scaled, making predictable retrieval impossible. Second, approximate nearest neighbor searches degraded under concurrent load—threatening the 1-second end-to-end latency target required for live phone and video interactions. Third, hard caps on partition counts blocked scaling beyond initial capacity without complex re-architecture. Each new creator added operational complexity rather than simply adding data.

Delphi selected Pinecone to replace its open-source vector infrastructure. Each Digital Mind’s content lives in its own Pinecone namespace, providing natural data isolation and simplifying compliance with enterprise privacy requirements including SOC 2. Pinecone’s fully managed, cloud-native architecture eliminated the operational burden entirely: no index tuning, no sharding logic, no capacity planning. As new creators onboard and usage spikes around live events, the database scales automatically.

The performance numbers are concrete: Delphi now stores over 100 million vectors across thousands of customers, with P95 query latency at 100ms. Retrieval accounts for less than 30% of total response time—leaving the remaining budget for LLM generation and delivery. The engineering team, which is small and growing, focuses on product features rather than database maintenance.

Delphi’s architecture is a blueprint for AI-native companies building multi-tenant agent platforms. The combination of namespace isolation, managed scaling, and enterprise security compliance makes Pinecone the infrastructure layer that allows Delphi to onboard creators at any scale without re-architecting for each growth milestone.

Impact

Challenge

Solution

Tools & Technologies

Full Story

Similar Cases