Read the Report — State of Applied AI →

How ZoomInfo Uses Pinecone to Deliver Real-Time Contact Recommendations at Scale

ZoomInfo, a B2B go-to-market intelligence platform with hundreds of millions of professional contact records, needed a vector database to power real-time personalized contact recommendations for sales and marketing teams. The company deployed Pinecone’s serverless vector database with Dedicated Read Nodes to run semantic search over 390 million contact embeddings with sub-second latency. The result was a 50% increase in user engagement, a 2x improvement in recommendation relevancy, and 50x more peak request capacity.

Impact

>50%

Increase in user engagement

2x

Improvement in relevancy and recall

50x

Increase in peak customer requests served

390 million+

Contact vectors in production system

~60ms

P50 query latency

3 weeks

Time to working proof of concept

Challenge

ZoomInfo needed to deliver real-time personalized contact recommendations over 390 million embeddings with sub-second latency, without adding the operational burden of managing distributed vector infrastructure.

Solution

ZoomInfo deployed Pinecone’s serverless vector database with Dedicated Read Nodes to run semantic search over 390 million contact embeddings, enabling instant recommendations with predictable low-latency performance as traffic scaled.

Tools & Technologies

What Leaders Say

Pinecone’s slab architecture and Dedicated Read Nodes gave us the speed, consistency, and isolation we needed to run real-time recommendations at scale. Instead of managing infrastructure, we spend our time improving our recommendation model and the product itself. That has reduced the time our customers spend researching, filtering, and evaluating contacts—from hours to minutes—by giving them the right people to reach out to with a single click.

Carlos Nunez, Vice President of Engineering and Applied AI at ZoomInfo

Pinecone enabled us to build, scale, and optimize a real-time contact recommendation system that processes thousands of large-embedding-model vector search queries per second, which has driven a 2x improvement in relevancy and 50% boost to user engagement.

Tamiro Scholer, Senior Data Scientist at ZoomInfo
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

ZoomInfo provides sales and marketing teams worldwide with access to hundreds of millions of professional contact records, enriched with firmographic data and AI-powered search capabilities. For its customers, the ability to quickly identify the right person inside a target account—and act on that information—is directly tied to pipeline and revenue. Even small improvements in how contacts are surfaced translate into significant time savings and better outcomes for go-to-market teams.

ZoomInfo’s existing contact discovery experience required users to manually search, filter, and navigate large volumes of contact data. Identifying the most relevant buyers inside a target account could take hours of manual work. The platform lacked a way to surface ranked, personalized recommendations the moment a user viewed a company profile—a gap the Applied AI team set out to close.

ZoomInfo’s Applied AI team built a real-time contact recommendation system using a ~400M-parameter text embedding model and Pinecone as the vector database. Within three weeks, the team had loaded millions of embeddings and validated sub-second latency without manual index configuration. Pinecone’s serverless slab architecture stores vectors in large contiguous units, ensuring writes and reads proceed in parallel without blocking. For production-scale traffic, the team deployed Pinecone Dedicated Read Nodes—isolated read replicas with warm memory and local SSD—delivering predictable low-latency performance under sustained high-QPS workloads. The system scaled to over 390 million vectors across more than 100,000 namespaces without re-architecting the underlying infrastructure.

The production system achieved a P50 latency of ~60ms at ~40 QPS, keeping end-to-end recommendation service latency under one second for standard loads and under five seconds for peak traffic above 100 RPS. Relevancy improved 2x, user engagement rose by more than 50%, and the system served 50x more peak customer requests than the prior implementation. What had required hours of manual research was reduced to a single click: the moment a user views a company profile, the platform surfaces the most relevant contacts instantly. As Carlos Nunez, VP of Engineering and Applied AI, described it: the team went from managing infrastructure to improving the recommendation model itself.

ZoomInfo plans to expand Pinecone-powered recommendations across additional products, customer segments, and internal applications. The deployment demonstrates a broader shift in B2B software: platforms that can surface the right information instantly, rather than after manual search, are redefining what go-to-market intelligence means. With Pinecone’s combination of serverless scale, slab-based storage, and Dedicated Read Nodes, ZoomInfo has a clear path to extend real-time semantic search to new surfaces without operational overhead.

Similar Cases

1
1up
10x faster
response generation speed for rfps and compliance questionnaires

1up, a sales knowledge automation platform, integrated Pinecone's vector database to power a RAG-based system that delivers real-time, highly accurate answers to complex sales queries. The solution replaced a slow, home-grown embedding system and achieved 10x faster response generation for RFPs and compliance questionnaires. Sales reps can now handle high volumes of queries with confidence, reducing reliance on colleagues and accelerating the go-to-market process.

TechnologyPPineconeAAWS
C
CustomGPT.ai
>400M
vectors stored

CustomGPT.ai built a RAG-as-a-Service platform on Pinecone storing over 400M vectors, achieving sub-20ms query latency and the #1 ranking in an independent RAG accuracy benchmark.

TechnologyPPinecone
D
Delphi
>100M
vectors stored

Delphi is an AI platform that enables coaches, creators, and experts to deploy interactive “Digital Minds”—always-on conversational agents trained on their unique content. Scaling from proof of concept to a commercial platform with thousands of customers required a vector database that could support millions of isolated namespaces, billions of vectors, and sub-second retrieval under variable load. Delphi selected Pinecone, achieving P95 query latency of 100ms and keeping retrieval under 30% of total response time—freeing the engineering team to build product rather than manage infrastructure.

TechnologyPPinecone
TX
Terminal X
0.68 to 0.91
f1 retrieval accuracy improvement

Terminal X is a vertical AI platform for institutional investors that acts as a 24/7 research agent, processing millions of financial documents for hedge funds, asset managers, and private equity firms. By rebuilding its retrieval architecture on Pinecone’s vector database, Terminal X improved F1 retrieval accuracy from 0.68 to 0.91, cut average latency by over 35%, and doubled deployment velocity. Users now save approximately three hours per day, and investment memo preparation dropped from two days to half a day.

Financial ServicesTechnologyPPinecone
A
Assembled
~95%
ticket handling time reduction

Assembled is a workforce management and customer support optimization platform serving enterprises like Stripe, Etsy, and DoorDash. To power Assembled Assist, the company built a hybrid RAG pipeline combining Pinecone vector search with Algolia keyword retrieval and LLMs from OpenAI and Anthropic. Support tasks that previously took 40 minutes now complete in 2 minutes—a 95% reduction in handling time.

TechnologyPPineconeOLOpenAI LLMs
G
Gong
10x
infrastructure cost reduction

Gong is a revenue intelligence platform that analyzes billions of customer interactions to help sales teams improve performance. To power Smart Trackers—its patented AI system for detecting and classifying concepts in sales conversations—Gong adopted Pinecone as its core vector database, storing billions of sentence-level embeddings across real conversations. Migrating to Pinecone Serverless delivered a 10x reduction in infrastructure costs while sustaining peak search performance across a massive corpus.

TechnologyPPinecone
P
Pfizer
93%
database reduction

Pfizer achieved a 93% database reduction and 20% cost avoidance by migrating their global SAP environment to S/4HANA on IBM Power10 infrastructure.

PharmaceuticalsTechnologyIPIBM Power10IPIBM PowerVM
C
Classmethod
up to 90%
reduction in development time

Classmethod, a leading Japanese cloud integrator, deployed Claude Code across its engineering teams to address chronic developer shortages. The tool automated code generation, review, and testing workflows, reducing development time by up to 90% on specific tasks and cutting code review time by 80%.

TechnologyCCClaude Code