TechnologyCustomer Service

How Assembled Cuts Support Response Time 95% with Pinecone RAG

Assembled is a workforce management and customer support optimization platform serving enterprises like Stripe, Etsy, and DoorDash. To power Assembled Assist, the company built a hybrid RAG pipeline combining Pinecone vector search with Algolia keyword retrieval and LLMs from OpenAI and Anthropic. Support tasks that previously took 40 minutes now complete in 2 minutes—a 95% reduction in handling time.

Impact

~95%

Ticket handling time reduction

2 minutes

Post-AI task completion time

Challenge

Support agents lacked fast access to accurate answers, requiring up to 40 minutes per ticket to search knowledge bases and draft responses manually—a process that couldn’t scale as client support volumes grew.

Solution

Assembled built Assembled Assist, a RAG pipeline powered by Pinecone for semantic vector retrieval and Algolia for keyword search, fused via Reciprocal Rank Fusion and completed by OpenAI and Anthropic LLMs to generate ticket responses in seconds.

Tools & Technologies

What Leaders Say

Pinecone was a no-brainer for us. We needed to move quickly, and Pinecone was the leader in the vector database space. Its cost-effectiveness and ease of integration have been significant advantages, allowing us to focus on delivering value rather than managing infrastructure. We can test and adjust on the fly, which is crucial for maintaining high search quality and continuously enhancing our support solutions.

John Wang, Co-founder and CTO, Assembled
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

Assembled helps enterprise support teams at companies like Stripe, Etsy, and DoorDash run more efficiently, providing tools for workforce management, performance tracking, and ticket resolution. As AI began reshaping customer service expectations, Assembled saw an opportunity to close a persistent gap: support agents were spending too much time searching for accurate answers rather than delivering them.

Before Assembled Assist, agents navigated knowledge bases, past tickets, and product documentation manually before drafting responses. This slow, inconsistent process stretched routine tasks to 40 minutes per ticket. As support volumes grew, so did the inefficiency—agents were drowning in lookup work rather than focusing on the judgment calls that actually required human expertise.

Assembled built Assembled Assist as an AI automation engine that analyzes incoming tickets and generates high-quality contextual responses. The retrieval layer combines Pinecone’s vector database for semantic search with Algolia for keyword matching, fused via Reciprocal Rank Fusion to surface the most relevant results regardless of how a question is phrased. OpenAI and Anthropic LLMs then generate the response. The team chose a RAG architecture over fine-tuning, prioritizing prompt flexibility and rapid iteration as foundation models continue to improve.

The impact was stark: tasks that took 40 minutes now complete in 2 minutes, a roughly 95% reduction in handling time. Agents can process more tickets with greater consistency, and Assembled’s engineering team focuses on prompt quality and data curation rather than model infrastructure.

Assembled’s architecture reflects where enterprise AI is heading—composing specialized components rather than training from scratch. As foundation models improve, their bet on flexible, prompt-driven RAG over fine-tuned models positions Assembled to absorb those gains automatically, compounding the value delivered to every support team on the platform.

Similar Cases

H
Hostinger
Minutes vs. days
website creation time

Hostinger partnered with Anthropic to build Hostinger Horizons, an AI-powered platform that converts natural language prompts into complete, functional websites and applications. The solution eliminates the steep learning curve of traditional web builders, enabling non-technical users to create professional online presences in minutes instead of days.

TechnologyCClaude
1
1up
10x faster
response generation speed for rfps and compliance questionnaires

1up, a sales knowledge automation platform, integrated Pinecone's vector database to power a RAG-based system that delivers real-time, highly accurate answers to complex sales queries. The solution replaced a slow, home-grown embedding system and achieved 10x faster response generation for RFPs and compliance questionnaires. Sales reps can now handle high volumes of queries with confidence, reducing reliance on colleagues and accelerating the go-to-market process.

TechnologyAAWSPPinecone
FD
Fifth Dimension
50x
document processing capacity increase

Fifth Dimension, a UK-based AI analytics company serving the real estate industry, migrated to Google Cloud to overcome critical infrastructure bottlenecks. By adopting Vertex AI, Cloud Run, and serverless architecture, the company achieved 50x processing scalability, 6x revenue growth, and a 30% reduction in infrastructure costs — all within a rapid growth trajectory from founding in 2023 to global scale by 2025.

TechnologyGCGoogle Cloud Pub/SubGCGoogle Cloud Run
C
CustomGPT.ai
>400M
vectors stored

CustomGPT.ai built a RAG-as-a-Service platform on Pinecone storing over 400M vectors, achieving sub-20ms query latency and the #1 ranking in an independent RAG accuracy benchmark.

TechnologyPPinecone
D
Delphi
>100M
vectors stored

Delphi is an AI platform that enables coaches, creators, and experts to deploy interactive “Digital Minds”—always-on conversational agents trained on their unique content. Scaling from proof of concept to a commercial platform with thousands of customers required a vector database that could support millions of isolated namespaces, billions of vectors, and sub-second retrieval under variable load. Delphi selected Pinecone, achieving P95 query latency of 100ms and keeping retrieval under 30% of total response time—freeing the engineering team to build product rather than manage infrastructure.

TechnologyPPinecone
L
Lindy
10x
customer growth

Lindy's AI agent platform is built on Claude, enabling 10x customer growth, 72% reduction in time-to-qualified-lead, and handling 70%+ of routine support tickets.

TechnologyCClaude
TX
Terminal X
0.68 to 0.91
f1 retrieval accuracy improvement

Terminal X is a vertical AI platform for institutional investors that acts as a 24/7 research agent, processing millions of financial documents for hedge funds, asset managers, and private equity firms. By rebuilding its retrieval architecture on Pinecone’s vector database, Terminal X improved F1 retrieval accuracy from 0.68 to 0.91, cut average latency by over 35%, and doubled deployment velocity. Users now save approximately three hours per day, and investment memo preparation dropped from two days to half a day.

Financial ServicesTechnologyPPinecone
Z
ZoomInfo
>50%
increase in user engagement

ZoomInfo, a B2B go-to-market intelligence platform with hundreds of millions of professional contact records, needed a vector database to power real-time personalized contact recommendations for sales and marketing teams. The company deployed Pinecone’s serverless vector database with Dedicated Read Nodes to run semantic search over 390 million contact embeddings with sub-second latency. The result was a 50% increase in user engagement, a 2x improvement in recommendation relevancy, and 50x more peak request capacity.

TechnologyPPinecone