How Assembled Cuts Support Response Time 95% with Pinecone RAG

Assembled is a workforce management and customer support optimization platform serving enterprises like Stripe, Etsy, and DoorDash. To power Assembled Assist, the company built a hybrid RAG pipeline combining Pinecone vector search with Algolia keyword retrieval and LLMs from OpenAI and Anthropic. Support tasks that previously took 40 minutes now complete in 2 minutes—a 95% reduction in handling time.

Impact

~95%

Ticket handling time reduction

2 minutes

Post-AI task completion time

Challenge

Support agents lacked fast access to accurate answers, requiring up to 40 minutes per ticket to search knowledge bases and draft responses manually—a process that couldn’t scale as client support volumes grew.

Solution

Assembled built Assembled Assist, a RAG pipeline powered by Pinecone for semantic vector retrieval and Algolia for keyword search, fused via Reciprocal Rank Fusion and completed by OpenAI and Anthropic LLMs to generate ticket responses in seconds.

Tools & Technologies

What Leaders Say

Pinecone was a no-brainer for us. We needed to move quickly, and Pinecone was the leader in the vector database space. Its cost-effectiveness and ease of integration have been significant advantages, allowing us to focus on delivering value rather than managing infrastructure. We can test and adjust on the fly, which is crucial for maintaining high search quality and continuously enhancing our support solutions.

John Wang, Co-founder and CTO, Assembled
Get the full story.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

Assembled helps enterprise support teams at companies like Stripe, Etsy, and DoorDash run more efficiently, providing tools for workforce management, performance tracking, and ticket resolution. As AI began reshaping customer service expectations, Assembled saw an opportunity to close a persistent gap: support agents were spending too much time searching for accurate answers rather than delivering them.

Before Assembled Assist, agents navigated knowledge bases, past tickets, and product documentation manually before drafting responses. This slow, inconsistent process stretched routine tasks to 40 minutes per ticket. As support volumes grew, so did the inefficiency—agents were drowning in lookup work rather than focusing on the judgment calls that actually required human expertise.

Assembled built Assembled Assist as an AI automation engine that analyzes incoming tickets and generates high-quality contextual responses. The retrieval layer combines Pinecone’s vector database for semantic search with Algolia for keyword matching, fused via Reciprocal Rank Fusion to surface the most relevant results regardless of how a question is phrased. OpenAI and Anthropic LLMs then generate the response. The team chose a RAG architecture over fine-tuning, prioritizing prompt flexibility and rapid iteration as foundation models continue to improve.

The impact was stark: tasks that took 40 minutes now complete in 2 minutes, a roughly 95% reduction in handling time. Agents can process more tickets with greater consistency, and Assembled’s engineering team focuses on prompt quality and data curation rather than model infrastructure.

Assembled’s architecture reflects where enterprise AI is heading—composing specialized components rather than training from scratch. As foundation models improve, their bet on flexible, prompt-driven RAG over fine-tuned models positions Assembled to absorb those gains automatically, compounding the value delivered to every support team on the platform.

Similar Cases

T
TaskUs
20%
average handle time reduction

TaskUs is a leading outsourced digital services company providing next-generation customer experience (CX) for innovative global brands. To move beyond flat-file embedding storage and scaling limitations, TaskUs built TaskGPT—a proprietary GenAI platform—with Pinecone as the core vector database for semantic search, RAG-based knowledge retrieval, and client-specific recommendations. The result: a 20% reduction in average handle time and a 5% increase in customer satisfaction across client deployments.

Business Process OutsourcingABAmazon BedrockPPinecone
D
Delphi
>100M
vectors stored

Delphi is an AI platform that enables coaches, creators, and experts to deploy interactive “Digital Minds”—always-on conversational agents trained on their unique content. Scaling from proof of concept to a commercial platform with thousands of customers required a vector database that could support millions of isolated namespaces, billions of vectors, and sub-second retrieval under variable load. Delphi selected Pinecone, achieving P95 query latency of 100ms and keeping retrieval under 30% of total response time—freeing the engineering team to build product rather than manage infrastructure.

TechnologyPPinecone
G
Gong
10x
infrastructure cost reduction

Gong is a revenue intelligence platform that analyzes billions of customer interactions to help sales teams improve performance. To power Smart Trackers—its patented AI system for detecting and classifying concepts in sales conversations—Gong adopted Pinecone as its core vector database, storing billions of sentence-level embeddings across real conversations. Migrating to Pinecone Serverless delivered a 10x reduction in infrastructure costs while sustaining peak search performance across a massive corpus.

Sales TechnologyPPinecone
I
InpharmD
80%
data storage cost savings

InpharmD's AI assistant, Sherlock, leverages Pinecone's vector database to deliver fast, accurate drug information to healthcare professionals. By embedding 30 million medical documents into a RAG pipeline, InpharmD achieved 70% better query accuracy, 95x faster first response times, and 80% cost savings on data storage.

HealthcareSSherlockAAWS
1
1up
10x faster
response generation speed for rfps and compliance questionnaires

1up, a sales knowledge automation platform, integrated Pinecone's vector database to power a RAG-based system that delivers real-time, highly accurate answers to complex sales queries. The solution replaced a slow, home-grown embedding system and achieved 10x faster response generation for RFPs and compliance questionnaires. Sales reps can now handle high volumes of queries with confidence, reducing reliance on colleagues and accelerating the go-to-market process.

Sales TechnologyAAWS
R(
RAG (Retrieval-Augmented Generation)
CC
Chipper Cash
95%+
selfie verification accuracy

Chipper Cash, a fintech serving over five million customers across Africa, deployed a Pinecone-powered facial similarity search system to detect and block fraudulent duplicate sign-ups in real time. The solution slashed identity verification latency from up to 20 minutes down to under 2 seconds, and reduced fraudulent sign-ups by 10x across all markets.

Financial ServicesGCGoogle CloudSSnowflake