How Notion Powers Precise AI Search for Millions with Cohere Rerank

Notion, the connected workspace platform used by millions worldwide, integrated Cohere Rerank into its search pipeline to power Notion AI’s search accuracy across multilingual enterprise workspaces. Every search and Notion AI interaction now routes through Cohere Rerank, delivering dramatically improved relevance while cutting the cost and complexity of embedding-based retrieval for smaller workspaces.

Impact

Millions

Notion AI users reached

From 100,000 to 200 documents

Retrieval precision improvement

Eliminated embedding and vector storage for most workspaces

Cost reduction

Challenge

Notion needed a search reranker that could improve answer precision across large, multilingual enterprise workspaces without the cost and infrastructure burden of embedding-based retrieval, while supporting diverse global user bases in EMEA and APAC.

Solution

Cohere Rerank was integrated into Notion’s search pipeline via Amazon SageMaker, placed before the generative model to boost result relevance — enabling Notion to skip costly embeddings for smaller workspaces while combining cross-source search results from Slack, GitHub, and other connected tools.

Tools & Technologies

What Leaders Say

Cohere is a key part of what makes Notion AI work. Cohere Rerank gives us both the speed and quality we need, and it’s consistently improving. It’s been essential for getting our AI Connectors out the door quickly.

Simon Last, Cofounder and CTO, Notion

Cohere Rerank is one of the only high quality and fast multilingual rerankers on the market and that’s why we use it. Every single search and Notion AI interaction goes through Cohere Rerank.

Abhishek Modi, Software Engineer, Notion

One big part of the search pipeline is precision. With Cohere Rerank, we no longer have to worry about it and can focus on other parts like recall.

Abhishek Modi, Software Engineer, Notion
Get the full story.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Full Story

Notion operates at a scale that makes search infrastructure a core product concern rather than a technical detail. The platform hosts collaborative wikis, project management systems, and connected knowledge bases for companies ranging from small startups to global enterprises — and as those organizations grow, so does the volume of documents, databases, and integrations they manage in Notion. When Notion AI launched, delivering accurate, fast answers to user queries across this heterogeneous body of content became a critical product requirement. The technical challenge was finding a reranking solution that could match the precision users expected from AI-native features.

Before integrating Cohere Rerank, Notion’s search pipeline faced a structural tension: embedding models were expensive to run at scale and required vector storage infrastructure, but simpler retrieval methods lacked the precision needed to surface the most relevant documents from large workspaces. The challenge was compounded by Notion’s global user base — more than half of whom work with multilingual content in EMEA and APAC markets — requiring a solution that could operate accurately across languages without separate models or additional engineering overhead.

Notion integrated Cohere Rerank directly into the search pipeline, placing it before the generative model processes any user query. The implementation used Amazon SageMaker for auto-scaling, allowing Notion to handle variable traffic loads without over-provisioning. A particularly important design choice: for workspaces with fewer than 1,000 documents — the majority of Notion’s customer base — Cohere Rerank allows Notion to bypass traditional embedding and vector search entirely, reducing both cost and system complexity while maintaining high answer quality. The same Rerank model also handles cross-source result combination, merging outputs from Slack, GitHub, and other connected tools into coherent, ranked answers.

The precision improvement over embedding models has been substantial. As software engineer Abhishek Modi explained, Cohere Rerank is dramatically more accurate than embedding approaches at scoping large document sets down to the most relevant subset — moving from a retrieval pool of 100,000 documents to 200 with much higher fidelity. Co-founder and CTO Simon Last described Rerank as essential to getting Notion AI Connectors out the door quickly, citing both speed and ongoing improvement as differentiating factors. The result: millions of Notion users have engaged with Notion AI features, with rapid month-on-month growth in usage and a meaningful contribution to company revenue.

Notion’s implementation illustrates how a consumer-grade AI experience depends on enterprise-grade retrieval infrastructure underneath. The decision to partner closely with Cohere — rather than relying on open-source alternatives that would require self-maintenance — reflects a strategic bet that rapid model improvement and collaborative support would be a durable competitive advantage as Notion AI scales.

Similar Cases

D
Delphi
>100M
vectors stored

Delphi is an AI platform that enables coaches, creators, and experts to deploy interactive “Digital Minds”—always-on conversational agents trained on their unique content. Scaling from proof of concept to a commercial platform with thousands of customers required a vector database that could support millions of isolated namespaces, billions of vectors, and sub-second retrieval under variable load. Delphi selected Pinecone, achieving P95 query latency of 100ms and keeping retrieval under 30% of total response time—freeing the engineering team to build product rather than manage infrastructure.

TechnologyPPinecone
F
Fujitsu
World-class score
jglue benchmark performance

Fujitsu, the global IT and digital transformation company with 124,000 employees, partnered with Cohere to develop Takane — a state-of-the-art Japanese large language model built on the Cohere Command series. Designed for private deployment in regulated sectors such as finance, healthcare, and government, Takane delivers world-class performance on the JGLUE benchmark and is now integrated into Fujitsu’s AI service offerings and data intelligence platform.

TechnologyCCCohere Command
PA
Palo Alto Networks
351,000 hours
employee productivity hours saved

Palo Alto Networks, the global cybersecurity leader with nearly 15,000 employees, deployed Moveworks as an AI Assistant named Sheldon to deliver autonomous support across Slack, email, and ServiceNow. The platform resolves 4,000 IT and HR issues per month while saving 351,000 employee hours, enabling the company to scale its hybrid FLEXWORK model without adding headcount.

TechnologyMMoveworks
PS
Pure Storage
30+ minutes
time saved per search

Pure Storage, a Santa Clara-based enterprise data storage company, deployed Glean to unify knowledge access across Jira, GitHub, and internal wikis for teams spanning engineering, legal, and customer support. The AI-powered search platform cuts information-retrieval time by more than 30 minutes per search and enables employees to build custom GenAI applications in as little as 5 minutes, while boosting overall employee satisfaction scores by 39 points.

TechnologyGGlean
C
CoreWeave
2–5 days (down from 4–8 days)
mean time to resolution

CoreWeave, a global AI cloud provider serving top AI labs and enterprises, deployed Cohere’s North agentic AI platform to overhaul its Slack-based customer support workflow in 90 days. North automated ticket triage, context gathering, and routing recommendations, cutting mean resolution time from 4–8 days to 2–5 days while sustaining customer satisfaction scores between 4.9 and 5.0.

TechnologyCNCohere North
S
Salesforce
20%
productivity increase

Salesforce, the world’s leading CRM company, deployed Writer across more than 3,000 employees spanning marketing, communications, product, and customer success. Using Writer’s AI Studio no-code builder and Knowledge Graph RAG, teams create and launch custom agents in minutes without engineering support. Users report a 20% productivity gain—equivalent to reclaiming one full workday per week—with 78% saying the platform positively affects their daily work.

TechnologyWWriter
FD
Fifth Dimension
50x
document processing capacity increase

Fifth Dimension, a UK-based AI analytics company serving the real estate industry, migrated to Google Cloud to overcome critical infrastructure bottlenecks. By adopting Vertex AI, Cloud Run, and serverless architecture, the company achieved 50x processing scalability, 6x revenue growth, and a 30% reduction in infrastructure costs — all within a rapid growth trajectory from founding in 2023 to global scale by 2025.

TechnologyVAVertex AIPPub/Sub
A
Adobe
30%
faster case resolutions

Adobe deployed the ServiceNow AI Platform across IT, HR, security, and workplace operations to streamline employee experiences for over 30,000 staff. Generative AI tools like Now Assist help more than 8,000 IT and HR team members resolve cases faster, reduce outage recovery time, and automate email triage. The result is a measurably faster, more connected workforce that frees employees to focus on high-value creative work.

TechnologyNANow AssistSAServiceNow AI Experience