What AI tools did monday.com use?

monday.com used LangSmith, LangGraph in this implementation within the Enterprise Software sector.

What business function does this AI use case address?

This use case focuses on Customer Service & Support, demonstrating how AI can drive impact in that area.

Enterprise SoftwareCustomer Service & Support

How monday Service Uses LangSmith and LangGraph to Build Reliable AI Service Agents

monday.com Source: LANGCHAIN ↗February 2026

monday Service implemented an eval-driven development framework using LangSmith and LangGraph to build and monitor customer-facing AI service agents, achieving 8.7x faster evaluation cycles for IT, HR, and Legal support workflows.

Impact

8.7x faster

Evaluation speed improvement

4.1x faster

Parallelization benefit

Challenge

Building reliable customer-facing AI agents where minor prompt deviations cascade into incorrect outcomes, with no efficient way to test and validate agent behavior before production.

Solution

monday Service implemented an eval-driven development framework using LangSmith for evaluation and tracing and LangGraph for agent orchestration, with offline regression testing and online trajectory monitoring.

Tools & Technologies

LangSmith AI Observability

LangGraph AI Agents

Get the full story.

Full Story

monday Service, the enterprise service management arm of monday.com, set out to build production-grade AI agents capable of handling complex, multi-turn customer conversations across IT, HR, and Legal departments. The fundamental challenge: in agentic systems, even minor prompt or tool-call deviations can cascade into significantly incorrect outcomes, making traditional development approaches insufficient.

The team built an eval-driven development (EDD) framework built on two pillars. Offline evaluations serve as a safety net, running hundreds of test scenarios against sanitized IT tickets before any code reaches production. Online evaluations act as a real-time monitor, scoring entire multi-turn conversation trajectories using LLM-as-judge metrics and tracking business signals like automated resolution and containment rates. LangSmith provided the evaluation platform and tracing infrastructure, while LangGraph powered the ReAct-based agent architecture.

The results demonstrated the power of the approach: evaluation speed improved by 8.7x, from 162 seconds to just 18 seconds per evaluation cycle, through parallelization and concurrent LLM scoring. The team can now evaluate hundreds of examples in minutes rather than hours, enabling rapid iteration on agent behavior. The Evaluations as Code (EaC) pattern they pioneered treats AI judges as versioned TypeScript objects in source control, integrated directly into CI/CD pipelines for continuous quality assurance.

Impact

Challenge

Solution

Tools & Technologies

Full Story

Similar Cases