How Ensono Uses Snowflake ML to Predict IT Failures and Cut MTTR by Up to 70%
Ensono, a managed services provider handling over 60 billion retail transactions and government platforms for 24 million constituents, built two AI-powered systems on Snowflake to shift IT operations from reactive to predictive. The Envision Predictive Engine (EPE) and DiagnoseNow application reduced mean time to resolution by 54–70%, cut major incidents by 22%, and improved SLA performance by 38% across its enterprise client base.
Impact
54–70%
Reduction in mean time to resolution (MTTR)
22%
Reduction in major incidents
38%
SLA performance improvement
< 2 minutes
Time to generate AI incident analysis
75M+ events, 9M+ alerts
Events analyzed by EPE
Challenge
Ensono’s MSP engineers managed IT environments for large enterprise clients generating millions of alerts with no reliable way to predict which would escalate to major incidents, while manual root cause analysis slowed incident resolution and data labeling for ML models required significant human effort to scale.
Solution
Ensono built the Envision Predictive Engine and DiagnoseNow on Snowflake’s AI Data Cloud, using Snowflake ML for model training and deployment, Cortex AI for GPT-powered data labeling and automated root cause analysis, and Streamlit in Snowflake for the engineer-facing incident resolution interface integrated with ServiceNow.
Tools & Technologies
What Leaders Say
“When EPE proposes major incidents, the MTTR is 54% lower.”
“We’ve reached new heights in terms of customer satisfaction. Eighty percent of our clients recommend us to other customers. That’s a tangible measure of the quality of the delivery we provide to every one of our clients.”
“We recognized early on that Snowflake had a unique value to our business. Partly because it holds so much of our data, but also because of its extensive capabilities for building, hosting and running inference against machine learning models.”
“We wanted to deploy models as quickly as possible. And with Snowflake ML, we don’t have to worry about creating or finding another model hosting platform because we can use the Model Registry to manage and deploy models for inference with our existing pipelines.”
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.
Full Story
Ensono operates as a managed services provider for large enterprise clients whose IT environments span hundreds of servers, thousands of SaaS accounts, and terabytes of operational data. The company supports critical infrastructure at scale—processing over 60 billion retail transactions and giving 24 million constituents access to government platforms. At that scale, a single misclassified ticket or delayed incident response doesn’t just affect one client: it ripples across dozens of complex environments where downtime has direct financial and operational consequences.
The traditional MSP model of monitor-alert-respond was structurally inadequate. Engineers received floods of alerts with no reliable mechanism to identify which were true precursors to major incidents and which were noise. Data labeling for model training was a manual, dashboard-intensive process that was difficult to scale. And when incidents did occur, root cause analysis required time-consuming manual investigation before the right fix could be applied. Ensono’s Chief AI Officer Jim Piazza set a specific goal: shift the operating model to prevent-predict-optimize.
Ensono built two systems using Snowflake’s AI Data Cloud. The first, Envision Predictive Engine (EPE), is an ML model that ingests data from millions of events and alerts across client environments, estimates each support ticket’s probability of becoming a service-impacting event, and surfaces high-priority tickets as ServiceNow popup notifications for frontline engineers. GPT models accessed via Snowflake Cortex AI automated the historically manual data labeling process, saving engineering hours at scale. The second system, DiagnoseNow, built using Streamlit in Snowflake and Cortex AI, automates root cause analysis by pulling case-specific details, event timelines, error summaries, and recommended actions—all within under two minutes of a request being initiated.
The results across both systems are concrete. When EPE flags a major incident, MTTR is 54% lower than baseline. In some cases, DiagnoseNow pilot testing showed MTTR reductions as high as 70%. Combined, the two systems have helped Ensono spot over 1,700 issues and reduce major incidents by 22%. SLA performance improved by 38%, and the company’s NPS-equivalent metric shows 80% of clients actively recommend Ensono to other organizations—a direct indicator of the operational improvement clients experience.
Ensono is continuing to expand the AI layer. The team is adopting Snowpark Container Services to accelerate DiagnoseNow response times, and Snowflake-managed Model Context Protocol (MCP) servers are planned for the next phase of the decision engine. For Piazza, the direction is clear: “Having more specialized models working together, with different systems, greatly improves the quality of outcomes. It’s like having a team of experts on demand.”