EBOOK
Resilient Machine Learning
with MLOps
Observe, Diagnose, and Mitigate Issues with Production Models
Resilient Machine Learning with MLOps
The Real World Demands Agility
Today’s economy is under pressure from inflation, rising be able to deploy their models quickly and 3.5 times less
interest rates, and disruptions in the global supply chain. likely to be confronted with projects with overwhelming
As a result, many organizations are seeking new ways to complexity. However, untangling the complexities of a
overcome challenges — to be agile and rapidly respond to machine learning lifecycle is just one piece of the puzzle.
constant change. The same challenges are supposed to
be addressed by production-grade AI systems, which are Organizations need full visibility and automation to rapidly
built on data influenced by all of these real-world changes correct course in their businesses and to reflect daily
in market and consumer behavior. But AI systems don’t changes. This task becomes increasingly hard without a
always reflect that situation, since training data deviates robust MLOps framework or a solution in place. In light
from real-time data more and more. For this reason, AI of this, there are many questions that data science teams
models deployed to solve existing business problems are asked to answer.
are extremely vulnerable.
• Deployed models were accurate yesterday,
We do not know what the future holds. But we can take but what about today?
the right actions to prevent failure and ensure that AI
• How long will it take to replace a model?
systems perform to predictably high standards, meet
business needs, unlock additional resources for financial • Is there a way to get a better model into
sustainability, and reflect the real patterns observed in production faster?
the outside world.
• Is there a way to prove the value of AI to
To prevent deployment delays and deliver resilient, business stakeholders?
accountable, and trusted AI systems, many organizations
invest in MLOps to monitor and manage models while These questions and many others are now at the top of
ensuring appropriate governance. According to BARC, the agenda of every data science team. That is even more
organizations that adopt such tools are better able to true for organizations who are actively expanding their
plan their projects, as they are 4.2 times more likely to production deployments.
S ource: BARC, “Driving Innovation with AI: Getting Ahead with DataOps and MLOps”
1
1
Resilient Machine Learning with MLOps
REPORT The majority of AI-enabled organizations are still
struggling to stay atop the ever-expanding repository
of production models. This poses a critical challenge IDC2 predicts that by 2024,
IDC MarketScape: Worldwide as these models continuously influence key business
Machine Learning Operations 60% of enterprises would
decisions, such as loan provisioning in financial
Platforms 2022 services, inventory forecasting in retail, or have operationalized their ML
Vendor assessment, capabilities review,
staffing optimization in healthcare. workflows by using MLOps.
vendor strategy insights. A myriad of issues can interfere with the performance
and delivery of production models, resulting in poor
or incomplete predictions and ill-informed decision-
Get the Report making. This is due to lack of holistic visibility into model
operations. It’s not enough to simply expose an error; it’s Observability Is Key
essential that teams can instantly pinpoint the context of
the error, thereby enabling quicker resolution. Many data scientists lack visibility into the deployment
behavior and performance of models that are in
Fortunately, DataRobot MLOps allows data science teams production. Tracking the integrity of a machine learning
to address these and many other challenges associated model in production is critical.
with the ML lifecycle, delivering granular model-level
insights, observability of production models, and higher Model observability features within the DataRobot AI
level of confidence for decisions informed by models. platform help ensure that you know when something goes
wrong and understand why it went wrong. By tracking
This ebook highlights several features of DataRobot service, drift, prediction data, training data, and custom
MLOps that enable higher trust and observability for your metrics, you can keep your models and predictions
production models. relevant in a fast-changing world.
2
ource: “IDC FutureScape: Worldwide Artificial Intelligence and Automation 2022 Predictions,” By: Ritu Jyoti, David Schubmehl, Hayley Sutherland,
S
Peter Haas, Sriram Subramanian, Maureen Fleming, Shari Lava, Dr. Chris Marshall, Jennifer Hamel, Neil Ward-Dutton, Jack Vernon, Mary Wardley,
2
Warren Shiau, Jennifer Cooke, Pietro Delai, Harish Dunakhe, Tim Grieser, Peter Rutten, Shane Rau, October 2021, IDC#US48298421
EBOOK
DataRobot MLOps Agents:
Visibility For All Your Production Models
Deploy, manage, monitor, and govern machine learning
models from a single place, regardless of how they
were created or where they are located.
Get the Ebook
DataRobot Model Observability shows what went
wrong with models and why it went wrong.
Model Observability Is More Than Just Monitoring
Model observability provides an end-to-end picture It is not enough to just monitor performance and log
of the internal states of a system, such as its inputs, errors. To get a complete understanding of the internal
outputs, and environment, including data drift, prediction state of your AI/ML system, you also need visibility into
performance, service health, and more relevant metrics. prediction requests and the ability to slice and dice
prediction data over time. Not knowing the context of a
This means you have the ability to not only monitor the performance issue delays the resolution, as the user will
problem but also analyze and pinpoint its source. Model have to diagnose via trial and error, which is problematic
observability compounds performance stats and metrics for business critical models.
across the entire model lifecycle to provide context to
problems that can threaten the integrity of your models. This is a key difference between model monitoring and
Holistic control over ML models is key to sustaining a model observability: model monitoring exposes what the
high-yield AI environment. problem is; model observability helps understand why the
problem occurred. Both must go hand in hand.
DataRobot MLOps provides world-class governance and
scalability for model deployment. Models across the With model observability capabilities, DataRobot MLOps
organization, regardless of where they were built, can users gain full visibility and the ability to track information
be supervised and managed under one single platform. regarding service, drift, prediction and training data, as
In addition to DataRobot models, open source models well as custom metrics that are relevant to your business.
deployed outside of DataRobot MLOps can also be DataRobot customers now have enhanced visibility into
managed and monitored by the DataRobot platform. hundreds of models across the organization.
Resilient Machine Learning with MLOps 3
Resilient Machine Learning with MLOps
Model Observability with Custom Metrics
To quantify how well your models are doing, DataRobot hyper specific for your unique problems and opportunities
provides you with a comprehensive set of data science — specific business KPIs or data science secrets. With
metrics — from the standards (Log Loss, RMSE) to the DataRobot Custom Metrics, you can monitor details
more specific (SMAPE, Tweedie Deviance). But many of specific to your business.
the things you need to measure for your business are
Challenger Insights for Multiclass and External Models
After DataRobot has delivered an optimal model, Once a challenger is detected to outperform the current
Production Lifecycle Management capabilities of the champion model, the DataRobot AI platform notifies you
platform help ensure that the currently deployed model about changing to this new candidate model.
will always be the best one even as the world changes
around it. MLOps delivers automated strategies to keep Business processes probably require you to verify this
production models at peak performance, regardless of suggestion. Is this automatically created model actually
external conditions. better than the current champion—and reliably so? To
facilitate this decision, DataRobot platform provides
For example, DataRobot Data Drift and Accuracy challenger Insights, a deep but intuitive analysis of how
Monitoring detects when reality differs from the situation well the challenger performs and how it stacks up against
when the training dataset was created and the model the champion. This also shows how the models compare
trained. Meanwhile, DataRobot can continuously train on standard performance metrics and informative
challenger models based on more up-to-date data. visualizations like Dual Lift.
4
Resilient Machine Learning with MLOps
Manage changing market conditions. With the
DataRobot AI platform, you can see predicted values and
accuracy for various metrics for the champion, as well as
any challenger models.
On top of that, DataRobot MLOps offers Challenger
Insights for External Models, which allows you to
leverage DataRobot MLOps to monitor already existing
and deployed models, while DataRobot will construct
challengers for those models in the background. If a
DataRobot challenger model manages to beat the
external model, Challenger Insights capability allows
you to carefully compare your own models against the
candidate produced by the DataRobot AI platform.
Compare performance of multiple models 5
Resilient Machine Learning with MLOps
Clearly know when your challenger beats your
champion. DataRobot Challenger Insights includes
a rich set of performance metrics, from standards
such as Log Loss and RMSE to the more specialized
metrics DataRobot uses for specific problems. Here the
DataRobot view shows that the challenger beats the
Challenger and champion comparison insights
champion on some metrics, but not all.
DataRobot offers more in-depth analysis in Challenger
Insights, including Dual Lift, ROC and Prediction Differences.
In the above illustrated case, DataRobot shows that the
challenger automatically retrained via AutoML handily beats
the champion on key metrics.
Champion and challenger ROC comparison
Resilient Machine Learning with MLOps
Visualize Data Drift Over Time
to Maintain Model Integrity
Data drift is a key performance metric that data scientists
should track in order to maintain the high quality results
they expect from a model. Data drift occurs when input
data changes over time and becomes significantly
different from the data that was used during training
and validation stages of model development. When this
type of drift occurs, your model is at risk of degradation,
meaning you cannot trust the predictions anymore.
In addition to being alerted when data drift has occurred,
you need to understand how the drift score has changed
in order to get a deeper understanding of the cause and
impact of this drift.
Data drift can occur for a variety of reasons, including
seasonality, change in prediction values, or even different
volumes of predictions. The corrective action you take will
depend on the cause and context of the drift. Therefore,
you need to fully understand why and how drift occurred,
which is the ultimate goal of observability.
7
REPORT
The Business Value of MLOps
Discover the most impactful benefits
of MLOps tools and processes for
different types of organizations.
Get the Report
Drift over time
DataRobot MLOps Offers User-Friendly Visuals to Track Data Drift Over Time
The example above shows drift (y axis) over time of by drift. Users can slice and dice drift information by
prediction (x-axis) allowing you to easily track trends. The choosing different features to investigate drift.
gray dotted line is the acceptable threshold for drift. You
can easily scan which predictions surpass this threshold With the interactive ability to compound this information,
and at what time. Additionally, the gray bars at the bottom you can understand why drift is happening and quickly
of the chart showcase the volume of predictions so that take appropriate action before it impacts the business.
you can understand how many predictions were impacted
Resilient Machine Learning with MLOps 8
Drill Down into Drift for Rapid Model Diagnostics
A key challenge in investigating data drift is the lack of continue to impact models at an alarming rate. Patterns
details available to the user. Traditionally, drift is tracked in data are changing faster than data science teams can
for top features by comparing scoring data to training data. keep up, costing them time and visibility into deployments.
Drift can also be viewed over time to identify general drift Drift drill down solves this data science challenge so that
trends. To dive deeper into the patterns and causes of drift, organizations can maintain AI driven business results.
MLOps users need to be able to compare drift between
two scoring data segments (in addition to the traditional DataRobot MLOps users can compare drift of selected
comparison between scoring and training data), for any or features between two scoring segments of a model (or
all features, and across any specified time period. scoring and training segments), for any time period, and
view contextual information such as prediction value over
More and more organizations see the need for ad hoc time to further support their investigation.
drift deep dive, especially as world economic conditions
Resilient Machine Learning with MLOps 9
Resilient Machine Learning with MLOps
As highlighted in the DataRobot interface above, the Data
Drift tab is enhanced with a Drill Down section for users
to visualize drift details. Users can configure their own
display settings to select a model, date range of interest,
and time granularity. This is important as data drift can
look different at different time granularities; drift can
happen at any time and at any rate.
For example, if a model has been in production for a
year with little drift, but has only begun drifting at an
increasing rate in the last week, the overall drift view may
not represent this imminent problem. Zooming into that
last week will help the user understand how quickly data is
drifting and whether or not it’s a cause for concern.
Interactive drift drill down
10
“You might think that overall, the model’s features drifted
relatively little in production, but in reality, the model’s
drift statistics might be fluctuating quite a bit up and
down. Or there might be a concerning trend beginning to
develop over the past week that you want to keep an eye Without the ability to zoom into granular time slices,
differences in drift patterns may get lost in the overall
on. That insight requires looking at specific time slices. analysis. DataRobot drift drill down capability allows
data scientists to run quick sanity checks, investigate
Granular time splits show you the true picture,” accelerating or decelerating patterns in drift, and control
the level of granularity of the visuals.
Brian Bell The DataRobot AI platform offers fast and intuitive drift
Senior Director, Product Management, DataRobot MLOps Strategy Lead drill down, as we focus on analyzing your data across
different dimensions in real time to answer data science
questions. You can change the parameters of analysis and
get multiple insights quickly from the user interface.
Resilient Machine Learning with MLOps 11
Resilient Machine Learning with MLOps
Embrace Large Scale
with Confidence
While model observability and deeper understanding of
your model insights are crucial for longevity of AI projects,
it’s important to remember that scale is where the value of
machine learning models is realized. For example, if you
have a model that predicts warehouse capacity for one
store, what about global capacity? What if we can add
more segments and conditions to these parameters? Does
your system handle billions of predictions and ensure
that your models are trustworthy and data is secured?
Organizations that see more value from AI want to apply it
to more use cases, which creates a scalability challenge.
Are you making millions of predictions daily or hourly?
Do you need to ensure that you have a top-performing
model in production without sharing sensitive data? You
can aggregate prediction statistics much faster while
controlling the governance and security of your sensitive
data — no need to submit their entire prediction requests
to DataRobot AI Cloud platform to get data about drift and
accuracy monitoring.
DataRobot Large Scale Monitoring allows you to
access aggregated prediction statistics. This feature
will compute some DataRobot monitoring calculations
outside of DataRobot and send the summary metadata
to DataRobot MLOps. It will let you independently control
the scale and handle billions of rows per day. These
capabilities, combined with the incredible granularity
and observability enabled by MLOps, allow organizations
to observe, diagnose, and mitigate any issues with
production models, no matter the scale.
12
Resilient Machine Learning with MLOps
Going Forward
By implementing DataRobot MLOps, organizations can
quickly and efficiently stand up a center of excellence
(CoE) for machine learning operations and production AI.
MLOps allows organizations to:
• Put ML into production quickly through
repeatable, scalable processes.
• Observe, diagnose, and mitigate issues with
production models on an enterprise scale.
• Optimize outcomes, reduce costs, and maximize
the value of their production models.
With DataRobot MLOps, organizations can realize
significant ROI by spending less time and resources
on infrastructure, model deployment, and model
management, while ensuring resilience and observability
for their AI projects throughout their whole lifecycle.
To learn more about DataRobot’s MLOps capabilities visit
datarobot.com/platform/mlops/.
To schedule a demo, visit
datarobot.com/lp/request-demo/.
13
DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that
combines our open AI platform, deep AI expertise and broad use-case implementation to
improve how customers run, grow and optimize their business. The DataRobot AI Platform
is the only complete AI lifecycle platform that interoperates with your existing investments
in data, applications and business processes, and can be deployed on prem or in any
cloud environment. DataRobot and our partners have a decade of world-class AI expertise
collaborating with AI teams (data scientists, business and IT), removing common blockers
and developing best practices to successfully navigate projects that result in faster time
to value, increased revenue and reduced costs. DataRobot customers include 40% of the
Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top
10 telcos, 5 of top 10 global manufacturers.
Learn more at datarobot.com.
© 2023 DataRobot, Inc. All rights reserved. DataRobot and the DataRobot logo are trademarks of DataRobot, Inc.
All other marks are trademarks or registered trademarks of their respective holders.