Search Origin
Shobhit Srivastava
Concept drift in Machine Learning
“The pessimist complains about the wind; the optimist expects it to
change; the realist adjusts the sails.”- William Arthur Ward
Source: Unsplash Anthony Aird
Everything changes with time, data is no exception. The change in data leads
to degrading testing performance of the machine learning model with time.
Ultimately the wrong prediction coming out of the model can affect its
business values.
The relationship between input and output label attributes doesn’t remain
static rather it changes with time, which affects the model performance as it
is unable to understand the new underlying pattern present in the new data.
The effect is termed as Concept Drift in machine learning.
In this article, I will provide a brief overview of this concept that is used in
machine learning quite frequently and is important for every practitioner to
be aware of.
Here’s a brief mention of the points I will be going through the this article.
What is concept drift?
How the concept is related to data science life cycle?
Why do we need to monitor this effect?
How to address the issue?
Conclusion.
What is concept drift?
Concept drift is an effect which leads to degradation of the machine learning
model’s performance over the years. This degradation happens due to the
change in the underlying pattern between the new data set on which model is
tested and the data set on which model is trained. This change happens due
to change is customer’ s product buying pattern or due to some weather
parameter getting changed over time.
We all would be quite familiar with this basic function concept:
Y= f(X)
Here we have a function f which understands the pattern or relationship
between independent variable X and dependent variable Y.
But when this pattern fades, the model gives out the wrong output and
becomes equivalent to garbage.
This is where concept drift comes into effect.
Source: Unsplash by Markus Spiske
How the concept is related to data science life cycle?
We all are aware that the data science project is executed in various phases,
right? Starting with:
Problems identification and its business context.
Data set collection.
Data exploration and feature engineering.
Data visualizations.
Model training and development.
Model testing and deployment.
Model retraining and update process.
I am taking that we all are quite familiar with the top 6 concepts. Concept
drift comes in the last phase i.e., model retraining and updating. It is
where the model is deployed on the customer end and frequent model testing
happens daily. To avoid the model’s deviation, its prediction is monitored
and checked as if it is giving the right predictions or not to maintain business
productivity.
Why do we need to monitor this effect?
We need to monitor this effect because it can cause a huge problem for the
business entity it is running for. Wrong predictions can lead to a business
company losing its reputation as well as its loyal customers as a model could
be providing wrong recommendations that aren’t matching with the new
buying pattern of the users.
Let’s take an example of Corona pandemic time where people have
experienced a major shift in their buying patterns. They are only catering
their requirements to very necessary stuff, of which the model is unaware.
Thus it keeps recommending products to them which aren’t going with the
customer’s choice. Due to this, a business can lose a major chunk of revenue.
How to address the issue?
There are many methods to deal with this issue.
1.Do Nothing (Maintain a single static model).
NO, I am not joking...!. We can just assume that the underlying pattern in the
data doesn’t change over time which in many cases happens so.
Due to this, we can focus on building one single best model for making future
predictions and focus on some other projects.
2.Periodically re-fit the model.
This may be a bit more effective than the first one. We retrain our outdated
model on the new data set coming in, thus explaining the new underlying
pattern in the data set.
This saves the model from becoming ‘Garbage’ and keeps bringing business
values.
3.Periodically update the model.
Instead of updating the outdated model on the new data set we can train and
deploy a new model time by time when our testing shows that the previous
model giving wrong predictions.
This method is a bit more effective as a change in the model can make leads
to more accurate predictions. But model training, as well as its deployment,
takes significant time.
4.Ensemble a new model with the old one.
In this method, we ensemble some new models trained on the new data set
with the outdated model. It is where the new model work together with the
old one at the same time correcting the wrong predictions of the old model.
This method comes out to be a bit complex but more effective than the above
mentioned.
Source: Unsplash by Aaron Burden
[Edit] If you want to dive deep into the topic, please go through this article at
neptune.ai.
Conclusion...
All right guys, that’s it for today. I think we must have learned some new
concepts. This effect is most of the time ignored by the junior data scientists
who after completing one project thinks their work is over, but that’s not the
case. Their responsibilities don’t end there. To maintain our business values
we must monitor and track how and what values we are adding to customer’s
experience and whether the recommendation provider is providing them
with good recommendations or not.
For more as such visit here.
If this article has benefitted you in anyway. Please do support me here
https://www.buymeacoffee.com/shobhitsri
Please feel free to comment below in case you are unclear with any points. I
will reply as soon as possible. You can connect with me here on LinkedIn.
Thank you for co-operating. Have a good day.
Machine Learning Data Science Articles Learning
Recommended from ReadMedium
Eivind Kjosbakken
How to Effectively Forecast Time Series with Amazon's New Time Series
Forecasting Model
Learn about the new Amazon time series model, which you can use to forecast energy usage,
traffic congestion, and weather.
12 min read
Benyam
avata
Mastering Machine Learning Tutorial Creation
Ten Key Insights from My Journey in Crafting Engaging ML Video Tutorials
4 min read
Qwak
How to Build an End-to-End ML Pipeline in 2024
Learn to build an end-to-end ML pipeline and streamline your ML workflows in 2024, from data
ingestion to model deployment and performance…
24 min read
Turkish Technology
Deep Learning with Tabnet
TabNet is a deep learning architecture specifically designed for tabular data, introduced in the
paper “TabNet: Attentive Interpretable…
7 min read
Simran Kaushik
House Price Prediction: A Simple Guide with Scikit-Learn and Linear
Regression
Navigate the realm of predictive analytics with simplicity
7 min read
Hakan Ateşli
Explainable AI With SHAP
From Complexity to Clarity: Exploring AI Transparency with SHAP
13 min read