Machine Learning Life cycle
What is Machine Learning ?
• In the real world, we are surrounded by humans who
can learn everything from their experiences with
their learning capability, and we have computers or
machines which work on our instructions.
• But can a machine also learn from experiences or
past data like a human does?
• So here comes the role of Machine Learning.
Introduction to Machine Learning
• A subset of artificial intelligence known as machine
learning focuses primarily on the creation of
algorithms that enable a computer to
independently learn from data and previous
experiences.
• Arthur Samuel first used the term "machine
learning" in 1959
• Machine learning algorithms create a mathematical
model that, without being explicitly programmed,
aids in making predictions or decisions with the
assistance of sample historical data, or training data.
• For the purpose of developing predictive models,
machine learning brings together statistics and
computer science.
Algorithms that learn from historical data are either
constructed or utilized in machine learning.
How does Machine Learning work
Features of Machine Learning:
• Machine learning uses data to detect various
patterns in a given dataset.
• It can learn from past data and improve
automatically.
• It is a data-driven technology.
• Machine learning is much similar to data mining as
it also deals with the huge amount of the data.
• Need for Machine Learning
• Rapid increment in the production of data
• Solving complex problems, which are difficult for a
human
• Decision making in various sector including
finance
• Finding hidden patterns and extracting useful
information from data.
Machine learning Life cycle
• Machine learning life cycle is a cyclic process to build an
efficient machine learning project.
• The main purpose of the life cycle is to find a solution to
the problem or project.
• Machine learning life cycle involves seven major steps,
which are given below:
• Gathering Data
• Data preparation
• Data Wrangling
• Analyze Data
• Train the model
• Test the model
• Deployment
1. Gathering Data:
• Data Gathering is the first step of the machine
learning life cycle.
• The goal of this step is to identify and obtain all
data-related problems.
• This step includes the below tasks:
• Identify various data sources
• Collect data
• Integrate the data obtained from different
sources
• By performing the above task, we get a coherent set
of data, also called as a dataset.
2. Data preparation
• This step can be further divided into two processes:
• Data exploration:
It is used to understand the nature of data that we
have to work with. We need to understand the
characteristics, format, and quality of data.
A better understanding of data leads to an effective
outcome. In this, we find Correlations, general
trends, and outliers.
• Data pre-processing:
Now the next step is preprocessing of data for its
analysis.
3. Data Wrangling
• Data wrangling is the process of cleaning and
converting raw data into a useable format.
• It is the process of cleaning the data, selecting the
variable to use, and transforming the data in a proper
format to make it more suitable for analysis in the
next step.
• In real-world applications, collected data may have
various issues, including:
• Missing Values
• Duplicate data
• Invalid data
• Noise
4. Data Analysis
• Now the cleaned and prepared data is passed on to
the analysis step. This step involves:
• Selection of analytical techniques
• Building models
• Review the result
5. Train Model
• We use datasets to train the model using various
machine learning algorithms.
• Training a model is required so that it can
understand the various patterns, rules, and, features.
6. Test Model
Testing the model determines the percentage
accuracy of the model as per the requirement of
project or problem.
7. Deployment
The last step of machine learning life cycle is
deployment, where we deploy the model in the real-
world system.
Applications of Machine learning
1. E-commerce & Recommendations:
ML powers recommendation systems on platforms
like Amazon, Netflix, and YouTube, suggesting
products, movies, or videos based on user
preferences and viewing history.
This personalization enhances user experience and
drives sales by connecting users with relevant
content.
2. Social Media:
Social media platforms like Facebook and
Instagram utilize ML for content filtering, friend
suggestions, and targeted advertising.
ML algorithms analyze user interactions to
personalize news feeds and identify potential
fraudulent activities.
3. Virtual Assistants:
Virtual assistants like Siri, Alexa, and Google
Assistant rely on ML for natural language
processing, enabling them to understand and
respond to voice commands.
ML allows these assistants to learn user preferences
and provide increasingly personalized and helpful
responses.
4. Transportation:
Self-driving cars utilize ML to process data from
cameras, sensors, and GPS to navigate roads,
identify objects, and make driving decisions.
ML also plays a role in traffic management
systems, predicting traffic patterns and optimizing
routes to improve traffic flow.
5. Finance:
ML algorithms are used for fraud detection in
credit card transactions and other financial
activities.
They analyze transaction patterns to identify
suspicious activity and prevent fraudulent
transactions.
ML also plays a role in stock market analysis and
prediction.
6. Healthcare:
ML is used in medical imaging analysis to detect
diseases like cancer in X-rays and other medical
images.
It can also assist in drug discovery, personalized
medicine, and patient monitoring.
7. Image and Speech Recognition:
ML powers image recognition software used for
facial recognition, object detection, and image
classification.
Speech recognition technology, used in voice
assistants and dictation software, also relies heavily
on ML.
8. Email Filtering and Spam Detection:
ML algorithms analyze email content to identify
and filter spam and phishing emails, protecting
users from harmful content.
9. Customer Service:
Chatbots powered by ML provide automated
customer support, resolving simple queries and
directing complex issues to human agents.
ML can also analyze customer feedback to improve
customer service strategies.
10. Fraud Detection:
ML algorithms are used to detect fraudulent
activities in various sectors like finance, insurance,
and e-commerce.
These algorithms analyze data patterns to identify
anomalies and potential fraudulent behavior.
Types of Machine Learning
• Machine learning is a subset of AI, which enables the
machine to automatically learn from data, improve
performance from past experiences, and make
predictions.
• Machine learning contains a set of algorithms that
work on a huge amount of data.
• Data is fed to these algorithms to train them, and on
the basis of training, they build the model & perform
a specific task.
• Based on the methods and way of learning, machine
learning is divided into mainly four types, which are:
1.Supervised Machine Learning
2.Unsupervised Machine Learning
3.Semi-Supervised Machine Learning
4.Reinforcement Learning
1.Supervised Machine Learning
• Supervised machine learning is based on
supervision.
• It means in the supervised learning technique, we
train the machines using the "labelled" dataset, and
based on the training, the machine predicts the
output.
• Here, the labelled data specifies that some of the
inputs are already mapped to the output.
• The main goal of the supervised learning
technique is to map the input variable(x) with the
output variable(y).
• Some real-world applications of supervised learning
are Risk Assessment, Fraud Detection, Spam
filtering, etc.
• Categories of Supervised Machine Learning
• Supervised machine learning can be classified into
two types of problems, which are given below:
• Classification
• Regression
• Classification
• Classification algorithms are used to solve the
classification problems in which the output variable
is categorical, such as "Yes" or No, Male or
Female, Red or Blue, etc.
• The classification algorithms predict the categories
present in the dataset. Some real-world examples of
classification algorithms are Spam Detection, Email
filtering, etc.
• Random Forest Algorithm
• Decision Tree Algorithm
• Logistic Regression Algorithm
• Support Vector Machine Algorithm
b) Regression
• Regression algorithms are used to solve regression
problems in which there is a linear relationship
between input and output variables.
• These are used to predict continuous output
variables, such as market trends, weather prediction,
etc.
• Some popular Regression algorithms are given below:
• Simple Linear Regression Algorithm
• Multivariate Regression Algorithm
• Decision Tree Algorithm
• Lasso Regression
Advantages and Disadvantages of
Supervised Learning
Advantages:
• Since supervised learning work with the
labelled dataset so we can have an exact idea
about the classes of objects.
• These algorithms are helpful in predicting the
output on the basis of prior experience.
Disadvantages:
• These algorithms are not able to solve complex
tasks.
• It may predict the wrong output if the test data
is different from the training data.
• It requires lots of computational time to train the
algorithm.
Applications of Supervised Learning
• Image Segmentation
• Medical Diagnosis
• Fraud Detection
• Spam detection
• Speech Recognition
2. Unsupervised Machine Learning
• Unsupervised learning is different from the Supervised
learning technique; as its name suggests, there is no need for
supervision.
• It means, in unsupervised machine learning, the machine is
trained using the unlabeled dataset, and the machine predicts
the output without any supervision.
• In unsupervised learning, the models are trained with the data
that is neither classified nor labelled, and the model acts on
that data without any supervision.
• The main aim of the unsupervised learning algorithm is to
group or categories the unsorted dataset according to the
similarities, patterns, and differences.
• Machines are instructed to find the hidden patterns from the
input dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into
two types, which are given below:
o Clustering
o Association
1) Clustering
• The clustering technique is used when we want to
find the inherent groups from the data.
• It is a way to group the objects into a cluster such
that the objects with the most similarities remain in
one group and have fewer or no similarities with the
objects of other groups.
• An example of the clustering algorithm is grouping
the customers by their purchasing behaviour.
• Some of the popular clustering algorithms are given
below:
• K-Means Clustering algorithm
• Mean-shift algorithm
• DBSCAN Algorithm
• Principal Component Analysis
• Independent Component Analysis
2) Association
• Association rule learning is an unsupervised
learning technique, which finds interesting relations
among variables within a large dataset.
• The main aim of this learning algorithm is to find
the dependency of one data item on another data
item and map those variables accordingly so that it
can generate maximum profit.
• This algorithm is mainly applied in Market Basket
analysis, Web usage mining, continuous
production, etc.
• Some popular algorithms of Association rule
learning are Apriori Algorithm, Eclat, FP-growth
algorithm.
• Advantages and Disadvantages of Unsupervised
Learning Algorithm
• Advantages:
• These algorithms can be used for complicated tasks
compared to the supervised ones because these
algorithms work on the unlabeled dataset.
• Unsupervised algorithms are preferable for various
tasks as getting the unlabeled dataset is easier as
compared to the labelled dataset.
Disadvantages:
• The output of an unsupervised algorithm can be
less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in
prior.
• Working with Unsupervised learning is more
difficult as it works with the unlabelled dataset that
does not map with the output.
• Applications of Unsupervised Learning
• Network Analysis: Unsupervised learning is used
for identifying plagiarism and copyright in
document network analysis of text data for
scholarly articles.
• Recommendation Systems: Recommendation
systems widely use unsupervised learning
techniques for building recommendation
applications for different web applications and e-
commerce websites.
• Anomaly Detection: Anomaly detection is a
popular application of unsupervised learning, which
can identify unusual data points within the dataset.
It is used to discover fraudulent transactions.
• Singular Value Decomposition: Singular Value
Decomposition or SVD is used to extract particular
information from the database. For example,
extracting information of each user located at a
particular location.
Semi-Supervised Learning
• Semi-Supervised learning is a type of Machine
Learning algorithm that lies between Supervised
and Unsupervised machine learning.
• It represents the intermediate ground between
Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training
data) algorithms and uses the combination of
labelled and unlabeled datasets during the training
period.
• To overcome the drawbacks of supervised learning
and unsupervised learning algorithms, the concept
of Semi-supervised learning is introduced.
• The main aim of semi-supervised learning is to
effectively use all the available data, rather than
only labelled data like in supervised learning.
Advantages and disadvantages of Semi-supervised
Learning
Advantages:
• It is simple and easy to understand the algorithm.
• It is highly efficient.
• It is used to solve drawbacks of Supervised and
Unsupervised Learning algorithms.
Disadvantages:
• Iterations results may not be stable.
• We cannot apply these algorithms to network-level
data.
• Accuracy is low.
4. Reinforcement Learning
• Reinforcement learning works on a feedback-based
process, in which an AI agent (A software
component) automatically explore its surrounding by
hitting & trail, taking action, learning from
experiences, and improving its performance.
• The reinforcement learning process is similar to a
human being; for example, a child learns various
things by experiences in his day-to-day life.
• An example of reinforcement learning is to play a
game, where the Game is the environment, moves of
an agent at each step define states, and the goal of the
agent is to get a high score. Agent receives feedback
in terms of punishment and rewards.
• Due to its way of working, reinforcement learning is
employed in different fields such as Game theory,
Operation Research, Information theory, multi-
agent systems.
• Categories of Reinforcement Learning
• Reinforcement learning is categorized mainly into
two types of methods/algorithms:
• Positive Reinforcement Learning: Positive
reinforcement learning specifies increasing the
tendency that the required behaviour would occur
again by adding something. It enhances the strength
of the behaviour of the agent and positively impacts
it.
• Negative Reinforcement Learning: Negative
reinforcement learning works exactly opposite to the
positive RL. It increases the tendency that the
specific behaviour would occur again by avoiding the
negative condition.
• Real-world Use cases of Reinforcement Learning
• Video Games • Resource Management:
• Robotics:
• Text Mining
Advantages and Disadvantages of Reinforcement
Learning
Advantages
• It helps in solving complex real-world problems which
are difficult to be solved by general techniques.
• The learning model of RL is similar to the learning of
human beings; hence most accurate results can be found.
• Helps in achieving long term results.
Disadvantage
• RL algorithms are not preferred for simple problems.
• RL algorithms require huge data and computations.
• Too much reinforcement learning can lead to an
overload of states which can weaken the results.
What is Concept Learning?
• Concept learning in machine learning refers to the process
of teaching a machine to identify and recognize patterns
from specific examples or data points.
• In simple terms, concept learning involves learning a
general rule from a set of observed instances.
• For example, if you show a machine many pictures of cats,
it will learn to recognize the concept of a “cat” and apply
that knowledge to identify new cat pictures.
• Concept learning helps machines generalize from data.
Instead of memorizing each example, it creates a broader
understanding that can be applied to unseen situations.
• This ability to generalize is what makes machine learning
models so powerful.
Types of Concept Learning Tasks
• There are two main types of concept learning tasks in
machine learning: supervised concept learning and
unsupervised concept learning.
• Importance of Concept Learning In Machine Learning
1. Image Classification
2. Natural Language Processing (NLP)
3. Recommendation Systems
4. Fraud Detection