A
Project Report
On
MACHINE LEARNING
TAKEN AT
“INTERNSHALA”
Submitted in partial fulfilment for the award of degree of
Bachelor of Technology
in
Electronics and Communication Engineering
2021-22
Guided by: Submitted by:
Department of Electronic and Communication Engineering
Acknowledgement
I am grateful to INTERNSHALA for providing me a quality
education and helping me to make understand Machine Learning
online. I would like to also thank my institute, Jaipur Engineering
college and Research Center, for giving permission and necessary
administrative support to take up the online course and guiding me
throughout the course as well.
Deepest thank to our Trainer Mr. Sarvesh Agarwal for his guidance,
good content, quality education and making such a good course for
understanding Machine learning. He has made some of the quizzes
and questions after some topics throughout the course so that I can
check my self-learning also.
Preface
The purpose of this document is to provide a conceptual
introduction to statistical or machine learning (ML) techniques for
those that would not normally be exposed to such approaches during
their typical required statistical training. Machine learning can be
described as a form of statistical analysis, often even utilizing well-
known and familiar techniques, that has bit of a different focus than
traditional analytical practice in applied disciplines. The key notion is
that flexible, automatic approaches are used to detect patterns within
the data, with a primary focus on making predictions on future data.
Table of contents
• Introduction
• Supervised learning
• Bayesian decision theory
• Parametric methods
• Neural network or Artificial neural network
• Back-propagation
• Deep neural network or Deep learning
• Linear regression
• Logistic regression
• K-Nearest neighbors
• Random forest
• Ensemble learning
• Gradient boosted Decision trees
• Overfitting
• Underfitting
• Regularization
• L1 and L2 regularization
• Performance metrics for regression
Introduction
Machine learning is a subfield of artificial intelligence (AI). The goal of
machine learning generally is to understand the structure of data and fit that
data into models that can be understood and utilized by people.
Although machine learning is a field within computer science, it
differs from traditional computational approaches. In traditional
computing, algorithms are sets of explicitly programmed instructions
used by computers to calculate or problem solve.
Machine learning algorithms instead allow for computers to train on
data inputs and use statistical analysis in order to output values that
fall within a specific range. Because of this, machine learning
facilitates computers in building models from sample data in order to
automate decision-making processes based on data inputs.
Machine Learning Methods
In machine learning, tasks are generally classified into broad categories. These
categories are based on how learning is received or how feedback on the
learning is given to the system developed.
Two of the most widely adopted machine learning methods are supervised
learning which trains algorithms based on example input and output data that
is labeled by humans, and unsupervised learning which provides the
algorithm with no labeled data in order to allow it to find structure within its
input data. Let’s explore these methods in more detail.
Supervised Learning
In supervised learning, the computer is provided with example inputs
that are labeled with their desired outputs. The purpose of this method
is for the algorithm to be able to “learn” by comparing its actual
output with the “taught” outputs to find errors, and modify the model
accordingly. Supervised learning therefore uses pattern to predict
label values on additional unlabeled data.
A common use case of supervised learning is to use historical data to
predict statistically likely future events.
Unsupervised Learning
In unsupervised learning, data is unlabeled, so the learning algorithm
is left to find commonalities among its input data. As unlabeled data
are more abundant than labeled data, machine learning methods that
facilitate unsupervised learning are particularly valuable.
The goal of unsupervised learning may be as straightforward as
discovering hidden patterns within a dataset, but it may also have a
goal of feature learning, which allows the computational machine to
automatically discover the representations that are needed to classify
raw data.
Unsupervised learning is commonly used for transactional data. You
may have a large dataset of customers and their purchases, but as a
human you will likely not be able to make sense of what similar
attributes can be drawn from customer profiles and their types of
purchases. With this data fed into an unsupervised learning algorithm,
it may be determined that women of a certain age range who buy
unscented soaps are likely to be pregnant, and therefore a marketing
campaign related to pregnancy and baby products can be targeted to
this audience in order to increase their number of purchases.
Without being told a “correct” answer, unsupervised learning
methods can look at complex data that is more expansive and
seemingly unrelated in order to organize it in potentially meaningful
ways. Unsupervised learning is often used for anomaly detection
including for fraudulent credit card purchases, and recommender
systems that recommend what products to buy next.
Introduction to Bayesian
Decision Theory
Whether you are building Machine Learning models or making
decisions in everyday life, we always choose the path with the least
amount of risk. As humans, we are hardwired to take any action that
helps our survival; however, machine learning models are not initially
built with that understanding. These algorithms need to be trained and
optimized to choose the best option with the least amount of risk.
Additionally, it is important to know that some risky decisions can
lead to severe consequences if they are not correct.
Bayes’ Theorem
One of the most well-known equations in the world of statistics and
probability is Bayes’ Theorem (see formula below). The basic
intuition is that the probability of some class or event occurring,
given some feature (i.e., attribute), is calculated based on the
likelihood of the feature’s value and any prior information about the
class or event of interest. This seems like a lot to digest, so I will
break it down for you. First off, the case of cancer detection is a two-
class problem. The first class, Ω1, represents the event that a tumor is
present, and ω2 represents the event that a tumor is not present.
Parametric vs Nonparametric
Methods in Machine Learning
Parametric Methods
In parametric methods, we typically make an assumption with regards
to the form of the function f. For example, you could make an
assumption that the unknown function f is linear. In other words, we
assume that the function is of the form f(X) = β₀ + β₁ X₁ + … + βₚ
Xₚ
where f(X) is the unknown function to be estimated, β are the
coefficients to be learned, p is the number of independent variables
and X are the corresponding inputs.
Now that we have made an assumption about the form of the function
to be estimated and selected a model that aligns with this assumption,
we need a learning process that will eventually help us to train the
model and estimated the coefficients.
To summaries, parametric methods in Machine Learning usually take
a model-based approach where we make an assumption with respect
to form of the function to be estimated and then we select a suitable
model based on this assumption in order to estimate the set of
parameters.
The biggest disadvantage of parametric methods is that the
assumptions we make may not always be true.
For instance, you may assume that the form of the function is linear,
whilst it is not. Therefore, these methods involve fewer flexible
algorithms and are usually used for less complex problems.
Non-Parametric Methods
On the other hand, non-parametric methods refer to a set of
algorithms that do not make any underlying assumptions with respect
to the form of the function to be estimated. And since no assumption
is being made, such methods are capable of estimating the unknown
function f that could be of any form.
Non-parametric methods tend to be more accurate as they seek to best
fit the data points. However, this comes at the expense of requiring a
very large number of observations that is needed in order to estimate
the unknown function f accurately. Additionally, these methods tend
to be less efficient when it comes to training the models.
Furthermore, non-parametric methods may sometimes introduce
overfitting.
Artificial Intelligence Neural
Network
The Artificial neural network is one of its advancements which is
inspired by the structure of the human brain that helps computers and
machines more like a human. This article helps you to understand the
structure of Artificial Intelligence Neural Networks and their working
procedure.
What is a Neural Network?
A neural network is either a system software or hardware that works
similar to the tasks performed by neurons of the human brain. Neural
networks include various technologies like deep learning, and
machine learning as a part of Artificial Intelligence (AI).
Artificial neural networks (ANN) are the key tool of machine
learning. These are systems developed by the inspiration of neuron
functionality in the brain, which will replicate the way we humans
learn. Neural networks (NN) constitute both the input & output
layers, as well as a hidden layer containing units that change input
into the output so that the output layer can utilize the value.
These are the tools for finding patterns that are numerous & complex
for programmers to retrieve and train the machine to recognize the
patterns.
Backpropagation
In machine learning, backpropagation (backprop,[1] BP) is a widely
used algorithm for training feedforward neural networks.
Generalizations of backpropagation exist for other artificial neural
networks (ANNs), and for functions generally. These classes of
algorithms are all referred to generically as "backpropagation".[2] In
fitting a neural network, backpropagation computes the gradient of
the loss function with respect to the weights of the network for a
single input–output example, and does so efficiently, unlike a naive
direct computation of the gradient with respect to each weight
individually. This efficiency makes it feasible to use gradient
methods for training multilayer networks, updating weights to
minimize loss; gradient descent, or variants such as stochastic
gradient descent, are commonly used. The backpropagation algorithm
works by computing the gradient of the loss function with respect to
each weight by the chain rule, computing the gradient one layer at a
time, iterating backward from the last layer to avoid redundant
calculations of intermediate terms in the chain rule; this is an example
of dynamic programming.[3] The term backpropagation strictly refers
only to the algorithm for computing the gradient, not how the
gradient is used; however, the term is often used loosely to refer to
the entire learning algorithm, including how the gradient is used, such
as by stochastic gradient descent.[4] Backpropagation generalizes the
gradient computation in the delta rule, which is the single layer
version of backpropagation, and is in turn generalized by automatic
differentiation, where backpropagation is a special case of reverse
accumulation (or "reverse mode").[5] The term backpropagation and
its general use in neural networks was announced in Rumelhart,
Hinton & Williams (1986a), then elaborated and popularized in
Rumelhart, Hinton & Williams (1986b), but the technique was
independently rediscovered many times, and had many predecessors
dating to the 1960s; see § History.[6] A modern overview is given in
the deep learning textbook by Goodfellow, Bengio & Courville
(2016).[7]
Deep Neural Networks
It is a neural network that incorporates the complexity of a
certain level, which means several numbers of hidden layers are
encompassed in between the input and output layers.
They are highly proficient on model and process non-linear
associations.
Deep Belief Networks
A deep belief network is a class of Deep Neural Network that
comprises of multi-layer belief networks. Steps to perform
DBN:
With the help of the Contrastive Divergence algorithm, a layer
of features is learned from perceptible units.
Next, the formerly trained features are treated as visible units,
which perform learning of features.
Lastly, when the learning of the final hidden layer is
accomplished, then the whole DBN is trained.
Recurrent Neural Networks
It permits parallel as well as sequential computation, and it is
exactly similar to that of the human brain (large feedback
network of connected neurons). Since they are capable enough
to reminisce all of the imperative things related to the input they
have received, so they are more precise.
Linear Regression
Linear Regression is a machine learning algorithm based on
supervised learning. It performs a regression task. Regression models
a target prediction value based on independent variables. It is mostly
used for finding out the relationship between variables and
forecasting. Different regression models differ based on – the kind of
relationship between dependent and independent variables, they are
considering and the number of independent variables being used.
Linear regression performs the task to predict a dependent variable
value (y) based on a given independent variable (x). So, this
regression technique finds out a linear relationship between x (input)
and y(output). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output)
is the salary of a person. The regression line is the best fit line for our
model.
Hypothesis
While training the model we are given:
x: input training data (univariate – one input variable(parameter))
y: labels to data (supervised learning)
When training the model – it fits the best line to predict the value of y
for a given value of x. The model gets the best regression fit line by
finding the best θ1 and θ2 values.
θ1: intercept θ2: coefficient of x Once we find the best θ1 and θ2
values, we get the best fit line. So, when we are finally using our
model for prediction, it will predict the value of y for the input value
of x.
Logistic Regression in Machine
Learning
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised
Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
Logistic regression predicts the output of a categorical
dependent variable. Therefore, the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression
except that how they are used. Linear Regression is used for
solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit
an "S" shaped logistic function, which predicts two maximum
values (0 or 1).
The curve from the logistic function indicates the likelihood of
something such as whether the cells are cancerous or not, a
mouse is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm
because it has the ability to provide probabilities and classify
new data using continuous and discrete datasets.
Logistic Regression can be used to classify the observations
using different types of data and can easily determine the most
effective variables used for the classification. The below image
is showing the logistic function:
Logistic Function (Sigmoid Function):
The sigmoid function is a mathematical function used to map
the predicted values to probabilities.
It maps any real value into another value within a range of
0 and 1.
The value of the logistic regression must be between 0 and 1,
which cannot go beyond this limit, so it forms a curve like the
"S" form. The S-form curve is called the Sigmoid function or
the logistic function.
In logistic regression, we use the concept of the threshold value,
which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold
values tends to 0.
Assumptions for Logistic Regression:
The dependent variable must be categorical in nature.
The independent variable should not have multicollinearity.
Type of Logistic Regression:
On the basis of the categories, Logistic Regression can be classified
into three types:
Binomial: In binomial Logistic regression, there can be only
two possible types of the dependent variables, such as 0 or 1,
Pass or Fail, etc.
Multinomial: In multinomial Logistic regression, there can be
3 or more possible unordered types of the dependent variable,
such as "cat", "dogs", or "sheep".
Ordinal: In ordinal Logistic regression, there can be 3 or more
possible ordered types of dependent variables, such as "low",
"Medium", or "High".
Steps in Logistic Regression:
To implement the Logistic Regression using Python, we will use
the same steps as we have done in previous topics of Regression.
Below are the steps:
Data Pre-processing step
Fitting Logistic Regression to the Training set
Predicting the test result
Test accuracy of the result (Creation of Confusion matrix)
Visualizing the test set result
Machine Learning Basics with
the K-Nearest Neighbors
Algorithm
Breaking it down
A supervised machine learning algorithm (as opposed to an
unsupervised machine learning algorithm) is one that relies on
labeled input data to learn a function that produces an appropriate
output when given new unlabeled data.
Imagine a computer is a child, we are its supervisor (e.g., parent,
guardian, or teacher), and we want the child (computer) to learn what
a pig looks like. We will show the child several different pictures,
some of which are pigs and the rest could be pictures of anything
(cats, dogs, etc.).
When we see a pig, we shout “pig!” When it’s not a pig, we shout
“no, not pig!” After doing this several times with the child, we show
them a picture and ask “pig?” and they will correctly (most of the
time) say “pig!” or “no, not pig!” depending on what the picture is.
That is supervised machine learning.
Supervised machine learning algorithms are used to solve
classification or regression problems.
A classification problem has a discrete value as its output.
For example, “likes pineapple on pizza” and “does not like pineapple
on pizza” are discrete. There is no middle ground. The analogy above
of teaching a child to identify a pig is another
Image showing randomly generated data.
This image shows a basic example of what classification data might
look like. We have a predictor (or set of predictors) and a label. In the
image, we might be trying to predict whether someone likes
pineapple (1) on their pizza or not (0) based on their age (the
predictor).
It is standard practice to represent the output (label) of a classification
algorithm as an integer number such as 1, -1, or 0. In this instance,
these numbers are purely representational.
Mathematical operations should not be performed on them because
doing so would be meaningless. Think for a moment. What is “likes
pineapple” + “does not like pineapple”? Exactly. We cannot add
them, so we should not add their numeric representations.
A regression problem has a real number (a number with a decimal
point) as its output. For example, we could use the data in the table
below to estimate someone’s weight given their height.
K-Nearest Neighbors
The KNN algorithm assumes that similar things exist in close
proximity. In other words, similar things are near to each other.
“Birds of a feather flock together.”
Image showing how similar data points typically exist close to each other
Notice in the image above that most of the time, similar data points
are close to each other. The KNN algorithm hinges on this
assumption being true enough for the algorithm to be useful. KNN
captures the idea of similarity (sometimes called distance, proximity,
or closeness) with some mathematics we might have learned in our
childhood— calculating the distance between points on a graph.
Note: An understanding of how we calculate the distance between
points on a graph is necessary before moving on. If you are
unfamiliar with or need a refresher on how this
calculation is done, thoroughly read “Distance Between 2 Points” in
its entirety, and come right back.
There are other ways of calculating distance, and one way might be
preferable depending on the problem we are solving.
However, the straight-line distance (also called the Euclidean
distance) is a popular and familiar choice.
The KNN Algorithm
1. Load the data
2. Initialize K to your chosen number of neighbors
1. For each example in the data
Calculate the distance between the query example and the
current example from the data.
Add the distance and the index of the example to an ordered
collection
2. Sort the ordered collection of distances and indices from smallest
to largest (in ascending order) by the distances
Pick the first K entries from the sorted collection
Get the labels of the selected K entries
If regression, return the mean of the K labels
If classification, return the mode of the K labels
Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs
to the supervised learning technique. It can be used for both
Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the
performance of the model.
As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that dataset."
Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher
accuracy and prevents the problem of overfitting.
The below diagram explains the working of the Random Forest
algorithm:
Assumptions for Random Forest
Since the random forest combines multiple trees to predict the class
of the dataset, it is possible that some decision trees may predict the
correct output, while others may not. But together, all the trees
predict the correct output. Therefore, below are two assumptions for a
better Random forest classifier:
There should be some actual values in the feature variable of the
dataset so that the classifier can predict accurate results rather
than a guessed result.
The predictions from each tree must have very low correlations.
Why use Random Forest?
Below are some points that explain why we should use the Random
Forest algorithm:
<="" li="">
It takes less training time as compared to other algorithms.
It predicts output with high accuracy, even for the large dataset it
runs efficiently.
It can also maintain accuracy when a large proportion of data is
missing.
How does Random Forest algorithm work?
Random Forest works in two-phase first is to create the random forest
by combining N decision tree, and second is to make predictions for
each tree created in the first phase.
The Working process can be explained in the below steps and
diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data
points (Subsets).
Step-3: Choose the number N for decision trees that you want to
build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision
tree, and assign the new data points to the category that wins the
majority votes.
The working of the algorithm can be better understood by the below
example:
Example: Suppose there is a dataset that contains multiple fruit
images. So, this dataset is given to the Random forest classifier. The
dataset is divided into subsets and given to each decision tree. During
the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results,
the Random Forest classifier predicts the final decision. Consider the
below image:
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the
identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends
and risks of the disease can be identified.
3. Land Use: We can identify the areas of similar land use by
this algorithm.
4. Marketing: Marketing trends can be identified using this
algorithm.
Random Forest is capable of performing both Classification and
Regression tasks.
It is capable of handling large datasets with high dimensionality.
It enhances the accuracy of the model and prevents the
overfitting issue.
Disadvantages of Random Forest
Although random forest can be used for both classification and
regression tasks, it is not more suitable for Regression tasks.
Python Implementation of Random Forest Algorithm
Now we will implement the Random Forest Algorithm tree using
Python. For this, we will use the same dataset "user_data.csv", which
we have used in previous classification models. By using the same
dataset, we can compare the Random Forest classifier with other
classification models such as Decision tree Classifier,
KNN,
SVM,
Logistic Regression, etc.
Implementation Steps are given below:
Data Pre-processing step
Fitting the Random forest algorithm to the Training set
Predicting the test result
Test accuracy of the result (Creation of Confusion matrix)
Visualizing the test set result.
Ensemble Methods in Machine Learning:
What are They and Why Use Them?
Ensemble Methods, what are they? Ensemble methods is a machine
learning technique that combines several base models in order to
produce one optimal predictive model. To better understand this
definition let’s take a step back into ultimate goal of machine learning
and model building. This is going to make more sense as I dive into
specific examples and why Ensemble methods are used.
I will largely utilize Decision Trees to outline the definition and
practicality of Ensemble Methods (however it is important to note
that Ensemble Methods do not only pertain to Decision Trees).
A Decision Tree determines the predictive value based on series of
questions and conditions. For instance, this simple Decision Tree
determining on whether an individual should play outside or not. The
tree takes several weather factors into account, and given each factor
either makes a decision or asks another question. In this example,
every time it is overcast, we will play outside. However, if it is
raining, we must ask if it is windy or not? If windy, we will not play.
But given no wind, tie those shoelaces tight because were going
outside to play.
Decision Trees can also solve quantitative problems as well with the
same format. In the Tree to the left, we want to know whether or not
to invest in a commercial real estate property. Is it an office building?
A Warehouse? An Apartment building? Good economic conditions?
Poor Economic Conditions? How much will an investment return?
These questions are answered and solved using this decision tree.
When making Decision Trees, there are several factors we must take
into consideration: On what features do we make our decisions on?
What is the threshold for classifying each question into a yes or no
answer? In the first Decision Tree, what if we wanted to ask
ourselves if we had friends to play with or not. If we have friends, we
will play every time. If not, we might continue to ask ourselves
questions about the weather. By adding an additional question, we
hope to greater define the Yes and No classes.
This is where Ensemble Methods come in handy! Rather than just
relying on one Decision Tree and hoping we made the right decision
at each split, Ensemble Methods allow us to take a sample of
Decision Trees into account, calculate which features to use or
questions to ask at each split, and make a final predictor based on the
aggregated results of the sampled Decision Trees.
An Introduction to Gradient
Boosting Decision Trees
Gradient boosting works by building simpler (weak) prediction
models sequentially where each model tries to predict the error left
over by the previous model. Because of this, the algorithm tends to
overfit rather quick.
But what is a weak learning model? A model that does slightly better
than random predictions.
I will show you the exact formula shortly.
But for clearly understanding the underlying principles and working
of GBT, it’s important to first learn the basic concepts of decision
trees and ensemble learning.
This tutorial will take you through the concepts behind gradient
boosting and also through two practical implementations of the
algorithm:
1. Gradient Boosting from scratch
2. Using the scikit-learn in-built function.
Decision trees
A decision tree is a machine learning model that builds upon
iteratively asking questions to partition data and reach a solution. It is
the most intuitive way to zero in on a classification or label for an
object. Visually too, it resembles and upside-down tree with
protruding branches and hence the name.
For example, if you went hiking, and saw an animal that you couldn’t
immediately recognize through its features. You could later come
home and ask yourself a set of questions about its features which
could help you decide what exact species of animal did you notice. A
decision tree for this problem would look something like this.
A decision tree is a flowchart-like tree structure where each node is
used to denote feature of the dataset, each branch is used to denote a
decision, and each leaf node is used to denote the outcome.
The topmost node in a decision tree is known as the root node. It
learns to partition on the basis of the feature value. It partitions the
tree in a recursive manner, also call recursive partitioning. This
flowchart-like structure helps in decision making.
It’s visualization, as shown above, is like a flowchart diagram which
easily mimics the human level thinking. That is why decision trees
are easy to understand and interpret.
WHY BOOSTING?
Boosting works on the principle of improving mistakes of the
previous learner through the next learner.
In boosting, weak learner are used which perform only slightly better
than a random chance.
Boosting focuses on sequentially adding up these weak learners and
filtering out the observations that a learner gets correct at every step.
Basically, the stress is on developing new weak learners to handle the
remaining difficult observations at each step.
One of the very first boosting algorithms developed was Adaboost.
Gradient boosting improvised upon some of the features of Adaboost
to create a stronger and more efficient algorithm.
Let’s look at a brief overview of Adaboost.
AdaBoost
Adaboost used decision stumps as weak learners. Decision stumps are
decision trees with only a single split. It also attached weights to
observations, adding more weight too difficult to classify instances
and less weight to easy to classify instances.
The aim was to put stress on the difficult to classify instances for
every new weak learner. Further, the final result was average of
weighted outputs from all individual learners. The weights associated
with outputs were proportional to their accuracy.
Gradient boosting algorithm is slightly different from Adaboost.
Instead of using the weighted average of individual outputs as the
final outputs, it uses a loss function to minimize loss and converge
upon a final output value. The loss function optimization is done
using gradient descent, and hence the name gradient boosting.
Further, gradient boosting uses short, less-complex decision trees
instead of decision stumps. To understand this in more detail, let’s
see how exactly a new weak learner in gradient boosting algorithm
learns from the mistakes of previous weak learners.
Overfitting in Machine Learning
In the real world, the dataset present will never be clean and perfect.
It means each dataset contains impurities, noisy data, outliers,
missing data, or imbalanced data. Due to these impurities, different
problems occur that affect the accuracy and the performance of the
model. One of such problems is Overfitting in Machine Learning.
Overfitting is a problem that a model can exhibit.
Example to Understand Overfitting
We can understand overfitting with a general example. Suppose there
are three students, X, Y, and Z, and all three are preparing for an
exam. X has studied only three sections of the book and left all other
sections. Y has a good memory, hence memorized the whole book.
And the third student, Z, has studied and practiced all the questions.
So, in the exam, X will only be able to solve the questions if the exam
has questions related to section
Student Y will only be able to solve questions if they appear exactly
the same as given in the book. Student Z will be able to solve all the
exam questions in a proper way.
The same happens with machine learning; if the algorithm learns
from a small part of the data, it is unable to capture the required data
points and hence under fitted.
Suppose the model learns the training dataset, like the Y student.
They perform very well on the seen dataset but perform badly on
unseen data or unknown instances. In such cases, the model is said to
be Overfitting.
And if the model performs well with the training dataset and also
with the test/unseen dataset, similar to student Z, it is said to be a
good fit.
How to detect Overfitting?
Overfitting in the model can only be detected once you test the data.
To detect the issue, we can perform Train/test split.
In the train-test split of the dataset, we can divide our dataset into
random test and training datasets. We train the model with a training
dataset which is about 80% of the total dataset. After training the
model, we test it with the test dataset, which is 20 % of the total
dataset.
Underfitting
Underfitting occurs when our machine learning model is not able to
capture the underlying trend of the data. To avoid the overfitting in
the model, the fed of training data can be stopped
at an early stage, due to which the model may not learn enough from
the training data. As a result, it may fail to find the best fit of the
dominant trend in the data.
In the case of underfitting, the model is not able to learn enough from
the training data, and hence it reduces the accuracy and produces
unreliable predictions.
An underfitted model has high bias and low variance.
Introduction to Regularization
Machine Learning
Regularization is that the method of adding data so as to resolve an
ill-posed drawback or to forestall overfitting. It applies to objective
functions in ill-posed improvement issues. Often, a regression model
overfits the information it’s coaching upon. During the method of
regularization, we tend to try and cut back the complexness of the
regression operate while not really reducing the degree of the
underlying polynomial operate. Regularization are often intended as a
method to enhance the generalizability of a learned model. In this
topic, we are going to learn about Regularization Machine Learning.
Some more about Regularization Machine Learning:
Regularization is even for classification. As classifiers is usually
an undetermined drawback because it tries to infer to operate of
any x given.
The term regularization is additionally supplementary to a loss
operate.
Regularization will serve multiple functions, together with
learning easier models to be distributed and introducing cluster
structure into the educational drawback.
The goal of this learning drawback is to seek out to operate that
matches or predicts the result that minimizes the expected error
overall potential inputs and labels.
1. Lasso Regularization (L1 regularization)
Regularization or Lasso Regularization adds a penalty to the error
operate. The penalty is that the total of absolutely the values of
weights.
p is that the standardization parameter that decides what proportion
we wish to penalize the model.
This lasso regularization is additionally referred to as L1
regularization.
2. Ridge Regularization (L2 regularization)
error operates.
However, the penalty here is that the total of the squared values
of L2 Regularization or Ridge Regularization conjointly add a
penalty to the weights.
p is that the standardization parameter that decides what
proportion we wish to penalize the model.
This ridge regularization is additionally referred to as L2
regularization.
The distinction between these each technique is that lasso
shrinks the slighter options constant to zero so, removing some
feature altogether. So, this works well for feature choice just in
case we’ve got a vast range of options.
Future scope of machine learning
We have heard a lot about the scope of Machine Learning, its
applications, job and salary scopes, etc. But, do you know, what is
Machine Learning? Why do we need Machine Learning? Where do
we use it?
To answer these questions popping up in your mind, this blog will
use an application of Machine Learning in the investment sector or
the stock market and try to understand the need and future scope of
Machine Learning.
Scope of Machine Learning (ML) is vast, and in the near future, it
will deepen its reach into various fields like medical, finance, social
media, facial and voice recognition, online fraud detection, and
biometrics. Gartner predicts that 30% of Government and large
enterprise contracts will require AI-fueled solutions by 2025.
Let’s understand the scope of machine learning in the future in
various fields:
Medical:
Cybersecurity:
Digital voice assistants:
Education:
Job opportunities:
Search engines:
If the current state of ML is exciting since it is the near future of
machine learning opens significantly more and highly complicated
chances for technologists. Let us look at these one-by-one Machine
learning is the process of automatically getting insights from data that
can drive business value.
Lavanya Tekumalla
Gathering and preparing large volumes of data that the machine will
use to teach itself. Feeding the data into ML models and training
them to make right decisions through supervision and correction.
Deploying the model to make analytical predictions or feed with new
kinds of data to expand its capabilities.
CONCLUSION
This tutorial has introduced you to Machine Learning. Now, you
know that Machine Learning is a technique of training machines to
perform the activities a human brain can do, albeit bit faster and
better than an average human-being. Today we have seen that the
machines can beat human champions in games such as Chess,
AlphaGO, which are considered very complex. You have seen that
machines can be trained to perform human activities in several areas
and can aid humans in living better lives.
Machine Learning can be a Supervised or Unsupervised. If you have
lesser amount of data and clearly labelled data for training, opt for
Supervised Learning. Unsupervised Learning would generally give
better performance and results for large data sets. If you have a huge
data set easily available, go for deep learning techniques. You also
have learned Reinforcement Learning and Deep Reinforcement
Learning. You now know what Neural Networks are, their
applications and limitations.