Basic Concepts in Machine
Learning
• Machine Learning is continuously growing in the IT world and gaining strength in
different business sectors.
• Although Machine Learning is in the developing phase, it is popular among all
technologies.
• It is a field of study that makes computers capable of automatically learning and
improving from experience.
• Hence, Machine Learning focuses on the strength of computer programs with the
help of collecting data from various observations.
• In this article, ''Concepts in Machine Learning'', we will discuss a few basic
concepts used in Machine Learning such as what is Machine Learning,
technologies and algorithms used in Machine Learning, Applications and example
of Machine Learning, and much more. So, let's start with a quick introduction to
machine learning.
What is Machine Learning?
• Machine Learning is defined as a technology that is used to train machines to
perform various actions such as predictions, recommendations, estimations, etc.,
based on historical data or past experience.
• Machine Learning enables computers to behave like human beings by training
them with the help of past experience and predicted data.
• There are three key aspects of Machine Learning, which are as follows:
• Task: A task is defined as the main problem in which we are interested. This
task/problem can be related to the predictions and recommendations and
estimations, etc.
• Experience: It is defined as learning from historical or past data and used to
estimate and resolve future tasks.
• Performance: It is defined as the capacity of any machine to resolve any machine
learning task or problem and provide the best outcome for the same. However,
performance is dependent on the type of machine learning problems.
Techniques in Machine Learning
• Machine Learning techniques are divided mainly into the following 4 categories:
• 1. Supervised Learning
• Supervised learning is applicable when a machine has sample data, i.e., input as
well as output data with correct labels. Correct labels are used to check the
correctness of the model using some labels and tags. Supervised learning
technique helps us to predict future events with the help of past experience and
labeled examples. Initially, it analyses the known training dataset, and later it
introduces an inferred function that makes predictions about output values.
Further, it also predicts errors during this entire learning process and also corrects
those errors through algorithms.
• Example: Let's assume we have a set of images tagged as ''dog''. A machine
learning algorithm is trained with these dog images so it can easily distinguish
whether an image is a dog or not.
• Unsupervised Learning
• In unsupervised learning, a machine is trained with some input
samples or labels only, while output is not known. The training
information is neither classified nor labeled; hence, a machine may
not always provide correct output compared to supervised learning.
• Although Unsupervised learning is less common in practical
business settings, it helps in exploring the data and can draw
inferences from datasets to describe hidden structures from
unlabeled data.
• Example: Let's assume a machine is trained with some set of
documents having different categories (Type A, B, and C), and we
have to organize them into appropriate groups. Because the machine
is provided only with input samples or without output, so, it can
organize these datasets into type A, type B, and type C categories,
but it is not necessary whether it is organized correctly or not.
• Reinforcement Learning
• Reinforcement Learning is a feedback-based machine learning
technique.
• In such type of learning, agents (computer programs) need to
explore the environment, perform actions, and on the basis of their
actions, they get rewards as feedback.
• For each good action, they get a positive reward, and for each bad
action, they get a negative reward.
• The goal of a Reinforcement learning agent is to maximize the
positive rewards. Since there is no labeled data, the agent is bound
to learn by its experience only.
• . Semi-supervised Learning
• Semi-supervised Learning is an intermediate technique of both
supervised and unsupervised learning. It performs actions on
datasets having few labels as well as unlabeled data. However, it
generally contains unlabeled data. Hence, it also reduces the cost of
the machine learning model as labels are costly, but for corporate
purposes, it may have few labels. Further, it also increases the
accuracy and performance of the machine learning model.
• Sem-supervised learning helps data scientists to overcome the
drawback of supervised and unsupervised learning. Speech analysis,
web content classification, protein sequence classification, text
documents classifiers., etc., are some important applications of
Semi-supervised learning.
Applications of Machine Learning
• Machine Learning is widely being used in approximately every sector,
including healthcare, marketing, finance, infrastructure, automation,
etc. There are some important real-world examples of machine
learning, which are as follows:
• Healthcare and Medical Diagnosis:
• Machine Learning is used in healthcare industries that help in generating
neural networks. These self-learning neural networks help specialists for
providing quality treatment by analyzing external data on a patient's condition,
X-rays, CT scans, various tests, and screenings. Other than treatment, machine
learning is also helpful for cases like automatic billing, clinical decision
supports, and development of clinical care guidelines, etc.
• Marketing:
• Machine learning helps marketers to create various hypotheses, testing,
evaluation, and analyze datasets. It helps us to quickly make predictions based
on the concept of big data. It is also helpful for stock marketing as most of the
trading is done through bots and based on calculations from machine learning
algorithms. Various Deep Learning Neural network helps to build trading
models such as Convolutional Neural Network, Recurrent Neural Network,
Long-short term memory, etc.
• Self-driving cars:
• This is one of the most exciting applications of machine learning in
today's world. It plays a vital role in developing self-driving cars. Various
automobile companies like Tesla, Tata, etc., are continuously working for
the development of self-driving cars. It also becomes possible by the
machine learning method (supervised learning), in which a machine is
trained to detect people and objects while driving.
• Speech Recognition:
• Speech Recognition is one of the most popular applications of machine
learning. Nowadays, almost every mobile application comes with a voice
search facility. This ''Search By Voice'' facility is also a part of speech
recognition. In this method, voice instructions are converted into text,
which is known as Speech to text" or "Computer speech recognition.
• Google assistant, SIRI, Alexa, Cortana, etc., are some famous
applications of speech recognition.
• Traffic Prediction:
• Machine Learning also helps us to find the shortest route to reach our
destination by using Google Maps. It also helps us in predicting traffic
conditions, whether it is cleared or congested, through the real-time location of
the Google Maps app and sensor.
• Image Recognition:
• Image recognition is also an important application of machine learning for
identifying objects, persons, places, etc. Face detection and auto friend tagging
suggestion is the most famous application of image recognition used by
Facebook, Instagram, etc. Whenever we upload photos with our Facebook
friends, it automatically suggests their names through image recognition
technology.
• Product Recommendations:
• Machine Learning is widely used in business industries for the marketing
of various products. Almost all big and small companies like Amazon,
Alibaba, Walmart, Netflix, etc., are using machine learning techniques
for products recommendation to their users. Whenever we search for any
products on their websites, we automatically get started with lots of
advertisements for similar products. This is also possible by Machine
Learning algorithms that learn users' interests and, based on past data,
suggest products to the user.
• Automatic Translation:
• Automatic language translation is also one of the most significant
applications of machine learning that is based on sequence algorithms by
translating text of one language into other desirable languages. Google
GNMT (Google Neural Machine Translation) provides this feature,
which is Neural Machine Learning. Further, you can also translate the
selected text on images as well as complete documents through Google
Lens.
• Virtual Assistant:
• A virtual personal assistant is also one of the most popular applications
of machine learning. First, it records out voice and sends to cloud-based
server then decode it with the help of machine learning algorithms. All
big companies like Amazon, Google, etc., are using these features for
playing music, calling someone, opening an app and searching data on
the internet, etc.
• Email Spam and Malware Filtering:
• Machine Learning also helps us to filter various Emails received on our
mailbox according to their category, such as important, normal, and
spam. It is possible by ML algorithms such as Multi-Layer Perceptron,
Decision tree, and Naïve Bayes classifier.
Bias and Variance in Machine
Learning
• Machine learning is a branch of Artificial Intelligence, which allows machines to
perform data analysis and make predictions.
• However, if the machine learning model is not accurate, it can make predictions
errors, and these prediction errors are usually known as Bias and Variance.
• In machine learning, these errors will always be present as there is always a slight
difference between the model predictions and actual predictions.
• The main aim of ML/data science analysts is to reduce these errors in order to get
more accurate results.
• In this topic, we are going to discuss bias and variance, Bias-variance trade-off,
Underfitting and Overfitting. But before starting, let's first understand what errors
in Machine learning are?
Errors in Machine Learning?
• In machine learning, an error is a measure of how accurately an algorithm
can make predictions for the previously unknown dataset. On the basis of
these errors, the machine learning model is selected that can perform best
on the particular dataset. There are mainly two types of errors in machine
learning, which are:
• Reducible errors: These errors can be reduced to improve the model
accuracy. Such errors can further be classified into bias and Variance.
• Bias and Variance in Machine Learning
• Irreducible errors: These errors will always be present in the model
• regardless of which algorithm has been used. The cause of these errors is
unknown variables whose value can't be reduced.
What is Bias?
• In general, a machine learning model analyses the data, find patterns in it and
make predictions.
• While training, the model learns these patterns in the dataset and applies them to
test data for prediction.
• While making predictions, a difference occurs between prediction values made
by the model and actual values/expected values, and this difference is known as
bias errors or Errors due to bias.
• It can be defined as an inability of machine learning algorithms such as Linear
Regression to capture the true relationship between the data points.
• Each algorithm begins with some amount of bias because bias occurs from
assumptions in the model, which makes the target function simple to learn. A
model has either:
• Low Bias: A low bias model will make fewer assumptions about the
form of the target function.
• High Bias: A model with a high bias makes more assumptions, and the
model becomes unable to capture the important features of our
dataset. A high bias model also cannot perform well on new data.
• Generally, a linear algorithm has a high bias, as it makes them learn
fast. The simpler the algorithm, the higher the bias it has likely to be
introduced. Whereas a nonlinear algorithm often has low bias.
• Some examples of machine learning algorithms with low bias are
Decision Trees, k-Nearest Neighbours and Support Vector Machines.
At the same time, an algorithm with high bias is Linear Regression,
Linear Discriminant Analysis and Logistic Regression.
Ways to reduce High Bias:
• High bias mainly occurs due to a much simple model. Below are some ways to
reduce the high bias:
• Increase the input features as the model is underfitted.
• Decrease the regularization term.
• Use more complex models, such as including some polynomial features.
What is a Variance Error?
• The variance would specify the amount of variation in the prediction if the
different training data was used.
• In simple words, variance tells that how much a random variable is different
from its expected value.
• Ideally, a model should not vary too much from one training dataset to another,
which means the algorithm should be good in understanding the hidden mapping
between inputs and output variables.
• Variance errors are either of low variance or high variance.
• Low variance means there is a small variation in the prediction of the target
function with changes in the training data set. At the same time, High
variance shows a large variation in the prediction of the target function with
changes in the training dataset.
• A model that shows high variance learns a lot and perform well with the training
dataset, and does not generalize well with the unseen dataset. As a result, such a
model gives good results with the training dataset but shows high error rates on
the test dataset.
• Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:
• A high variance model leads to overfitting.
• Increase model complexities.
• Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high
variance.
• Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis. At the
same time, algorithms with high variance are decision tree, Support Vector
Machine, and K-nearest neighbours.
• Ways to Reduce High Variance:
• Reduce the input features or number of parameters as a model is overfitted.
• Do not use a much complex model.
• Increase the training data.
• Increase the Regularization term.
• Different Combinations of Bias-V
Different Combinations of Bias-Variance
• There are four possible combinations of bias and variances, which are
represented by the below diagram:
1.Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine
learning model. However, it is not possible practically.
2.Low-Bias, High-Variance: With low bias and high variance, model
predictions are inconsistent and accurate on average. This case occurs
when the model learns with a large number of parameters and hence
leads to an overfitting
3.High-Bias, Low-Variance: With High bias and low variance,
predictions are consistent but inaccurate on average. This case occurs
when a model does not learn well with the training dataset or uses few
numbers of the parameter. It leads to underfitting problems in the
model.
4.High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also
inaccurate on average.
How to identify High variance or High Bias?
• High variance can be identified if the model
has:
Feature Selection & Feature Extraction
1.Feature Selection
2.Feature Extraction
• Feature Selection:- This module is used for feature
selection/dimensionality reduction on given datasets. This is
done either to improve estimators’ accuracy scores or to boost
their performance on very high-dimensional datasets.
• Feature Extraction:- This module is used to extract features in
a format supported by machine learning algorithms from the
given datasets consisting of formats such as text and image.
• The main difference:- Feature Extraction transforms an arbitrary data, such as text or
images, into numerical features that is understood by machine learning algorithms.
Feature Selection on the other hand is a machine learning technique applied on these
(numerical) features.
• Feature Selection
• Feature selection is a process of selecting a subset of relevant features from the original
set of features. The goal is to reduce the dimensionality of the feature space, simplify the
model, and improve its generalization performance. Feature selection methods can be
categorized into three types:
• Filter Methods
• Wrapper methods
• Embedded methods.
• Filter methods rank features based on their statistical properties and select the top-ranked
features. Wrapper methods use the model performance as a criterion to evaluate the
feature subset and search for the optimal feature subset. Embedded methods incorporate
feature selection as a part of the model training process.
•
• Filter Methods
• Filter methods are the simplest and most computationally
efficient methods for feature selection. In this approach, features
are selected based on their statistical properties, such as their
correlation with the target variable or their variance. These
methods are easy to implement and are suitable for datasets
with a large number of features. However, they may not always
produce the best results as they do not take into account the
interactions between features.
• Wrapper Methods
• Wrapper methods are more sophisticated than filter methods
and involve training a machine learning model to evaluate the
performance of different subsets of features. In this approach, a
search algorithm is used to select a subset of features that
results in the best model performance. Wrapper methods are
more accurate than filter methods as they take into account the
interactions between features. However, they are
computationally expensive, especially when dealing with large
datasets or complex models.
• Embedded Methods
• Embedded methods are a hybrid of filter and wrapper methods.
In this approach, feature selection is integrated into the model
training process, and features are selected based on their
importance in the model. Embedded methods are more efficient
than wrapper methods as they do not require a separate feature
selection step. They are also more accurate than filter methods
as they take into account the interactions between features.
However, they may not be suitable for all models as not all
models have built-in feature selection capabilities.
Feature Extraction
• Feature extraction is a process of transforming the original features into a new set of features that
are more informative and compact. The goal is to capture the essential information from the
original features and represent it in a lower-dimensional feature space. Feature extraction methods
can be categorized into linear methods and nonlinear methods.
• Linear methods use linear transformations such as Principal Component Analysis (PCA) and
Linear Discriminant Analysis (LDA) to extract features. PCA finds the principal components that
explain the maximum variance in the data, while LDA finds the projection that maximizes the class
separability.
• Nonlinear methods use nonlinear transformations such as Kernel PCA and Autoencoder to extract
features. Kernel PCA uses kernel functions to map the data into a higher-dimensional space and
finds the principal components in that space. Autoencoder is a neural network architecture that
learns to compress the data into a lower-dimensional representation and reconstruct it back to the
original space.
• Here is an example of feature extraction in the Mel-Frequency Cepstral Coefficients (MFCC)
method. MFCC is a nonlinear method that extracts features from audio signals for speech
recognition tasks. It first applies a filter bank to the audio signals to extract the spectral features,
then applies the Discrete Cosine Transform (DCT) to the log-magnitude spectrum to extract the
cepstral features.
Why feature selection/extraction is
required?
• Feature selection/extraction is an important step in many machine-
learning tasks, including classification, regression, and clustering. It
involves identifying and selecting the most relevant features (also
known as predictors or input variables) from a dataset while
discarding the irrelevant or redundant ones. This process is often
used to improve the accuracy, efficiency, and interpretability of
a machine-learning model.
• Here are some of the main reasons why feature selection/extraction
is required in machine learning:
1.Improved Model Performance: The inclusion of irrelevant or
redundant features can negatively impact the performance of a
machine learning model. Feature selection/extraction can help to
identify the most important and informative features, which can lead
to better model performance, higher accuracy, and lower error rates.
2.Reduced Overfitting: Including too many features in a model can
cause overfitting, where the model becomes too complex and starts to fit the
noise in the data instead of the underlying patterns. Feature
selection/extraction can help to reduce overfitting by focusing on the most
relevant features and avoiding the inclusion of noise.
3.Faster Model Training and Inference: Feature selection/extraction can
help to reduce the dimensionality of a dataset, which can make model training
and inference faster and more efficient. This is especially important in large-
scale or real-time applications, where speed and performance are critical.
4.Improved Interpretability: Feature selection/extraction can help to
simplify the model and make it more interpretable, by focusing on the most
important features and discarding the less important ones. This can help to
explain how the model works and why it makes certain predictions, which
can be useful in many applications, such as healthcare, finance, and law.
Decision Tree Classification
• Decision Tree
Algorithm
is a Supervised learning
technique that can be used for both
classification and Regression problems, but
mostly it is preferred for solving
Classification problems. It is a tree-
structured classifier, where internal nodes
represent the features of a dataset, branches
represent the decision rules and each leaf node
represents the outcome.
• In a Decision tree, there are two nodes, which
are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have
multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain
• The decisions or the test are performed on the basis of features of the
given dataset.
• It is a graphical representation for getting all the possible solutions to
a problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-
like structure.
• In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
• Below diagram explains the general structure of a decision tree:
Why use Decision Trees?
• There are various algorithms in Machine learning, so choosing the best
algorithm for the given dataset and problem is the main point to
remember while creating a machine learning model. Below are the two
reasons for using the Decision tree:
• Decision Trees usually mimic human thinking ability while making a
decision, so it is easy to understand.
• The logic behind the decision tree can be easily understood because it
shows a tree-like structure.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more homogeneous
sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from
the tree.
• Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
How does the Decision Tree
algorithm Work?
• In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm
compares the values of root attribute with the record (real dataset)
attribute and, based on the comparison, follows the branch and jumps
to the next node.
• For the next node, the algorithm again compares the attribute value
with the other sub-nodes and move further. It continues the process
until it reaches the leaf node of the tree. The complete process can be
better understood using the below algorithm:
• Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for the
best attributes.
• Step-4: Generate the decision tree node, which contains the best
attribute.
• Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node
as a leaf node.
• Example: Suppose there is a candidate who has a job offer and wants
to decide whether he should accept the offer or Not. So, to solve this
problem, the decision tree starts with the root node (Salary attribute by
ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the
corresponding labels. The next decision node further gets split into one
decision node (Cab facility) and one leaf node. Finally, the decision
node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Attribute Selection Measures
• While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to
solve such problems there is a technique which is called as Attribute
selection measure or ASM. By this measurement, we can easily
select the best attribute for the nodes of the tree. There are two popular
techniques for ASM, which are:
• Information Gain
• Gini Index
Information Gain:
• Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and
build the decision tree.
• A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest information
gain is split first. It can be calculated using the below formula:
• Advantages of the Decision Tree
• It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other
algorithms.
• Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using the
Random Forest algorithm.
• For more class labels, the computational complexity of the decision
tree may increase.