Course Code: CSA3002
MACHINE LEARNING ALGORITHMS
Course Type: LPC – 2-2-3
Course Outcomes
At the end of the course, students should be able to
1. Understanding of training and testing the datasets using machine
Learning techniques.
2. Apply optimization and parameter tuning techniques for machine
Learning algorithms.
3. Apply a machine learning model to solve various problems using
machine learning algorithms.
4. Apply machine learning algorithm to create models.
Course Objectives
• The objective of the course is to familiarize the learners with
the concepts of Machine Learning Algorithms and attain
Skill Development through Experiential Learning
techniques.
Supervised Learning - Regression
Analysis
• It consists of a set of machine learning methods that allow us to
predict a continuous outcome variable (y) based on the value of one or
multiple independent variables (x).
• It predicts continuous/real values such as temperature, age, salary,
price, etc.
Terminologies
• Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
• Independent Variable: The factors which are used to predict the values of the
dependent variables are called independent variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
• Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is called Overfitting.
And if our algorithm does not perform well even with training dataset, then such
problem is called underfitting.
Types of Regression
• Simple Linear Regression
• One dependent variable (interval or ratio)
• One independent variable (interval or ratio or dichotomous)
• Multiple Linear Regression
• One dependent variable (interval or ratio)
• Two or more independent variables (interval or ratio or dichotomous)
• Logistic Regression
• One dependent variable (binary)
• Two or more independent variable(s) (interval or ratio or dichotomous)
Linear Regression
• Linear regression is a statistical regression method which is used for
predictive analysis.
• It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis),
hence called linear regression.
• If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear
regression.
• The mathematical
equation for Linear
regression:
• Y= a+bX
• Here, Y = dependent
variables (target
variables),
X= Independent
variables (predictor
variables),
a and b are the linear
coefficients
Find linear regression equation for the following
two sets of data: Also find the value of Y for x=12
find the value of Y for x=12
Y=1.5 + (0.95 * 12)
Y= 12.5
Problem
a) Find the regression line y=a+bx
b) Use the regression line as a model to estimate the sales of the
company in 2012.
Solution
The following data are math aptitude test and statistics
score for five students.
Maths 95 85 80 70 60
Statistics 85 95 70 65 70
1. What linear regression equation best predicts statistics
performance, based on math aptitude scores?
2. If a student made a 75 on the math aptitude test, what grade
would we expect her to make in statistics?
Logistics Regression
• Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning technique.
It is used for predicting the categorical dependent variable using a
given set of independent variables.
• Logistic regression predicts the output of a categorical dependent
variable. Therefore, the outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except
that how they are used. Linear Regression is used for solving
Regression problems, whereas Logistic regression is used for
solving the classification problems.
• In Logistic regression, instead
of fitting a regression line, we
fit an "S" shaped logistic
function, which predicts two
maximum values (0 or 1).
• Logistic Regression is a
significant machine learning
algorithm because it has the
ability to provide probabilities
and classify new data using
continuous and discrete
datasets.
Logistic Function (Sigmoid Function):
• The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the "S" form. The
S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the
threshold value tends to 1, and a value below the threshold values
tends to 0.
Apply linear function in sigmoid function to get S type
curve
Example:
• Example: Email Classification
• Imagine you're working on an email system, and you want to
automatically classify emails as either "spam" or "not spam" (ham).
Logistic regression can help you build a model that assigns a
probability of an email being spam or not.
• Logistic Regression Concept:
• Logistic regression uses the logistic function (also called the sigmoid
function) to map any input into a value between 0 and 1. This value
represents the probability of an instance belonging to the positive
class (in our case, spam).
• Applying Logistic Regression to the Example:
• Here's how you might apply logistic regression to classify emails:
• Data Preparation: Collect a dataset of emails, where each email has
features like the number of words, the presence of certain keywords,
etc. Each email is labeled as either "spam" (1) or "not spam" (0).
• Feature Scaling: Normalize or scale the features so that they're on a
similar scale. This can improve the convergence of the algorithm.
• Model Training: Use logistic regression to find the best parameters
(coefficients) for your model that maximizes the likelihood of the
observed data. This involves finding the best-fitting sigmoid curve.
• Prediction: Once the model is trained, you can input the features of a
new email and calculate z. Then, plug z into the logistic function to get
the probability of it being spam.
• Thresholding: Choose a threshold (commonly 0.5) above which you
classify the email as spam, and below which you classify it as not spam.
• Conclusion:
• Logistic regression is a powerful tool for binary classification tasks like
spam detection. It's easy to understand because it provides probabilities
that an instance belongs to a certain class. The concept of mapping a
linear combination of features to a probability using the sigmoid function
is at the core of logistic regression. In real-world applications, logistic
regression is used in a wide range of areas, from medical diagnosis to
sentiment analysis in natural language processing.