KEMBAR78
Supervised Learning Algorithmn | PDF | Machine Learning | Support Vector Machine
0% found this document useful (0 votes)
10 views4 pages

Supervised Learning Algorithmn

DATA SCIENCE PROCESS

Uploaded by

vijayaselvi2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Supervised Learning Algorithmn

DATA SCIENCE PROCESS

Uploaded by

vijayaselvi2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Supervised machine learning

 Supervised machine learning is a fundamental approach for machine learning and


artificial intelligence.
 It involves training a model using labeled data, where each input comes with a
corresponding correct output.
 The process is like a teacher guiding a student—hence the term "supervised" learning.
 The goal of supervised learning is to make accurate predictions when given new, unseen
data.
 For example, if a model is trained to recognize handwritten digits, it will use what it
learned to correctly identify new numbers it hasn't seen before .

 A fundamental concept in supervised machine learning is learning a class from examples.


 This involves providing the model with examples where the correct label is known, such
as learning to classify images of cats and dogs by being shown labeled examples of both.
 The model then learns the distinguishing features of each class and applies this knowledge
to classify new images .
How Supervised Machine Learning Works?

 Supervised learning algorithm consists of input features and corresponding output


labels.
 The process works through:
 Training Data: The model is provided with a training dataset that includes input data
(features) and corresponding output data (labels or target variables).
 Learning Process: The algorithm processes the training data, learning the
relationships between the input features and the output labels.

 Supervised machine learning model is trained on a dataset to learn a mapping function


between input and output.
 With learned function is used to make predictions on new data.
 Training phase
 involves feeding the algorithm labeled data, where each data point is paired with its
correct output.
 The algorithm learns to identify patterns and relationships between the input and
output data.
 Testing phase
 involves feeding the algorithm new, unseen data and evaluating its ability to predict
the correct output based on the learned patterns.

Supervised learning problems are generally categorized into two main types:
 Classification
 Regression

 Figure A: It is a dataset of a shopping store that is useful in predicting whether a customer


will purchase a particular product under consideration or not based on his/ her gender, age,
and salary.
Input: Gender, Age, Salary
Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that the
customer won't purchase it.

 Figure B: It is a Meteorological dataset that serves the purpose of predicting wind speed
based on different parameters.
Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
Output: Wind Speed

Most widely used supervised learning algorithms are:


1. Linear Regression
Linear regression is used to predict a continuous value by finding the best-fit straight line
between input (independent variable) and output (dependent variable)
 Minimizes the difference between actual values and predicted values using a method called
"least squares" to to best fit the data.
 Predicting a person’s weight based on their height or predicting house prices based on size.

2. Logistic Regression
Logistic regression predicts probabilities and assigns data points to binary classes (e.g., spam or
not spam).
 It uses a logistic function (S-shaped curve) to model the relationship between input features
and class probabilities.
 Used for classification tasks (binary or multi-class).
 Outputs probabilities to classify data into categories.
 Example : Predicting whether a customer will buy a product online (yes/no) or diagnosing if
a person has a disease (sick/not sick).

3. Decision Trees
A decision tree splits data into branches based on feature values, creating a tree-like
structure.
 Each decision node represents a feature; leaf nodes provide the final prediction.
 The process continues until a final prediction is made at the leaf nodes
 Works for both classification and regression tasks.
For more decision tree algorithms, you can explore:
 Iterative Dichotomiser 3 (ID3) Algorithms
 C5. Algorithms
 Classification and Regression Trees Algorithms

4. Support Vector Machines (SVM)


SVMs find the best boundary (called a hyperplane) that separates data points into different
classes.
 Uses support vectors (critical data points) to define the hyperplane.
 Can handle linear and non-linear problems using kernel functions .
 focuses on maximizing the margin between classes , making it robust for high-dimensional
data or complex patterns.

5. k-Nearest Neighbors (k-NN)


KNN is a simple algorithm that predicts the output for a new data point based on the similarity
(distance) to its nearest neighbors in the training dataset, used for both classification and
regression tasks.
 Calculates distance between point with existing data points in training dataset using a distance
metric (e.g., Euclidean, Manhattan, Minkowski)
 identifies k nearest neighbors to new data point based on the calculated distances.
o For classification, algorithm assigns class label that is most common among its k
nearest neighbors.
o For regression, the algorithm predicts the value as the average of the values of its
k nearest neighbors.

6. Naive Bayes
Based on Bayes' theorem and assumes all features are independent of each other (hence "naive")
 Calculates probabilities for each class and assigns the most likely class to a data point.
 Assumption of feature independence might not hold in all cases ( rarely true in real-world
data )
 Works well for high-dimensional data.
 Commonly used in text classification tasks like spam filtering : Naive Bayes

7. Random Forest
Random forest is an ensemble method that combines multiple decision trees.
 Uses random sampling and feature selection for diversity among trees.
 Final prediction is based on majority voting (classification) or averaging (regression).
 Advantages : reduces overfitting compared to individual decision trees.
 Handles large datasets with higher dimensionality.

7. Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost)


These algorithms build models sequentially, meaning each new model corrects errors made by
previous ones. Combines weak learners (like decision trees) to create a strong predictive
model. Effective for both regression and classification tasks. : Gradient Boosting in ML
 XGBoost (Extreme Gradient Boosting) : Advanced version of Gradient Boosting that includes
regularization to prevent overfitting. Faster than traditional Gradient Boosting, for large
datasets.
 LightGBM (Light Gradient Boosting Machine) : Uses a histogram-based approach for faster
computation and supports categorical features natively.
 CatBoost: Designed specifically for categorical data, with built-in encoding techniques. Uses
symmetric trees for faster training and better generalization.
For more ensemble learning and gradient boosting approaches, explore:
 AdaBoost
 Stacking - ensemble learning

8. Neural Networks ( Including Multilayer Perceptron)


Neural Networks, including Multilayer Perceptrons (MLPs), are considered part of supervised
machine learning algorithms as they require labeled data to train and learn the relationship
between input and desired output; network learns to minimize the
error using backpropagation algorithm to adjust weights during training.
 Multilayer Perceptron (MLP): Neural network with multiple layers of nodes.
 Used for both classification and regression ( Examples: image classification, spam detection,
and predicting numerical values like stock prices or house prices)

Practical Examples of Supervised learning


Few practical examples of supervised machine learning across various industries:
 Fraud Detection in Banking : Utilizes supervised learning algorithms on historical
transaction data, training models with labeled datasets of legitimate and fraudulent
transactions to accurately predict fraud patterns.
 Parkinson Disease Prediction: Parkinson’s disease is a progressive disorder that affects the
nervous system and the parts of the body controlled by the nerves.
 Customer Churn Prediction : Uses supervised learning techniques to analyze historical
customer data, identifying features associated with churn rates to predict customer retention
effectively.
 Cancer cell classification: Implements supervised learning for cancer cells based on their
features, and identifying them if they are ‘malignant’ or ‘benign.
 Stock Price Prediction: Applies supervised learning to predict a signal that indicates whether
buying a particular stock will be helpful or not.

You might also like