KEMBAR78
Feature Selection | PDF | Applied Mathematics | Computer Science
0% found this document useful (0 votes)
20 views17 pages

Feature Selection

Uploaded by

malkmoh781.mm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views17 pages

Feature Selection

Uploaded by

malkmoh781.mm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Feature Selection

Machine learning
Prepared by / Abdelrahman Hassan
INTRODUCTION
5

WHAT IS FEATURE SELECTION?


1

Agenda FEATURE SELECTION MODELS


6
HOW TO CHOOSE A FEATURE
SELECTION MODEL?
2
Introduction
• The input variables that we give to our machine learning models are
called features. Each column in our dataset constitutes a feature. To
train an optimal model, we need to make sure that we use only the
essential features. If we have too many features, the model can
capture the unimportant patterns and learn from noise. The method
of choosing the important parameters of our data is called Feature
Selection.

20XX presentation title 3


Cont.
• To train a model, we collect enormous quantities of data
to help the machine learn better. Usually, a good portion
of the data collected is noise, while some of the
columns of our dataset might not contribute
significantly to the performance of our model. Further,
having a lot of data can slow down the training process
and cause the model to be slower. The model may also
learn from this irrelevant data and be inaccurate.

20XX presentation title 4


Cont.
• Consider a table which contains information on old cars. The model
decides which cars must be crushed for spare parts.

20XX presentation title 5


Cont.
• In the above table, we can see that the model of the car, the year of
manufacture, and the miles it has traveled are important to find out if
the car is old enough to be crushed or not. However, the name of the
previous owner of the car does not decide if the car should be crushed
or not. Further, it can confuse the algorithm into finding patterns
between names and the other features. Hence, we can drop the
column.

20XX presentation title 6


Cont.

20XX presentation title 7


What is feature selection?
• Feature Selection is the method of reducing the input variable to
your model by using only relevant data and getting rid of noise in
data.

20XX presentation title 8


Feature selection models
• Feature selection models are of two types:

1. Supervised Models: Supervised feature selection refers to the method


which uses the output label class for feature selection. They use the
target variables to identify the variables which can increase the
efficiency of the model

2. Unsupervised Models: Unsupervised feature selection refers to the


method which does not need the output label class for feature selection.
We use them for unlabeled data.

20XX presentation title 9


Cont.

20XX presentation title 10


Cont.
• Filter Method: In this method, features
are dropped based on their relation to
the output, or how they
are correlating to the output. We use
correlation to check if the features are
positively or negatively correlated to
the output labels and drop features
accordingly. E.g: Information Gain,
Fisher’s Score, etc.

20XX presentation title 11


Cont.
• Wrapper Method: We split our data into
subsets and train a model using this.
Based on the output of the model, we
add and subtract features and train the
model again. It forms the subsets using
a greedy approach and evaluates the
accuracy of all the possible
combinations of features. E.g: Forward
Selection, Backwards Elimination, etc.

20XX presentation title 12


Cont.
• Intrinsic Method: This method
combines the qualities of both the
Filter and Wrapper method to create
the best subset.

20XX presentation title 13


Cont.

20XX presentation title 14


How to choose a feature selection
model?
• How do we know which feature selection model will work out for our
model? The process is relatively simple, with the model depending on
the types of input and output variables.

Variables are of two main types:


• Numerical Variables: Which include integers and float numbers.
• Categorical Variables: Which include labels, strings, Boolean variables,
etc.

20XX presentation title 15


Cont.
• Based on whether we have
numerical or categorical
variables as inputs and
outputs, we can choose our
feature selection model as
follows:

20XX presentation title 16


Thank you

You might also like