KEMBAR78
Data-Science Feature Selection & Extraction | PDF | Machine Learning | Software Engineering
0% found this document useful (0 votes)
22 views15 pages

Data-Science Feature Selection & Extraction

The document discusses feature engineering in data science, focusing on feature selection and extraction techniques. It outlines the importance of reducing input variables to improve model performance, interpretability, and accuracy while detailing various methods such as filter, wrapper, embedded, and hybrid approaches. Additionally, it distinguishes between supervised and unsupervised feature selection techniques, highlighting their respective methodologies and applications.

Uploaded by

Mrclub 3Money
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views15 pages

Data-Science Feature Selection & Extraction

The document discusses feature engineering in data science, focusing on feature selection and extraction techniques. It outlines the importance of reducing input variables to improve model performance, interpretability, and accuracy while detailing various methods such as filter, wrapper, embedded, and hybrid approaches. Additionally, it distinguishes between supervised and unsupervised feature selection techniques, highlighting their respective methodologies and applications.

Uploaded by

Mrclub 3Money
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

DATA SCIENCE & BIG DATA

ANALYTICS

FEATURE ENGINEERING
(Feature Selection & Extraction)

Dr. S. N. Ahsan
Feature Selection
 Feature selection is the process of reducing the number of
input variables when developing a predictive model.
 Feature selection is the process of selecting a subset of most
relevant predicting features for use in machine learning model
building.
 Feature elimination helps a model to perform better by
weeding out redundant features and features that are not
providing much insight.
 It is economical in computing power because there are fewer
features to train on. Results are more interpretable, and it
reduces chance of overfitting by detecting collinear features
and improves model accuracy if methods are used
intelligently.
Feature Extraction
 Feature Extraction aims to reduce the number of
features in a dataset by creating new features from the
existing ones (and then discarding the original features).
 These new reduced set of features should then be able
to summarize most of the information contained in the
original set of features. In this way, a summarized
version of the original features can be created from a
combination of the original set.
Supervised and Unsupervised
Feature Extraction
High-Level Taxonomy for Feature Engineering
Extended Taxonomy of Supervised
Feature Selection Methods
Feature Selection Categories
Supervised & Unsupervised Feature
Selection
 Supervised feature selection techniques use the target
variable, such as methods that remove irrelevant
variables..
 Unsupervised feature selection techniques ignores the
target variable, such as methods that remove redundant
variables using correlation.
General Frameworks of Supervised (a)
and Unsupervised (b) Feature Selection
Feature Selection Methods
Filter Method
 In the Filter, method features are selected based on
statistical measures. It is independent of the learning
algorithm and requires less computational time.
Information gain, chi-square test, Fisher score,
correlation coefficient, and variance threshold are some
of the statistical measures used to understand the
importance of the features.
 This method should be used for preliminary screening. It
can detect constant, duplicated, and correlated features.
Usually not the best performance in terms of reducing
features. Being said that, It should be the first step for
feature reduction as it deals with multicollinearity of the
features depending on method used.
Wrapper Method
 The Wrapper methodology considers the selection of
feature sets as a search problem, where different
combinations are prepared, evaluated, and compared to
other combinations. A predictive model is used to
evaluate a combination of features and assign model
performance scores.
 The performance of the Wrapper method depends on
the classifier. The best subset of features is selected
based on the results of the classifier.
 Wrapper methods are computationally more expensive
than filter methods, due to the repeated learning steps
and cross-validation. However, these methods are more
accurate than the filter method. Some of the examples
are Recursive feature elimination, Sequential feature
selection algorithms, and Genetic algorithms.
Embedded Method
 In the Embedded method, there are ensemble learning
and hybrid learning methods for feature selection. Since
it has a collective decision, its performance is better than
the other two models. Random forest is one such
example. It is computationally less intensive than
wrapper methods. However, this method has a drawback
specific to a learning model.
 In embedded techniques, the feature selection algorithm
is integrated as part of the learning algorithm. The most
typical embedded technique is the decision tree
algorithm. Decision tree algorithms select a feature in
each recursive step of the tree growth process and
divide the sample set into smaller subsets.
Hybrid Method
 The process of creating hybrid feature selection methods
depends on what you choose to combine. The main
priority is to select the methods you’re going to use, then
follow their processes.
 The idea here is to use these ranking methods to
generate a feature ranking list in the first step, then use
the top k features from this list to perform wrapper
methods. With that, we can reduce the feature space of
our dataset using these filter-based rangers to improve
the time complexity of the wrapper methods.
Extended Taxonomy of Unsupervised
Feature Selection Methods

You might also like