DATA SCIENCE & BIG DATA
ANALYTICS
FEATURE ENGINEERING
(Feature Selection & Extraction)
Dr. S. N. Ahsan
Feature Selection
Feature selection is the process of reducing the number of
input variables when developing a predictive model.
Feature selection is the process of selecting a subset of most
relevant predicting features for use in machine learning model
building.
Feature elimination helps a model to perform better by
weeding out redundant features and features that are not
providing much insight.
It is economical in computing power because there are fewer
features to train on. Results are more interpretable, and it
reduces chance of overfitting by detecting collinear features
and improves model accuracy if methods are used
intelligently.
Feature Extraction
Feature Extraction aims to reduce the number of
features in a dataset by creating new features from the
existing ones (and then discarding the original features).
These new reduced set of features should then be able
to summarize most of the information contained in the
original set of features. In this way, a summarized
version of the original features can be created from a
combination of the original set.
Supervised and Unsupervised
Feature Extraction
High-Level Taxonomy for Feature Engineering
Extended Taxonomy of Supervised
Feature Selection Methods
Feature Selection Categories
Supervised & Unsupervised Feature
Selection
Supervised feature selection techniques use the target
variable, such as methods that remove irrelevant
variables..
Unsupervised feature selection techniques ignores the
target variable, such as methods that remove redundant
variables using correlation.
General Frameworks of Supervised (a)
and Unsupervised (b) Feature Selection
Feature Selection Methods
Filter Method
In the Filter, method features are selected based on
statistical measures. It is independent of the learning
algorithm and requires less computational time.
Information gain, chi-square test, Fisher score,
correlation coefficient, and variance threshold are some
of the statistical measures used to understand the
importance of the features.
This method should be used for preliminary screening. It
can detect constant, duplicated, and correlated features.
Usually not the best performance in terms of reducing
features. Being said that, It should be the first step for
feature reduction as it deals with multicollinearity of the
features depending on method used.
Wrapper Method
The Wrapper methodology considers the selection of
feature sets as a search problem, where different
combinations are prepared, evaluated, and compared to
other combinations. A predictive model is used to
evaluate a combination of features and assign model
performance scores.
The performance of the Wrapper method depends on
the classifier. The best subset of features is selected
based on the results of the classifier.
Wrapper methods are computationally more expensive
than filter methods, due to the repeated learning steps
and cross-validation. However, these methods are more
accurate than the filter method. Some of the examples
are Recursive feature elimination, Sequential feature
selection algorithms, and Genetic algorithms.
Embedded Method
In the Embedded method, there are ensemble learning
and hybrid learning methods for feature selection. Since
it has a collective decision, its performance is better than
the other two models. Random forest is one such
example. It is computationally less intensive than
wrapper methods. However, this method has a drawback
specific to a learning model.
In embedded techniques, the feature selection algorithm
is integrated as part of the learning algorithm. The most
typical embedded technique is the decision tree
algorithm. Decision tree algorithms select a feature in
each recursive step of the tree growth process and
divide the sample set into smaller subsets.
Hybrid Method
The process of creating hybrid feature selection methods
depends on what you choose to combine. The main
priority is to select the methods you’re going to use, then
follow their processes.
The idea here is to use these ranking methods to
generate a feature ranking list in the first step, then use
the top k features from this list to perform wrapper
methods. With that, we can reduce the feature space of
our dataset using these filter-based rangers to improve
the time complexity of the wrapper methods.
Extended Taxonomy of Unsupervised
Feature Selection Methods