0% found this document useful (0 votes)

22 views15 pages

PR - Unit 4

Uploaded by

Khushbu Pandya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views15 pages

PR - Unit 4

Uploaded by

Khushbu Pandya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

IAR University

Department of Computer Sciences and Engineering

B.Tech (CE-AI) SEM V
Subject: CE447 Pattern Recognition
Unit 4

Fisher Discriminant Analysis

Fisher Discriminant Analysis (FDA), also known as Linear Discriminant Analysis (LDA), is a
technique used in statistics and machine learning to classify or separate data points into
different groups. It's commonly applied when you have two or more classes (or categories) and
want to find a way to distinguish between them.

Key Ideas of Fisher Discriminant Analysis

1. Class Separation: The main goal of FDA is to find a line or direction that best separates
the classes in the data. For instance, if you have two types of flowers, FDA will try to
find the line that best divides them based on their characteristics, like petal length and
width.
2. Maximizing Distance Between Classes: FDA works by finding a direction that
maximizes the distance between the means (centers) of each class while minimizing the
variation (spread) within each class. This way, the classes are as separate as possible in
this new direction.
3. Projection to a New Axis: Once the best direction is found, all data points are projected
onto this axis. This makes the data more distinguishable along a single dimension, even
if it initially had many dimensions.
4. Classification: After projecting the data, we can set a threshold along the new axis to
decide the boundary between classes. If a new data point falls on one side of the
threshold, it belongs to one class; if it falls on the other side, it belongs to the other
class.

Steps in Fisher Discriminant Analysis

1. Calculate Class Means: For each class, calculate the average (mean) value for each
feature.
2. Compute Scatter Matrices:
o Within-class scatter: Measures the spread of points within each class.
o Between-class scatter: Measures the distance between the means of each class.
3. Find the Optimal Direction: The next step is to find a direction that maximizes the
ratio of between-class scatter to within-class scatter. This is the "discriminant" direction
that best separates the classes.
4. Project Data: Project the original data points onto this direction to get a single value
for each point. This value represents where it falls along the new axis.
5. Classify New Points: Use the projection to assign new points to one of the classes by
checking where it falls relative to the threshold on the new axis.

Dr. Pankti Bhatt

Example in Practice

Imagine you have two types of flowers, and you measured the petal length and petal width of
each. The two types of flowers form two clusters in this feature space. FDA will try to find a
line that best divides these two clusters by considering the centers and spreads of the clusters.
Once it finds this line, you can project new flowers onto it and determine which type they are
based on where they fall on the line.

Advantages and Limitations

Advantages:

• Simple and interpretable, as it finds a single direction that best separates classes.
• Effective for linearly separable data (where a line or plane can split the classes).

Limitations:

• Assumes that data for each class is normally distributed and has the same variance,
which may not always be true.
• May not work well if classes are not linearly separable.

In summary, Fisher Discriminant Analysis helps separate classes by finding a single direction
where the differences between them are most pronounced, making classification easier and
more accurate.

Principal Component Analysis

Principal Component Analysis (PCA) is a mathematical technique used to simplify complex

datasets. It helps reduce the number of variables in a dataset while retaining as much
information as possible. PCA is especially useful when working with high-dimensional data
(data with many variables), making it easier to visualize and analyze.

Why Use PCA?

1. Data Simplification:
o Imagine trying to understand a dataset with hundreds of variables. PCA helps
condense these variables into a smaller number of "principal components" that
capture the most important patterns.
2. Remove Redundancy:
o Many datasets have correlated variables (e.g., height and weight are related).
PCA identifies and removes this redundancy.
3. Better Visualization:
o High-dimensional data is hard to visualize. PCA reduces the dataset to two or
three dimensions so you can plot and see patterns.
4. Speed Up Computation:
o Fewer variables mean faster computations, especially in machine learning
models.

Dr. Pankti Bhatt

How PCA Works

Think of PCA as finding a new way to look at the data. Instead of using the original variables,
it creates principal components, which are new variables that capture most of the variation in
the data.

Here are the steps:

1. Standardize the Data

• Data is standardized so that all variables have the same scale (mean = 0, standard
deviation = 1). This is important because variables measured on different scales (e.g.,
height in meters vs. weight in kilograms) can dominate the results.

2. Find the Covariance Matrix

• The covariance matrix measures how variables relate to each other. If two variables
change together (e.g., height and weight increase together), they will have a high
covariance.

3. Calculate Eigenvalues and Eigenvectors

• The covariance matrix is used to compute eigenvalues and eigenvectors.

o Eigenvectors define the directions of the principal components.
o Eigenvalues indicate how much information (variance) each principal
component captures.

4. Select Principal Components

• Principal components are ranked by their eigenvalues. The first principal component
captures the most variation, the second captures the next most, and so on.
• You choose how many components to keep, typically enough to explain 90-95% of the
total variation in the data.

5. Transform the Data

• The original data is projected onto the selected principal components, creating a new
dataset with fewer dimensions.

Key Terms

• Principal Components:
o These are new variables created by PCA. Each principal component is a linear
combination of the original variables.
• Variance:
o A measure of how spread out the data is. PCA aims to capture the directions
where the data varies the most.
• Dimensionality Reduction:

Dr. Pankti Bhatt

o The process of reducing the number of variables while retaining the important
information.

Example in Simple Terms

Imagine you’re analyzing the preferences of customers in a bakery. You’ve collected data on
how much they like cookies, cakes, and pastries. Here’s what PCA would do:

1. Standardize: Make sure that "likes cookies," "likes cakes," and "likes pastries" are
measured on the same scale.
2. Covariance Matrix: Calculate how much liking cookies is related to liking cakes and
pastries.
3. Eigenvalues and Eigenvectors: Identify the directions (principal components) that
capture the most customer preference patterns.
4. Select Components: If the first two components capture 95% of the variation, you’ll
use only these two.
5. Transform: Combine "likes cookies," "likes cakes," and "likes pastries" into just two
new variables (the first two principal components).

Now, instead of analyzing three variables, you work with two, making the analysis simpler
without losing much information.

Visual Explanation

1. Original Data: Imagine data points scattered in 3D space (three variables).

2. Principal Component 1: PCA finds the line (direction) along which the data is most
spread out.
3. Principal Component 2: PCA finds a second line, perpendicular to the first, capturing
the next most spread-out direction.
4. Projection: The data is projected onto these new lines, reducing dimensions (e.g., from
3D to 2D).

Advantages of PCA

1. Simplifies Analysis:
o Reduces the number of variables while keeping the core patterns.
2. Improves Performance:
o Speeds up machine learning algorithms by reducing dimensions.
3. Removes Noise:
o Filters out less important information (lower-variance components).
4. Visualization:
o Helps visualize high-dimensional data in 2D or 3D plots.

Limitations of PCA

1. Loss of Interpretability:
o Principal components are combinations of original variables, so they may not
have a clear meaning.

Dr. Pankti Bhatt

2. Linear Assumptions:
o PCA assumes relationships between variables are linear.
3. Sensitive to Scaling:
o Results can change if data isn’t standardized.
4. Not Always Suitable:
o If most variables are important, reducing dimensions might lose critical
information.

When to Use PCA

• When you have too many variables, and they’re highly correlated.
• When you want to visualize high-dimensional data.
• When you want to reduce the complexity of your machine learning model.

In summary, PCA is a powerful tool for reducing the number of variables in a dataset while
keeping the most important information. It simplifies analysis, speeds up computation, and
makes data easier to visualize and interpret.

Gradient Descent Procedures

In pattern recognition, factor analysis and linear discriminant analysis (LDA) are techniques
used to identify and classify patterns in data. Gradient descent is an optimization method often
used to "teach" these models by finding the best parameters that minimize some form of error.

1. Factor Analysis

Factor analysis is a technique used to reduce the number of variables (features) in data by
finding a few "factors" that can explain the relationships among the observed variables. This
is useful in pattern recognition because it simplifies complex data, making it easier to identify
meaningful patterns.

For example, imagine we have a dataset with numerous observed features like age, height,
weight, income, and education level. These features could be combined into fewer "factors"
(e.g., lifestyle or socioeconomic status) that capture the main patterns without needing all the
original data.

2. Linear Discriminant Analysis (LDA)

LDA is a technique used for classifying data into different categories (or classes) by finding
a line (or a decision boundary) that best separates those classes. It’s commonly used in pattern
recognition to tell which category a new data point should belong to.

For example, suppose you have a dataset with two classes, say, "Apples" and "Oranges." Each
fruit has features like weight and color intensity. LDA finds a decision boundary (or line) that
separates the "Apples" from the "Oranges" by maximizing the distance between the two
classes.

Dr. Pankti Bhatt

3. Gradient Descent

Gradient descent is an optimization technique used in both factor analysis and LDA to "tune"
the model by adjusting parameters (weights) to minimize the classification error.

Here’s how it works in a simplified way:

1. Initialize Parameters: Start with some random values for the parameters (like the
factors or weights in the model).
2. Calculate the Error: Check how far off the current model predictions are from the
actual class labels.
3. Update Parameters: Adjust the parameters a little in the direction that reduces the
error, based on the gradient.
4. Repeat: Keep doing this until the error stops decreasing or becomes very small,
meaning the model has learned the best parameters to recognize patterns in the data.

Example

Let’s take a simple example:

Imagine you’re a farmer with a data set of fruits with features like size and color intensity.
You want to classify these fruits as either apples or oranges.

1. Applying LDA: You’ll use LDA to find the best boundary (line) that separates apples
from oranges.
2. Using Gradient Descent in LDA:
o You start by randomly setting the boundary line and calculate how many fruits
are misclassified.
o If 20% of apples are misclassified as oranges and 15% of oranges as apples,
that’s the "error."
o Gradient descent will adjust the boundary line slightly to reduce this error, step
by step.
o After enough iterations, the boundary line will have adjusted to minimize
misclassification.

Why Gradient Descent Helps

Without gradient descent, it would be challenging to find the ideal parameters that best separate
your classes. Gradient descent systematically reduces the error, leading to a model that
effectively recognizes patterns by tuning itself based on the data.

So, in pattern recognition using factor analysis or LDA, gradient descent is crucial because it
enables the model to "learn" the parameters that best explain and classify the data.

Dr. Pankti Bhatt

Perceptron

The Perceptron is a basic unit in neural networks and an algorithm for classification in pattern
recognition. It’s a type of linear classifier that’s particularly useful for understanding how
models like linear discriminant analysis (LDA) work, as it shares a similar goal: separating
data points into different classes.

In the context of pattern recognition, factor analysis, and linear discriminant functions, a
Perceptron can be used to classify data into categories by finding a boundary (usually a line)
that separates those categories based on their features.

1. What is a Perceptron?

A Perceptron is a simple algorithm that takes multiple input values (features) and outputs a
single value (a decision about classifying the data into one of two classes). It's based on a
weighted sum of inputs and a threshold:

1. Weighted Sum of Inputs: Each input (feature) is multiplied by a weight that shows its
importance, and all the weighted inputs are added up.
2. Activation/Threshold: If this weighted sum exceeds a certain threshold, the Perceptron
"fires" (outputs 1), and if not, it does not fire (outputs 0). This is a very simple way of
deciding if an input belongs to one class or another.

Think of the Perceptron as a "yes" or "no" decision-maker.

2. How It Works in Pattern Recognition

In pattern recognition, you might have data with two classes that you want to separate, such as
images of cats and dogs. Each image has features like size, color, and shape. The Perceptron
will try to learn weights for each feature to create a decision boundary that separates cats from
dogs.

The Perceptron learns by:

1. Taking an image’s features as inputs.

2. Calculating a weighted sum of these features.
3. Making a decision about the image's class (cat or dog) based on whether the weighted
sum is above or below a threshold.

3. Training a Perceptron

The Perceptron learns how to classify data by adjusting its weights through a process called
training, using gradient descent or Perceptron learning rule. During training:

1. The Perceptron receives an input and makes an initial guess about the class.
2. If it’s wrong, it adjusts the weights a little bit to improve its next prediction.
3. This process is repeated until the Perceptron can separate the classes correctly.

Dr. Pankti Bhatt

4. Example in Simple Terms

Suppose you want to classify emails as either "spam" or "not spam" based on features like:

• Number of times "free" appears

• Email length
• Presence of specific keywords

The Perceptron will learn weights for each of these features. For example:

• "free" might have a high weight because it’s often in spam emails.
• Email length might have a smaller weight because both spam and non-spam emails
can be short or long.
• Specific keywords might have weights based on their relevance.

During training, the Perceptron will adjust these weights to minimize misclassifications. After
training, if you input a new email, it will use these learned weights to classify the email as spam
or not spam.

5. Perceptron in Linear Discriminant Functions

In linear discriminant functions like LDA, the goal is similar: to find a decision boundary
that separates classes. The Perceptron, as a linear classifier, also finds a boundary, but it does
so with an iterative learning process that adjusts weights based on errors. The Perceptron only
works well when the classes are linearly separable (can be separated by a straight line or
plane).

Summary

• The Perceptron is a simple linear classifier that decides between two classes based on
a weighted sum of inputs.
• It’s trained to minimize classification errors by adjusting weights, much like how
gradient descent tunes parameters in other models.
• In pattern recognition, the Perceptron can classify objects (like spam vs. not spam) by
finding a linear boundary based on input features.

Dr. Pankti Bhatt

Support Vector Machine Non Metric Methods For Pattern Classification: Non Numeric
Data Or Numeric Data

Support Vector Machines (SVMs) are typically used for classification tasks and work well with
numeric data. However, when dealing with non-numeric (categorical) data or when a non-
metric approach is needed, modifications are required. Let's break down the concepts in simple
language, focusing on how SVMs can be adapted for non-metric data (data that isn't just
numbers or doesn't follow traditional distance-based metrics).

Traditional SVM Recap (for Numeric Data)

In a basic SVM for numeric data:

1. Input: Numeric features (like height, weight, age, etc.)

2. Output: A model that classifies data into two or more categories.
3. Mechanism: It looks for a hyperplane (a line in 2D or a plane in 3D) that separates
different classes of data while maximizing the distance between the closest points from
both classes (called the "margin").

Challenge with Non-Metric (or Non-Numeric) Data

Non-metric data includes things like categorical variables (e.g., color, type of product, or even
text data). This data doesn't have a natural ordering or numeric relationship, so measuring
"distances" between points isn't straightforward.

Approaches to Handle Non-Metric Data in SVM

1. Kernel Trick

The kernel trick is a core part of SVMs that allows it to work in non-linear spaces, meaning
you don’t always need numeric inputs in the traditional sense. For non-metric data, the kernel
function is modified to measure the similarity between different categories rather than
measuring "distance."

Example:

Imagine you're classifying animals into "mammals" and "non-mammals." You might have
categories like "has fur," "lays eggs," "number of legs," etc. Here, instead of numeric distances,
you can use a kernel function that compares these categories:

• "Has fur" vs. "Doesn’t have fur"

• "Lays eggs" vs. "Doesn’t lay eggs"

One such kernel is the Hamming kernel, which measures how different two sets of categorical
features are (by counting mismatches between them).

2. Encoding Categorical Data

One way to handle non-metric or categorical data is by converting it into a numeric format.
Some common methods for encoding are:

Dr. Pankti Bhatt

• One-Hot Encoding: This creates a new binary column for each category. If you have
a feature like "color" with values [red, blue, green], you convert it into three binary
features: [1, 0, 0] for "red", [0, 1, 0] for "blue", and [0, 0, 1] for "green."

Once this data is transformed into numeric form, the traditional SVM approach can be
applied.

Example: You’re classifying clothing items based on "color" and "type" (shirt, pants,
jacket). After encoding, each clothing item is represented numerically (e.g., [0,1,0] for
blue, and [1,0,0] for a shirt), and SVM can be used as usual.

• Label Encoding: Each category is assigned a unique number (e.g., red = 1, blue = 2,
green = 3). This is simpler but assumes some implicit ordering or relationship between
categories, which might not always be appropriate.

3. String Kernel for Text Data

For text data or any non-numeric sequences (like DNA sequences), you can use specialized
kernels like the String Kernel. This kernel measures the similarity between two strings or
sequences based on their common subsequences.

Example:

Consider two DNA sequences:

• ACGT
• ACTT

The String Kernel would look for shared subsequences like "AC" and "T" to calculate how
similar they are, without needing to turn the data into numbers. This similarity measure can
then be used in the SVM algorithm.

4. Graph Kernel for Structured Data

If your data has a structured, non-numeric form (e.g., graphs representing chemical compounds
or social networks), graph kernels can be used. These kernels compare graphs based on their
structure and can work with SVM to classify them.

Example:

Imagine you're classifying different chemical molecules. Each molecule can be represented as
a graph, where atoms are nodes, and bonds are edges. A graph kernel would compare the
structures of these molecules, rather than comparing numeric values, and feed that similarity
information to the SVM.

Dr. Pankti Bhatt

Example: SVM with Non-Metric Data

Let’s go through an example with a non-numeric dataset:

Problem: Classify animals based on features like "type of movement" (walks, flies, swims),
"skin type" (fur, feathers, scales), and "habitat" (land, water, air).

Step 1: Encode the Data

• "Type of movement": One-hot encoded into [walks, flies, swims]

• "Skin type": One-hot encoded into [fur, feathers, scales]
• "Habitat": One-hot encoded into [land, water, air]

Now, each animal is represented as a vector of binary values like [1, 0, 0, 1, 0, 0, 1, 0, 0], which
corresponds to "walks", "fur", and "land."

Step 2: Apply SVM

After encoding, the dataset becomes numeric. The SVM can now look for the best hyperplane
that separates animals based on the encoded features.

Step 3: Use a Kernel (if needed)

If the data isn’t linearly separable (i.e., can’t be classified with a straight hyperplane), the kernel
trick can be applied. A specialized kernel (like Hamming or String kernel) can help measure
similarity in a way that makes classification more accurate.

Conclusion

In SVMs, handling non-metric or non-numeric data requires adapting how the data is processed
or how the similarity between data points is calculated. Key approaches include:

• Using encoding techniques to convert categorical data into a numeric form.

• Using specialized kernels (e.g., string, Hamming, graph kernels) that allow SVMs to
work directly with non-metric data, such as categorical features or sequences.

The idea is that even though SVM traditionally works with numeric data, there are ways to
extend it to handle more complex, non-metric types of data by transforming the problem into
something that SVM can still handle effectively.

Dr. Pankti Bhatt

CART Algorithm

Decision Trees (CART)

A Decision Tree is a simple and intuitive model used for classification and regression tasks.
It makes decisions by splitting data into smaller subsets based on the value of the input features.
Decision trees are often visualized as trees where each node represents a decision, and each
branch represents an outcome of that decision.

The CART (Classification and Regression Trees) algorithm is one of the most widely used
methods for constructing decision trees. As its name suggests, it can handle both classification
(when predicting categories like "spam" or "not spam") and regression (when predicting
continuous values like temperature or price).

Key Concepts of Decision Trees (CART)

1. Root Node: This is the first node of the tree where the decision-making process starts.
It represents the entire dataset.
2. Internal Nodes: These are the decision points in the tree, where a specific feature
(input) is selected to split the data into two or more groups based on a condition.
3. Branches: These represent the possible outcomes of a decision. Each branch
corresponds to a specific subset of the data.
4. Leaf Nodes: These are the terminal nodes (endpoints) of the tree where no further
decisions are made. In a classification tree, each leaf node represents a predicted class;
in a regression tree, it represents a predicted value.

How Decision Trees Work (The Process)

The CART algorithm splits the data into smaller and smaller subsets, step by step, in a greedy
fashion, which means it tries to make the best split at each step, aiming to improve the accuracy
of the tree. The splits are based on certain criteria:

• For classification (CART for classification): CART uses a measure called Gini
Impurity to choose the best splits. The Gini Impurity measures how “pure” a node is,
i.e., how much the data in the node belong to one class. A pure node contains data from
only one class.
• For regression (CART for regression): CART minimizes the Mean Squared Error
(MSE) at each step to make better predictions. This helps the model predict continuous
values like house prices or temperatures.

Example of CART for Classification

Let’s say you are trying to build a decision tree to classify whether a person will buy a computer
based on the following features:

1. Age (Young, Middle-aged, Senior)

2. Income (Low, Medium, High)
3. Student (Yes, No)
4. Credit Rating (Fair, Excellent)

Dr. Pankti Bhatt

Step 1: Selecting the Root Node

The CART algorithm starts by analyzing the data and selecting the feature that best separates
the people who buy computers from those who don’t.

For example, let’s say that Age is the best feature to split on. The root node might ask:

• Is the person Young?

Step 2: Making the First Split

Based on the answer, the tree splits into two branches:

• Yes (Young): The left branch represents all young people.

• No (Middle-aged, Senior): The right branch represents everyone else.

Step 3: Further Splits

Now, each branch becomes a new node, and the algorithm looks for another feature to split on.

• For the Yes (Young) branch, the best feature might be whether they are a student.
• For the No branch (Middle-aged and Senior), the algorithm may choose Income as the
best feature to split on.

Step 4: Continue Until Leaf Nodes

This process continues until the algorithm can’t make any more useful splits, and the tree
reaches its leaf nodes.

• One leaf node might predict: If the person is young and a student, they will buy a
computer.
• Another leaf node might predict: If the person is middle-aged and has high income,
they will buy a computer.

At each split, the CART algorithm tries to minimize the Gini Impurity for classification tasks.
It keeps choosing the best feature and threshold for splitting the data.

Example of CART for Regression

Now, let’s use a regression example where we want to predict house prices based on:

1. Number of Bedrooms
2. Square Footage
3. Distance from City Center

Step 1: Selecting the Root Node

The CART algorithm examines the data and decides that Square Footage is the most important
feature for predicting house prices. The root node might ask:

Dr. Pankti Bhatt

• Is the Square Footage > 2000?

Step 2: Making the First Split

The tree splits into two branches:

• Yes (Square Footage > 2000): Larger homes.

• No (Square Footage ≤ 2000): Smaller homes.

Step 3: Further Splits

For the branch where Square Footage > 2000, the algorithm might decide that Number of
Bedrooms is the next best feature to split on. The tree might then ask:

• Is the Number of Bedrooms > 3?

Step 4: Leaf Nodes with Predictions

The tree continues to split the data until it reaches leaf nodes, where each leaf represents a
predicted house price.

• One leaf node might predict that if Square Footage > 2000 and Number of Bedrooms
> 3, the price is $500,000.
• Another leaf node might predict that if Square Footage ≤ 2000 and Distance from
City Center > 10 miles, the price is $250,000.

At each step, the CART algorithm tries to minimize the Mean Squared Error (MSE) for
regression tasks, ensuring that the splits lead to more accurate predictions.

Stopping Criteria and Pruning

• Stopping Criteria: The tree-building process will stop when the data is perfectly
separated, or when a minimum number of data points are left in each node (to avoid
overfitting). Sometimes, growing a tree until every data point is classified perfectly can
lead to overly complex trees, which don’t generalize well to new data.
• Pruning: To prevent overfitting (where the tree is too complex and captures noise in
the training data), we can "prune" the tree. This means removing branches that add
complexity but don’t improve accuracy much. This leads to a simpler, more
generalizable model.

Advantages of Decision Trees (CART)

1. Easy to Interpret: Decision trees are very intuitive and easy to understand, even by
non-experts. You can visualize the entire decision-making process as a flowchart.
2. Handles Both Types of Data: CART can handle both classification and regression
tasks.
3. No Need for Feature Scaling: Decision trees don’t require features to be scaled or
normalized, as the splits are based on the actual values of the features.
4. Non-linear Relationships: Decision trees can capture non-linear relationships between
features and target variables, as they repeatedly split the data into smaller chunks.

Dr. Pankti Bhatt

Disadvantages of Decision Trees (CART)

1. Overfitting: Decision trees can easily overfit the training data, especially if they grow
too deep. This can make them sensitive to small changes in the data.
2. Instability: Small changes in the data can lead to very different trees being generated.
3. Less Accurate Compared to Other Models: On their own, decision trees may not be
as accurate as more complex models like random forests or boosting techniques.

Example Summary

Let’s summarize with a small example:

• Task: Predict if someone will buy a house based on income and credit score.
• CART Decision Tree:
1. Root Node: Is income > $50,000?
2. Split: Yes (high-income) or No (low-income).
3. Further Split: For high-income people, check credit score > 700.
4. Predictions: Leaf nodes might say: Yes, they will buy a house (if income >
$50,000 and credit score > 700) or No, they won’t buy a house (if income ≤
$50,000).

The tree follows a series of decisions based on the features to arrive at a prediction.

Conclusion

• Decision Trees (CART) is a powerful and interpretable machine learning algorithm

that works by recursively splitting the data into smaller and smaller subsets based on
the most important features.
• CART can be used for both classification and regression problems.
• It is easy to understand and can handle complex data but needs to be pruned to avoid
overfitting and maintain simplicity and generalization.

Dr. Pankti Bhatt

Pca 1
No ratings yet
Pca 1
3 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Dimensionality Reduction Technique
No ratings yet
Dimensionality Reduction Technique
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Module 4
No ratings yet
Module 4
48 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Feature Selection and Dimensionality Reduction
No ratings yet
Feature Selection and Dimensionality Reduction
4 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
Dimensionality Reduction: Key Concepts
No ratings yet
Dimensionality Reduction: Key Concepts
13 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Dimensionality Reduction & Models
No ratings yet
Dimensionality Reduction & Models
59 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
Module3 OTML
No ratings yet
Module3 OTML
67 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Love Report
No ratings yet
Love Report
7 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
PCA & LDA for Engineering Students
No ratings yet
PCA & LDA for Engineering Students
5 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
CH 6
No ratings yet
CH 6
11 pages
Data Science: Dimensionality Reduction
No ratings yet
Data Science: Dimensionality Reduction
24 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling
No ratings yet
Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling
5 pages
Unit 3: Discriminant Analysis and Cluster Analysis
No ratings yet
Unit 3: Discriminant Analysis and Cluster Analysis
43 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
1 Principal Component Analysis (PCA) : Complete Lecture Notes
No ratings yet
1 Principal Component Analysis (PCA) : Complete Lecture Notes
22 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
PCA Guide for B.Tech Students
No ratings yet
PCA Guide for B.Tech Students
10 pages
Module 3
No ratings yet
Module 3
41 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Unit-4 ML
No ratings yet
Unit-4 ML
17 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
Program 3
No ratings yet
Program 3
7 pages
Pages 141-210
No ratings yet
Pages 141-210
70 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
26 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Data Reduction
No ratings yet
Data Reduction
9 pages
Moser 1992
No ratings yet
Moser 1992
4 pages
Central Tendency & Variability
No ratings yet
Central Tendency & Variability
5 pages
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
No ratings yet
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
10 pages
Statistics: Hyphothesis: WK16-LAS2-SAP-II-11
No ratings yet
Statistics: Hyphothesis: WK16-LAS2-SAP-II-11
12 pages
Presentation of Research Methodology
No ratings yet
Presentation of Research Methodology
11 pages
Disruptiveness of Innovations - Measurement and An Assessment of Reliability and Validity
No ratings yet
Disruptiveness of Innovations - Measurement and An Assessment of Reliability and Validity
11 pages
Examples Econometrics
No ratings yet
Examples Econometrics
9 pages
Linear Regression Analysis in Education and Economics
No ratings yet
Linear Regression Analysis in Education and Economics
19 pages
Stock Analysis for Finance Students
No ratings yet
Stock Analysis for Finance Students
4 pages
Introduction to Data Analytics
No ratings yet
Introduction to Data Analytics
30 pages
Handbook For Health Care Research Second Edition Robert L. Chatburn Instant Download
100% (5)
Handbook For Health Care Research Second Edition Robert L. Chatburn Instant Download
84 pages
Data Mining for Academics
No ratings yet
Data Mining for Academics
18 pages
Formula Sheet STAT 2066
No ratings yet
Formula Sheet STAT 2066
4 pages
2022 Test
No ratings yet
2022 Test
12 pages
Introduction To K-Fold Cross-Validation
No ratings yet
Introduction To K-Fold Cross-Validation
6 pages
Chapter 5 Discrete Probability Distributions
0% (1)
Chapter 5 Discrete Probability Distributions
60 pages
FRM291-Applied Statistical Analysis-Jan-May2018
No ratings yet
FRM291-Applied Statistical Analysis-Jan-May2018
3 pages
Understanding Standard Error in Commerce
No ratings yet
Understanding Standard Error in Commerce
16 pages
PFDA (Programming For Data Analysis) APU
No ratings yet
PFDA (Programming For Data Analysis) APU
60 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
23 pages
Statistics For Communication Research
No ratings yet
Statistics For Communication Research
48 pages
1 Descriptive Statistics
No ratings yet
1 Descriptive Statistics
20 pages
1.8.4 Test (TST) - Statistical Analysis (Test)
No ratings yet
1.8.4 Test (TST) - Statistical Analysis (Test)
12 pages
Basics of SAS for Econometrics
No ratings yet
Basics of SAS for Econometrics
14 pages
Advanced Statistics Exam Prep
No ratings yet
Advanced Statistics Exam Prep
3 pages
Econ3120 SPR 14 Prelim1 Final Solution v2
No ratings yet
Econ3120 SPR 14 Prelim1 Final Solution v2
8 pages
Methods of Research and Procedures
No ratings yet
Methods of Research and Procedures
22 pages
Michael Akritas Probability Statistics With R PDF
100% (1)
Michael Akritas Probability Statistics With R PDF
526 pages
2023 PLS
No ratings yet
2023 PLS
21 pages