KEMBAR78
Maths Roadmap For Machine Learning | PDF | Matrix (Mathematics) | Statistical Hypothesis Testing
0% found this document useful (0 votes)
85 views21 pages

Maths Roadmap For Machine Learning

Uploaded by

taruntomar9084
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views21 pages

Maths Roadmap For Machine Learning

Uploaded by

taruntomar9084
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Module Topic

Scalars What are scalars

Vectors What are Vectors


Row Vector and Column Vector
Distance from Origin
Euclidean Distance between 2 vectors
Scalar Vector Addition/Subtraction(Shiftin
Scalar Vector Multiplication/Division(Scali
Vector Vector Addition/Subtraction

Dot Product of 2 vectors


Angle between 2 vectors

Unit Vectors
Projection of a Vector
Basis Vectors

Equation of a Line in n-D

Vector Norms[L]

Linear Independence

Vector Spaces

Matrix What are Matrices?


Types of Matrices
Orthogonal Matrices
Symmetric Matrices
Diagonal Matrices
Matrix Equality
Scalar Operations on Matrices
Matrix Addition and Subtraction
Matrix Multiplication
Transpose of a Matrix
Determinant
Minor and Cofactor
Adjoint of a Matrix
Inverse of a Matrix
Rank of a Matrix

Coulumn Space and Null Space[L]

Change of Basis [L]

Solving a System of linear equations

Linear Transormations
3d Linear Transformations
Matrix Multiplication as Composition

Linear Transformation of Non-square Matr

Dot Product
Cross Product [L]

Tensors What are Tensors


Importance of Tensors in Deep Learning
Tensor Operations
Data Representation using Tensors

Eigen Values and VecEigen Vectors and Eigen Values


Eigen Faces [L]
Principal Component Analysis [L]

Matrix Factorization LU Decomposition[L]


QR Decomposition[L]
Eigen Decompositon[L]
Singular Value Decomposition[L]
Non-Negative Matrix Factorization[L]

Advanced Topics Moore-Penrose Pseudoinverse[L]


Quadratic Forms[L]
Positive Definite Matrices[L]
Hadamard Product[L]

Tools and Libraries Numpy


Scipy[L]
Usage in Machine Learning
A scalar is a single numeric quantity, fundamental in machine learning for computations, and
deep learning for things like learning rates and loss values.
vectors can represent data points, while in deep learning, they can represent features,
weights, and biases. matter because they affect computations like matrix multiplication,
these representations
critical
machine inlearning
areas like
forneural network
operations likeoperations.
normalization, while in deep learning, it can help
understand the magnitude of weights
nearest neighbor search, and also used orin
feature vectors. loss functions like Mean Squared
deep learning
Error.
These operations can shift vectors, useful in machine learning for data normalization and
centering.
Scalar and In deepmultiplication/division
vector learning, they are employed
can be for
usedoperations like bias
for data scaling incorrection.
machine learning. In
deep learning, it's used to control the learning rate in optimization algorithms.
These are fundamental operations used to combine or compare vectors, used across machine
learning and deep learning for computations on data and weights.
measures and perform computations in more advanced algorithms. In deep learning, it's
crucial in operations
when comparing like calculating
vectors the like
in applications weighted sums in asystems,
recommender neural network
and alsolayer.
in deep learning
when examining the relationships between high-dimensional vectors.
They're equally significant in deep learning, particularly when it comes to generating
directionally
The projectionconsistent weight
of a vector updates.
can be used for dimensionality reduction in machine learning and
can be useful in deep learning for
transformations useful in algorithmsvisualizing
like PCA high-dimensional
and SVD. In deep data or features.
learning, understanding basis
vectors can be useful for interpreting the internal representations that a network learns.
tasks like linear regression, and also crucial in deep learning where hyperplanes (an n-D
extension of a line) are used to separate classes in high-dimensional space.
which can control the complexity of the model, and in normalization techniques such as batch
and layer normalization.
parameters.
PCA assumes that the principal components are linearly independent.
In deep learning, each layer of a neural network can be seen as transforming one vector space
(the layer's input) into another vector space (the layer's output).
matrices are often used to represent sets of features, model parameters, or transformations of
data.
identity matrix in linear algebra operations, or sparse matrices for handling large, high-
dimensional data
techniques. In sets
deep efficiently.
learning, orthogonal matrices are often used to initialize weights in a way
that
because of their desirable properties,gradients.
prevents vanishing or exploding like always having real eigenvalues. Covariance matrices
in statistics
quadratic are an
forms, example
while of symmetric
in deep matrices.
learning, the diagonal matrix structure is used in constructing
learning rate schedules for stochastic optimization.
is fundamental to many machine learning and deep learning algorithms, for example, when
checking convergence
Scalar operations of algorithms.
are used to adjust all elements of a matrix by a fixed value. This is used in
machine learning and deep
These operations are used to learning
combineforordata scaling,
compare weightor
datasets updates, and more. among other
model parameters,
things.
This operation is central to many algorithms in both machine learning and deep learning, like
linear regression
Transposing or forward
a matrix propagation
is important in neurallike
for operations networks.
computing the dot product between two
vectors, or performing
multivariate certain types
normal distributions. of matrix
In deep multiplication.
learning, the determinant is often used in advanced
topics like volume-preserving transformations in flow-based models.
directly used in many machine learning algorithms, they're fundamental to the underlying
linear
inversealgebra.
of a matrix, which is crucial in solving systems of linear equations, often found in
machine learning
Moore-Penrose algorithms.
inversion, which can be used to calculate weights in certain network
architectures.
linear regression), and in deep learning, it's used to investigate the properties of weight
matrices.
important for understanding the solvability of a system of equations, which can arise in
algorithms like linear regression.
parameters between different coordinate systems. This is often used in dimensionality
reduction techniques like PCA, or when visualizing high-dimensional feature spaces.
down to solving a system of linear equations. In deep learning, backpropagation can be seen
as a process of solving a system of equations to find the best parameters.
relationships between points. This is a fundamental operation in many machine learning and
deep
Theselearning algorithms,
transformations from simple
preserve points,regression
lines, and to complex
planes. neural
They're networks.
often used in machine
learning for visualization
transformations. and extensively
This is used geometric interpretations of data.
in deep learning where each layer of a neural
network can be seen as a matrix transformation of the input.
of features doesn't usually match the number of data points. Their transformations can be
used for dimensionality reduction or feature construction.
learning to compute similarity measures and in deep learning, for instance, to calculate the
weighted sum
the original of inputs
vectors. in a neural
In machine network
learning, it'slayer.
used less often due to its restriction to three
dimensions, but it might appear in specific applications that involve 3D data.
learning and deep learning, they are used to represent and manipulate data of various
dimensionalities, such as 1D for time series, 2D for images, or 3D for videos.
Operations such as tensor addition, multiplication, and reshaping are common in deep learning
algorithms foran
For instance, manipulating data
image can be and weights.
represented as a 3D tensor with dimensions for height, width,
and color channels.

understanding linear transformations, and more. In deep learning, they're used to understand
the
Thisbehavior of optimization
is a specific algorithms.
application of eigenvectors used for facial recognition. The 'eigenfaces'
represent the directions in data,
visualize high-dimensional whichand
the more.
images of faces
While show as
not used theoften
mostinvariation.
deep learning, it's
sometimes used for visualizing learned embeddings or activations.

models like linear regression. While not often used directly in deep learning, it's a fundamental
linear algebra stability
for numerical operation.
in certain algorithms. In deep learning, it's often used in some
optimization methods.
structure of data, like PCA. In deep learning, eigen decomposition can be used to analyze the
weights
SVD is a of a model.
method used in machine learning for dimensionality reduction, latent semantic
analysis, and more.
In deep learning, NMFIn is
deep
lesslearning,
common,SVD
butcan be be
might used for in
used model
somecompression
specific dataorpreprocessing
initialization.
or analysis tasks.

unique solution. This is useful in machine learning algorithms such as linear regression. In deep
learning, it canprocesses.
and Gaussian be used inIncalculating the weights
deep learning, they areofoften
certain network
found in thearchitectures.
formulation of loss
functions and regularization terms.
In deep learning, positive definite matrices appear in the analysis of optimization methods,
ensuring certain
learning in desirable
various properties
ways, for instance,like convergence.
in computing certain types of features. In deep learning,
it's used in operations such as gating in recurrent neural networks (RNNs).

Numpy is a fundamental library for numerical computation in Python and is used extensively in
both machine
hierarchical learning and
clustering. deeplearning,
In deep learningScipy
for operations on arrays
might be used and matrices.
for tasks like image processing or
signal processing.
Important

Very important

[L] Later
Module

What is Probability

Random Variable

Contingency Tables in Probab

Bayes Theorem
Topic

Basic Terms like Randome Experment, Trial, Outcome, Sample Space, Event
Types of Events
Empirical Probability Vs Theoritical Probability

What is a Random Variable


Probability Distribution of a Random Variable
Mean of a Random Variable
Variane of a Random Variable

Venn Diagrams
Joint Probability
Marginal Probability
Conditional Probability

Independent Events
Mutually Exculsive Events
Bayes Theorem
e, Sample Space, Event
Module Topic

Descriptive StatistWhat is Stats/Types of Stats


Population Vs Sample

Types of Data

Measures of Central Tendency


- Mean
- Median
- Mode
- Weighted Mean [L]
- Trimmed Mean [L]

Measure of Dispersion
- Range
- Variance
- Standard Deviation
- Coefficient of Variation

Quantiles and Percentiles


5 number summary and BoxPlot

Skewness
Kurtosis [L]

Plotting Graphs
- Univariate Analysis
- Bivariate Analysis
- Multivariate Analysis

Correlation Covariance
Covariance Matrix
Pearson Correlation Coefficient
Spearman Correlation Coefficient [L]
Correlation and Causation

Probability Distri Random Variables


What are Probability Distributions
Why are Probability Distributions important
Probability Distribution Functions and it's types

Probability Mass Function (PMF)


CDF of PMF

Probability Density Function(PDF)


CDF of PDF
Density Estimation [L]
Parametric Density Estimation [L]
Non-Parametric Density Estimation [L]
Kernel Density Estimation(KDE) [L]

How to use PDF/PMF and CDF in Analysis

2D Density Plots

Types of ProbabiliNormal Distribution


- Properties of Normal Distribution
- CDF of Normal Distribution
- Standard Normal Variate

Uniform Distribution

Bernaulli Distribution

Binomial Distribution

Multinomial Distribution

Log Normal Distribution

Pareto Distribution [L]

Chi-square Distribution

Student's T Distribution

Poisson Distribution [L]


Beta Distribution [L]

Gamma Distribution [L]

Transformations

Confidence Interva
Point Estimates
Confidence Intervals
Confidence Interval(Sigma Known)
Confidence Interval(Sigma Unknown)
Interpreting Confidence Interval
Margin of Error and factors affecting it

Central Limit The Sampling Distribution


What is CLT
Standard Error

Hypothesis Tests What is Hypothesis Testing?


Null and Alternate Hypothesis
Steps involved in a Hypothesis Test
Performing Z-test
Rejection Region Approach
Type 1 Vs Type 2 Errors
One Sided vs 2 sided tests
Statistical Power
P-value
How to interpret P-values

Types of HypothesZ-test

T-test
- Single Sample T-test
- Independent 2 sample t-test
- Paired 2 sample t-test

Chi-square Test
Chi-square Goodness of Fit Test
Chi-square Test of Independence

ANOVA
One Way Anova
Two Way Anova
F-test

Levene Test [L]

Shapiro Wilk Test [L]

K-S Test [L]

Fisher's Test [L]

Miscellaneous TopChebyshev's Inequality [L]


QQ Plot
Sampling
Resampling Techniques
Bootstraping [L]
Standardization
Normalization
Statistical Moments [L]
Bayesian Statistics
A/B Testing
Law of Large Numbers
Usage in Machine Learning

training set), which is assumed to be representative of the population. This concept is used to perform
inferential statistics and to estimate the model's performance on unseen data.

Understanding the type of data you're working with helps in selecting the appropriate preprocessing
techniques, feature engineering methods, and machine learning models.

value in a dataset and are used in various areas of machine learning including exploratory data analysis,
outlier detection, and data imputation.

understanding the consistency in the data and are also used in exploratory data analysis, outlier detection,
feature normalization, etc.

These help in understanding the distribution of data and are used in descriptive statistics, outlier detection,
and setting up thresholds for decision-making.
outliers. Boxplots graphically depict the minimum, first quartile, median, third quartile, and maximum of a
dataset.

particularly useful in exploratory data analysis, informing data transformations needed to meet the
assumptions of some machine learning algorithms.

distributions of individual variables (univariate), relationships between two variables (bivariate), or complex
interactions among multiple variables (multivariate).

are used in many machine learning algorithms, such as Principal Component Analysis (PCA) for
dimensionality reduction, or Gaussian Mixture Models for clustering.

highly correlated input features can be identified and reduced, to improve the performance and
interpretability of the model.
assumptions of Pearson's correlation (linearity, normality). It can be used in the same contexts as Pearson's
correlation coefficient.
machine learning, it's important to remember that correlation doesn't imply causation, and algorithms
based purely on correlation might fail to generalize well.

algorithms. They help us understand the data's inherent randomness and variability, and guide the choice
and behavior of algorithms.
s important
s and it's types

These concepts are critical in understanding and manipulating discrete random variables, often used in
algorithms like Naive Bayes, Hidden Markov Models, etc.

These are used for continuous random variables. For instance, in the Gaussian Mixture Model, each cluster
is modeled as a Gaussian distribution with its PDF.

non-parametric way to estimate the PDF of a random variable, is particularly useful when no suitable
parametric form of the data is known.

and trends in the data. In machine learning, these analyses can inform the choice of model, preprocessing
steps, and potential feature engineering.

modeling steps. For instance, they could help identify clusters for a clustering algorithm in unsupervised
learning.

regression, and any algorithm that uses these as a base, such as neural networks. Also, many statistical
methods require the assumption of normally distributed errors.

This distribution is used in random forest algorithms for feature splits, and also in initialization of weights in
neural networks. It is also used in methods like grid search where you need to randomly sample parameters.

Used in algorithms that model binary outcomes, such as the Bernoulli Naive Bayes classifier and logistic
regression.

Used in modelling the number of successes in a fixed number of Bernoulli trials, often applied in
classification problems.

Text Classification, topic modelling, deep learning and word embeddings

Useful in various contexts, such as when dealing with variables that are the multiplicative product of other
variables, or when working with data that exhibit skewness.

Often used in the realm of anomaly detection or for studying phenomena in the domain of social, quality
control, and economic sciences.

Chi-square tests use this distribution extensively to test relationships between categorical variables. The chi-
square statistic is also used in the context of feature selection.

Plays a crucial role in formulating the confidence interval when the sample size is small and/or when the
population standard deviation is unknown.

Used for modeling the number of times an event might occur within a set time or space. It's often used in
queuing theory and for time-series prediction models.
This is a versatile distribution often used in Bayesian methods, and is also the conjugate prior for the
Bernoulli, binomial, negative binomial and geometric distributions.

The Gamma distribution is used in a variety of fields, including queuing models, climatology, and financial
services. It's the conjugate prior of the Poisson, exponential, and normal distributions.

performance of the algorithm, or help visualize the data. Common examples are the logarithmic, square
root, and z-score standardization transformations.

given level of confidence. They are used to understand the reliability of point estimates and are often used
to report the results of models.

the population distribution. This is the foundation for many machine learning methods and is often used in
hypothesis testing and in creating confidence intervals.

This is used to understand the variability in a point estimate. In machine learning, it's often used in
constructing confidence intervals for model parameters and in hypothesis testing.

and in checking assumptions related to specific models. For instance, a t-test might be used to determine if
the means of two sets of results (like two algorithms) are significantly different.
that has been put forward, either because it is believed to be true or because it is to be used as a basis for
argument, but has not been proved.

observed effect in our sample is real or happened due to chance. These concepts are used in feature
selection, model validation, and comparisons between models.

This is the ability of a hypothesis test to detect an effect, if the effect actually exists. In machine learning,
power analysis can be used to estimate the minimum number of observations required to detect an effect.
strong evidence to reject the null hypothesis. In machine learning, p-values are often used in feature
selection where the null hypothesis is that the feature has no effect on the target variable.

is used in machine learning when the data is normally distributed and the population variance is known. It's
often used in A/B testing to decide whether two groups' mean outcomes are different.

T-tests are used when the data is normally distributed but the population variance is unknown.
compares the mean of a single sample to a known population mean.
compares the means of two independent samples.
machine learning, t-tests are often used in experiments designed to compare the performance of two
different algorithms on the same problem.

The Chi-square test is used when dealing with categorical variables. It helps to establish if there's a
statistically significant relationship between categorical variables.
determines if a sample data matches a population.
checks the relationship between two categorical variables.

The null hypothesis states that all population means are equal while the alternative hypothesis states that at
least one is different.
It’s used to test for differences among at least three groups, as they relate to one factor or variable.
It’s used to compare the mean differences between groups that have been split on two independent
variables.

This test assesses the equality of variances for a variable calculated for two or more groups. It's often used
in feature selection where the null hypothesis is that the variances are equal.

This test is used to check the normality of a distribution. Many machine learning algorithms assume normal
distribution, making this test quite useful.

he K-S test is a non-parametric test that compares a sample with a reference probability distribution, or two
samples with each other. It's used in goodness-of-fit tests.

Fisher's test is used to determine if there are nonrandom associations between two categorical variables.

applied for outlier detection. Chebyshev's inequality is also used in the analysis and proof of convergence of
some machine learning algorithms.
It's often used to check the assumption of normality in data. Normality of residuals is an assumption in
certain statistical and machine learning models, so this can help in diagnostic analysis of these models.
datasets, where it may be computationally infeasible to use the entire population. Techniques such as train-
test split, k-fold cross-validation, and stratified sampling all involve sampling principles.
Cross Validation
in ensemble methods like Bagging and Random Forests to generate diverse models by creating different
datasets.
It's used to bring data to a common scale without distorting differences in the ranges of values or losing
information. Many machine learning algorithms perform better with standardized input features.
in the dataset to a common scale, but without distorting differences in the ranges of values or losing
information. It's also known as Min-Max scaling.
distributions. In particular, skewness and kurtosis can be used in feature engineering to create new features
or to select features.
Hyperparameter Tuning
Yellow Important
Red Extremely Important
[L] Later
Module

Diffrentiation

Optimization Theory
Topic

What is Diffrentiation
Diffrentiation of a Constant
Power Rule
Sum Rule
Product Rule
Quotient Rule
Chain Rule
Partial Diffrentiation
Higher Order Derivatives
Matrix Diffrentiation

Function
Multivariate Funtions
Parameters of a Function
Parametric Vs Non Parameteric Models
Maxima & Minima
Loss Functions
How to select a good Loss Function
Calculating Parameters of a Loss Function
Convex & Concave Loss Functions
Gradient Descent
Gradient Descent with Multiple Parameters
Hessians
Problems faced in Optimization
Hessians
Constrained Optimization Problem

You might also like