Module Topic
Scalars What are scalars
Vectors What are Vectors
Row Vector and Column Vector
Distance from Origin
Euclidean Distance between 2 vectors
Scalar Vector Addition/Subtraction(Shiftin
Scalar Vector Multiplication/Division(Scali
Vector Vector Addition/Subtraction
Dot Product of 2 vectors
Angle between 2 vectors
Unit Vectors
Projection of a Vector
Basis Vectors
Equation of a Line in n-D
Vector Norms[L]
Linear Independence
Vector Spaces
Matrix What are Matrices?
Types of Matrices
Orthogonal Matrices
Symmetric Matrices
Diagonal Matrices
Matrix Equality
Scalar Operations on Matrices
Matrix Addition and Subtraction
Matrix Multiplication
Transpose of a Matrix
Determinant
Minor and Cofactor
Adjoint of a Matrix
Inverse of a Matrix
Rank of a Matrix
Coulumn Space and Null Space[L]
Change of Basis [L]
Solving a System of linear equations
Linear Transormations
3d Linear Transformations
Matrix Multiplication as Composition
Linear Transformation of Non-square Matr
Dot Product
Cross Product [L]
Tensors What are Tensors
Importance of Tensors in Deep Learning
Tensor Operations
Data Representation using Tensors
Eigen Values and VecEigen Vectors and Eigen Values
Eigen Faces [L]
Principal Component Analysis [L]
Matrix Factorization LU Decomposition[L]
QR Decomposition[L]
Eigen Decompositon[L]
Singular Value Decomposition[L]
Non-Negative Matrix Factorization[L]
Advanced Topics Moore-Penrose Pseudoinverse[L]
Quadratic Forms[L]
Positive Definite Matrices[L]
Hadamard Product[L]
Tools and Libraries Numpy
Scipy[L]
Usage in Machine Learning
A scalar is a single numeric quantity, fundamental in machine learning for computations, and
deep learning for things like learning rates and loss values.
vectors can represent data points, while in deep learning, they can represent features,
weights, and biases. matter because they affect computations like matrix multiplication,
these representations
critical
machine inlearning
areas like
forneural network
operations likeoperations.
normalization, while in deep learning, it can help
understand the magnitude of weights
nearest neighbor search, and also used orin
feature vectors. loss functions like Mean Squared
deep learning
Error.
These operations can shift vectors, useful in machine learning for data normalization and
centering.
Scalar and In deepmultiplication/division
vector learning, they are employed
can be for
usedoperations like bias
for data scaling incorrection.
machine learning. In
deep learning, it's used to control the learning rate in optimization algorithms.
These are fundamental operations used to combine or compare vectors, used across machine
learning and deep learning for computations on data and weights.
measures and perform computations in more advanced algorithms. In deep learning, it's
crucial in operations
when comparing like calculating
vectors the like
in applications weighted sums in asystems,
recommender neural network
and alsolayer.
in deep learning
when examining the relationships between high-dimensional vectors.
They're equally significant in deep learning, particularly when it comes to generating
directionally
The projectionconsistent weight
of a vector updates.
can be used for dimensionality reduction in machine learning and
can be useful in deep learning for
transformations useful in algorithmsvisualizing
like PCA high-dimensional
and SVD. In deep data or features.
learning, understanding basis
vectors can be useful for interpreting the internal representations that a network learns.
tasks like linear regression, and also crucial in deep learning where hyperplanes (an n-D
extension of a line) are used to separate classes in high-dimensional space.
which can control the complexity of the model, and in normalization techniques such as batch
and layer normalization.
parameters.
PCA assumes that the principal components are linearly independent.
In deep learning, each layer of a neural network can be seen as transforming one vector space
(the layer's input) into another vector space (the layer's output).
matrices are often used to represent sets of features, model parameters, or transformations of
data.
identity matrix in linear algebra operations, or sparse matrices for handling large, high-
dimensional data
techniques. In sets
deep efficiently.
learning, orthogonal matrices are often used to initialize weights in a way
that
because of their desirable properties,gradients.
prevents vanishing or exploding like always having real eigenvalues. Covariance matrices
in statistics
quadratic are an
forms, example
while of symmetric
in deep matrices.
learning, the diagonal matrix structure is used in constructing
learning rate schedules for stochastic optimization.
is fundamental to many machine learning and deep learning algorithms, for example, when
checking convergence
Scalar operations of algorithms.
are used to adjust all elements of a matrix by a fixed value. This is used in
machine learning and deep
These operations are used to learning
combineforordata scaling,
compare weightor
datasets updates, and more. among other
model parameters,
things.
This operation is central to many algorithms in both machine learning and deep learning, like
linear regression
Transposing or forward
a matrix propagation
is important in neurallike
for operations networks.
computing the dot product between two
vectors, or performing
multivariate certain types
normal distributions. of matrix
In deep multiplication.
learning, the determinant is often used in advanced
topics like volume-preserving transformations in flow-based models.
directly used in many machine learning algorithms, they're fundamental to the underlying
linear
inversealgebra.
of a matrix, which is crucial in solving systems of linear equations, often found in
machine learning
Moore-Penrose algorithms.
inversion, which can be used to calculate weights in certain network
architectures.
linear regression), and in deep learning, it's used to investigate the properties of weight
matrices.
important for understanding the solvability of a system of equations, which can arise in
algorithms like linear regression.
parameters between different coordinate systems. This is often used in dimensionality
reduction techniques like PCA, or when visualizing high-dimensional feature spaces.
down to solving a system of linear equations. In deep learning, backpropagation can be seen
as a process of solving a system of equations to find the best parameters.
relationships between points. This is a fundamental operation in many machine learning and
deep
Theselearning algorithms,
transformations from simple
preserve points,regression
lines, and to complex
planes. neural
They're networks.
often used in machine
learning for visualization
transformations. and extensively
This is used geometric interpretations of data.
in deep learning where each layer of a neural
network can be seen as a matrix transformation of the input.
of features doesn't usually match the number of data points. Their transformations can be
used for dimensionality reduction or feature construction.
learning to compute similarity measures and in deep learning, for instance, to calculate the
weighted sum
the original of inputs
vectors. in a neural
In machine network
learning, it'slayer.
used less often due to its restriction to three
dimensions, but it might appear in specific applications that involve 3D data.
learning and deep learning, they are used to represent and manipulate data of various
dimensionalities, such as 1D for time series, 2D for images, or 3D for videos.
Operations such as tensor addition, multiplication, and reshaping are common in deep learning
algorithms foran
For instance, manipulating data
image can be and weights.
represented as a 3D tensor with dimensions for height, width,
and color channels.
understanding linear transformations, and more. In deep learning, they're used to understand
the
Thisbehavior of optimization
is a specific algorithms.
application of eigenvectors used for facial recognition. The 'eigenfaces'
represent the directions in data,
visualize high-dimensional whichand
the more.
images of faces
While show as
not used theoften
mostinvariation.
deep learning, it's
sometimes used for visualizing learned embeddings or activations.
models like linear regression. While not often used directly in deep learning, it's a fundamental
linear algebra stability
for numerical operation.
in certain algorithms. In deep learning, it's often used in some
optimization methods.
structure of data, like PCA. In deep learning, eigen decomposition can be used to analyze the
weights
SVD is a of a model.
method used in machine learning for dimensionality reduction, latent semantic
analysis, and more.
In deep learning, NMFIn is
deep
lesslearning,
common,SVD
butcan be be
might used for in
used model
somecompression
specific dataorpreprocessing
initialization.
or analysis tasks.
unique solution. This is useful in machine learning algorithms such as linear regression. In deep
learning, it canprocesses.
and Gaussian be used inIncalculating the weights
deep learning, they areofoften
certain network
found in thearchitectures.
formulation of loss
functions and regularization terms.
In deep learning, positive definite matrices appear in the analysis of optimization methods,
ensuring certain
learning in desirable
various properties
ways, for instance,like convergence.
in computing certain types of features. In deep learning,
it's used in operations such as gating in recurrent neural networks (RNNs).
Numpy is a fundamental library for numerical computation in Python and is used extensively in
both machine
hierarchical learning and
clustering. deeplearning,
In deep learningScipy
for operations on arrays
might be used and matrices.
for tasks like image processing or
signal processing.
Important
Very important
[L] Later
Module
What is Probability
Random Variable
Contingency Tables in Probab
Bayes Theorem
Topic
Basic Terms like Randome Experment, Trial, Outcome, Sample Space, Event
Types of Events
Empirical Probability Vs Theoritical Probability
What is a Random Variable
Probability Distribution of a Random Variable
Mean of a Random Variable
Variane of a Random Variable
Venn Diagrams
Joint Probability
Marginal Probability
Conditional Probability
Independent Events
Mutually Exculsive Events
Bayes Theorem
e, Sample Space, Event
Module Topic
Descriptive StatistWhat is Stats/Types of Stats
Population Vs Sample
Types of Data
Measures of Central Tendency
- Mean
- Median
- Mode
- Weighted Mean [L]
- Trimmed Mean [L]
Measure of Dispersion
- Range
- Variance
- Standard Deviation
- Coefficient of Variation
Quantiles and Percentiles
5 number summary and BoxPlot
Skewness
Kurtosis [L]
Plotting Graphs
- Univariate Analysis
- Bivariate Analysis
- Multivariate Analysis
Correlation Covariance
Covariance Matrix
Pearson Correlation Coefficient
Spearman Correlation Coefficient [L]
Correlation and Causation
Probability Distri Random Variables
What are Probability Distributions
Why are Probability Distributions important
Probability Distribution Functions and it's types
Probability Mass Function (PMF)
CDF of PMF
Probability Density Function(PDF)
CDF of PDF
Density Estimation [L]
Parametric Density Estimation [L]
Non-Parametric Density Estimation [L]
Kernel Density Estimation(KDE) [L]
How to use PDF/PMF and CDF in Analysis
2D Density Plots
Types of ProbabiliNormal Distribution
- Properties of Normal Distribution
- CDF of Normal Distribution
- Standard Normal Variate
Uniform Distribution
Bernaulli Distribution
Binomial Distribution
Multinomial Distribution
Log Normal Distribution
Pareto Distribution [L]
Chi-square Distribution
Student's T Distribution
Poisson Distribution [L]
Beta Distribution [L]
Gamma Distribution [L]
Transformations
Confidence Interva
Point Estimates
Confidence Intervals
Confidence Interval(Sigma Known)
Confidence Interval(Sigma Unknown)
Interpreting Confidence Interval
Margin of Error and factors affecting it
Central Limit The Sampling Distribution
What is CLT
Standard Error
Hypothesis Tests What is Hypothesis Testing?
Null and Alternate Hypothesis
Steps involved in a Hypothesis Test
Performing Z-test
Rejection Region Approach
Type 1 Vs Type 2 Errors
One Sided vs 2 sided tests
Statistical Power
P-value
How to interpret P-values
Types of HypothesZ-test
T-test
- Single Sample T-test
- Independent 2 sample t-test
- Paired 2 sample t-test
Chi-square Test
Chi-square Goodness of Fit Test
Chi-square Test of Independence
ANOVA
One Way Anova
Two Way Anova
F-test
Levene Test [L]
Shapiro Wilk Test [L]
K-S Test [L]
Fisher's Test [L]
Miscellaneous TopChebyshev's Inequality [L]
QQ Plot
Sampling
Resampling Techniques
Bootstraping [L]
Standardization
Normalization
Statistical Moments [L]
Bayesian Statistics
A/B Testing
Law of Large Numbers
Usage in Machine Learning
training set), which is assumed to be representative of the population. This concept is used to perform
inferential statistics and to estimate the model's performance on unseen data.
Understanding the type of data you're working with helps in selecting the appropriate preprocessing
techniques, feature engineering methods, and machine learning models.
value in a dataset and are used in various areas of machine learning including exploratory data analysis,
outlier detection, and data imputation.
understanding the consistency in the data and are also used in exploratory data analysis, outlier detection,
feature normalization, etc.
These help in understanding the distribution of data and are used in descriptive statistics, outlier detection,
and setting up thresholds for decision-making.
outliers. Boxplots graphically depict the minimum, first quartile, median, third quartile, and maximum of a
dataset.
particularly useful in exploratory data analysis, informing data transformations needed to meet the
assumptions of some machine learning algorithms.
distributions of individual variables (univariate), relationships between two variables (bivariate), or complex
interactions among multiple variables (multivariate).
are used in many machine learning algorithms, such as Principal Component Analysis (PCA) for
dimensionality reduction, or Gaussian Mixture Models for clustering.
highly correlated input features can be identified and reduced, to improve the performance and
interpretability of the model.
assumptions of Pearson's correlation (linearity, normality). It can be used in the same contexts as Pearson's
correlation coefficient.
machine learning, it's important to remember that correlation doesn't imply causation, and algorithms
based purely on correlation might fail to generalize well.
algorithms. They help us understand the data's inherent randomness and variability, and guide the choice
and behavior of algorithms.
s important
s and it's types
These concepts are critical in understanding and manipulating discrete random variables, often used in
algorithms like Naive Bayes, Hidden Markov Models, etc.
These are used for continuous random variables. For instance, in the Gaussian Mixture Model, each cluster
is modeled as a Gaussian distribution with its PDF.
non-parametric way to estimate the PDF of a random variable, is particularly useful when no suitable
parametric form of the data is known.
and trends in the data. In machine learning, these analyses can inform the choice of model, preprocessing
steps, and potential feature engineering.
modeling steps. For instance, they could help identify clusters for a clustering algorithm in unsupervised
learning.
regression, and any algorithm that uses these as a base, such as neural networks. Also, many statistical
methods require the assumption of normally distributed errors.
This distribution is used in random forest algorithms for feature splits, and also in initialization of weights in
neural networks. It is also used in methods like grid search where you need to randomly sample parameters.
Used in algorithms that model binary outcomes, such as the Bernoulli Naive Bayes classifier and logistic
regression.
Used in modelling the number of successes in a fixed number of Bernoulli trials, often applied in
classification problems.
Text Classification, topic modelling, deep learning and word embeddings
Useful in various contexts, such as when dealing with variables that are the multiplicative product of other
variables, or when working with data that exhibit skewness.
Often used in the realm of anomaly detection or for studying phenomena in the domain of social, quality
control, and economic sciences.
Chi-square tests use this distribution extensively to test relationships between categorical variables. The chi-
square statistic is also used in the context of feature selection.
Plays a crucial role in formulating the confidence interval when the sample size is small and/or when the
population standard deviation is unknown.
Used for modeling the number of times an event might occur within a set time or space. It's often used in
queuing theory and for time-series prediction models.
This is a versatile distribution often used in Bayesian methods, and is also the conjugate prior for the
Bernoulli, binomial, negative binomial and geometric distributions.
The Gamma distribution is used in a variety of fields, including queuing models, climatology, and financial
services. It's the conjugate prior of the Poisson, exponential, and normal distributions.
performance of the algorithm, or help visualize the data. Common examples are the logarithmic, square
root, and z-score standardization transformations.
given level of confidence. They are used to understand the reliability of point estimates and are often used
to report the results of models.
the population distribution. This is the foundation for many machine learning methods and is often used in
hypothesis testing and in creating confidence intervals.
This is used to understand the variability in a point estimate. In machine learning, it's often used in
constructing confidence intervals for model parameters and in hypothesis testing.
and in checking assumptions related to specific models. For instance, a t-test might be used to determine if
the means of two sets of results (like two algorithms) are significantly different.
that has been put forward, either because it is believed to be true or because it is to be used as a basis for
argument, but has not been proved.
observed effect in our sample is real or happened due to chance. These concepts are used in feature
selection, model validation, and comparisons between models.
This is the ability of a hypothesis test to detect an effect, if the effect actually exists. In machine learning,
power analysis can be used to estimate the minimum number of observations required to detect an effect.
strong evidence to reject the null hypothesis. In machine learning, p-values are often used in feature
selection where the null hypothesis is that the feature has no effect on the target variable.
is used in machine learning when the data is normally distributed and the population variance is known. It's
often used in A/B testing to decide whether two groups' mean outcomes are different.
T-tests are used when the data is normally distributed but the population variance is unknown.
compares the mean of a single sample to a known population mean.
compares the means of two independent samples.
machine learning, t-tests are often used in experiments designed to compare the performance of two
different algorithms on the same problem.
The Chi-square test is used when dealing with categorical variables. It helps to establish if there's a
statistically significant relationship between categorical variables.
determines if a sample data matches a population.
checks the relationship between two categorical variables.
The null hypothesis states that all population means are equal while the alternative hypothesis states that at
least one is different.
It’s used to test for differences among at least three groups, as they relate to one factor or variable.
It’s used to compare the mean differences between groups that have been split on two independent
variables.
This test assesses the equality of variances for a variable calculated for two or more groups. It's often used
in feature selection where the null hypothesis is that the variances are equal.
This test is used to check the normality of a distribution. Many machine learning algorithms assume normal
distribution, making this test quite useful.
he K-S test is a non-parametric test that compares a sample with a reference probability distribution, or two
samples with each other. It's used in goodness-of-fit tests.
Fisher's test is used to determine if there are nonrandom associations between two categorical variables.
applied for outlier detection. Chebyshev's inequality is also used in the analysis and proof of convergence of
some machine learning algorithms.
It's often used to check the assumption of normality in data. Normality of residuals is an assumption in
certain statistical and machine learning models, so this can help in diagnostic analysis of these models.
datasets, where it may be computationally infeasible to use the entire population. Techniques such as train-
test split, k-fold cross-validation, and stratified sampling all involve sampling principles.
Cross Validation
in ensemble methods like Bagging and Random Forests to generate diverse models by creating different
datasets.
It's used to bring data to a common scale without distorting differences in the ranges of values or losing
information. Many machine learning algorithms perform better with standardized input features.
in the dataset to a common scale, but without distorting differences in the ranges of values or losing
information. It's also known as Min-Max scaling.
distributions. In particular, skewness and kurtosis can be used in feature engineering to create new features
or to select features.
Hyperparameter Tuning
Yellow Important
Red Extremely Important
[L] Later
Module
Diffrentiation
Optimization Theory
Topic
What is Diffrentiation
Diffrentiation of a Constant
Power Rule
Sum Rule
Product Rule
Quotient Rule
Chain Rule
Partial Diffrentiation
Higher Order Derivatives
Matrix Diffrentiation
Function
Multivariate Funtions
Parameters of a Function
Parametric Vs Non Parameteric Models
Maxima & Minima
Loss Functions
How to select a good Loss Function
Calculating Parameters of a Loss Function
Convex & Concave Loss Functions
Gradient Descent
Gradient Descent with Multiple Parameters
Hessians
Problems faced in Optimization
Hessians
Constrained Optimization Problem