KEMBAR78
Maths Roadmap For Machine Learning-1 | PDF | Matrix (Mathematics) | Student's T Test
0% found this document useful (0 votes)
38 views8 pages

Maths Roadmap For Machine Learning-1

The document provides a comprehensive overview of mathematical concepts and their applications in machine learning and deep learning, including scalars, vectors, matrices, tensors, and advanced topics like eigenvalues and matrix factorization. It emphasizes the importance of these concepts for various operations, such as data representation, dimensionality reduction, and optimization techniques. Additionally, it covers probability and descriptive statistics, highlighting their relevance in data analysis and model training.

Uploaded by

fr3id0912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views8 pages

Maths Roadmap For Machine Learning-1

The document provides a comprehensive overview of mathematical concepts and their applications in machine learning and deep learning, including scalars, vectors, matrices, tensors, and advanced topics like eigenvalues and matrix factorization. It emphasizes the importance of these concepts for various operations, such as data representation, dimensionality reduction, and optimization techniques. Additionally, it covers probability and descriptive statistics, highlighting their relevance in data analysis and model training.

Uploaded by

fr3id0912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Module Topic Usage in Machine Learning

Scalars What are scalars A scalar is a single numeric quantity, fundamental in machine learning for computations, and deep
learning for things like learning rates and loss values. Important

Vectors What are Vectors These are arrays of numbers that can represent multiple forms of data. In machine learning, vectors can
represent data points, while in deep learning, they can represent features, weights, and biases. Very important
Row Vector and Column Vector These are different forms of representing vectors. In both machine learning and deep learning, these
representations matter because they affect computations like matrix multiplication, critical in areas like
neural network operations.
Distance from Origin This is the magnitude of the vector from the origin of the vector space. It's important in machine learning
for operations like normalization, while in deep learning, it can help understand the magnitude of weights
or feature vectors. [L] Later
Euclidean Distance between 2 vectors This metric calculates the straight-line distance between two points or vectors. It's a common way to
measure distance in many machine learning algorithms, including clustering and nearest neighbor
search, and also used in deep learning loss functions like Mean Squared Error.
Scalar Vector Addition/Subtraction(Shifting) These operations can shift vectors, useful in machine learning for data normalization and centering. In
deep learning, they are employed for operations like bias correction.
Scalar Vector Multiplication/Division(Scaling) Scalar and vector multiplication/division can be used for data scaling in machine learning. In deep
learning, it's used to control the learning rate in optimization algorithms.
Vector Vector Addition/Subtraction These are fundamental operations used to combine or compare vectors, used across machine learning
and deep learning for computations on data and weights.

Dot Product of 2 vectors This operation results in a scalar and is used in machine learning to compute similarity measures and
perform computations in more advanced algorithms. In deep learning, it's crucial in operations like
calculating the weighted sums in a neural network layer.
Angle between 2 vectors This can indicate the difference in direction between two vectors, useful in machine learning when
comparing vectors in applications like recommender systems, and also in deep learning when examining
the relationships between high-dimensional vectors.

Unit Vectors Unit vectors are important in machine learning for normalization and simplifying computations. They're
equally significant in deep learning, particularly when it comes to generating directionally consistent
weight updates.
Projection of a Vector The projection of a vector can be used for dimensionality reduction in machine learning and can be
useful in deep learning for visualizing high-dimensional data or features.
Basis Vectors Basis vectors are used in machine learning for defining coordinate systems and working with
transformations useful in algorithms like PCA and SVD. In deep learning, understanding basis vectors can
be useful for interpreting the internal representations that a network learns.

Equation of a Line in n-D This generalizes the equation of a line to higher dimensions. It's used in machine learning for tasks like
linear regression, and also crucial in deep learning where hyperplanes (an n-D extension of a line) are
used to separate classes in high-dimensional space.

Vector Norms[L] Vector norms measure the length of a vector. In machine learning, they are fundamental in regularization
techniques. In deep learning, they're used in measuring the size of weights, which can control the
complexity of the model, and in normalization techniques such as batch and layer normalization.

Linear Independence Linear independence is a fundamental concept in many machine learning algorithms.

For instance, in linear regression, if the predictor variables are not linearly independent (i.e., they are
collinear), it can lead to issues like inflated variance and unstable estimates of parameters.
PCA assumes that the principal components are linearly independent.

Vector Spaces The concept of a vector space is used throughout machine learning and deep learning.

In supervised learning, for example, the feature space (consisting of all possible feature vectors) and the
output space (consisting of all possible output vectors) are vector spaces.
In unsupervised learning, clustering algorithms often operate in a vector space, grouping together points
that are close in this space.
In deep learning, each layer of a neural network can be seen as transforming one vector space (the
layer's input) into another vector space (the layer's output).

Matrix What are Matrices? A matrix is a two-dimensional array of numbers. In machine learning and deep learning, matrices are
often used to represent sets of features, model parameters, or transformations of data.
Types of Matrices Different types of matrices (identity, zero, sparse, etc.) are used in various ways, such as the identity
matrix in linear algebra operations, or sparse matrices for handling large, high-dimensional data sets
efficiently.
Orthogonal Matrices Orthogonal matrices preserve the length and angle between vectors when they're multiplied. In machine
learning, they're often used in PCA and SVD, which are dimension reduction techniques. In deep learning,
orthogonal matrices are often used to initialize weights in a way that prevents vanishing or exploding
gradients.
Symmetric Matrices These are matrices that are equal to their transpose. They're used in various algorithms because of their
desirable properties, like always having real eigenvalues. Covariance matrices in statistics are an
example of symmetric matrices.
Diagonal Matrices Diagonal matrices are used for scaling operations. In machine learning, they often appear in quadratic
forms, while in deep learning, the diagonal matrix structure is used in constructing learning rate
schedules for stochastic optimization.
Matrix Equality Matrices are equal if they're of the same size and their corresponding elements are equal. This is
fundamental to many machine learning and deep learning algorithms, for example, when checking
convergence of algorithms.
Scalar Operations on Matrices Scalar operations are used to adjust all elements of a matrix by a fixed value. This is used in machine
learning and deep learning for data scaling, weight updates, and more.
Matrix Addition and Subtraction These operations are used to combine or compare datasets or model parameters, among other things.
Matrix Multiplication This operation is central to many algorithms in both machine learning and deep learning, like linear
regression or forward propagation in neural networks.
Transpose of a Matrix Transposing a matrix is important for operations like computing the dot product between two vectors, or
performing certain types of matrix multiplication.
Determinant The determinant of a matrix in machine learning is often used in statistics for tasks like multivariate
normal distributions. In deep learning, the determinant is often used in advanced topics like volume-
preserving transformations in flow-based models.
Minor and Cofactor These concepts are used in computing the inverse of a matrix or its determinant. While not directly used
in many machine learning algorithms, they're fundamental to the underlying linear algebra.
Adjoint of a Matrix The adjoint of a matrix is the transpose of the cofactor matrix. It's used in calculating the inverse of a
matrix, which is crucial in solving systems of linear equations, often found in machine learning algorithms.
Inverse of a Matrix The inverse of a matrix is used to solve systems of linear equations, which appears in methods like linear
regression. In deep learning, pseudo-inverse matrices are used in techniques like Moore-Penrose
inversion, which can be used to calculate weights in certain network architectures.

Rank of a Matrix The rank of a matrix is the maximum number of linearly independent rows or columns in the matrix. It's
useful in machine learning for determining the solvability of linear systems (like in linear regression), and
in deep learning, it's used to investigate the properties of weight matrices.

Coulumn Space and Null Space[L] The column space represents the set of all possible linear combinations of the vectors in the matrix. The
null space represents the solutions to the homogeneous equation Ax=0. They are important for
understanding the solvability of a system of equations, which can arise in algorithms like linear
regression.

Change of Basis [L] The change of basis is used in machine learning and deep learning to transform data or model
parameters between different coordinate systems. This is often used in dimensionality reduction
techniques like PCA, or when visualizing high-dimensional feature spaces.

Solving a System of linear equations Many machine learning algorithms, including linear and logistic regression, essentially boil down to
solving a system of linear equations. In deep learning, backpropagation can be seen as a process of
solving a system of equations to find the best parameters.

Linear Transormations Linear transformations are used to map input data to a different space, preserving relationships between
points. This is a fundamental operation in many machine learning and deep learning algorithms, from
simple regression to complex neural networks.
3d Linear Transformations These transformations preserve points, lines, and planes. They're often used in machine learning for
visualization and geometric interpretations of data.
Matrix Multiplication as Composition In both machine learning and deep learning, sequential transformations can be compactly represented
as a single matrix, created by multiplying the matrices representing the individual transformations. This is
used extensively in deep learning where each layer of a neural network can be seen as a matrix
transformation of the input.

Linear Transformation of Non-square Matrix Non-square matrices are common in machine learning and deep learning because the number of
features doesn't usually match the number of data points. Their transformations can be used for
dimensionality reduction or feature construction.

Dot Product Dot product is a way of multiplying vectors that results in a scalar. It's used in machine learning to
compute similarity measures and in deep learning, for instance, to calculate the weighted sum of inputs
in a neural network layer.
Cross Product [L] The cross product of two vectors results in a vector that's orthogonal to the plane containing the original
vectors. In machine learning, it's used less often due to its restriction to three dimensions, but it might
appear in specific applications that involve 3D data.

Tensors What are Tensors Tensors are a generalization of scalars, vectors, and matrices to higher dimensions. In machine learning
and deep learning, they are used to represent and manipulate data of various dimensionalities, such as
1D for time series, 2D for images, or 3D for videos.
Importance of Tensors in Deep Learning
Tensor Operations Operations such as tensor addition, multiplication, and reshaping are common in deep learning
algorithms for manipulating data and weights.
Data Representation using Tensors In machine learning and deep learning, tensors are used to represent multidimensional data. For
instance, an image can be represented as a 3D tensor with dimensions for height, width, and color
channels.

Eigen Values and Vectors Eigen Vectors and Eigen Values These concepts are used in machine learning for dimensionality reduction (PCA), understanding linear
transformations, and more. In deep learning, they're used to understand the behavior of optimization
algorithms.
Eigen Faces [L] This is a specific application of eigenvectors used for facial recognition. The 'eigenfaces' represent the
directions in which the images of faces show the most variation.
Principal Component Analysis [L] PCA is a dimensionality reduction technique used in machine learning to remove noise, visualize high-
dimensional data, and more. While not used as often in deep learning, it's sometimes used for visualizing
learned embeddings or activations.

Matrix Factorization LU Decomposition[L] LU decomposition is a method of solving linear equations, which can arise in machine learning models
like linear regression. While not often used directly in deep learning, it's a fundamental linear algebra
operation.
QR Decomposition[L] QR decomposition can be used in machine learning for solving linear regression problems or for
numerical stability in certain algorithms. In deep learning, it's often used in some optimization methods.
Eigen Decompositon[L] This is used in machine learning to solve problems that involve understanding the underlying structure of
data, like PCA. In deep learning, eigen decomposition can be used to analyze the weights of a model.
Singular Value Decomposition[L] SVD is a method used in machine learning for dimensionality reduction, latent semantic analysis, and
more. In deep learning, SVD can be used for model compression or initialization.
Non-Negative Matrix Factorization[L] NMF is a matrix factorization technique often used in machine learning for dimensionality reduction and
feature extraction in datasets where the data and the features are non-negative. In deep learning, NMF is
less common, but might be used in some specific data preprocessing or analysis tasks.
Advanced Topics Moore-Penrose Pseudoinverse[L] The pseudoinverse provides a way to solve systems of linear equations that may not have a unique
solution. This is useful in machine learning algorithms such as linear regression. In deep learning, it can be
used in calculating the weights of certain network architectures.
Quadratic Forms[L] Quadratic forms appear in many machine learning algorithms such as support vector machines and
Gaussian processes. In deep learning, they are often found in the formulation of loss functions and
regularization terms.
Positive Definite Matrices[L] Positive definiteness is a property of matrices that guarantees the existence of a unique solution to
certain systems of equations, which is used in many machine learning algorithms. In deep learning,
positive definite matrices appear in the analysis of optimization methods, ensuring certain desirable
properties like convergence.
Hadamard Product[L] The Hadamard product is the element-wise multiplication of matrices. It is used in machine learning in
various ways, for instance, in computing certain types of features. In deep learning, it's used in operations
such as gating in recurrent neural networks (RNNs).

Tools and Libraries Numpy Numpy is a fundamental library for numerical computation in Python and is used extensively in both
machine learning and deep learning for operations on arrays and matrices.
Scipy[L] Scipy is a library for scientific computing in Python that builds on Numpy. It's used in machine learning for
tasks like optimization, statistical testing, and some specific models like hierarchical clustering. In deep
learning, Scipy might be used for tasks like image processing or signal processing.
Module Topic

What is Probability Basic Terms like Randome Experment, Trial, Outcome, Sample Space, Event
Types of Events
Empirical Probability Vs Theoritical Probability

Random Variable What is a Random Variable


Probability Distribution of a Random Variable
Mean of a Random Variable
Variane of a Random Variable

Contingency Tables in Probability Venn Diagrams


Joint Probability
Marginal Probability
Conditional Probability

Bayes Theorem Independent Events


Mutually Exculsive Events
Bayes Theorem
Module Topic Usage in Machine Learning

Descriptive Statistics What is Stats/Types of Stats


Population Vs Sample In machine learning, a population might refer to the entire set of data relevant to a problem, while a sample would be a
subset of that data. Training a model typically happens on a sample of the total data (the training set), which is
assumed to be representative of the population. This concept is used to perform inferential statistics and to estimate the
model's performance on unseen data.

Types of Data Understanding the type of data you're working with helps in selecting the appropriate preprocessing techniques, feature
engineering methods, and machine learning models.

Measures of Central Tendency These measures provide the central value of a data distribution. They are used to understand the 'typical' value in a
dataset and are used in various areas of machine learning including exploratory data analysis, outlier detection, and
data imputation.

- Mean Yellow Important


- Median Red Extremely Important
- Mode [L] Later
- Weighted Mean [L]
- Trimmed Mean [L]

Measure of Dispersion These measures provide insights into the spread or variability of the data distribution. They help in understanding the
consistency in the data and are also used in exploratory data analysis, outlier detection, feature normalization, etc.

- Range
- Variance
- Standard Deviation
- Coefficient of Variation

Quantiles and Percentiles These help in understanding the distribution of data and are used in descriptive statistics, outlier detection, and setting
up thresholds for decision-making.

5 number summary and BoxPlot These are used in the exploratory data analysis phase to understand the data distribution and identify outliers. Boxplots
graphically depict the minimum, first quartile, median, third quartile, and maximum of a dataset.

Skewness These are used to understand the asymmetry and tailedness of the data distribution, respectively. They're particularly
useful in exploratory data analysis, informing data transformations needed to meet the assumptions of some machine
learning algorithms.

Kurtosis [L]

Plotting Graphs Graphical analysis is crucial in the exploratory phase of machine learning. It helps in understanding the distributions of
individual variables (univariate), relationships between two variables (bivariate), or complex interactions among
multiple variables (multivariate).

- Univariate Analysis
- Bivariate Analysis
- Multivariate Analysis

Correlation Covariance Covariance is a measure that indicates the extent to which two variables change in tandem. The covariance matrix, on
the other hand, gives the covariance between each pair of features in a dataset. These concepts are used in many
machine learning algorithms, such as Principal Component Analysis (PCA) for dimensionality reduction, or Gaussian
Mixture Models for clustering.

Covariance Matrix
Pearson Correlation Coefficient This statistic measures the linear relationship between two datasets. It's used in feature selection, where highly
correlated input features can be identified and reduced, to improve the performance and interpretability of the model.

Spearman Correlation Coefficient [L] This measures the monotonic relationship between two datasets. It's useful when the data doesn't meet the assumptions
of Pearson's correlation (linearity, normality). It can be used in the same contexts as Pearson's correlation coefficient.

Correlation and Causation Correlation measures association between variables, while causation indicates a cause-effect relationship. In machine
learning, it's important to remember that correlation doesn't imply causation, and algorithms based purely on
correlation might fail to generalize well.

Probability DistributionsRandom Variables Random variables and their distributions form the mathematical basis of probabilistic machine learning algorithms.
They help us understand the data's inherent randomness and variability, and guide the choice and behavior of
algorithms.

What are Probability Distributions


Why are Probability Distributions important
Probability Distribution Functions and it's types

Probability Mass Function (PMF) These concepts are critical in understanding and manipulating discrete random variables, often used in algorithms like
Naive Bayes, Hidden Markov Models, etc.

CDF of PMF

Probability Density Function(PDF) These are used for continuous random variables. For instance, in the Gaussian Mixture Model, each cluster is modeled as
a Gaussian distribution with its PDF.

CDF of PDF
Density Estimation [L] Density estimation is the construction of an estimate of the probability distribution that generated a dataset. It's used in
unsupervised learning for tasks such as anomaly detection. Kernel Density Estimation (KDE), a non-parametric way to
estimate the PDF of a random variable, is particularly useful when no suitable parametric form of the data is known.

Parametric Density Estimation [L]


Non-Parametric Density Estimation [L]
Kernel Density Estimation(KDE) [L]

How to use PDF/PMF and CDF in Analysis hese concepts are used for data analysis and visualization, to understand and communicate the distribution and trends
in the data. In machine learning, these analyses can inform the choice of model, preprocessing steps, and potential
feature engineering.

2D Density Plots These plots are a useful tool in exploratory data analysis for visualizing the relationship and density between two
numerical variables. They can reveal patterns and associations in the data that can guide subsequent modeling steps.
For instance, they could help identify clusters for a clustering algorithm in unsupervised learning.

Types of Probability Distributions


Normal Distribution This distribution is fundamental to many machine learning algorithms, including linear regression, logistic regression,
and any algorithm that uses these as a base, such as neural networks. Also, many statistical methods require the
assumption of normally distributed errors.

- Properties of Normal Distribution


- CDF of Normal Distribution
Module Topic Usage in Machine Learning
- Standard Normal Variate

Uniform Distribution This distribution is used in random forest algorithms for feature splits, and also in initialization of weights in neural
networks. It is also used in methods like grid search where you need to randomly sample parameters.

Bernaulli Distribution Used in algorithms that model binary outcomes, such as the Bernoulli Naive Bayes classifier and logistic regression.

Binomial Distribution Used in modelling the number of successes in a fixed number of Bernoulli trials, often applied in classification problems.

Multinomial Distribution Text Classification, topic modelling, deep learning and word embeddings

Log Normal Distribution Useful in various contexts, such as when dealing with variables that are the multiplicative product of other variables, or
when working with data that exhibit skewness.

Pareto Distribution [L] Often used in the realm of anomaly detection or for studying phenomena in the domain of social, quality control, and
economic sciences.

Chi-square Distribution Chi-square tests use this distribution extensively to test relationships between categorical variables. The chi-square
statistic is also used in the context of feature selection.

Student's T Distribution Plays a crucial role in formulating the confidence interval when the sample size is small and/or when the population
standard deviation is unknown.

Poisson Distribution [L] Used for modeling the number of times an event might occur within a set time or space. It's often used in queuing theory
and for time-series prediction models.

Beta Distribution [L] This is a versatile distribution often used in Bayesian methods, and is also the conjugate prior for the Bernoulli, binomial,
negative binomial and geometric distributions.

Gamma Distribution [L] The Gamma distribution is used in a variety of fields, including queuing models, climatology, and financial services. It's
the conjugate prior of the Poisson, exponential, and normal distributions.

Transformations These are used to make the data conform to the assumptions of a machine learning algorithm, enhance the
performance of the algorithm, or help visualize the data. Common examples are the logarithmic, square root, and z-
score standardization transformations.

Confidence Intervals Point Estimates Point estimates are used to provide a single predicted value for a variable of interest. They are used in a wide range of
machine learning algorithms to make predictions. Confidence intervals, on the other hand, give us a range of possible
values within which we can expect the true population parameter to lie, with a given level of confidence. They are used to
understand the reliability of point estimates and are often used to report the results of models.

Confidence Intervals
Confidence Interval(Sigma Known)
Confidence Interval(Sigma Unknown)
Interpreting Confidence Interval
Margin of Error and factors affecting it

Central Limit Theorem Sampling Distribution The concept of a sampling distribution is used to make inferences about a population from a sample. The Central Limit
Theorem (CLT) is a fundamental theorem in statistics that states that the distribution of sample means approximates a
normal distribution as the sample size gets larger, regardless of the shape of the population distribution. This is the
foundation for many machine learning methods and is often used in hypothesis testing and in creating confidence
intervals.

What is CLT
Standard Error This is used to understand the variability in a point estimate. In machine learning, it's often used in constructing
confidence intervals for model parameters and in hypothesis testing.

Hypothesis Tests What is Hypothesis Testing? Hypothesis testing is used extensively in machine learning, especially in model selection, feature selection, and in
checking assumptions related to specific models. For instance, a t-test might be used to determine if the means of two
sets of results (like two algorithms) are significantly different.

Null and Alternate Hypothesis These are fundamental components of all hypothesis tests. The null hypothesis typically represents a theory that has
been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not
been proved.

Steps involved in a Hypothesis Test


Performing Z-test These are all components of hypothesis testing, and they're used to make decisions about whether the observed effect
in our sample is real or happened due to chance. These concepts are used in feature selection, model validation, and
comparisons between models.

Rejection Region Approach


Type 1 Vs Type 2 Errors
One Sided vs 2 sided tests
Statistical Power This is the ability of a hypothesis test to detect an effect, if the effect actually exists. In machine learning, power analysis
can be used to estimate the minimum number of observations required to detect an effect.

P-value The p-value is used in hypothesis testing to help support or reject the null hypothesis. It represents the probability that
the results of your test occurred at random. If p-value is small (typically ≤ 0.05), it indicates strong evidence to reject the
null hypothesis. In machine learning, p-values are often used in feature selection where the null hypothesis is that the
feature has no effect on the target variable.

How to interpret P-values

Types of Hypothesis Tests


Z-test Z-tests are statistical calculations that can be used to compare population means to a sample's. The Z-test is used in
machine learning when the data is normally distributed and the population variance is known. It's often used in A/B
testing to decide whether two groups' mean outcomes are different.

T-test T-tests are used when the data is normally distributed but the population variance is unknown.

- Single Sample T-test compares the mean of a single sample to a known population mean.

- Independent 2 sample t-test compares the means of two independent samples.

- Paired 2 sample t-test compares the means of the same group at two different times (say, before and after a treatment). In machine learning,
t-tests are often used in experiments designed to compare the performance of two different algorithms on the same
problem.

Chi-square Test The Chi-square test is used when dealing with categorical variables. It helps to establish if there's a statistically
significant relationship between categorical variables.

Chi-square Goodness of Fit Test determines if a sample data matches a population.

Chi-square Test of Independence checks the relationship between two categorical variables.
Module Topic Usage in Machine Learning

ANOVA ANOVA tests the hypothesis that the means of two or more populations are equal. ANOVAs assess the importance of one
or more factors by comparing the response variable means at the different factor levels. The null hypothesis states that
all population means are equal while the alternative hypothesis states that at least one is different.

One Way Anova It’s used to test for differences among at least three groups, as they relate to one factor or variable.

Two Way Anova It’s used to compare the mean differences between groups that have been split on two independent variables.

F-test

Levene Test [L] This test assesses the equality of variances for a variable calculated for two or more groups. It's often used in feature
selection where the null hypothesis is that the variances are equal.

Shapiro Wilk Test [L] This test is used to check the normality of a distribution. Many machine learning algorithms assume normal distribution,
making this test quite useful.

K-S Test [L] he K-S test is a non-parametric test that compares a sample with a reference probability distribution, or two samples
with each other. It's used in goodness-of-fit tests.

Fisher's Test [L] Fisher's test is used to determine if there are nonrandom associations between two categorical variables.

Miscellaneous Topics Chebyshev's Inequality [L] This mathematical theorem provides a universal boundary on the spread of data, irrespective of the shape of the
distribution. It's useful for understanding the range within which most data points lie and can be applied for outlier
detection. Chebyshev's inequality is also used in the analysis and proof of convergence of some machine learning
algorithms.

QQ Plot A QQ (Quantile-Quantile) plot is a graphical tool to help us assess if a dataset is distributed in a certain way. It's often
used to check the assumption of normality in data. Normality of residuals is an assumption in certain statistical and
machine learning models, so this can help in diagnostic analysis of these models.

Sampling Sampling is the technique of selecting a subset of individuals from a statistical population to estimate characteristics of
the population. It's widely used in machine learning, especially in the context of large datasets, where it may be
computationally infeasible to use the entire population. Techniques such as train-test split, k-fold cross-validation, and
stratified sampling all involve sampling principles.

Resampling Techniques Cross Validation

Bootstraping [L] Bootstrapping is a powerful statistical method for estimating the sampling distribution of an estimator by resampling
with replacement from the original sample. It's used for hypothesis testing and to construct confidence intervals for
generalizing results from a sample to the population. In machine learning, it's used in ensemble methods like Bagging
and Random Forests to generate diverse models by creating different datasets.

Standardization This is a scaling technique where the values are centered around the mean with a unit standard deviation. It's used to
bring data to a common scale without distorting differences in the ranges of values or losing information. Many machine
learning algorithms perform better with standardized input features.

Normalization Similar to standardization, normalization is a scaling technique that modifies the values of numeric columns in the
dataset to a common scale, but without distorting differences in the ranges of values or losing information. It's also
known as Min-Max scaling.

Statistical Moments [L] The statistical moments (mean, variance, skewness, and kurtosis) capture different aspects of the distribution shape.
They are used in machine learning to describe, understand, and compare variable distributions. In particular, skewness
and kurtosis can be used in feature engineering to create new features or to select features.

Bayesian Statistics Hyperparameter Tuning

A/B Testing
Law of Large Numbers
Module Topic

Diffrentiation What is Diffrentiation


Diffrentiation of a Constant
Power Rule
Sum Rule
Product Rule
Quotient Rule
Chain Rule
Partial Diffrentiation
Higher Order Derivatives
Matrix Diffrentiation

Optimization Theory Function


Multivariate Funtions
Parameters of a Function
Parametric Vs Non Parameteric Models
Maxima & Minima
Loss Functions
How to select a good Loss Function
Calculating Parameters of a Loss Function
Convex & Concave Loss Functions
Gradient Descent
Gradient Descent with Multiple Parameters
Hessians
Problems faced in Optimization
Hessians
Constrained Optimization Problem

You might also like