0% found this document useful (0 votes)

33 views10 pages

100+ Mathematics For Machine Learning - ComprehensiveEdition

The document outlines essential mathematical concepts for machine learning, covering basic linear algebra, probability and statistics, calculus, optimization, regression, neural networks, clustering, dimensionality reduction, probability distributions, and reinforcement learning. Each section provides key equations and definitions necessary for understanding and applying these concepts in machine learning contexts. It serves as a comprehensive reference for learners and practitioners in the field.

Uploaded by

zefirisanat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views10 pages

100+ Mathematics For Machine Learning - ComprehensiveEdition

Uploaded by

zefirisanat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Mathematics for Machine Learning: Essential

Equations (V4)

1. Basic Linear Algebra

• Scalar Multiplication:  
cv1
 cv2 
c · v =  .. 
 
 . 
cvn

• Matrix-Vector Multiplication:
   
a11 a12 ··· a1n v1
 a21 a22 ··· a2n   v2 
 
A · v =  .. ..  ·  .. 
 
.. ...
 . . .  .
am1 am2 · · · amn vn

• Norm of a Vector: q
||v|| = v12 + v22 + · · · + vn2

• Dot Product: n
X
u·v = ui vi
i=1

• Cross Product (3D Vectors):

 
u2 v3 − u3 v2
u × v = u3 v1 − u1 v3 
u1 v2 − u2 v1

• Outer Product:  
u1 v1 u1 v2 ··· u1 vn
 u2 v1 u2 v2 ··· u2 vn 
u ⊗ v =  ..
 
.. .. .. 
 . . . . 
um v1 um v2 · · · um vn

• Matrix Addition:
 
a11 + b11 · · · a1n + b1n
A+B=
 .. .. .. 
. . . 
am1 + bm1 · · · amn + bmn

1
• Matrix Multiplication:
n
X
(A · B)ij = aik bkj
k=1

• Transpose of a Matrix:
(AT )ij = aji

• Inverse of a Matrix (for square A):

A−1 · A = I, where I is the identity matrix.

2. Basic Probability and Statistics

• Conditional Probability:
P (A ∩ B)
P (A|B) =
P (B)

• Law of Total Probability:

X X
P (A) = P (A ∩ Bi ) = P (A|Bi )P (Bi )
i i

• Bayes’ Theorem:
P (B|A)P (A)
P (A|B) =
P (B)
• Expectation:
X Z
E[X] = xi P (xi ) (discrete) or xp(x)dx (continuous)
i

• Variance:
Var(X) = E[(X − E[X])2 ]

• Standard Deviation: p
σ= Var(X)

• Covariance:
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

• Correlation Coefficient:
Cov(X, Y )
ρX,Y =
σX σY
• Probability Mass Function (PMF):
X
P (X = x) = p(x), p(x) = 1
x

• Probability Density Function (PDF):

Z ∞
p(x)dx = 1 for continuous random variables.
−∞

2
3. Basic Calculus
• Derivative of a Function:
d f (x + h) − f (x)
[f (x)] = lim
dx h→0 h

• Partial Derivatives:
∂f f (x + h, y) − f (x, y)
= lim
∂x h→0 h

• Gradient:  ∂f 
∂x1
 ∂f 
 ∂x2 
∇f (x) =  . 
 .. 
∂f
∂xn

• Chain Rule:
dy dy du
= ·
dx du dx
• Second Derivative (Hessian Matrix):
∂2f ∂2f ∂2f
 
∂x2 ∂x1 ∂x2
··· ∂x1 ∂xn
 ∂ 2 f1 ∂2f ∂2f 
 ∂x2 ∂x1 ∂x22
··· ∂x2 ∂xn 

H(f ) = 
.. .. .. ..
.

 . . . 
∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2
··· ∂x2n

• Taylor Series Expansion:

f ′′ (a)
f (x) ≈ f (a) + f ′ (a)(x − a) + (x − a)2 + · · ·
2!

• Gradient Descent Update Rule:

w ← w − η∇J(w)

• Optimization Objective:
min f (x)
x

• Logarithmic Derivative:
d 1
[ln x] =
dx x
• Exponential Derivative:
d x
[e ] = ex
dx

3
4. Basic Optimization
• Gradient Descent:
wt+1 = wt − η∇J(wt )

• Learning Rate Decay:

η0
ηt =
1 + λt
• Stochastic Gradient Descent (SGD):

w ← w − η∇J(w; xi , yi )

• Momentum-based Optimization:

vt = βvt−1 + (1 − β)∇J(w), w ← w − ηvt

• Nesterov Accelerated Gradient (NAG):

wt+1 = wt − η∇J(wt + β(wt − wt−1 ))

• RMSProp:
η
w←w− p ∇J(w)
∇2 J(w) + ϵ

• Adam Optimization:

mt = β1 mt−1 + (1 − β1 )∇J(w), vt = β2 vt−1 + (1 − β2 )(∇J(w))2

mt vt
m̂t = , v̂t =
1 − β1t 1 − β2t
η m̂t
wt+1 ← wt − √
v̂t + ϵ
• Regularized Optimization Objective:

J(w) = Loss(w) + λ||w||2

• Projection Gradient Descent:

wt+1 = ΠC (wt − η∇J(wt )) where ΠC projects onto set C

• Newton’s Method:

wt+1 = wt − ηH−1 ∇J(wt ) where H is the Hessian matrix.

4
5. Basic Regression Equations
• Linear Regression Hypothesis:

ŷ = X · w + b

• Mean Absolute Error (MAE):

m
1 X
MAE = |yi − ŷi |
m i=1

• Mean Squared Error (MSE):

m
1 X
MSE = (yi − ŷi )2
m i=1

• Ridge Regression Objective:

m n
1 X 2
X
J(w) = (ŷi − yi ) + λ wj2
m i=1 j=1

• Lasso Regression Objective:

m n
1 X X
J(w) = (ŷi − yi )2 + λ |wj |
m i=1 j=1

• Logistic Regression Hypothesis:

1
ŷ = σ(X · w + b), σ(z) =
1 + e−z

• Binary Cross-Entropy Loss:

m
1 X
J(w) = − [yi log(ŷi ) + (1 − yi ) log(1 − ŷi )]
m i=1

• Coefficient of Determination (R-squared):

Pm 2
2 i=1 (yi − ŷi )
R =1− P m 2
i=1 (yi − ȳ)

• Adjusted R-squared:
(1 − R2 )(n − 1)
R̄2 = 1 −
n−p−1

• Gradient of the MSE Loss:

1 T
∇J(w) = X (Xw − y)
m
5
6. Basic Neural Network Concepts
• Perceptron Update Rule:

w ← w + η(y − ŷ)x

• Sigmoid Activation Function:

1
σ(z) =
1 + e−z

• ReLU Activation Function:

f (x) = max(0, x)

• Softmax Function:
ezi
Softmax(zi ) = Pn
j=1 ezj

• Loss Function for Multi-Class Classification:

m K
1 XX
J(w) = − yik log(ŷik )
m i=1 k=1

• Forward Propagation (Single Layer):

a = σ(wT x + b)

• Backward Propagation (Gradient for Weights):

∂J
= x(ŷ − y)
∂w

• Gradient Descent for Neural Networks:

∂J
w ←w−η
∂w

• Dropout Regularization:
(l) (l)
hi = ri hi , ri ∼ Bernoulli(p)

• Batch Normalization:
xi − µ B
x̂i = p 2 , yi = γ x̂i + β
σB + ϵ

6
7. Basic Clustering Concepts
• k-Means Objective Function:
K X
X
J= ||xi − µk ||2
k=1 i∈Ck

• Centroid Update Rule:

1 X
µk = x
|Ck | x∈C
k

• Distance Metric (Euclidean Distance):

v
u n
uX
d(x, y) = t (xi − yi )2
i=1

• Silhouette Score:
b(i) − a(i)
s(i) =
max(a(i), b(i))
• DBSCAN Core Point Condition:

|Nϵ (x)| ≥ MinPts where Nϵ (x) = {y : d(x, y) ≤ ϵ}

• Hierarchical Clustering Dendrogram Objective:

Minimize the linkage criterion L(A, B)

• Gaussian Mixture Model (GMM):

K
X
p(x) = πk N (x|µk , Σk )
k=1

• Expectation-Maximization (E-step):

πk N (xi |µk , Σk )
γik = PK
j=1 πj N (xi |µj , Σj )

• Expectation-Maximization (M-step):
PN PN
γik xi γik (xi − µk )(xi − µk )T
µk = Pi=1
N
and Σk = i=1
PN
i=1 γik i=1 γik

• Elbow Method for Optimal k:

Choose k where J(k) has the largest drop.

7
8. Basic Dimensionality Reduction Concepts
• Principal Component Analysis (PCA) Objective:

Maximize ||Xw||2 subject to ||w|| = 1

• Covariance Matrix for PCA:

1 T
C= X X
m

• Eigen Decomposition for PCA:

Cw = λw

• t-SNE Objective: X pij

C= pij log
i̸=j
qij

• Singular Value Decomposition (SVD):

X = UΣVT

• LDA Objective (Fisher’s Criterion):

wT Sb w
J(w) =
wT Sw w

• Reconstruction Error for PCA:

Error = ||X − X̂||F

• Kernel PCA Transformation:

ϕ(x) → Principal Components in Feature Space

• Autoencoder Reconstruction:

X ≈ g(f (X))

• Explained Variance Ratio:

λi
Ratio = P
j λj

8
9. Basic Probability Distributions
• Bernoulli Distribution:

P (X = x) = px (1 − p)1−x , x ∈ {0, 1}

• Binomial Distribution:

n k
P (X = k) = p (1 − p)n−k , k ∈ {0, 1, . . . , n}
k

• Poisson Distribution:
λk e−λ
P (X = k) = , k≥0
k!

• Uniform Distribution:
(
1
b−a
, a≤x≤b
f (x) =
0, otherwise

• Normal Distribution:
1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
• Exponential Distribution:
(
λe−λx , x ≥ 0
f (x) =
0, x<0

• Beta Distribution:
xα−1 (1 − x)β−1
f (x; α, β) = , x ∈ [0, 1]
B(α, β)

• Gamma Distribution:
β α xα−1 e−βx
f (x; α, β) = , x≥0
Γ(α)

• Multinomial Distribution:
n!
P (X1 = x1 , . . . , Xk = xk ) = px1 1 px2 2 · · · pxkk
x1 !x2 ! · · · xk !

• Chi-Square Distribution:
k x
x 2 −1 e− 2
f (x; k) = k , x≥0
2 2 Γ( k2 )

9
10. Basic Reinforcement Learning Concepts
• Bellman Equation for State-Value Function:

V (s) = E[Rt + γV (St+1 )|St = s]

• Bellman Equation for Action-Value Function:

Q(s, a) = E[Rt + γQ(St+1 , At+1 )|St = s, At = a]

• Policy Improvement:
π ′ (s) = arg max Q(s, a)
a

• Temporal Difference Update Rule:

V (St ) ← V (St ) + α[Rt+1 + γV (St+1 ) − V (St )]

• Q-Learning Update Rule:

Q(St , At ) ← Q(St , At ) + α[Rt+1 + γ max Q(St+1 , a) − Q(St , At )]

• SARSA Update Rule:

Q(St , At ) ← Q(St , At ) + α[Rt+1 + γQ(St+1 , At+1 ) − Q(St , At )]

• Reward Function:
R(s, a) = E[Rt |St = s, At = a]

• Value Iteration Update Rule:

X
V (s) ← max[R(s, a) + γ P (s′ |s, a)V (s′ )]
a
s′

• Actor-Critic Policy Update:

θ ← θ + α∇θ log πθ (a|s)δ

• Discounted Return: ∞
X
Gt = γ k Rt+k+1
k=0

Mathematics For Machine Learning V5
No ratings yet
Mathematics For Machine Learning V5
10 pages
DL Unit 1
No ratings yet
DL Unit 1
9 pages
ML Iit Madras Summary (1-12)
No ratings yet
ML Iit Madras Summary (1-12)
43 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
Basic Concepts For Understanding ML & DL
No ratings yet
Basic Concepts For Understanding ML & DL
8 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
No ratings yet
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
223 pages
Super Cheatsheet Artificial Intelligence
No ratings yet
Super Cheatsheet Artificial Intelligence
18 pages
CS229
No ratings yet
CS229
216 pages
Main Notes
No ratings yet
Main Notes
227 pages
Cs229-Main Notes Andrew NG and Tengyu Ma
No ratings yet
Cs229-Main Notes Andrew NG and Tengyu Ma
227 pages
Main Notes
No ratings yet
Main Notes
227 pages
CS229: Machine Learning Notes
No ratings yet
CS229: Machine Learning Notes
241 pages
Maths Behind ML Algos
No ratings yet
Maths Behind ML Algos
18 pages
Math Behind ML Algos
No ratings yet
Math Behind ML Algos
18 pages
The Science of Deep Learning Iddo Drori
No ratings yet
The Science of Deep Learning Iddo Drori
37 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Inference and Learning
No ratings yet
Inference and Learning
33 pages
ML Math Notes 2025
No ratings yet
ML Math Notes 2025
3 pages
6036 Lecture Notes
No ratings yet
6036 Lecture Notes
56 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
135 pages
LN ML Rug
No ratings yet
LN ML Rug
283 pages
ML Question Bank Ese
No ratings yet
ML Question Bank Ese
37 pages
Introduction To Machine Learning by Ethem Alpaydin 2nded - 2010
No ratings yet
Introduction To Machine Learning by Ethem Alpaydin 2nded - 2010
314 pages
Andrew NG Main - Notes PDF
100% (1)
Andrew NG Main - Notes PDF
226 pages
Maths for Intelligent Systems Guide
No ratings yet
Maths for Intelligent Systems Guide
76 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Content-CS229 MachineLearning Notes
No ratings yet
Content-CS229 MachineLearning Notes
4 pages
Deep Learning Chapter 1
No ratings yet
Deep Learning Chapter 1
46 pages
Notes Class1 Copy 2
No ratings yet
Notes Class1 Copy 2
225 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
Super VIP Cheat Sheet: Arti Cial Intelligence
No ratings yet
Super VIP Cheat Sheet: Arti Cial Intelligence
18 pages
Back Propogation
No ratings yet
Back Propogation
43 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
39 pages
Main
No ratings yet
Main
183 pages
I2DL Student Lecture Notes
No ratings yet
I2DL Student Lecture Notes
97 pages
6.036 Notes
No ratings yet
6.036 Notes
99 pages
Machine Learning
No ratings yet
Machine Learning
55 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Data Input
No ratings yet
Data Input
6 pages
10 1 1 672 7118 PDF
No ratings yet
10 1 1 672 7118 PDF
35 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Machine Learning Exam Review Guide
No ratings yet
Machine Learning Exam Review Guide
32 pages
009 Neural - Networks Complete
No ratings yet
009 Neural - Networks Complete
61 pages
Full Maths Syllabus For Machine Learning
100% (1)
Full Maths Syllabus For Machine Learning
31 pages
Week 4
No ratings yet
Week 4
61 pages
Detailed Contents
No ratings yet
Detailed Contents
8 pages
Answer Key
No ratings yet
Answer Key
12 pages
Probabilistic Machine Learning Guide
No ratings yet
Probabilistic Machine Learning Guide
343 pages
Generative AI Engineering Basics
No ratings yet
Generative AI Engineering Basics
25 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
2024 Hse Policy
No ratings yet
2024 Hse Policy
2 pages
Physic SS1
No ratings yet
Physic SS1
50 pages
A Mathematical Model For The Control of Cholera in
No ratings yet
A Mathematical Model For The Control of Cholera in
5 pages
RUAE Homework Booklet 2
No ratings yet
RUAE Homework Booklet 2
36 pages
Iem Model QP 2024-2025
No ratings yet
Iem Model QP 2024-2025
2 pages
PROCATINATION
No ratings yet
PROCATINATION
6 pages
Artificial Recharge by Floodwater Spreading Estimated by Water Balances and Groundwater Modelling in Arid Iran
No ratings yet
Artificial Recharge by Floodwater Spreading Estimated by Water Balances and Groundwater Modelling in Arid Iran
16 pages
Bilingual Education and Early Childhood Care
No ratings yet
Bilingual Education and Early Childhood Care
9 pages
ISO 1444 Meat - and - Meat - Products - Free - Fat
No ratings yet
ISO 1444 Meat - and - Meat - Products - Free - Fat
12 pages
Semi-Detailed Demonstration Lesson Plan in Science 7 Third Quarter SY 2019-2020
No ratings yet
Semi-Detailed Demonstration Lesson Plan in Science 7 Third Quarter SY 2019-2020
4 pages
PMLS 1 p4
No ratings yet
PMLS 1 p4
8 pages
Landmine Detection Using Autoencoders On Multipolarization GPR
No ratings yet
Landmine Detection Using Autoencoders On Multipolarization GPR
14 pages
Holmen 200 Manual Ver 1 2 2
No ratings yet
Holmen 200 Manual Ver 1 2 2
24 pages
Irrigation Headworks & Distribution Systems
100% (1)
Irrigation Headworks & Distribution Systems
115 pages
Biological Development
100% (2)
Biological Development
48 pages
Northern Bukidnon Community College: Manolo Fortich, Bukidnon Mobile # 09171426080
No ratings yet
Northern Bukidnon Community College: Manolo Fortich, Bukidnon Mobile # 09171426080
10 pages
Satir Model
No ratings yet
Satir Model
12 pages
Course Handbook GGD 2020-2021
No ratings yet
Course Handbook GGD 2020-2021
46 pages
Astm A615 A615m 16
No ratings yet
Astm A615 A615m 16
4 pages
Di Pa Tapos
No ratings yet
Di Pa Tapos
3 pages
Chapter 1-2 Lecture Note
No ratings yet
Chapter 1-2 Lecture Note
64 pages
'Might Be Something' The Language of Indet - Elisabeth A. Povinelli
No ratings yet
'Might Be Something' The Language of Indet - Elisabeth A. Povinelli
27 pages
Las in Earth Life Science C1W1
No ratings yet
Las in Earth Life Science C1W1
10 pages
Evaluation of Scratch Resistance of Polymeric Coatings and Plastics Using An Instrumented Scratch Machine
No ratings yet
Evaluation of Scratch Resistance of Polymeric Coatings and Plastics Using An Instrumented Scratch Machine
9 pages
Infix Prefix Postfix
No ratings yet
Infix Prefix Postfix
5 pages
Hydrochloric Acid Inhibitor MSDS
No ratings yet
Hydrochloric Acid Inhibitor MSDS
4 pages
Quarter 3, Week 2: Mathematics 7 Activity Sheet
100% (1)
Quarter 3, Week 2: Mathematics 7 Activity Sheet
4 pages
Covariance Matrix
No ratings yet
Covariance Matrix
6 pages
Hypotactic Structure in English Dessy Kurniasy: IAIN Langsa
No ratings yet
Hypotactic Structure in English Dessy Kurniasy: IAIN Langsa
18 pages
NS 450 GRADING RUBRIC QI Proposal - Part A - 11 24
No ratings yet
NS 450 GRADING RUBRIC QI Proposal - Part A - 11 24
2 pages

100+ Mathematics For Machine Learning - ComprehensiveEdition

Uploaded by

100+ Mathematics For Machine Learning - ComprehensiveEdition

Uploaded by

Mathematics for Machine Learning: Essential

1. Basic Linear Algebra

• Cross Product (3D Vectors):

• Inverse of a Matrix (for square A):

2. Basic Probability and Statistics

• Law of Total Probability:

• Probability Density Function (PDF):

• Taylor Series Expansion:

• Gradient Descent Update Rule:

• Learning Rate Decay:

vt = βvt−1 + (1 − β)∇J(w), w ← w − ηvt

• Nesterov Accelerated Gradient (NAG):

wt+1 = wt − η∇J(wt + β(wt − wt−1 ))

mt = β1 mt−1 + (1 − β1 )∇J(w), vt = β2 vt−1 + (1 − β2 )(∇J(w))2

J(w) = Loss(w) + λ||w||2

• Projection Gradient Descent:

wt+1 = ΠC (wt − η∇J(wt )) where ΠC projects onto set C

wt+1 = wt − ηH−1 ∇J(wt ) where H is the Hessian matrix.

• Mean Absolute Error (MAE):

• Mean Squared Error (MSE):

• Ridge Regression Objective:

• Lasso Regression Objective:

• Logistic Regression Hypothesis:

• Binary Cross-Entropy Loss:

• Coefficient of Determination (R-squared):

• Gradient of the MSE Loss:

• Sigmoid Activation Function:

• ReLU Activation Function:

• Loss Function for Multi-Class Classification:

• Forward Propagation (Single Layer):

• Backward Propagation (Gradient for Weights):

• Gradient Descent for Neural Networks:

• Centroid Update Rule:

• Distance Metric (Euclidean Distance):

|Nϵ (x)| ≥ MinPts where Nϵ (x) = {y : d(x, y) ≤ ϵ}

• Hierarchical Clustering Dendrogram Objective:

Minimize the linkage criterion L(A, B)

• Gaussian Mixture Model (GMM):

• Elbow Method for Optimal k:

Choose k where J(k) has the largest drop.

Maximize ||Xw||2 subject to ||w|| = 1

• Covariance Matrix for PCA:

• Eigen Decomposition for PCA:

• t-SNE Objective: X pij

• Singular Value Decomposition (SVD):

• LDA Objective (Fisher’s Criterion):

• Reconstruction Error for PCA:

Error = ||X − X̂||F

• Kernel PCA Transformation:

ϕ(x) → Principal Components in Feature Space

• Explained Variance Ratio:

V (s) = E[Rt + γV (St+1 )|St = s]

• Bellman Equation for Action-Value Function:

Q(s, a) = E[Rt + γQ(St+1 , At+1 )|St = s, At = a]

• Temporal Difference Update Rule:

V (St ) ← V (St ) + α[Rt+1 + γV (St+1 ) − V (St )]

• Q-Learning Update Rule:

Q(St , At ) ← Q(St , At ) + α[Rt+1 + γ max Q(St+1 , a) − Q(St , At )]

• SARSA Update Rule:

Q(St , At ) ← Q(St , At ) + α[Rt+1 + γQ(St+1 , At+1 ) − Q(St , At )]

• Value Iteration Update Rule:

• Actor-Critic Policy Update:

θ ← θ + α∇θ log πθ (a|s)δ

You might also like