KEMBAR78
100+ Mathematics For Machine Learning - ComprehensiveEdition | PDF | Principal Component Analysis | Variance
0% found this document useful (0 votes)
33 views10 pages

100+ Mathematics For Machine Learning - ComprehensiveEdition

The document outlines essential mathematical concepts for machine learning, covering basic linear algebra, probability and statistics, calculus, optimization, regression, neural networks, clustering, dimensionality reduction, probability distributions, and reinforcement learning. Each section provides key equations and definitions necessary for understanding and applying these concepts in machine learning contexts. It serves as a comprehensive reference for learners and practitioners in the field.

Uploaded by

zefirisanat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views10 pages

100+ Mathematics For Machine Learning - ComprehensiveEdition

The document outlines essential mathematical concepts for machine learning, covering basic linear algebra, probability and statistics, calculus, optimization, regression, neural networks, clustering, dimensionality reduction, probability distributions, and reinforcement learning. Each section provides key equations and definitions necessary for understanding and applying these concepts in machine learning contexts. It serves as a comprehensive reference for learners and practitioners in the field.

Uploaded by

zefirisanat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Mathematics for Machine Learning: Essential

Equations (V4)

1. Basic Linear Algebra


• Scalar Multiplication:  
cv1
 cv2 
c · v =  .. 
 
 . 
cvn

• Matrix-Vector Multiplication:
   
a11 a12 ··· a1n v1
 a21 a22 ··· a2n   v2 
 
A · v =  .. ..  ·  .. 
 
.. ...
 . . .  .
am1 am2 · · · amn vn

• Norm of a Vector: q
||v|| = v12 + v22 + · · · + vn2

• Dot Product: n
X
u·v = ui vi
i=1

• Cross Product (3D Vectors):


 
u2 v3 − u3 v2
u × v = u3 v1 − u1 v3 
u1 v2 − u2 v1

• Outer Product:  
u1 v1 u1 v2 ··· u1 vn
 u2 v1 u2 v2 ··· u2 vn 
u ⊗ v =  ..
 
.. .. .. 
 . . . . 
um v1 um v2 · · · um vn

• Matrix Addition:
 
a11 + b11 · · · a1n + b1n
A+B=
 .. .. .. 
. . . 
am1 + bm1 · · · amn + bmn

1
• Matrix Multiplication:
n
X
(A · B)ij = aik bkj
k=1

• Transpose of a Matrix:
(AT )ij = aji

• Inverse of a Matrix (for square A):


A−1 · A = I, where I is the identity matrix.

2. Basic Probability and Statistics


• Conditional Probability:
P (A ∩ B)
P (A|B) =
P (B)

• Law of Total Probability:


X X
P (A) = P (A ∩ Bi ) = P (A|Bi )P (Bi )
i i

• Bayes’ Theorem:
P (B|A)P (A)
P (A|B) =
P (B)
• Expectation:
X Z
E[X] = xi P (xi ) (discrete) or xp(x)dx (continuous)
i

• Variance:
Var(X) = E[(X − E[X])2 ]

• Standard Deviation: p
σ= Var(X)

• Covariance:
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

• Correlation Coefficient:
Cov(X, Y )
ρX,Y =
σX σY
• Probability Mass Function (PMF):
X
P (X = x) = p(x), p(x) = 1
x

• Probability Density Function (PDF):


Z ∞
p(x)dx = 1 for continuous random variables.
−∞

2
3. Basic Calculus
• Derivative of a Function:
d f (x + h) − f (x)
[f (x)] = lim
dx h→0 h

• Partial Derivatives:
∂f f (x + h, y) − f (x, y)
= lim
∂x h→0 h

• Gradient:  ∂f 
∂x1
 ∂f 
 ∂x2 
∇f (x) =  . 
 .. 
∂f
∂xn

• Chain Rule:
dy dy du
= ·
dx du dx
• Second Derivative (Hessian Matrix):
∂2f ∂2f ∂2f
 
∂x2 ∂x1 ∂x2
··· ∂x1 ∂xn
 ∂ 2 f1 ∂2f ∂2f 
 ∂x2 ∂x1 ∂x22
··· ∂x2 ∂xn 

H(f ) = 
.. .. .. ..
.

 . . . 
∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2
··· ∂x2n

• Taylor Series Expansion:

f ′′ (a)
f (x) ≈ f (a) + f ′ (a)(x − a) + (x − a)2 + · · ·
2!

• Gradient Descent Update Rule:

w ← w − η∇J(w)

• Optimization Objective:
min f (x)
x

• Logarithmic Derivative:
d 1
[ln x] =
dx x
• Exponential Derivative:
d x
[e ] = ex
dx

3
4. Basic Optimization
• Gradient Descent:
wt+1 = wt − η∇J(wt )

• Learning Rate Decay:


η0
ηt =
1 + λt
• Stochastic Gradient Descent (SGD):

w ← w − η∇J(w; xi , yi )

• Momentum-based Optimization:

vt = βvt−1 + (1 − β)∇J(w), w ← w − ηvt

• Nesterov Accelerated Gradient (NAG):

wt+1 = wt − η∇J(wt + β(wt − wt−1 ))

• RMSProp:
η
w←w− p ∇J(w)
∇2 J(w) + ϵ

• Adam Optimization:

mt = β1 mt−1 + (1 − β1 )∇J(w), vt = β2 vt−1 + (1 − β2 )(∇J(w))2


mt vt
m̂t = , v̂t =
1 − β1t 1 − β2t
η m̂t
wt+1 ← wt − √
v̂t + ϵ
• Regularized Optimization Objective:

J(w) = Loss(w) + λ||w||2

• Projection Gradient Descent:

wt+1 = ΠC (wt − η∇J(wt )) where ΠC projects onto set C

• Newton’s Method:

wt+1 = wt − ηH−1 ∇J(wt ) where H is the Hessian matrix.

4
5. Basic Regression Equations
• Linear Regression Hypothesis:

ŷ = X · w + b

• Mean Absolute Error (MAE):


m
1 X
MAE = |yi − ŷi |
m i=1

• Mean Squared Error (MSE):


m
1 X
MSE = (yi − ŷi )2
m i=1

• Ridge Regression Objective:


m n
1 X 2
X
J(w) = (ŷi − yi ) + λ wj2
m i=1 j=1

• Lasso Regression Objective:


m n
1 X X
J(w) = (ŷi − yi )2 + λ |wj |
m i=1 j=1

• Logistic Regression Hypothesis:


1
ŷ = σ(X · w + b), σ(z) =
1 + e−z

• Binary Cross-Entropy Loss:


m
1 X
J(w) = − [yi log(ŷi ) + (1 − yi ) log(1 − ŷi )]
m i=1

• Coefficient of Determination (R-squared):


Pm 2
2 i=1 (yi − ŷi )
R =1− P m 2
i=1 (yi − ȳ)

• Adjusted R-squared:
(1 − R2 )(n − 1)
R̄2 = 1 −
n−p−1

• Gradient of the MSE Loss:


1 T
∇J(w) = X (Xw − y)
m
5
6. Basic Neural Network Concepts
• Perceptron Update Rule:

w ← w + η(y − ŷ)x

• Sigmoid Activation Function:


1
σ(z) =
1 + e−z

• ReLU Activation Function:

f (x) = max(0, x)

• Softmax Function:
ezi
Softmax(zi ) = Pn
j=1 ezj

• Loss Function for Multi-Class Classification:


m K
1 XX
J(w) = − yik log(ŷik )
m i=1 k=1

• Forward Propagation (Single Layer):

a = σ(wT x + b)

• Backward Propagation (Gradient for Weights):

∂J
= x(ŷ − y)
∂w

• Gradient Descent for Neural Networks:


∂J
w ←w−η
∂w

• Dropout Regularization:
(l) (l)
hi = ri hi , ri ∼ Bernoulli(p)

• Batch Normalization:
xi − µ B
x̂i = p 2 , yi = γ x̂i + β
σB + ϵ

6
7. Basic Clustering Concepts
• k-Means Objective Function:
K X
X
J= ||xi − µk ||2
k=1 i∈Ck

• Centroid Update Rule:


1 X
µk = x
|Ck | x∈C
k

• Distance Metric (Euclidean Distance):


v
u n
uX
d(x, y) = t (xi − yi )2
i=1

• Silhouette Score:
b(i) − a(i)
s(i) =
max(a(i), b(i))
• DBSCAN Core Point Condition:

|Nϵ (x)| ≥ MinPts where Nϵ (x) = {y : d(x, y) ≤ ϵ}

• Hierarchical Clustering Dendrogram Objective:

Minimize the linkage criterion L(A, B)

• Gaussian Mixture Model (GMM):


K
X
p(x) = πk N (x|µk , Σk )
k=1

• Expectation-Maximization (E-step):

πk N (xi |µk , Σk )
γik = PK
j=1 πj N (xi |µj , Σj )

• Expectation-Maximization (M-step):
PN PN
γik xi γik (xi − µk )(xi − µk )T
µk = Pi=1
N
and Σk = i=1
PN
i=1 γik i=1 γik

• Elbow Method for Optimal k:

Choose k where J(k) has the largest drop.

7
8. Basic Dimensionality Reduction Concepts
• Principal Component Analysis (PCA) Objective:

Maximize ||Xw||2 subject to ||w|| = 1

• Covariance Matrix for PCA:


1 T
C= X X
m

• Eigen Decomposition for PCA:

Cw = λw

• t-SNE Objective: X pij


C= pij log
i̸=j
qij

• Singular Value Decomposition (SVD):

X = UΣVT

• LDA Objective (Fisher’s Criterion):

wT Sb w
J(w) =
wT Sw w

• Reconstruction Error for PCA:

Error = ||X − X̂||F

• Kernel PCA Transformation:

ϕ(x) → Principal Components in Feature Space

• Autoencoder Reconstruction:

X ≈ g(f (X))

• Explained Variance Ratio:


λi
Ratio = P
j λj

8
9. Basic Probability Distributions
• Bernoulli Distribution:

P (X = x) = px (1 − p)1−x , x ∈ {0, 1}

• Binomial Distribution:
 
n k
P (X = k) = p (1 − p)n−k , k ∈ {0, 1, . . . , n}
k

• Poisson Distribution:
λk e−λ
P (X = k) = , k≥0
k!

• Uniform Distribution:
(
1
b−a
, a≤x≤b
f (x) =
0, otherwise

• Normal Distribution:
1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
• Exponential Distribution:
(
λe−λx , x ≥ 0
f (x) =
0, x<0

• Beta Distribution:
xα−1 (1 − x)β−1
f (x; α, β) = , x ∈ [0, 1]
B(α, β)

• Gamma Distribution:
β α xα−1 e−βx
f (x; α, β) = , x≥0
Γ(α)

• Multinomial Distribution:
n!
P (X1 = x1 , . . . , Xk = xk ) = px1 1 px2 2 · · · pxkk
x1 !x2 ! · · · xk !

• Chi-Square Distribution:
k x
x 2 −1 e− 2
f (x; k) = k , x≥0
2 2 Γ( k2 )

9
10. Basic Reinforcement Learning Concepts
• Bellman Equation for State-Value Function:

V (s) = E[Rt + γV (St+1 )|St = s]

• Bellman Equation for Action-Value Function:

Q(s, a) = E[Rt + γQ(St+1 , At+1 )|St = s, At = a]

• Policy Improvement:
π ′ (s) = arg max Q(s, a)
a

• Temporal Difference Update Rule:

V (St ) ← V (St ) + α[Rt+1 + γV (St+1 ) − V (St )]

• Q-Learning Update Rule:

Q(St , At ) ← Q(St , At ) + α[Rt+1 + γ max Q(St+1 , a) − Q(St , At )]


a

• SARSA Update Rule:

Q(St , At ) ← Q(St , At ) + α[Rt+1 + γQ(St+1 , At+1 ) − Q(St , At )]

• Reward Function:
R(s, a) = E[Rt |St = s, At = a]

• Value Iteration Update Rule:


X
V (s) ← max[R(s, a) + γ P (s′ |s, a)V (s′ )]
a
s′

• Actor-Critic Policy Update:

θ ← θ + α∇θ log πθ (a|s)δ

• Discounted Return: ∞
X
Gt = γ k Rt+k+1
k=0

10

You might also like