0% found this document useful (0 votes)

624 views1 page

ML Cheatsheet

This document provides a cheat sheet summarizing several common algorithms for supervised and unsupervised learning, including k-nearest neighbors, naive Bayes, logistic regression, perceptron, support vector machines, k-means clustering, and mixture of Gaussians. For each algorithm, it describes the basic model, objective function, training process, complexity, and ability to handle nonlinear relationships in the data.

Uploaded by

francoisroyer7709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

624 views1 page

ML Cheatsheet

Uploaded by

francoisroyer7709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Cheat Sheet: Algorithms for Supervised- and Unsupervised Learning

Algorithm Description Model Objective Training Regularisation t = arg max

1
Complexity Non-linear Online learning

i:xi Nk (x,) x

(ti , C) Use cross-validation to learn the appropriate k; otherwise no training, classication based on existing points. k acts as to regularise the classier: as k N the boundary becomes smoother. O(N M ) space complexity, since all training instances and all their features need to be kept in memory.

k-nearest neighbour

The label of a new point x is classied with the most frequent label t of the k nearest training instances.

Nk (x, x) k points in x closest to x Euclidean distance formula: D 2 i=1 (xi xi ) (a, b) 1 if a = b; 0 o/w

No optimisation needed.

Natively nds non-linear boundaries.

To be added.

y(x) = arg max p(Ck |x)

k k

Multivariate likelihood p(x|Ck ) = D log p(xi |Ck ) i=1 N pMLE (xi = v|Ck ) =

Use a Dirichlet prior on the parameters to obtain a MAP estimate. (tj = Ck xji = v) N j=1 (tj = Ck ) Multivariate likelihood pMAP (xi = v|Ck ) = (i 1) + N (tj = Ck xji = v) j=1 |xi |(i 1) + N (tj = Ck ) j=1 Can only learn linear boundaries for multivariate/multinomial attributes. With Gaussian attributes, quadratic boundaries can be learned with uni-modal distributions. To be added.

j=1

Naive Bayes

Learn p(Ck |x) by modelling p(x|Ck ) and p(Ck ), using Bayes rule to infer the class conditional probability. Assumes each feature independent of all others, ergo Naive.

= arg max p(x|Ck ) p(Ck ) = arg max

k D

i=1 D i=1

p(xi |Ck ) p(Ck ) log p(xi |Ck ) + log p(Ck )

No optimisation needed.

Multinomial likelihood p(x|Ck ) = D p(wordi |Ck )xi i=1 N pMLE (wordi = v|Ck ) = . . . where:

= arg max
k

j=1 (tj = Ck ) xji N D j=1 d=1 (tj = Ck ) xdi

Multinomial likelihood

O(N M ), each training instance must be visited and each of its features counted.

xji is the count of word i in test example j; xdi is the count of feature d in test example j. Gaussian likelihood p(x|Ck ) = D N (v; ik , ik ) i=1 Gradient descent (or gradient ascent if maximising objective): n+1 = n L

pMAP (wordi = v|Ck ) = (i 1) + N (tj = Ck ) xji j=1 N D ((tj = Ck ) xdi ) D + D d j=1 d=1 d=1

y(x) = arg max p(Ck |x)

k k

Log-linear

Estimate p(Ck |x) directly, by assuming a maximum entropy distribution and optimising an objective function over the conditional entropy distribution.

= arg max . . . where:

m m (x, Ck )

Minimise the negative log-likelihood: LMLE (, D) = p(t|x) =

(x,t)D

. . . where is the step parameter. log p(t|x) m m (x, t) LMLE (, D) = E[(x, )] (x, t)

Penalise large values for the parameters, by introducing a prior distribution over them (typically a Gaussian). Objective function LMAP (, D, ) = arg min log p()

Reformulate the class conditional distribution in terms of a kernel K(x, x ), and use a non-linear kernel (for example K(x, x ) = (1 + wT x)2 ). By the Representer Theorem: O(IN M K), since each training instance must be visited and each combination of class and features must be calculated for the appropriate feature mapping. p(Ck |x) =
T 1 e (x,Ck ) Z (x) N K T 1 = e n=1 i=1 nk (xn ,Ci ) (x,Ck ) Z (x) N K 1 = e n=1 i=1 nk K((xn ,Ci ),(x,Ck )) Z (x) N 1 = e n=1 nk K(xn ,x) Z (x)

1 e m m m (x,Ck ) Z (x) Z (x) = e m m m (x,Ck )

(x,t)D

p(Ck |x) =

(x,t)D

log Z (x) log

(x,t)D

m m (x, t)

m m (x,Ck )

LMAP (, D, ) = . . . where
(x,t)D

+ 2

(x,t)D

E[(x, )]

(x, t)

(x,t)D

(x, t) are the empirical counts.

= arg min log e

(0)2 2 2

For each class Ck : E[(x, )] =

(x,t)D

(x, )p(Ck |x)

= arg min

(x,t)D

log p(t|x)

Online Gradient Descent: Update the parameters using GD after seeing each training instance.

2 m

2 2

(x,t)D

log p(t|x)

Binary, linear classier: y(x) = sign(wT x) Directly estimate the linear function y(x) by iteratively updating the weight vector when incorrectly classifying a training instance. . . . where: sign(x) = Tries to minimise the Error function; the number of incorrectly classied input vectors: arg min EP (w) = arg min w T xn t n
w w nM

Iterate over each training example xn , and update the weight vector if misclassication: wi+1 = wi + EP (w) = wi + xn tn . . . where typically = 1. For the multiclass perceptron: wi+1 = wi + (x, t) (x, y(x)) The Voted Perceptron: run the perceptron i times and store each iterations weight vector. Then: y(x) = sign ci sign(wT x) i
i

Perceptron

+1 1

if x 0 if x < 0

O(IN M L), since each combination of instance, class and features must be calculated (see log-linear).

Use a kernel K(x, x ), and 1 weight per training instance: N y(x) = sign wn tn K(x, xn )
n=1

Multiclass perceptron: y(x) = arg max wT (x, Ck )

. . . where M is the set of misclassied training vectors.

. . . and the update:

i+1 i wn = w n + 1

The perceptron is an online algorithm per default.

. . . where ci is the number of correctly classied training instances for wi .

Primal arg min

w,w0

The soft margin SVM: penalise a hyperplane by the number and distance of misclassied points. 1 ||w||2 2 n Primal
N 1 arg min ||w||2 + C n 2 w,w0 n=1

Use a non-linear kernel K(x, x ):

N N

y(x) =

n t n xT xn + w 0

n=1

s.t. Support vector machines A maximum margin classier: nds the separating hyperplane with the maximum margin to its closest data points. y(x) =
N

tn (wT xn + w0 ) 1
N N

Dual n t n x xn + w 0 L() =
T N

Quadratic Programming (QP) SMO, Sequential Minimal Optimisation (chunking).

s.t. Dual

tn (wT xn + w0 ) 1 n ,
N N N

n > 0

QP: O(n3 );

n=1

n m t n t m xT xm n

SMO: much more ecient than QP, since computation based only on support vectors. L() = =

n tn K(x, xn ) + w0

n=1 N N N N N N

Online SVM is described in The Huller: A Simple and Ecient Online SVM, Bordes & Bottou (2005).

n=1 m=1 N

L() = n s.t.

n=1

n m t n t m xT xm n

n=1 m=1 N

n=1

n n

n m t n t m xT xm n

n=1 m=1

s.t.

n 0,

n tn = 0,

n=1

n m tn tm K(xn , xm )

0 n C,

n tn = 0,

n=1

n=1 m=1

Hard assignments rnk {0, 1} s.t. n k rnk = 1, i.e. each data point is assigned to one cluster k. k-means A hard-margin, geometric clustering algorithm, where each data point is assigned to its closest centroid. Geometric distance: The Euclidean distance, l2 norm: D ||xn || = (x )2
k 2 ni ki i=1

arg min
r,

n=1 k=1

N K

rnk ||xn k ||2 2

Expectation: 1 rnk = 0 Maximisation:

if ||xn k ||2 minimal for k o/w n rnk xn = n rnk Only hard-margin assignment to clusters. To be added. Not applicable. To be added.

. . . i.e. minimise the distance from each cluster centre to each of its points.

MLE

(k)

. . . where (k) is the centroid of cluster k. Expectation: For each n, k set: nk = p(z (i) = k|x(i) ; , , ) = K L(x, , , ) = log p(x|, , ) K N = log k Nk (xn |k , k )
n=1 k=1

(= p(k|xn ))

p(x(i) |z (i) = k; , )p(z (i) = k; )

j=1

Mixture of Gaussians

A probabilistic clustering algorithm, where clusters are modelled as latent Guassians and each data point is assigned the probability of being drawn from a particular Gaussian.

Assignments to clusters by specifying probabilities p(x(i) , z (i) ) = p(x(i) |z (i) )p(z (i) ) . . . with z (i) Multinomial(), and k nk p(k|xn ) s.t. j=1 nj = 1. I.e. want to maximise the probability of the observed data x.

Maximisation:

k N (xn |k , k ) = K j=1 j N (xn |j , j )

p(x(i) |z (i) = l; , )p(z (i) = l; ) Online estimation of Gaussian Mixture Models is described in Highly Ecient Incremental Estimation of Gaussian Mixture Models for Online Data Stream Clustering, Song & Wang (2005).

1 k = nk N n=1 N T n=1 nk (xn k )(xn k ) k = N n=1 nk N n=1 nk xn k = N n=1 nk

The mixture of Gaussians assigns probabilities for each cluster to each data point, and as such is capable of capturing ambiguities in the data set.

To be added.

Not applicable.

1 Created

by Emanuel Ferm, HT2011, for semi-procrastinational reasons while studying for a Machine Learning exam. Last updated May 5, 2011.

10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Patient Satisfaction Analysis
No ratings yet
Patient Satisfaction Analysis
7 pages
Statistical Inference Solutions Manual
No ratings yet
Statistical Inference Solutions Manual
12 pages
CSC 336 Midterm Exam
No ratings yet
CSC 336 Midterm Exam
3 pages
Binomial Distribution Explained
No ratings yet
Binomial Distribution Explained
16 pages
Scientific Computing Selected Solutions
50% (2)
Scientific Computing Selected Solutions
3 pages
PDE Solutions for Instructors
No ratings yet
PDE Solutions for Instructors
74 pages
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
100% (1)
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
61 pages
Ee263 Course Reader
No ratings yet
Ee263 Course Reader
430 pages
Probability Solutions by Grinstead & Snell
No ratings yet
Probability Solutions by Grinstead & Snell
45 pages
CS 3600 Project 4b Analysis
No ratings yet
CS 3600 Project 4b Analysis
3 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
263 Homework
No ratings yet
263 Homework
153 pages
23.0 Logistic Regression-6
No ratings yet
23.0 Logistic Regression-6
24 pages
Convex Optimization Primer
No ratings yet
Convex Optimization Primer
300 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Optimal Control Matlab
No ratings yet
Optimal Control Matlab
25 pages
Study Guide For STA3701
No ratings yet
Study Guide For STA3701
325 pages
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
No ratings yet
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
4 pages
Student Performance Analysis Report
No ratings yet
Student Performance Analysis Report
4 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
Mathematical Modeling in Engineering
No ratings yet
Mathematical Modeling in Engineering
69 pages
Assign 1 MTH308 Sol
No ratings yet
Assign 1 MTH308 Sol
3 pages
Center Manifold Reduction
100% (2)
Center Manifold Reduction
8 pages
VHDL Code for 8-to-3 Encoder
100% (1)
VHDL Code for 8-to-3 Encoder
2 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Stochastic Modeling by Nicolas Lanchier
0% (1)
Stochastic Modeling by Nicolas Lanchier
310 pages
Quiz 4
No ratings yet
Quiz 4
3 pages
Shumway and Stoffer
No ratings yet
Shumway and Stoffer
5 pages
Data Analytics - Ridge and LASSO Regression
No ratings yet
Data Analytics - Ridge and LASSO Regression
15 pages
CS229 Midterm Solutions 2010
No ratings yet
CS229 Midterm Solutions 2010
8 pages
The Spectral Decomposition: 1 N N J J J J J J 1 1 T 1 N N T N
No ratings yet
The Spectral Decomposition: 1 N N J J J J J J 1 1 T 1 N N T N
7 pages
Introduction To Curve Fitting PDF
No ratings yet
Introduction To Curve Fitting PDF
49 pages
Gaussian Function: Properties
No ratings yet
Gaussian Function: Properties
8 pages
Epidemic Modeling with ODEs
No ratings yet
Epidemic Modeling with ODEs
8 pages
Linear Optimization Geometry Guide
No ratings yet
Linear Optimization Geometry Guide
135 pages
Bishop ML
No ratings yet
Bishop ML
3 pages
Method of Moment
No ratings yet
Method of Moment
53 pages
MA214-Lecture Notes
No ratings yet
MA214-Lecture Notes
282 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Numerical Methods - B. Ram
No ratings yet
Numerical Methods - B. Ram
236 pages
Probability Density Analysis
No ratings yet
Probability Density Analysis
56 pages
Hw3sol 21015 PDF
No ratings yet
Hw3sol 21015 PDF
13 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Unit-2 Linear Algebra - 2 Notes
No ratings yet
Unit-2 Linear Algebra - 2 Notes
14 pages
Simo Särkkä and Arno Solin - Applied Stochastic Differential Equations (2019, Cambridge University Press)
No ratings yet
Simo Särkkä and Arno Solin - Applied Stochastic Differential Equations (2019, Cambridge University Press)
324 pages
Cheat ML
No ratings yet
Cheat ML
1 page
Classification
No ratings yet
Classification
47 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Statistical Methods For NLP: Text Categorization, Support Vector Machines
No ratings yet
Statistical Methods For NLP: Text Categorization, Support Vector Machines
28 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Unit I - Afs
No ratings yet
Unit I - Afs
18 pages
Advanced Concepts of Modeling in AI (WORK SHEET)
No ratings yet
Advanced Concepts of Modeling in AI (WORK SHEET)
3 pages
Advanced Structural Analysis Course
No ratings yet
Advanced Structural Analysis Course
32 pages
Neural Networks & MLP Explained
No ratings yet
Neural Networks & MLP Explained
50 pages
Seminar Deep Learning
No ratings yet
Seminar Deep Learning
17 pages
DL Unit-2
No ratings yet
DL Unit-2
32 pages
Perceptron Neural Network Program
No ratings yet
Perceptron Neural Network Program
3 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
Refresher: Perceptron Training Algorithm
No ratings yet
Refresher: Perceptron Training Algorithm
12 pages
Foundations of Artificial Intelligence
No ratings yet
Foundations of Artificial Intelligence
60 pages
Perceptron
No ratings yet
Perceptron
3 pages
Acd 21 JB
No ratings yet
Acd 21 JB
51 pages
NNDL QB Part B
No ratings yet
NNDL QB Part B
7 pages
PLA Perceptron Learning Algorithm
No ratings yet
PLA Perceptron Learning Algorithm
14 pages
Lesson Plan - ML
No ratings yet
Lesson Plan - ML
12 pages
Rosenblatt's Perceptron: Neural Networks and Learning Machines, Third Edition
No ratings yet
Rosenblatt's Perceptron: Neural Networks and Learning Machines, Third Edition
12 pages
Nayak 2020
No ratings yet
Nayak 2020
36 pages
ANN - Session 3 CO 1
No ratings yet
ANN - Session 3 CO 1
8 pages
Exam 2018 With Answers
No ratings yet
Exam 2018 With Answers
31 pages
Hide Answer Workspace
No ratings yet
Hide Answer Workspace
40 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
3rd Lecture
No ratings yet
3rd Lecture
21 pages
Perceptron (Numericals) SOLVED
No ratings yet
Perceptron (Numericals) SOLVED
3 pages
Project Report
No ratings yet
Project Report
49 pages
CONTROL ENGINEERING-II Sem
No ratings yet
CONTROL ENGINEERING-II Sem
18 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Lab Manual - Soft Computing
No ratings yet
Lab Manual - Soft Computing
31 pages
Learning Law in Neural Networks
100% (2)
Learning Law in Neural Networks
19 pages
AIML CAT2 - Important Question
No ratings yet
AIML CAT2 - Important Question
3 pages
Learning Deep Learning Theory and Practice of Neural Networks Computer Vision NLP and Transformers Using TensorFlow 1st Edition Ekman Magnus Instant Download
100% (2)
Learning Deep Learning Theory and Practice of Neural Networks Computer Vision NLP and Transformers Using TensorFlow 1st Edition Ekman Magnus Instant Download
82 pages

ML Cheatsheet

Uploaded by

ML Cheatsheet

Uploaded by

Cheat Sheet: Algorithms for Supervised- and Unsupervised Learning

Algorithm Description Model Objective Training Regularisation t = arg max

Natively nds non-linear boundaries.

y(x) = arg max p(Ck |x)

= arg max p(x|Ck ) p(Ck ) = arg max

p(xi |Ck ) p(Ck ) log p(xi |Ck ) + log p(Ck )

j=1 (tj = Ck ) xji N D j=1 d=1 (tj = Ck ) xdi

y(x) = arg max p(Ck |x)

= arg max . . . where:

Minimise the negative log-likelihood: LMLE (, D) = p(t|x) =

1 e m m m (x,Ck ) Z (x) Z (x) = e m m m (x,Ck )

log Z (x) log

(x, t) are the empirical counts.

= arg min log e

For each class Ck : E[(x, )] =

(x, )p(Ck |x)

Multiclass perceptron: y(x) = arg max wT (x, Ck )

. . . where M is the set of misclassied training vectors.

. . . and the update:

The perceptron is an online algorithm per default.

. . . where ci is the number of correctly classied training instances for wi .

Primal arg min

Use a non-linear kernel K(x, x ):

Quadratic Programming (QP) SMO, Sequential Minimal Optimisation (chunking).

rnk ||xn k ||2 2

Expectation: 1 rnk = 0 Maximisation:

p(x(i) |z (i) = k; , )p(z (i) = k; )

k N (xn |k , k ) = K j=1 j N (xn |j , j )

1 k = nk N n=1 N T n=1 nk (xn k )(xn k ) k = N n=1 nk N n=1 nk xn k = N n=1 nk

You might also like