Introduction to machine_learning

1
Introduction to Machine Learning
Kiran Lonikar

2
What is learning?
Tom Mitchell: Learning is to improve some performance measure P of executing some
task T with experience E.
In plain English: Performing some task better with experience and training…
Key Elements:
• Remember or memorize the past experiences E
• Generalize from the experiences E
Observe how kids learn to read words: They make mistakes even when reading
previously known words, then correct themselves. Especially happens when reading
words with silent letters, and those ending with tion.
Warning: This is a highly mathematical subject!

3
What is Machine Learning
How would you build a computer program which “learns” from experiences?
Generally a three phase process
• Express Experience E mathematically: Build a
set of features related to the experiences (Feature
Extraction from raw data)
• Memorize and Generalize: Build a mathematical
model or set of rules from the experiences (training)
• Apply the mathematical model to features of the
future tasks

4
Machine Learning in Action…
• Word Lens mobile app
• OCR in web pages:
http://newscarousel.herokuapp.com/scribble-js/Scribble.html

5
Types of ML Systems
• Supervised Learning
• Classification
• Logistic Regression, SVM, NB, Decision Trees, ANN etc.
• Regression
• Recommender Systems*
• User-user/item-item similarity, matrix factorization etc.
• Unsupervised Learning
• Clustering
• K-means, Fuzzy K-Means, Model based (LDA) clustering etc.
• Dimensionality reduction
• Principal Component Analysis (PCA)
• Anomaly Detection

6
Classification
Identify speaker’s gender from the voice spectrum
Amplitude
Frequency
• Training: Build a model using data: {(a1, f1,
g1), (a2, f2, g2), … (am, fm, gm)}
• Logistic Regression (LR): p(g = F | a, f; θ)
= hθ(θ0 + θ1a + θ2f)
• Decision Boundary: p < 0.5, g = M, else F

7
Logistic Regression
• If we let
• y = 1 when g = F, and y = 0 when g = M, and define
vector x = [a, f]
• and define a function hθ(x) = sigmoid(θT*x) where
sigmoid(z) = 1/(1+e-z). It represents probability
p(y=1|x,θ).
• Cost J(θ) = -Σ(y*log(h) + (1-y)*log(1-h)) -
λθTθ over all training examples for some λ.
• Optimization algorithm (gradient descent): Obtain θ which
minimizes J(θ).
• Try to fit model θ to cross validation data, vary λ for
optimum fitment.
• Test model θ against test data: hθ(x) ≥ 0.5, predict
gender = F, otherwise predict gender as M.

8
Recommender Systems
• User j specifies ratings for item i: y(i,j) Training Data
• Guess ratings for other items: The blanks
Items
Users
1 5
3
4
2
4
2
5
2
1
3
2
4
5
3
3
2
4
3
3
1
1
3
4
5
• Collaborative Filtering: k features of each item:
• Feature vector xi for item i: {xi
1,xi
2, … xi
k}
• Parameter Vector θj for user j: {θj
1, θj
2, … θj
k}
• For user j’s estimated rating for item i: (θj)T xi

9
Recommender Systems
• Learn xi and θj:
• Given xi , minimize Σ((θj)T xi - y(i,j))2 for all i where user j
has rated item i to find optimum θj.
• Given θj, minimize Σ((θj)T xi - y(i,j))2 for all j where user j
has rated item i to find optimum xi.
• Simultaneously: minimize Σ((θj)T xi - y(i,j))2 for all (i,j)
where user j has rated item i to find optimum θj and xi.
• Find factors X and ϴ of ratings matrix Y such that Y ≈ X ϴT
• Other Algorithms: user-user similarity, item-item
similarity
• Useful even when users are not humans, for e.g..
Wiki documents as users and links as items.

10
Clustering
• Example: Top two occurring terms in documents
• Training set: {x1, x2, x3, … xm}, vector xi
• No labels (yi) specified
#Term 2
#Term 1

11
Clustering: Applications
• Computer Science
• Document Clustering
• Google news: Organizing similar news from different sources
• News Categorizing
• Social networks analysis
• Features reduction: Speeding up ML pipelines
• Cluster Centroids as new features
• Image compression (Reduce number of colors): Pre-processing for faster, memory efficient
computations
• Deep Learning: Alternate supervised and unsupervised learning
• Recommender Systems
• Physics:
• Astronomy
• Particle physics
• Market segmentation
• http://en.wikipedia.org/wiki/Cluster_analysis#Applications

12
K-Means Clustering
1. Randomly choose initial cluster centroids
#Term 2
#Term 1
2. Assign each training example to a cluster: Pick
closest centroid
3. Move centroids: Re-compute centroids as average
of training points assigned
4. Repeat 2, 3 for max iterations count or convergence

13
Popular Machine Learning Tools
• Apache Mahout:
• Various Recommender Systems, clustering, and
classification algorithms
• Java based, with some algorithms having Hadoop Map-
Reduce implementations. Recently started spark
implementations, with a new ML DSL.
• Stable, widely used in production, community support.
• R:
• Popular in statistics world. Has its own language
• GNU license
• Spark MLLib, Mlbase(http://www.mlbase.org/)
• Scala based. Runs on spark (in memory, distributed)

14
Popular Machine Learning tools
• Weka:
• Java based
• GNU License
• Vowpal Wabbit: http://hunch.net/~vw/,
https://github.com/JohnLangford/vowpal_wabbit
• Google Prediction API
• http://en.wikipedia.org/wiki/Machine_learning#Soft
ware

15
Machine Learning In Action
• Mobile:
• Speech Recognition: Google Now, Siri
• Languages/NLP: Google Translate
• Vision: face recognition in cameras and online photos, OCR
• Misc: Handwriting driven Myscript calculator and Stylus
keyboard
• Applications
• OCR of printed documents and handwriting
• Automatic tagging of photos based on similar faces
• Biology and Medicine:
• DNA analysis for likelihood of diseases, personalized drugs
etc.

16
Resources
• Online Courses:
• Coursera: Machine Learning (Andrew Ng)
• Coursera: Neural Networks for Machine Learning (Geoffrey
Hinton)
• Udacity: Intro to Artificial Intelligence (Peter Norvig, Sebastian
Thrun)
• CMU: Introduction to Machine Learning (Alex Smola)
• Berkely: Scalable Machine Learning (Alex Smola)
• Books:
• Pattern Recognition and Machine Learning: Christopher Bishop
• Machine Learning: Tom Mitchell
• Mahout In Action
• Artificial Intelligence: A modern approach (http://aima.cs.berkeley.edu/)
• Machine Learning in Action

17
Resources
• Quora:
• http://www.quora.com/How-do-you-explain-Machine-
Learning-and-Data-Mining-to-non-Computer-Science-people
• http://www.quora.com/Machine-Learning
• Misc.:
• http://fastml.com/
• http://alex.smola.org/
• https://funnel.hasgeek.com/fifthel2014/1132-realizing-large-scale-
distributed-deep-learning-ne
• http://spark-summit.org/2014/agenda
• Tutorial on HMM, Speech Recognition: Rabiner
• Tesseract OCR library

Introduction to machine_learning

More Related Content

What's hot

Similar to Introduction to machine_learning

Recently uploaded

Introduction to machine_learning

Editor's Notes