Overview of Machine Learning and Feature Engineering

Overview of Machine
Learning & Feature
Engineering
Machine Learning 101 Tutorial
Strata + Hadoop World, NYC, Sep 2015
Alice Zheng, Dato
1

2
About us
Chris DuBois
Intro to recommenders
Alice Zheng
Overview of ML
Piotr Teterwak
Intro to image search & deep learning
Krishna Sridhar
Deploying ML as a predictive service
Danny Bickson
TA
Alon Palombo
TA

3
Why machine learning?
Model data.
Make predictions.
Build intelligent
applications.

Classification
Predict amongst a discrete set of classes
4

6
Spam filtering
data prediction
Spam
vs.
Not spam

Text classification
EDUCATION
FINANCE
TECHNOLOGY

Regression
Predict real/numeric values
8

Similarity
Find things like this
10

11
Similar products
Product I’m buying
Output: other products I might be interested in

12
Given image, find similar images
http://www.tiltomo.com/

Recommender systems
Learn what I want before I know it
13

15
Playlist recommendations
Recommendations form
coherent & diverse sequence

16
Friend recommendations
Users and “items” are of
the same type

Clustering
Grouping similar items
17

18
Clustering images
Goldberger et al.
Set of Images

19
Clustering web search results

20
Machine learning … how?
Data
Answers
I fell in love the instant I laid
my eyes on that puppy. His
big eyes and playful tail, his
soft furry paws, …
Many systems
Many tools
Many teams
Lots of methods/jargon

21
The machine learning pipeline
I fell in love the instant I laid
my eyes on that puppy. His
big eyes and playful tail, his
soft furry paws, …
Raw data
Features
Models
Predictions
Deploy in
production

22
Three things to know about ML
• Feature = numeric representation of raw data
• Model = mathematical “summary” of features
• Making something that works = choose the right model
and features, given data and task

Feature = numeric representation of raw data

24
Representing natural text
It is a puppy and it is
extremely cute.
What’s important?
Phrases? Specific
words? Ordering?
Subject, object, verb?
Classify:
puppy or not?
Raw Text
{“it”:2,
“is”:2,
“a”:1,
“puppy”:1,
“and”:1,
“extremely”:1,
“cute”:1 }
Bag of Words

25
Representing natural text
It is a puppy and it is
extremely cute.
Classify:
puppy or not?
Raw Text Bag of Words
it 2
they 0
I 1
am 0
how 0
puppy 1
and 1
cat 0
aardvark 0
cute 1
extremely 1
… …
Sparse vector
representation

26
Representing images
Image source: “Recognizing and learning object categories,”
Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009.
Raw image:
millions of RGB triplets,
one for each pixel
Classify:
person or animal?
Raw Image Bag of Visual Words

27
Representing images
Classify:
person or animal?
Raw Image Deep learning features
3.29
-15
-5.24
48.3
1.36
47.1
-
1.92
36.5
2.83
95.4
-19
-89
5.09
37.8
Dense vector
representation

28
Feature space in machine learning
• Raw data  high dimensional vectors
• Collection of data points  point cloud in feature space
• Feature engineering = creating features of the appropriate
granularity for the task

Crudely speaking, mathematicians fall into two
categories: the algebraists, who find it easiest to reduce
all problems to sets of numbers and variables, and the
geometers, who understand the world through shapes.
-- Masha Gessen, “Perfect Rigor”

30
Algebra vs. Geometry
a
b
c
a2 + b2 = c2
Algebra Geometry
Pythagorean
Theorem
(Euclidean space)

31
Visualizing a sphere in 2D
x2 + y2 = 1
a
b
c
Pythagorean theorem:
a2 + b2 = c2
x
y
1
1

32
x2 + y2 + z2 = 1
x
y
z
1
1
1

33
x2 + y2 + z2 + t2 = 1
x
y
z
1
1
1

34
Why are we looking at spheres?
= =
= =
Poincaré Conjecture:
All physical objects without holes
is “equivalent” to a sphere.

35
The power of higher dimensions
• A sphere in 4D can model the birth and death process of
physical objects
• High dimensional features can model many things

37
The challenge of high dimension geometry
• Feature space can have hundreds to millions of
dimensions
• In high dimensions, our geometric imagination is limited
- Algebra comes to our aid

38
Visualizing bag-of-words
puppy
cute
1
1
I have a puppy and
it is extremely cute
I have a puppy and
it 1
they 0
I 1
am 0
how 0
puppy 1
and 1
cat 0
aardvark 0
zebra 0
cute 1
extremely 1
… …

39
Visualizing bag-of-words
puppy
cute
1
1
1
extremely
I have a puppy and
I have an extremely
cute cat
I have a cute
puppy

40
Document point cloud
word 1
word 2

Model = mathematical “summary” of features

42
What is a summary?
• Data  point cloud in feature space
• Model = a geometric shape that best “fits” the point cloud

43
Clustering model
Feature 2
Feature 1
Group data points tightly

44
Classification model
Feature 2
Feature 1
Decide between two classes

45
Regression model
Target
Feature
Fit the target values

Visualizing Feature Engineering

47
When does bag-of-words fail?
puppy
cat
2
1
1
have
I have a puppy
I have a cat
I have a kitten
Task: find a surface that separates
documents about dogs vs. cats
Problem: the word “have” adds fluff
instead of information
I have a dog
and I have a pen
1

48
Improving on bag-of-words
• Idea: “normalize” word counts so that popular words
are discounted
• Term frequency (tf) = Number of times a terms
appears in a document
• Inverse document frequency of word (idf) =
• N = total number of documents
• Tf-idf count = tf x idf

49
From BOW to tf-idf
puppy
cat
2
1
1
have
I have a puppy
I have a cat
I have a kitten
idf(puppy) = log 4
idf(cat) = log 4
idf(have) = log 1 = 0
I have a dog
and I have a pen
1

50
From BOW to tf-idf
puppy
cat1
have
tfidf(puppy) = log 4
tfidf(cat) = log 4
tfidf(have) = 0
I have a dog
and I have a pen,
I have a kitten
1
log 4
log 4
I have a cat
I have a puppy
Decision surface
Tf-idf flattens
uninformative
dimensions in the
BOW point cloud

51
Entry points of feature engineering
• Start from data and task
- What’s the best text representation for classification?
• Start from modeling method
- What kind of features does k-means assume?
- What does linear regression assume about the data?

Dato’s Machine Learning Platform

53
Dato’s machine learning platform
Raw data
Features Models
Predictions
Deploy in
production
GraphLab Create
Dato Distributed
Dato Predictive Services

54
Data structures for feature engineering
Features SFrames
User Com.
Title Body
User Disc.
SGraphs

55
Machine learning toolkits in GraphLab Create
• Classification/regression
• Clustering
• Recommenders
• Deep learning
• Similarity search
• Data matching
• Sentiment analysis
• Churn prediction
• Frequent pattern mining
• And on…

57
Dimensionality reduction
Feature 1
Feature 2
Flatten non-useful features
PCA: Find most non-flat
linear subspace

58
PCA : Principal Component Analysis
Center data at origin

59
Find a line, such that
the average distance of
every data point to the
line is minimized.
This is the 1st Principal
Component

60
Find a 2nd line,
- at right angles to the 1st
- such that the average
distance of every data
point to the line is
minimized.
This is the 2nd Principal
Component

61
Find a 3rd line
- at right angles to the
previous lines
- such that the average
distance of every data
point to the line is
minimized.
…
There can only be as many
principle components as
the dimensionality of the
data.

63
Coursera Machine Learning Specialization
• Learn machine learning in depth
• Build and deploy intelligent applications
• Year long certification program
• Joint project between University of Washington + Dato
• Details:
https://www.coursera.org/specializations/machine-learning

64
Next up today
alicez@dato.com @RainyData, #StrataConf
11:30am - Intro to recommenders
Chris DuBois
1:30pm - Intro to image search & deep learning
Piotr Teterwak
3:30pm - Deploying ML as a predictive service
Krishna Sridhar

Overview of Machine Learning and Feature Engineering

More Related Content

What's hot

Viewers also liked

Similar to Overview of Machine Learning and Feature Engineering

More from Turi, Inc.

Recently uploaded

In this document

Overview of Machine Learning and Feature Engineering