Object Recognition with Deformable Models

Object Recognition with
Deformable Models
Pedro F. Felzenszwalb
Department of Computer Science
University of Chicago

Joint work with: Dan Huttenlocher, Joshua Schwartz,
David McAllester, Deva Ramanan.

Example Problems
Detecting rigid objects PASCAL challenge

Medical image
Detecting non-rigid objects analysis
Segmenting cells

Deformable Models
• Signiﬁcant challenge:
- Handling variation in appearance within object classes
- Non-rigid objects, generic categories, etc.
• Deformable models approach:
- Consider each object as a deformed version of a template
- Compact representation
- Leads to interesting modeling and algorithmic problems

Overview
• Part I: Pictorial Structures
- Deformable part models
- Highly efﬁcient matching algorithms
• Part II: Deformable Shapes
- Triangulated polygons
- Hierarchical models
• Part III: The PASCAL Challenge
- Recognizing 20 object categories in realistic scenes
- Discriminatively trained, multiscale, deformable part models

Part I: Pictorial Structures

• Introduced by Fischler and Elschlager in 1973

• Part-based models:
- Each part represents local visual properties
- “Springs” capture spatial relationships
Matching model to image involves
joint optimization of part locations
“stretch and ﬁt”

Local Evidence + Global Decision

• Parts have a match quality at each image location

• Local evidence is noisy
- Parts are detected in the context of the whole model
part

test image match quality

Matching Problem

• Model is represented by a graph G = (V, E)
- V = {v ,...,v } are the parts
1 n

- (v ,v ) ∈ E indicates a connection between parts
i j

• mi(li) is a cost for placing part i at location li

• dij(li,lj) is a deformation cost

• Optimal conﬁguration for the object is L = (l1,...,ln) minimizing
n
E(L) = ∑ m (l ) + ∑ d (l ,l )
i i ij i j
i=1 (vi,vj) ∈ E

Matching Problem
n
E(L) = ∑ m (l ) + ∑ d (l ,l )
i i ij i j
i=1 (vi,vj) ∈ E

• Assume n parts, k possible locations for each part
- There are k n conﬁgurations L

• If graph is a tree we can use dynamic programming
- O(nk ) algorithm
2

• If dij(li,lj) = g(li-lj) we can use min-convolutions
- O(nk) algorithm
- As fast as matching each part separately!

Dynamic Programming on Trees
n v2
E(L) = ∑ m (l ) + ∑ d (l ,l )
i i ij i j
i=1 (vi,vj) ∈ E v1

• For each l1 ﬁnd best l2:

- Best (l ) = min [m (l ) + d
2 1
l2
2 2 12(l1,l2) ]
• “Delete” v2 and solve problem with smaller model

• Keep removing leafs until there is a single part left

Min-Convolution Speedup
v2

Best2(l1) = min [m2(l2) + d12(l1,l2)] v1
l2

• Brute force: O(k2) --- k is number of locations

• Suppose d12(l1,l2) = g(l1-l2):

- Best (l ) = min [m (l ) + g(l -l )]
2 1
l2
2 2 1 2

• Min-convolution: O(k) if g is convex

Finding Motorbikes

Model with 6 parts:
2 wheels
2 headlights
front & back of seat

Human Tracking

Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance
IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007

Part II: Deformable Shapes
• Shape is a fundamental cue for recognizing objects

• Many objects have no well deﬁned parts
- We can capture their outlines using deformable models

Triangulated Polygons

• Polygonal templates

• Delauney triangulation gives natural decomposition of an object

• Consider deforming each triangle “independently”

Rabbit ear can be bent by
changing shape of a single
triangle

Structure of Triangulated Polygons

There are 2 graphs associated with a
triangulated polygon

If the polygon is simple (no holes):

Dual graph is a tree
Graphical structure of triangulation is a 2-tree

Deformable Matching
Consider piecewise afﬁne maps from model
to image (taking triangles to triangles)

Find globally optimal deformation using
Model dynamic programming over 2-tree

Matching to MRI data

Hierarchical Shape Model
• Shape-tree of curve from a to b:
- Select midpoint c, store relative location c | a,b.
- Left child is a shape-tree of sub-curve from a to c.
- Right child is a shape-tree of sub-curve from c to b.
h
f c d i
e g c | a,b
b
a

e | a,c d | c,b

f | a,e g | e,c h | c,d i | d,b

Deformations

• Independently perturb relative locations stored in a shape-tree
- Local and global properties are preserved
- Reconstructed curve is perceptually similar to original

Matching
h
f c d i
e g c | a,b

a
b w p

e | a,c d | c,b
r

v f | a,e g | e,c h | c,d i | d,b
q

u
model curve

Match(v, [p,q]) = w1
Match(u, [q,r]) = w2
Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r))

similar to parsing with the CKY algorithm

Recognizing Leafs

Nearest neighbor classification
15 species
Shape-tree 96.28
75 examples per species
Inner distance 94.13
(25 training, 50 test)
Shape context 88.12

Part III: PASCAL Challenge
• ~10,000 images, with ~25,000 target objects
- Objects from 20 categories (person, car, bicycle, cow, table...)
- Objects are annotated with labeled bounding boxes

Model Overview

detection root filter part filters deformation
models

Model has a root filter plus deformable parts

Histogram of Gradient (HOG) Features

• Image is partitioned into 8x8 pixel blocks

• In each block we compute a histogram of gradient orientations
- Invariant to changes in lighting, small deformations, etc.
• We compute features at different resolutions (pyramid)

Filters

• Filters are rectangular templates deﬁning weights for features

• Score is dot product of ﬁlter and subwindow of HOG pyramid

H
W
Score of H at this location is H ⋅ W

HOG pyramid

Object Hypothesis

Score is sum of ﬁlter
scores plus deformation
scores

Image pyramid HOG feature pyramid

Multiscale model captures features at two-resolutions

Training
• Training data consists of images with labeled bounding boxes

• Need to learn the model structure, ﬁlters and deformation costs

Training

Connection With Linear Classifiers
• Score of model is sum of filter scores plus deformation scores
- Bounding box in training data specifies that score should be
high for some placement in a range

w is a model
x is a detection window
z are filter placements

concatenation of filters and concatenation of features
deformation parameters and part displacements

Latent SVMs

Linear in w if z is ﬁxed

Regularization Hinge loss

Learned Models
Bicycle
Sofa

Car
Bottle

Overall Results

• 9 systems competed in the 2007 challenge

• Out of 20 classes we get:
- First place in 10 classes
- Second place in 6 classes
• Some statistics:
- It takes ~2 seconds to evaluate a model in one image
- It takes ~3 hours to train a model
- MUCH faster than most systems

Component Analysis

PASCAL2006 Person
1
0.9 Root (0.18)
Root+Latent (0.24)
0.8 Parts+Latent (0.29)
0.7 Root+Parts+Latent (0.34)
0.6
precision

0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall

Summary

• Deformable models provide an elegant framework for object
detection and recognition

- Efﬁcient algorithms for matching models to images
- Applications: pose estimation, medical image analysis,
object recognition, etc.

• We can learn models from partially labeled data

- Generalized standard ideas from machine learning
- Leads to state-of-the-art results in PASCAL challenge
• Future work: hierarchical models, grammars, 3D objects

Object Recognition with Deformable Models

More Related Content

What's hot

Similar to Object Recognition with Deformable Models

More from zukun

Recently uploaded

Object Recognition with Deformable Models