An Introduction to Face
Detection and Recognition
Ziyou Xiong
Dept. of Electrical and Computer Engineering,
Univ. of Illinois at Urbana-Champaign
Outline
• Face Detection
• What is face detection?
• Importance of face detection
• Current state of research
• Different approaches
• One example
• Face Recognition
• What is face recognition?
• Its applications
• Different approaches
• One example
• A Video Demo
What is Face Detection?
• Given an image, tell whether
there is any human face, if there
is, where is it(or where they
are).
Importance of Face Detection
• The first step for any automatic face recognition system system
• First step in many Human Computer Interaction systems
• Expression Recognition
• Cognitive State/Emotional State Recogntion
• First step in many surveillance systems
• Tracking: Face is a highly non rigid object
• A step towards Automatic Target Recognition(ATR) or generic object
detection/recognition
• Video coding……
Face Detection: current state
• State-of-the-art:
• Front-view face detection can be done at >15 frames per second on 320x240
black-and-white images on a 700MHz PC with ~95% accuracy.
• Detection of faces is faster than detection of edges!
• Side view face detection remains to be difficult.
Face Detection: challenges
• Out-of-Plane Rotation: frontal, 45 degree, profile, upside down
• Presence of beard, mustache, glasses etc
• Facial Expressions
• Occlusions by long hair, hand
• In-Plane Rotation
• Image conditions:
• Size
• Lighting condition
• Distortion
• Noise
• Compression
Different Approaches
• Knowledge-based methods:
• Encode what constitutes a typical face, e.g., the relationship between facial features
• Feature invariant approaches:
• Aim to find structure features of a face that exist even when pose, viewpoint or lighting
conditions vary
• Template matching:
• Several standard patterns stored to describe the face as a whole or the facial features
separately
• Appearance-based methods:
• The models are learned from a set of training images that capture the representative
variability of faces.
Knowledge-Based Methods
• Top Top-down approach: Represent a face using a set of human-
coded rules
Example:
• The center part of face has uniform intensity values
• The difference between the average intensity values of the center part and
the upper part is significant
• A face often appears with two eyes that are symmetric to each other, a nose
and a mouth
• Use these rules to guide the search process
Knowledge-Based Method: [Yang and Huang
94]
• Level 1 (lowest resolution):
• apply the rule “the center part of the face has 4 cells with a basically uniform
intensity” to search for candidates
• Level 2: local histogram equalization followed by edge equalization
followed by edge detection
• Level 3: search for eye and mouth features for validation
Knowledge-based Methods: Summary
• Pros:
• Easy to come up with simple rules
• Based on the coded rules, facial features in an input image are extracted first, and face
candidates are identified
• Work well for face localization in uncluttered background
• Cons:
• Difficult to translate human knowledge into rules precisely: detailed rules fail to detect faces
and general rules may find many false positives
• Difficult to extend this approach to detect faces in different poses: implausible to enumerate
all the possible cases
Feature-Based Methods
• Bottom-up approach: Detect facial features (eyes, nose, mouth, etc)
first
• Facial features: edge, intensity, shape, texture, color, etc
• Aim to detect invariant features
• Group features into candidates and verify them
Color Spaces
Skin Detection
Skin Modeling in RGB Color Space
• Most commonly used color space for digital images
• Cameras generally use RGB to store data
• Normalized RGB (r+g+b=1)
• More effective than RGB
• Mitigates lighting effects
• Reduces the differences between skin-tone pixels due to ethnicity
HSV and HSI
• Hue Saturation, and Intensity (HSI)
• Hue Saturation and intensity Value (HSV )
• Benefits: HSV is able to cope with
• High intensity white light
• Ambient lights
• Different surface orientations relative to the light source
• Good choice for skin detection methods
YCbCr and YUV
• Able to reduce the redundancy present in RGB color channels
• Able to represent the color independent components
• Can separate luminance and chrominance components
• These spaces are a favorable choice for skin detection
• The YCbCr space is one of the most popular choices for skin detection
Skin-tone Classifiers
Methods for finding appropriate skin range
• Color distributions of skin color of differing nationalities are clustered
in small area of the color spectrum
• By providing the mean and covariance values, the skin color model
can be fitted into a Gaussian model
Distribution of skin-color
Self Organizing Maps
• The self-organizing map (SOM) algorithm is based on unsupervised, competitive
learning
• The SOM can serve as a clustering tool of high-dimensional data and low-
dimensional data
• Can train to learn skin-color and non-skin-color pixel distributions
• SOM can generalize well
Bayesian
• Bayesian networks (BN) are probabilistic graphical models that
represent a set of variables and their probabilistic independencies
• Sebe et al. used a BN for skin modeling and classification
• Training data of 60,000 samples
• Detection rates of 95.82%
• Only 5% false positives
Histogram Classification
• Represent range of skin tones as a color histogram
• Quantize the histogram into color “bins”
• Use training data to calculate the probability that skin falls within
each bin
• If probability is above a certain threshold, the pixel is classified as skin
• This method has been used extensively by many researchers
Implementation
Hypothesis
• Current methodologies perform reasonably well under certain
conditions; however, they produce too many false positives
• By coupling and filtering differing techniques, false positives can be
reduced
Skin-tone Threshold
• Skin tone ranges were initially taken from journal articles
• These values were not adequate to support the unstructured dataset
• Ultimately, the values were found using experimentation
Test Cases
• Analysis in each color space
• HSI, HSV, RGB, YCbCr
• Average was taken as an additional metric
• We ran the tests in the following configurations in order to evaluate
the overall performance of each aspect
• Skin detection only
• Face detection only
• Skin AND face detection
• Skin OR face detection
Average Comparison
120
100
80
60
40
20
0
Face OR Skin Face AND Skin Face Skin
positive false negative false positive negative
HIS Comparison
120
100
80
60
40
20
0
Face OR Skin Face AND Skin Face Skin
positive false negative false positive negative
HSV Comparison
120
100
80
60
40
20
0
Face OR Skin Face AND Skin Face Skin
positive false negative false positive negative
RGB Comparison
120
100
80
60
40
20
0
Face OR Skin Face AND Skin Face Skin
positive false negative false positive negative
YCbCr Comparison
120
100
80
60
40
20
0
Face OR Skin Face AND Skin Face Skin
positive false negative false positive negative
The Viola/Jones Face Detector
• A seminal approach to real-time object detection
• Training is slow, but detection is very fast
• Key ideas
• Integral images for fast feature evaluation
• Boosting for feature selection
• Attentional cascade for fast rejection of non-face windows
P. Viola and M. Jones. Rapid object detection using a boosted cascade of
simple features. CVPR 2001.
P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
Image Features
“Rectangle filters”
Value =
∑ (pixels in white area) –
∑ (pixels in black area)
Example
Source
Result
Fast computation with integral images
• The integral image computes
a value at each pixel (x,y)
that is the sum of the pixel
values above and to the left (x,y)
of (x,y), inclusive
• This can quickly be
computed in one pass
through the image
Computing the integral image
Computing the integral image
ii(x, y-1)
s(x-1, y)
i(x, y)
• Cumulative row sum: s(x, y) = s(x–1, y) + i(x, y)
• Integral image: ii(x, y) = ii(x, y−1) + s(x, y)
MATLAB: ii = cumsum(cumsum(double(i)), 2);
Example
Integral
Image
-1 +1
+2 -2
-1 +1
Feature selection
• For a 24x24 detection region, the number of possible rectangle
features is ~160,000!
Feature selection
• For a 24x24 detection region, the number of possible rectangle
features is ~160,000!
• At test time, it is impractical to evaluate the entire feature set
• Can we create a good classifier using just a small subset of all possible
features?
• How to select such a subset?
Boosting
• Boosting is a classification scheme that works by combining weak
learners into a more accurate ensemble classifier
• A weak learner need only do better than chance
• Training consists of multiple boosting rounds
• During each boosting round, we select a weak learner that does well on
examples that were hard for the previous weak learners
• “Hardness” is captured by weights attached to training examples
Y. Freund and R. Schapire, A short introduction to boosting, Journal of
Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.
Training procedure
• Initially, weight each training example equally
• In each boosting round:
• Find the weak learner that achieves the lowest weighted
training error
• Raise the weights of training examples misclassified by
current weak learner
• Compute final classifier as linear combination of all
weak learners (weight of each learner is directly
proportional to its accuracy)
• Exact formulas for re-weighting and combining
weak learners depend on the particular boosting
scheme (e.g., AdaBoost)
Y. Freund and R. Schapire, A short introduction to boosting, Journal of
Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.
Template Matching Methods
• Store a template
• Predefined: based on edges or regions
• Deformable: based on facial contours
(e.g., Snakes)
• Templates are hand-coded (not
learned)
• Use correlation to locate faces
Template-Based Methods: Summary
• Pros:
• Simple
• Cons:
• Templates needs to be initialized near the face images
• Difficult to enumerate templates for different poses (similar to knowledge-
based methods)
Appearance-Based Methods: Classifiers
• Neural network
• Multilayer Perceptrons
• Principal Component Analysis (PCA), Factor Analysis
• Support vector machine (SVM)
• Mixture of PCA, Mixture of factor analyzers
• Distribution Distribution-based method
• Naïve Bayes classifier
• Hidden Markov model
• Sparse network of winnows (SNoW)
• Kullback relative information
• Inductive learning: C4.5
• Adaboost
• …
Face and Non-Face Exemplars
• Positive examples:
• Get as much variation as possible
• Manually crop and normalize each face image into a standard size(e.g., 19×19
• Creating virtual examples [Poggio 94]
• Negative examples: Fuzzy idea
• Any images that do not contain faces
• A large image subspace
• Bootstraping[Sung and Poggio 94]
Exhaustive Search
• Across scales
• Across locations
What is Face Recognition?
• A set of two task:
• Face Identification: Given a face image that belongs to a person in a database,
tell whose image it is.
• Face Verification: Given a face image that might not belong to the database,
verify whether it is from the person it is claimed to be in the database.
Difference between Face Detection and
Recognition
• Detection – two-class classification
• Face vs. Non-face
• Recognition – multi-class classification
• One person vs. all the others
Applications of Face Recognition
• Access Control
• Face Databases
• Face ID
• HCI - Human Computer
Interaction
• Law Enforcement
Applications of Face Recognition
• Multimedia Management
• Security
• Smart Cards
• Surveillance
• Others
Different Approaches
• Features:
• Features from global appearance
• Principal Component Analysis(PCA)
• Independent Component Analysis(ICA)
• Features from local regions
• Local Feature Analysis(LFA)
• Gabor Wavelet
• Similarity Measure
• Euclidian Distance
• Neural Networks
• Elastic Graph Matching
• Template Matching
• …
Face Recognition
using
PCA (Eigenfaces) and LDA (Fisherfaces)
Slides adapted from Pradeep Buddharaju
Principal Component Analysis
• A N x N pixel image of a face,
represented as a vector occupies a single
point in N2-dimensional image space.
• Images of faces being similar in overall
configuration, will not be randomly
distributed in this huge image space.
• Therefore, they can be described by a low
dimensional subspace.
• Main idea of PCA for faces:
• To find vectors that best account for variation
of face images in entire image space.
• These vectors are called eigen vectors.
• Construct a face space and project the images
into this face space (eigenfaces).
Image Representation
• Training set of m images of size N*N are
represented by vectors of size N2
x1,x2,x3,…,xM
Example
1
2
1 2 3 3
3 1 2 3
1
4 5 1 33
2
4
5
1
91
Average Image and Difference Images
• The average training set is defined by
m= (1/m) ∑mi=1 xi
• Each face differs from the average by vector
ri = xi – m
Covariance Matrix
• The covariance matrix is constructed as
C = AAT where A=[r1,…,rm]
Size of this matrix is N2 x N2
• Finding eigenvectors of N2 x N2 matrix is intractable. Hence, use the matrix ATA of
size m x m and find eigenvectors of this small matrix.
Eigenvalues and Eigenvectors - Definition
• If v is a nonzero vector and λ is a number such that
Av = λv, then
v is said to be an eigenvector of A with eigenvalue λ.
Example
2 1 1 1
1 2 1 3 1
Eigenvectors of Covariance Matrix
• The eigenvectors vi of ATA are:
• Consider the eigenvectors vi of ATA such that
ATAvi = mivi
• Premultiplying both sides by A, we have
AAT(Avi) = mi(Avi)
Face Space
• The eigenvectors of covariance matrix are
ui = Avi
• ui resemble facial images which look ghostly, hence called Eigenfaces
Projection into Face Space
• A face image can be projected into this face space by
pk = UT(xk – m) where k=1,…,m
Recognition
• The test image x is projected into the face space to
obtain a vector p:
p = UT(x – m)
• The distance of p to each face class is defined by
Єk2 = ||p-pk||2; k = 1,…,m
• A distance threshold Өc, is half the largest distance
between any two face images:
Өc = ½ maxj,k {||pj-pk||}; j,k = 1,…,m
Recognition
• Find the distance Є between the original image x and its reconstructed
image from the eigenface space, xf,
Є2 = || x – xf ||2 , where xf = U * x + m
• Recognition process:
• IF Є≥Өc
then input image is not a face image;
• IF Є<Өc AND Єk≥Өc for all k
then input image contains an unknown face;
• IF Є<Өc AND Єk*=mink{ Єk} < Өc
then input image contains the face of individual k*
Limitations of Eigenfaces Approach
• Variations in lighting conditions
• Different lighting conditions for enrolment
and query.
• Bright light causing image saturation.
• Differences in pose – Head orientation
- 2D feature distances appear to distort.
• Expression
- Change in feature location and shape.
Linear Discriminant Analysis
• PCA does not use class information
• PCA projections are optimal for reconstruction from a
low dimensional basis, they may not be optimal from a
discrimination standpoint.
• LDA is an enhancement to PCA
• Constructs a discriminant subspace that minimizes the
scatter between images of same class and maximizes the
scatter between different class images
Mean Images
• Let X1, X2,…, Xc be the face classes in the database and let each face
class Xi, i = 1,2,…,c has k facial images xj, j=1,2,…,k.
• We compute the mean image mi of each class Xi as:
1 k
mi x j
k j 1
• Now, the mean image m of all the classes in the database can be
calculated as:
1 c
m mi
c i 1
Scatter Matrices
• We calculate within-class scatter matrix as:
c
SW (x k m i )( x k m i ) T
i 1 xk X i
• We calculate the between-class scatter matrix as:
c
S B N i ( m i m )( m i m ) T
i 1
Multiple Discriminant Analysis
We find the projection directions as the matrix W that maximizes
^ |W T SBW |
W argmax J(W )
|W T SW W |
This is a generalized Eigenvalue problem where the
columns of W are given by the vectors wi that solve
SB wi i SW wi
Fisherface Projection
• We find the product of SW-1 and SB and then compute the Eigenvectors of this
product (SW-1 SB) - AFTER REDUCING THE DIMENSION OF THE FEATURE SPACE.
• Use same technique as Eigenfaces approach to reduce the dimensionality of
scatter matrix to compute eigenvectors.
• Form a matrix W that represents all eigenvectors of SW-1 SB by placing each
eigenvector wi as a column in W.
• Each face image xj Xi can be projected into this face space by the operation
pi = WT(xj – m)
Testing
• Same as Eigenfaces Approach