Face Detection
Face Detection Using HAAR Cascade
Abstract
 In the past years a lot of effort has been made in the field of face
detection. The human face contains important features that can be used by
vision-based automated systems in order to identify and recognize
individuals. Face location, the primary step of the vision-based automated
systems, finds the face area in the input image. An accurate location of
the face is still a challenging task. Viola-Jones framework has been
widely used by researchers in order to detect the location of faces and
objects in a given image. Face detection classifiers are shared by public
communities, such as OpenCV. An evaluation of these classifiers will
help researchers to choose the best classifier for their particular need.This
work focuses of the Face Detection Using HAAR Cascade
1) Introduction
       Face detection is a computer vision technology that involves
identifying and locating human faces in digital images or video frames. It
plays a crucial role in various applications, ranging from photography and
video surveillance to facial recognition and augmented reality. The
primary goal of face detection is to locate and extract facial features
within an image or a video stream.LTHOUGH recognizing an individual
by the face is an easy task for humans, it is a challenge for vision-based
automated systems. It has been an active research area involving several
disciplines such as image processing, neural networks, statistics, pattern
recognition, anthropometry and computer vision. Vision-based automated
                          Face Detection
systems can apply facial recognition and facial identification in numerous
commercial applications, such as biometric authentication, human-
computer interaction, surveillance, games and multimedia entertainment.
Unlike other biometrics, face recognition is non-invasive, and does not
need physical contact of the individual with the system, making it a very
acceptable biometric. Vision-based automated systems applied to face
recognition can be divided into 4 steps: face detection, image pre-
processing, feature
extraction and matching [1]. Face detection is a hard task, once faces
form a similar class of objects and their features, such as eyes, mouth,
nose and chin, have, in general, the same geometrical configuration. The
captured image of the face may be pre-processed to overcome
illumination variations [2]. Feature extraction is the process where a
geometrical or vectorial model is obtained gathering important
characteristics presented on the face. Feature extraction can be divided
into 3 approaches: holistic, feature-based and hybrid. Principal
component analysis [3] [4], fisher discriminant analysis [5] [6] and
support vector machine [7] are examples of holistic approach. Feature-
based approach is based on geometrical relation of the facial features. [8]
applied active shape model, gathering important information presented in
some of the facial features. Statistical classifiers such as Euclidian
distance [9], Bayes classifier [10], Mahalanobis distance [11] and
neural classifiers [12] can be used to compare the characteristic vector
with other classes (individuals) in the
matching step. Face detection has been improved in terms of speed with
the application of haar-features with the contribution of the
ViolaJonesobject      detection   framework.   Implementations       of   this
framework, such as OpenCV, provide different face classifiers created by
                           Face Detection
authors that used different datasets into their training. The performance
and reliability of these classifiers vary a lot.
2) RELATED WORK AND SOME BASICS
Face detection
Face detection is a technology that can identify a person's face in pictures
or videos. It has become increasingly important for security purposes,
including legal requirements and global security. There are various
algorithms used for face detection, such as the Haar cascade and Local
Binary Pattern (LBP) algorithms. These algorithms use different
techniques to extract facial features and classify faces based on their
positions. In the context of the COVID-19 pandemic, there has been a
focus on detecting masked faces, which presents additional challenges for
face detection algorithms. Studies have compared the performance of
different algorithms, and it has been found that the Haar cascade classifier
outperforms the LBP classifier [24] [25]. However, there is still a lack of
evidence regarding how well existing face detection algorithms perform
on masked faces [26] [27].
Haar-like features
In the 19th century a Hungarian mathematician, Alfred Haar gave the
concepts of Haar wavelets, which are a sequence of rescaled “square-
shaped” functions which together form a wavelet family or basis. Voila
and Jones adapted the idea of using Haar wavelets and developed the so-
called Haar-like features.
                        Face Detection
Haar-like features are a method used in various research fields. They have
been applied in the development of frameworks for automatic gun
detection using CCTV images [29]. Haar-like features have also been
used in the detection of human faces for tracking purposes in unmanned
aerial vehicles (UAVs) [30]. Additionally, Haar-like features have been
utilized in the detection of wind turbine blade cracks from images [31]. In
the field of advertising, Haar-like features have been employed in
augmented reality (AR) technology for marker detection and car
specification presentation [32]. Furthermore, Haar-like features have been
used in face detection systems for biometric research, face recognition,
and identification [33]. These applications demonstrate the versatility and
effectiveness of Haar-like features in various domains.
A.Yale face database
The yale face database [13] contains facial images of 15 individuals, with
11 pictures per person, taken with different illumination conditions. The
subjects have different facial expressions (with glasses, sad, sleepy,
surprised, wink). The size of each image is 320x243 pixels.
B.FEI face database
The FEI face database [14] is a Brazilian database containing 14 images
for each of 200 individuals, with a total of 2800 images. The images are
colorful in different rotations with neutral, smiling and non-smiling
expressions. We used 2 frontal images per individual, considering the
smiling and nonsmiling expression, in a total of 400 images. The original
size of each image is 640x480 pixels.
                         Face Detection
C.Viola-Jones face detectors
Motivated by the challenge of face detection, [15] proposed an object
detector framework using Haar-like features, which has been widely used
by other works not only for face detection, but also for object locations.
 Thanks to the Open Computer Vision Library implementation [16], the
general object detector framework has become popular and motivated the
community to generate their own object classifiers. These classifiers use
haar-like features that are applied over the image. Only those image
regions, called sub-windows, that pass through all the stages of the
detector are considered to contain the target object.
The cascade object detector uses the Viola-Jones algorithm to detect
people’s faces, noses, eyes, mouth, or upper body. You can also use the
Image Labeler to train a custom classifier to use with this System object.
For details on how the function works, see Get Started with Cascade
Object Detector.
To detect facial features or upper body in an image:
1) Create the vision.CascadeObjectDetector object and set its properties.
2) Call the object with arguments, as if it were a function.
The typical cascade classifier is the very successful method of Viola and
Jones for face detection [22-23] . Generally, many object detection tasks
with rigid structure can be addressed by means of this method, not
limited to face detection. The cascade classifier is a tree-based technology,
in which Viola and Jones used Haar-like features for human face
detection. The Haar-like features by default are shown in Figure
                         Face Detection
1 , which can be used with all scales in the boosted classifier and can be
rapidly computed from an integral version of the image to be detected in.
Fig. 3 shows the detection cascade schematic with N stages. The
detection cascade is designed to eliminate a large number of negative
examples with a little processing.
D.LANDMARKS
                          Face Detection
Landmark detection is important not only to generate a geometric face
model, but also can be used for face detection [17]. [18] compared
different algorithms for facial landmark localization and proposed a set of
tools that ease the integration of other face databases. [19] proposed a
technique for face segmentation using Active Shape Model based on
border landmarks of the face. [20] used a facial geometrical model based
on the distance of the eyes to estipulate the position of other landmarks
for face segmentation, shown in Fig. 4.
            Fig. 4 Geometrical model of the face (Liu, Z et al – 2008)
FGnet project has published the location of 22 facial features of each face
of the AR face database [21]. We also marked manually the same 22
                         Face Detection
facial feature points of the Yale and FEI face database images used in this
work. Fig. 5 shows an image with the marked facial points. In the total,
565 images were used and for each one of the 22 landmarks, a score was
given (see Table LANDMARKS And Scores). The scores were either 1
or 2. The landmarks located in the contour of the face were given the
highest score. The application of the scores will be explained in the next
section.
                 Fig. 5 Example of landmarks marked manually
                       Face Detection
3) Algorithm and code
Viola Jones algorithm is named after two computer vision researchers
who proposed the method in 2001, Paul Viola and Michael Jones in their
paper, “Rapid Object Detection using a Boosted Cascade of Simple
Features”. Despite being an outdated framework, Viola-Jones is quite
powerful, and its application has proven to be exceptionally notable in
                          Face Detection
real-time face detection. This algorithm is painfully slow to train but can
detect faces in real-time with impressive speed.
Given an image(this algorithm works on grayscale image), the algorithm
looks at many smaller subregions and tries to find a face by looking for
specific features in each subregion. It needs to check many different
positions and scales because an image can contain many faces of various
sizes. Viola and Jones used Haar-like features to detect faces in this
algorithm[28].
The Viola Jones algorithm has four main steps, which we shall discuss in
the sections to follow:
 Selecting Haar-like features
 Creating an integral image
 Running AdaBoost training
 Creating classifier cascades
Haar-like features are digital image features used in object recognition.
All human faces share some universal properties of the human face like
the eyes region is darker than its neighbour pixels, and the nose region is
brighter than the eye region.
A simple way to find out which region is lighter or darker is to sum up
the pixel values of both regions and compare them. The sum of pixel
values in the darker region will be smaller than the sum of pixels in the
lighter region. If one side is lighter than the other, it may be an edge of an
eyebrow or sometimes the middle portion may be shinier than the
surrounding boxes, which can be interpreted as a nose This can be
accomplished using Haar-like features and with the help of them, we can
interpret the different parts of a face.
                           Face Detection
There are 3 types of Haar-like features that Viola and Jones identified in
their research:
I. Edge features
II. Line-features
III. Four-sided features
Edge features and Line features are useful for detecting edges and lines
respectively. The four-sided features are used for finding diagonal
features.
The value of the feature is calculated as a single number: the sum of pixel
values in the black area minus the sum of pixel values in the white area.
The value is zero for a plain surface in which all the pixels have the same
value, and thus, provide no useful information.
                             Face Detection
Since our faces are of complex shapes with darker and brighter spots, a
Haar-like feature gives you a large number when the areas in the black
and white rectangles are very different. Using this value, we get a piece
of valid information out of the image.
To be useful, a Haar-like feature needs to give you a large number,
meaning that the areas in the black and white rectangles are very different.
There are known features that perform very well to detect human faces:
For example, when we apply this specific haar-like feature to the bridge
of the nose, we get a good response. Similarly, we combine many of these
features to understand if an image region contains a human face.
 Integral Images
 calculate a value for each feature, we need to perform computations on
all the pixels inside that particular feature. In reality, these calculations
can be very intensive since the number of pixels would be much greater
when we are dealing with a large feature.
The integral image plays its part in allowing us to perform these intensive
calculations quickly so we can understand whether a feature of several
features fit the criteria.
An integral image (also known as a summed-area table) is the name of
both a data structure and an algorithm used to obtain this data structure. It
is used as a quick and efficient way to calculate the sum of pixel values in
an image or rectangular part of an image.
                          Face Detection
Next, we use a Machine Learning algorithm known as AdaBoost. But
why do we even want an algorithm?
The number of features that are present in the 24×24 detector window is
nearly 160,000, but only a few of these features are important to identify
a face. So we use the AdaBoost algorithm to identify the best features in
the 160,000 features.
In the Viola-Jones algorithm, each Haar-like feature represents a weak
learner. To decide the type and size of a feature that goes into the final
classifier, AdaBoost checks the performance of all classifiers that you
supply to it.
To calculate the performance of a classifier, you evaluate it on all
subregions of all the images used for training. Some subregions will
produce a strong response in the classifier. Those will be classified as
positives, meaning the classifier thinks it contains a human face.
Subregions that don’t provide a strong response don’t contain a human
face, in the classifiers opinion. They will be classified as negatives.
The classifiers that performed well are given higher importance or weight.
The final result is a strong classifier, also called a boosted classifier, that
contains the best performing weak classifiers.
So when we’re training the AdaBoost to identify important features,
we’re feeding it information in the form of training data and subsequently
training it to learn from the information to predict. So ultimately, the
algorithm is setting a minimum threshold to determine whether
something can be classified as a useful feature or not.
Cascading Classifiers
Maybe the AdaBoost will finally select the best features around say 2500,
but it is still a time-consuming process to calculate these features for each
                          Face Detection
region. We have a 24×24 window which we slide over the input image,
and we need to find if any of those regions contain the face. The job of
the cascade is to quickly discard non-faces, and avoid wasting precious
time and computations. Thus, achieving the speed necessary for real-time
face detection.
We set up a cascaded system in which we divide the process of
identifying a face into multiple stages. In the first stage, we have a
classifier which is made up of our best features, in other words, in the
first stage, the subregion passes through the best features such as the
feature which identifies the nose bridge or the one that identifies the eyes.
In the next stages, we have all the remaining features.
When an image subregion enters the cascade, it is evaluated by the first
stage. If that stage evaluates the subregion as positive, meaning that it
thinks it’s a face, the output of the stage is maybe.
When a subregion gets a maybe, it is sent to the next stage of the cascade
and the process continues as such till we reach the last stage.
If all classifiers approve the image, it is finally classified as a human face
and is presented to the user as a detection.
Now how does it help us to increase our speed? Basically, If the first
stage gives a negative evaluation, then the image is immediately
discarded as not containing a human face. If it passes the first stage but
fails the second stage, it is discarded as well. Basically, the image can get
discarded at any stage of the classifier.
                         Face Detection
Code
%loading the video
the_Image = imread('image_face.jpg');
[width, height] = size(the_Image);
if width>320
the_Image = imresize(the_Image,[320 NaN]);
end
% Create a cascade detector object.
faceDetector = vision.CascadeObjectDetector();
%finding the bounding box that encloses the face on video frame
face_Location = step(faceDetector, the_Image);
% Draw the returned bounding box around the detected face.
the_Image = insertShape(the_Image, 'Rectangle', face_Location);
figure;
imshow(the_Image);
title('Detected face');
                         Face Detection
4) Conclusion
In conclusion, using the Haar Cascade for face detection has both
advantages and limitations. Haar Cascade classifiers are appreciated for
their speed and resource efficiency, making them suitable for real-time
applications with less computational power. They perform well in
controlled environments with consistent lighting and well-posed faces,
and they are robust against overfitting.
However, Haar Cascade classifiers may fall short in accuracy when faced
with complex backgrounds, occlusions, or variations in facial expressions
and poses. Their training process can be complex, requiring a substantial
amount of diverse image samples.
Additionally, Haar Cascade classifiers are less effective for detecting
small faces or faces at a distance. Their performance tends to degrade as
the size of the face decreases.
In summary, Haar Cascade is a practical choice for certain applications,
especially those prioritizing speed and efficiency in controlled
environments. For more demanding applications that require higher
accuracy and robustness in handling diverse conditions, alternative
methods like deep learning approaches using Convolutional Neural
Networks (CNNs) may be more suitable. The choice depends on the
specific needs and constraints of the application.
                        Face Detection
REFERENCES
[1] J. Fagertun, 2005. Face Recognition. Master Thesis, Technical
University of Denmark (DTU).
[2] Y. Gang, L. Jiawei, L. Jiayu, M. Qingli and Y. Ming, “Illumination
Variation in Face Recognition: A Review”, IEEE Second International
Conference on Intelligent Networks and Intelligent Systems (ICINIS
2009), pp. 309-311.
[3] M.A. Turk, A.P. Pentland, "Face Recognition Using Eigenfaces",
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR’91), 3-6 June 1991, Maui, Hawaii, USA, pp. 586-
591.
[4] H. Moon, P.J. Phillips, "Computational and Performance Aspects of
PCA-based Face Recognition Algorithms", Perception, Vol. 30, 2001,
pp. 303-321.
[5] K. Etemad, R. Chellappa, "Discriminant Analysis for Recognition of
Human Face Images", Journal of the Optical Society of America A, Vol.
14, No. 8, August 1997, pp. 1724-1733.
[6] J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, "Face Recognition
Using
LDA-Based Algorithms", IEEE Transaction on Neural Networks, Vol.
14, No. 1, January 2003, pp. 195-200.
[7] B. Heisele, P. Ho, T. Poggio, "Face Recognition with Support Vector
Machines: Global versus Component-based Approach", Proceedings. of
the Eighth IEEE International Conference on Computer Vision
(ICCV’01), Vol. 2, 09-12 July 2001, Vancouver, Canada, pp. 688-694.
                         Face Detection
[8] A. Lanitis, C.J. Taylor, T.F. Cootes, "Automatic Interpretation and
Coding of Face Images Using Flexible Models", IEEE Transaction
Pattern Analysis and Machine Intelligence (1997), pp. 743-756.
[9] E. Gomathi, K. Baskaran, "Recognition of Faces Using Improved
Principal Component Analysis", Second International Conference on
Machine Learning and Computing (ICMLC’10), pp.198-201.
[10] C. Liu and H. Wechsler, "Probabilistic Reasoning Models for Face
Recognition", in. IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’98), pp.827-832.
[11] Y. Ji, T. Lin, and H. Zha, "Mahalanobis Distance Based Non-
negative
Sparse Representation for Face Recognition", in Proceedings The
Eighth International Conference on Machine Learning and
Applications (ICMLA’09), pp.41-46.
[12] V. Kabeer & N. K. Narayanan, "Face recognition using state space
parameter and Artificial Neural Network Classifier", Proceedings of
IEEE International Conference on Computational Intelligence and
Multimedia Applications (ICCIMA’07), Sivakasi, India Vol.3,
December, 2007, pp 250-254.
[13] P. N. Bellhumer, J. Hespanha, and D. Kriegman, "Eigenfaces vs.
fisherfaces: Recognition using class specific linear projection", IEEE
Transactions on Pattern Analysis and Machine Intelligence, Special
Issue on Face Recognition, 1997, pp. 711-720.
[14] C. Thomaz and G. Giraldi, "A new ranking method for principal
components analysis and its application to face image analysis", Journal
Image and Vision Computing, 2010, vol. 28, no. 6, pp. 902-913.
[15] P. Viola and M. Jones, "Robust real-time object detection,"
International Journal of Computer Vision, 2002 vol. 57, no. 2, pp. 137-
                        Face Detection
154.
[16] Inte, Intel Open Source Computer Vision Library, v1. 1ore,
http://sourceforge.net/projects/opencvlibrary (October 2011).
[17] G. M. Beumer, Q. Tao, A. M. Bazen and R. N. J. Veldhuis, "A
Landmark Paper in Face Recognition". IEEE International Conference
on Automatic Face and Gesture Recognition (FGE’02), pp. 73-78.
[18] M. Koestinger, P. Wohlhart, P. M. Roth and H. Bischof, “Annotated
Facial Landmarks in the Wild: A Large-scale, Real-world Database for
Facial Landmark Localization”, IEEE International Workshop on
Benchmarking Facial Image Analysis Technologies (BeFIT’11).
[19] M. Jian-Wei and F. Yu-Hua, "Face segmentation algorithm based on
ASM", IEEE Conference on Intelligent Computing and Intelligent
Systems (ICIS’09), 2009, pp 495-499.
[20] Z. Liu, W. Li, X. Zhang and J. Yang, "Efficient Face Segmentation
Based on Face Attention Model and Seeded Region Merging", 9th
International Conference on Signal Processing (ICSP’08). pp. 1116-
1119.
[21] A.M. Martinez and R. Benavente, "The AR Face Database", CVC
Technical Report #24, June - 1998.
[22] Voila P, Jones M J. Rapid object detection using a boosted
cascade of simple features [C], Proceedings of the 2001
IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (ICCVPR 2001), vol. 1, pp.
I-511-I-518, 8-14 Dec. 2001, Kauai, USA.
[23] Viola P, Jones M. Robust real-time face detection [J],
International Journal of Computer Visio , 57( 2):137-154.
                         Face Detection
[24] H., L., Punith, Kumar. (2023). Face Mask Detection System.
International Journal For Science Technology And Engineering,
Available from: 10.22214/ijraset.2023.53690
[25] Sahel, Mohammad, Iqbal., Subhankar, Mishra. (2023). A
Comparative Study of Face Detection Algorithms for Masked Face
Detection. arXiv.org, Available from: 10.48550/arXiv.2305.11077
[26] (2023). A Comparative Study of Face Detection Algorithms for
Masked Face Detection. Available from: 10.48550/arxiv.2305.11077
[27] Xuesen, Chen. (2023). Detecting Multi-Pose Masked Face using
Haar        Cascade    and    LBP      Classifiers.   Available     from:
10.1109/icaaic56838.2023.10141064
[28] Viola, P. and Jones, M., 2001, December. Rapid object detection
using a boosted cascade of simple features. In Proceedings of the 2001
IEEE computer society conference on computer vision and pattern
recognition. CVPR 2001 (Vol. 1, pp. I-I). Ieee.
[29] Sami, Ur, Rahman., Fakhre, Alam., Wajid, Ali. (2022). Gun
Detection in CCTV Images using HAAR-Like Features. Proceedings of
the Pakistan Academy of Sciences: A. Physical and Computational
Sciences, Available from: 10.53560/ppasa(59-4)749
[30]   (2022). Object recognition and tracking using Haar-like Features
Cascade Classifiers: Application to a quad-rotor UAV.      Available from:
10.1109/codit55151.2022.9803981
[31] Cherif, Seibi., Zachary, F., Ward., Masoum, Mohammad, A.S..,
Mohammad, Shekaramiz. (2022). Locating and Extracting Wind Turbine
Blade Cracks Using Haar-like Features and Clustering.      Available from:
10.1109/ietc54973.2022.9796823
[32] Andrew, Sebastian, Lehman., Joseph, Sanjaya. (2020). Sistem
Pengenalan Spesifikasi Mobil pada Showroom Berbasis Haar-Like
Features.      Available from: 10.28932/JUTISI.V6I3.2903
                         Face Detection
[33] Srikanta, Pal. (2020). Human Face Detection Technique using Haar-
like   Features.   International   Journal   of   Computer   Applications,
Available from: 10.5120/IJCA2020920883