Background Substract I On
Background Substract I On
A Texture-Based Method for Modeling the computed over a larger area than a single pixel. This approach
Background and Detecting Moving Objects provides us with many advantages and improvements compared to
the state-of-the-art. Our method tries to address all the issues
Marko Heikkilä and mentioned earlier except the handling of shadows which turned out
to be an extremely difficult problem to solve with background
Matti Pietikäinen, Senior Member, IEEE
modeling. In [4], a comprehensive survey of moving shadow
detection approaches is presented.
Abstract—This paper presents a novel and efficient texture-based method for
modeling the background and detecting moving objects from a video sequence.
Each pixel is modeled as a group of adaptive local binary pattern histograms that 2 RELATED WORK
are calculated over a circular region around the pixel. The approach provides us
with many advantages compared to the state-of-the-art. Experimental results
A very popular technique is to model each pixel in a video frame
clearly justify our model. with a Gaussian distribution. This is the underlying model for many
background subtraction algorithms. A simple technique is to
Index Terms—Motion, texture, background subtraction, local binary pattern. calculate an average image of the scene, to subtract each new video
frame from it and to threshold the result. The adaptive version of
æ this algorithm updates the model parameters recursively by using a
1 INTRODUCTION simple adaptive filter. This single Gaussian model was used in [5].
The previous model does not work well in the case of dynamic
BACKGROUND subtraction is often one of the first tasks in machine
natural environments since they include repetitive motions like
vision applications, making it a critical part of the system. The
swaying vegetation, rippling water, flickering monitors, camera
output of background subtraction is an input to a higher-level
process that can be, for example, the tracking of an identified object. jitter, etc. This means that the scene background is not completely
The performance of background subtraction depends mainly on the static. By using more than one Gaussian distribution per pixel, it is
background modeling technique used. Natural scenes especially possible to handle such backgrounds. In [6], the mixture of
put many challenging demands on background modeling since they Gaussians approach was used in a traffic monitoring application.
are usually dynamic in nature including illumination changes, The model for pixel intensity consisted of three Gaussian distribu-
swaying vegetation, rippling water, flickering monitors, etc. A tions corresponding to the road, vehicle, and shadow distributions.
robust background modeling algorithm should also handle situa- One of the most commonly used approaches for updating the
tions where new objects are introduced to or old ones removed from Gaussian mixture model was presented in [7]. Instead of using the
the background. Furthermore, the shadows of the moving and scene exact EM algorithm, an online K-means approximation was used.
objects can cause problems. Even in a static scene frame-to-frame Many authors have proposed improvements and extensions to this
changes can occur due to noise and camera jitter. Moreover, the algorithm. In [8], new update algorithms for learning mixture
background modeling algorithm should operate in real-time. models were presented. They also proposed a method for
A large number of different methods for detecting moving detecting moving shadows using an existing mixture model. In
objects have been proposed and many different features are [9], not only the parameters but also the number of components of
utilized for modeling the background. Most of the methods use the mixture is constantly adapted for each pixel. In [10], the
only the pixel color or intensity information to make the decision. mixture of Gaussians model was combined with concepts defined
To the authors’ knowledge, none of the earlier studies have by region level and frame level considerations.
utilized discriminative texture features in dealing with the The Gaussian assumption for the pixel intensity distribution
problem. Only some simple statistics of neighborhoods may have does not always hold. To deal with the limitations of parametric
been considered. This is maybe due to the high-computational methods, a nonparametric approach to background modeling was
complexity and limited performance of texture methods. proposed in [11]. The proposed method utilizes a general
In this paper, we propose an approach that uses discriminative nonparametric kernel density estimation technique for building a
texture features to capture background statistics. An early version of statistical representation of the scene background. The probability
the method based on block-wise processing was presented in [1]. For density function for pixel intensity is estimated directly from the
our method, we chose the local binary pattern (LBP) texture operator data without any assumptions about the underlying distributions.
[2], [3], which has recently shown excellent performance in many In [12], a quantization/clustering technique to construct a
nonparametric background model was presented. The background
applications and has several properties that favor its usage in
is encoded on a pixel by pixel basis and samples at each pixel are
background modeling. Perhaps the most important properties of the
clustered into the set of codewords.
LBP operator are its tolerance against illumination changes and its
Some presented background models consider the time aspect of
computational simplicity. In order to make the LBP even more
a video sequence. The decision depends also on the previous pixel
suitable to real-world scenes, we propose a modification to the
values from the sequence. In [13], [14], an autoregressive process
operator and use this modified LBP throughout this paper. Unlike
was used to model the pixel value distribution over time. In [15], a
most other approaches, the features in background modeling are
Hidden Markov Model (HMM) approach was adopted.
Some authors have modeled the background using edge
. The authors are with the Machine Vision Group, Infotech Oulu, features. In [16], the background model was constructed from
Department of Electrical and Information Engineering, University of the first video frame of the sequence by dividing it into equally
Oulu, PO Box 4500, 90014, Finland. E-mail: {markot, mkp}@ee.oulu.fi. sized blocks and calculating an edge histogram for each block. The
Manuscript received 19 Jan. 2005; revised 22 June 2005; accepted 22 Aug. histograms were constructed using pixel-specific edge directions
2005; published online 14 Feb. 2006.
as bin indices and incrementing the bins with the corresponding
Recommended for acceptance by P. Fua.
For information on obtaining reprints of this article, please send e-mail to: edge magnitudes. In [17], a fusion of edge and intensity
tpami@computer.org, and reference IEEECS Log Number TPAMI-0041-0105. information was used.
0162-8828/06/$20.00 ß 2006 IEEE Published by the IEEE Computer Society
658 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 4, APRIL 2006
neighboring pixels are very close to the value of the center pixel.
This is due to the thresholding scheme of the operator. Think of a
case where gp and gc have the values 29 and 30, respectively. From
(1), we see that, in this case, sðxÞ outputs a value of 0. If the values
were 30 and 30, sðxÞ would return with 1. In order to make the LBP
more robust against these negligible changes in pixel values, we
propose to modify the thresholding scheme of the operator by
replacing the term sðgp gc Þ in (1) with the term sðgp gc þ aÞ. The
Fig. 1. Calculating the binary pattern. bigger the value of jaj is, the bigger changes in pixel values are
allowed without affecting the thresholding results. In order to
Motion-based approaches have also been proposed for back- retain the discriminative power of the LBP operator, a relatively
ground subtraction. The algorithm presented in [18] detects salient small value should be used. In our experiments, a was given a
motion by integrating frame-to-frame optical flow over time. Salient value of 3. The results presented in [1] show that good results can
motion is assumed to be motion that tends to move in a consistent also be achieved by using the original LBP. With the modified
direction over time. The saliency measure used is directly related to version, our background subtraction method consistently behaves
the distance over which a point has traveled with a consistent more robustly and thus should be preferred over the original one.
direction.
Region-based algorithms usually divide an image into blocks 4 A TEXTURE-BASED APPROACH
and calculate block-specific features. Change detection is achieved
In this section, we introduce our approach to background subtrac-
via block matching. In [19], the block correlation is measured using
tion. The algorithm can be divided into two phases, background
the NVD (Normalized Vector Distance) measure. In [16], an edge
modeling and foreground detection, described in Sections 4.1 and 4.2. In
histogram calculated over the block area is used as a feature vector
Section 4.3, some guidelines for how to select the parameter values
describing the block.
are given.
m ~ þ ð1 b Þm
~ k ¼ b h ~k ; b 2 ½0; 1; ð3Þ
TABLE 1
The Parameter Values of the Method for the Results in Figs. 2 and 3
be used. Making the decision pixel-wise usually offers more than the comparison method. In the case of false positives, our
accurate results for shape extraction. In Section 4, we extended our method was better in two cases. For the rest of the three sequences,
algorithm for pixel-wise processing and all the tests presented in the difference is very small. It should be noticed that, for the
this paper are carried out using this method. proposed method, most of the false positives occur on the contour
Fig. 2 shows some results for our method using some of the test areas of the moving objects (see Fig. 2). This is because the features
sequences from [1]. The first two frames on the upper left are from are extracted from the pixel neighborhood. According to the
an indoor sequence where a person is walking in a laboratory room. overall results, the proposed method outperforms the comparison
Background subtraction methods that rely only on color information method for the used test sequences.
will most probably fail to detect the moving object correctly because Since our method has relatively many parameters, there
of the similar color of the foreground and the background. The next naturally arises a question: How easy or difficult is it to obtain a
two frames are from an indoor sequence where a person walks
good set of parameter values? To see how sensitive the proposed
toward the camera. Many adaptive pixel-based methods output a
method is to small changes of its parameter values, we calculated the
huge amount of false negatives on the inner areas of the moving
error classifications for different parameter settings. Because of a
object because the pixel values stay almost the same over time. The
huge amount of different combinations, only one parameter was
proposed method gives good results because it exploits information
varied at a time. The measurements were made for several video
gathered over a larger area than a single pixel. The first two frames
on the lower left are from an outdoor sequence which contains sequences, including indoor and outdoor scenes. The results for the
relatively small moving objects. The original sequence has been first sequence of Fig. 2 are plotted in Fig. 5. It can be clearly seen that,
taken from the PETS database (ftp://pets.rdg.ac.uk). The proposed for all parameters, a good value can be chosen across a wide range of
method successfully handles this situation and all the moving values. The same observation was made for all the measured
objects are detected correctly. The last two frames are from an sequences. This property significantly eases the selection of
outdoor sequence that contains heavily swaying trees and rippling parameter values. Furthermore, the experiments have shown that
water. This is a very difficult scene from the background modeling
point of view. Since the method was designed to handle also
multimodal backgrounds, it manages the situation relatively well.
The values for the method parameters are given in Table 1. For the
first three sequences, the values were kept untouched. For the last
sequence, values for the parameters K and TB were changed to
adjust the method for increased multimodality of the background.
In [13], a test set for evaluating background subtraction methods
was presented. It consists of seven video sequences, each addressing
a specific canonical background subtraction problem. In the same
paper, 10 different methods were compared using the test set. We
tested our method against this test set and achieved the results
shown in Fig. 3. When compared to the results in [13], the overall
performance of our method seems to be better than that of the other
methods. We did not change the parameter values of our method
between the test sequences, although better results could be
obtained by customizing the values for each sequence. See Table 1
for the parameter values used. Like most of the other methods, our
method was not capable of handling the Light Switch problem. This
is because we do not utilize any higher level processing that could be
used to detect sudden changes in background.
We also compared the performance of our method to the widely
used method of Stauffer and Grimson [7] by using the five test
sequences presented in [1]. The sequences include both indoor and
outdoor scenes and five frames from the each sequence are labeled
as the ground truth. The results are shown in Fig. 4. The numbers
of error classifications were achieved by summing the errors from
the five processed frames corresponding to the ground truth Fig. 3. Detection results of our method for the test sequences presented in [13].
frames. For all five sequences, our method gave less false negatives The image resolution is 160 120 pixels.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 4, APRIL 2006 661
Fig. 4. Comparison results of the method presented in this paper (TBMOD) and the method of Stauffer and Grimson [7] (GMM). (a) The test sequences. (b) The test
results. FN and FP stand for false negatives and false positives, respectively.
Fig. 5. Number of false negatives (FN) and false positives (FP) for different parameter values for the first sequence of Fig. 2. While one parameter was varied, other
x ¼ ð~
parameters were kept fixed at the values given in Table 1. The results are normalized between zero and one: ~ x minð~
xÞÞ=maxð~
x minð~
xÞÞ.
662 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 4, APRIL 2006
a good set of parameters for a sequence usually performs well also REFERENCES
for other sequences (see Table 1). [1] M. Heikkilä, M. Pietikäinen, and J. Heikkilä, “A Texture-Based Method for
We also measured the speed of the proposed method. For the Detecting Moving Objects,” Proc. British Machine Vision Conf., vol. 1, pp. 187-
196, 2004.
parameter values used in the tests, a frame rate of 15 fps was [2] T. Ojala, M. Pietikäinen, and D. Harwood, “A Comparative Study of
achieved. We used a standard PC with a 1.8 GHz processor and Texture Measures with Classification Based on Feature Distributions,”
Pattern Recognition, vol. 29, no. 1, pp. 51-59, 1996.
512 MB of memory in our experiments. The image resolution was
[3] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution Gray-Scale and
160 120 pixels. This makes the method well-suited to systems Rotation Invariant Texture Classification with Local Binary Patterns,” IEEE
that require real-time processing. Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987,
July 2002.
[4] A. Prati, I. Mikic, M.M. Trivedi, and R. Cucchiara, “Detecting Moving
Shadows: Algorithms and Evaluation,” IEEE Trans. Pattern Analysis and
6 CONCLUSIONS Machine Intelligence, vol. 25, no. 7, pp. 918-923, July 2003.
[5] C.R. Wren, A. Azarbayejani, T. Darrell, and A.P. Pentland, “Pfinder: Real-
A novel approach to background subtraction was presented, in Time Tracking of the Human Body,” IEEE Trans. Pattern Analysis and
which the background is modeled using texture features. The Machine Intelligence, vol. 19, no. 7, pp. 780-785, July 1997.
[6] N. Friedman and S. Russell, “Image Segmentation in Video Sequences: A
features are extracted by using the modified local binary pattern Probabilistic Approach,” Proc. Conf. Uncertainty in Artificial Intelligence,
(LBP) operator. Our approach provides us with several advantages pp. 175-181, 1997.
compared to other methods. Due to the invariance of the [7] C. Stauffer and W.E.L. Grimson, “Adaptive Background Mixture Models
for Real-Time Tracking,” Proc. IEEE CS Conf. Computer Vision and Pattern
LBP features with respect to monotonic gray-scale changes, our Recognition, vol. 2, pp. 246-252, 1999.
method can tolerate considerable illumination variations common [8] P. KaewTraKulPong and R. Bowden, “An Improved Adaptive Background
Mixture Model for Real-Time Tracking with Shadow Detection,” Proc.
in natural scenes. Unlike many other approaches, the proposed European Workshop Advanced Video Based Surveillance Systems, 2001.
features are very fast to compute, which is an important property [9] Z. Zivkovic, “Improved Adaptive Gaussian Mixture Model for Background
from the practical implementation point of view. The proposed Subtraction,” Proc. Int’l Conf. Pattern Recognition, vol. 2, pp. 28-31, 2004.
[10] Q. Zang and R. Klette, “Robust Background Subtraction and Maintenance,”
method belongs to nonparametric methods, which means that no Proc. Int’l Conf. Pattern Recognition, vol. 2, pp. 90-93, 2004.
assumptions about the underlying distributions are needed. [11] A. Elgammal, R. Duraiswami, D. Harwood, and L.S. Davis, “Background
and Foreground Modeling Using Nonparametric Kernel Density Estima-
Our method has been evaluated against several video sequences tion for Visual Surveillance,” Proc. IEEE, vol. 90, no. 7, pp. 1151-1163, 2002.
including both indoor and outdoor scenes. It has proven to be [12] K. Kim, T.H. Chalidabhongse, D. Harwood, and L. Davis, “Background
tolerant to illumination variations, the multimodality of the back- Modeling and Subtraction by Codebook Construction,” Proc. IEEE Int’l
Conf. Image Processing, vol. 5, pp. 3061-3064, 2004.
ground, and the introduction/removal of background objects. [13] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles
Furthermore, the method is capable of real-time processing. and Practice of Background Maintenance,” Proc. IEEE Int’l Conf. Computer
Vision, vol. 1, pp. 255-261, 1999.
Comparisons to other approaches presented in the literature have [14] A. Monnet, A. Mittal, N. Paragios, and R. Visvanathan, “Background
shown that our approach is very powerful when compared to the Modeling and Subtraction of Dynamic Scenes,” Proc. IEEE Int’l Conf.
state-of-the-art. Computer Vision, vol. 2, pp. 1305-1312, 2003.
[15] J. Kato, T. Watanabe, S. Joga, J. Rittscher, and A. Blake, “An HMM-Based
Currently, the method requires a nonmoving camera, which Segmentation Method for Traffic Monitoring Movies,” IEEE Trans. Pattern
restricts its usage in certain applications. We plan to extend the Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1291-1296, Sept. 2002.
[16] M. Mason and Z. Duric, “Using Histograms to Detect and Track Objects in
method to support also moving cameras. The preliminary results Color Video,” Proc. Applied Imagery Pattern Recognition Workshop, pp. 154-
with a pan-tilt-zoom camera are promising. The proposed method 159, 2001.
also has relatively many parameters. This could be a weakness, but [17] S. Jabri, Z. Duric, H. Wechsler, and A. Rosenfeld, “Detection and Location
of People in Video Images Using Adaptive Fusion of Color and Edge
at the same time, it allows the user extensive control over method Information,” Proc. Int’l Conf. Pattern Recognition, vol. 4, pp. 627-630, 2000.
behavior. A proper set of parameters can be easily found for a [18] L. Wixson, “Detecting Salient Motion by Accumulating Directionally-
Consistent Flow,” IEEE Trans. Pattern Analysis and Machine Intelligence,
given application scenario. vol. 22, no. 8, pp. 774-780, Aug. 2000.
[19] T. Matsuyama, T. Ohya, and H. Habe, “Background Subtraction for Non-
Stationary Scenes,” Proc. Asian Conf. Computer Vision, pp. 622-667, 2000.
ACKNOWLEDGMENTS
This work was supported by the Academy of Finland. The authors
. For more information on this or any other computing topic, please visit our
also want to thank Professor Janne Heikkilä for his contribution. Digital Library at www.computer.org/publications/dlib.