See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/342733702
Artificial Intelligence In Object Detection - Report
Preprint · January 2020
CITATIONS READS
0 587
1 author:
Ashish Kumar
National Taipei University of Technology
8 PUBLICATIONS 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
A Study on NCOVID – 19 India vs The World using Data Mining and Machine Learning on Python View project
COMPANY NETWORK IMPLEMENTING ROUTING PROTOCOL AND LAN SWITCHING View project
All content following this page was uploaded by Ashish Kumar on 07 July 2020.
The user has requested enhancement of the downloaded file.
Ashish Kumar
National Taipei University of Technology
FC Report
Artificial Intelligence in Object Detection
Ashish Kumar
Department of Electrical Engineering and Computer Science
National Taipei University of Technology
Taipei, Taiwan 10608
Email: t108998404@ntut.edu.tw
representing, manipulating, interpreting, and utilizing data and
Abstract— In this report on Artificial intelligence in object
detection major developments in this research field are presented information that are vague and lack certainty. Object detection
also my main research based on face and motion detection is is a computer vision technique whose aim is to detect objects
explained a little bit. Different architectures based on convolutional such as cars, buildings, and human beings, just to mention a
neural network a class of deep neural network is studied and few. The objects can generally be identified from either pictures
different methodologies for object detection are presented and or video feeds. Object detection has been applied widely in
compared. Also an approach for 3D modelling using fuzzy logic in video surveillance, self-driving cars, and object/people
presented. tracking. Object detection is widely used in computer vision
tasks such as face detection, face recognition and video object
Index Terms— 3D modelling, Object Detection., AI, Fuzzy logic co-segmentation.
I. INTRODUCTION
My research project is basically based on face and motion
I N computer science, artificial intelligence (AI), sometimes
called machine intelligence, is intelligence demonstrated
detection using python programming language. But in this
report I mainly focus on AI and Object detection.
by machines, in contrast to the natural intelligence displayed by
humans. AI textbooks define the field as the study of The remainder of this report is structured as follows. In Section
"intelligent agents": any device that perceives its environment II, AI, machine learning (ML) and deep learning (DL) are told.
and takes actions that maximize its chance of successfully Model architectures for object detection explained in section
achieving its goals [1]. Colloquially, the term "artificial III. Results and discussion are presented in section IV.
intelligence" is often used to describe machines (or computers) Conclusions and future study directions are concluded in
that mimic "cognitive" functions that humans associate with section V.
the human mind, such as "learning" and "problem solving" [2].
The traditional problems of AI research II. AI VS ML VS DL
include reasoning, knowledge
representation, planning, learning, natural language
processing, perception and the ability to move and manipulate
objects [3]. General intelligence is among the field's long-term
goals[4]. Approaches include statistical
methods, computational intelligence, and traditional symbolic
AI. Many tools are used in AI, including versions of search and
mathematical optimization, artificial neural networks,
and methods based on statistics, probability and economics
A fuzzy control system is a control system based on fuzzy logic
i.e. a mathematical system that analyzes analog input values in
terms of logical variables that take on continuous values
between 0 and 1, in contrast to classical or digital logic, which
operates on discrete values of either 1 or 0 (true or false,
respectively). Fuzzy logic is a form of many-valued logic in
which the truth values of variables may be any real
number between 0 and 1 both inclusive. It is employed to
handle the concept of partial truth, where the truth value may
range between completely true and completely false [5]. By
contrast, in Boolean logic, the truth values of variables may
only be the integer values 0 or 1. Fuzzy logic is based on the
observation that people make decisions based on imprecise and
non-numerical information. Fuzzy models or sets are
mathematical means of representing vagueness and imprecise
information. These models have the capability of recognizing,
Fig. 1: Showing Artificial Intelligence is a superset within
which Machine Learning and Deep Learning belong
[Google images]
AI is human intelligence demonstrated by machines to perform
simple to complex tasks where in ML it provides machines the
ability to learn and understand without being explicitly
programmed. The idea behind AI is to program machines to
carry out tasks in more human ways or smart ways in ML the
key to teach computers to think and understand like we do is
ML. Fig. 2: RCNN: Regions with CNN features [6]
Methods for object detection generally fall into either machine These 2000 candidate region proposals are warped into a square
learning-based approaches or deep learning-based approaches. and fed into a convolutional neural network that produces a
For Machine Learning approaches, it becomes necessary to first 4096-dimensional feature vector as output. The CNN acts as a
define features using one of the methods below, then using a feature extractor and the output dense layer consists of the
technique such as support vector machine (SVM) to do the features extracted from the image and the extracted features are
classification. On the other hand, deep learning techniques are fed into an SVM to classify the presence of the object within that
able to do end-to-end object detection without specifically candidate region proposal. In addition to predicting the presence
defining features, and are typically based on convolutional of an object within the region proposals, the algorithm also
neural networks (CNN). predicts four values which are offset values to increase the
precision of the bounding box. For example, given a region
Machine Learning approaches: proposal, the algorithm would have predicted the presence of a
person but the face of that person within that region proposal
Viola–Jones object detection framework based
could’ve been cut in half. Therefore, the offset values help in
on Haar features
adjusting the bounding box of the region proposal.
Scale-invariant feature transform (SIFT)
Histogram of oriented gradients (HOG) features
Major Problems in R-CNN are as follows: (1) It still takes a huge
Deep Learning approaches:
amount of time to train the network as you would have to
Region Proposals (R-CNN, Fast R-CNN, Faster R-
classify 2000 region proposals per image. (2) It cannot be
CNN) implemented real time as it takes around 47 seconds for each
Single Shot MultiBox Detector (SSD) test image. (3) The selective search algorithm is a fixed
You Only Look Once (YOLO) algorithm. Therefore, no learning is happening at that stage.
Single-Shot Refinement Neural Network for Object This could lead to the generation of bad candidate region
Detection (RefineDet) proposals.
. 2. Fast R-CNN
III. MODEL ARCHITECTURES FOR OBJECT
DETECTION
In this section some famous model architectures for object
detection and their main principles are shown:
1. R-CNN
To bypass the problem of selecting a huge number of
regions, Ross Girshick et al. proposed a method where we use
selective search to extract just 2000 regions from the image and
he called them region proposals. Therefore, now, instead of
trying to classify a huge number of regions, we can just work
with 2000 regions. These 2000 region proposals are generated
using the selective search algorithm which is written below [6]. Fig. 3: Architecture Fast – RCNN [7]
Ross Girshick et al. solved some of the drawbacks of R-CNN to
Selective Search: build a faster object detection algorithm and it was called Fast
1. Generate initial sub-segmentation, we generate many candidate R-CNN. The approach is similar to the R-CNN algorithm. But,
regions instead of feeding the region proposals to the CNN, here feeding
2. Use greedy algorithm to recursively combine similar regions the input image to the CNN to generate a convolutional feature
into larger ones map is there. From the convolutional feature map, one can
3. Use the generated regions to produce the final candidate region identify the region of proposals and warp them into squares and
proposals by using a RoI pooling layer user can reshape them into a fixed
size so that it can be fed into a fully connected layer [7]. From
the RoI feature vector, user use a softmax layer to predict the
class of the proposed region and also the offset values for the
bounding box. The reason “Fast R-CNN” is faster than R-CNN
is because you don’t have to feed 2000 region proposals to the
convolutional neural network every time. Instead, the
convolution operation is done only once per image and a feature
map is generated from it.
Fig.6 Comparison of test-time speed of object detection
algorithms [online source]
From the above graph, it can be seen that Faster R-CNN is much
faster than it’s predecessors. Therefore, it can even be used for
real-time object detection.
4. YOLO
Fig.4: Comparison of object detection algorithms [online All of the previous object detection algorithms use regions to
source] localize the object within the image. The network does not look
at the complete image. Instead, parts of the image which have
From the above graphs, it is shown that Fast R-CNN is high probabilities of containing the object. YOLO or You Only
significantly faster in training and testing sessions over R-CNN. Look Once is an object detection algorithm much different from
When you look at the performance of Fast R-CNN during testing the region based algorithms seen above. In YOLO a single
time, including region proposals slows down the algorithm convolutional network predicts the bounding boxes and the class
significantly when compared to not using region proposals. probabilities for these boxes [9].
Therefore, region proposals become bottlenecks in Fast R-CNN
algorithm affecting its performance.
3. Faster R-CNN
Fig.7: YOLO: You Only Look Once [9]
In YOLO we take an image and split it into an SxS grid, within
each of the grid we take m bounding boxes. For each of the
bounding box, the network outputs a class probability and offset
Fig. 5: Faster R-CNN Structure [8] values for the bounding box. The bounding boxes having the
Both of the above algorithms (R-CNN & Fast R-CNN) uses class probability above a threshold value is selected and used to
selective search to find out the region proposals. Selective search locate the object within the image. YOLO is orders of magnitude
is a slow and time-consuming process affecting the performance faster (45 frames per second) than other object detection
of the network. Therefore, Shaoqing Ren et al. came up with an algorithms. The limitation of YOLO algorithm is that it struggles
object detection algorithm that eliminates the selective search with small objects within the image, for example it might have
algorithm and lets the network learn the region proposals [8]. difficulties in detecting a flock of birds. This is due to the spatial
Similar to Fast R-CNN, the image is provided as an input to a constraints of the algorithm.
convolutional network which provides a convolutional feature
map. Instead of using selective search algorithm on the feature
map to identify the region proposals, a separate network is used
to predict the region proposals. The predicted region proposals
are then reshaped using a RoI pooling layer which is then used
to classify the image within the proposed region and predict the
offset values for the bounding boxes.
techniques with their upgraded versions are there which help in
detection of object as well as real time object detection of
objects.
V. CONCLUSIONS
Real time object detection in today’s world has been made easy
using the various object detection techniques more
advancements can be done in this research field. for example:
in YOLO V3 Anchor box offset prediction, focal loss and liner
prediction instead of logistic didn’t work so in future it can
further be extended by finding solutions to their problems.
Fig.8: Real Time Object detection using Laptop Webcam There is always a possibility of improvement in this world of
researchers Nothing Is Perfect.
I have tried yolo myself and it works good in real time object
detection compared to others. I used it personally to detect the REFERENCES
things around me like cellphone, bottle, mugs and bowls not to [1] Poole, David; Mackworth, Alan; Goebel, Randy (1998). Computational
forget myself as a person. I also think there is a gaffe in YOLO Intelligence: A Logical Approach. New York: Oxford University
Press. ISBN 978-0-19-510270-3.
as if a thing or a person is blocking the half frame for detection [2] Russell, Stuart J.; Norvig, Peter (2009). Artificial Intelligence: A Modern
or is overlapping a thing YOLO doesn’t detect it even though it Approach (3rd ed.). Upper Saddle River, New Jersey: Prentice
is in the frame. Reason could be the weights are not trained to Hall. ISBN 978-0-13-604259-4.
do so or the data used for detection doesn’t have such images. [3] Luger, George; Stubblefield, William (2004). Artificial Intelligence:
Structures and Strategies for Complex Problem Solving (5th ed.).
Though the users made it on Linux and IOS based device but it Benjamin/Cummings. ISBN 978-0-8053-4780-7
can also be used on Microsoft OS. Using anconda3, Cygwin, [4] Kurzweil, Ray (1999). The Age of Spiritual Machines. Penguin
Cuda and Nvidia – CUDNN. Books. ISBN 978-0-670-88217-5.
[5] Novák, V.; Perfilieva, I.; Močkoř, J. (1999). Mathematical principles of
fuzzy logic. Dordrecht: Kluwer Academic. ISBN 978-0-7923-8595-0
Also in the International Journal of Science and Research [6] Girshick et al, “Rich feature hierarchies for accurate object detection and
(IJSR) a paper written by Prerna Dahiya and Kamal Kumar semantic segmentation”, CVPR 2014.
Ranga on real time object detection and 3D modelling using [7] Girshick, “Fast R-CNN”, ICCV 2015.
fuzzy logic is there which used Entropy based selection of [8] Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks”, NIPS 2015.
optimum transformation of input data, wavelet based [9] Redmon et al, “You Only Look Once: Unified, Real-Time Object
transformation and fuzzy logic for visualizing and quantifying Detection”, CVPR 2016.
the degree of difficulty of detecting objects and a technique to [10] P. Dahiya et al, “Real Time Object Detection and 3D Modeling Using
detect the object and modeling of the object. Were authors Fuzzy Logic”, IJSR June 2014.
proposed a system OD3DM that can detect, extract and model
the images in 3D. The experimental results on collected image
dataset show that their proposed approach is more accurate and
efficient than traditional methods. they prepared the model
which accurately detects the complex geometric structures and
model it into 3D. Fuzzy logic and entropy based selection of
optimum based input data has been used to implement their
work. Common pattern detection technique provides efficient
detection and modeling of complex geometric objects. All of
the implementation were done in Matlab fuzzy Logic methods
which provide better and accurate results as compare to the
traditional approaches [10].
Currently I am working on a project which is an application of
Computer Vision using Python. Which can be used to detect
moving objects from the computer/laptop webcam which will
store and visualize the times when the object entered and exited
the video frame which will work as an aid to CCTV in saving
memory and energy as it won’t have to work 24*7 only when
an object is there in the coverage area of it then it will work.
IV. RESULTS & DISCUSSIONS
This report on artificial intelligence (AI) in object detection
shows different approaches in the modern word used for object
detection as can be seen from above sections there are different
(a)
(b)
Fig. 9: Real Time Object detection using YOLO and Laptop
Webcam (a) wrong detection of towel as person (b) correct
detection of person as person
View publication stats