Investigations of Object Detection in Im
Investigations of Object Detection in Im
sciences
Review
Investigations of Object Detection in Images/Videos
Using Various Deep Learning Techniques and
Embedded Platforms—A Comprehensive Review
Chinthakindi Balaram Murthy 1,† , Mohammad Farukh Hashmi 1,† , Neeraj Dhanraj Bokde 2,†
and Zong Woo Geem 3, *
1 Department of Electronics and Communication Engineering, National Institute of Technology,
Warangal 506004, India; balu1602@student.nitw.ac.in (C.B.M.); mdfarukh@nitw.ac.in (M.F.H.)
2 Department of Engineering—Renewable Energy and Thermodynamics, Aarhus University,
8000 Aarhus, Denmark; neerajdhanraj@eng.au.dk
3 Department of Energy IT, Gachon University, Seongnam 13120, Korea
* Correspondence: geem@gachon.ac.kr; Tel.: +82-31-750-5586
† These authors contributed equally to this work.
Received: 15 April 2020; Accepted: 2 May 2020; Published: 8 May 2020
Abstract: In recent years there has been remarkable progress in one computer vision application
area: object detection. One of the most challenging and fundamental problems in object detection is
locating a specific object from the multiple objects present in a scene. Earlier traditional detection
methods were used for detecting the objects with the introduction of convolutional neural networks.
From 2012 onward, deep learning-based techniques were used for feature extraction, and that led to
remarkable breakthroughs in this area. This paper shows a detailed survey on recent advancements
and achievements in object detection using various deep learning techniques. Several topics have
been included, such as Viola–Jones (VJ), histogram of oriented gradient (HOG), one-shot and two-shot
detectors, benchmark datasets, evaluation metrics, speed-up techniques, and current state-of-art object
detectors. Detailed discussions on some important applications in object detection areas, including
pedestrian detection, crowd detection, and real-time object detection on Gpu-based embedded
systems have been presented. At last, we conclude by identifying promising future directions.
Keywords: convolutional neural network (CNN); computer vision (CV); graphics processing units
(GPUs); object detection; deep learning techniques
1. Introduction
Recently, computer vision has been extensively researched in the area of object detection for
industrial automation, consumer electronics, medical imaging, military, and video surveillance. It is
predicted that the computer vision market will be worth $50 billion by the end of 2020.
For object recognition, the raw input data are represented in matrix pixel form, where the first
representation layer abstracts the pixels and encodes edges, the next layer composes and encodes
edge arrangement, the next layer up encodes eyes and noses, and the final layer recognizes a face
present in the image. Normally, a deep learning process optimally classifies the facial features into
their respective levels without supervision.
In object classification application, manual feature extraction is eliminated by a convolutional
neural network (CNN), so there is no need to manually identify features that are useful for image
classification. CNNs extract features directly from images, and these extracted features are not
pre-trained, but they learn while the network is trained on collected images.
Due to automatic feature extraction, deep learning models became highly accurate in computer
vision. Deep CNN architecture involves complex models. They require large image datasets for higher
accuracy. CNNs require large labeled datasets to perform related tasks in computer vision, such as
object classification, detection, object tracking, and recognition.
With the advancement in technology and the availability of powerful graphics processing unit’s
(GPU), deep learning has been employed on datasets; state-of-the-art results have been demonstrated
by researchers in areas such as object classification, detection, and recognition. To perform both training
and testing, deep learning requires powerful computational resources and larger datasets. In computer
vision, image classification is the most widely researched area and it has attained astonishing results in
worldwide competitions through PASCAL, ILSVRC, VOC, and MS-COCO, which apply deep learning
techniques [1]. Deep learning techniques are deployed for object detection due to promising results
in image classification [2]. Nguyen et al. [3] implemented classification of sonar images with various
added noises on GoogleNet CNN and tested on TDI 2017 and 2018 datasets.
In generic object detection, the main aim is to determine whether or not there are any instances of
objects from the specified varieties (e.g., animals, vehicles, and pedestrians) in an image, and if present,
then return the spatial location and extent of a single object (by bounding box) [4,5]. Object detection
became the basis for solving overly complex vision-related tasks; namely, scene understanding,
image captioning, instance segmentation, semantic segmentation, object recognition, and tracking [6,7].
The applications of object detection cover areas such as Internet of Things (IoT) and artificial
intelligence, which includes intelligent military surveillance systems, security, self-driving cars,
robot vision, human–computer interaction (HCI), and consumer electronics.
Recently, deep learning methods [8,9] have emerged as the most powerful techniques for
automatically learning features from raw data. Specifically, deep learning methods have achieved
great progress in object detection, a problem that has grabbed the attention of many researchers in this
decade. Video surveillance is one of the most challenging and fundamental areas in security systems,
as it depends entirely on a lot on object detection and tracking. It monitors the behavior of people in
public to detect any suspicious behavior [10].
The road-map of object detection milestones is shown in Figure 1.
Figure 1. Milestones of object detection. In 2012 the major turning point was the use of DCNN
implemented for image classification by Krizhevsky et al. [1], VJ Det. [11,12], HOG Det. [13],
DPM [14–16], RCNN [17], etc. (source: [18]).
Appl. Sci. 2020, 10, 3280 3 of 46
The stated goals of object detection are to achieve both high accuracy and high efficiency by
developing robust object detection algorithms.
• Intra-class variations: variations in real-world objects include color, size, shape, material,
and pose variations.
• Image conditions and unconstrained environments: factors such as lighting, weather conditions,
occlusion, object physical location, viewpoint, clutter, shadow, blur, and motion.
• Imaging noise: factors such as low-resolution images, compression noise, filter distortions.
• Thousands of structured and unstructured real-world object categories to be distinguished
by the detector.
2. To achieve high efficiency, related challenges are:
• Low-end mobile devices have limited memory, limited speed, and low computational
capabilities.
• Thousands of open-world object classes should be distinguished.
• Large scale image or video data.
• Inability to handle previously unseen objects.
• Moreover, differently from recently published review papers on object detection topics [18–23],
this paper comprehensively reviews modern deep learning-based object detectors starting from
regions with convolutional neural netwoks (RCNN) and ending at CornerNet with its pros
and cons.
• It also covers some specific problems in computer vision (CV) application areas, such as pedestrian
detection, the military, crowd detection, intelligent transportation systems, medical imaging
analysis, face detection, object detection in sports videos, and other domains.
• It provides an outlook on the available deep learning frameworks, application program Interface
(API) services, and specific datasets used for object detection applications.
• It also puts forth the idea of deploying deep learning models into various embedded platforms
for real-time object detection. In the case of a pre-trained model being adopted, replacing the
feature extractor with an efficient backbone network would improve the real-time performance of
the CNN.
• It describes how a GPU-based CNN object detection framework would improve real-time
detection performance on edge devices.
Finally, we intend to give an overview of various deep learning methods deployed on various
embedded platforms in real-time objection and possible research directions.
The rest of this paper is organized as follows. Section 2 covers various deep learning architectures
used in object detection in detail. Frameworks and API Services, and available datasets and
performance metrics for object detection, have been discussed in Sections 3 and 4. Application
domains and deep learning approaches for object detection are explained briefly in Sections 5 and 6
respectively. Section 7 discusses various GPU-based embedded systems for real-time object detection
implemented using deep learning techniques. Research directions, a conclusion and future research
possibilities are presented in Sections 8 and 9.
Appl. Sci. 2020, 10, 3280 4 of 46
2. Object Detection
There are multiple ways to detect objects and these are done using the Viola–Jones (VJ) object
detector [11,12], the feature-based object detector [24–26], HOG features using a support vector machine
(SVM) classification object detector [13], and object detection-based deep learning techniques. Figure 3
shows various approaches available in object detection.
extraction. SVM classifier predicts the object presence within each region proposal and also recognizes
object classes. RCNN improved mean average precision (mAP) to 58.5% on the VOC-2007 dataset
from 33.7% (DPM-v5 [31]).
Despite the great improvement reported by the RCNN method, there are many drawbacks: (1) It
consumes more time to train the network, as we need to classify 2000 object region proposals per
image. (2) It cannot be implemented in real-time, as each test image needs around 47 s, and since
the selective search method is a fixed algorithm, no learning happens at this rate and it leads to the
generation of bad object region proposals. To overcome these drawbacks, SPPNet [32] was formulated
in the same year.
2.4.3. Retina-Net
SSD achieves better accuracy when applied over dense sampling of object locations, aspect ratios,
and scales. Large sets of object locations are generated by SSD that densely cover a few areas of
the image. This creates a class imbalance as the negatives increase and the object classes present
in those locations go undetected. Y. Lin et al. [44] implemented Retina-Net, as shown in Figure 12,
Appl. Sci. 2020, 10, 3280 11 of 46
to overcome the drawbacks—the class imbalance problem in SSD—and to control the decrease in
prediction accuracy of YOLO and SSD. The class imbalance problem in SSD is solved by using focal
loss in Retina-Net so that during training, it puts more focus on misclassified examples. Besides
maintaining very high-speed detection (MS-COCO dataset 59.1% mAP), focal loss enables SSD to
achieve comparable accuracy to that of RCNN series detectors.
class+box
subnets
class+box
subnets
class+box
subnets
Class subnet
W x H x 256 W x H x 256 W x H x KA
X 4A
W x H x 256 X 4A W x H x 256 W x H x 4A
Box subnet
2.4.4. SqueezeDet
Wu et al. [45] implemented SqueezeDet, a lightweight, single shot, extremely fast, fully-CNN
for detecting objects in an autonomous driving system. To deploy Deep CNN for real-time object
detection, the model should address some important problems, such as speed, accuracy, model size,
and power efficiency. These constraints are well addressed in the SqueezeDet model, as shown in
Figure 13. It is a single forward pass object detector, used to extract a high dimensional, low-resolution
feature maps for the applied input images; it uses stacked convolution filters. Second, it uses ConvDet,
a convolutional layer fed with a feature map as input that produces a large number of bounding boxes
and also predicts the object’s category. Finally, by applying filtering to these bounding boxes, it outputs
final object detections. The backbone model of SqueezeDet is SqueezeNet [46], and the model size is
less than 8 MB which is very small compared to AlexNet [1] without losing any accuracy. This model
consists of approximately two million trainable parameters and achieves a higher level of accuracy
when compared to VGG19 and ResNet-50 with 143,000,000 and 25,000,000 parameters. For the input
image of size 1242x375, this model achieved 57.2 Fps on the Kitti dataset [47] and consumed only 1.4 J
energy per image.
Appl. Sci. 2020, 10, 3280 12 of 46
2.4.5. CornerNet
Law et al. [48] implemented CornerNet for object detection, wherein the object is detected by a pair
of key points using a CNN instead of drawing an anchor box around the detected object. So the need for
designing anchor boxes usually used in one stage detectors is eliminated by detecting objects as paired
key points; i.e., top-left and top-right corners respectively. They introduced a new type of pooling layer
referred to as corner pooling, which helps the network to localize corners better. The CNN outputs the
heatmap for all top-left corners and bottom-right corners, along with an embedded vector map for
each detected corner. On the MS-COCO dataset, CornerNet achieved 42.2% AP which outperforms
the existing one-stage detectors. Figure 14 shows the CornerNet architecture. The main drawback is
that it generates incorrect paired key points for the detected object. So to overcome this drawback,
Duan et al. [49] implemented CenterNet by introducing a third key point at the center to detect each
object. CenterNet achieved 47% AP, and inference speed is slower than CornerNet. Table 1 shows the
summary of different object detection algorithms tested on Pascal Titan X GPU on MS-COCO and
Pascal-VOC2007 datasets. Table 2 shows a comparison of various deep learning-based object detectors’
performances on the MS-COCO test-dev dataset.
Table 1. Summary of different object detection (Pascal Titan X GPU) performances on MS-COCO and
Pascal-voc07.
mAP mAP
S.No Architecture FPS
(MS-COCO) (Pascal-Voc 2007)
1 RCNN [17] – 66% 0.1
2 SPPNet [32] – 63.10% 1
3 Fast RCNN [33] 35.90% 70.00% 0.5
4 Faster RCNN [2] 36.20% 73.20% 6
5 Mask RCNN [44] - 78.20% 5
6 YOLO [40] – 63.40% 45
7 SSD [41] 31.20% 76.80% 8
8 YOLOv2 [42] 21.60% 78.60% 67
9 YOLOv3 [43] 33.00% – 35
10 SqueezeDet [45] - – 57.2
11 SqueezeDet+ [45] - – 32.1
12 CornerNet [48] 69.2 – 4
Appl. Sci. 2020, 10, 3280 13 of 46
Table 2. Comparison of deep learning based object detection performances on MS-COCO test-dev
dataset. (Note: SqueezeDet* and SqueezeDet+* models are trained on Kitti dataset. AP(E), AP(M),
and AP(H) refer to average precision for easy, medium, and hard cases.)
Table 3. Cont.
• CIFAR-100 dataset [78]: This dataset provides only 100 object classes; each contains 600 images of
which 500 and 100 per class are training and testing images; 100 object classes are clustered into
20 superclasses, and each image comes with a “fine” label (the class to which it belongs) and a
“coarse” label (the superclass to which it belongs).
• CUB-200-2011 [79]: Cub-200 dataset [80] consists of 200 bird species annotated classes, and each
class has 11,788 images. Every annotated image has a single bounding box image (annotations
per image: 15 part locations, 312 binary attributes, and one bounding box).
• Caltech-256 [81]: It contains 256 object classes with a total of 30,607 images for each class 80 images,
and is not suitable for object localization.
• ILSVRC [82]: Since 2010, the “ImageNet Large Scale Visual Recognition Challenge (ILSVRC)” has
been conducted every year for object detection and classification. The ILSVRC dataset contains
10 times more object classes than PASCAL VOC. It contains 200 object classes, whereas the
PASCAL VOC dataset contains only 20 object classes.
• PASCAL VOC [83]: The “Pattern Analysis, Statistical Modelling and Computational Learning
(PASCAL) Visual Object Classes (VOC)” challenge provides standardized image datasets
for object class recognition tasks, and a common set of tools to access available datasets,
and enables evaluations and comparisons of the performances of various object detection
methods from 2008 to 2012. For object detection, all the researchers mostly follow MS-COCO and
PASCAL-VOC datasets.
Table 5 gives brief summary of the main stages of PASCAL VOC developments and image dataset
statistics [4], and includes new challenges every year.
Table 7. Comparison of object detection datasets between PASCAL VOC and ILSRVC.
challenges and difficulties faced in pedestrian detection. Table 9 shows various papers applied
deep learning-based techniques to handle dense and occluded pedestrian detection.
The widely used datasets for evaluating the pedestrian detection performance are
PASCAL-VOC [83], INRIA [13], KITTI [47], CalTech [81], ETH [75], and CityPersons [73].
Challenges Method
Feature fusion [99]
‘Bootstrap’ [41,99,115,116]
• "Boosted decision trees" [11,12,135], as they are easy to compute, but for complex scenes they
provide only low detection accuracy.
• CNNs which are used to speed-up detection where the computation of features is shared [136,137].
Deep learning-based face detection methods implement deep learning algorithms such as Faster
RCNN and SSD. Table 10 describes possible directions to overcome the major challenges and difficulties
faced in face detection.
Challenges Methods
Cascaded detection [138,139]
To improve speed up face detection ‘To predict the scale distribution of the faces in an
image and then run algorithm on some
selected scales’ [42,140,141]
‘Face calibration method’ using
progressive calibration through
To improve multi-pose face detection multiple detection stages [142]
The widely used datasets for evaluating face detection performance are WIDERFACE [71],
PASCAL-VOC [5], and Face Detection Dataset and Benchmark (FDDB) [72].
Appl. Sci. 2020, 10, 3280 22 of 46
A “CAD system based on a deep CNN model to detect breast cancer” in MRI images was proposed
in [181].
Abramoff et al. [182] proposed the “CNN technique to detect diabetic retinopathy in fundus
images” using public datasets. A “3D group-equivariant CNN technique for lung nodule detection”
in CT images was proposed in [183]. Recently, deep learning-based techniques have been used for
diagnosing retinal diseases [184,185]. Li et al. [177] introduced a CNN-based attention mechanism for
glaucoma detection. A deep CNN model [186] was introduced for melanoma detection, and there
was also a “Deep CNN for the detection of chronic obstructive pulmonary disease (COPD) and acute
respiratory disease (ARD) prediction” in CT images of smokers [187].
The widely used datasets for evaluating medical image analysis performance are different for
different diseases. ILD [188] and LIDC-IDRI [189] datasets for the lung; ADNI [190], BRATS [191],
and OASIS-3 [175,192] datasets for the brain; DDSM [193], MIAS [194], CAMELYON 17 [195],
and INbreast [196] datasets for the breast; DRIVE [197], STARE [198], and MESSIDOR-2 [199] datasets
for the eye.
The major challenge faced in the medical imaging field is the imbalance of samples in available
datasets, so there is a need to develop large-scale medical imaging datasets. The best solution is to
apply multi-task learning on the deep neural network when the training data are scarce. The other
possible solution is applying data augmentation techniques to the images.
Pham et al. [212] extended the “3DOP proposal generation considering class-independent
proposals, then re-rank[ed] the proposals” using both monocular images and depth maps. Li et al. [213]
used “a cylindrical projection mapping and a fully-convolutional network (FCN) to predict 3D
bounding boxes around vehicles only” Qi et al. [214] proposed that, “Frustum Point-Net generates
region proposals on the image plane with monocular images and use[s] the point cloud” to perform
classification and bounding box regression.
Wang et al. [147] implemented an approach based on an adversary network. They used the
“Spatial Dropout Network and Spatial Transformer Network based on adversarial network” for
generating features of occlusion and deformation [20]. Finding minute differences among inter-class
object classes is needed for fine-grained object detection. Chuang et al. [254] “integrated CNN with the
part-based method” by introducing a co-occurrence layer. Table 11 shows comparisons to overcome
challenges and difficulties arisen while using various deep learning-based object detection methods.
Currently, to speed up object detection, various acceleration techniques are adopted. Acceleration
techniques are mainly classified into three types: speed up detection pipeline, detection engine,
and numerical computation, as shown in Figure 15.
It is clear from this survey that deep learning-based CNNs are applicable for fine-grained object
localization, detection, and generic object detection. In object detection, CNNs automatically learn the
features and they form the backbone of modern object detection methods.
Appl. Sci. 2020, 10, 3280 27 of 46
Table 11. Comparisons to overcome challenges using various deep learning-based object detection methods.
7.1. Raspberry Pi 4
The latest, cheapest, and most flexible tiny product in the popular Raspberry Pi range of computers
is Raspberry Pi 4 Model B. It offers great progress in processor speed, multimedia performance, memory,
and connectivity compared to existing Raspberry Pi 3 Model B+. The key features of the Raspberry Pi
4 [260] module include a high-performance Broadcom BCM2711, a quad-core Cortex-A72 (ARM v8)
64-bit SoC, a pair of micro-HDMI ports which are used to connect dual displays with 4k resolution,
H.265 and H.264 hardware video decoding (maximally supporting up to 4Kp60 ), 8GB RAM, dual-band
2.4/5.0 GHz IEEE 802.11ac wireless (wireless LAN), OpenGL ES, 3.0 graphics, Bluetooth 5.0, Bluetooth
Low Energy (BLE), standard 40-pin GPIO, Gigabit Ethernet (2 × USB 2.0 ports, 3.0 ports), a micro SD
card slot for loading OS and data storage, operating temperature 40–50◦ , and power over Ethernet
(PoE) enabled (requires separate PoE HAT).
Table 13. Performance comparison between Jetson modules and GTX 1080 for real-time object detection.
GPU-Based Work
S.No Architecture TX1 (Fps) TX2 (Fps) Xavier AGX (Fps)
Station Gtx 1080 (Fps)
1 YOLOv2 3 10 25–30 27
2 YOLOv3 — 4 15–18 15.8
3 Tiny YOLOv3 8–10 11 31 31+
4 SSD 8 10–12 34–49 33
5 Faster RCNN — 1 1.2 —
6 Mask RCNN — — — 3–4
8. Research Directions
Despite great progress achieved in the object detection field, still, the technology remains
significantly far away from human vision while addressing real-world challenges, such as: detecting
objects under constrained conditions, working in an open world, and other modalities.
We can see the following directions of future research, based on these challenges:
1. More efficient detection frameworks: The main reason for the success of object detection is
due to the development of superior detection frameworks, both in two-stage and one-stage
detectors (RCNN, Fast/Faster/Mask RCNN, YOLO, and SSD). Two-stage detectors exhibit high
accuracy, whereas single-stage detectors are simple and faster. Object detectors depend a lot on
the underlying backbone models, and most of them are optimized for classification of images,
possibly causing a learning bias; and it could be helpful to develop new object detectors learning
from scratch.
2. Compact and efficient CNN features: CNN layers are increased in depth from several layers
(AlexNet) to hundreds of layers (ResNet, ResNext, CentreNet, DenseNet). All these networks
require a lot of data and high-end GPUs for training, since they have billions of parameters. Thus,
to reduce network redundancy further, researchers should show interest in designing lightweight
and compact networks.
3. Weakly supervised detection: At present all the state-of-the-art detectors use only labeled data
with either object segmentation masks or bounding boxes on the fully supervised models. But in
the absence of labeled training data, fully supervised learning is not scalable, so it is essential to
design a model where only partially labeled data are available.
4. Efficient backbone architecture for object detection: Done by adopting weights of pre-trained
classification models, since they are trained on large-scale datasets for object detection tasks. Thus,
adopting a pre-trained model might not result in an optimal solution due the conflicts between
image classification and object detection tasks. Currently, most object detectors are based on
classification backbones, and only a few use different backbone models (like SqueezeDet based
on SqueezeNet). So there is a need to develop a detection-aware light-weight backbone model for
real-time object detection.
5. Object detection in other modalities: Currently most of the object detectors work only with
2D images, but detection in other modalities—3D, LIDAR, etc.—would be highly relevant in
application areas such as self-driving cars [274], drones, and robots. However, again the 3D object
detection may raise new challenges using video, depth, and cloud points.
6. Network optimization: Selecting an optimal detection network brings a perfect balance between
speed, memory, and accuracy for a specific application and on embedded hardware. Though the
detection accuracy is reduced, it is better to teach compact models with few parameters, and this
situation might be overcome by introducing hint learning, knowledge distillation, and better
pre-training schemes.
7. Scale adaption [19]: It is more obvious in face detection and crowd detection; objects usually
exist on different scales. In order to increase the robustness to learn spatial transformation, it is
necessary to train designed detectors in scale-invariant, multi-scale, or scale-adaptive ways.
Appl. Sci. 2020, 10, 3280 31 of 46
(a) For scale-adaptive detectors, make an attention mechanism, form a cascaded network,
and scale a distribution estimation for detecting objects adaptively.
(b) For multi-scale detectors, both the GAN (generative adversarial network) and FPN (feature
pyramid network) generate a multi-scale feature map.
(c) For scale-invariant detectors, reverse connection, hard negative mining, backbone models
such as AlexNet (rotation invariance) and ResNet are all beneficial.
8. Cross-dataset training [275]: Cross-dataset training for object detection aims to detect the union
of all the classes across different existing datasets with a single model and without additional
labeling, which in turn saves the heavy burden of labeling new classes on all the existing datasets.
Using cross-dataset training, one only needs to label the new classes on the new dataset. It is
widely used in industrial applications that are usually faced with increasing classes.
Object detection is independent of domains and the research in many fields is still far complete.
Object detection is a fundamental problem to be solved; existing methods were developed. But
still, there is a huge scope for developing new mechanisms and object detection as basic services in
real-time applications, such as deep-sea bases, driverless cars, robots navigating on planets, industrial
plants, and drone cameras where high precision is expected for certain tasks.
Particularly, for detecting some small objects there remains a large speed gap between human
eyes and machine vision. The future of object detection would be AutoML; i.e., designing a detection
model to reduce human intervention. For more accurate detection, much data is required. To improve
overall accuracy further, training images with more diversity (scale, view-angle) of the object are
needed. Finally, we point out that promising future directions in this research field are not limited to
the aforementioned aspects, and the research in this field is still far from complete.
Author Contributions: Conceptualization, C.B.M. and M.F.H.; methodology, C.B.M. and M.F.H.; software,
C.B.M., M.F.H., and N.D.B.; validation, C.B.M., M.F.H., and N.D.B.; formal analysis, C.B.M., M.F.H.,
and N.D.B.; investigation, C.B.M. and M.F.H.; resources, C.B.M. and M.F.H.; data curation, C.B.M. and
M.F.H.; writing—original draft preparation, C.B.M., M.F.H., and N.D.B.; writing—review and editing, C.B.M.,
M.F.H., N.D.B., and Z.W.G.; visualization, M.F.H. and N.D.B.; supervision, N.D.B., M.F.H., and Z.W.G.; project
administration, C.B.M., M.F.H., N.D.B., and Z.W.G.; funding acquisition, Z.W.G. and N.D.B. All authors have read
and agreed to the published version of the manuscript.
Funding: This research was supported by the Energy Cloud R&D Program through the National Research
Foundation of Korea (NRF) funded by the Ministry of Science, ICT [2019M3F2A1073164].
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
2D 2-dimensional
ALPR Automatic License Plate Recognition
AP Average Precision
API Application Program Interface
ARD Acute Respiratory Disease
ARM Advanced RISC Machine
AV Autonomous Vehicle
BB Bounding Box
BCNN Binarized deep Convolutional Neural Network
BLE Bluetooth Low Energy
BSP Board Support Package
CAD Computer Aided Design
CAN Controller Area Network
CNN Convolutional Neural Network
COCO Common Objects in Context
COPD Chronic Obstructive Pulmonary Disease
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
CV Computer Vision
DBN Deep Belief Network
DCNN Deep Convolutional Neural Network
DDR Double Data Rate
DNN Deep Neural Network
DPM Deformable Part-based Model
FC Fully-Convolutional
FCN Fully-Convolutional Network
FDDB Face Detection Dataset and Benchmark
FPGA Field Programmable Gate Array
FPN Feature Pyramid Networks
FPS Frames Per Second
Appl. Sci. 2020, 10, 3280 33 of 46
References
1. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
In Advances in Neural Information Processing Systems; Curran Associates, Inc.: San Diego, CA, USA 2012;
pp. 1097–1105.
2. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal
networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: San Diego, CA, USA,
2015; pp. 91–99.
3. Nguyen, H.T.; Lee, E.H.; Lee, S. Study on the Classification Performance of Underwater Sonar Image
Classification Based on Convolutional Neural Networks for Detecting a Submerged Human Body. Sensors
2020, 20, 94. [CrossRef]
4. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.;
Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252.
[CrossRef]
5. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc)
challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [CrossRef]
6. Fourie, J.; Mills, S.; Green, R. Harmony filter: A robust visual tracking system using the improved harmony
search algorithm. Image Vis. Comput. 2010, 28, 1702–1716. [CrossRef]
7. Cuevas, E.; Ortega-Sánchez, N.; Zaldivar, D.; Pérez-Cisneros, M. Circle detection by harmony search
optimization. J. Intell. Robot. Syst. 2012, 66, 359–376. [CrossRef]
8. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef]
9. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006,
313, 504–507. [CrossRef]
10. McIvor, A.M. Background subtraction techniques. Proc. Image Vis. Comput. 2000, 4, 3099–3104.
11. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of
the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA,
8–14 December 2001; Volume 1, p. I.
12. Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [CrossRef]
13. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE
Computer Society Sonference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June
2005; Volume 1, pp. 886–893.
14. Felzenszwalb, P.; McAllester, D.; Ramanan, D. A discriminatively trained, multiscale, deformable part
model. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Anchorage, AK,
USA, 23–28 June 2008; pp. 1–8.
15. Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D. Cascade object detection with deformable part models.
In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition,
San Francisco, CA, USA, 13–18 June 2010; pp. 2241–2248.
16. Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained
part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1627–1645. [CrossRef]
17. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and
semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition,
Columbus, OH, USA, 24–27 June 2014; pp. 580–587.
18. Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055.
19. Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object
detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [CrossRef]
20. Pathak, A.R.; Pandey, M.; Rautaray, S. Application of deep learning for object detection. Procedia Comput.
Sci. 2018, 132, 1706–1717. [CrossRef]
21. Sultana, F.; Sufian, A.; Dutta, P. A review of object detection models based on convolutional neural network.
arXiv 2019, arXiv:1905.01614.
22. Zhao, Z.Q.; Zheng, P.; Xu, S.t.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural
Netw. Learn. Syst. 2019, 30, 3212–3232. [CrossRef] [PubMed]
Appl. Sci. 2020, 10, 3280 35 of 46
23. Mittal, U.; Srivastava, S.; Chawla, P. Review of different techniques for object detection using deep learning.
In Proceedings of the Third International Conference on Advanced Informatics for Computing Research,
Shimla, India, 15–16 June 2019; pp. 1–8.
24. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the International
Conference on Computer Vision, Kerkyra, Corfu, Greece, 20–25 September 1999; Volume 2, pp. 1150–1157.
25. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110.
[CrossRef]
26. Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Trans.
Pattern Anal. Mach. Intell. 2002, 24, 509–522. [CrossRef]
27. Girshick, R.B.; Felzenszwalb, P.F.; Mcallester, D.A. Object detection with grammar models. In Advances in
Neural Information Processing Systems; Curran Associates Inc.: San Francisco, CA, USA, 2011; pp. 442–450.
28. Girshick, R.B. From Rigid Templates to Grammars: Object Detection with Structured Models. Ph.D. Thesis,
The University of Chicago, Chicago, IL, USA, 2012.
29. Li, Y.F.; Kwok, J.T.; Tsang, I.W.; Zhou, Z.H. A convex method for locating regions of interest with
multi-instance learning. In Proceedings of the Joint European Conference on Machine Learning and
Knowledge Discovery in Databases, Antwerp, Belgium 14–18 September 2009; Springer: Berlin, Germany,
2009; pp. 15–30.
30. Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J.
Comput. Vis. 2013, 104, 154–171. [CrossRef]
31. Girshick, R.B.; Felzenszwalb, P.F.; McAllester, D. Discriminatively Trained Deformable Part Models, Release
5. 2012. Available online: http://people.cs.uchicago.edu/~rbg/latent-release5/ (accessed on 7 May 2020).
32. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual
recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef]
33. Girshick, R. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, Santiago,
Chile, 13–16 December 2015; pp. 1440–1448.
34. Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the
European Conference on Computer Vision, Nancy, France, 14–18 September 2014; Springer: Berlin, Germany,
2014; pp. 818–833.
35. Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks.
In Advances in Neural Information Processing Systems; Curran Associates Inc.: San Francisco, CA, USA, 2016;
pp. 379–387.
36. Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Light-head R-CNN: In defense of two-stage object
detector. arXiv 2017, arXiv:1711.07264.
37. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object
detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI,
USA, 21–26 July 2017; pp. 2117–2125.
38. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969.
39. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016;
pp. 770–778.
40. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA,
26 June–1 July 2016; pp. 779–788.
41. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox
detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands,
8–16 Octomber 2016; Springer: Berlin, Germany, 2016; pp. 21–37.
42. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the Conference on Computer
Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
43. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
44. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the
International Cconference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988.
Appl. Sci. 2020, 10, 3280 36 of 46
45. Wu, B.; Iandola, F.; Jin, P.H.; Keutzer, K. SqueezeDet: Unified, small, low power fully convolutional neural
networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 129–137.
46. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level
accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv 2016, arXiv:1602.07360.
47. Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale
dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983.
48. Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European
Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750.
49. Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Object detection with keypoint triplets. arXiv
2019, arXiv:1904.08189.
50. Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017,
arXiv:1701.06659.
51. Mathematica. Available online: https://www.wolfram.com/mathematica/ (accessed on 31 December 2019).
52. Dlib. Available online: Dlib.net (accessed on 31 December 2019).
53. Theano. Available online: http://deeplearning.net/software/theano/ (accessed on 31 December 2019).
54. Caffe. Available online: http://caffe.berkeleyvision.org/ (accessed on 31 December 2019).
55. Deeplearning4j. Available online: https://deeplearning4j.org (accessed on 31 December 2019).
56. Cahiner. Available online: https://chainer.org (accessed on 31 December 2019).
57. Keras. Available online: https://keras.io/ (accessed on 31 December 2019).
58. Mathworks—Deep Learning. Available online: https://in.mathworks.com/solutions/deep-learning.html
(accessed on 31 December 2019).
59. Apache. Available online: http://singa.apache.org (accessed on 31 December 2019).
60. TensorFlow. Available online: https://www.tensorflow.org/ (accessed on 31 December 2019).
61. Pytorch. Available online: https://pytorch.org (accessed on 31 December 2019).
62. BigDL. Available online: https://github.com/intel-analytics/BigDL (accessed on 31 December 2019).
63. Apache. Available online: http://www.apache.org (accessed on 31 December 2019).
64. MXnet. Available online: http://mxnet.io/ (accessed on 31 December 2019).
65. Microsoft Cognitive Service. Available online: https://www.microsoft.com/cognitive-services/en-us/
computer-vision-api (accessed on 31 December 2019).
66. Amazon Recognition. Available online: https://aws.amazon.com/rekognition/ (accessed on 31 December 2019).
67. IBM Watson Vision Recognition service. Available online: http://www.ibm.com/watson/developercloud/
visual-recognition.html (accessed on 31 December 2019).
68. Google Cloud Vision API. Available online: https://cloud.google.com/vision/ (accessed on 31 December 2019).
69. Cloud Sight. Available online: https://cloudsight.readme.io/v1.0/docs (accessed on 31 December 2019).
70. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June
2009; pp. 248–255.
71. Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Wider face: A face detection benchmark. In Proceedings of the
IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 26 June–1 July 2016;
pp. 5525–5533.
72. Jain, V.; Learned-Miller, E. Fddb: A Benchmark for Face Detection in Unconstrained Settings; Technical Report,
UMass Amherst Technical Report; UMass Amherst Libraries: Amherst, MA, USA, 2010.
73. Zhang, S.; Benenson, R.; Schiele, B. Citypersons: A diverse dataset for pedestrian detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017;
pp. 3213–3221.
74. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B.
The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 3213–3223.
75. Ess, A.; Leibe, B.; Van Gool, L. Depth and appearance for mobile scene analysis. In Proceedings of the 2007
IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8.
Appl. Sci. 2020, 10, 3280 37 of 46
76. Torralba, A.; Fergus, R.; Freeman, W.T. 80 million tiny images: A large data set for nonparametric object and
scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1958–1970. [CrossRef]
77. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco:
Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich,
Switzerland, 5–12 September 2014; Springer: Berlin, Germany, 2014; pp. 740–755.
78. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis, University of Tront,
Toronto, ON, Canada, 2009.
79. Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S.; Goering, C.; Berg, T.; Belhumeur, P. Caltech-UCSD
Birds-200-2011; California Institute of Technology: Pasadena, CA, USA, 2011.
80. Welinder, P.; Branson, S.; Mita, T.; Wah, C.; Schroff, F.; Belongie, S.; Perona, P. Caltech-UCSD birds 200;
California Institute of Technology: Pasadena, CA, USA, 2010.
81. Griffin, G.; Holub, A.; Perona, P. Caltech-256 Object Category Dataset; California Institute of Technology:
Pasadena, CA, USA, 2007.
82. ILSVRC Detection Challenge Results. Available online: http://www.image-net.org/challenges/LSVRC/
(accessed on 31 December 2019).
83. Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object
classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [CrossRef]
84. Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A database and web-based tool for image
annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [CrossRef]
85. Xiao, J.; Hays, J.; Ehinger, K.A.; Oliva, A.; Torralba, A. Sun database: Large-scale scene recognition from abbey
to zoo. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition,
San Francisco, CA, USA, 13–18 June 2010; pp. 3485–3492.
86. Open Images. Available online: https://www.kaggle.com/bigquery/open-images (accessed on
31 December 2019).
87. Kragh, M.F.; Christiansen, P.; Laursen, M.S.; Larsen, M.; Steen, K.A.; Green, O.; Karstoft, H.; Jørgensen, R.N.
FieldSAFE: Dataset for obstacle detection in agriculture. Sensors 2017, 17, 2579. [CrossRef] [PubMed]
88. Grady, N.W.; Underwood, M.; Roy, A.; Chang, W.L. Big data: Challenges, practices and technologies:
NIST big data public working group workshop at IEEE big data 2014. In Proceedings of the International
Conference on Big Data, Washington, DC, USA, 27–30 October 2014; pp. 11–15.
89. Dollár, P.; Tu, Z.; Perona, P.; Belongie, S. Integral Channel Features; BMVC Press: London, UK, 2009.
90. Maji, S.; Berg, A.C.; Malik, J. Classification using intersection kernel support vector machines is efficient.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA,
23–28 June 2008; pp. 1–8.
91. Zhu, Q.; Yeh, M.C.; Cheng, K.T.; Avidan, S. Fast human detection using a cascade of histograms of oriented
gradients. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition,
New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1491–1498.
92. Mohan, A.; Papageorgiou, C.; Poggio, T. Example-based object detection in images by components.
IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 349–361. [CrossRef]
93. Wang, X.; Han, T.X.; Yan, S. An HOG-LBP human detector with partial occlusion handling. In Proceedings
of the International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 32–39.
94. Wu, B.; Nevatia, R. Detection of multiple, partially occluded humans in a single image by bayesian
combination of edgelet part detectors. In Proceedings of the International Conference on Computer Vision,
Beijing, China, 17–21 October 2005; Volume 1, pp. 90–97.
95. Andreopoulos, A.; Tsotsos, J.K. 50 years of object recognition: Directions forward. Comput. Vis. Image
Underst. 2013, 117, 827–891. [CrossRef]
96. Sadeghi, M.A.; Forsyth, D. 30hz object detection with dpm v5. In Proceedings of the European Conference
on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, 2014; pp. 65–79.
97. Hosang, J.; Omran, M.; Benenson, R.; Schiele, B. Taking a deeper look at pedestrians. In Proceedings of the
Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4073–4082.
98. Yi, Z.; Yongliang, S.; Jun, Z. An improved tiny-yolov3 pedestrian detection algorithm. Optik 2019, 183, 17–23.
[CrossRef]
Appl. Sci. 2020, 10, 3280 38 of 46
99. Zhang, L.; Lin, L.; Liang, X.; He, K. Is faster R-CNNN doing well for pedestrian detection? In Proceedings of
the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer:
Berlin, Germany, 2016; pp. 443–457.
100. Song, T.; Sun, L.; Xie, D.; Sun, H.; Pu, S. Small-scale pedestrian detection based on somatic topology
localization and temporal feature aggregation. arXiv 2018, arXiv:1807.01438.
101. Cao, J.; Pang, Y.; Li, X. Learning multilayer channel features for pedestrian detection. IEEE Trans.
Image Process. 2017, 26, 3210–3220. [CrossRef] [PubMed]
102. Mao, J.; Xiao, T.; Jiang, Y.; Cao, Z. What can help pedestrian detection? In Proceedings of the Conference on
Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3127–3136.
103. Krishna, H.; Jawahar, C. Improving small object detection. In Proceedings of the 4th IAPR Asian Conference
on Pattern Recognition (ACPR), Nanjing China, 26–29 November 2017; pp. 340–345.
104. Hu, Q.; Wang, P.; Shen, C.; van den Hengel, A.; Porikli, F. Pushing the limits of deep cnns for pedestrian
detection. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 1358–1368. [CrossRef]
105. Lee, Y.; Bui, T.D.; Shin, J. Pedestrian detection based on deep fusion network using feature correlation.
In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and
Conference (APSIPA ASC), Honolulu, HI, USA, 12–15 November 2018; pp. 694–699.
106. Cai, Z.; Saberian, M.; Vasconcelos, N. Learning complexity-aware cascades for deep pedestrian detection.
In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December
2015; pp. 3361–3369.
107. Bosquet, B.; Mucientes, M.; Brea, V.M. STDnet: Exploiting high resolution feature maps for small object
detection. Eng. Appl. Artif. Intell. 2020, 91, 103615. [CrossRef]
108. Tian, Y.; Luo, P.; Wang, X.; Tang, X. Deep learning strong parts for pedestrian detection. In Proceedings of
the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1904–1912.
109. Ouyang, W.; Zhou, H.; Li, H.; Li, Q.; Yan, J.; Wang, X. Jointly learning deep features, deformable parts,
occlusion and classification for pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017,
40, 1874–1887. [CrossRef]
110. Zhang, S.; Yang, J.; Schiele, B. Occluded pedestrian detection through guided attention in CNNs.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,
18–23 June 2018; pp. 6995–7003.
111. Gao, M.; Yu, R.; Li, A.; Morariu, V.I.; Davis, L.S. Dynamic zoom-in network for fast object detection in large
images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City,
UT, USA, 18–23 June 2018; pp. 6926–6935.
112. Lu, Y.; Javidi, T.; Lazebnik, S. Adaptive object detection using adjacency and zoom prediction. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July
2016; pp. 2351–2359.
113. Wang, X.; Xiao, T.; Jiang, Y.; Shao, S.; Sun, J.; Shen, C. Repulsion loss: Detecting pedestrians in a crowd.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,
18–23 June 2018; pp. 7774–7783.
114. Tian, Y.; Luo, P.; Wang, X.; Tang, X. Pedestrian detection aided by deep learning semantic tasks.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June
2015; pp. 5079–5087.
115. Shrivastava, A.; Gupta, A.; Girshick, R. Training region-based object detectors with online hard example
mining. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV,
USA, 26 June–1 July 2016; pp. 761–769.
116. Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L. Vehicle detection in aerial images based on region convolutional
neural networks and hard negative example mining. Sensors 2017, 17, 336. [CrossRef]
117. Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT,
USA, 18–23 June 2018; pp. 4203–4212.
118. Jin, J.; Fu, K.; Zhang, C. Traffic sign recognition with hinge loss trained convolutional neural networks.
IEEE Trans. Intell. Transp. Syst. 2014, 15, 1991–2000. [CrossRef]
119. Zhou, M.; Jing, M.; Liu, D.; Xia, Z.; Zou, Z.; Shi, Z. Multi-resolution networks for ship detection in infrared
remote sensing images. Infrared Phys. Technol. 2018, 92, 183–189. [CrossRef]
Appl. Sci. 2020, 10, 3280 39 of 46
120. Xu, D.; Ouyang, W.; Ricci, E.; Wang, X.; Sebe, N. Learning cross-modal deep representations for robust
pedestrian detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition,
Honolulu, HI, USA, 21–26 July 2017; pp. 5363–5371.
121. Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Occlusion-aware R-CNN: Detecting pedestrians in a crowd.
In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September
2018; pp. 637–653.
122. Zhou, C.; Yuan, J. Bi-box regression for pedestrian detection and occlusion estimation. In Proceedings of the
European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 135–151.
123. Hsu, W.Y. Automatic pedestrian detection in partially occluded single image. Integr. Comput.-Aided Eng.
2018, 25, 369–379. [CrossRef]
124. Ren, Y.; Zhu, C.; Xiao, S. Deformable faster r-cnn with aggregating multi-layer features for partially occluded
object detection in optical remote sensing images. Remote Sens. 2018, 10, 1470. [CrossRef]
125. Li, W.; Ni, H.; Wang, Y.; Fu, B.; Liu, P.; Wang, S. Detection of partially occluded pedestrians by an enhanced
cascade detector. IET Intell. Transp. Syst. 2014, 8, 621–630. [CrossRef]
126. Yang, G.; Huang, T.S. Human face detection in a complex background. Pattern Recognit. 1994, 27, 53–63.
[CrossRef]
127. Craw, I.; Tock, D.; Bennett, A. Finding face features. In Proceedings of the European Conference on
Computer Vision, Santa Margherita Ligure, Italy, 9–22 May 1992; Springer: Berlin, Germany, 1992; pp. 92–96.
128. Turk, M.; Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci. 1991, 3, 71–86. [CrossRef] [PubMed]
129. Vaillant, R.; Monrocq, C.; Le Cun, Y. Original approach for the localisation of objects in images. IEE Proc.
Vision. Image Signal Process. 1994, 141, 245–250. [CrossRef]
130. Pentland; Moghaddam; Starner. View-based and modular eigenspaces for face recognition. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 1994;
pp. 84–91.
131. Rowley, H.A.; Baluja, S.; Kanade, T. Human face detection in visual scenes. In Advances in Neural Information
Processing Systems; Curran Associates Inc.: San Francisco, CA, USA, 1996; pp. 875–881.
132. Rowley, H.A.; Baluja, S.; Kanade, T. Neural network-based face detection. IEEE Trans. Pattern Anal. Mach.
Intell. 1998, 20, 23–38. [CrossRef]
133. Osuna, E.; Freund, R.; Girosit, F. Training support vector machines: An application to face detection.
In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
San Juan, PR, USA, 17–19 June 1997; pp. 130–136.
134. Byun, H.; Lee, S.W. Applications of support vector machines for pattern recognition: A survey.
In Proceedings of the International Workshop on Support Vector Machine, Niagara Falls, ON, Canada,
10 August 2002; Springer: Berlin, Germany, 2002; pp. 213–236.
135. Xiao, R.; Zhu, L.; Zhang, H.J. Boosting chain learning for object detection. In Proceedings of the Ninth IEEE
International Conference on Computer Vision, Nice, France, 14–17 October 2003; pp. 709–715.
136. Zhang, Y.; Zhao, D.; Sun, J.; Zou, G.; Li, W. Adaptive convolutional neural network and its application in
face recognition. Neural Process. Lett. 2016, 43, 389–399. [CrossRef]
137. Wu, S.; Kan, M.; Shan, S.; Chen, X. Hierarchical Attention for Part-Aware Face Detection. Int. J. Comput. Vis.
2019, 127, 560–578. [CrossRef]
138. Li, H.; Lin, Z.; Shen, X.; Brandt, J.; Hua, G. A convolutional neural network cascade for face detection.
In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA,
7–12 June 2015; pp. 5325–5334.
139. Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded
convolutional networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [CrossRef]
140. Hao, Z.; Liu, Y.; Qin, H.; Yan, J.; Li, X.; Hu, X. Scale-aware face detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6186–6195.
141. Najibi, M.; Samangouei, P.; Chellappa, R.; Davis, L.S. SSH: Single stage headless face detector. In
Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017;
pp. 4875–4884.
142. Shi, X.; Shan, S.; Kan, M.; Wu, S.; Chen, X. Real-time rotation-invariant face detection with progressive
calibration networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Salt Lake City, UT, USA, 18–23 July 2018; pp. 2295–2303.
Appl. Sci. 2020, 10, 3280 40 of 46
143. Chen, D.; Hua, G.; Wen, F.; Sun, J. Supervised transformer network for efficient face detection. In Proceedings
of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer:
Berlin, Germany, 2016; pp. 122–138.
144. Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Faceness-net: Face detection through deep facial part responses.
IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1845–1859. [CrossRef]
145. Ghodrati, A.; Diba, A.; Pedersoli, M.; Tuytelaars, T.; Van Gool, L. Deepproposal: Hunting objects by
cascading deep convolutional layers. In Proceedings of the IEEE International Conference on Computer
Vision, Las Condes, Chile, 11–18 December 2015; pp. 2578–2586.
146. Wang, J.; Yuan, Y.; Yu, G. Face attention network: An effective face detector for the occluded faces. arXiv
2017, arXiv:1711.07246.
147. Wang, X.; Shrivastava, A.; Gupta, A. A-fast-RCNN: Hard positive generation via adversary for object
detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI,
USA, 21–26 July 2017; pp. 2606–2615.
148. Zhou, Y.; Liu, D.; Huang, T. Survey of face detection on low-quality images. In Proceedings of the 2018 13th
IEEE International Conference on Automatic Face & Gesture Recognition, Xi’an, China, 15–19 May 2018;
pp. 769–773.
149. Yang, S.; Xiong, Y.; Loy, C.C.; Tang, X. Face detection through scale-friendly deep convolutional networks.
arXiv 2017, arXiv:1706.02863.
150. Zhang, S.; Zhu, X.; Lei, Z.; Shi, H.; Wang, X.; Li, S.Z. S3fd: Single shot scale-invariant face detector.
In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017;
pp. 192–201.
151. Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network
for fast object detection. In Proceedings of the European conference on computer vision, Amsterdam,
The Netherlands, 8–16 October 2016; Springer: Berlin, Germany, 2016; pp. 354–370.
152. Zhang, C.; Xu, X.; Tu, D. Face detection using improved faster rcnn. arXiv 2018, arXiv:1802.02142.
153. Li, Y.; Chen, Y.; Wang, N.; Zhang, Z. Scale-aware trident networks for object detection. In Proceedings
of the IEEE International Conference on Computer Vision, South Korea, 27 October–2 November 2019;
pp. 6054–6063.
154. Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Detnet: A backbone network for object detection. arXiv
2018, arXiv:1804.06215.
155. Liu, S.; Huang, D.; Wang, Y. Receptive field block net for accurate and fast object detection. In Proceedings of
the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400.
156. Zhang, F.; Du, B.; Zhang, L.; Xu, M. Weakly supervised learning based on coupled convolutional neural
networks for aircraft detection. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 5553–5563. [CrossRef]
157. Han, J.; Zhang, D.; Cheng, G.; Guo, L.; Ren, J. Object detection in optical remote sensing images based
on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote. Sens. 2014,
53, 3325–3337. [CrossRef]
158. Li, Q.; Wang, Y.; Liu, Q.; Wang, W. Hough transform guided deep feature extraction for dense building
detection in remote sensing images. In Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1872–1876.
159. Mou, L.; Zhu, X.X. Vehicle instance segmentation from aerial image and video using a multitask learning
residual fully convolutional network. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 6699–6711. [CrossRef]
160. Chen, X.; Xiang, S.; Liu, C.L.; Pan, C.H. Vehicle detection in satellite images by hybrid deep convolutional
neural networks. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1797–1801. [CrossRef]
161. Ammour, N.; Alhichri, H.; Bazi, Y.; Benjdira, B.; Alajlan, N.; Zuair, M. Deep learning approach for car
detection in UAV imagery. Remote Sens. 2017, 9, 312. [CrossRef]
162. Ma, W.; Guo, Q.; Wu, Y.; Zhao, W.; Zhang, X.; Jiao, L. A novel multi-model decision fusion network for
object detection in remote sensing images. Remote Sens. 2019, 11, 737. [CrossRef]
163. Zhang, X.; Zhu, K.; Chen, G.; Tan, X.; Zhang, L.; Dai, F.; Liao, P.; Gong, Y. Geospatial object detection on high
resolution remote sensing imagery based on double multi-scale feature pyramid network. Remote Sens. 2019,
11, 755. [CrossRef]
Appl. Sci. 2020, 10, 3280 41 of 46
164. Wang, J.; Ding, J.; Guo, H.; Cheng, W.; Pan, T.; Yang, W. Mask OBB: A Semantic Attention-Based Mask
Oriented Bounding Box Representation for Multi-Category Object Detection in Aerial Images. Remote Sens.
2019, 11, 2930. [CrossRef]
165. Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection
in VHR optical remote sensing images. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 7405–7415. [CrossRef]
166. Li, Q.; Mou, L.; Xu, Q.; Zhang, Y.; Zhu, X.X. R3-net: A deep network for multi-oriented vehicle detection in
aerial images and videos. arXiv 2018, arXiv:1808.05560.
167. Pang, J.; Li, C.; Shi, J.; Xu, Z.; Feng, H. R2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing
Images. IEEE Trans. Geosci. Remote. Sens. 2019, 57, 5512–5524. [CrossRef]
168. Qian, X.; Lin, S.; Cheng, G.; Yao, X.; Ren, H.; Wang, W. Object Detection in Remote Sensing Images Based on
Improved Bounding Box Regression and Multi-Level Features Fusion. Remote Sens. 2020, 12, 143. [CrossRef]
169. Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image
classification based on collection of part detectors. ISPRS J. Photogramm. Remote. Sens. 2014, 98, 119–132.
[CrossRef]
170. Liu, K.; Mattyus, G. Fast multiclass vehicle detection on aerial images. IEEE Geosci. Remote. Sens. Lett. 2015,
12, 1938–1942.
171. Razakarivony, S.; Jurie, F. Vehicle detection in aerial imagery: A small target detection benchmark. J. Vis.
Commun. Image Represent. 2016, 34, 187–203. [CrossRef]
172. Zhang, Y.; Yuan, Y.; Feng, Y.; Lu, X. Hierarchical and robust convolutional neural network for very
high-resolution remote sensing object detection. IEEE Trans. Geosci. Remote. Sens. 2019, 57, 5535–5548.
[CrossRef]
173. Islam, J.; Zhang, Y. Early Diagnosis of Alzheimer’s Disease: A Neuroimaging Study with Deep Learning
Architectures. In Proceedings of the IEEE Conference on Computer vision and Pattern Recognition
Workshops, Salt Lake City, UT, USA, 19–21 June 2018; pp. 1881–1883.
174. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual
connections on learning. In Proceedings of the Thirty-first AAAI conference on artificial intelligence,
San Francisco, CA, USA, 4–9 February 2017.
175. Marcus, D.S.; Fotenos, A.F.; Csernansky, J.G.; Morris, J.C.; Buckner, R.L. Open access series of imaging
studies: Longitudinal MRI data in nondemented and demented older adults. J. Cogn. Neurosci. 2010,
22, 2677–2684. [CrossRef] [PubMed]
176. Alaverdyan, Z.; Jung, J.; Bouet, R.; Lartizien, C. Regularized siamese neural network for unsupervised
outlier detection on brain multiparametric magnetic resonance imaging: Application to epilepsy lesion
screening. Med Image Anal. 2020, 60, 101618. [CrossRef] [PubMed]
177. Laukamp, K.R.; Thiele, F.; Shakirin, G.; Zopfs, D.; Faymonville, A.; Timmer, M.; Maintz, D.; Perkuhn, M.;
Borggrefe, J. Fully automated detection and segmentation of meningiomas using deep learning on routine
multiparametric MRI. Eur. Radiol. 2019, 29, 124–132. [CrossRef] [PubMed]
178. Katzmann, A.; Muehlberg, A.; Suehling, M.; Noerenberg, D.; Holch, J.W.; Heinemann, V.; Gross, H.M.
Predicting Lesion Growth and Patient Survival in Colorectal Cancer Patients Using Deep Neural Networks.
In Proceedings of the Conference track: Medical Imaging with Deep Learning, Amsterdam, The Netherlands,
4–6 July 2018.
179. Bejnordi, B.E.; Veta, M.; Van Diest, P.J.; Van Ginneken, B.; Karssemeijer, N.; Litjens, G.; Van Der Laak, J.A.;
Hermsen, M.; Manson, Q.F.; Balkenhol, M.; et al. Diagnostic assessment of deep learning algorithms for
detection of lymph node metastases in women with breast cancer. JAMA 2017, 318, 2199–2210. [CrossRef]
[PubMed]
180. Zhang, J.; Cain, E.H.; Saha, A.; Zhu, Z.; Mazurowski, M.A. Breast mass detection in mammography
and tomosynthesis via fully convolutional network-based heatmap regression. In Medical Imaging 2018:
Computer-Aided Diagnosis. International Society for Optics and Photonics; SPIE: Bellingham WA, USA, 2018;
Volume 10575, p. 1057525.
181. Dalmış, M.U.; Vreemann, S.; Kooi, T.; Mann, R.M.; Karssemeijer, N.; Gubern-Mérida, A. Fully automated
detection of breast cancer in screening MRI using convolutional neural networks. J. Med Imaging 2018,
5, 014502. [CrossRef] [PubMed]
Appl. Sci. 2020, 10, 3280 42 of 46
182. Abràmoff, M.D.; Lou, Y.; Erginay, A.; Clarida, W.; Amelon, R.; Folk, J.C.; Niemeijer, M. Improved automated
detection of diabetic retinopathy on a publicly available dataset through integration of deep learning.
Investig. Ophthalmol. Vis. Sci. 2016, 57, 5200–5206. [CrossRef]
183. Winkels, M.; Cohen, T.S. 3D G-CNNs for pulmonary nodule detection. arXiv 2018, arXiv:1804.04656.
184. Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.;
Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based deep learning.
Cell 2018, 172, 1122–1131. [CrossRef]
185. Food, U. Drug Administration. FDA Permits Marketing of Artificial Intelligence-Based Device to Detect Certain
Diabetes-Related Eye Problems; SciPol: Durham, NC, USA, 2018.
186. Gutman, D.; Codella, N.C.; Celebi, E.; Helba, B.; Marchetti, M.; Mishra, N.; Halpern, A. Skin lesion analysis
toward melanoma detection: A challenge at the international symposium on biomedical imaging (ISBI) 2016,
hosted by the international skin imaging collaboration (ISIC). arXiv 2016, arXiv:1605.01397.
187. González, G.; Ash, S.Y.; Vegas-Sánchez-Ferrero, G.; Onieva Onieva, J.; Rahaghi, F.N.; Ross, J.C.; Díaz, A.;
San José Estépar, R.; Washko, G.R. Disease staging and prognosis in smokers using deep learning in chest
computed tomography. Am. J. Respir. Crit. Care Med. 2018, 197, 193–203. [CrossRef]
188. Depeursinge, A.; Vargas, A.; Platon, A.; Geissbuhler, A.; Poletti, P.A.; Müller, H. Building a reference
multimedia database for interstitial lung diseases. Comput. Med Imaging Graph. 2012, 36, 227–238. [CrossRef]
[PubMed]
189. Armato III, S.G.; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Zhao, B.; Aberle,
D.R.; Henschke, C.I.; Hoffman, E.A.; et al. The lung image database consortium (LIDC) and image database
resource initiative (IDRI): A completed reference database of lung nodules on CT scans. Med Phys. 2011,
38, 915–931. [CrossRef] [PubMed]
190. Petersen, R.C.; Aisen, P.; Beckett, L.A.; Donohue, M.; Gamst, A.; Harvey, D.J.; Jack, C.; Jagust, W.; Shaw, L.;
Toga, A.; et al. Alzheimer’s disease neuroimaging initiative (ADNI): Clinical characterization. Neurology
2010, 74, 201–209. [CrossRef] [PubMed]
191. Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom,
J.; Wiest, R.; et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med
Imaging 2014, 34, 1993–2024. [CrossRef] [PubMed]
192. Marcus, D.S.; Wang, T.H.; Parker, J.; Csernansky, J.G.; Morris, J.C.; Buckner, R.L. Open Access Series of
Imaging Studies (OASIS): Cross-sectional MRI data in young, middle aged, nondemented, and demented
older adults. J. Cogn. Neurosci. 2007, 19, 1498–1507. [CrossRef]
193. Bowyer, K.; Kopans, D.; Kegelmeyer, W.; Moore, R.; Sallam, M.; Chang, K.; Woods, K. The digital database
for screening mammography. In Proceedings of the Third International Workshop on Digital Mammography,
Chicago, IL, USA, 9–12 June 1996; Volume 58, p. 27.
194. Suckling, J.; Parker, J.; Dance, D.; Astley, S.; Hutt, I.; Boggis, C.; Ricketts, I.; Stamatakis, E.; Cerneaz, N.;
Kok, S.; et al. Mammographic Image Analysis Society (MIAS) Database v1. 21; University of Cambridge:
Cambridge, UK, 2015.
195. Bandi, P.; Geessink, O.; Manson, Q.; Van Dijk, M.; Balkenhol, M.; Hermsen, M.; Bejnordi, B.E.; Lee, B.;
Paeng, K.; Zhong, A.; et al. From detection of individual metastases to classification of lymph node status at
the patient level: The camelyon17 challenge. IEEE Trans. Med Imaging 2018, 38, 550–560. [CrossRef]
196. Moreira, I.C.; Amaral, I.; Domingues, I.; Cardoso, A.; Cardoso, M.J.; Cardoso, J.S. Inbreast: Toward a full-field
digital mammographic database. Acad. Radiol. 2012, 19, 236–248. [CrossRef]
197. Staal, J.; Abràmoff, M.D.; Niemeijer, M.; Viergever, M.A.; Van Ginneken, B. Ridge-based vessel segmentation
in color images of the retina. IEEE Trans. Med Imaging 2004, 23, 501–509. [CrossRef]
198. Hoover, A.; Kouznetsova, V.; Goldbaum, M. Locating blood vessels in retinal images by piecewise threshold
probing of a matched filter response. IEEE Trans. Med Imaging 2000, 19, 203–210. [CrossRef]
199. Decencière, E.; Zhang, X.; Cazuguel, G.; Lay, B.; Cochener, B.; Trone, C.; Gain, P.; Ordonez, R.; Massin, P.;
Erginay, A.; et al. Feedback on a publicly distributed image database: The Messidor database. Image Anal.
Stereol. 2014, 33, 231–234. [CrossRef]
200. Hu, W.; Zhuo, Q.; Zhang, C.; Li, J. Fast branch convolutional neural network for traffic sign recognition.
IEEE Intell. Transp. Syst. Mag. 2017, 9, 114–126. [CrossRef]
201. Shao, F.; Wang, X.; Meng, F.; Rui, T.; Wang, D.; Tang, J. Real-time traffic sign detection and recognition
method based on simplified Gabor wavelets and CNNs. Sensors 2018, 18, 3192. [CrossRef] [PubMed]
Appl. Sci. 2020, 10, 3280 43 of 46
202. Shao, F.; Wang, X.; Meng, F.; Zhu, J.; Wang, D.; Dai, J. Improved faster R-CNN traffic sign detection based on
a second region of interest and highly possible regions proposal network. Sensors 2019, 19, 2288. [CrossRef]
[PubMed]
203. Cao, J.; Song, C.; Peng, S.; Xiao, F.; Song, S. Improved traffic sign detection and recognition algorithm for
intelligent vehicles. Sensors 2019, 19, 4021. [CrossRef] [PubMed]
204. Zhang, J.; Huang, M.; Jin, X.; Li, X. A real-time chinese traffic sign detection algorithm based on modified
YOLOv2. Algorithms 2017, 10, 127. [CrossRef]
205. Luo, H.; Yang, Y.; Tong, B.; Wu, F.; Fan, B. Traffic sign recognition using a multi-task convolutional neural
network. IEEE Trans. Intell. Transp. Syst. 2017, 19, 1100–1111. [CrossRef]
206. Li, J.; Wang, Z. Real-time traffic sign recognition based on efficient CNNs in the wild. IEEE Trans. Intell.
Transp. Syst. 2018, 20, 975–984. [CrossRef]
207. Masood, S.Z.; Shu, G.; Dehghan, A.; Ortiz, E.G. License plate detection and recognition using deeply learned
convolutional neural networks. arXiv 2017, arXiv:1703.07330.
208. Laroca, R.; Zanlorensi, L.A.; Gonçalves, G.R.; Todt, E.; Schwartz, W.R.; Menotti, D. An efficient and
layout-independent automatic license plate recognition system based on the YOLO detector. arXiv 2019,
arXiv:1909.01754.
209. Hendry; Chen, R.-C. Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning.
Image Vis. Comput. 2019, 87, 47–56.
210. Raza, M.A.; Qi, C.; Asif, M.R.; Khan, M.A. An Adaptive Approach for Multi-National Vehicle License Plate
Recognition Using Multi-Level Deep Features and Foreground Polarity Detection Model. Appl. Sci. 2020,
10, 2165. [CrossRef]
211. Gonçalves, G.R.; Diniz, M.A.; Laroca, R.; Menotti, D.; Schwartz, W.R. Real-time automatic license plate
recognition through deep multi-task networks. In Proceedings of the 31st SIBGRAPI Conference on Graphics,
Patterns and Images (SIBGRAPI), Paraná, Brazil, 29 October–1 November 2018; pp. 110–117.
212. Arnold, E.; Al-Jarrah, O.Y.; Dianati, M.; Fallah, S.; Oxtoby, D.; Mouzakitis, A. A survey on 3d object
detection methods for autonomous driving applications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3782–3795.
[CrossRef]
213. Pham, C.C.; Jeon, J.W. Robust object proposals re-ranking for object detection in autonomous driving using
convolutional neural networks. Signal Process. Image Commun. 2017, 53, 110–122. [CrossRef]
214. Li, B.; Zhang, T.; Xia, T. Vehicle detection from 3d lidar using fully convolutional network. arXiv 2016,
arXiv:1608.07916.
215. Helbing, D.; Brockmann, D.; Chadefaux, T.; Donnay, K.; Blanke, U.; Woolley-Meza, O.; Moussaid, M.;
Johansson, A.; Krause, J.; Schutte, S.; et al. Saving human lives: What complexity science and information
systems can contribute. J. Stat. Phys. 2015, 158, 735–781. [CrossRef]
216. Saleh, S.A.M.; Suandi, S.A.; Ibrahim, H. Recent survey on crowd density estimation and counting for visual
surveillance. Eng. Appl. Artif. Intell. 2015, 41, 103–114. [CrossRef]
217. Jones, M.J.; Snow, D. Pedestrian detection using boosted features over many frames. In Proceedings of the
International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; pp. 1–4.
218. Viola, P.; Jones, M.J.; Snow, D. Detecting pedestrians using patterns of motion and appearance. Int. J.
Comput. Vis. 2005, 63, 153–161. [CrossRef]
219. Leibe, B.; Seemann, E.; Schiele, B. Pedestrian detection in crowded scenes. In Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June
2005; Volume 1, pp. 878–885.
220. Lin, S.F.; Chen, J.Y.; Chao, H.X. Estimation of number of people in crowded scenes using perspective
transformation. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2001, 31, 645–654.
221. Junior, J.C.S.J.; Musse, S.R.; Jung, C.R. Crowd analysis using computer vision techniques. IEEE Signal
Process. Mag. 2010, 27, 66–77.
222. Kok, V.J.; Lim, M.K.; Chan, C.S. Crowd behavior analysis: A review where physics meets biology.
Neurocomputing 2016, 177, 342–362. [CrossRef]
223. Sun, M.; Zhang, D.; Qian, L.; Shen, Y. Crowd Abnormal Behavior Detection Based on Label Distribution
Learning. In Proceedings of the International Conference on Intelligent Computation Technology and
Automation, Nanchang, China, 14–15 June 2015; pp. 345–348.
224. Zhao, L.; Li, S. Object Detection Algorithm Based on Improved YOLOv3. Electronics 2020, 9, 537. [CrossRef]
Appl. Sci. 2020, 10, 3280 44 of 46
225. Reno, V.; Mosca, N.; Marani, R.; Nitti, M.; D’Orazio, T.; Stella, E. Convolutional neural networks based
ball detection in tennis games. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition Workshops, Salt Lake City, UT, USA, 19–21 June 2018; pp. 1758–1764.
226. Kang, K.; Ouyang, W.; Li, H.; Wang, X. Object detection from video tubelets with convolutional neural
networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas,
NV, USA, 26 June–1 July 2016; pp. 817–825.
227. Pobar, M.; Ivasic-Kos, M. Active Player Detection in Handball Scenes Based on Activity Measures. Sensors
2020, 20, 1475. [CrossRef] [PubMed]
228. Pobar, M.; Ivašić-Kos, M. Detection of the leading player in handball scenes using Mask R-CNN and STIPS.
In Proceedings of the Eleventh International Conference on Machine Vision (ICMV 2018), International
Society for Optics and Photonics, Munich, Germany, 1–3 November 2019; Volume 11041, p. 110411V.
229. Pobar, M.; Ivasic-Kos, M. Mask R-CNN and Optical flow based method for detection and marking of
handball actions. In Proceedings of the 11th International Congress on Image and Signal Processing,
BioMedical Engineering and Informatics (CISP-BMEI), Beijing, China, 13–15 October 2018; pp. 1–6.
230. Burić, M.; Pobar, M.; Ivašić-Kos, M. Object detection in sports videos. In Proceedings of the 2018 41st
International Convention on Information and Communication Technology, Electronics and Microelectronics
(MIPRO), Opatija, Croatia, 21–25 May 2018; pp. 1034–1039.
231. Acuna, D. Towards real-time detection and tracking of basketball players using deep neural networks.
In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach,
CA, USA, 4–9 December 2017.
232. Afif, M.; Ayachi, R.; Said, Y.; Atri, M. Deep Learning Based Application for Indoor Scene Recognition. Neural
Process. Lett. 2020, 1–11. [CrossRef]
233. Tapu, R.; Mocanu, B.; Zaharia, T. DEEP-SEE: Joint object detection, tracking and recognition with application
to visually impaired navigational assistance. Sensors 2017, 17, 2473. [CrossRef] [PubMed]
234. Yang, W.; Tan, R.T.; Feng, J.; Liu, J.; Guo, Z.; Yan, S. Deep joint rain detection and removal from a single
image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI,
USA, 21–26 July 2017; pp. 1357–1366.
235. Hu, X.; Zhu, L.; Fu, C.W.; Qin, J.; Heng, P.A. Direction-aware spatial context features for shadow detection.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT,
USA, 18–23 June 2018; pp. 7454–7462.
236. Yang, Z.; Li, Q.; Wenyin, L.; Lv, J. Shared multi-view data representation for multi-domain event detection.
IEEE Trans. Pattern Anal. Mach. Intell. 2019. [CrossRef]
237. Hashmi, M.F.; Gupta, V.; Vijay, D.; Rathwa, V. Computer Vision-Based Assistive Technology for Helping
Visually Impaired and Blind People Using Deep Learning Framework. In Handbook of Research on Emerging
Trends and Applications of Machine Learning; IGI Global: Hershey, PA, USA, 2020; pp. 577–598.
238. Buzzelli, M.; Albé, A.; Ciocca, G. A vision-based system for monitoring elderly people at home. Appl. Sci.
2020, 10, 374. [CrossRef]
239. Szegedy, C.; Toshev, A.; Erhan, D. Deep neural networks for object detection. In Advances in Neural
Information Processing Systems; Curran Associates Inc.: San Francisco, CA, USA, 2013; pp. 2553–2561.
240. Du Terrail, J.O.; Jurie, F. On the use of deep neural networks for the detection of small vehicles in
ortho-images. In Proceedings of the 2017 IEEE International Conference on Image Processing, Beijing,
China, 17–20 September 2017; pp. 4212–4216.
241. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich,
A. Going deeper with convolutions. In Proceedings of the Conference on Computer Vision and Pattern
Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9.
242. Erhan, D.; Szegedy, C.; Toshev, A.; Anguelov, D. Scalable object detection using deep neural networks.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA,
23–28 June 2014; pp. 2147–2154.
243. Ohn-Bar, E.; Trivedi, M.M. Multi-scale volumes for deep object detection and localization. Pattern Recognit.
2017, 61, 557–572. [CrossRef]
244. Huang, C.; He, Z.; Cao, G.; Cao, W. Task-driven progressive part localization for fine-grained object
recognition. IEEE Trans. Multimed. 2016, 18, 2372–2383. [CrossRef]
Appl. Sci. 2020, 10, 3280 45 of 46
245. Liu, N.; Han, J. DHSNet: Deep hierarchical saliency network for salient object detection. In Proceedings
of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016;
pp. 678–686.
246. Li, X.; Zhao, L.; Wei, L.; Yang, M.H.; Wu, F.; Zhuang, Y.; Ling, H.; Wang, J. DeepSaliency: Multi-task deep
neural network model for salient object detection. IEEE Trans. Image Process. 2016, 25, 3919–3930. [CrossRef]
247. Wang, L.; Lu, H.; Ruan, X.; Yang, M.H. Deep networks for saliency detection via local estimation and global
search. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA,
7–12 June 2015; pp. 3183–3192.
248. Li, G.; Yu, Y. Deep contrast learning for salient object detection. In Proceedings of the Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 478–487.
249. Gao, M.L.; He, X.; Luo, D.; Yu, Y.M. Object tracking based on harmony search: Comparative study. J. Electron.
Imaging 2012, 21, 043001. [CrossRef]
250. Hao, Z. Improved Faster R-CNN for Detecting Small Objects and Occluded Objects in Electron Microscope
Imaging. Acta Microsc. 2020, 29.
251. Leung, H.K.; Chen, X.Z.; Yu, C.W.; Liang, H.Y.; Wu, J.Y.; Chen, Y.L. A Deep-Learning-Based Vehicle Detection
Approach for Insufficient and Nighttime Illumination Conditions. Appl. Sci. 2019, 9, 4769. [CrossRef]
252. Park, J.; Chen, J.; Cho, Y.K.; Kang, D.Y.; Son, B.J. CNN-based person detection using infrared images for
night-time intrusion warning systems. Sensors 2020, 20, 34. [CrossRef] [PubMed]
253. Kim, K.H.; Hong, S.; Roh, B.; Cheon, Y.; Park, M. PVANET: Deep but lightweight neural networks for
real-time object detection. arXiv 2016, arXiv:1608.08021.
254. Shih, Y.F.; Yeh, Y.M.; Lin, Y.Y.; Weng, M.F.; Lu, Y.C.; Chuang, Y.Y. Deep co-occurrence feature learning for
visual object recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition,
Honolulu, HI, USA, 21–26 July 2017; pp. 4123–4132.
255. Denton, E.L.; Chintala, S.; Szlam, A.; Fergus, R. Deep Generative Image Models using a Laplacian Pyramid
of Adversarial Networks. In Proceedings of the 28th International Conference on Neural Information
Processing Systems, Montréal, ON, Canada, 7–12 December 2015; Volume 1, pp. 1486–1494.
256. Takác, M.; Bijral, A.S.; Richtárik, P.; Srebro, N. Mini-Batch Primal and Dual Methods for SVMs. In Proceedings
of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1022–1030.
257. Goring, C.; Rodner, E.; Freytag, A.; Denzler, J. Nonparametric part transfer for fine-grained recognition.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA,
24–27 June 2014; pp. 2489–2496.
258. Lin, D.; Shen, X.; Lu, C.; Jia, J. Deep LAC: Deep localization, alignment and classification for fine-grained
recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA,
USA, 7–12 June 2015; pp. 1666–1674.
259. Zhang, N.; Donahue, J.; Girshick, R.; Darrell, T. Part-based R-CNNs for fine-grained category detection.
In Proceedings of the European Conference on Computer Vision, Zürich, Switzerland, 6–12 September 2014;
Springer: Berlin, Germany, 2014; pp. 834–849.
260. RaspberryPI. Available online: https://www.raspberrypi.org/ (accessed on 31 December 2019).
261. Nakahara, H.; Yonekawa, H.; Sato, S. An object detector based on multiscale sliding window search using
a fully pipelined binarized CNN on an FPGA. In Proceedings of the International Conference on Field
Programmable Technology, Melbourne, Australia, 11–13 December 2017; pp. 168–175.
262. Soma, P.; Jatoth, R.K. Hardware Implementation Issues on Image Processing Algorithms. In Proceedings
of the International Conference on Computing Communication and Automation, Greater Noida, India,
14–15 December 2018; pp. 1–6.
263. JetsonTX2. Available online: https://elinux.org/JetsonTX2 (accessed on 31 December 2019).
264. Garland, M.; Le Grand, S.; Nickolls, J.; Anderson, J.; Hardwick, J.; Morton, S.; Phillips, E.; Zhang, Y.; Volkov, V.
Parallel computing experiences with CUDA. IEEE Micro 2008, 28, 13–27. [CrossRef]
265. Stone, J.E.; Gohara, D.; Shi, G. OpenCL: A parallel programming standard for heterogeneous computing
systems. Comput. Sci. Eng. 2010, 12, 66–73. [CrossRef]
266. NVIDIA Collective Communications Library (NCCL). Available online: https://developer.nvidia.com/nccl
(accessed on 31 December 2019).
Appl. Sci. 2020, 10, 3280 46 of 46
267. Hwang, S.; Lee, Y. FPGA-based real-time lane detection for advanced driver assistance systems.
In Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, Jeju, South Korea, 25–28 October
2016; pp. 218–219.
268. Sajjanar, S.; Mankani, S.K.; Dongrekar, P.R.; Kumar, N.S.; Mohana.; Aradhya, H.V.R. Implementation of real
time moving object detection and tracking on FPGA for video surveillance applications. In Proceedings
of the IEEE Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), Mangalore, India,
13–14 August 2016; pp. 289–295.
269. Tijtgat, N.; Van Ranst, W.; Goedeme, T.; Volckaert, B.; De Turck, F. Embedded real-time object detection for a
UAV warning system. In Proceedings of the IEEE International Conference on Computer Vision Workshops,
Venice, Italy, 22–29 October 2017; pp. 2110–2118.
270. Hossain, S.; Lee, D.j. Deep Learning-Based Real-Time Multiple-Object Detection and Tracking from Aerial
Imagery via a Flying Robot with GPU-Based Embedded Devices. Sensors 2019, 19, 3371. [CrossRef]
271. Stepanenko, S.; Yakimov, P. Using high-performance deep learning platform to accelerate object detection.
In Proceedings of the International Conference on Information Technology and Nanotechnology, Samara,
Russia, 26–29 May 2019; pp. 1–7.
272. Körez, A.; Barışçı, N. Object Detection with Low Capacity GPU Systems Using Improved Faster R-CNN.
Appl. Sci. 2020, 10, 83. [CrossRef]
273. Çambay, V.Y.; Uçar, A.; Arserim, M.A. Object Detection on FPGAs and GPUs by Using Accelerated Deep
Learning. In Proceedings of the 2019 International Artificial Intelligence and Data Processing Symposium
(IDAP), Malatya, Turkey, 28–30 September 2019; pp. 1–5.
274. Moon, Y.Y.; Geem, Z.W.; Han, G.T. Vanishing point detection for self-driving car using harmony search
algorithm. Swarm Evol. Comput. 2018, 41, 111–119. [CrossRef]
275. Yao, Y.; Wang, Y.; Guo, Y.; Lin, J.; Qin, H.; Yan, J. Cross-dataset Training for Class Increasing Object Detection.
arXiv 2020, arXiv:2001.04621.
276. Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.;
Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316.
c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).