Detection Tracking and Classification of
Detection Tracking and Classification of
                                                    Nikhil Asogekar
                                                Department of Aerospace,
                                               Punjab Engineering College,
                                                    Chandigarh, India
                                                    Sudarshan Rathi
                                                Department of Aerospace,
                                               Indian Institute of Science,
                                               Bengaluru, Karnataka, India
Abstract—The increase in the volume of UAVs has been                justified in various tests in an uncontrolled outdoor
rapid in the past few years. The utilization of drones has          environment (in the presence of clouds, birds, trees, rain,
increased considerably in the military and commercial               etc.), proving to be equally effective in all the situations
setups, with UAVs of all sizes, shapes, and types being used        yielding high-quality results.
for various applications, from recreational flying to
purpose-driven missions. This development has come with             Keywords—Anti-UAV Defense Systems, Digital Image
challenges and has been identified as a potential source of         Processing, Convolutional Neural Network, YOLO, POT.
operational disruptions leading to various security
complications, including threats to Critical Infrastructures                          I.    INTRODUCTION
(CI). Thus, the need for developing fully autonomous anti-          The development of Unmanned Aerial Vehicles (UAVs), also
UAV Defense Systems (AUDS) hasn't been more imminent                known as drones, has been rapid in recent years. Both
than today. To attenuate and nullify the threat posed by            commercials, as well as military applications, have increased
the UAVs, either deliberately or otherwise, this paper              considerably. Various companies like Uber, Amazon, etc., are
presents the holistic design and operational prototype of           pushing forward to use drones as service providers for
drone detection technology based on visual detection using          packages and food in a commercial setup. At the same time,
Digital Image Processing (DIP) and Machine Learning                 the military application includes warfare, identifying
(ML) to detect, track and classify drones accurately. The           vulnerable areas prone to risks, and their mitigation.
proposed system uses a background-subtracted frame                  Despite attracting wide attention in diverse civil and
difference technique for detecting moving objects                   commercial applications, UAVs pose several threats to
partnered with a Pan-Tilt tracking system powered by                airspace safety that may endanger people and property. While
Raspberry Pi to track the moving object. The                        such threats can be highly diverse regarding the attackers'
identification of moving objects is made by a                       intentions and sophistication, ranging from pilot unskillfulness
Convolutional Neural Network (CNN) system called the                to deliberate attacks with unmanned aerial vehicles, they all
YOLO v4-tiny ML algorithm. The novelty of the proposed              can produce severe disruption and cause menace.
system lies in its accuracy, effectiveness with low-cost            The first-ever drone attack happened in the Indian Air Force
sensing equipment, and better performance compared to               base, Jammu, India, on 27th June 2021, with two drones
other alternatives. Along with ease of operations,                  dropped an IED packed with high explosives. The US did a lot
combining the system with other systems like RADAR                  of drone strikes in Pakistan, Yemen, and Somalia between the
could be a real game-changer in detection technology. The           year 2010 to 2020. The reported data shows 14,040 minimum
experimental validation of the proposed technology was              confirmed strikes with 8858 to 16901 total people killed.
                                                               11
                     International Journal of Engineering Applied Sciences and Technology, 2022
                                    Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                              Published Online July 2022 in IJEAST (http://www.ijeast.com)
About 910 to 2200 civilians and 283 to 454 children were                   Much research is going on to enhance the performance of
killed in these attacks. [1]                                               RADAR, RF, Visual, Acoustic, etc., methods [2]. Mukesh et
To mitigate and neutralize the threat posed by the misuse of               al. [3] have proposed a review of other moving object
UAVs against deliberate malicious and inadvertent activities,              detection and tracking methods. The significant challenges in
this paper presents a complete design of a Long-Range, Fully               detecting moving object are dynamic background, noise in the
Autonomous Drone Detection Platform. Considering the                       video, illumination changes, etc. Different classical models are
requirements of a system that will automatically detect, track,            proposed, such as Gaussian mixture background modelling [4]
and classify UAVs in a critical environment, the research in               [5] [6], which performs well in some cases. Algorithms such
this paper presents a complete system that consists of both                as ViBe [7] are fast but prone to noise in the video. Different
hardware and software components. The originality and                      pixel-based [8], region-based [9] and texture-based [10]
elegance of the proposed approach lie in the convergence and               methods are used to model dynamic backgrounds.
confluence of hardware and software components efficiently                 Zhang et al. [11] have proposed a new algorithm using canny
and effectively for the accurate localization of intruder objects.         edge detection to detect camouflaged moving objects. They
                                                                           have used constant zoom to track an object detected in a frame
The rest of the paper is organized as follows. Literature                  using a PTZ camera. Huang et al. [12] have proposed an ANN
Reviewis explained in section II. The proposed method is                   (artificial neural networks) model to detect moving objects in
explained in section III. Experimental results are presented in            dynamic backgrounds. Cao et al. [13] proposed an algorithm
section IV. Concluding remarks are given in section V.                     for dynamic background and irregular object movements. Bor-
                                                                           HorngSheu et al. [14] have proposed moving object detection
               II.     LITERATURE REVIEW                                   using a frame difference algorithm and tracking with the
With precise computer-controlled moments, variable speed,                  conversion of the pixel into the required pan-tilt angle in 3-D
and maneuvering capabilities, it has become essential to                   space.
predict the path of RPAS (Remotely piloted aircraft system)                State-of-the-art methods such as Region-based CNN, fast R-
and APAS. With its small size (actual and apparent in-camera)              CNN [15], and Faster R-CNN [16] are used as object detector
and its resemblance and similarity to that of other aerial                 and classifier methods in two stages. YOLO [17] and SSD
objects like aeroplanes, birds create challenges in automatic              [18] are one-stage object detectors and classifiers.
detection and accurate localization in real coordinates in
space. Various methods have been proposed and implemented
to solve this auto-detection and tracking problem, such as RF,                         III.     PROPOSED ALGORITHM
GPS, Radar, Acoustic, Vision-based, etc. While these systems               A.       Proposed method
can identify moving objects, tracking and classification are
acute problems they face. Since it is crucial to recognize the
malicious and harmful drones, Vision-Based Systems are
proposed and explored in this literature review.
Since the advent of fighter and commercial flights, detection
has significantly improved safety and enemy intention. With
the power of flight coming into the remotes and powerful
UAVs becoming an ever-increasing part of the modern world,
especially in the past five years, actively detecting, classifying,
and tracking has become significant.
Various techniques have been employed in detecting Flying
Aerial Vehicles, like RADAR, Acoustic, Computer Vision,                               Fig. 1.   Proposed System Architecture
and RF Based detection. But never has there been the need to
implement these techniques localized and in mass than now.                 Figure 1. shows the proposed drone defence method using two
With the boom in the number of drones being produced and                   camera systems. A wide-angle static camera is mounted on a
used, the need for robust, efficient, and cost-effective solutions         static frame. A pan-tilt setup is created using two servo
for the Detection and Tracking of Drone is at its peak.                    motors. The pan servo motor is fixed on the frame, whereas
With the ever-improving sophistication in the camera                       the tilt servo motor is attached to the rotating head of the pan
hardware, the potential of the Vision-Based Detection system               servo motor. One camera with zoom capability is attached to
is immense. Better Algorithms paired with high-quality                     the rotating shaft of the tilt servo motor. This camera is
hardware have made Vision-Based Detection of objects one of                referred to as a dynamic camera throughout the paper. Both
the founding pillars of autonomous systems. From self-driving              motors are controlled with a raspberry pi. All primary
vehicles using sense and avoiding face detection, VBD has                  algorithms process on the main computer showing the output
and will continue to become an inseparable part of                         results. The static camera act as a detection system, whereas
technologies.
                                                                      12
                 International Journal of Engineering Applied Sciences and Technology, 2022
                                Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                          Published Online July 2022 in IJEAST (http://www.ijeast.com)
the dynamic camera tracks the detected object. The flowchart             and frame difference is used to detect all moving objects
of the work is shown in figure 2.                                        present in the frame of the static camera.
The detection algorithm is used to detect the moving objects             When it comes to complex backgrounds containing objects
using modified background subtraction and frame difference               such as trees, grass, etc., different algorithms fail to detect
method from the frames of the static camera. It returns the              actual moving objects, as shown in figure 3. Here red circle
position of all detected objects in terms of a tuple (x, y),             denotes the moving objects detected by the corresponding
denoting the object's bounding box centre. This data is then             algorithm.
transferred to the raspberry pi system using a LAN cable. This           Here figure 3a shows the original frame of the video.
pixel information is converted into corresponding pan-tilt               Considering the first video frame as a background frame, the
activation by the raspberry pi. Pan-tilt servo motors are                background subtraction method is applied to detect the
controlled using this information by the raspberry pi, so the            moving objects, as shown in figure 3b.Figure 3c shows the
camera attached to it looks toward the object.                           gaussian mixture model application to classify the pixels as
                                                                         foreground or background. Figure 3d shows the combination
                                                                         of simple background subtraction and gaussian mixture model
                                                                         methods. It reduces the false alarm up to a certain level but not
                                                                         enough.
B.       Detection
Detection of fast-moving objects in a complex background is a
                                                                            Fig. 4.   Flow chart of proposed moving object detection
challenging task. A combination of background subtraction
                                                                                                     method
                                                                    13
                  International Journal of Engineering Applied Sciences and Technology, 2022
                                 Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                           Published Online July 2022 in IJEAST (http://www.ijeast.com)
All the frame operations are carried on grey frames to reduce             bounding circle is denoted across the object detected for better
the computation time. Figure 5b shows the colour image of the             visualization along with its path.
drone, which is converted into grey, as shown in figure 5c.
The RGB frame of the static camera is first converted into the            C.        Tracking
grey frame as per the following equation –                                Object Tracking is the process of estimating or predicting the
     Image[gray] = 0.299 ∗ R + 0.587 ∗ G + 0.114 ∗ B                      positions of moving objects in a video sequence using tracing
The background frame is the static camera's first frame, as               algorithms. The object tracking program based on RaspberryPi
shown in figure 5a. Subtracting each k'th frame from this                 takes in the initial set of object detection as a bounding box
background frame gives a background-subtracted k'th frame.                and makes a unique ID for each set of boxes. The algorithm
Such two consecutive background-subtracted frames are                     then tracks the moving object as they move in subsequent
subtracted to get the actual moving object, as shown in figure            frames in a video sequence. These tracking methods are
5d. This step reduces the false alarms generated by slight                readily applied to real-time video streams from the camera
movements of grass, leaves of a tree, clods etc. since all these          with the help of a USB or IP-based protocol. The video is then
generate movement almost at the exact location in all frames.             fed into the algorithm to perform object tracking. Each frame
After this, thresholding is used to convert the image into a              is fed into the tracking algorithm for subsequent detection and
binary image, as shown in figure 5e.                                      tracking, and as a result, high-performance tracking of the
Frame difference will leave broken gaps in the moving object              object of interest is obtained.
detection, as shown in figure 5e. Morphological operation                 Object tracking is an important part of the overall system and
closing, a dilation followed by an erosion, can be used to fill           is essential in the localization of the trespassing drone. The
these gaps, as shown in figure 5f.                                        tracking method developed and used utilizes OpenCV-
The following equation defines morphological closing –                    Tracking by Detection, which tracks the object with the help
                     A ⋅ B = (A ⊕ B) ⊖ B                                  of detection. A bounding box is created around the object to
                                                                          be detected, showing the user where the object is in the frame.
                                                                          A self-fabricated model of a Pan-Tilt camera operated by
                                                                          Raspberry Pi and a Personal Computer (PC) Laptop for testing
                                                                          the algorithm is used for tracking the object of interest.
                                                                     14
                 International Journal of Engineering Applied Sciences and Technology, 2022
                                Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                          Published Online July 2022 in IJEAST (http://www.ijeast.com)
                                                                         Datasets
                                                                         Different drone datasets are available open-source. We have
                                                                         modified these datasets into the following categories –
                                                                         1. Drone close to the camera– Images from different datasets
                                                                         available are combined so that all drones are very close to the
                                                                         camera.
                                                                         2. Drone far from the camera– Dataset from the drone vs
                                                                         bird detection challenge [20] is used, including drone videos
                                                                         with a high distance between camera and drone.
                 Pan_High − Pan_Low
  Pan_angle =                       ∗ Pixel_x + Pan_low
                    frame_width
                                                                    15
                 International Journal of Engineering Applied Sciences and Technology, 2022
                                Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                          Published Online July 2022 in IJEAST (http://www.ijeast.com)
                                                                 16
                 International Journal of Engineering Applied Sciences and Technology, 2022
                                Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                          Published Online July 2022 in IJEAST (http://www.ijeast.com)
Quantitative Comparisons
The detection algorithm is quantitatively compared with
different parameters on different videos of the dataset [20].
The indexes for evaluating the performance of these
techniques are as follows.
                                  TP
                       Recall =
                                TP + FN                                     Fig. 14. Result-Two cameras detection and tracking
                    Binarization          Binarization
 Video   name       threshold = 50        threshold = 20
 [20]               Precision Recall      Precision Recall
 GOPR5842_005       0.796        0.865    0.597       0.983
 GOPR5844_002       0.80         0.987    0.585       1
                                                                                Fig. 15. Result-Two cameras classification
 GOPR5847_003       0.753         0.976   0.821           0.997
                                                                       Zooming towards the object is implemented to get the required
 GOPR5848_004       0.760         0.924   0.364           1            pixels on target (60x40 pixels). This image from the dynamic
                                                                       camera is then fed to the YOLO algorithm for classification.
Table -1 Quantitative comparisons with binarization threshold          The fame number and pixels on target are denoted in figure
                                                                       15. Fame number 408, we get 2484 PoT; this frame will be
The results show that to detect an object at a far distance, we        sent to YOLO for classification. If the object is a drone, it is
must reduce the binarization threshold, reducing false                 denoted on the frame in red colour.
negatives and improve the recall. But this increases the false
positives making the precision worse in some cases.                    C.        Single (PTZ) camera system
                                                                       A single-camera detection, tracking and classification system
Method                        Precision           Recall               is made using a camera attached to the PT system. Initially,
                                                                       the system works in detection mode, which uses the frame of
Method proposed by Chen et 0.275                  1                    the dynamic camera. Once the object is detected, it sends the
al. [19]                                                               object's location to raspberry pi so that it can drive the PT
Proposedmethod             0.796                  0.865                servo motors to keep the object at the centre of the camera, as
                                                                       shown in figure 16.
 Table -2 Quantitative comparisons with method proposed by
                        Chen et al. [19]
                                                                  17
                  International Journal of Engineering Applied Sciences and Technology, 2022
                                 Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                           Published Online July 2022 in IJEAST (http://www.ijeast.com)
                                                                                                  V.CONCLUSION
                                                                          This paper illustrates a holistic design and operational
                                                                          prototype of drone detection technology based on image
                                                                          processing to solve the problem of illicit rogue drones
                                                                          trespassing in classified and protected locations. The proposed
     Fig. 16. Result-Single camera detection and tracking                 system accurately detects any moving object in the complex
                                                                          background and varied environments up to but not limited to a
Zooming is applied to get enough PoT, after which the frame               distance of 300 m, tracking it in real-time with near-zero
of the dynamic camera is sent to the YOLO algorithm for                   latency and classifying it as a potential threat (i.e., drone) or
classification. If the object is a drone, it is denoted in the red        otherwise. The seamless confluence and convergence of
circle, as shown in figure 17.                                            smartly designed hardware combined with tailored software
                                                                          (proprietary) have resulted in a high-performance, low-cost
                                                                          alternative to other detection methods. This system aims
                                                                          toward fully automated surveillance and protection solutions
                                                                          for critical infrastructure, emphasizing precision, accuracy,
                                                                          and reduced response time. The culmination of Background
                                                                          subtraction and Frame Difference has improved the standards
                                                                          considerably compared to other similar works, as outlined, and
                                                                          validated in the experimental result section.
                                                                          The proposed system is an excellent initiative for future work
                                                                          to continue the investigation in the same direction to yield an
                                                                          industry-level robust system. A fusion of this technology with
                                                                          existing technology, viz. RADAR or RF Detection in a certain
                                                                          way would result in a sound and robust system with a high
                                                                          range- high accuracy capability that would be practically
                                                                          impenetrable.
         Fig. 17. Result-Single camera classification
                                                                                               IV.   REFERENCE
D.        Multiple objects detection, tracking and                        [1]     Retrieved from The Bureau of Investigative Journalism:
classification                                                                    https://www.thebureauinvestigates.com/ (accessed on
When a static camera detects multiple moving objects, unique                      January 2022)
IDs are assigned to each object individually. Unique IDs are              [2]     BilalT., and Shoufan A.(2019). Machine learning-based
assigned to each object detected in the first frame. Euclidean                    drone detection and classification: State-of-the-art in
distance between objects in two consecutive frames is                             research.IEEE access 7 (2019): 138669-138682.
calculated. The same ID is assigned to the closest pair of                [3]     Tiwari M., and Singhai R.(2017). A review of detection
objects, whereas new objects get the following ID. If an object                   and tracking of object from image and video
moves out of the frame, its ID is erased from the system.                         sequences.Int. J. Comput. Intell. Res 13.5 (2017): 745-
Figure 18 shows four objects detected by the static camera,                       765.
two birds with ID 0 and 4 and two drones with ID 1 and 2.                 [4]     Lee D., Hull J., and Erol B.(2003). A Bayesian
The dynamic camera will track only one ID for which the                           framework for Gaussian mixture background
PoTs are higher, assuming it is nearest to the system. Here                       modeling.Proceedings 2003 international conference on
drone with ID2 is being followed by the dynamic camera, and                       image processing (Cat. No. 03CH37429). Vol. 3. IEEE,
other objects are ignored until the classification of the ID2                     2003.
object.                                                                   [5]     Reynolds, Douglas A., Thomas F. Quatieri, and Robert
                                                                                  B. Dunn(2000). "Speaker verification using adapted
                                                                     18
                 International Journal of Engineering Applied Sciences and Technology, 2022
                                Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
                          Published Online July 2022 in IJEAST (http://www.ijeast.com)
       Gaussian       mixture      models." Digital      signal               detector. uropean conference on computer vision.
       processing 10.1-3 (2000): 19-41.                                       Springer, Cham, 2016.
[6]    Zivkovic Z.(2004). Improved adaptive Gaussian                   [19]   Chen S., Xu T., Li D.,Zhang J., and . Jiang1 S (2016).
       mixture model for background subtraction.Proceedings                   Moving object detection using scanning camera on a
       of the 17th International Conference on Pattern                        high-precision intelligent holder. Sensors 16.10 (2016):
       Recognition, 2004. ICPR 2004.. Vol. 2. IEEE, 2004.                     1758.
[7]    Olivier B., and Droogenbroeck M.(2009). ViBe: a                 [20]   Coluccia, A.; Fascista, A.; Schumann, A.; Sommer, L.;
       powerful random technique to estimate the background                   Dimou, A.; Zarpalas, D.; Méndez, M.; de la Iglesia, D.;
       in video sequences.2009 IEEE international conference                  González, I.; Mercier, J.-P.; Gagné, G.; Mitra, A.;
       on acoustics, speech and signal processing. IEEE, 2009.                Rajashekar, S. Drone vs. Bird Detection: Deep
[8]    Michael H.(2002). A framework for high-level                           Learning Algorithms and Results from a Grand
       feedback to adaptive, per-pixel, mixture-of-gaussian                   Challenge. Sensors 2021, 21,                       2824.
       background models. European Conference on                              https://doi.org/10.3390/s21082824
       Computer Vision. Springer, Berlin, Heidelberg, 2002.
[9]    How-Lung E., and Junxian W., (2004). "Novel region-
       based modeling for human detection within highly
       dynamic aquatic environment." Proceedings of the 2004
       IEEE Computer Society Conference on Computer
       Vision and Pattern Recognition, 2004. CVPR 2004..
       Vol. 2. IEEE, 2004.
[10]   Jing Z.(2003). Segmenting foreground objects from a
       dynamic textured background via a robust kalman filter.
       Proceedings Ninth IEEE International Conference on
       Computer Vision. IEEE, 2003.
[11]   Xindi Z., and Kusrini K.(2021). Autonomous long-
       range drone detection system for critical infrastructure
       safety. Multimedia Tools and Applications 80.15
       (2021): 23723-23743.
[12]   Shih-Chia H., and Ben-Hsiang D.(2013). Radial basis
       function based neural network for motion detection in
       dynamic scenes.IEEE transactions on cybernetics 44.1
       (2013): 114-125.
[13]   Xiaochun C., Yang L., and Guo X.(2015). Total
       variation regularized RPCA for irregularly moving
       object detection under dynamic background. IEEE
       transactions on cybernetics 46.4 (2015): 1014-1027.
[14]   Sheu B., Chiu C., Lu W., Huang C., and Chen
       W.(2019). Development of UAV tracing and coordinate
       detection method using a dual-axis rotary platform for
       an anti-UAV system. Applied Sciences 9.13 (2019):
       2583.
[15]   Ross G. (2015). Fast R-CNN.IEEE International
       Conference on Computer Vision (ICCV), 2015, pp.
       1440-1448, doi: 10.1109/ICCV.2015.169.
[16]   Ren S., He K., Girshick R., and Sun J. (2015). Faster r-
       cnn: Towards real-time object detection with region
       proposal networks. Advances in neural information
       processing systems 28 (2015).
[17]   Redmon J., Divvala S., Girshick R., Farhadi A. (2016).
       You only look once: Unified, real-time object detection.
       Proceedings of the IEEE conference on computer vision
       and pattern recognition. 2016.
[18]   Liu W.,Anguelov D., Erhan D., Szegedy C., Reed S.,
       Fu C., and Berg A. (2016). Ssd: Single shot multibox
19