KEMBAR78
Image Object Detection Pipeline | PDF
Object Detection Pipeline
Abhinav Dadhich, ABEJA, Inc.
Tokyo Machine Learning Kitchen, 2017
Outline
1. What is Object Detection? Components?
2. Models for detection
3. Train, Test & Evaluate
4. FAQs
Object Detection
http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf
Q: What object is in image?
A: Cat
Object Detection
http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf
Q: Where the object is in image?
A: bounding box or coordinates
Object Detection
http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf
Components of
a Detection
● Dataset of images and
target labels
● Pre-Processing images
and labels
● Model selection and
modifications
● Training
● Testing and Evaluation
● Deploying final model
Model Architecture
Feature
Extractor
Classifier
Object
Classification
Bounding Box
Why Deep Neural Network?
“Fast R-CNN” Ross Girshick, ICCV’15
Model Architecture
Feature
Extractor
Classifier
Object
Classification
Bounding Box
Deep CNN Architecture
● VGG
● Inception
● Resnet
Faster Region based ConvNets (Faster-RCNN)
Girshick et al. ICCV 2015.
Fig: Huang et al. 2016, arXiv:1611.10012v1
● 2 Step Process
● Higher Accuracy
● Slower per
sample prediction
wrt similar
models.
● Large image size
Region based Fully Convolutional Network(R-FCN)
Dai et al. , NIPS 2016.
● 2 Step Process
● Faster per sample
prediction time wrt
Faster-RCNN
Fig: Huang et al. 2016, arXiv:1611.10012v1
Single Shot Detector(SSD)
Liu et al. , ECCV 2016
● 1 step Process
● Faster per sample
prediction time
● Small images,
large objects
Fig: Huang et al. 2016, arXiv:1611.10012v1
Training
● Pre-trained models are available for major deep learning
frameworks.
● Fine tune existing model
Feature Layers
CNNs FCs
Classification Layers
Training
● Pre-trained models are available for major deep learning
frameworks.
● Fine tune existing model, re-initializing output layers.
Feature Layers
CNNs FCs
Re-initialized
Classification Layers
Evaluation Metrics & Tests
● Mean Average Precision(mAP):
○ Thresholding based on Intersection over Union(IoU) score.
○ Average over all class predictions.
○ Higher is better.
● Prediction Time : pre-processing + prediction time per image
● Memory Usage : model’s gpu/cpu memory usage while
prediction
Datasets
● [Try] Collect new dataset according to task.
● [Try] Get labels as accurate as possible.
● If not, use public datasets:
○ MSCOCO: 80 objects, 300K images, 5 captions per image.
○ Pascal VOC: 20 objects, ~20K images
○ LSVRC: 200 objects, ~470K images
Pre-processing
● Image resizing.
● Image pixel normalization.
● Data Augmentation:
○ Flip
○ Random rotations
Codes and Pre-trained Models
● Faster-RCNN: https://github.com/rbgirshick/py-faster-rcnn (Caffe)
● R-FCN : https://github.com/Orpine/py-R-FCN (Caffe)
● SSD: https://github.com/weiliu89/caffe/tree/ssd (Caffe)
Thank You

Image Object Detection Pipeline