0% found this document useful (0 votes)

40 views62 pages

Lecture 5 - CNNs For Detection and Segmentation

Uploaded by

Syed Rafay Hashmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views62 pages

Lecture 5 - CNNs For Detection and Segmentation

Uploaded by

Syed Rafay Hashmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

High Level Computer Vision

Object Detection and Segmentation

@ May 22, 2024

Bernt Schiele

https://cms.sic.saarland/hlcvss24/

Max Planck Institute for Informatics & Saarland University,

Saarland Informatics Campus Saarbrücken
So far: Image Classification
So far: Image Classification

Class Scores
Cat: 0.9
Dog: 0.05
Fully-Connected: Car: 0.01
4096 to 1000 ...
This image is CC0 public domain Vector:
4096

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 6 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 2
Other Computer Vision Tasks
Other Computer Vision Tasks
Semantic Classification Object Instance
Segmentation + Localization Detection Segmentation

GRASS, CAT, CAT DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No objects, just pixels Single Object Multiple Object This image is CC0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 8 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 3
Other Computer Vision Tasks
Other Computer Vision Tasks
Semantic Classification Object Instance
Segmentation + Localization Detection Segmentation

GRASS, CAT, CAT DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No objects, just pixels Single Object Multiple Object This image is CC0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 8 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 4
Semantic Segmentation
Semantic Segmentation This image is CC0 public domain

Label each pixel in the

image with a category
label

es
Sky Sky

e
Tr

Tr
e
es
Don’t differentiate
Cat Cow
instances, only care about
pixels
Grass
Grass

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 11 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 5
Semantic Segmentation Idea: Sliding Window
Semantic Segmentation Idea: Sliding Window
Classify center
Extract patch pixel with CNN

Full image
Cow

Cow

Grass
Problem: Very inefficient! Not
reusing shared features between
overlapping patches Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013
Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 13

12 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 6
Semantic Segmentation Idea: Fully Convolutional
Semantic Segmentation Idea: Fully Convolutional
Design a network as a bunch of convolutional layers
to make predictions for pixels all at once!

Conv Conv Conv Conv argmax

Input:
Scores: Predictions:
3xHxW
CxHxW HxW
Convolutions:
Problem: convolutions at DxHxW
original image resolution will
be very expensive ...

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 15

14 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 7
Semantic Segmentation Idea: Fully Convolutional
Semantic Segmentation Idea: Fully Convolutional
Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x⇥H/8
H/4 x⇥W/4
W/8
Input: High-res: High-res: Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 16 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 8
Semantic Segmentation Idea: Fully Convolutional
Semantic Segmentation Idea: Fully Convolutional
Downsampling: Design network as a bunch of convolutional layers, with Upsampling:
Pooling, strided downsampling and upsampling inside the network! ???
convolution
Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x⇥H/8
H/4 x⇥W/4
W/8
Input: High-res: High-res: Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 17 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 9
In-Network Upsampling: "Unpooling"
In-Network upsampling: “Unpooling”

Nearest Neighbor “Bed of Nails”

1 1 2 2 1 0 2 0

1 2 1 1 2 2 1 2 0 0 0 0

3 4 3 3 4 4 3 4 3 0 4 0

3 3 4 4 0 0 0 0

Input: 2 x 2 Output: 4 x 4 Input: 2 x 2 Output: 4 x 4

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 18 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 10
In-Network Upsampling: "Max Unpooling"
In-Network upsampling: “Max Unpooling”
Max Pooling
Max Unpooling
Remember which element was max!
Use positions from
pooling layer 0 0 2 0
1 2 6 3
1 2
3 5 2 1 5 6
… 3 4
0 1 0 0

1 2 2 1 7 8 0 0 0 0
Rest of the network
7 3 4 8 3 0 0 4

Input: 4 x 4 Output: 2 x 2 Input: 2 x 2 Output: 4 x 4

Corresponding pairs of
downsampling and
upsampling layers

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 19 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 11
Learnable Upsampling: Transpose Convolution
Learnable Upsampling: Transpose Convolution
Recall: Normal 3 x 3 convolution, stride 1 pad 1
Recall:Typical 3 x 3 convolution, stride 1 pad 1

Dot product
between filter
and input

Input: 4 x 4 Output: 4 x 4

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 22

20 May 10, 2018
21

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 12
Learnable Upsampling: Transpose Convolution
Learnable Upsampling: Transpose Convolution
Recall: Normal 3 x 3 convolution, stride 2 pad 1

Filter moves 2 pixels in

Dot product the input for every one
between filter pixel in the output
and input
Stride gives ratio between
movement in input and
output
Input: 4 x 4 Output: 2 x 2

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 25

23 May 10, 2018
24

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 13
Learnable Upsampling: Transpose Convolution
Learnable Upsampling: Transpose Convolution
Sum where
Other names: 3 x 3 transpose convolution, stride 2 pad 1 output overlaps
-Deconvolution (bad)
-Upconvolution
-Fractionally strided
convolution
-Backward strided Filter moves 2 pixels in
convolution Input gives the output for every one
weight for pixel in the input
filter
Stride gives ratio between
movement in output and
input
Input: 2 x 2 Output: 4 x 4

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 30

26 May 10, 2018
27
28

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 14
Learnable Upsampling: 1D Example
Learnable Upsampling: 1D Example
Output
Input Filter Output contains
ax copies of the filter
weighted by the
x ay input, summing at
where at overlaps in
a the output
y az + bx
b Need to crop one
z by pixel from output to
make output exactly
2x input
bz

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 31 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 15
Convolution as Matrix Multiplication (1D Example)
Convolution as Matrix Multiplication (1D Example)
We can express convolution in Convolution transpose multiplies by the
terms of a matrix multiplication transpose of the same matrix:

2 3
x y z 0 0 0
60 x y z 0 07
6 7
40 0 x y z 05
0 0 0 x y z

Example: 1D conv, kernel 1D input When stride=1, convolution transpose is

size=3, stride=1, padding=1 (e.g. image) just a regular convolution (with different
padding rules)

1D Convolution
Fei-FeiKernel
Li & Justin Johnson & Serena Yeung Lecture 11 - 33
32 May 10, 2018
expanded into Matrix
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 16
Convolution as Matrix Multiplication (1D Example)
Convolution as Matrix Multiplication (1D Example)
We can express convolution in Convolution transpose multiplies by the
terms of a matrix multiplication transpose of the same matrix:


x y z 0 0 0
0 0 x y z 0

Learnable Upsampling: 1D Example

Example: 1D conv, kernel
When stride>1, convolution transpose is Output
size=3, stride=2, padding=1
no longer a normal convolution!
Input Filter Outpu
ax copie
weigh
x ay input,
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 35
34a May 10, y2018 where
the ou
az + bx
b Need
z by pixel
make
2x inp
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
bz
High Level Computer Vision | Bernt Schiele 17
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 31 Ma
Semantic Segmentation Idea: Fully Convolutional
Semantic Segmentation Idea: Fully Convolutional
Upsampling:
Downsampling: Design network as a bunch of convolutional layers, with
Unpooling or strided
Pooling, strided downsampling and upsampling inside the network!
transpose convolution
convolution
Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x⇥H/8
H/4 x⇥W/4
W/8
Input: High-res: High-res: Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 36 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 18
What does a U-Net do?

Learns Segmentation

Input Image Output Segmentation Map

High Level Computer Vision | Bernt Schiele 19

U-Net Architecture

Ronneberger et al. (2015) U-net Architecture

High Level Computer Vision | Bernt Schiele 20

U-Net Architecture

“Contraction” Phase
- Increases field of view
- Lose Spatial Information

Ronneberger et al. (2015) U-net Architecture

High Level Computer Vision | Bernt Schiele 21

U-Net Architecture

“Expansion” Phase
- Create High Resolution
Mapping

Ronneberger et al. (2015) U-net Architecture

High Level Computer Vision | Bernt Schiele 22

U-Net Architecture

Concatenate with high-resolution feature

maps from the Contraction Phase

Ronneberger et al. (2015) U-net Architecture

High Level Computer Vision | Bernt Schiele 23

U-Net Summary

• Contraction Phase
‣ Reduce spatial dimension, but increases the “what.”

• Expansion Phase
‣ Recovers object details and the dimensions, which is the “where.”

• Concatenating feature maps from the Contraction phase helps the Expansion phase
with recovering the “where” information.

High Level Computer Vision | Bernt Schiele 24

Other Computer Vision Tasks
Other Computer Vision Tasks
Semantic Classification Object Instance
Segmentation + Localization Detection Segmentation

GRASS, CAT, CAT DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No objects, just pixels Single Object Multiple Object This image is CC0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 8 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 25
Classification + Localization
Classification + Localization Correct label:
Cat

Class Scores
Fully Cat: 0.9 Softmax
Connected: Dog: 0.05 Loss
4096 to 1000 Car: 0.01
...

Multitask Loss + Loss

This image is CC0 public domain Vector: Fully

Connected:
4096 4096 to 4 Box
Coordinates L2 Loss
(x, y, w, h)
Treat localization as a
regression problem! Correct box:
(x’, y’, w’, h’)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 43

41 May 10, 2018
42

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 26
Classification + Localization
Classification + Localization Correct label:
Cat

Class Scores
Fully Cat: 0.9 Softmax
Connected: Dog: 0.05 Loss
4096 to 1000 Car: 0.01
...

+ Loss

This image is CC0 public domain Vector: Fully

Often pretrained on ImageNet Connected:
4096 4096 to 4 Box
(Transfer learning)
Coordinates L2 Loss
(x, y, w, h)
Treat localization as a
regression problem! Correct box:
(x’, y’, w’, h’)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 42

44 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 27
Other Computer Vision Tasks
Other Computer Vision Tasks
Semantic Classification Object Instance
Segmentation + Localization Detection Segmentation

GRASS, CAT, CAT DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No objects, just pixels Single Object Multiple Object This image is CC0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 8 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 28
Object Detection as Regression?

Object Detection as Regression?

CAT: (x, y, w, h)

DOG: (x, y, w, h)
DOG: (x, y, w, h)
CAT: (x, y, w, h)

DUCK: (x, y, w, h)
DUCK: (x, y, w, h)
….

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 45 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 29
Object Detection as Regression?
Each image needs a
Object Detection as Regression? different number of outputs!

CAT: (x, y, w, h) 4 numbers

DOG: (x, y, w, h)
DOG: (x, y, w, h) 16 numbers
12
CAT: (x, y, w, h)

DUCK: (x, y, w, h) Many

DUCK: (x, y, w, h) numbers!
….

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 46 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 30
Object Detection as Classification: Sliding Window

Object Detection as Classification: Sliding Window

Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? NO
Cat? NO
Background? YES

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 47 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 31
Object Detection as Classification: Sliding Window

Object Detection as Classification: Sliding Window

Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? YES
Cat? NO
Background? NO

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 48 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 32
Object Detection as Classification: Sliding Window

Object Detection as Classification: Sliding Window

Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? YES
Cat? NO
Background? NO

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 49 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 33
Object Detection as Classification: Sliding Window

Object Detection as Classification: Sliding Window

Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? NO
Cat? YES
Background? NO

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 50 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 34
Object Detection as Classification: Sliding Window

Object Detection as Classification: Sliding Window

Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? NO
Cat? YES
Background? NO

Problem: Need to apply CNN to huge

number of locations, scales, and aspect
ratios, very computationally expensive!

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 51 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 35
Region Proposals / e.g. Selective Search

Region Proposals / Selective Search

● Find “blobby” image regions that are likely to contain objects
● Relatively fast to run; e.g. Selective Search gives 2000 region
proposals in a few seconds on CPU

Alexe et al, “Measuring the objectness of image windows”, TPAMI 2012

Uijlings et al, “Selective Search for Object Recognition”, IJCV 2013
Cheng et al, “BING: Binarized normed gradients for objectness estimation at 300fps”, CVPR 2014
Zitnick and Dollar, “Edge boxes: Locating object proposals from edges”, ECCV 2014

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 52 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 36
Region Proposal Step

High Level Computer Vision | Bernt Schiele 37

Selective Search: Motivation

• Many approaches (at the time) use exhaustive search:

‣ visit every location in an image
‣ problem: computationally expensive:
- number of possible locations should be small
-> number of grid locations & aspect ratio(s) need to be small
- evaluation cost per location should be low
-> simple features / classifiers

• to go beyond this - we should aim for something more “sophisticated”

High Level Computer Vision | Bernt Schiele 38

Selective Search: Main Design Criteria

High Level Computer Vision | Bernt Schiele 39

Selective Search: How to Obtain High Recall?

High Level Computer Vision | Bernt Schiele 40

Selective Search: Method

High Level Computer Vision | Bernt Schiele 41

Selective Search: Method

High Level Computer Vision | Bernt Schiele 42

Selective Search: Method
• compute similarity measure between all adjacent region pairs
a and b (e.g.) as:
size(a) + size(b)
S(a, b) = ↵S
Ssize + 1 Scolor (a, b)
zize(a, b) =
size(image)
‣ with
size(a) + size(b)
Ssize (a, b) = 1
size(image)
encourages small regions to merge early
‣ and n
X
Scolor (a, b) = min(ak , bk )
k=1
ak , bk are color histograms, encouraging “similar (color)” regions to merge
‣ for slightly more elaborated similarities see their IJCV-paper
High Level Computer Vision | Bernt Schiele 43
Selective Search: Method

High Level Computer Vision | Bernt Schiele 44

Selective Search: Method

High Level Computer Vision | Bernt Schiele 45

Selective Search: High Recall Revisited

High Level Computer Vision | Bernt Schiele 46

Selective Search: Evaluation of Object Hypotheses

High Level Computer Vision | Bernt Schiele 47

R-CNN — Region Based CNN

R-CNN

Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 58

53 May 10, 2018
54
55
56
57

High Level Computer Vision | Bernt Schiele 48

R-CNN — Region Based CNN: Problems
R-CNN: Problems

• Ad hoc training objectives

• Fine-tune network with softmax classifier (log loss)
• Train post-hoc linear SVMs (hinge loss)
• Train post-hoc bounding-box regressions (least squares)
• Training is slow (84h), takes a lot of disk space
• Inference (detection) is slow
• 47s / image with VGG16 [Simonyan & Zisserman. ICLR15]
• Fixed by SPP-net [He et al. ECCV14]

Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Slide copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 59 May 10, 2018
High Level Computer Vision | Bernt Schiele 49
Fast R-CNN
Fast R-CNN
Fast R-CNN

Girshick, “Fast R-CNN”, ICCV 2015.

Figure copyright Ross Girshick, 2015; source. Reproduced with permission.
Girshick, “Fast R-CNN”, ICCV 2015.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 62
63 May 10, 2018
60
61
Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 66

65 May 10, 2018

High Level Computer Vision | Bernt Schiele 50

R-CNN vs. SPP vs. Fast-RCNN
R-CNN vs SPP vs Fast R-CNN

Problem:
Runtime dominated
by region proposals!

Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014.
He et al, “Spatial pyramid pooling in deep convolutional networks for visual recognition”, ECCV 2014
Girshick, “Fast R-CNN”, ICCV 2015

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 70

69 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 51
Faster - R-CNN: Make CNN do Proposals also !
Faster R-CNN:
Make CNN do proposals!
Insert Region Proposal
Network (RPN) to predict
proposals from features

Jointly train with 4 losses:

1. RPN classify object / not object
2. RPN regress box coordinates
3. Final classification score (object
classes)
4. Final box coordinates

Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
Figure copyright 2015, Ross Girshick; reproduced with permission

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 71 May 10, 2018
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele 52
Faster R-CNN:
Make CNN do proposals!
Insert Region Proposal
Network (RPN) to predict
Region Proposal Network (RPN) proposals from features

Jointly train with 4 losses:

1. RPN classify object / not object
2. RPN regress box coordinates
3. Final classification score (object

• components classes)
4. Final box coordinates

‣ 3x3 sliding window Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015

‣ 256-dimensional vector for

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 71 May 10, 201
each location (convolutions)
‣ k anchor boxes, e.g. k=9:
- 3 scales x 3 aspect ratios
‣ for each box
- class score (here 2-class softmax)
for (any) object present or not
- 4 coordinates for bounding box

High Level Computer Vision | Bernt Schiele 53

Faster - R-CNN
Faster R-CNN:
Make CNN do proposals!

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 72 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 54
Object Detection vs. Instance
Other Computer Segmentation
Vision Tasks
Semantic Classification Object Instance
Segmentation + Localization Detection Segmentation

GRASS, CAT, CAT DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No objects, just pixels Single Object Multiple Object This image is CC0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 8 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 55
Mask R-CNN = Faster R-CNN + Segmentation Output for each ROI
Mask R-CNN
Classification Scores: C
Box coordinates (per class): 4 * C

CNN Conv Conv

RoI Align

256 x 14 x 14 256 x 14 x 14 Predict a mask for

each of C classes

C x 14 x 14
He et al, “Mask R-CNN”, arXiv 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 73 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 56
Mask R-CNN: Very Good Results
Mask R-CNN: Very Good Results!

He et al, “Mask R-CNN”, arXiv 2017

Figures copyright Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick, 2017.
Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 74 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 57
Mask R-CNN: Also does Pose
Mask R-CNN
Also does pose

He et al, “Mask R-CNN”, arXiv 2017

Figures copyright Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick, 2017.
Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 75 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 58
Detection without Proposals: YOLO (alternative: SSD)
Detection without Proposals: YOLO / SSD
Go from input image to tensor of scores with one big convolutional network!

Within each grid cell:

- Regress from each of the B
base boxes to a final box with
5 numbers:
(dx, dy, dh, dw, confidence)
- Predict scores for each of C
classes (including
background as a class)

Input image Divide image into grid Output:

3xHxW 7x7 7 x 7 x (5 * B + C)
Image a set of base boxes
Redmon et al, “You Only Look Once: centered at each grid cell
Unified, Real-Time Object Detection”, CVPR 2016
Liu et al, “SSD: Single-Shot MultiBox Detector”, ECCV 2016 Here B = 3

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 77

76 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 59
Detection without Proposals: YOLO

slide credit: https://youtu.be/YmMZkCstui0

High Level Computer Vision | Bernt Schiele 60

YOLO Family

Darknet-53
YOLO YOLO v3 YOLO v5
CVPR 2016 arXiv 2016 GitHub 2020

YOLO v2 / YOLO9000 YOLO v4

CVPR 2017 arXiv 2020

Anchor boxes, Batch normalization, …

Darknet-19
Joint training: both detection and classi cation

High Level Computer Vision | Bernt Schiele 61

fi
Object Detection: Impact of Deep Learning
Object Detection: Impact of Deep Learning

Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 79 May 10, 2018

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele 62

Semantic Segmentation for CS Students
No ratings yet
Semantic Segmentation for CS Students
151 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
181 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
02 Semantic Segmentation 2024
No ratings yet
02 Semantic Segmentation 2024
53 pages
Object Detyection Using CNN
No ratings yet
Object Detyection Using CNN
113 pages
Lecture 4
No ratings yet
Lecture 4
46 pages
8-Image Detection and Segmentation
No ratings yet
8-Image Detection and Segmentation
73 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Lecture 5 Segmentation
No ratings yet
Lecture 5 Segmentation
140 pages
Object Detection-Compressed
No ratings yet
Object Detection-Compressed
80 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Object Detection and Segmentation - Part 2
No ratings yet
Object Detection and Segmentation - Part 2
36 pages
Deep Learning for Image Segmentation
No ratings yet
Deep Learning for Image Segmentation
92 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
Deconvolution Network ICCV 2015 Paper PDF
No ratings yet
Deconvolution Network ICCV 2015 Paper PDF
9 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
17 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
AML - Lecture - 10 - 15nov24
No ratings yet
AML - Lecture - 10 - 15nov24
169 pages
CS60010 - CNN 4
No ratings yet
CS60010 - CNN 4
32 pages
14 Segmentation
No ratings yet
14 Segmentation
22 pages
Thesis AlexanderJaus BIBTEX
No ratings yet
Thesis AlexanderJaus BIBTEX
9 pages
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
No ratings yet
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
10 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
No ratings yet
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
76 pages
Image Segmentation Basics
No ratings yet
Image Segmentation Basics
11 pages
Segmentation-Aware Convolutional Networks Using Local Attention Masks
No ratings yet
Segmentation-Aware Convolutional Networks Using Local Attention Masks
11 pages
Lecture 6 Review
No ratings yet
Lecture 6 Review
74 pages
A Novel Attention Fully Convolutional Network Method For Synthetic Aperture Radar Image Segmentation
No ratings yet
A Novel Attention Fully Convolutional Network Method For Synthetic Aperture Radar Image Segmentation
14 pages
Lecture 21 Semantic Segmentation
No ratings yet
Lecture 21 Semantic Segmentation
24 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
No ratings yet
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
6 pages
Large Kernel Matters
No ratings yet
Large Kernel Matters
11 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
【全局卷积GAP】2017 - Large - Kernel - Matters - Improve - Semantic - Segmentation - by - Global - Convolutional - Network
No ratings yet
【全局卷积GAP】2017 - Large - Kernel - Matters - Improve - Semantic - Segmentation - by - Global - Convolutional - Network
9 pages
Unsupervised Image Segmentation Model
No ratings yet
Unsupervised Image Segmentation Model
13 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
Semantic Segmentation with Keras
No ratings yet
Semantic Segmentation with Keras
5 pages
REF-6-DeepLab Semantic Image Segmentation With Deep Convolutional Nets Atrous Convolution and Fully Connected CRFs
No ratings yet
REF-6-DeepLab Semantic Image Segmentation With Deep Convolutional Nets Atrous Convolution and Fully Connected CRFs
15 pages
Semantic Segmentation by Using Down-Sampling and S
No ratings yet
Semantic Segmentation by Using Down-Sampling and S
14 pages
Cs383 Lecture 20 PDF
No ratings yet
Cs383 Lecture 20 PDF
61 pages
Lecture Sematic-Segmentation
No ratings yet
Lecture Sematic-Segmentation
23 pages
Hota ML32
No ratings yet
Hota ML32
9 pages
Deep Learning for Vision Experts
No ratings yet
Deep Learning for Vision Experts
91 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
CVNN Paper
No ratings yet
CVNN Paper
10 pages
FCN 29sep2018
No ratings yet
FCN 29sep2018
12 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
100 pages
Overview of Semantic Segmentation
No ratings yet
Overview of Semantic Segmentation
20 pages
AML - Lecture - 11 - 19nov24
No ratings yet
AML - Lecture - 11 - 19nov24
103 pages
Chapter 7 - Part 3 - DL For CV
No ratings yet
Chapter 7 - Part 3 - DL For CV
79 pages
A Beginner's Guide To Deep Learning Based Semantic Segmentation Using Keras - Divam Gupta
No ratings yet
A Beginner's Guide To Deep Learning Based Semantic Segmentation Using Keras - Divam Gupta
14 pages
Deep Learning in Semantic Segmentation
No ratings yet
Deep Learning in Semantic Segmentation
28 pages
DL Unit 5
No ratings yet
DL Unit 5
63 pages
Explo PPT
No ratings yet
Explo PPT
25 pages
All Notes
No ratings yet
All Notes
127 pages
Important Dates
No ratings yet
Important Dates
1 page
IIEE Admission Notice For Batch 2022
No ratings yet
IIEE Admission Notice For Batch 2022
1 page
Pizza Crust Menu in Karachi - Restaurant Online Ordering Pakistan
No ratings yet
Pizza Crust Menu in Karachi - Restaurant Online Ordering Pakistan
2 pages
Universities List
No ratings yet
Universities List
4 pages
GRE Quantitative PDF
No ratings yet
GRE Quantitative PDF
7 pages
EEG-Deformer for BCI Enhancement
No ratings yet
EEG-Deformer for BCI Enhancement
10 pages
Real-Time System For Driver Fatigue Detection Based On A Recurrent Neuronal Network
No ratings yet
Real-Time System For Driver Fatigue Detection Based On A Recurrent Neuronal Network
15 pages
ChatGPT and OpenAI Overview
No ratings yet
ChatGPT and OpenAI Overview
8 pages
Assignment 2 - 20240709
No ratings yet
Assignment 2 - 20240709
13 pages
Data Science Skills & Learning Roadmap
No ratings yet
Data Science Skills & Learning Roadmap
4 pages
Intro to Machine Learning Course
No ratings yet
Intro to Machine Learning Course
9 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
8 pages
Machine Learning Papers Report
No ratings yet
Machine Learning Papers Report
5 pages
MACHINE LEARNING Syllabus
No ratings yet
MACHINE LEARNING Syllabus
3 pages
Python Latest Ieee Extension Titles
No ratings yet
Python Latest Ieee Extension Titles
16 pages
Deep Learning Exam Solutions 2019
No ratings yet
Deep Learning Exam Solutions 2019
20 pages
MTP: Advancing Remote Sensing Foundation Model Via Multi-Task Pretraining
No ratings yet
MTP: Advancing Remote Sensing Foundation Model Via Multi-Task Pretraining
21 pages
LLMs: A Researcher's Guide
No ratings yet
LLMs: A Researcher's Guide
46 pages
2025 PYTHON SECOND 50 PROJECTS LIST - Translated
No ratings yet
2025 PYTHON SECOND 50 PROJECTS LIST - Translated
4 pages
Learning Temporal Regularity in Video Sequences
No ratings yet
Learning Temporal Regularity in Video Sequences
40 pages
Tensorflow Tutorial PDF
100% (6)
Tensorflow Tutorial PDF
90 pages
Deep Learning - IIT Ropar - Unit 13 - Week 10
No ratings yet
Deep Learning - IIT Ropar - Unit 13 - Week 10
4 pages
Stress Detection Using Deep Neural Networks
No ratings yet
Stress Detection Using Deep Neural Networks
11 pages
Bio Optimization of Deep Learning Network Architectures 22fguqp5
No ratings yet
Bio Optimization of Deep Learning Network Architectures 22fguqp5
11 pages
3D-DDA for Brain Tumor Segmentation
No ratings yet
3D-DDA for Brain Tumor Segmentation
17 pages
AI Safety: Risks and Research Focus
No ratings yet
AI Safety: Risks and Research Focus
23 pages
Reinforcement Learning Applications
No ratings yet
Reinforcement Learning Applications
10 pages
AI-Enhanced Intrusion Detection for IoV
No ratings yet
AI-Enhanced Intrusion Detection for IoV
24 pages
AI Introduction by Ahmed Banafa
100% (1)
AI Introduction by Ahmed Banafa
76 pages
Artificial Intelligence Ai Courses Training Udacity
No ratings yet
Artificial Intelligence Ai Courses Training Udacity
1 page
Cours 1 - Intro To Deep Learning
100% (1)
Cours 1 - Intro To Deep Learning
38 pages
(領先制勝試閱版) AWS AIF C01 401 405
No ratings yet
(領先制勝試閱版) AWS AIF C01 401 405
4 pages
IEEE Journal Paper Template 1
No ratings yet
IEEE Journal Paper Template 1
5 pages
Single Layer Perceptron Experiment
No ratings yet
Single Layer Perceptron Experiment
11 pages
99 + Machine Learning Algorithms
No ratings yet
99 + Machine Learning Algorithms
7 pages