Davide Scaramuzza
University of Zurich
Robotics and Perception Group
http://rpg.ifi.uzh.ch/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Scaramuzza, D., Fraundorfer, F., Visual Odometry: Part I - The First 30 Years and
Fundamentals, IEEE Robotics and Automation Magazine, Volume 18, issue 4, 2011.
Fraundorfer, F., Scaramuzza, D., Visual Odometry: Part II - Matching, Robustness, and
Applications, IEEE Robotics and Automation Magazine, Volume 19, issue 1, 2012.
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
VO is the process of incrementally estimating the pose of the vehicle by
examining the changes that motion induces on the images of its onboard
cameras
input output
Image sequence (or video stream)
from one or more cameras attached to a moving vehicle
Camera trajectory (3D structure is a plus):
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Sufficient illumination in the environment
Dominance of static scene over moving objects
Enough texture to allow apparent motion to be extracted
Sufficient scene overlap between consecutive frames
Is any of these scenes good for VO? Why?
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Contrary to wheel odometry, VO is not affected by wheel slip in
uneven terrain or other adverse conditions.
More accurate trajectory estimates compared
to wheel odometry
(relative position error 0.1% − 2%)
VO can be used as a complement to
wheel odometry
GPS
inertial measurement units (IMUs)
laser odometry
In GPS-denied environments,
such as underwater and aerial,
VO has utmost importance
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Image 1 Image 2
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
VO computes the camera path incrementally (pose after pose)
Image sequence
Feature detection
SIFT features tracks
Feature matching (tracking)
Motion estimation Ck+1
2D-2D 3D-3D 3D-2D Tk+1,k
Ck
Local optimization
Tk,k-1 Ck-1
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Ck+1
Ck
Tk+1,k
Tk,k-1
Ck-1
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
SFM is more general than VO and tackles the problem of 3D reconstruction of
both the structure and camera poses from unordered image sets
The final structure and camera poses are typically refined with an offline
optimization (i.e., bundle adjustment), whose computation time grows with the
number of images
This video can be seen at
http://youtu.be/kxtQqYLRaSQ
Reconstruction from 3 million images from Flickr.com
Cluster of 250 computers, 24 hours of computation!
Paper: “Building Rome in a Day”, ICCV’09
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
VO is a particular case of SFM
VO focuses on estimating the 3D motion of the camera
sequentially (as a new frame arrives) and in real time.
Bundle adjustment can be used (but it’s optional) to refine the
local estimate of the trajectory
Terminology: sometimes SFM is used as a synonym of VO
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Before loop closing After loop closing
Image courtesy of Clemente et al. RSS’07
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
VO only aims to the local consistency of the
trajectory
SLAM aims to the global consistency of the
trajectory and of the map
Image courtesy of Clemente et al. RSS’07
VO can be used as a building block of SLAM
Visual odometry
VO is SLAM before closing the loop!
The choice between VO and V-SLAM depends on
the tradeoff between performance and
consistency, and simplicity in implementation.
VO trades off consistency for real-time
performance, without the need to keep track of
all the previous history of the camera.
Visual SLAM
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Brief history of VO
Problem formulation
Camera modeling and calibration
Motion estimation
Robust estimation
Error propagation
Camera-pose optimization (bundle adjustment)
Discussion
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
1996: The term VO was coined by Srinivasan to define motion orientation
in honey bees.
1980: First known stereo VO real-time implementation on a robot by Moraveck
PhD thesis (NASA/JPL) for Mars rovers using a sliding camera. Moravec invented a
predecessor of Harris detector, known as Moravec detector
1980 to 2000: The VO research was dominated by NASA/JPL in preparation of
2004 Mars mission (see papers from Matthies, Olson, etc. From JPL)
2004: VO used on a robot on another planet: Mars rovers Spirit and Opportunity
2004. VO was revived in the academic environment
by Nister «Visual Odometry» paper.
The term VO became popular.
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Brief history of VO
Problem formulation
Camera modeling and calibration
Motion estimation
Robust estimation
Error propagation
Camera-pose optimization (bundle adjustment)
Discussion
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Tk,k-1
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Ck+1
Tk+1,k
Ck
Tk,k-1
Ck-1
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Ck+1
Tk+1,k
Ck
Tk,k-1
Ck-1
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
𝑪𝟎 𝑪𝟏 𝑪𝟑 𝑪𝟒 𝑪𝒏−𝟏 𝑪𝒏
...
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Brief history of VO
Problem formulation
Camera modeling and calibration
Motion estimation
Robust estimation
Error propagation
Camera-pose optimization (bundle adjustment)
Discussion
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
P
p' z' z
f
1 1 1
z z' f
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
P
p' z' z
f
1 1
z z ' z' f
z' f
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
P
F C
p'
z
f
1 1
z z ' z' f
z' f
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
P
F C
Single effective viewpoint
Image plane
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
X
p
Single effective viewpoint
v
u
Image plane
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Always possible after the camera has been calibrated!
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
For convenience, points are projected on the unit sphere. Why?
In the perspective case, is it better to use the perspective or the
spherical model?
Image courtesy of Micusik & Pajdla, ACCV’04
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Brief history of VO
Problem formulation
Camera modeling and calibration
Motion estimation
Robust estimation
Error propagation
Camera-pose optimization (bundle adjustment)
Image sequence
Feature detection
Discussion
Feature matching
Motion estimation
2D-2D 3D-3D 3D-2D
Local optimization
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
...
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Makadia et al. «Correspondence-free structure from motion», IJCV’07
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Global methods are less accurate than feature-based methods and
are computationally more expensive.
Feature-based methods require the ability to match (or track)
robustly features across frames but are faster and more accurate
than global methods. Therefore, most VO implementations are
feature based.
Image courtesy of Makadia et al., IJCV’07
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Both 𝑓𝑘−1 and 𝑓𝑘 are specified in 2D
The minimal-case solution involves 5-point correspondences
The solution is found by determining the transformation that
minimizes the reprojection error of the triangulated points in each image
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Both 𝑓𝑘−1 and 𝑓𝑘 are specified in 3D
To do this, it is necessary to triangulate 3D points (e.g. use a stereo
camera)
The minimal-case solution involves 3 non-collinear correspondences
The solution is found by determining the aligning transformation that
minimizes the 3D-3D distance
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Both 𝑓𝑘−1 and 𝑓𝑘 are specified in 3D
To do this, it is necessary to triangulate 3D points (e.g. use a stereo
camera)
The minimal-case solution involves 3 non-collinear correspondences
The solution is found by determining the aligning transformation that
minimizes the 3D-3D distance
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
𝑓𝑘−1 is specified in 3D and 𝑓𝑘 in 2D
This problem is known as camera resection or PnP (perspective from
n points)
The minimal-case solution involves 3 correspondences
The solution is found by determining the transformation that
minimizes the reprojection error
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
In the monocular case, the 3D structure needs to be triangulated
from two adjacent camera views (e.g., 𝐼𝑘−2 and 𝐼𝑘−1 ) and then
matched to 2D image features in a third view (e.g., 𝐼𝑘 ).
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Both 𝑓𝑘−1 and 𝑓𝑘 are specified in 2D
The minimal-case solution involves 5-point correspondences
The solution is found by determining the transformation that
minimizes the reprojection error of the triangulated points in each image
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
epipolar plane
x1 x2
p1 y1 p2 y2
z1 z 2
p2T (t p1 ' ) 0 p2T (t ( Rp1 )) 0
p2T [t ] R p1 0 p2T E p1 0 Epipolar constraint
E [t ] R essential matrix
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
x1 x2
p1 y1 p2 y2 Image coordinates on the Unit sphere
z1 z 2
p2T E p1 0 Epipolar constraint
E [t ] R Essential matrix
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
which can be solved with SVD
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Motion estimation
2D-2D 3D-3D 3D-2D
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Type of Monocular Stereo
correspondences
2D-2D X X
3D-3D X
3D-2D X X
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Some of the previous motion estimation methods require
triangulation of 3D points
Triangulated 3D points are determined by intersecting
backprojected rays from 2D image correspondences of at least
two image frames
In reality, they never intersect due to
image noise,
camera model and calibration errors,
and feature matching uncertainty
The point at minimal distance from all intersecting rays can be
taken as an estimate of the 3D point position
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
When frames are taken at nearby positions compared to the scene
distance, 3D points will exibit large uncertainty
Therefore, 3D-3D motion estimation methods will drift much
more quickly than 3D-2D and 2D-2D methods
In fact, the uncertainty introduced by triangulation affects the motion
estimation. In fact, in the 3D-to-3D case the 3D position error is
minimized, while in the 3D-to-2D and 2D-to-2D cases is the image
reprojection error
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
One way to avoid this consists of skipping frames until the average
uncertainty of the 3D points decreases below a certain threshold. The
selected frames are called keyframes
Keyframe selection is a very important step in VO and should always
be done before updating the motion
...
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
In the Stereo vision case, 3D-2D method exhibits less drift than
3D-3D method
Stereo vision has the advantage over monocular vision that both
motion and structure are computed in the absolute scale. It also
exhibits less drift.
When the distance to the scene is much larger than the stereo
baseline, stereo VO degenerates into monocular VO
Keyframes should be selected carefully to reduce drift
Regardless of the chosen motion computation method, local
bundle adjustment (over the last m frames) should be always
performed to compute a more accurate estimate of the trajectory.
After bundle adjustment, the effects of the motion
estimation method are much more alleviated (as long as the
initialization is close to the solution)
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Brief history of VO
Problem formulation
Camera modeling and calibration
Motion estimation
Robust estimation
Error propagation
Camera-pose optimization (bundle adjustment)
Image sequence
Feature detection
Discussion
Feature matching
Motion estimation
2D-2D 3D-3D 3D-2D
Local optimization
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Matched points are usually contaminated by outliers, that is, wrong
data associations
Possible causes of outliers are
image noise,
occlusions, efirecam-0-0000018959.jpg, INLIERS
efirecam-0-0000018959.jpg
blur,
changes in view point and illumination
50
for
which the mathematical model of 100
the
feature detector or descriptor does not
account for 150
For the camera motion to be 200
estimated accurately, outliers must be
250
removed
This is the task of Robust Estimation
300
350
400
450
100 200 300 400 500 6
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Error at the loop closure: 6.5 m
Error in orientation: 5 deg
Trajectory length: 400 m
Before removing the outliers
After removing the outliers
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
• Select sample of 2
points at random
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
• Select sample of 2
points at random
• Calculate model
parameters that fit the
data in the sample
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
• Select sample of 2
points at random
• Calculate model
parameters that fit the
data in the sample
• Calculate error
function for each data
point
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
• Select sample of 2
points at random
• Calculate model
parameters that fit the
data in the sample
• Calculate error
function for each data
point
• Select data that
support current
hypothesis
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
• Select sample of 2
points at random
• Calculate model
parameters that fit the
data in the sample
• Calculate error
function for each data
point
• Select data that
support current
hypothesis
• Repeat sampling
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
• Select sample of 2
points at random
• Calculate model
parameters that fit the
data in the sample
• Calculate error
function for each data
point
• Select data that
support current
hypothesis
• Repeat sampling
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
ALL-INLIER SAMPLE
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Has been established as the standard method for motion estimation
in the presence of outliers
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Has been established as the standard method for motion estimation
in the presence of outliers
1. Randomly select a minimal set of point
correspondences
2. Compute motion and count inliers
3. Repeat from 1
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Has been established as the standard method for motion estimation
in the presence of outliers
1. Randomly select a minimal set of
point correspondences
2. Compute motion and count inliers
3. Repeat N times
The number of iterations needed
grows exponentially with the
outliers
~ 1000 iterations!
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
log(1 p)
N
log(1 (1 ) s )
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
To estimate the motion of a calibrated camera in 6 DoF,
we need 5 points
[Kruppa, 1913]
Why ?
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
In 6 DoF we would need 6 points …
… but the scale is unobservable …
… and therefore we only need 6 – 1 = 5 points
[“5-Point RANSAC”, Nister, 2003]
General rule:
Minimum number of points = NDoF - 1
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
The “5-Point RANSAC” typically needs ~1000 iterations
To reduce the number iterations, we should use a smaller number of points ( < 5 )
Is this possible?
Yes, if we exploit motion constraints!
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
For planar motion, only 3 parameters need to be estimated
θ, φ, ρ => 3 DoF
and therefore only 2 points are needed
[“2-Point RANSAC”, Ortin, 2001]
Can we use an even smaller number of points?
Yes, if we exploit the vehicle non-holonomic constraints
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Wheeled vehicles follow locally circular motion about the Instantaneous Center
of Rotation (ICR)
Example of Ackerman steering principle Locally circular motion
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Wheeled vehicles follow locally circular motion about the Instantaneous Center
of Rotation (ICR)
Example of Ackerman steering principle Locally circular motion
φ = θ/2 => only 2 parameters (θ, ρ) need to be estimated
and therefore only 1 point is needed
This is the smallest parameterization possible and results in
the most efficient algorithm for removing outliers
D. Scaramuzza. 1-Point-RANSAC Structure from Motion for Vehicle-Mounted Cameras by Exploiting Non-holonomic
Constraints. International Journal of Computer Vision, Volume 95, Issue 1, 2011
D. Scaramuzza. Copyright of Davide
Performance Scaramuzza
Evaluation - davide.scaramuzza@ieee.org
of 1-Point-RANSAC - https://sites.google.com/site/scarabotix/
Visual Odometry. Journal of Field Robotics, Vol. 28, issue 5, 2011
Compute θ for
every point
correspondence
Only 1 iteration
The most efficient algorithm for
removing outliers, up to 800 Hz
1-Point RANSAC is ONLY used to find the inliers.
Motion is then estimated from them in 6DOF
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
1000
900
800
700 5-point RANSAC
Number of iterations, N
600
500
2-point RANSAC
400
300
200
100 1-point RANSAC
0
0 10 20 30 40 50 60 70 80 90 100
Fraction of outliers in the data (%)
5-Point RANSAC 2-Point RANSAC 1-Point RANSAC
[Nister’03] [Ortin’01] [Scaramuzza, IJCV’11, JFR’11]
Number of ~1000 ~100 1
iterations
Our proposed method
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
15,000 images collected in Zurich during a over 25 Km path
Image resolution: 640 x 480
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
This video can be seen at
http://youtu.be/t7uKWZtUjCE
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Is it really better to use minimal sets in RANSAC?
If one is concerned with certain speed requirements, YES
However, might not be a good choice if the image
correspondences are very noisy: in this case, the motion
estimated from a minimal set wil be inaccurate and will exhibit
fewer inliers when tested on all other points
Therefore, when the computational time is not a real concern and
one deals with very noisy features, using a non-minimal set
may be better than using a minimal set
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Brief history of VO
Problem formulation
Camera modeling and calibration
Motion estimation
Robust estimation
Error propagation
Camera-pose optimization (bundle adjustment)
Image sequence
Feature detection
Discussion
Feature matching
Motion estimation
2D-2D 3D-3D 3D-2D
Local optimization
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
The uncertainty of the camera pose 𝐶𝑘 is a combination of the
uncertainty at 𝐶𝑘−1 (black-solid ellipse) and the uncertainty of the
transformation 𝑇𝑘 (gray dashed ellipse)
𝐶𝑘 = 𝑓(𝐶𝑘−1 , 𝑇𝑘 ) Ck+1
Ck
The combined covariance ∑𝑘 is Tk+1
Tk
Ck-1
The camera-pose uncertainty is always increasing when concatenating
transformations. Thus, it is important to keep the uncertainties of the
individual transformations small
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Brief history of VO
Problem formulation
Camera modeling and calibration
Motion estimation
Robust estimation
Error propagation
Camera-pose optimization (bundle adjustment)
Image sequence
Feature detection
Discussion
Feature matching
Motion estimation
2D-2D 3D-3D 3D-2D
Local optimization
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
𝑪𝒏−𝒎 𝑪𝒏−𝒎+𝟏 𝑪𝒏−𝒎+𝟐 𝑪𝒏−𝒎+𝟑 𝑪𝒏−𝟏 𝑪𝒏
... ...
𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝒏
𝑻𝟑,𝟏
𝑻𝟒,𝟏 𝑻𝒏−𝟏,𝟑
𝒎
So far we assumed that the transformations are between consecutive
frames
Transformations can be computed also between non-adjacent frames
𝑇𝑒𝑖𝑗 and can be used as additional constraints to improve cameras
poses by minimizing the following
For efficiency, only the last 𝑚 keyframes are used
Levenberg-Marquadt can be used
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
𝑪𝒏−𝒎 𝑪𝒏−𝒎+𝟏 𝑪𝒏−𝒎+𝟐 𝑪𝒏−𝒎+𝟑 𝑪𝒏−𝟏 𝑪𝒏
... ...
𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝒏
𝑻𝟑,𝟏
𝑻𝟒,𝟏 𝑻𝒏−𝟏,𝟑
𝒎
Similar to pose-optimization but it also optimizes 3D points
In order to not get stuck in local minima, the initialization should
be close the minimum
Levenberg-Marquadt can be used
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Loop constraints are very valuable constraints for pose graph
optimization
These constraints form graph edges between nodes that are usually far
apart and between which large drift might have been accumulated.
Events like reobserving a landmark after not seeing it for a long time or
coming back to a previously-mapped area are called loop detections
Loop constraints can be found by evaluating visual similarity between
the current camera images and past camera images.
Visual similarity can be computed using global image descriptors or
local image descriptors (see lecture about Visual SLAM)
First observation
Image courtesy of Cummins & Newman, IJRR’08
Second observation after a loop
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Windowed BA reduces the drift compared to 2-view VO because
incorporates constraints between several frames
More precise than camera-pose optimization
The choise of the window size m is governed by computational
reasons
The computational complexity of BA is 𝑂 𝑞𝑁 + 𝑙𝑚 3 with 𝑁 being
the number of points, 𝑚 the number of poses, and 𝑞 and 𝑚 the
number of parameters for points and camera poses
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Other sensors can be used such as
IMU (called inertial VO)
Compass
GPS
Laser
An IMU combined with a single camera allows the estimation of
the absolute scale. Why?
Make sure that you have many points (thoudsands) which cover
the image uniformly
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
VO has successfully been applied within various technological fields
Space exploration:
Planetary lander furing descent phase
Spirit and Opportunity Mars-exploration rovers
Since 2004, used VO in addition to dead-reckoning for
about 6 Km
Especially in preence of wheel slip
MAV navigation
European project SFLY
Vision-based MAVs at the Robotics and Perception Group
(see http://rpg.ifi.uzh.ch/research_mav.html )
Underwater vehicles
The sFly video can be seen at
http://youtu.be/_-p08o_oTO4
Automotive industry
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
World-first mouse scanner
Currently distributed by LG: SmartScan LG LSM100
This video can be seen at
http://youtu.be/A4NGXFv27AE
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/
Copyright of Davide Scaramuzza - davide.scaramuzza@ieee.org - https://sites.google.com/site/scarabotix/