3D Data Processing
Structure from Motion
Alberto Pretto
SfM: Problem Statement
SfM is the process of reconstructing the 3D
structure of a scene from its projections into a
series of images taken from different viewpoints.
2
Picture from Sameer Agarwal et al.
Structure from Motion
● Input: a set of images that frame the same scene
from different points of view. Each image should
have some field of view overlap with other
images
– Images can be totally unordered and taken with
differen cameras
– Camera calibration parameters may not be available
● Output: camera positions, camera calibrations
(optional) and sparse 3D scene structure (i.e., a
set of 3D points)
3
Incremental SfM Typical Pipeline
● Incremental SfM is a sequential processing pipeline
with an iterative reconstruction component
● In this discussion, we report some implementation
details from:
J. L. Schanberger and J. Frahm, "Structure-from-Motion Revisited,"
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016,
4
Correspondence Search
● For each image extract inavarinat
features with associated descriptors
(e.g., SIFT, SURF, ORB,...).
● Match features between every
image pair
– This is just a naive approach since an
exhaustive matching has
computational complexity:
– Many efficient approaches have
been introduced to improve matching
5
Geometric Verification
● Matching is based solely
on appearance, no
guarantee that matches
refer to the same 3D point
in the scene
● Use RANSAC +
projective/epipolar
geometry to verify the
matches
6
Geometric Verification
● Try all models:
– Essential matrix E (use an initial guess calibration)
– Fundamental matrix F (uncalibrated settings)
– Homography matrix H (planar scene or pure rotation)
● For each model, use RANSAC to estimate the best
estimate:
– Sample N minimal set of mathes (e.g., 5, 8, or 4) and find the
maximum number of inliers that support the model
● Select the best model E, F or H based on certain criteria:
its inliers define the inlier matches between an image
pair
7
Correspondences
3 3
5 6
5 8
Observations
3 3
4 5
5 9
Scene Graph
After geometric verification, we can build a
scene graph with images as nodes and
verified pairs of images as edges. It
provides:
– Correspondences between a pair of images.
– Number of correspondences per image.
– Number of observations in an image.
– The correspondence of an image observation
with all other images
10
Incremental Mapping
● Iterative, "try and error" process
● Reconstruction may never recover from a
bad initialization and also from a bad
image registration
11
Initialization
● Find a good initial pair ("seed"):
– Sort images such that images with
more correspondences are
preferred, i.e. they appear in the
front of the list.
– Collect images that are
connected to the first seed image
and have not been registered
before
– If not done before, perform
geometrical verification
12
Initialization
● Try to recover a rigid body
transformation from the selected
model (E, F, or H)
– In the uncalibrated case, this most likely
leads to a ill-defined reconstruction
● If failed, restart the initalization process
● If the motion is a pure rotation
estimated from H, restart the
initalization process
– No triangulation can be obtained from a R,t
pure rotation
13
Pose Recovery
When recovering
position from the
Essential or
Fundamental
matrix we get 4
possible solutions:
use cheirality
constraint to select
one
[Picture from Marc Pollefeys, Viktor Larsson] 14
Initial Triangulation
Using point correspondences
and the initial R,t, triangulate
point to obtain their 3D
coordinates
R,t 15
Triangulation
For a non rectilinear stereo rigs, no single
solution due to noise
X?
x’
x
[Some slide from 16-385 Computer Vision, www.cs.cmu.edu ] 16
Triangulation
We have R,t and (an estimate of) the
calibrations parameters: given a 3D point,
we know how to project it in both views!
X?
x’
x
17
Triangulation
To use our points cooridnates x, we need to
normalize ("h-normalize") the homogeneous
points:
Same direction but differs
by a scale factor
Cross product of two vectors of same
direction is zero:
18
Triangulation
19
Triangulation
20
Triangulation
Third line is a linear combination of the first and second lines.
21
Triangulation
Use both points:
Is a homogeneous linear system!
Use SVD and take the last (h-normalized)
column of V
22
PnP-based Image Registration
A newly registered image
must observe existing scene
points.
● Use Perspective-n-Point
(PnP) using feature
correspondences to
triangulated points in
already registered images.
– Estimate both R’,t’ and
(optionally) intrinsic
parameters.
– Use a modified version of PnP R,t
that exploits RANSAC and a
minimal pose solver R’,t’
23
3D Points Registration
The newly
registered image
may also increase
scene coverage
by extending the
set of points X
through
triangulation.
R,t
R’,t’
24
Next Best View Selection
Several criteria to select a still not registered image, among others:
– "Sees" the maximum of already registered points, or;
– Use visivility score (e.g., sum #occupied cells * res over a number of
resolutions with res = 2,4,8,...):
25
[J. L. Schanberger and J. Frahm]
Bundle Adjustment
● We may iterate with PnP and triangulation
steps to register new images and new points
but...
● ... they are very correltated procedures:
uncertainties in the camera pose propagate
to triangulated points and vice-versa
● Bundle Adjustment (BA): joint non-linear
refinement of camera parameters and point
parameters
26
Bundle Adjustment
R1,t1
R2,t2
R3,t3 - = reprojection error
R4,t4
27
Bundle Adjustment
BA goal: find a suitable parameters set
(transformations Ti= Ri,ti, calibration
parameters θi , 3D points positions Pj , i=
0, ...Nc – 1, j=0, ..Np - 1) that minimizes the
sum of squares of reprojection errors (one
for each observation of each camera):
Solve with LM 28
BA Parametrization
● Camera positions (#=6*(Nc- 1))
– Set the coordinate system of one of the two seed cameras as reference
world frame
– Use a suitable rotation representation (e.g., with Axis-Angle, 6 DoF for each
camera)
● Calibration parameters (if not given) (#=3*Nc)
– Fix the principal point at the image center (principal point
calibration is an ill-posed problem)
– Optimize for each camera the focal length in pixel (e.g., one
shared focal lenght) and some distortion parameters (e.g., two
radial distortion parameters)
● Scene points (#=3*Np)
– Simply parametrized as 3D points
29
BA Complexity
LM update:
with
a = [a1, ..., am ]T problem parmeters vector, e.g. with
dimension:
m = 6*(Nc- 1) +3*Nc + 3*Np
Warning: matrix inversion requires O(m3)
30
Schur Complement Trick
Idea to efficiently solve BA: exploit J
sparsity and the fact the usually Np >> Nc
C1 C2 C3 X1 X2 X3 X4
Ci : camera i params
Camera 1 Xj : point j params
observations
Camera 2
observations
Camera 3
observations
31
Schur Complement Trick
Rewrite LM step as:
and let define H = JTJ + λI + λI
H=
C E
E P
T
C depends only on cameras,
P depends only on points
32
Schur Complement Trick
Partitioning the step update between
cameras and structure (points):
Camera Parameters
Structures (3D points)
33
Schur Complement Trick
Multiply both sides by
Schur complement of P 34
Schur Complement Trick
● P block diagonal matrix: calculating the
inverse of P by inverting each of its 3x3
blocks is cheap!
– Solve for δac then then backsubstituting it to
obtain the value of δap
● Complexity: O(Nc3)
35
Robust Cost Function
● Standard least square assumes gaussian errors
– High influence from high errors!
● To account for potential outliers, it is better to choose a
proper loss function ρ (e.g., Cauchy, Huber, Tuckey, ...)
36
Sparse to Dense Reconstruction
Use Multi-View Stereo (MVS): see next
lectures...
37
Picture from Sameer Agarwal et al.
Open Source SfM Frameworks
● COLMAP
https://colmap.github.io/
● Bundler: Structure from Motion (SfM) for
Unordered Image Collections
https://www.cs.cornell.edu/~snavely/bundler/
● Multicore Bundle Adjustment
http://grail.cs.washington.edu/projects/mcba/
38