D R .
DAU D A B D U L L A H
COMPUTER VISION W E E K 4
F E B 2 0 2 3
Agenda
2.1 Primitives and Transformations
2.2 Geometric Image Formation
2.3 Photometric Image Formation
2.4 Image Sensing Pipeline
2
2.1
Primitives and Transformations
Primitives and Transformations
I Geometric primitives are the basic building blocks used to describe 3D shapes
I In this unit, we introduce points, lines and planes
I Furthermore, the most basic transformations are discussed
I This unit covers the topics of the Szeliski book, chapter 2.1
I A more exhaustive introduction can be found in the book:
Hartley and Zisserman: Multiple View Geometry in Computer Vision
4
2D Points
2D points can be written in inhomogeneous coordinates as
!
x
x= ∈ R2
y
or in homogeneous coordinates as
x̃
x̃ = ỹ ∈ P2
w̃
where P2 = R3 \ {(0, 0, 0)} is called projective space.
Remark: Homogeneous vectors that differ only by scale are considered equivalent and
define an equivalence class. ⇒ Homogeneous vectors are defined only up to scale.
5
2D Points
An inhomogeneous vector x is converted to a homogeneous vector x̃ as follows
x̃ x !
x
x̃ = ỹ = y = = x̄
1
w̃ 1
with augmented vector x̄.To convert in the opposite direction we divide by w̃:
! x x̃ x̃/w̃
x 1 1
x̄ = = y = x̃ = ỹ = ỹ/w̃
1 w̃ w̃
1 w̃ 1
Homogeneous points whose last element is w̃ = 0 are called ideal points or
points at infinity. These points can’t be represented with inhomogeneous coordinates!
6
2D Points
Homogeneous Vector
Augmented Vector
Homogeneous Inhomogeneous
Coordinates Coordinates
7
2D Lines
2D lines can also be expressed using homogeneous coordinates l̃ = (a, b, c)> :
{x̄ | l̃> x̄ = 0} ⇔ {x, y | ax + by + c = 0}
We can normalize l̃ so that l̃ = (nx , ny , d)> = (n, d)> with knk2 = 1. In this case, n is
the normal vector perpendicular to the line and d is its distance to the origin.
An exception is the line at infinity l̃∞ = (0, 0, 1)> which passes through all ideal points.
8
Cross Product
Cross product expressed as the product of a skew-symmetric matrix and a vector:
0 −a3 a2 b1 a2 b3 − a3 b2
a × b = [a]× b = a3 0 −a1 b2 = a3 b1 − a1 b3
−a2 a1 0 b3 a1 b2 − a2 b1
Remark: In this course, we use squared brackets to distinguish matrices from vectors.
9
2D Conics
More complex algebraic objects can be represented using polynomial homogeneous
equations. For example, conic sections (arising as the intersection of a plane and a
3D cone) can be written using quadric equations:
{x̄ | x̄> Q x̄ = 0}
Circle Ellipse Parabola Hyperbola
Useful for multi-view geometry and camera calibration, see Hartley and Zisserman.
13
3D Points
3D points can be written in inhomogeneous coordinates as
x
x = y ∈ R3
or in homogeneous coordinates as
x̃
ỹ 3
z̃ ∈ P
x̃ =
w̃
with projective space P3 = R4 \ {(0, 0, 0, 0)}.
14
3D Quadrics
The 3D analog of 2D conics is a quadric surface:
{x̄ | x̄> Q x̄ = 0}
Useful in the study of multi-view geometry. Also serves as useful modeling primitives
(spheres, ellipsoids, cylinders), see Hartley and Zisserman, Chapter 2 for details.
18
Superquadrics Revisited
Superquadrics (generalization of quadrics) for shape abstraction and compression.
Paschalidou, Ulusoy and Geiger: Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids. CVPR, 2019. 19
2D Transformations
Translation: (2D Translation of the Input, 2 DoF)
" #
I t
x0 = x + t ⇔ x̄0 = > x̄
0 1
I Using homogeneous representations allows to chain/invert transformations
I Augmented vectors x̄ can always be replaced by general homogeneous ones x̃
20
Application: Panorama Stitching
Brown and Lowe: Recognising Panoramas. ICCV, 2003 26
2.2
Geometric Image Formation
Origins of the Pinhole Camera
Animal Eye: Pinhole Perspective Projection: Photographic Camera:
A long time ago Brunelleschi, 15th Century Nicéphore Niépce, 1816
28
Origins of the Pinhole Camera
Camera Obscura:
4th Century BC
28
Origins of the Pinhole Camera
https://www.abelardomorell.net/camera-obscura
28
Origins of the Pinhole Camera
Physical Camera Model Mathematical Camera Model
e
e
y
t Ra
an
an
t Ra
y Ligh
Pl
Pl
Focal Ligh
e
e
ag
ag
Focal Point
Im
Point
Im
Light Ray Light Ray
Camera
Coordinate Camera
System Coordinate
System
I In a physical pinhole camera the image is projected up-side down
onto the image plane which is located behind the focal point
I When modeling perspective projection, we assume the image plane in front
I Both models are equivalent, with appropriate change of image coordinates
29
Projection Models
Orthographic Projection Perspective Projection
e Light Ray
e
an
an
t Ray
Ligh
Pl
Pl
e
e
ag
ag
Im
Im
Focal Point
Light Ray Light Ray
Camera
Coordinate
System
Image
Coordinate Camera Image
System Coordinate Coordinate
System System
Opto Engineering Telecentric Lens Canon 800mm Telephoto Lens Nikon AF-S Nikkor 50mm Sony DSC-RX100 V Samsung Galaxy S20
I These two are the most important projections, see Szeliski Ch. 2.1.4 for others
30
Projection Models
Perspective Weak Perspective /Orthographic
Increasing Focal Length / Distance from Camera
30
Orthographic Projection
Image Plane
Image Coordinate System
Light Ray
Camera
Coordinate
System
Camera
Center
Orthographic projection of a 3D point xc ∈ R3 to pixel coordinates xs ∈ R2 :
I The x and y axes of the camera and image coordinate systems are shared
I Light rays are parallel to the z-coordinate of the camera coordinate system
I During projection, the z-coordinate is dropped, x and y remain the same
I Remark: the y coordinate is not shown here for clarity, but behaves similarly
31
Orthographic Projection
An orthographic projection simply drops the z component of the 3D point in camera
coordinates xc to obtain the corresponding 2D point on the image plane (= screen) xs .
" # 1 0 0 0
1 0 0
xs = xc ⇔ x̄s = 0 1 0 0 x̄c
0 1 0
0 0 0 1
Orthography is exact for telecentric lenses and an approximation for telephoto lenses.
After projection the distance of the 3D point from the image can’t be recovered.
32
Scaled Orthographic Projection
In practice, world coordinates (which may measure dimensions in meters) must be
scaled to fit onto an image sensor (measuring in pixels) ⇒ scaled orthography:
" # s 0 0 0
s 0 0
xs = xc ⇔ x̄s = 0 s 0 0 x̄c
0 s 0
0 0 0 1
Remark: The unit for s is px/m or px/mm to convert metric 3D points into pixels.
Under orthography, structure and motion can be estimated simultaneously using
factorization methods (e.g., via singular value decomposition).
33
Perspective Projection
Image Plane
Image
Coordinate Light Ray
Camera System
Coordinate
System
Camera Principal Axis
Center Focal Length
Perspective projection of a 3D point xc ∈ R3 to pixel coordinates xs ∈ R2 :
I The light ray passes through the camera center, the pixel xs and the point xc
I Convention: the principal axis (orthogonal to image plane) aligns with the z-axis
I Remark: the y coordinate is not shown here for clarity, but behaves similarly
34
Perspective Projection
In perspective projection, 3D points in camera coordinates are mapped to the image
plane by dividing them by their z component and multiplying with the focal length:
! ! f 0 0 0
xs f xc /zc
= ⇔ x̃s = 0 f 0 0 x̄c
ys f yc /zc
0 0 1 0
Note that this projection is linear when using homogeneous coordinates. After the
projection it is not possible to recover the distance of the 3D point from the image.
Remark: The unit for f is px (=pixels) to convert metric 3D points into pixels.
35
Perspective Projection
Without Principal Point Offset With Principal Point Offset
Image
Coordinate
System
e
ay ay
an
ht R ht R
Lig Lig
Pl
e
ag
Im
y y
Light Ra Light Ra
Focal Point Focal Point
Principal Point
Principal Axis Principal Point Principal Axis
Camera Image Camera
e
an
Coordinate Coordinate Coordinate
Pl
System System System
e
ag
Im
I To ensure positive pixel coordinates, a principal point offset c is usually added
I This moves the image coordinate system to the corner of the image plane
36
Perspective Projection
The complete perspective projection model is given by:
! ! fx s cx 0
xs fx xc /zc + s yc /zc + cx
= ⇔ x̃s = 0 fy cy 0 x̄c
ys fy yc /zc + cy
0 0 1 0
I The left 3 × 3 submatrix of the projection matrix is called calibration matrix K
I The parameters of K are called camera intrinsics (as opposed to extrinsic pose)
I Here, fx and fy are independent, allowing for different pixel aspect ratios
I The skew s arises due to the sensor not mounted perpendicular to the optical axis
I In practice, we often set fx = fy and s = 0, but model c = (cx , cy )>
37
Chaining Transformations
Image
Coordinate
System World
e
an
Coordinate
Pl
System
e
ag
Im
Camera
Coordinate
System
Let K be the calibration matrix (intrinsics) and [R|t] the camera pose (extrinsics).
We chain both transformations to project a point in world coordinates to the image:
" #
h i h i R t h i
x̃s = K 0 x̄c = K 0 x̄ w = K R t x̄w = P x̄w
0> 1
Remark: The 3 × 4 projection matrix P can be pre-computed. 38
Lens Distortion
The assumption of linear projection (straight lines remain straight) is violated in
practice due to the properties of the camera lens which introduces distortions.
Both radial and tangential distortion effects can be modeled relatively easily:
Let x = xc /zc , y = yc /zc and r2 = x2 + y 2 . The distorted point is obtained as:
! !
x 2 κ3 x y + κ4 (r2 + 2 x2 )
x0 = (1 + κ1 r2 + κ2 r4 ) +
| {z } y 2 κ4 x y + κ3 (r2 + 2 y 2 )
Radial Distortion | {z }
Tangential Distortion
!
fx x0 + cx
xs =
fy y 0 + cy
Images can be undistorted such that the perspective projection model applies.
More complex distortion models must be used for wide-angle lenses (e.g., fisheye).
40
Lens Distortion
41
2.3
Photometric Image Formation
Photometric Image Formation
light
source
^
n
sensor
plane
surface
optics
I So far we have discussed how individual light rays travel through space
I We now discuss how an image is formed in terms of pixel intensities and colors
I Light is emitted by one or more light sources and reflected or refracted
(once or multiple times) at surfaces of objects (or media) in the scene
43
Rendering Equation
Let p ∈ R3 denote a 3D surface point, v ∈ R3 the viewing direction and s ∈ R3 the
incoming light direction. The rendering equation describes how much of the light Lin
with wavelength λ arriving at p is reflected into the viewing direction v:
Z
Lout (p, v, λ) = Lemit (p, v, λ) + BRDF(p, s, v, λ) · Lin (p, s, λ) · (−n> s) ds
Ω
I Ω is the unit hemisphere at normal n
I The bidirectional reflectance distribution
function BRDF(p, s, v, λ) defines how
light is reflected at an opaque surface.
I Lemit > 0 only for light emitting surfaces
44
Diffuse and Specular Reflection
Diffuse Specular Mirror
I Typical BRDFs have a diffuse and a specular component
I The diffuse (=constant) component scatters light uniformly in all directions
I This leads to shading, i.e., smooth variation of intensity wrt. surface normal
I The specular component depends strongly on the outgoing light direction
45
Diffuse and Specular Reflection
Diffuse Specular Combined
I Typical BRDFs have a diffuse and a specular component
I The diffuse (=constant) component scatters light uniformly in all directions
I This leads to shading, i.e., smooth variation of intensity wrt. surface normal
I The specular component depends strongly on the outgoing light direction
45
BRDF Examples
I BRDFs can be very complex and spatially varying
Slide Credits: Svetlana Lazebnik 46
Fresnel Effect
I The amount of light reflected from a surface depends on the viewing angle
Slide Credits: Filament Documentation 47
Global Illumination
Rendering with Direct Lighting Rendering with Global Illumination
I Modeling one light bounce is insufficient for rendering complex scenes
I Light sources can be shadowed by occluders and rays can bounce multiple times
I Global illumination techniques also take indirect illumination into account
48
Why Camera Lenses?
I Large and very small pinholes result in image blur (averaging, diffraction)
I Small pinholes require very long shutter times (⇒ motion blur)
I http://www.pauldebevec.com/Pinhole/
49
Why Camera Lenses?
Small Pinhole
e
an
Pl
Pin-
e
ag
hole
Im
Large Pinhole
e
an
Pl
Pin-
e
ag
hole
Im
50
Optics
Pinhole Camera Model Camera with Lens
ne
e
an
a
Pl
Pl
Lens
Pin-
e
e
ag
ag
hole
Im
Im
I Cameras use one or multiple lenses to accumulate light on the sensor plane
I Importantly, if a 3D point is in focus, all light rays arrive at the same 2D pixel
I For many applications it suffices to model lens cameras with a pinhole model
I However, to address focus, vignetting and aberration we need to model lenses
51
Thin Lens Model
Image Focal Points of Lens
Plane
xs zs − f xs zs zs − f zs zs zs 1 1 1
= ∧ = ⇒ = ⇒ −1 = ⇒ + =
xc f xc zc f zc f zc zs zc f
I The thin lens model with spherical lens is often used as an approximation
I Properties: Axis-parallel rays pass the focal point, rays via center keep direction
R
I From Snell’s law we obtain f = 2(n−1) with radius R and index of refraction n
52
Depth of Field (DOF)
I The image is in focus if 1 1 1
zs + zc = f where f is the focal length of the lens
I For zc → ∞ we obtain zs = f (lens with focal length f ≈ pinhole at distance f )
I If the image plane is out of focus, a 3D point projects to the circle of confusion c
53
Depth of Field (DOF)
I To control the size of the circle of confusion, we change the lens aperture
I An aperture is a hole or an opening through which light travels
I The aperture limits the amount of light that can reach the image plane
I Smaller apertures lead to sharper, but more noisy images (less photons)
54
Depth of Field (DOF)
I The allowable depth variation that limits the circle of confusion c is called
depth of field and is a function of both the focus distance and the lens aperture
I Typical DSLR lenses have depth of field indicators
I The commonly displayed f-number is defined as
f
N= (often denoted as f /N , e.g.: f /1.4)
d
I In other words, it is the lens focal length f divided by the aperture diameter d
55
Depth of Field (DOF)
Aperture = f/1.4 Aperture = f/4.0 Aperture = f/22
DOF = 0.8 cm DOF = 2.2 cm DOF = 12.4 cm
Depth of Field (DOF):
I Distance between the nearest and farthest objects that are acceptably sharp
I Decreasing the aperture diameter (increasing the f-number) increases the DOF
56
Chromatic Aberration
I The index of refraction for glass varies slightly as a function of wavelength
I Thus, simple lenses suffer from chromatic aberration which is the tendency for
light of different colors to focus at slightly different distances (blur, color shift)
I To reduce chromatic and other kinds of aberrations, most photographic lenses
are compound lenses made of different glass elements (with different coatings)
57
Chromatic Aberration
I Top: High-quality lens Bottom: Low-quality lens (blur, rainbow edges)
58
Vignetting
Image
Plane
I Vignetting is the tendency for the brightness to fall off towards the image edge
I Composition of two effects: natural and mechanical vignetting
I Natural vignetting: foreshortening of object surface and lens aperture
I Mechanical vignetting: the shaded part of the beam never reaches the image
I Vignetting can be calibrated (i.e., undone)
59
Vignetting
60
2.4
Image Sensing Pipeline
Image Sensing Pipeline
The image sensing pipeline can be divided into three stages:
I Physical light transport in the camera lens/body
I Photon measurement and conversion on the sensor chip
I Image signal processing (ISP) and image compression
62
Shutter
I A focal plane shutter is positioned just in front the image sensor / film
I Most digital cameras use a combination of mechanical and electronic shutter
I The shutter speed (exposure time) controls how much light reaches the sensor
I It determines if an image appears over-/underexposed, blurred or noisy
63
Sensor
I CCDs move charge from pixel to pixel and convert it to voltage at the output node
I CMOS images convert charge to voltage inside each pixel and are standard
I Larger chips (full frame = 35 mm) are more photo-sensitive ⇒ less noise
https://meroli.web.cern.ch/lecture_cmos_vs_ccd_pixel_sensor.html
64
Color Filter Arrays
G R G R rGb Rgb rGb Rgb
B G B G rgB rGb rgB rGb
G R G R rGb Rgb rGb Rgb
B G B G rgB rGb rgB rGb
Bayer RGB Pattern Interpolated Pixels
I To measure color, pixels are arranged in a color array, e.g.: Bayer RGB pattern
I Missing colors at each pixel are interpolated from the neighbors (demosaicing)
65
Color Filter Arrays
Slide Credits: Steve Seitz 66
Color Filter Arrays
I Each pixel integrates the light spectrum L according to its spectral sensitivity S:
Z
R = L(λ) SR (λ) dλ
I The spectral response curves are provided by the camera manufacturer
67
Color Spaces
I Various different color spaces have been developed and are used in practice
68
Gamma Compression
Y’ Y
visible
Y’ = Y1/γ noise
Y = Y’γ
Y quantization Y’
noise
I Humans are more sensitive to intensity differences in darker regions
I Therefore, it is beneficial to nonlinearly transform (left) the intensities or colors
prior to discretization (left) and to undo this transformation during loading
69
Image Compression
I Typically luminance is compressed with higher fidelity than chrominance
I Often, (8 × 8 pixel) patch-based discrete cosine or wavelet transforms are used
I Discrete Cosine Transform (DCT) is an approximation to PCA on natural images
I The coefficients are quantized to integers that can be stored with Huffman codes
I More recently, deep network based compression algorithms are developed
70
QUESTIONS???
AC K N OW L E D G E M E N T !
• Various contents in this presentation have been taken from different books,
lecture notes, and the web. These solely belong to their owners, and are here used
only for clarifying various educational concepts. Any copyright infringement is
not intended.