Computer Vision!
CS-E4850, 5 study credits!
!
Juho Kannala!
Aalto University!
Plan for today!
• Background!
• What is computer vision?!
• Why to study computer vision?!
• Overview of the course!
• Lecture 1: Image formation!
Credits: Material for slides borrowed from Victor Prisacariu, Andrew Zisserman, Esa Rahtu, James Hays, !
Derek Hoiem, Svetlana Lazebnik, Steve Seitz, David Forsyth, and others!
Course personnel!
!
• Lecturer: !
Juho Kannala
juho.kannala@aalto.fi!
• Main course assistant:!
Xiaotian Li
firstname.lastname@aalto.fi !
A few words about me!
Juho Kannala!
Assistant Professor of Computer vision!
• PhD, University of Oulu 2010!
• Professor at Aalto since 2016!
• Working with computer vision since 2000 !
• Recent projects and other info available on my homepage: https://users.aalto.fi/~kannalj1/ !
Motivation - what is computer vision?!
Make computers understand images!
• What kind of scene?!
• Where are the cars?!
• How far are the buildings?!
• Where are the cars going?!
• …..!
Many data modalities!
• 2D or 3D still images !
• Video frames!
• X-ray !
• Ultra-sound!
• Microscope!
• ….!
What kind of information can be extracted?!
Semantic information! Geometric information!
What do we have here?!
… seems pretty easy…
Wrong! Very hard big data problem…!
• Hardware perspective:!
• RGB stereo images with 30 frames per second -> 100s MB/s data stream.!
• Non-trivial processing per each byte.!
• Massive image collections.!
• Mathematical perspective!
• Information is highly implicit or lost by perspective projection!
• 2D -> 3D mapping is ill-posed and ill-conditioned -> need to use constraints!
Wrong! Very hard big data problem…!
• Artificial intelligence perspective!
• Images have uneven information content !
• Computational visual semantics is hard (what does visual stuff mean exactly?)!
• If we have limited time, what is the important visual stuff right now?!
Still a massive challenge - if we want genuine autonomy.!
Natural vision !
• Humans see effortlessly!
Natural vision!
• Humans see effortlessly, but… it is very hard work for our brains!!
• There are billions of neurons in human brain!
• Years of evolution generated hardwired priors.!
So why bother?
What are the advantages?
Why computer vision matters?!
• Engineering point of view - Computer Vision helps to
solve many practical problems: business potential!
• Scientific point of view - Human kind of visual system is
one of the grand challenges of Artificial Intelligence (AI)!
• AI itself is a grand challenge of computing !
Why computer vision matters?!
• Safety!
• Health!
• Security!
• Fun!
• Access!
• ….!
Computer vision is already here!
• You are surrounded by !
devices using computer vision!
• Imagine what can be done !
with already installed cameras!!
Motivation - Success stories!
Recognizing “simple” patterns!
Face recognition!
Object detection and recognition!
Reconstruction: 3D from photo collections!
The Visual Turing test for Scene Reconstruction,!
Shan, Adams, Curless, Furukawa, Seitz, in 3DV 2013. YouTube video.!
A recent commercial 3D reconstruction system!
YouTube!
Robotics!
NASA’s Mars Rover! Robocup!
See “Computer Vision on Mars”! See www.robocup.org !
STAIRS at Stanford!
Saxena et al. 2008 !
Self-driving cars (Nvidia @ CES 2016)!
Visual odometry and SLAM!
Augmented Reality (AR) and Virtual Reality (VR)!
Image generation!
A style-based generator architecture for generative adversarial networks. Karras, Laine, Aila. CVPR 2019.!
Current state of the affairs!
• Many of the previous examples are less than 5 years old!!
• Many new applications to appear in the next 5 years!
• Strong open source culture!
• Many recent state-of-the-art methods are freely available!
• See papers from top conferences like CVPR, ECCV, ICCV, and NeurIPS!
5160
Rapidly growing area!
2019
Attendees and submissions to IEEE Conference on !
Computer Vision and Pattern Recognition (CVPR)!
Rapidly growing area !
Ref. Google Scholar top publications.!
Rapidly growing area - substantial commercial interest!
CVPR 2018 sponsors!
Plenty of job opportunities!
• Companies are looking for computer vision and deep learning experts.!
• Big Internet players are investing heavily (Apple, Google, Facebook,
Microsoft, Baidu, Tencent, …) as well as car industry (Tesla, BMW,…)!
• Strong imaging ecosystem also in Finland!
Specifics of this course!
Course textbooks!
• Szeliski: Computer Vision!
• Full-copy freely available!
• Hartley & Zisserman: Multiple!
View Geometry in Computer Vision!
• Available as an e-book via library!
• Forsyth & Ponce: Computer Vision!
• Full-copy freely available!
What will you learn on this course?!
• Course content (numbers refer to chapters in Szeliski’s book,1st edition):!
• Image formation and processing (2, 3)!
• Feature detection and matching (4)!
• Feature based alignment and image stitching (6,9)!
• Optical flow and tracking (8)!
• Basics of image classification and convolutional neural networks!
• Object recognition and detection (14)!
• Structure from motion, stereo and 3D reconstruction (7, 11, 12)!
What will you NOT learn on this course?!
• Software packages!
• PyTorch, TensorFlow, Keras, Caffe, etc.!
• We have simple exercises with Python/Matlab though!
• In-depth deep learning!
• Tweaking architectures, loss functions, etc.!
• Note that there exists a separate deep learning course (CS-E4890) !
• All the bells and whistles in the state-of-the-art systems!
• We concentrate on the basic concepts (get them right and the rest is easier for you)!
Organization!
• Lectures on Mondays at 8-10 (12 lectures)!
• Exercises on Fridays at 12-14 (12 sessions)!
• The solutions of weekly homework assignments should be returned before the session!
• The solutions are presented in the session !
• Guidance available if needed!
• Slack and guidance sessions on Thursdays (see MyCourses)!
• Presence is not rewarded, only returned homework and exam counts!
Requirements!
• Get more than 0 points from at least 8 exercise rounds !
(i.e. solve at least 1 task from 8 different weekly rounds)!
• Pass the exam!
Hints!
• Doing homework takes time but is often a good way to learn in depth!
• Try to do more than the minimum - homework points are taken into
account in the grading (i.e. weighted exercise points are added to
exam points)!
• Note that the amount of work and bonus points varies a bit between
weeks - exercises are published early so that you can do them in
advance if needed!
Questions at this point?!
Lecture 1: Camera model!
Relevant reading!
• Chapters 2, 3, and 6 in [Hartley & Zisserman]!
• Comprehensive presentation of the core content!
• Chapter 2 in [Szeliski]!
• Broader overview of the image formation!
This is (a picture of) a cat!
Credits: Victor Prisacariau!
Cat lives in a 3D world!
The point X in world space projects to the point x in image space.!
Credits: Victor Prisacariau!
Going from X in 3D to x in 2D!
The output would be blurry if film just exposed to the cat.!
Pinhole camera!
All rays passing through a single point (center of projection)!
Pinhole camera!
Pinhole camera!
What happens in the projection?!
• Projection from 3D to 2D -> information is lost!
• What properties are preserved?!
• Straight lines!
• Incidence!
• What properties are not preserved?!
• Angles!
• Lengths!
Projective geometry - what is lost?!
Length is not preserved!
Angles are not preserved!
Straight lines are still straight!
Vanishing points and lines!
• Parallel lines in the world!
intersect at a “vanishing point”!
Constructing the vanishing point of a line!
Vanishing points and lines!
All parallel lines will have the same vanishing point.!
Homogenous coordinates!
• The projection x1 = fX1/x3 is non linear!!
• Can be made linear using
homogenous coordinates!
• Homogenous coordinates allow for
transforms to be concatenated easily!
Homogenous coordinates!
Conversion to homogenous coordinates!
Conversion from homogenous coordinates!
Invariance to scaling!
E.g. [1,2,3] is the same as [3,6,9] and both represent !
the same inhomogeneous point [0.33,0.66]. !
Basic geometry in homogenous coordinates!
• Line equation: ax+by+c=0!
!
• A pixel p in homogenous coordinates:!
!
• Line is given by cross product of two points!
!
• Intersection of two lines is given by cross !
product of the lines!
3D Euclidean transformation!
• Cat moves through 3D space!
• The movement of the nose can be !
described using an Euclidean Transform!
Building the 3D rotation matrix R!
• R can be build from various representations (Euler angles, quaternion,
angle-axis representation, latter ones recommended)!
• Euler angles represent the rotation using three parameters, one for
each axis:!
!
!
!
!
!
!
!
!
3D Euclidean transformation!
• Concatenation of successive transforms is a mess!!
Homogenous coordinates save the day!!
• Replace 3D points with homogenous versions!
• The Euclidean transform becomes!
• Transformation can now be concatenated by matrix multiplication!
More 3D-3D and 2D-2D transformations!
3
Examples of 2D-2D transforms!
Perspective transformation (3D-2D)!
Perspective using homogenous coordinates!
Perspective using homogenous coordinates!
Wait! Our setup has several assumptions!
• Camera at world origin!
• Camera aligned with world
coordinates!
• Ideal pinhole camera!
Removing the initial assumptions!
• It is useful to split the overall projection matrix into three parts:!
• A part that depends on the internals of the camera (intrinsic)!
• A vanilla projection matrix!
• An Euclidean transformation between the world and camera frames (extrinsic)!
• Assume first that the world is aligned with camera coordinates!
-> the extrinsic camera matrix is an identity!
More realistic setting - camera pose!
• Assume the camera is translated and rotated with respect to the world!
The camera pose!
• The non-ideal camera pose can be taken into account by first
rotating and translating points from world frame to the camera frame!
The intrinsic parameters!
• Transformation to pixel units from metric units !
• Describe the hardware properties of a real camera!
• The image plane might be skewed!
• The pixels might not be square!
Summary of steps from scene to image!
• Move the scene point (Xw,1)T into camera coordinate system by!
4x4 (extrinsic) Euclidean transformation:!
!
!
• Project into ideal camera via the vanilla perspective transformation!
!
• Map the ideal image into the real image using intrinsic matrix!
Camera projection matrix P!
Beyond pinholes: Radial distortion!
• Common in wide-angle lenses!
• Creates non-linear terms in projection! Original!
• Usually handled by solving non-linear!
terms and then correcting the image!
Corrected!
Things to remember!
• Pinhole camera model!
!
!
• Homogenous coordinates!
!
!
• Camera projection matrix!
The end!