KEMBAR78
Computer Vision Course Overview | PDF | Line (Geometry) | Computer Vision
0% found this document useful (0 votes)
49 views79 pages

Computer Vision Course Overview

Uploaded by

jinyaoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views79 pages

Computer Vision Course Overview

Uploaded by

jinyaoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Computer Vision!

CS-E4850, 5 study credits!


!
Juho Kannala!
Aalto University!
Plan for today!

• Background!
• What is computer vision?!
• Why to study computer vision?!

• Overview of the course!


• Lecture 1: Image formation!

Credits: Material for slides borrowed from Victor Prisacariu, Andrew Zisserman, Esa Rahtu, James Hays, !
Derek Hoiem, Svetlana Lazebnik, Steve Seitz, David Forsyth, and others!
Course personnel!

!
• Lecturer: !
Juho Kannala
juho.kannala@aalto.fi!

• Main course assistant:!


Xiaotian Li
firstname.lastname@aalto.fi !
A few words about me!

Juho Kannala!
Assistant Professor of Computer vision!
• PhD, University of Oulu 2010!

• Professor at Aalto since 2016!

• Working with computer vision since 2000 !

• Recent projects and other info available on my homepage: https://users.aalto.fi/~kannalj1/ !


Motivation - what is computer vision?!
Make computers understand images!

• What kind of scene?!


• Where are the cars?!
• How far are the buildings?!
• Where are the cars going?!
• …..!
Many data modalities!

• 2D or 3D still images !
• Video frames!
• X-ray !
• Ultra-sound!
• Microscope!
• ….!
What kind of information can be extracted?!

Semantic information! Geometric information!


What do we have here?!

… seems pretty easy…


Wrong! Very hard big data problem…!

• Hardware perspective:!
• RGB stereo images with 30 frames per second -> 100s MB/s data stream.!
• Non-trivial processing per each byte.!
• Massive image collections.!

• Mathematical perspective!
• Information is highly implicit or lost by perspective projection!
• 2D -> 3D mapping is ill-posed and ill-conditioned -> need to use constraints!
Wrong! Very hard big data problem…!

• Artificial intelligence perspective!


• Images have uneven information content !
• Computational visual semantics is hard (what does visual stuff mean exactly?)!
• If we have limited time, what is the important visual stuff right now?!

Still a massive challenge - if we want genuine autonomy.!


Natural vision !

• Humans see effortlessly!


Natural vision!

• Humans see effortlessly, but… it is very hard work for our brains!!
• There are billions of neurons in human brain!
• Years of evolution generated hardwired priors.!

So why bother?
What are the advantages?
Why computer vision matters?!

• Engineering point of view - Computer Vision helps to


solve many practical problems: business potential!
• Scientific point of view - Human kind of visual system is
one of the grand challenges of Artificial Intelligence (AI)!
• AI itself is a grand challenge of computing !
Why computer vision matters?!

• Safety!
• Health!
• Security!
• Fun!
• Access!
• ….!
Computer vision is already here!

• You are surrounded by !


devices using computer vision!
• Imagine what can be done !
with already installed cameras!!
Motivation - Success stories!
Recognizing “simple” patterns!
Face recognition!
Object detection and recognition!
Reconstruction: 3D from photo collections!

The Visual Turing test for Scene Reconstruction,!


Shan, Adams, Curless, Furukawa, Seitz, in 3DV 2013. YouTube video.!
A recent commercial 3D reconstruction system!

YouTube!
Robotics!

NASA’s Mars Rover! Robocup!


See “Computer Vision on Mars”! See www.robocup.org !

STAIRS at Stanford!
Saxena et al. 2008 !
Self-driving cars (Nvidia @ CES 2016)!
Visual odometry and SLAM!
Augmented Reality (AR) and Virtual Reality (VR)!
Image generation!

A style-based generator architecture for generative adversarial networks. Karras, Laine, Aila. CVPR 2019.!
Current state of the affairs!

• Many of the previous examples are less than 5 years old!!


• Many new applications to appear in the next 5 years!
• Strong open source culture!
• Many recent state-of-the-art methods are freely available!
• See papers from top conferences like CVPR, ECCV, ICCV, and NeurIPS!
5160

Rapidly growing area!

2019
Attendees and submissions to IEEE Conference on !
Computer Vision and Pattern Recognition (CVPR)!
Rapidly growing area !

Ref. Google Scholar top publications.!


Rapidly growing area - substantial commercial interest!

CVPR 2018 sponsors!


Plenty of job opportunities!

• Companies are looking for computer vision and deep learning experts.!
• Big Internet players are investing heavily (Apple, Google, Facebook,
Microsoft, Baidu, Tencent, …) as well as car industry (Tesla, BMW,…)!
• Strong imaging ecosystem also in Finland!
Specifics of this course!
Course textbooks!

• Szeliski: Computer Vision!


• Full-copy freely available!

• Hartley & Zisserman: Multiple!


View Geometry in Computer Vision!
• Available as an e-book via library!

• Forsyth & Ponce: Computer Vision!


• Full-copy freely available!
What will you learn on this course?!

• Course content (numbers refer to chapters in Szeliski’s book,1st edition):!


• Image formation and processing (2, 3)!
• Feature detection and matching (4)!
• Feature based alignment and image stitching (6,9)!
• Optical flow and tracking (8)!
• Basics of image classification and convolutional neural networks!
• Object recognition and detection (14)!
• Structure from motion, stereo and 3D reconstruction (7, 11, 12)!
What will you NOT learn on this course?!

• Software packages!
• PyTorch, TensorFlow, Keras, Caffe, etc.!
• We have simple exercises with Python/Matlab though!

• In-depth deep learning!


• Tweaking architectures, loss functions, etc.!
• Note that there exists a separate deep learning course (CS-E4890) !

• All the bells and whistles in the state-of-the-art systems!


• We concentrate on the basic concepts (get them right and the rest is easier for you)!
Organization!

• Lectures on Mondays at 8-10 (12 lectures)!


• Exercises on Fridays at 12-14 (12 sessions)!
• The solutions of weekly homework assignments should be returned before the session!
• The solutions are presented in the session !

• Guidance available if needed!


• Slack and guidance sessions on Thursdays (see MyCourses)!

• Presence is not rewarded, only returned homework and exam counts!


Requirements!

• Get more than 0 points from at least 8 exercise rounds !


(i.e. solve at least 1 task from 8 different weekly rounds)!
• Pass the exam!
Hints!

• Doing homework takes time but is often a good way to learn in depth!
• Try to do more than the minimum - homework points are taken into
account in the grading (i.e. weighted exercise points are added to
exam points)!
• Note that the amount of work and bonus points varies a bit between
weeks - exercises are published early so that you can do them in
advance if needed!
Questions at this point?!
Lecture 1: Camera model!
Relevant reading!

• Chapters 2, 3, and 6 in [Hartley & Zisserman]!


• Comprehensive presentation of the core content!

• Chapter 2 in [Szeliski]!
• Broader overview of the image formation!
This is (a picture of) a cat!

Credits: Victor Prisacariau!


Cat lives in a 3D world!

The point X in world space projects to the point x in image space.!


Credits: Victor Prisacariau!
Going from X in 3D to x in 2D!

The output would be blurry if film just exposed to the cat.!


Pinhole camera!

All rays passing through a single point (center of projection)!


Pinhole camera!
Pinhole camera!
What happens in the projection?!

• Projection from 3D to 2D -> information is lost!


• What properties are preserved?!
• Straight lines!
• Incidence!

• What properties are not preserved?!


• Angles!
• Lengths!
Projective geometry - what is lost?!
Length is not preserved!
Angles are not preserved!
Straight lines are still straight!
Vanishing points and lines!

• Parallel lines in the world!


intersect at a “vanishing point”!
Constructing the vanishing point of a line!
Vanishing points and lines!

All parallel lines will have the same vanishing point.!


Homogenous coordinates!

• The projection x1 = fX1/x3 is non linear!!


• Can be made linear using
homogenous coordinates!
• Homogenous coordinates allow for
transforms to be concatenated easily!
Homogenous coordinates!

Conversion to homogenous coordinates!

Conversion from homogenous coordinates!


Invariance to scaling!

E.g. [1,2,3] is the same as [3,6,9] and both represent !


the same inhomogeneous point [0.33,0.66]. !
Basic geometry in homogenous coordinates!

• Line equation: ax+by+c=0!


!
• A pixel p in homogenous coordinates:!
!
• Line is given by cross product of two points!
!
• Intersection of two lines is given by cross !
product of the lines!
3D Euclidean transformation!

• Cat moves through 3D space!


• The movement of the nose can be !
described using an Euclidean Transform!
Building the 3D rotation matrix R!

• R can be build from various representations (Euler angles, quaternion,


angle-axis representation, latter ones recommended)!
• Euler angles represent the rotation using three parameters, one for
each axis:!
!
!
!
!
!
!
!
!
3D Euclidean transformation!

• Concatenation of successive transforms is a mess!!


Homogenous coordinates save the day!!

• Replace 3D points with homogenous versions!

• The Euclidean transform becomes!

• Transformation can now be concatenated by matrix multiplication!


More 3D-3D and 2D-2D transformations!

3
Examples of 2D-2D transforms!
Perspective transformation (3D-2D)!
Perspective using homogenous coordinates!
Perspective using homogenous coordinates!
Wait! Our setup has several assumptions!

• Camera at world origin!


• Camera aligned with world
coordinates!
• Ideal pinhole camera!
Removing the initial assumptions!

• It is useful to split the overall projection matrix into three parts:!


• A part that depends on the internals of the camera (intrinsic)!
• A vanilla projection matrix!
• An Euclidean transformation between the world and camera frames (extrinsic)!

• Assume first that the world is aligned with camera coordinates!


-> the extrinsic camera matrix is an identity!
More realistic setting - camera pose!

• Assume the camera is translated and rotated with respect to the world!
The camera pose!

• The non-ideal camera pose can be taken into account by first


rotating and translating points from world frame to the camera frame!
The intrinsic parameters!

• Transformation to pixel units from metric units !


• Describe the hardware properties of a real camera!
• The image plane might be skewed!
• The pixels might not be square!
Summary of steps from scene to image!

• Move the scene point (Xw,1)T into camera coordinate system by!
4x4 (extrinsic) Euclidean transformation:!
!
!
• Project into ideal camera via the vanilla perspective transformation!
!

• Map the ideal image into the real image using intrinsic matrix!
Camera projection matrix P!
Beyond pinholes: Radial distortion!

• Common in wide-angle lenses!


• Creates non-linear terms in projection! Original!

• Usually handled by solving non-linear!


terms and then correcting the image!

Corrected!
Things to remember!

• Pinhole camera model!


!
!
• Homogenous coordinates!
!
!
• Camera projection matrix!
The end!

You might also like