0% found this document useful (0 votes)

22 views14 pages

CV - Unit 1

Chapter 1 introduces computer vision as a branch of artificial intelligence that enables machines to interpret visual data through various applications such as face detection, self-driving cars, and augmented reality. It covers the basics of image representation, formation, and the importance of camera calibration for accurate image processing. The chapter also details the intrinsic and extrinsic parameters necessary for mapping 3D world coordinates to 2D image coordinates, along with methods for correcting image distortions.

Uploaded by

abc.asdfghjkl.2345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views14 pages

CV - Unit 1

Uploaded by

abc.asdfghjkl.2345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CHAPTER 1: INTRODUCTION TO COMPUTER VISION

1.1 Overview of Computer Vision

Computer vision is a sector of Artificial Intelligence that uses Machine Learning and Deep
Learning to allow computers to see, recognize and analyze things in photos and videos in the
same way that people do. Computational vision is rapidly gaining popularity for automated AI
vision inspection, remote monitoring, and automation.
Definition:
Computer vision can be defined as a scientific field that extracts information out of digital
images. The type of information gained from an image can vary from identification, space
measurements for navigation, or augmented reality applications.

Computer vision systems use

(1) cameras to obtain visual data,
(2) machine learning models for processing the images, and
(3) conditional logic to automate application-specific use cases.
Applications
● Special effects Shape and motion capture are new techniques used in movies like Avatar to
animate digital characters by recording the movements played by a human actor. In order to do
that, we have to find the exact positions of markers on the actor’s
face in a 3D space, and then recreate them on the digital avatar.
● 3D urban modeling Taking pictures with a drone over a city can be used to render a 3D model
of the city. Computer vision is used to combine all the photos into a single 3D model.
● Scene recognition It is possible to recognize the location where a photo was taken. For
instance, a photo of a landmark can be compared to billions of photos on google to find the best
matches.
● Face detection Face detection has been used for multiple years in cameras to take better
pictures and focus on the faces. Smile detection can allow a camera to take pictures
automatically when the subject is smiling.
● Face recognition is more difficult than face detection, but with the scale of today’s data,
companies like Facebook are able to get very good performance. Finally, we can also use
computer vision for biometrics, using unique iris pattern recognition or fingerprints.
● Optical Character Recognition One of the oldest successful applications of computer vision is
to recognize characters and numbers. This can be used to read zip codes, or license plates.
● Mobile visual search With computer vision, we can do a search on Google using an image as
the query.
● Self-driving cars Autonomous driving is one of the hottest applications of computer vision.
Companies like Tesla, Google or General Motors compete to be the first to build a fully
autonomous car.
● Automatic checkout Amazon Go is a new kind of store that has no checkout. With computer
vision, algorithms detect exactly which products you take and they charge you as you walk out of
the store
● Vision-based interaction Microsoft’s Kinect captures movement in real time and allows players
to interact directly with a game through moves.
● Augmented Reality AR is also a very hot field right now, and multiple companies are
competing to provide the best mobile AR platform.Apple released ARKit in June and
has already impressive applications
● Virtual Reality VR is using similar computer vision techniques as AR. The algorithm needs to
know the position of a user, and the positions of all the objects around. As the user moves
around, everything needs to be updated in a realistic and smooth way.

1.2 Basics of Image Representation

After getting an image, it is important to devise ways to represent the image. There are various
ways by which an image can be represented. Let’s look at the most common ways to represent
an image.
Image as a matrix

The simplest way to represent the image is in the form of a matrix.

In fig., a part of the image, i.e., the clock, has been represented as a matrix. A similar matrix\will
represent the rest of the image too.It is commonly seen that people use up to a byte to represent
every pixel of the image. This means that values between 0 to 255 represent the intensity for
each pixel in the image where 0 is black and 255 is white. For every color channelin the
image,one such matrix is generated. In practice, it is also common to normalize the values
between 0 and 1 (as done in the example in the figure above).
Image as a function

An image can also be represented as a function. An image (grayscale) can be thought of as a

function that takes in a pixel coordinate and gives the intensity at that pixel.
It can be written as function

f: R2 → R that outputs the intensity at any input point (x,y). The

value of intensity can be between 0 to 255 or 0 to 1 if values are normalized.

Numericals:

Resolution Conversion Example:

1. Given an image with a resolution of 1024x768 and pixel depth of 24 bits, calculate the
size of the image in memory.

Solution: Image size (in bits)=Width×Height×Color depth =1024×768×24=18,874,368

bits=2.25 MB( to convert into MB divide bits with 1024*1024*8)

2. Calculate the total number of pixels in an RGB image with a resolution of 1920×1080.
How many bits are required to store this image assuming an 8-bit depth per channel?

Solution:
The number of pixels in the image is:
1920×1080=2,073,600
1920×1080=2,073,600 pixels.

Since it is an RGB image, each pixel has three color channels (Red, Green, Blue). The
total number of values is:
2,073,600×3=6,220,800 values.

With an 8-bit depth for each channel, the total number of bits required to store the image
is:
6,220,800×8=49,766,400 bits.
To convert to bytes:
49,766,400÷8=6,220,800 bytes.

1.3 Image Formation

In image formation,radiometry is concerned with the relation among the amounts of light energy
emitted from light sources,reflected from surfaces and captured by sensors.
Simple model for Image Formation
● A Simple model of image formation The scene is illuminated by a single source.
● The scene reflects radiation towards the camera. The camera senses it via solid state
cells (CCD cameras)
● There are two parts to the image formation process:
○ The geometry, which determines where in the image plane the projection of a
point in the scene will be located.
○ The physics of light, which determines the brightness of a point in the image
plane. f(x,y) = i(x,y) r(x,y)
■ Simple model: i: illumination, r: reflectance
Photometric image formation

● Images cannot exist without light.

● Light sources can be a point or an area light source.
○ point source (location only, e.g. bulb)
○ Directional source (orientation only, e.g. Sun)
○ Ambient source (no location nor orientation)
○ Spot light (point source + spread angle)
○ Flap, barn-door (directional source + spatial extent)
● When Light arriving at a surface, two factors affect of light for vision.
○ Strength: characterized by its irradiance (energy/time-area)
○ Distance: how much emitted energy actually gets to the object (no
attenuation, no reflection)

● When the light hits a surface, three major reactions might occur-

○ Some light is absorbed. That depends on the factor called ρ (albedo). Low ρ of the
surface means more light will get absorbed.
○ Some light gets reflected diffusively, which is independent of viewing direction. E.g.,
cloth, brick.
Some light is reflected specularly, which depends on the viewing direction. E.g., mirror.
○ Some lights may refracted.(absorbed and travel through material)
○ absorption + reflection + refraction = total incident
1.4 Camera Calibration:

https://alphapixeldev.com/opencv-tutorial-part-1-camera-calibration/
https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html
https://learnopencv.com/camera-calibration-using-opencv/
What is Camera calibration?
Camera calibration is a crucial step in most computer vision tasks, as it allows you to determine
the intrinsic parameters of a camera, which are essential for accurately interpreting the geometry
of the scene being captured. Camera calibration involves estimating the camera’s internal
parameters, such as the focal length, optical center (principal point), and lens distortion
coefficients. These parameters describe how the camera projects 3D points in the world onto a
2D image plane. The goal of camera calibration is to correct distortions and improve the
accuracy of measurements and image processing tasks, such as 3D reconstruction, object
detection, and augmented reality applications.
● The process of estimating the parameters of a camera is called camera calibration.
● This means we have all the information (parameters or coefficients) about the camera
required to determine an accurate relationship between a 3D point in the real world and its
corresponding 2D projection (pixel) in the image captured by that calibrated camera.
Two kinds of parameters
1. Internal parameters of the camera/lens system. E.g. focal length, optical center, and
radial distortion coefficients of the lens.
2. External parameters : This refers to the orientation (rotation and translation) of the
camera with respect to some world coordinate system.
The major purpose of camera calibration is to remove the distortions in the image and thereby
establish a relation between image pixels and real world dimensions. In order to remove the
distortion we need to find the intrinsic parameters in the intrinsic matrix K and the distortion
parameters.
Intrinsic parameters depend only on camera characteristics while extrinsic parameters depend on
camera position.
Extrinsic Parameters:
Mapping of 3D world coordinate to 2D Image coordinate
Calibration maps a 3D point (in the world) with [X, Y, Z] coordinates to a 2D Pixel with [X, Y]
coordinates.
With a calibrated camera, we can transformation world coordinates to pixel coordinates going
through camera coordinates.

The essential matrix is the part of the basic fundamental matrix that is only related to external
parameters.
Extrinsic calibration converts World Coordinates to Camera Coordinates. The extrinsic
parameters are called R (rotation matrix) and T (translation matrix).

Intrinsic Parameters
Intrinsic calibration converts Camera Coordinates to Pixel Coordinates. It requires inner values
for the camera such as focal length, optical center. The intrinsic parameter is a matrix we call K.

Camera to Image Conversion

However the matrices doesn’t match. Due to this world matrix needs to be modified from [X Y
Z] to [X Y Z 1]. This “1” is called a homogeneous coordinate.

P = K[R|T]
If we have the 2D coordinates, then using calibration parameters, we can map to 3D and vice
versa using the following equation:

Image Distortion

A distortion can be radial or tangential. Calibration helps to undistort an image.

Radial distortion: Radial distortion occurs when the light rays bend more at the edges of the lens
than the optical center of the lens. It essentially makes straight lines appear as slightly curved
within an image.
x-distorted = x(1 + k1*r² + k2*r⁴ + k3*r⁶)
y-distorted = y(1 + k1*r² + k2*r⁴ + k3*r⁶)
x, y — undistorted pixels that are in image coordinate system.
k1, k2, k3 — radial distortion coefficients of the lens.
Tangential distortion: This form of distortion occurs when the lens of the camera being utilized is
not perfectly aligned i.e. parallel with the image plane. This makes the image to be extended a
little while longer or tilted, it makes the objects appear farther away or even closer than they
actually are.

x-distorted = x + [2 * p1 * x * y + p2 * (r² + 2 * x²)]

y-distorted = y + [p1 * (r² + 2 *y²) + 2 * p2 * x * y]
x, y — undistorted pixels that are in image coordinate system.
p1, p2 — tangential distortion coefficients of the lens.
This distortion can be captured by five numbers called Distortion Coefficients, whose values
reflect the amount of radial and tangential distortion in an image.
We need intrinsic and extrinsic parameters of camera to find distortion coefficient parameters
(k1, k2, k3, p1, p2). The intrinsic parameters are camera-specific (same parameters for same
camera) that are focal length (fx, fy) and optical centers (cx, cy).
The actual OpenCV Camera Calibration step

■ For each image

■ Load the image
■ Convert to Grayscale
■ Find the chessboard corners
■ Refine the location of the corners
■ Add the corners to the dataset
■ Perform calibrateCamera()

# Import required modules

import cv2
import numpy as np
import os
import glob
from google.colab.patches import cv2_imshow

# Define the dimensions of checkerboard

CHECKERBOARD = (6, 9)

# stop the iteration when specified

# accuracy, epsilon, is reached or
# specified number of iterations are completed.
criteria = (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Vector for 3D points
threedpoints = []

# Vector for 2D points

twodpoints = []
# 3D points real world coordinates
objectp3d = np.zeros((1, CHECKERBOARD[0]
* CHECKERBOARD[1],
3), np.float32)
objectp3d[0, :, :2] = np.mgrid[0:CHECKERBOARD[0],
0:CHECKERBOARD[1]].T.reshape(-1, 2)
prev_img_shape = None
# Extracting path of individual image stored
# in a given directory. Since no path is
# specified, it will take current directory
# jpg files alone
images = glob.glob('/content/*.jpg')

for filename in images:

image = cv2.imread(filename)
grayColor = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Find the chess board corners

# If desired number of corners are
# found in the image then ret = true
ret, corners = cv2.findChessboardCorners(
grayColor, CHECKERBOARD,
cv2.CALIB_CB_ADAPTIVE_THRESH
+ cv2.CALIB_CB_FAST_CHECK +
cv2.CALIB_CB_NORMALIZE_IMAGE)
# If desired number of corners can be detected then,
# refine the pixel coordinates and display
# them on the images of checker board
if ret == True:
threedpoints.append(objectp3d)

# Refining pixel coordinates

# for given 2d points.
corners2 = cv2.cornerSubPix(
grayColor, corners, (11, 11), (-1, -1), criteria)

twodpoints.append(corners2)

# Draw and display the corners

image = cv2.drawChessboardCorners(image,
CHECKERBOARD,
corners2, ret)
cv2_imshow(image)
cv2.waitKey(0)
cv2.destroyAllWindows()
h, w = image.shape[:2]

# Perform camera calibration by

# passing the value of above found out 3D points (threedpoints)
# and its corresponding pixel coordinates of the
# detected corners (twodpoints)
ret, matrix, distortion, r_vecs, t_vecs = cv2.calibrateCamera(
threedpoints, twodpoints, grayColor.shape[::-1], None, None)

# Displaying required output

print(" Camera matrix:")
print(matrix)

print("\n Distortion coefficient:")

print(distortion)

print("\n Rotation Vectors:")

print(r_vecs)

print("\n Translation Vectors:")

print(t_vecs)
# Refining the camera matrix using parameters obtained by calibration
#newcameramtx, roi = cv2.getOptimalNewCameraMatrix(matrix, distortion,
(w,h), 1, (w,h))

# Method to undistort the image

#dst = cv2.undistort(graycolor, matrix, distortion, None, newcameramtx)

#dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)
# Displaying the undistorted image
#cv2.imshow("undistorted image",dst)
#cv2.waitKey(0)
Examples:
A camera has intrinsic parameters: focal length (fx = 1000, fy = 950), and the optical center
(cx = 320, cy = 240). If a 3D point is at (X = 2, Y = 1, Z = 5) in the world coordinate system,
calculate its corresponding 2D image coordinates.

If a camera has a focal length of 50mm and an object is located 2000mm away from
the camera, what is the size of the object’s image on the sensor, assuming the actual
object is 500mm tall?

Computer Vision
No ratings yet
Computer Vision
15 pages
AI 10th Grade Pdfs
No ratings yet
AI 10th Grade Pdfs
30 pages
Intro to Computer Vision Applications
No ratings yet
Intro to Computer Vision Applications
21 pages
1 - Module 1
No ratings yet
1 - Module 1
47 pages
CV CL10
No ratings yet
CV CL10
4 pages
Computer Vision
No ratings yet
Computer Vision
19 pages
CS5330 F22 Lectures
No ratings yet
CS5330 F22 Lectures
116 pages
Screenshot 2023-10-23 at 5.51.17 AM
No ratings yet
Screenshot 2023-10-23 at 5.51.17 AM
14 pages
Computer Vision
No ratings yet
Computer Vision
29 pages
Ch-Computer Vision
No ratings yet
Ch-Computer Vision
6 pages
Computer Vision: Facial Recognition
No ratings yet
Computer Vision: Facial Recognition
9 pages
CV Gtu Answers
No ratings yet
CV Gtu Answers
56 pages
CH 3
No ratings yet
CH 3
22 pages
Computer Vision and Image Processing (Updated)
No ratings yet
Computer Vision and Image Processing (Updated)
165 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
4 pages
6960795-Class10 Ai Partb Unit5 Computervision
No ratings yet
6960795-Class10 Ai Partb Unit5 Computervision
17 pages
Class X Artificial Intelligence: Computer Vision
No ratings yet
Class X Artificial Intelligence: Computer Vision
54 pages
Grade10 AI Notes - Unit 5 Computer Vision (1) - 404622
No ratings yet
Grade10 AI Notes - Unit 5 Computer Vision (1) - 404622
9 pages
COMPUTER VISION Notes
No ratings yet
COMPUTER VISION Notes
3 pages
Computer Vision Class 10 AI Notes CBSE
No ratings yet
Computer Vision Class 10 AI Notes CBSE
8 pages
Chapter-4 Computer Vision Study Material
No ratings yet
Chapter-4 Computer Vision Study Material
4 pages
Computer Vision Class X
No ratings yet
Computer Vision Class X
39 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
AI-Computer Vision
No ratings yet
AI-Computer Vision
16 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
8 pages
Computer Vision for Tech Enthusiasts
No ratings yet
Computer Vision for Tech Enthusiasts
3 pages
Lecture 1 AI Summary
No ratings yet
Lecture 1 AI Summary
31 pages
Computer Vision (Ist Unit)
No ratings yet
Computer Vision (Ist Unit)
31 pages
Ai CV Notes
No ratings yet
Ai CV Notes
6 pages
Unit 1
No ratings yet
Unit 1
200 pages
Computer Vision
No ratings yet
Computer Vision
4 pages
HW 675075 1compu
No ratings yet
HW 675075 1compu
3 pages
Introduction To Computer Vision: Domain of AI
No ratings yet
Introduction To Computer Vision: Domain of AI
4 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
72 pages
Applications of Computer Vision
No ratings yet
Applications of Computer Vision
6 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
4 pages
Computer Vision
No ratings yet
Computer Vision
36 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Class 10 AI 417 Computer Vision
No ratings yet
Class 10 AI 417 Computer Vision
22 pages
Computer Vision Class 10 Notes
No ratings yet
Computer Vision Class 10 Notes
5 pages
CVDL Ese
No ratings yet
CVDL Ese
49 pages
2023 - 12 - 06 7 - 57 PM Office Lens
No ratings yet
2023 - 12 - 06 7 - 57 PM Office Lens
11 pages
Machine - Learning (Computer Vision)
No ratings yet
Machine - Learning (Computer Vision)
56 pages
CV Unit-1
No ratings yet
CV Unit-1
26 pages
Unit-5 Computer Vision
No ratings yet
Unit-5 Computer Vision
3 pages
PDF Computer Vision
No ratings yet
PDF Computer Vision
3 pages
Ip CV Summary Finaaaal-1
No ratings yet
Ip CV Summary Finaaaal-1
178 pages
Chunk 2
No ratings yet
Chunk 2
31 pages
Lecture1 Merged
No ratings yet
Lecture1 Merged
182 pages
Part B Unit 5 - Computer Vision - Notes
No ratings yet
Part B Unit 5 - Computer Vision - Notes
5 pages
Unit 1
No ratings yet
Unit 1
15 pages
Intro Imaging
No ratings yet
Intro Imaging
41 pages
Introduction to Data Science: (Khoa học dữ liệu)
No ratings yet
Introduction to Data Science: (Khoa học dữ liệu)
91 pages
Computer Vision MCQs
No ratings yet
Computer Vision MCQs
3 pages
Computer Vision
No ratings yet
Computer Vision
33 pages
Computer Vision Class X (23-24)
No ratings yet
Computer Vision Class X (23-24)
4 pages
Computer Vision
No ratings yet
Computer Vision
17 pages
8394 Making Machines See
No ratings yet
8394 Making Machines See
50 pages
Computer Vision Class 10 Notes
100% (7)
Computer Vision Class 10 Notes
7 pages
Friction
No ratings yet
Friction
31 pages
4006-23tag2a - Tag3a
No ratings yet
4006-23tag2a - Tag3a
14 pages
Metallurgy Dual Phase Steel: November 29
No ratings yet
Metallurgy Dual Phase Steel: November 29
4 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Asynchronous Tasks With FastAPI and Celery
No ratings yet
Asynchronous Tasks With FastAPI and Celery
4 pages
J Matchar 2021 110911
No ratings yet
J Matchar 2021 110911
14 pages
Theodolite Permanent Adjustments
No ratings yet
Theodolite Permanent Adjustments
4 pages
Types Roof Trusses: Building Technology 3 2012
100% (1)
Types Roof Trusses: Building Technology 3 2012
34 pages
Alkhatib Et Al - 2022 - Assessing Explanation Quality by Venn Prediction
No ratings yet
Alkhatib Et Al - 2022 - Assessing Explanation Quality by Venn Prediction
13 pages
Real-Life Applications of Linear Algebra
No ratings yet
Real-Life Applications of Linear Algebra
3 pages
Asyn Driver
No ratings yet
Asyn Driver
78 pages
Business Intelligence Architectures
No ratings yet
Business Intelligence Architectures
14 pages
Experimental Study of Flow Past A Low-Rise Building: M. Mahmood
No ratings yet
Experimental Study of Flow Past A Low-Rise Building: M. Mahmood
18 pages
Process Control Benefits in Grinding
No ratings yet
Process Control Benefits in Grinding
4 pages
Mathmatics Demarcation - Wiskunde Afbakening
No ratings yet
Mathmatics Demarcation - Wiskunde Afbakening
4 pages
Grove 1997 VIII On The Gas Voltaic Battery Experiments Made With A View of Ascertaining The Rationale of Its Action and
No ratings yet
Grove 1997 VIII On The Gas Voltaic Battery Experiments Made With A View of Ascertaining The Rationale of Its Action and
23 pages
Lag Manual
No ratings yet
Lag Manual
23 pages
Valve CV Sizing Liquids Gases
No ratings yet
Valve CV Sizing Liquids Gases
22 pages
Engineering Differential Equations
No ratings yet
Engineering Differential Equations
57 pages
Light and Shadows Quiz Key
No ratings yet
Light and Shadows Quiz Key
2 pages
Microprocessor Lecture 10
No ratings yet
Microprocessor Lecture 10
11 pages
Complex Numbers For High School
No ratings yet
Complex Numbers For High School
60 pages
Ambarella CV2S66 Preliminary Datasheet
No ratings yet
Ambarella CV2S66 Preliminary Datasheet
88 pages
Report On Gis Substation Sector
No ratings yet
Report On Gis Substation Sector
6 pages
Keiler Realtime Prediction Dafx2000
No ratings yet
Keiler Realtime Prediction Dafx2000
6 pages
TEE - CSE3001 - DBMS - 100237 - Dr. Harihrasitaraman.S - Winter21-22-Block1 - QP
No ratings yet
TEE - CSE3001 - DBMS - 100237 - Dr. Harihrasitaraman.S - Winter21-22-Block1 - QP
3 pages
Gamoyeneb. Tox.17.red
No ratings yet
Gamoyeneb. Tox.17.red
82 pages
CESTAT30 - 01.01.introduction To The Course - Lecture
No ratings yet
CESTAT30 - 01.01.introduction To The Course - Lecture
8 pages
Attachment A980727a5ed0537d
No ratings yet
Attachment A980727a5ed0537d
21 pages
ALLGON 2002 Product Catalog
0% (1)
ALLGON 2002 Product Catalog
92 pages

CV - Unit 1

Uploaded by

CV - Unit 1

Uploaded by

CHAPTER 1: INTRODUCTION TO COMPUTER VISION

1.1 Overview of Computer Vision

Computer vision systems use

1.2 Basics of Image Representation

The simplest way to represent the image is in the form of a matrix.

An image can also be represented as a function. An image (grayscale) can be thought of as a

f: R2 → R that outputs the intensity at any input point (x,y). The

Resolution Conversion Example:

Solution: Image size (in bits)=Width×Height×Color depth =1024×768×24=18,874,368

1.3 Image Formation

● Images cannot exist without light.

Camera to Image Conversion

A distortion can be radial or tangential. Calibration helps to undistort an image.

x-distorted = x + [2 * p1 * x * y + p2 * (r² + 2 * x²)]

■ For each image

# Import required modules

# Define the dimensions of checkerboard

# stop the iteration when specified

# Vector for 2D points

for filename in images:

# Find the chess board corners

# Refining pixel coordinates

# Draw and display the corners

# Perform camera calibration by

# Displaying required output

print("\n Distortion coefficient:")

print("\n Rotation Vectors:")

print("\n Translation Vectors:")

# Method to undistort the image

You might also like