CV Unit-1

Computer vision is a field of AI focused on enabling computers to interpret visual information from images and videos, aiming to replicate human visual perception. It involves various algorithms and techniques for tasks like object recognition, image segmentation, and 3D reconstruction, with applications across multiple industries. Key components of computer vision systems include image processing, feature extraction, and transformations in both 2D and 3D spaces.

Uploaded by

INDIAN TECHING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views26 pages

CV Unit-1

Uploaded by

INDIAN TECHING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Introduction to Computer Vision

Computer vision is a field of artificial intelligence (AI) that focuses on enabling computers to
understand and interpret visual information from digital images or videos. It seeks to replicate the
complex visual perception and processing capabilities of the human visual system.
The goal of computer vision is to develop algorithms and systems that can analyze and understand
visual data, allowing computers to perceive the world in a way similar to humans. By extracting
meaningful information from images or videos, computer vision enables a wide range of
applications, such as object recognition, image classification, image segmentation, tracking, 3D
reconstruction, and even autonomous navigation.
Computer vision algorithms rely on a combination of image processing techniques, statistical
learning, and deep learning methods. Image processing techniques involve manipulating and
enhancing images to extract features and reduce noise, while statistical learning algorithms enable
the training of models based on labeled datasets. Deep learning, especially convolutional neural
networks (CNNs), has played a significant role in recent advancements in computer vision,
providing powerful tools for feature extraction and pattern recognition.
The applications of computer vision are vast and continue to expand rapidly. It is used in various
fields such as healthcare, self-driving cars, surveillance and security systems, robotics, augmented
reality, virtual reality, industrial automation, agriculture, and many others. Computer vision has
the potential to revolutionize industries and improve our daily lives by enabling machines to
perceive and understand the visual world around us.
The following diagram represents the components of a computer vision system. Keep in mind that
computer vision systems can vary depending on the specific application and complexity. Here is a
brief description of a basic computer vision system:

• Input: The system takes in an input image or video frame captured by a camera or sourced
from a dataset.
• Preprocessing: The input is preprocessed to enhance the quality and prepare it for further
analysis. Preprocessing steps may include resizing, cropping, color normalization, and
noise reduction.
• Feature Extraction: This stage involves extracting relevant features from the preprocessed
image or frame. Features can include edges, corners, textures, colors, or higher-level
representations like objects or shapes. Feature extraction techniques may involve filters,
gradients, or more advanced methods like deep learning-based feature extraction.
• Feature Representation: The extracted features are represented in a suitable format for
further analysis. This representation could be a vector, a histogram, a set of descriptors, or
any other format that encodes the information captured by the features.
• Object Detection and Recognition: Object detection and recognition algorithms are
employed to identify specific objects or patterns of interest within the image or video
frame. This can involve techniques like template matching, Haar cascades, or advanced
deep learning models such as convolutional neural networks (CNNs).
• Image Segmentation: Image segmentation aims to partition an image into meaningful
regions or segments based on properties like color, texture, or intensity. This step is useful
for tasks like object tracking, scene understanding, or image editing.
• High-Level Processing: Once objects or regions of interest are identified, further high-level
processing can be performed. This may involve tasks such as object tracking, pose
estimation, 3D reconstruction, behavior analysis, or semantic understanding.
• Output: The final stage involves providing the desired output based on the application
requirements. Outputs can include annotated images, object labels, measurements, or any
other information relevant to the specific task.
It is important to note that computer vision systems can be much more complex, incorporating
additional stages or variations based on the application's requirements. The diagram described
above represents a basic framework for understanding the components of a computer vision
system.
Image Formation
Image formation in a camera involves the process of capturing and creating a digital representation
of the visual scene in front of the camera. It consists of several steps, including the passage of light
through the camera lens, its interaction with the camera sensor, and the conversion of the captured
A simplified explanation of the image formation process in a digital camera is as follows:
• Light enters the camera: When you take a photo, light from the scene passes through the
camera's lens. The lens helps focus the light onto the camera's sensor.
• Lens and focusing: The camera lens plays a crucial role in image formation. It adjusts its
shape and position to focus the incoming light rays onto the camera sensor. This process
helps create a clear and sharp image.
• Aperture: The aperture is an adjustable opening in the lens that controls the amount of light
entering the camera. It determines the depth of field and affects the image's brightness and
sharpness. A smaller aperture (larger f-number) allows less light, while a larger aperture
(smaller f-number) allows more light.
• Sensor capture: The light, after passing through the lens, reaches the camera's image sensor.
The sensor is an array of millions of light-sensitive pixels that convert light energy into an
electrical signal. Each pixel measures the intensity of light falling on it.
• Photodetection and conversion: In a digital camera, the image sensor is typically a charge-
coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) sensor.
These sensors use photodiodes or photogates to convert photons (light particles) into
electrical charges. Each pixel's photodiode accumulates charge proportional to the light
intensity it receives.
• Analog-to-digital conversion: Once the sensor has captured the light as electrical charges,
an analog-to-digital converter (ADC) converts the analog charge signals into digital data.
This process assigns digital values (numbers) to represent the light intensity for each pixel.
The digital values form the basis of the image.
• Image processing: After the conversion, the digital image data may undergo additional
processing within the camera or in post-processing on a computer. This processing can
include color correction, noise reduction, sharpening, and other enhancements to improve
the image quality.
• Storage and display: Finally, the digital image is typically stored in a memory card or
internal memory within the camera. It can then be displayed on the camera's LCD screen
or transferred to a computer or other devices for further viewing, editing, or sharing.
Geometric Primitives
In computer vision, geometric primitives refer to basic geometric shapes or structures that are used
to represent and analyze visual information in images or 3D scenes. These primitives serve as
fundamental elements for tasks such as object detection, recognition, pose estimation, and
geometric transformations. Here are some commonly used geometric primitives in computer
vision:
• Points: Points in computer vision represent specific locations in an image or a 3D scene.
They are often used to represent the position of key features or landmarks, such as corners,
key points, or interest points.
• Lines and line segments: Lines and line segments are used to represent linear structures or
boundaries in images. They can be detected through techniques like edge detection and can
provide information about object boundaries, contours, or straight lines in the scene.
• Circles and ellipses: Circles and ellipses are used to represent circular or elliptical
structures in images. They are often utilized for tasks such as object detection, circle fitting,
shape analysis, and calibration.
• Rectangles and bounding boxes: Rectangles and bounding boxes are used to represent the
spatial extent or boundaries of objects in images. They are commonly employed for object
localization, object detection, and region-based analysis.
• Polygons: Polygons are used to represent complex, irregular shapes or regions in images
or 2D/3D scenes. They are often used for tasks such as object segmentation, region of
interest (ROI) extraction, and shape representation.
• Planes: Planes are used to represent flat surfaces or regions in 3D scenes. They can be used
for tasks such as plane detection, surface reconstruction, and scene understanding.
These geometric primitives provide a foundational framework for analyzing and manipulating
visual data in computer vision. By extracting and representing the geometric information from
images or scenes, computer vision algorithms can understand the spatial relationships, shapes, and
structures within the visual data, enabling a wide range of applications, including object
recognition, scene understanding, augmented reality, and robotics.

2D Transformations
In computer vision, 2D transformations are used to manipulate and alter the geometric properties
of images or objects in a 2D space. These transformations are applied to perform tasks such as
image alignment, registration, object tracking, and geometric correction. Here are some commonly
used 2D transformations in computer vision:
• Translation: Translation involves shifting an image or object in the x and y directions. It
moves the entire image/object without altering its shape or orientation. The translation is
useful for tasks such as aligning images, object tracking, and image stitching.
• Rotation: Rotation involves rotating an image or object by a certain angle around a
specified point, known as the rotation center. It can be clockwise or counterclockwise.
Rotation is used for tasks like image alignment, object recognition, and pose estimation.
• Scaling: Scaling involves resizing an image or object, either by enlarging or reducing its
size. It is performed uniformly in both the x and y directions. Scaling is useful for tasks
such as image resizing, object detection at different scales, and multi-resolution analysis.
• Shearing: Shearing is a transformation that slants an image or object along one axis while
keeping the other axis fixed. It stretches or compresses the image/object in a particular
direction. Shearing is utilized for tasks like correcting image distortions, texture mapping,
and skew correction.
• Affine Transformation: Affine transformations preserve parallel lines and ratios of
distances between points. They include combinations of translation, rotation, scaling, and
shearing. Affine transformations are used for tasks such as image registration, geometric
correction, and texture warping.

•
• Perspective Transformation: Perspective transformation is a more complex transformation
that allows for the simulation of a three-dimensional (3D) perspective effect on a 2D image
or object. It is used to correct distortions caused by perspective projection, rectify tilted or
skewed objects, and perform virtual camera transformations.
These 2D transformations enable the manipulation and adjustment of images or objects in
computer vision applications. By applying these transformations, we can align images, correct
distortions, estimate pose, track objects, and perform various geometric operations to enhance the
analysis and understanding of visual data.
3D Transformations
In computer vision, 3D transformations are used to manipulate and alter the geometric properties
of objects or scenes in a three-dimensional space. These transformations enable tasks such as 3D
object manipulation, camera pose estimation, 3D reconstruction, and augmented reality. Here are
some commonly used 3D transformations in computer vision:
• Translation: Translation in 3D involves moving an object or scene in the x, y, and z
directions. It displaces the entire object or scene without changing its shape or orientation.
Translation is used for tasks such as object positioning, scene alignment, and camera
motion compensation.
• Rotation: Rotation in 3D involves rotating an object or scene around a specified axis or
point. It can be performed in various ways, such as Euler angles, rotation matrices, or
quaternions. Rotation is crucial for tasks like object orientation estimation, camera pose
estimation, and viewpoint changes.
• Scaling: Scaling in 3D involves changing the size of an object or scene uniformly in all
three dimensions. It enlarges or reduces the size of the object or scene without altering its
shape. Scaling is used for tasks such as resizing 3D models, adjusting object dimensions,
and multi-scale analysis.
• Shearing: Shearing in 3D involves skewing the shape of an object or scene along one or
more axes. It distorts the object or scene in a specific direction. Shearing can be useful for
tasks such as perspective correction, texture mapping, and non-uniform scaling.
• Affine Transformation: Affine transformations in 3D combine translation, rotation,
scaling, and shearing. They preserve parallel lines, ratios of lengths, and angles. Affine
transformations are used for tasks such as 3D object registration, surface alignment, and
geometric correction.
• Perspective Transformation: Perspective transformation in 3D simulates the effects of
perspective projection, allowing the transformation of 3D objects or scenes to a 2D space.
It takes into account the position and orientation of the camera relative to the object or
scene. Perspective transformation is essential for tasks such as 3D-to-2D projection, virtual
camera rendering, and augmented reality.
These 3D transformations enable the manipulation, analysis, and rendering of three-dimensional
objects and scenes in computer vision applications. By applying these transformations, we can
align 3D models, estimate camera poses, reconstruct 3D scenes, and perform various geometric
operations for visualization, analysis, and interaction with three-dimensional visual data.
3D To 2D Projection.
In computer vision, the process of projecting a 3D scene onto a 2D image is known as 3D to 2D
projection. This projection simulates how a camera captures a scene in the real world and
transforms the 3D coordinates of objects into 2D image coordinates. The projection is governed
by the camera's intrinsic parameters (focal length, principal point) and the relative pose of the
camera with respect to the scene.
There are various mathematical models used for 3D to 2D projection, but one commonly used
model is the pinhole camera model. According to this model, a pinhole camera consists of a tiny
aperture (the pinhole) through which light passes to form an image on the camera's image plane.

The projection process involves the following steps:

Step1: Define the camera parameters:
• Focal length: The distance between the camera's optical center and the image plane.
• Principal point: The point where the optical axis intersects the image plane.
Step 2: Define the 3D scene:
• Represent the 3D objects in a suitable coordinate system (e.g., Cartesian or homogeneous
coordinates).
• Each point in the 3D scene is represented by its 3D coordinates (X, Y, Z).
Step 3: Compute the projection:
• Transform the 3D coordinates to the camera's coordinate system using the camera's
extrinsic parameters (position and orientation).
• Apply perspective projection to obtain the normalized image coordinates (x, y, z) using the
pinhole camera model.
x=f*X/Z
y=f*Y/Z
z=Z
where (x, y) are the 2D image coordinates, (X, Y, Z) are the 3D coordinates, and f is the focal
length.
Step 4: Apply camera calibration:
• Adjust the image coordinates based on the camera's intrinsic parameters (focal length,
principal point) and distortion parameters (if applicable).
Step 5: Map the normalized image coordinates to the actual image coordinates:
• Convert the normalized image coordinates to pixel coordinates using the image size and
scaling factors.
By following these steps, the 3D scene can be projected onto a 2D image, capturing the appearance
and perspective of the scene as seen by the camera. This projection is fundamental in computer
vision for tasks such as object detection, tracking, 3D reconstruction, and augmented reality.
Lighting, Reflectance, And Shading
In computer vision, lighting, reflectance, and shading are important concepts that play a significant
role in understanding and analyzing the appearance of objects in images or scenes. These concepts
help to model and interpret how light interacts with objects and surfaces. Let's explore each of
these concepts:
• Lighting: Lighting refers to the illumination of a scene or object by one or more light
sources. In computer vision, lighting models are used to simulate and understand how light
interacts with objects and affects their appearance. Various lighting models, such as
ambient lighting, directional lighting, and point lighting, are employed to capture the
intensity, direction, and color of light sources.
• Reflectance: Reflectance describes how light is reflected off the surface of an object. It
characterizes the surface properties of an object, such as its color, texture, glossiness, and
transparency. Reflectance models aim to estimate or represent how different materials or
surfaces reflect light under various lighting conditions. Common reflectance models
include Lambertian (diffuse) reflectance, Phong reflectance, and the Bidirectional
Reflectance Distribution Function (BRDF).
• Shading: Shading refers to the variation of light intensity and color across an object's
surface, resulting from the interaction between lighting and reflectance. Shading models
are used to compute the brightness and color of pixels on an object's surface based on the
lighting conditions and reflectance properties. These models take into account factors such
as light direction, surface orientation, and the material's response to light. Shading
techniques include Gouraud shading, Phong shading, and more advanced approaches like
ray tracing.
Understanding lighting, reflectance, and shading is crucial for tasks such as material recognition,
shape estimation, object tracking, and realistic rendering in computer vision. By modeling and
analyzing the interplay between these factors, computer vision algorithms can infer object
properties, estimate surface normals, recover shape information, and enhance the visual realism of
rendered scenes.
SAMPLING AND ALIASING
In computer vision, sampling and aliasing are important concepts that have implications for the
acquisition, representation, and processing of images and visual data.
Sampling in Computer Vision:

Sampling in computer vision refers to the process of converting a continuous image into a discrete
representation by selecting a finite set of points (pixels) from the continuous image. Each pixel
represents the color or intensity value at a specific location in the image. The sampling rate or pixel
resolution determines the level of detail and fidelity in representing the original image. Higher
sampling rates capture more details, while lower sampling rates result in a loss of information.
Aliasing in Computer Vision:
Aliasing in computer vision occurs when the sampling rate is insufficient to accurately represent
high-frequency components or details in an image. It leads to the appearance of unwanted visual
artifacts, which can distort or misrepresent the original image. Aliasing in images is typically
observed as moiré patterns, jagged edges, or false patterns that are not present in the continuous
version of the image.

Aliasing can occur in computer vision applications in various scenarios, such as:
• Image downsampling: When reducing the resolution of an image (e.g., for storage or
display purposes), aliasing can occur if high-frequency information is not properly
represented or lost.
• Edge detection: Aliasing can affect the accuracy of edge detection algorithms, as high
frequency details may be misrepresented or missed, leading to incomplete or inaccurate
edge detection.
• Texture analysis: When analyzing textures in an image, aliasing can introduce false
patterns or distort the appearance of textures, making it challenging to extract meaningful
texture information.
To mitigate aliasing in computer vision, various techniques can be applied:
• Anti-aliasing filters: Pre-filtering the image with anti-aliasing filters can limit the high-
frequency content before sampling, reducing the likelihood of aliasing artifacts.
• Super-resolution techniques: Super-resolution methods aim to reconstruct a higher-
resolution image from a lower-resolution version, leveraging information from multiple
frames or image patches to recover details and mitigate aliasing.
• Post-processing and smoothing: Applying post-processing techniques, such as smoothing
or adaptive filtering, can help reduce aliasing artifacts after sampling or during image
processing tasks.
Understanding the potential for aliasing and employing appropriate sampling strategies and anti-
aliasing techniques are crucial in computer vision to ensure accurate representation, analysis, and
processing of images and visual data.
Point Operators
In image processing, point operators, also known as pixel-wise operations or point-wise
operations, are a class of operations that are applied to individual pixels of an image independently.
These operators typically transform the pixel values based on specific mathematical functions or
predefined lookup tables. Point operators are widely used for image enhancement, contrast
adjustment, grayscale conversion, and other pixel-level manipulations. Here are some commonly
used point operators:

• Contrast Stretching: Contrast stretching expands an image’s dynamic range of pixel values.
It aims to increase the contrast between dark and bright regions by mapping the pixel values
to a wider range. This can be achieved using linear or nonlinear stretching functions.
• Histogram Equalization: Histogram equalization redistributes the pixel values in an image
to achieve a more uniform histogram. It enhances the contrast by mapping the original
pixel values to a new range based on the cumulative distribution function (CDF) of the
image histogram.
• Gamma Correction: Gamma correction adjusts the image's tonal scale by applying a power-
law transformation to the pixel values. It is used to correct the nonlinear response of
imaging devices or to modify the perceived brightness and contrast of an image.
• Thresholding: Thresholding converts a grayscale image into a binary image by assigning a
specific threshold value. Pixels above the threshold are set to one, while pixels below the
threshold are set to zero. It is commonly used for image segmentation and object extraction.
• Inversion: Inversion, also known as negative transformation, reverses the intensity values
of an image. Dark regions become bright, and bright regions become dark. It is often used
for special effects or for highlighting certain features.
• Color Space Conversion: Color space conversion transforms the representation of an image
from one color space to another. For example, converting an RGB image to grayscale or
converting between different color models such as RGB, HSV, CMYK, etc.
These are just a few examples of point operators commonly used in image processing. Point
operators are simple and efficient operations that can be applied directly to individual pixels
without considering their spatial relationships. They are essential tools for manipulating pixel
values, enhancing image appearance, and preparing images for further analysis or processing tasks.

PIXEL TRANSFORMS IN COMPUTER VISION

Pixel transforms in computer vision refer to mathematical operations that are applied to individual
pixels or groups of pixels in an image. These transformations can alter the pixel values, perform
image enhancement, or extract useful information from the image. Here are some commonly used
pixel transforms in computer vision along with their mathematical formulations:
Linear Transformations:
Contrast Stretching: The pixel values are linearly transformed to expand the image’s dynamic
range.

Mathematically, a pixel value p is transformed as:

p' = (p - min_val) * (new_max - new_min) / (max_val - min_val) + new_min
where min_val and max_val are the minimum and maximum pixel values in the original image,
and new_min and new_max are the desired minimum and maximum values for the transformed
image.
Gamma Correction: The pixel values are adjusted using a power-law transformation to modify the
image's tonal scale.
Mathematically, a pixel value p is transformed as:
p' = c * p^gamma
where c is a constant and gamma is the gamma value determining the transformation.
Non-linear Transformations:
Histogram Equalization: The pixel values are redistributed to achieve a more uniform histogram.
Mathematically, a pixel value p is transformed based on the cumulative distribution function
(CDF) of the image histogram.

Logarithmic Transform: The pixel values are logarithmically transformed to enhance the
visualization of details in darker regions.
Mathematically, a pixel value p is transformed as:
p' = c * log(1 + p)
where c is a constant determining the scaling factor.
Thresholding:
Global Thresholding: A threshold value is applied to convert grayscale images into binary images.
Mathematically, a pixel value p is transformed as:
p' = 0 if p < threshold
p' = 1 if p >= threshold
Adaptive Thresholding: Different threshold values are computed for different regions of the image
based on local characteristics.
Mathematically, a pixel value p in a local neighborhood is transformed based on its relationship
with the local threshold.
These are just a few examples of pixel transforms in computer vision. The mathematical
formulations provided here serve as a general representation of how the pixel values can be
modified. The specific parameter values and adaptation methods may vary based on the application
and desired outcome
Color Transforms
Color transforms in computer vision involve converting an image from one color space to another.
These transformations allow for manipulating and extracting information from color images. Here
are some commonly used color transforms in computer vision along with their mathematical
formulations:
RGB to Grayscale Conversion:
Lightness method: The grayscale value is computed as the average of the red, green, and blue
components.
Mathematically, given an RGB pixel value (R, G, B), the grayscale value G' is computed as:
G' = (R + G + B) / 3
Luminosity method: The grayscale value is computed by weighting the red, green, and blue
components according to their perceived brightness.
Mathematically, given an RGB pixel value (R, G, B), the grayscale value G' is computed as:
G' = 0.21R + 0.72G + 0.07B
RGB to HSV Conversion:
HSV (Hue, Saturation, Value) color space is commonly used for color-based image processing
and segmentation.
The conversion involves transforming the RGB values to their corresponding HSV values using
specific formulas. These formulas vary depending on the implementation and may involve
normalization or rescaling operations.
RGB to LAB Conversion:
LAB color space is designed to be more perceptually uniform than RGB, making it useful for
color-based image analysis and computer vision tasks.
The conversion involves transforming the RGB values to their corresponding LAB values using
specific formulas. These formulas include non-linear transformations and normalization
operations.
Color Balance Adjustment:
Color balance adjustments involve modifying the color channels of an image to correct color cast
or achieve a desired color tone.
Mathematically, given an RGB pixel value (R, G, B), the adjusted RGB values are computed as:
R' = R + ΔR
G' = G + ΔG
B' = B + ΔB
where ΔR, ΔG, and ΔB are the color balance adjustments applied to each channel.
Color Quantization:
Color quantization reduces the number of distinct colors in an image, which can be useful for
compression or simplifying color-based processing.
Mathematically, color quantization involves grouping similar colors and replacing them with
representative color values. Various algorithms and techniques can be used for color quantization,
such as k-means clustering or median cut.
The specific mathematical formulations provided here serve as a general representation of the
transformations involved. Different color spaces and algorithms may have variations in their
mathematical formulations, depending on the specific implementation and application.
HISTOGRAM EQUALIZATION
Histogram equalization is a technique used in image processing to enhance the contrast and
improve the overall appearance of an image. It redistributes the pixel intensity values to achieve a
more uniform histogram, thereby increasing the image’s dynamic range. Here's an example of
histogram equalization step-by-step:

Step 1: Compute the Image Histogram:

Calculate the histogram of the image, which represents the distribution of pixel intensities. The
histogram is a count of the number of pixels at each intensity level.
Step 2: Compute the Cumulative Distribution Function (CDF):
Compute the cumulative distribution function (CDF) of the histogram. The CDF represents the
cumulative probability of each intensity value occurring in the image.
Step 3: Normalize the CDF:
Normalize the CDF values so that they range from 0 to 1. This is done by dividing each CDF value
by the total number of pixels in the image.
Step 4: Map the Intensity Values:
For each pixel in the image, map its original intensity value to a new value based on the normalized
CDF.
Multiply the normalized CDF value by the maximum intensity level (e.g., 255 for an 8-bit image)
and round it to the nearest integer to obtain the new intensity value.
Replace the original intensity value in the image with the new mapped value.
Generate the Equalized Image:
After mapping the intensity values for all pixels in the image, the result is an equalized image with
improved contrast and a more uniform histogram.
An example to illustrate histogram equalization is as follows:
Original Image:
50 120 80
200 90 30
170 10 60
Compute the Image Histogram:
Intensity | Count
----------+------
10 | 1
30 | 1
50 | 1
60 | 1
80 | 1
90 | 1
120 | 1
170 | 1
200 | 1
Compute the CDF:
Intensity | Count | CDF
----------+-------+------
10 | 1 | 0.1
30 | 1 | 0.2
50 | 1 | 0.3
60 | 1 | 0.4
80 | 1 | 0.5
90 | 1 | 0.6
120 | 1 | 0.7
170 | 1 | 0.8
200 | 1 | 0.9
Normalize the CDF:
Intensity | Count | CDF
----------+-------+------
10 | 1 | 0.111
30 | 1 | 0.222
50 | 1 | 0.333
60 | 1 | 0.444
80 | 1 | 0.556
90 | 1 | 0.667
120 | 1 | 0.778
170 | 1 | 0.889
200 | 1 | 1.000
Map the Intensity Values:
Multiply the normalized CDF by 255 and round to the nearest integer:
Intensity | New Intensity
----------+--------------
10 | 28
30 | 56
50 | 84
60 | 112
80 | 141
90 | 169
120 | 197
170 | 225
200 | 255
Generate the Equalized Image:
Equalized Image:

84 197 141
255 169 56
225 28 112
The equalized image has a more balanced distribution of intensity values, resulting in improved
contrast compared to the original image.
Histogram equalization is a widely used technique in image processing and computer vision to
enhance image appearance and improve subsequent analysis or feature extraction tasks.
LINEAR FILTERING
Linear filtering is a common technique used in computer vision for various tasks such as image
smoothing, edge detection, noise reduction, and feature enhancement. It involves convolving an
image with a filter kernel, which is a small matrix of coefficients. Here's an example of linear
filtering step-by-step:

Step 1:Choose a Filter Kernel:

• Select a filter kernel based on the desired operation. Different kernels produce different
effects on the image.
Examples of commonly used filter kernels include Gaussian, Laplacian, Sobel, and Prewitt filters.
Step 2: Convolve the Image with the Filter Kernel:
• Place the filter kernel on top of each pixel in the image.
• Perform element-wise multiplication between the filter kernel and the corresponding pixel
neighborhood.
• Sum up the multiplied values to obtain the new pixel value.
Step 3: Repeat the Process for All Pixels:
• Slide the filter kernel across the entire image, performing the convolution operation at each
pixel location.
• The resulting values form a new filtered image.
An example of applying a Gaussian filter for image smoothing using linear filtering is as follows:
Original Image:
140 155 146 154
146 112 189 122
130 150 155 168
128 145 112 142
Gaussian Filter Kernel:
1 2 1
2 4 2
1 2 1
Convolution:
140*1 + 155*2 + 146*1 + 146*2 + 112*4 + 189*2 + 130*1 + 150*2 + 155*1 = 1269
155*1 + 146*2 + 154*1 + 112*2 + 189*4 + 122*2 + 150*1 + 155*2 + 168*1 = 1457
Filtered Image:
131 143 141 148
143 138 155 142
137 144 152 153
135 140 135 144
In this example, the Gaussian filter kernel is convolved with the original image to perform
smoothing. Each pixel in the filtered image is computed by multiplying the corresponding
neighborhood pixels with the filter coefficients and summing up the results.
Linear filtering is a fundamental technique in computer vision and image processing. It allows for
various operations like blurring, edge detection, and feature extraction by applying appropriate
filter kernels to an image. The choice of the filter kernel depends on the desired effect and
application-specific requirements.
NON-LINEAR FILTERING IN COMPUTER VISION
Non-linear filtering in computer vision involves applying filters that do not rely solely on the linear
combination of pixel values. These filters use more complex operations that consider the
relationships between neighboring pixels. They are effective for tasks such as noise removal, edge-
preserving smoothing, and image enhancement. An example of a commonly used non-linear filter
called the Median Filter is described below:
Median Filter

The Median Filter is a widely used non-linear filtering technique in image processing for noise
reduction. It replaces each pixel value with the median value of its neighboring pixels, effectively
removing outliers and preserving edges and details.
Original Image:
70 80 75 55 90
65 75 80 75 60
45 70 50 85 95
75 55 90 80 70
80 75 65 70 85
Applying a 3x3 Median Filter:
70 70 75 55 75
65 70 75 70 70
70 70 75 75 75
75 75 75 75 75
75 75 70 75 75
In this example, a 3x3 Median Filter is applied to the original image. The filter moves over the
image, and at each location, the pixel values within the 3x3 neighborhood are sorted, and the
median value is selected as the new pixel value. The median is the middle value when the pixel
values are arranged in ascending order.
For example, let's consider the pixel at the center of the image with an original value of 50. The
neighboring pixel values within the 3x3 kernel are [70, 75, 55, 90, 80, 75, 60, 70, 50]. After sorting
them in ascending order, we have [50, 55, 55, 60, 70, 70, 75, 75, 80]. The median value is 70, so
the new pixel value becomes 70.
The Median Filter effectively reduces noise, especially salt-and-pepper noise, while preserving the
overall structure and important image details. It is particularly useful when dealing with images
corrupted by random isolated pixel values that can significantly affect the image quality.
Note that the size of the filter kernel (e.g., 3x3, 5x5) can be adjusted based on the desired filtering
effect and the level of noise present in the image. Larger kernel sizes provide stronger noise
reduction but can also blur fine details.
Bilateral Filter
The Bilateral Filter is a non-linear filtering technique used in image processing to reduce noise
while preserving edges. It combines both spatial and intensity information to compute the filtered
pixel values. The mathematical formulation of the Bilateral Filter can be described as follows:
Given an input image I(x, y), where (x, y) represents the spatial coordinates, and a pixel at location
(x, y) with intensity I(x, y), the filtered output pixel value O(x, y) is computed as:
O(x, y) = (1 / W(x, y)) * ∑∑ I(i, j) * Gs(||(x, y) - (i, j)||) * Gr(|I(x, y) - I(i, j)|)
In this formulation: W(x, y) is the normalization factor defined as the sum of the weighting
coefficients for each pixel in the neighborhood.
Gs(||(x, y) - (i, j)||) is the spatial Gaussian kernel, which determines the influence of spatial distance
between the center pixel (x, y) and its neighboring pixel (i, j). It is defined as a function of the
Euclidean distance ||(x, y) - (i, j)||.
Gr(|I(x, y) - I(i, j)|) is the range Gaussian kernel, which determines the influence of the intensity
difference between the center pixel (x, y) and its neighboring pixel (i, j). It is defined as a function
of the absolute difference in intensity values |I(x, y) - I(i, j)|.
The ∑∑ represents the summation over the spatial neighborhood defined around the pixel (x, y),
and the range of summation is determined by the chosen filter window size.
The Bilateral Filter operates by calculating the weighted average of the neighboring pixel
intensities, where the weights are determined by the spatial distance and intensity difference. Pixels
that are close in space and have similar intensity contribute more to the filtered output, while
distant pixels or pixels with large intensity differences have less influence.
The Bilateral Filter strikes a balance between noise reduction and edge preservation by selectively
smoothing pixels while preserving sharp intensity transitions. It is a versatile filter commonly used
in image denoising, texture-preserving smoothing, and other applications where noise reduction
and detail preservation are important.
Original Image:
50 60 70 80 90
60 70 80 90 100
70 80 90 100 110
80 90 100 110 120
90 100 110 120 130
Applying a Bilateral Filter:
66 68 72 79 87
68 71 76 83 91
72 76 82 89 97
79 83 89 97 105
87 91 97 105 112
FOURIER TRANSFORMS
Fourier Transforms are widely used in computer vision for tasks such as image analysis, filtering,
and feature extraction. They provide a way to represent an image as a sum of sinusoidal
components, revealing the frequency content and spatial characteristics of the image. The
mathematical formulation of the Fourier Transform in two dimensions is as follows:
Given an input image f(x, y), where (x, y) represents the spatial coordinates, the Fourier Transform
F(u, v) is computed as:
F(u, v) = ∬ f(x, y) * e^(-i2π(ux + vy)) dx dy
In this formulation: F(u, v) represents the complex-valued frequency spectrum of the image at the
frequency components (u, v).
u and v represent the spatial frequencies along the horizontal and vertical dimensions, respectively.
f(x, y) is the intensity value at position (x, y) in the spatial domain.
e^(-i2π(ux + vy)) is the complex exponential function that represents the sinusoidal component at
a frequency (u, v) in the image.
The inverse Fourier Transform, which converts the frequency domain representation back to the
spatial domain, is given by:
f(x, y) = ∬ F(u, v) * e^(i2π(ux + vy)) du dv
The Fourier Transform provides a representation of the image in the frequency domain, where the
magnitude and phase of the frequency components convey information about the image's content.
By analyzing the Fourier spectrum, one can identify the presence of specific frequencies, such as
edges, textures, or periodic patterns, and manipulate them for various applications.
The Fast Fourier Transform (FFT) algorithm is commonly employed to efficiently compute the
Fourier Transform. It reduces the computational complexity from O(N^2) to O(N log N), making
it feasible for practical implementation.
In computer vision, Fourier Transforms are applied for tasks like frequency domain filtering,
image restoration, image compression, and feature extraction. They enable the analysis and
manipulation of images in the frequency domain, providing insights into the underlying spatial
structure and enabling the development of various image processing techniques. Wiener filtering
is a technique used in image processing for noise reduction and image restoration. It is based on
statistical estimation and aims to recover the original image by minimizing the mean square error
between the estimated image and the true image.

Let's consider an example where we have an image corrupted by additive Gaussian noise, and we
want to apply Wiener filtering to restore the original image.
Original Image:
120 140 160 180
130 150 170 190
140 160 180 200
150 170 190 210
Corrupted Image (with additive Gaussian noise):
118 141 157 182
127 153 175 187
138 159 182 202
155 168 192 215
To apply Wiener filtering, we need to estimate the power spectral density (PSD) of the original
image and the noise. Let's assume we know the PSD of the noise as:

PSD of Noise:
100 0 0 0
0 100 0 0
0 0 100 0
0 0 0 100
The Wiener filter estimation can be formulated as follows:
Estimated image = H(u, v) * Corrupted Image
---------------------------------
|H(u, v)|^2 + PSD of Noise / PSD of Image
where H(u, v) represents the transfer function of the filter.
Applying the Wiener filter to the corrupted image using the given PSD values, we get the estimated
image:

Estimated Image:
120 140 160 180
130 150 170 190
140 160 180 200
150 170 190 210
In this example, the Wiener filter successfully removes the additive Gaussian noise from the
corrupted image and recovers the original image. The filter exploits the knowledge of the noise
and image PSDs to estimate the ideal transfer function that minimizes the mean square error.
It's important to note that in practice, the PSDs may not be known exactly and need to be estimated
from the available data. Wiener filtering is effective when the statistics of the noise and image are
known or can be estimated accurately. It is commonly used for image restoration tasks where the
noise characteristics are well understood.

Computer Vision Notes
No ratings yet
Computer Vision Notes
72 pages
Computer Vision U1&2 Notes
No ratings yet
Computer Vision U1&2 Notes
62 pages
Making Machines See (Unit-3)
No ratings yet
Making Machines See (Unit-3)
8 pages
Computer Vision (7th Sem)
No ratings yet
Computer Vision (7th Sem)
48 pages
Computer VISION - 1
No ratings yet
Computer VISION - 1
21 pages
CV 1
No ratings yet
CV 1
21 pages
Computer Vision Lecture Notes Unit 1
No ratings yet
Computer Vision Lecture Notes Unit 1
21 pages
Xiiaiunit3making Machines See
No ratings yet
Xiiaiunit3making Machines See
12 pages
Computer Vision
No ratings yet
Computer Vision
7 pages
Unit I
No ratings yet
Unit I
25 pages
Unit 1
No ratings yet
Unit 1
21 pages
Unit 1
No ratings yet
Unit 1
200 pages
Deep Learning For Vision Book 2
No ratings yet
Deep Learning For Vision Book 2
292 pages
Making Machines See QAndA
No ratings yet
Making Machines See QAndA
3 pages
Making Machines See Class 12 Notes
No ratings yet
Making Machines See Class 12 Notes
6 pages
Computer Vision11 PDF
No ratings yet
Computer Vision11 PDF
18 pages
Unit 1
No ratings yet
Unit 1
15 pages
Computer
No ratings yet
Computer
22 pages
Computer Vision: Dr. Sukhendu Das Deptt. of Computer Science and Engg., IIT Madras, Chennai - 600036
No ratings yet
Computer Vision: Dr. Sukhendu Das Deptt. of Computer Science and Engg., IIT Madras, Chennai - 600036
21 pages
Computer Vision
No ratings yet
Computer Vision
2 pages
Computer & Machine Vision
No ratings yet
Computer & Machine Vision
2 pages
Computer Vision
No ratings yet
Computer Vision
22 pages
CV Gtu Answers
No ratings yet
CV Gtu Answers
56 pages
Image Manipulation Finall
No ratings yet
Image Manipulation Finall
7 pages
Al3502 - DLV Unit 1 Notes
No ratings yet
Al3502 - DLV Unit 1 Notes
15 pages
Lecture 1 AI Summary
No ratings yet
Lecture 1 AI Summary
31 pages
IPCV Unit 01
No ratings yet
IPCV Unit 01
18 pages
3-D Computer Vision: Yu-Jin Zhang
No ratings yet
3-D Computer Vision: Yu-Jin Zhang
453 pages
PDF Joiner
No ratings yet
PDF Joiner
38 pages
Unit 1 Computer Vision
No ratings yet
Unit 1 Computer Vision
10 pages
1
No ratings yet
1
22 pages
CV - Unit 1
No ratings yet
CV - Unit 1
14 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Unit 1
No ratings yet
Unit 1
20 pages
Computer Vision
No ratings yet
Computer Vision
8 pages
Computer Vision: Facial Recognition
No ratings yet
Computer Vision: Facial Recognition
9 pages
Class - Notes Computer Vision
No ratings yet
Class - Notes Computer Vision
3 pages
CV&IP Chapter-One
No ratings yet
CV&IP Chapter-One
28 pages
Computer Vision SM-1
No ratings yet
Computer Vision SM-1
26 pages
CO1 Notes
No ratings yet
CO1 Notes
105 pages
Computer Vision Seminar Report
No ratings yet
Computer Vision Seminar Report
45 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
4 pages
Previewpdf
No ratings yet
Previewpdf
37 pages
Computer Vision CS-6350: Prof. Sukhendu Das Deptt. of Computer Science and Engg., IIT Madras, Chennai - 600036
No ratings yet
Computer Vision CS-6350: Prof. Sukhendu Das Deptt. of Computer Science and Engg., IIT Madras, Chennai - 600036
48 pages
CV Unit 1 Overview of Computer Vison and Application
No ratings yet
CV Unit 1 Overview of Computer Vison and Application
51 pages
Computer Vision
No ratings yet
Computer Vision
19 pages
CV 4
No ratings yet
CV 4
8 pages
Computer Vision Basics Explained
No ratings yet
Computer Vision Basics Explained
35 pages
CCS338 Computer Vision Lecture Notes 1
No ratings yet
CCS338 Computer Vision Lecture Notes 1
99 pages
A Computer Vision System Processes Images Acquired
No ratings yet
A Computer Vision System Processes Images Acquired
4 pages
Computer Vision
No ratings yet
Computer Vision
4 pages
Study Material - CO-1 and CO-2 Updated
No ratings yet
Study Material - CO-1 and CO-2 Updated
53 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Lec 1 - 2
No ratings yet
Lec 1 - 2
39 pages
CV Unit-2
No ratings yet
CV Unit-2
21 pages
CV Unit-4
No ratings yet
CV Unit-4
10 pages
CV Unit-3
No ratings yet
CV Unit-3
11 pages
C Operators
No ratings yet
C Operators
12 pages
Array
No ratings yet
Array
27 pages
C - Conditional Operators
No ratings yet
C - Conditional Operators
1 page
Chemistry MCQs for Class XI Students
50% (2)
Chemistry MCQs for Class XI Students
42 pages
Array 1DD
No ratings yet
Array 1DD
7 pages
PHYSICS
80% (5)
PHYSICS
336 pages
Digital Signal Processing For Wireless Communication Using Matlab (2nd Edition) Gopi
No ratings yet
Digital Signal Processing For Wireless Communication Using Matlab (2nd Edition) Gopi
10 pages
2 5427330694831422442
No ratings yet
2 5427330694831422442
8 pages
Pro Sound Catalogue RCF - ENG
No ratings yet
Pro Sound Catalogue RCF - ENG
62 pages
Fetal ECG
No ratings yet
Fetal ECG
3 pages
Voicecontrolleddoorlock, The Article
100% (1)
Voicecontrolleddoorlock, The Article
7 pages
PDC Unit 1 PPT VL
No ratings yet
PDC Unit 1 PPT VL
72 pages
Montarbo 458 PDF
No ratings yet
Montarbo 458 PDF
59 pages
Pipelined Analog To Digital Converters: IIT Madras
No ratings yet
Pipelined Analog To Digital Converters: IIT Madras
55 pages
Example - High Speed DAC - ADC
No ratings yet
Example - High Speed DAC - ADC
11 pages
PWM Modes and Frequency Guide
No ratings yet
PWM Modes and Frequency Guide
14 pages
Photodetector Noise Analysis
No ratings yet
Photodetector Noise Analysis
5 pages
Industrial Accelerometers Current Output (4 To 20 Ma) : 1 Axis 2 Axis 3 Axis Range Max. Bandwidth Noise Floor
No ratings yet
Industrial Accelerometers Current Output (4 To 20 Ma) : 1 Axis 2 Axis 3 Axis Range Max. Bandwidth Noise Floor
1 page
Ss Important Questions
No ratings yet
Ss Important Questions
21 pages
ICA Manual
No ratings yet
ICA Manual
88 pages
DSP Lab Exercises for Students
No ratings yet
DSP Lab Exercises for Students
141 pages
Control System Toolbox (Part-II) : Imtiaz - Hussain@faculty - Muet.edu - PK
No ratings yet
Control System Toolbox (Part-II) : Imtiaz - Hussain@faculty - Muet.edu - PK
18 pages
SAS Module #11 - MRI
No ratings yet
SAS Module #11 - MRI
10 pages
S - S - Final
No ratings yet
S - S - Final
2 pages
Digital Image Watermarking Thesis
100% (2)
Digital Image Watermarking Thesis
4 pages
Image Denoising Using Wavelet Transform
No ratings yet
Image Denoising Using Wavelet Transform
7 pages
Dip Assignment (2) Solved
No ratings yet
Dip Assignment (2) Solved
8 pages
7 Steps To A Pro Mix Guide
No ratings yet
7 Steps To A Pro Mix Guide
12 pages
Reducing Front-End Bandwidth May Improve Digital GNSS Receiver Performance
No ratings yet
Reducing Front-End Bandwidth May Improve Digital GNSS Receiver Performance
6 pages
DLL - MATH 3 - Q3 - WEEK 1 Identifies Odd-Even-Visualizes and Represents
No ratings yet
DLL - MATH 3 - Q3 - WEEK 1 Identifies Odd-Even-Visualizes and Represents
11 pages
Signal and Systems Question Bank
No ratings yet
Signal and Systems Question Bank
9 pages
Development of Robust, Fast and Efficient QRS Complex Detector: A Methodological Review
No ratings yet
Development of Robust, Fast and Efficient QRS Complex Detector: A Methodological Review
20 pages
CODE Binary Worksheet
No ratings yet
CODE Binary Worksheet
6 pages
CSP039
No ratings yet
CSP039
4 pages
KDFX Kurzweil
No ratings yet
KDFX Kurzweil
3 pages
Analysis of Duobinary
No ratings yet
Analysis of Duobinary
5 pages

CV Unit-1

Uploaded by

CV Unit-1

Uploaded by

Introduction to Computer Vision

The projection process involves the following steps:

PIXEL TRANSFORMS IN COMPUTER VISION

Mathematically, a pixel value p is transformed as:

Step 1: Compute the Image Histogram:

Step 1:Choose a Filter Kernel:

You might also like