CV Unit-1
CV Unit-1
Computer vision is a field of artificial intelligence (AI) that focuses on enabling computers to
understand and interpret visual information from digital images or videos. It seeks to replicate the
complex visual perception and processing capabilities of the human visual system.
The goal of computer vision is to develop algorithms and systems that can analyze and understand
visual data, allowing computers to perceive the world in a way similar to humans. By extracting
meaningful information from images or videos, computer vision enables a wide range of
applications, such as object recognition, image classification, image segmentation, tracking, 3D
reconstruction, and even autonomous navigation.
Computer vision algorithms rely on a combination of image processing techniques, statistical
learning, and deep learning methods. Image processing techniques involve manipulating and
enhancing images to extract features and reduce noise, while statistical learning algorithms enable
the training of models based on labeled datasets. Deep learning, especially convolutional neural
networks (CNNs), has played a significant role in recent advancements in computer vision,
providing powerful tools for feature extraction and pattern recognition.
The applications of computer vision are vast and continue to expand rapidly. It is used in various
fields such as healthcare, self-driving cars, surveillance and security systems, robotics, augmented
reality, virtual reality, industrial automation, agriculture, and many others. Computer vision has
the potential to revolutionize industries and improve our daily lives by enabling machines to
perceive and understand the visual world around us.
The following diagram represents the components of a computer vision system. Keep in mind that
computer vision systems can vary depending on the specific application and complexity. Here is a
brief description of a basic computer vision system:
• Input: The system takes in an input image or video frame captured by a camera or sourced
from a dataset.
• Preprocessing: The input is preprocessed to enhance the quality and prepare it for further
analysis. Preprocessing steps may include resizing, cropping, color normalization, and
noise reduction.
• Feature Extraction: This stage involves extracting relevant features from the preprocessed
image or frame. Features can include edges, corners, textures, colors, or higher-level
representations like objects or shapes. Feature extraction techniques may involve filters,
gradients, or more advanced methods like deep learning-based feature extraction.
• Feature Representation: The extracted features are represented in a suitable format for
further analysis. This representation could be a vector, a histogram, a set of descriptors, or
any other format that encodes the information captured by the features.
• Object Detection and Recognition: Object detection and recognition algorithms are
employed to identify specific objects or patterns of interest within the image or video
frame. This can involve techniques like template matching, Haar cascades, or advanced
deep learning models such as convolutional neural networks (CNNs).
• Image Segmentation: Image segmentation aims to partition an image into meaningful
regions or segments based on properties like color, texture, or intensity. This step is useful
for tasks like object tracking, scene understanding, or image editing.
• High-Level Processing: Once objects or regions of interest are identified, further high-level
processing can be performed. This may involve tasks such as object tracking, pose
estimation, 3D reconstruction, behavior analysis, or semantic understanding.
• Output: The final stage involves providing the desired output based on the application
requirements. Outputs can include annotated images, object labels, measurements, or any
other information relevant to the specific task.
It is important to note that computer vision systems can be much more complex, incorporating
additional stages or variations based on the application's requirements. The diagram described
above represents a basic framework for understanding the components of a computer vision
system.
Image Formation
Image formation in a camera involves the process of capturing and creating a digital representation
of the visual scene in front of the camera. It consists of several steps, including the passage of light
through the camera lens, its interaction with the camera sensor, and the conversion of the captured
A simplified explanation of the image formation process in a digital camera is as follows:
• Light enters the camera: When you take a photo, light from the scene passes through the
camera's lens. The lens helps focus the light onto the camera's sensor.
• Lens and focusing: The camera lens plays a crucial role in image formation. It adjusts its
shape and position to focus the incoming light rays onto the camera sensor. This process
helps create a clear and sharp image.
• Aperture: The aperture is an adjustable opening in the lens that controls the amount of light
entering the camera. It determines the depth of field and affects the image's brightness and
sharpness. A smaller aperture (larger f-number) allows less light, while a larger aperture
(smaller f-number) allows more light.
• Sensor capture: The light, after passing through the lens, reaches the camera's image sensor.
The sensor is an array of millions of light-sensitive pixels that convert light energy into an
electrical signal. Each pixel measures the intensity of light falling on it.
• Photodetection and conversion: In a digital camera, the image sensor is typically a charge-
coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) sensor.
These sensors use photodiodes or photogates to convert photons (light particles) into
electrical charges. Each pixel's photodiode accumulates charge proportional to the light
intensity it receives.
• Analog-to-digital conversion: Once the sensor has captured the light as electrical charges,
an analog-to-digital converter (ADC) converts the analog charge signals into digital data.
This process assigns digital values (numbers) to represent the light intensity for each pixel.
The digital values form the basis of the image.
• Image processing: After the conversion, the digital image data may undergo additional
processing within the camera or in post-processing on a computer. This processing can
include color correction, noise reduction, sharpening, and other enhancements to improve
the image quality.
• Storage and display: Finally, the digital image is typically stored in a memory card or
internal memory within the camera. It can then be displayed on the camera's LCD screen
or transferred to a computer or other devices for further viewing, editing, or sharing.
Geometric Primitives
In computer vision, geometric primitives refer to basic geometric shapes or structures that are used
to represent and analyze visual information in images or 3D scenes. These primitives serve as
fundamental elements for tasks such as object detection, recognition, pose estimation, and
geometric transformations. Here are some commonly used geometric primitives in computer
vision:
• Points: Points in computer vision represent specific locations in an image or a 3D scene.
They are often used to represent the position of key features or landmarks, such as corners,
key points, or interest points.
• Lines and line segments: Lines and line segments are used to represent linear structures or
boundaries in images. They can be detected through techniques like edge detection and can
provide information about object boundaries, contours, or straight lines in the scene.
• Circles and ellipses: Circles and ellipses are used to represent circular or elliptical
structures in images. They are often utilized for tasks such as object detection, circle fitting,
shape analysis, and calibration.
• Rectangles and bounding boxes: Rectangles and bounding boxes are used to represent the
spatial extent or boundaries of objects in images. They are commonly employed for object
localization, object detection, and region-based analysis.
• Polygons: Polygons are used to represent complex, irregular shapes or regions in images
or 2D/3D scenes. They are often used for tasks such as object segmentation, region of
interest (ROI) extraction, and shape representation.
• Planes: Planes are used to represent flat surfaces or regions in 3D scenes. They can be used
for tasks such as plane detection, surface reconstruction, and scene understanding.
These geometric primitives provide a foundational framework for analyzing and manipulating
visual data in computer vision. By extracting and representing the geometric information from
images or scenes, computer vision algorithms can understand the spatial relationships, shapes, and
structures within the visual data, enabling a wide range of applications, including object
recognition, scene understanding, augmented reality, and robotics.
2D Transformations
In computer vision, 2D transformations are used to manipulate and alter the geometric properties
of images or objects in a 2D space. These transformations are applied to perform tasks such as
image alignment, registration, object tracking, and geometric correction. Here are some commonly
used 2D transformations in computer vision:
• Translation: Translation involves shifting an image or object in the x and y directions. It
moves the entire image/object without altering its shape or orientation. The translation is
useful for tasks such as aligning images, object tracking, and image stitching.
• Rotation: Rotation involves rotating an image or object by a certain angle around a
specified point, known as the rotation center. It can be clockwise or counterclockwise.
Rotation is used for tasks like image alignment, object recognition, and pose estimation.
• Scaling: Scaling involves resizing an image or object, either by enlarging or reducing its
size. It is performed uniformly in both the x and y directions. Scaling is useful for tasks
such as image resizing, object detection at different scales, and multi-resolution analysis.
• Shearing: Shearing is a transformation that slants an image or object along one axis while
keeping the other axis fixed. It stretches or compresses the image/object in a particular
direction. Shearing is utilized for tasks like correcting image distortions, texture mapping,
and skew correction.
• Affine Transformation: Affine transformations preserve parallel lines and ratios of
distances between points. They include combinations of translation, rotation, scaling, and
shearing. Affine transformations are used for tasks such as image registration, geometric
correction, and texture warping.
•
• Perspective Transformation: Perspective transformation is a more complex transformation
that allows for the simulation of a three-dimensional (3D) perspective effect on a 2D image
or object. It is used to correct distortions caused by perspective projection, rectify tilted or
skewed objects, and perform virtual camera transformations.
These 2D transformations enable the manipulation and adjustment of images or objects in
computer vision applications. By applying these transformations, we can align images, correct
distortions, estimate pose, track objects, and perform various geometric operations to enhance the
analysis and understanding of visual data.
3D Transformations
In computer vision, 3D transformations are used to manipulate and alter the geometric properties
of objects or scenes in a three-dimensional space. These transformations enable tasks such as 3D
object manipulation, camera pose estimation, 3D reconstruction, and augmented reality. Here are
some commonly used 3D transformations in computer vision:
• Translation: Translation in 3D involves moving an object or scene in the x, y, and z
directions. It displaces the entire object or scene without changing its shape or orientation.
Translation is used for tasks such as object positioning, scene alignment, and camera
motion compensation.
• Rotation: Rotation in 3D involves rotating an object or scene around a specified axis or
point. It can be performed in various ways, such as Euler angles, rotation matrices, or
quaternions. Rotation is crucial for tasks like object orientation estimation, camera pose
estimation, and viewpoint changes.
• Scaling: Scaling in 3D involves changing the size of an object or scene uniformly in all
three dimensions. It enlarges or reduces the size of the object or scene without altering its
shape. Scaling is used for tasks such as resizing 3D models, adjusting object dimensions,
and multi-scale analysis.
• Shearing: Shearing in 3D involves skewing the shape of an object or scene along one or
more axes. It distorts the object or scene in a specific direction. Shearing can be useful for
tasks such as perspective correction, texture mapping, and non-uniform scaling.
• Affine Transformation: Affine transformations in 3D combine translation, rotation,
scaling, and shearing. They preserve parallel lines, ratios of lengths, and angles. Affine
transformations are used for tasks such as 3D object registration, surface alignment, and
geometric correction.
• Perspective Transformation: Perspective transformation in 3D simulates the effects of
perspective projection, allowing the transformation of 3D objects or scenes to a 2D space.
It takes into account the position and orientation of the camera relative to the object or
scene. Perspective transformation is essential for tasks such as 3D-to-2D projection, virtual
camera rendering, and augmented reality.
These 3D transformations enable the manipulation, analysis, and rendering of three-dimensional
objects and scenes in computer vision applications. By applying these transformations, we can
align 3D models, estimate camera poses, reconstruct 3D scenes, and perform various geometric
operations for visualization, analysis, and interaction with three-dimensional visual data.
3D To 2D Projection.
In computer vision, the process of projecting a 3D scene onto a 2D image is known as 3D to 2D
projection. This projection simulates how a camera captures a scene in the real world and
transforms the 3D coordinates of objects into 2D image coordinates. The projection is governed
by the camera's intrinsic parameters (focal length, principal point) and the relative pose of the
camera with respect to the scene.
There are various mathematical models used for 3D to 2D projection, but one commonly used
model is the pinhole camera model. According to this model, a pinhole camera consists of a tiny
aperture (the pinhole) through which light passes to form an image on the camera's image plane.
Sampling in computer vision refers to the process of converting a continuous image into a discrete
representation by selecting a finite set of points (pixels) from the continuous image. Each pixel
represents the color or intensity value at a specific location in the image. The sampling rate or pixel
resolution determines the level of detail and fidelity in representing the original image. Higher
sampling rates capture more details, while lower sampling rates result in a loss of information.
Aliasing in Computer Vision:
Aliasing in computer vision occurs when the sampling rate is insufficient to accurately represent
high-frequency components or details in an image. It leads to the appearance of unwanted visual
artifacts, which can distort or misrepresent the original image. Aliasing in images is typically
observed as moiré patterns, jagged edges, or false patterns that are not present in the continuous
version of the image.
Aliasing can occur in computer vision applications in various scenarios, such as:
• Image downsampling: When reducing the resolution of an image (e.g., for storage or
display purposes), aliasing can occur if high-frequency information is not properly
represented or lost.
• Edge detection: Aliasing can affect the accuracy of edge detection algorithms, as high
frequency details may be misrepresented or missed, leading to incomplete or inaccurate
edge detection.
• Texture analysis: When analyzing textures in an image, aliasing can introduce false
patterns or distort the appearance of textures, making it challenging to extract meaningful
texture information.
To mitigate aliasing in computer vision, various techniques can be applied:
• Anti-aliasing filters: Pre-filtering the image with anti-aliasing filters can limit the high-
frequency content before sampling, reducing the likelihood of aliasing artifacts.
• Super-resolution techniques: Super-resolution methods aim to reconstruct a higher-
resolution image from a lower-resolution version, leveraging information from multiple
frames or image patches to recover details and mitigate aliasing.
• Post-processing and smoothing: Applying post-processing techniques, such as smoothing
or adaptive filtering, can help reduce aliasing artifacts after sampling or during image
processing tasks.
Understanding the potential for aliasing and employing appropriate sampling strategies and anti-
aliasing techniques are crucial in computer vision to ensure accurate representation, analysis, and
processing of images and visual data.
Point Operators
In image processing, point operators, also known as pixel-wise operations or point-wise
operations, are a class of operations that are applied to individual pixels of an image independently.
These operators typically transform the pixel values based on specific mathematical functions or
predefined lookup tables. Point operators are widely used for image enhancement, contrast
adjustment, grayscale conversion, and other pixel-level manipulations. Here are some commonly
used point operators:
• Contrast Stretching: Contrast stretching expands an image’s dynamic range of pixel values.
It aims to increase the contrast between dark and bright regions by mapping the pixel values
to a wider range. This can be achieved using linear or nonlinear stretching functions.
• Histogram Equalization: Histogram equalization redistributes the pixel values in an image
to achieve a more uniform histogram. It enhances the contrast by mapping the original
pixel values to a new range based on the cumulative distribution function (CDF) of the
image histogram.
• Gamma Correction: Gamma correction adjusts the image's tonal scale by applying a power-
law transformation to the pixel values. It is used to correct the nonlinear response of
imaging devices or to modify the perceived brightness and contrast of an image.
• Thresholding: Thresholding converts a grayscale image into a binary image by assigning a
specific threshold value. Pixels above the threshold are set to one, while pixels below the
threshold are set to zero. It is commonly used for image segmentation and object extraction.
• Inversion: Inversion, also known as negative transformation, reverses the intensity values
of an image. Dark regions become bright, and bright regions become dark. It is often used
for special effects or for highlighting certain features.
• Color Space Conversion: Color space conversion transforms the representation of an image
from one color space to another. For example, converting an RGB image to grayscale or
converting between different color models such as RGB, HSV, CMYK, etc.
These are just a few examples of point operators commonly used in image processing. Point
operators are simple and efficient operations that can be applied directly to individual pixels
without considering their spatial relationships. They are essential tools for manipulating pixel
values, enhancing image appearance, and preparing images for further analysis or processing tasks.
Logarithmic Transform: The pixel values are logarithmically transformed to enhance the
visualization of details in darker regions.
Mathematically, a pixel value p is transformed as:
p' = c * log(1 + p)
where c is a constant determining the scaling factor.
Thresholding:
Global Thresholding: A threshold value is applied to convert grayscale images into binary images.
Mathematically, a pixel value p is transformed as:
p' = 0 if p < threshold
p' = 1 if p >= threshold
Adaptive Thresholding: Different threshold values are computed for different regions of the image
based on local characteristics.
Mathematically, a pixel value p in a local neighborhood is transformed based on its relationship
with the local threshold.
These are just a few examples of pixel transforms in computer vision. The mathematical
formulations provided here serve as a general representation of how the pixel values can be
modified. The specific parameter values and adaptation methods may vary based on the application
and desired outcome
Color Transforms
Color transforms in computer vision involve converting an image from one color space to another.
These transformations allow for manipulating and extracting information from color images. Here
are some commonly used color transforms in computer vision along with their mathematical
formulations:
RGB to Grayscale Conversion:
Lightness method: The grayscale value is computed as the average of the red, green, and blue
components.
Mathematically, given an RGB pixel value (R, G, B), the grayscale value G' is computed as:
G' = (R + G + B) / 3
Luminosity method: The grayscale value is computed by weighting the red, green, and blue
components according to their perceived brightness.
Mathematically, given an RGB pixel value (R, G, B), the grayscale value G' is computed as:
G' = 0.21R + 0.72G + 0.07B
RGB to HSV Conversion:
HSV (Hue, Saturation, Value) color space is commonly used for color-based image processing
and segmentation.
The conversion involves transforming the RGB values to their corresponding HSV values using
specific formulas. These formulas vary depending on the implementation and may involve
normalization or rescaling operations.
RGB to LAB Conversion:
LAB color space is designed to be more perceptually uniform than RGB, making it useful for
color-based image analysis and computer vision tasks.
The conversion involves transforming the RGB values to their corresponding LAB values using
specific formulas. These formulas include non-linear transformations and normalization
operations.
Color Balance Adjustment:
Color balance adjustments involve modifying the color channels of an image to correct color cast
or achieve a desired color tone.
Mathematically, given an RGB pixel value (R, G, B), the adjusted RGB values are computed as:
R' = R + ΔR
G' = G + ΔG
B' = B + ΔB
where ΔR, ΔG, and ΔB are the color balance adjustments applied to each channel.
Color Quantization:
Color quantization reduces the number of distinct colors in an image, which can be useful for
compression or simplifying color-based processing.
Mathematically, color quantization involves grouping similar colors and replacing them with
representative color values. Various algorithms and techniques can be used for color quantization,
such as k-means clustering or median cut.
The specific mathematical formulations provided here serve as a general representation of the
transformations involved. Different color spaces and algorithms may have variations in their
mathematical formulations, depending on the specific implementation and application.
HISTOGRAM EQUALIZATION
Histogram equalization is a technique used in image processing to enhance the contrast and
improve the overall appearance of an image. It redistributes the pixel intensity values to achieve a
more uniform histogram, thereby increasing the image’s dynamic range. Here's an example of
histogram equalization step-by-step:
84 197 141
255 169 56
225 28 112
The equalized image has a more balanced distribution of intensity values, resulting in improved
contrast compared to the original image.
Histogram equalization is a widely used technique in image processing and computer vision to
enhance image appearance and improve subsequent analysis or feature extraction tasks.
LINEAR FILTERING
Linear filtering is a common technique used in computer vision for various tasks such as image
smoothing, edge detection, noise reduction, and feature enhancement. It involves convolving an
image with a filter kernel, which is a small matrix of coefficients. Here's an example of linear
filtering step-by-step:
The Median Filter is a widely used non-linear filtering technique in image processing for noise
reduction. It replaces each pixel value with the median value of its neighboring pixels, effectively
removing outliers and preserving edges and details.
Original Image:
70 80 75 55 90
65 75 80 75 60
45 70 50 85 95
75 55 90 80 70
80 75 65 70 85
Applying a 3x3 Median Filter:
70 70 75 55 75
65 70 75 70 70
70 70 75 75 75
75 75 75 75 75
75 75 70 75 75
In this example, a 3x3 Median Filter is applied to the original image. The filter moves over the
image, and at each location, the pixel values within the 3x3 neighborhood are sorted, and the
median value is selected as the new pixel value. The median is the middle value when the pixel
values are arranged in ascending order.
For example, let's consider the pixel at the center of the image with an original value of 50. The
neighboring pixel values within the 3x3 kernel are [70, 75, 55, 90, 80, 75, 60, 70, 50]. After sorting
them in ascending order, we have [50, 55, 55, 60, 70, 70, 75, 75, 80]. The median value is 70, so
the new pixel value becomes 70.
The Median Filter effectively reduces noise, especially salt-and-pepper noise, while preserving the
overall structure and important image details. It is particularly useful when dealing with images
corrupted by random isolated pixel values that can significantly affect the image quality.
Note that the size of the filter kernel (e.g., 3x3, 5x5) can be adjusted based on the desired filtering
effect and the level of noise present in the image. Larger kernel sizes provide stronger noise
reduction but can also blur fine details.
Bilateral Filter
The Bilateral Filter is a non-linear filtering technique used in image processing to reduce noise
while preserving edges. It combines both spatial and intensity information to compute the filtered
pixel values. The mathematical formulation of the Bilateral Filter can be described as follows:
Given an input image I(x, y), where (x, y) represents the spatial coordinates, and a pixel at location
(x, y) with intensity I(x, y), the filtered output pixel value O(x, y) is computed as:
O(x, y) = (1 / W(x, y)) * ∑∑ I(i, j) * Gs(||(x, y) - (i, j)||) * Gr(|I(x, y) - I(i, j)|)
In this formulation: W(x, y) is the normalization factor defined as the sum of the weighting
coefficients for each pixel in the neighborhood.
Gs(||(x, y) - (i, j)||) is the spatial Gaussian kernel, which determines the influence of spatial distance
between the center pixel (x, y) and its neighboring pixel (i, j). It is defined as a function of the
Euclidean distance ||(x, y) - (i, j)||.
Gr(|I(x, y) - I(i, j)|) is the range Gaussian kernel, which determines the influence of the intensity
difference between the center pixel (x, y) and its neighboring pixel (i, j). It is defined as a function
of the absolute difference in intensity values |I(x, y) - I(i, j)|.
The ∑∑ represents the summation over the spatial neighborhood defined around the pixel (x, y),
and the range of summation is determined by the chosen filter window size.
The Bilateral Filter operates by calculating the weighted average of the neighboring pixel
intensities, where the weights are determined by the spatial distance and intensity difference. Pixels
that are close in space and have similar intensity contribute more to the filtered output, while
distant pixels or pixels with large intensity differences have less influence.
The Bilateral Filter strikes a balance between noise reduction and edge preservation by selectively
smoothing pixels while preserving sharp intensity transitions. It is a versatile filter commonly used
in image denoising, texture-preserving smoothing, and other applications where noise reduction
and detail preservation are important.
Original Image:
50 60 70 80 90
60 70 80 90 100
70 80 90 100 110
80 90 100 110 120
90 100 110 120 130
Applying a Bilateral Filter:
66 68 72 79 87
68 71 76 83 91
72 76 82 89 97
79 83 89 97 105
87 91 97 105 112
FOURIER TRANSFORMS
Fourier Transforms are widely used in computer vision for tasks such as image analysis, filtering,
and feature extraction. They provide a way to represent an image as a sum of sinusoidal
components, revealing the frequency content and spatial characteristics of the image. The
mathematical formulation of the Fourier Transform in two dimensions is as follows:
Given an input image f(x, y), where (x, y) represents the spatial coordinates, the Fourier Transform
F(u, v) is computed as:
F(u, v) = ∬ f(x, y) * e^(-i2π(ux + vy)) dx dy
In this formulation: F(u, v) represents the complex-valued frequency spectrum of the image at the
frequency components (u, v).
u and v represent the spatial frequencies along the horizontal and vertical dimensions, respectively.
f(x, y) is the intensity value at position (x, y) in the spatial domain.
e^(-i2π(ux + vy)) is the complex exponential function that represents the sinusoidal component at
a frequency (u, v) in the image.
The inverse Fourier Transform, which converts the frequency domain representation back to the
spatial domain, is given by:
f(x, y) = ∬ F(u, v) * e^(i2π(ux + vy)) du dv
The Fourier Transform provides a representation of the image in the frequency domain, where the
magnitude and phase of the frequency components convey information about the image's content.
By analyzing the Fourier spectrum, one can identify the presence of specific frequencies, such as
edges, textures, or periodic patterns, and manipulate them for various applications.
The Fast Fourier Transform (FFT) algorithm is commonly employed to efficiently compute the
Fourier Transform. It reduces the computational complexity from O(N^2) to O(N log N), making
it feasible for practical implementation.
In computer vision, Fourier Transforms are applied for tasks like frequency domain filtering,
image restoration, image compression, and feature extraction. They enable the analysis and
manipulation of images in the frequency domain, providing insights into the underlying spatial
structure and enabling the development of various image processing techniques. Wiener filtering
is a technique used in image processing for noise reduction and image restoration. It is based on
statistical estimation and aims to recover the original image by minimizing the mean square error
between the estimated image and the true image.
Let's consider an example where we have an image corrupted by additive Gaussian noise, and we
want to apply Wiener filtering to restore the original image.
Original Image:
120 140 160 180
130 150 170 190
140 160 180 200
150 170 190 210
Corrupted Image (with additive Gaussian noise):
118 141 157 182
127 153 175 187
138 159 182 202
155 168 192 215
To apply Wiener filtering, we need to estimate the power spectral density (PSD) of the original
image and the noise. Let's assume we know the PSD of the noise as:
PSD of Noise:
100 0 0 0
0 100 0 0
0 0 100 0
0 0 0 100
The Wiener filter estimation can be formulated as follows:
Estimated image = H(u, v) * Corrupted Image
---------------------------------
|H(u, v)|^2 + PSD of Noise / PSD of Image
where H(u, v) represents the transfer function of the filter.
Applying the Wiener filter to the corrupted image using the given PSD values, we get the estimated
image:
Estimated Image:
120 140 160 180
130 150 170 190
140 160 180 200
150 170 190 210
In this example, the Wiener filter successfully removes the additive Gaussian noise from the
corrupted image and recovers the original image. The filter exploits the knowledge of the noise
and image PSDs to estimate the ideal transfer function that minimizes the mean square error.
It's important to note that in practice, the PSDs may not be known exactly and need to be estimated
from the available data. Wiener filtering is effective when the statistics of the noise and image are
known or can be estimated accurately. It is commonly used for image restoration tasks where the
noise characteristics are well understood.