1.
Object Detection using Sliding Window and Region Proposal
Sliding Window Technique:
A fixed-size window slides across the image (horizontally and vertically).
At each location, the sub-region is passed to a classifier (e.g., SVM, CNN) to determine
whether it contains the object.
Drawbacks: High computational cost, slow, and doesn’t handle scale changes well.
Region Proposal Methods:
Instead of exhaustive sliding, these methods propose regions likely to contain objects.
Selective Search: Groups similar regions based on color, texture, size.
Edge Boxes: Generates boxes based on edge information.
These proposals are then passed to CNNs for classification and bounding box regression.
Viola-Jones for Face Detection:
Uses Haar-like features computed with an integral image.
A cascade of classifiers quickly eliminates non-face regions.
Adaboost is used to select the best features.
Deep Learning Models for Face Detection:
MTCNN, RetinaFace, and others use CNNs to detect faces at different scales.
Provide higher accuracy and robustness to pose, illumination, and occlusion variations.
2. Comparison of YOLO, SSD, and Faster R-CNN; Harris and Shi-Tomasi Corner Detection
Feature YOLO SSD Faster R-CNN
Type Single-shot Single-shot Two-stage
Speed Very fast Fast Slower
Accuracy Moderate-high Moderate-high High
Pipeline Unified CNN Multi-scale CNN RPN + Detection
Harris Corner Detector:
Measures intensity change in all directions.
Uses the second-moment matrix.
Corner response: R=det(M)−k(trace(M))2R = det(M) - k(trace(M))^2R=det(M)−k(trace(M))2
Shi-Tomasi:
Improves Harris by using minimum eigenvalue of the matrix MMM.
A point is a good corner if the smallest eigenvalue is above a threshold.
3. Hough Transform and Morphological Operations
Hough Transform:
Used to detect lines and shapes (e.g., circles).
Transforms each point to parameter space (e.g., lines in polar form: ρ=xcosθ+ysinθ\rho = x\
cos\theta + y\sin\thetaρ=xcosθ+ysinθ).
Peaks in accumulator space indicate lines.
Morphological Operations:
Applied to binary images.
Based on structuring elements.
Erosion:
Shrinks white regions.
Removes small noise and separates objects.
Dilation:
Expands white regions.
Fills small holes.
Sobel Edge Detection:
Uses gradient filters in X and Y directions.
Highlights edges based on intensity change.
Canny Edge Detection:
Steps: Gaussian blur → Gradient → Non-maximum suppression → Hysteresis thresholding.
Produces clean and continuous edges.
4. Image Segmentation Techniques
Thresholding:
Converts grayscale to binary using a threshold value.
Global Thresholding:
Single threshold for the whole image.
Adaptive Thresholding:
Threshold computed locally for different regions.
Useful for uneven lighting.
Region-Based Segmentation:
Groups pixels with similar properties.
Region growing starts from a seed and includes neighboring pixels.
Opening:
Erosion followed by dilation.
Removes small objects.
Closing:
Dilation followed by erosion.
Fills small holes.
5. SIFT, SURF, Viola-Jones, and Panorama Matching
SIFT (Scale-Invariant Feature Transform):
Detects keypoints invariant to scale, rotation.
Steps: Scale-space extrema → Keypoint localization → Orientation → Descriptor.
SURF (Speeded-Up Robust Features):
Faster than SIFT using integral images and box filters.
Less accurate but computationally efficient.
Viola-Jones:
As explained earlier, uses Haar features, Adaboost, and cascade classifiers.
Keypoint Matching in Panorama Creation:
Detect keypoints using SIFT/SURF.
Match descriptors between images.
Estimate transformation (homography).
Warp and blend images to create panorama.
6. CNN Architectures and Harris Corner Detection
YOLO:
Predicts bounding boxes and class probabilities directly from image in one pass.
Very fast; used in real-time applications.
Faster R-CNN:
Region Proposal Network (RPN) suggests regions.
These are classified and refined by the detector head.
Harris Corner Detector (Revisited):
Computes gradient matrix for each pixel.
Uses eigenvalues of matrix to detect corners.
7. Fundamentals of Computer Vision
Computer Vision:
Field enabling machines to interpret and understand visual information.
Applications: object detection, facial recognition, autonomous vehicles, etc.
Pixels:
Smallest element of an image.
Each pixel stores color/intensity values.
Resolution:
Number of pixels in width × height.
Higher resolution = more detail.
Image Representation:
Grayscale: one value per pixel.
RGB: three values (Red, Green, Blue).
Stored as 2D or 3D arrays.
Image Formation:
Through lenses projecting scene onto a sensor.
Pinhole model, perspective projection, and lens distortions affect the image.
Brightness: Intensity level (e.g., dark vs. bright image).
Contrast: Difference between darkest and brightest regions.
Hue: Type of color (e.g., red, blue).
Saturation: Intensity or purity of color (gray = low saturation).
✅ 1. Filtering in Image Processing
➤ Definition:
Filtering is the process of modifying or enhancing an image by emphasizing or removing certain
features like noise, edges, or textures.
➤ Types of Filters:
A. Linear Filters:
Apply a linear transformation to pixel values.
Mean Filter (Averaging): Reduces noise by replacing each pixel with the average of its
neighbors.
Gaussian Filter: Applies a weighted average using a Gaussian kernel. Smoothens image while
preserving edges better than mean filtering.
B. Non-linear Filters:
Median Filter: Replaces pixel value with the median of its neighborhood. Very effective in
removing salt-and-pepper noise.
➤ Example:
An image with salt-and-pepper noise can be cleaned using a median filter, which removes outliers
(black or white dots) while preserving edges.
✅ 2. Convolution in Image Processing
➤ Definition:
Convolution is a mathematical operation used to apply filters to images.
➤ How It Works:
A kernel (filter matrix) is slid across the image.
At each location, the sum of the element-wise product of the kernel and the overlapping
image region is computed.
This value replaces the central pixel.
➤ Mathematical Expression:
G(x,y)=∑i=−kk∑j=−kkI(x+i,y+j)⋅K(i,j)G(x, y) = \sum_{i=-k}^{k} \sum_{j=-k}^{k} I(x+i, y+j) \cdot K(i,
j)G(x,y)=i=−k∑kj=−k∑kI(x+i,y+j)⋅K(i,j)
Where:
III is the input image
KKK is the kernel
GGG is the output image
➤ Example:
Using a 3×3 sharpening kernel:
ini
CopyEdit
[ 0 -1 0
-1 5 -1
0 -1 0 ]
applied via convolution enhances edges in an image.
✅ 3. Edge Detection
Edge detection helps identify object boundaries by detecting changes in intensity.
➤ A. Sobel Edge Detection
➤ How it works:
Applies two 3×3 kernels: one for horizontal (GxG_xGx), one for vertical (GyG_yGy) gradients.
Combined gradient magnitude:
G=Gx2+Gy2G = \sqrt{G_x^2 + G_y^2}G=Gx2+Gy2
➤ Kernels:
text
CopyEdit
Gx = [ -1 0 1 Gy = [ -1 -2 -1
-2 0 2 0 0 0
-1 0 1 ] 1 2 1]
➤ Example:
Apply Sobel to detect roads in satellite images by emphasizing edges in horizontal and vertical
directions.
➤ B. Canny Edge Detection
Canny is a multi-stage edge detection algorithm:
1. Noise Reduction: Gaussian blur
2. Gradient Calculation: Sobel-like operation
3. Non-Maximum Suppression: Thins edges
4. Double Thresholding: Strong and weak edges
5. Edge Tracking by Hysteresis: Connects weak edges to strong ones
➤ Example:
Used in medical imaging (e.g., MRI, X-rays) to detect boundaries of tissues or bones accurately.
✅ 4. Image Transformations
➤ A. Fourier Transform (FT)
➤ Definition:
Transforms an image from spatial domain to frequency domain. Useful to analyze frequency
content.
➤ How it works:
F(u,v)=∑x∑yf(x,y)⋅e−j2π(ux/M+vy/N)F(u, v) = \sum_{x} \sum_{y} f(x, y) \cdot e^{-j2\pi(ux/M +
vy/N)}F(u,v)=x∑y∑f(x,y)⋅e−j2π(ux/M+vy/N)
➤ Use Cases:
Filtering (e.g., low-pass to remove high-frequency noise)
Image compression (JPEG uses DCT, a related concept)
Pattern recognition
➤ Example:
A fingerprint image with periodic noise can be denoised by applying Fourier Transform, masking
high-frequency components, and applying Inverse Fourier Transform.
➤ B. Hough Transform
➤ Definition:
Used to detect geometric shapes (lines, circles) in images.
➤ For Lines:
A line can be expressed as:
ρ=xcosθ+ysinθ\rho = x\cos\theta + y\sin\thetaρ=xcosθ+ysinθ
Each edge point votes in the accumulator space for possible lines passing through it.
➤ For Circles:
Circle equation: (x−a)2+(y−b)2=r2(x - a)^2 + (y - b)^2 = r^2(x−a)2+(y−b)2=r2
➤ Example:
Used in license plate detection or lane detection in autonomous driving by detecting straight lines
on the road.