COMPUTER VISION
TOPIC: SCALE SPACE AND SCALE SELECTION,
SIFT, SURF, HOG, LBP
WHAT IS SCALE SPACE ?
IN COMPUTER VISION, SCALE-SPACE IS A MULTI-RESOLUTION
REPRESENTATION OF AN IMAGE CREATED BY APPLYING GAUSSIAN
FILTERS OF INCREASING STANDARD DEVIATIONS TO THE IMAGE,
PRODUCING A SERIES OF BLURRED VERSIONS. SCALE SPACE IS A
FAMILY OF GRADUALLY SMOOTHED IMAGES GENERATED FROM A
SINGLE INPUT IMAGE. IT PROVIDES A ROBUST FRAMEWORK FOR
ANALYZING IMAGE STRUCTURES AND FEATURES ACROSS DIFFERENT
LEVELS OF DETAIL, FROM FINE TO COARSE.
WHY WE NEED IT?
• Objects in images appear at different
1.
sizes.
• Fixed-scale filters might miss features (too small
2. or too large)
• Scale-space ensures that features are detected
consistently across varying images size.
3.
CONSTRUCTION OF SCALE SPACE
•Mathematical Foundation:
•Scale space is generated by convolving the image with a Gaussian kernel:
L(x,y;σ)=G(x,y;σ)∗I(x,y)
where G(x,y;σ) is a Gaussian filter with scale (σ).
•Process:
1.Take an image I(x,y)
2.Convolve with Gaussian filters of different σ (small σ = fine details, large σ = coarse structures).
3.Stack results → Scale-Space representation.
•Visualization:
•Imagine a 3D plot:
•X, Y = image dimensions
•Z = scale (σ)
•The image becomes a “tube” of versions at different blur levels.
MERITS DEMERITS
• COMPUTATIONALLY EXPENSIVE – GENERATING MANY
•Multi-scale feature detection – captures fine to SCALES REQUIRES HIGH PROCESSING.
coarse details.
• LOSS OF DETAILS AT HIGHER SCALES – FINE
• Robustness to scale changes – detects
INFORMATION CAN DISAPPEAR.
objects regardless of size or distance.
• Mathematically well-founded – Gaussian • CHOICE OF SCALE PARAMETER (Σ) – SELECTING
smoothing provides stable results. OPTIMAL Σ IS NON-TRIVIAL.
• Foundation for key algorithms – used in • STORAGE OVERHEAD – MAINTAINING MULTIPLE
SIFT, SURF, blob/edge detection. BLURRED VERSIONS CAN BE MEMORY HEAVY.
• Noise reduction – higher scales smooth out
• NOT ROTATION OR ILLUMINATION INVARIANT BY
image noise. ITSELF – NEEDS ADDITIONAL METHODS.
APPLICATIONS
•Feature Detection: Basis for algorithms like SIFT, SURF (keypoint detection at
multiple scales).
•Edge & Blob Detection: Edges and regions are more robustly detected at
appropriate scales.
•Object Recognition: Recognizes objects irrespective of their distance/size in
the image.
•Image Segmentation & Tracking: Helps in separating meaningful structures
across scales.
WHAT IS SCALE SELECTION?
•Scale selection is the process of automatically identifying the optimal scale
at which image features (edges, corners, blobs, textures) should be
detected.
•Since objects and patterns in an image may appear at different sizes, a
single scale of analysis may fail.
•Scale selection ensures that features are detected where they are most
stable and distinctive, making feature detection scale-invariant.
•It forms a crucial step in modern computer vision techniques such as SIFT,
SURF, and blob detection.
PRINCIPLE OF SCALE SELECTION?
•Features appear differently depending on the level of smoothing (σ).
•At the correct scale, the feature produces a strong, stable response in the
chosen detector.
•Principle: Detect local extrema across both space and scale.
Example:
•Fine details → visible at small σ
•Coarse structures → visible at large σ
•Correct scale = maximum filter response.
CONSTRUCTION OF SCALE SPACE
CONSTRUCTION PROCESS:
1: Start With The Input Image 𝐼 𝑥 𝑦 .
2: Build A Scale-space Representation By Convolving 𝐼 𝑥 𝑦 With Gaussian Kernels Of Different
Standard Deviations (Σ).
3: Detect Features (Edges, Corners, Blobs, Etc.) At Multiple Scales.
4: Select The Scale Where The Feature Response Is Maximum And Most Stable.
MATHEMATICAL FORMULATION:
• SCALE-SPACE REPRESENTATION:
𝐿 𝑥 𝑦 𝜎 =𝐺 𝑥 𝑦 𝜎 ∗𝐼 𝑥 𝑦
• WHERE 𝐼 𝑥 𝑦 =ORIGINAL IMAGE
𝐺 𝑥 𝑦 𝜎 =GAUSSIAN KERNEL WITH VARIANCE 𝜎 2
𝜎 =SCALE PARAMETER
• AUTOMATIC SCALE SELECTION PRINCIPLE:
FIND LOCAL EXTREMA OF NORMALIZED DERIVATIVES IN SCALE-SPACE.
FOR BLOBS, THE NORMALIZED LAPLACIAN IS WIDELY USED:
𝜎 2 ∇2 𝐿 𝑥 𝑦 𝜎
FEATURE SCALE = VALUE OF 𝜎WHERE THIS RESPONSE IS MAXIMIZED.
MERITS DEMERITS
• Provides scale invariance (features • Computationally expensive –
detected regardless of object size). scanning across multiple scales.
• Ensures robust and stable feature • Choice of detector matters –
detection. poor filter may give weak results.
• Captures both fine and coarse • Fine details lost at large
details effectively. scales (over-smoothing).
• Forms the foundation of many vision • Noise sensitivity at small
algorithms (SIFT, SURF). scales.
• Reduces ambiguity in multi-scale • Requires storage of multiple
feature detection. scale levels, increasing memory
usage.
APPLICATIONS
•Keypoint Detection (SIFT, SURF, ORB): Scale-invariant feature matching.
•Blob Detection: Identifying cells, spots, or objects in images.
•Object Recognition: Detecting objects at varying distances/sizes.
•Image Segmentation: Identifying structures at their natural scale.
•Tracking in Videos: Following objects consistently across scales.
FEATURE DETECTION ALGORITHMS
1. SIFT (SCALE INVARIANT FEATURE TRANSFORM) :
SIFT is a robust algorithm designed to identify and describe local features in images that
are invariant to scale, rotation, and partially invariant to affine transformations and
illumination changes. This means that sift can detect the same features in an image even if
the image is resized, rotated, or viewed under different lighting conditions. This property
makes sift extremely valuable for tasks that require matching points between different
views of the same scene or object.
KEY STEPS
1. SCALE-SPACE EXTREMA DETECTION
• IDENTIFY KEY POINTS INVARIANT TO SCALE.
• CONSTRUCT A SCALE-SPACE BY PROGRESSIVELY BLURRING THE IMAGE USING GAUSSIAN FILTERS
WITH INCREASING STANDARD DEVIATION.
• COMPUTE DIFFERENCE OF GAUSSIANS (DOG) BY SUBTRACTING ADJACENT GAUSSIAN-BLURRED
IMAGES.
• DETECT LOCAL EXTREMA IN DOG IMAGES AS POTENTIAL KEY POINTS.
KEY STEPS
2. KEYPOINT LOCALIZATION
• REFINE THE POSITIONS OF POTENTIAL KEY POINTS FOR HIGHER ACCURACY.
• FIT A QUADRATIC FUNCTION TO LOCAL SAMPLE POINTS TO DETERMINE EXACT LOCATION AND SCALE.
• DISCARD KEY POINTS WITH LOW CONTRAST OR THOSE POORLY LOCALIZED ALONG EDGES TO IMPROVE
ROBUSTNESS.
3. ORIENTATION ASSIGNMENT
• ASSIGN ONE OR MORE ORIENTATIONS TO EACH KEY POINT BASED ON LOCAL IMAGE GRADIENT DIRECTIONS.
• COMPUTE GRADIENT DIRECTIONS AND MAGNITUDES WITHIN A LOCAL NEIGHBORHOOD.
• CREATE AN ORIENTATION HISTOGRAM AND ASSIGN THE PEAK(S) AS THE KEY POINT ORIENTATION(S).
• ENSURES DESCRIPTORS ARE ROTATION-INVARIANT.
KEY STEPS
4. KEYPOINT DESCRIPTOR
• CREATE A DESCRIPTOR VECTOR FOR EACH KEY POINT REPRESENTING THE LOCAL IMAGE REGION.
• SAMPLE GRADIENT MAGNITUDES AND ORIENTATIONS WITHIN A LOCAL NEIGHBORHOOD
AROUND THE KEY POINT.
• CONSTRUCT A 128-DIMENSIONAL VECTOR FROM THESE SAMPLES.
• NORMALIZE THE DESCRIPTOR TO REDUCE EFFECTS OF ILLUMINATION CHANGES.
MERITS DEMERITS
• SCALE AND ROTATION INVARIANCE: SIFT
FEATURES ARE ROBUST TO CHANGES IN SCALE AND • COMPUTATIONAL COMPLEXITY: THE
ROTATION, MAKING THEM SUITABLE FOR A WIDE SIFT ALGORITHM IS COMPUTATIONALLY
RANGE OF APPLICATIONS. INTENSIVE, MAKING IT SLOWER
• DISTINCTIVE DESCRIPTORS: THE 128-DIMENSIONAL COMPARED TO SOME OTHER FEATURE
DESCRIPTORS ARE HIGHLY DISTINCTIVE, ALLOWING DETECTION METHODS.
FOR ACCURATE MATCHING OF FEATURES BETWEEN • PATENT ISSUES: SIFT WAS PATENTED,
IMAGES. WHICH LIMITED ITS USE IN COMMERCIAL
• ROBUSTNESS TO NOISE AND ILLUMINATION APPLICATIONS UNTIL THE PATENT
CHANGES: SIFT FEATURES ARE RELATIVELY EXPIRED.
INSENSITIVE TO NOISE AND CHANGES IN
ILLUMINATION, ENHANCING THEIR RELIABILITY.
APPLICATIONS
• OBJECT RECOGNITION – IDENTIFY OBJECTS REGARDLESS OF SCALE,
ORIENTATION, OR VIEWPOINT.
• IMAGE STITCHING – MATCH POINTS TO ALIGN AND BLEND IMAGES INTO
PANORAMAS.
• 3D RECONSTRUCTION – TRIANGULATE MATCHING POINTS TO BUILD 3D
SCENES.
• ROBOT NAVIGATION – DETECT FEATURES FOR LOCALIZATION AND MAPPING.