KEMBAR78
Image Classification & ML Challenges | PDF | Image Segmentation | Computer Vision
0% found this document useful (0 votes)
122 views27 pages

Image Classification & ML Challenges

The document discusses several key topics in machine learning and computer vision, including the six stages of machine learning model development, differences between structured and unstructured data classification, challenges of image classification, and common computer vision tasks such as classification, detection, localization, and prediction. Classification involves categorizing images into predefined classes and is used for tasks like object recognition, organization and retrieval of visual data, and automation. Detection identifies specific objects within images, localization provides more precise spatial information, and prediction estimates future attributes. These tasks are important for applications involving road signs and traffic management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views27 pages

Image Classification & ML Challenges

The document discusses several key topics in machine learning and computer vision, including the six stages of machine learning model development, differences between structured and unstructured data classification, challenges of image classification, and common computer vision tasks such as classification, detection, localization, and prediction. Classification involves categorizing images into predefined classes and is used for tasks like object recognition, organization and retrieval of visual data, and automation. Detection identifies specific objects within images, localization provides more precise spatial information, and prediction estimates future attributes. These tasks are important for applications involving road signs and traffic management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.

ipynb - Colaboratory

Machine Learning Life Cycle

The Six Stages of ML Model Development


Machine learning (ML) model development is a structured process that involves several stages
to ensure the creation of accurate and reliable models. These stages guide the progression from
acquiring and exploring data to deploying the model in production. Let's take a closer look at the
six stages of ML model development:

Classification Models (Structured VS Unstructured Data)


We are used to with the tabular data ,i.e., structured data classification.But classification models
for tabular data and image data differ in terms of data representation, feature engineering, model

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 1/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

architecture, and evaluation techniques. Here's a breakdown of the key differences:

Image Classification Challenges


So we have to face different type of challenges to handle image classification.

Variability: Images can vary greatly in terms of lighting conditions, scale, orientation,
occlusion, and background clutter. These variations make it difficult to generalize and
classify images accurately.

For more read this article

Overfitting: Overfitting occurs when a model learns the training data too well and fails to
generalize to unseen images. This challenge requires techniques like data augmentation,
regularization, and model selection to mitigate.

For more read this article

Class imbalance: In real-world datasets, certain classes may have a significantly larger
number of samples compared to others. This class imbalance can lead to biased models
that perform poorly on underrepresented classes.

For more read these articles

article1

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 2/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

article2

Computational requirements: Image classification often requires substantial


computational resources, particularly when working with large datasets or complex deep
learning models. Access to powerful hardware or cloud computing may be necessary.

Interpretability: Deep learning models used for image classification, such as convolutional
neural networks (CNNs), can be challenging to interpret. Understanding why a model
makes specific predictions is an ongoing research area.

Common Tasks in Computer Vision


Computer Vision (CV) encompasses a wide range of tasks that involve analyzing and
understanding visual data. Several common tasks in CV include classification, detection,
localization, prediction, segmentation, generation, keypoint detection, and matching. Let's
explore each of these tasks in detail:

Classification: Classification in CV involves categorizing images or objects into predefined


classes or categories. The goal is to assign a label or class to an input image accurately.
This task is often performed using machine learning or deep learning models trained on
labeled datasets.

Classification finds applications in various domains, such as object recognition, image


categorization, and scene understanding.

Organization and retrieval: Organization and retrieval: Image classification enables


efficient organization and retrieval of road signs or traffic-related visual data. When
road sign images are classified into different categories or classes, it becomes easier
to search and navigate through a large database of traffic signs. For example, in a
traffic management system, classifying road sign images into categories like "stop
signs," "speed limit signs," or "pedestrian crossing signs" allows traffic engineers and
authorities to quickly locate specific types of signs.

By organizing road sign images based on their categories, it becomes easier to


manage and index the data, facilitating quick retrieval of relevant signs for various
purposes. For instance, when planning road construction or maintenance, traffic
engineers can easily access all the relevant images of "roadwork ahead" signs.
Similarly, when analyzing traffic patterns or studying driver behavior, researchers can
retrieve images of specific types of signs, such as "yield" or "school zone" signs.

Object recognition: Image classification is fundamental to object recognition, which


involves identifying and distinguishing objects or patterns of interest within images.
By training models to recognize specific objects or classes, image classification
helps in tasks like identifying common objects, detecting anomalies, or identifying
specific features within images. Object recognition finds applications in various
domains, including self-driving cars, where the system needs to identify and classify
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 3/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

objects on the road, or in medical imaging, where the system needs to identify
specific structures or anomalies within medical scans.

Automation: Image classification plays a crucial role in automating tasks across


different industries.

In autonomous driving, image classification is used to identify and classify


objects such as pedestrians, vehicles, traffic signs, and road markings, enabling
the vehicle to make informed decisions and navigate safely.
In medical diagnosis, image classification helps in detecting diseases,
identifying abnormalities, and assisting doctors in making accurate diagnoses
from medical images such as X-rays, MRIs, or CT scans.
In quality control applications, image classification can be used to identify
defective products or detect anomalies in manufacturing processes.
In security surveillance, image classification can help in identifying suspicious
activities or objects in real-time video feeds.

Detection, Localization, and Prediction: In the context of road signs and traffic-related
scenarios, detection, localization, and prediction tasks play crucial roles in traffic
management, driver assistance systems, and overall road safety. Let's explore an example:

Detection: Object detection can be applied to detect and identify specific road signs
within an image or a video stream. For instance, a computer vision system equipped
with object detection algorithms can analyze a live video feed from a traffic camera
and identify various road signs such as stop signs, yield signs, or speed limit signs.
The system would draw bounding boxes around each detected sign, indicating their
presence and location within the scene.

Localization: Localization goes a step further by accurately determining the precise


spatial coordinates or boundaries of the road signs within an image or video. It
involves providing more detailed information about the position and shape of the
detected signs. In a traffic monitoring system, localization can help identify the exact
location and size of a speed limit sign, allowing authorities to assess its visibility and
placement effectiveness.

Prediction: Prediction tasks in the context of road signs and traffic-related scenarios
involve estimating future attributes or behaviors associated with the detected signs.
For example, an advanced driver assistance system (ADAS) can analyze a sequence
of video frames and predict the future position or behavior of a pedestrian crossing
sign. This prediction can assist the system in determining the appropriate action,
such as alerting the driver or adjusting the vehicle's speed, to ensure safety when
approaching the sign.

Combining detection, localization, and prediction tasks in road sign and traffic-related
applications enables advanced systems to accurately identify, locate, and anticipate the

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 4/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

behavior of various road signs.

For more read these articles:

article1

article2

article3

Segmentation: Image segmentation involves dividing an image into meaningful and


coherent regions or segments based on similarity criteria. The goal is to assign a label or
category to each pixel or region in the image. Segmentation can be binary (dividing the
image into foreground and background) or multi-class (assigning different labels to distinct
objects or regions). Segmentation is used in various applications, including medical image
analysis, autonomous driving, and object tracking. There are several types of image
segmentation techniques, each with its own approach and characteristics. Here are the
explanations of different types of image segmentation:

Thresholding: Thresholding is a simple and widely used segmentation technique. It


involves selecting a threshold value and classifying each pixel in the image as
foreground or background based on its intensity or color. Pixels with intensity or color
values above the threshold are considered foreground, while those below the
threshold are considered background.

Thresholding for Intensity-based Segmentation: In road sign images,


thresholding can be used to separate the sign's foreground (the actual sign)
from the background (the surrounding environment). This is typically done by
selecting a threshold value based on the intensity of the pixels in the image.
Pixels with intensity values above the threshold are classified as the sign's
foreground, while those below the threshold are considered part of the
background.

For example, in a speed limit sign image, thresholding can be applied to


separate the white text and symbol on the sign from the rest of the image. By
setting an appropriate intensity threshold, the algorithm can effectively
distinguish between the foreground and the background, allowing subsequent
analysis and processing to focus on the sign itself.

Thresholding for Color-based Segmentation: In certain road sign and traffic-


related scenarios, color information is crucial for segmentation. For instance,
traffic lights have specific colors (e.g., red, green, yellow) that need to be
identified for effective segmentation. Thresholding can be used to separate the
different color regions of a traffic light, enabling the system to identify the state
of the light (e.g., red or green) accurately.

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 5/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

By setting appropriate color thresholds, pixels with color values falling within
the specified range are classified as the desired color region, while those
outside the range are considered part of the background. This allows for the
extraction of specific color regions within the image, facilitating subsequent
analysis and decision-making in applications such as traffic light detection and
control.

Edge-based segmentation: Edge-based segmentation focuses on identifying and


localizing the boundaries or edges of objects within an image. In road sign and traffic-
related scenarios, edge detection can be employed to extract the edges of signs, lane
markings, or other important features. This involves detecting abrupt changes in
intensity or color, which typically indicate object boundaries.

Techniques such as the Canny edge detector or gradient-based methods can be used
to identify edges and separate objects based on the detected edges.

Canny Edge Detector: The Canny edge detector is a popular algorithm used for
edge detection. It works by computing the gradient magnitude of the image and
identifying areas with significant changes in intensity. In the context of road
sign segmentation, the Canny edge detector can help separate the sign from
the surrounding background by detecting the edges of the sign.

Gradient-based Methods: Gradient-based methods, such as the Sobel operator


or the Laplacian of Gaussian (LoG), can also be used for edge detection. These
methods analyze the gradient or second derivative of the image to locate areas
of rapid intensity changes. By identifying these edges, road signs can be
effectively separated from the background.

Region-based segmentation: Region-based segmentation aims to group pixels into


meaningful regions based on their similarity in terms of intensity, color, texture, or
other features. It involves partitioning the image into homogeneous regions that have
similar characteristics.

Popular algorithms for region-based segmentation include the Watershed algorithm,


mean-shift clustering, and graph-based segmentation.

Grouping Pixels into Regions: Region-based segmentation aims to partition an


image into homogeneous regions that have similar characteristics, such as
intensity, color, texture, or other features. In the context of road sign and traffic-
related applications, region-based segmentation can be used to group pixels
belonging to the same sign or traffic element based on their similarities.

Watershed Algorithm: The Watershed algorithm is a popular region-based


segmentation technique that operates by treating the image as a topographic
map. It identifies regional minima and gradually fills basins, separating adjacent
regions. In the context of road signs, the Watershed algorithm can be used to

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 6/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

segment the sign from the background by identifying the regions corresponding
to the sign's shape and color.

Mean-Shift Clustering: Mean-shift clustering is another region-based


segmentation algorithm that iteratively assigns pixels to clusters based on their
similarity. It works by shifting each pixel's location toward the mode (center) of
its local neighborhood until convergence. In road sign and traffic-related
scenarios, mean-shift clustering can be applied to group pixels with similar
color or texture characteristics, effectively separating the sign or element from
the background.

Graph-based Segmentation: Graph-based segmentation approaches represent


the image as a graph, where nodes correspond to pixels, and edges represent
the similarity between pixels. By applying graph algorithms such as minimum
cuts or normalized cuts, the image can be segmented into meaningful regions.
In the context of road signs, graph-based segmentation can be utilized to
identify regions corresponding to the sign's shape or color, enabling effective
separation from the surrounding background.

Contour-based segmentation: Contour-based segmentation focuses on extracting


object boundaries or contours from an image. It involves detecting and connecting
the boundaries of objects, which can be represented as a series of connected curves
or curves enclosing the objects.

Extracting Object Boundaries: Contour-based segmentation focuses on


detecting and connecting the boundaries or contours of objects within an
image. In road sign and traffic-related scenarios, contour-based segmentation
can be used to extract the boundaries of road signs, lane markings, or other
important elements. The boundaries can be represented as a series of
connected curves or curves enclosing the objects of interest.

Active Contours (Snakes) Model: The active contours model, also known as
snakes, is a popular algorithm used for contour-based segmentation. It involves
iteratively deforming an initial contour to fit the boundaries of objects in the
image. By minimizing an energy function that combines data and smoothness
terms, the contour adapts to the object's shape. In the context of road signs, the
active contours model can be used to accurately delineate the sign's
boundaries.

Chan-Vese Model: The Chan-Vese model is another contour-based


segmentation algorithm that aims to separate objects from the background by
minimizing an energy functional. It utilizes the region-based properties of
objects, such as intensity or color homogeneity, to determine the optimal
contours. In road sign and traffic-related applications, the Chan-Vese model can

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 7/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

be applied to segment the sign by finding the contours that separate it from the
surrounding background.

Semantic segmentation:

Semantic segmentation assigns a semantic label to each pixel in an image, aiming to


categorize pixels into meaningful classes or objects. It goes beyond simple boundary
detection and aims to provide a detailed understanding of the image content. Deep
learning approaches, such as convolutional neural networks (CNNs) and their
variants (e.g., U-Net, Fully Convolutional Networks), have achieved remarkable
success in semantic segmentation tasks.

Example: In road sign and traffic-related scenarios, semantic segmentation can be


used to categorize each pixel into classes such as road signs, pedestrians, vehicles,
road markings, or other relevant objects or regions.

Instance segmentation: Instance segmentation involves not only segmenting objects


but also distinguishing between different instances of the same object class. It aims
to identify and differentiate individual object instances within an image. Instance
segmentation techniques often combine object detection and semantic
segmentation methods to achieve precise and accurate segmentation results.

Example: In road sign and traffic-related scenarios, this involves segmenting and
labeling each road sign, pedestrian, vehicle, or other relevant objects as separate
instances. For example, if there are multiple stop signs in an image, instance
segmentation techniques can distinguish between them and assign unique labels to
each instance.

Panoptic segmentation: Panoptic segmentation is a recent advancement that unifies


semantic and instance segmentation. It aims to assign a unique label to each pixel in
the image, including both things (object instances) and stuff (amorphous regions).
Panoptic segmentation provides a comprehensive understanding of the scene by
combining instance-level and semantic-level segmentation.

Example: It can be applied in urban planning. By segmenting an urban scene into both
objects (e.g., cars, buildings, trees) and amorphous regions (e.g., roads, sidewalks,
parks), planners can obtain a comprehensive understanding of the city layout, identify
potential areas for development, and analyze the distribution of different urban
elements.

For more read this article

Generation: Image generation refers to the task of creating new images based on learned
patterns and characteristics from existing data. Generative models, such as Generative
Adversarial Networks (GANs) or Variational Autoencoders (VAEs), are used to generate
new images that resemble the training data. Image generation has applications in image
synthesis, data augmentation, and creative content generation.
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 8/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

For more read these articles

article1

article2

article3

Keypoint Detection and Matching: Keypoint detection involves identifying specific points of
interest in an image, such as corners, edges, or distinctive regions. Keypoints serve as
landmarks or reference points for further analysis or tracking. Keypoint matching focuses
on finding correspondences between keypoints in different images, enabling tasks like
image alignment, object tracking, or 3D reconstruction. These tasks are commonly used in
applications like augmented reality, image stitching, and image registration.

For more read this article

Problem Statement

DriveViz, a deep research and AI company specializing in fleet management and transportation
intelligence, aims to expand its product portfolio by developing Advanced Driver Assistance
Systems (ADAS) and establish its brand in the connected car market. The company's primary
objective is to enhance driver safety and reduce the mortality rate caused by road accidents.

After extensive research, DriveViz has identified a key problem that needs to be addressed within
the ADAS program. The problem revolves around the accurate classification of road signs under
various weather conditions. DriveViz believes that solving this problem will play a crucial role in
achieving their goal of improving driver safety and reducing the occurrence of road accidents.

As a valued contributor to DriveViz's ADAS development initiative, you have been presented with
an opportunity to contribute your expertise and help tackle the challenge of developing a robust
road sign classification system for improved driver assistance systems.

Code Setup & Dataset

! pip install opencv-python albumentations torch c torchvision gdown --quiet
! gdown --fuzzy "https://drive.google.com/file/d/1OpowBroHNYx1hx6rXOIfwL7k9VLcTNzf/
! unzip -q -o notebook1_roadsigndataset.zip

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.4/65.4 kB 2.8 MB/s eta 0:00:0


Preparing metadata (setup.py) ... done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.2/71.2 kB 4.7 MB/s eta 0:00:0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 765.6/765.6 kB 23.0 MB/s eta 0:00:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 37.2/37.2 MB 8.5 MB/s eta 0:00:0
Preparing metadata (setup.py) ... done
Preparing metadata (setup.py) ... done
Downloading...
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 9/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory
From: https://drive.google.com/uc?id=1OpowBroHNYx1hx6rXOIfwL7k9VLcTNzf
To: /content/notebook1_roadsigndataset.zip
100% 16.3M/16.3M [00:00<00:00, 88.9MB/s]

Exploratory Data Analysis (EDA)

Task 1 : Plotting the histogram of image widths and heights

import matplotlib.pyplot as plt
import cv2
import os

def plot_image_histograms(image_folder):

    """
    Plot the histogram of image widths and heights for the images in the specified 

    Parameters:
        image_folder (str): Path to the folder containing the images.

    Returns:
        None
    """

    widths = []
    heights = []

    # Iterate over the images in the folder
    for filename in os.listdir(image_folder):
        if filename.endswith(".jpg") or filename.endswith(".png"):
            image_path = os.path.join(image_folder, filename)
            image = cv2.imread(image_path)

            # Extract the width and height of the image
            height, width, _ = image.shape
            widths.append(width)
            heights.append(height)

    # Create subplots with 1 row and 3 columns
    fig, axs = plt.subplots(1, 3, figsize=(18, 5))

    # Plot the histogram of image widths
    axs[0].hist(widths, bins=30, color='blue', alpha=0.7)
    axs[0].set_xlabel("Width")
    axs[0].set_ylabel("Frequency")
    axs[0].set_title("Histogram of Image Widths")

    # Plot the histogram of image heights
    axs[1].hist(heights, bins=30, color='green', alpha=0.7)
    axs[1].set_xlabel("Height")

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 10/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

    axs[1].set_ylabel("Frequency")
    axs[1].set_title("Histogram of Image Heights")

    # Create a scatter plot of image width vs. height
    axs[2].scatter(widths, heights, color='red', alpha=0.5)
    axs[2].set_xlabel("Width")
    axs[2].set_ylabel("Height")
    axs[2].set_title("Scatter Plot of Image Width vs. Height")

    # Adjust spacing between subplots
    plt.tight_layout()

    # Show the plot
    plt.show()

# Specify the folder containing the images
image_folder = "/content/roadsigndataset/train"

# Call the function to plot the histograms
plot_image_histograms(image_folder)

Unexpected variations in image dimensions can indicate data corruption, incomplete data,
or errors during data collection or preprocessing.

Preprocessing techniques like resizing, cropping, or normalizing often require the images to
have consistent dimensions. By checking the height and width, we can determine the
necessary preprocessing steps to bring all images to a uniform size or aspect ratio,
enabling fair comparisons, accurate analysis, and efficient model training.

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 11/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

It helps in optimizing memory allocation, batch sizes, and other resource-related


considerations to ensure efficient processing and prevent memory overflow or
performance issues.

In many computer vision tasks, images serve as input to deep learning models or other
machine learning algorithms. These models often have specific input shape requirements.
By checking the image dimensions, we can ensure that the images align with the expected
input dimensions of the models we intend to use.

Task 2 : Find the distribution of quality of images

import os
import cv2
import numpy as np
import matplotlib.pyplot as plt

To find the distribution of the quality of images, we can use various image quality metrics such
as Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index
(SSIM), or any other suitable quality metric.

Peak Signal-to-Noise Ratio (PSNR)

Peak Signal-to-Noise Ratio (PSNR) is a widely used metric in image and video processing to
measure the quality of a reconstructed or compressed image/video compared to the original,
reference image/video. It provides an objective measure of the amount of distortion or loss
introduced during the compression or reconstruction process.

PSNR is calculated based on the mean squared error (MSE) between the original and
reconstructed images/videos. The higher the PSNR value, the closer the reconstructed
image/video is to the original, indicating better quality. It is expressed in decibels (dB).

def calculate_image_quality_psnr(image_folder):
    """
    Calculate the Peak Signal-to-Noise Ratio (PSNR) for images in the specified fol

    Parameters:
        image_folder (str): Path to the folder containing the images.

    Returns:
        list: List of PSNR values for each image.
    """
    psnr_values = []

    # Iterate over the images in the folder
    for filename in os.listdir(image_folder):
        if filename.endswith(".jpg") or filename.endswith(".png"):
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 12/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

            image_path = os.path.join(image_folder, filename)
            image = cv2.imread(image_path)

            # Convert the image to grayscale for PSNR calculation
            gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

            # Calculate the PSNR
            mse = np.mean((gray_image - gray_image.mean()) ** 2)
            if mse == 0:
                psnr = float('inf')
            else:
                max_pixel_value = np.max(gray_image)
                psnr = 20 * np.log10(max_pixel_value / np.sqrt(mse))
            psnr_values.append(psnr)

    return psnr_values

def plot_image_quality_distribution_PSNR(image_folder):
    """
    Plot the distribution of image quality based on Peak Signal-to-Noise Ratio (PSN

    Parameters:
        image_folder (str): Path to the folder containing the images.

    Returns:
        None
    """
    # Calculate PSNR values for the images
    psnr_values = calculate_image_quality_psnr(image_folder)

    # Plot the distribution of PSNR values
    plt.hist(psnr_values, bins=30, color='purple', alpha=0.7)
    plt.xlabel("Peak Signal-to-Noise Ratio (PSNR)")
    plt.ylabel("Frequency")
    plt.title("Distribution of Image Quality (PSNR)")
    plt.show()

# Specify the folder containing the images
image_folder = "/content/roadsigndataset/train"

# Call the function to plot the histograms
plot_image_quality_distribution_PSNR(image_folder)

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 13/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

PSNR is a widely used metric to evaluate image quality, and it measures the ratio of the
peak signal (maximum possible pixel value) to the mean squared error (MSE) between
images. The lower PSNR values and the distribution in this range suggest that the dataset
contains images with noticeable differences compared to the reference image.

The PSNR values in the range of 8 to 23 indicate that there is a noticeable difference
between the images and the reference image. Lower PSNR values typically correspond to
higher levels of image distortion or loss of quality.

The high peak near 12 to 16 suggests that there is a significant concentration of images
with similar levels of distortion or quality degradation. This range may represent a common
characteristic or specific type of image in the dataset.

Limitations of Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) as Image
Quality Metrics

Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) are commonly used metrics
to measure the quality of images, but they have certain limitations that make them less suitable
for assessing perceptual image quality.

MSE and PSNR evaluate image quality based on pixel-level differences between the
original and reconstructed images. While these metrics can measure the level of distortion
or error in an image, they are highly sensitive to even small changes in pixel values. This
sensitivity does not always align with human perception of image quality. Human visual
perception is more focused on higher-level features, such as edges, textures, and overall
visual appearance, rather than pixel-level differences.

They treat all pixels equally, regardless of their visual importance. As a result, these metrics
may not reflect the visual quality perceived by humans accurately. For example, a small
amount of distortion in a smooth region of an image may be less noticeable to the human
eye compared to distortion in high-contrast or detailed regions.

Structural Similarity Index (SSIM)

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 14/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

The Structural Similarity Index (SSIM) is a widely used image quality metric that measures the
similarity between two images. It aims to capture perceptual differences between images by
considering three components: luminance, contrast, and structural similarity.

Luminance: SSIM takes into account the similarity of pixel intensities in the images. It
considers the mean values of the pixels, which represent the overall brightness.

Contrast: SSIM evaluates the similarity of contrast between the images. It calculates the
standard deviation of pixel intensities, which measures the variability or sharpness of the
image.

Structural Similarity: SSIM focuses on the structural patterns and textures present in the
images. It computes the covariance of pixel intensities and their spatial arrangement,
capturing local image structures.

The SSIM index ranges between -1 and 1, where 1 indicates perfect similarity and -1 represents
complete dissimilarity. A higher SSIM value indicates a greater similarity between the images.

In addition to these components, SSIM also incorporates a Gaussian window to account for the
varying importance of different image regions. The Gaussian window assigns higher weights to
the central region and lower weights to the surrounding regions. This emphasizes the impact of
local structures on the overall similarity measure.

The Gaussian window helps to give more importance to the fine details and edges in the image
while reducing the influence of smooth regions. It enables SSIM to effectively capture the
structural similarity between images, even when they differ in terms of global luminance and
contrast.

To calculate SSIM with the Gaussian window, the pixel intensities of the images are multiplied by
the Gaussian weights before computing the mean, standard deviation, and covariance. This
weighting scheme allows SSIM to prioritize the local structural information and produce more
accurate similarity measurements.

For more details read this article:

Task 3 : Find out the distribution of day vs night images

Daytime and nighttime scenes have distinct visual characteristics due to differences in
lighting conditions, shadows, and overall visibility.

Road and traffic conditions vary significantly between day and night. Factors such as traffic
density, pedestrian activity, and lighting conditions can impact driver behavior and the
effectiveness of safety systems.

Road safety is a primary concern in the domain of road and traffic. Day and night driving
pose unique challenges, and accurate classification and detection of objects and road
signs under both conditions are essential for driver safety.
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 15/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

Knowing the distribution of day vs. night images helps in assessing the performance of
computer vision models designed for road and traffic-related tasks. If the dataset is imbalanced,
efforts can be made to collect more samples from the underrepresented class (e.g., nighttime
images) to create a balanced dataset for training.

import cv2
import numpy as np
import os
import matplotlib.pyplot as plt

# Define the image folder path containing the images
# Define the image folder path
image_folder = "/content/roadsigndataset/train"

# Initialize an empty list to store V values
v_values = []

# Iterate over the images in the folder
for image_file in os.listdir(image_folder):
    if image_file.endswith(".jpg") or image_file.endswith(".png"):
        image_path = os.path.join(image_folder, image_file)
        # Load the image
        image = cv2.imread(image_path)

        # Convert the image to HSV color space
        hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

        # Calculate the V value of the top quarter of the image
        height, width, _ = hsv_image.shape
        top_quarter_v_values = hsv_image[:height//4, :, 2].flatten()

        # Append the V values to the list
        v_values.extend(top_quarter_v_values)

# Plot the histogram of V values
plt.hist(v_values, bins=30, color='blue', alpha=0.7)
plt.xlabel("V Value")
plt.ylabel("Frequency")
plt.title("Distribution of V Values for the Top Quarter of Images")
plt.show()

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 16/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

Task 4: Run yolov5(available object detection) and Find out the distribution of
objects in the images

This information helps in understanding the types of objects commonly present on the
road, such as cars, pedestrians, bicycles, traffic signs, and other relevant elements. It
allows us to identify patterns, trends, and potential challenges related to traffic flow,
congestion, or safety.

knowing the frequency and positioning of traffic signs, traffic lights, or road markings can
aid in optimizing road layouts, identifying areas where additional signage is required, or
evaluating the effectiveness of existing traffic control measures.

Understanding the distribution of objects in images is crucial for training object detection
models effectively. By analyzing the frequency and diversity of objects, we can determine the
appropriate training strategies, data augmentation techniques, and model architectures to
handle different object classes and their variations. It helps in building accurate and robust
object detection systems specific to the road and traffic-related domain.

To use the YOLOv5 model in Google Colab, you can follow these step-by-step instructions:

Install the required packages by running the following code cell:

!pip install torch torchvision
!pip install numpy matplotlib
!pip install scikit-image

Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-package


Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-p
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-pack
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-package
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-pack
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packag
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-package
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 17/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-package
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-pack
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/di
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/pyt
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-package
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-pa
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/d
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/di
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/d
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-pack
Requirement already satisfied: scikit-image in /usr/local/lib/python3.10/dist-
Requirement already satisfied: numpy>=1.17.0 in /usr/local/lib/python3.10/dist
Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: networkx>=2.2 in /usr/local/lib/python3.10/dist
Requirement already satisfied: pillow!=7.1.0,!=7.1.1,!=8.3.0,>=6.1.0 in /usr/l
Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.10/dis
Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.1
Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.10/
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/di

Download the YOLOv5 model by running the following code cell:

This will download the YOLOv5 model weights (yolov5m_Objects365.pt) into the current
directory of the Colab notebook.

!wget https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5m_Objects3
!wget https://github.com/ultralytics/yolov5/raw/master/data/Objects365.yaml

--2023-07-10 09:26:46-- https://github.com/ultralytics/yolov5/releases/downlo


Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asse
--2023-07-10 09:26:46-- https://objects.githubusercontent.com/github-producti
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|18
HTTP request sent, awaiting response... 200 OK
Length: 45096813 (43M) [application/octet-stream]
Saving to: ‘yolov5m_Objects365.pt’

yolov5m_Objects365. 100%[===================>] 43.01M 268MB/s in 0.2s

2023-07-10 09:26:47 (268 MB/s) - ‘yolov5m_Objects365.pt’ saved [45096813/45096

--2023-07-10 09:26:47-- https://github.com/ultralytics/yolov5/raw/master/data


Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 18/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/ultralytics/yolov5/master/data/Obj
--2023-07-10 09:26:47-- https://raw.githubusercontent.com/ultralytics/yolov5/
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.10
HTTP request sent, awaiting response... 200 OK
Length: 9204 (9.0K) [text/plain]
Saving to: ‘Objects365.yaml’

Objects365.yaml 100%[===================>] 8.99K --.-KB/s in 0s

2023-07-10 09:26:47 (42.4 MB/s) - ‘Objects365.yaml’ saved [9204/9204]

Clone the YOLOv5 repository by running the following code cell:

!git clone https://github.com/ultralytics/yolov5.git

Cloning into 'yolov5'...


remote: Enumerating objects: 15814, done.
remote: Counting objects: 100% (46/46), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 15814 (delta 9), reused 23 (delta 2), pack-reused 15768
Receiving objects: 100% (15814/15814), 14.64 MiB | 15.92 MiB/s, done.
Resolving deltas: 100% (10821/10821), done.

Copy the Objects365.yaml configuration file into the yolov5/data directory. You can
download the file manually from the YOLOv5 GitHub repository and upload it to the Colab
notebook, then run the following code cell to move the file:

!mv '/content/Objects365.yaml' '/content/yolov5/data'
!mv '/content/yolov5m_Objects365.pt' '/content/yolov5'

!pip install ultralytics

Collecting ultralytics
Downloading ultralytics-8.0.131-py3-none-any.whl (626 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 626.9/626.9 kB 11.8 MB/s eta 0:00:
Requirement already satisfied: matplotlib>=3.2.2 in /usr/local/lib/python3.10/
Requirement already satisfied: opencv-python>=4.6.0 in /usr/local/lib/python3.
Requirement already satisfied: Pillow>=7.1.2 in /usr/local/lib/python3.10/dist
Requirement already satisfied: PyYAML>=5.3.1 in /usr/local/lib/python3.10/dist
Requirement already satisfied: requests>=2.23.0 in /usr/local/lib/python3.10/d
Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: torch>=1.7.0 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: torchvision>=0.8.1 in /usr/local/lib/python3.10
Requirement already satisfied: tqdm>=4.64.0 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: pandas>=1.1.4 in /usr/local/lib/python3.10/dist
Requirement already satisfied: seaborn>=0.11.0 in /usr/local/lib/python3.10/di
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packag
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/d
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 19/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory
Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-p
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/di
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/d
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/pyt
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-pack
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-package
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-pack
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packag
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-package
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-pack
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/di
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-
Installing collected packages: ultralytics
Successfully installed ultralytics-8.0.131

import torch
import os
import matplotlib.pyplot as plt
import ultralytics

os.chdir("/content/yolov5")
# Model
model = torch.hub.load('.', 'custom', path="yolov5m_Objects365.pt",source='local')

requirements: Ultralytics requirement /content/yolov5/requirements.txt not fou


Collecting gitpython>=3.1.30
Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 184.3/184.3 kB 12.8 MB/s eta 0:00:
Collecting gitdb<5,>=4.0.1 (from gitpython>=3.1.30)
Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 kB 220.6 MB/s eta 0:00:
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython>=3.1.30)
Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Installing collected packages: smmap, gitdb, gitpython
Successfully installed gitdb-4.0.10 gitpython-3.1.31 smmap-5.0.0

requirements: AutoUpdate success ✅


5.7s, installed 1 package: /content/yolov5
requirements: ⚠️
Restart runtime or rerun command for updates to take effect

YOLOv5 🚀 v7.0-193-g485da42 Python-3.10.12 torch-2.0.1+cu118 CPU

Fusing layers...
YOLOv5m summary: 290 layers, 22323858 parameters, 0 gradients
Adding AutoShape...

yolov5 not running colab... need to do setup separetly

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 20/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

import cv2

# Define the image folder path
image_folder = "/content/roadsigndataset/train"

# Initialize an empty dictionary to store the object distribution
object_distribution = {}

# Iterate over the images in the folder
for image_file in os.listdir(image_folder):
    if image_file.endswith(".jpg") or image_file.endswith(".png"):
        image_path = os.path.join(image_folder, image_file)
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        # Perform object detection using YOLOv5
        results = model(image)

        # Extract the class labels for detected objects
        class_labels = results.pandas().xyxy[0]['name'].tolist()

        # Update the object distribution dictionary
        for label in class_labels:
            if label in object_distribution:
                object_distribution[label] += 1
            else:
                object_distribution[label] = 1

# Print the object distribution
for label, count in object_distribution.items():
    print(f"{label}: {count}")

Traffic Sign: 94
Stop Sign: 66
Clock: 15
Street Lights: 88
Car: 86
Person: 17
Traffic Light: 35
Van: 7
Vase: 2
Potted Plant: 3
Flower: 1
Train: 2
Microphone: 1
Speaker: 1
Hat: 3
Handbag/Satchel: 1
Candle: 1
Truck: 5
Flag: 4
SUV: 5
Bus: 3
Motorcycle: 1
Picture/Frame: 3
Pickup Truck: 2
Volleyball: 2
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 21/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory
Crane: 1
Mirror: 2
Lamp: 2
Basketball: 1
Backpack: 2
Air Conditioner: 6
Tent: 1

Data augmentation
Data augmentation is a technique commonly used in machine learning, including computer
vision tasks like image classification. It involves applying various transformations and
modifications to the existing dataset to create new training samples with altered versions of the
original data. Data augmentation serves two primary purposes:

Domain Adaptation: Data augmentation can be used for domain adaptation, which involves
training a model on a source domain and applying it to a target domain. In computer vision,
domains could differ in lighting conditions, camera angles, or imaging equipment, leading
to variations that the model needs to handle. Data augmentation can simulate these
variations in the source domain to make the model more adaptable to the target domain.
By applying domain-specific transformations during augmentation, the model learns to
generalize better and performs well on the target domain by mimicking the variations
present in the real-world data.

Increase Variability and Generalization: Data augmentation helps increase the variability
and diversity of the training data. In many cases, the available dataset is limited, and
training a model solely on this data can lead to overfitting. Overfitting occurs when a model
becomes too specific to the training data and fails to generalize well to new, unseen
examples. By applying random transformations and modifications to the existing data,
such as rotation, scaling, flipping, cropping, or adding noise, data augmentation generates
new samples that exhibit variations of the original data. This expanded dataset allows the
model to learn robust and generalized patterns, enabling better performance on unseen
data during inference.

Mitigate Class Imbalance: Class imbalance occurs when certain classes or categories
have significantly fewer samples compared to others. This can lead to biased models that
perform poorly on underrepresented classes. Data augmentation can help address class
imbalance by artificially increasing the number of samples for minority classes. By
applying augmentation techniques specifically to the underrepresented classes, the
dataset can be rebalanced, allowing the model to learn better representations for all
classes and prevent bias towards the majority classes.

Enhance Model Robustness: Data augmentation introduces variations and perturbations


into the training data, making the model more robust to noise, distortions, or variations
present in real-world scenarios. By exposing the model to augmented samples with

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 22/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

different transformations, the model learns to recognize and generalize patterns in the
presence of such variations. This robustness enables the model to perform well on test
data that may have different lighting conditions, orientations, or other factors that were not
present in the original training data.

Improve Model Performance on Limited Data: Data augmentation is particularly valuable


when the available training data is limited. In scenarios where collecting a large, diverse
dataset is challenging or costly, data augmentation can artificially increase the size and
diversity of the training data. By creating augmented samples, the model receives more
exposure to different variations and becomes more capable of generalizing well, even with
a small original dataset.

Data augmentation refers to a set of techniques used to create new training data samples by
applying various transformations and modifications to the existing dataset. These
transformations alter the appearance or characteristics of the data while preserving the label or
class information.

Data augmentation techniques can be applied to different types of data, including images, text,
audio, and time series. In computer vision, image data augmentation is widely used and includes
operations such as:

Geometric transformations: These include rotation, scaling, translation, flipping (horizontal


or vertical), and cropping. These transformations simulate changes in the viewpoint or
orientation of objects within the image.

Color and contrast adjustments: Operations such as brightness adjustment, contrast


enhancement, hue and saturation changes, and color channel shifting can be applied to
modify the color properties of the images.

Noise addition: Different types of noise, such as Gaussian noise or random pixel value
perturbations, can be added to the images to mimic real-world variations or improve
robustness.

Occlusion and cutout: Artificial occlusions or cutout regions can be introduced into the
images to simulate partial object occlusion or missing information.

These are just a few examples of data augmentation techniques used in computer vision. The
specific choice and combination of augmentation techniques depend on the characteristics of
the dataset, the nature of the problem, and the desired variations needed to improve the model's
performance.

Albumination
Albumination is a Python library specifically designed for data augmentation in computer vision
tasks.The primary goal of Albumination is to facilitate the augmentation process for deep
learning and machine learning practitioners working on image classification, object detection,

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 23/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

segmentation, and other computer vision tasks. By leveraging the power of data augmentation,
Albumination helps improve model performance, generalization, and robustness by exposing the
model to a broader range of variations and scenarios.

It provides a flexible and easy-to-use interface for applying a wide range of image
transformations and augmentations to enhance the diversity and variability of the training
dataset. Albumination offers a rich set of augmentation techniques that can be easily applied to
images. These techniques include geometric transformations like rotation, scaling, and flipping,
as well as color manipulations such as brightness adjustment, contrast enhancement, and hue
shifts. In addition, Albumination supports more advanced transformations such as perspective
transformations, elastic deformations, and noise injection.

One of the key features of Albumination is its ability to handle complex augmentation pipelines.
Users can chain together multiple augmentation operations, specifying the desired parameters
and probabilities for each transformation. This allows for the creation of diverse and
customizable augmentation pipelines tailored to specific needs.

these specific transformations are chosen for simulating low light, low saturation, and a
particular range of rotation:

RandomBrightnessContrast: By randomly adjusting the brightness and contrast of the images,


we introduce variations in the lighting conditions. This helps the model become robust to
different lighting scenarios, including low light situations.

ColorJitter: This transformation randomly applies color variations to the images, such as
changes in hue, saturation, and brightness. It helps the model learn to recognize objects under
different color conditions, such as images captured with low saturation cameras.

GaussNoise: Adding Gaussian noise to the images simulates image noise commonly
encountered in real-world scenarios. It helps the model learn to distinguish and classify objects
even when there is noise present in the image.

Rotate: Rotation augmentation introduces variations by randomly rotating the images within a
specified range (30 degrees in this case). This is useful to handle situations where the road signs
might be tilted or not perfectly aligned in the input images.

RandomRain: Simulating rain effects in the images adds realism and helps the model generalize
better to images captured during rainy weather conditions. It allows the model to learn the
characteristics of road signs under rainy conditions.

RandomShadow: The presence of shadows can affect the appearance of road signs. By applying
random shadow effects, the model learns to recognize road signs even when they are partially
covered by shadows.

RandomFog: Fog is another common environmental factor that can impact the visibility of road
signs. Introducing random fog effects helps the model adapt to such conditions and learn to
classify road signs accurately in foggy scenarios.
https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 24/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

RandomGravel: Road surfaces with gravel or other textured elements can introduce variations in
the appearance of road signs. By applying random gravel effects, the model becomes more
robust to such variations and can recognize road signs in diverse environments.

RandomSnow: Simulating snow on the images helps the model handle road sign classification in
snowy conditions, where the presence of snow might affect the appearance of the signs.

import cv2
import albumentations as A
import os
import matplotlib.pyplot as plt

# Define the image folder path
image_folder = "/content/roadsigndataset/train"

# Define data augmentation transformations
augmentation_transforms = A.Compose([
    A.RandomBrightnessContrast(p=0.5),
    A.ColorJitter(p=0.5),
    A.GaussNoise(p=0.5),
    A.Rotate(limit=30, p=0.5),
    A.RandomRain(p=0.5),
    A.RandomShadow(p=0.5),
    A.RandomFog(p=0.5),
    #A.RandomGravel(p=0.5),
    A.RandomSnow(p=0.5),
])

# Create an empty list to store augmented images
augmented_images = []

# Apply data augmentation to each image in the dataset
# Iterate over the images in the folder
for image_file in os.listdir(image_folder):
    if image_file.endswith(".jpg") or image_file.endswith(".png"):
        image_path = os.path.join(image_folder, image_file)
        # Read the image
        image = cv2.imread(image_path)

        # Apply data augmentation
        augmented = augmentation_transforms(image=image)
        augmented_image = augmented["image"]

        # Add the augmented image to the list or save it to disk
        augmented_images.append(augmented_image)

        # Display the augmented image
        plt.imshow(cv2.cvtColor(augmented_image, cv2.COLOR_BGR2RGB))
        plt.axis('off')
        plt.show()

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 25/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 26/27
17/07/2023, 13:26 notebook_1_data_preparation_and_eda_and_data_augmentation.ipynb - Colaboratory

https://colab.research.google.com/drive/18zMbkgy3VE-o43WQasB9iEXrAIves8St#scrollTo=a4gjMwn9JDcS&printMode=true 27/27

You might also like