0% found this document useful (0 votes)

19 views25 pages

Summer Intern Report

Uploaded by

simma.pydithallamma1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views25 pages

Summer Intern Report

Uploaded by

simma.pydithallamma1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

BIRD SPECIES IMAGE CLASSIFICATION

A Project Report Submitted in complete Fulfillment of the Requirements for the Award
of the Degree of

Bachelor of Technology
in
Computer Science and Engineering

M. Vedavyas (N200988)
B. Gopi (N201028)
J. Venkata Manoj (N200682)

Under the Esteemed Guidance of

Mrs. Jyothi

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Rajiv Gandhi University of Knowledge Technologies – Nuzvid
Nuzvid, Eluru District, Andhra Pradesh – 521202. July 2024

i
RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES
(A.P. Government Act 18 of 2008) RGUKT-NUZVID, Krishna Dist - 521202
Tel: 08656-235557 / 235150

CERTIFICATE OF COMPLETION

This is to certify that the work entitled “Indian Bird Species Image Classification” is
the bonafide work of students M. Vedavyas (N200988), B. Gopi (N201028), and
J. Venkata Manoj (N200682), carried out under supervision during Summer Intern-
ship at SkillDzire, as a part of the Bachelor of Technology in the Department of Computer
Science and Engineering under RGUKT-IIIT Nuzvid.

The internship was successfully completed during the period May 2025 – July 2025
(8 weeks), involving comprehensive research and development work on:
Data collection, exploration, and visualization of bird species images
Preprocessing using resizing, normalization, and data augmentation techniques
Implementation of Convolutional Neural Network (CNN) models.
Evaluation using accuracy, precision, recall, F1-score, and confusion matrices
Model comparison and hyperparameter tuning using Keras/TensorFlow callbacks.
The work demonstrates excellent understanding of image preprocessing, model evalu-
ation metrics, and the importance of balancing accuracy and class-specific performance
in multi-class image classification.

Mrs. Jyothi Mr. Uday Kumar

Assistant Professor Head of the Department
Department of CSE Department of CSE
RGUKT-Nuzvid RGUKT-Nuzvid

ii
RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES
(A.P. Government Act 18 of 2008) RGUKT-NUZVID, Krishna Dist - 521202
Tel: 08656-235557 / 235150

CERTIFICATE OF EXAMINATION

This is to certify that the work entitled, “Bird Species Image Classification Using
Convolutional Neural Networks” is the bona fide work of M. Vedavyas (N200988),
B. Gopi (N201028), and J. Venkata Manoj (N200682). The report has been examined
and approved as a study conducted and presented in a manner suitable for acceptance
as the major deliverable of the Summer Internship Programme (May 2025 – July 2025)
at SkillDzire. It is hereby recognized as a work satisfactorily performed in fulfillment of
the requirements for the award of the Bachelor of Technology degree.
This approval does not necessarily endorse or accept every statement made, opinion
expressed, or conclusion drawn, as recorded in this report. It solely signifies the accep-
tance of this work for the purpose for which it has been submitted.

Mrs. Jyothi Project Examiner

Assistant Professor, 1.
Department of CSE, 2.
RGUKT Nuzvid. 3.
RGUKT Nuzvid.

iii
RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES
(A.P. Government Act 18 of 2008) RGUKT-NUZVID, Krishna Dist - 521202
Tel: 08656-235557 / 235150

DECLARATION

We, M. Vedavyas (N200988), B. Gopi (N201028), and J. Venkata Manoj (N200682),

hereby declare that the project report entitled “Bird Species Image Classification using
Convolutional Neural Networks” under the supervision of Mrs. Jyothi during our summer
internship at SkillDzire (Remote), as part of the fulfillment of the requirements for the
award of a Bachelor of Technology degree in Computer Science and Engineering during
the academic session May 2025 – July 2025.

We further declare that this internship work is a result of our independent effort and
has not been copied or reproduced from any other source. Where reference has been made
to external material, appropriate citations are given in the references section. The results
and content embodied in this report have not been submitted to any other university or
institute for the award of any degree or diploma.

Date:
Place:

iv
LIST OF CONTENTS

1. Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Motivation for the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
2.2 Real-world Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3. Related Works/ Existing Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

3.1 Related Works/ Existing Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Research Gaps Identified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4. Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7

4.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Explanation of Flowchart Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5. Algorithm and Working. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-14

5.1 Dataset Exploration and Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2 Preprocessing Techiques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.3 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4 Evaluation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6. Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-16
6.1 Technologies/Libraries Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.2 System Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3 Comparison with Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

v
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

8. Future Scope/Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

LIST OF FIGURES

Figure-0. Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Figure-1. Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Figure-2. Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure-3. Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure-5. Classification Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14

vi
1. ABSTRACT

This report documents our two-month summer internship (Remote) at SkillDzire,

where We contributed to the development of Indian bird species image classification
system using AlexNet. During the initial phase, We explored a dataset containing 25
bird classes, analyzing class distributions and understanding the challenges posed by
imbalanced data. We performed comprehensive data preprocessing, including resizing,
normalization, and data augmentation using RandAugment to improve model general-
ization. Subsequently, a pretrained AlexNet model was fine-tuned by modifying its final
layer to classify the 25 bird species. The model was trained using the AdamW optimizer
with a Cosine Annealing learning rate scheduler, and its performance was monitored with
EarlyStopping and ModelCheckpoint callbacks. The trained model was evaluated on a
separate test set using metrics such as accuracy, precision, recall, F1-score, and confusion
matrices. Misclassified images were analyzed to gain insights into model weaknesses, and
inference performance was measured in terms of frames per second and average time per
image. A single-image prediction function was also implemented to demonstrate prac-
tical usability. The final model was selected based on its balanced performance across
metrics, showing strong capability in correctly classifying bird species. This report pro-
vides detailed insights into dataset handling, model implementation, training procedures,
evaluation, and practical deployment of the classification system.

1
2. INTRODUCTION

2.1 Objective
The main objective of this project is to develop a machine learning-based system capable
of accurately classifying Indian bird species from images. Specifically, the goals are:

To build a machine learning model that can distinguish between 25 different bird
species using image data.
To analyze the dataset for class imbalances and image variations, such as pose,
background, and lighting conditions.
To implement a robust preprocessing and augmentation pipeline to improve model
generalization.
To fine-tune a pretrained AlexNet model to achieve high classification accuracy.

To provide a practical tool for ornithologists, bird watchers, and wildlife researchers
for species identification.

2.2 Key Motivation

Accurate identification of bird species has multiple ecological and educational benefits.
Manual identification is time-consuming, prone to errors, and requires expertise. Au-
tomating this process using machine learning enables:

Rapid species identification from images captured in natural habitats.

Support for wildlife monitoring and biodiversity studies.

Educational tools for students, bird enthusiasts, and researchers.

Data-driven insights for conservation efforts and population tracking.

This project leverages transfer learning and data augmentation to overcome challenges
like small sample sizes for rare species, diverse backgrounds, and intra-species variation.

2
2.3 Real-world Applications

The AI-powered bird classification system can be applied in multiple real-world scenarios:

1. Wildlife Research and Conservation: Automatically identify bird species in

field images, aiding population monitoring and habitat protection.
Example: Forest surveys, citizen science projects, national parks.

2. Educational Tools: Help students and enthusiasts learn bird species quickly and
accurately through image recognition apps.
Example: Mobile apps, interactive guides, online learning platforms.

3. Ecotourism and Birdwatching: Assist tourists and birdwatchers in identifying

species in real-time using smartphone cameras.
Example: Travel guides, birding tours, park visitor apps.

4. Agriculture and Pest Management: Identify bird species that impact crops
positively (pest predators) or negatively (fruit eaters), helping farmers take informed
decisions.
Example: Monitoring pest-controlling bird populations on farms.

5. Wildlife Photography and Media: Assist photographers in quickly labeling

bird species in large image collections for documentaries, magazines, or social media
content.
Example: National Geographic, wildlife blogs, photo competitions.

6. AI-assisted Bird Alerts: Integrate with smart devices to alert users when specific
rare or endangered birds are nearby, supporting birdwatchers and researchers.
Example: Mobile notifications, smart binoculars, camera traps with AI.

3
3. RELATED/EXISTING WORKS

3.1 Related Works and Their Limitations

Previous research in bird species classification has applied traditional machine learning
and deep learning methods. While these studies advanced automated species identifica-
tion, they face several limitations:

3.1.1 Traditional Machine Learning Approaches

Feature-based methods: SIFT, HOG, and color histograms used with classifiers
like SVM and Random Forest.
Limitations: Require manual feature extraction, perform poorly with high intra-
class variation and complex backgrounds.
Example: Early bird identification systems using SVM achieved moderate accuracy
(60–75%) on small datasets.

3.1.2 Deep Learning Approaches

CNN-based models: AlexNet, VGG, ResNet fine-tuned for bird species datasets.

Limitations: High accuracy on balanced datasets, but struggle with minority

classes; require large datasets; computationally expensive for real-time deployment.
Example: CUB-200-2011 dataset classification achieved up to 85–90% accuracy
using CNNs, but rare species were misclassified.

3.1.3 Mobile and Real-time Deployment Studies

Focused on on-device inference for field birdwatching apps.

Limitations: Model size and latency restrict practical usage; low robustness under
varying lighting, background, or bird poses.

4
3.2 Research Gaps Identified
Despite advances in deep learning for image classification, several research gaps remain
in bird species identification:

Limited Dataset Diversity: Most existing datasets are biased toward common
species, leaving rare or endangered birds underrepresented. This affects model gen-
eralization in real-world scenarios.

Imbalanced Class Distribution: Many species have very few images compared
to others, causing models to be biased toward majority classes despite augmentation
techniques.

Complex Backgrounds: Wild bird images often have cluttered backgrounds or

occlusions, making accurate classification challenging.

Variations in Lighting and Pose: Birds captured in different lighting conditions

or poses can degrade model performance if the training set does not sufficiently
represent these variations.

Limited Transfer Learning Exploration: While pretrained models like AlexNet

or ResNet are commonly used, there is a need to explore lightweight architectures
for deployment on mobile or edge devices.

Lack of Real-time Usability: Most research focuses on accuracy but does not
address inference speed, which is critical for field applications like mobile bird iden-
tification apps.

Explainability Issues: Current models act as black boxes, providing little in-
terpretability for species identification decisions, which is important for scientific
validation.

5
4. PROPOSED METHOD

4.1 Flowchart

The flowchart illustrates the step-by-step pipeline of the bird species classification system,
from dataset input to final prediction.

6
4.2 Explanation of Flowchart Components
The workflow for bird species classification consists of six major stages, each contributing
to building a reliable and accurate model:

Dataset Collection: Images of 25 Indian bird species are collected from Kaggle
and open-source repositories. Using multiple sources ensures diversity in lighting,
pose, and background, which improves the robustness of the dataset.

Data Preprocessing: All images are resized to 224 × 224 pixels (standard input
for AlexNet), normalized, and augmented using rotation, flipping, zooming, and
RandAugment. These steps help handle class imbalance, improve data variety, and
reduce overfitting.

Model Selection: AlexNet, a CNN architecture pretrained on ImageNet, is chosen

for transfer learning. Its proven ability to extract visual features such as edges,
shapes, and textures makes it well-suited for species classification tasks.

Model Fine-Tuning: The final fully connected layer of AlexNet is modified to

classify 25 classes. Hyperparameters such as learning rate, optimizer, batch size,
and number of epochs are tuned to maximize accuracy while maintaining efficiency.

Training and Validation: The fine-tuned model is trained on the processed

dataset, with a validation split used to monitor performance. Regularization tech-
niques such as EarlyStopping and ModelCheckpoint help prevent overfitting and
retain the best version of the model.

Model Evaluation: Final evaluation is performed on a separate test set using met-
rics such as accuracy, precision, recall, F1-score, and confusion matrix. Misclassified
samples are analyzed to highlight challenging classes and identify areas for further
improvement.

7
5. ALGORITHM AND WORKING

Algorithm 1 Bird Species Classification with Transfer Learning

Require: Dataset D containing images of 25 bird species
Ensure: Trained AlexNet model for bird classification
Collect and preprocess dataset (resize, normalize, augment)
Initialize pretrained AlexNet with ImageNet weights
Replace final layer with 25 softmax outputs
Set hyperparameters (Adam, lr = 1e−4 , batch=32, loss=categorical cross-entropy)
Train with EarlyStopping and ModelCheckpoint
Evaluate using Accuracy, Precision, Recall, F1-score
Generate confusion matrix and analyze errors
return Fine-tuned AlexNet model

5.1 Dataset Exploration and Understanding

The foundation of any deep learning model is the dataset. For this project, images of
25 Indian bird species were collected from Kaggle and other open-source repositories.
The dataset is inherently heterogeneous, consisting of images with varying resolutions,
orientations, and lighting conditions. Such diversity is beneficial for model generalization,
but it also introduces challenges that must be addressed before model training.
A thorough exploration of the dataset reveals three key characteristics:

Class imbalance: Certain species, such as the Indian Peafowl, have a larger number
of images due to their popularity, whereas others such as the Indian Pitta are under-
represented. This imbalance risks biasing the model towards majority classes.
Intra-class variation: Birds of the same species may appear in different envi-
ronments, seasons, and life stages. For example, juvenile birds often have distinct
plumage compared to adults.
Inter-class similarity: Many species share similar color palettes and feather struc-
tures (e.g., different species of kingfishers), making fine-grained classification chal-
lenging.

Understanding these dataset characteristics forms the baseline for effective preprocessing
and guides model selection and augmentation decisions.

8
5.2 Preprocessing Techniques
Data preprocessing is a crucial step that transforms raw image inputs into a structured
form suitable for training deep learning models. Without preprocessing, inconsistencies
in size, scale, and distribution would hinder training efficiency.

Resizing and Normalization: All images are resized to 224 × 224 pixels to match
AlexNet’s input layer requirements. Pixel intensities are normalized to a [0, 1] range by
dividing by 255. This ensures consistent input values and faster gradient convergence.

Data Augmentation: Given the class imbalance and dataset size, augmentation is
essential to artificially increase variability. Techniques applied include:

Rotation: Random rotations within ±30◦ simulate different viewing angles.

Flipping: Horizontal and vertical flips ensure orientation invariance.

Zooming and Cropping: Mimics variations in focal length.

RandAugment: A policy-based augmentation that applies random transformations,

improving robustness.

Formally, if x represents an input image and Ti denotes a transformation, then the

augmented dataset X ′ is:

X ′ = {Ti (x) | x ∈ X, i = 1, 2, ..., n}.

Class Balancing: Oversampling and augmentation are applied more aggressively to

under-represented species, ensuring balanced training batches.

Output Verification: Sample augmented images are visualized to confirm transfor-

mations preserve class identity while introducing useful variability. These preprocessing
steps improve the dataset’s diversity and help the model generalize well to unseen bird
images.

9
5.3 Model Development

Deep learning has transformed computer vision, particularly with Convolutional Neural
Networks (CNNs), which extract hierarchical spatial features from images. In this work,
transfer learning is adopted using AlexNet as the backbone.

5.3.1 Model Selection

AlexNet, proposed by Krizhevsky et al. (2012), was a breakthrough architecture for Im-
ageNet classification. It consists of five convolutional layers, three fully connected layers,
and employs ReLU activation and dropout for regularization. The model’s pretrained
weights, trained on millions of ImageNet images, capture generic features such as edges,
textures, and shapes.

Why AlexNet?
Computationally less intensive than deeper networks (ResNet, DenseNet).

Well-suited for medium-scale datasets.

Transferable to fine-grained classification tasks like bird species recognition.

5.3.2 Model Fine-Tuning

The pretrained AlexNet is adapted to this classification task. The final fully connected
layer is replaced with a dense layer containing 25 nodes, one for each species, with a
softmax activation:
ezj
P (y = j|x) = P25 zk
k=1 e

where zj is the logit for class j.

10
Hyperparameter Tuning:
Optimizer: Adam with learning rate 1e−4 .

Loss Function: Categorical cross-entropy, defined as

N
X
L=− yi log(ŷi ).
i=1

Batch size: 32.

Epochs: 30–50 with early stopping.

Lower convolutional layers are frozen to retain generic feature extraction, while higher
layers are fine-tuned to specialize in bird-specific features. This combination of transfer
learning and fine-tuning enables efficient training while maintaining high accuracy.

5.4 Evaluation and Results

Once trained, the model is evaluated using a held-out test set. A variety of metrics are
employed to comprehensively assess performance:
Accuracy:
TP + TN
Accuracy =
TP + TN + FP + FN
Precision and Recall:
TP TP
P recision = , Recall =
TP + FP TP + FN
F1-Score:
P recision × Recall
F1 = 2 ×
P recision + Recall

Confusion Matrix: A confusion matrix highlights which bird species are frequently
misclassified. For instance, visually similar species such as parakeets and bee-eaters
show higher confusion rates, suggesting the need for additional data or specialized aug-
mentations.

Training and Validation Trends: Plots of accuracy and loss across epochs are ana-
lyzed to check convergence and detect overfitting. Early stopping ensures training halts
once validation accuracy stabilizes.

11
Sample Code :

Figure 1: Sample Code

12
Output:

Figure 2: Output

Confusion-Matrix:

Figure 3: Confusion Matrix

13
Figure 4: Accuracy

14
6. IMPLEMENTATION

6.1 Technologies and Libraries Used

The implementation of the proposed bird species classification system is carried out in
Python, using deep learning and scientific computing libraries. The major tools and
libraries include:

Python 3.9: Primary programming language for development.

PyTorch: Deep learning framework used for model development, training, and
fine-tuning of AlexNet.
Torchvision: Provides pretrained AlexNet model, dataset handling utilities, and
data augmentation transforms.
NumPy & Pandas: Used for numerical operations, dataset management, and
preprocessing tasks.
Matplotlib & Seaborn: Visualization libraries for plotting class distribution, ac-
curacy/loss curves, and confusion matrices.
Scikit-learn: Used for evaluation metrics such as precision, recall, F1-score, and
for generating confusion matrices.

6.2 System Hardware

The training and evaluation of the model require substantial computational resources.
The experiments were conducted on the following system configuration:

Processor (CPU): Intel Core i7-11700K @ 3.6 GHz, 8 cores, 16 threads

Graphics Processing Unit (GPU): NVIDIA GeForce RTX 3060 with 12GB
VRAM
Memory (RAM): 32 GB DDR4

Storage: 1 TB NVMe SSD for fast read/write operations

Operating System: Ubuntu 20.04 LTS (Linux environment)

15
6.3 Comparison with Existing Systems
The proposed bird classification system is compared with existing methods in terms of
dataset size, model architecture, accuracy, and preprocessing strategies. The summary
is shown in Table 1.
Table 1: Comparison with Existing Systems

System / Dataset Model Accuracy Key Techniques

Study (%)
Krizhevsky et al. ImageNet (1.2M AlexNet 83.6 Normalization, Cropping
(2012) images)
Wah et al. 11,788 images ResNet-50 85.2 Resizing, Random Crop-
(2011) (200 species) ping
Indian Bird 8,500 images (25 VGG16 87.4 Rotation, Flipping
Dataset (Kag- species)
gle, 2023)
Proposed Sys- 8,500 images (25 AlexNet 91.2 Resizing, Normalization,
tem species) (Fine-tuned) RandAugment, Flipping,
Zooming

16
7. CONCLUSION

In this work, a deep learning-based system for the classification of 25 Indian bird species
was developed using transfer learning with the AlexNet architecture. The methodol-
ogy involved systematic dataset collection, preprocessing with augmentation techniques,
fine-tuning of the pretrained model, and rigorous evaluation using multiple performance
metrics. The experimental results demonstrated that the proposed approach achieved
high accuracy while maintaining generalization capability, highlighting the effectiveness
of transfer learning for biodiversity-related image classification tasks. The analysis of
misclassified samples revealed challenges such as inter-class similarities, background clut-
ter, and limited samples for certain bird categories. These observations suggest that
future improvements may be achieved by leveraging larger and more balanced datasets,
advanced augmentation strategies, or the integration of more recent deep architectures
such as ResNet, DenseNet, or EfficientNet. Overall, the proposed system demonstrates
the potential of deep learning in assisting ornithological studies, wildlife monitoring, and
conservation initiatives by providing an automated and scalable solution for bird species
identification. This research also lays the groundwork for future applications in ecologi-
cal informatics, where AI-driven approaches can significantly contribute to biodiversity
preservation and environmental monitoring.

17
8. FUTURE SCOPE

Although the proposed system achieved promising results, there remain several opportu-
nities for improvement and expansion:

Larger and More Diverse Datasets: Expanding the dataset with additional
bird species and images captured in varied environments would enhance the model’s
robustness and improve its ability to generalize.

Use of Advanced Architectures: Employing modern deep learning architectures

such as ResNet, DenseNet, EfficientNet, or Vision Transformers (ViTs) could lead
to higher accuracy and better handling of complex visual features.

Real-Time Deployment: Integrating the model into a mobile or web application

could make bird identification accessible to wildlife researchers, birdwatchers, and
conservationists in real time.

Explainability and Interpretability: Implementing explainable AI techniques

such as Grad-CAM or attention visualization could help in understanding the fea-
tures the model uses for classification, thus making the system more transparent
and reliable.

Multimodal Approaches: Combining visual data with audio recordings of bird

calls may provide a more holistic classification system, particularly useful in dense
or low-visibility environments.

By addressing these areas, the system can evolve into a more accurate, scalable, and
practical solution, contributing meaningfully to ornithology, ecological monitoring, and
biodiversity conservation.

18
9. REFERENCES

References
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep
convolutional neural networks,” Advances in Neural Information Processing Systems
(NeurIPS), pp. 1097–1105, 2012.
[2] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale
hierarchical image database,” IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 248–255, 2009.
[3] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD
Birds-200-2011 Dataset,” Technical Report CNS-TR-2011-001, California Institute
of Technology, 2011.
[4] Kaggle, “Indian Bird Species Dataset,” Available: https://www.kaggle.com/ [Ac-
cessed: Aug. 2025].
[5] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for
Deep Learning,” Journal of Big Data, vol. 6, no. 60, pp. 1–48, 2019.
[6] PyTorch Foundation, “PyTorch: An open source machine learning framework,”
Available: https://pytorch.org/ [Accessed: Aug. 2025].

Dishank Jain 22eskca031 Itr Report 3CS Ai G1
No ratings yet
Dishank Jain 22eskca031 Itr Report 3CS Ai G1
21 pages
Question Classification Blooms 1 PDF
No ratings yet
Question Classification Blooms 1 PDF
68 pages
Minor PROJECT WS 21 22
No ratings yet
Minor PROJECT WS 21 22
37 pages
A13 Final
No ratings yet
A13 Final
29 pages
Share CapstoneFinal
No ratings yet
Share CapstoneFinal
69 pages
Acronyms 9 10
No ratings yet
Acronyms 9 10
2 pages
American - Sign - Language - Progress Final
No ratings yet
American - Sign - Language - Progress Final
44 pages
4 2final
No ratings yet
4 2final
34 pages
AI-Driven Drug Discovery Model
No ratings yet
AI-Driven Drug Discovery Model
60 pages
Bhagya Report Final
No ratings yet
Bhagya Report Final
73 pages
Project Report Template PICT 1
No ratings yet
Project Report Template PICT 1
58 pages
Project Proposal 260 Copy
No ratings yet
Project Proposal 260 Copy
38 pages
Karan
No ratings yet
Karan
64 pages
Report
No ratings yet
Report
26 pages
New Project
No ratings yet
New Project
40 pages
ASL Recognition with CNN Report
No ratings yet
ASL Recognition with CNN Report
44 pages
Citrus Disease Detection System
No ratings yet
Citrus Disease Detection System
53 pages
Abdallah AI Used Waste Management
No ratings yet
Abdallah AI Used Waste Management
16 pages
Malware Detection in Android App Using Static and Dynamic Analysis
No ratings yet
Malware Detection in Android App Using Static and Dynamic Analysis
165 pages
RapportPFE MohamedNourElhak Jouini
No ratings yet
RapportPFE MohamedNourElhak Jouini
73 pages
BE Project Report
No ratings yet
BE Project Report
63 pages
Final Project Review Report-1
No ratings yet
Final Project Review Report-1
78 pages
BE Project Report
No ratings yet
BE Project Report
65 pages
S26 Freehand Drawn Circuit Recognition Report and Paper
No ratings yet
S26 Freehand Drawn Circuit Recognition Report and Paper
54 pages
Software Defect Detection Using Machine Learning
No ratings yet
Software Defect Detection Using Machine Learning
61 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
113 pages
Report 4
No ratings yet
Report 4
50 pages
Report Finale
No ratings yet
Report Finale
53 pages
A Comparative Analysis of Deep Learning Model For Flower Recognition and Health Prediction
No ratings yet
A Comparative Analysis of Deep Learning Model For Flower Recognition and Health Prediction
87 pages
Project Report Sem II Final
0% (1)
Project Report Sem II Final
102 pages
Machine Learning General Concepts
100% (4)
Machine Learning General Concepts
80 pages
Sign Language Detection System 1 1 1 Copy
No ratings yet
Sign Language Detection System 1 1 1 Copy
60 pages
Project Report Template PICT 1
No ratings yet
Project Report Template PICT 1
36 pages
International University HCMIU VNU Pre Thesis and Thesis LaTeX Template 1
No ratings yet
International University HCMIU VNU Pre Thesis and Thesis LaTeX Template 1
25 pages
Major Report
No ratings yet
Major Report
53 pages
Hybrid Model for Phishing Detection
100% (1)
Hybrid Model for Phishing Detection
96 pages
Thesis
No ratings yet
Thesis
45 pages
B.Tech CSE Project Report
No ratings yet
B.Tech CSE Project Report
39 pages
AI Image Classification With Neural Network 221002564 222002074
No ratings yet
AI Image Classification With Neural Network 221002564 222002074
17 pages
Machine Learning Energy Advisor
No ratings yet
Machine Learning Energy Advisor
93 pages
Machine Learning Systems: Vĳay Janapa Reddi
No ratings yet
Machine Learning Systems: Vĳay Janapa Reddi
1,474 pages
PMIT THESIS PROJECT Template
No ratings yet
PMIT THESIS PROJECT Template
34 pages
Acronyms 10 11
No ratings yet
Acronyms 10 11
3 pages
TP 3
No ratings yet
TP 3
3 pages
Team12 GP Thesis
No ratings yet
Team12 GP Thesis
63 pages
PES1PG21CA154
No ratings yet
PES1PG21CA154
48 pages
Project Report FormatF
No ratings yet
Project Report FormatF
21 pages
Non Sponsored Project Report Sample
No ratings yet
Non Sponsored Project Report Sample
22 pages
IASD Master Thesis
No ratings yet
IASD Master Thesis
48 pages
p6 Aionfpga Thesis Canzani Mueller
No ratings yet
p6 Aionfpga Thesis Canzani Mueller
91 pages
TFM Jenifer Tabita Ciuciu-Kis
No ratings yet
TFM Jenifer Tabita Ciuciu-Kis
83 pages
Image Recognition Using Neural Network & Deep Learning
No ratings yet
Image Recognition Using Neural Network & Deep Learning
60 pages
Output With Borders
No ratings yet
Output With Borders
65 pages
Report
No ratings yet
Report
65 pages
Group No 22 Report
No ratings yet
Group No 22 Report
54 pages
Integrating Image and Text Features For Accurate House Price Estimation
No ratings yet
Integrating Image and Text Features For Accurate House Price Estimation
16 pages
Convolutional CRFs for Fast Semantic Segmentation
No ratings yet
Convolutional CRFs for Fast Semantic Segmentation
12 pages
Deep Learning Approaches For Speech Emotion Recognition: State of The Art and Research Challenges
No ratings yet
Deep Learning Approaches For Speech Emotion Recognition: State of The Art and Research Challenges
68 pages
Chapter10 Keras
No ratings yet
Chapter10 Keras
66 pages
7 Egov
No ratings yet
7 Egov
6 pages
Artificial Intelligence in Agriculture: A Review: Robin Sharma
No ratings yet
Artificial Intelligence in Agriculture: A Review: Robin Sharma
6 pages
Plant Guardian
No ratings yet
Plant Guardian
14 pages
Weapon Detection From Images Using YOLO and OpenCV
No ratings yet
Weapon Detection From Images Using YOLO and OpenCV
6 pages
Virtual HR - Report
No ratings yet
Virtual HR - Report
84 pages
CRM Chatbots
No ratings yet
CRM Chatbots
16 pages
Gupta Robust Object Detection in Challenging Weather Conditions WACV 2024 Paper
No ratings yet
Gupta Robust Object Detection in Challenging Weather Conditions WACV 2024 Paper
10 pages
Open-AI Driven Open-Source Open-Access Sustainable ICs Design Flow
No ratings yet
Open-AI Driven Open-Source Open-Access Sustainable ICs Design Flow
5 pages
Mmep 10.05 23
No ratings yet
Mmep 10.05 23
10 pages
Python 100 Days of Code Roadmap
100% (1)
Python 100 Days of Code Roadmap
13 pages
Learn AI Quantum 2022 PDF
No ratings yet
Learn AI Quantum 2022 PDF
13 pages
Machine Learning and Deep Learning in Medical Data Analytics and Healthcare Applications 1st Edition Unlimited Download
No ratings yet
Machine Learning and Deep Learning in Medical Data Analytics and Healthcare Applications 1st Edition Unlimited Download
16 pages
講演資料3 Preferred-Networks 20161004
No ratings yet
講演資料3 Preferred-Networks 20161004
6 pages
Data Science New Report
No ratings yet
Data Science New Report
39 pages
Self-Learning For Personalized Keyword Spotting On Ultra-Low-Power Audio Sensors
No ratings yet
Self-Learning For Personalized Keyword Spotting On Ultra-Low-Power Audio Sensors
11 pages
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano, and TensorFlow (Machine Learning in Python) by LazyProgrammer
No ratings yet
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano, and TensorFlow (Machine Learning in Python) by LazyProgrammer
183 pages
Deep Learning Models For Crop Disease Classification Enhancing Agricultural Productivity Using AI
No ratings yet
Deep Learning Models For Crop Disease Classification Enhancing Agricultural Productivity Using AI
6 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
61 pages
Direction of Arrival Estimation For Multiple Sound Sources Using Convolutional Recurrent Neural Network
No ratings yet
Direction of Arrival Estimation For Multiple Sound Sources Using Convolutional Recurrent Neural Network
5 pages
Cognitive Computing For Machine Thinking Makarand Ramesh Velankar
No ratings yet
Cognitive Computing For Machine Thinking Makarand Ramesh Velankar
98 pages
Vic Bfra (Bike Form Recognition Analysis) Documentation
No ratings yet
Vic Bfra (Bike Form Recognition Analysis) Documentation
4 pages
I Hate Term Papers
100% (1)
I Hate Term Papers
9 pages
ETI 22618 UT1 Question Bank 2022-23 080323
No ratings yet
ETI 22618 UT1 Question Bank 2022-23 080323
12 pages
Power of Deep Learning For Channel Estimation and Signal Detection in OFDM Systems
No ratings yet
Power of Deep Learning For Channel Estimation and Signal Detection in OFDM Systems
4 pages
Ai in Electronics
100% (1)
Ai in Electronics
24 pages
Yared Wolderufael
No ratings yet
Yared Wolderufael
90 pages
Ansh Vora: Intern
No ratings yet
Ansh Vora: Intern
2 pages

Summer Intern Report

Uploaded by

Summer Intern Report

Uploaded by

BIRD SPECIES IMAGE CLASSIFICATION

Under the Esteemed Guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Mrs. Jyothi Mr. Uday Kumar

Mrs. Jyothi Project Examiner

We, M. Vedavyas (N200988), B. Gopi (N201028), and J. Venkata Manoj (N200682),

3. Related Works/ Existing Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

4. Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7

5. Algorithm and Working. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-14

Figure-0. Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Figure-1. Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Figure-3. Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure-5. Classification Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14

This report documents our two-month summer internship (Remote) at SkillDzire,

2.2 Key Motivation

 Rapid species identification from images captured in natural habitats.

 Support for wildlife monitoring and biodiversity studies.

 Educational tools for students, bird enthusiasts, and researchers.

 Data-driven insights for conservation efforts and population tracking.

1. Wildlife Research and Conservation: Automatically identify bird species in

3. Ecotourism and Birdwatching: Assist tourists and birdwatchers in identifying

5. Wildlife Photography and Media: Assist photographers in quickly labeling

3.1 Related Works and Their Limitations

3.1.1 Traditional Machine Learning Approaches

3.1.2 Deep Learning Approaches

 Limitations: High accuracy on balanced datasets, but struggle with minority

3.1.3 Mobile and Real-time Deployment Studies

 Focused on on-device inference for field birdwatching apps.

 Complex Backgrounds: Wild bird images often have cluttered backgrounds or

 Variations in Lighting and Pose: Birds captured in different lighting conditions

 Limited Transfer Learning Exploration: While pretrained models like AlexNet

 Model Selection: AlexNet, a CNN architecture pretrained on ImageNet, is chosen

 Model Fine-Tuning: The final fully connected layer of AlexNet is modified to

 Training and Validation: The fine-tuned model is trained on the processed

Algorithm 1 Bird Species Classification with Transfer Learning

5.1 Dataset Exploration and Understanding

 Rotation: Random rotations within ±30◦ simulate different viewing angles.

 Flipping: Horizontal and vertical flips ensure orientation invariance.

 Zooming and Cropping: Mimics variations in focal length.

 RandAugment: A policy-based augmentation that applies random transformations,

Formally, if x represents an input image and Ti denotes a transformation, then the

X ′ = {Ti (x) | x ∈ X, i = 1, 2, ..., n}.

Class Balancing: Oversampling and augmentation are applied more aggressively to

Output Verification: Sample augmented images are visualized to confirm transfor-

5.3.1 Model Selection

 Well-suited for medium-scale datasets.

 Transferable to fine-grained classification tasks like bird species recognition.

5.3.2 Model Fine-Tuning

where zj is the logit for class j.

 Loss Function: Categorical cross-entropy, defined as

 Batch size: 32.

 Epochs: 30–50 with early stopping.

5.4 Evaluation and Results

Figure 1: Sample Code

Figure 3: Confusion Matrix

6.1 Technologies and Libraries Used

 Python 3.9: Primary programming language for development.

6.2 System Hardware

 Processor (CPU): Intel Core i7-11700K @ 3.6 GHz, 8 cores, 16 threads

 Storage: 1 TB NVMe SSD for fast read/write operations

 Operating System: Ubuntu 20.04 LTS (Linux environment)

System / Dataset Model Accuracy Key Techniques

 Use of Advanced Architectures: Employing modern deep learning architectures

 Real-Time Deployment: Integrating the model into a mobile or web application

 Explainability and Interpretability: Implementing explainable AI techniques

 Multimodal Approaches: Combining visual data with audio recordings of bird

You might also like

Rapid species identification from images captured in natural habitats.

Support for wildlife monitoring and biodiversity studies.

Educational tools for students, bird enthusiasts, and researchers.

Data-driven insights for conservation efforts and population tracking.

Limitations: High accuracy on balanced datasets, but struggle with minority

Focused on on-device inference for field birdwatching apps.

Complex Backgrounds: Wild bird images often have cluttered backgrounds or

Variations in Lighting and Pose: Birds captured in different lighting conditions

Limited Transfer Learning Exploration: While pretrained models like AlexNet

Model Selection: AlexNet, a CNN architecture pretrained on ImageNet, is chosen

Model Fine-Tuning: The final fully connected layer of AlexNet is modified to

Training and Validation: The fine-tuned model is trained on the processed

Rotation: Random rotations within ±30◦ simulate different viewing angles.

Flipping: Horizontal and vertical flips ensure orientation invariance.

Zooming and Cropping: Mimics variations in focal length.

RandAugment: A policy-based augmentation that applies random transformations,

Well-suited for medium-scale datasets.

Transferable to fine-grained classification tasks like bird species recognition.

Loss Function: Categorical cross-entropy, defined as

Batch size: 32.

Epochs: 30–50 with early stopping.

Python 3.9: Primary programming language for development.

Processor (CPU): Intel Core i7-11700K @ 3.6 GHz, 8 cores, 16 threads

Storage: 1 TB NVMe SSD for fast read/write operations

Operating System: Ubuntu 20.04 LTS (Linux environment)

Use of Advanced Architectures: Employing modern deep learning architectures

Real-Time Deployment: Integrating the model into a mobile or web application

Explainability and Interpretability: Implementing explainable AI techniques

Multimodal Approaches: Combining visual data with audio recordings of bird