B.
Tech Project Report
Classification of Endoscopy Images and Web App
Implementation
under the guidance of
(Mr. Sareddy Shiva Reddy)
ubmitted By:
S
1. Zahanat Ali Khan (21011M2210)
2. Priyanshu Kumar (21011M2103)
3. Yashwanth (21011M2220)
epartment of Computer Science Engineering
D
Jawaharlal Nehru Technological University Hyderabad
University College of Engineering, Science, & Technology Hyderabad
Kukatpally, Hyderabad - 500 0085, Telangana, India
epartment of Computer Science Engineering
D
Jawaharlal Nehru Technological University Hyderabad
University College of Engineering, Science, & Technology Hyderabad
Kukatpally, Hyderabad - 500 0085, Telangana, India
DECLARATION BY THE CANDIDATES
e,Zahanat Ali Khan (21011M2210),Priyanshu Kumar(21011M2103), &Yashwanth
W
(21011M2220)hereby declare that the major projectentitled“Classification of Endoscopy and Web
App Implementation”, carried out by us under theguidance ofMr. Sareddy Shiva Reddy, is
submitted in partial fulfillment of the requirements for the award of the degree of Bachelor of
Technology in Computer Science and Engineering. This is a record of Bonafide work carried out by us
and the results embodied in this project have not been reproduced/copied from any source.
he results embodied in this project report have not been submitted to any other university or institute
T
for the award of any other degree or diploma.
ahanat Ali Khan (21011M2210)
Z
Priyanshu Kumar (21011M2103)
Yashwanth (21011M2220)
epartment of Computer Science Engineering
D
Jawaharlal Nehru Technological University Hyderabad
University College of Engineering, Science, & Technology Hyderabad
Kukatpally, Hyderabad - 500 0085, Telangana, India
CERTIFIED BY THE SUPERVISOR
This is to certify that project report entitled“Classificationof Endoscopy and Web App
I mplementation”, being submitted byZahanat Ali Khan(21011M2210),Priyanshu Kumar
(21011M2103), &Yashwanth (21011M2220)in partialfulfillment of the requirements for the award
of the degree of Bachelor of Technology in Computer Science and Engineering, is a record of Bonafide
work carried out by them. The results embodied in this project have not been reproduced/copied from
any source.
he results embodied in this project report have not been submitted to any other university or institute
T
for the award of any other degree or diploma.
Mr. Sareddy Shiva Reddy
Assistant Professor,
epartment of Computer Science and Engineering,
D
JNTUH UCESTH
epartment of Computer Science Engineering
D
Jawaharlal Nehru Technological University Hyderabad
University College of Engineering, Science, & Technology Hyderabad
Kukatpally, Hyderabad - 500 0085, Telangana, India
CERTIFIED BY THE HEAD OF DEPARTMENT
his is to certify that project report entitled“Classificationof Endoscopy and Web App
T
Implementation”, being submitted byZahanat Ali Khan(21011M2210),Priyanshu Kumar
(21011M2103), &Yashwanth (21011M2220)in partialfulfillment of the requirements for the award
of the degree of Bachelor of Technology in Computer Science and Engineering, is a record of Bonafide
work carried out by them. The results embodied in this project have not been reproduced/copied from
any source.
he results embodied in this project report have not been submitted to any other university or institute
T
for the award of any other degree or diploma.
Dr. K P Supreethi
rofessor & Head of Department,
P
epartment of Computer Science and Engineering,
D
JNTUH UCESTH
ACKNOWLEDGEMENT
We wish to extend our sincere gratitude to our supervisor,Mr. Sareddy Shiva Reddy,
rofessor of Computer Science & Engineering Department, for being a driving force all through the
P
way. This project would not be so smooth and so interesting without his encouragement.
We also wish to express our gratitude toDr. K P SUPREETHI,Professor and Head,
epartment of Computer Science & Engineering for providing necessary computing facilities. We are
D
indebted to the Department of Computer Science & Engineering, JNTUH for providing us with all the
required facilities to carry out our project work in a congenial environment. We also extend our
gratitude to the CSE Department staff for providing necessary support from time to time whenever
requested.
ahanat Ali Khan (21011M2210)
Z
Priyanshu Kumar (21011M2103)
Yashwanth (21011M2220)
ABSTRACT
ndoscopy, a widely used procedure in the medical field to assess and examine the
E
gastrointestinal (GI) tract to detect various types of diseases and give a better diagnosis when
compared to manual diagnosis which poses many challenges. In this ever advancing world, the
widespread and general use of Artificial Intelligence has imposed a life changing ease of
everyday task and would be even more beneficial if implemented in niches that require manual
labor and time constraint scenarios as such could reap a benefit of a lifetime advancing the
society and helping a vast majority of the ~2.86 billion people who are diagnosed with some
kind gastrointestinal (GI) disease. Our use of Convoluted Neural Network (CNN) stands out as
it is optimized and developed to help classify the large datasets obtained during the process of
the endoscopy using a Capsule. The process known as Capsule Endoscopy, is a pill shaped
capsule able to produce large datasets which professionals have to assess manually. By
adopting our method, we achieved an impressive top-1 accuracy of 92% and an F1 score of
92%. Our study highlights the impressive potential of classifying endoscopy images to better
improve the diagnosis of GI related diseases, marking a milestone for more accurate and
efficient diagnosis.
KEYWORDS: Classification, CNN, Capsule Endoscopy
1
Contents
1. Theoretical
.1. Introduction
1
1.2. Problem Statement
1.3. Strategies for Building AI Model
1.3.1. Introduction
1.3.2. Convolution Layer
1.3.3. Pooling Layer
1.3.4. Activation Functions
1.3.5. Feature Maps
1.3.6. Optimizing Algorithms
2. Experimental
.1. Introduction
2
2.2. Methodology
2.2.1. Sample preparation
2.2.2. Data Preprocessing
2.2.3. Model Building
2.2.3.1. Model Architecture
2.2.3.2. Model Training
2.2.3.3. Model Evaluation
2.2.3.4. Model Saving
2.2.4. Evaluation and Visualisation
2.2.5. Results
2.2.6. Web Application
References....................................................................................................
2
1.Theoretical
his part of the report covers the theory and concepts applied throughout the
T
development of the project.
1.1. Introduction
he gastrointestinal (GI) tract has several ways of diagnosis, one of the most prominent
T
and accurate method being the Capsule Endoscopy (Fig. 1). In this non-invasive
procedure, a capsule equivalent to the size of a medicine pill is digested through the
opening of the mouth and has the ability to take videos and photos as it passes through
the GI tract.
ccording to the National Institute of Diabetes and Digestive and Kidney Disease
A
(NIDDK), more than 60 million people are affected by medical conditions related to GI
tract. Diseases associated with the GI tract were responsible for a crude rate of 37.2
deaths per 100,000 population in the region of Americas in 2019.
I t remains quite challenging to accurately identify a handful of abnormalities among the
several thousands of photos and videos captured by the capsule and this impediment
has limited expansion of the technology. Many medical imaging challenges, such as
lung segmentation and brain tumor classification, have previously made extensive use
of neural network (NN) based models. To develop a tool or to even present an idea to
overcome such a difficult scenario where manual labor is prominent could be beneficial
for quick assessment and diagnostics helping the medical professionals save time in
making quicker decisions overall impacting the livelihood of patients and contributing
to the safety of the society.
I mage classification is an overly complex methodology in terms of architecture and
design to be implemented effectively and accurately to present a plausible
implementation in the medical field. The classification has to take into consideration
many factors such as the complexity and the variability of the images and videos
captured by the endoscopy capsule. a single session recorded by an endoscopy capsule
could fetch as many as several thousands of images and videos which makes the
classification of this image is time consuming even for the highly professional medical
staff, bringing an automated solution to the process of image classification could pose
under the guidance of a medical professional, a great development towards diagnosing
GI related diseases.
3
1.2. Problem Statement
he aim of this project is to provide a solution towards the classification of images
T
produced/collected by the non-invasive procedure of capsule endoscopy and
characterize them into predefined classes and perform self supervised learning to
provide an automated classification of images via an easily accessible user interface
website.
bjective:To develop a robust machine learning model(e.g., Convolutional Neural
O
Networks) that can automatically classify capsule endoscopy images into these categories.
The model should be able to:
1 . Identify abnormalities: Detect and classify visibleabnormalities in the GI tract,
such as ulcers, polyps, bleeding, or signs of Crohn's disease.
2. Accuracy and Reliability: Achieve high classificationaccuracy, minimizing false
positives and false negatives.
3. Scalability: Handle large datasets, as capsule endoscopyprocedures generate a vast
number of images.
4. Interpretability: Provide interpretable results tohelp clinicians understand the
model's decision-making process, potentially offering insight into the severity and nature
of detected conditions.
Challenges:
1 . Data Variability: Capsule endoscopy images can varysignificantly in terms of
lighting, angle, and quality.
2. Class Imbalance: Some categories (such as 'Normal')may have far more examples
than others (such as 'Tumor'), leading to class imbalance that could affect model
performance.
3. Complexity of Visual Features: Gastrointestinal diseasesmay present with subtle
visual features that require advanced image analysis techniques to detect.
4. Generalization: The model needs to generalize wellacross different patient
demographics and types of abnormalities.
Expected Outcomes:
1 . Trained Model: A trained machine learning model capableof accurately
classifying capsule endoscopy images into the specified categories.
2. Evaluation Metrics: Quantitative performance metrics,such as accuracy,
precision, recall, and F1 score, to assess model effectiveness.
3. Clinical Utility: A prototype that can be integratedinto clinical settings, assisting
gastroenterologists in diagnosing GI diseases more efficiently and accurately.
4
1.3.Strategies for Building AI Model
This subsection discusses the concepts utilized towards the development of the AI Model.
1.3.1.Introduction
here are several AI Models that can be implemented to bring this project to life and as
T
such for the entirety of this project we shall implement the workings of a very widely
accepted and implemented AI model known as Neural Network. In the depths of the AI
Model, a prominent model known as Convoluted Neural Network or CNN will be
implemented.
onvoluted Neural Network or CNN is a specific type of Neural Network that is composed
C
of several layers as seen in Fig. 2.
Fig. 2
1.3.2.Convolution Layer
his layer uses the concept of filters that performs convolution operations as it is scanning
T
which helps to identify patterns and features within the image. The convolution operation
involves sliding a small matrix called a filter or kernel over the input image, and computing
a dot product between the filter and each overlapping region of the image. This produces a
two-dimensional feature map that highlights the presence of certain patterns in the input
image as seen in Fig. 3.
5
yperparameters include filter size and stride which can be fine tuned to optimize the
H
models performance.
1.3.3.Pooling Layer
his is a downsampling operation performed on the feature maps generated by the
T
convolution layer. Taking into consideration only the crucial details, the pooling layer takes
the feature map and shrinks it down. Max and average pooling are special kinds of pooling
where the maximum and average value is taken, respectively as seen in Fig. 4.
Fig. 4
1.3.4.Activation Functions
ctivation functions are the mathematical functions applied to the output of the neural
A
network layer as they alter the input to better suit the network's next layer and add
nonlinearity to the equation. Several varieties of activation functions exist and a few widely
used ones being Rectified Linear Unit Layer (ReLu) and SoftMax. Selecting the best
activation function significantly improves the performance of the model.
ectified Linear Unit Layer (ReLu):As the name suggests,ReLu is linear when the
R
given input is positive and zero in all other cases. This linearity helps prevent gradients
from being non-saturated. Refer to Fig. 5.
Fig. 5
6
oftMax:The softmax function converts a vector ofK real numbers into a probability
S
distribution of K possible outcomes. It is a generalization of the logistic function to
multiple dimensions, and is used in multinomial logistic regression. Refer to Fig. 6 for
graphical representation and formula.
Fig. 6
1.3.5.Feature Maps
hese are 2D matrices of numbers that represent the distinctive features of a given set.
T
They are made up of neurons and their respective relative activations, each element in the
array representing the activation of a distinctive neuron in the layer. The complexity of the
feature map is directly proportional to the number of moves made through the network by
the input. This helps in the overall performance of the model and reduced dimensionality
with improved interpretability.
1.3.6.Optimization Algorithms
ptimizing algorithms are key elements in machine learning to develop learning models.
O
Its objective is to minimize the cost and loss functions. The cost function measures how
effectively the network performs its mission. The most popular optimization approach in
deep learning is gradient descent. It operates by calculating the cost function's gradient with
respect to the network parameters and changing the parameters in the gradient's opposite
direction. The optimizer used in building our model is Adam optimizer, short for Adaptive
Moment Estimation, it helps in the minimizing of the loss function during the training of
the neural network.
7
2. Experimental
his part of the report covers the experimental and implementation of the theoretical
T
part of this project.
2.1. Introduction
s the implementation of the theory takes place, the most crucial part of the entire
A
project was to collect data and build a dataset to experiment on. As extensive research
and networking, we have settled on using a dataset provided by the OSF directory
named, “Kvasir-Capsule Dataset”. This dataset was made public to help researchers
develop a machine learning algorithm such as ours. This dataset in its entirety contains
47,238 medically accurately labeled frames of endoscopy images and a total of 117
videos which can be utilized to extract several GBs of such frames.We are grateful to
the researchers who have put their time and effort to make this dataset open to public
for use. The 14 classes of disease are labeled and shown in Fig. 7.
Fig.7
8
2.2. Methodology
his subsection covers the methodology and the workings made throughout the
T
development of the project.
2.2.1.Sample Preparation
uring the utilization of the Kvasir-Capsule dataset, we have to segregate the entire dataset
D
into three distinct parts, 1. Training, 2. Validation, and 3. Testing data. This part is crucial
as this helps us train, validate, and test the model.
e have achieved this by utilizing the python library called, “Split Folders”. This library
W
helps us split the data across multiple files in a set ratio provided by the user. For the use
case of our project we have used this library to write a small script splitting the main
dataset into three distinct folders in the ratio of 60:20:20 for training, validation, and testing
respectively. We will also be extracting frames from the videos part of the Kvasir-Capsule
dataset and using those in addition to our testing set.
2.2.2.Data Preprocessing
he labeled and split images of the Kvasir-Capsule dataset are loaded into the program
T
using a python library widely used to develop a machine learning algorithm, known as,
“TensorFlow”. The images loaded are resized into 128x128 and also have been converted
to grayscale to minimize workload and maximize efficiency.
We can see the before and after of the image in fig. 8.
→
Fig. 8
his preprocessing is done to both the training set and the validation set and this will be
T
used to train our model to classify these images and later validate the model to assess the
accuracy of the algorithm and fine tune it accordingly.
9
2.2.3.Model Building
fter the data is made ready to be used, we have to build the model to train and validate the
A
same. The building of the model will be done using the previously mentioned python
library, “Tensorflow”.
The following will be the process of building our model and putting it to assessment.
2.2.3.1.Model Architecture
he machine learning concept we will be applying is the Convoluted Neural Network or
T
CNN. To define the model of this CNN we will be using the Tensorflow library to utilize
the concepts of the Convolution Layer, Pooling Layer, Feature Mapping, Fully Connected
Layer, Input Layer, and Activation Functions. All applications of these concepts are well
explained and structured within the documentation provided by Tensorflow.
The model architecture being defined is visually represented in Fig. 9.
Fig. 9
The CNN architecture contains the following hierarchical structure:
I nput Layer:The raw input data (for this project,an image) is presented to the
network post the data preprocessing in a 128x128 grayscale image.
onvolutional Layers:This applies the convolutionalfilters(32-512) to the input,
C
doing edge detection, textures, and more sophisticated features in deeper layers. Each
filter convolves over the input, producing a feature map which will highlight specific
features of the input image. It lets the model learn local patterns.
ctivation Function:For every convolution, a nonlinearactivation function such as
A
ReLu is applied to introduce non-linearity to help the model process complexity.
10
ooling Layers:For this project, pooling such as max pooling is applied as it reduces
P
the height and width of spatial dimensions in feature maps; thereby it reduces the
computational complexity and enhances robustness to small input translations. For
example max pooling chooses the maximal value from a patch in the feature map.
lattening Layer:Finally, after the convolutionaland pooling layers, numerous times
F
repeated, the 2D feature maps are flattened into a 1D vector that gets as an input to the
fully connected layers. Here, features extracted are prepared for classification or
regression.
ully Connected Layers:These are fully connected to the previous layer and have the
F
purpose of combining what was learned by the convolutional layers. The last fully
connected layer will then output what must be class probabilities in case of a
classification.
utput Layer:The final output layer takes an activationsuch as the softmax function,
O
it has 14 output neurons representing the 14 classes of the Kvasir-Capsule endoscopy
images.
2.2.3.2.Model Training
he CNN is compiled in accordance to the above architecture along an Adam optimizer
T
and sparse categorical cross-entropy loss function, and trained on the training data for
10 epochs initially. The summary of the CNN model prior to training can be seen in
Fig. 10.
Fig.10
11
2.2.3.3.Model Evaluation
he model is evaluated after it is trained to generate loss and accuracy of the model for
T
the training set and the validation set to assess the results and make any necessary
changes required to improve the results and the efficiency.
he evaluated loss and accuracy for the training set and validation set is as shown in
T
Fig. 11 and Fig. 12 respectively.
Fig. 11
Fig. 12
2.2.3.4.Model Saving
he trained model after it has met all the satisfactory requirements is saved in keras
T
format to be used for future uses such as a web application.
2.2.4.Evaluation and Visualization
everal metrics have been used to evaluate the model and also have been represented in
S
an visual format as mentioned below:
ccuracy Visualization:The accuracy obtained by the model for the training set and
A
the validation set has been visualized in a graphical representation as shown in Fig. 13.
12
onfusion Matrix:This metric summarizes the performance of the model on a set of
C
test data. The confusion matrix visualization is as shown in Fig. 14.
Fig. 14
2.2.5.Results
The model is able to classify the Endoscopy images with an accuracy of 95%.
he model has more room for improvement as the training data has shown a training
T
accuracy of 98.9%. The classification matrix of the model is as shown in Fig. 15.
Fig. 15
13
2.2.6.Web Application
he development of a web application provides a neat interface to classify these images
T
with ease for medical experts with no programming background.
e have implemented the Streamlit library to develop our web app and integrate our AI
W
model into the web app to be used with no difficulties.
The website is divided into three parts, Home, About, and Disease Detection.
ome:This webpage comprises the welcoming of theuser to the website and has UI
H
shown in Fig. 16. From here the user can navigate to the About and Disease Detection
page using the “Dashboard” dropdown menu.
Fig. 16
14
bout:This webpage explains to the user the intent behind this website and the goal
A
we sought out to achieve making this as shown in Fig. 17. This page also explains the
workings of the Disease Detection page and the user can navigate to other pages by
using the “Dashboard” dropdown menu.
Fig. 17
isease Detection:This webpage is essentially the heart of the entire project and
D
allows the to access model by giving an input image and getting a prediction as an
output. This page allows the user to see the input file provided by clicking the button
labeled as “Show Image”. The entire page is as shown in Fig. 18.
15
References
[1] Kvasir-Capsule Data by OSF Repository [Online]. Available:https://osf.io/dv2ag/
[ 2] P. Wang, S.M. Krishnan, C. Kugean, and M.P. Tjoa, “Classification of endoscopic
images based on texture and neural network” [Online]. Available:
https://ieeexplore.ieee.org/document/1019637
[ 3] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo,et al., “Tensorflow:
Large-scale machine learning on heterogeneous distributed systems.” [Online].
Available: https://arxiv.org/abs/1603.04467
[ 4]Yaw Afriyie, Benjamin A. Weyori, and Alex A. Opoku,“Gastrointestinal tract
disease recognition based on denoising capsule network” [Online]. Available:
https://www.tandfonline.com/doi/full/10.1080/23311916.2022.2142072#abstract
[ 5] Afriyie, Y., Weyori, B. A., & Opoku, A. A., “Exploring optimised capsule network
on complex images for medical diagnosis.” [Online]. Available:
https://ieeexplore.ieee.org/abstract/document/9682081
[ 6] Curtis P. Langlotz , Bibb Allen, Bradley J. Erickson, Jayashree Kalpathy-Cramer, et
al., “A Roadmap for Foundational Research on Artificial Intelligence in Medical
Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop” [Online].
Available:https://pubs.rsna.org/doi/full/10.1148/radiol.2019190613#