0% found this document useful (0 votes)

13 views7 pages

NanoNet Real Time Polyp Segmentation in

The document presents NanoNet, a lightweight deep learning architecture designed for real-time segmentation of polyps in video capsule endoscopy and colonoscopy images. It addresses challenges in obtaining annotated datasets and offers a model with significantly fewer parameters, enabling efficient performance on low-end hardware. The proposed architecture demonstrates improved segmentation accuracy and speed compared to traditional methods, making it suitable for clinical applications.

Uploaded by

Sunita Borse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

NanoNet Real Time Polyp Segmentation in

Uploaded by

Sunita Borse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

1

NanoNet: Real-Time Polyp Segmentation in Video

Capsule Endoscopy and Colonoscopy
Debesh Jha∗† , Nikhil Kumar Tomar∗ , Sharib Ali§ , Michael A. Riegler∗ ,
Håvard D. Johansen† , Dag Johansen† , Thomas de Lange¶k∗∗†† , Pål Halvorsen∗‡
∗ SimulaMet, Norway † UiT
The Arctic University of Norway, Norway ‡ Oslo Metropolitan University, Norway
§ Institute of Biomedical Engineering, University of Oxford, Oxford, UK
¶ Department of Medical Research, Bærum Hospital, Norway k Augere Medical AS, Norway
∗∗ Medical Department, Sahlgrenska University Hospital-Mölndal Hospital, Sweden
†† Department of Molecular and Clinical Medicine, Sahlgrenska Academy, University of Gothenburg, Sweden
arXiv:2104.11138v1 [eess.IV] 22 Apr 2021

Abstract—Deep learning in gastrointestinal endoscopy can and Video Capsule Endoscopy (VCE) are non-invasive tests.
assist to improve clinical performance and be helpful to assess VCE is a technology for capturing the video inside the GI
lesions more accurately. To this extent, semantic segmentation tract. It has evolved as an important tool for detecting small
methods that can perform automated real-time delineation of a
region-of-interest, e.g., boundary identification of cancer or pre- bowel diseases [2].
cancerous lesions, can benefit both diagnosis and interventions. Deep Learning (DL) methods have made a significant break-
However, accurate and real-time segmentation of endoscopic im- through in several medical domain such as lung cancer detec-
ages is extremely challenging due to its high operator dependence tion [3], diabetic retinopathy progression [4], and obstructive
and high-definition image quality. To utilize automated methods hypertrophic cardiomyopathy detection [5]. It has provided
in clinical settings, it is crucial to design lightweight models
with low latency such that they can be integrated with low-end new opportunities to solve challenges such as bleeding, light
endoscope hardware devices. In this work, we propose NanoNet, over/underexposure, smoke, and reflections [6]. However, DL
a novel architecture for the segmentation of video capsule normally needs a large annotated dataset for the implemen-
endoscopy and colonoscopy images. Our proposed architecture tation of methods. It is difficult to obtain a labeled medical
allows real-time performance and has higher segmentation accu- dataset. First, it needs collaborations with the hospitals. For
racy compared to other more complex ones. We use video capsule
endoscopy and standard colonoscopy datasets with polyps, and a data collection, the doctors require approval from various
dataset consisting of endoscopy biopsies and surgical instruments, authorities and patient consent. They need to set protocols
to evaluate the effectiveness of our approach. Our experiments for the collection, and the collected data must be anonymized
demonstrate the increased performance of our architecture in and cleaned with the help of data engineers. Domain experts
terms of a trade-off between model complexity, speed, model must label raw data, and after labeling, the annotations must be
parameters, and metric performances. Moreover, the resulting
models´ size is relatively tiny, with only nearly 36,000 parameters done depending upon the need of the task. The whole process
compared to traditional deep learning approaches having millions requires an significant amount of expert time and is costly.
of parameters. Additionally, it is an operator-dependent process. The quality
Index Terms—Video capsule endoscopy, colonoscopy, deep of the data labeling and annotation depends on the expertise
learning, segmentation, tool segmentation of the clinicians. Therefore, it is challenging to curate a larger
dataset.
One way of solving the dataset issue is to create synthetic
I. I NTRODUCTION images using Genearative Adversarial Network (GAN) [7].
Gastrointestinal (GI) endoscopy is a widely used technique However, generated synthetic images may not always capture
to diagnose and treat anomalies in the upper (esophagus, all the properties and characteristics of real endoscopic im-
stomach, and duodenum) and the lower (large bowel and ages. Consequently, the model may only learn to predict the
anus) GI tract. Among the other GI tract organs, colorectal properties from the synthetic images and may not perform
cancer (CRC) has the highest cancer incidences and mortality well on a real endoscopic dataset. Another solution could
rate [1]. There are several CRC screening options. Theses are be domain adaptation from a similar endoscopic dataset.
usually divided into two categories, namely, invasive (visual However, we lack large publicly available labeled endoscopic
examination-based test) and non-invasive based tests (stool, datasets. Thus, a viable and compelling approach to solve the
blood, and radiological test). Colonoscopy, the gold standard semantic segmentation task is to reuse ImageNet pre-trained
for examining the large bowel (colon and rectum), is an encoders in the segmentation model [8]. The predicted masks
invasive examination used to detect, observe, and remove ab- from the algorithm can provide reliable information to the
normalities (such as polyps). It detects colorectal cancer with endoscopic model.
both high sensitivity and specificity. Sigmoidscopy is another A lightweight Convolutional Neural Network (CNN) model
invasive test. Computed Tomography(CT) Colonoscopy, Fecal can be essential for the development of real-time and efficient
Occult Blood Test (FOBT) Fecal Immunochemical Test (FIT), semantic segmentation methods. Usually, lightweight models
2

are computationally efficient and require less memory. A to minimize the research gap towards building a clinically
smaller number of parameters makes the network less redun- relevant model.
dant. Lightweight CNN models are mainly being deployed
in mobile applications [9]. A lightweight model can play a B. Lightweight model
crucial role from a system perspective with a limited resource There are few works in the literature that have proposed
constraint for real-time prediction in clinics. Consequently, lightweight models for image segmentation. Ni et al. [29]
we propose a novel architecture, NanoNet, optimized for presented a novel bilinear attention network-based approach
faster inference and high accuracy. An extremely lightweight with an adaptive receptive field for the segmentation of
model with very few trainable parameters, faster inference, and surgical instruments. Wang et al. [30] proposed a lightweight
higher performance would require less memory footprint to be encoder-decoder network (LEDNet), an encoder-decoder
incorporated with any devices. Therefore, we put forward this network that uses ResNet50 in the encoder block and
approach to address the challenges in endoscopy. attention pyramidal network in the decoder block. Beheshti
The main contributions of this work include the following: et al. [31] proposed SqueezeNet. The architecture of the
1) We proposed a novel architecture, named NanoNet, to SqueezeNet is inspired by UNet [32]. The proposed model
segment video capsule endoscopy and colonoscopy im- obtained a 12× reduction in model size and showed efficient
ages in real-time with high accuracy. The proposed archi- performance in multiplication accumulation (mac) and
tecture is very lightweight, and the model size is smaller, memory uses.
requiring less computational cost.
2) VCE datasets are difficult to obtain with pixel-wise From the above-related work, we identify a need for a real-
annotations. In this context, we have annotated 55 polyps time polyp segmentation method. A real-time polyp segmenta-
from the “polyp” class of the Kvasir-Capsule dataset with tion method can be achieved by building a lightweight network
the help of an expert gastroenterologist. We have made architecture by designing an efficient network with blocks that
this dataset public and provided the benchmark. require fewer parameters. A lower number of network param-
3) NanoNet achieves promising performance on the eters will reduce the network complexity, leading to real-time
KvasirCapsule-SEG, Kvasir-SEG [10], 2020 Medico au- or faster inference. In this respect, we propose NanoNet, which
tomatic polyp segmentation challenge [11], 2020 Endo- uses a lightweight pre-trained network MobileNetV2 [33],
Tect challenge [12], and Kvasir-Instrument [13] datasets. and simple convolutional blocks such as residual block and
All experiments conform with state-of-the-art (SOTA) in squeeze and excite block.
terms of parameter uses (size), speed, computation, and
performance metrics. III. N ETWORK ARCHITECTURE
4) The model can be integrated with mobile and embedded The architecture of NanoNet follows an encoder-decoder
devices because of fewer parameters used in the network. approach as shown in Figure 1. As depicted in Figure 1, the
network architecture uses a pre-trained model as an encoder,
II. R ELATED WORK followed by the three decoder blocks. Using pre-trained Ima-
A. Semantic segmentation of endoscopic images geNet [34] models for transfer learning has become the best
Semantic segmentation of endoscopic images has been a choice for many CNN architectures [8], [25]. It helps the
well-established topic in medical image segmentation. Earlier model converge much faster and achieves high performance
work mostly relied on the handcrafted descriptors for feature compared to the non-pre-trained model. The proposed archi-
learning [14], [15]. The handcrafted features such as color, tecture uses a MobileNetV2 [33] model pre-trained on the
shape, texture, and edges were extracted and fed to the Ma- ImageNet [34] dataset as the encoder. The decoder is built
chine Learning (ML) classifier, which separates lesions from using a modified version of the residual block, which was
the background. However, the traditional ML methods based initially introduced by He et al. [35]. The encoder is used to
on handcrafted features suffer from low performance [16]. The capture the required contextual information from the input,
recent works on polyp segmentation using both video capsule whereas the decoder is used to generate the final output by
endoscopy and colonoscopy mostly relied on Deep Neural using the contextual information extracted by the encoder.
Network (DNN) [17]–[23].
With the DNN methods, there is progress in the performance A. MobileNetV2
for segmenting endoscopic images (for example, polyps). The MobileNetV2 [33] is an architecture that is primarily
However, the network architectures are often complex and designed for mobile and embedded devices. The architecture
requires high-end GPUs for training, and is computationally performed well on a variety of different datasets while main-
expensive [22], [24], [25]. Additionally, real-time lesion seg- taining high accuracy, despite having fewer parameters. The
mentation has often been ignored. Although there is some architecture of MobileNetV2 is based on the architecture of
recent initiation for the real-time detection of endoscopic MobileNetV1, which uses depth-wise separable convolutions
images, they have mostly used private datasets [26]–[28] for as the main building block. A depth-wise separable convolu-
the experimentation. It is difficult to compare the new methods tion consists of depth-wise convolution followed by a point-
on these datasets and extend the benchmark. Therefore, there wise convolution. The MobileNetV2 introduces two main
is a need for a benchmark on publicly available datasets ideas: inverted residual block and linear bottleneck block [33].
3

standard convolution has a linear activation before an element-

wise addition is performed with the identity mapping.

B. Modified Residual Block

The original residual block uses two 3 × 3 standard con-
volutions, where the first convolution is followed by a batch-
normalization and a ReLU activation function. After that, the
second convolution is followed only by a batch-normalization.
An element-wise addition is performed between the output of
the batch-normalization and the identity mapping, followed by
another ReLU activation function. An identity mapping con-
sists of a 1×1 standard convolution and a batch-normalization
over the original input.
We have modified the residual block for our network. The
modified residual block starts with a 1×1 convolution followed
by a 3 × 3 convolution. In both of these convolutions, we
reduce the number of filters by 41 , which are then followed by
the batch normalization and the ReLU activation function. We
have a 3 × 3 convolution with batch normalization. Now, we
perform an element-wise addition with the identity mapping.
Finally, we apply a ReLU activation function followed by
the squeeze and excitation block. The squeeze and excitation
block improves the quality of feature maps by increasing their
sensitivity towards essential features.

C. The NanoNet architecture

Figure 1 shows the block diagram of the NanoNet archi-
tecture. The NanoNet architecture starts with a pre-trained
MobileNetV2 as an encoder followed by a decoder. There
is a modified residual block between the encoder and the
decoder, which acts like a bridge that connects the encoder
and the decoder. In the first step, we feed the image data into
the pre-trained encoder. The pre-trained encoder starts with a
standard convolution with 32 feature channels, followed by the
bottleneck layer with ReLU6 as the activation function. All the
Fig. 1: Overview of the proposed NanoNet architecture convolution operations use a standard 3 × 3 kernel size. The
entire encoder network progressively downsamples the feature
maps by using strided convolution and slowly increases the
number of feature channels alternatively.
The inverted residual block is based on the bottleneck The output from the pre-trained encoder passes through the
residual block as described in the [35], which consists of modified residual block, which is fed to the decoder. Every
three standard convolutions, which are 1 × 1, 3 × 3, and step in the decoder uses a bilinear upsampling to increase the
1 × 1. Every convolution layer is followed by a Rectified spatial dimension (height and width) of the input feature maps.
Linear Unit (ReLU) non-linearity. In the first 1 × 1 standard After that, it is concatenated with the appropriate feature maps
convolution, the number of feature channels are reduced, and from the pre-trained encoder using the skip connections. These
in the last 1 × 1 standard convolution, the number of feature skip connections pass information that may be lost sometimes
channels are expanded. After that, an element-wise addition between the layers and are used to improve the quality of
with the identity mapping is performed. The inverted residual the feature maps. These concatenated feature maps are passed
block also has three convolution layers: a 1 × 1 standard through the modified residual block, which further increases
convolution, a 3 × 3 depth-wise convolution, and a 1 × 1 the generalization capacity of the decoder. After the feature
standard convolution. Every convolution has a ReLU activation maps pass through all the three decoder block, the output of
function. Here, the exact opposite of the bottleneck residual the last decoder block is fed to a 1 × 1 convolution with a
block is performed. The first 1 × 1 standard convolution number of classes as the feature channels. This is followed by
expands the number of feature channels, and the last 1 × 1 the sigmoid activation if it is a binary segmentation task, else
standard convolution reduces the number of feature channels. we use the softmax activation function.
Due to this opposite functionality, it is referred to as an We have demonstrated three different NanoNet architec-
inverted residual block. The linear bottleneck block is the tures: NanoNet-A, NanoNet-B, and NanoNet-C. Each archi-
same as the inverted residual block, except the last 1 × 1 tecture consists of different feature channels in its decoder
4

TABLE I: Publicly available endoscopic datasets used in our experiments

Dataset No. of Images Imaging Type Availability
KvasirCapsule-SEG 55 Video capsule endoscopy https://www.dropbox.com/sh/hr46vieykbmvmkk/
AAAs V8ECG0wq51Fpw3rYU 5a?dl=0
Kvasir-SEG [10] 1000 Colonoscopy https://datasets.simula.no/kvasir-seg/
2020 Medico automatic polyp segmen- 160⋄ Colonoscopy https://multimediaeval.github.io/editions/2020/tasks/
tation challenge [11] medico/
Endotect Challenge Dataset [12] 200⋄ Colonoscopy https://endotect.com/
Kvasir-Instrument [36] 590 Colonoscopy https://datasets.simula.no/kvasir-instrument/
⋄ test images

Table I shows the detailed information about the open

imaging dataset used in our experiments. Each of the datasets
presented in Table I also has the corresponding ground truth.
The link for each of the datasets is provided in the table.
The standard setting for the “Medico automatic polyp seg-
mentation challenge” and “Endotect challenge” is that they
use the Kvasir-SEG for training. The challenge organizers
have provided unseen 160 images in the “Medico automatic
polyp segmentation challenge” and released 200 images in
the “Endotect challenge” to test the participant’s approaches.
Fig. 2: Polyps and corresponding masks from KvasirCapsule- For the Kvasir-instrument dataset, we experimented with the
SEG official split provided by the organizers. The detail explanation
of these datasets and the baseline results can be found in [10]–
[13].
block. NanoNet-A consists of 32, 64 and 128 feature channels.
In NanoNet-B, the number of feature channels is reduced to
B. Evaluation metrics
32, 64, and 96. In NanoNet-C, these feature channels are
further reduced to 16, 24, and 32. The reduction in the number For the evaluation of our model, we have chosen standard
of feature channels leads to less trainable parameters, which computer vision metrics such as Dice Coefficient (DSC), mean
simplifies the model complexity leading to a light-weight Intersection over Union (mIoU), Precision, Recall, Specificity,
network. Accuracy, and Frame-per-second (FPS). More explanation of
these metrics can be found in [10]–[13].
IV. E XPERIMENTAL SETUP
C. Implementation details
In this section, we will describe the dataset, evaluation met-
rics, implementation details, and data augmentation techniques We have implemented the NanoNet using Keras3 with
used. TensorFlow [40] as backend. The experiments were run on
the Experimental Infrastructure for Exploration of Exascale
Computing (eX3), NVIDIA DGX-2 machine. The code im-
A. Datasets
plementation of NanoNet can be found here4 . As the model
To address the polyp segmentation problem from video has very few low trainable parameters, we have set a batch size
capsule endoscopy images, we have selected the polyp class of 16. We have resized the dataset images to 256 × 256 pixels
from labelled images folder of the Kvasir-Capsule dataset [37] for better utilization of the GPU, and it also helps to reduce
and annotated it with the help of an expert gastroenterologist. the training time. The model is trained on 200 epochs with the
The Kvasir-Capsule is an open-access dataset that contains 13 Nadam optimizer [41] and dice coefficient as the loss function.
classes of labelled anomalies and findings. It only includes 55 The learning rate for the optimizer is set to 1e−4 . We prefer to
polyp frames out of 44,228 medically verified video capsule choose a low learning rate to update the parameters slowly and
frames present in the Kvasir-Capsule. We have annotated the carefully. The learning rate is reduced by a factor of 0.1 when
polyp class of Kvasir-Capsule and generated corresponding the validation loss does not decrease in 10 consecutive epochs.
ground truth masks. Examples of polyps and their correspond- It helps to improve model performance. Additionally, we have
ing masks from KvasirCapsule-SEG can be found in Figure 2. used an early stopping mechanism to prevent over-fitting.
Furthermore, we also provide bounding box information to be
used for video capsule endoscopy detection and localization
D. Data augmentation
tasks. The Kvasir-Capsule can be downloaded from here 1 and
KvasirCapsule-SEG can be downloaded from here 2 . We use data-augmentation on the training set to increase
diversity and to improve the generalization of our model. Data
1 https://osf.io/dv2ag/
2 https://www.dropbox.com/sh/hr46vieykbmvmkk/AAAs 3 https://keras.io/

V8ECG0wq51Fpw3rYU 5a?dl=0 4 https://github.com/DebeshJha/NanoNet

TABLE II: Performance evaluation of the proposed networks and recent SOTA methods on KvasirCapsule-SEG
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [38] 8,227,393 0.9532 0.9137 0.9785 0.9325 0.9677 0.9386 17.96
ResUNet++ (ISM’19) [24] 4,070,385 0.9499 0.9087 0.9762 0.9296 0.9648 0.9334 15.39
NanoNet-A (Ours) 235,425 0.9493 0.9059 0.9693 0.9325 0.9609 0.9351 28.35
NanoNet-B (Ours) 132,049 0.9474 0.9028 0.9682 0.9308 0.9593 0.9324 27.39
NanoNet-C (Ours) 36,561 0.9465 0.9021 0.9754 0.9238 0.9629 0.9297 29.48

TABLE III: Performance evaluation of the proposed networks and recent SOTA methods on Kvasir-SEG [10]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [38] 8,227,393 0.7203 0.6106 0.7602 0.7624 0.7327 0.9251 17.72
ResUNet++ (ISM’19) [24] 4,070,385 0.7310 0.6363 0.7925 0.7932 0.7478 0.9223 19.79
NanoNet-A (Ours) 235,425 0.8227 0.7282 0.8588 0.8367 0.8354 0.9456 26.13
NanoNet-B (Ours) 132,049 0.7860 0.6799 0.8392 0.8004 0.8067 0.9365 29.73
NanoNet-C (Ours) 36,561 0.7494 0.6360 0.8081 0.7738 0.7719 0.9290 32.17

TABLE IV: Performance evaluation of the proposed networks and recent SOTA methods on the Medico 2020 dataset [11]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [38] 8,227,393 0.6846 0.5599 0.7235 0.7236 0.6961 0.9231 18.54
ResUNet++ (ISM’19) [24] 4,070,385 0.6925 0.5849 0.8249 0.6840 0.7434 0.8995 19.47
NanoNet-A (Ours) 235,425 0.7364 0.6319 0.8566 0.7310 0.7804 0.9166 28.07
NanoNet-B (Ours) 132,049 0.7378 0.6247 0.8283 0.7373 0.7685 0.9223 29.04
NanoNet-C (Ours) 36,651 0.7070 0.5866 0.8095 0.7089 0.7432 0.9148 32.66

TABLE V: Performance evaluation of the proposed networks and recent SOTA methods on the Endotect 2020 dataset [12]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [39] 8,227,393 0.6640 0.5408 0.7510 0.6841 0.6943 0.9075 26.55
ResUNet++ (ISM’19) [24] 4,070,385 0.6940 0.5838 0.8797 0.6591 0.7597 0.8841 18.58
NanoNet-A (Ours) 235,425 0.7508 0.6466 0.8238 0.7744 0.7773 0.9255 27.19
NanoNet-B (Ours) 132,049 0.7362 0.6238 0.8109 0.7532 0.7646 0.9252 29.91
NanoNet-C (Ours) 36,651 0.7001 0.5792 0.8000 0.7159 0.7380 0.9091 32.98

TABLE VI: Performance evaluation of the proposed networks and recent SOTA methods on Kvasir-Instrument [13]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
UNet (Baseline) [39] - 0.9158 0.8578 0.9487 0.8998 0.9320 0.9864 20.46
DoubleUNet (Baseline) [25] - 0.9038 0.8430 0.9275 0.8966 0.9147 0.9838 10.00
ResUNet++ (ISM’19) [24] 4,070,385 0.9140 0.8635 0.9103 0.9348 0.9140 0.9866 17.87
NanoNet-A (Ours) 235,425 0.9251 0.8768 0.9142 0.9540 0.9251 0.9887 28.00
NanoNet-B (Ours) 132,049 0.9284 0.8790 0.9205 0.9482 0.9284 0.9875 29.82
NanoNet-C (Ours) 36,561 0.9139 0.8600 0.9037 0.9452 0.9139 0.9863 32.18

augmentation techniques such as random cropping, random show that NanoNet can produce real-time segmentation (i.e.,
rotation, horizontal flipping, vertical flipping, grid distortion, produces at least close to 30 FPS for each dataset present in
and many more are used. We have used an offline data the Tables). This is one of the major contributions of the work.
augmentation technique. The validation and testing set is not The other strength of the work lies in the parameter use. From
augmented and is directly resized into 256 × 256. Table II, we can observe that the best performing NanoNet
(i.e., NanoNet-A) uses nearly 35 times less parameters as
V. R ESULT AND D ISCUSSION ResUNet [38]. Similarly, NanoNet-C uses 225 times less
parameters as compared to that of ResUNet and also produces
In this section, we provide the experimental results for
better DSC, mIoU and FPS with the Kvasir-SEG.
the segmentation task of the endoscopic image dataset. For
the evaluation, we have used performance metrics such as The qualitative results are displayed in Figure 3. The first,
DSC and mIoU, and FPS as the main evaluation metrics. We second, and third columns show the image, ground truth,
also calculate recall, precision, F2, and overall accuracy to and prediction masks, respectively. Similarly, the name of the
support a complete set of metrics. Table II, Table III, Table IV, dataset is provided on the left side. One example image for
Table V, and Table VI show the results of the NanoNet each dataset is shown. The qualitative results with diversified
model experiments using different parameters. The results are classes of medical datasets show that NanoNet can produce
compared with the recent SOTA computer vision methods. accurate segmentation results with different types of lesions
The quantitative results in these tables show that NanoNet (polyps) and therapeutic tools. The example images and the
consistently outperforms or performs nearly equal to its com- prediction also show that NanoNet produces good segmenta-
petitors in terms of performance. The quantitative results also tion masks for large, medium, and small polyps (see Figure 3).
6

performance with minuscule trainable parameters. The exper-

imental results on varied endoscopy datasets demonstrate the
strength of our model compared to state-of-the-art models
with respect to their speed and performance. The presented
model has the potential to enable easier roll out of deep
learning models in clinical systems due to fewer parameters,
competitive accuracy, and low-latency. In addition, the model
does not require any sort of initialization, post-processing, or
temporal regularization, considered as another strength of this
work. In the future, we will design an encoder lighter than the
currently used pre-trained MobilNetV2. Moreover, we aspire
to utilize the currently built segmentation module in the clinic
and study the efficacy of our designed model.

ACKNOWLEDGMENT
The research is partially funded by the PRIVATON project
(263248) and the Autocap project (282315) from the Research
Council of Norway (RCN). Our experiments were performed
on the Experimental Infrastructure for Exploration of Exascale
Computing (eX3) system, which is financially supported by
RCN under contract 270053.

R EFERENCES
[1] H. Sung et al., “Global cancer statistics 2020: Globocan estimates of
incidence and mortality worldwide for 36 cancers in 185 countries,” CA:
a cancer journal for clinicians, 2021.
[2] A. Kornbluth, P. Legnani, and B. S. Lewis, “Video capsule endoscopy
in inflammatory bowel disease: past, present, and future,” Inflammatory
Bowel Diseases, vol. 10, no. 3, pp. 278–285, 2004.
[3] D. Ardila et al., “End-to-end lung cancer screening with three-
dimensional deep learning on low-dose chest computed tomography,”
Nature medicine, vol. 25, no. 6, pp. 954–961, 2019.
[4] F. Arcadu et al., “Deep learning algorithm predicts diabetic retinopathy
progression in individual patients,” NPJ digital medicine, vol. 2, no. 1,
pp. 1–9, 2019.
[5] E. M. Green et al., “Machine learning detection of obstructive hy-
pertrophic cardiomyopathy using a wearable biosensor,” NPJ digital
medicine, vol. 2, no. 1, pp. 1–4, 2019.
Fig. 3: Qualitative results of NanoNet-A on five different [6] S. Bodenstedt et al., “Comparative evaluation of instrument segmenta-
datasets tion and tracking methods in minimally invasive surgery,” arXiv preprint
arXiv:1805.02475, 2018.
[7] I. J. Goodfellow et al., “Generative adversarial networks,” arXiv preprint
arXiv:1406.2661, 2014.
From the qualitative results, we can derive and conclude that [8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
NanoNet produces good results with small-sized polyps but “Deeplab: Semantic image segmentation with deep convolutional nets,
atrous convolution, and fully connected crfs,” IEEE transactions on
produces over-segmentation for the large-sized lesions upon pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848,
detail dissection. For future work, one could create a specific 2017.
dataset consisting of a set of small and large-sized polyps to [9] Y.-D. Kim et al., “Compression of deep convolutional neural net-
works for fast and low power mobile applications,” arXiv preprint
explore this further. arXiv:1511.06530, 2015.
From both evaluation metrics and qualitative results, the [10] D. Jha et al., “Kvasir-seg: A segmented polyp dataset,” in Proc. of
improvement is remarkable. Thus, the proposed NanoNet International Conference on Multimedia Modeling (MMM), 2020, pp.
architecture is simple, compact, and provides a robust solution 451–462.
[11] D. Jha, S. A. Hicks, K. Emanuelsen, H. Johansen, D. Johansen,
for real-time applications, as it produces satisfactory perfor- T. de Lange, M. A. Riegler, and P. Halvorsen, “Medico multimedia
mance despite having fewer parameters. task at mediaeval 2020: Automatic polyp segmentation,” in CEUR
Proceedings of MediaEval Workshop, 2020.
[12] S. A. Hicks et al., “The endotect 2020 challenge: Evaluation and compar-
VI. C ONCLUSION ison of classification, segmentation and inference time for endoscopy,”
In this paper, we proposed a novel lightweight architecture in Proceedings of ICPR 2020 Workshops and Challenges, 2020.
[13] D. Jha et al., “Kvasir-instrument: Diagnostic and therapeutic tool seg-
for real-time video capsule endoscopy and colonoscopy image mentation dataset in gastrointestinal endoscopy,” in Proc. of Multimedia
segmentation. The proposed NanoNet architecture utilizes a Modeling (MMM), 2021.
pre-trained MobileNetV2 model and a modified residual block. [14] S. A. Karkanis, D. K. Iakovidis, D. E. Maroulis, D. A. Karras, and
M. Tzivras, “Computer-aided tumor detection in endoscopic video using
The depthwise separable convolution is the main building color wavelet features,” IEEE transactions on information technology in
block of the network and allows the model to achieve high biomedicine, vol. 7, no. 3, pp. 141–152, 2003.
7

[15] S. Ameling, S. Wirth, D. Paulus, G. Lacey, and F. Vilarino, “Texture- [39] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
based polyp detection in colonoscopy,” in Bildverarbeitung für die for biomedical image segmentation,” in Proc. of International Confer-
Medizin 2009, 2009, pp. 346–350. ence on Medical image computing and computer-assisted intervention
[16] J. Bernal, J. Sánchez, and F. Vilarino, “Towards automatic polyp (MICCAI), 2015, pp. 234–241.
detection with a polyp appearance model,” Pattern Recognition, vol. 45, [40] M. Abadi et al., “Tensorflow: A system for large-scale machine learn-
no. 9, pp. 3166–3182, 2012. ing,” in Proc. of USENIX Symposium on Operating Systems Design and
[17] X. Jia, X. Xing, Y. Yuan, L. Xing, and M. Q.-H. Meng, “Wireless capsule Implementation (OSDI), 2016, pp. 265–283.
endoscopy: A new tool for cancer screening in the colon with deep- [41] T. Dozat, “Incorporating nesterov momentum into adam,” in Proc. of
learning-based polyp recognition,” Proceedings of the IEEE, vol. 108, International Conference on Learning Representations, 2016.
no. 1, pp. 178–197, 2019.
[18] V. Prasath, “Polyp detection and segmentation from video capsule
endoscopy: A review,” Journal of Imaging, vol. 3, no. 1, p. 1, 2017.
[19] N. K. Tomar et al., “Fanet: A feedback attention network for improved
biomedical image segmentation,” arXiv preprint arXiv:2103.17235,
2021.
[20] Y. Guo, J. Bernal, and B. J Matuszewski, “Polyp segmentation with
fully convolutional deep neural networks—extended evaluation study,”
Journal of Imaging, vol. 6, no. 7, p. 69, 2020.
[21] S. Ali et al., “Deep learning for detection and segmentation of artefact
and disease instances in gastrointestinal endoscopy,” Medical Image
Analysis, p. 102002, 2021.
[22] D.-P. Fan et al., “Pranet: Parallel reverse attention network for polyp
segmentation,” in Proc. of International Conference on Medical Image
Computing and Computer-Assisted Intervention (MICCAI), 2020, pp.
263–273.
[23] D. Jha et al., “A Comprehensive Study on Colorectal Polyp Seg-
mentation with ResUNet++, Conditional Random Field and Test-Time
Augmentation,” IEEE Journal of Biomedical and Health Informatics.
[24] D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange,
P. Halvorsen, and H. D. Johansen, “ResUNet++: An Advanced Archi-
tecture for Medical Image Segmentation,” in Proc. of IEEE International
Symposium on Multimedia (ISM), 2019, pp. 225–2255.
[25] D. Jha, , M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen,
“DoubleU-Net: A Deep Convolutional Neural Network for Medical Im-
age Segmentation,” in Proc. of International Conference on Multimedia
Modeling (MMM), 2020, pp. 451–462.
[26] J. Y. o. Lee, “Real-time detection of colon polyps during colonoscopy us-
ing deep learning: systematic validation with four independent datasets,”
Scientific reports, vol. 10, no. 1, pp. 1–9, 2020.
[27] M. Yamada et al., “Development of a real-time endoscopic image diag-
nosis support system using deep learning technology in colonoscopy,”
Scientific reports, vol. 9, no. 1, pp. 1–9, 2019.
[28] C. C. Poon et al., “Ai-doscopist: a real-time deep-learning-based algo-
rithm for localising polyps in colonoscopy videos with edge computing
devices,” NPJ Digital Medicine, vol. 3, no. 1, pp. 1–8, 2020.
[29] Z.-L. Ni et al., “Barnet: Bilinear attention network with adaptive
receptive field for surgical instrument segmentation,” arXiv preprint
arXiv:2001.07093, 2020.
[30] Y. Wang et al., “Lednet: A lightweight encoder-decoder network for real-
time semantic segmentation,” in Proc. of IEEE International Conference
on Image Processing (ICIP), 2019, pp. 1860–1864.
[31] N. Beheshti and L. Johnsson, “Squeeze u-net: A memory and energy
efficient image segmentation network,” in Proc. of IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition (CVPR) Workshops,
2020, pp. 364–365.
[32] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Effi-
cient residual factorized convnet for real-time semantic segmentation,”
IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1,
pp. 263–272, 2017.
[33] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
“Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proc. of
IEEE conference on computer vision and pattern recognition, 2018, pp.
4510–4520.
[34] J. Deng et al., “Imagenet: A large-scale hierarchical image database,” in
Proc. of IEEE conference on computer vision and pattern recognition
(CVPR), 2009, pp. 248–255.
[35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proc. of the IEEE conference on computer vision and
pattern recognition (CVPR), 2016, pp. 770–778.
[36] J. Bernal et al., “Wm-dova maps for accurate polyp highlighting in
colonoscopy: Validation vs. saliency maps from physicians,” Computer-
ized Medical Imaging and Graphics, vol. 43, pp. 99–111, 2015.
[37] P. H. Smedsrud et al., “Kvasir-capsule, a video capsule endoscopy
dataset,” Springer Nature Scientific Data, 2021.
[38] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual u-
net,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp.
749–753, 2018.

Study 2
No ratings yet
Study 2
14 pages
Review
No ratings yet
Review
21 pages
A Systematic Review On Application of Deep Learning in Digestive System Image Processing
No ratings yet
A Systematic Review On Application of Deep Learning in Digestive System Image Processing
16 pages
Jimaging 11 00084 v2
No ratings yet
Jimaging 11 00084 v2
20 pages
Phase1 Esophageal Detection
No ratings yet
Phase1 Esophageal Detection
22 pages
Resunet
No ratings yet
Resunet
7 pages
PHD Synopsis
No ratings yet
PHD Synopsis
7 pages
DE-ColonSegNet: An Improved Version of ColonSegNet With Dual Encoder For Accurate Real Time Polyp Segmentation in Colonos
No ratings yet
DE-ColonSegNet: An Improved Version of ColonSegNet With Dual Encoder For Accurate Real Time Polyp Segmentation in Colonos
5 pages
RAPUNet
No ratings yet
RAPUNet
9 pages
FA23-Polyps Classifcation in Colonoscopy Images Using Image Segmentation Method
No ratings yet
FA23-Polyps Classifcation in Colonoscopy Images Using Image Segmentation Method
39 pages
JMI-24284GRR Online
No ratings yet
JMI-24284GRR Online
20 pages
A Deep Neural Network Improves Endoscopic Detection of Early Gastric Cancer Without Blind Spots
No ratings yet
A Deep Neural Network Improves Endoscopic Detection of Early Gastric Cancer Without Blind Spots
10 pages
Capsule Endoscopy 2018
No ratings yet
Capsule Endoscopy 2018
5 pages
Ensemble U-Net Model For Efficient Polyp Segmentation
No ratings yet
Ensemble U-Net Model For Efficient Polyp Segmentation
3 pages
Synthetic Data-Driven Multi-Architecture Framework For Automated Polyp Segmentation Through Integrated Detection and Mask Generation
No ratings yet
Synthetic Data-Driven Multi-Architecture Framework For Automated Polyp Segmentation Through Integrated Detection and Mask Generation
12 pages
Polyp Segmentation with DBMIA-Net
No ratings yet
Polyp Segmentation with DBMIA-Net
11 pages
Life 13 00719 v2
No ratings yet
Life 13 00719 v2
18 pages
Su-stackedEnsemble-ElsevierCBM (2022)
No ratings yet
Su-stackedEnsemble-ElsevierCBM (2022)
8 pages
1 s2.0 S1746809424006372 Main
No ratings yet
1 s2.0 S1746809424006372 Main
12 pages
Bioengineering 10 00957 v2
No ratings yet
Bioengineering 10 00957 v2
16 pages
DBH Net
No ratings yet
DBH Net
16 pages
Improving Automatic Polyp Detection Using CNN by Exploiting Temporal Dependency in Colonoscopy Video
No ratings yet
Improving Automatic Polyp Detection Using CNN by Exploiting Temporal Dependency in Colonoscopy Video
14 pages
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
No ratings yet
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
5 pages
Gastric Cancer Paper 1
No ratings yet
Gastric Cancer Paper 1
9 pages
Fast Machine Learning Annotation in The Medical Do
No ratings yet
Fast Machine Learning Annotation in The Medical Do
24 pages
Paper 1
No ratings yet
Paper 1
12 pages
Diagnostics 13 01473
No ratings yet
Diagnostics 13 01473
11 pages
1807 10584fhwf
No ratings yet
1807 10584fhwf
6 pages
1 s2.0 S001048252400180X Mainext
No ratings yet
1 s2.0 S001048252400180X Mainext
11 pages
Using DUCK Net For Polyp Image Segmentation: Razvan Gabriel Dumitru, Darius Peteleaza & Catalin Craciun
No ratings yet
Using DUCK Net For Polyp Image Segmentation: Razvan Gabriel Dumitru, Darius Peteleaza & Catalin Craciun
12 pages
Ghosh 2021
No ratings yet
Ghosh 2021
14 pages
Bui MEGANet Multi-Scale Edge-Guided Attention Network For Weak Boundary Polyp Segmentation WACV 2024 Paper
No ratings yet
Bui MEGANet Multi-Scale Edge-Guided Attention Network For Weak Boundary Polyp Segmentation WACV 2024 Paper
10 pages
ICONDEEPCOM 156.docm
No ratings yet
ICONDEEPCOM 156.docm
18 pages
1 s2.0 S0010482524010151 Main
No ratings yet
1 s2.0 S0010482524010151 Main
16 pages
Ai in Gi
No ratings yet
Ai in Gi
49 pages
M Tech. Oral Examination: School of Computer Science & Engineering
No ratings yet
M Tech. Oral Examination: School of Computer Science & Engineering
20 pages
Paper 5
No ratings yet
Paper 5
15 pages
Wireless Capsule Endoscopy Image Classification An Explainable AI Approach
No ratings yet
Wireless Capsule Endoscopy Image Classification An Explainable AI Approach
19 pages
UMF Net (已发表)
No ratings yet
UMF Net (已发表)
9 pages
Transfer Learning Boosts Polyp Segmentation
No ratings yet
Transfer Learning Boosts Polyp Segmentation
3 pages
Classification of Upper Gastrointestinal Tract Diseases Using Endoscopic Images
No ratings yet
Classification of Upper Gastrointestinal Tract Diseases Using Endoscopic Images
10 pages
Plagiarism Checker X - Report: Originality Assessment
No ratings yet
Plagiarism Checker X - Report: Originality Assessment
12 pages
AI App Pitch Deck by Slidesgo
No ratings yet
AI App Pitch Deck by Slidesgo
39 pages
Applsci 13 10800
No ratings yet
Applsci 13 10800
12 pages
s6826 Le Lu Deep Neural Networks
No ratings yet
s6826 Le Lu Deep Neural Networks
60 pages
PolypSeg A Lightweight Context-Aware Network For Real-Time Polyp Segmentation
No ratings yet
PolypSeg A Lightweight Context-Aware Network For Real-Time Polyp Segmentation
12 pages
1 s2.0 S0010482524000155 Main
No ratings yet
1 s2.0 S0010482524000155 Main
13 pages
9.a Review Application Deep Learning Endoscopic
No ratings yet
9.a Review Application Deep Learning Endoscopic
19 pages
Colorectal Cancer Image Recognition Algorithm Based On Improved
No ratings yet
Colorectal Cancer Image Recognition Algorithm Based On Improved
11 pages
Attn Guided CNN 2021
No ratings yet
Attn Guided CNN 2021
12 pages
Resarch Paper01
No ratings yet
Resarch Paper01
11 pages
Endoscopic Image Classification Based On Explainable Deep Learning
No ratings yet
Endoscopic Image Classification Based On Explainable Deep Learning
14 pages
Preprints202110 0135 v1
No ratings yet
Preprints202110 0135 v1
13 pages
1 s2.0 S0031320324003054 Main
No ratings yet
1 s2.0 S0031320324003054 Main
11 pages
Scopus Publications
No ratings yet
Scopus Publications
8 pages
Polyp Segmentation Based On Implicit Edge Guided Cross Layer Fusion Networks
No ratings yet
Polyp Segmentation Based On Implicit Edge Guided Cross Layer Fusion Networks
12 pages
Diagnostics 13 01721
No ratings yet
Diagnostics 13 01721
16 pages
Gastric Cancer Paper 4)
No ratings yet
Gastric Cancer Paper 4)
8 pages
Paper 2
No ratings yet
Paper 2
13 pages
Hypertension Lecture
No ratings yet
Hypertension Lecture
10 pages
FM8-35 Transportation of The Sick and Wounded 1970
No ratings yet
FM8-35 Transportation of The Sick and Wounded 1970
211 pages
MLSP111 PDF 02 History Organizations of Medical Technology
No ratings yet
MLSP111 PDF 02 History Organizations of Medical Technology
4 pages
Expedited Hip Fracture Surgery in Patients On.7
No ratings yet
Expedited Hip Fracture Surgery in Patients On.7
6 pages
BS Nursing
No ratings yet
BS Nursing
5 pages
Medical Report for Healthcare Professionals
No ratings yet
Medical Report for Healthcare Professionals
4 pages
3.1.9. BADAC Form 4 (UBRA)
90% (10)
3.1.9. BADAC Form 4 (UBRA)
1 page
Man
No ratings yet
Man
6 pages
4th Generation HIV Testing
No ratings yet
4th Generation HIV Testing
31 pages
Drug List Availible in Uae Market
50% (8)
Drug List Availible in Uae Market
316 pages
Newdcr
No ratings yet
Newdcr
144 pages
Biomedical Engineering Ktu Mod 2
No ratings yet
Biomedical Engineering Ktu Mod 2
19 pages
Pharmacotherapy of Glaucoma: 10/26/2020 by Abera J. 1
No ratings yet
Pharmacotherapy of Glaucoma: 10/26/2020 by Abera J. 1
45 pages
Human Health and Disease What Is Health?
No ratings yet
Human Health and Disease What Is Health?
40 pages
Letter of Dissent Regarding Abortion Rights in Ohio
0% (1)
Letter of Dissent Regarding Abortion Rights in Ohio
1 page
A Client With Multiple Sclerosis
No ratings yet
A Client With Multiple Sclerosis
10 pages
Histologi - DIGESTIVE SYSTEM
No ratings yet
Histologi - DIGESTIVE SYSTEM
6 pages
Chapter 20-Cardiovascular System The Heart
No ratings yet
Chapter 20-Cardiovascular System The Heart
44 pages
Poultry Farm Biosecurity Guide
No ratings yet
Poultry Farm Biosecurity Guide
9 pages
Kajian Pola Penggunaan Antibiotik Profilaksis Dan Diagnosa Pascaoperasi Hubungannya Dengan Angka Kejadian Infeksi Daerah Operasi (Ido) Pada Pasien Bedah Digestif Di Rumah Sakit Swasta
No ratings yet
Kajian Pola Penggunaan Antibiotik Profilaksis Dan Diagnosa Pascaoperasi Hubungannya Dengan Angka Kejadian Infeksi Daerah Operasi (Ido) Pada Pasien Bedah Digestif Di Rumah Sakit Swasta
9 pages
Pharmacology (Green Pacop) Flashcards - Quizlet
No ratings yet
Pharmacology (Green Pacop) Flashcards - Quizlet
27 pages
Propaedeutics Philosophical Background of Ayurveda: Bhaktivedanta College
No ratings yet
Propaedeutics Philosophical Background of Ayurveda: Bhaktivedanta College
13 pages
07-Traditional Medical Astrology
100% (5)
07-Traditional Medical Astrology
6 pages
Case Files Anesthesiology 1st Edition Premium Ebook Download
100% (20)
Case Files Anesthesiology 1st Edition Premium Ebook Download
14 pages
Sutures
No ratings yet
Sutures
30 pages
Cite Website
No ratings yet
Cite Website
2 pages
Anatomy - General Anatomy of Skin
No ratings yet
Anatomy - General Anatomy of Skin
28 pages
Diabetes Insipidus: Nitha K 2 Year MSC Nursing
100% (1)
Diabetes Insipidus: Nitha K 2 Year MSC Nursing
47 pages
Gavinos Resume
No ratings yet
Gavinos Resume
4 pages
Assessment and Management of Lip Lacerations - UpToDate
No ratings yet
Assessment and Management of Lip Lacerations - UpToDate
39 pages

NanoNet Real Time Polyp Segmentation in

Uploaded by

NanoNet Real Time Polyp Segmentation in

Uploaded by

1

NanoNet: Real-Time Polyp Segmentation in Video

standard convolution has a linear activation before an element-

B. Modified Residual Block

C. The NanoNet architecture

TABLE I: Publicly available endoscopic datasets used in our experiments

Table I shows the detailed information about the open

V8ECG0wq51Fpw3rYU 5a?dl=0 4 https://github.com/DebeshJha/NanoNet

performance with minuscule trainable parameters. The exper-

You might also like