NanoNet Real Time Polyp Segmentation in
NanoNet Real Time Polyp Segmentation in
Abstract—Deep learning in gastrointestinal endoscopy can and Video Capsule Endoscopy (VCE) are non-invasive tests.
assist to improve clinical performance and be helpful to assess VCE is a technology for capturing the video inside the GI
lesions more accurately. To this extent, semantic segmentation tract. It has evolved as an important tool for detecting small
methods that can perform automated real-time delineation of a
region-of-interest, e.g., boundary identification of cancer or pre- bowel diseases [2].
cancerous lesions, can benefit both diagnosis and interventions. Deep Learning (DL) methods have made a significant break-
However, accurate and real-time segmentation of endoscopic im- through in several medical domain such as lung cancer detec-
ages is extremely challenging due to its high operator dependence tion [3], diabetic retinopathy progression [4], and obstructive
and high-definition image quality. To utilize automated methods hypertrophic cardiomyopathy detection [5]. It has provided
in clinical settings, it is crucial to design lightweight models
with low latency such that they can be integrated with low-end new opportunities to solve challenges such as bleeding, light
endoscope hardware devices. In this work, we propose NanoNet, over/underexposure, smoke, and reflections [6]. However, DL
a novel architecture for the segmentation of video capsule normally needs a large annotated dataset for the implemen-
endoscopy and colonoscopy images. Our proposed architecture tation of methods. It is difficult to obtain a labeled medical
allows real-time performance and has higher segmentation accu- dataset. First, it needs collaborations with the hospitals. For
racy compared to other more complex ones. We use video capsule
endoscopy and standard colonoscopy datasets with polyps, and a data collection, the doctors require approval from various
dataset consisting of endoscopy biopsies and surgical instruments, authorities and patient consent. They need to set protocols
to evaluate the effectiveness of our approach. Our experiments for the collection, and the collected data must be anonymized
demonstrate the increased performance of our architecture in and cleaned with the help of data engineers. Domain experts
terms of a trade-off between model complexity, speed, model must label raw data, and after labeling, the annotations must be
parameters, and metric performances. Moreover, the resulting
models´ size is relatively tiny, with only nearly 36,000 parameters done depending upon the need of the task. The whole process
compared to traditional deep learning approaches having millions requires an significant amount of expert time and is costly.
of parameters. Additionally, it is an operator-dependent process. The quality
Index Terms—Video capsule endoscopy, colonoscopy, deep of the data labeling and annotation depends on the expertise
learning, segmentation, tool segmentation of the clinicians. Therefore, it is challenging to curate a larger
dataset.
One way of solving the dataset issue is to create synthetic
I. I NTRODUCTION images using Genearative Adversarial Network (GAN) [7].
Gastrointestinal (GI) endoscopy is a widely used technique However, generated synthetic images may not always capture
to diagnose and treat anomalies in the upper (esophagus, all the properties and characteristics of real endoscopic im-
stomach, and duodenum) and the lower (large bowel and ages. Consequently, the model may only learn to predict the
anus) GI tract. Among the other GI tract organs, colorectal properties from the synthetic images and may not perform
cancer (CRC) has the highest cancer incidences and mortality well on a real endoscopic dataset. Another solution could
rate [1]. There are several CRC screening options. Theses are be domain adaptation from a similar endoscopic dataset.
usually divided into two categories, namely, invasive (visual However, we lack large publicly available labeled endoscopic
examination-based test) and non-invasive based tests (stool, datasets. Thus, a viable and compelling approach to solve the
blood, and radiological test). Colonoscopy, the gold standard semantic segmentation task is to reuse ImageNet pre-trained
for examining the large bowel (colon and rectum), is an encoders in the segmentation model [8]. The predicted masks
invasive examination used to detect, observe, and remove ab- from the algorithm can provide reliable information to the
normalities (such as polyps). It detects colorectal cancer with endoscopic model.
both high sensitivity and specificity. Sigmoidscopy is another A lightweight Convolutional Neural Network (CNN) model
invasive test. Computed Tomography(CT) Colonoscopy, Fecal can be essential for the development of real-time and efficient
Occult Blood Test (FOBT) Fecal Immunochemical Test (FIT), semantic segmentation methods. Usually, lightweight models
2
are computationally efficient and require less memory. A to minimize the research gap towards building a clinically
smaller number of parameters makes the network less redun- relevant model.
dant. Lightweight CNN models are mainly being deployed
in mobile applications [9]. A lightweight model can play a B. Lightweight model
crucial role from a system perspective with a limited resource There are few works in the literature that have proposed
constraint for real-time prediction in clinics. Consequently, lightweight models for image segmentation. Ni et al. [29]
we propose a novel architecture, NanoNet, optimized for presented a novel bilinear attention network-based approach
faster inference and high accuracy. An extremely lightweight with an adaptive receptive field for the segmentation of
model with very few trainable parameters, faster inference, and surgical instruments. Wang et al. [30] proposed a lightweight
higher performance would require less memory footprint to be encoder-decoder network (LEDNet), an encoder-decoder
incorporated with any devices. Therefore, we put forward this network that uses ResNet50 in the encoder block and
approach to address the challenges in endoscopy. attention pyramidal network in the decoder block. Beheshti
The main contributions of this work include the following: et al. [31] proposed SqueezeNet. The architecture of the
1) We proposed a novel architecture, named NanoNet, to SqueezeNet is inspired by UNet [32]. The proposed model
segment video capsule endoscopy and colonoscopy im- obtained a 12× reduction in model size and showed efficient
ages in real-time with high accuracy. The proposed archi- performance in multiplication accumulation (mac) and
tecture is very lightweight, and the model size is smaller, memory uses.
requiring less computational cost.
2) VCE datasets are difficult to obtain with pixel-wise From the above-related work, we identify a need for a real-
annotations. In this context, we have annotated 55 polyps time polyp segmentation method. A real-time polyp segmenta-
from the “polyp” class of the Kvasir-Capsule dataset with tion method can be achieved by building a lightweight network
the help of an expert gastroenterologist. We have made architecture by designing an efficient network with blocks that
this dataset public and provided the benchmark. require fewer parameters. A lower number of network param-
3) NanoNet achieves promising performance on the eters will reduce the network complexity, leading to real-time
KvasirCapsule-SEG, Kvasir-SEG [10], 2020 Medico au- or faster inference. In this respect, we propose NanoNet, which
tomatic polyp segmentation challenge [11], 2020 Endo- uses a lightweight pre-trained network MobileNetV2 [33],
Tect challenge [12], and Kvasir-Instrument [13] datasets. and simple convolutional blocks such as residual block and
All experiments conform with state-of-the-art (SOTA) in squeeze and excite block.
terms of parameter uses (size), speed, computation, and
performance metrics. III. N ETWORK ARCHITECTURE
4) The model can be integrated with mobile and embedded The architecture of NanoNet follows an encoder-decoder
devices because of fewer parameters used in the network. approach as shown in Figure 1. As depicted in Figure 1, the
network architecture uses a pre-trained model as an encoder,
II. R ELATED WORK followed by the three decoder blocks. Using pre-trained Ima-
A. Semantic segmentation of endoscopic images geNet [34] models for transfer learning has become the best
Semantic segmentation of endoscopic images has been a choice for many CNN architectures [8], [25]. It helps the
well-established topic in medical image segmentation. Earlier model converge much faster and achieves high performance
work mostly relied on the handcrafted descriptors for feature compared to the non-pre-trained model. The proposed archi-
learning [14], [15]. The handcrafted features such as color, tecture uses a MobileNetV2 [33] model pre-trained on the
shape, texture, and edges were extracted and fed to the Ma- ImageNet [34] dataset as the encoder. The decoder is built
chine Learning (ML) classifier, which separates lesions from using a modified version of the residual block, which was
the background. However, the traditional ML methods based initially introduced by He et al. [35]. The encoder is used to
on handcrafted features suffer from low performance [16]. The capture the required contextual information from the input,
recent works on polyp segmentation using both video capsule whereas the decoder is used to generate the final output by
endoscopy and colonoscopy mostly relied on Deep Neural using the contextual information extracted by the encoder.
Network (DNN) [17]–[23].
With the DNN methods, there is progress in the performance A. MobileNetV2
for segmenting endoscopic images (for example, polyps). The MobileNetV2 [33] is an architecture that is primarily
However, the network architectures are often complex and designed for mobile and embedded devices. The architecture
requires high-end GPUs for training, and is computationally performed well on a variety of different datasets while main-
expensive [22], [24], [25]. Additionally, real-time lesion seg- taining high accuracy, despite having fewer parameters. The
mentation has often been ignored. Although there is some architecture of MobileNetV2 is based on the architecture of
recent initiation for the real-time detection of endoscopic MobileNetV1, which uses depth-wise separable convolutions
images, they have mostly used private datasets [26]–[28] for as the main building block. A depth-wise separable convolu-
the experimentation. It is difficult to compare the new methods tion consists of depth-wise convolution followed by a point-
on these datasets and extend the benchmark. Therefore, there wise convolution. The MobileNetV2 introduces two main
is a need for a benchmark on publicly available datasets ideas: inverted residual block and linear bottleneck block [33].
3
TABLE II: Performance evaluation of the proposed networks and recent SOTA methods on KvasirCapsule-SEG
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [38] 8,227,393 0.9532 0.9137 0.9785 0.9325 0.9677 0.9386 17.96
ResUNet++ (ISM’19) [24] 4,070,385 0.9499 0.9087 0.9762 0.9296 0.9648 0.9334 15.39
NanoNet-A (Ours) 235,425 0.9493 0.9059 0.9693 0.9325 0.9609 0.9351 28.35
NanoNet-B (Ours) 132,049 0.9474 0.9028 0.9682 0.9308 0.9593 0.9324 27.39
NanoNet-C (Ours) 36,561 0.9465 0.9021 0.9754 0.9238 0.9629 0.9297 29.48
TABLE III: Performance evaluation of the proposed networks and recent SOTA methods on Kvasir-SEG [10]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [38] 8,227,393 0.7203 0.6106 0.7602 0.7624 0.7327 0.9251 17.72
ResUNet++ (ISM’19) [24] 4,070,385 0.7310 0.6363 0.7925 0.7932 0.7478 0.9223 19.79
NanoNet-A (Ours) 235,425 0.8227 0.7282 0.8588 0.8367 0.8354 0.9456 26.13
NanoNet-B (Ours) 132,049 0.7860 0.6799 0.8392 0.8004 0.8067 0.9365 29.73
NanoNet-C (Ours) 36,561 0.7494 0.6360 0.8081 0.7738 0.7719 0.9290 32.17
TABLE IV: Performance evaluation of the proposed networks and recent SOTA methods on the Medico 2020 dataset [11]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [38] 8,227,393 0.6846 0.5599 0.7235 0.7236 0.6961 0.9231 18.54
ResUNet++ (ISM’19) [24] 4,070,385 0.6925 0.5849 0.8249 0.6840 0.7434 0.8995 19.47
NanoNet-A (Ours) 235,425 0.7364 0.6319 0.8566 0.7310 0.7804 0.9166 28.07
NanoNet-B (Ours) 132,049 0.7378 0.6247 0.8283 0.7373 0.7685 0.9223 29.04
NanoNet-C (Ours) 36,651 0.7070 0.5866 0.8095 0.7089 0.7432 0.9148 32.66
TABLE V: Performance evaluation of the proposed networks and recent SOTA methods on the Endotect 2020 dataset [12]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
ResUNet (GRSL’18) [39] 8,227,393 0.6640 0.5408 0.7510 0.6841 0.6943 0.9075 26.55
ResUNet++ (ISM’19) [24] 4,070,385 0.6940 0.5838 0.8797 0.6591 0.7597 0.8841 18.58
NanoNet-A (Ours) 235,425 0.7508 0.6466 0.8238 0.7744 0.7773 0.9255 27.19
NanoNet-B (Ours) 132,049 0.7362 0.6238 0.8109 0.7532 0.7646 0.9252 29.91
NanoNet-C (Ours) 36,651 0.7001 0.5792 0.8000 0.7159 0.7380 0.9091 32.98
TABLE VI: Performance evaluation of the proposed networks and recent SOTA methods on Kvasir-Instrument [13]
Method Parameters DSC mIoU Recall Precision F2 Accuracy FPS
UNet (Baseline) [39] - 0.9158 0.8578 0.9487 0.8998 0.9320 0.9864 20.46
DoubleUNet (Baseline) [25] - 0.9038 0.8430 0.9275 0.8966 0.9147 0.9838 10.00
ResUNet++ (ISM’19) [24] 4,070,385 0.9140 0.8635 0.9103 0.9348 0.9140 0.9866 17.87
NanoNet-A (Ours) 235,425 0.9251 0.8768 0.9142 0.9540 0.9251 0.9887 28.00
NanoNet-B (Ours) 132,049 0.9284 0.8790 0.9205 0.9482 0.9284 0.9875 29.82
NanoNet-C (Ours) 36,561 0.9139 0.8600 0.9037 0.9452 0.9139 0.9863 32.18
augmentation techniques such as random cropping, random show that NanoNet can produce real-time segmentation (i.e.,
rotation, horizontal flipping, vertical flipping, grid distortion, produces at least close to 30 FPS for each dataset present in
and many more are used. We have used an offline data the Tables). This is one of the major contributions of the work.
augmentation technique. The validation and testing set is not The other strength of the work lies in the parameter use. From
augmented and is directly resized into 256 × 256. Table II, we can observe that the best performing NanoNet
(i.e., NanoNet-A) uses nearly 35 times less parameters as
V. R ESULT AND D ISCUSSION ResUNet [38]. Similarly, NanoNet-C uses 225 times less
parameters as compared to that of ResUNet and also produces
In this section, we provide the experimental results for
better DSC, mIoU and FPS with the Kvasir-SEG.
the segmentation task of the endoscopic image dataset. For
the evaluation, we have used performance metrics such as The qualitative results are displayed in Figure 3. The first,
DSC and mIoU, and FPS as the main evaluation metrics. We second, and third columns show the image, ground truth,
also calculate recall, precision, F2, and overall accuracy to and prediction masks, respectively. Similarly, the name of the
support a complete set of metrics. Table II, Table III, Table IV, dataset is provided on the left side. One example image for
Table V, and Table VI show the results of the NanoNet each dataset is shown. The qualitative results with diversified
model experiments using different parameters. The results are classes of medical datasets show that NanoNet can produce
compared with the recent SOTA computer vision methods. accurate segmentation results with different types of lesions
The quantitative results in these tables show that NanoNet (polyps) and therapeutic tools. The example images and the
consistently outperforms or performs nearly equal to its com- prediction also show that NanoNet produces good segmenta-
petitors in terms of performance. The quantitative results also tion masks for large, medium, and small polyps (see Figure 3).
6
ACKNOWLEDGMENT
The research is partially funded by the PRIVATON project
(263248) and the Autocap project (282315) from the Research
Council of Norway (RCN). Our experiments were performed
on the Experimental Infrastructure for Exploration of Exascale
Computing (eX3) system, which is financially supported by
RCN under contract 270053.
R EFERENCES
[1] H. Sung et al., “Global cancer statistics 2020: Globocan estimates of
incidence and mortality worldwide for 36 cancers in 185 countries,” CA:
a cancer journal for clinicians, 2021.
[2] A. Kornbluth, P. Legnani, and B. S. Lewis, “Video capsule endoscopy
in inflammatory bowel disease: past, present, and future,” Inflammatory
Bowel Diseases, vol. 10, no. 3, pp. 278–285, 2004.
[3] D. Ardila et al., “End-to-end lung cancer screening with three-
dimensional deep learning on low-dose chest computed tomography,”
Nature medicine, vol. 25, no. 6, pp. 954–961, 2019.
[4] F. Arcadu et al., “Deep learning algorithm predicts diabetic retinopathy
progression in individual patients,” NPJ digital medicine, vol. 2, no. 1,
pp. 1–9, 2019.
[5] E. M. Green et al., “Machine learning detection of obstructive hy-
pertrophic cardiomyopathy using a wearable biosensor,” NPJ digital
medicine, vol. 2, no. 1, pp. 1–4, 2019.
Fig. 3: Qualitative results of NanoNet-A on five different [6] S. Bodenstedt et al., “Comparative evaluation of instrument segmenta-
datasets tion and tracking methods in minimally invasive surgery,” arXiv preprint
arXiv:1805.02475, 2018.
[7] I. J. Goodfellow et al., “Generative adversarial networks,” arXiv preprint
arXiv:1406.2661, 2014.
From the qualitative results, we can derive and conclude that [8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
NanoNet produces good results with small-sized polyps but “Deeplab: Semantic image segmentation with deep convolutional nets,
atrous convolution, and fully connected crfs,” IEEE transactions on
produces over-segmentation for the large-sized lesions upon pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848,
detail dissection. For future work, one could create a specific 2017.
dataset consisting of a set of small and large-sized polyps to [9] Y.-D. Kim et al., “Compression of deep convolutional neural net-
works for fast and low power mobile applications,” arXiv preprint
explore this further. arXiv:1511.06530, 2015.
From both evaluation metrics and qualitative results, the [10] D. Jha et al., “Kvasir-seg: A segmented polyp dataset,” in Proc. of
improvement is remarkable. Thus, the proposed NanoNet International Conference on Multimedia Modeling (MMM), 2020, pp.
architecture is simple, compact, and provides a robust solution 451–462.
[11] D. Jha, S. A. Hicks, K. Emanuelsen, H. Johansen, D. Johansen,
for real-time applications, as it produces satisfactory perfor- T. de Lange, M. A. Riegler, and P. Halvorsen, “Medico multimedia
mance despite having fewer parameters. task at mediaeval 2020: Automatic polyp segmentation,” in CEUR
Proceedings of MediaEval Workshop, 2020.
[12] S. A. Hicks et al., “The endotect 2020 challenge: Evaluation and compar-
VI. C ONCLUSION ison of classification, segmentation and inference time for endoscopy,”
In this paper, we proposed a novel lightweight architecture in Proceedings of ICPR 2020 Workshops and Challenges, 2020.
[13] D. Jha et al., “Kvasir-instrument: Diagnostic and therapeutic tool seg-
for real-time video capsule endoscopy and colonoscopy image mentation dataset in gastrointestinal endoscopy,” in Proc. of Multimedia
segmentation. The proposed NanoNet architecture utilizes a Modeling (MMM), 2021.
pre-trained MobileNetV2 model and a modified residual block. [14] S. A. Karkanis, D. K. Iakovidis, D. E. Maroulis, D. A. Karras, and
M. Tzivras, “Computer-aided tumor detection in endoscopic video using
The depthwise separable convolution is the main building color wavelet features,” IEEE transactions on information technology in
block of the network and allows the model to achieve high biomedicine, vol. 7, no. 3, pp. 141–152, 2003.
7
[15] S. Ameling, S. Wirth, D. Paulus, G. Lacey, and F. Vilarino, “Texture- [39] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
based polyp detection in colonoscopy,” in Bildverarbeitung für die for biomedical image segmentation,” in Proc. of International Confer-
Medizin 2009, 2009, pp. 346–350. ence on Medical image computing and computer-assisted intervention
[16] J. Bernal, J. Sánchez, and F. Vilarino, “Towards automatic polyp (MICCAI), 2015, pp. 234–241.
detection with a polyp appearance model,” Pattern Recognition, vol. 45, [40] M. Abadi et al., “Tensorflow: A system for large-scale machine learn-
no. 9, pp. 3166–3182, 2012. ing,” in Proc. of USENIX Symposium on Operating Systems Design and
[17] X. Jia, X. Xing, Y. Yuan, L. Xing, and M. Q.-H. Meng, “Wireless capsule Implementation (OSDI), 2016, pp. 265–283.
endoscopy: A new tool for cancer screening in the colon with deep- [41] T. Dozat, “Incorporating nesterov momentum into adam,” in Proc. of
learning-based polyp recognition,” Proceedings of the IEEE, vol. 108, International Conference on Learning Representations, 2016.
no. 1, pp. 178–197, 2019.
[18] V. Prasath, “Polyp detection and segmentation from video capsule
endoscopy: A review,” Journal of Imaging, vol. 3, no. 1, p. 1, 2017.
[19] N. K. Tomar et al., “Fanet: A feedback attention network for improved
biomedical image segmentation,” arXiv preprint arXiv:2103.17235,
2021.
[20] Y. Guo, J. Bernal, and B. J Matuszewski, “Polyp segmentation with
fully convolutional deep neural networks—extended evaluation study,”
Journal of Imaging, vol. 6, no. 7, p. 69, 2020.
[21] S. Ali et al., “Deep learning for detection and segmentation of artefact
and disease instances in gastrointestinal endoscopy,” Medical Image
Analysis, p. 102002, 2021.
[22] D.-P. Fan et al., “Pranet: Parallel reverse attention network for polyp
segmentation,” in Proc. of International Conference on Medical Image
Computing and Computer-Assisted Intervention (MICCAI), 2020, pp.
263–273.
[23] D. Jha et al., “A Comprehensive Study on Colorectal Polyp Seg-
mentation with ResUNet++, Conditional Random Field and Test-Time
Augmentation,” IEEE Journal of Biomedical and Health Informatics.
[24] D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange,
P. Halvorsen, and H. D. Johansen, “ResUNet++: An Advanced Archi-
tecture for Medical Image Segmentation,” in Proc. of IEEE International
Symposium on Multimedia (ISM), 2019, pp. 225–2255.
[25] D. Jha, , M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen,
“DoubleU-Net: A Deep Convolutional Neural Network for Medical Im-
age Segmentation,” in Proc. of International Conference on Multimedia
Modeling (MMM), 2020, pp. 451–462.
[26] J. Y. o. Lee, “Real-time detection of colon polyps during colonoscopy us-
ing deep learning: systematic validation with four independent datasets,”
Scientific reports, vol. 10, no. 1, pp. 1–9, 2020.
[27] M. Yamada et al., “Development of a real-time endoscopic image diag-
nosis support system using deep learning technology in colonoscopy,”
Scientific reports, vol. 9, no. 1, pp. 1–9, 2019.
[28] C. C. Poon et al., “Ai-doscopist: a real-time deep-learning-based algo-
rithm for localising polyps in colonoscopy videos with edge computing
devices,” NPJ Digital Medicine, vol. 3, no. 1, pp. 1–8, 2020.
[29] Z.-L. Ni et al., “Barnet: Bilinear attention network with adaptive
receptive field for surgical instrument segmentation,” arXiv preprint
arXiv:2001.07093, 2020.
[30] Y. Wang et al., “Lednet: A lightweight encoder-decoder network for real-
time semantic segmentation,” in Proc. of IEEE International Conference
on Image Processing (ICIP), 2019, pp. 1860–1864.
[31] N. Beheshti and L. Johnsson, “Squeeze u-net: A memory and energy
efficient image segmentation network,” in Proc. of IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition (CVPR) Workshops,
2020, pp. 364–365.
[32] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Effi-
cient residual factorized convnet for real-time semantic segmentation,”
IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1,
pp. 263–272, 2017.
[33] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
“Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proc. of
IEEE conference on computer vision and pattern recognition, 2018, pp.
4510–4520.
[34] J. Deng et al., “Imagenet: A large-scale hierarchical image database,” in
Proc. of IEEE conference on computer vision and pattern recognition
(CVPR), 2009, pp. 248–255.
[35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proc. of the IEEE conference on computer vision and
pattern recognition (CVPR), 2016, pp. 770–778.
[36] J. Bernal et al., “Wm-dova maps for accurate polyp highlighting in
colonoscopy: Validation vs. saliency maps from physicians,” Computer-
ized Medical Imaging and Graphics, vol. 43, pp. 99–111, 2015.
[37] P. H. Smedsrud et al., “Kvasir-capsule, a video capsule endoscopy
dataset,” Springer Nature Scientific Data, 2021.
[38] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual u-
net,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp.
749–753, 2018.