KEMBAR78
Polyp Nature OA2020 | PDF | Colonoscopy | Sensitivity And Specificity
0% found this document useful (0 votes)
68 views8 pages

Polyp Nature OA2020

Polyp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views8 pages

Polyp Nature OA2020

Polyp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

www.nature.

com/npjdigitalmed

ARTICLE OPEN

AI-doscopist: a real-time deep-learning-based algorithm


for localising polyps in colonoscopy videos with edge
computing devices
Carmen C. Y. Poon 1 ✉, Yuqi Jiang1, Ruikai Zhang1, Winnie W. Y. Lo1, Maggie S. H. Cheung2, Ruoxi Yu1, Yali Zheng1,3, John C. T. Wong4,
Qing Liu5, Sunny H. Wong4, Tony W. C. Mak6 and James Y. W. Lau2 ✉

We have designed a deep-learning model, an “Artificial Intelligent Endoscopist (a.k.a. AI-doscopist)”, to localise colonic neoplasia
during colonoscopy. This study aims to evaluate the agreement between endoscopists and AI-doscopist for colorectal neoplasm
localisation. AI-doscopist was pre-trained by 1.2 million non-medical images and fine-tuned by 291,090 colonoscopy and non-
medical images. The colonoscopy images were obtained from six databases, where the colonoscopy images were classified into 13
categories and the polyps’ locations were marked image-by-image by the smallest bounding boxes. Seven categories of non-
medical images, which were believed to share some common features with colorectal polyps, were downloaded from an online
search engine. Written informed consent were obtained from 144 patients who underwent colonoscopy and their full colonoscopy
videos were prospectively recorded for evaluation. A total of 128 suspicious lesions were resected or biopsied for histological
1234567890():,;

confirmation. When evaluated image-by-image on the 144 full colonoscopies, the specificity of AI-doscopist was 93.3%. AI-
doscopist were able to localise 124 out of 128 polyps (polyp-based sensitivity = 96.9%). Furthermore, after reviewing the suspected
regions highlighted by AI-doscopist in a 102-patient cohort, an endoscopist has high confidence in recognizing four missed polyps
in three patients who were not diagnosed with any lesion during their original colonoscopies. In summary, AI-doscopist can localise
96.9% of the polyps resected by the endoscopists. If AI-doscopist were to be used in real-time, it can potentially assist endoscopists
in detecting one more patient with polyp in every 20–33 colonoscopies.
npj Digital Medicine (2020)3:73 ; https://doi.org/10.1038/s41746-020-0281-z

INTRODUCTION models for a range of computing tasks. Deep convolutional neural


Colorectal cancer (CRC) is top three commonest cancers world- networks (CNNs) required large amount of data for training;
wide, with an estimated 1.8 million new diagnoses and 881 however, with sufficient training, deep features can be stored in
thousand deaths occurred in 20181. Colonoscopy can effectively the model and used to classify or detect different objects. The
reduce CRC incidence and mortality, but is contingent on a high- models can achieve promising results even if the same class of
quality examination. Polyps that are diminutive in size (<5 mm), objects possess very different features7. Therefore, deep-learning
sessile in type and flat in shape are more frequently being missed models have been shown to be useful in different tasks in both
during colonoscopy2. Human factors such as visual fatigue and non-medical7 and medical domains8, including classification of
inadvertent overlook were also found to be contributing to the diminutive colorectal polyps9,10.
missed lesions. For example, one study showed that polyp Based on our previous work on using deep-learning models to
detection rates decline over time during an endoscopist’s working detect and localise colorectal lesions in colonoscopy videos11, we
day by ~4.6% per hour3. An automated tool can assist aim to evaluate in this study the agreement between endoscopists
endoscopists by highlighting a region of a possible polyp during and the AI-doscopist (Artificial Intelligent Endoscopist), a deep-
colonoscopy, thus maximizing the quality of colonoscopy, as learning-based computer-aided model we developed for color-
illustrated in Fig. 1. ectal lesion localisation.
Although computer-aided detection methods for polyp detec-
tion have been actively studied in the past, most of them were
based on hand-crafted feature engineering methods4,5. These RESULTS
methods require strong domain knowledge and are less robust to Results from image-based analysis
background noises. The advantage of the hand-crafted features is We evaluated the proposed model on different platforms. When
that the predictions are easier to be explained. Some of these the input image resolution was fixed at 608 × 608, the model ran
methods can even achieve near real-time performance (at 10 at around 28 frames per second (fps) on a Nvidia GTX 1080Ti and
frames per seconds, fps)6. On the other hand, the recent explosion at 37 fps on a Nvidia GTX 2080Ti. Figure 2a, b presents the
of data opens up new opportunities for applying deep-learning Receiver Operating Characteristic (ROC) curves and the

1
Division of Biomedical Engineering Research, Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, People’s Republic of China. 2Division of Vascular
and General Surgery, Department of Surgery, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, People’s Republic of China. 3College of Health
Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, People’s Republic of China. 4Division of Gastroenterology and Hepatology, Department of
Medicine and Therapeutics, Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong SAR, People’s Republic of China. 5Department of Electrical and
Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou, People’s Republic of China. 6Division of Colorectal Surgery, Department of Surgery, The Chinese University of
Hong Kong, Hong Kong SAR, People’s Republic of China. ✉email: cpoon@surgery.cuhk.edu.hk; laujyw@surgery.cuhk.edu.hk

Scripps Research Translational Institute


C.C.Y. Poon et al.
2

Fig. 1 An illustration of the future use of AI-doscopist, a.k.a. “Artificial Intelligent Endoscopist”, during colonoscopy. Colonoscopy can
effectively reduce CRC incidence and mortality, but is contingent on a high-quality examination. Polyps that are diminutive in size (<5 mm),
sessile in type and flat in shape are more frequently being missed during colonoscopy. To maximize the quality of colonoscopy, an automated
tool is designed to assist endoscopists by highlighting regions of a possible polyp during colonoscopy.
1234567890():,;

Fig. 2 The image-based performance of AI-doscopist on Dataset B under different training schemes. a the Receiver Operating
Characteristic curves and b the Precision–Recall curves. In Training Scheme a, AI-doscopist learnt only the spatial features from a random
subset of 33,819 original colonoscopy images. In Scheme b, the training set was enlarged to a random subset of 119,703 original colonoscopy
and non-medical images. In Scheme c, AI-doscopist learnt both the spatial and temporal features from a random subset of 119,703 original
colonoscopy and non-medical images. In Training Scheme d, the spatial and temporal features were learnt from a larger, random subset of
191,493 colonoscopy and non-medical images. A total of 34,469 images were used for validation in each case.

Precision–Recall (PR) curves for AI-doscopist on the testing dataset localised in at least 16% of the frames of a video clip). If the same
under different training schemes, respectively. The model trained criteria were used to evaluate 140 video clips randomly selected
using Scheme d (threshold = 0.1) was selected based on its from 70 patients, who had no lesions detected, AI-doscopist made
performance in the validation dataset and used for further 10 out of 140 false detections (polyp-based specificity = 92.9%).
analysis. The selected model achieved an image-based sensitivity On average, 147.2 frames (5.9 s) were falsely detected in each of
of 72.6% and specificity of 93.3% when evaluated on Dataset B. these 10 video clips.
The accuracy and precision of it were 92.0% and 14.7%,
respectively. Estimation of potential increase in polyp detection rate (PDR)
Table 1 shows the evaluation performance of AI-doscopist on
different testing datasets, using training scheme d and the PDR is defined as the number of patients found with at least one
selected threshold 0.1. polyp divided by the total number of patients who underwent
colonoscopy. For Dataset C, the endoscopists had found at least
one polyp in 62 patients (total number of polyps = 130). No polyp
Results from Polyp-based analysis was found in 40 patients and their full colonoscopies were
Figure 3 shows the polyp-based evaluation of AI-doscopist under screened by AI-doscopist off-line after colonoscopy. The regions
different training schemes. On average, AI-doscopist correctly highlighted by AI-doscopist were then reviewed by an endosco-
localised a polyp for 15.0 out of 20.6 s. For video clips without a pist for a second time. The endoscopist confirmed with high
polyp, AI-doscopist falsely detected in 1.0 out of 20.6 s. AI- confidence that four regions highlighted by AI-doscopist in three
doscopist correctly localised 124 out of 128 polyps (polyp-based patients were possible polyps. Another four regions were
sensitivity = 96.9%) when n = 16% (i.e. a polyp was correctly confirmed with low confidence in another two patients as

npj Digital Medicine (2020) 73 Scripps Research Translational Institute


C.C.Y. Poon et al.
3
Table 1. Image-based evaluation results of AI-doscopist using training scheme d.

Dataset No. of No. of non- True- False- True- False- Image-based Image-based
polyp images lesion images positives negatives negatives positives sensitivity specificity

Dataset A 4313 13,261 3106 1207 12,880 480 72.0% 97.1%


Dataset B 65,958 3,603,892 47,877 18,082 3,363,076 277,407 72.6% 93.3%
Dataset B.1 65,958 N/A 47,877 18,082 N/A N/A 72.6% N/A
Dataset B.2 N/A 69,157 N/A N/A 72,238 3514 N/A 95.7%

Fig. 3 The polyp-based performance of AI-doscopist on Datasets B.1 and B.2 under different training schemes. Although Training
Schemes b, c, and d resulted in significant different performances in the image-based analysis (as shown in Fig. 2), their performances are
comparable in the polyp-based analysis.

Table 2. Estimated increase in polyp detection rate based on the evaluation on Dataset C.

1st time diagnosis by endoscopist 2nd time reviewed by an endoscopist, after


during colonoscopy screened by AI-doscopist
With high confidence With high or low confidence

No. of patients diagnosed with a polyp 62 65 (=62 + 3) 67 (=62 + 5)


No. of patients without any lesion detected 40 37 (=40 − 3) 35 (=40 − 5)
No. of polyps detected 130 134 (=130 + 4) 138 (=130 + 8)
Polyp detection rate 60.8% 63.7% 65.7%

possible polyps. Therefore, if AI-doscopist were to be used in real- obtained from completely different patients. Therefore, we found
time, the estimated possible increase in PDR is around 3–5%, as that the evaluation of our model is extremely close to reality,
summarised in Table 2. providing solid evidence to carry out prospective study of AI-
doscopist in real clinical setting. Since our method was trained on
images from around the world, it is robust to different endoscopy
DISCUSSION setting, scopes, and instruments.
Using deep learning in endoscopy has been gaining interest in the Although a number of studies have been conducted in this area,
research communities12. Compared to previous studies in this the evaluation methods, datasets, and metrics varied from study
area, our study contributed uniquely in the following aspects: In to study. As a result, comparison between different studies is not
this study, we explicitly trained our model using data obtained straight forward. Most studies trained and evaluated their
from multiple databases collected from different regions in the methods on preselected still images and are not comparable to
world, including colonoscopy and non-medical databases col- our study objectives and design. Two recent publications
lected by our own research group. Different training schemes evaluated computer-aided diagnosis algorithms in full colono-
have been proposed and tested on the same dataset, which scopy or colonoscopy video clips13,14. One publication presented
include over 3.71 million images from the full colonoscopy videos an algorithm developed based on SegNet, which after being
of 144 patients, and labelled by information obtained from 144 trained and tested on their own colonoscopy images and videos,
endoscopy reports and 70 pathology reports. No images/videos can achieve over 90% in both image-based sensitivity and
were preselected manually for testing. Rather, the full colonoscopy specificity13. The same model achieved a sensitivity of 88% when
videos were evaluated for image-based and polyp-based analysis. tested on a public database (CVC-ClinicDB)13. Another publication
Moreover, the training and testing datasets in our study were presented the evaluation of a system for detecting, rather than

Scripps Research Translational Institute npj Digital Medicine (2020) 73


C.C.Y. Poon et al.
4
localising, polyps in colonoscopy achieved an image-based polyp-based sensitivity, and image-based specificity in the
sensitivity and specificity of 90.0 and 63.3%, respectively14. It Evaluation Metrics Section. Note that some studies in the
detected 94% (47 out of 50) polyps, but also resulted in 60% false- engineering domains defined image-based specificity as TN/(TN
positive detection in 85 short non-lesion video clips. Their results + FP)15, while a number of recent studies defined image-based
suggested that one must observe for a tendency of over- specificity as TN/(Total Number of Non-lesion Images)13,14. The
diagnosing in artificial intelligent systems. former definitions will result in a lower specificity if multiple
Our proposed algorithm correctly localised 124 out of 128 regions were wrongly identified from the same frame, whereas
polyps (polyp-based sensitivity = 96.9%) and missed four polyps the later definition do not take into account multiple false
(Fig. 3). It resulted in only 7.1% false detections in short video clips detections in the same frame. We adopted the later definition in
(10 out of 140), which is considerably lower than previous work14. this study since we found that this definition better shows the user
Our evaluation method demonstrated that AI-doscopist can experience of an endoscopist in reality.
correctly localise most of the polyps; however, it cannot localise In summary, we presented the image-based and polyp-based
the same polyp in every frame. This is consistent with the general evaluation results of a real-time artificial intelligent algorithm for
knowledge of endoscopists, who often need to orbit around a localising polyps in colonoscopy videos, using different medical
suspicious lesion before they can make judgement. Furthermore, and non-medical datasets for training. We tested AI-doscopist on
we have also included an estimation of the potential improve- the full colonoscopies of 144 patients. AI-doscopist correctly
ment in PDR if AI-doscopist were to be used back-to-back with localised 124 out of 128 polyps (polyp-based sensitivity = 96.9%),
conventional colonoscopy. Based on our evaluation on Dataset C, missed four polyps, and achieved an image-based specificity of
we postulated that there can be a 3–5% increase in PDR. That is, 93.3%. If AI-doscopist were to be used as a second observer during
AI-doscopist can possibly help endoscopists to detect one more colonoscopy, it can potentially help endoscopists to detect one
patient with polyp in every 20–33 colonoscopies. This is given that more patient with polyp in every 20–33 colonoscopies. Benefits of
endoscopists are confident enough to resect polyps missed by AI- the use of AI-doscopist in improving adenoma detection rate,
doscopist. This remains to be verified in future study. compared with other related techniques such as Endocuff, need to
Although the precision of AI-doscopist seems to be relatively be verified in future prospective studies.
low (<0.3), one should take into account that in the full
colonoscopies, the images without a polyp normally outnumber
those with a polyp (≈56:1). The correct predictions were made in METHODS
47,877 out of 65,958 (72.6%) polyp images; but only 277,407 false Algorithm description
predictions were made in 3,776,900 regions without a lesion AI-doscopist was constructed based on one of our earlier works11, which
(7.3%). The image-based specificity for the evaluation on Datasets was built from ResNet5016, YOLOv217, and a temporal tracking algorithm.
A, B, and B.2 were 97.1, 93.3, and 95.7%, respectively (Table 1). The The model was found to perform reasonably well with a good trade-off
polyp-based specificity for the evaluation on Dataset B.2 was between speed and accuracy. As shown in Fig. 4, AI-doscopist adopted
92.9% (=100 − 7.1%, Fig. 3). The image-based analysis suggested ResNet50 as the feature extractor16. ResNet50 was constructed by 16
that the model was detecting one suspicious object in every residual blocks, each consisted of three convolutional layers with different
second (for 25 fps). Nevertheless, the polyp-based analysis channel widths and strides. We modified the ResNet50 architecture by
changing the channel width of the last convolutional layer and by adding
suggested that when one considered short video clips of 20 s,
two convolutional layers. Furthermore, we added a routing layer to retain
only 7.1% of these video clips have detected an object for more the high resolution feature maps for concatenation. On the other hand,
than 3.2 s (=20.6 × 16%). Therefore, to confirm whether a polyp YOLOv217 is a one-stage object detection system targeted for real-time
has been detected by AI-doscopist, the endoscopist can orbit processing. It divided the input image into a certain number of grids and
around a suspicious region for at least 3 s (up to 15–20 s) during predicted the confidence and the location of an object in each grid using a
colonoscopy to reduce false-positives. Furthermore, “false posi- single regression-based CNN structure. The dimension of the output layer
tives” in this study include (1) missed polyps; (2) hyperplastic or of the combined structure was determined by the number of girds, the
other polyps, which were detected but not resected; and (3) number of classes, and the number of predefined anchors. YOLOv2 was
polyps/resected polyps localized during polypectomy or removal found to be useful for the current application since a polyp can appear in
from the colon, during which we did not label the images due to different spatial location in an image. Prediction boxes that were unlikely
polyp were removed and overlapped prediction boxes were combined
limited manpower. Therefore, it is expected that the true precision
using the non-maximum suppression method. Temporal information was
and specificity will be higher if AI-doscopist were to be run in real- incorporated by using the majority votes of the prediction results within a
time during colonoscopy. sliding window, which was six consecutive frames in length.
Moreover, we labelled our gold standard frame-by-frame by The backbone network of AI-doscopist was first pre-trained with 1.2
rewinding the videos from the start of biopsy of a polyp to the first million non-medical images collected from the public online database
appearance of a polyp. Note that this is a very tough criterion ImageNet18. Additional learning on a training dataset for 90 epochs used
compared to other previous studies, which typically asked stochastic gradient descent with a learning rate of 0.001, weight decay of
multiple endoscopists to confirm the existence of polyps in each 0.0005 and momentum of 0.9. All learned weights were monitored by the
endoscopic image. When labelling the gold standard in our study, validation dataset to avoid overfitting. The learned weights that gave the
some videos were played forward and backward multiple times highest sensitivity, given the specificity was over 0.9, when evaluated on
the validation dataset were selected as the final model for testing.
before the labelling can be confirmed. It is suspected that if each
endoscopic image were independently reviewed by an endosco-
pist, some of the polyps may not be accurately located in the Training and validation datasets
blurry frames of the video clips. To our best knowledge, most of The training and validation datasets to fine-tune AI-doscopist consisted of
the previous papers did not report whether the gold standard was colonoscopy and non-medical images. The images were obtained from
labelled in frames that are recorded during motion or out of focus. seven databases around the world, including four public online colono-
This is suspected to be one of the major reasons causing the scopy databases, two private databases formed by colonoscopy images/
videos from two local hospitals, and one non-medical database. Table 3
differences in the reported performance metrics between our
summarises the number of images in each of the 7 databases: (1) CVC-
study and previous studies. ColonDB19, (2) CVC-ClinicDB20, (3) ETIS-LaribDB21, (4) AsuMayoDB22, (5) CU-
It is necessary to standardise the evaluation scheme for ColonDB9, (6) ACP-ColonDB530, and (7) Selected Google Images. Details of
different computer-aided diagnosis systems in this area. Setting the first five databases have been described in our previous studies9,11,23.
an evaluation guideline will help end-user to select the best As most of the images in the previous five databases consisted of images
system. In this study, we presented the definition of TP, TN, FP, FN, with polyps, we constructed the sixth database from videos of

npj Digital Medicine (2020) 73 Scripps Research Translational Institute


C.C.Y. Poon et al.
5

Fig. 4 An overview of the algorithm design of AI-doscopist. AI-doscopist was constructed based on ResNet50, YOLOv2, and a temporal
tracker. The model was found to perform reasonably well with a good trade-off between speed and accuracy in earlier studies. The feature
extractor was adopted from a modified version of ResNet50. A one-stage object detector, YOLOv2, was selected for localising objects in each
image in real-time. Predicted boxes that were unlikely polyp were removed and overlapped predicted boxes were combined using the non-
maximum suppression method. Temporal information was incorporated by using the majority votes of the prediction results within a sliding
window.

Table 3. Summary of the number of images used for training and validating AI-doscopist.

Name of database Training subset Validation subset


No. of polyp images No. of non-lesion images No. of polyp images No. of non-lesion images

CVC-ColonDB 297 N/A 82 N/A


CVC-ClinicDB 485 N/A 127 N/A
ETISDB 150 N/A 46 N/A
AsuMayoDBTrain 3237 1842 619 55
CU-ColonDB 634 N/A 164 N/A
ACP-ColonDB530 72,350 116,250 13,973 19,403
Selected Google Images N/A 2893 N/A N/A
Total (before augmentation) 77,153 120,985 15,011 19,458
Total (after augmentation) 160,618 130,472 N/A N/A

colonoscopies collected from our Endoscopy Centre. To construct this from 220 patients were used for training and validation (ACP-ColonDB530-
database, written informed consents were obtained from patients before Train), while data from 144 patients (68.0 ± 8.8 years old and 69 males) were
colonoscopy during June to October 2017. Excluding 19 patients with used for testing (ACP-ColonDB530-Test). A total of 110 h of colonoscopy
abnormality found but no biopsy taken, 133 patients with corrupted/ videos were recorded from 364 patients by seven endoscopists. The
missed videos, and 14 patients whose lesion cannot be labelled, 364 objects found in the colonoscopy images were classified into 13
patients were included in this database, namely ACP-ColonDB530. Data categories, namely “Adenomatous Polyp”, “Hyperplastic Polyp”, “Other

Scripps Research Translational Institute npj Digital Medicine (2020) 73


C.C.Y. Poon et al.
6
Polyp”, “Bleeding”, “Lumen”, “IC Valve”, “Normal Colon Structure”, Kong (CREC 2017.064). Written informed consent were obtained from 144
“Instrument”, “Stool”, “Bubble”, “Artefact”, “Inside Colon Background”, and patients who underwent colonoscopy and their full colonoscopy videos
“Outside Colon Background”. were prospectively recorded for evaluation.
The total length of the colonoscopy videos we collected for the training
dataset were 57 h. We included images with a polyp as much as possible
(72,350 images). In order to maintain a relatively balanced ratio between Study protocol
images with and without a polyp, we randomly selected 116,250 images After pre-training and fine-tuning AI-doscopist, we evaluated its perfor-
without a polyp for training. Most of these images were selected based on mance on a public database (Dataset A), as well as 144 full colonoscopies
running the training dataset with an earlier version of AI-doscopist. “False (Dataset B). Furthermore, a private database consisted of 102 full
Positives” were manually checked and re-labelled to other categories. colonoscopy videos (Dataset C) was used to estimate the potential
“False Negatives” were confirmed and other non-polyp labels that can increase in PDR if AI-doscopist were to be used in real-time screening.
possibly affect the localization of the polyp were added in the same image. To compare the performance of AI-doscopist with existing algorithms,
“True Negatives” were randomly selected for inclusion for training. The we first evaluated it on a public online database, AsuMayoDB (Dataset A).
selection ratio is around 3.7%, which is a trade-off between acceptable AsuMayoDB was originally used for the MICCAI endoscopic vision
performance, labelling efforts and time required for training. challenge in 2015 and a number of algorithms have reported their
In addition, 2893 non-medical images were obtained from Google for performance using this database. In this study, we chose to evaluate AI-
training. AI-doscopist simultaneously with the colonoscopy images. These doscopist on AsuMayoDB such that a direct comparison with existing
images were found to share common features as colorectal polyps and algorithms can be made. Besides the 20 videos used for training and
therefore we hypothesized that training AI-doscopist with these images validating the algorithm, 18 short colonoscopy video clips from
can improve the polyp localisation performance. Specifically, the images AsuMayoDB has been designated for algorithm testing22. Nine videos
were searched online using keywords that described a polyp. Objects have one polyp each and the rest have no polyps. A total of 4313 polyp
included were “blood vessels”, “fingers”, “skin”, “eggs”, “nuts”, “red meats”, images and 13261 non-lesion images were extracted from the 18 videos
…, and “tomatoes.” The images were broadly classified into seven for evaluation in this study. Each frame in the videos has a respective
categories, namely, “Cell”, “Food”, “Body”, “Nature”, “Plant”, “Pattern”, and reference image marked with a binary mask. Black region in the reference
“Others”. image indicates non-lesion region. On the contrary, white region
As summarised in Table 3, the images were divided into the training and represents the polyp area. The reference images were initially created by
validation subsets. The ratio of colonoscopy images used for training to Arizona State University.
validation was around 6:1. In particular, from ACP-ColonDB530, 182 patients As aforementioned, data from 144 patients of ACP-ColonDB530-Test were
(160 polyps) and another 38 patients (32 polyps) were used for training used for evaluation (Dataset B). Their colonoscopy videos were recorded in
and validation, respectively. The training subset was further augmented by MP4 format at 25 fps. Resected tissues were sent for histological diagnosis
random rotation (0°, 90°, 180°,and 270°), flipping (horizontal and vertical), and used as the gold standard. Among them, 128 polyps were found in 70
Gaussian smoothing (sigma ranged from 0.5 to 2), or different combina- patients. According to the histology analysis, the 128 polyps were 110
tions of these operations. The number of images with polyps were adenomatous, 10 hyperplastic, and 8 mucosal polyps. Adenomatous
increased from 77,153 to 160,618, and those without a polyp were polyps contributed to 85.9% in this test dataset. No polyp was found in 74
increased from 120,985 to 130,472. Four training schemes were used: (a) patients.
when only spatial features were learnt from a random subset of 33,819 As shown in Fig. 5, three timepoints were marked for each full
original colonoscopy images; (b) when only spatial features were learnt colonoscopy video collected: (1) the first appearance of the polyp, (2) the
from a random subset of 119,703 original colonoscopy and non-medical start of the biopsy/polypectomy procedure, confirmed by the first
images; (c) when both spatial and temporal features were learnt from a appearance of an endoscopic tool; and (3) the end of its biopsy/
random subset of 119,703 original colonoscopy and non-medical images; polypectomy procedure.
and (d) when both spatial and temporal features were learnt from a Two subsets were generated from Dataset B. Dataset B.1 contained
random subset of 191,493 colonoscopy and non-medical images. A total of 128 short video clips, each started with the first appearance of a polyp, and
34,469 images were used for validation in each case. ended with the beginning of the polypectomy of that polyp. Dataset B.2
This study and the recording of the endoscopic videos were approved consisted of 140 short video clips extracted from 70 patients without any
by the Clinical Trial Ethics Committee of The Chinese University of Hong detected polyp. The average duration of the video clips in Dataset B.2 was

Start of Polypectomy
First Appearance
of Polyp End of Polypectomy

Randomly Selected Non-lesion Clips Polyp Clips

Bubble No Object Stool Hyperplasia Adenoma Instrument

Full Colonoscopy Video

Fig. 5 An illustration of video clips and types of images that were found in a full colonoscopy video. Three timepoints were marked for
each full colonoscopy video: (1) the first appearance of the polyp, (2) the start of the biopsy/polypectomy procedure, confirmed by the first
appearance of an endoscopic tool; and (3) the end of its biopsy/polypectomy procedure. Nil were recorded when no polyp was found in a
colonoscopy video. The colonoscopy images were screened by an earlier version of AI-doscopist. All localised objects were classified into 13
categories, namely “Adenomatous Polyp”, “Hyperplastic Polyp”, “Other Polyp”, “Bleeding”, “Lumen”, “IC Valve”, “Normal Colon Structure”,
“Instrument”, “Stool”, “Bubble”, “Artefact”, “Inside Colon Background”, and “Outside Colon Background”.

npj Digital Medicine (2020) 73 Scripps Research Translational Institute


C.C.Y. Poon et al.
7
20.6 s, which is equivalent to the average duration of the polyp video clips Reporting summary
of Dataset B.1. Figure 5 illustrates the type of images found in Datasets B, Further information on research design is available in the Nature Research
B.1 and B.2 from the 144-patient cohort. Reporting Summary linked to this article.
To estimate the potential increase of PDR with AI-doscopist compared to
traditional colonoscopy, an endoscopist was invited to re-examine a subset
of highlighted colonoscopy video clips (Dataset C). Dataset C consisted of a DATA AVAILABILITY
102-patient cohort who underwent colonoscopy from June to July 2017. In Data are available on request due to privacy or other restrictions.
this cohort of patients, 62 patients had one or more polypectomies, while
40 had no biopsies taken during their procedures. Videos of the 40 patients
who had no biopsies taken were screened by AI-doscopist for potentially CODE AVAILABILITY
missed polyps. The predictions of AI-doscopist were transformed into The codes are available upon request. Users are required to accept a license
bounding boxes to highlight suspicious regions and overlaid on the agreement before using the codes.
original full colonoscopy. Videos clips with highlighted regions were
segmented and re-examined by an endoscopist. The protocol is similar to
performing a back-to-back colonoscopy. The endoscopist was invited to
Received: 27 November 2019; Accepted: 28 April 2020;
comment whether the region highlighted by AI-doscopist correctly
identified a polyp, together with his level of confidence (high or low).

Gold standard labelling REFERENCES


Dataset A is an online database which the gold standard of each image has 1. Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and
been provided by a binary mask. For Dataset B, the polyp areas were mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68,
marked image-by-image with a bounding box in each polyp clip. In order 394–424 (2018).
to efficiently and accurately label each image in the dataset, each video 2. Zimmermann-Fraedrich, K. et al. Right-sided location not associated with missed
clip was first screened using one of our previously developed polyp colorectal adenomas in an individual-level reanalysis of tandem colonoscopy
detection algorithms11. The gold standard was then confirmed by fine- studies. Gastroenterology 157, 660 (2019).
tuning the bounding box in each image manually. 3. Leufkens, A. M., van Oijen, M. G. H., Vleggaar, F. P. & Siersema, P. D. Factors
influencing the miss rate of polyps in a back-to-back colonoscopy study. Endo-
scopy 44, 470–475 (2012).
Evaluation metrics for image-based analysis 4. Mamonov, A. V., Figueiredo, I. N., Figueiredo, P. N. & Tsai, Y. H. R. Automated
The prediction generated from AI-doscopist was in the form of a 6-element polyp detection in colon capsule endoscopy. IEEE Trans. Med. Imaging 33,
vector that indicated the class (either a polyp or not), confidence level, 1488–1502 (2014).
centre coordinates, width and height of the detected object, respectively. 5. Bae, S. H. & Yoon, K. J. Polyp detection via imbalanced learning and discriminative
Only the predicted bounding boxes for the three polyp classes were feature learning. IEEE Trans. Med. Imaging 34, 2379–2393 (2015).
evaluated in this study. The image-based metrics used to measure the 6. Wang, Y., Tavanapong, W., Wong, J., Oh, J. H. & de Groen, P. C. Polyp-Alert: near
correctness of each predicted bounding box were as follows: real-time feedback during colonoscopy. Comput. Methods Programs Biomed. 120,
164–179 (2015).
(1) True-positive (TP) counts the number of polyp areas, which has at 7. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
least one prediction box with the centre point fallen within the area 8. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural
marked by the ground truth. If the centroids of multiple predicted networks. Nature 542, 115 (2017).
boxes fall inside the same ground-truth bounding box, it will only be 9. Zhang, R. K. et al. Automatic detection and classification of colorectal polyps by
counted as one TP. transferring low-level CNN features from nonmedical domain. IEEE J. Biomed.
(2) False-positive (FP) counts in any image the number of prediction Health Inf. 21, 41–47 (2017).
boxes fallen outside the ground-truth polyp area. 10. Chen, P. J. et al. Accurate classification of diminutive colorectal polyps using
(3) True-negative (TN) counts the number of non-lesion images that computer-aided analysis. Gastroenterology 154, 568–575 (2018).
have no prediction boxes. 11. Zhang, R. K., Zheng, Y. L., Poon, C. C. Y., Shen, D. G. & Lau, J. Y. W. Polyp detection
(4) False-negative (FN) counts the number of polyp areas where none of during colonoscopy using a regression-based convolutional neural network with
the centroids of the predicted boxes fall within the area marked by a tracker. Pattern Recogn. 83, 209–219 (2018).
the ground truth. 12. Ahmad, O. F. et al. Artificial intelligence and computer-aided diagnosis in colo-
In addition, the image-based sensitivity, specificity, precision, and noscopy: current evidence and future directions. Lancet Gastroenterol. Hepatol. 4,
accuracy were calculated using the following set of equations: 71–80 (2019).
13. Wang, P. et al. Development and validation of a deep-learning algorithm for
the detection of polyps during colonoscopy. Nat. Biomed. Eng. 2, 741–748
Image  based Sensitivity ¼ TP=ðTP þ FNÞ; (2018).
Image  based Specificity ¼ TN=Total Number of Non  lesion Images; 14. Misawa, M. et al. Artificial intelligence-assisted polyp detection for colonoscopy:
initial experience. Gastroenterology 154, 2027 (2018).
Precision ¼ TP=ðTP þ FPÞ; and
15. Bernal, J. et al. Comparative validation of polyp detection methods in video
Accuracy ¼ðTP þ TNÞ=ðTP þ TN þ FP þ FNÞ: colonoscopy: results from the MICCAI 2015 endoscopic vision challenge. IEEE
Trans. Med. Imaging 36, 1231–1249 (2017).
The ROC curves and the PRC were plotted for different training methods 16. He, K., Zhang, X., Ren, S. & Sun, J. In 2016 IEEE Conference on Computer Vision and
of AI-doscopist. Both ROC curves were made by varying the algorithm Pattern Recognition 770–778 (Seattle, 2016).
threshold from 0.01 to 1.0 in steps of 0.01. The confusion matrix of the 17. Redmon, J. & Farhadi, A. In 30th IEEE Conference on Computer Vision and Pattern
predictions was calculated for the selected model. Recognition 6517–6525 (2017).
18. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J.
Comput. Vis. 115, 211–252 (2015).
Evaluation metrics for polyp-based analysis 19. Bernal, J., Sanchez, J. & Vilarino, F. Towards automatic polyp detection with a
Furthermore, we analysed the number of polyps that were missed by AI- polyp appearance model. Pattern Recogn. 45, 3166–3182 (2012).
doscopist. AI-doscopist was considered as correctly localising a polyp if it 20. Bernal, J. et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy:
made prediction in at least n% of the frames of a short video clip, and the validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43,
ROC curves for n ranging from 9 to 44% were plotted. The polyp-based 99–111 (2015).
21. Silva, J., Histace, A., Romain, O., Dray, X. & Granado, B. Toward embedded
sensitivity is calculated as the number of detected polyps over the total
detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J.
number of polyp clips. The polyp-based specificity is calculated as the
Comput. Assist. Radiol. Surg. 9, 283–293 (2014).
number of falsely detected objects over the total number of non-polyp clips.

Scripps Research Translational Institute npj Digital Medicine (2020) 73


C.C.Y. Poon et al.
8
22. Tajbakhsh, N., Gurudu, S. R. & Liang, J. M. Automated polyp detection in colo- ADDITIONAL INFORMATION
noscopy videos using shape and context information. IEEE Trans. Med. Imaging Supplementary information is available for this paper at https://doi.org/10.1038/
35, 630–644 (2016). s41746-020-0281-z.
23. Zheng, Y. et al. In 2018 40th Annual International Conference of the IEEE Engi-
neering in Medicine and Biology Society (EMBC) 4142–4145 (IEEE, 2018). Correspondence and requests for materials should be addressed to C. C. Y.P.
or J. Y. W.L.

ACKNOWLEDGEMENTS Reprints and permission information is available at http://www.nature.com/


The work was supported in part by Hong Kong General Research Fund and Hong reprints
Kong Innovation and Technology Fund. We are grateful for Surgical Team Three of
the surgical department of the Chinese University of Hong Kong for their help in Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims
collecting the endoscopy videos at the Prince of Wales Hospital. in published maps and institutional affiliations.

AUTHOR CONTRIBUTIONS
C.C.Y.P. contributed to the design of the study, data collection and analysis, paper
Open Access This article is licensed under a Creative Commons
drafting, and approving the final version of the manuscript. Y.J. and R.Z. contributed
Attribution 4.0 International License, which permits use, sharing,
equally to the algorithm implementation, data analysis, and paper drafting. W.W.Y.L.
adaptation, distribution and reproduction in any medium or format, as long as you give
contributed to the data analysis and paper drafting. M.S.H.C., R.Y., Y.Z., and Q.L.
appropriate credit to the original author(s) and the source, provide a link to the Creative
contributed to the data preparation and analysis. J.C.T.W. and S.H.W. contributed to
Commons license, and indicate if changes were made. The images or other third party
the study design and data analysis. T.W.C.M. contributed to the study design and
material in this article are included in the article’s Creative Commons license, unless
patient recruitment. J.Y.W.L. contributed to the study design, patient recruitment, and
indicated otherwise in a credit line to the material. If material is not included in the
approving the final version of the manuscript.
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
COMPETING INTERESTS org/licenses/by/4.0/.
The authors are inventors of patents related to the submitted work and the
corresponding author is a director of a spin-off company aiming to commercialise the
product. © The Author(s) 2020

npj Digital Medicine (2020) 73 Scripps Research Translational Institute

You might also like