Self Supervised Visual Feature Learning for Polyp Segmentation in Colonoscopy Images Using Image Reconstruction as Pretext Task Self supervised Visual Feature Learning for Polyp Segmentation in Colono[.]
Trang 1Self-supervised Visual Feature Learning for Polyp Segmentation in Colonoscopy Images Using Image
Reconstruction as Pretext Task
Le Thi Thu Hong
Institute of Information Technology,
AMST
Hanoi, Vietnam
lethithuhong1302@gmail.com
Nguyen Chi Thanh*
Institute of Information Technology,
AMST
Hanoi, Vietnam thanhnc80@gmail.com
Tran Quoc Long
University of Engineering and Technology, VNU
Hanoi, Vietnam tqlong@gmail.com
Abstract— Automatic polyp detection and segmentation are
desirable for colon screening because the polyps miss rate in
clinical practice is relatively high The deep learning-based
approach for polyp segmentation has gained much attention in
recent years due to the automatic feature extraction process to
segment polyp regions with unprecedented precision However,
training these networks requires a large amount of manually
annotated data, which is limited by the available resources of
endoscopic doctors We propose a self-supervised visual
learning method for polyp segmentation to address this
challenge We adapted self-supervised visual feature learning
with image reconstruction as a pretext task and polyp
segmentation as a downstream task UNet is used as the
backbone architecture for both the pretext task and the
downstream task The unlabeled colonoscopy image dataset is
used to train the pretext network For polyp segmentation, we
apply transfer learning on the pretext network The polyp
segmentation network is trained using a public benchmark
dataset for polyp segmentation Our experiments demonstrate
that the proposed self-supervised learning method can achieve a
better segmentation accuracy than an UNet trained from
scratch On the CVC-ColonDB polyp segmentation dataset with
only annotated 300 images, the proposed method improves IoU
metric from 76.87% to 81.99% and Dice metric from 86.61% to
89.33% for polyp segmentation, compared to the baseline UNet
Keywords: Polyp segmentation, medical image analysis,
transfer learning, deep learning, self-supervised learning
I INTRODUCTION Colonoscopy is considered the gold-standard investigation
for colorectal cancer screening However, the polyps miss rate
in clinical practice is relatively high due to different factors
[1] This presents an opportunity to use AI models to
automatically detect and segment polyps, supporting
clinicians to reduce the number of polyps missed Recently,
deep learning methods have been widely used to solve
medical image segmentation problems, including polyp
segmentation, due to their capacity in learning image features
for the segmentation task [2] Deep learning methods usually
rely on a large amount of training data with manual labels
However, the polyp segmentation dataset may not always be
available because annotation usually requires expert
knowledge of endoscopists Thus, there is more and more
interest in developing methods that do not require a large
number of annotations for learning features of colonoscopy
images to avoid time-consuming and expensive data
annotations Directions of research include transfer learning,
semi-supervised learning, unsupervised, and self-supervised
learning Self-supervised visual feature learning allows visual
feature learning from large-scale unlabeled images
* Corresponding author
Generally, computer vision pipelines that employ self-supervised feature learning involve performing two tasks, a pretext task and a real (downstream) task [3] The pretext task
is the self-supervised learning task for learning visual representations The learned representations or model weights obtained from the pretext task are used for the downstream task The real (downstream) task can be any object recognition task like classification, detection, or segmentation, with insufficient annotated data samples Self-supervised learning
is a good method to discover the unlabelled images to improve the performance of a deep model when there is only limited labeled data This method not only helps to overcome the need for large amounts of annotated data but also helps to improve the robustness and uncertainty of deep convolution neural networks [4]
In this work, to address the challenge of limitations of labeled polyp data, we propose a novel method for training a polyp segmentation network, which formulates a self-supervised task for visual feature learning and decreases the cost of data annotation Image reconstruction is proposed for pretext tasks to improve the performance of real polyp segmentation tasks The visual features of colonoscopy images are learned through training UNet for image reconstruction task as pretext task We use an unlabeled colonoscopy image dataset containing 8,500 images collected from Hospital 103 in Hanoi, Vietnam, to train the pretext network The pixels and pixel channels (R, G, or B pixel channel) in the input images are dropped in a random manner, and the original image serves as the label After self-supervised pretext task training is finished, the learned parameters serve as a pre-trained model and are transferred to the downstream task - polyp segmentation The CVC-ColonDB [11] dataset, containing 300 labeled segmentation polyp images, was used for the finetuning polyp segmentation network Our experiment shows that the proposed method improves the Dice metric significantly for polyp segmentation compared to the baseline segmentation network In summary, the main contributions of our work can be summarized as follows:
1) We propose a self-supervised feature learning method for training a polyp segmentation network using image reconstruction as a pretext task In the pretext task, the pixels and or pixel channels (R or G or B pixel channel) in the input images are dropped in a random manner, and the original image serves as the label
2) The experimental results on a public polyp segmentation dataset show the efficacy of our method In experiments, we also study the effect of pretext task
Trang 2complexity and polyp segmentation network finetuning
methods on the performance of the polyp segmentation tasks
The rest of the article is organized as Section 2 reviews
related research on deep learning for polyp segmentation and
self-supervised learning for medical image analysis Section 3
describes our proposed self-supervised visual feature learning
method for polyp segmentation in colonoscopy images using
image reconstruction as a pretext task in detail Section 4
outlines our experiment settings, experimental results, and
discussion Finally, Section 5 summarizes and concludes this
work
II RELATEDWORK
A Polyp segmentation using deep learning methods
A Computer-Aided Diagnosis (CADx) system for polyp
segmentation on colonoscopy images can be an effective
clinical tool that helps endoscopists for faster screening and
higher accuracy [2] However, precise polyp segmentation is
still challenging due to variations of polyps in size, shape,
texture, and color Like other medical imaging applications,
the deep learning-based approach for polyp segmentation has
gained much attention in recent years due to the automatic
feature extraction process to segment polyp regions with
unprecedented precision In addition, the public database of
polyp images facilitated further research on the use of deep
learning models for polyp segmentation Several benchmark
datasets are publicly available that are used for the training
and evaluation of the deep models They are as follows:
CVC-ClinicDB [5] dataset consists of 612 images with the
corresponding ground truth masks of defined polyp regions
Kvasir-SEG dataset [8] includes 1000 polyp images with
corresponding ground truth masks manually annotated by
expert endoscopists ETIS-Larib [7] dataset contains 36
different types of polyps in 196 images with the ground truth
masks were annotated by experts The CVC-ColonDB [6]
consists of 300 polyp images and their corresponding
pixel-level annotated polyp masks UNet [9], an
encoder-decoder-based structure that uses skip connections to concatenate the
features from the encoding and decoding layers, is a popular
strategy for solving medical image segmentation tasks,
including polyp segmentation Inspired by the success of
UNet, several variants were proposed for polyp segmentation
and yielded promising results, such as DoubleUNet [10],
UNet++[11], ResUNet++[12]
B self-supervised learning in the medical imaging domain
Self-supervised learning, which formulates a pretext task based on unannotated data for feature learning, has gained more and more popularity in recent years Various types of pretext tasks have been proposed depending on data types For general image and video analysis problems, patch relative positions [13,14], local context prediction [15], colorization [16], and image reconstruction [17] have been used in self-supervised learning In the medical imaging domain, patients often have follow-up scans, and all unlabeled images are stored on PACS systems (Picture Archiving and Communication System) At the same time, only limited labeled data is available because annotation usually requires expert knowledge Thus, self-supervised learning is a good way for mining the unlabelled images to improve the deep neural network accuracy Overpass year, self-supervised learning has also been explored for medical imaging but to a less extent Jamaludin et al [18] proposed a self-supervised learning method to predict the level of vertebral bodies, e.g., classify whether two spinal MR images came from the same patient or not They used to recognize the recognization of patients’ MR scans as a pretext task and prediction the level
of vertebral bodies as a real task Tajbakhsh et al [19] proposed a self-supervised learning method for lung lobe segmentation and nodule detection tasks by using rotation prediction as a pretext task Ross et al [20] proposed the method to exploit the potential of unlabeled endoscopic video data for surgical instrument segmentation They defined the decolorization of surgical videos as a pretext task and used the pre-trained features to initialize a surgical instrument segmentation network In this article, different from previous works, we propose novel self-supervised learning, which uses image reconstruction as a pretext task and polyp segmentation
as a real task
A The architecture of the proposed model
Overall the proposed method, which adapts self-supervised visual feature learning for training a polyp segmentation network, is depicted in Fig.1 The image reconstruction is used as the pretext task We use UNet [16]
as a backbone architecture for both the pretext and downstream tasks UNet architecture was developed for a biomedical image segmentation application and is more and more widely used in medical image analysis applications UNet consists of two paths: encoder and decoder The encoder
Fig.1 Overview of the proposed self-supervised visual feature learning method for polyp segmentation
Trang 3is a typical CNN's which generally consists of a set of
convolutional and pooling layers The encoder is used to
capture the context in the image The decoder is the
symmetric expanding path that is used to enable precise
localization using transposed convolutions MobileNetV2
[24], which is designed to effectively maximize accuracy
while being mindful of the restricted resources for an
on-device or embedded application, is used as the encoder
Although deeper backbones can give more accurate results,
we choose MobileNetV2 as a backbone for feature extraction
because this is the lightweight model suitable for an on-device
or embedded application with acceptable accuracy Output
layers of UNet differ for pretext task and downstream task:
convolution layer with filters of 3 for pretext network and
convolution layer with filters of 1 for the downstream task
The pipeline of the proposed method is as follows: First,
we use an unlabeled dataset for training a UNet-based image
reconstruction network to learn visual features of the
colonoscopy image The transformed images are the input of
the reconstruction network, and the original images serve as
ground truth labels The pretext task, image reconstruction,
challenges the network to learn visual features with
automatically generated pseudo-labels Then, we use the
learned features to transfer to the polyp segmentation network
Since both the encoder and decoder are trained simultaneously
in the pretext task, results for the segmentation task could get
better
B Self-supervised visual feature learning
We use colonoscopy image reconstruction as the pretext
task for self-supervised learning The pretext task makes the
model learn semantic features of the colonoscopy images by
creating the ground truth labels using known input
transformations We propose two transformation methods to
generate self-supervised labels: random pixel drop and
random channel drop An example of transformed images
with pixel drop and channel drop is shown in Fig.2 With the
random pixel drop method, all channels of the pixel are
randomly dropped In the random channel drop method, a
pixel's red, blue, or green channel is randomly dropped The
drop scale, i.e., the percentage of pixel values dropped,
presents the pretext task complexity We conduct experiments
with different drop scales to explore the impact of pretext task
complexity on downstream tasks The image reconstruction
network is UNet-based Fig.3 shows the image reconstruction
network that we used in this work The input to the network is
a transformed image with randomly dropped pixels, and the
original images serve as labels We use SSIM loss for training
the reconstruction image network SSIM (Structure Similarity
Index Measure) [21] is a perceptual image quality assessment
between a distorted image and a reference image The images
are divided into multiple square windows, and SSIM is
computed in each window as follows:
𝑆𝑆𝐼𝑀(𝑥, 𝑦) = 2𝜇𝑥𝜇𝑦+
𝜇𝑥2+𝜇𝑦2+𝐶 1∙ 2𝜎𝑥𝑦+𝐶2
𝜎𝑥2+𝜎𝑦2+ 𝐶2 (1) Where x, y are two nonnegative image signals, which have
been aligned with each other (e.g., windows extracted from
each image), 𝜇𝑥 is mean of x, 𝜇𝑦 is mean of y, 𝜎𝑥 is the
variance of x, 𝜎𝑦 is the variance of y, 𝜎𝑥𝑦 is the covariance of
Fig.2 Example of random pixel drop and random channel drop transformation: (a) original image, (b) pixel drop transformed image, (c)
channel drop transformed image
Fig.3 The proposed self-supervised learning model
x and y, 𝐶1, 𝐶2 are constants added to avoid instability The overall quality measure of the entire image is computed as:
𝑀𝑆𝑆𝐼𝑀(𝑋, 𝑌) = 1
𝑀∑𝑀𝑗=1𝑆𝑆𝐼𝑀(𝑥𝑗, 𝑦𝑗) (2) where X and Y are the reference and the distorted images, respectively, 𝑥𝑗, 𝑦𝑗 are the image contents at the j window
SSIM loss is used to reconstruct images in accordance with human perception SSIM loss is calculated as:
ℒ𝑆𝑆𝐼𝑀(𝑋, 𝑌) = 1 − 𝑀𝑆𝑆𝐼𝑀(𝑋, 𝑌) (3) where X and Y are the prediction and the ground truth,
respectively
Colonoscopy images unlabeled dataset is used to train the image reconstruction network The network will learn the general features that capture the salient characteristics of the colonoscopy image data The learned features are transferred
to the downstream network for polyp segmentation
C Polyp segmentation using knowledge transferred from pretext task
After the network is self-trained on the colonoscopy image reconstruction, it is transferred to the polyp segmentation task
By changing the last layer of UNet to match the number of output classes, we repurpose the colonoscopy image reconstruction network for the polyp segmentation task We investigate three different methods of transfer learning for the downstream task The first method is to freeze the weights learned at all network layers, except the layer on top, and only finetune the layer on top The second way is to freeze the weights learned at the encoder and only finetune the decoder The third way is to finetune all the weights, including the encoder, decoder Fig.4 illustrates these methods, the gray area denotes the freezing of the weights learned from the pretext task To train the polyp segmentation network, we use
a public polyp segmentation dataset consisting of colonoscopy images and their corresponding pixel-level annotated polyp masks that were annotated by colonoscopists The asymmetric similarity loss function [22] is used for training networks to address the unbalanced data problem The asymmetric similarity loss function is defined as:
ℒ𝐴𝑠𝑦𝑚𝐶𝐸 = 𝛼 ∗ ℒ𝐶𝐸+ ℒ𝐴𝑠𝑦𝑚 (4) where ℒ𝐶𝐸 is cross-entropy loss, ℒ𝐴𝑠𝑦𝑚 = 1 − 𝐹𝛽 is asymmetric similarity loss which is based on 𝐹𝛽 score and the
Trang 4Fig.4 Network architectures for polyp segmentation and three different
ways for transfer learning: (a)- freezing all network layers, except the layer
on top, and only finetuning the layer on top (b)- freezing the encoder and
only finetunes the decoder, (c)- finetuning all network
hyperparameter 𝛼 control the amount of cross-entropy loss
term contribution in the loss function 𝐹𝛽 score is defined as:
𝐹𝛽 = (1 + 𝛽2) 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙
𝛽 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 (5)
𝐹𝛽 score with the hyper-parameter 𝛽 generalizes Dice
similarity coefficient and Jaccard (IoU) index When 𝛽 = 1,
the 𝐹𝛽 score is Dice score, 𝛽 = 2 generates F2 score, and 𝛽 =
0 transforms the score to precision
III EXPERIMENTSANDRESULTS
A Dataset
For the training image reconstruction network, we
constructed an unlabeled colonoscopy image dataset from
several datasets First, the unlabeled colonoscopy images
were acquired from PACS Systems of Hospital 103 in Hanoi,
Vietnam These images were extracted from colonoscopy
videos of patients who may or may not have polyps in the
colon After collecting and standardizing data, we obtained
8500 colonoscopy images with size (384× 288) We also add
colonoscopy images from public datasets, including
CVC-ClinicDB with 612 images, Kvasir-SEG with 1000 images,
ETIS-Larib with 196 images, and CVC-ColonDB with 300
images without using the labels Finally, we already have an
unlabeled dataset with 10,608 colonoscopy images We use
this unlabeled dataset for training and validating the image
reconstruction network For finetuning the polyp
segmentation model, the CVC-ColonDB dataset is used In
our experiments, both datasets are split 80/10/10 for training,
validating, and testing
B Implementation
The proposed models are implemented using Keras and
Tensorflow backend All algorithms have been
programmed/trained on a PC with a GeForce GTX 1080 Ti
GPU Both pretext network and downstream network are
updated via Adam optimizer, the learning rate of Adam is set
to 1e−4 All the training data is divided into mini-batches for
network training, and the mini-batch size is set as four during
the training stage Data augmentation was performed online, including vertical flipping, horizontal flipping, random rotation, random scaling, random shearing, random Gaussian blurring, random brightness, and random cropping and padding for polyp segmentation network training The model generated at the epoch with a min loss value on the validating set is the final self-supervised learning model On the downstream task, the model generated at the epoch with max dice score on the validating set is the final polyp segmentation network The UNet baseline for polyp segmentation was also trained with exactly the same setting but initialized with random weights
C Pretext task results
We implement UNet for image reconstruction with backbone MobileNetV2 as described in section II.A The unlabeled colonoscopy image dataset with 10,608 images was used for training and testing the network We use both methods, random pixel drop and random channel drop discussed in section II.B, to generate self-supervised labels with equal probability To understand the impact of increased pretext task complexity on polyp segmentation tasks, we experiment with randomly dropping X% of pixels in an image, where X equals 20%,30%, 40%, 50%, 60%,70%, and 80% TABLE 1 ACCURACY OF IMAGE RECONSTRUCTION WITH DIFFERENT DROP SCALES
Fig.4 Examples of reconstruction network predictions Table 1 presents the accuracy of image reconstruction with different drop scales on the test set This table shows that when
Trang 5the drop scale equals 50%, image reconstruction accuracy is
highest with 88.20% Moreover, Fig.5 visualizes some
examples of reconstruction network predictions with the drop
scales of 50% This figure shows that reconstructed images
have quite good quality, but there are changes in brightness
and color shifts
D Polyp segmentation results
For polyp segmentation, we apply transfer learning from
the pretext network The polyp segmentation network is
finetuned using a labeled polyp segmentation dataset First,
we conducted an experiment that used transfer learning UNet
from the pretext task with a difference drop scale to evaluate
the impact of pretext task complexity on polyp segmentation
tasks We use the CVC-ColonDB dataset, which consists of
300 polyp images and their corresponding pixel-level
annotated polyp masks for finetuned segmentation network
The dataset is split 80/10/10 for training, validating, and
testing For the evaluation of polyp segmentation
performance, we use popular metrics for image segmentation:
the Dice score coefficient (Dice), Jaccard index (IoU), Recall
(Re), and Precision (Prec) [22] Table 2 presents performance
metrics for polyp segmentation on the test set The table shows
that the polyp segmentation performance is highest when the
drop scale equals 50% Next, we evaluated the performance
of polyp segmentation network trained using transfer learning
from pretext task with drop scale is 50% We trained UNet
from scratch and finetuned UNet with a transfer learning
method for polyp segmentation Table 3 compares the
performance of polyp segmentation between the U-net trained
from scratch and self-supervised learning methods on the test
set As it shows, the performance of the self-supervised
learning method outperforms UNet trained from scratch in
both IoU and Dice metrics with 5.12% in IoU and 2.72% in
Dice Even if we freeze the encoder and only finetune the
decoder, we can achieve a high accuracy comparable to
training an UNet from scratch This indicates that
self-supervised learning can learn good features at the encoder that
are transferrable for the segmentation task In addition, Fig.5
shows some examples of polyp segmentation prediction
generated by different transfer learning methods This figure
also shows that the self-supervised learning method with
finetuning all layers of the segmentation network generates
the best results
D Comparison with Other Methods
We evaluate our proposed segmentation method on other
independent datasets, ETIS-Larib and CVC-ClinicDB Then
we compare the results with current works, which have the
same training and testing data scenarios: ColonDB for training
and ETIS-Larib and ClinicDB for testing Our results are
presented in Table 5 The table shows that the Dice score of
the proposed method outperforms previous methods on both
test sets with a Dice score of 65.63% on ETIS-Larib and a
Dice score of 77.25% on CVC-ClinicDB
IV CONCLUSION
In this article, we presented self-supervised visual feature
learning for polyp segmentation in colonoscopy images We
adapted self-supervised visual feature learning with image
reconstruction as a pretext task and polyp segmentation as a
downstream task UNet with MobinetV2 backbone
architecture was used for both pretext task and downstream
task
TABLE 2 PERFORMANCE OF POLYP SEGMENTATION FROM PRETEXT TASK WITH DIFFERENCE DROP SCALES
TABLE 3 PERFORMANCE OF POLYP SEGMENTATION FROM PRETEXT TASK WITH DIFFERENCE DROP SCALES.
Training from scratch 86.61 76.87 86.12 87.71 Finetuning the top layer 79.15 70.63 82.19 86.43 Finetuning decoder 86.87 77.45 79.17 92.70 Finetuning all network 89.33 81.99 86.05 94.65
TABLE 4 COMPARISON WITH OTHER METHODS
The highlight is that proposed self-supervised learning methods allow us to train both the encoder and decoder in tandem in the pretext task Thus, results for the polyp segmentation task could get better We experimented with different pixel drop percentages and different ways of transfer learning for the downstream task Our experimental results show that the proposed method can achieve a high polyp segmentation accuracy that is better than an UNet trained from scratch Moreover, when comparing the results on the same training and testing data scenarios: ColonDB for training and ETIS-Larib and ClinicDB for testing, the Dice score of the proposed method outperforms previous methods on both test sets
In this work, we use SSIM loss for training the image reconstruction network; SSIM loss did reconstruct the images with quite good quality However, there is a change in brightness or a color shift of the reconstructed image because the SSIM is not sensitive to uniform biases In the future, we will focus on studying loss functions for training image reconstruction networks to improve the performance of the proposed method In addition, it would be interesting to extend this work for image classification and polyp detection on colonoscopy images since they are also important tasks in colonoscopy image analysis
Trang 6REFERENCES [1] A M Leufkens, M G H van Oijen, F P Vleggaar, and P D
Siersema "Factors influencing the miss rate of polyps in a
back-to-back colonoscopy study," Endoscopy, 44(05):470475, 2012
[2] D Vázquez, J Bernal, F J Sánchez, G Fernández-Esparrach, A M
López, A Romero, M Drozdzal, and A Courville, “A benchmark for
endoluminal scene segmentation of colonoscopy images,” Journal of
healthcare engineering, vol 2017, 2017
[3] Jing, Longlong, and Yingli Tian "Self-supervised visual feature
learning with deep neural networks: A survey." IEEE transactions on
pattern analysis and machine intelligence (2020)
[4] Chen, Liang, et al "Self-supervised learning for medical image
analysis using image context restoration." Medical image analysis 58
(2019): 101539
[5] Bernal, Jorge, et al "WM-DOVA maps for accurate polyp highlighting
in colonoscopy: Validation vs saliency maps from physicians."
Computerized Medical Imaging and Graphics 43 (2015): 99-111
[6] Bernal, Jorge, Javier Sánchez, and Fernando Vilarino "Towards
automatic polyp detection with a polyp appearance model." Pattern
Recognition 45.9 (2012): 3166-3182
[7] Silva, Juan, et al "Toward embedded detection of polyps in WCE
images for early diagnosis of colorectal cancer." International Journal
of Computer Assisted Radiology and Surgery 9.2 (2014): 283-293
[8] D Jha et al., "Kvasir-seg: A segmented polyp dataset," in Proc Int
Conf Multimedia Model., 2020, pp 451–462
[9] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox "U-net:
Convolutional networks for biomedical image segmentation."
International Conference on Medical image computing and
computer-assisted intervention Springer, Cham, 2015
[10] Jha, Debesh, et al "Doubleu-net: A deep convolutional neural network
for medical image segmentation." 2020 IEEE 33rd International
Symposium on computer-based medical systems (CBMS) IEEE,
2020
[11] Zhou, Zongwei, et al "Unet++: A nested u-net architecture for medical
image segmentation." Deep learning in medical image analysis and
multimodal learning for clinical decision support Springer, Cham,
2018 3-11
[12] Jha, Debesh, et al "Resunet++: An advanced architecture for medical
image segmentation." 2019 IEEE International Symposium on
Multimedia (ISM) IEEE, 2019
[13] Doersch, Carl, Abhinav Gupta, and Alexei A Efros "Unsupervised visual representation learning by context prediction." Proceedings of the IEEE international conference on computer vision 2015 [14] Noroozi, Mehdi, and Paolo Favaro "Unsupervised learning of visual representations by solving jigsaw puzzles." European conference on computer vision Springer, Cham, 2016
[15] Pathak, Deepak, et al "Context encoders: Feature learning by inpainting." Proceedings of the IEEE conference on computer vision and pattern recognition 2016
[16] Zhang, Richard, Phillip Isola, and Alexei A Efros "Colorful image colorization." European conference on computer vision Springer, Cham, 2016
[17] Karnam, Srivallabha Self-Supervised Learning for Segmentation using Image Reconstruction Rochester Institute of Technology, 2020 [18] Jamaludin, Amir, Timor Kadir, and Andrew Zisserman "Self-supervised learning for spinal MRIs." Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support Springer, Cham, 2017 294-302
[19] Tajbakhsh, Nima, et al "Surrogate supervision for medical image analysis: Effective deep learning from limited quantities of labeled data." 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) IEEE, 2019
[20] Ross, Tobias, et al "Exploiting the potential of unlabeled endoscopic video data with self-supervised learning." International journal of computer assisted radiology and surgery 13.6 (2018): 925-933 [21] Wang, Zhou, et al "Image quality assessment: from error visibility to structural similarity." IEEE transactions on image processing 13.4 (2004): 600-612
[22] L T Thu Hong, N Chi Thanh, and T Q Long, "Polyp segmentation
in colonoscopy images using ensembles of u-nets with Efficientnet and asymmetric similarity loss function," in 2020 RIVF International Conference on Computing and Communication Technologies (RIVF) IEEE, 2020, pp.1–6
[23] T Mahmud, B Paul, and S A Fattah, "Polypsegnet: A modified encoder-decoder architecture for automated polyp segmentation from colonoscopy images," Computers in Biology and Medicine, vol 128,
p 104119, 2021
[24] Sandler, Mark, et al "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition 2018
Fig.5 Examples of polyp segmentation prediction generated by differents transfer learning methods