Self supervised visual feature learning for polyp segmentation in colonoscopy images using image reconstruction as pretext task

Self Supervised Visual Feature Learning for Polyp Segmentation in Colonoscopy Images Using Image Reconstruction as Pretext Task Self supervised Visual Feature Learning for Polyp Segmentation in Colono[.]

Trang 1

Self-supervised Visual Feature Learning for Polyp Segmentation in Colonoscopy Images Using Image

Reconstruction as Pretext Task

Le Thi Thu Hong

Institute of Information Technology,

AMST

Hanoi, Vietnam

lethithuhong1302@gmail.com

Nguyen Chi Thanh*

Institute of Information Technology,

AMST

Hanoi, Vietnam thanhnc80@gmail.com

Tran Quoc Long

University of Engineering and Technology, VNU

Hanoi, Vietnam tqlong@gmail.com

Abstract— Automatic polyp detection and segmentation are

desirable for colon screening because the polyps miss rate in

clinical practice is relatively high The deep learning-based

approach for polyp segmentation has gained much attention in

recent years due to the automatic feature extraction process to

segment polyp regions with unprecedented precision However,

training these networks requires a large amount of manually

annotated data, which is limited by the available resources of

endoscopic doctors We propose a self-supervised visual

learning method for polyp segmentation to address this

challenge We adapted self-supervised visual feature learning

with image reconstruction as a pretext task and polyp

segmentation as a downstream task UNet is used as the

backbone architecture for both the pretext task and the

downstream task The unlabeled colonoscopy image dataset is

used to train the pretext network For polyp segmentation, we

apply transfer learning on the pretext network The polyp

segmentation network is trained using a public benchmark

dataset for polyp segmentation Our experiments demonstrate

that the proposed self-supervised learning method can achieve a

better segmentation accuracy than an UNet trained from

scratch On the CVC-ColonDB polyp segmentation dataset with

only annotated 300 images, the proposed method improves IoU

metric from 76.87% to 81.99% and Dice metric from 86.61% to

89.33% for polyp segmentation, compared to the baseline UNet

Keywords: Polyp segmentation, medical image analysis,

transfer learning, deep learning, self-supervised learning

I INTRODUCTION Colonoscopy is considered the gold-standard investigation

for colorectal cancer screening However, the polyps miss rate

in clinical practice is relatively high due to different factors

[1] This presents an opportunity to use AI models to

automatically detect and segment polyps, supporting

clinicians to reduce the number of polyps missed Recently,

deep learning methods have been widely used to solve

medical image segmentation problems, including polyp

segmentation, due to their capacity in learning image features

for the segmentation task [2] Deep learning methods usually

rely on a large amount of training data with manual labels

However, the polyp segmentation dataset may not always be

available because annotation usually requires expert

knowledge of endoscopists Thus, there is more and more

interest in developing methods that do not require a large

number of annotations for learning features of colonoscopy

images to avoid time-consuming and expensive data

annotations Directions of research include transfer learning,

semi-supervised learning, unsupervised, and self-supervised

learning Self-supervised visual feature learning allows visual

feature learning from large-scale unlabeled images

* Corresponding author

Generally, computer vision pipelines that employ self-supervised feature learning involve performing two tasks, a pretext task and a real (downstream) task [3] The pretext task

is the self-supervised learning task for learning visual representations The learned representations or model weights obtained from the pretext task are used for the downstream task The real (downstream) task can be any object recognition task like classification, detection, or segmentation, with insufficient annotated data samples Self-supervised learning

is a good method to discover the unlabelled images to improve the performance of a deep model when there is only limited labeled data This method not only helps to overcome the need for large amounts of annotated data but also helps to improve the robustness and uncertainty of deep convolution neural networks [4]

In this work, to address the challenge of limitations of labeled polyp data, we propose a novel method for training a polyp segmentation network, which formulates a self-supervised task for visual feature learning and decreases the cost of data annotation Image reconstruction is proposed for pretext tasks to improve the performance of real polyp segmentation tasks The visual features of colonoscopy images are learned through training UNet for image reconstruction task as pretext task We use an unlabeled colonoscopy image dataset containing 8,500 images collected from Hospital 103 in Hanoi, Vietnam, to train the pretext network The pixels and pixel channels (R, G, or B pixel channel) in the input images are dropped in a random manner, and the original image serves as the label After self-supervised pretext task training is finished, the learned parameters serve as a pre-trained model and are transferred to the downstream task - polyp segmentation The CVC-ColonDB [11] dataset, containing 300 labeled segmentation polyp images, was used for the finetuning polyp segmentation network Our experiment shows that the proposed method improves the Dice metric significantly for polyp segmentation compared to the baseline segmentation network In summary, the main contributions of our work can be summarized as follows:

1) We propose a self-supervised feature learning method for training a polyp segmentation network using image reconstruction as a pretext task In the pretext task, the pixels and or pixel channels (R or G or B pixel channel) in the input images are dropped in a random manner, and the original image serves as the label

2) The experimental results on a public polyp segmentation dataset show the efficacy of our method In experiments, we also study the effect of pretext task

Trang 2

complexity and polyp segmentation network finetuning

methods on the performance of the polyp segmentation tasks

The rest of the article is organized as Section 2 reviews

related research on deep learning for polyp segmentation and

self-supervised learning for medical image analysis Section 3

describes our proposed self-supervised visual feature learning

method for polyp segmentation in colonoscopy images using

image reconstruction as a pretext task in detail Section 4

outlines our experiment settings, experimental results, and

discussion Finally, Section 5 summarizes and concludes this

work

II RELATEDWORK

A Polyp segmentation using deep learning methods

A Computer-Aided Diagnosis (CADx) system for polyp

segmentation on colonoscopy images can be an effective

clinical tool that helps endoscopists for faster screening and

higher accuracy [2] However, precise polyp segmentation is

still challenging due to variations of polyps in size, shape,

texture, and color Like other medical imaging applications,

the deep learning-based approach for polyp segmentation has

gained much attention in recent years due to the automatic

feature extraction process to segment polyp regions with

unprecedented precision In addition, the public database of

polyp images facilitated further research on the use of deep

learning models for polyp segmentation Several benchmark

datasets are publicly available that are used for the training

and evaluation of the deep models They are as follows:

CVC-ClinicDB [5] dataset consists of 612 images with the

corresponding ground truth masks of defined polyp regions

Kvasir-SEG dataset [8] includes 1000 polyp images with

corresponding ground truth masks manually annotated by

expert endoscopists ETIS-Larib [7] dataset contains 36

different types of polyps in 196 images with the ground truth

masks were annotated by experts The CVC-ColonDB [6]

consists of 300 polyp images and their corresponding

pixel-level annotated polyp masks UNet [9], an

encoder-decoder-based structure that uses skip connections to concatenate the

features from the encoding and decoding layers, is a popular

strategy for solving medical image segmentation tasks,

including polyp segmentation Inspired by the success of

UNet, several variants were proposed for polyp segmentation

and yielded promising results, such as DoubleUNet [10],

UNet++[11], ResUNet++[12]

B self-supervised learning in the medical imaging domain

Self-supervised learning, which formulates a pretext task based on unannotated data for feature learning, has gained more and more popularity in recent years Various types of pretext tasks have been proposed depending on data types For general image and video analysis problems, patch relative positions [13,14], local context prediction [15], colorization [16], and image reconstruction [17] have been used in self-supervised learning In the medical imaging domain, patients often have follow-up scans, and all unlabeled images are stored on PACS systems (Picture Archiving and Communication System) At the same time, only limited labeled data is available because annotation usually requires expert knowledge Thus, self-supervised learning is a good way for mining the unlabelled images to improve the deep neural network accuracy Overpass year, self-supervised learning has also been explored for medical imaging but to a less extent Jamaludin et al [18] proposed a self-supervised learning method to predict the level of vertebral bodies, e.g., classify whether two spinal MR images came from the same patient or not They used to recognize the recognization of patients’ MR scans as a pretext task and prediction the level

of vertebral bodies as a real task Tajbakhsh et al [19] proposed a self-supervised learning method for lung lobe segmentation and nodule detection tasks by using rotation prediction as a pretext task Ross et al [20] proposed the method to exploit the potential of unlabeled endoscopic video data for surgical instrument segmentation They defined the decolorization of surgical videos as a pretext task and used the pre-trained features to initialize a surgical instrument segmentation network In this article, different from previous works, we propose novel self-supervised learning, which uses image reconstruction as a pretext task and polyp segmentation

as a real task

A The architecture of the proposed model

Overall the proposed method, which adapts self-supervised visual feature learning for training a polyp segmentation network, is depicted in Fig.1 The image reconstruction is used as the pretext task We use UNet [16]

as a backbone architecture for both the pretext and downstream tasks UNet architecture was developed for a biomedical image segmentation application and is more and more widely used in medical image analysis applications UNet consists of two paths: encoder and decoder The encoder

Fig.1 Overview of the proposed self-supervised visual feature learning method for polyp segmentation

Trang 3

is a typical CNN's which generally consists of a set of

convolutional and pooling layers The encoder is used to

capture the context in the image The decoder is the

symmetric expanding path that is used to enable precise

localization using transposed convolutions MobileNetV2

[24], which is designed to effectively maximize accuracy

while being mindful of the restricted resources for an

on-device or embedded application, is used as the encoder

Although deeper backbones can give more accurate results,

we choose MobileNetV2 as a backbone for feature extraction

because this is the lightweight model suitable for an on-device

or embedded application with acceptable accuracy Output

layers of UNet differ for pretext task and downstream task:

convolution layer with filters of 3 for pretext network and

convolution layer with filters of 1 for the downstream task

The pipeline of the proposed method is as follows: First,

we use an unlabeled dataset for training a UNet-based image

reconstruction network to learn visual features of the

colonoscopy image The transformed images are the input of

the reconstruction network, and the original images serve as

ground truth labels The pretext task, image reconstruction,

challenges the network to learn visual features with

automatically generated pseudo-labels Then, we use the

learned features to transfer to the polyp segmentation network

Since both the encoder and decoder are trained simultaneously

in the pretext task, results for the segmentation task could get

better

B Self-supervised visual feature learning

We use colonoscopy image reconstruction as the pretext

task for self-supervised learning The pretext task makes the

model learn semantic features of the colonoscopy images by

creating the ground truth labels using known input

transformations We propose two transformation methods to

generate self-supervised labels: random pixel drop and

random channel drop An example of transformed images

with pixel drop and channel drop is shown in Fig.2 With the

random pixel drop method, all channels of the pixel are

randomly dropped In the random channel drop method, a

pixel's red, blue, or green channel is randomly dropped The

drop scale, i.e., the percentage of pixel values dropped,

presents the pretext task complexity We conduct experiments

with different drop scales to explore the impact of pretext task

complexity on downstream tasks The image reconstruction

network is UNet-based Fig.3 shows the image reconstruction

network that we used in this work The input to the network is

a transformed image with randomly dropped pixels, and the

original images serve as labels We use SSIM loss for training

the reconstruction image network SSIM (Structure Similarity

Index Measure) [21] is a perceptual image quality assessment

between a distorted image and a reference image The images

are divided into multiple square windows, and SSIM is

computed in each window as follows:

𝑆𝑆𝐼𝑀(𝑥, 𝑦) = 2𝜇𝑥𝜇𝑦+

𝜇𝑥2+𝜇𝑦2+𝐶 1∙ 2𝜎𝑥𝑦+𝐶2

𝜎𝑥2+𝜎𝑦2+ 𝐶2 (1) Where x, y are two nonnegative image signals, which have

been aligned with each other (e.g., windows extracted from

each image), 𝜇𝑥 is mean of x, 𝜇𝑦 is mean of y, 𝜎𝑥 is the

variance of x, 𝜎𝑦 is the variance of y, 𝜎𝑥𝑦 is the covariance of

Fig.2 Example of random pixel drop and random channel drop transformation: (a) original image, (b) pixel drop transformed image, (c)

channel drop transformed image

Fig.3 The proposed self-supervised learning model

x and y, 𝐶1, 𝐶2 are constants added to avoid instability The overall quality measure of the entire image is computed as:

𝑀𝑆𝑆𝐼𝑀(𝑋, 𝑌) = 1

𝑀∑𝑀𝑗=1𝑆𝑆𝐼𝑀(𝑥𝑗, 𝑦𝑗) (2) where X and Y are the reference and the distorted images, respectively, 𝑥𝑗, 𝑦𝑗 are the image contents at the j window

SSIM loss is used to reconstruct images in accordance with human perception SSIM loss is calculated as:

ℒ𝑆𝑆𝐼𝑀(𝑋, 𝑌) = 1 − 𝑀𝑆𝑆𝐼𝑀(𝑋, 𝑌) (3) where X and Y are the prediction and the ground truth,

respectively

Colonoscopy images unlabeled dataset is used to train the image reconstruction network The network will learn the general features that capture the salient characteristics of the colonoscopy image data The learned features are transferred

to the downstream network for polyp segmentation

C Polyp segmentation using knowledge transferred from pretext task

After the network is self-trained on the colonoscopy image reconstruction, it is transferred to the polyp segmentation task

By changing the last layer of UNet to match the number of output classes, we repurpose the colonoscopy image reconstruction network for the polyp segmentation task We investigate three different methods of transfer learning for the downstream task The first method is to freeze the weights learned at all network layers, except the layer on top, and only finetune the layer on top The second way is to freeze the weights learned at the encoder and only finetune the decoder The third way is to finetune all the weights, including the encoder, decoder Fig.4 illustrates these methods, the gray area denotes the freezing of the weights learned from the pretext task To train the polyp segmentation network, we use

a public polyp segmentation dataset consisting of colonoscopy images and their corresponding pixel-level annotated polyp masks that were annotated by colonoscopists The asymmetric similarity loss function [22] is used for training networks to address the unbalanced data problem The asymmetric similarity loss function is defined as:

ℒ𝐴𝑠𝑦𝑚𝐶𝐸 = 𝛼 ∗ ℒ𝐶𝐸+ ℒ𝐴𝑠𝑦𝑚 (4) where ℒ𝐶𝐸 is cross-entropy loss, ℒ𝐴𝑠𝑦𝑚 = 1 − 𝐹𝛽 is asymmetric similarity loss which is based on 𝐹𝛽 score and the

Trang 4

Fig.4 Network architectures for polyp segmentation and three different

ways for transfer learning: (a)- freezing all network layers, except the layer

on top, and only finetuning the layer on top (b)- freezing the encoder and

only finetunes the decoder, (c)- finetuning all network

hyperparameter 𝛼 control the amount of cross-entropy loss

term contribution in the loss function 𝐹𝛽 score is defined as:

𝐹𝛽 = (1 + 𝛽2) 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙

𝛽 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 (5)

𝐹𝛽 score with the hyper-parameter 𝛽 generalizes Dice

similarity coefficient and Jaccard (IoU) index When 𝛽 = 1,

the 𝐹𝛽 score is Dice score, 𝛽 = 2 generates F2 score, and 𝛽 =

0 transforms the score to precision

III EXPERIMENTSANDRESULTS

A Dataset

For the training image reconstruction network, we

constructed an unlabeled colonoscopy image dataset from

several datasets First, the unlabeled colonoscopy images

were acquired from PACS Systems of Hospital 103 in Hanoi,

Vietnam These images were extracted from colonoscopy

videos of patients who may or may not have polyps in the

colon After collecting and standardizing data, we obtained

8500 colonoscopy images with size (384× 288) We also add

colonoscopy images from public datasets, including

CVC-ClinicDB with 612 images, Kvasir-SEG with 1000 images,

ETIS-Larib with 196 images, and CVC-ColonDB with 300

images without using the labels Finally, we already have an

unlabeled dataset with 10,608 colonoscopy images We use

this unlabeled dataset for training and validating the image

reconstruction network For finetuning the polyp

segmentation model, the CVC-ColonDB dataset is used In

our experiments, both datasets are split 80/10/10 for training,

validating, and testing

B Implementation

The proposed models are implemented using Keras and

Tensorflow backend All algorithms have been

programmed/trained on a PC with a GeForce GTX 1080 Ti

GPU Both pretext network and downstream network are

updated via Adam optimizer, the learning rate of Adam is set

to 1e−4 All the training data is divided into mini-batches for

network training, and the mini-batch size is set as four during

the training stage Data augmentation was performed online, including vertical flipping, horizontal flipping, random rotation, random scaling, random shearing, random Gaussian blurring, random brightness, and random cropping and padding for polyp segmentation network training The model generated at the epoch with a min loss value on the validating set is the final self-supervised learning model On the downstream task, the model generated at the epoch with max dice score on the validating set is the final polyp segmentation network The UNet baseline for polyp segmentation was also trained with exactly the same setting but initialized with random weights

C Pretext task results

We implement UNet for image reconstruction with backbone MobileNetV2 as described in section II.A The unlabeled colonoscopy image dataset with 10,608 images was used for training and testing the network We use both methods, random pixel drop and random channel drop discussed in section II.B, to generate self-supervised labels with equal probability To understand the impact of increased pretext task complexity on polyp segmentation tasks, we experiment with randomly dropping X% of pixels in an image, where X equals 20%,30%, 40%, 50%, 60%,70%, and 80% TABLE 1 ACCURACY OF IMAGE RECONSTRUCTION WITH DIFFERENT DROP SCALES

Fig.4 Examples of reconstruction network predictions Table 1 presents the accuracy of image reconstruction with different drop scales on the test set This table shows that when

Trang 5

the drop scale equals 50%, image reconstruction accuracy is

highest with 88.20% Moreover, Fig.5 visualizes some

examples of reconstruction network predictions with the drop

scales of 50% This figure shows that reconstructed images

have quite good quality, but there are changes in brightness

and color shifts

D Polyp segmentation results

For polyp segmentation, we apply transfer learning from

the pretext network The polyp segmentation network is

finetuned using a labeled polyp segmentation dataset First,

we conducted an experiment that used transfer learning UNet

from the pretext task with a difference drop scale to evaluate

the impact of pretext task complexity on polyp segmentation

tasks We use the CVC-ColonDB dataset, which consists of

300 polyp images and their corresponding pixel-level

annotated polyp masks for finetuned segmentation network

The dataset is split 80/10/10 for training, validating, and

testing For the evaluation of polyp segmentation

performance, we use popular metrics for image segmentation:

the Dice score coefficient (Dice), Jaccard index (IoU), Recall

(Re), and Precision (Prec) [22] Table 2 presents performance

metrics for polyp segmentation on the test set The table shows

that the polyp segmentation performance is highest when the

drop scale equals 50% Next, we evaluated the performance

of polyp segmentation network trained using transfer learning

from pretext task with drop scale is 50% We trained UNet

from scratch and finetuned UNet with a transfer learning

method for polyp segmentation Table 3 compares the

performance of polyp segmentation between the U-net trained

from scratch and self-supervised learning methods on the test

set As it shows, the performance of the self-supervised

learning method outperforms UNet trained from scratch in

both IoU and Dice metrics with 5.12% in IoU and 2.72% in

Dice Even if we freeze the encoder and only finetune the

decoder, we can achieve a high accuracy comparable to

training an UNet from scratch This indicates that

self-supervised learning can learn good features at the encoder that

are transferrable for the segmentation task In addition, Fig.5

shows some examples of polyp segmentation prediction

generated by different transfer learning methods This figure

also shows that the self-supervised learning method with

finetuning all layers of the segmentation network generates

the best results

D Comparison with Other Methods

We evaluate our proposed segmentation method on other

independent datasets, ETIS-Larib and CVC-ClinicDB Then

we compare the results with current works, which have the

same training and testing data scenarios: ColonDB for training

and ETIS-Larib and ClinicDB for testing Our results are

presented in Table 5 The table shows that the Dice score of

the proposed method outperforms previous methods on both

test sets with a Dice score of 65.63% on ETIS-Larib and a

Dice score of 77.25% on CVC-ClinicDB

IV CONCLUSION

In this article, we presented self-supervised visual feature

learning for polyp segmentation in colonoscopy images We

adapted self-supervised visual feature learning with image

reconstruction as a pretext task and polyp segmentation as a

downstream task UNet with MobinetV2 backbone

architecture was used for both pretext task and downstream

task

TABLE 2 PERFORMANCE OF POLYP SEGMENTATION FROM PRETEXT TASK WITH DIFFERENCE DROP SCALES

TABLE 3 PERFORMANCE OF POLYP SEGMENTATION FROM PRETEXT TASK WITH DIFFERENCE DROP SCALES.

Training from scratch 86.61 76.87 86.12 87.71 Finetuning the top layer 79.15 70.63 82.19 86.43 Finetuning decoder 86.87 77.45 79.17 92.70 Finetuning all network 89.33 81.99 86.05 94.65

TABLE 4 COMPARISON WITH OTHER METHODS

The highlight is that proposed self-supervised learning methods allow us to train both the encoder and decoder in tandem in the pretext task Thus, results for the polyp segmentation task could get better We experimented with different pixel drop percentages and different ways of transfer learning for the downstream task Our experimental results show that the proposed method can achieve a high polyp segmentation accuracy that is better than an UNet trained from scratch Moreover, when comparing the results on the same training and testing data scenarios: ColonDB for training and ETIS-Larib and ClinicDB for testing, the Dice score of the proposed method outperforms previous methods on both test sets

In this work, we use SSIM loss for training the image reconstruction network; SSIM loss did reconstruct the images with quite good quality However, there is a change in brightness or a color shift of the reconstructed image because the SSIM is not sensitive to uniform biases In the future, we will focus on studying loss functions for training image reconstruction networks to improve the performance of the proposed method In addition, it would be interesting to extend this work for image classification and polyp detection on colonoscopy images since they are also important tasks in colonoscopy image analysis

Trang 6

REFERENCES [1] A M Leufkens, M G H van Oijen, F P Vleggaar, and P D

Siersema "Factors influencing the miss rate of polyps in a

back-to-back colonoscopy study," Endoscopy, 44(05):470475, 2012

[2] D Vázquez, J Bernal, F J Sánchez, G Fernández-Esparrach, A M

López, A Romero, M Drozdzal, and A Courville, “A benchmark for

endoluminal scene segmentation of colonoscopy images,” Journal of

healthcare engineering, vol 2017, 2017

[3] Jing, Longlong, and Yingli Tian "Self-supervised visual feature

learning with deep neural networks: A survey." IEEE transactions on

pattern analysis and machine intelligence (2020)

[4] Chen, Liang, et al "Self-supervised learning for medical image

analysis using image context restoration." Medical image analysis 58

(2019): 101539

[5] Bernal, Jorge, et al "WM-DOVA maps for accurate polyp highlighting

in colonoscopy: Validation vs saliency maps from physicians."

Computerized Medical Imaging and Graphics 43 (2015): 99-111

[6] Bernal, Jorge, Javier Sánchez, and Fernando Vilarino "Towards

automatic polyp detection with a polyp appearance model." Pattern

Recognition 45.9 (2012): 3166-3182

[7] Silva, Juan, et al "Toward embedded detection of polyps in WCE

images for early diagnosis of colorectal cancer." International Journal

of Computer Assisted Radiology and Surgery 9.2 (2014): 283-293

[8] D Jha et al., "Kvasir-seg: A segmented polyp dataset," in Proc Int

Conf Multimedia Model., 2020, pp 451–462

[9] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox "U-net:

Convolutional networks for biomedical image segmentation."

International Conference on Medical image computing and

computer-assisted intervention Springer, Cham, 2015

[10] Jha, Debesh, et al "Doubleu-net: A deep convolutional neural network

for medical image segmentation." 2020 IEEE 33rd International

Symposium on computer-based medical systems (CBMS) IEEE,

2020

[11] Zhou, Zongwei, et al "Unet++: A nested u-net architecture for medical

image segmentation." Deep learning in medical image analysis and

multimodal learning for clinical decision support Springer, Cham,

2018 3-11

[12] Jha, Debesh, et al "Resunet++: An advanced architecture for medical

image segmentation." 2019 IEEE International Symposium on

Multimedia (ISM) IEEE, 2019

[13] Doersch, Carl, Abhinav Gupta, and Alexei A Efros "Unsupervised visual representation learning by context prediction." Proceedings of the IEEE international conference on computer vision 2015 [14] Noroozi, Mehdi, and Paolo Favaro "Unsupervised learning of visual representations by solving jigsaw puzzles." European conference on computer vision Springer, Cham, 2016

[15] Pathak, Deepak, et al "Context encoders: Feature learning by inpainting." Proceedings of the IEEE conference on computer vision and pattern recognition 2016

[16] Zhang, Richard, Phillip Isola, and Alexei A Efros "Colorful image colorization." European conference on computer vision Springer, Cham, 2016

[17] Karnam, Srivallabha Self-Supervised Learning for Segmentation using Image Reconstruction Rochester Institute of Technology, 2020 [18] Jamaludin, Amir, Timor Kadir, and Andrew Zisserman "Self-supervised learning for spinal MRIs." Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support Springer, Cham, 2017 294-302

[19] Tajbakhsh, Nima, et al "Surrogate supervision for medical image analysis: Effective deep learning from limited quantities of labeled data." 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) IEEE, 2019

[20] Ross, Tobias, et al "Exploiting the potential of unlabeled endoscopic video data with self-supervised learning." International journal of computer assisted radiology and surgery 13.6 (2018): 925-933 [21] Wang, Zhou, et al "Image quality assessment: from error visibility to structural similarity." IEEE transactions on image processing 13.4 (2004): 600-612

[22] L T Thu Hong, N Chi Thanh, and T Q Long, "Polyp segmentation

in colonoscopy images using ensembles of u-nets with Efficientnet and asymmetric similarity loss function," in 2020 RIVF International Conference on Computing and Communication Technologies (RIVF) IEEE, 2020, pp.1–6

[23] T Mahmud, B Paul, and S A Fattah, "Polypsegnet: A modified encoder-decoder architecture for automated polyp segmentation from colonoscopy images," Computers in Biology and Medicine, vol 128,

p 104119, 2021

[24] Sandler, Mark, et al "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition 2018

Fig.5 Examples of polyp segmentation prediction generated by differents transfer learning methods

Tiêu đề	Self supervised visual feature learning for polyp segmentation in colonoscopy images using image reconstruction as pretext task
Tác giả	Th Le Thu Hong, Nguyen Chi Thanh
Trường học	Institute of Information Technology, AMST
Chuyên ngành	Medical Image Analysis
Thể loại	Conference Paper
Năm xuất bản	2021
Thành phố	Hanoi

Định dạng
Số trang	6
Dung lượng	915,06 KB