Dr unet rethinking the resunet++ architecture with dual respath skip connection for nuclei segmentation

DR Unet Rethinking the ResUnet++ Architecture with Dual ResPath Skip Connection for Nuclei Segmentation DR Unet Rethinking the ResUnet++ Architecture with Dual ResPath skip connection for Nuclei segme[.]

Trang 1

DR-Unet: Rethinking the ResUnet++ Architecture with Dual ResPath skip connection for Nuclei

segmentation

Nhat-Minh Le Faculty of Automation Engineering School of Electrical and Electronics Engineering

Hanoi University of Science and Technology

minh.ln181647@sis.hust.edu.vn

Dinh-Hung Le Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology hung.ld181504@sis.hust.edu.vn Van-Truong Pham*

Faculty of Automation Engineering

School of Electrical and Electronics Engineering

Hanoi University of Science and Technology

truong.phamvan@hust.edu.vn

Thi-Thao Tran Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology

thao.tranthi@hust.edu.vn

Abstract—Nuclei segmentation is a crucial stage in the analysis

of cell microscope pictures By identifying nuclei, researchers may

identify and characterize each cell in a sample Some models

used techniques based on encoder-decoder pairs, such as U-Net,

Multi ResUnet, DoubleUnet, and ResUnet++, which have been

implemented and deployed on the Data Science Bowl 2018 dataset

and given excellent results However, there is still a semantics

gap between the features that directly connect from encoder to

decoder in ResUnet++, and the extraction of information on many

different regions is still limited To improve the performance

of ResUnet++ in this segmentation task, in this paper, we

propose a new architecture that uses Double ResPath (DR),

called Double respath Unet (DR-Unet) The DR-Unet architecture

retains some advantages that made Resunet++ successful such as

residual block associated with a squeeze and excitation block

Besides that, we also pass the encoder features through Respath,

which can bridge the semantic gap instead of combining the

encoder with the decoder feature straightforwardly Moreover,

we use Progressive Atrous Spatial Pyramidal Pooling, PASPP, to

replace ASPP to capture contextual information more efficiently

Experimental results demonstrate that DR-Unet outperforms

ResUnet, DoubleUnet, and other models in the benchmark

Index Terms—Deep Learning, Nuclei Segmentation, Image

Segmentation, ResUnet++, Multi ResUnet

I INTRODUCTION Medical image segmentation is the task of segmenting

objects of interest in medical images More specifically, it is

the task of labeling each pixel in a medical image In this field,

nuclei image segmentation has received considerable attention

Identifying the cell’s nuclei will help locate the cells under

different conditions to enable faster cures Besides, we can

improve throughput for research and insight, reduce

time-to-market for new drugs, etc [1] However, the manual cell’s

nucleus image segmentation is highly time-consuming and

labor-intensive, and the accuracy is highly dependent on the

expertise of the experts Therefore, automatic nucleus image partitioning is an essential requirement

Otsu-based method [2], the watershed method [3], active contour [4] are only a few of the traditional image segmen-tation approaches that have been used in this problem The majority of the approaches listed above, on the other hand, are inefficient, time-consuming, and computationally demanding Another noteworthy factor is that the nuclei of the cells in the

2018 Data Science Bowl dataset [5] varied greatly in form, size, color, and border Traditional techniques can’t distinguish certain cell nuclei because their borders are unclear

Recently, with the vigorous development of Convolutional Neural Networks (CNNs), CNN-based image segmentation methods have shown superior performance compared to tra-ditional methods in many segmentation tasks [6] Long et al proposed Fully Convolutional Network (FCN) [7], one of the first deep learning architectures trained end-to-end for pixel-wise prediction FCN uses an encoder to extract features of the input image and a decoder to generate a segmentation mask from those features Unet [8] is another popular method After them, many image segmentation architectures that use encoder-decoder structures are released and get good results ResUnet++ [9], developed from ResUnet, takes advantage

of residual units, squeeze and excitation units, Atrous Spatial Pyramidal Pooling (ASPP), and attention units, showing great potential in medical image segmentation Double Unet [10] uses two U-Net [8] architecture in sequence, with two encoders and two decoders Multi ResUnet [11], an enhanced version

of the standard U-Net architecture, uses Multi ResBlock to capture more spatial information and Respath to reduce the semantic gap between pair parallel features from the encoder

to decoder

The above models have proven effective in nuclei

Trang 2

segmen-tation, especially the 2018 Data Science Bowl dataset [5] We

desire to inherit and combine the strengths of the above models

and develop a new architecture that achieves better results

From that movitation, this paper develops a novel architecture

that uses Double respath scheme,called DR-Unet, inspired by

ResUnet++ [9] architecture, for medical image segmentation

We tested the model on the Data 2018 Science Bowl dataset

[5] The results indicate that the improved model is efficient

and performs well compared to ResUnet++ and other models

in the benchmark

The paper is organized as follows: In Section II, we review

some related works The proposed model is presented in

Section III Experimental results on the Data 2018 Science

Bowl are given Section IV Finally, we summarize the paper

and discuss future work and limitations in Section V

II RELATED WORK Deep learning-based algorithms have recently been widely

used in medical image applications, such as image

super-resolution, classification, especially medical imaging

segmen-tation Along with the development of deep learning,

Im-age Segmentation also achieved remarkable accomplishments

Many deep learning-based techniques have been used to

segment cells and nuclei However, there are still some existing

problems that require researchers to come up with methods to

solve

Theoretically, the deeper the model, the higher the accuracy

[12] [13] However, there will be a deterioration problem when

the model reaches a certain depth [14] [15] He et al suggested

a residual learning framework [14] to overcome this problem

and increase the depth of the model Residual Blocks have a

simple architecture but are capable of deeper model training

without the degradation problem

The Squeeze and excitation [17] block aims to improve

the quality of convolutional neural networks by performing

recalibration of each channel’s features, enhancing information

between channels, and selective emphasis on channels

contain-ing more important features With any convolution layers, we

can construct a corresponding SE block to reframe the feature

maps This task is achieved in two steps:

• Features are passed through a transformation, usually

several convolutional layers, which extract the feature

maps These feature maps go through a squeeze function,

generating a feature matrix of each channel

• The excitation function is added right after that, taking the

above composite matrices as input through the calculation

steps to calculate the weights describing the dependence

between the channels These weights are then multiplied

by features we got before to get the significant features

to the problem

In medical image segmentation, there are usually some

issues as follows The first issue is that as the model becomes

deeper, the resolution of the features is reduced due to the mass

application of pooling layers At that time, it is challenging

to extract spatial information The second issue is that the

objects we are interested in have different scales Chen et al

Fig 1 PASPP architecture with 4 atrous convolutional layer

[18] [19] proposed Atrous Spatial Pyramid Pooling (ASPP) to deal with the above problems ASPP combines the output of atrous convolutions with different dilation rates to increase the ability to capture more global information while keeping the size of the feature map the same In the article [20], Yan et al proposed a Progressive ASPP model based on ASPP PASPP still uses atrous convolution layers with different dilation rates, but the output features are not combined immediately but will

be gradually combined with different cognitive regions Figure

1 depicts the architecture of the PASPP block

Fig 2 ResPath was introduced in [11] by Ibtehaz and Rahman It helps to reduce the semantic gap between the Encoder and the Decoder.

In U-net architecture [8], Ronneberger et al proposed a shortcut link between the convolutional layers immediately before the MaxPooling layer in the Encoder and the convo-lutional layers immediately after the equivalent deconvolution layer in the Decoder This permits the Encoder to send context information to the Decoder that was lost during training However, in the paper [11], Ibtehaz and Rahman pointed out that a problem in simple skip connection is the information imbalance between features in Encoder and Decoder Here, the

Trang 3

Encoder features are considered low-level features compared

to the features in the Decoder because the features in the

Decoder are computed in the deeper layers of the network

Therefore, directly combining these features can cause

differ-ences that adversely affect the segmentation results To address

this problem, Ibtehaz and Rahman [11] devised the ”Res Path”

which consists of numerous convolutional layers running the

short connection length to lessen the information gap between

the Encoder and the Decoder as shown in Figure 2

III DR-UNETARCHITECTURE

Fig 3 Proposed DR-Unet architecture.

In the current work, we propose a new architecture that

uses Double ResPath (DR), called Double Respath Unet

(DR-Unet) for nuclei segmentation The architecture of the model is

presented in Fig.3 To extract feature information in Encoder,

we employ one Stem block first, followed by three SE blocks interspersed with three Residual blocks The first PASPP block is used to collect information about multi-scale objects efficiently In Decoder, we build three Residual blocks using input from the previous block combining information from Double Res Path Finally, we use a combination of the second PASPP block, which have six atrous convolutional layers with

a higher dilation rate,1×1 2D convolution layer, and Sigmoid activation function to generate the output mask

A Residual block Each Residual block consists of two successive 3×3 con-volutional blocks In [16], He et al demonstrated that the use

of Batch Normalization and ReLU activation as pre-activation

is surprisingly effective For this reason, in this paper, we employ a convolutional block with a batch normalization layer,

a ReLU activation layer, and a convolutional layer A 1×1 convolutional layer is applied on the shortcut that connects the input and output of the encoder block A strided convolution layer is applied to reduce the spatial dimension of the feature maps by half at the first convolutional layer of the encoder block

In the proposed architecture, the Squeeze and Excitation blocks are stacked together with the Residual blocks to in-crease effective generalization and improve the performance

of the network [9]

B Progressive Atrous Spatial Pyramid Pooling (PASPP) Because of the potential and efficiency of PASPP, we adopt two PASPP blocks in the ResUnet++ architecture Since the input features of the first ASPP block are 32×32, we employ four layers of atrous convolution with a dilation rate of 1, 2, 4,

8 respectively, similar to what Yan el al [20] did in his model

In the 2nd PASPP block, since the input size is 256×256,

We believe that using more atrous convolutional layers and a higher dilation rate will result in a more informed final result Because of that reasons, we adopt 6 atrous convolution with dilation rate of 1, 2, 4, 8, 16, 32 respectively

C Double ResPath Inspired by ResPath and ResUnet++ model, we propose

a new shortcut called ”Double Res Path” in this article Double Res Path is illustrated in Figure 3 In ResUnet++ architecture [9], Attention block combines low-level features

in Encoder with high-level features in Decoder to identify which parts of the network need more attention We believe that concatenating the output of ResPath to the decoder feature both before and after upsampling step will preserve context information most accurately We also find that when using 2 ResPath blocks, the usefulness of the attention block

is still but not much, and removing them from the model will significantly reduce the cost calculation but still keep good results Furthermore, since the semantic gap between Encoder and Decoder decreases as the network is trained in deeper layers, we also gradually reduce convolutional blocks

Trang 4

along Double ResPath [11], respectively, is 3, 2, 1 However,

corresponding to the number of filters at the ends of Double

ResPath, we apply the number of filters to the layers at Double

ResPath as 64, 128, 256 respectively

IV EXPERIMENT AND RESULTS

A Datasets

In the Data Science Bowl 2018 dataset [5], scientists

world-wide were challenged to detect and image cells in a series of

micro pictures using machine learning techniques The primary

task is to determine image segmentation algorithms that can

be used to a large number of tests images without any human

influence This method might shorten the time to analyze

images, allowing future researchers to adopt and test more

experiments for research and clinical application

The Data Science Bowl 2018 dataset contains 670 training

pairs and 65 testing pairs with each pair includes an image

and its corresponding masks [1] The dataset includes 5 types

of cell images: Small flourescent, Purple tis-sue, Pink and

purple tissue, Large flourescent, Grayscale tissue with different

percentages of data inside [5]

B Training

We implemented our model and utilized the Adam

algo-rithm for optimizing the trainable parameters of the model

with 1e×10−4learning rate The training process is looped on

the dataset for 200 epochs with batch size 8 Early stopping

and Reducelronplateau have already been used

C Evaluation Metrics

To assess the neural network’s segmentation performance,

we utilize the Dice Similarity Coefficient (DSC) The DSC

metric is described as follows:

where TP, TN, FP, FN are respectively the number of true

positives, true negatives, false positives and false negatives

In addition to DSC, we also use the Intersection over Union

(IoU) index as an alternative evaluation measure, defined as

D Results

Figure 4 presents the representative segmentation of our

proposed approach on the 2018 Data Science Bowl challenge

dataset As shown in this figure, the predicted masks are in

good agreement with those in the ground truths

In addition, the learning curves obtained from the train and

validation sets are shown in Fig 5 From this figure we can

see that the DSC, and IoU as well as accuracy are stable

after 100 epochs For quantitative assessment, we provide

the evaluation scores including the DSC, and IoU of the

proposed model and other state-of-the-arts in Table 1 The

use of PASPP helps an increase of 1.17 in average Dice

score (92.40) and 1.5 IoU score compared to ASPP (92.23

Dice score and 85.20 IoU score) The comparative methods include the Double Unet, Multi ResUnet, and ResUnet++ As shown in this Table, our approach obtains highest scores for both DSC and IoU In addition, the proposed model has less training parameters (2.6M) compared to ResUnet++ (4.1M), Multi ResUnet (7.3M), and DoubleUnet ( 29.3M)

TABLE I

R ESULT ON N UCLEI SEGMENTATION FROM 2018 D ATA S CIENCE B OWL

CHALLENGE

Double Unet 29,297,573 91.33 85.07 Multi ResUnet 7,275,844 90.92 84.40

Ours (use PASPP) 2,560,973 92.40 86.70 Ours (use ASPP) 3,584,497 91.23 85.20

Fig 4 Some representative segmentation results of DR-Unet on Nuclei images from 2018 Data Science Bowl challenge dataset

V CONCLUSION

In this paper, we have proposed the DR-Unet architecture for nuclei image segmentation In this novel model, we take advantage of Residual blocks, SE blocks Furthermore, we replaced ASPP with PASPP allowing more efficient seman-tic context extraction and Double Res Path development to

Trang 5

Fig 5 Learning curve

reduce the semantic gap between features when combined

Our experiments outperformed several previous state-of-the-art

models on the 2018 Data Science Bowl dataset Besides, this

result also demonstrates the potential of DR-Unet in cell image

segmentation Nevertheless, we found the DR-Unet

architec-ture to be even better In the fuarchitec-ture, besides developing the

DR-Unet model, we will simultaneously develop appropriate

loss functions for image segmentation to achieve even more

performance

ACKNOWLEDGMENT This research is funded by the Hanoi University of Science

and Technology (HUST) under project number T2021-PC-005

REFERENCES [1] ”2018 Data Science Bowl — Kaggle”, Kaggle.com, 2021 [Online].

Available: https://www.kaggle.com/c/data-science-bowl-2018/overview.

[Accessed: 11- Nov- 2021].

[2] KOtsu, N.: A threshold selection method from gray-level histograms.

IEEE Trans Syst Man Cybern 9, 62-66 ( 1979).

[3] 5.W¨ahlby, C., Sintorn, I.-M., Erlandsson, F., Borgefors, G., Bengtsson,

E.: Combining intensity, edge and shape information for 2D and 3D

segmentation of cell nuclei in tissue sections Journal of Microscopy

215, 67-76 (2004).

[4] Hayakawa, T., Surya Prasath, V.B., Kawanaka, H., Aronow, B.J.,

Tsuruoka, S.: Computational Nuclei Segmentation Methods in Digital

Pathology: A Survey Archives of Computational Methods in

Engineer-ing 28, 1-13 (2021) minEngineer-ing and advanced computEngineer-ing (SAPIENCE) (pp.

198- 203) IEEE.

[5] Caicedo, J.C., Goodman, A., Karhohs, K.W., Cimini, B.A., Ackerman,

J., Haghighi, M., Heng, C., Becker, T., Doan, M., McQuin, C., al., e.:

Nucleus segmentation across imaging experiments: the 2018 data science

bowl Nature Methods 16(12), 1247–1125 (2019).

[6] G Litjens, T Kooi, B E Bejnordi, A A A Setio, F Ciompi, M.

Ghafoorian, J A Van Der Laak, B Van Ginneken, and C I Sanchez, ´

“A survey on deep learning in medical image analysis,” Medical image

analysis (MedIA), vol 42, pp 60–88, 2017.

[7] E Shelhamer, J Long and T Darrell, ”Fully Convolutional Networks

for Semantic Segmentation”, IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol 39, no 4, pp 640-651, 2017 Available:

10.1109/tpami.2016.2572683.

[8] 16.Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks

for biomedical image segmentation In: International Conference on

Medical image computing and computer-assisted intervention 2015, pp.

234-241 Springer

[9] ] D Jha, P H Smedsrud, M A Riegler, D Johansen, T De Lange,P Halvorsen, and H D Johansen, “Resunet++: An advanced architecture for medical image segmentation,” in Proceeding of IEEE International Symposium on Multimedia (ISM), 2019, pp 225–2255.

[10] Jha, D., Riegler, M., Johansen, D., Halvorsen, P., Johansen, H.: Doubleu-net: A deep convolutional neural network for medical image segmen-tation In: IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS) 2020, pp 558-564

[11] N Ibtehaz and M Rahman, ”MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation”, Neural Networks, vol 121, pp 74-87, 2020.

[12] K Simonyan and A Zisserman Very deep convolutional networks for large-scale image recognition arXiv:1409.1556, 2014.

[13] C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan,

V Vanhoucke, and A Rabinovich Going deeper with convolutions arXiv:1409.4842, 2014.

[14] Z Zhang, Q Liu, and Y Wang, “Road extraction by deep residual unet,” IEEE Geoscience and Remote Sensing Letters, vol 15, no 5, pp.749–753, 2018.

[15] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp 770–778.

[16] K He, X Zhang, S Ren, and J Sun Identity mappings in deep residual networks In ECCV, 2016.

[17] J Hu, L Shen, and G Sun, “Squeeze-and-excitation networks,” in Proceedings of IEEE conference on computer vision and pattern recog-nition(CVPR), 2018, pp 7132–7141.

[18] L.-C Chen, G Papandreou, I Kokkinos, K Murphy, and A L Yuille Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs arXiv:1606.00915, 2016 [19] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam Rethinking atrous convolution for semantic image segmentation CoRR, abs/1706.05587, 2017.

[20] Q Yan, B Wang, D Gong, C Luo, W Zhao, J Shen, Q Shi, S Jin, L Zhang, and Z You, “Covid-19 chest ct image segmentation– a deep con-volutional neural network solution,” arXiv preprint arXiv:2004.10987, 2020.

Tiêu đề	Dr Unet Rethinking The ResUnet++ Architecture With Dual Respath Skip Connection For Nuclei Segmentation
Tác giả	Nhat-Minh Le, Dinh-Hung Le, Van-Truong Pham, Thi-Thao Tran
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Computer Science
Thể loại	conference paper
Năm xuất bản	2021
Thành phố	Hanoi

Định dạng
Số trang	5
Dung lượng	473,32 KB