DR Unet Rethinking the ResUnet++ Architecture with Dual ResPath Skip Connection for Nuclei Segmentation DR Unet Rethinking the ResUnet++ Architecture with Dual ResPath skip connection for Nuclei segme[.]
Trang 1DR-Unet: Rethinking the ResUnet++ Architecture with Dual ResPath skip connection for Nuclei
segmentation
Nhat-Minh Le Faculty of Automation Engineering School of Electrical and Electronics Engineering
Hanoi University of Science and Technology
minh.ln181647@sis.hust.edu.vn
Dinh-Hung Le Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology hung.ld181504@sis.hust.edu.vn Van-Truong Pham*
Faculty of Automation Engineering
School of Electrical and Electronics Engineering
Hanoi University of Science and Technology
truong.phamvan@hust.edu.vn
Thi-Thao Tran Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology
thao.tranthi@hust.edu.vn
Abstract—Nuclei segmentation is a crucial stage in the analysis
of cell microscope pictures By identifying nuclei, researchers may
identify and characterize each cell in a sample Some models
used techniques based on encoder-decoder pairs, such as U-Net,
Multi ResUnet, DoubleUnet, and ResUnet++, which have been
implemented and deployed on the Data Science Bowl 2018 dataset
and given excellent results However, there is still a semantics
gap between the features that directly connect from encoder to
decoder in ResUnet++, and the extraction of information on many
different regions is still limited To improve the performance
of ResUnet++ in this segmentation task, in this paper, we
propose a new architecture that uses Double ResPath (DR),
called Double respath Unet (DR-Unet) The DR-Unet architecture
retains some advantages that made Resunet++ successful such as
residual block associated with a squeeze and excitation block
Besides that, we also pass the encoder features through Respath,
which can bridge the semantic gap instead of combining the
encoder with the decoder feature straightforwardly Moreover,
we use Progressive Atrous Spatial Pyramidal Pooling, PASPP, to
replace ASPP to capture contextual information more efficiently
Experimental results demonstrate that DR-Unet outperforms
ResUnet, DoubleUnet, and other models in the benchmark
Index Terms—Deep Learning, Nuclei Segmentation, Image
Segmentation, ResUnet++, Multi ResUnet
I INTRODUCTION Medical image segmentation is the task of segmenting
objects of interest in medical images More specifically, it is
the task of labeling each pixel in a medical image In this field,
nuclei image segmentation has received considerable attention
Identifying the cell’s nuclei will help locate the cells under
different conditions to enable faster cures Besides, we can
improve throughput for research and insight, reduce
time-to-market for new drugs, etc [1] However, the manual cell’s
nucleus image segmentation is highly time-consuming and
labor-intensive, and the accuracy is highly dependent on the
expertise of the experts Therefore, automatic nucleus image partitioning is an essential requirement
Otsu-based method [2], the watershed method [3], active contour [4] are only a few of the traditional image segmen-tation approaches that have been used in this problem The majority of the approaches listed above, on the other hand, are inefficient, time-consuming, and computationally demanding Another noteworthy factor is that the nuclei of the cells in the
2018 Data Science Bowl dataset [5] varied greatly in form, size, color, and border Traditional techniques can’t distinguish certain cell nuclei because their borders are unclear
Recently, with the vigorous development of Convolutional Neural Networks (CNNs), CNN-based image segmentation methods have shown superior performance compared to tra-ditional methods in many segmentation tasks [6] Long et al proposed Fully Convolutional Network (FCN) [7], one of the first deep learning architectures trained end-to-end for pixel-wise prediction FCN uses an encoder to extract features of the input image and a decoder to generate a segmentation mask from those features Unet [8] is another popular method After them, many image segmentation architectures that use encoder-decoder structures are released and get good results ResUnet++ [9], developed from ResUnet, takes advantage
of residual units, squeeze and excitation units, Atrous Spatial Pyramidal Pooling (ASPP), and attention units, showing great potential in medical image segmentation Double Unet [10] uses two U-Net [8] architecture in sequence, with two encoders and two decoders Multi ResUnet [11], an enhanced version
of the standard U-Net architecture, uses Multi ResBlock to capture more spatial information and Respath to reduce the semantic gap between pair parallel features from the encoder
to decoder
The above models have proven effective in nuclei
Trang 2segmen-tation, especially the 2018 Data Science Bowl dataset [5] We
desire to inherit and combine the strengths of the above models
and develop a new architecture that achieves better results
From that movitation, this paper develops a novel architecture
that uses Double respath scheme,called DR-Unet, inspired by
ResUnet++ [9] architecture, for medical image segmentation
We tested the model on the Data 2018 Science Bowl dataset
[5] The results indicate that the improved model is efficient
and performs well compared to ResUnet++ and other models
in the benchmark
The paper is organized as follows: In Section II, we review
some related works The proposed model is presented in
Section III Experimental results on the Data 2018 Science
Bowl are given Section IV Finally, we summarize the paper
and discuss future work and limitations in Section V
II RELATED WORK Deep learning-based algorithms have recently been widely
used in medical image applications, such as image
super-resolution, classification, especially medical imaging
segmen-tation Along with the development of deep learning,
Im-age Segmentation also achieved remarkable accomplishments
Many deep learning-based techniques have been used to
segment cells and nuclei However, there are still some existing
problems that require researchers to come up with methods to
solve
Theoretically, the deeper the model, the higher the accuracy
[12] [13] However, there will be a deterioration problem when
the model reaches a certain depth [14] [15] He et al suggested
a residual learning framework [14] to overcome this problem
and increase the depth of the model Residual Blocks have a
simple architecture but are capable of deeper model training
without the degradation problem
The Squeeze and excitation [17] block aims to improve
the quality of convolutional neural networks by performing
recalibration of each channel’s features, enhancing information
between channels, and selective emphasis on channels
contain-ing more important features With any convolution layers, we
can construct a corresponding SE block to reframe the feature
maps This task is achieved in two steps:
• Features are passed through a transformation, usually
several convolutional layers, which extract the feature
maps These feature maps go through a squeeze function,
generating a feature matrix of each channel
• The excitation function is added right after that, taking the
above composite matrices as input through the calculation
steps to calculate the weights describing the dependence
between the channels These weights are then multiplied
by features we got before to get the significant features
to the problem
In medical image segmentation, there are usually some
issues as follows The first issue is that as the model becomes
deeper, the resolution of the features is reduced due to the mass
application of pooling layers At that time, it is challenging
to extract spatial information The second issue is that the
objects we are interested in have different scales Chen et al
Fig 1 PASPP architecture with 4 atrous convolutional layer
[18] [19] proposed Atrous Spatial Pyramid Pooling (ASPP) to deal with the above problems ASPP combines the output of atrous convolutions with different dilation rates to increase the ability to capture more global information while keeping the size of the feature map the same In the article [20], Yan et al proposed a Progressive ASPP model based on ASPP PASPP still uses atrous convolution layers with different dilation rates, but the output features are not combined immediately but will
be gradually combined with different cognitive regions Figure
1 depicts the architecture of the PASPP block
Fig 2 ResPath was introduced in [11] by Ibtehaz and Rahman It helps to reduce the semantic gap between the Encoder and the Decoder.
In U-net architecture [8], Ronneberger et al proposed a shortcut link between the convolutional layers immediately before the MaxPooling layer in the Encoder and the convo-lutional layers immediately after the equivalent deconvolution layer in the Decoder This permits the Encoder to send context information to the Decoder that was lost during training However, in the paper [11], Ibtehaz and Rahman pointed out that a problem in simple skip connection is the information imbalance between features in Encoder and Decoder Here, the
Trang 3Encoder features are considered low-level features compared
to the features in the Decoder because the features in the
Decoder are computed in the deeper layers of the network
Therefore, directly combining these features can cause
differ-ences that adversely affect the segmentation results To address
this problem, Ibtehaz and Rahman [11] devised the ”Res Path”
which consists of numerous convolutional layers running the
short connection length to lessen the information gap between
the Encoder and the Decoder as shown in Figure 2
III DR-UNETARCHITECTURE
Fig 3 Proposed DR-Unet architecture.
In the current work, we propose a new architecture that
uses Double ResPath (DR), called Double Respath Unet
(DR-Unet) for nuclei segmentation The architecture of the model is
presented in Fig.3 To extract feature information in Encoder,
we employ one Stem block first, followed by three SE blocks interspersed with three Residual blocks The first PASPP block is used to collect information about multi-scale objects efficiently In Decoder, we build three Residual blocks using input from the previous block combining information from Double Res Path Finally, we use a combination of the second PASPP block, which have six atrous convolutional layers with
a higher dilation rate,1×1 2D convolution layer, and Sigmoid activation function to generate the output mask
A Residual block Each Residual block consists of two successive 3×3 con-volutional blocks In [16], He et al demonstrated that the use
of Batch Normalization and ReLU activation as pre-activation
is surprisingly effective For this reason, in this paper, we employ a convolutional block with a batch normalization layer,
a ReLU activation layer, and a convolutional layer A 1×1 convolutional layer is applied on the shortcut that connects the input and output of the encoder block A strided convolution layer is applied to reduce the spatial dimension of the feature maps by half at the first convolutional layer of the encoder block
In the proposed architecture, the Squeeze and Excitation blocks are stacked together with the Residual blocks to in-crease effective generalization and improve the performance
of the network [9]
B Progressive Atrous Spatial Pyramid Pooling (PASPP) Because of the potential and efficiency of PASPP, we adopt two PASPP blocks in the ResUnet++ architecture Since the input features of the first ASPP block are 32×32, we employ four layers of atrous convolution with a dilation rate of 1, 2, 4,
8 respectively, similar to what Yan el al [20] did in his model
In the 2nd PASPP block, since the input size is 256×256,
We believe that using more atrous convolutional layers and a higher dilation rate will result in a more informed final result Because of that reasons, we adopt 6 atrous convolution with dilation rate of 1, 2, 4, 8, 16, 32 respectively
C Double ResPath Inspired by ResPath and ResUnet++ model, we propose
a new shortcut called ”Double Res Path” in this article Double Res Path is illustrated in Figure 3 In ResUnet++ architecture [9], Attention block combines low-level features
in Encoder with high-level features in Decoder to identify which parts of the network need more attention We believe that concatenating the output of ResPath to the decoder feature both before and after upsampling step will preserve context information most accurately We also find that when using 2 ResPath blocks, the usefulness of the attention block
is still but not much, and removing them from the model will significantly reduce the cost calculation but still keep good results Furthermore, since the semantic gap between Encoder and Decoder decreases as the network is trained in deeper layers, we also gradually reduce convolutional blocks
Trang 4along Double ResPath [11], respectively, is 3, 2, 1 However,
corresponding to the number of filters at the ends of Double
ResPath, we apply the number of filters to the layers at Double
ResPath as 64, 128, 256 respectively
IV EXPERIMENT AND RESULTS
A Datasets
In the Data Science Bowl 2018 dataset [5], scientists
world-wide were challenged to detect and image cells in a series of
micro pictures using machine learning techniques The primary
task is to determine image segmentation algorithms that can
be used to a large number of tests images without any human
influence This method might shorten the time to analyze
images, allowing future researchers to adopt and test more
experiments for research and clinical application
The Data Science Bowl 2018 dataset contains 670 training
pairs and 65 testing pairs with each pair includes an image
and its corresponding masks [1] The dataset includes 5 types
of cell images: Small flourescent, Purple tis-sue, Pink and
purple tissue, Large flourescent, Grayscale tissue with different
percentages of data inside [5]
B Training
We implemented our model and utilized the Adam
algo-rithm for optimizing the trainable parameters of the model
with 1e×10−4learning rate The training process is looped on
the dataset for 200 epochs with batch size 8 Early stopping
and Reducelronplateau have already been used
C Evaluation Metrics
To assess the neural network’s segmentation performance,
we utilize the Dice Similarity Coefficient (DSC) The DSC
metric is described as follows:
where TP, TN, FP, FN are respectively the number of true
positives, true negatives, false positives and false negatives
In addition to DSC, we also use the Intersection over Union
(IoU) index as an alternative evaluation measure, defined as
D Results
Figure 4 presents the representative segmentation of our
proposed approach on the 2018 Data Science Bowl challenge
dataset As shown in this figure, the predicted masks are in
good agreement with those in the ground truths
In addition, the learning curves obtained from the train and
validation sets are shown in Fig 5 From this figure we can
see that the DSC, and IoU as well as accuracy are stable
after 100 epochs For quantitative assessment, we provide
the evaluation scores including the DSC, and IoU of the
proposed model and other state-of-the-arts in Table 1 The
use of PASPP helps an increase of 1.17 in average Dice
score (92.40) and 1.5 IoU score compared to ASPP (92.23
Dice score and 85.20 IoU score) The comparative methods include the Double Unet, Multi ResUnet, and ResUnet++ As shown in this Table, our approach obtains highest scores for both DSC and IoU In addition, the proposed model has less training parameters (2.6M) compared to ResUnet++ (4.1M), Multi ResUnet (7.3M), and DoubleUnet ( 29.3M)
TABLE I
R ESULT ON N UCLEI SEGMENTATION FROM 2018 D ATA S CIENCE B OWL
CHALLENGE
Double Unet 29,297,573 91.33 85.07 Multi ResUnet 7,275,844 90.92 84.40
Ours (use PASPP) 2,560,973 92.40 86.70 Ours (use ASPP) 3,584,497 91.23 85.20
Fig 4 Some representative segmentation results of DR-Unet on Nuclei images from 2018 Data Science Bowl challenge dataset
V CONCLUSION
In this paper, we have proposed the DR-Unet architecture for nuclei image segmentation In this novel model, we take advantage of Residual blocks, SE blocks Furthermore, we replaced ASPP with PASPP allowing more efficient seman-tic context extraction and Double Res Path development to
Trang 5Fig 5 Learning curve
reduce the semantic gap between features when combined
Our experiments outperformed several previous state-of-the-art
models on the 2018 Data Science Bowl dataset Besides, this
result also demonstrates the potential of DR-Unet in cell image
segmentation Nevertheless, we found the DR-Unet
architec-ture to be even better In the fuarchitec-ture, besides developing the
DR-Unet model, we will simultaneously develop appropriate
loss functions for image segmentation to achieve even more
performance
ACKNOWLEDGMENT This research is funded by the Hanoi University of Science
and Technology (HUST) under project number T2021-PC-005
REFERENCES [1] ”2018 Data Science Bowl — Kaggle”, Kaggle.com, 2021 [Online].
Available: https://www.kaggle.com/c/data-science-bowl-2018/overview.
[Accessed: 11- Nov- 2021].
[2] KOtsu, N.: A threshold selection method from gray-level histograms.
IEEE Trans Syst Man Cybern 9, 62-66 ( 1979).
[3] 5.W¨ahlby, C., Sintorn, I.-M., Erlandsson, F., Borgefors, G., Bengtsson,
E.: Combining intensity, edge and shape information for 2D and 3D
segmentation of cell nuclei in tissue sections Journal of Microscopy
215, 67-76 (2004).
[4] Hayakawa, T., Surya Prasath, V.B., Kawanaka, H., Aronow, B.J.,
Tsuruoka, S.: Computational Nuclei Segmentation Methods in Digital
Pathology: A Survey Archives of Computational Methods in
Engineer-ing 28, 1-13 (2021) minEngineer-ing and advanced computEngineer-ing (SAPIENCE) (pp.
198- 203) IEEE.
[5] Caicedo, J.C., Goodman, A., Karhohs, K.W., Cimini, B.A., Ackerman,
J., Haghighi, M., Heng, C., Becker, T., Doan, M., McQuin, C., al., e.:
Nucleus segmentation across imaging experiments: the 2018 data science
bowl Nature Methods 16(12), 1247–1125 (2019).
[6] G Litjens, T Kooi, B E Bejnordi, A A A Setio, F Ciompi, M.
Ghafoorian, J A Van Der Laak, B Van Ginneken, and C I Sanchez, ´
“A survey on deep learning in medical image analysis,” Medical image
analysis (MedIA), vol 42, pp 60–88, 2017.
[7] E Shelhamer, J Long and T Darrell, ”Fully Convolutional Networks
for Semantic Segmentation”, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol 39, no 4, pp 640-651, 2017 Available:
10.1109/tpami.2016.2572683.
[8] 16.Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks
for biomedical image segmentation In: International Conference on
Medical image computing and computer-assisted intervention 2015, pp.
234-241 Springer
[9] ] D Jha, P H Smedsrud, M A Riegler, D Johansen, T De Lange,P Halvorsen, and H D Johansen, “Resunet++: An advanced architecture for medical image segmentation,” in Proceeding of IEEE International Symposium on Multimedia (ISM), 2019, pp 225–2255.
[10] Jha, D., Riegler, M., Johansen, D., Halvorsen, P., Johansen, H.: Doubleu-net: A deep convolutional neural network for medical image segmen-tation In: IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS) 2020, pp 558-564
[11] N Ibtehaz and M Rahman, ”MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation”, Neural Networks, vol 121, pp 74-87, 2020.
[12] K Simonyan and A Zisserman Very deep convolutional networks for large-scale image recognition arXiv:1409.1556, 2014.
[13] C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan,
V Vanhoucke, and A Rabinovich Going deeper with convolutions arXiv:1409.4842, 2014.
[14] Z Zhang, Q Liu, and Y Wang, “Road extraction by deep residual unet,” IEEE Geoscience and Remote Sensing Letters, vol 15, no 5, pp.749–753, 2018.
[15] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp 770–778.
[16] K He, X Zhang, S Ren, and J Sun Identity mappings in deep residual networks In ECCV, 2016.
[17] J Hu, L Shen, and G Sun, “Squeeze-and-excitation networks,” in Proceedings of IEEE conference on computer vision and pattern recog-nition(CVPR), 2018, pp 7132–7141.
[18] L.-C Chen, G Papandreou, I Kokkinos, K Murphy, and A L Yuille Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs arXiv:1606.00915, 2016 [19] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam Rethinking atrous convolution for semantic image segmentation CoRR, abs/1706.05587, 2017.
[20] Q Yan, B Wang, D Gong, C Luo, W Zhao, J Shen, Q Shi, S Jin, L Zhang, and Z You, “Covid-19 chest ct image segmentation– a deep con-volutional neural network solution,” arXiv preprint arXiv:2004.10987, 2020.