Mai thesis a computer vision based method for breast cancer histopathological image classification

MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY OPEN UNIVERSITY Ae MASTER OF SCIENCE IN COMPUTER SCIENCE A computer vision-based method for breast cancer histopathological image

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HO CHI MINH CITY OPEN UNIVERSITY

Ae

MASTER OF SCIENCE IN COMPUTER SCIENCE

A computer vision-based method for breast cancer histopathological image classification

by deep learning approach

Student: Mai Bui Thuy Huynh Thesis Advisor: Dr Vinh Truong Hoang

Ho Chi Minh City, October 2019

Trang 2

Contents

Acknowledgment

Abstract

Notations

Abbreviations

11 IntroducHon and general consideratons

1.2 Goalsofthethesis Q LH HQ HQ HQ Q Q Q Q1 v2 13 Contributionofthethesis Q Q LH va 14 Structure ofthethesis Q Q LH Q Q Q Q Q QVv2 15 Methodology 0.0.0.0 0000002 eee eee 2 Foundational theory 2.1 DeepneuronnefWOTkK HQ HH HH HH 2.11 Introduction to deep neuronnefwortk

2.1.2 Present the techniques of neuron network training

2.1.3 Present the popular deep network models

2.2_ Generative Adversarial Networks(GAN)

2.2.1 IntroductiontoGAN .0 5.02.00 200 000 2.2.2 PresentthepopularGANmodels

iii

vii

ix

10

13

17

20

21

24

Trang 3

3.22 BACHdataset Q Q Q Q Q Q Q o

Trang 4

Acknowledgment

11

Trang 5

Abstract

Computer vision field has became more active in the recent decades when scientists

found to apply mathematical and quantitative analysis Various applications have

been using computer vision techniques to improve their productivity such as visual

surveillance, robotic, autonomous vehicle, and specially medical image processing

Until Geoffrey Hinton and Yann LeCun, both known as “Godfather of deep learning”

used Neural Networks and Back Propagation in characters and handwritten predic-

tion given the best result comparing to previous works, the techniques has been be-

came prominent

In this thesis, we focus to detect the breast cancer with high accuracy in order to

decrease the examination cost in accepted time 50, we choose the deep learning to re-

search and evaluate our approach on three datasets such as BreaKHis, BACH and IDC

Due to some limitations of deep learning and dataset sizes, we propose the composi-

tion of popular techniques to be boosting the efficient classification, they are transfer

learning, Generative Adversarial Network (GAN) and neural networks VGG16 &

VGG19 are the base models which are applied to extract the high level features space

from patch cropped images, naming as multi deep features before being trained by

neuron nets So far, there are not any works to leverage GAN power to generate

the fake BreaKHis and in our thesis, we use Pix2Pix and StyleGAN model as gen-

erator model With the proposed approach, the cancer detection results achieve the

better performance to some existing works with 98% in accuracy for BreaKHis, 96%

for BACH and 86% for IDC

Trang 6

Notations

Number of block of layers are stacked together The hypothesis space in traditional machine learning Loss function for each hypothesis

Activation function in deep learning Mapping function in deep learning Input feature

Feature’s weight Output feature Loss function in GAN model Discriminator model Generator model Noise input Mean

Variance

Vii

Trang 7

Global Cancer Incidence, Mortality and Prevalence

Clinical Breast Exam Completed Local Binary Pattern

Local Phase Quantization

Gray Level Co-Occurrence Matrices

Free Threshold Adjacency Statistic Oriented FAST and Rotated BRIEF

k-Nearest Neighbor

Support Vector Machines Random Forest Quadratic discriminant analysis Graphic Processing Unit Convolution neuron network Convolutional layer Fully connected layer Manifold Persevering Autoencoder Decision Tree

Logistic Regression Generative Adversarial Network

Magnetic Resonance Image

Scale Invariant Feature Transform Speeded Up Robust Features Stochastic Gradient Descent

1X

Trang 8

Chapter 1

Literature review of breast cancer

histopathological image classification

Cancer is a public health problem in the world today Among them, breast cancer is the

most common invasive cancer in women and have a significant impact to 2.1 million

people yearly In 2018, the World Health Organization (WHO) estimated 627,000 death

cases because of breast cancer, be getting 15% death causes As a result in 2018 from

Global Cancer Incidence, Mortality and Prevalence, GLOBOCAN [1] about a number

of new and death cases of 36 cancer types from 185 countries though 5 continents

shown in Table 1.1, new breast cases is 11.6% and second leading cause of death cancer

(% of all sites) (% of all sites

Trang 9

with low per capital income about 3,200$/year and 20$/year for voluntary medical

expense — that this breast cancer was 23/100,000 and had the risen trend [2]

Early cancer detection has many changes to treat and increase survival rate for pa-

tients WHO finds that there are the effective diagnostic methods such as X-ray, Clin-

ical Breast Exam (CBE) but this needs to have the professional physicians or experts

Beside the diagnostic result is not always 100% accuracy because of some reasons such

as subjective experiments, expertise, emotional state

In recent years, trend of image processing field and machine learning proved that

physician can employ this technology to make diagnosis via medical image Medical

image processing method has been applied much on cancer diagnosis [3] and other

diseases [4] with high accuracy in short time Image diagnosis by machine learning

is cost efficient method in Vietnam’s urban region where there is no any professional

medical teams

For the most part, researches demonstrated the improvement of breast cancer clas-

sification accuracy [5, 6,7, 8,9] but it doesn’t achieve to the significantly high rate A

main reason is a limited on training dataset, as well collecting and annotating suffi-

cient data by pathological expert is time-consuming and expensive

Nowadays there are open-access breast cancer databases for research and litera-

ture such as BreaKHis, ICIAR 2018 BACH Challenge, Kaggle breast histopathological

image, Tumor Proliferation Assessment Challenge 2016 But almost works experi-

mented on BreaKHis [8] built on collaboration with the P&D Laboratory - Pathologi-

cal Anatomy and Cytopathology, Parana, Brazil, it means that those results can’t gain

the same accuracy on new dataset

Deep learning is a branch of machine learning, representing data characteristic by

layers from simple symbols of point, line to complex, abstract structure of polygon

In 1986, Rina Dechter first introduced deep learning in machine learning community

In 1970, multiplayer perceptrons algorithm stimulated the capacity of human brain to

recognize and discriminate objects, especially many applications in computer vision

Then, Yann Lecun achieved the good result of digit handwriting classification by back-

propopation in deep learning in 1987 Nowadays, deep learning has been developing

so quickly and widely as well applications in many fields

Indeed, although BreaKHis is breast cancer benchmark database, it is not as large

as ImageNet which are built on collaboration with Stanford University, Princeton Uni-

versity, Google Company and A9 Research ImageNet includes 14,197,122 images and

20,000 categories used much on deep learning

New machine learning researches have been getting high accuracy breast cancer

2

Trang 10

classification by many various supervised and unsupervised learning algorithms For

literature reviews from 2016 to May 2019, this thesis studies three main techniques:

handcrafted and /or deep feature, transfer learning, generative adversarial network

Handcrafted-feature or deep feature: Spanhol et al [10], Badejo et al [11] made

comparison handcrafted features extractions such as Local Binary Patterns (LBP), vari-

ant LBP - Completed Local Binary Pattern (CLBP), Local Phase Quantization (LPQ),

Gray Level Co-Occurrence Matrices (GLCM), Free Threshold Adjacency Statistic (PF-

TAS), Oriented FAST and Rotated BRIEF (ORB) and classifiers as 1-Nearest Neighbor

(1-NN), Support Vector Machines (SVM), Random Forest (RF) To improve the accu-

racy to a range of 98.5% - 100%, the Spanhol combined the boosting techniques of

1-NN, QDA, RE, SVM But the best result is efficient for 40x and 400x magnification

Author concluded that PFTAS feature is suitable for medical image Graphic Pro-

cessing Unit (GPU) development for big data processing, Spanhol et al [5] proposed

deep learning algorithm which is convolution neuron network (CNN) as CONV-Max

Pool-CONV-Average Pool-FC-FC with 32x32 and 64x64 window patch size in se-

explored the jointly color-texture information (RGB and HSV color space) with and

without stain normalization and various contemporary classifiers used in Spanhol’s

popular in general computer vision But this descriptor is high dimensionality In

two years, the authors presented different encoding methods to get it more discrim-

inated features such as intra-embedding algorithm and Fisher Vector descriptor 2 x

512 x N extracted from VGG19 and GMM model with N Gaussian components Qi et

al [15] have pointed to use entropy-based and confidence-boosting strategy as deep

active learning method for small training dataset classification which reduces anno-

tation costs up to 66.67% with high accuracy from 88.29% to 91.61% Mukkamala et

al [16] has built a deep learning technique based on principal component analysis

for each channel of LAB color space and SVM with accuracy from 85.85% to 96.12%

Kumar et al [17] built a CNN model as 3CONV[5x5]-3CONV][3x3]-ReLU-Pool-FC to

extract the deep features from medical images Gupta et al [18] found that histopatho-

logical stain normalization before using handcrafted-feature extraction will get the

cancer classification to be more efficient than gray scale image Feng et al [19] ex-

ploited the unsupervised learning capacity by using autoencoder network named as

manifold persevering autoencoder (MAE) to learn the encoded features from input

and then decoding hidden presentations to output For a new algorithm, Feng et al

archived accuracy from 82% to 99.16% Reza et al [20] experimented the sampling

techniques such as Under-sampling, Over-sampling, ADASYN, SMOTE with CNN

3

Trang 11

network and found that unbalance data will be effecting to accuracy Deploying over-

sampling method to unbalance BreaKHIs dataset gets the better performance Angara

et al [21], Guillén-Rondon et al [22] proposed the CNN network such as 3-[Conv-

ReLU-Pool]-2FC-Softmax Alom et al [6] combined the strength of Inception, ResNet

and Recurrent Convolutional Neural Network with 95% and 97% accurate classifica-

tion with/without augmentation for 4 magnification factors Two core ideas of Zhang

et al [23] are to use skip connection in Resnet to solve the optimization issues when

network becomes deeper and CBAM to refine Resnet features This method gained

the 92.6% highest accuracy for 200x and lowest 88.9% accuracy for 400x Sudharshan

et al [5] compared various multiple Instance Learning (MIL) together and concluded

that non-parametric MIL,which extracted MIL feature space using a Parzen window

technique and k-NN classifier, is higher accuracy than MILCNN and Single Instance

Learning, with 40x magnification rate its accuracy is 92.1% Roy et al [24] proposed

a patch-based classifier using CNN network consisting of 6CONV-5POOL-3FC Au-

thor experimented this model with 512x512 patch size based data which contained

more information and was efficient size, gained 92.% accuracy on ICIAR 2018 Alireza-

zadeh et al [25] learned features space from two different domains using LBP, LPQ,

PFTAS and then forming a projection matrix, in this case that domain space is be-

nign and malignant This method gave the better performance than using each sep-

arated LBP, PFTAS, LPQ or CNN feature and classifier as Spanhol’s work Fondén

et al [26] extracted 3 features type such as the nuclei-based feature by transform-

ing to CMYK color space and K-mean clustering algorithm; region-based vector of

pink/violet, pink/white, white/violet; texture feature consisting of first order statistic

vector, LBP and sparse texture descriptor Fond6n used 9 classifier to detect cancer

tumor using dataset through Bioimaging 2015 Grand Challenge

Transfer learning technique: Weiss et al [27] evaluated different feature extrac-

tor using VGG, ResNet and Xception in training a limited number of samples and

achieved state-of-the-art results on BACH dataset This method downsized BACH

image to 1024x768 in order to train the classification model Vo et al [7] applied the

augmentation method as rotate, cut, transform image to increase the training data

volumes before extracting deep feature from Inception-ResNet-v2 model in order to

avoid the over-fitting They trained the model with multi-scale input images 600x600,

450x450, 300x300 to extract local and global feature Then Gradient Boosting Trees

model again was trained to detect breast cancer Fusion model will vote the higher ac-

curacy classifier The accuracy rate archived to 93.8% — 96.9% at low cost computation

Murtaza et al [28] used Alexnet as feature extraction hierarchical classification model

by combination of 6 algorithms: kKNN, SVM, NB, DT, LDA and LR and finally feature

4

Trang 12

reduction to increase overall accuracy 92.45% to 95.48% Li et al [29] deployed the

transfer learning Xception network to avoid model over-fitting Li applied the Resnet

technique to transfer prior knowledge to latter layer in order to achieve accurate and

precise classification Cascianelli at el [50] proposed a new dimension reduction -

Principal Component Analysis, Gaussian Random Projection, Correlation based Fea-

ture Selection after applying the pre-train VGG-F, VGG-S and VGG-veryDeep network

for limited dataset as BreaK His to takeover over-fitting issues with accuracy from 84%

to 94.7% Brancati at el [31] chosen the fine-tuning ResNet network strategy of 3 dif-

ferent configurations, 34, 50, 101 layers and then voted which classification is getting

highest class probability from these configuration This work gets 97.2% accuracy for

benign and malignant tumor on BACH dataset Awan at el [32] used ResNet-50 lay-

ers to extract descriptors from overlapped patch-based images and then applied PCA

dimension reduction Shallu et al [33] proved that transfer learning from VGG16,

VGG19, ResNet50 is better than fully scratch training because these networks have

utilized as discriminated features and meanwhile VGG16 will be better feature gen-

erators Gandomkar et al [34] used ResNet-152 layers to extract features from five

overlapping patches in a stained normalized image This technique is applied to each

magnification rate 40x, 100x, 200x, 400x to detect malignant/benign and subtype

cancer with 97.66% - 98.52% and 94.60% - 95.40%

Generative Adversarial Network (GAN) technique: Shin et al [35] used Image-to-

Image Conditional GAN mode (pix2pix) to generate synthesis data and discriminate

T1 brain tumor class on ADNI dataset Then author continued to apply this model

on other dataset BRATS to classify Tl This GAN yielded 10% increased accuracy

compared to train on the real image dataset Iqbal et al [36] proposed a new Gen-

erative Adversarial Network for Medical Imaging (MI-GAN) to generate synthetic

retinal vessel images from STARE and DRIVE dataset This method generated pre-

cise segmented image better than existing techniques Author declared that synthetic

image contained the content and structure from original images Senaras et al [37]

employed a conditional Generative Adversarial Network (cGAN) to generate syn-

thetic histopathological breast cancer images G model used the modified version

of U-net D model used a CNN based classifier patch GAN Author’s experiments

showed that synthetic images are indistinguishable from real ones Six readers (three

pathologists and image analysts) tried to differentiate 15 real from 15 synthetic im-

ages and the probability that the average reader would be able to correctly classify

an image as synthetic or real more than 50% of the time was only 44.7% Mahapatra

et al [38] proposed P-GANs network to generate a high-resolution image of defined

scaling factors from a low-resolution image This research suggest the multi-stage

5

Trang 13

network with a triple loss function based correction mechanism Output from pre-

vious stage will be baseline to improve next stage’s output This technique helped

to recover the degradation of image quality at each stage The final super resolu-

tion image obtained the close accuracy to original magnetic resonance image (MRI)

in landmark and pathology detection Cai et al [39] studied the cross-model volume-

to-volume translation technique from the pancreas classification to breast lesion seg-

mentation domain by two different medical image types Frid-Adar et al [40] fol-

lowed DC-GAN, AC-GAN network for synthesizing high quality liver lesion ROIs and

then used CNN network as CONV-SUBSAMPLING-CONV-SUBSAMPLING-CONV-

SUBSAMPLING-FC-DROPOUT to classify liver lesion Wu et al [41] proposed the

conditional-GAN (ciGAN) to generate the fully contextual in-filling image of breast

lesion This work observed that ResNet-50 classifier trained with GAN augmented

dataset produced a higher AUROC curve to traditional augmentation with the same

classifier

Both handcrafted and deep feature demonstrate the good cancer detection capabil-

ity Various researches combine numerous color features and local texture descriptors

to improve the performance [42,43] Modak at al [43] did comparative analysis of sev-

eral multi-biometric fusions consisting levels of feature-mostly feature concatenation,

score or rules/algorithms level Authors statistically analyzed that fusion approach

represents many advantages than single mode such as accuracy improvement, noise

data and spoof attack reduction, more convenience A at al [42] exploited the pow-

erful transfer-learning technique from popular models such as Alexnet, VGGNet-16,

VGGNet-19, GoogleNet and ResNet to design the fusion schema at feature level for

satellite images classification It is said that fusion from many ConvNet layers are bet-

ter than feature extracted from single layer Features extracted from CNN network is

less effected by different conditions such as edge of view, color space; it is an invariant

feature and getting the better generalization Thus data augmentation methods might

affect the accuracy if it is applied inadequately In order to save low computation cost

from scratch, transfer learning technique can be considered to employ in medical field

It needs to be retrained or fine-tuning in some layers so that these networks can detect

the cancer features Furthermore, GAN is the effective data augmentation method in

computer vision but GAN training process is still a difficult problem These method

have been investigated intensively for common data and rarely for medical data To

overcome this limitation, we propose a composition method of three techniques to be

boosting the breast cancer classification accuracy in a limited training data

6

Trang 14

1.2

The

Goals of the thesis

objectives of this study are:

This thesis will use Generative Adversarial Networks (GAN) to build the syn-

thetic breast cancer images Goodfellow et al [44] [45] proposed new genera-

tive model which trained the model by adversarial process GAN included 2

separated models such as generative model G and discriminative model D, but

trained them concurrently G learn the distribution of training dataset while D

tried to discriminate which is true or fake image generated by G D estimated the

conditional probability of p(ylx) G tried to optimize the conditional probability

of p(xly) in order to make fool of D We can understand that D and G play the

two-player minimax game with function in equation 1.1

8y lạ

Loss function 6(d) maximize D(x) to gain 100% probability on true image and

D(G(z)) gained 0% on fake image Otherwise, loss function 6(g) minimize

D(G(z)) to gain 100% on fake image

¬ sư Generated image D: D(Gz)) ~ 0

[ Noise z nal Generator G mal from G imal Discriminator D bl G: D/GÉ)) ~ 1 |

Figure 1.1: GAN network

In Fig 1.1, noise z as Gaussian or uniform distribution is input to train G model

Conceptually, z is latent feature extracted from generated image Output from

G is used as input to train D model to discriminate either real or fake image

Mini batch stochastic gradient descent (SGD) trains GAN model to optimize

4(d), @(g) To speed up the training process, GAN can use Adam algorithm

as well

Previous year, many researches [46] [47] are about the efficiency of handcrafted

feature such as Scale Invariant Feature Transform (SIFT), Speeded Up Robust

Features (SURF) and deep network and/or deep feature such as features ex-

tracted from VGG16, VGG19 (developed by research group in Oxford univer-

sity), ResNet (developed by research group in Microsoft company) The thesis

apply the basic deep network model to extract the breast cancer feature, instead

of handcrafted features which can’t extract the complex cancer characteristic in

medical image

Trang 15

Figure 1.2: Illustration of BreaKHis database at different magnification factors of be-

nign cell 40x (a), 100x (b), 200 (c), 400x (d) and malignant cell 40x (e), 100x (f),

200x (g), 400 (h)

* This thesis proposes a new algorithm to classify the breast cancer image in three

databases: BreaKHis, Breast Cancer Classification Challenge 2018, Kaggle in or-

der to improve the classification performance

WHO declared that there were many image types used in cancer diagnosis such as

X-ray image find abnormal region but can’t identify the cancer region or not; biopsy

image can define the cancer region or not but can’t identity cancer subtype, shape

or other characteristic such as distribution or balance of cell For histopathological

image, experts can classify cancer region, its levels This work propose method to

detect cancer from histopathological image in three databases

1 BreaKHis: The Break His is benchmark database to study the breast cancer clas-

sification problem There are 7,909 images from 82 patients using 4 magnifi-

cations (40x, 100x, 200x, 400x) This dataset is divided into 2 main groups:

benign and malignant tumors, 8 sub cancer type as well totally size is 4GB This

dataset has been built in collaboration with the P&D Laboratory — Pathological

Anatomy and Cytopathology, Parana, Brazil

2 Breast Cancer Classification Challenge 2018: The BACH 2018 is been built on

collaboration with Universidade do Porto, Instituto de Engenharia de Sistemas e

Computadores, Tecnologia e Ciéncia (INESC TEC) and Instituto de Investigagio

and Inovacão em Satide (i135), Portugal This dataset consists of 400 images which

8

Trang 16

Table 1.2: Image distribution per magnification, class and subclass in BreaKHis

is divided into four groups as well totally its size is 13.2 GB Each image also

classified into two main groups, non-carcinoma and carcinoma, by grouping the

normal and benign class into non-carcinoma and grouping the in situ and inva-

sive into carcinoma class

Figure 1.3: Illustration of BACH database for tumor types (a) Normal, (b) Benign, (c)

In-situ, (d) Invasive cell in

Trang 17

Breast Cancer (BCa) specimens scanned at 40x in Hospital of the University of

Pennsylvania and The Cancer Institute of New Jersey From that, 277,524 patches

of size 50 x 50 were extracted (198,738 Invasive Ductal Carcinoma-IDC negative

and 78,786 IDC positive)

Table 1.4: Image distribution per magnification, class in BCa(Kaggle) database

1.3 Contribution of the thesis

The study proposes a composition method of three techniques, transfer learning, deep

learning and GAN to be boosting the breast cancer classification accuracy in a limited

training dataset

1.4 Structure of the thesis

The thesis consists of 5 main chapters: chapter 1 is to literature review of breast cancer

histopathological image, chapter 2 is to foundational theory about deep neuron and

generative adversarial network, chapter 3 is to propose the combination method of

three techniques, chapter 4 is to setup experiment Finally, the achievement, drawback

and future works is on chapter 5

The whole slide image will be divided into patch image After that, the patch images

will be normalized to [0,1] scale and then resized to 256x256 pixel values VGG16 &

10

Trang 18

'VGG19 base networks are used as feature extraction techniques to extract the discrim-

inated characteristics of benign or malignant tumor Our classification model is the

CNN network of 7 layers

11

Trang 19

With innovation of high performance computing system such as GPU or grid of mas-

sive clusters, forward and backward propagation applied in neuron network proved

that this technique improves the classification error rate than machine learning ap-

proaches such as SVM, Random forest,Bayesian network,etc These networks com-

prise many layers into deep neuron network architecture to learn features from low

to higher via a stack of layers Nowadays deep learning is a remarkable technique

and mostly be considered to apply in many fields such as computer vision, natural

language processing, video

2.1.1 Introduction to deep neuron network

For machine leaning approach, we have to collect dataset, analyze and understand

what data is or how distribution it is Applying the feature extraction and selection

such as feature ranking, dimension reduction to get the shaped dataset before build-

ing model We make various questions to select the hypothesis space ®(x) and ac-

cordingly loss function L(®(x)) to generalize our data at the best During a training

model process, minimizing the loss function is very important so that our predicted

result reaches to target value But the neuron network is driven to definitely different

way, instead of choosing the best one, this method be learned to find the hypothesis

Figure 2.1 shows the differences between the learning approaches

Conceptually, neuron nets is aspired from how human brain has been working

Figure 2.2 describes the physical brain structure that there mainly are three compo-

nents such as dendritic tree, axon hillock and axon Dendritic tree collects input in-

formation from other neurons via axon; axon contacts dendritic trees of other neurons

13

Trang 20

Addional )utput Mapping Mapping layer of

mm from feature from feature abstract

feature

Hand Hand designed designed Feature —e program feature

nput nput nput nput

Rule base Machine Representation Deep system learning learning earning

Figure 2.1: Summary of learning approaches by Goodfellow, Bengio, Courville Geen

box are thing to be learned

at synapses ;axon hillock receives output from dendritic tree and generates outgoing

a spike of activity at synapses into post-synaptic neuron To summary, each neuron

receives input from other neurons and flow of information in input line is controlled

by synaptic weight This connection weight can be adjusted efficiently to receiver dur-

ing cognition process Then the main principal of a neuron is simulated in computer

science as in figure 2.3 defined y = o(x1 x w1+x2x w2-+ x3 x w3-+ b) Input signal

from other neurons are transferred and their weights W can be adjusted accordingly

Final output information will be summarized at output node Activation function 7

makes the neuron to be able to do the complicated computation Mathematically, acti-

vation function makes affine transformation from linear to non-linear Neural network

comprise a thousand of this simple node or neuron which computes for tasks

So deep learning is an algorithm that have many layers of processing together to

resolve particular tasks such as classification, object detection,etc Each layer consists

of many neurons (nodes or units) as demonstrated network in figure 2.3b Deep learn-

ing technique generated the mapping functions by studying the relation of features

It is not definitely fixed function as traditional machine learning The function f will

map an input x to the intermediate output y defined as y = f(x;w) and then study

parameter values w to get the best approximated function f The model 2.4 is also

14

Trang 21

a Simulation of a neuron in computer science b Sample of deep learning network

Figure 2.3: Simulation a neuron and neuron network in computer science

called as feed forward and extremely important in deep learning networks Feedfor-

ward deep network comprise many different functions together as chain structures to

learn the abstract features, defined as u = g(h(f(x)))

input x " Mapping sanction L——>| Output y

Figure 2.4: Feedforward neuron network

Convolutional network is terminology in neuron network architecture Convolu-

tional network consists of three typical stages First stage is a combination of many

convolutional layers to do affine transform on input layer and next stage is to run the

nonlinear activation to detect the complexity object and final stage is to pooling layer

It is defined as below description

15

Trang 22

Convolutional layer: in general form, convolution is an operation on two real

value x & w to measure the weighted average denoted as s(t) = (x « w)(t)

In neuron network terminology, is matrix multiplication between input of pro-

cessed image and weight (kernel) to get output as feature map 2.5 To improve

tion (sparse interaction), parameter sharing For sparse connection, one input

units can effect to many output input and otherwise because kernel size is rather

small than input image and result is some connections to be zero out In a

case of fully connection, convolutional layer is called as dense layer or full con-

nected layer It proved that when processing large image with a million pixels,

small meaningful characteristic can be detected as edges, important points with

small reduced parameters and efficient computation Secondly sharing param-

eter idea, single parameter can be used for many inputs Composition of both

ideas, convolution can improve greatly object detection

Pooling layer: is a kind of convolutional layer that is used to adjust unit’s value

by statistically summary of neighbor units such as max, average Pool purpose

is to reduce the small variances from whole neighborhood Pool operator is a

good candidate to define the features regardless of particular position or vari-

able input size Pooling layer is normally designed after dense layer to produce

invariance to translation such as rotation transformation In other hand, pooling

layer can be used as the downsampling technique that reduces the presentation

size computationally on next layer Currently tensorflow supports many pooling

types: maxpooling 1D, 2D, 3D and averagepooling 1D, 2D, 3D

* Activation layer: there are popular activation functions used much in neuron

16

Trang 23

ponential function to normalize input vector into probability distribution of K

components denoted as

evi dja

Softmax function is used in final output to classify a multi-class Meanwhile,

that scale the real number to continuously range of value between 0 and 1 Rec-

tified linear activation function (ReLu) denoted as

will zero out all negative number and keep positive number ReLU has charac-

teristics as less train time because of simple math Secondly it doesn’t suffer the

vanish gradient problem as sigmoid or softmax and quickly converges

2.1.2 Present the techniques of neuron network training

* Activation function selection in each layer has to be considered carefully because

it impacts to the computationally effects, quickly converge to local/global mini-

mum in training process In practice, ReLU is advised to use in neuron network

After years, researchers focus to study the activation function by improve ReLU’s

limitation such as

LeakyReLU = max(0.01x,x) (2.4)

return negative number and

is ReLU and Leaky ReLU composition

* Batch normalization: batch norm is a kind of regularization technique that nor-

malizes each output dimension to expectation and variance defined as:

xí) ~ E[x(9|

q

Var|x®)]

It is proven to speed up convergence loffe and Szegedy suggested to use the

batch norm after dense or convolutional layer and before nonlinearity function

17

Trang 24

There are some efficients when using batch norm in the neuron nets such as al-

low to setup higher learning rate to quickly reach to local/ global minima, reduce

the dependence on weight initialization as standard distribution or dropout Re-

cently, there are additional normalization techniques as layer, instance and group

norm

Dropout: dropout’s principal is to turn some neurons off randomly by multiply-

ing its output value by zero and then they don’t take action in forward propa-

gation Dropout technique is rather similar to bagging approach that compos-

ites various different networks with shared parameters instead of training many

large architecture with highly computation The probability p hyper-parameter

is used as optimized wide range of networks; higher probability is, less dropout

is This technique makes the network less over-fitting As author’s experiments,

applying 20% dropout in input units and 50% of hidden layer is optimal selec-

tion

Data augmentation: deep learning always needs a huge train set to reduce the

over-fitting but in practice to get large volume is heavy task as medical images

So data augmentation is considered selection Traditionally, horizontal or ver-

tical rotate, randomly cropped images, scaling, resizing, changing color spaces

or combining all things are common technique and recently GAN is nominated

candidate in data augmentation It learns distribution of input data and then

generate fake output which is getting the approximated features as input In our

work, we combine both of types

Transfer learning: it bases on volume or industry of dataset to decide how many

layers is freezed or re-trained accordingly Mostly vision models are trained on

ImageNet dataset so these extracted features are generic and sometime it needs

to re-trained for particular issue In principal, these bottom layers extract generic

features while going further to top of model is more specified features As statis-

tically summary, a small dataset in similar domain, we can freeze many bottom

layers and train a little top layers When dataset becomes bigger and different

domain, it has to be trained more layers

Optimization: reaching to global minima and avoid local manima or saddle

point in training neuron network, there are many optimization approaches such

as stochastic gradient descent (SGD), momentum, nesterov momentum, Ada-

Grad, RMSProp, Adam Actually, these techniques are used much in machine

learning but Adam is mostly used in training neuron network Adam is the tech-

18

Trang 25

nique of compositing between momentum and AdaGrad or RMSProp Momen-

tum build up velocity to accelerate the SGD and step over the minimun local

or saddle point Selecting efficient learning rate is not easy task as well either

large or small value can take a long time training to achieve the best loss So

AdaGrad calculates the adaptive learning rate to gradient function by summing

of historical squared of gradient in each dimension For Kingma and Ba, hyper-

parameters in Adam are advised as beta1=0.9, beta2=0.999, learning rate=10~? or

5x10”! for starting to train model Adam optimizer is used in our experiments

Early Stopping: after many train iterations, validation accuracy to be gradually

decreasing or no any changes means that training process has to stop at that

time This process is called as early stopping While training model though

many loops, we often find that training loss is much down and train accuracy

is up but validation accuracy is suddenly down at a iteration, we should stop

this process and store the model weight The approach is built available on both

tensorflow and keras

Summarily, the deep learning training process is broken into these basic steps:

Pre-processing data: it is the first step that must be done in computer vision

or machine learning fields Data is normalized to zero-mean and variance on

whole image or each channel Normalized dataset helps to train quickly and

high accuracy

Selecting architecture: we can design a simple small network such as two layers

with about less neurons Or upon on the specified problems, we can choose the

popular pre-trained models such as R-CNN family for object detection or Alex,

VGG, ResNet for classification

Training model: during a model training with default hyper-parameters as learn-

ing rate, batch size, learning rate decay or small regularization,etc Then we val-

idate if loss function is getting down and validation accuracy is good or not If

loss value goes barely down or explode, we can adjust learning rate to more or

less bigger accordingly

Optimizing hyper-parameters: to get the better hyper-parameters, we use grid of

parameter search After some tries, we adjust a range of learning rate or regular-

ization rate, then train a model with each parameters With each set of parame-

ters, there are validation accuracy and select which one gives the better accuracy

In high level library -Keras or PyTorch, this optimization process is ready for

usage

19

Trang 26

combination principal, some popular blocks such as [(CONV-ReLU)x]-POOL/NORM]

(I <= 5), [((FC-ReLU)xI] (1 <= 2), [ResNetxl], [Inceptionxl], [CONV-BATCHNORM- RELU] or apply a fusion approach at facet of feature extraction, evaluation met-

rics and algorithms

2.1.3 Present the popular deep network models

Training a neuron network needs a lots of data but mostly medical datasets are so small

to use the deep learning techniques The transfer learning is a considerable approach

to resolve this matter So far there are some popular networks as VGG16 [48], Incep-

tion [49], ResNet [50], Inception-Resnet [51], DenseNet [52] which is rather efficient in

medical classification in general and particularly in cancer detection

* VGG network: VGG is first deep neuron architecture after a success of Alexnet

VGG team did stack of many convolutional and full connected layers together

and archived better performance by utilizing the smallest inception filter of 3x3

convolutional filters They proved that deeper network increased the classifica-

tion accuracy on a large Imagenet dataset Table 2.1 is a summarized architecture

of VGG-16 layers and VGG-19 layers

Input Block 1 Block 2 Block 3 Block 4 Block 5 Layer

conv3-256 | conv3-512 | conv3-512 conv3-64 | conv3-128

VGG16 | Image conv3-256 | conv3-512 | conv3-512

maxpool | maxpool

Nets input

maxpool | maxpool conv3-256 | conv3-512 | conv3-512

maxpool maxpool maxpool

Table 2.1: VGG16 & VGG19 nets architecture

The ReLU activation function is always used though VGG nets Technique of

3x3 inception filter-stride 1 pixel is better than 7x7 or 5x5 inceptions filter -

stride 2 pixels on two factors about discrimination capability and number of

weighted parameters It’s an important work’s contribution The 3x3 convo-

lutional filters can learn the local features and then after many of stacked layers

to combine the localized low space, the nets synthesis higher feature spaces with-

out missing characteristics The incorporation of 1x1 convolution layer is other

20

Trang 27

approach to increase discrimination function and still keep the inception fields

in layer In recently years, VGG16 & VGG19 nets are used in transfer learning

techniques because of its shared low level features extraction and medium sized

architecture Two top full connected layers - 4096 of network is good discrimi-

nated deep features that can be used in combination or independent way with

handcrafted features in classification network

Inception network: Inception nets concentrated on efficient deep neuron net

Author used 1x1 convolutional operator to rise the deep architecture and re-

duce high dimensional spaces Inception module’s idea is to concatenate many

optimal local structure with high correlation analyzed from previous layer de-

fined in 2.6 Author used various different sized convolutional operator such as

1x1, 3x3, 5x5 combined together They are likely to be types of multi-scale pre-

sentation in pyramid scheme For the inception with reduction design, it allows

for increasing a number of nodes at each layer without effecting to next compu-

tation layer Totally, Inception network has 22 layers with trained parameters

* ResNet network: when neuron network become deeper, the accuracy will begin

to saturate and more than that it is facing degradation problem Authors from

ResNet’s work proposed to stack additional identity mappings as defined in 2.7

Author declared that originally H (x) is predicted mapping function which learns

a mapping from input to output Alternatively, we let define an another mapping

F(x) = H(x)-—x and so again H(x) = F(x) +x Now H(z) - residual function

- is easier to optimize with reference to the layer input This formula is also a

type of shortcut connection which borrows in long short term memory (LSTM)

network Residual block brings a flow of memory from input layer to output

layer

For our experiment, although Inception, ResNet achieved the better result than VGG

on ImageNet classification but to BreaKHis medical image, VGG transfer learning give

more discriminated features

2.2 Generative Adversarial Networks (GAN)

Basically, GAN composites two networks (see figure 2.8), codename as generator net-

work G(x) and discriminator network D(G(x)) G will generate the fake images

21

Trang 28

Figure 2.6: a) Native version of Inception net; b) Reduction version of Inception net

from studying input data distribution while D will discriminate either real from train

dataset and fake from G model For GAN, G resolves the difficult tasks than D that

recognizes correlation or distribution between nearly similarly objects and categorize

them into correct features space From initial stage, input dataset consists of random

z and real x data used to train G network Recently, GAN is focus much by research

community and there are variant GAN which gradually generate the realistic images

such as face, animal, natural picture,etc These techniques are dominated in GAN as

conditional-GAN or style transfer GAN

As 2.8, both generator and discriminator are neuron networks and trained simul-

taneously Discriminator loss function is optimized by back-propagation to adjust

discriminator’s weight Otherwise, training generator is rather complexity which in-

corporate to D’s feedback on output classification as well penalized if fake image is

classified as un-real image Thoroughly for GAN training principal, either G or D will

22

Trang 29

freeze while another is training with purpose of optimizing its loss as two-player game

denoted as below algorithm:

* Loop though some iterations

- Loop though batch size

+ Do m sampling for noise {2), ,2(")} from distribution p¢(z) + Do m sampling for train data {x), ,x(")} from distribution pgata(x)

+ Update discriminator by gradient:

mt

5n, = >, [IogDaa(x)) + log(1 — Doa(Gea(z")))]

1=1

- Do m sampling for noise {z), ,z()} from distribution p.(z)

- Update generator by gradient:

Trang 30

From generator’s deputy, the mostly used important technique is to upsampling Up-

sampling is a process of learning from the sequence of data and generate the approx-

imately sequences of data by capturing input’s density function Upsampling convo-

lution is available in tensorflow library Beside of GAN’s common principal, Radford

et al [53] suggested some guides to develop the stable GAN architecture:

Replace any pooling layers with strided convolutions (discriminator) and fractional- strided convolutions (generator)

Use batch norm in both the generator and the discriminator

Remove fully connected hidden layers for deeper architectures

Use ReLU activation in generator for all layers except for the output, which uses

Tanh

Use LeakyReLU activation in the discriminator for all layers

To understand how GAN network is designed, we will introduce some popular GANs

2.2.2 Present the popular GAN models

Pix2Pix: Pix2Pix is published in 2016 and used widely in many applications in-

cluding arts such as converting edge maps to cat photo, translating sketches to

Pokemon or portrait,etc Model’s concept is to translate image to image The

generator is U-Net network combined to skip connection between layer i and

layer n —i as ResNet and discriminator is convolutional PatchGAN classifier

Loss function is denoted

- L1 distance:

L(G) = Exyzlly ~ G(z,z)ll (2.9)

- Conditional GAN loss:

- Final loss:

Author decided to choose L1 distance instead L2 distance because D is easy to

detect the blurred image as fake PatchGAN discriminator’s idea is to return

24

Trang 31

average of all patch’s output, where each NxN patch image is fake or real image

Our chosen classification’s evaluation metric comes from PatchGAN’s concept

CycleGAN: for pix2pix framework, it performs well on using training set of

aligned image pairs such as translating from sketches to shoes that exists y’s

characteristic on input x So CycleGAN is proposed to translate image from do-

main A to domain B where doesn’t present a paired of images Author assumes

that there is some hidden relations between two domains and instead of learn-

ing a pair of images, author can try to discover on a set of images in both two

domains A state of the art in CycleGAN is to define a new loss function: cycle

consistency loss and adversarial loss Let define them as:

- Generator G:G: X —> Y

- Generator F: F: Y > X

- To capture latent characteristic from domain A’s image set and transform them

to domain B’s image collection, mathematically G & F should be inverse together

F(G(x)) xx and G(F(y)) ~ y It’s also the cycle consistency loss

- Adversarial loss function consists of

+ CycleGAN also used L1 distance:

+ Final loss:

Lean(G, Dy, X,Y) + Lean(F,Dx,Y,X) + ALcye(G,F) (2.15)

About GAN architecture, the author uses neural style transfer and super-resolution

as generator network and 70x70 PatchGAN as discriminator network

StyleGAN: it borrows an idea from style transfer works and skip connection in

ResNet to build a generator model as 2.9 Instead of training full image, author

25

Trang 32

encodes input image by 8 dense layers in mapping network into intermediate

feature w and then specialize w to style vector A Author used the adaptive in-

stance normalization technique to scale input feature to mean-variance in order

to match those of style y defined as AdaIn(x,y)=0(y) (1) + p(y) in which it

is scaled the normalized x with o(y) and shifted with ji(y) StyleGAN inherit

all hyper-parameters from Progressive GAN generator but replace the nearest

neighbor layers by bilinear for upsampling Injection of noise B before Adaln

operator helps to increase regularization factor to each layer We choose Style-

GAN to generate the fake benign and malignant images in our work

Synthesis network g Mapping

i

Figure 2.9: StyleGAN generator network

26

Trang 33

Image preprocessing: a whole slide image will be divided into many patch im-

ages on both BreaKHis and BACH dataset, particularly IDC dataset is still kept

as an original image size For each patch, image pixel in each channel is normal-

ized to the [0,1] range to decrease the colored intensive rate Then patch image is

resized to 256x256 pixels, using the bilinear interpolation method Each image

in train set comprises the all patches of an original image so that our network

can learn the multi deep features and increase the performance

Feature extraction using deep feature: the discriminated features extracted from

fine-tuning VGG16 and concatenated of fine-tuning VGG16 & VGG19 transfer

learning is classified by our novel approach In this work, all layers before 17"

layer of VGG16 & VGG19 is freezed and the rest of layers is re-trained

GAN and Data augmentation: We choose two GAN architectures to test data

augmentation capability on histopathological cancer images

to generate cancer images for each magnification factor Diagram 2.9 de- scribed the state-of-the-art generator network than traditional generator architecture in style transfer image problem Synthesized images from Style- GAN shows that it’s rather similar to original images in first eye looking

- Conditional GAN: we choose the Pix2Pix [55] as another data augmentation

method Firstly we think that combination of U-net with skip connector in generator network is the best fitting to histopathological images Secondly

27

Trang 34

Figur

CNN

we expected to copy all features from both input and conditional images to

increase the discriminated characteristic of benign and malignant tumor

We applied the StyleGAN and Pix2Pix for BreaKHis dataset and just Pix2Pix for

BACH dataset

Convolution neuron network for classification: In the recent years, Convolu-

tional Neural Network (CNN) proved as an efficient approach in computer vi-

sion and have significantly improved in cancer classification Both VGG16 and

VGG19 are proven to be a good candidate in transfer-learning technique To

get the discriminated benign and malignant from the tumor features, the base

networks have to retrained on datasets and then be used as an input for CNN

Concatenate layer

FC layer (4096 BatchNormalization filter)

Y _ ReLU layer

FC layer (4096 filter) Dropout (prod=0.2) ReLU layer

FC layer (512 filter)

‘Dropout (prod=0.2) |

ReLU layer IFC layer (512 filter) TC ]

FC layer (1 filter)

| ReLU layer

Sigmoid layer

FC layer (filter)

Sigmoid layer Malignant

e 3.1: (a) Fine-tuning VGG16 and CNN, (b) Fine-tuning VGG16 &VGG19 and

A combination of different feature extraction methods can increase the classifica-

tion accuracy This work uses VGG16 network and then both VGG16 & VGG19

28

Tiêu đề	A Computer Vision-Based Method For Breast Cancer Histopathological Image Classification
Tác giả	Mai Bui Thuy Huynh
Người hướng dẫn	Dr. Vinh Truong Hoang
Trường học	Ho Chi Minh City Open University
Chuyên ngành	Computer Science
Thể loại	thesis
Năm xuất bản	2019
Thành phố	Ho Chi Minh City

Định dạng
Số trang	68
Dung lượng	7,42 MB