A computer vision based method for breast cancer histopathological image classification by deep learning approach

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH BÙI HUỲNH THÚY MAI A COMPUTER VISION-BASED METHOD FOR BREAST CANCER HISTOPATHOLOGICAL IMAGE CLASSIFICATION BY DEEP LEAR

Trang 1

BỘ GIÁO DỤC VÀ ĐÀO TẠO

TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH

BÙI HUỲNH THÚY MAI

A COMPUTER VISION-BASED METHOD FOR BREAST CANCER

HISTOPATHOLOGICAL IMAGE CLASSIFICATION

BY DEEP LEARNING APPROACH

LUẬN VĂN THẠC SĨ KHOA HỌC MÁY TÍNH

TP HỒ CHÍ MINH THÁNG 02 NĂM 2020

Trang 2

BỘ GIÁO DỤC VÀ ĐÀO TẠO

TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH

BÙI HUỲNH THÚY MAI

A COMPUTER VISION-BASED METHOD FOR BREAST CANCER

HISTOPATHOLOGICAL IMAGE CLASSIFICATION

BY DEEP LEARNING APPROACH

Chuyên ngành : Khoa Học Máy Tính

Mã số chuyên ngành : 60 48 01 01

LUẬN VĂN THẠC SĨ KHOA HỌC MÁY TÍNH

Người hướng dẫn khoa học:

TS TRƯƠNG HOÀNG VINH

Trang 3

1 Literature review of breast cancer histopathological image classification 1

1.1 Introduction and general considerations 1

1.2 Goals of the thesis 6

1.3 Contribution of the thesis 10

1.4 Structure of the thesis 10

1.5 Methodology 11

2 Foundational theory 12 2.1 Deep neuron network 12

2.1.1 Introduction to deep neuron network 12

2.1.2 Present the techniques of neuron network training 20

2.1.3 Present the popular deep network models 26

2.2 Generative Adversarial Networks (GAN) 28

2.2.1 Introduction to GAN 28

2.2.2 Present the techniques of GAN training 31

2.2.3 Present the popular GAN models 32

Trang 4

3 Experiment and Discussion 42

3.1 Methodology 42

3.2 Experimental setup 44

3.2.1 BreaKHis dataset 44

3.2.2 BACH dataset 48

3.2.3 IDC dataset 48

3.3 Experimental result on three datasets 50

3.4 Comparing to handcrafted features and deep features for classification 50

Trang 5

I sincerely thank my advisor, Dr Vinh Truong Hoang - Ho Chi Minh City Open University, for guiding me

to complete the thesis

Trang 6

Computer vision field has became more active in the recent decades when scientists found to apply ematical and quantitative analysis Various applications have been using computer vision techniques toimprove their productivity such as visual surveillance, robotic, autonomous vehicle, and specially medicalimage processing Until Geoffrey Hinton and Yann LeCun, both known as “Godfather of deep learning”used Neural Networks and Back Propagation in characters and handwritten prediction given the best resultcomparing to previous works, the techniques has been became prominent

math-In this thesis, we focus to detect the breast cancer with high accuracy in order to decrease the examinationcost in accepted time So, we choose the deep learning to research and evaluate our approach on three datasetssuch as BreaKHis, BACH and IDC Due to some limitations of deep learning and dataset sizes, we proposethe composition of popular techniques to be boosting the efficient classification, they are transfer learning,Generative Adversarial Network (GAN) and neural networks VGG16 & VGG19 are the base models whichare applied to extract the high level features space from patch cropped images, naming as multi deep featuresbefore being trained by neuron nets So far, there are not any works to leverage GAN power to generatethe fake BreaKHis and in our thesis, we use Pix2Pix and StyleGAN model as generator model With theproposed approach, the cancer detection results achieve the better performance to some existing works with98% in accuracy for BreaKHis, 96% for BACH and 86% for IDC

Trang 7

l Number of block of layers are stacked together

Φ(x) The hypothesis space in traditional machine learning

L(Φ(x)) Loss function for each hypothesis

σ Activation function in deep learning

f , g Mapping function in deep learning

Trang 8

LBP Local Binary Pattern

WHO World Health Organization

GLOBOCAN Global Cancer Incidence, Mortality and PrevalenceCBE Clinical Breast Exam

CLBP Completed Local Binary Pattern

LPQ Local Phase Quantization

GLCM Gray Level Co-Occurrence Matrices

PFTAS Free Threshold Adjacency Statistic

ORB Oriented FAST and Rotated BRIEF

k-NN k-Nearest Neighbor

SVM Support Vector Machines

RF Random Forest

QDA Quadratic discriminant analysis

GPU Graphic Processing Unit

CNN Convolution neuron network

CONV Convolutional layer

FC Fully connected layer

MAE Manifold Persevering Autoencoder

Trang 9

DT Decision Tree

LR Logistic Regression

GAN Generative Adversarial Network

MRI Magnetic Resonance Image

SIFT Scale Invariant Feature Transform

SURF Speeded Up Robust Features

SGD Stochastic Gradient Descent

Trang 10

Chapter 1

Literature review of breast cancer

histopathological image classification

1.1 Introduction and general considerations

Cancer is a public health problem in the world today Among them, breast cancer is the most commoninvasive cancer in women and have a significant impact to 2.1 million people yearly In 2018, the WorldHealth Organization (WHO) estimated 627,000 death cases because of breast cancer, be getting 15% deathcauses As a result in 2018 from Global Cancer Incidence, Mortality and Prevalence, GLOBOCAN [1] about

a number of new and death cases of 36 cancer types from 185 countries though 5 continents shown in Table1.1, new breast cases is 11.6% and second leading cause of death cancer

Specially in 2012, GLOBOCAN estimated Vietnam – South-Eastern Asia country with low per tal income about 3,200$/year and 20$/year for voluntary medical expense – that this breast cancer was23/100,000 and had the risen trend [2]

capi-Early cancer detection has many changes to treat and increase survival rate for patients WHO finds thatthere are the effective diagnostic methods such as X-ray, Clinical Breast Exam (CBE) but this needs to havethe professional physicians or experts Beside the diagnostic result is not always 100% accuracy because ofsome reasons such as subjective experiments, expertise, emotional state

In recent years, trend of image processing field and machine learning proved that physician can employthis technology to make diagnosis via medical image Medical image processing method has been appliedmuch on cancer diagnosis [3] and other diseases [4] with high accuracy in short time Image diagnosis

by machine learning is cost efficient method in Vietnam’s urban region where there is no any professional

Trang 11

Cancer site No of new cases

(% of all sites)

No of deaths(% of all sitesLung 2,093,876 (11.6%) 1,761,007 (18.4%)Breast 2,088,849 (11.6%) 626,679 (6.6%)Prostate 1,276,106 (7.1%) 358,989 (3.8%)Colon 1,096,601 (6.1%) 551,269 (5.8%)Nonmelanoma of skin 1,042,056 (5.8%) 65,155 (0.7%)Stomach 1,033,701 (5.7%) 782,685 (8.2%)Liver 841,080 (4.7%) 781,631 (8.2%)Rectum 704,376 (3.,9%) 310,394 (3.2%)Esophagus 572,034 (3.2%) 508,585 (5.3%)Cervix uteri 569,847 (3.2%) 311,365 (3.3%)Table 1.1: Top 10 of popular cancer sites sorted by new cases in descending order

medical teams

For the most part, researches demonstrated the improvement of breast cancer classification accuracy[5, 6, 7, 8, 9] but it doesn’t achieve to the significantly high rate A main reason is a limited on trainingdataset, as well collecting and annotating sufficient data by pathological expert is time-consuming andexpensive

Nowadays there are open-access breast cancer databases for research and literature such as BreaKHis,ICIAR 2018 BACH Challenge, Kaggle breast histopathological image, Tumor Proliferation Assessment Chal-lenge 2016 But almost works experimented on BreaKHis [8] built on collaboration with the P&D Laboratory

- Pathological Anatomy and Cytopathology, Parana, Brazil, it means that those results can’t gain the sameaccuracy on new dataset

Deep learning is a branch of machine learning, representing data characteristic by layers from simplesymbols of point, line to complex, abstract structure of polygon In 1986, Rina Dechter first introduced deeplearning in machine learning community In 1970, multiplayer perceptrons algorithm stimulated the capacity

of human brain to recognize and discriminate objects, especially many applications in computer vision Then,Yann Lecun achieved the good result of digit handwriting classification by backpropopation in deep learning

in 1987 Nowadays, deep learning has been developing so quickly and widely as well applications in manyfields

Indeed, although BreaKHis is breast cancer benchmark database, it is not as large as ImageNet which are

Trang 12

built on collaboration with Stanford University, Princeton University, Google Company and A9 Research.ImageNet includes 14,197,122 images and 20,000 categories used much on deep learning.

New machine learning researches have been getting high accuracy breast cancer classification by manyvarious supervised and unsupervised learning algorithms For literature reviews from 2016 to May 2019,this thesis studies three main techniques: handcrafted and/or deep feature, transfer learning, generativeadversarial network

Handcrafted-feature or deep feature: Spanhol et al [10], Badejo et al [11] made comparison handcraftedfeatures extractions such as Local Binary Patterns (LBP), variant LBP – Completed Local Binary Pattern(CLBP), Local Phase Quantization (LPQ), Gray Level Co-Occurrence Matrices (GLCM), Free ThresholdAdjacency Statistic (PFTAS), Oriented FAST and Rotated BRIEF (ORB) and classifiers as 1-Nearest Neigh-bor (1-NN), Support Vector Machines (SVM), Random Forest (RF) To improve the accuracy to a range of98.5% - 100%, the Spanhol combined the boosting techniques of 1-NN, QDA, RF, SVM But the best result

is efficient for 40×and 400× magnification Author concluded that PFTAS feature is suitable for medicalimage Graphic Processing Unit (GPU) development for big data processing, Spanhol et al [8] proposeddeep learning algorithm which is convolution neuron network (CNN) as CONV-Max Pool-CONV-AveragePool-FC-FC with 32×32 and 64×64 window patch size in sequence and random order Accuracy range is76.8% - 96.1% Gupta et al [12] explored the jointly color-texture information (RGB and HSV color space)with and without stain normalization and various contemporary classifiers used in Spanhol’s work Song

et al [13], [14] suggested Fisher Vector combined with CNN to be popular in general computer vision.But this descriptor is high dimensionality In two years, the authors presented different encoding meth-ods to get it more discriminated features such as intra-embedding algorithm and Fisher Vector descriptor

2 x 512 x N extracted from VGG19 and GMM model with N Gaussian components Qi et al [15] havepointed to use entropy-based and confidence-boosting strategy as deep active learning method for smalltraining dataset classification which reduces annotation costs up to 66.67% with high accuracy from 88.29%

to 91.61% Mukkamala et al [16] has built a deep learning technique based on principal component analysisfor each channel of LAB color space and SVM with accuracy from 85.85% to 96.12% Kumar et al [17]built a CNN model as 3CONV[5×5]-3CONV[3×3]-ReLU-Pool-FC to extract the deep features from medicalimages Gupta et al [18] found that histopathological stain normalization before using handcrafted-featureextraction will get the cancer classification to be more efficient than gray scale image Feng et al [19]exploited the unsupervised learning capacity by using autoencoder network named as manifold perseveringautoencoder (MAE) to learn the encoded features from input and then decoding hidden presentations to out-put For a new algorithm, Feng et al archived accuracy from 82% to 99.16% Reza et al [20] experimentedthe sampling techniques such as Under-sampling, Over-sampling, ADASYN, SMOTE with CNN network

Trang 13

and found that unbalance data will be effecting to accuracy Deploying over-sampling method to unbalanceBreaKHis dataset gets the better performance Angara et al [21], Guillén-Rondon et al [22] proposed theCNN network such as 3-[Conv-ReLU-Pool]-2FC-Softmax Alom et al [6] combined the strength of Inception,ResNet and Recurrent Convolutional Neural Network with 95% and 97% accurate classification with/withoutaugmentation for 4 magnification factors Two core ideas of Zhang et al [23] are to use skip connection inResnet to solve the optimization issues when network becomes deeper and CBAM to refine Resnet features.This method gained the 92.6% highest accuracy for 200×and lowest 88.9% accuracy for 400× Sudharshan

et al [5] compared various multiple Instance Learning (MIL) together and concluded that non-parametricMIL,which extracted MIL feature space using a Parzen window technique and k-NN classifier, is higher ac-curacy than MILCNN and Single Instance Learning, with 40×magnification rate its accuracy is 92.1% Roy

et al [24] proposed a patch-based classifier using CNN network consisting of 6CONV-5POOL-3FC Authorexperimented this model with 512×512 patch size based data which contained more information and wasefficient size, gained 92.%accuracy on ICIAR 2018 Alirezazadeh et al [25] learned features space from twodifferent domains using LBP, LPQ, PFTAS and then forming a projection matrix, in this case that domainspace is benign and malignant This method gave the better performance than using each separated LBP,PFTAS, LPQ or CNN feature and classifier as Spanhol’s work Fondón et al [26] extracted 3 features typesuch as the nuclei-based feature by transforming to CMYK color space and K-mean clustering algorithm;region-based vector of pink/violet, pink/white, white/violet; texture feature consisting of first order statisticvector, LBP and sparse texture descriptor Fondón used 9 classifier to detect cancer tumor using datasetthrough Bioimaging 2015 Grand Challenge

Transfer learning technique: Weiss et al [27] evaluated different feature extractor using VGG, ResNetand Xception in training a limited number of samples and achieved state-of-the-art results on BACH dataset.This method downsized BACH image to 1024×768 in order to train the classification model Vo et al [7]applied the augmentation method as rotate, cut, transform image to increase the training data volumesbefore extracting deep feature from Inception-ResNet-v2 model in order to avoid the over-fitting Theytrained the model with multi-scale input images 600×600, 450×450, 300×300 to extract local and globalfeature Then Gradient Boosting Trees model again was trained to detect breast cancer Fusion model willvote the higher accuracy classifier The accuracy rate archived to 93.8% – 96.9% at low cost computation.Murtaza et al [28] used Alexnet as feature extraction hierarchical classification model by combination of 6algorithms: kNN, SVM, NB, DT, LDA and LR and finally feature reduction to increase overall accuracy92.45% to 95.48% Li et al [29] deployed the transfer learning Xception network to avoid model over-fitting

Li applied the Resnet technique to transfer prior knowledge to latter layer in order to achieve accurate andprecise classification Cascianelli at el [30] proposed a new dimension reduction - Principal Component

Trang 14

Analysis, Gaussian Random Projection, Correlation based Feature Selection after applying the pre-trainVGG-F, VGG-S and VGG-veryDeep network for limited dataset as BreaKHis to takeover over-fitting issueswith accuracy from 84% to 94.7% Brancati at el [31] chosen the fine-tuning ResNet network strategy

of 3 different configurations, 34, 50, 101 layers and then voted which classification is getting highest classprobability from these configuration This work gets 97.2% accuracy for benign and malignant tumor onBACH dataset Awan at el [32] used ResNet-50 layers to extract descriptors from overlapped patch-basedimages and then applied PCA dimension reduction Shallu et al [33] proved that transfer learning fromVGG16, VGG19, ResNet50 is better than fully scratch training because these networks have utilized asdiscriminated features and meanwhile VGG16 will be better feature generators Gandomkar et al [34] usedResNet-152 layers to extract features from five overlapping patches in a stained normalized image Thistechnique is applied to each magnification rate 40×, 100×, 200×, 400× to detect malignant/benign andsubtype cancer with 97.66% - 98.52% and 94.60% - 95.40%

Generative Adversarial Network (GAN) technique: Shin et al [35] used Image-to-Image ConditionalGAN mode (pix2pix) to generate synthesis data and discriminate T1 brain tumor class on ADNI dataset.Then author continued to apply this model on other dataset BRATS to classify T1 This GAN yielded10% increased accuracy compared to train on the real image dataset Iqbal et al [36] proposed a newGenerative Adversarial Network for Medical Imaging (MI-GAN) to generate synthetic retinal vessel imagesfrom STARE and DRIVE dataset This method generated precise segmented image better than existingtechniques Author declared that synthetic image contained the content and structure from original images.Senaras et al [37] employed a conditional Generative Adversarial Network (cGAN) to generate synthetichistopathological breast cancer images G model used the modified version of U-net D model used a CNNbased classifier patch GAN Author’s experiments showed that synthetic images are indistinguishable fromreal ones Six readers (three pathologists and image analysts) tried to differentiate 15 real from 15 syntheticimages and the probability that the average reader would be able to correctly classify an image as synthetic orreal more than 50% of the time was only 44.7% Mahapatra et al [38] proposed P-GANs network to generate

a high-resolution image of defined scaling factors from a low-resolution image This research suggest themulti-stage network with a triple loss function based correction mechanism Output from previous stage will

be baseline to improve next stage’s output This technique helped to recover the degradation of image quality

at each stage The final super resolution image obtained the close accuracy to original magnetic resonanceimage (MRI) in landmark and pathology detection Cai et al [39] studied the cross-model volume-to-volumetranslation technique from the pancreas classification to breast lesion segmentation domain by two differentmedical image types Frid-Adar et al [40] followed DC-GAN, AC-GAN network for synthesizing highquality liver lesion ROIs and then used CNN network as CONV-SUBSAMPLING-CONV-SUBSAMPLING-

Trang 15

CONV-SUBSAMPLING-FC-DROPOUT to classify liver lesion Wu et al [41] proposed the conditional-GAN(ciGAN) to generate the fully contextual in-filling image of breast lesion This work observed that ResNet-50classifier trained with GAN augmented dataset produced a higher AUROC curve to traditional augmentationwith the same classifier.

Both handcrafted and deep feature demonstrate the good cancer detection capability Various researchescombine numerous color features and local texture descriptors to improve the performance [42,43] Modak

at al [43] did comparative analysis of several multi-biometric fusions consisting levels of feature-mostlyfeature concatenation, score or rules/algorithms level Authors statistically analyzed that fusion approachrepresents many advantages than single mode such as accuracy improvement, noise data and spoof attackreduction, more convenience A at al [42] exploited the powerful transfer-learning technique from popularmodels such as Alexnet, VGGNet-16, VGGNet-19, GoogleNet and ResNet to design the fusion schema atfeature level for satellite images classification It is said that fusion from many ConvNet layers are betterthan feature extracted from single layer Features extracted from CNN network is less effected by differentconditions such as edge of view, color space; it is an invariant feature and getting the better generalization.Thus data augmentation methods might affect the accuracy if it is applied inadequately In order to savelow computation cost from scratch, transfer learning technique can be considered to employ in medical field

It needs to be retrained or fine-tuning in some layers so that these networks can detect the cancer features.Furthermore, GAN is the effective data augmentation method in computer vision but GAN training process

is still a difficult problem These method have been investigated intensively for common data and rarelyfor medical data To overcome this limitation, we propose a composition method of three techniques to beboosting the breast cancer classification accuracy in a limited training data

1.2 Goals of the thesis

The objectives of this study are:

• This thesis will use Generative Adversarial Networks (GAN) to build the synthetic breast cancer images.Goodfellow et al [44] [45] proposed new generative model which trained the model by adversarialprocess GAN included 2 separated models such as generative model G and discriminative model D,but trained them concurrently G learn the distribution of training dataset while D tried to discriminatewhich is true or fake image generated by G D estimated the conditional probability of p(y|x) G tried

to optimize the conditional probability of p(x|y)in order to make fool of D We can understand that

Trang 16

Dand G play the two-player minimax game with function in equation 1.1.

min

θg

max

θd

Loss function θ(d)maximize D(x)to gain 100% probability on true image and D(G(z))gained 0% on

fake image Otherwise, loss function θ(g) minimize D(G(z))to gain 100% on fake image

Figure 1.1: GAN network

In Fig 1.1, noise z as Gaussian or uniform distribution is input to train G model Conceptually, z

is latent feature extracted from generated image Output from G is used as input to train D model

to discriminate either real or fake image Mini batch stochastic gradient descent (SGD) trains GAN

model to optimize θ(d), θ(g) To speed up the training process, GAN can use Adam algorithm as well

• Previous year, many researches [46] [47] are about the efficiency of handcrafted feature such as ScaleInvariant Feature Transform (SIFT), Speeded Up Robust Features (SURF) and deep network and/ordeep feature such as features extracted from VGG16, VGG19 (developed by research group in Oxforduniversity), ResNet (developed by research group in Microsoft company) The thesis apply the basicdeep network model to extract the breast cancer feature, instead of handcrafted features which can’textract the complex cancer characteristic in medical image

• This thesis proposes a new algorithm to classify the breast cancer image in three databases: BreaKHis,Breast Cancer Classification Challenge 2018, Kaggle in order to improve the classification performance.WHO declared that there were many image types used in cancer diagnosis such as X-ray image findabnormal region but can’t identify the cancer region or not; biopsy image can define the cancer region or notbut can’t identity cancer subtype, shape or other characteristic such as distribution or balance of cell Forhistopathological image, experts can classify cancer region, its levels This work propose method to detectcancer from histopathological image in three databases

1 BreaKHis: The BreaKHis is benchmark database to study the breast cancer classification problem.There are 7,909 images from 82 patients using 4 magnifications (40×, 100×, 200×, 400×) This dataset

is divided into 2 main groups: benign and malignant tumors, 8 sub cancer type as well totally size is4GB This dataset has been built in collaboration with the P&D Laboratory – Pathological Anatomyand Cytopathology, Parana, Brazil

Trang 17

(a) (b) (c) (d)

Figure 1.2: Illustration of BreaKHis database at different magnification factors of benign cell 40×(a), 100×

(b), 200×(c), 400×(d) and malignant cell 40×(e), 100× (f), 200×(g), 400×(h)

Class Subclass Patients Magnification Factor Total

40× 100× 200× 400×

Fibroadenoma 10 253 260 264 237 1,014Tubular Adenoma 3 109 121 108 115 453Phyllodes Tumor 7 149 150 140 130 569Subtotal 24 652 644 623 588 2480Malignant Ductal Carcinoma 38 864 903 896 788 3,451

Lobular Carcinoma 5 156 170 163 137 626Mucinous Carcinoma 9 205 222 196 169 792Papillary Carcinoma 6 145 142 135 138 560Subtotal 58 1,370 1,437 1,390 1,232 5,429Total 82 1,995 2,081 2,013 1,820 7,909Table 1.2: Image distribution per magnification, class and subclass in BreaKHis

2 Breast Cancer Classification Challenge 2018: The BACH 2018 is been built on collaboration withUniversidade do Porto, Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência

Trang 18

(INESC TEC) and Instituto de Investigação and Inovação em Saúde (i3S), Portugal This datasetconsists of 400 images which is divided into four groups as well totally its size is 13.2 GB Each imagealso classified into two main groups, non-carcinoma and carcinoma, by grouping the normal and benignclass into non-carcinoma and grouping the in situ and invasive into carcinoma class.

Table 1.3: Image distribution per magnification and class in ICIAR BACH Challenge 2018

3 BCa - Kaggle:The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa)specimens scanned at 40× in Hospital of the University of Pennsylvania and The Cancer Institute

of New Jersey From that, 277,524 patches of size 50 x 50 were extracted (198,738 Invasive DuctalCarcinoma-IDC negative and 78,786 IDC positive)

Figure 1.4: Illustration of IDC Kaggle database for (a) Normal, (b) IDC cell

Trang 19

Magnification Factor Normal IDC Total

40× 198,738 78,786 277,524Table 1.4: Image distribution per magnification, class in BCa(Kaggle) database

1.3 Contribution of the thesis

• Fine-tuning the deep learning models of VGG16 & VGG19 which create the good discriminated tracted features for histopathological images

ex-• Develop the neuron net for classification

• Utilize GAN for generating the histopathological images which create fake cancer images

• Define fusion evaluation method from patch images

• Composite modern techniques of transfer-learning, deep learning and GAN to improve the classificationperformance

1.4 Structure of the thesis

The thesis is organized into these chapters

• Chapter 1: Introduce why the thesis is selected and overall literature We have researched on three maintechniques of deep learning: handcrafted/deep features, transfer learning and generative adversarialnetwork Each technique is recently applied in medical domain and specially in breast cancer detection

• Chapter 2: There are two sections deep learning and generative adversarial network in foundationtheories For deep neuron net theory, we research how simple neuron network is performing and itscore components Then introduce these techniques to develop and train these model effectively Afterthat, we represent these popular models which are applied much on transfer learning nowadays such asVGG, Inception, Resnet or Mobilenet in embedded devices For generative adversarial network theory,

we represent what GAN is and its limitation; how to overcome these difficulties when training GANmodel In other word, we research current and popular GAN models

• Chapter 3: Experiment and discussion is introduced here We proposed the composition method ofabove techniques And then we do these experiments on three datasets BreaKHis, BACH and ICD

Trang 20

with our methods Achieved result is displayed that our model is better accuracy than some recentpapers on three datasets.

• Chapter 4: It’s a section about conclusion and future work

1.5 Methodology

• Research in deep learning techniques optimized in training model

• Research in pre-trained popular models and deep feature extraction

• Research in fine-tuning models from different domains such as medical images

• Research in GAN models and how to be applied in medical images

• Research in fusion models and datasets

– Selection of patch image or whole image in a specified datasets

– Normalize image to [0,1] scale and then resized to 256×256 pixel values

– Feature extraction from fine-tuning VGG16 & VGG19 nets

– Develop the CNN classification model

– Evaluation method

Trang 21

Chapter 2

Foundational theory

2.1 Deep neuron network

With innovation of high performance computing system such as GPU or grid of massive clusters, forward andbackward propagation applied in neuron network proved that this technique improves the classification errorrate than machine learning approaches such as SVM, Random forest,Bayesian network,etc These networkscomprise many layers into deep neuron network architecture to learn features from low to higher via a stack

of layers Nowadays deep learning is a remarkable technique and mostly be considered to apply in manyfields such as computer vision, natural language processing, video analysis

2.1.1 Introduction to deep neuron network

For machine leaning approach, we have to collect dataset, analyze and understand what data is or howdistribution it is Applying the feature extraction and selection such as feature ranking, dimension reduction

to get the shaped dataset before building model We make various questions to select the hypothesis space

Φ(x) and accordingly loss function L(Φ(x)) to generalize our data at the best During a training modelprocess, minimizing the loss function is very important so that our predicted result reaches to target value.But the neuron network is driven to definitely different way, instead of choosing the best one, this method belearned to find the hypothesis Figure 2.1 shows the differences between the learning approaches From datainput, neuron network will study how shape of line, point, geometry is from small data sample Stackinglayers together is to learn and find the connections of simple objects and generalize them into abstract thingbefore understanding discriminated features between of object classes Nowadays, many researchers havebeen developed large network of many neurons in order for whole process to be learned gradually until

Trang 22

reaching to target of human.

Figure 2.1: Summary of learning approaches by Goodfellow, Bengio, Courville Geen box are thing to belearned

Conceptually, neuron nets is aspired from how human brain has been working Figure 2.2 describesthe physical brain structure that there mainly are three components such as dendritic tree, axon hillockand axon Dendritic tree collects input information from other neurons via axon; axon contacts dendritic

Trang 23

trees of other neurons at synapses ;axon hillock receives output from dendritic tree and generates outgoing aspike of activity at synapses into post-synaptic neuron To summary, each neuron receives input from otherneurons and flow of information in input line is controlled by synaptic weight This connection weight can beadjusted efficiently to receiver during cognition process Then the main principal of a neuron is simulated incomputer science as in figure 2.3 defined y=σ(x1×w1+x2×w2+x3×w3+b) Input signal from otherneurons are transferred and their weights W can be adjusted accordingly Final output information will

be summarized at output node Activation function σ makes the neuron to be able to do the complicated

computation Mathematically, activation function makes affine transformation from linear to non-linear.Neural network comprise a thousand of this simple node or neuron which computes for tasks

Figure 2.2: Basic components in physical neuron

So deep learning is an algorithm that have many layers of processing together to resolve particulartasks such as classification, object detection,etc Each layer consists of many neurons (nodes or units) asdemonstrated network in figure 2.3b Deep learning technique generated the mapping functions by studyingthe relation of features It is not definitely fixed function as traditional machine learning The function fwill map an input x to the intermediate output y defined as y= f(x; w)and then study parameter values

w to get the best approximated function f The model 2.4 is also called as feed forward and extremely

Trang 24

Figure 2.3: Simulation a neuron and neuron network in computer science

important in deep learning networks Feedforward deep network comprise many different functions together

as chain structures to learn the abstract features, defined as u=g(h(f(x)))

Convolutional network is terminology in neuron network architecture Convolutional network consists

of three typical stages First stage is a combination of many convolutional layers to do affine transform oninput layer and next stage is to run the nonlinear activation to detect the complexity object and final stage

is to pooling layer It is defined as below description

• Convolutional layer: in general form, convolution is an operation on two real value x & w to measure theweighted average denoted as s(t) = (x∗w)(t) In neuron network terminology, is matrix multiplicationbetween input of processed image and weight (kernel) to get output as feature map in figure 2.5

Trang 25

Figure 2.4: Feedforward neuron network

To improve the learning capabilities, convolution has two important ideas: sparse connection (sparse

Figure 2.5: a) Convolution operator; b) Stride (stride of 2) convolution operator

interaction), parameter sharing For sparse connection, one input units can effect to many output inputand otherwise because kernel size is rather small than input image and result is some connections to

be zero out In a case of fully connection, convolutional layer is called as dense layer or full connectedlayer It proved that when processing large image with a million pixels, small meaningful characteristiccan be detected as edges, important points with small reduced parameters and efficient computation.Secondly sharing parameter idea, single parameter can be used for many inputs Composition of bothideas, convolution can improve greatly object detection

• Pooling layer: is a kind of convolutional layer that is used to adjust unit’s value by statisticallysummary of neighbor units such as max, average Pool purpose is to reduce the small variancesfrom whole neighborhood Pool operator is a good candidate to define the features regardless ofparticular position or variable input size Pooling layer is normally designed after dense layer toproduce invariance to translation such as rotation transformation In other hand, pooling layer can

Trang 26

be used as the down-sampling technique that reduces the presentation size computationally on nextlayer Currently Tensorflow, Pytorch support many pooling types: max-pooling 1D, 2D (refer to figure2.6), 3D and average-pooling 1D, 2D, 3D Max-pooling operator calculates maximum value of neighboritems defined in kernel pool size that sometimes represents to perform better than average-pool, whichget an average value of neighbor items, because its capability can detect the local maxima points.

Figure 2.6: Stride (stride of 2) max pool (2x2) operator

• Activation layer: there are popular activation functions used much in neuron nets such as softmax,sigmoid, relu,etc to work on non-linear feature space

– Softmax also known as normalized exponential function to normalize input vector into probabilitydistribution of K components denoted as

Trang 27

Figure 2.7: Sigmoid function plotting

learned via a chain of back-propagation and then data itself Output’s sigmoid is always positiveand then gradient parameter will become either negative or positive value So It’s difficult to findlocal/global optimal point because of zig zag learning path and the network can give the wrongprediction

– Hyperbolic tangent (Tanh)

Figure 2.8: Tanh function plotting

Trang 28

– Rectified linear activation function (ReLu) [48] denoted as

σ(x) =max(0, x) (2.4)will zero out all negative number and keep positive number (refer to figure 2.9) ReLU hascharacteristics as less train time with gradient descent because of simple math and non-saturatingnon-linearity It doesn’t suffer the vanish gradient problem as sigmoid or softmax and quicklyconverges Krizhevsky et al compared the performance in four-layer convolutional neuron net-work between ReLU and Tanh that ReLU (solid line in figure 2.10) is six times faster than Tanh(dashed line in figure 2.10 ) after five iterations ReLU partially improved the saturate gradientbut if input belongs to negative space, gradient is till killed So to resolve it, some researcherssuggest to initialize bias value

246

xy

Figure 2.9: ReLU function plotting

– Leaky ReLU: After years, researchers are continuously concentrate to study the activation function

by improve ReLU’s limitation Maas et al [49] defined

LeakyReLU=max(0.01x, x) (2.5)got non-zero gradient over full domain Author experimented that Leaky ReLU converged slightlythan ReLU but actually it didn’t impacted to neuron network’s optimization comparing to ReLU.Maas observed that average probability of activation hidden units in Leaky ReLU is 6 timesgreater than in Tanh Then He et al [50] didn’t use the value 0.01 for activating negative regionbecause this constant must be selected carefully and it effected much on model’s accuracy, so he

Trang 29

Figure 2.10: Comparison of ReLU and Tanh on training speed by Krizhevsky

suggested alternative approach that is the adaptive parametric value

ParametricReLU=max(αx, x) (2.6)– Maxoutneuron [51] is ReLU and Leaky ReLU composition Goodfellow et al proposed a newactivation function to calculate output’s unit by getting max of all inputs It is robust to learningboth negative and positive region and be easy to train with dropout From author’s experiment,maxout activation function got the better performance than ReLU as figure 2.11

Maxoutneuron=max(W1x+b1, W2x+b2) (2.7)

2.1.2 Present the techniques of neuron network training

• When designing the neuron network, activation function selection in each hidden layer has to beconsidered carefully because it impacts to the computationally effects, quickly converge to local/globalminimum in training process If using sigmoid function in neuron network, we should avoid two issuesdefined above Initializing the parameters to get large number will barely kill the gradient value inback-propagation training Input data should need normalized to mean-zero distribution to take overthe zig-zag learning problem From the section 2.1.1, ReLU, Leaky ReLU, Parametric ReLU or Maxoutare well-working activation and represent the fast training speed comparing to sigmoid or tanh but

it needs considered to choose the suitable parameter with input data and other training techniques

Trang 30

Figure 2.11: Comparison of large rectifier networks to maxout by Goodfellow

Sigmoid computation cost for epsilon function is rather hight so ReLU is a suggestion option fornow For classification output layer, we use sigmoid or softmax activation function which transfer realnumber to probability distribution of range[0, 1], where binary classification is sigmoid and softmax isused for multi-class detection

• Batch normalization: batch norm is a kind of regularization technique that normalizes each outputdimension to expectation and variance defined as:

ˆx(k)= x(k)−E[x(k)]

qVar[x(k)]

(2.8)

It is proven to speed up convergence Ioffe and Szegedy suggested to use the batch norm after dense

or convolutional layer and before nonlinearity function There are some efficients when using batchnorm in the neuron nets such as allow to setup higher learning rate to quickly reach to local/globalminima, reduce the dependence on weight initialization as standard distribution or dropout Recently,there are additional normalization techniques as layer, instance and group norm

– Layer Normalization [52]: batch norm is proven that it is boosting the training time by calculatingmean-variance of input per batch of data in feed-forward network Mean and variance number

in training phase is used again in test phase And that definitely depends on batch size and

it is difficult when applied in recurrent neuron network (RRN) because variable-length inputsequences can’t be summed statistically This approach can’t work well for RRN network Ba et

al suggested layer norm which calculated mean and variance statistic from the summed inputs in

Trang 31

hidden layer Layer norm calculated the mean/variance per each sample and then per the layer.

So it is not depended on batch-size and become suitable for variable input in RRN

At each hidden unit i within K input which is differentiate the normalization term

µi= 1K

H

∑

h=1

(xlh−µl)2 (2.13)– Instance Normalization (Contrast normalization) [53] is used to generate styled image by captur-ing the structure of content image in generator model Ulyanov at el replaced the batch norm

by instance norm per each channel and batch index

– Group Normaization [54]: Wu at el found the batch norm drawback if its size is graduallysmall which made statistic estimation incorrectly refer to figure 2.12 Authors proposed groupnormalization which divided channel into group and then normalize these features together Forthis approach, group norm is independent to batch size

µi= 1m

Si

∑

k=1

(xk−µi)2 (2.15), with m is a group size, Si = {k|kC=iC} (C is channel) Author experimented that error rate

is changed much on batch norm and not to change by group norm Group norm becomes layernorm if group number is equal to 1

Each variant normalization is specified for application Instance norm is gain the better result intransferring style image while layer norm is good at recurrent neuron network Batch norm is appliedsuccessfully in many applications of computer vision, video analysis Depending on related purpose,selecting the normalization technique need experimented To medical image analysis, we choose thebatch norm as generalizing basic features from transfer learning model

Trang 32

Figure 2.12: ImageNet classification error vs batch sizes by Wu

• Dropout: dropout’s principal is to turn some neurons off randomly by multiplying its output value

by zero and then they don’t take action in forward propagation Dropout technique is rather similar

to bagging approach that composites various different networks with shared parameters instead oftraining many large architecture with highly computation The probability p hyper-parameter is used

as optimized wide range of networks; higher probability is, less dropout is This technique makes thenetwork less over-fitting As author’s experiments, applying 20% dropout in input units and 50% ofhidden layer is optimal selection

• Data augmentation: deep learning always needs a huge train set to reduce the over-fitting but in practice

to get large volume is heavy task as medical images So data augmentation is considered selection.Traditionally, horizontal or vertical rotate, randomly cropped images, scaling, resizing, changing colorspaces or combining all things are common technique and recently GAN is nominated candidate indata augmentation It learns distribution of input data and then generate fake output which is gettingthe approximated features as input In our work, we combine both of types

• Transfer learning: it bases on volume or industry of dataset to decide how many layers is freezed or trained accordingly Mostly vision models are trained on ImageNet dataset so these extracted featuresare generic and sometime it needs to re-trained for particular issue In principal, these bottom layersextract generic features while going further to top of model is more specified features As statisticallysummary, a small dataset in similar domain, we can freeze many bottom layers and train a little top

Trang 33

re-layers When dataset becomes bigger and different domain, it has to be trained more re-layers.

• Optimization is to improve the learning process which is reaching to global minima quickly and avoidlocal manima or saddle point in training neuron network; defined by formula

, within θ is optimized parameter vector, L is a cost function and R is a regularization function There

are many optimization approaches such as stochastic gradient descent (SGD), momentum, nesterovmomentum, AdaGrad, RMSProp, Adam Actually, these techniques are used much in machine learningbut Adam is mostly used in training neuron network Adam is the technique of compositing betweenmomentum and AdaGrad or RMSProp Momentum build up velocity to accelerate the SGD and stepover the minimun local or saddle point Selecting efficient learning rate is not easy task as well eitherlarge or small value can take a long time training to achieve the best loss So AdaGrad calculatesthe adaptive learning rate to gradient function by summing of historical squared of gradient in eachdimension For Kingma and Ba, hyper-parameters in Adam are advised as beta1=0.9, beta2=0.999,learning rate=10−2or 5x10−4for starting to train model Adam optimizer is used in our experiments

In more details, the optimization techniques are

– Stochastic gradient descent with Momentum [55]: stochastic gradient descent (SGD) is iterativemethod in direction to optimize the loss function g(θ) =L(fθ(x), y)) It adjusted the θ parameter

vector by gradient value in direction to reduce the loss function from previous step as equation2.17

θt+1=θt−α5f(θt) (2.17)

, α is learning rate,5f(θt)is gradient at θt If learning rate parameter is large, optimization cess is hard to converge to global minima point If learning rate parameter is small, optimizationprocess is so slow Gradient is calculated for all dimensions so it can be effective for this dimensionand bad for other dimensions So training neuron network, decay parameter is option to take over

pro-it and selecting the learning rate and decay rate need tested by cross-validation method Thereare some drawbacks when using SGD optimization in multi-modality distribution, exists localminima, or saddle points At this time, the learning process can’t be optimized and then modelcan give the wrong prediction Qian proposed to add momentum as creating velocity to push oversaddle points

vt+1=µvt−α5f(θt) (2.18)

θt+1=θt+vt+1 (2.19)

Trang 34

, µ⊂ [0, 1].

– AdaGrad [56] is to take over SGD’s drawback This work suggest the adaptive learning rate foreach dimension which scales gradient from sum of squared gradients AdaGrad has been performwell in sparse dimension spaces

– Adam [57] suggest the efficient stochastic optimization with little memory computation Adamoptimizer updated the estimated weight by first moment (mean of gradients) and second moment(variance of gradients) as combination of Momentum and AdaGrad

• Early Stopping: after many train iterations, validation accuracy to be gradually decreasing or no anychanges means that training process has to stop at that time This process is called as early stopping.While training model though many loops, we often find that training loss is much down and trainaccuracy is up but validation accuracy is suddenly down at a iteration, we should stop this processand store the model weight The approach is built available on both tensorflow and keras

Summarily, the deep learning training process is broken into these basic steps:

• Pre-processing data: it is the first step that must be done in computer vision or machine learning fields.Data is normalized to zero-mean and variance on whole image or each channel Normalized datasethelps to train quickly and high accuracy

• Selecting architecture: we can design a simple small network such as two layers with about less neurons

Or upon on the specified problems, we can choose the popular pre-trained models such as R-CNN familyfor object detection or Alex, VGG, ResNet for classification

• Training model: during a model training with default hyper-parameters as learning rate, batch size,learning rate decay or small regularization,etc Then we validate if loss function is getting down andvalidation accuracy is good or not If loss value goes barely down or explode, we can adjust learningrate to more or less bigger accordingly

• Optimizing hyper-parameters: to get the better hyper-parameters, we use grid of parameter search.After some tries, we adjust a range of learning rate or regularization rate, then train a model with eachparameters With each set of parameters, there are validation accuracy and select which one gives thebetter accuracy In high level library -Keras or PyTorch, this optimization process is ready for usage

• Finally, we increase our network to achieve more accuracy by basic stacked block combination principal,some popular blocks such as [(CONV-ReLU)×l-POOL/NORM] (l<=5), [(FC-ReLU)×l] (l<=2),

Trang 35

[ResNet×l], [Inception×l], [CONV-BATCHNORM-RELU] or apply a fusion approach at facet of ture extraction, evaluation metrics and algorithms.

fea-2.1.3 Present the popular deep network models

Training a neuron network needs a lots of data but mostly medical datasets are so small to use the deeplearning techniques The transfer learning is a considerable approach to resolve this matter So far thereare some popular networks as VGG16 [58], Inception [59], ResNet [60], Inception-Resnet [61], DenseNet [62]which is rather efficient in medical classification in general and particularly in cancer detection

• VGG network: VGG is first deep neuron architecture after a success of Alexnet VGG team did stack

of many convolutional and full connected layers together and archived better performance by utilizingthe smallest inception filter of 3×3 convolutional filters They proved that deeper network increasedthe classification accuracy on a large Imagenet dataset Table 2.1 is a summarized architecture ofVGG-16 layers and VGG-19 layers

Input Block 1 Block 2 Block 3 Block 4 Block 5 Layer

conv3-128 conv3-128 maxpool

conv3-256 conv3-256 conv3-256 maxpool

conv3-512 conv3-512 conv3-512 maxpool

conv3-128 conv3-128 maxpool

conv3-256 conv3-256 conv3-256 conv3-256 maxpool

conv3-512 conv3-512 conv3-512 conv3-512 maxpool

FC-4096 FC-4096 FC-1000 Soft-max

Table 2.1: VGG16 & VGG19 nets architecture

The ReLU activation function is always used though VGG nets Technique of 3×3 inception filter-stride

1 pixel is better than 7×7 or 5×5 inceptions filter - stride 2 pixels on two factors about discriminationcapability and number of weighted parameters It’s an important work’s contribution The 3×3convolutional filters can learn the local features and then after many of stacked layers to combinethe localized low space, the nets synthesis higher feature spaces without missing characteristics Theincorporation of 1×1 convolution layer is other approach to increase discrimination function and stillkeep the inception fields in layer In recently years, VGG16 & VGG19 nets are used in transfer

Trang 36

learning techniques because of its shared low level features extraction and medium sized architecture.Two top full connected layers - 4096 of network is good discriminated deep features that can be used

in combination or independent way with handcrafted features in classification network

• Inception network: Inception nets concentrated on efficient deep neuron net Author used 1×1 lutional operator to rise the deep architecture and reduce high dimensional spaces Inception module’sidea is to concatenate many optimal local structure with high correlation analyzed from previous layerdefined in a figure 2.13 Author used various different sized convolutional operator such as 1×1, 3×3,

convo-5×5 combined together They are likely to be types of multi-scale presentation in pyramid scheme Forthe inception with reduction design, it allows for increasing a number of nodes at each layer withouteffecting to next computation layer Totally, Inception network has 22 layers with trained parameters

• ResNet network: when neuron network become deeper, the accuracy will begin to saturate and morethan that it is facing degradation problem Authors from ResNet’s work proposed to stack additionalidentity mappings as defined in 2.15 Author declared that originally H(x) is predicted mappingfunction which learns a mapping from input to output Alternatively, we let define an another mapping

F(x) =H(x) −x and so again H(x) =F(x) +x Now H(x)- residual function - is easier to optimizewith reference to the layer input This formula is also a type of shortcut connection which borrows inlong short term memory (LSTM) network Residual block brings a flow of memory from input layer

to output layer

• MobileNets [63]: in recent years, AI built-in mobile device is extremely interesting to research munity Although great computation and storage capability for smart-phone, it is not stronger thancomputer Light neuron network with accepted accuracy is necessary Howard at el introduced theMobileNet for mobile and embedded device which allowed to adjust hyper-parameter to be suitablefor specified applications Main core components come from techniques of Inception model and depth-wise separable convolution including 1x1 point-wise and 1x1 kernel per channel convolutional operator.Standard convolution is replaced by point-wise and depth-wise convolution which is more productivethan standard one for many applications such as classification, object detection The model is sug-

com-gested two hyper-parameters to adjust model size (α⊂ (0,1]) called as width multiplier by changing

a number of input/output channel at each layer and computation cost (ρ⊂ (0,1]) called as tion multiplier by altering image size and hidden output’s size Adjustment hyper-parameter will betrade off between accuracy, size and latency property; it’s the positive correlation By experiments,MobileNet is achieving 70.6% greater than GoogleNet 69.8% and less than GG16 71.5% but a number

Trang 37

resolu-Figure 2.13: a) Native version of Inception net; b) Reduction version of Inception net

of parameters is the least among them So computer vision application on smart-phone or embeddeddevice is completely developed in near future

For our experiment, although Inception, ResNet achieved the better result than VGG on ImageNet cation but to BreaKHis medical image, VGG transfer learning give more discriminated features

classifi-2.2 Generative Adversarial Networks (GAN)

2.2.1 Introduction to GAN

From GAN’s introduction by Goodfellow in 2014, GAN has been attracted much attention by researchcommunities and applied in various applications such as text/image/audio/video to text/image/audio/video

Trang 38

Figure 2.14: Residual learning: a building block

Figure 2.15: MobileNet: depth-wise, batch norm and ReLU block

synthesis, style transfer, high-resolution image enhancement, domain adaption Basically, GAN compositestwo networks (see figure 2.16), codename as generator network G(x) and discriminator network D(G(x))

G will generate the fake images from studying input data distribution while D will discriminate either realfrom train dataset and fake from G model For GAN, G resolves the difficult tasks than D that recognizescorrelation or distribution between nearly similarly objects and categorize them into correct features space.From initial stage, input dataset consists of random z and real x data used to train G network Recently,GAN is focus much by research community and there are variant GAN which gradually generate the realisticimages such as face, animal, natural picture,etc These techniques are dominated in GAN as conditional-GAN or style transfer GAN

Trang 39

Figure 2.16: A high level overview of GAN

As represent in figure 2.16, both generator and discriminator are neuron networks and trained taneously Discriminator loss function is optimized by back-propagation to adjust discriminator’s weight.Otherwise, training generator is rather complexity which incorporate to D’s feedback on output classification

simul-as well penalized if fake image is clsimul-assified simul-as un-real image Thoroughly for GAN training principal, either

G or D will freeze while another is training with purpose of optimizing its loss as two-player game denoted

as below algorithm:

• Loop though some iterations

– Loop though batch size

∗ Do m sampling for noise {z(1), , z(m)} from distribution pg(z)

∗ Do m sampling for train data {x(1), , x(m)} from distribution pdata(x)

∗ Update discriminator by gradient:

5θ

d

1m

m

∑

i=1

[logDθd(xi) +log(1−Dθd(Gθd(zi)))] (2.20)– Do m sampling for noise {z(1), , z(m)} from distribution pg(z)

– Update generator by gradient:

Trang 40

evaluated by classifier network whether generated images can capture the discriminated features of benignand malignant tumor or not If data space distribution is likely learned from real BreaKHis or BACH, wecan create a enough large dataset to avoid bias drawbacks trained by deep learning and reduce expert’safford in medical domain More importantly, from there we can understand what breast cancer structure is.

In this thesis, we are concentrate on researching some quantitative metrics Herein, IS and FID is ratherpopular in GAN evaluation

• Inception score (IS): Salimans et al [64] used pre-trained Inception model on ImageNet to extractthe good features from generated images, defined as

IS(x) =exp(Ex

h

KLp(y|x)||p(y)i) (2.22)This score calculated average KL-diverge distance between distribution of real and fake images Authordeclared that meaningful generated image has the low entropy p(y|x)within high quality But there aresome limitations analyzed by Barratt [65] IS score depends on trained weight by specified ImageNet(IS = 3.5%) or CIFAR (IS = 11.5%) datasets So IS score doesn’t perform well on natural images.Higher IS score will be better than low score

• Frechet Inception distance (FID): Heusel et al [66] compared distribution from activation feature atPOOL3 layer in Inception model FID is variant of IS score IS measured the generated meaningfulobjects while FID is used for validating the image diversity FID score is defined as

d2((mr, Cr),(mg, Cg)) = ||mr−mg||2+Tr(Cr+Cg−2(CrCg)1/2) (2.23), mr, mg: mean of real and fake image, Cr, Cg: covariance of real and fake image Therefore, FID hasthe same drawbacks as IS bout changing weight and data sample Lower FID score means two images

to be approximately similarity distance

• Mode Score: Che et al [67] suggested MODE score to fix the IS limitation

2.2.2 Present the techniques of GAN training

From generator’s deputy, the mostly used important technique is to upsampling Up-sampling is a process

of learning from the sequence of data and generate the approximately sequences of data by capturing input’s

Định dạng
Số trang	86
Dung lượng	19,36 MB