1. Trang chủ
  2. » Luận Văn - Báo Cáo

Brain tumor classification on mri images via deep learning

69 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Brain Tumor Classification On Mri Images Via Deep Learning
Tác giả Pham Ho Toan
Người hướng dẫn Dr. Truong Hoang Vinh
Trường học Ho Chi Minh City Open University
Chuyên ngành Computer Science
Thể loại Graduate Thesis
Năm xuất bản 2021
Thành phố Ho Chi Minh City
Định dạng
Số trang 69
Dung lượng 2,07 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY OPEN UNIVERSITY --- ∞0∞--- PHAM HO TOAN BRAIN TUMOR CLASSIFICATION ON MRI IMAGES VIA DEEP LEARNING GRADUATE THESIS THE COMPUTER SC

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HO CHI MINH CITY OPEN UNIVERSITY

- ∞0∞ -

PHAM HO TOAN

BRAIN TUMOR CLASSIFICATION

ON MRI IMAGES VIA DEEP LEARNING

GRADUATE THESIS THE COMPUTER SCIENCE DISSERTATION

HO CHI MINH CITY, 2021

Trang 2

MINISTRY OF EDUCATION AND TRAINING

HO CHI MINH CITY OPEN UNIVERSITY

- ∞0∞ -

PHAM HO TOAN

BRAIN TUMOR CLASSIFICATION

ON MRI IMAGES VIA DEEP LEARNING

Student Identity: 1751010162

GRADUATE THESIS THE COMPUTER SCIENCE DISSERTATION

Advisor: Dr TRUONG HOANG VINH

HO CHI MINH CITY, 2021

Trang 3

TRƯỜNG ĐẠI HỌC MỞ

THÀNH PHỐ HỒ CHÍ MINH

KHOA CÔNG NGHỆ THÔNG TIN

CỘNG HÒA XÃ HỘI CHỦ NGHĨAVIỆT NAM

Độc lập – Tự do – Hạnh phúc

GIẤY XÁC NHẬN

Tôi tên là : Phạm Hổ Toàn

Tôi đồng ý cung cấp toàn văn thông tin khóa luận tốt nghiệp hợp lệ về bản quyền cho Thư viện trường đại học Mở Thành phố Hồ Chí Minh Thư viện trường đại học Mở Thành phố Hồ Chí Minh sẽ kết nối toàn văn thông tin khóa luận tốt nghiệp vào hệ thống thông tin khoa học của Sở Khoa học và Công nghệ Thành phố Hồ Chí Minh

Ký tên

(Ghi rõ họ và tên)

Trang 4

COMMENTS OF ADVISOR TO ALLOW STUDENTS TO PROTECT THE

DISSERTATION

Guiding teacher: PhD Vinh Truong Hoang

Students: Toan Pham Ho Class: TH01

Date of Birth: 01/01/1999 Birthplace: Cam Ranh city

Dissertation’s Name: Brain Tumor Classification on MRI Images via Deep Learning

Judgements of advisor to allow students to protect his (her) dissertation before The Council of IT Faculty:

Ho Chi Minh city, day … month … year ……

Reviewer ………

Trang 5

I wonder if you could remember this, but all your pieces of advice have always stuckwith me since so, please take this acknowledgment as a token of my appreciation foralways going above and beyond.

Moreover, I wish teachers belonged to the Faculty of Information Technology andThe Principal all the best while working at Open University and for good

Trang 6

Being considered as one of the most prominent and detrimental neurological ders, diagnosing what category of brain tumour disease as soon as possible is tremen-dously crucial for patients, which is excessively relied on human factors on determiningwhat kind of brain tumour type by having a read at MRI images In order to addressthe said issue, deep learning has been implemented in practice for classification In thethesis, I recommend a myriad of proposed architectures that combine with differentbasic versions of Convolutional Neural Networks in order for enhancing the classifyingperformance in deep learning applying At the same time, these techniques are be-ing used in suggested model: Data Augmentation, ReduceLROnPlateau class, AdamOptimizer, Model Checkpoint, Early Stopping, transfer learning, data augmentation,the arrangement between Batch Normalization and Dropout Finally, Eventual exper-imental results prove the proposed approaches outperform the aforementioned resultsfrom other paper works when carrying out experiments on the same brain tumourdatabase

Trang 7

1 General Review of Brain Tumor MRI Classification 11

1.1 Introduction 11

1.2 Accomplishments of The Thesis 16

1.3 Contributions of The Thesis 17

1.4 Formation of The Thesis 18

1.5 Researching Fields 19

2 Foundational Theory 20 2.1 Basic Conceptions 20

2.1.1 Grayscale Image 20

2.1.2 RGB Image 20

2.2 Machine Learning 21

2.2.1 Definition 21

2.3 Hand-Crafted Feature Extractions 22

2.3.1 Local Binary Pattern 22

2.3.2 Histogram of Oriented Gradients 23

2.4 Classifiers 32

2.4.1 Support Vector Machines 32

2.4.2 K-Nearest Neighbors 34

Trang 8

2.4.3 Rain Forest 36

2.5 Deep learning 37

2.5.1 Definition 37

2.6 Convolutional Neural Networks 39

2.6.1 Introduction 39

2.6.2 Techniques of Neural Network Training 42

2.6.3 Renowned CNN Models 48

3 Experiment and Results 52 3.1 Methodology 52

3.2 Experimental Outcomes 56

4 Comparison And Conclusion 59 4.1 Comparison 59

4.2 Conclusion 60

Trang 9

gc was the center value among nine given pixels in LBP

{gi}P −1

i=0 is the value of P pixels around

T is the matrix transposition operator

Ix, Ix are two separate derivations corresponding to two directions

| G | is magnitude

Θ is direction

Wimage, Wblock, Wcell The width of image, block, and cell respectively

Himage, Hblock, Hcell The height of image, block, and cell respectively

n is the quantity of cells in a block

sizeblock is the dimensions of feature vector of block (sizecell = 9 if using

"unsigned-HOG" or sizecell = 18 otherwise for "sign-HOG")

K is the number of points that are taken into drawing a comparison with the

unlabeled point in KNN

Trang 10

CNN Convolutional Neural Network

FCL Full Connected Layer

Conv Convolutional Layer

DL Deep Learning

ML Machine Learning

GPU Graphics Processing Unit

ANN Artificial Neural Network

KNN K-Nearest Neighbors

LBP Local Binary Pattern

RF Rain Forest

SVM Support Vector Machine

Pool Pooling Layer

ReLU Rectified Linear Units

RGB Red Green Blue

Adam Adaptive Moment Estimation

SGD Stochastic Gradient Descent

WHO World Health Organisation

VGG Visual Geometry Group

CTS Computed Tomography Scan

PCA Principal Component Analysis

CBIR Content-Based Image Retrieval

RELM Rough Extreme Learning Machine

CNNBCN Convolutional Neural Network Based on Complex NetworkResNet Residual Network

DenseNet Dense Convolutional Network

MRI Magnetic Resonance Imaging

Trang 11

DNN Deep Neural Network

CapsNets Capsule Networks

DWT Discrete Wavelet Transform

RPN Region Proposal Network

DSL Deep Structured Learning

SIANN Space Invariant Artificial Neural Network

Trang 12

List of Figures

1.1 The changes in the figure for Brain and Other Nervous System Cancer

over the 16 years 12

1.2 Hand-Crafted Processing 16

1.3 Convolutional Neurol Network Processing 17

2.1 (a) Grayscale image, (b) RGB image 20

2.2 Extracting process of LBP8,1 value 23

2.3 Pre-processing image for HOG algorithm 24

2.4 (a) An analyzed image in 8×8 cells, (b) Blocks of cells have been set 25 2.5 Kernels for calculating gradient 26

2.6 A pixel’s value after preprocessing step 27

2.7 Gradient Direction and Gradient Magnitude of a pixel play an vital role in determining gradient vector 29

2.8 Putting gradient vector value into histogram of gradients 29

2.9 The outcome histogram of 9-bin 30

2.10 SVM illustration 33

2.11 One-versus-all (Basic) 33

2.12 One-versus-all (Advanced) 33

2.13 KNN illustration 35

2.14 The functional process of Rain Forest 36

2.15 The performance of DL and Older learning algorithms 38

2.16 A CNN operating sequence of brain tumor classification 40

2.17 Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 con-volved feature 40

2.18 Different types of pooling convolved features 41

2.19 Flattening pooled feature into vector 42

2.20 ReLU activation 43

2.21 Before and after implementing dilution 45

2.22 AlexNet’s architecture 48

Trang 13

2.23 VGG16’s architecture 48

2.24 VGG19’s architecture 49

2.25 InceptionV3’s architecture 49

2.26 Inception-ResNetV2’s architecture 50

2.27 ResNet50’s architecture 50

2.28 DenseNet121’s architecture 51

2.29 MobileNet’s architecture 51

3.1 (a) Proposed AlexNet model, (b) Proposed VGG16 model, (c) Pro-posed VGG19 model 54

3.2 (a) Proposed ResNet model, (b) Proposed DenseNet model, (c) Pro-posed MobileNet model 55

Trang 14

List of Tables

1.1 Experimental Database In Detail 17

3.1 Accuracy from handcraft feature extractions 56

3.2 Set values of original (or paper’s) methods 56

3.3 set values of proposed methods 57

3.4 Accuracy from both original and proposed methods 58

3.5 The best accuracy for each papers as opposed to mine 58

Trang 15

of brain tumors as soon as possible after symptoms appear to carry out plausibletreatments eventually for the patient is tremendously imperative.

Fortunately, The World Health Organization (WHO) finds that there are a riety of effective diagnostic methods such as Magnetic Resonance Imaging (MRI),Computed Tomography Scan (CTS) However, MRI or CTS have their drawbacks,which is they excessively rely on human-being such as emotional state, expertise,subjective experiments et cetera, not to mention that a biopsy or surgery is carriedout afterward to determine which type of brain tumor is, which is also carried out by

Trang 16

va-doctors At the end of the day, it all boils down to a brand-new technique, which isway more accurate, convenient, and be able to deal with all the former disadvantagesflawlessly Automatically classifying brain tumor types via MR (Magnetic Resonance)images by computer vision was born to tackle these problems Lately, a computer-aided automatic diagnostic approach has been ubiquitous in numerous forms To bemore specific, robust machine learning techniques have been implemented strongly

in preventive treatments to assist medical doctors in fixing on an appropriate curefor patients on time and not relying on human factors excessively whenever conductdiagnosing raw MRI brain tumor images

Trang 17

tures, Pipeline of BoW - based tissue classification, and Tumor region augmentationand partition The ultimate outcomes were quite objective, over 90% of assortment foreach type of brain tumor A Biller et al [3] introduced Sodium-MR imaging data ofsufferers with treatment-na¨ıve glioma WHO grades I–IV acquired by using a 7T MRsystem For the acquisition of sodium-MR images, researchers utilized density-adapted3D radial projection reconstruction pulse sequences Proton-MR imaging data wereacquired by using 3T whole-body usage In the next three years, Muhammad Saj-jad et al [4] offered a novel convolutional neural network (CNN) based multi-gradebrain tumor classification system and extensive data development for avoiding lack ofdata query for multi-grade tumor distribution A Tumor segmentation using a deeplearning strategy was also formed The research proved that without using their dataaugmentation, validation accuracy would be declined slightly as opposed to the sug-gested one J Seetha and S Selvakumar Raja [5] proposed automatic brain tumordetection by using based Convolutional Neural Networks (CNN) division, and thedeeper architecture design was performed by using small kernels The recommendedmodels witnessed a better improvement in the validation accuracy compared to otheralgorithms such as SVM, DNN There was no need to have a presentation of featureextraction as the feature value was taken from CNN itself Parnian Afshar et al [6]newly proposed CapsNets (Capsule Networks) model, which had the potential to pre-serve the spatial relations due to their Routing by Agreement process This suggestionaimed to classify three different groups of brain tumors, which were Meningioma, Pi-tuitary, and Glioma Javaria Amin et al [7] proposed a fusion process to combinestructural and texture information of four MRI sequences (T1C, T1, Flair, and T2)for the detection of brain tumors Also, a Discrete Wavelet Transform (DWT) alongwith Daubechies wavelet kernel was utilized for the fusion process, which provides

a more informative tumor region than a single individual sequence of MRI erally, these researches’ success rate ranges from 87% (the minimum) to 98.7% (themaximum) with diverse models and methods Content-based Image Retrieval (CBIR)techniques have been developed by Jun Cheng et al [8] with a myriad of adjustments

Gen-in five parameters: The radius (R) of the disk-shaped structurGen-ing element used to late the brain tumor area, the number (N) of pooling regions created by the intensityorder-based division method, the size (W) of raw image patches used as local features,the number (K) of vocabulary size, and the reduced dimensionality (D) in new spaceinduced by the projection matrix (L) learned in CFML, like the number of rows of

di-L The eventual consequence that researchers could accomplish was approximately94.7% Arshia Rehman et al [9] used CNN (Convolutional Neural Network) mod-els like AlexNet, Inception, and VGG16 with different kinds of improving techniques

Trang 18

to classify three types of brain tumor images (Meningioma, Glioma, and Pituitary)

in 2019 To be more specific, training parameters were adjusted, and models werefine-tuned, which achieved the accuracy of 98.69% in VGG16, 98.04% in Inception,and 97.39% in AlexNet In the same former year, S Deepak and P.M Ameer [10]also implemented the Inception model but only for the extraction stage of two pro-posed models In the classifying stage, SVM (Support Vector Machine) and KNNs(K-Nearest Neighbors) was chosen to diagnose MRI images and accomplished signif-icant results, with 97.8% and 98.0%, respectively Zar Nawab Khan Swati et al [11]proposed efficient methods using pre-train VGG19 on ImageNet database combinedwith fine-tuning respectively from the first block to the 6th block in 2019 for brain tu-mor classification The highest following accuracy for transfer learning VGG19 modelwas 96.13% when fine-tuning all 6th first blocks Abdu Gumaei et al [12] presented anew approach (with the highest accuracy was 94.23% in proposed methods) includingthree steps In the preprocessing stage, brain tumor images outside of the range [0,255]would be transformed into intensity images in the range of [0,1] by using a min-maxnormalization rule The hybrid PCA-NGIST, which combines the PCA methods withthe GIST descriptor after normalizing, would be chosen to extract features of braintumor images, and RELM was used as a classifier in the final stage Regarding traindatabase in poor conditions such as Noise Based, Fast Gradient Sign Method, andVirtual Adversarial Training, Jai Kotia et al [13] come up with a solution which wastraining CNN models on adversarial attack generated images The validation accuracyfor each said adversarial attack after training was enhanced remarkably as approxi-mate as the actual results compared to results that have not been through proposedtraining On the other hand, five simple proposed Convolutional Neural Network Ar-chitectures for brain tumor classification was constructed by Nyoman Abiwinanda et

al [14] in order to assert that their results on simple models could be higher thanother numerous complicated models Only with two 2D convolutions, two ReLU acti-vation, and two Maxpooling layers, they obtained 98.51% for training and 84.19% forvalidation In the year 2020, a modified CNNBCNs (Convolutional Neural NetworkBased on Complex Networks) was tested and constructed by Zhiguan Huang et al [15]with three algorithms for randomly generating graphs such as the Erdos-Renyi (ER),Watts-Strogatz (WS), and Barabasi-Albert (BA) that was more effective than theoriginal CNNBCNs in diagnosing brain tumor types via MRI images The highest ac-curacy amongst obtained results belonged to CNNBCN–ER, at exactly 95.49% TheVGG16 model was chosen as a base network in proposed methods, whose name wasThe Faster R-CNN architecture, presented by Yakub Bhanothu et al [16] in 2020.The Faster R-CNN consists of three primary blocks, namely RPN, Region of Interest

Trang 19

(RoI), and Region-based CNN (R-CNN) for object classification The final results foreach brain tumor class (Glioma, Meningioma, and Pituitary) were 75.18%, 89.45%,and 68.18%, respectively Likewise, a developed model that consists of two primarystages was introduced by Kazihise Ntikurako Guy-Fernand et al [17] in the year 2020.

To be more specific, input images would be firstly pass through the visual attentionmechanism for training The acquired knowledge would then be transferred to theproposed architecture as a feature selector that mainly uses staples of CNNs such asConvolutional layers and Batch Normalization layers The coming out results werequite good, at roughly 96% In the second month of the year 2020, a new model forbrain tumor classification based on CNN’s largely was presented by Milica M.Badzaand Marko C.Barjaktarovic [18] In other words, MR images at first would be pre-processing (normalize, resize) and augmented (rotate, flip vertically) to increase thetraining image database after passing through architecture with two proposed ex-tracting blocks which had different kinds of layers arranged The highest obtainedaccuracy in this way was around 95% Preethi Kurian and Vijay Jeyakumar [19] inearly 2020 experimented with the CBIR task on seven distinct databases with fifteendiverse classes by implementing LeNet and AlexNet architecture and make a com-parison of validation accuracy via the number of epochs It indicated that the higherthe figure for epoch is, the more accurate the models get

First of all, the research thesis would illustrate conceptions of two well-known localimage descriptors including Local Binary Pattern (LBP) [20], Histogram of OrientedGradients (HOG) [21] as feature extraction in the early stage (after preprocessingimages) followed by three typical supervised machine learning algorithm namely Thek-nearest neighbors (KNN) [22], Support Vector Machine (SVM), and Rain Forest(RF) as classification As a matter of fact, RF is also utilized for some situation thatwas supposed to apply non-supervised machine learning Last but not least, Convo-lutional Neural Network (CNN) [23] is a class of deep neural networks which is themost ubiquitous method applied in computer vision applications Therefore, we wouldbriefly assimilate various architectures and their other versions such as AlexNet [24],VGG [25], Inception [26], MobileNet [27], ResNet [28], and DenseNet [29] Eventu-ally, I propose myriads of new approaches in every single CNN architecture that couldaccomplish a dramatical test accuracy instead of those original ones

Trang 20

1.2 Accomplishments of The Thesis

This project would focus on finding a more effective CNN model that could helpdoctors to make a decision precisely whenever diagnosing three ubiquitous brain tu-mor types (Glioma, Meningioma, and Pituitary) and avoiding human factors as much

as possible during patient treatment The experiment would be divided into two mary sections to draw a comparison between original models and proposed architec-tures eventually on the same brain tumor database published on Figshare in April

pri-2017 In section one of conducting experimentation, I would initiate by using LBP [20]and HOG [21] as feature extraction separately, then KNN [22], SVM [30], and RF [31]would play a crucial role in the classifying stage At the end of the day, I could havesix distinct test accuracy on a given database provided by six models [32] (LBP -KNN, LBP - SVM, LBP - RF, HOG - KNN, HOG - SVM, and HOG - RF) viaHand-crafted processing (1.2)

Figure 1.2: Hand-Crafted Processing

On the other hand, I would conduct experiments of almost all versions of eachrenowned CNN model (AlexNet [24], VGG [25], Inception [26], MobileNet [27], ResNet [28],and DenseNet [29]) to perceive original results at first The proposed approach would

be carried out then to achieve higher possible accuracy For now, there is only a sion for Alexnet, so I would check it out About VGG, popular VGG16, and VGG19both would be chosen to do experiments Likewise, there are also two architectureschosen for Inception and MobileNet, which are InceptionV3, InceptionResNetV2, Mo-bileNet, and MobileNetV2, respectively DenseNet could be higher than two formerarchitectures an applied version, DenseNet121, DenseNet169, and DenseNet201 Amyriad of versions from ResNet might be implemented like ResNet50, ResNet50V2,ResNet101, ResNet101V2, ResNet152, and ResNet152V2 through CNN processing(1.3)

ver-Regarding the brain tumor database (1.1), the experimental database [33] is thebenchmark database to study the brain tumor classification problem, which consists

of 3064 T1-weighted contrast-enhanced images presented for three kinds of brain mor categories (Glioma, Meningioma, and Pituitary) The total size for this database

tu-is approximately 82 MB and tu-is publtu-ished by Jun Cheng on Figshare in April 2017

Trang 21

Figure 1.3: Convolutional Neurol Network Processing

Brain Tumor Category Quantity Width × Height

Total Number Of Images 3064 512 × 512

Table 1.1: Experimental Database In Detail

To sum up, I would have thirty eight different results in total including originsand propositions, let alone various results from other papers that did experiment onthe same database as mine to have a discussion among final results and hammer outthe most effective methods recently

• Make a comparison of eventual test accuracies between original methods, posed architectures, and results achieved from other papers that carry out ex-periments on the same introduced database about brain tumor discrimination

pro-• Come out with a new arrangement in CNN Top Layers (The Fully ConnectedLayers) that provides better outcomes with numerous adjustments in particularparameters in some CNN architectures

Trang 22

1.4 Formation of The Thesis

The thesis is organized into four chapters:

• Chapter 1: Introduce the reason why the thesis is adopted and a generalstory about brain tumors in the last few years followed by typical results withproposed methods from science papers all around the globe For now, I haveresearched two main techniques of deep learning: Hand-crafted/deep featureextractions and transfer learning in CNNs Each technique is applied in themedical domain, and in this case, brain tumor classification Contributions andother research fields would also be shown in chapter one, along with the struc-ture of the thesis

• Chapter 2: Foundational conceptions about image classification, ML, DL would

be presented first to be able to comprehend ubiquitous algorithms that are vanced later, namely Hand-crafted feature extractions (LBP, HOG), classifica-tions (SVM, KNN, and RF), and CNN’s (AlexNet, VGG, Inception, MobileNet,ResNet, and DenseNet)

ad-• Chapter 3: In this section, the author would submit set values in each nique and explain the reasons why it is established or resolved The upshots oforiginal and proposed methods are described afterward in drawing tables withexplanations attached below

tech-• Chapter 4: In the last chapter of this thesis, comparisons would be drawnbetween proposed accuracy and initial (or papers) results to determine whichone is the more powerful method according to eventual test accuracy, confusionmatrix (to ensure that the test accuracy is absolutely well-performed), resourcesand time-wasting

Trang 23

1.5 Researching Fields

• Research in python language

• Research in Hand-crafted features in computer vision (LBP, HOG) and deepfeature extractions (SVM, KNN, and RF)

• Research in machine learning platform: TensorFlow and other necessary works or libraries such as Keras, os, OpenCV, NumPy, Sklearn, Matplotlib,math, and seaborn

frame-• Research in transfer learning in CNN models (AlexNet, VGG, Inception, bileNet, ResNet, and DenseNet)

Mo-• Research in the preprocessing input image (label, resize, normalize, split, dataaugmented)

• Research in training CNN models (ModelCheckpoint, EarlyStopping, ROnPlateau, and Optimizer) by CPU and GPU via CUDA Toolkit

ReduceL-• Research in evaluating CNN models after training (Confusion Matrix)

• Research in saving weight and model after training (JSON)

• Research in Dropout and BatchNormalization to avoid overfitting and enhanceperformance

Trang 24

2.1.2 RGB Image

Standing for "Red Green Blue", RGB refers to three hues of light that could bemixed to generate different colors on pixels of the matrix of an image RGB imagehas the same structure as a grayscale image, but each pixel of a whole matrix hasthree values in the range of [0, 255] Three said values present for three colors (red,green, blue)

Figure 2.1: (a) Grayscale image, (b) RGB image

Trang 25

2.2 Machine Learning

2.2.1 Definition

Being developed and seen as a subset of Artificial Intelligence (AI), Machine ing (ML) is the self-study of computer algorithms based on input databases (train-ing databases), which enhances automatically through experience After the trainingstage, models built by machine learning algorithms could be able to make predictions

Learn-or decisions on the testing database without being explicitly programmed to do so.Machine learning algorithms are implemented in a wide variety of applications, such

as manufacturing, healthcare, media, consumer goods, email filtering, creative arts,energy, financial services et cetera, and especially computer vision, where it is chal-lenging or infeasible to come out with conventional algorithms to address the neededissues

Technically, machine learning approaches are conventionally split into three broadgroups on account of relying on diverse techniques that models are trained:

• Supervised learning: The computer with chosen algorithms for training would

be presented with a database labeled, which means the computer could knowthe desirable outputs after training with example inputs The goal for models is

to study broadly general rules so that it could make decisions as most plausible

as possible at the end of the day

• Unsupervised learning: On the other hand, no labels are provided to the pared models compared to Supervised learning The trained computer wouldhave no choice but to broadly perceiving the overall structure in its given in-puts on its own

pre-• Reinforcement learning: A computer program interacts with a dynamic ronment in which it has to perform certain goals, for instance, driving a vehicle

envi-or playing a game against an opponent such as a game of chess During thetime events occurring, it would learn gradually and strikes to do their best toaccomplish the given goals until the end

Trang 26

2.3 Hand-Crafted Feature Extractions

2.3.1 Local Binary Pattern

2.3.1.1 Definition

Being seen as a particular case of the Texture Spectrum model proposed in 1990and was first described publicly in the year 1994, LBP [20] is a type of visual descriptorapplied for classification in computer vision It has been found as a powerful feature fortexture classification since Technically, in each pixel of an image, the LBP algorithmuses the center pixel’s value to firstly make a comparison with others around a radius

of R, then rewrite every single value of P pixels around If the center pixel’s value

is greater than the neighbor’s value, replace the original surrounding pixel value by

"0" Otherwise, write "1" Values of LBP’s pixel are calculated by (2.1):

Specifically, gc was the center value among nine given pixels, {gi}P −1

i=0 is the value

of P pixels around and threshold function s (x) was defined below (2.2):

• Used to be one of the most simple and effective feature-extracting algorithms

• The influence of brightness is almost immune to LBP

Disadvantages:

• Data augmentation (i.e Rotation) in LBP would lead to awful performance

• The complexity of calculation would increase exponentially along with the ber of neighbor’s pixels

num-• Losing a myriad of image information on account of the threshold stage

Trang 27

An illustration (2.2) of the LBP8,1 value extracting operation.

Take surrounding pixel’s

value and minus the

center value one by one.

Multiply thresholded values

by each rational weight value (2 0 ,2 1 ,2 2 ,2 3 , 2 4 ,2 5 ,2 6 ,2 7 )

1 * 4 = 4

0 * 16 = 0

199

Sum of pixel’s values LBP’s value

Figure 2.2: Extracting process of LBP8,1 value

2.3.2 Histogram of Oriented Gradients

2.3.2.1 Definition

As mentioned earlier, LBP used to be one of the most simple feature descriptors;HOG was not an exception Basically, a feature descriptor is a representation of analgorithm that simplifies one or a myriad of images by extracting needed informationand tossing this extraneous ones

Typically, an image of size width×height×3 (RGB channels) is converted to avector of length n by feature descriptor, in this case, is HOG Therefore, if an imagewith a size of 64×128×3 is put in, the output feature vector would be of length

3780 To have an eventual feature vector, five calculated steps must be carried outrespectively:

Trang 28

• Step 1: Pre-processing.

Given is a pedestrian image under processing by detecting a needed object,cropping out into a suitable width x height, and resizing finally for the conve-nience of calculation before conducting the next steps (2.3) For instance, thebelow athlete is cropped out of the original image into rational size (100×200)first, then to be resized to 64×128 The only constraint here is the patches be-ing analyzed have to be a fixed aspect ratio like 100×200, 128×256, or even1000×2000 but 101×604 A patch of size for calculating the HOG feature de-scriptor could be selected freely as the authority will

Trang 29

• Step 2: Analyzing input picture into blocks of cells.

In this step, a 64×128 image would be analyzed into loads of 8×8 cells (2.4)and a histogram of gradients is computed for each one The reason why imagemust be divided into blocks is that to be more robust to noise or it could besaid that much less sensitive to noise However, it would be needed a step beforestarting calculating, which is determining-blocks stage (2.4)

Figure 2.4: (a) An analyzed image in 8×8 cells, (b) Blocks of cells have been set

To calculate the characteristic vector for each cell, the image must be dividedinto blocks, comparable to the number of cells To determine how many blocksthat an image to be analyzed into, the following equation (2.3) below would beuseful:

nblock-image = (Wimage − Wblock∗ Wcell

Wcell

+ 1) ∗ (Himage − Hblock ∗ Hcell

Hcell

+ 1)(2.3)

• nblock-image is the number of blocks within the analyzed image

• Wimage and Himage are the width and the height of input image

• Wblock and Hblock are the width and the height a block that is set

• Wcell and Hcell are the width and the height of a cell that is set as well

Trang 30

• Step 3: Calculating the Gradient Images.

First of all, the horizontal and vertical gradients are calculated by two tions (2.4, 2.5) below corresponding to the derivative operators in the two di-mensions Ox and Oy:

Dy = [1 0 − 1]T (2.5)Specifically, T is the matrix transposition operator The picture below (2.5)could be more objective to perceive:

-1

01

Figure 2.5: Kernels for calculating gradient

If an input image is seen as I, there would be two separate derivation sponding to two directions, according to the equations (2.6, 2.7):

At this time, two intensity components of gradient could be calculated, which

is Gradient Magnitude and Gradient Direction (2.8, 2.9):

Trang 31

For example: Below (2.6) is a given pixel of image that is calculated throughgradient-calculating stage:

93

55

Figure 2.6: A pixel’s value after preprocessing step

By implementing these former equations about gradient calculation, the finalresult would be (2.10, 2.11, 2.12, 2.13):

Trang 32

• Step 4: Calculate Histogram of Gradients in 8×8 cells.

After confirming the number of blocks of an image, two tasks need to bedone to calculate feature vector values:

• Dividing directional space into p bin (The number of characteristic vectordimensions of the cell)

• Discreting inclination angle at each pixel into the bin

Assuming that the angle of inclination of an under-processing pixel at (x, y)coordinate has alpha (x, y) Unsigned-HOG with p=9 (2.14) and Signed-HOGwith p=18 (2.15):

sizeblock = n ∗ sizecell (2.16)

• n is the quantity of cells in a block

• sizeblock is the dimensions of feature vector of block (sizecell = 9 if using

"unsigned-HOG" or sizecell = 18 otherwise for "sign-HOG")

The histogram is essentially a vector of 9 bins (numbers) comparable to theangle of 0, 20, 40, 60, 80, 100, 120, 140, 160 In pixels, the histogram of gradientintensity is built by voting To be more specific, the weight of a pixel’s votedepends on GD and GM (2.7), which are computed in step 2 Let take a look

at this 8×8 patch and perceive how the gradients work to finally put gradientvector value into HOGs (2.8)

Trang 33

Figure 2.7: Gradient Direction and Gradient Magnitude of a pixel play an vital role

in determining gradient vector

Trang 34

The contributions of all pixels in the block of 8×8 cells are added up together

to generate the 9-bin histogram (2.9)

Figure 2.9: The outcome histogram of 9-bin

• Step 5: 16×16 Block Normalization

In the previous stage, a histogram based on the gradient of the image iscreated Unfortunately, gradients of an image are sensitive to general lighting.Therefore, “normalizing” the histogram to avoid being affected by lighting vari-ations is a vital step to be done by:

Ngày đăng: 04/02/2025, 18:53

TRÍCH ĐOẠN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w