1. Trang chủ
  2. » Luận Văn - Báo Cáo

Machine learning tools for diagnosis of alzheimer’s disease using whole genome sequencing data and mri and pet images

79 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Machine Learning Tools for Diagnosis of Alzheimer’s Disease Using Whole Genome Sequencing Data and MRI and PET Images
Tác giả Vũ Duy Thanh
Người hướng dẫn Oliver Y. Chén, Nguyễn Linh Trung, Laurent Le Brusquet
Trường học Paris-Saclay University
Chuyên ngành Electronic Engineering
Thể loại Master thesis
Năm xuất bản 2023
Thành phố Lausanne
Định dạng
Số trang 79
Dung lượng 89,79 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

VŨ DUY THANH MACHINE LEARNING TOOLS FOR DIAGNOSIS OF ALZHEIMER’S DISEASE USING WHOLE GENOME SEQUENCING DATA AND MRI AND PET IMAGES LUẬN VĂN THẠC SĨ NGÀNH ĐIỆN TỬ, NĂNG LƯỢNG ĐIỆN, TỰ Đ

Trang 1

VŨ DUY THANH

MACHINE LEARNING TOOLS FOR DIAGNOSIS OF ALZHEIMER’S DISEASE USING WHOLE GENOME SEQUENCING DATA AND MRI AND PET IMAGES

LUẬN VĂN THẠC SĨ NGÀNH ĐIỆN TỬ,

NĂNG LƯỢNG ĐIỆN, TỰ ĐỘNG HÓA

CHUYÊN NGÀNH KỸ THUẬT TRUYỀN THÔNG VÀ DỮ LIỆU

NGƯỜI HƯỚNG DẪN KHOA HỌC

GS.TS Oliver Y Chén PGS.TS Nguyễn Linh Trung GS.TS Laurent Le Brusquet

LAUSANNE, NĂM 2023

Trang 2

Master Thesis

MACHINE LEARNING TOOLS FOR DIAGNOSIS OF ALZHEIMER’S DISEASE USING WHOLE GENOME SEQUENCING

DATA AND MRI AND PET IMAGES

Master thesis of Paris-Saclay University and VNU University of Engineering and Technology

Specialization: M2 Data and Communication EngineeringResearch unit: Centre hospitalier universitaire vaudois

Thesis presented at Lausanne, on 21 December 2023

Duy Thanh VU

Committee

Arnaud BOURNEL Paris-Saclay University Chairman Arthur TENENHAUS

CNRS, CentraleSupelec, Paris-Saclay University Rapporteur Oliver Y Chén

CHUV and Université de Lausanne Supervisor NGUYEN Linh Trung

VNU University of Engineering and Technology Co-supervisor Laurent LE BRUSQUET

CNRS, CentraleSupelec, Paris-Saclay University Co-supervisor Pierre DUHAMEL

CNRS, CentraleSupelec, Paris-Saclay University Examiner

Thesis Supervision

Oliver Y Chén CHUV and Université de Lausanne Supervisor

NGUYEN Linh Trung VNU University of Engineering and Technology Co-supervisor

Laurent LE BRUSQUET CNRS, CentraleSupelec, Paris-Saclay University Co-supervisor

Trang 3

This internship thesis is my four-month full-time work at Oliver Chen’s lab at Centre Laboratoire d’Epalinges (CLE)CHUV/UNIL, under the supervision of Prof Oliver Chen at the University Hospital of Lausanne (CHUV) and the Uni-versity of Lausanne (UNIL), Prof Nguyen Linh Trung at VNU University of Engineering and Technology, and Prof.Laurent Le Brusque at Paris-Saclay University In this internship, I had the chance to have to work in a new environ-ment, meet new people, and a lot of new things to me Fortunately, I worked in a great environment with very nicepeople Everything treats me well, so I can focus, feel comfortable accomplishing enjoyable and fulfilling tasks, anddevelop new things for my master’s thesis Despite the limitations of this thesis, I am happy and proud of what Ihave accomplished so far I hope to continue pursuing academic research in the long run

I would like to express my first great appreciation to my two supervisors for their incredible mentorship andsupport Professor Oliver Chen supports funding for this internship thesis I had a chance to work with him after mybachelor’s degree His kindness, encouragement, and guidance enabled me to discover my potential and improve

my research skills significantly With his help, I am now a much better version of myself than I was two years ago.Prof Linh Trung provided me with guidance and support from the beginning day I started doing research and hewas also my supervisor during my bachelor’s degree He inspired me to explore tensor methods for Alzheimer’sdisease (AD) study and carefully help to my errors Prof Laurent specializes in tensor factozation, join analysis ofheterogeneous data, and multiway data analysis I hope that after this thesis, we can still work together All mysupervisors gave me a lot of freedom to complete my thesis and provided useful comments and edits I could nothave done it without them

I am thankful to have met, worked with, and learned from wonderful colleagues and friends I would like tothank Dr Nguyen Viet Dung and Dr Le Trung Thanh for guiding me in tensor methods and tensor decompositionsince the first time I started my undergraduate research I want to thank Mr Pham Minh Tuan for our discussion

on understanding AD, technical preprocessing, and brain positron emission tomography (PET) Special thanks to

my labmates, Christelle, and Julien, for helping me when I arrived in Switzerland I have learned a lot after manydiscussions on genetics, biology with Christelle, and statistics with Julien They are all very nice people with manyinteresting stories that helped me learn and understand more people and life in Switzerland I also want to thank

my friends, colleagues, and people at AVITECH and VNU-UET in Vietnam who helped me in my master’s program

Trang 4

This research has been done under the research project QG.22.62 “Multi-dimensional data analysis and tion to Alzheimer’s disease diagnosis” of Vietnam National University, Hanoi In addition, I would like to thank theVingroup Innovation Foundation (VINIF) for their support of my master’s program, under code VINIF.2021.ThS.15.

applica-I am grateful to my mom and my two sisters for their love and support Lastly, this thesis is a gift applica-I specificallywant to dedicate to my father in loving memory I wish I could have completed this master’s degree a year earlier sothat he could have read it and felt proud of me

Trang 5

I solemnly declare that my thesis, titled ’MACHINE LEARNING TOOLS FOR DIAGNOSIS OF ALZHEIMER’S DISEASE ING WHOLE GENOME SEQUENCING DATA AND MRI AND PET IMAGES’, is my own research work conducted underthe guidance of Prof Oliver Y Chén, Prof NGUYEN Linh Trung, and Prof Laurent LE BRUSQUET The sources used

US-in the thesis are explicitly mentioned US-in the reference section, with proper citations The data and results presented

in the thesis are entirely truthful, and there is no copying from the works of others If any discrepancies are found, Itake full responsibility and am subject to any disciplinary actions imposed by the university

Lausanne, November , 2023

Student

Vu Duy Thanh

Trang 6

The early, timely, and accurate diagnosis of Alzheimer’s disease (AD), particularly its earlier sign – mild cognitiveimpairment (MCI), plays an important role in detecting, managing, and potentially treating the disease for bothpatients and clinicians Recent studies have shown that neuroimaging and genetic data provide complementaryinformation for the diagnosis and prognosis of AD Using fusion, one can integrate these multimodal, multivariateand potentially high-dimensional biomarkers to improve disease assessment State-of-the-art fusion approachestypically involve linearly combining kernels or similarity matrices from various modalities These strategies, however,often neglect the interactions among modalities and the fact that the relationship among multimodal data may not

be linear In addition, existing combination methods omit neighborhood relationships

To address these issues, in this thesis, we present a machine-learning framework specifically designed for datafusion The framework leverages the strengths of both tensor methods and deep learning to manage multivariateand multimodal data More concretely, we design two new methods to exploit the inherent complementarity insuch datasets by first learning and then fusing kernels within and among modalities Technically, the first method– deep kernel learning (DKL) – involves automatically combining multiple kernels through deep learning to uncoverthe intrinsic complex relationships among modalities It is a generalized version of several multi-kernel learningmethods The second method – tensor kernel learning (TKL) – involves multi-kernel learning using non-negativeCANDECOMP/PARAFAC (CP) decomposition They serve as augmented kernels to facilitate the learning of optimalkernels and provide a better explanation These two methods complement each other in practice

To evaluate the efficiency of DKL and TKL, we use data from the Alzheimer’s Disease Neuroimaging Initiative(ADNI), encompassing 331 subjects including 121 cognitively normal individuals, 100 subjects with Mild CognitiveImpairment (MCI), and 110 AD patients Overall, both DKL and TKL demonstrate improvements regarding AD as-sessment over single-modality approaches When employed together, DKL and TKL provide new insights into datafusion and multi-kernel learning Finally, further simulation studies and data analysis show promises of the proposedmethods in learning intricate inter- and intra-modality relationships suitable for both supervised and unsupervisedsettings

Trang 7

1.1 Literature review 9

1.1.1 Feature fusion 9

1.1.1.1 Concatenation 9

1.1.1.2 Summation 10

1.1.2 Kernel fusion 10

1.1.3 Tensor decomposition 10

1.2 Contribution 11

1.3 Thesis outline 12

2 Background 13 2.1 Biomarkers for Alzheimer’s Disease 13

2.1.1 Genetics 13

2.1.2 Magnetic Resonance Imaging (MRI) 16

2.1.3 Positron emission tomography (PET) 18

2.1.4 Cerebrospinal fluid (CSF) biomarkers 21

2.2 Matrix decomposition and Tensor decomposition 22

2.2.1 Matrix decomposition 22

2.2.1.1 Singular value decomposition (SVD) 22

2.2.1.2 Non-negative matrix factorization 23

2.2.2 Tensor decomposition 23

2.2.3 Tensor notation and tensor operators 24

2.2.4 Different types of tensor decomposition 27

2.3 Kernel methods in machine learning 30

2.4 Remarks 31

Trang 8

3 A Framework for Kernel Combination/Fusion for Alzheimer’s disease accessment 33

3.1 Introduction 33

3.2 Limitations of current fusion approaches 33

3.2.1 Problem formulation 34

3.2.2 Related work 34

3.2.3 Limitations 36

3.3 Overview of the proposed framework for kernel learning and kernel combination 36

3.3.1 Feature representation 37

3.3.2 Kernel construction/representation 38

3.3.3 Kernel learning/combination 38

3.3.3.1 Deep kernel learning (DKL) 38

3.3.3.2 Tensor kernel learning (TKL) 39

3.3.4 Feature learning through kernel by manifold learning 39

3.3.5 Classification 39

3.4 Remarks 40

4 Experiments 41 4.1 Dataset and preprocessing 41

4.1.1 ADNI dataset 41

4.1.2 Data preprocessing 42

4.2 Single modality 43

4.3 Kernel combination using the DKL 46

4.3.1 Influence of different types of loss 46

4.3.2 Influence of number of layers and training iterations 47

4.4 Kernel combination using the TKL 50

4.4.1 Advantage of non-negative CP decomposition 50

4.4.2 Influence of rank on non-negative CP decomposition 52

4.5 Discussion 54

5 Conclusion and Future Works 55 5.1 Conclusion 55

5.2 Future work 55

A DKL experiments 57 A.1 CN vs AD 58

Trang 9

A.2 CN vs MCI 60A.3 MCI vs AD 62

B.1 CN vs AD 65B.2 CN vs MCI 65B.3 MCI vs AD 65

Trang 10

List of Figures

1.1 Thesis structure 11

2.1 Structure of Chapter 2 14

2.2 Relationship between cell, chromosomes, genes, and DNA 14

2.3 Alleles and SNP encoding 15

2.4 MRI principle 17

2.5 Progression of Alzheimer’s disease in grey matter atrophy 17

2.6 Z score reduction of FDG-PET in a normal cognition subject who transitions to AD and another subject who transitions from MCI to AD 19

2.7 The top region shows decreased uptake values in FDG-PET images when comparing AD and high-risk populations to normal controls 19

2.8 The top region shows increased uptake values in PIB-PET images when comparing AD and high-risk populations to normal controls 20

2.9 Example of a three-order tensor 24

2.10 Illustration for different types of three-way tensor decomposition 28

3.1 A Data Fusion Framework 37

3.2 Proposed DKL model for kernel combination 39

4.1 Cerebrospinal fluid (CSF) measures in this study 42

4.2 T1-weighted MRI and FDG-PET images of AD patients and healthy control patients 43

4.3 Processed T1-weighted MRI and FDG-PET images of AD patients and healthy control patients 44

4.4 Comparing results between three groups based on a single modality in (a) Accuracy and (b) AUC 44

4.5 Illustrate the kernel of each modality in both supervised and unsupervised kernels 45

4.6 Classification performance deteriorates when utilizing only 10 features to represent each subject, re-sulting in degraded results for MRI and PET compared to using the original features 46

Trang 11

4.7 Comparing the results of kernel combination using various loss functions in the proposed deep

learn-ing model for a three binary group classification 48

4.8 Classification results at a train-testing split and at iteration 100 for five deep learning networks trained on five types of loss 48

4.9 Classification results at a train-testing split and at iteration 240 for five deep learning networks trained on five types of loss 49

4.10 Classification results at a train-testing split and at iteration 320 for five deep learning networks trained on five types of loss 49

4.11 Classification results at a train-testing split and at iteration 520 for five deep learning networks trained on five types of loss 49

4.12 Influence of number convolutional layers in the DKL on classification accuracy of three binary groups 50 4.13 CANDECOMP/PARAFAC (CP) decomposition breaks down the kernels of multiple modalities 51

4.14 The impact of the number of tensor ranks in nonnegative CPD on the reconstruction rate 52

4.15 The impact of rank on classification accuracy 53

5.1 Kernel combination for groups 56

A.1 Cortes’s loss 58

A.2 Cris’s loss 58

A.3 FSM loss 58

A.4 He’s loss 58

A.5 MSE loss 58

A.6 SRO loss 58

A.7 Iteration 20 58

A.8 Cortes’s loss 59

A.9 Cris’s loss 59

A.10 FSM loss 59

A.11 He’s loss 59

A.12 MSE loss 59

A.13 SRO loss 59

A.14 Iteration 500 59

A.15 Cortes’s loss 60

A.16 Cris’s loss 60

A.17 FSM loss 60

A.18 He’s loss 60

Trang 12

A.19 MSE loss 60

A.20 SRO loss 60

A.21 Iteration 20 60

A.22 Cortes’s loss 61

A.23 Cris’s loss 61

A.24 FSM loss 61

A.25 He’s loss 61

A.26 MSE loss 61

A.27 SRO loss 61

A.28 Iteration 500 61

A.29 Cortes’s loss 62

A.30 Cris’s loss 62

A.31 FSM loss 62

A.32 He’s loss 62

A.33 MSE loss 62

A.34 SRO loss 62

A.35 Iteration 20 62

A.36 Cortes’s loss 63

A.37 Cris’s loss 63

A.38 FSM loss 63

A.39 He’s loss 63

A.40 MSE loss 63

A.41 SRO loss 63

A.42 Iteration 500 63

B.1 Iter20 65

B.2 Iter500 65

B.3 Iter20 66

B.4 Iter500 66

B.5 Iter20 66

B.6 Iter500 66

Trang 13

List of Tables

2.1 Associations between MRI and CSF, PET 214.1 Number of subjects used in the thesis 41

Trang 14

Chapter 1

Introduction

Dementia is a broad term used to describe a group of symptoms affecting cognitive functions such as memory,reasoning, and communication It usually appears in people more than 65 years old According to the Diagnostic andStatistical Manual of Mental Disorders (DSM-5) of [1], dementia is characterized by several key criteria: (i) significantcognitive decline in one or more domains, as reported by the individual, a knowledgeable informant, or a clinician, (ii)

a decline in neurocognitive performance, typically evidenced by test scores falling two or more standard deviationsbelow appropriate norms on formal testing or equivalent evaluation, and (iii) cognitive deficits that are substantialenough to interfere with daily functioning

There are several types of dementia Among them, Alzheimer’s disease (AD) is perhaps the most common form.Currently, there is no definitive treatment available for AD As the global population continues to age, and the number

of clinical trial disappointments rises, the prevalence of AD is increasing The estimated number of people living with

AD worldwide was around 50 million in 2020, and it is projected to reach 152 million by 2050 [2] As such, it presentssignificant health, medical and societal challenges

The study of AD is facing several challenges First, the underlying cause of AD is not fully understood, despitegrowing knowledge about "tau" and "amyloid" - two main biomarkers for AD Second, the complexity of AD affectsmultiple functional domains and the disease progresses longitudinally [3] Third, it heterogeneously affects individ-uals and can vary significantly among subjects due to differences in scanning methods employed at multiple sitesand centers To address these challenges, comprehensive research is underway to explore new biomarkers such asplasma, EEG, audio, video for the diagnosis of Normal Control (NC) (also known as Cognitive Normal (CN) or HealthyControl (HC)) as compared to individuals with mild cognitive impairment (MCI) and AD Some initiatives are focusing

on the fusion of these modalities to identify effective markers for diagnostic purposes The integration of ties is a crucial step in enhancing the diagnosis of AD, and numerous studies have demonstrated the potential andeffectiveness of fusion techniques

Trang 15

1.1.1.1 Concatenation

Concatenation is a simple yet common technique for low-level feature combinations The resulting combined ture vector includes all properties of the combining features The authors in [4] combined the scale-invariant featuretransform (SIFT) features of magnetic resonance imaging (MRI) and positron emission tomography (PET) images withcanonical correlation analysis (CCA) of PET and CCA of MRI The authors in [5] showed that the concatenation of dif-ferent anatomical MRI measures (cortical thickness, cortical area, cortical curvature, grey matter density, and subcor-tical volumes) contain complementary information and that improves the final classification results However, thereare some disadvantages of concatenation First, concatenation can lead to a significant increase in feature dimen-sionality, especially if some modalities have many features This can result in increased computational complexity,higher memory usage, and the risk of overfitting Second, if the modalities have features on vastly different scales,concatenation can lead to issues where certain features dominate the fusion process, potentially overshadowingthe importance of other modalities Third, while concatenation can capture some interactions among modalities,

fea-it may not capture complex nonlinear relationships or dependencies that require non-linear modeling Fourth, incases where one modality has many missing values, the concatenated feature vector may result in a sparse datarepresentation, which can be problematic for some algorithms

To address the first challenge, some studies employ feature selection techniques For instance, [6] and [7] usedcanonical feature selection based on CCA is utilized The work [8] applied multi-task feature selection, while [9]implemented longitudinal and multi-modal feature selection For the second challenge, [5] adopted a weightedconcatenation approach, with weights to be learned through cross-validation To address the third challenge, [10]proposed an approach involving the concatenation of features from multiple levels using deep learning This allowedfor the capture of more complex relationships, resulting in non-linear fusion features Regarding the last challenge,

it can only be addressed in the training set by designing the training with exhaustive training data or data imputation(which can be challenging and less reliable, especially for high-dimensional data like MRI or PET images)

Trang 16

1.1.1.2 Summation

Another popular method to aggregate information from multiple sources is feature summation This process volves combining features without increasing the dimensionality of the final result Several challenges arise First,features of modalities must have the same size and located in the same space Second, summation can be sensitive

in-to outliers or extreme values Third, summation does not capture interaction among sources; it merely combinestheir values

To address these challenges, various approaches have been developed For instance, the work [11] employed

a registration process to combine low-level grey matter (GM) images with fluorodeoxyglucose (FDG)-PET images,creating a novel fused modality referred to as "GM-PET" These fused images preserve both the structural contourand metabolic characteristics of the subject’s brain tissue Additionally, the author in [12] used an autoencoder

to represent the latent features of each modality, function MRI (fMRI), or single nucleotide polymorphisms (SNP),and then applied the summation method to combine encoding vectors from different modalities This approach, inconjunction with joint training data, is designed to facilitate the capture of relationships among data sources andcan handle missing data in the training set However, due to variations in size, dimension, space, and scale amongfeatures of different modalities, it may not be accurate to represent them by simple weighted concatenation or forcethem into the same feature space for summation Moreover, due to heterogeneity among subjects and modalities,the combination of features at the individual level may be different Altogether, each feature should consider notonly the characteristics of the individual subject but also those of other subjects in different modalities

1.1.2 Kernel fusion

In machine learning, a kernel is a similarity function that computes the similarity or inner product between two datapoints Because the kernel function can be linear or nonlinear, it allows algorithms to implicitly operate in a high-dimensional feature space, via outputs of the similarity function, without explicitly computing the transformation

to the feature space Kernels are used in a variety of methods such as kernel PCA, kernel LDA, and kernel CCA.There are two types of kernel combinations The first type is to find the parameters for constructing kernels whereeach kernel is represented by one set of parameters The second one is to combine kernels from multiple sources/modalities where each kernel is represented by a source/modality The kernels can be combined in an unsupervisedway with parameters or in a supervised way by combining with other learning algorithms such as support vectormachine (SVM), Support Vector Regression (SRV), and Gaussian process [13]

1.1.3 Tensor decomposition

Tensors can be considered as multidimensional arrays As a tensor can be seen as a multiway array, it provides a ural representation of multidimensional data Tensor method is useful in various domains, from neuroscience [14,

Trang 17

nat-Figure 1.1: Thesis structure.

15, 16, 17, 18], genetics [19, 20] and deep learning [21, 22, 23] Tensor decomposition, which factorizes a tensor into aset of basis components (e.g., vectors, matrices, or simpler tensors), has become a popular tool for multivariate andhigh-dimensional data analysis Tensor methods like tensor decomposition reveal the relation between the latentvariable and the observation and give a good explanation [24], represent unimodal, bimodal, and trimodal interac-tions between behaviors [21] Tensor decompositions are useful in data fusion [25] to combine multiple tensors into

a single tensor One can apply tensor decomposition for a variety of reasons, such as to improve the performance

of a model, to reduce the number of parameters in a model, or to make it easier to interpret the results of a model

1.2 Contribution

By exploring the advances in tensor methods and deep learning models, the main contributions of this thesis are:

1 Developing a new data fusion framework to study multi-modal data

2 Proposing two kernel learning methods: Deep kernel learning (DKL) and Tensor kernel learning (TKL)

3 Comparing the performance of DKL and TKL with state-of-the-art kernel methods

4 Applying DKL and TKL to classify and assess Alzheimer’s Disease (AD), incorporating neuroimaging, genetic,and biological data (Cerebrospinal fluid (CSF))

Trang 18

1.3 Thesis outline

Figure 1.1 presents the main chapters and sections in this thesis To begin, Chapter 2 explores biomarkers, machinelearning, tensor decomposition, and deep learning Chapter 3 presents the proposed framework and the two pro-posed kernel methods DKL and TKL Chapter 4 utilizes the ADNI dataset to evaluate the proposed framework andkernel methods and to compare them with state-of-the-art kernel methods Finally, Chapter 5 concludes our work

in this thesis, with a discussion on limitations and potential future directions

Trang 19

Chapter 2

Background

This chapter presents an overview of the biomarkers for AD and the technical background of tensor decomposition,and deep learning In the first part, we discuss different biomarkers for AD including brain imaging, chemical mea-sures, and gen (SNP) In the second part, we introduce matrix decomposition, tensor, and tensor decomposition

2.1 Biomarkers for Alzheimer’s Disease

Alzheimer’s Disease (AD) is a neurodegenerative disease with a complex pathobiology and no curative treatment [26].One important direction in AD research is to find biomarkers that help the diagnosis of AD and its early signs, usingbrain imaging, cerebrospinal fluid (CSF), and genetic data In the following section, we begin with a brief discussionabout the fundamentals of brain imaging, CSF, and genetic data

2.1.1 Genetics

Cells are the basic structural and functional units of all living organisms A human body is composed of close to

50 to 100 trillion cells and there are almost 200 different types of cells in an adult human body [27] They are thesmallest entities that can independently carry out the processes necessary for life, such as metabolism, growth, andreproduction Cells are organized into tissues, and tissues into organs In humans, the nucleus of a cell containsthe chromosomes which carry the genetic code that guides the development, functioning, and inheritance of traits(Figure 2.2a) Humans have 46 chromosomes organized as 23 pairs in most of their cells Each chromosome consists

of a single, long deoxyribonucleic acid (DNA) molecule wrapped around by histone proteins, which help condenseand protect the DNA (Figure 2.2b)

A gene is a segment of DNA that contains information for the synthesis of a specific protein or the regulation of

Trang 20

Figure 2.1: Structure of Chapter 2.

Figure 2.2: Relationship between cell, chromosomes, genes, and DNA (a) Cells are the fundamental building blocks ofthe human body Chromosomes are within the nucleus of human cells, containing genetic information necessary forproper functioning and development (b) Humans have a total of 46 chromosomes organized into 23 pairs including

22 pairs of autosomes and 1 pair of sex chromosomes (c) Each chromosome consists of a single, long DNA moleculewrapped around histone proteins (d) DNA structure and nucleotide types

Trang 21

Figure 2.3: Alleles and SNP encoding (a) An allele is one of the various possible variants of a gene that exists at aspecific location (locus) on a chromosome A Single Nucleotide Polymorphism (SNP) is the smallest unit of geneticvariation within an allele (b) Each SNP is characterized by the frequency of minor variants (the variant less common

in the entire population)

a particular biological process Genes vary in size, and the human body comprises a vast number of them (Figure2.2c) The structure of DNA is a double-stranded helix DNA is composed of repeating units called nucleotides Eachnucleotide consists of a sugar-phosphate backbone and a nitrogenous base There are four nitrogenous bases:adenine (A), thymine (T), cytosine (C), and guanine (G) Adenine (A) always pairs with thymine (T), and cytosine (C) pairswith guanine (G) One strand runs 5’ to 3’, and the other runs 3’ to 5’ This orientation is crucial for DNA replicationand the synthesis of new DNA strands (Figure 2.2d)

A gene variant, known as an allele, represents a specific version or form of a gene, distinguished by differences inthe DNA sequence Within a population, it is common for multiple alleles to exist for a single gene, and these geneticvariations can lead to different traits and characteristics Changes in alleles correspond to changes in traits, likehow gene variations influence an individual’s characteristics Alleles can exhibit distinctions in their DNA sequences

at specific locations within a gene, often referred to as loci These differences can encompass single nucleotidechanges, known as single nucleotide polymorphisms (SNPs), as well as insertions, deletions, or rearrangements ofgenetic material SNP represents the smallest unit of genetic variation within an allele One allele of a gene may

Trang 22

exhibit dominance (denoted by an uppercase letter, e.g "A"), which means its effects are observable even when

it coexists with a recessive allele (denoted by a lowercase letter, e.g "b") The recessive allele becomes evident

in an individual’s phenotype only when two copies of it are present An individual’s genotype is a combination ofalleles inherited from each of the two chromosomes In a homozygous individual, two identical alleles occupy a givenlocus, while in a heterozygous individual, two different alleles are present (Figure 2.3a) Additionally, minor variants,typically representing the least common allele within the entire population, are sometimes used in genetic analysisfor SNP encoding corresponding to the number of minor alleles an individual carries (Figure 2.3b) Let xij denoteSNP j of subject i for i = 1, , N and j = 1, , p as:

0if SNP j has no major variant,

1if SNP j has one major variant,

2if SNP j has two major variants

There are two primary methods for assessing genetic variation: genotyping and sequencing In genotyping,microarrays, also known as genomic arrays, are utilized to examine single nucleotide polymorphisms (SNPs) andother genetic variations within an individual’s genome These arrays are effective at establishing an individual’sgenetic profile at specific loci The identification of SNPs is accomplished through the calculation of the B-allelefrequency (BAF), which measures the presence of alleles B and A This is achieved by transforming the fluorescenceimage into two normalized allele intensities for a given SNP, denoted as X and Y , which are further used to derive

a transformed ratio θ represented as:

θ = 2

π× arctan Y

X



In this context, homozygous sites are identified as AA (with a BAF of 0) or BB (with a BAF of 1), while heterozygoussites are referred to as AB (with a BAF of 0.5)

2.1.2 Magnetic Resonance Imaging (MRI)

Magnetic Resonance Imaging (MRI) is a non-invasive imaging technique that exploits the phenomenon of nuclearmagnetic resonance The principle of MRI is summarised in Figure 2.4 During an MRI procedure, the patient is po-sitioned within a superconducting magnet, and the hydrogen atoms in the body align with this field These alignedatoms produce signals, which are then recorded in spatially encoded data points forming the so called k-space Spa-tial coding involves systematically varying the magnetic field gradients during data acquisition, allowing the differ-entiation of signals from different locations within the body The resulting k-space data – a representation of spatialfrequencies – then enter the inverse Fourier transform This converts the k-space data into the final high-resolutionimages that reveal detailed cross-sectional views of internal structures The duration of an MRI scan typically ranges

Trang 23

Figure 2.4: MRI principle (adopted from [28, 29, 30], with modifications) (a) MRI uses strong magnetic fields andradio waves to create detailed images of the inside of the human body The fundamental principle behind MRI isthe behavior of hydrogen nuclei (protons), which are abundant in the human body, particularly in water molecules.(b) When a patient is placed in the MRI scanner, the strong magnetic field aligns the hydrogen nuclei in the bodyalong the magnetic field direction Radio waves are then applied to temporarily disrupt this alignment When theradio waves are turned off, the hydrogen nuclei return to their original alignment, releasing energy in the process.(c) Detectors in the MRI machine measure the energy released to construct k-space images, resulting in (d) detailedcross-sectional images of the body using inverse Fourier transform The resulting images are valuable for diagnosing

a wide range of medical conditions and are especially useful for examining soft tissues

from 15 to 90 minutes depending upon what the MRI is scanning

Figure 2.5: Progression of Alzheimer’s disease in grey matter atrophy [31]

The authors in [32] showed that MRI is the most well-known modality among imaging modalities used for ing AD, where most of the researchers used T1-weighted images while only a few researchers used T2 images In AD,the ventricular surface of the brain and hippocampus are commonly examined In parallel, voxel-based morphom-etry (VBM) using structural brain MRI is also commonly used for assessing Normal Aging and AD The progression

Trang 24

evaluat-of brain atrophy evaluat-of healthy control is shown in Figure 2.5 It shows that there is a significant negative correlation tween grey matter and white matter volume as one ages most prominently in the frontal cortex and brain stem [31].

be-2.1.3 Positron emission tomography (PET)

Positron emission tomography (PET) is an advanced medical imaging technique that provides insights into the tional activities of living tissues PET relies on the principles of positron emission and annihilation In PET imaging,radiotracers – compounds labeled with positron-emitting isotopes – are administered into the body These isotopesundergo positron emission, generating detectable signals through annihilation with nearby electrons PET scannerscapture these signals, producing three-dimensional images that offer detailed insights into biological functions atthe molecular level

func-One of the most widely used radiotracers in PET imaging is [18F ]-fluorodeoxyglucose ([18F ]-FDG) FDG is a cose analog that reflects glucose metabolism within tissues High metabolic activity increases FDG uptake Literaturestudies show that AD and Mild Cognitive Impairment (MCI) patients typically have reduced glucose metabolism incertain areas of the brain in comparison with healthy individuals Specifically, deficits in the cerebral metabolicrate for glucose (CMRglc), extending from the hippocampus to the parietal temporal and posterior cingulate cor-tices, are observed in normal elderly individuals Patients with AD and diffuse Lewy body disease exhibit decreasedmetabolism, particularly in the occipital and parietal temporal regions, as reported by [33] In comparison to nor-mal controls (NC), both the prodromal AD (pAD) and amnestic MCI (aMCI) groups exhibit significantly lower CMRglcbilaterally in the posterior cingulate, precuneus, parietal temporal, and frontal cortex [34] Furthermore, as a diag-nostic tool, PET is superior to a baseline clinical evaluation and similar to an evaluation performed 4 years late [35].FDG-PET studies report that, compared to NCs, AD patients show metabolic reductions involving medial temporallobe (MTL), parietal temporal (PTC), and posterior cingulate cortices (PCC) (Figure 2.7) while the frontal cortex (FC) isalso involved in more advanced AD stages [36]

glu-Whereas [18F ]-FDG provides insights into metabolic activity, AV-45, also known as florbetapir, and PittsburghCompound B (PiB)-PET focus on specific molecular targets AV-45 is a radiotracer designed to target beta-amyloidplaques, a hallmark of Alzheimer’s disease AV-45 PET imaging enables the visualization and quantification of beta-amyloid deposits in the brain

PiB is another radiotracer employed in PET imaging to detect beta-amyloid deposits PiB-PET imaging provides aunique perspective on the distribution and accumulation of beta-amyloid in the brain, aiding researchers and clini-cians in studying AD progression PIB-PET studies examined the change in PiB uptake between high-risk populationsand healthy people In Figure 2.8, both FAD mutation carriers ApoE4 and carriers population showed increased PiBuptake values than normal controls In MCI, only a subset shows high PiB uptake values In general, PiB is considered

to not be sufficient in the determination of risk among at-risk subjects, thus a combination with a complementary

Trang 25

Figure 2.6: Z score reduction of FDG-PET in a normal cognition subject who transitions to AD and another subjectwho transitions from MCI to AD ( (adopted from [33] with modification).

Figure 2.7: The top region shows decreased uptake values in FDG-PET images when comparing AD and high-riskpopulations to normal controls [36]

modality, such as FDG-PET or MRI is necessary [36]

Trang 26

Figure 2.8: The top region shows increased uptake values in PIB-PET images when comparing AD and high-riskpopulations to normal controls [36].

Trang 27

2.1.4 Cerebrospinal fluid (CSF) biomarkers

The cerebrospinal fluid (CSF) serves as a biochemical biomarker, measuring levels of amyloid-β42, total tau, andphospho-tau It includes total tau (T-tau), hyperphosphorylated tau (P-tau), and the 42 amino acid isoform of amy-loid β (Aβ42) CSF biomarkers offer prognostic insight and help identify individuals whose disease is more likely toprogress It is, therefore, included in therapeutic clinical trials [37]

In the autopsy cohort of CSF samples, CSF Aβ42 emerged as the most sensitive biomarker for AD [38] Reducedconcentrations of CSF amyloid-β1–42 (Aβ1–42) were associated with the presence of Aβ plaques [39] Some studiessuggest that CSF tau/Aβ(42) ratios hold promise as preclinical biomarkers predictive of future dementia in cognitivelynormal older adults [40, 41, 42]

The studies [43] and [44] demonstrated that CSF and MRI biomarkers contribute independently to intergroupdiagnostic discrimination The combination of CSF and MRI provides better predictive capabilities than either datasource alone However, MRI is more useful for predicting clinically defined disease stages compared to CSF biomark-ers

CN CSF AB is correlated with MRI;Decrease in CSF A1-42

correlated with brain atrophy

FDG showed significant decrease but little sMRI change in asymptomatic subjects

MCI Increases in tau and p-taucorrelated with a decrease in

progressed to AD hippocampal volumes

FDG and MRI measures in hippocampal formation best characterize MCI, and additional neocortical

damage best characterizes AD

AD Increases in CSF t-tau and p-tau181correlated with brain atrophy. Both MRI and FDG had a hippocampaldecreasedue to AD; FDG was better than

MRI in predicting conversion of MCI to AD.

Table 2.1: Associations between MRI and CSF, PET (adopted from [45] with modifications)

Trang 28

2.2 Matrix decomposition and Tensor decomposition

Matrix decomposition is a fundamental device in various machine-learning algorithms It enables the breaking down

of a large matrix into smaller components in the form of vectors and smaller matrices Perhaps the most commonlyused matrix decompositions are Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) Amatrix can be represented as samples (rows) by features (columns) Breaking a large-scale matrix helps to findlatent variables, dimensionality reduction, or compress the original data matrix

Nowadays, the size of data increases exponentially due to the expansion of recording devices (i.e sensors, ing tools, and clinical tools) as well as diverse sources such as brain imaging and gene expression data Equallygrowing is the spatial temporal information of the data In addition, the dimensionality of data extends to otherdomains, such as the frequency domain, and operation sparse domain Standard approaches using matrix decom-position using vector or matrix may not be suitable as the covariance information among various modes may be lost

imag-To work with multilinear structures, the analysis tools should reflect the multi-dimensional structure of the data.Tensor decompositions extend the concept of matrix decomposition to higher-dimensional arrays, known astensors It provides natural representations for multidimensional data by capturing multi-linear and multi-aspectstructures in a much lower dimension

Tensor method and tensor decomposition are useful in various fields, including neuroscience [14, 15, 16, 17, 18],genetics [19, 20] and deep learning [21, 22, 23] It has the potential to deal with high-dimensional and structureddata In this section, I will first talk about matrix decomposition I will then discuss basis tensors, operators, anddifferent types of tensor decomposition Finally, I will present the pros and cons of each type of decomposition

In this section, I begin with a brief introduction to the foundation of matrix and tensor decompositions

2.2.1 Matrix decomposition

2.2.1.1 Singular value decomposition (SVD)

Singular value decomposition (SVD) method [46] is the workhorse algorithm for many well-known algorithms inmachine learning This section describes its essential formulas and properties and its relation to the tensor decom-position algorithms

For a matrix A ∈ Rm×n, the SVD is given by:

Trang 29

1 U and V are orthogonal matrices: UTU = Iand VTV = I

2 The columns of U are the left singular vectors of A, and the columns of V are the right singular vectors of A

3 The singular values in Σ are the square roots of the eigenvalues of ATA(or AAT), and the columns of U and

Vare the corresponding eigenvectors

An important property of SVD decomposition is stated in the [47] theorem that, for a given matrix A of rank r,the best rank-k approximation of A in terms of the Frobenius norm is given by the SVD truncated after the first ksingular values:

Ak = UkΣkVTk, (2.2)where Ukis an m × k matrix containing the first k columns of U from the SVD of A, Σkis a k × k diagonal matrixcontaining the first k singular values of A, Vk is an n × k matrix containing the first k columns of V from the SVD

of A

The Frobenius norm of the error in this approximation is minimized among all rank-k approximations

2.2.1.2 Non-negative matrix factorization

Nonnegative matrix factorization (NMF) seeks to approximate a matrix A as the product of two nonnegative matrices,

Wand H, as follows:

where W is an m × k nonnegative matrix representing the basis vectors or features, H is a k × n nonnegative matrixrepresenting the coefficients or activations, k is the chosen rank, determining the dimensionality of the feature space.NMF is particularly useful for matrices with nonnegative entries, such as images, text, and biological data Thenonnegativity constraints often lead to parts-based representations, making NMF interpretable in various applica-tions For example, in image processing, NMF is applied to help identify patterns and features within images Inbiological data analysis, NMF aids in the extraction of meaningful patterns from biological data

2.2.2 Tensor decomposition

Tensor decomposition was proposed in [48] with diverse applications across various disciplines Tensors and theiroperators are crucial in unraveling the complexities of multi-dimensional data structures A tutorial on tensors [49]stands as one of the most cited papers, providing essential insights into the fundamental principles of tensor anal-ysis

Trang 30

Figure 2.9: Example of a three-order tensor (a) Example of tensor X ∈ R 1 2 3 The first dimension is the row,the second is the column (b) Example of tensor X ∈ R4×3×2.

Tensor decomposition, which factorizes a tensor into a set of basis components (e.g., vectors, matrices, or simplertensors), has become a popular tool for multivariate and high-dimensional data analysis Tensor methods like tensordecomposition reveal the relation between the latent variable and the observation and give a good explanation[24], representing unimodal, bimodal, and trimodal interactions between behaviors [21] Tensor decompositions

is useful in data fusion [25] to combine multiple tensors into a single tensor One can apply tensor decompositionfor a variety of reasons, such as to improve the performance of a model, to reduce the number of parameters in amodel, or to make it easier to interpret the results of a model In [50], a broad spectrum of applications is discussed,spanning from social network analysis to brain data analysis, as well as from web mining to healthcare In [51],the focus is specifically on low-rank decomposition, with a connection to machine learning In [52], a comprehensivemathematical overview of methods, types of tensors, and variants is provided In [53], tensor methods for computervision and deep learning are explored The latest survey [54] covers up-to-date topics in tensor supervised learning,tensor unsupervised learning, tensor deep learning, and several applications of these research areas

In this session, I first introduce the basis for tensor decomposition, and tensor operators and then discuss ferent types of tensor decomposition

dif-2.2.3 Tensor notation and tensor operators

Tensors are characterized by their order or rank, representing the number of indices needed to identify an elementwithin the tensor The order of a tensor is a fundamental characteristic that defines the number of indices needed

to specify a particular element within the tensor These indices correspond to different modes or dimensions of thetensor For example, a tensor of order one, commonly known as a vector, requires a single index for identification

An element in a vector v is denoted as vi, where i is the index Similarly, a tensor of order two, represented as amatrix, involves two indices An element in a matrix M is expressed as Mij, indicating the row i and column j in

Trang 31

the matrix Extending this concept, a tensor of order three involves three indices, and its notation becomes moreintricate An element in a third-order tensor T can be expressed as Tijk, where i, j, and k represent the indicesalong each mode Figure 2.9 displays an example of an order-3 tensor, where the first, second, and third modescorrespond to the number of rows, columns, and depths, respectively.

Matricization: Mode-k Unfolding

Matricization, also known as mode unfolding, is an important operation in tensor algebra that transforms atensor into a matrix while preserving certain structural information The mode-k unfolding of a tensor X , denoted

as X(k), reshapes the tensor along its k-th mode into a matrix

For a tensor X of order N, the mode-k unfolding is defined as follows:

Here, i1, , iN are indices along the corresponding modes, and Ikrepresents the size of the k-th mode

Vec Operator for a Tensor

The Vec operator applied to a tensor X , transforms the tensor into a vector by concatenating its columns Thisoperation is useful in converting a multi-dimensional array into a linear format, facilitating certain mathematicaloperations and analyses

For a tensor X of order N with dimensions I1× I2× × IN, the vectorization, denoted as Vec(X ), is defined

by stacking the columns of the tensor:

Here, the indices 1, 2, , I1× I2× × IN represent the elements of the tensor in a linear order

Tensor Product: Mode-k Product between Tensor and Matrix

The mode-k product, denoted as ×k, is a fundamental operation in tensor algebra When applied between atensor X of order N and a matrix M along its k-th mode, the mode-k product results in a new tensor

For a tensor X of size I1× I2× × IN, and a matrix M of size J × Ik, the mode-k product is defined as:

Y = X ×kM (2.6)

Trang 32

The elements of the resulting tensor Y are calculated as follows:

The Hadamard product, often denoted by ∗, is a binary operation performed element-wise between two matrices

of the same size Given two matrices A and B with dimensions m × n, the Hadamard product A ∗ B is defined as:

(A ∗ B)ij = Aij· Bij (2.15)

In other words, each element of the resulting matrix is obtained by multiplying the corresponding elements of A

Trang 33

and B The Hadamard product is thus a simple and intuitive operation that preserves the structure of the originalmatrices.

The Hadamard product is commutative and associative, but it differs from the matrix multiplication, which volves the dot product of rows and columns

A ⊙ (C + D) = A ⊙ C + A ⊙ D

(cA) ⊙ B = c(A ⊙ B) = A ⊙ (cB) ( Scalar Multiplication) (2.19)

Im⊙ A = A ⊙ In= A (Identity Matrix) (2.20)diag(A1, A2, , Ak) ⊙ B =diag(A1⊙ B, A2⊙ B, , Ak⊙ B) (Block Diagonal Matrices) (2.21)(A ⊙ B)T(A ⊙ B) = ATA ∗ BTB (2.22)

2.2.4 Different types of tensor decomposition

Tensor decompositions are mathematical techniques used to break down a high-dimensional tensor into a set ofsimpler components Tensors, which are multi-dimensional arrays, often arise in various scientific, engineering, andmachine-learning applications Decomposing tensors helps reveal underlying patterns, reduce dimensionality, andextract meaningful information from complex data structures

Trang 34

Several tensor decomposition methods exist, each serving unique purposes Prominent examples are the ical Polyadic Decomposition (CP or PARAFAC), Tensor Train (TT) Decomposition, Tucker Decomposition, Block TermDecomposition (BTD), and Tensor Singular Value Decomposition (t-SVD) are prominent examples These tensor de-compositions are illustrated in Figure 2.10.

Canon-Figure 2.10: Illustration for different types of three-way tensor decomposition [56]

CP (Canonical Polyadic) Decomposition

The CP decomposition, also known as the PARAFAC (Parallel Factors) decomposition, is a tensor factorizationtechnique that expresses a given tensor as the sum of the outer products of vectors Given a tensor X of order N,the CP decomposition is represented as:

The CP decomposition is defined as:

Trang 35

rep-as alternating lerep-ast squares (ALS) and gradient-brep-ased optimization methods, can be employed to compute the factormatrices.

Tucker Decomposition

The Tucker decomposition, also known as the n-mode factorization, is a tensor factorization method that presses a given tensor as the sum of a core tensor and outer products of factor matrices along each mode Given atensor X of order N, the Tucker decomposition is represented as:

ex-X ≈ G ×1A1×2A2×3 ×NAN, (2.25)where G is the core tensor and A1, A2, , AN are factor matrices along each mode

The Tucker decomposition is defined as:

Applications of the Tucker decomposition include image processing, signal processing, neuroscience, and variousother fields where multi-dimensional data analysis is essential Computational techniques such as higher-ordersingular value decomposition (HOSVD) are commonly used to compute the factor matrices and the core tensor

Block Term Decomposition (BTD)

The Block Term Decomposition (BTD) is a tensor factorization method that extends the concepts of CP (CanonicalPolyadic) and Tucker decompositions BTD is particularly suitable for tensors with a block-wise structure, wheredifferent blocks exhibit different modes of variability

Given a tensor X of order N, the BTD represents the tensor as the sum of terms, each corresponding to a block,where each term is a Tucker format The BTD is expressed as:

where K is the number of terms, and GKand A(k)

i represent the core and the factor matrix associated with the i-thmode of the k-th term

The BTD allows for a compact representation of tensors with a structured block-wise pattern, capturing the lationships among different blocks and their interactions across modes This decomposition is particularly valuable

re-in various applications, such as multi-modal data analysis, image processre-ing, and network analysis Computational

Trang 36

techniques for BTD involve optimization methods, including alternating least squares (ALS) and gradient-based gorithms, to estimate the factor matrices BTD provides a powerful tool for extracting interpretable patterns fromcomplex multi-dimensional data.

al-Tensor Train (TT) Decomposition

The Tensor Train (TT) decomposition, also known as Matrix Product State (MPS) or Hierarchical Tucker sition, is an advanced tensor factorization method designed for efficiently representing high-dimensional tensors.The TT decomposition is particularly effective for tensors with low numerical rank

decompo-Given a tensor X of order N, the TT decomposition expresses the tensor as a sequence of small core tensorscombined with factor matrices The TT decomposition is represented as:

ik are the core tensors and A(k)

ik−1ikare the factor matrices associated with the k-th mode The TT position provides a hierarchical representation of the tensor with a series of local interactions

decom-The key advantage of the TT decomposition is its capability to represent high-dimensional tensors in a compactand structured format, reducing the number of parameters required to represent the tensor This makes it es-pecially useful for large-scale tensor computations encountered in fields such as quantum chemistry, physics, andmachine learning Computational techniques for the TT decomposition involve iterative optimization methods, in-cluding alternating least squares (ALS) and gradient-based algorithms The TT decomposition is closely related tomatrix product states and is a special case of tensor network decompositions

2.3 Kernel methods in machine learning

In machine learning, a kernel function is a measure of similarity between pairs of data points This allows algorithms

to implicitly operate in a high-dimensional feature space without explicitly calculating the coordinates of the datapoints in that space Using the kernel converts a linear method to a nonlinear method (linear in explicit high dimen-sion) Kernels are responsible for mapping data points from the input space to a higher-dimensional feature space,making it easier to find patterns and relationships in the data For each m modality/source with kernel function

k : Rpm×1× Rp m ×1→ R, we can construct a kernel matrix Km

∈ RN ×N as follows:

Km(i, j) = km(xi, xj), where i, j = 0, , N (2.29)

In machine learning, kernels are essential mathematical functions that play a pivotal role in various algorithms,particularly in the context of support vector machines (SVM) and kernelized methods A kernel function, denoted as

Trang 37

K(x, y), computes the inner product of transformed feature vectors without explicitly computing the transformationitself Popular examples include the Gaussian radial basis function (RBF) kernel

of the polynomial Kernels enable algorithms to operate in high-dimensional feature spaces efficiently, capturingintricate relationships in the data and enhancing the performance of machine learning models This kernel matrixcan be used as input for various methods, including kernel SVM, for classification tasks Each type of feature ismapped into a higher dimensional space via a kernel function The following section will describe kernels used inmachine-learning algorithms

Kernel Support Vector Machines (SVM) represent a powerful extension of traditional SVMs in machine learning.The fundamental idea behind kernel SVM is to implicitly map input data into a higher-dimensional space using akernel function, allowing for the linear separation of complex patterns The decision function of a kernel SVM isexpressed as

Trang 38

Chapter 3

A Framework for Kernel

Combination/Fusion for Alzheimer’s

disease accessment

3.1 Introduction

Recall that AD is a neurodegenerative disorder that significantly impacts memory, cognitive function, and the ability

to carry out daily activities Although AD is irreversible with no currently available effective treatments, early tion, and diagnosis plays an important role in facilitating timely interventions, slowing the progression of AD, andenhancing overall quality of life for patients

detec-Because of the complexity of AD, current clinical diagnoses rely on a combination of assessments including clinicalscores, MRI, FDG-PET, amyloid, and tau PET scans Computer-aided diagnosis (CAD) plays an important role in clinicaldiagnosis in early detection and precise diagnosis The number of research papers focusing on CAD using multiplemodalities is on the rise Despite advances, few methodological approaches are available to effectively combineinformation from these diverse modalities Thus, there is an urgent need to develop suitable multimodal analyticaltools A beginning can be made by developing better fusion techniques

3.2 Limitations of current fusion approaches

In the fusion process, explicit features (both low-level and high-level) or implicit features (via kernel fusion) can becombined either at the feature or decision level In feature fusion, it may be challenging to enforce all modalitiesinto the same feature space for effective combination Numerous studies show that combining kernels in a data-

Trang 39

dependant manner outperforms classical fusion such as feature-level and score-level methods [57, 58] In this thesis,our primary focus lies on kernel fusion/combination We first discuss some popular types of kernel learning andcurrent limitations We then present the proposed solutions to address these limitations.

3.2.1 Problem formulation

Given a set of kernels {km(·, ·)}M

m=1, the objective is to find a combined kernel k(xi, xj), expressed ask(xi, xj) =

Ngày đăng: 11/10/2024, 10:32

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm