Recently, deep learning methods us-ing the representation of knowledge graph entities nodes and relations edges Gain-in vector space have gaGain-ined traction from the research community
Trang 1HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
MASTER THESIS
Representation learning for Knowledge Graph using
Deep Learning methods
TONG VAN VINH Vinh.TV202705M@sis.hust.edu.vn School of Information and Communication Technology
Supervisor: Assoc Prof Huynh Quyet Thang
Supervisor’s signature
Institution: School of Information and Communication Technology
January 12, 2022
Trang 2Graduation Thesis Assignment
Name: Tong Van Vinh
Phone: +84354095052
Email: Vinh.TV202705M@sis.hust.edu.vn; vinhbachkhoait@gmail.com
Class: 20BKHDL-E
Affiliation: Hanoi University of Science and Technology
Tong Van Vinh- hereby warrants that the work and presentation in this thesis formed by myself under the supervision of Assoc Prof Huynh Quyet Thang Allthe results presented in this thesis are truthful and are not copied from any otherworks All references in this thesis including images, tables, figures, and quotesare clearly and fully documented in the bibliography I will take full responsibilityfor even one copy that violates school regulations
per-Student
Signature and Name
Trang 3I would like to acknowledge and give my warmest thanks to my supervisor, soc Prof Huynh Quyet Thang inspired me a lot in my research career path I alsothank Mr Huynh Thanh Trung, Dr Nguyen Quoc Viet Hung, and Dr NguyenThanh Tam for supporting me in giving birth to my brainchild and challengingmyself by submitting it to the top-tier conferences I would also like to thank mycommittee members for letting my defense be an enjoyable moment and for yourthoughtful comments and suggestion
As-I would also like to give a special thanks to my girlfriend Thu Hue and my ily as a whole for their mental support during my thesis writing process There
fam-is nothing to touch my love to you Moreover, in the absence of my friends,Tien Thanh, Trong Tuan, Hong Ngoc, Hieu Tran, Minh Tam, Quang Huy, QuangThang, Ngo The Huan, I could hardly melt away all the tension from my work.Thanks for always accompanying me through ups and downs
Finally, this work was funded by Vingroup and supported by Vingroup InnovationFoundation (VINIF) under project code VINIF.2020.ThS.BK.07 I enormouslyappreciate all the financial support from Vingroup, allowing me to stay focused
on my research without worrying about my financial burden
Trang 4Knowledge graphs (KGs) have received significant attention in recent years ing more profound insight into the structure of knowledge graphs allows us totackle many challenging tasks, such as knowledge graph alignment, knowledgegraph completion, and question answering Recently, deep learning methods us-ing the representation of knowledge graph entities (nodes) and relations (edges)
Gain-in vector space have gaGain-ined traction from the research community because oftheir flexibility and prospective performance The best way to evaluate how good
a representation learning method is to use that representation to solve real-worldtasks In terms of knowledge graphs, we can rank methods by their performance
on tasks such as knowledge graph completion (KGC) or knowledge graph ment (KGA) However, many research challenges still exist, such as enhancingthe accuracy or simultaneously solving multiple tasks
align-With such motivation, in the scope of our Master work, we address the threegroups of crucial challenges in knowledge graph representation, namely (i) chal-lenges in enhancing KGC performance, (ii) challenges in enhancing KGA per-formance, and (iii) challenges in enhancing both KGC and KGA simultaneously.For the first class of challenges, we develop a model named NoGE which takestake advantage of not only the power of Graph Neural Networks (GNNs) but alsothe expressive power of quaternion vector space and the co-occurrence statistics
of elements in KGs to achieve SOTA performance on the KGC task Moving tothe second challenge group, we propose EMGCN, a special GNN architecturedesigned to exploit different types of information to better the final alignment re-sults Finally, we propose IKAMI, the first multitask-learning model, to solve thetwo tasks simultaneously Our proposed techniques improve upon the state-of-the-art for different tasks and thus cover an extensive range of applications
Student
Signature and Name
Trang 5TABLE OF CONTENTS
CHAPTER 1 INRODUCTION 1
1.1 Knowledge Graphs (KGs) 1
1.2 Knowledge graph completion and knowledge graph alignment 2
1.2.1 Knowledge graph completion 2
1.2.2 Knowledge graph alignment 3
1.2.3 The relation between completion and alignment 4
1.3 Research challenges 5
1.3.1 Handle knowledge graph completion challenges 5
1.3.2 Handle knowledge graph alignment challenges 6
1.3.3 Handle the challenges of solving the two task simultaneously 7
1.4 Thesis methodology 7
1.5 Contributions and Thesis Outline 8
1.6 Selected Publications 9
CHAPTER 2 BACKGROUND 11
2.1 Graph Convolutional Networks (GCNs) 11
2.2 Knowledge Graph Completion background 12
2.2.1 Incomplete knowledge graphs 12
2.2.2 Knowledge graph completion models 12
2.3 Knowledge Graph Alignment background 15
2.3.1 Previous approaches 15
Trang 62.3.2 Alignment constraints 16
2.3.3 Incomplete knowledge graph alignment 18
CHAPTER 3 ENHANCING KNOWLEDGE GRAPH COMPLETION PERFORMANCE 19
3.1 Introduction 19
3.2 Dual quaternion background 20
3.3 NoGE 21
3.4 Experimental Results 23
3.4.1 Experiment setup 23
3.4.2 Main results 25
CHAPTER 4 ENHANCING KNOWLEDGE GRAPH ALIGNMENT PERFORMANCE 27
4.1 Introduction 27
4.2 Overview of the Proposed Approach 28
4.2.1 Motivation 28
4.2.2 The entity alignment framework 29
4.3 Relation-aware Multi-order Embedding 30
4.3.1 GCN-based embedding model 30
4.3.2 Loss function 31
4.4 Alignment Instantiation 32
4.4.1 Single-order alignment matrices 33
4.4.2 Multi-order alignment matrix 33
Trang 74.4.3 Attribute Alignment 33
4.4.4 Puting It All Together 35
4.5 Empirical evaluation 35
4.5.1 Experimental setup 36
4.5.2 End-to-end comparison 38
4.5.3 Efficiency Test 39
4.5.4 Ablation Test 40
4.5.5 Hyperparameter sensitivity 41
4.5.6 Robustness to constraint violations 43
CHAPTER 5 MULTITASK LEARNING FOR KNOWLEDGE GRAPH COMPLETION AND KNOWLEDGE GRAPH ALIGNMENT 45
5.1 Introduction 45
5.2 Incomplete Knowledge Graph Alignment 46
5.2.1 Challenges 46
5.2.2 Outline of the Alignment Process 47
5.3 Feature channel models 50
5.3.1 Pre-processing 50
5.3.2 Transitivity-based channel 50
5.3.3 Proximity-based channel 52
5.4 The complete alignment process 55
5.4.1 Alignment instantiation 55
5.4.2 Missing triples recovery 56
Trang 85.4.3 Link-augmented training process 58
5.5 Evaluation 59
5.5.1 Experimental Setup 59
5.5.2 End-to-end comparison 62
5.5.3 Robustness to KGs incompleteness 64
5.5.4 Saving of labelling effort 64
5.5.5 Qualitative evidences 66
CHAPTER 6 CONCLUSION 68
Trang 9LIST OF FIGURES
1.1 An illustration of knowledge graph 1
1.2 An example of knowledge graph completion 2
1.3 An example of knowledge graph entity alignment 3
1.4 Aligning incomplete KGs across domains 4
1.5 Encoder Decoder architecture for GNN based models 6
2.1 CNN and GCN comparison [37] 11
3.1 An illustration of our proposed NoGE 21
4.1 Overview of EMGCN framework 28
4.2 Computation time 38
4.3 Different supervision percentage 38
4.4 #GCN-layers 41
4.5 Embedding dim 41
4.6 Robustness to violations of entity consistency 44
4.7 Robustness to violations of relation consistency 44
5.1 Framework Overview 49
5.2 Running time (in log scale) on different datasets 63
5.3 Saving of labelling effort for entity alignment on D-W-V1 test set 65 5.4 Robustness of graph alignment models against noise on EN-DE-V2 test set 65
5.5 Attention visualisation (EN-FR-V1 dataset) The model pays less attention to noisy relations 66
5.6 KGC performance comparison between TransE and IKAMI dur-ing traindur-ing 67
Trang 10LIST OF TABLES
3.1 Statistics of the experimental datasets 23
3.2 Experimental results on the CoDEx test sets 25
3.3 Ablation results on the validation sets 26
4.1 Statistics of real-world datasets 36
4.2 End to end comparison 38
4.3 Ablation Test 39
4.4 Different weighting schemes of GCN layers 42
4.5 Effects of similarity matrix coefficients 42
5.1 Summary of notation used 47
5.2 Dataset statistics for KG alignment 59
5.3 End-to-end KG alignment performance (bold: winner, underline: first runner-up) 62
5.4 Ablation study 63
5.5 Knowledge Graph Completion performance 63
5.6 Correct aligned relations in EN↔FR KGs 66
Trang 11we describe our contributions and our thesis’s outline, followed by listing someselected publications.
1.1 Knowledge Graphs (KGs)
Figure 1.1: An illustration of knowledge graph
Knowledge graphs (KGs) are knowledge bases, but they use graph-structureddata to encode information They present the facts about real-world entities inthe form of triples hhead entity, relation, tail entityi [1], [2] For instance, Fig-ure 1.1 illustrates a knowledge graph that contains a lot of triples such as (DAVINCI, painted, MONA LISA) In this example, DA VINCI, painted, and MONALISA are a head entity, a relation, and a tail entity respectively Each triple in
a knowledge graph can be considered as a fact, and a knowledge graph thus is
a set of valid triples Valid triples represent true facts (Melbourne, city of, tralia), while invalid triples represent false facts (Melbourne, city of, Vietnam).Corrupted triples are facts that are not available in the current knowledge graph
Aus-A corrupted triple can be a valid triple, or an invalid one [3]
Trang 12In recent years, knowledge graph has been used in natural language processing,intelligent question-answering systems, intelligent recommendation systems, etc.With bigdata and deep learning, knowledge graph has become one of the corepower for the development of artificial intelligence.
1.2 Knowledge graph completion and knowledge graph alignment
1.2.1 Knowledge graph completion
Lucky Partners The Prisoner of Zenda
Surrey Santa Barbara, California
Ronald Colman
Starring Starring
BirthPlace DeathPlace
(Ronald Colman, Job, ?)
Figure 1.2: An example of knowledge graph completion
It is the fact that large knowledge graphs, even at the scale of billions of triples,are still incomplete, i.e., missing a lot of valid triples [3] Therefore, many re-search works have focused on inferring missing triples in KGs, called knowledgegraph completion (KGC) By completing knowledge graphs, we can enrich cur-rent knowledge bases and thus improve the performance of their applications.The KGC task is sometimes referred to as a link prediction task Intuitively,given two out of three elements of a missing triple, the task is to predict the thirdelement For example, it can be seen from 1.2 that given the current knowledgegraph and two available components (head entity Ronal Colman and relation Job)
of a missing triple, we are asked to predict the tail entity This task corresponds
to answering the question “What is Ronald Colman’s job?” Recently, extensivestudies have been done on learning low-dimensional representations of entitiesand relations for missing link prediction [4] These methods have been demon-strated to be scalable and effective The general intuition of these methods is tomodel and infer the connectivity patterns in knowledge graphs according to theobserved knowledge facts For example, some relations are symmetric (e.g mar-riage) while others are antisymmetric (e.g., affiliation); some relations are 1-to-1relationships (e.g., is capital of), or Many-to-Many (e.g., is author of), and some
Trang 13relations may be composed by others (e.g., my mother’s husband is my father).
It is critical to find ways to model and infer these patterns [5] A robust edge graph completion model should have enough expressive power to presentall these relation types
knowl-Indeed, many existing architectures have been trying to model one or a few ofthe above relation patterns [3], [6] Few models have been proved to be fullyexpressive, which means they can successfully model all these patterns [5], [7].However, these models often face the same challenge of over-fitting because oftheir large number of trainable parameters
In addition to conventional KG embedding models such as TransE [3], DistMult[8], ComplEx [9], and ConvKB [6], recent approaches have adapted graph neuralnetworks (GNNs) for knowledge graph completion [10], [11], [12] In general,vanilla GNNs are modified and utilized as an encoder module to update vectorrepresentations for entities and relations; then these vector representations are fedinto a decoder module that adopts a score function (e.g., as employed in TransE,DistMult, and ConvE) to return the triple scores The model will be trained sothat valid triples have higher scores than invalid ones
1.2.2 Knowledge graph alignment
Medina Duke University
Type: Person DOB: 28-10-1955 Gender: Male
County
Population:2969 Type: La personne
Né à: 15-08-1964 Sexe: femelle
Type: City Ville:King County Population:2969
Type: La personne
Né à: 28-10-1955 Sexe: mâle Type: Université
Emplacement :
North Carolina Fondé : 1838
Figure 1.3: An example of knowledge graph entity alignment
Popular knowledge graphs (e.g., DBpedia, YAGO, and BabelNet) are often tilingual, in which each language domain has a separate version [13] To encour-age the knowledge fusion between different domains, knowledge graph alignment(KGA)– the task of identifying entities in the cross-lingual KGs that refers to thesame real-world object – has received significant interest from both industriesand academia [14], [15] The alignment result can be used for further data en-
Trang 14mul-richment applications such as repairing inconsistencies, filling knowledge gaps,and building cross-lingual KBs [16]–[18].
Given two knowledge graphs (of different domains or languages), the KGA task
is to find the correspondence of entities across the two knowledge graphs Forexample, 1.3 illustrates two knowledge graphs with their entities in different col-ors (Blue and Orange) We aim to infer all the alignment information (red dashlines) from current information (KGs’ structures, entity names, and attributes).The problem of entity alignment for cross-lingual KGs has been studied inten-sively with the emergence of graph embedding techniques [19], [20] Giventwo monolingual KGs, these techniques first learn low-dimensional vectors rep-resenting the entities of each KG, and the corresponding entities are then dis-covered based on their vector similarities The first generation methods of thisparadigm, including MTransE [21], JAPE [22], ITransE [23], and BootEA [24],learn the embeddings on the assumption that if two entities have a relation, thedistance between their respective embeddings is equal to the embedding of theirrelation Avoiding this strict assumption [25], the second generation of embed-ding techniques such as GCN-Align [26], RDGCN [25], MUGNN [27], KG-matching [28], and NAEA [29] employ graph neural networks, which encode thestructural relationship based on neighbourhood information [30]
1.2.3 The relation between completion and alignment
Lucky Partners The Prisoner of Zenda
Surrey Santa Barbara, California
Ronald Colman
Starring Starring
BirthPlace DeathPlace
Double Chance Le Prisonnier de Zenda
Acteur Santa Barbara (Californie)
Ronald Colman
apparaître apparaître
metier lieuMort
Actor
Figure 1.4: Aligning incomplete KGs across domains
Existing knowledge graph alignment techniques often assume that the input KGsare nearly identical (isomorphic), which is not true [31]–[33] There is usually
a considerable gap between the levels of completeness of different monolingualKGs [2], especially between the English domain and other languages [34] Forexample, in the DBP15K dataset, the ratio of relational triples between Chinese,Japanese, or French KGs over the English KG is only around 80% [35] Fig-ure 1.4 gives an example of incomplete KGs, in which the neighborhood of the
Trang 15two entities referring to the actor Ronald Colman in English and French KGs areinconsistent (his occupation is missing in the English KG while his place of birth(lieuNaissance) is missing in the French KG) Such inconsistencies easily lead tothe different representations of the corresponding entities, especially for GNNswhere the noise is accumulated over neural layers [36].
On the other hand, no knowledge graph completion method utilizes externalknowledge to the best of our knowledge We may have multiple knowledgegraphs with different amounts of facts; thus, we can design a model that cantransfer knowledge from one to another As a result, we can leverage this mech-anism to enrich our knowledge bases better instead of just using one knowledgegraph to complete itself For example, in Figure 1.4, suppose we already suc-cessfully aligned the same color entities together Then we can easily find themissing relation between the entities Ronald Colman and Actor of the left-handside graph based on the relation (metier) that connects their corresponding enti-ties on the right-hand side of the figure Similarly, we can also find the missingrelation between the entities Ronald Colman and Surrey of the right-hand sidegraph based on their corresponding relation at the left-hand side graph
So the two tasks KGA and KGC are related to each other Intuitively, solvingone task can help us better tackle the other However, designing an architecture
to solve these tasks simultaneously is not trivial This thesis will introduce amultitask-learning method for simultaneously tackling the two tasks
1.3 Research challenges
As mentioned in section 1.2, KGC is a task of a challenge because of variousrelation types Moreover, how to balance between the expressive power and over-fitting risk is still an open question needed to be solved On the other hand, KGA
is also a complex problem due to its NP-hard nature and the fast expansion ofnetworks in complex applications nowadays This section briefly discusses someexisting challenges of KGA and KGC
1.3.1 Handle knowledge graph completion challenges
The expressive power of a representation learning model is of paramount tance in KGC because of KGs’ various relation types However, there is a trade-off between expressive power and over-fitting risk Recently, to overcome theproblem of over-fitting and better capture the neighborhood relationship between
Trang 16Scorer
Decoder Input Graph
Representation
Score GNN
Figure 1.5: Encoder Decoder architecture for GNN based models
entities and relations, many models have made use of Graph Neural Networks(GNNs) In general, vanilla GNNs are modified and utilized as an encoder mod-ule to update vector representations for entities and relations; then, these vectorrepresentations are fed into a decoder module that adopts a score function (e.g.,
es employed in TransE, DistMult, and ConvE) to return the triple scores (as trated in 1.5) Note that the expressive power of this type of architecture depends
illus-on the expressiveness of its decoder (or scorer) However, designing an encodermodule that fits the input of the decoder module is not a trivial task and thus can
be considered another challenge
Another challenge when designing KGC models is how we can model the occurrent between elements in KGs Entities and relations forming facts fre-quently co-occur in news articles, texts, and documents, e.g., “Melbourne” fre-quently co-occurs with “Australia” We should somehow construct a model that
co-is capable of modeling thco-is relationship, which co-is also considered one of the mainchallenges of KGC
1.3.2 Handle knowledge graph alignment challenges
Most KGA challenges can be linked to General Graph Alignment (GA) lenges such as handling scalability or improving the model’s accuracy However,there are some typical challenges of KGC, namely how to successfully take multi-hop structure information around entities into consideration when we solve thealignment task Furthermore, existing models have not fully utilized attribute in-formation of entities (e.g., the age attribute of a person, the country’s population)due to the high levels of inconsistency and linguistic differences For example,GCN-Align considers only the attributes types and ignores their value Anotherchallenge is designing a model that can adapt to noises, e.g., when a KG pos-sesses more entities than another Attribute noise is also common, e.g., when anentity in source KG has more attributes than its counterpart in the target KG or isstored differently
Trang 17chal-1.3.3 Handle the challenges of solving the two task simultaneously
Existing techniques often assume that the input KGs are nearly identical phic), which is not true [31]–[33] There is usually a considerable gap betweenthe levels of completeness of different monolingual KGs [2], especially betweenthe English domain and other languages [34] For example, in the DBP15Kdataset, the ratio of relational triples between Chinese, Japanese, or French KGsover the English KG is only around 80% [35] Figure 1.4 gives an example ofincomplete KGs, in which the neighborhood of the two entities referring to theactor Ronald Colman in English and French KGs are inconsistent (his occupation
(isomor-is m(isomor-issing in the Engl(isomor-ish KG while h(isomor-is place of birth (isomor-is m(isomor-issing in the FrenchKG) Such inconsistencies easily lead to the different representations of the cor-responding entities, especially for GNNs where the noise is accumulated overneural layers [36] On the other hand, how to automatically use other knowledgebases to complete a knowledge graph is still a hard question Finally, solving thetwo problems simultaneously seems to make sense, but that’s not a trivial task
1.4 Thesis methodology
The theme of this thesis is to find deep learning methods which can produceexpressive representations for knowledge graph entities and relations so that theycan outperform current state-of-the-art models in two well-known tasks, namelyknowledge graph alignment and knowledge graph completion, by addressing theabove challenges To this end, the proposed methods can adapt with variousapplication settings, save the computation power and memory while guaranteeingthe inference step is within a reasonable time, and the produced result achieveshigh accuracy We follow a top-down approach, where we focus on tackling thetasks for various types of datasets with different complexity levels We attempt
to overcome the mentioned challenges for each network type by first analyzingthe framework’s requirements, then designing an embedding-based model and itscomponents to satisfy such needs For each proposed framework of each networktype, we validate its effectiveness by extensive experiments with both syntheticand real-world datasets against state-of-the-art baselines We also demonstratethe scalability and robustness of the proposed models against different adversarialconditions (e.g., structural noises)
Trang 181.5 Contributions and Thesis Outline
In addressing the above research questions, this thesis makes the following tributions:
con-Enhancing knowledge graph completion performance In Chapter 3, we solvethe problem of knowledge graph completion on three new challenging KGs.Given an incomplete knowledge graph with many missing valid triples, we pro-pose a knowledge graph completion framework that can produce high-qualityresults In particular:
• We propose a new effective GNN-based KG representation learning model,named NoGE, to integrate co-occurrence among entities and relations in theencoder module for knowledge graph completion
• We also propose a novel form of GNNs, named Dual Quaternion GraphNeural Network (DualQGNN) as the encoder module Which allows therepresentations of entities and relations of KG to be expressive
• We conduct extensive experiments to compare our NoGE with other strongGNN-based baselines and show that NoGE outperforms these baselines aswell as other up-to-date KG embedding models and obtains state-of-the-artresults on three new and difficult benchmark datasets
Enhancing knowledge graph alignment performance In Chapter 4, we solvethe problem of knowledge graph alignment on large-scale KGs Given the twoassociated KGs, we propose an architecture that can embed entities of KGs intolow dimensional vector space and then align corresponding entities across KGs.The contribution of this solution is as follow:
• We propose a framework called EMGCN for unsupervised KG entity ment with no prior knowledge Since this framework is grounded in thelate-fusion mechanism, rich KG information (e.g relational triples, attributetype, attribute value) can be integrated regardless of the modality This al-lows us to be the first in the literature to successfully use the attribute value
align-• We design a GCN-based model that exploits the rare characteristics of GCNs,including multi-order and permutation immunity, to simultaneously inte-grate different relation-related consistency constraints We also tailor theloss function to enforce joint and consistent learning of the embeddings of
Trang 19two KGs to support their alignment and avoid reconciliation of their ding spaces.
embed-• We conduct experiments on real-world and synthetic KG datasets to uate our scheme The results show that our framework outperforms otherbaselines and also is robust to various types of adversarial conditions
eval-Enhancing knowledge graph completion and knowledge graph alignmentperformance at the same time Chapter 5 solves the two mentioned tasks si-multaneously by proposing a multi-task learning model We argue this is the firstarchitecture to solve two research questions related to KGs simultaneously Inparticular, the contribution of this innovation is as follow:
• We address the problem of aligning incomplete KGs using external edge bases and propose a framework called Incomplete Knowledge graphsAligner via MultI-channel Feature Exchange (IKAMI) The model exploitsmultiple representations to capture the multi-channel nature of KGs (e.g re-lational type, entity name, structural information) This is the first attempt
knowl-to address the entity alignment and knowledge completion at the same time,and we argue that this collaboration benefit for the both tasks, especially thealignment performance
• We design a joint train schedule of the two embedding models to enable theholistic objectives of the embeddings can support each other well Then, thesimilarity matrix for each channel is calculated and fused by weighted-sum
to return the final result
• We conduct experiments on real-world and synthetic KG datasets to evaluateour scheme The results show that our framework outperforms other base-lines in the entity alignment task and the knowledge completion task by up
to 15.2% and 3.5%, respectively
The remainder of this thesis is organised as follows Chapter 2 presents a survey
of literature related to research challenges addressed in this thesis work Chapter3,4, and 5 address the research challenges as above Chapter 6 concludes ourthesis and discusses the future works
1.6 Selected Publications
This thesis is based on the following research papers:
Trang 20• Dai Quoc Nguyen*, Vinh Tong*, Dinh Phung, Dat Quoc Nguyen view Graph Neural Networks for Knowledge Graph Completion”, In 202215th ACM International on Web Search and Data Mining Accepted (WSDM
“Two rank A*)
• Nguyen Thanh Tam, Huynh Thanh Trung, Hongzhi Yin, Tong Van Vinh,Darnbi Sakong, Bolong Zheng, Nguyen Quoc Viet Hung “Entity Alignmentfor Knowledge Graphs with Multi-order Convolutional Networks” In 2021IEEE Transaction on Knowledge and Data Engineering Accepted (TKDE -rank Q1)
• Vinh Tong, Huynh Thanh Trung, Nguyen Thanh Tam, Nguyen Quoc VietHung, Huynh Quyet Thang “IKAMI: Multi-channel Feature Exchange forAligning Incomplete Knowledge Graphs from Different Domains”, Submit-ted to 2022 48th International Conference on Very Large Data Base Underreview (VLDB - rank A*)
Trang 21CHAPTER 2 BACKGROUND
2.1 Graph Convolutional Networks (GCNs)
Figure 2.1: CNN and GCN comparison [37]
Convolutional Neural Network (CNN) has long been used as a great tool forcapturing image (or grid-structured data in general) feature As illustrated inFigure 2.1, CNN (left-hand side) operates over a grid structure of pixels whereeach pixel has exactly 8 (or 3, 5 if the pixel is at the corner, edge of the image)neighbor pixels On the other hand, at the right-hand side, we can see that thereare no fixed structured (number of neighbors) around each node in the graph.Thus, designing a graph convolution is not quite straight forward as CNN Indeed, GCN can be considered as a generalized version of CNN since CNN’s 2Dstructure is equivalent to a special graph
To get a hidden representation of a centre node, one simple solution of GCN is
to take the average value of the node features of itself along with other neighbornodes [37] Suppose we have a homogeneous graph G = (V, A, X), whereV isthe set of nodes;A ∈ {0, 1}|V|×|V| is the adjacency matrix whereAu,v = 1meansthere is an edge connecting nodeuto nodev of the graph andA u,v = 0otherwise;
X ∈ R|V|×d is the attribute matrix where X v is the initialize attribute vector ofnode v GCN learns multi-layer representations for nodes in the graph where
whereWk is thek-th layer trainable parameter of the model andσis an activationfunction, such as ReLU (.) or Sigmoid(.); Hk is the k-th layer representation ofgraph nodes (H0 = X; L˜ is the normalized graph Laplacian matrix [37] which iscomputed as follow:
Trang 22where Iis the identical matrix; D is the degree matrix (a diagonal matrix where
D v,v equals to the degree of node v) We then have the formula for updating anodev representationh v as:
Xu∈N (v)
2.2 Knowledge Graph Completion background
2.2.1 Incomplete knowledge graphs
The KG is often denoted as KG = (V, R, E), where V is the set of entities; R isthe set of relations and E is the set of triples The triple hh, r, ti ∈ E is atomicunit in KG, which depicts a relation r between a head (an entity) h and a tail t(an attribute or another entity) We present the incomplete knowledge graphs byextending the KG notation as i-KG = (V, R, E, ¯ E), where E¯is the set of missingtriples in the i-KG For brevity’s sake, we use i-KGandKGinterchangeably in thisthesis There are a lot of KGs that have attribute information along with structureinformation For example, It can be seen from Figure 1.4 that each entity hassome extra information such as Duke University has “type: University”, “loca-tion: North Carolina”, and “found: 1838” To model those information, someworks introduce some addition elements to the definition of KGs namely attributetriples (e.g hDuke University,Found, 1983i) Formally, a KG with addition at-tribute and value information can be represented as i-KG = (V, R, E, ¯ E, A, V, EA),where A, V, E A are set of attributes, values, and attribute triples, respectively
2.2.2 Knowledge graph completion models
Given the incomplete knowledge graph KG = (V, R, E, ¯ E, A, V, E A ), where E¯isunrevealed, the knowledge graph completion (KGC) task aims to discover all themissing tripleshh, r, ti ∈ ¯ E|hh, r, ti / ∈ E
For each triple hh, r, ti, the embedding models define a score function f (h, t, r)
Trang 23of its plausibility Their goal here is to choose f such that the score f (h, r, t)of
a correct triple(h, r, t) is higher than the score of f (¯ h, ¯ r, ¯ t) of an incorrect triple
(¯ h, ¯ r, ¯ t) For example, TransE [3] defines a score function of ftranse(h, r, t) =
−||hh + hr − ht||, where h, r, and t are represented by low dimensional tors hh,hr, and ht, respectively As (Melbourne, city of, Australia) is a correcttriple, while (Melbourne, city of, Vietnam) is an incorrect one, we would have:
vec-−||hMelbourne+ hcity of− hAustralia|| > −||hMelbourne+ hcity of− hVietnam|| Shallowbased models often distinguish each other by their score function, we will exploresome of them as following
a, Translation-based models
The first model in this category is TransE [3] which is inspired by models such
as Word2vec, Skip-gram model [38] where relationships between words oftencorrespond to translations in latent feature space In particular, TransE aims toreturn low-dimensional representation for each entity and relation in the KG andensure that each relation type corresponds to a translation operation from headentity vector to the tail entity vector, i.e hh + hr = ht However, this limitsTransE to only be able to model 1-to-1 relationship such as “is capital of”, where
a head entity is linked to at most one tail entity given a relation type Thus, it fails
to model “Many-to-1”, ”1-to-Many”, or “Many-to-Many” relationships
On the other hand, TransH [39] handles all the problems of TransE by ing each relation with a relation-specific hyperplane and uses a projection vector
associat-to project entity vecassociat-tors onassociat-to that hyperplane TransD [40] and TransR/CTransR[41] extend TransH by using two projection vectors and a matrix to project entityvectors into a relation-specific space, respectively STransE [42] and TranSparse[43] can be viewed as direct extensions of TransR, where head and tail entitiesare associated with their own projection matrices Unlike STransE, TranSparse[43] uses adaptive sparse matrices, whose sparse degrees are defined based on thenumber of entities linked by relations ITransF [44] can be considered as a gen-eralization of STransE, which allows the sharing of statistic regularities betweenrelation projection matrices and alleviates data sparsity issues
b, Neural network-based models
The neural tensor network (NTN) [45] model uses a bilinear tensor operator
to present each relation while ProjE can be viewed as a simplified version ofNTN ConvE [4] and ConvKB [6] are based on convolutional neural networks
Trang 24ConvE uses a convolution layer directly over 2D reshaping of head entity andrelation embeddings, while ConvKB applies a convolution layer over the embed-ding triples HypER [46] simplifies ConvE by using a hyper network to produce1D convolutional filters for each relation, then extracts relation-specific featuresfrom head entity embeddings.
c, Complex vector-based models
Instead of embedding entities and relations in the real-valued vector space, plEx [9] is an extension of embedding models in complex vector space ComplEx-N3 [47] extends ComplEx with weighted nuclear 3-norm Also, in the complexvector space, RotatE [5] defines each relation as a rotation from the head en-tity to the tail entity QuatE [48] represents entities by quaternion embeddings(i.e., hypercomplex-valued embeddings) and models relations as rotations in thequaternion space by employing the Hamilton and quaternion-inner products
Com-d, Graph neural network based models
Currently, there is an increasing trend of using graph neural networks (GNNs)
as an efficient tool to achieve graph representation that captures not only graphstructure but also node attributes GNNs are originally designed for general undi-rected graphs However, several works have adapted those architectures to fit withthe multi-relational nature of KGs Generally, They design their own graph neu-ral network architectures as encoders to return entity and relation embeddings.These embeddings will then be pushed forward to a decoder module that can beany of the mentioned above to return triple scores Formally, a GNN architecture
is a multi-layer neural network where each layer produces embeddings of graphcomponents by:
hk+1p = fe
X(q,r)∈N (p)
mkqr
whereN (p) = {(q, r)|(hp, r, qi ∈ E) ∨ (hq, r, pi ∈ E)}is the neighbor set of entityp,
mkqr ∈Rd denotes the message passing from neighbor entitypto entityqthroughrelation r; and fe is a linear transformation followed by an activation function.The main difference between GNN models lies on their messages mkqr Regard-ing the GNN-based KG embedding approaches, R-GCN [10] modifies GCNs to
Trang 25introduce a specific message:
Recently, CompGCN [12] customizes GCNs to consider composition operations
h(k+1)r = Wkhkr
CompGCN then applies ConvE [4] as the decoder module This model is actuallythe first architecture that allows relation to have their own embeddings at eachGNN layer
Note that R-GCN and CompGCN do not consider co-occurrence among entitiesand relations in the encoder module This limitation also exists in other GNN-based models such as SACN [11]
2.3 Knowledge Graph Alignment background
2.3.1 Previous approaches
In recent years, entity alignment models based on representation learning haverapidly received widespread attention from academia and industry These meth-ods use low-dimensional vector representation for entities in KGs to calculate thesimilarity between them across KGs to find equivalent entity pairs They can bedevided into two main categories namely semantic matching-based models andgraph neural network-based models
a, Semantic matching-based models
All models in this class try to contain semantic information about the entity to thelow-dimensional vector representation of entities Inspired by TransE, MTransE[21] uses TransE to learn the vector representation of a single knowledge graph
Trang 26and then learns linear transformation to map them to the same vector space Themodel then uses the cosine similarity metric to align entity pairs IPTransE [23]restricts pre-aligned equivalent entities to have the close vector representation andthen uses PTransE [49] to iteratively learn and align different KGs in a unifiedvector space BootEA [24] tries to solve a more challenging problem when itconsiders only a small number of supervised entity pairs and then continuouslyselects possible entity pairs for training through an iterative method.
Another representation method is to integrate a variety of knowledge to enrich tity semantics JAPE [22] uses TransE to represent entities and uses Skip-gram tolearn attribute representations Based on the assumption that entities with similarattributes have a greater probability of being equivalent entities, it is generalized
en-by the similarity between attributes to enhance the semantic of entities KDCoE[50] uses GRU to encode the description information of entities and performscollaborative training with representation learning based on relational triples toimprove the alignment performance
b, Graph Neural Network-based models
Many models have successfully adapted GNNs to solve the problem of KGA.GCN-Align [51] uses Graph Convolutional Network (GCN) to learn the vectorrepresentation of entities, and at the same time, allows two GCNs encodings dif-ferent KGs to encode relational triples as well as attribute triples information tothe presentation of entities MuGNN [27] pays attention to the sparsity of the
KG and uses AMIE+ to infer and complete the missing entities in the KG matically, construct a denser graph, and aligns entities with a small number ofpre-aligned pairs RDGCN [25] introduces the concept of dual graphs when con-structing the relationship graph between entities and enhances the discrimination
auto-of different entity network structures through the restriction auto-of dual graphs
2.3.2 Alignment constraints
Existing entity alignment methods for cross-lingual KGs focus on three types ofconstraints, namely entity consistency, relation consistency and attribute consis-tency
Trang 27a, Entity consistency
Since the corresponding entities reflect the same real-world entity (e.g a person
or concept), their names should be equivalent In Figure 1.3 (showing part ofYAGO), the terms ‘universite de Duke’ in the French KG and ‘Duke University’ inthe English KG both represent a university in North Carolina Recent works haveused Google Translate to check whether the names of corresponding entities indifferent languages have the same meaning by translating them into English [25]
b, Relation consistency
This requirement (a.k.a the homophily rule) states that if two nodes n 1 and n 2are closely related in one network in a structural manner (e.g., being neighbours),then their corresponding nodes n01 and n02 also have a close relation in the coun-terpart KG [26] In Figure 1.3, two entities Bill Gates and Melinda Gates areconnected in the English KG; under relation consistency, their two correspondingentity nodes in the French KG are also connected Mathematically, ifpandqhave
a relation triplehp, r, qi in the source KG, and (p, p0) and (q, q0) are anchor links,thenp0,q0also have a relation triple hp0, r, q0iin the target KG Note that relationtriples in KGs are directional, and this direction should also be respected
c, Attribute consistency
This requirement states that corresponding entities should have equivalent tributes and equivalent attribute values [52] For example, the entity Bill Gateshas an attribute triple hBill Gates,DOB,28-10-1955i in the English KG and itscounterpart ishBill Gates,N´e `a,28-10-1955iin the French KG Formally, if(p, p0)
at-is an anchor link and(p, a, v) ∈ EsAis an attribute triple, then there exists(e0, a0, v0) ∈
EtA such thataanda0 are equivalent andv andv0are equivalent
However, many conventional entity alignment models struggle to address all threetypes of consistency requirements simultaneously, since the attribute imbalance(the difference in the number of attributes) between the two KGs and modalityinconsistency are frequently observed in the real-world datasets While there are
a few notable exceptions, they often ignore the value in an attribute triple [26]
Trang 282.3.3 Incomplete knowledge graph alignment
By generalising the problem setting in related works, i-KGalignment aims to findall of the corresponding entities of two given i-KGs Without loss of generality,
we select one i-KG as the source graph and the other as the target graph, anddenote them as KGs and KGt respectively Note that EsSE ¯s = EtSE ¯t, whichrepresents the complete triple facts Then, for each entity p in the source graph
KGs, we aim to recognise its counterpartp0(if any) in the target knowledge graph
KG t The corresponding entities (p, p0) also often denoted as anchor links; andexisting alignment frameworks often require supervision data in the form of a set
of pre-aligned anchor links, denoted by L
Since the corresponding entities reflect the same real-world entity (e.g a person
or concept), the existing alignment techniques often rely on the consistencies,which states that the corresponding entities should maintain similar character-istics across different KGs [26] The entity consistency states that the entitiesreferring to the same objects should exist in both the KGs and have equivalentname The relation consistency (a.k.a the homophily rule) declares that the enti-ties should maintain their relationship characteristics (existence, type, direction).While KG alignment and completion have been studied for decades [33], [53],there is little work on jointly solving these problems together However, doing
so is indeed beneficial: missing triples hh, r, ti ∈ ¯ E in one KG can be recovered
by cross-checking another KG via the alignment, which, in turn, can be boosted
by recovered links To the best of our knowledge, this work is a first attempt tosolve the joint optimization of KG alignment and completion, which is formallydefined as follows
Trang 29CHAPTER 3 ENHANCING KNOWLEDGE GRAPH COMPLETION
PERFORMANCE
3.1 Introduction
In addition to conventional KG embedding models such as TransE [54], DistMult[8], ComplEx [9], ConvE [4], ConvKB [6], and TuckER [7], recent approacheshave adapted graph neural networks (GNNs) for knowledge graph completion[10]–[12], [55] In general, vanilla GNNs are modified and utilized as an encodermodule to update vector representations for entities and relations; then these vec-tor representations are fed into a decoder module that adopts a score function(e.g., as employed in TransE, DistMult, and ConvE) to return the triple scores.Those GNN-based models, however, are still outperformed by other conventionalmodels on some benchmark datasets [56] To boost the model performance, ourmotivation comes from the fact that entities and relations forming facts oftenco-occur frequently in news articles, texts, and documents, e.g., “Melbourne”co-occurs frequently together with “Australia”
We thus propose a new effective GNN-based KG embedding model, named NoGE,
to integrate co-occurrence among entities and relations in the encoder module forknowledge graph completion (as the first contribution) NoGE is different fromother existing GNN-based KG embedding models in two important aspects: (i)Given a knowledge graph, NoGE builds a single graph, which contains entitiesand relations as individual nodes; (ii) NoGE counts the co-occurrence of enti-ties and relations to compute weights of edges among nodes, resulting in a newweighted adjacency matrix Consequently, NoGE can leverage the vanilla GNNsdirectly on the single graph of entity and relation nodes associated with the newweighted adjacency matrix As the second contribution, NoGE also proposes
a novel form of GNNs, named Dual Quaternion Graph Neural Networks alQGNN) as the encoder module Then NoGE employs a score function, e.g.QuatE [57], as the decoder module to return the triple scores As our final con-tribution, we conduct extensive experiments to compare our NoGE with otherstrong GNN-based baselines and show that NoGE outperforms these baselines
(Du-as well (Du-as other up-to-date KG embedding models and obtains state-of-the-artresults on three new and difficult benchmark datasets CoDEx-S, CoDEx-M, andCoDEx-L [58] for knowledge graph completion
Trang 303.2 Dual quaternion background
A background in quaternion can be found in recent works [57] We briefly provide
a background in dual quaternion [59] A dual quaternion h ∈ Hd is given in theform: h = q + p, whereqandpare quaternions∈H,is the dual unit with2 = 0
Conjugate The conjugateh∗of a dual quaternionhis defined as: h∗ = q∗+ p∗
Addition The addition of two dual quaternionsh 1 = q 1 + p 1andh 2 = q 2 + p 2
is defined as: h1+ h2 = (q1+ q2) + (p1+ p2)
Dual quaternion multiplication The dual quaternion multiplication⊗dof twodual quaternionsh 1 andh 2is defined as:
h1⊗dh2 = (q1⊗ q2) + (q1⊗ p2+ p1⊗ q2)
where ⊗denotes the Hamilton product between two quaternions
Norm The norm khkof a dual quaternionhis a dual number, which is usuallydefined as: khk =√h ⊗dh ∗ =pkqk 2 + 2q • p = kqk + q•pkqk
Unit dual quaternion A dual quaternionhis unit ifh ⊗dh∗= 1 withkqk 2 = 1
Matrix-vector multiplication The dual quaternion multiplication⊗dof a dualquaternion matrix WDQ = WQq + WQp and a dual quaternion vector hDQ =
qQ+ pQis defined as:
where the superscripts DQ and Q denote the dual Quaternion space Hd and theQuaternion space H, respectively
Trang 31Figure 3.1: An illustration of our proposed NoGE
To enhance the efficiency of the encoder module, our motivation comes fromthe fact that valid entities and relations co-occur frequently in KGs, e.g., “Mel-bourne” co-occurs together with “city of” frequently Given a knowledge graph
KG, NoGE builds a single graph G that contains entities and relations as nodesfollowing Levi graph transformation [60], as illustrated in Figure 3.1 The totalnumber of nodes inG is the sum of the numbers of entities and relations, i.e |V|
= |E| + |R| NoGE then builds edges among nodes based on the co-occurrence
of entities and relations within the triples in KG Formally, NoGE computes theweights of edges among nodespandqto create a new weighted adjacency matrix
Trang 32eral advantages in modeling rotations and translations, and efficiently ing rigid transformations [64] Therefore, we introduce Dual Quaternion GraphNeural Networks (DualQGNN) and then utilize our DualQGNN as the encodermodule in NoGE as:
represent-hk+1,DQp = fe
Xq∈N p ∪{p}
, wherein eA = A +I, andD is the diagonal node degree matrix˜
of eA
NoGE obtains the dual quaternion vector representations of entities and relationsfrom the last DualQGNN layer of the encoder module For each obtained dualquaternion representation, NoGE concatenates its two quaternion coefficients toproduce a final quaternion representation These final quaternion representations
of entities and relations are then fed to QuatE [57], employed as the decodermodule, to compute the score of (h, r, t) as:
We then apply the Adam optimizer [65] to train our proposed NoGE by ing the binary cross-entropy loss function [4] as:
Trang 333.4 Experimental Results
3.4.1 Experiment setup
We evaluate our proposed NoGE for the knowledge graph completion task, i.e.,link prediction [54], which aims to predict a missing entity given a relation withanother entity, e.g., predicting a head entity h given (?, r, t) or predicting a tailentityt given (h, r, ?) The results are calculated by ranking the scores produced
by the score functionf on triples in the test set
Table 3.1: Statistics of the experimental datasets
CoDEx-M 17,050 51 185,584 10,310 10,311CoDEx-L 77,951 69 551,193 30,622 30,622
b, Evaluation protocol
Following [54], for each valid test triple(h, r, t), we replace eitherh ortby each
of all other entities to create a set of corrupted triples We also use the tered” setting protocol [54] We rank the valid test triple and corrupted triples
“Fil-in descend“Fil-ing order of their scores and report mean reciprocal rank (MRR) andHits@10(the proportion of the valid triples ranking in top 10predictions) Thefinal scores on the test set are reported for the model that obtains the highest MRR
on the validation set
Trang 34c, Training protocol
We set the same dimension value for both the embedding size and the hidden size
of the DualQGNN hidden layers, wherein we vary the dimension value in{32, 64,
128} We fix the batch size to 1024 We employtanhfor the nonlinear activationfunction fe We use the Adam optimizer [65] to train our NoGE model up to3,000 epochs on CoDEx-S and CoDEx-M, and 1,500 epochs on CoDEx-L Weuse a grid search to choose the number of hidden layers ∈ {1, 2, 3}and the Adaminitial learning rate ∈
1e−4, 5e−4, 1e−3, 5e−3 To select the best checkpoint, weevaluate the MRR after each training epoch on the validation set
Baselines’ training protocol For other baseline models, we apply the sameevaluation protocol The training protocol is the same w.r.t the optimizer, thehidden layers, the initial learning rate values, and the number of training epochs
In addition, we use the model-specific configuration for each baseline as follows:
• QuatE [57]: We set the batch size to 1024 and vary the embedding dimension
in{64, 128, 256, 512}
• Regarding the GNN-based baselines – R-GCN [10], CompGCN [12], SACN[11], and our NoGE variants with QGNN and GCN – we also set the samedimension value for both the embedding size and the hidden size, wherein wevary the dimension value in{64, 128, 256, 512}
• Our NoGE variant with QGNN: This is a variant of our proposed method thatutilizes QGNN [55] as the encoder module
• Our NoGE variant with GCN: This is a variant of our proposed method thatutilizes GCN [61] as the encoder module
• CompGCN: We consider a CompGCN variant that set ConvE [4] as its decodermodule, circular-correlation as its composition operator, the kernel size to 7,and the number of output channels to 200, producing the best results as reported
in the original implementation
• SACN: For its decoder Conv-TransE, we set the kernel size to 5 and the number
of output channels 200 as used in the original implementation
Trang 353.4.2 Main results
In Table 3.2, we report our obtained results for NoGE and other strong lines including QuatE [57], R-GCN [10], SACN [11] and CompGCN [12] on theCoDEx datasets
base-Table 3.2: Experimental results on the CoDEx test sets
is outperformed by ConvE on CoDEx-S and CoDEx-M CompGCN also doesnot perform better than ComplEx and TuckER on the CoDEx datasets Similarly,QuatE, utilized as our NoGE’s decoder module, also produces lower results thanComplEx, ConvE, and TuckER
When comparing with QuatE and three other GNN-based baselines, our NoGEachieves substantial improvements on the CoDEx datasets For example, NoGEgains absolute Hits@10 improvements of 2.9%, 2.7%, and 2.2% over CompGCN
on CoDEx-S, CoDEx-M, and CoDEx-L In general, our NoGE outperforms to-date embedding models and is considered as the best model on the CoDExdatasets In particular, NoGE yields new state-of-the-art Hits@10 and MRRscores on the three datasets, except the second-best MRR on CoDEx-S
up-Ablation analysis We compute and report our ablation results for three ants of NoGE in Table 3.3 In general, the results degrade when using eitherQGNN or GCN as the encoder module, showing the advantage of our proposedDualQGNN The scores also degrade when not using the new weighted adjacencymatrixA Besides, our NoGE variants with QGNN and GCN also substantially
Trang 36vari-outperform three other GNN-based baselines R-GCN, SACN, and CompGCN,thus clearly showing the effectiveness of integrating our matrixAinto GNNs for
KG completion
Table 3.3: Ablation results on the validation sets
Trang 37CHAPTER 4 ENHANCING KNOWLEDGE GRAPH ALIGNMENT
PERFORMANCE
4.1 Introduction
The problem of entity alignment for cross-lingual KGs has been studied sively with the emergence of graph embedding techniques [19], [20] Giventwo monolingual KGs, these techniques first learn low-dimensional vectors rep-resenting the entities of each KG, and the corresponding entities are then dis-covered based on their vector similarities The first generation methods of thisparadigm, including MTransE [21], JAPE [22], ITransE [23], and BootEA [24],learn the embeddings on the assumption that if two entities have a relation, thedistance between their respective embeddings is equal to the embedding of theirrelation Avoiding this strict assumption [25], the second generation of embed-ding techniques such as GCN-Align [26], RDGCN [25], MUGNN [27], KG-matching [28], and NAEA [29] employ graph neural networks, which encode thestructural relationship based on neighbourhood information [30]
inten-However, we argue that the above approaches overload the embedding modelwith unrelated objectives On the one hand, the entity embeddings must encodethe syntactic information (e.g., neighborhood, topology, degree) for each KG Atthe same time, they also need to reflect the semantic alignment of entities acrossKGs Some techniques such as JAPE [22] use pre-aligned entities to remedythis issue by increasing the influence of negative samples in the loss function.Furthermore, existing models have not fully utilized the attribute information ofentities (e.g., the age attribute of a person, the country’s population) due to thehigh levels of inconsistency and linguistic differences For example, GCN-Alignconsiders only the types of attributes and ignores their values [26]
This chapter meets the above requirements via a unified and adaptive entity ment model for cross-lingual KGs In essence, our idea is to fully leverage therichness of a KG by simultaneously comparing the relational and attributionalinformation of the entities to be aligned The fusion of these types of informa-tion helps them complete each other and mitigate the high levels of consistencyviolation of each kind To efficiently extract the relational data, we propose touse the multi-layer characteristics of graph convolutional networks (GCNs) [30]
align-to model the relational correlation at different orders without the need for vision data (e.g., pre-aligned entities) In terms of attributional information, we
Trang 38super-adopt the advance of machine translation (e.g., Google Translate) to efficientlyreconcile the information in different languages and avoid human involvement.More specifically, we summarise our contributions as follows:
• We propose a framework called Entity Alignment with Multi-order GraphConvolutional Network (EMGCN) KG entity alignment with no prior knowl-edge Since this framework is grounded in the late-fusion mechanism, rich
KG information (e.g relational triples, attribute type, attribute value) can beintegrated regardless of the modality
• We design a GCN-based model that exploits the rare characteristics of GCNs,including multi-order and permutation immunity, to simultaneously inte-grate different relation-related consistency constraints We also tailor theloss function to enforce joint and consistent learning of the embeddings oftwo KGs to support their alignment and avoid reconciliation of their embed-ding spaces
• We conduct experiments on real-world and synthetic KG datasets to uate our scheme The results show that our framework outperforms otherbaselines and also is robust to various types of adversarial conditions
eval-4.2 Overview of the Proposed Approach
Alignment Matrix
…
… Topology Information
consistency loss
Relation network of source KG
Relation network of target KG
Attribute Alignment
Attribute triples of source KG
Attribute triples of target KG
Figure 4.1: Overview of EMGCN framework
4.2.1 Motivation
A KG entity alignment framework should satisfy the following requirements:
Trang 391 Consistency: Entity consistency, relational consistency, and attribute sistency should be respected since these constraints guide the model to findprecise anchor links w.r.t the specific characteristics of KG (e.g., name equiv-alence, directional relations) False positives may adversely affect the per-formance of the downstream tasks.
con-2 Adaptivity: While the consistency constraints form the backbone assumption
of the alignment techniques, these consistency constraints are sometimesviolated in real-world datasets, e.g., when one KG possesses more entitiesthan another Attribute noise is also common, e.g., when an entity in thesource KG has more attributes than its counterpart in the target KG or whentheir attribute values are stored in different formats
Several challenges need to be addressed to satisfy these requirements Firstly,the source and target KGs often show some inherent differences in the form ofconsistency violations (noise) [66], [67] The proposed model should be immune
to node permutation and robust to structural and attribute noise Secondly, guistic challenges arise when entity and attribute names must be unified in thesame language for direct comparison The use of a translation engine such asGoogle Translate is only a temporary solution, however, since translations with-out context are not always accurate [25] Thirdly, a high noise level often exists
lin-in attribute lin-information For example, when correspondlin-ing entities do not haveequivalent attributes and their values have different formats
4.2.2 The entity alignment framework
The structure of our framework is presented in Figure 4.1 First, we forward therelational network extracted from the input KGs to a designed multi-order GCN-based model to embed the KG entities in low-dimensional vector spaces Therelational correlation of two entities is captured based on the distance betweentheir embeddings We then retrieve the relation alignment using the learned em-beddings and retrieve the attribution alignment using a strategy based on machinetranslation Finally, the alignments in different views are combined to give theoverall result Three important functionalities must be considered here:
a, Multi-order relational-aware embedding
To integrate the relational information of entities into the framework, we adopt
a GCN-based model to learn the relation-aware embedding for the entities The
Trang 40model consists of several layers; each encodes network topology at multiple ders To train the model, we optimize a loss function equivalent to minimizingthe violations of consistency constraints Only the relational information of KGs
or-is used in thor-is step since attribute information often has a high level of noor-iseand may degrade the quality of the embeddings The details of this process areexplained in section 4.3
b, Relational Alignment
In this step, we compute the alignment output using the relational information ofthe embeddings from all GCN layers In more detail, we construct a single-orderalignment matrix at each layer and then apply a weighted-sum combination to thematrices to obtain the final relational alignment output The weights representthe importance of the layers Before calculating the single-order matrices, weperform a tuning pre-process to decay the impact of noise via an iterative process.The details of this process are described in section 4.4
c, Attribute Alignment
Candidate anchor links are further solidified based on the attribute information ofthe entities (i.e the attributes and their values) A dictionary of correspondingattributes is built to compute the attribute-based similarity A value-based simi-larity is then calculated using a Jaccard measure These similarities are combinedwith the relation similarity to produce the final alignment matrix The details aregiven in section 4.4
4.3 Relation-aware Multi-order Embedding
In this section, we describe our GCN-based model, which learns the aware representation for entities while guaranteeing the consistency constraints
relation-4.3.1 GCN-based embedding model
We employ a GCN to learn the representation for the entities Our GCN-basedmodel consists ofk layers, and each hidden feature layer simultaneously encodesthe topological and attributional information using a message passing scheme inwhich the hidden features in the current layers are constructed from the hiddenfeature in previous layers [68] Based on the general definition of one-layer GNN