Given: Find: class red / green for rest nodes Assuming: networks have homophily 11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu... ¡ Intuitio
Trang 1CS224W: Analysis of Networks
http://cs224w.stanford.edu
Trang 2¡ Main question today: Given a network with
labels on some nodes, how do we assign
labels to all other nodes in the network?
¡ Example: In a network, some nodes are
fraudsters and some nodes are fully trusted
How do you find the other fraudsters and
trustworthy nodes?
Trang 3¡ Main question today: Given a network with
labels on some nodes, how do we assign
labels to all other nodes in the network?
¡ Collective classification: Idea of assigning
labels to all nodes in a network together
¡ Intuition : Correlations exist in networks
Trang 4¡ Individual behaviors are correlated in a
Trang 511/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
(Easley and Kleinberg, 2010)
Trang 6¡ How to leverage this correlation observed in
networks to help predict user attributes or
interests?
Trang 7¡ Similar entities are typically close together or directly connected:
§ “ Guilt-by-association ”: If I am connected to a
node with label X, then I am likely to have label X
as well.
§ Example: Malicious/benign web page :
Malicious web pages link to one another to increase visibility, look credible, and rank
higher in search engines
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7
Trang 8¡ Classification label of an object O in network
may depend on:
§ Features of O
§ Labels of the objects in O’s neighborhood
§ Features of objects in O’s neighborhood
Trang 9Given:
Find: class ( red / green ) for rest nodes
Assuming: networks
have homophily
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 10¡ Let ! be a "×" (weighted) adjacency matrix over " nodes
¡ Let Y = −1, 0, 1 ) be a vector of labels :
§ 1: positive node, known to be involved in a gene
function/biological process
§ -1: negative node
§ 0: unlabeled node
Trang 11¡ Intuition: simultaneous classification of
interlinked objects using correlations
¡ Several applications
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12
Trang 12¡ Markov Assumption : the label Y i of one node i depends on the label of its neighbors N i
¡ Collective classification involves 3 steps:
Local Classifier Relational Classifier Collective Inference
!(# $ |&) = ! # $ ) $ )
Trang 13Local Classifier : used for initial label assignment
§ Predicts label based on node attributes/features
§ Classical classification learning
§ Does not employ network information
• Learn a classifier from the labels or/and attributes
of its neighbors to label one node
• Network information is used
Collective Inference : propagate the correlation
• Apply relational classifier to each node iteratively
• Iterate until the inconsistency between neighboring labels is minimized
• Network structure substantially affects the final prediction
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 14¡ Exact inference is practical only when the
network satisfies certain conditions
Trang 15¡ How to predict the labels Y i for the nodes i in
yellow?
¡ Each node i has a feature vector f i
¡ Labels for some nodes are given (+ for green, - for blue)
¡ Task: find P(Y i ) given all features and the network
17
P(Y i ) = ?
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 16¡ Basic idea: Class probability of Y i is a weighted
average of class probabilities of its neighbors.
¡ For labeled nodes , initialize with ground-truth Y
labels
¡ For unlabeled nodes , Initialize Y uniformly
¡ Update all nodes in a random order till convergence
or till maximum number of iterations is reached
Trang 17¡ Repeat for each node i and label c
¡ W(i,j) is the edge strength from i to j
¡ |N i | is the number of neighbors of I
¡ Challenges :
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19
Trang 18Initialization: All labeled nodes to their labels
and all unlabeled nodes uniformly
P(Y = 1) = 0
P(Y=1) = 0.5 P(Y = 1) = 0.5
P(Y = 1) = 0.5 P(Y = 1) = 0.5
Trang 19¡ Update for the 1 st Iteration:
2111/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
P(Y = 1) = 0
P(Y = 1) = 0 P(Y=1) = 0.5
Trang 20¡ Update for the 1 st Iteration:
P(Y = 1) = 1
Trang 21¡ Update for the 1 st Iteration:
2311/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 22P(Y = 1) = 0.73
P(Y = 1) = 0.91 P(Y = 1) = 1.00
Trang 23P(Y = 1) = 0.85
P(Y = 1) = 0.95
P(Y = 1) = 1.00
Trang 24P(Y = 1) = 0.86
P(Y = 1) = 0.95 P(Y = 1) = 1.00
Trang 25P(Y = 1) = 0.86
P(Y = 1) = 0.95 P(Y = 1) = 1.00
Trang 26¡ All scores stabilize after 5 iterations:
§ Nodes 5, 8, 9 are + (P(Y i = 1) > 0.5)
Trang 27+/-¡ Relational classifiers
¡ Iterative classification
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32
Trang 28¡ Relational classifiers do not use node
attributes How can one leverage them?
¡ Main idea of iterative classification: classify
node i based on its attributes as well as labels
of neighbor set N i
Trang 29¡ Relational classifiers do not use node
attributes How can one leverage them?
¡ Main idea of iterative classification: classify
node i based on its attributes as well as labels
of neighbor set N i
¡ Create a flat vector a i for each node i
¡ Train a classifier to classify using a i
proportion, mean, exists, etc.
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34
Trang 30¡ Bootstrap phase
§ Convert each node i to a flat vector a i
§ Use local classifier f(a i ) (e.g., SVM, kNN, …) to compute best value for Y i
¡ Iteration phase: Iterate till convergence
§ Repeat for each node i
§ Update node vector a i
§ Update label Y i to f(a i ) This is a hard assignment
§ Iterate until class labels stabilize or max number of
iterations is reached
Trang 31¡ w 1 , w 2 , w 3 , … represent presence of words
¡ Baseline : train a classifier (e.g., k-NN) to
classify pages based on words
Ground truth: B
Ground truth: B
Wrong Can we improve?
Same words, but different link structure
Word-based classifier gives same label A to both Can we use link to improve prediction?
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 32¡ I A = 1 if at least one of the incoming pages is labelled A
Similar definitions for I B , O A , and O B
Trang 3511/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Use trained word-vector
classifier to bootstrap on
test set
Trang 41REV2: Fraudulent User Predictions in Rating Platforms Kumar et al ACM Web Search and Data Mining, 2018
Trang 42¡ Review sites are an attractive target for spam:
a +1 star increase in rating increases 5-9%
revenue!
Trang 43¡ Behavioral analysis
session history, etc.
misspell, many agreement words, …
¡ Easy to fake!
¡ Hard to fake: graph structure
reviews, stores
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 52
Trang 44¡ Input: bipartite rating
graph as a weighted
signed network:
§ Edges: rating scores
between -1 and +1
Trang 45¡ Basic idea: Users, products,
quality scores :
§ Users have fairness scores
§ Products have goodness scores
§ Ratings have reliability scores
Trang 46¡ Basic idea: Users, products,
quality scores :
§ Users have fairness scores
§ Products have goodness scores
§ Ratings have reliability scores
values for all nodes and edges
Each product has a
Each user has a
Trang 4711/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 56
¡ Fixing goodness and reliability, fairness is
updated as:
Trang 48¡ Fixing fairness and reliability, goodness is
updated as:
Trang 4911/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 58
¡ Fixing fairness and goodness, reliability is
updated as:
Trang 5111/15/18 F(u) = 1 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60
Trang 52R(r) = 0.58
G(p) = 0.67
G(p) = 0.67
Both gamma values are set to 1
Trang 5311/15/18 F(u) = 0.92 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62
F(u) = 0.92 F(u) = 0.58
F(u) = 0.92
F(u) = 0.92 F(u) = 0.92
R(r) = 0.92
R(r) = 0.92 R(r) = 0.58
Trang 56¡ Low fairness users = Fraudsters
Flipkart were real fraudsters
Trang 57¡ Multiple iterations, but linear scalability
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 66
Trang 59¡ Relational classifiers
¡ Iterative classification
¡ Loopy belief propagation
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 68
Trang 60¡ Used to estimate marginals (beliefs) or the most likely states of all variables (nodes)
“talk” to each other, passing messages
Trang 61Task : Count the number of nodes in a graph*
Condition: Each node can only interact (pass
message) with its neighbors
Example: straight line graph
74
adapted from MacKay (2003) textbook
* Graph can not have loops Explanation later.
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 621 before you
2 before you
there's
1 of me
3 before you
4 before you
5 before you
Task : Count the number of nodes in a graph
Condition: Each node can only interact (pass message) with its
neighbors
Solution: Each node listens to the message from its neighbor, updates
it, and passes it forward
Trang 633 behind you
2 before you
76
2 before you
Each node only sees incoming messages
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 644 behind you
1 before you
there's
1 of me
only see
my incoming messages
Trang 657 here
3 here
11 here (= 7+3+1)
1 of me
78
Each node receives reports from all branches of tree
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 663 here
3 here
7 here (= 3+3+1)
Each node receives reports from all branches of tree
Trang 677 here
3 here
11 here (= 7+3+1)
80
Each node receives reports from all branches of tree
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 69Each node receives reports from all branches of tree
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 70What message will i send to j?
- It depends on what i hears
from its neighbors k
- Each neighbor k passes a
message to i its beliefs of the
state to i
Trang 71¡ Label-label potential matrix : Dependency
between a node and its neighbor
equals the probability of a node i being in
state given that it has a j neighbor in state
¡ Prior belief : Probability of node i
being in state
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 84
Trang 721 Initialize all messages to 1
2 Repeat for each node
Label-label potential Prior All messages from neighbors
Trang 73After convergence:
= i’s belief of being in
state
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 86
Prior All messages from neighbors
Trang 74¡ Messages from different subgraphs are
no longer independent!
¡ But we can still run BP
it's a local algorithm so it doesn't "see What if our graph has cycles?
Trang 75This is an extreme example Often in practice, the cyclic influences are weak (As cycles are long or include at least one weak correlation.)
F 1 • Messages loop around and around :
2, 4, 8, 16, 32, More and more convinced that these variables are T!
• BP incorrectly treats this message as
separate evidence that the variable
is T
• Multiplies these two messages as if they were independent
• But they don’t actually come from
independent parts of the graph.
• One influenced the other (via a cycle)
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Trang 76¡ Advantages :
form of potentials (higher order than pairwise)
¡ Challenges :
especially if many closed loops
¡ Potential functions (parameters)
Trang 77Netprobe: A Fast and Scalable System for
Fraud Detection in Online Auction Networks
Pandit et al., World Wide Web conference 2007
Trang 78¡ Auction sites: attractive target for fraud
Complaint Center in U.S in 2006
Trang 79¡ Insufficient solution to look at individual
features: user attributes, geographic
locations, login times, session history, etc.
¡ Hard to fake : graph structure
¡ Main question : how do fraudsters interact
with other users and among each other?
complex relations?
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 92
Trang 80¡ Each user has a reputation score
Trang 81¡ Do they boost each other’s
reputation?
§ No, because if one is caught,
all will be caught
cores (2 roles)
honest, looks legit
§ Fraudster : trades with
accomplice, fraud with honest
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 94
Trang 82¡ How to find near-bipartite cores? How to find roles ( honest , accomplice , fraudster )?
§ Use belief propagation!
Trang 8311/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 96
Initialize all nodes as unbiased
Trang 84Initialize all nodes as unbiased
At each iteration, for each node, compute messages
to its neighbors
Trang 8511/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 98
Initialize all nodes as unbiased
Continue till convergence
At each iteration, for each node, compute messages
to its neighbors
Trang 86P(associate)
P(honest)
Trang 87¡ Three collective classification algorithms:
§ Weighted average of neighborhood properties
§ Can not take node attributes while labeling
§ Update each node’s label using own and neighbor’s labels
§ Can consider node attributes while labeling
§ Message passing to update each node’s belief of itself
based on neighbors’ beliefs
11/15/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 100