Random networkScale-free power-law network Erdos-Renyi random graph Degree distribution is Binomial Degree distribution is Power-law Jure Leskovec, Stanford CS224W: Analysis of Networks,
Trang 1CS224W: Analysis of Networks
http://cs224w.stanford.edu
Trang 2¡ (1) Power-laws in Networks
¡ (2) Network Robustness
¡ (3) Preferential Attachment
Trang 311/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3
Which interesting graph
¡ Node degree distribution
§ What fraction of nodes has degree ! (as a function of !)?
§ Prediction from simple random graph models:
p (!) = exponential function of !
Trang 4Expected based on G np Found in data
! " ∝ " $%
Trang 5¡ Take a network, plot a histogram of !(#) vs #
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
Flickr social network
n= 584,207, m=3,555,115
Trang 6¡ Plot the same data on log-log scale:
Flickr social network
n= 584,207, m=3,555,115
If , = / *+ then log , = −5 log(/)
So on log-log axis power-law looks like
a straight line of slope −5 !
Trang 7¡ First observed in Internet Autonomous Systems
[Faloutsos, Faloutsos and Faloutsos, 1999]
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7
Internet domain topology
Trang 8¡ The World Wide Web [Broder et al., 2000]
Trang 9¡ Other Networks [Barabasi-Albert, 1999]
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9
Power-grid Web graph
Actor collaborations
Trang 10¡ Above a certain ! value, the power law is
always higher than the exponential!
0.2 0.6 1
1
) ( x = cx -
p
x
c x
p ( ) =
-5 0
) ( x = cx -
p
x
Trang 11¡ Power-law vs Exponential
on log-log and semi-log (log-lin) scales
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11
[Clauset-Shalizi-Newman 2007]
semi-log
5 0
) ( x = cx-
p
x
c x
p ( ) = 10
p
x
c x
p ( ) =
-5 0
) ( x = cx-
p
1
) ( x = cx-
Trang 13¡ Power-law degree exponent is
Trang 14¡ Definition:
Networks with a power-law tail in
their degree distribution are called
“scale-free networks”
§ Scale invariance: There is no characteristic scale
§ Scale invariance means laws do not change if scales of length,
energy, or other variables, are multiplied by a common factor
§ Scale-free function: ! "# = " % !(#)
§ Power-law function: ! "# = " % # % = " % !(#)
Log() or Exp() are not scale free!
( )* = log )* = log ) + log * = log ) + ( *
Trang 1511/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15
Many other quantities follow heavy-tailed distributions
[Clauset-Shalizi-Newman 2007]
Trang 16CMU grad-students at
the G20 meeting in
Pittsburgh in Sept 2009
Trang 18¡ Degrees are heavily skewed:
Distribution !(# > %) is heavy tailed if:
Trang 19¡ What is the normalizing constant?
Trang 20¡ What’s the expected value of a power-law
( :
( ( :
,-+ = D − 1
Trang 21¡ Power-laws have infinite moments!
§ If ! ≤ 2 : $[&] = ∞
§ If ! ≤ 3 : +,-[&] = ∞
§ Average is meaningless, as the variance is too high!
from a power-law with exponent α
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21
Trang 23Estimating a from data:
¡ (1) Fit a line on log-log axis using least squares:
§ Solve !"# $%&
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23
BAD!
Trang 24Estimating a from data:
¡ Plot Complementary CDF (CCDF) ! " ≥ $
Then the estimated & = ( + &′
where &′ is the slope of !(" ≥ $)
Trang 25Estimating a from data:
§ The log-likelihood of observed data d i :
Trang 27Random network
Scale-free (power-law) network
(Erdos-Renyi random graph)
Degree distribution is Binomial
Degree distribution is Power-law
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2711/8/18
Trang 29¡ The interest in the robustness problem has
two origins:
à Robustness of a system is an important
problem in many areas
à Many real networks are not regular, but have
a scale-free topology
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29
Trang 30¡ How does network
connectivity change
as nodes get removed?
[Albert et al 00; Palmer et al 01]
¡ Nodes can be removed:
§ Random failure:
§ Remove nodes uniformly at random
§ Targeted attack:
§ Remove nodes in order of decreasing degree
¡ This is important for robustness of the internet
as well as epidemiology
Trang 31¡ Networks with equal number of nodes and edges:
§ ER random graph
§ Scale-free network
¡ Study the properties of the network as an
increasing fraction of nodes are removed
§ Node selection:
§ Random (corresponds to random failures )
§ Nodes with largest degrees (corresponds to targeted attacks)
¡ Measures:
§ Fraction of nodes in the largest connected component
§ Average shortest path length between nodes in the
largest component
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31
Trang 32¡ Study the properties of the network as an
increasing fraction of nodes are removed
§ Node selection:
§ Random failures: select node to remove at random
§ Targeted attacks: remove nodes by decreasing degree
Trang 33¡ Graphs are resilient to random failures, but
are sensitive to targeted attacks:
¡ For random networks there is smaller
difference between the two
random failure targeted attack
Trang 34What proportion of random nodes must be removed
in order for the size (S) of the giant component to
drop to 0?
¡ Infinite scale-free networks with ! < # never
break down under random node failures
$… degree exponent K… maximum degree
Fraction deleted nodes
Trang 35¡ Real networks are resilient to random failures
¡ G np has better resilience to targeted attacks
§ E.g., we need to remove all pages of degree >5 to disconnect
the Web But this is a very small fraction of all web pages!
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35
Fraction of removed nodes
Targeted
attack
G np network
Scale-free network
Random failures Targeted
attack
Trang 36Targeted attack
Trang 37¡ The first few % of
nodes removed:
§ G np
§ SF: Scale-free
¡ Notice how targeted
attacks very quickly
disconnect the network
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37
Trang 3911/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39
Model:
Trang 40¡ New nodes are more likely to link to nodes that already have high degree
¡ Herbert Simon’s result:
§ Power-laws arise from “Rich get richer” (cumulative
advantage)
¡ Examples
proportional to the number it already has
§ Herding: If a lot of people cite a paper, then it must be good, and
therefore I should cite it too
§ Sociology: Matthew effect, http://en.wikipedia.org/wiki/Matthew_effect
§ “For whoever has will be given more, and they will have an abundance Whoever does not have, even what they have will be taken from them.”
§ Eminent scientists often get more credit than a comparatively unknown researcher, even if their work is similar
Trang 41¡ Preferential attachment:
[de Solla Price ‘65, Albert-Barabasi ’99, Mitzenmacher ‘03]
§ Nodes arrive in order 1,2,…,n
§ At step j, let d i be the degree of node i < j
§ A new node j arrives and creates m out-links
§ Prob of j linking to a previous node i is
proportional to degree d i of node i
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41
j
Trang 42We analyze the following simple model:
¡ Nodes arrive in order 1,2,3, … , &
¡ When node ' is created it makes a
single out-link to an earlier node ( chosen:
§ 1) With prob ), ' links to ( chosen uniformly at
random (from among all earlier nodes)
§ 2) With prob * − ), node ' chooses ( uniformly at random & links to a random node l that i points to
node , with prob proportional to - , (the in-degree of ,)
§ Our graph is directed: Every node has out-degree 1
[Mitzenmacher, ‘03]
Node j
Trang 43¡ Claim: The described model generates
networks where the fraction of nodes with
in-degree k scales as:
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43
)
1 1
a
So we get power-law
degree distribution
with exponent:
Trang 44¡ Preferential attachment gives
power-law in-degrees!
¡ Intuitively reasonable process
¡ Can tune model parameter p to get the
observed exponent
§ On the web, P[node has in-degree d] ~ d -2.1
Trang 45¡ Preferential attachment is not so good at
predicting network structure
§ Age-degree correlation
§ Node degree is proportional to its age
§ Solution: Node fitness (virtual degree)
§ Links among high degree nodes:
§ On the web nodes sometimes avoid linking to each other
¡ Further questions:
§ What is a reasonable model for how people
sample network nodes and link to them?
§ Short random walks
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45
Trang 46THE ORIGINS OF PREFERENTIAL
ATTACHMENT
SECTION 5.8
Given the key role preferential attachment plays in the evolution of
real networks, we must ask, where does it come from? The question can be
broken down to two narrower issues:
Why does ( k) depend on k?
Why is the dependence of ( k) linear in k?
In the past decade we witnessed the emergence of two philosophically
different answers to these questions The first views preferential
attach-ment as the interplay between random events and some structural
prop-erty of a network These mechanisms do not require global knowledge of
the network but rely on random events, hence we will call them local or
random mechanisms The second assumes that each new node or link
bal-ances conflicting needs, hence they are preceeded by a cost-benefit
anal-ysis These models assume familiarity with the whole network and rely
on optimization principles, prompting us to call them global or optimized
mechanisms In this section we discuss both approaches.
LOCAL MECHANISMS
Several network models do not have preferential attachment
explic-itely built into them, as we did in the case of the Barabási-Albert model
Rather, they generate preferential attachment Next we discuss two such
models, helping us to derive ( k) and understand its origins.
Link Selection Model
The link selection model offers perhaps the simplest example of a local
mechanism capable of generating preferential attachment [16] It is
de-fined as follows ( Figure 5.13 ):
• Growth: At each time step we add a new node to the network.
• Link Selection: We select a link at random and connect the new node to
one of the two nodes at the two ends of the selected link
(b) In this case the new node connected to the node at the right end of the link.
NEW NODE
¡ Link selection model perhaps the
simplest example of a local or random mechanism capable of generating
nodes at the two ends of the selected link
¡ This simple mechanism generates preferential attachment
§ Why? Because node is picked with prob
proportional to the number of edges it has
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 46
Trang 47Copying model:
¡ (a) Random Connection: with prob ! the new node links to
random "
¡ (b) Copying: With prob 1 − ! randomly choose an outgoing link
of node " and connect the new node to the selected link's target
§ The new node “copies” one of the links of an earlier node
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47
Trang 48Analysis of the copying model:
¡ (a) the probability of selecting a node is 1/N
¡ (b) is equivalent to selecting a node linked to a
randomly selected link The probability of selecting a degree-! node through the copying process of step (b)
is !/2$ for undirected networks
¡ Again, the likelihood that the new node will connect to
a degree-! node follows preferential attachment
Examples:
¡ Social networks: Copy your friend’s friends.
¡ Citation Networks: Copy references from papers we
read
¡ Protein interaction networks: gene duplication
Trang 49¡ Copying mechanism (directed network)
§ Select a node and an edge of this node
§ Attach to the endpoint of this edge
¡ Walking on a network (directed network)
§ The new node connects to a node, then to every first, second, … neighbor of this node
§ Select an edge and attach to both endpoints of this edge
¡ Node duplication
§ Duplicate a node with all its edges
§ Randomly prune edges of new node
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 49
Trang 503 3
3 2
2
log
log log
log
) 1 log(
log log
Size of the biggest hub is of order O(N) Most nodes can
be connected within two steps, thus the average path
length will be independent of the network size n.
The avg path length increases slower than logarithmically
with n In G np all nodes have comparable degree, thus most paths will have comparable length In a scale-free network vast majority of the paths go through the few high degree hubs, reducing the distances between nodes
Some models produce ! = 3 This was first derived by Bollobas et al for the network diameter in the context of a dynamical model, but it holds for the average path length
as well.
The second moment of the distribution is finite, thus in many ways the network behaves as a random network Hence the average path length follows the result that we derived for the random network model earlier.
Degree exponent
Trang 51The scale-free behavior is relevant
Regime full of anomalies…
internet
actor
collaboration metabolic
citation
11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 51
Trang 52¡ (1) Power-laws in Networks
¡ (2) Network Robustness
¡ (3) Preferential Attachment
Trang 54¡ Protein interactome: A protein-protein
interaction network of a species:
§ Nodes: Species’ proteins
§ Edges: Physical protein-protein interactions (PPI)
¡ Tree of life: Evolutionary history of species:
§ Phylogenetic tree calculated based on similarity of gene sequence information between species:
§ Units: nucleotide substitutions per site
§ Evolutionary time of a species: total branch length from the root to the corresponding leaf in the tree
Trang 55§ Ancestral species have gone
extinct or evolved into
present-day species
§ Older interactomes are lost
Jure Leskovec, Stanford CS224W: Analysis of Networks 55
[Zitnik et al., bioRxiv 454033, 2018]
Trang 56¡ Protein failure can occur through:
§ Removal of a protein (e.g., nonsense mutation)
§ Disruption of a PPI (e.g., environmental factors,
such as availability of resources)
¡ Resilience is a critical interactome property:
§ Breakdown of proteins affect the exchange of any
biological information between proteins in a cell
§ Protein failures can fragment the interactome and
lead to cell death and disease
[Zitnik et al., bioRxiv 454033, 2018]
Trang 57¡ Questions for today:
§ How do interactomes change through evolution?
§ How does natural selection shape the interactomes?
§ How do changes in these networks impact species?
¡ Approach:
§ Define a network resilience measure
§ Use the measure to study resilience of interactomes
§ Find a network mechanism of resilience
Jure Leskovec, Stanford CS224W: Analysis of Networks 57
[Zitnik et al., bioRxiv 454033, 2018]
Trang 58[Zitnik et al., bioRxiv 454033, 2018]
¡ Fragmentation of the network upon node removal:
§ Entropy ! on a set of isolated clusters:
§ " # = % # /' is the proportion of nodes that belong to cluster % #
§ Probability of seeing a node from %# when sampling one node from the fragmented network
§ ! quantifies uncertainty in predicting the cluster identity of an individual node
taken at random from the network
§ Shannon’s diversity index ( )* : Normalize w.r.t network size