14 network robustness and preferential attachment

Random networkScale-free power-law network Erdos-Renyi random graph Degree distribution is Binomial Degree distribution is Power-law Jure Leskovec, Stanford CS224W: Analysis of Networks,

Trang 1

CS224W: Analysis of Networks

http://cs224w.stanford.edu

Trang 2

¡ (1) Power-laws in Networks

¡ (2) Network Robustness

¡ (3) Preferential Attachment

Trang 3

11/8/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

Which interesting graph

¡ Node degree distribution

§ What fraction of nodes has degree ! (as a function of !)?

§ Prediction from simple random graph models:

p (!) = exponential function of !

Trang 4

Expected based on G np Found in data

! " ∝ " $%

Trang 5

¡ Take a network, plot a histogram of !(#) vs #

Flickr social network

n= 584,207, m=3,555,115

Trang 6

¡ Plot the same data on log-log scale:

Flickr social network

n= 584,207, m=3,555,115

If , = / *+ then log , = −5 log(/)

So on log-log axis power-law looks like

a straight line of slope −5 !

Trang 7

¡ First observed in Internet Autonomous Systems

[Faloutsos, Faloutsos and Faloutsos, 1999]

Internet domain topology

Trang 8

¡ The World Wide Web [Broder et al., 2000]

Trang 9

¡ Other Networks [Barabasi-Albert, 1999]

Power-grid Web graph

Actor collaborations

Trang 10

¡ Above a certain ! value, the power law is

always higher than the exponential!

0.2 0.6 1

1

) ( x = cx -

p

x

c x

p ( ) =

-5 0

) ( x = cx -

p

x

Trang 11

¡ Power-law vs Exponential

on log-log and semi-log (log-lin) scales

[Clauset-Shalizi-Newman 2007]

semi-log

5 0

) ( x = cx-

p

x

c x

p ( ) = 10

p

x

c x

p ( ) =

-5 0

) ( x = cx-

p

1

) ( x = cx-

Trang 13

¡ Power-law degree exponent is

Trang 14

¡ Definition:

Networks with a power-law tail in

their degree distribution are called

“scale-free networks”

§ Scale invariance: There is no characteristic scale

§ Scale invariance means laws do not change if scales of length,

energy, or other variables, are multiplied by a common factor

§ Scale-free function: ! "# = " % !(#)

§ Power-law function: ! "# = " % # % = " % !(#)

Log() or Exp() are not scale free!

( )* = log )* = log ) + log * = log ) + ( *

Trang 15

Many other quantities follow heavy-tailed distributions

[Clauset-Shalizi-Newman 2007]

Trang 16

CMU grad-students at

the G20 meeting in

Pittsburgh in Sept 2009

Trang 18

¡ Degrees are heavily skewed:

Distribution !(# > %) is heavy tailed if:

Trang 19

¡ What is the normalizing constant?

Trang 20

¡ What’s the expected value of a power-law

( :

( ( :

,-+ = D − 1

Trang 21

¡ Power-laws have infinite moments!

§ If ! ≤ 2 : $[&] = ∞

§ If ! ≤ 3 : +,-[&] = ∞

§ Average is meaningless, as the variance is too high!

from a power-law with exponent α

Trang 23

Estimating a from data:

¡ (1) Fit a line on log-log axis using least squares:

§ Solve !"# $%&

BAD!

Trang 24

¡ Plot Complementary CDF (CCDF) ! " ≥ $

Then the estimated & = ( + &′

where &′ is the slope of !(" ≥ $)

Trang 25

§ The log-likelihood of observed data d i :

Trang 27

Random network

Scale-free (power-law) network

(Erdos-Renyi random graph)

Degree distribution is Binomial

Degree distribution is Power-law

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2711/8/18

Trang 29

¡ The interest in the robustness problem has

two origins:

à Robustness of a system is an important

problem in many areas

à Many real networks are not regular, but have

a scale-free topology

Trang 30

¡ How does network

connectivity change

as nodes get removed?

[Albert et al 00; Palmer et al 01]

¡ Nodes can be removed:

§ Random failure:

§ Remove nodes uniformly at random

§ Targeted attack:

§ Remove nodes in order of decreasing degree

¡ This is important for robustness of the internet

as well as epidemiology

Trang 31

¡ Networks with equal number of nodes and edges:

§ ER random graph

§ Scale-free network

¡ Study the properties of the network as an

increasing fraction of nodes are removed

§ Node selection:

§ Random (corresponds to random failures )

§ Nodes with largest degrees (corresponds to targeted attacks)

¡ Measures:

§ Fraction of nodes in the largest connected component

§ Average shortest path length between nodes in the

largest component

Trang 32

¡ Study the properties of the network as an

increasing fraction of nodes are removed

§ Node selection:

§ Random failures: select node to remove at random

§ Targeted attacks: remove nodes by decreasing degree

Trang 33

¡ Graphs are resilient to random failures, but

are sensitive to targeted attacks:

¡ For random networks there is smaller

difference between the two

random failure targeted attack

Trang 34

What proportion of random nodes must be removed

in order for the size (S) of the giant component to

drop to 0?

¡ Infinite scale-free networks with ! < # never

break down under random node failures

$… degree exponent K… maximum degree

Fraction deleted nodes

Trang 35

¡ Real networks are resilient to random failures

¡ G np has better resilience to targeted attacks

§ E.g., we need to remove all pages of degree >5 to disconnect

the Web But this is a very small fraction of all web pages!

Fraction of removed nodes

Targeted

attack

G np network

Scale-free network

Random failures Targeted

attack

Trang 36

Targeted attack

Trang 37

¡ The first few % of

nodes removed:

§ G np

§ SF: Scale-free

¡ Notice how targeted

attacks very quickly

disconnect the network

Trang 39

Model:

Trang 40

¡ New nodes are more likely to link to nodes that already have high degree

¡ Herbert Simon’s result:

§ Power-laws arise from “Rich get richer” (cumulative

advantage)

¡ Examples

proportional to the number it already has

§ Herding: If a lot of people cite a paper, then it must be good, and

therefore I should cite it too

§ Sociology: Matthew effect, http://en.wikipedia.org/wiki/Matthew_effect

§ “For whoever has will be given more, and they will have an abundance Whoever does not have, even what they have will be taken from them.”

§ Eminent scientists often get more credit than a comparatively unknown researcher, even if their work is similar

Trang 41

¡ Preferential attachment:

[de Solla Price ‘65, Albert-Barabasi ’99, Mitzenmacher ‘03]

§ Nodes arrive in order 1,2,…,n

§ At step j, let d i be the degree of node i < j

§ A new node j arrives and creates m out-links

§ Prob of j linking to a previous node i is

proportional to degree d i of node i

j

Trang 42

We analyze the following simple model:

¡ Nodes arrive in order 1,2,3, … , &

¡ When node ' is created it makes a

single out-link to an earlier node ( chosen:

§ 1) With prob ), ' links to ( chosen uniformly at

random (from among all earlier nodes)

§ 2) With prob * − ), node ' chooses ( uniformly at random & links to a random node l that i points to

node , with prob proportional to - , (the in-degree of ,)

§ Our graph is directed: Every node has out-degree 1

[Mitzenmacher, ‘03]

Node j

Trang 43

¡ Claim: The described model generates

networks where the fraction of nodes with

in-degree k scales as:

)

1 1

a

So we get power-law

degree distribution

with exponent:

Trang 44

¡ Preferential attachment gives

power-law in-degrees!

¡ Intuitively reasonable process

¡ Can tune model parameter p to get the

observed exponent

§ On the web, P[node has in-degree d] ~ d -2.1

Trang 45

¡ Preferential attachment is not so good at

predicting network structure

§ Age-degree correlation

§ Node degree is proportional to its age

§ Solution: Node fitness (virtual degree)

§ Links among high degree nodes:

§ On the web nodes sometimes avoid linking to each other

¡ Further questions:

§ What is a reasonable model for how people

sample network nodes and link to them?

§ Short random walks

Trang 46

THE ORIGINS OF PREFERENTIAL

ATTACHMENT

SECTION 5.8

Given the key role preferential attachment plays in the evolution of

real networks, we must ask, where does it come from? The question can be

broken down to two narrower issues:

Why does ( k) depend on k?

Why is the dependence of ( k) linear in k?

In the past decade we witnessed the emergence of two philosophically

different answers to these questions The first views preferential

attach-ment as the interplay between random events and some structural

prop-erty of a network These mechanisms do not require global knowledge of

the network but rely on random events, hence we will call them local or

random mechanisms The second assumes that each new node or link

bal-ances conflicting needs, hence they are preceeded by a cost-benefit

anal-ysis These models assume familiarity with the whole network and rely

on optimization principles, prompting us to call them global or optimized

mechanisms In this section we discuss both approaches.

LOCAL MECHANISMS

Several network models do not have preferential attachment

explic-itely built into them, as we did in the case of the Barabási-Albert model

Rather, they generate preferential attachment Next we discuss two such

models, helping us to derive ( k) and understand its origins.

Link Selection Model

The link selection model offers perhaps the simplest example of a local

mechanism capable of generating preferential attachment [16] It is

de-fined as follows ( Figure 5.13 ):

• Growth: At each time step we add a new node to the network.

• Link Selection: We select a link at random and connect the new node to

one of the two nodes at the two ends of the selected link

(b) In this case the new node connected to the node at the right end of the link.

NEW NODE

¡ Link selection model perhaps the

simplest example of a local or random mechanism capable of generating

nodes at the two ends of the selected link

¡ This simple mechanism generates preferential attachment

§ Why? Because node is picked with prob

proportional to the number of edges it has

Trang 47

Copying model:

¡ (a) Random Connection: with prob ! the new node links to

random "

¡ (b) Copying: With prob 1 − ! randomly choose an outgoing link

of node " and connect the new node to the selected link's target

§ The new node “copies” one of the links of an earlier node

Trang 48

Analysis of the copying model:

¡ (a) the probability of selecting a node is 1/N

¡ (b) is equivalent to selecting a node linked to a

randomly selected link The probability of selecting a degree-! node through the copying process of step (b)

is !/2$ for undirected networks

¡ Again, the likelihood that the new node will connect to

a degree-! node follows preferential attachment

Examples:

¡ Social networks: Copy your friend’s friends.

¡ Citation Networks: Copy references from papers we

read

¡ Protein interaction networks: gene duplication

Trang 49

¡ Copying mechanism (directed network)

§ Select a node and an edge of this node

§ Attach to the endpoint of this edge

¡ Walking on a network (directed network)

§ The new node connects to a node, then to every first, second, … neighbor of this node

§ Select an edge and attach to both endpoints of this edge

¡ Node duplication

§ Duplicate a node with all its edges

§ Randomly prune edges of new node

Trang 50

3 3

3 2

2

log

log log

log

) 1 log(

log log

Size of the biggest hub is of order O(N) Most nodes can

be connected within two steps, thus the average path

length will be independent of the network size n.

The avg path length increases slower than logarithmically

with n In G np all nodes have comparable degree, thus most paths will have comparable length In a scale-free network vast majority of the paths go through the few high degree hubs, reducing the distances between nodes

Some models produce ! = 3 This was first derived by Bollobas et al for the network diameter in the context of a dynamical model, but it holds for the average path length

as well.

The second moment of the distribution is finite, thus in many ways the network behaves as a random network Hence the average path length follows the result that we derived for the random network model earlier.

Degree exponent

Trang 51

The scale-free behavior is relevant

Regime full of anomalies…

internet

actor

collaboration metabolic

citation

Trang 52

¡ (1) Power-laws in Networks

¡ (2) Network Robustness

¡ (3) Preferential Attachment

Trang 54

¡ Protein interactome: A protein-protein

interaction network of a species:

§ Nodes: Species’ proteins

§ Edges: Physical protein-protein interactions (PPI)

¡ Tree of life: Evolutionary history of species:

§ Phylogenetic tree calculated based on similarity of gene sequence information between species:

§ Units: nucleotide substitutions per site

§ Evolutionary time of a species: total branch length from the root to the corresponding leaf in the tree

Trang 55

§ Ancestral species have gone

extinct or evolved into

present-day species

§ Older interactomes are lost

Jure Leskovec, Stanford CS224W: Analysis of Networks 55

[Zitnik et al., bioRxiv 454033, 2018]

Trang 56

¡ Protein failure can occur through:

§ Removal of a protein (e.g., nonsense mutation)

§ Disruption of a PPI (e.g., environmental factors,

such as availability of resources)

¡ Resilience is a critical interactome property:

§ Breakdown of proteins affect the exchange of any

biological information between proteins in a cell

§ Protein failures can fragment the interactome and

lead to cell death and disease

Trang 57

¡ Questions for today:

§ How do interactomes change through evolution?

§ How does natural selection shape the interactomes?

§ How do changes in these networks impact species?

¡ Approach:

§ Define a network resilience measure

§ Use the measure to study resilience of interactomes

§ Find a network mechanism of resilience

Jure Leskovec, Stanford CS224W: Analysis of Networks 57

Trang 58

¡ Fragmentation of the network upon node removal:

§ Entropy ! on a set of isolated clusters:

§ " # = % # /' is the proportion of nodes that belong to cluster % #

§ Probability of seeing a node from %# when sampling one node from the fragmented network

§ ! quantifies uncertainty in predicting the cluster identity of an individual node

taken at random from the network

§ Shannon’s diversity index ( )* : Normalize w.r.t network size

Tiêu đề	Network robustness and preferential attachment
Tác giả	Jure Leskovec
Trường học	Stanford University
Chuyên ngành	Analysis of Networks
Thể loại	lecture notes
Năm xuất bản	2018
Thành phố	Stanford

Định dạng
Số trang	67
Dung lượng	44,87 MB