1. Trang chủ
  2. » Công Nghệ Thông Tin

02 measuring networks, and random graph model

60 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Analysis of Networks
Tác giả Jure Leskovec
Trường học Stanford University
Chuyên ngành Analysis of Networks
Thể loại Lecture notes
Năm xuất bản 2018
Thành phố Stanford
Định dạng
Số trang 60
Dung lượng 33,15 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3Degree distribution: Pk Clustering coefficient: C Connected components: s... ¡ Diameter: The m

Trang 1

CS224W: Analysis of Networks

http://cs224w.stanford.edu

Trang 3

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

Degree distribution: P(k)

Clustering coefficient: C

Connected components: s

Trang 4

¡ Degree distribution P(k): Probability that

a randomly chosen node has degree k

N k = # nodes with degree k

k

Nk

Trang 5

¡ A path is a sequence of nodes in which each

node is linked to the next one

and pass through the same edge multiple times

Trang 6

¡ Distance (shortest path, geodesic)

between a pair of nodes is defined as the number of edges along the

shortest path connecting the nodes

§ *If the two nodes are not connected, the distance is usually defined as infinite

¡ In directed graphs paths need to follow the direction of the arrows

Trang 7

¡ Diameter: The maximum (shortest path)

distance between any pair of nodes in a graph

(component) or a strongly connected

(component of a) directed graph

§ Many times we compute the average only over the

connected pairs of nodes (that is, we ignore “infinite”

ij

h E

h

, max

2

1 where hij is the distance from node i to node j

E max is max number of edges (total number of

node pairs) = n(n-1)/2

Trang 8

¡ Clustering coefficient:

§ C i Î [0,1]

§

¡ Average clustering coefficient:

where e i is the number of edges

between the neighbors of node i

Trang 9

¡ Clustering coefficient:

§

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

where e i is the number of edges

between the neighbors of node i

Trang 10

¡ Size of the largest connected component

by a path

¡ Largest component = Giant component

How to find connected components:

• Start from random node and perform Breadth First Search (BFS)

• Label the nodes BFS visited

• If all nodes are visited, the network is connected

• Otherwise find an unvisited node and repeat BFS

A

B

H F

G

Trang 11

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

Degree distribution: P(k)

Clustering coefficient: C

Connected components: s

Trang 13

MSN Messenger.

Trang 15

Network: 180M people, 1.3B edges

1510/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

Trang 16

Contact Conversation

Messaging as an undirected graph

• Edge (u,v) if users u and v

exchanged at least 1 msg

• N=180 million people

• E=1.3 billion edges

Trang 17

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

Trang 18

Note: We plotted the same data as on the previous slide, just the axes are now logarithmic.

Trang 19

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

å

=

=

k k i

i k

k

i

C N

Trang 21

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

Number of links between pairs of nodes in the largest connected component

Avg path length 6.6

90% of the nodes can be reached in < 8 hops

Trang 22

Are these values “expected”?

Are they “surprising”?

To answer this we need a null-model!

Trang 23

a Undirected network

N=2,018 proteins as nodes E=2,930 binding interactions as links

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

Trang 25

¡ Erdös-Renyi Random Graphs [Erdös-Renyi, ‘60]

¡ Two variants:

edge (u,v) appears i.i.d with probability p

§ G n,m : undirected graph with n nodes, and

m uniformly at random picked edges

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

What kind of networks do such models produce?

Trang 26

¡ n and p do not uniquely determine the graph!

¡ We can have many different realizations given the same n and p

n = 10 p= 1/6

Trang 27

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27

Degree distribution: P(k)

Clustering coefficient: C

What are the values of

Trang 28

¡ Fact: Degree distribution of G np is binomial.

¡ Let P(k) denote the fraction of nodes with

degree k:

k n

p k

n k

ö çç

è

æ

) (

Select k nodes out of n-1

Probability of

having k edges

Probability of missing the rest of

= p n

k By the law of large numbers, as the network size

increases, the distribution becomes increasingly narrow—we are increasingly confident that the degree

of a node is in the vicinity of k.

Mean, variance of a binomial distribution

k

Trang 29

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

n

k n

k p

k k

k k

p C

i i

=

1 )

1 (

) 1 (

Clustering coefficient of a random graph is small.

If we generate bigger and bigger graphs with fixed avg degree ! (that is we set " = ! ⋅ 1/'), then C decreases with the graph size n.

) 1 (

2 -

=

i i

i i

k k

e C

e i = p k i (k i −1)

2 Number of distinct pairs of

neighbors of node i of degree ki

Each pair is connected

with prob p

Where e i is the number

of edges between i’s neighbors

¡ Edges in G np appear i.i.d with prob p

Trang 30

p k

n k

ö çç

è

æ

) (

Trang 31

¡ Graph G(V, E) has expansion α : if " S Í V:

# of edges leaving S ³ α × min(|S|,|V\S|)

# min edges S leaving V S S

Trang 32

¡ Expansion is measure of robustness:

Trang 33

¡ Fact: In a graph on n nodes with expansion α for all pairs of nodes there is a path of length O((log n)/α).

¡ Random graph G np :

For log n > np > c, diam(G np ) = O(log n/ log (np))

§ Random graphs have good expansion so it takes a

logarithmic number of steps for BFS to visit all nodes

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33

s

Trang 34

Erdös-Renyi Random Graph can grow very

large but nodes will be just a few hops apart

Trang 35

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35

p k

n k

ö çç

è

æ

) (

Trang 36

¡ Graph structure of G np as p changes:

¡ Emergence of a giant component:

avg degree k=2E/n or p=k/(n-1)

§ k=1-ε : all components are of size Ω(log n)

§ k=1+ε : 1 component of size Ω(n), others have size Ω(log n)

§ Each node has at least one edge in expectation

p=

1/(n-1)

Giant component appears

c/(n-1)

Avg deg const

Lots of isolated nodes.

log(n)/(n-1)

Fewer isolated nodes.

Avg deg = 1

Trang 37

¡ G np , n=100,000, k=p(n-1) = 0.5 … 3

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37

Fraction of nodes in the largest component

p*(n-1)=1

Trang 38

Paul Erdos

Paul Erdös

Trang 39

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39

Degree distribution:

Avg path length: 6.6 O(log n)

Avg clustering coef.: 0.11 k / n

Largest Conn Comp.: 99%

C ≈ 8·10 -8

h ≈ 8.2

GCC exists when k>1.

Trang 40

¡ Are real networks like random graphs?

§ Giant connected component: J

§ Average path length: J

§ Clustering Coefficient: L

§ Degree Distribution: L

§ Degree distribution differs from that of real networks

§ Giant component in most real network does NOT

emerge through a phase transition

§ No local structure – clustering coefficient is too low

¡ Most important: Are real networks random?

§ The answer is simply: NO!

Trang 41

¡ If G np is wrong, why did we spend time on it?

then be compared to the real data

particular property the result of some random

process

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41

So, while G np is WRONG, it will turn out

to be extremly USEFUL!

Trang 42

¡ Goal: Generate a random graph with a

given degree sequence k 1 , k 2 , … k N

Trang 43

Can we have high clustering while also having short paths?

Trang 44

¡ What is the typical shortest path

length between any two people?

§ Experiment on the global friendship

network

§ Can’t measure, need to probe explicitly

§ Picked 300 people in Omaha, Nebraska

and Wichita, Kansas

§ Ask them to get a letter to a

stock-broker in Boston by passing

it through friends

¡ How many steps did it take?

Trang 45

¡ 64 chains completed:

(i.e., 64 letters reached the target)

average, thus

“6 degrees of separation”

had shorter paths to the stockbroker

than random people: 5.4 vs 6.7

closer paths: 4.4

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 48

Milgram’s small world experiment

[Milgram, ’67]

Trang 46

¡ Assume each human is connected to 100 other people

Then:

§ Step 1: reach 100 people

§ Step 2: reach 100*100 = 10,000 people

§ Step 3: reach 100*100*100 = 1,000,000 people

§ Step 4: reach 100*100*100*100 = 100M people

§ In 5 steps we can reach 10 billion people

¡ What’s wrong here? We ignore clustering!

§ Not all edges point to new people

§ 92% of FB friendships happen through a friend-of-a-friend

s

Trang 47

¡ MSN network has 7 orders of magnitude

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 50

h Average shortest path length

C Average clustering coefficient

“actual” … real network

“random” … random graph with same avg degree

Actor Collaborations (IMDB): N = 225,226 nodes, avg degree k = 61

Electrical power grid: N = 4,941 nodes, k = 2.67

Network of neurons: N = 282 nodes, k = 14

Network h actual h random C random

Trang 48

¡ Consequence of expansion:

§ Short paths: O(log n)

§ This is the smallest diameter we can

get if we have a constant degree.

¡ But networks have

“local” structure:

Friend of a friend is my friend

diameter is also high

Low diameter Low clustering coefficient

High clustering coefficient

High diameter

Trang 49

¡ Could a network with high clustering also

be a small world (log $ dimeter)?

§ How can we at the same time have

high clustering and small diameter?

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 52

High clustering High diameter

Low clustering Low diameter

Trang 50

Small-World Model [Watts-Strogatz ‘98]

Two components to the model:

¡ (1) Start with a low-dimensional regular lattice

¡ Now introduce randomness (“shortcuts”)

¡ (2) Rewire:

shortcuts to join remote parts

of the lattice

the other end to a random node

[Watts-Strogatz, ‘98]

Trang 51

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 54

High clustering Low diameter

Low clustering Low diameter

4

3

N

h = = log

log

a

Rewiring allows us to “interpolate” between

a regular lattice and a random graph

[Watts-Strogatz, ‘98]

1 2

High clustering

High diameter

Trang 53

¡ Could a network with high clustering be at the same time a small world?

and the small-world

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 58

Trang 54

¡ What mechanisms do people use to

navigate networks and find the target?

Trang 55

The setting:

¡ s only knows locations of its friends

and location of the target t

¡ s does not know links of anyone else but itself

¡ Geographic Navigation: s “navigates” to

a node geographically closest to t

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60

s

t

Trang 56

O

) )

((log n b O

O

Kleinberg’s model

Note: We know these graphs have diameter O(log n).

So in Kleinberg’s model search time is polynomial in log n,

while in Watts-Strogatz it is exponential (in log n).

Trang 57

are not random

§ They follow geography!

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62

Saul Steinberg, “View of the World from 9th Avenue”

Trang 58

¡ Model [Kleinberg, Nature ‘01]

§ d(u,v) … grid distance between u and v

v) d(u, v)

Trang 59

¡ We know:

§ α = 0 (i.e., Watts-Strogatz): We need steps

§ α = 1: We need O(log(n) 2 ) steps

10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 64

“funnels inward” through these different scales of resolution, as we see from the way the letter depicted in this figure reduces its distance to the target by approximately a factor of

two with each step.

So now let’s look at how the inverse-square exponent q = 2 interacts with these scales of resolution We can work concretely with a single scale by taking a node v in the network, and a fixed distance d, and considering the group of nodes lying at distances between d and

2d from v, as shown in Figure 20.7.

Now, what is the probability that v forms a link to some node inside this group? Since area in the plane grows like the square of the radius, the total number of nodes in this group

is proportional to d 2 On the other hand, the probability that v links to any one node in the group varies depending on exactly how far out it is, but each individual probability

is proportional to d 2 These two terms — the number of nodes in the group, and the

Trang 60

Small α: too many long links Big α: too many short links

Ngày đăng: 26/07/2023, 19:35

w