1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Tài liệu PARALLEL COMPUTER ARCHITECTURES-8 doc

43 295 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Parallel Computer Architectures
Thể loại Tài liệu
Định dạng
Số trang 43
Dung lượng 185,52 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

a A multicomputer with 16 CPUs, each with each own private memory.. Operating system ApplicationHardware b Language run-time system Operating system Shared memory Application Hardware La

Trang 1

PARALLEL COMPUTER ARCHITECTURES

1

Trang 2

P P P

P

Shared

memory

Figure 8-1 (a) A multiprocessor with 16 CPUs sharing a

com-mon memory (b) An image partitioned into 16 sections, eachbeing analyzed by a different CPU

Trang 3

M M M M

P P P

P

passing interconnection network Private memory

Message-Figure 8-2 (a) A multicomputer with 16 CPUs, each with

each own private memory (b) The bit-map image of Fig 8-1split up among the 16 memories

Trang 4

Operating system Application

Hardware

(b)

Language run-time system

Operating system

Shared memory

Application

Hardware

Language run-time system

Operating system Application

Hardware

(c)

Language run-time system

Operating system

Shared memory

Application

Hardware

Language run-time system

Operating system Application

Hardware

Figure 8-3 Various layers where shared memory can be

im-plemented (a) The hardware (b) The operating system (c)The language runtime system

Trang 5

Figure 8-4 Various topologies The heavy dots represent

switches The CPUs and memories are not shown (a) A star.(b) A complete interconnect (c) A tree (d) A ring (e) A grid.(f) A double torus (g) A cube (h) A 4D hypercube

Trang 6

Figure 8-5 An interconnection network in the form of a

four-switch square grid Only two of the CPUs are shown

Trang 7

CPU 1 Input port

(a)

Output port

Entire packet

Entire

packet

Four-port switch

C

A

CPU 2

Entire packet D

(c)

C

A

D B

Figure 8-6 Store-and-forward packet switching.

Trang 9

60 50

40 30

20 10

Figure 8-8 Real programs achieve less than the perfect

speed-up indicated by the dotted line

Trang 10

n CPUs active

1 CPUactive

1 – ff

Potentiallyparallelizablepart

fT (1 – f)T/n

Figure 8-9 (a) A program has a sequential part and a

parallel-izable part (b) Effect of running part of the program in lel

Trang 11

Bus

Figure 8-10 (a) A 4-CPU bus-based system (b) A 16-CPU

bus-based system (c) A 4-CPU grid-based system (d) A CPU grid-based system

Trang 12

Figure 8-11 Computational paradigms (a) Pipeline (b)Phased computation (c) Divide and conquer (d) Replicatedworker

Trang 13

222222222222222222222222222222222222222222222222222222222222222222222222222222222222 Multiprocessor Message passing Message passing simulated with buffers in memory 222222222222222222222222222222222222222222222222222222222222222222222222222222222222 Multicomputer Shared variables DSM, Linda, Orca, etc on an SP/2 or a PC network 222222222222222222222222222222222222222222222222222222222222222222222222222222222222 Multicomputer Message passing PVM or MPI on an SP/2 or a network of PCs

Trang 14

Instruction

streams

Data streams Name Examples

Trang 15

processors

computers

Bus Switched CC-NUMA NC-NUMA Grid Hyper-cube

Shared memory Message passing

Figure 8-14 A taxonomy of parallel computers.

Trang 16

Input vectors

Vector ALU

Figure 8-15 A vector ALU.

Trang 20

8 64-Bit scalar registers

64 64-Bit holding registers for scalars

64 Elements per register

8 64-Bit vector registers

Scalar integer units

Scalar/vector floatng-point units

Vector integer units

ADD BOOLEAN SHIFT

ADD MUL RECIP.

ADD BOOLEAN SHIFT POP COUNT MUL

Figure 8-19 Registers and functional units of the Cray-1

Trang 21

Write

100

Write 200 Read 2x

Read 2x

W100 W200 R3 = 200 R3 = 200 R4 = 200 R4 = 200

(c)

W100 R3 = 100 W200 R4 = 200 R3 = 200 R4 = 200

(d)

W200 R4 = 200 W100 R3 = 100 R4 = 100 R3 = 100 1

2

3

4

x

Figure 8-20 (a) Two CPUs writing and two CPUs reading a

common memory word (b) - (d) Three possible ways the two

writes and four reads might be interleaved in time

Trang 22

Figure 8-21 Weakly consistent memory uses synchronization

operations to divide time into sequential epochs

Trang 23

Figure 8-22 Three bus-based multiprocessors (a) Without

caching (b) With caching (c) With caching and private

memories

Trang 24

Figure 8-23 The write through cache coherence protocol.

The empty boxes indicate that no action is taken

Trang 25

Modified

CPU 1 reads block A

CPU 2 reads block A

CPU 2 writes block A

CPU 3 reads block A

CPU 2 writes block A

CPU 1 writes block A

A A A A A

A

Figure 8-24 The MESI cache coherence protocol.

Trang 26

Closed crosspoint switch

Open crosspoint switch (a)

(b)

(c)

Crosspoint switch is closed

Crosspoint switch is open

Figure 8-25 (a) An 8×8 crossbar switch (b) An open

crosspoint (c) A closed crosspoint

Trang 28

B

XY

Module Address Opcode Value

Figure 8-27 (a) A 2×2 switch (b) A message format

Trang 30

Figure 8-29 A NUMA machine based on two levels of buses.

The Cm* was the first multiprocessor to use this design

Trang 31

0 0 1 0 0

2 18 -1

82

Figure 8-30 (a) A 256-node directory-based multiprocessor.

(b) Division of a 32-bit memory address into fields (c) The

directory at node 36

Trang 32

Uncached, shared, modified

This is the directory

for cluster 13 This bit

tells whether cluster 0

has block 1 of the memory

homed here in any of

its caches. 0

1 2 3

Trang 33

Snooping bus interface

Quad board with

4 Pentium Pros and

up to 4 GB of RAM

Figure 8-32 The NUMA-Q multiprocessor.

Trang 34

Local memory table

at home node

0

2 19 -1

Node 4 cache directory Node 9 cache directory Node 22 cache directory

Figure 8-33 SCI chains all the holders of a given cache line

together in a doubly-linked list In this example, a line isshown cached at three nodes

Trang 35

Local interconnect

Disk and I/O

High-performance interconnection network

Figure 8-34 A generic multicomputer.

Trang 36

Control +

E registers

Commun processor Mem

Figure 8-35 The Cray Research T3E.

Trang 37

(a) (b)

64-Bit local bus

64-Bit local bus

PPro MB64

Kestrel board

2

I/O NIC PPro

MB I/O NICPPro

32

38

Figure 8-36 The Intel/Sandia Option Red system (a) The

kestrel board (b) The interconnection network

Trang 38

6

7

8 9

1

2

3 4

5 6

7

8 9

Figure 8-37 Scheduling a COW (a) FIFO (b) Without

head-of-line blocking (c) Tiling The shaded areas indicate idleCPUs

Trang 39

CPU CPU CPU

Packet

going east

Packet going west

Line card

Ethernet Switch

plane

Back-Figure 8-38 (a) Three computers on an Ethernet (b) An

Eth-ernet switch

Trang 40

10

1112

Figure 8-39 Sixteen CPUs connected by four ATM switches.

Two virtual circuits are shown

Trang 41

Globally shared virtual memory consisting of 16 pages

Memory

Network (a)

Figure 8-40 A virtual address space consisting of 16 pages

spread over four nodes of a multicomputer (a) The initial tion (b) After CPU 0 references page 10 (c) After CPU 1references page 10, here assumed to be a read-only page

Trang 42

situa-(′′abc′′, 2, 5)

(′′matrix-1′′, 1, 6, 3.14)

(′′family′′,′′is sister′′, Carolyn, Elinor)

Figure 8-41 Three Linda tuples.

Trang 43

Object implementation stack;

stack: array [integer 0 N-1] of integer;

operation push(item: integer); function returning nothing begin

end;

operation pop( ): integer; # function returning an integer

begin

Figure 8-42 A simplified ORCA stack object, with internal data

and two operations

Ngày đăng: 12/12/2013, 09:15

TỪ KHÓA LIÊN QUAN

w