1. Trang chủ
  2. » Công Nghệ Thông Tin

parallel programming with intel parallel studio xe

556 719 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 556
Dung lượng 48,63 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Static Security Analysis 51 Diff erent Approaches to Using Parallel Studio XE 52 Summary 52Example 1: Working with Cilk Plus 54 Using Intel Parallel Amplifi er XE for Hotspot Analysis 60

Trang 3

PARALLEL PROGRAMMING

FOREWORD xxv

INTRODUCTION xxvii

 PART I AN INTRODUCTION TO PARALLELISM CHAPTER 1 Parallelism Today 3

CHAPTER 2 An Overview of Parallel Studio XE 25

CHAPTER 3 Parallel Studio XE for the Impatient 53

 PART II USING PARALLEL STUDIO XE CHAPTER 4 Producing Optimized Code 87

CHAPTER 5 Writing Secure Code 131

CHAPTER 6 Where to Parallelize 155

CHAPTER 7 Implementing Parallelism 181

CHAPTER 8 Checking for Errors 217

CHAPTER 9 Tuning Parallel Applications 251

CHAPTER 10 Parallel Advisor–Driven Design 277

CHAPTER 11 Debugging Parallel Applications 309

CHAPTER 12 Event-Based Analysis with VTune Amplifi er XE 341

 PART III CASE STUDIES CHAPTER 13 The World’s First Sudoku “Thirty-Niner” 377

CHAPTER 14 Nine Tips to Parallel-Programming Heaven 397

CHAPTER 15 Parallel Track Fitting in the CERN Collider 419

CHAPTER 16 Parallelizing Legacy Code 463

INDEX 489

Trang 5

Parallel Programming

Trang 7

Parallel Programming

Stephen Blair-Chappell Andrew Stokes

Trang 8

Copyright © 2012 by John Wiley & Sons, Inc., Indianapolis, Indiana

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means,

electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108

of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization

through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers,

MA 01923, (978) 750-8400, fax (978) 646-8600 Requests to the Publisher for permission should be addressed to the

Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)

748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with

respect to the accuracy or completeness of the contents of this work and specifi cally disclaim all warranties, including

without limitation warranties of fi tness for a particular purpose No warranty may be created or extended by sales or

pro-motional materials The advice and strategies contained herein may not be suitable for every situation This work is sold

with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services

If professional assistance is required, the services of a competent professional person should be sought Neither the

pub-lisher nor the author shall be liable for damages arising herefrom The fact that an organization or Web site is referred to

in this work as a citation and/or a potential source of further information does not mean that the author or the publisher

endorses the information the organization or Web site may provide or recommendations it may make Further, readers

should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was

written and when it is read.

For general information on our other products and services please contact our Customer Care Department within the

United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand Some material included with

stan-dard print versions of this book may not be included in e-books or in print-on-demand If this book refers to media such

as a CD or DVD that is not included in the version you purchased, you may download this material at http://

booksupport.wiley.com For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2011945570

Trademarks: Wiley, the Wiley logo, Wrox, the Wrox logo, Programmer to Programmer, and related trade dress are

trade-marks or registered tradetrade-marks of John Wiley & Sons, Inc and/or its affi liates, in the United States and other countries,

and may not be used without written permission Intel is a registered trademark of Intel Corporation All other

trade-marks are the property of their respective owners John Wiley & Sons, Inc., is not associated with any product or vendor

mentioned in this book.

Trang 9

ABOUT THE AUTHORS

STEPHEN BLAIR-CHAPPELL has been working for Intel in the Software and Services Group (SSG) for the past 15 years During his time with Intel, Stephen has worked on the compiler team as a devel-oper and, more recently, as a technical consulting engineer helping users make the best use of the Intel software tools Prior to working with Intel, Stephen was managing director of the UK offi ce of CAD-UL, a German-based compiler and debugger company During his time at CAD-UL Stephen was primarily responsible for technical support in the UK Projects he worked on during that time included the design and specifi cation of a graphical linker; the development and teaching of pro-tected mode programming courses to programmers; and support to many varied companies in the telecoms, automotive, and embedded industries

Stephen fi rst studied electronics as a technician at Matthew Boulton Technical College, and later studied Applied Software Engineering at Birmingham City University (BCU), where he also eventu-ally taught Outside work, Stephen is a regular contributor to the life of his local church, St Martin

in the Bull Ring, Birmingham, where he plays the organ, preaches, and leads the occasional service

ANDREW STOKES is a retired lecturer in software and electronics at Birmingham City University (BCU), UK Prior to lecturing, Andrew was a software developer in the research and commercial

fi elds He fi rst started software development in the 1980s at Cambridge University Engineering Laboratory, where he worked on software for scanning electron microscopes These software devel-opments continued in the commercial fi eld, where he worked on graphical programs in support of a Finite Element Analysis package

During his time at BCU, Andrew developed many software simulation tools, including programs for artifi cial neural network simulation, CPU simulation, processor design, code development tools, and a PROLOG expert system Andrew continues these software interests during retirement, with

a healthy interest in games programming, such as 3-D chess, where parallel programming is mount Away from computing, Andrew is a keen gardener and particularly likes the vibrant colors

para-of the typical English garden

Trang 10

ABOUT THE TECHNICAL EDITORS

KITTUR GANESH is a Senior Technical Consulting Engineer at Intel, providing consulting,

sup-port, and training for more than 7 years on various software products targeting Intel architecture

Previously, for more than 6 years at Intel, Kittur designed and developed software primarily used

for fracturing design data of Intel chips Prior to joining Intel more than 13 years ago, Kittur was

involved in developing commercial software in the EDA industry for more than 10 years Kittur has

a M.S (Computer Science), M.S (Industrial Engineering) and a B.S (Mechanical Engineering)

PABLO HALPERN is a Senior Software Engineer at Intel Corporation, working in the parallel runtime

libraries group He is a member of the C++ Standards Committee and helped produce the recent

C++11 revision of the standard Pablo is the author of the well-received book, The C++ Standard

Library from Scratch and a coauthor of the paper, Reducers and Other Cilk++ Hyperobjects, which

was named best paper at ACM SPAA in 2009 He has more than three decades of experience in the

software industry, with expertise in C++, language and compiler design, large-scale development

and testing, and network management protocols During this time, he has developed and taught

both beginning and advanced courses on C++ programming He currently lives in New Hampshire

with his wife and two children

Trang 11

Mary Beth Wakefi eld

FREEL ANCER EDITORIAL MANAGER

Trang 13

in the graphics department, for your work on all the fi gures.

We’d like to say a special “thank you” to those who have contributed to the book, especially to Mark Davis, who wrote Chapter 10, “Parallel Advisor–Driven Design,” and to Fred Tedeschi, who wrote Chapter 11, “Debugging Parallel Applications.”

We also want to thank those who allowed us to write about their experiences in the case studies

Thanks to Lars Peters Endresen and Håvard Graff for their work in Chapter 13, “The World’s First Sudoku ‘Thirty-Niner’”; to Dr Yann Golanski, for his input into Chapter 14, “Nine Tips to Parallel-Programming Heaven”; and to Hans Pabst, for his help with Chapter 15, “Parallel Track Fitting in the CERN Collider.”

Our appreciation also goes to the many colleagues from Intel who tirelessly reviewed different chapters — in particular, Levent Akyil, Bernth Andersson, Julian Horn, Martyn Corden, Maxym Dmytrychenko, Max Domeika, Hubert Haberstock, Markus Metzger, Mark Sabahi, and Thomas Zipplies

Finally, thanks to James Reinders, who encouraged the writing of this book and has been kind enough to provide the Foreword

— Stephen Blair-Chappell

Andrew Stokes

Trang 15

FOREWORD xxv

INTRODUCTION xxvii

PART I: AN INTRODUCTION TO PARALLELISM

The Emergence of Multi-Core and Many-Core Computing 4

Tools 7Education 8

Maintainability 8

Parallelism and the Programmer 9

Examples of Mixing and Matching Parallel Constructs 12

Trang 16

Speedup and Scalability 19

Summary 24

What’s in Parallel Studio XE? 26

OpenMP 37

Trang 17

Static Security Analysis 51 Diff erent Approaches to Using Parallel Studio XE 52 Summary 52

Example 1: Working with Cilk Plus 54

Using Intel Parallel Amplifi er XE for Hotspot Analysis 60

Example 2: Working with OpenMP 73

Summary 84

PART II: USING PARALLEL STUDIO XE

Introduction 88

Optimizing Code in Seven Steps 90

Trang 18

Using the General Options on the Example Application 94Generating Optimization Reports Using /Qopt-report 95

Determining That Auto-Vectorization Has Happened 100

Adding Interprocedural Optimization to the Example Application 108The Impact of Interprocedural Optimization on Auto-Vectorization 109

Building Applications to Run on More Than One Type of CPU 118

Manual CPU Dispatch: Rolling Your Own CPU-Specifi c Code 124

Summary 130

A Simple Security Flaw Example 132 Understanding Static Security Analysis 134

Creating a Build Specifi cation File by Injection 146

Trang 19

Using Static Security Analysis in a QA Environment 149

Summary 154

Diff erent Ways of Profi ling 156

Hotspot Analysis Using the Intel Compiler 158

Hotspot Analysis Using the Auto-Parallelizer 165

Hotspot Analysis with Amplifi er XE 171

Summary 180

C or C++, That Is the Question 182

The Beauty of Lambda Functions 183

Trang 20

Parallelizing Sections and Functions 193

OpenMP 196TBB 197

Parallelizing Recursive Functions 198

OpenMP 200TBB 200

Parallelizing Pipelined Applications 201

OpenMP 205TBB 206

Summary 215

Parallel Inspector XE Analysis Types 218

Trang 21

Controlling the Right Level of Detail 227

Avoiding Being Overwhelmed by the Amount of Data 228

Measuring the Baseline Using the Amplifi er XE Command Line 253

Identifying Concurrency Hotspots 255

Conducting Further Analysis and Tuning 264

Using the Intel Software Autotuning Tool 271

Summary 275

Trang 22

CHAPTER 10: PARALLEL ADVISOR–DRIVEN DESIGN 277

Getting Started with the NQueens Example Program 280

Summary 308

Introduction to the Intel Debugger 309

Using the Intel Debugger to Detect Data Races 311

Trang 23

Observing the Results 315

Using Suppression Filters to Discard Unwanted Events 319

Using Focus Filters to Examine a Selected Portion of Code 325

Runtime Investigation: Viewing the State of Your Application 333

Using the OpenMP Tasks Window to Investigate Variables

Using the OpenMP Spawn Tree Window to View the Behavior

Summary 339

Testing the Health of an Application 342

Is CPI on Its Own a Good Enough Measure of Health? 343

Conducting a Hotspot Analysis 345

User Mode Hotspots Versus Lightweight Hotspots 346

Conducting a General Exploration Analysis 352

Using Amplifi ers XE’s Other Tools 364

Trang 24

Using Amplifi er XE from the Command Line 369

Summary 374

PART III: CASE STUDIES

The Sudoku Optimization Challenge 377

Hands-On Example: Optimizing the Sudoku Generator 384

Adding Parallelism to the Generator Using OpenMP 391

Summary 396

The Challenge: Simulating Star Formation 397

Identifying the Hotspot and Discovering the Calling Sequence 410

Trang 25

Implementing Parallelism 410

Summary 416

The Stages of a High-Energy Physics Experiment 420

Confi guring the Array Building Blocks Build Environment 444

Summary 460

Trang 26

CHAPTER 16: PARALLELIZING LEGACY CODE 463

Introducing the Dhrystone Benchmark 464

Adding Amplifi er XE APIs to Timestamp the Dhrystone Loop 468

Attempt One: Synchronizing Shared Variable Access 472

Initializing and Accessing the Global Variables 477

Parallelizing the C++ Version 478

Attempt Three: Wrapping the Application in a C++ Class 479

Trang 27

Learning from real examples can fi lter theoretical distractions and inject less glamorous realities

Real experiences and examples help us to see what matters the most

In this book, I am pleased that Stephen shares tips from his interviews to understand how to really use tools and develop parallel code The result is a book with value that is not apparent from simply browsing the table of contents

For instance, I know data layout critically affects the ability to process data in parallel, but I like

to be convinced by real examples The topic of data layouts, such as the need to use “structures

of arrays” instead of “arrays of structures” (SOA vs AOS), is brought to the forefront by Stephen asking the provocative question, “If you were doing the project again, is there anything you would

do differently?” in the “Parallel Track Fitting in the CERN Collider” interview (Chapter 15) In response, the interviewed developer highlights the importance of data models to getting effective parallel programs “The World’s First Sudoku ‘Thirty-Niner’” (Chapter 13) highlights that “much

of the time taken was used in reworking the code so that there was less need to share data between the different running tasks.”

The ubiquitous nature of parallelism affects every aspect of programming today I’m encouraged by Stephen’s work, which walks through each aspect instead of just coding Covering the issues of dis-covery, debugging, and tuning is critical to understanding the challenges of parallel programming

I hope this book is an inspiration to all who read it

“Think Parallel.”

—James Reinders

Director, Parallel Evangelist, Intel

Portland, Oregon, March 2012

Trang 29

Nearly all the computers sold today have a multi-core processor, but only a small number of tions are written to take advantage of the extra cores Most programmers are playing catch-up A recent consultation with a group of senior programming engineers revealed the top three hurdles in adopting parallelism: the challenges of porting legacy code, the lack of education, and the lack of the right kinds of programming tools This book helps to address some of these hurdles

applica-This book was written to help you use Intel Parallel Studio XE to write programs that use the latest features of multi-core CPUs With the help of this book, you should be able to produce code that

is fast, safe, and parallel In addition to helping you write parallel code, some chapters cover other optimization topics that you can use in your code development, regardless of whether or not you are developing parallel code Most of the chapters include hands-on activities that will help you apply the techniques being explained

WHO THIS BOOK IS FOR

If you are writing parallel code or are interested in writing parallel code, this book is for you The target audience includes:

‰ C and C++ developers who are adding parallelism to their code The required technical skill

is “average” to “experienced.” Knowledge of C programming is a prerequisite

‰ Students and academics who are looking to gain practical experience in making code parallel

‰ Owners and users of Intel Parallel Studio XE

WHAT THIS BOOK COVERS

This book, written using Parallel Studio XE 2011, shows how you can profi le, optimize, and lelize your code By reading this book, you will learn how to:

paral-‰ Analyze applications to determine the best place to implement parallelism

‰ Implement parallelism using a number of language extensions/standards

‰ Detect and correct diffi cult to fi nd parallel errors

‰ Tune parallel programs

‰ Write code that is more secure

‰ Use the compiler switches to create optimized code that takes advantage of the latest CPU extensions

Trang 30

‰ Perform an architectural analysis to answer the question, “Is my program making the best

use of the CPU?”

HOW THIS BOOK IS STRUCTURED

The book is comprised of the following parts:

‰ Part I: An Introduction to Parallelism

‰ Part II: Using Parallel Studio XE

‰ Part III: Case Studies

Every chapter in the book, with the exception of the fi rst two chapters, offers hands-on activities

These activities are an important part of the book, although you can read the book without

com-pleting them

Chapters 6–9 are intended to be used in sequence, showing how to add parallelism to your code

using a well-tested, four-step methodology (analyze, implement, error-check, and tune) Examples of

parallelism are provided using Cilk Plus, OpenMP, and Threading Building Blocks

The case studies are based on larger projects and show how Parallel Studio XE was used to

parallel-ize them

WHAT YOU NEED TO USE THIS BOOK

You need the following to use this book:

‰ Intel Parallel Studio XE You can download an evaluation version from the Intel

Software Evaluation Center (http://software.intel.com/en-us/articles/

intel-software-evaluation-center/)

‰ If you are using Windows:

‰ Visual Studio (not the Express edition) version 2005, 2008, or 2010

‰ Windows XP, Windows 2008, or Windows 7

‰ If you are using Linux:

‰ An installation of the GNU GCC compiler development tools

‰ Debian* 6.0; Red Hat Enterprise Linux* 4 (Deprecated), 5, 6; SUSE Linux Enterprise Server* 10, 11 SP1; or Ubuntu* 10.04

‰ A PC based on an IA-32 or Intel 64 architecture processor supporting the Intel Streaming

SIMD Extensions 2 (Intel SSE2) instructions (Intel Pentium 4 processor or later), or

compat-ible non-Intel processor If you use a non-Intel processor, you will not be able to carry out

the activities in Chapter 12, “Event-Based Analysis with VTune Amplifi er XE.”

Trang 31

As for styles in the text:

We italicize new terms and important words when we introduce them.

‰ We show keyboard strokes like this: Ctrl+A

‰ We show fi lenames, URLs, and code within the text like so: persistence.properties

‰ We present code in two different ways:

We use a monofont type with no highlighting for most code examples.

We use bold to emphasize code that is particularly important in the present context

or to show changes from a previous code snippet.

Listings include the fi lename in the title If it is just a code snippet, you’ll fi nd the fi lename in a code note such as this:

Code snippet fi lename

Trang 32

Because many books have similar titles, you may fi nd it easiest to search by

ISBN; this book’s ISBN is 978-0-470-89165-0.

Once you download the code, just decompress it with your favorite compression tool Alternately,

you can go to the main Wrox code download page at www.wrox.com/dynamic/books/download

.aspx to see the code available for this book and all other Wrox books

ERRATA

We make every effort to ensure that there are no errors in the text or in the code However, no one

is perfect, and mistakes do occur If you fi nd an error in one of our books, like a spelling mistake

or faulty piece of code, we would be very grateful for your feedback By sending in errata, you may

save another reader hours of frustration, and at the same time, you will be helping us provide even

higher-quality information

To fi nd the errata page for this book, go to www.wrox.com and locate the title using the Search box

or one of the title lists Then, on the book details page, click the Book Errata link On this page, you

can view all errata that have been submitted for this book and posted by Wrox editors A complete

book list, including links to each book’s errata, is also available at www.wrox.com/misc-pages

/booklist.shtml

If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact

/techsupport.shtml and complete the form there to send us the error you have found We’ll check

the information and, if appropriate, post a message to the book’s errata page and fi x the problem in

subsequent editions of the book

P2P.WROX.COM

For author and peer discussion, join the P2P forums at p2p.wrox.com The forums are a web-based

system for you to post messages relating to Wrox books and related technologies and interact with

other readers and technology users The forums offer a subscription feature to e-mail you topics

of interest of your choosing when new posts are made to the forums Wrox authors, editors, other

industry experts, and your fellow readers are present on these forums

At http://p2p.wrox.com, you will fi nd a number of different forums that will help you, not only as

you read this book, but also as you develop your own applications To join the forums, just follow

these steps:

1. Go to p2p.wrox.com and click the Register link

2. Read the terms of use and click Agree

Trang 33

3. Complete the required information to join, as well as any optional information you wish to provide, and click Submit.

4. You will receive an e-mail with information describing how to verify your account and plete the joining process

com-You can read messages in the forums without joining P2P, but in order to post your own messages, you must join.

Once you join, you can post new messages and respond to messages other users post You can read messages at any time on the web If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing

For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works, as well as many common questions specifi c to P2P and Wrox books To read the FAQs, click the FAQ link on any P2P page

Trang 35

PART I

An Introduction to Parallelism

 CHAPTER 1: Parallelism Today

 CHAPTER 2: An Overview of Parallel Studio XE

 CHAPTER 3: Parallel Studio XE for the Impatient

Trang 37

Parallelism Today

WHAT’S IN THIS CHAPTER?

‰ How parallelism arrived and why parallel programming is feared

‰ Diff erent parallel models that you can use, along with some potential pitfalls this new type of programming introduces

‰ How to predict the behavior of parallel programs

The introduction of multi-core processors brings a new set of challenges for the programmer

After a brief discussion on the power density race, this chapter looks at the top six parallel programming challenges Finally, the chapter presents a number of different programming models that you can use to add parallelism to your code

THE ARRIVAL OF PARALLELISM

Parallelism is not new; indeed, parallel computer architectures were available in the 1950s

What is new is that parallelism is ubiquitous, available to everyone, and now in every computer

The Power Density Race

Over the recent decades, computer CPUs have become faster and more powerful; the clock speed of CPUs doubled almost every 18 months This rise in speed led to a dramatic rise in the power density Figure 1-1 shows the power density of different generations of processors

Power density is a measure of how much heat is generated by the CPU, and is usually

dis-sipated by a heat sink and cooling system If the trend of the 1990s were to continue into the twenty-fi rst century, the heat needing to be dissipated would be comparable to that of the sur-face of the sun — we would be at meltdown! A tongue-in-cheek cartoon competition appeared

on an x86 user-forum website in the early 1990s The challenge was to design an alternative

Trang 38

use of the Intel Pentium Processor The winner suggested a high-tech oven hot plate design using

four CPUs side-by-side

8086 8085

8080

286 386

486

Pentium processors

Sun’s surface

Rocket nozzle

Nuclear reactor

Hot plate

FIGURE 1-1: The power density race

Increasing CPU clock speed to get better software performance is well established Computer game

players use overclocking to get their games running faster Overclocking involves increasing the

CPU clock speed so that instructions are executed faster Processors often are run at speeds above

what the manufacturer specifi es One downside to overclocking is that it produces extra heat, which

needs dissipating Increasing the speed of a CPU by just a fraction can result in a chip that runs

much hotter So, for example, increasing a CPU clock speed by just over 20 percent causes the power

consumption to be almost doubled

Increasing clock speed was an important tool for the silicon manufacturer Many of the

perfor-mance claims and marketing messages were based purely on the clock speed Intel and AMD

typi-cally were leapfrogging over each other to produce faster and faster chips — all of great benefi t

to the computer user Eventually, as the physical limitations of the silicon were reached, further

increases in CPU speed gave diminishing returns

Even though the speed of the CPU is no longer growing rapidly, the number of transistors used in

CPU design is still growing, with the new transistors used to supply added functionality and

per-formance Most of the recent performance gains in CPUs are because of improved connections to

external memory, improved transistor design, extra parallel execution units, wider data registers

and buses, and placing multiple cores on one die The 3D-transistor, announced in May 2011, which

exhibits reduced current leakage and improved switching times while lowering power consumption,

will contribute to future microarchitecture improvements

The Emergence of Multi-Core and Many-Core Computing

Hidden in the power density race is the secret to why multi-core CPUs have become today’s solution

to the limits on performance

Trang 39

Rather than overclocking a CPU, if it were underclocked by 20 percent, the power consumption would be almost half the original value By putting two of these underclocked CPUs on the same die, you get a total performance improvement of more than 70 percent, with a power consumption being about the same as the original single-core processor The fi rst multi-core devices consisted of two underclocked CPUs on the same chip Reducing power consumption is one of the key ingredi-ents to the successful design of multi-core devices.

Gordon E Moore observed that the number of transistors that can be placed on integrated circuits

doubles about every two years — famously referred to as Moore’s Law Today, those transistors are

being used to add additional cores The current trend is that the number of cores in a CPU is bling about every 18 months Future devices are likely to have dozens of cores and are referred to as

dou-being many-core.

It is already possible to buy a regular PC machine that supports many hardware threads For ple, the workstation used to test some of the example programs in this book can support 24 parallel execution paths by having:

exam-‰ A two-socket motherboard

‰ Six-core XEON CPUs

‰ Hyper-threading, in which some of the internal electronics of the core are duplicated to double the amount of hardware threads that can be supported

One of Intel’s fi rst many-core devices was the Intel Terafl op Research Chip The processor, which came out of the Intel research facilities, had 80 cores and could do one terafl op, which is one tril-lion fl oating-point calculations per second In 2007, this device was demonstrated to the public As shown in Figure 1-2, the heat sink is quite small — an indication that despite its huge processing capability, it is energy effi cient

FIGURE 1-2: The 80-core Terafl op Research Chip

Trang 40

There is the huge difference in power consumption between the lower and higher clock speeds;

Table 1-1 provides sample values With a one-terafl op performance (1 ¥ 1012 fl oating-point

calcula-tions per second), 62 watts of power is used; to get 1.81 terafl ops of performance, the power

consumption is four times larger

TABLE 1-1: Power-to-Performance Relationship of the Terafl op Research Chip

SPEE D

(GHZ)

POWER (WATTS)

PERFORMANCE (TER AFLOPS)

The Intel Many Integrated Core Architecture (MIC) captures the essentials of Intel’s current

many-core strategy (see Figure 1-3) Each of the many-cores is connected together on an internal network

A 32-core preproduction version of such devices is already available

COHERENT CACHE

COHERENT CACHE

COHERENT CACHE

COHERENT CACHE

COHERENT CACHE

COHERENT CACHE

COHERENT CACHE

FIGURE 1-3: Intel’s many-core architecture

Many programmers are still operating with a single-core computing mind-set and have not taken up

the opportunities that multi-core programming brings

For some programmers, the divide between what is available in hardware and what the software is

doing is closing; for others, the gap is getting bigger

Adding parallelism to programs requires new skills, knowledge, and the appropriate software

devel-opment tools This book introduces Intel Parallel Studio XE, a software suite that helps the C\C++

and Fortran programmer to transition from serial programmer to parallel programmer Parallel

Studio XE is designed to help the programmer in all phases of the development of parallel code

The challenge (and opportunity) for the developer is knowing how to reap the rewards of improved

performance through parallelism

Ngày đăng: 01/08/2014, 16:45

TỪ KHÓA LIÊN QUAN