Static Security Analysis 51 Diff erent Approaches to Using Parallel Studio XE 52 Summary 52Example 1: Working with Cilk Plus 54 Using Intel Parallel Amplifi er XE for Hotspot Analysis 60
Trang 3PARALLEL PROGRAMMING
FOREWORD xxv
INTRODUCTION xxvii
PART I AN INTRODUCTION TO PARALLELISM CHAPTER 1 Parallelism Today 3
CHAPTER 2 An Overview of Parallel Studio XE 25
CHAPTER 3 Parallel Studio XE for the Impatient 53
PART II USING PARALLEL STUDIO XE CHAPTER 4 Producing Optimized Code 87
CHAPTER 5 Writing Secure Code 131
CHAPTER 6 Where to Parallelize 155
CHAPTER 7 Implementing Parallelism 181
CHAPTER 8 Checking for Errors 217
CHAPTER 9 Tuning Parallel Applications 251
CHAPTER 10 Parallel Advisor–Driven Design 277
CHAPTER 11 Debugging Parallel Applications 309
CHAPTER 12 Event-Based Analysis with VTune Amplifi er XE 341
PART III CASE STUDIES CHAPTER 13 The World’s First Sudoku “Thirty-Niner” 377
CHAPTER 14 Nine Tips to Parallel-Programming Heaven 397
CHAPTER 15 Parallel Track Fitting in the CERN Collider 419
CHAPTER 16 Parallelizing Legacy Code 463
INDEX 489
Trang 5Parallel Programming
Trang 7Parallel Programming
Stephen Blair-Chappell Andrew Stokes
Trang 8Copyright © 2012 by John Wiley & Sons, Inc., Indianapolis, Indiana
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means,
electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108
of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization
through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers,
MA 01923, (978) 750-8400, fax (978) 646-8600 Requests to the Publisher for permission should be addressed to the
Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)
748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with
respect to the accuracy or completeness of the contents of this work and specifi cally disclaim all warranties, including
without limitation warranties of fi tness for a particular purpose No warranty may be created or extended by sales or
pro-motional materials The advice and strategies contained herein may not be suitable for every situation This work is sold
with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services
If professional assistance is required, the services of a competent professional person should be sought Neither the
pub-lisher nor the author shall be liable for damages arising herefrom The fact that an organization or Web site is referred to
in this work as a citation and/or a potential source of further information does not mean that the author or the publisher
endorses the information the organization or Web site may provide or recommendations it may make Further, readers
should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was
written and when it is read.
For general information on our other products and services please contact our Customer Care Department within the
United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand Some material included with
stan-dard print versions of this book may not be included in e-books or in print-on-demand If this book refers to media such
as a CD or DVD that is not included in the version you purchased, you may download this material at http://
booksupport.wiley.com For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2011945570
Trademarks: Wiley, the Wiley logo, Wrox, the Wrox logo, Programmer to Programmer, and related trade dress are
trade-marks or registered tradetrade-marks of John Wiley & Sons, Inc and/or its affi liates, in the United States and other countries,
and may not be used without written permission Intel is a registered trademark of Intel Corporation All other
trade-marks are the property of their respective owners John Wiley & Sons, Inc., is not associated with any product or vendor
mentioned in this book.
Trang 9ABOUT THE AUTHORS
STEPHEN BLAIR-CHAPPELL has been working for Intel in the Software and Services Group (SSG) for the past 15 years During his time with Intel, Stephen has worked on the compiler team as a devel-oper and, more recently, as a technical consulting engineer helping users make the best use of the Intel software tools Prior to working with Intel, Stephen was managing director of the UK offi ce of CAD-UL, a German-based compiler and debugger company During his time at CAD-UL Stephen was primarily responsible for technical support in the UK Projects he worked on during that time included the design and specifi cation of a graphical linker; the development and teaching of pro-tected mode programming courses to programmers; and support to many varied companies in the telecoms, automotive, and embedded industries
Stephen fi rst studied electronics as a technician at Matthew Boulton Technical College, and later studied Applied Software Engineering at Birmingham City University (BCU), where he also eventu-ally taught Outside work, Stephen is a regular contributor to the life of his local church, St Martin
in the Bull Ring, Birmingham, where he plays the organ, preaches, and leads the occasional service
ANDREW STOKES is a retired lecturer in software and electronics at Birmingham City University (BCU), UK Prior to lecturing, Andrew was a software developer in the research and commercial
fi elds He fi rst started software development in the 1980s at Cambridge University Engineering Laboratory, where he worked on software for scanning electron microscopes These software devel-opments continued in the commercial fi eld, where he worked on graphical programs in support of a Finite Element Analysis package
During his time at BCU, Andrew developed many software simulation tools, including programs for artifi cial neural network simulation, CPU simulation, processor design, code development tools, and a PROLOG expert system Andrew continues these software interests during retirement, with
a healthy interest in games programming, such as 3-D chess, where parallel programming is mount Away from computing, Andrew is a keen gardener and particularly likes the vibrant colors
para-of the typical English garden
Trang 10ABOUT THE TECHNICAL EDITORS
KITTUR GANESH is a Senior Technical Consulting Engineer at Intel, providing consulting,
sup-port, and training for more than 7 years on various software products targeting Intel architecture
Previously, for more than 6 years at Intel, Kittur designed and developed software primarily used
for fracturing design data of Intel chips Prior to joining Intel more than 13 years ago, Kittur was
involved in developing commercial software in the EDA industry for more than 10 years Kittur has
a M.S (Computer Science), M.S (Industrial Engineering) and a B.S (Mechanical Engineering)
PABLO HALPERN is a Senior Software Engineer at Intel Corporation, working in the parallel runtime
libraries group He is a member of the C++ Standards Committee and helped produce the recent
C++11 revision of the standard Pablo is the author of the well-received book, The C++ Standard
Library from Scratch and a coauthor of the paper, Reducers and Other Cilk++ Hyperobjects, which
was named best paper at ACM SPAA in 2009 He has more than three decades of experience in the
software industry, with expertise in C++, language and compiler design, large-scale development
and testing, and network management protocols During this time, he has developed and taught
both beginning and advanced courses on C++ programming He currently lives in New Hampshire
with his wife and two children
Trang 11Mary Beth Wakefi eld
FREEL ANCER EDITORIAL MANAGER
Trang 13in the graphics department, for your work on all the fi gures.
We’d like to say a special “thank you” to those who have contributed to the book, especially to Mark Davis, who wrote Chapter 10, “Parallel Advisor–Driven Design,” and to Fred Tedeschi, who wrote Chapter 11, “Debugging Parallel Applications.”
We also want to thank those who allowed us to write about their experiences in the case studies
Thanks to Lars Peters Endresen and Håvard Graff for their work in Chapter 13, “The World’s First Sudoku ‘Thirty-Niner’”; to Dr Yann Golanski, for his input into Chapter 14, “Nine Tips to Parallel-Programming Heaven”; and to Hans Pabst, for his help with Chapter 15, “Parallel Track Fitting in the CERN Collider.”
Our appreciation also goes to the many colleagues from Intel who tirelessly reviewed different chapters — in particular, Levent Akyil, Bernth Andersson, Julian Horn, Martyn Corden, Maxym Dmytrychenko, Max Domeika, Hubert Haberstock, Markus Metzger, Mark Sabahi, and Thomas Zipplies
Finally, thanks to James Reinders, who encouraged the writing of this book and has been kind enough to provide the Foreword
— Stephen Blair-Chappell
Andrew Stokes
Trang 15FOREWORD xxv
INTRODUCTION xxvii
PART I: AN INTRODUCTION TO PARALLELISM
The Emergence of Multi-Core and Many-Core Computing 4
Tools 7Education 8
Maintainability 8
Parallelism and the Programmer 9
Examples of Mixing and Matching Parallel Constructs 12
Trang 16Speedup and Scalability 19
Summary 24
What’s in Parallel Studio XE? 26
OpenMP 37
Trang 17Static Security Analysis 51 Diff erent Approaches to Using Parallel Studio XE 52 Summary 52
Example 1: Working with Cilk Plus 54
Using Intel Parallel Amplifi er XE for Hotspot Analysis 60
Example 2: Working with OpenMP 73
Summary 84
PART II: USING PARALLEL STUDIO XE
Introduction 88
Optimizing Code in Seven Steps 90
Trang 18Using the General Options on the Example Application 94Generating Optimization Reports Using /Qopt-report 95
Determining That Auto-Vectorization Has Happened 100
Adding Interprocedural Optimization to the Example Application 108The Impact of Interprocedural Optimization on Auto-Vectorization 109
Building Applications to Run on More Than One Type of CPU 118
Manual CPU Dispatch: Rolling Your Own CPU-Specifi c Code 124
Summary 130
A Simple Security Flaw Example 132 Understanding Static Security Analysis 134
Creating a Build Specifi cation File by Injection 146
Trang 19Using Static Security Analysis in a QA Environment 149
Summary 154
Diff erent Ways of Profi ling 156
Hotspot Analysis Using the Intel Compiler 158
Hotspot Analysis Using the Auto-Parallelizer 165
Hotspot Analysis with Amplifi er XE 171
Summary 180
C or C++, That Is the Question 182
The Beauty of Lambda Functions 183
Trang 20Parallelizing Sections and Functions 193
OpenMP 196TBB 197
Parallelizing Recursive Functions 198
OpenMP 200TBB 200
Parallelizing Pipelined Applications 201
OpenMP 205TBB 206
Summary 215
Parallel Inspector XE Analysis Types 218
Trang 21Controlling the Right Level of Detail 227
Avoiding Being Overwhelmed by the Amount of Data 228
Measuring the Baseline Using the Amplifi er XE Command Line 253
Identifying Concurrency Hotspots 255
Conducting Further Analysis and Tuning 264
Using the Intel Software Autotuning Tool 271
Summary 275
Trang 22CHAPTER 10: PARALLEL ADVISOR–DRIVEN DESIGN 277
Getting Started with the NQueens Example Program 280
Summary 308
Introduction to the Intel Debugger 309
Using the Intel Debugger to Detect Data Races 311
Trang 23Observing the Results 315
Using Suppression Filters to Discard Unwanted Events 319
Using Focus Filters to Examine a Selected Portion of Code 325
Runtime Investigation: Viewing the State of Your Application 333
Using the OpenMP Tasks Window to Investigate Variables
Using the OpenMP Spawn Tree Window to View the Behavior
Summary 339
Testing the Health of an Application 342
Is CPI on Its Own a Good Enough Measure of Health? 343
Conducting a Hotspot Analysis 345
User Mode Hotspots Versus Lightweight Hotspots 346
Conducting a General Exploration Analysis 352
Using Amplifi ers XE’s Other Tools 364
Trang 24Using Amplifi er XE from the Command Line 369
Summary 374
PART III: CASE STUDIES
The Sudoku Optimization Challenge 377
Hands-On Example: Optimizing the Sudoku Generator 384
Adding Parallelism to the Generator Using OpenMP 391
Summary 396
The Challenge: Simulating Star Formation 397
Identifying the Hotspot and Discovering the Calling Sequence 410
Trang 25Implementing Parallelism 410
Summary 416
The Stages of a High-Energy Physics Experiment 420
Confi guring the Array Building Blocks Build Environment 444
Summary 460
Trang 26CHAPTER 16: PARALLELIZING LEGACY CODE 463
Introducing the Dhrystone Benchmark 464
Adding Amplifi er XE APIs to Timestamp the Dhrystone Loop 468
Attempt One: Synchronizing Shared Variable Access 472
Initializing and Accessing the Global Variables 477
Parallelizing the C++ Version 478
Attempt Three: Wrapping the Application in a C++ Class 479
Trang 27Learning from real examples can fi lter theoretical distractions and inject less glamorous realities
Real experiences and examples help us to see what matters the most
In this book, I am pleased that Stephen shares tips from his interviews to understand how to really use tools and develop parallel code The result is a book with value that is not apparent from simply browsing the table of contents
For instance, I know data layout critically affects the ability to process data in parallel, but I like
to be convinced by real examples The topic of data layouts, such as the need to use “structures
of arrays” instead of “arrays of structures” (SOA vs AOS), is brought to the forefront by Stephen asking the provocative question, “If you were doing the project again, is there anything you would
do differently?” in the “Parallel Track Fitting in the CERN Collider” interview (Chapter 15) In response, the interviewed developer highlights the importance of data models to getting effective parallel programs “The World’s First Sudoku ‘Thirty-Niner’” (Chapter 13) highlights that “much
of the time taken was used in reworking the code so that there was less need to share data between the different running tasks.”
The ubiquitous nature of parallelism affects every aspect of programming today I’m encouraged by Stephen’s work, which walks through each aspect instead of just coding Covering the issues of dis-covery, debugging, and tuning is critical to understanding the challenges of parallel programming
I hope this book is an inspiration to all who read it
“Think Parallel.”
—James Reinders
Director, Parallel Evangelist, Intel
Portland, Oregon, March 2012
Trang 29Nearly all the computers sold today have a multi-core processor, but only a small number of tions are written to take advantage of the extra cores Most programmers are playing catch-up A recent consultation with a group of senior programming engineers revealed the top three hurdles in adopting parallelism: the challenges of porting legacy code, the lack of education, and the lack of the right kinds of programming tools This book helps to address some of these hurdles
applica-This book was written to help you use Intel Parallel Studio XE to write programs that use the latest features of multi-core CPUs With the help of this book, you should be able to produce code that
is fast, safe, and parallel In addition to helping you write parallel code, some chapters cover other optimization topics that you can use in your code development, regardless of whether or not you are developing parallel code Most of the chapters include hands-on activities that will help you apply the techniques being explained
WHO THIS BOOK IS FOR
If you are writing parallel code or are interested in writing parallel code, this book is for you The target audience includes:
‰ C and C++ developers who are adding parallelism to their code The required technical skill
is “average” to “experienced.” Knowledge of C programming is a prerequisite
‰ Students and academics who are looking to gain practical experience in making code parallel
‰ Owners and users of Intel Parallel Studio XE
WHAT THIS BOOK COVERS
This book, written using Parallel Studio XE 2011, shows how you can profi le, optimize, and lelize your code By reading this book, you will learn how to:
paral-‰ Analyze applications to determine the best place to implement parallelism
‰ Implement parallelism using a number of language extensions/standards
‰ Detect and correct diffi cult to fi nd parallel errors
‰ Tune parallel programs
‰ Write code that is more secure
‰ Use the compiler switches to create optimized code that takes advantage of the latest CPU extensions
Trang 30‰ Perform an architectural analysis to answer the question, “Is my program making the best
use of the CPU?”
HOW THIS BOOK IS STRUCTURED
The book is comprised of the following parts:
‰ Part I: An Introduction to Parallelism
‰ Part II: Using Parallel Studio XE
‰ Part III: Case Studies
Every chapter in the book, with the exception of the fi rst two chapters, offers hands-on activities
These activities are an important part of the book, although you can read the book without
com-pleting them
Chapters 6–9 are intended to be used in sequence, showing how to add parallelism to your code
using a well-tested, four-step methodology (analyze, implement, error-check, and tune) Examples of
parallelism are provided using Cilk Plus, OpenMP, and Threading Building Blocks
The case studies are based on larger projects and show how Parallel Studio XE was used to
parallel-ize them
WHAT YOU NEED TO USE THIS BOOK
You need the following to use this book:
‰ Intel Parallel Studio XE You can download an evaluation version from the Intel
Software Evaluation Center (http://software.intel.com/en-us/articles/
intel-software-evaluation-center/)
‰ If you are using Windows:
‰ Visual Studio (not the Express edition) version 2005, 2008, or 2010
‰ Windows XP, Windows 2008, or Windows 7
‰ If you are using Linux:
‰ An installation of the GNU GCC compiler development tools
‰ Debian* 6.0; Red Hat Enterprise Linux* 4 (Deprecated), 5, 6; SUSE Linux Enterprise Server* 10, 11 SP1; or Ubuntu* 10.04
‰ A PC based on an IA-32 or Intel 64 architecture processor supporting the Intel Streaming
SIMD Extensions 2 (Intel SSE2) instructions (Intel Pentium 4 processor or later), or
compat-ible non-Intel processor If you use a non-Intel processor, you will not be able to carry out
the activities in Chapter 12, “Event-Based Analysis with VTune Amplifi er XE.”
Trang 31As for styles in the text:
‰ We italicize new terms and important words when we introduce them.
‰ We show keyboard strokes like this: Ctrl+A
‰ We show fi lenames, URLs, and code within the text like so: persistence.properties
‰ We present code in two different ways:
We use a monofont type with no highlighting for most code examples.
We use bold to emphasize code that is particularly important in the present context
or to show changes from a previous code snippet.
Listings include the fi lename in the title If it is just a code snippet, you’ll fi nd the fi lename in a code note such as this:
Code snippet fi lename
Trang 32Because many books have similar titles, you may fi nd it easiest to search by
ISBN; this book’s ISBN is 978-0-470-89165-0.
Once you download the code, just decompress it with your favorite compression tool Alternately,
you can go to the main Wrox code download page at www.wrox.com/dynamic/books/download
.aspx to see the code available for this book and all other Wrox books
ERRATA
We make every effort to ensure that there are no errors in the text or in the code However, no one
is perfect, and mistakes do occur If you fi nd an error in one of our books, like a spelling mistake
or faulty piece of code, we would be very grateful for your feedback By sending in errata, you may
save another reader hours of frustration, and at the same time, you will be helping us provide even
higher-quality information
To fi nd the errata page for this book, go to www.wrox.com and locate the title using the Search box
or one of the title lists Then, on the book details page, click the Book Errata link On this page, you
can view all errata that have been submitted for this book and posted by Wrox editors A complete
book list, including links to each book’s errata, is also available at www.wrox.com/misc-pages
/booklist.shtml
If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact
/techsupport.shtml and complete the form there to send us the error you have found We’ll check
the information and, if appropriate, post a message to the book’s errata page and fi x the problem in
subsequent editions of the book
P2P.WROX.COM
For author and peer discussion, join the P2P forums at p2p.wrox.com The forums are a web-based
system for you to post messages relating to Wrox books and related technologies and interact with
other readers and technology users The forums offer a subscription feature to e-mail you topics
of interest of your choosing when new posts are made to the forums Wrox authors, editors, other
industry experts, and your fellow readers are present on these forums
At http://p2p.wrox.com, you will fi nd a number of different forums that will help you, not only as
you read this book, but also as you develop your own applications To join the forums, just follow
these steps:
1. Go to p2p.wrox.com and click the Register link
2. Read the terms of use and click Agree
Trang 333. Complete the required information to join, as well as any optional information you wish to provide, and click Submit.
4. You will receive an e-mail with information describing how to verify your account and plete the joining process
com-You can read messages in the forums without joining P2P, but in order to post your own messages, you must join.
Once you join, you can post new messages and respond to messages other users post You can read messages at any time on the web If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing
For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works, as well as many common questions specifi c to P2P and Wrox books To read the FAQs, click the FAQ link on any P2P page
Trang 35PART I
An Introduction to Parallelism
CHAPTER 1: Parallelism Today
CHAPTER 2: An Overview of Parallel Studio XE
CHAPTER 3: Parallel Studio XE for the Impatient
Trang 37Parallelism Today
WHAT’S IN THIS CHAPTER?
‰ How parallelism arrived and why parallel programming is feared
‰ Diff erent parallel models that you can use, along with some potential pitfalls this new type of programming introduces
‰ How to predict the behavior of parallel programs
The introduction of multi-core processors brings a new set of challenges for the programmer
After a brief discussion on the power density race, this chapter looks at the top six parallel programming challenges Finally, the chapter presents a number of different programming models that you can use to add parallelism to your code
THE ARRIVAL OF PARALLELISM
Parallelism is not new; indeed, parallel computer architectures were available in the 1950s
What is new is that parallelism is ubiquitous, available to everyone, and now in every computer
The Power Density Race
Over the recent decades, computer CPUs have become faster and more powerful; the clock speed of CPUs doubled almost every 18 months This rise in speed led to a dramatic rise in the power density Figure 1-1 shows the power density of different generations of processors
Power density is a measure of how much heat is generated by the CPU, and is usually
dis-sipated by a heat sink and cooling system If the trend of the 1990s were to continue into the twenty-fi rst century, the heat needing to be dissipated would be comparable to that of the sur-face of the sun — we would be at meltdown! A tongue-in-cheek cartoon competition appeared
on an x86 user-forum website in the early 1990s The challenge was to design an alternative
Trang 38use of the Intel Pentium Processor The winner suggested a high-tech oven hot plate design using
four CPUs side-by-side
8086 8085
8080
286 386
486
Pentium processors
Sun’s surface
Rocket nozzle
Nuclear reactor
Hot plate
FIGURE 1-1: The power density race
Increasing CPU clock speed to get better software performance is well established Computer game
players use overclocking to get their games running faster Overclocking involves increasing the
CPU clock speed so that instructions are executed faster Processors often are run at speeds above
what the manufacturer specifi es One downside to overclocking is that it produces extra heat, which
needs dissipating Increasing the speed of a CPU by just a fraction can result in a chip that runs
much hotter So, for example, increasing a CPU clock speed by just over 20 percent causes the power
consumption to be almost doubled
Increasing clock speed was an important tool for the silicon manufacturer Many of the
perfor-mance claims and marketing messages were based purely on the clock speed Intel and AMD
typi-cally were leapfrogging over each other to produce faster and faster chips — all of great benefi t
to the computer user Eventually, as the physical limitations of the silicon were reached, further
increases in CPU speed gave diminishing returns
Even though the speed of the CPU is no longer growing rapidly, the number of transistors used in
CPU design is still growing, with the new transistors used to supply added functionality and
per-formance Most of the recent performance gains in CPUs are because of improved connections to
external memory, improved transistor design, extra parallel execution units, wider data registers
and buses, and placing multiple cores on one die The 3D-transistor, announced in May 2011, which
exhibits reduced current leakage and improved switching times while lowering power consumption,
will contribute to future microarchitecture improvements
The Emergence of Multi-Core and Many-Core Computing
Hidden in the power density race is the secret to why multi-core CPUs have become today’s solution
to the limits on performance
Trang 39Rather than overclocking a CPU, if it were underclocked by 20 percent, the power consumption would be almost half the original value By putting two of these underclocked CPUs on the same die, you get a total performance improvement of more than 70 percent, with a power consumption being about the same as the original single-core processor The fi rst multi-core devices consisted of two underclocked CPUs on the same chip Reducing power consumption is one of the key ingredi-ents to the successful design of multi-core devices.
Gordon E Moore observed that the number of transistors that can be placed on integrated circuits
doubles about every two years — famously referred to as Moore’s Law Today, those transistors are
being used to add additional cores The current trend is that the number of cores in a CPU is bling about every 18 months Future devices are likely to have dozens of cores and are referred to as
dou-being many-core.
It is already possible to buy a regular PC machine that supports many hardware threads For ple, the workstation used to test some of the example programs in this book can support 24 parallel execution paths by having:
exam-‰ A two-socket motherboard
‰ Six-core XEON CPUs
‰ Hyper-threading, in which some of the internal electronics of the core are duplicated to double the amount of hardware threads that can be supported
One of Intel’s fi rst many-core devices was the Intel Terafl op Research Chip The processor, which came out of the Intel research facilities, had 80 cores and could do one terafl op, which is one tril-lion fl oating-point calculations per second In 2007, this device was demonstrated to the public As shown in Figure 1-2, the heat sink is quite small — an indication that despite its huge processing capability, it is energy effi cient
FIGURE 1-2: The 80-core Terafl op Research Chip
Trang 40There is the huge difference in power consumption between the lower and higher clock speeds;
Table 1-1 provides sample values With a one-terafl op performance (1 ¥ 1012 fl oating-point
calcula-tions per second), 62 watts of power is used; to get 1.81 terafl ops of performance, the power
consumption is four times larger
TABLE 1-1: Power-to-Performance Relationship of the Terafl op Research Chip
SPEE D
(GHZ)
POWER (WATTS)
PERFORMANCE (TER AFLOPS)
The Intel Many Integrated Core Architecture (MIC) captures the essentials of Intel’s current
many-core strategy (see Figure 1-3) Each of the many-cores is connected together on an internal network
A 32-core preproduction version of such devices is already available
COHERENT CACHE
COHERENT CACHE
COHERENT CACHE
COHERENT CACHE
COHERENT CACHE
COHERENT CACHE
COHERENT CACHE
FIGURE 1-3: Intel’s many-core architecture
Many programmers are still operating with a single-core computing mind-set and have not taken up
the opportunities that multi-core programming brings
For some programmers, the divide between what is available in hardware and what the software is
doing is closing; for others, the gap is getting bigger
Adding parallelism to programs requires new skills, knowledge, and the appropriate software
devel-opment tools This book introduces Intel Parallel Studio XE, a software suite that helps the C\C++
and Fortran programmer to transition from serial programmer to parallel programmer Parallel
Studio XE is designed to help the programmer in all phases of the development of parallel code
The challenge (and opportunity) for the developer is knowing how to reap the rewards of improved
performance through parallelism