Anthony williams c++ concurency in action, practical multithreading

Đây là quyển sách tiếng anh về lĩnh vực công nghệ thông tin cho sinh viên và những ai có đam mê. Quyển sách này trình về lý thuyết ,phương pháp lập trình cho ngôn ngữ C và C++.

Trang 2

MEAP Edition Manning Early Access Program

For more information on this and other Manning titles go to

www.manning.com

Trang 3

©Manning Publications Co Please post comments or corrections to the Author Online forum:

Table of Contents

Chapter One: Introduction

Chapter Two: Managing Threads

Chapter Three: Sharing Data

Chapter Four: Synchronizing Concurrent Operations

Chapter Five: The C++ Memory Model and Operations on Atomic Types

Chapter Six: Designing Data Structures for Concurrency I: Lock-based Data Structures Chapter Seven: Designing Data Structures for Concurrency II: Lock-free Concurrent Data Structures

Chapter Eight: Designing Concurrent Code

Chapter Nine: High Level Thread Management

Chapter Ten: Testing and Debugging Multi-threaded Applications

Appendix A: New Features of the C++ language used by the thread library

Trang 4

1

Introduction

These are exciting times for C++ users Eleven years after the original C++ Standard was published in 1998, the C++ Standards committee is giving the language and its supporting library a major overhaul The new C++ Standard (referred to as C++0x) is due to be published in 2010 and will bring with it a whole swathe of changes that will make working with C++ easier and more productive

One of the most significant new features in the C++0x Standard is the support of threaded programs For the first time, the C++ Standard will acknowledge the existence of multi-threaded applications in the language, and provide components in the library for writing multi-threaded applications This will make it possible to write multi-threaded C++ programs without relying on platform-specific extensions, and thus allow us to write portable multi-threaded code with guaranteed behaviour It also comes at a time when programmers are increasingly looking to concurrency in general, and multi-threaded programming in particular in order to improve application performance

multi-This book is about writing programs in C++ using multiple threads for concurrency, and the C++ language features and library facilities that make that possible I'll start by explaining what I mean by concurrency and multi-threading, and why you would want to use

it in your applications After a quick detour into why you might not want to use it in your

application, I'll give an overview of the concurrency support in C++, and round off this chapter with a simple example of C++ concurrency in action Readers experienced with developing multi-threaded applications may wish to skip the early sections In subsequent chapters we'll cover more extensive examples, and look at the library facilities in more depth The book will finish with an in-depth reference to all the Standard C++ Library facilities for multi-threading and concurrency

So, what do I mean by concurrency and multi-threading?

Trang 5

1.1 What is Concurrency?

At the simplest and most basic level, concurrency is about two or more separate activities happening at the same time We encounter concurrency as a natural part of life: we can walk and talk at the same time or perform different actions with each hand, and of course we each go about our lives independently of each other — you can watch football whilst I go swimming, and so on

1.1.1 Concurrency in Computer Systems

When we talk about concurrency in terms of computers, we mean a single system performing multiple independent activities in parallel, rather than sequentially one after the other It is not a new phenomenon: multi-tasking operating systems that allow a single computer to run multiple applications at the same time through task switching have been common place for many years, and high-end server machines with multiple processors that

enable genuine concurrency have been available for even longer What is new is the

increased prevalence of computers that can genuinely run multiple tasks in parallel rather than just giving the illusion of doing so

Historically, most computers have had one processor, with a single processing unit or core, and this remains true for many desktop machines today Such a machine can really only perform one task at a time, but they can switch between tasks many times per second

By doing a bit of one task and then a bit of another and so on, it appears that they are

happening concurrently This is called task switching We still talk about concurrency with

such systems: since the task switches are so fast, you can't tell at which point a task may be suspended as the processor switches to another one The task switching provides an illusion

of concurrency both to the user and the applications themselves Since there is only an

illusion of concurrency, the behaviour of applications may be subtly different when executing

in a single-processor task-switching environment compared to when executing in an environment with true concurrency In particular, incorrect assumptions about the memory model (covered in chapter 5) may not show up in such an environment This is discussed in more depth in chapter 10

Computers containing multiple processors have been used for servers and performance computing tasks for a number of years, and now computers based around processors with more than one core on a single chip (multi-core processors) are becoming increasingly common as desktop machines too Whether they have multiple processors or multiple cores within a processor (or both), these computers are capable of genuinely

high-running more than one task in parallel We call this hardware concurrency

Trang 6

Figure 1.1 shows an idealized scenario of a computer with precisely two task to do, each divided into ten equally-sized chunks On a dual-core machine (which thus has two processing cores), each task can execute on its own core On a single-core machine doing task-switching, the chunks from each task are interleaved However, they are also spaced out a bit (in the diagram this is shown by the grey bars separating the chunks being thicker):

in order to do the interleaving, the system has to perform a context switch every time it

changes from one task to another, and this takes time In order to perform a context switch the OS has to save the CPU state and instruction pointer for the currently running task, work out which task to switch to, and reload the CPU state for the task being switched to The CPU will then potentially have to load the memory for the instructions and data for the new task into cache, which can prevent the CPU executing any instructions, thus causing further delay

Figure 1.1 Two approaches to concurrency: parallel execution on a dual-core machine vs task-switching on a single core machine

Though the availability of concurrency in the hardware is most obvious with processor or multi-core systems, some processors can execute multiple threads on a single

multi-core The important factor to consider is really the number of hardware threads: the

measure of how many independent tasks the hardware can genuinely run concurrently Even with a system that has genuine hardware concurrency, it is easy to have more tasks than the hardware can run in parallel, so task switching is still used in these cases For example, on a typical desktop computer there may be hundreds of tasks running, performing background operations, even when the computer is nominally idle It is the task-switching that allows these background tasks to run, and allows you to run your word processor, compiler, editor and web browser (or any combination of applications) all at once Figure 1.2 shows task switching between four tasks on a dual-core machine, again for an idealized scenario with the tasks divided neatly into equal-sized chunks In practice there are many issues which will make the divisions uneven and the scheduling irregular Some of these are covered in chapter 8 when we look at factors affecting the performance of concurrent code

Trang 7

Figure 1.2 Task switching with two cores

All the techniques, functions and classes covered in this book can be used whether you're application is running on a machine with one single-core processor, or a machine with many multi-core processors, and are not affected by whether the concurrency is achieved through task switching or by genuine hardware concurrency However, as you may imagine, how you make use of concurrency in your application may well depend on the amount of hardware concurrency available This is covered in chapter 8, where I cover the issues involved with designing concurrent code in C++

1.1.2 Approaches to Concurrency

Imagine for a moment a pair of programmers working together on a software project If your developers are in separate offices, they can go about their work peacefully, without being disturbed by each other, and they each have their own set of reference manuals However, communication is not straightforward: rather than just turning round and talking, they have

to use the phone or email or get up and walk Also, you've got the overhead of two offices to manage, and multiple copies of reference manuals to purchase

Now imagine that you move your developers in to the same office They can now talk to each other freely to discuss the design of the application, and can easily draw diagrams on paper or on a whiteboard to help with design ideas or explanations You've now only got one office to manage, and one set of resources will often suffice On the negative side, they might find it harder to concentrate, and there may be issues with sharing resources (“Where's the reference manual gone now?”)

These two ways of organising your developers illustrate the two basic approaches to concurrency Each developer represents a thread, and each office represents a process The first approach is to have multiple single-threaded processes, which is similar to having each developer in his own office, and the second approach is to have multiple threads in a single process, which is like having two developers in the same room You can of course combine these in an arbitrary fashion and have multiple processes, some of which are multi-threaded, and some of which are single-threaded, but the principles are the same Let's now have a brief look at these two approaches to concurrency in an application

Trang 8

Concurrency with Multiple Processes

The first way to make use of concurrency within an application is to divide the application into multiple separate single-threaded processes which are run at the same time, much as you can run your web browser and word processor at the same time These separate processes can then pass messages to each other through all the normal interprocess communication channels (signals, sockets, files, pipes, etc.), as shown in figure 1.3 One downside is that such communication between processes is often either complicated to set

up, slow, or both, since operating systems typically provide a lot of protection between processes to avoid one process accidentally modifying data belonging to another process Another downside is that there is an inherent overhead in running multiple processes: it takes time to start a process, the operating system must devote internal resources to managing the process, and so forth

Figure 1.3 Communication between a pair of processes running concurrently

Of course, it's not all downside: the added protection operating systems typically provide between processes and the higher-level communication mechanisms mean that it can be

easier to write safe concurrent code with processes rather than threads Indeed,

environments such as that provided for the Erlang programming language use processes as the fundamental building block of concurrency to great effect

Using separate processes for concurrency also has an additional advantage — you can run the separate processes on distinct machines connected over a network Though this increases the communication cost, on a carefully designed system it can be a very cost effective way of increasing the available parallelism, and improving performance

Trang 9

Concurrency with Multiple Threads

The alternative approach to concurrency is to run multiple threads in a single process Threads are very much like lightweight processes — each thread runs independently of the others, and each thread may run a different sequence of instructions However, all threads in

a process share the same address space, and the majority of data can be accessed directly from all threads — global variables remain global, and pointers or references to objects or data can be passed around between threads Though it is often possible to share memory between processes, this is more complicated to set up, and often harder to manage, as memory addresses of the same data are not necessarily the same in different processes Figure 1.4 shows two threads within a process communicating through shared memory

Figure 1.4 Communication between a pair of threads running concurrently in a single process

The shared address space and lack of protection of data between threads makes the overhead associated with using multiple threads much smaller than that from using multiple processes, as the operating system has less book-keeping to do However, the flexibility of shared memory also comes with a price — if data is accessed by multiple threads, then the application programmer must ensure that the view of data seen by each thread is consistent whenever it is accessed The issues surrounding sharing data between threads and the tools

to use and guidelines to follow to avoid problems are covered throughout the book, notably

in chapters 3, 4, 5 and 8 The problems are not insurmountable, provided suitable care is taken when writing the code, but they do mean that a great deal of thought must go in to the communication between threads

The low overhead associated with launching and communicating between multiple threads within a process compared to launching and communicating between multiple single-threaded processes means that this is the favoured approach to concurrency in mainstream languages including C++, despite the potential problems arising from the shared memory In

Trang 10

addition, the C++ standard does not provide any intrinsic support for communication between processes, so applications that use multiple processes will have to rely on platform-specific APIs to do so This book therefore focuses exclusively on using multi-threading for concurrency, and future references to concurrency are under the assumption that this is achieved by using multiple threads

Having clarified what we mean by concurrency, let's now look at why we would use concurrency in our applications

1.2 Why Use Concurrency?

There are two main reasons to use concurrency in an application: separation of concerns and

performance In fact, I'd go so far as to say they are the pretty much the only reasons to use

concurrency: anything else boils down to one or the other (or maybe even both) when you look hard enough (well, except for reasons like “because I want to”)

1.2.1 Using Concurrency for Separation of Concerns

Separation of concerns is almost always a good idea when writing software: by grouping related bits of code together, and keeping unrelated bits of code apart we can make our programs easier to understand and test, and thus less likely to contain bugs We can use concurrency to separate distinct areas of functionality even when the operations in these distinct areas need to happen at the same time: without the explicit use of concurrency we either have to write a task-switching framework, or actively make calls to unrelated areas of code during an operation

Consider a processing-intensive application with a user-interface, such as a DVD player application for a desktop computer Such an application fundamentally has two sets of responsibilities: not only does it have to read the data from the disk, decode the images and sound and send them to the graphics and sound hardware in a timely fashion so the DVD plays without glitches, but it must also take input from the user, such as when the user clicks “pause” or “return to menu”, or even “quit” In a single thread, the application has to check for user input at regular intervals during the playback, thus conflating the DVD playback code with the user interface code By using multi-threading to separate these concerns, the user interface code and DVD playback code no longer have to be so closely intertwined: one thread can handle the user interface, and another the DVD playback Of course there will have to be interaction between them, such as when the user clicks “pause”, but now these interactions are directly related to the task at hand

This gives the illusion of responsiveness, as the user-interface thread can typically respond immediately to a user request, even if the response is simply to display a “busy” cursor or “please wait” message whilst the request is conveyed to the thread doing the work

Trang 11

Similarly, separate threads are often used to run tasks which must run continuously in the background, such as monitoring the filesystem for changes in a desktop search application Using threads in this way generally makes the logic in each thread much simpler, as the interactions between them can be limited to clearly identifiable points, rather than having to intersperse the logic of the different tasks

In this case, the number of threads is independent of the number of CPU cores available, since the division into threads is based on the conceptual design, rather than an attempt to increase throughput

1.2.2 Using Concurrency for Performance

Multi-processor systems have existed for decades, but until recently they were mostly only found in supercomputers, mainframes and large server systems However, chip manufacturers have increasingly been favouring multi-core designs with 2, 4, 16 or more processors on a single chip over better performance with a single core Consequently, multi-core desktop computers, and even multi-core embedded devices, are now increasingly prevalent The increased computing power of these machines comes not from running a single task faster, but from running multiple tasks in parallel In the past, programmers have been able to sit back and watch their programs get faster with each new generation of processors, without any effort on their part, but now, as Herb Sutter put it: “The free lunch is

over.” [Sutter2005] If software is to take advantage of this increased computing

power, it must be designed to run multiple tasks concurrently Programmers must

therefore take heed, and those who have hitherto ignored concurrency must now look to add

it to their toolbox

There are two ways to use concurrency for performance The first, and most obvious, is

to divide a single task into parts, and run each in parallel, thus reducing the total runtime

This is task parallelism Though this sounds straight-forward, it can be quite a complex

process, as there may be many dependencies between the various parts The divisions may

be either in terms of processing — one thread performs one part of the algorithm, whilst another thread performs a different part — or in terms of data: each thread performs the

same operation on different parts of the data This latter is called data parallelism

Algorithms which are readily susceptible to such parallelism are frequently called

Embarrassingly Parallel Despite the implications that you might be embarrassed to have

code so easy to parallelize, this is a good thing: other terms I've encountered for such

algorithms are naturally parallel and conveniently concurrent Embarrassingly parallel

algorithms have very good scalability properties — as the number of available hardware threads goes up, the parallelism in the algorithm can be increased to match Such an algorithm is the perfect embodiment of “Many hands make light work” For those parts of the algorithm that aren't embarrassingly parallel, you might be able to divide the algorithm into

Trang 12

a fixed (and therefore not scalable) number of parallel tasks Techniques for dividing tasks between threads are covered in chapter 8

The second way to use concurrency for performance is to use the available parallelism to solve bigger problems — rather than processing one file at a time, process two or ten or

twenty, as appropriate Though this is really just an application of data parallelism, by

performing the same operation on multiple sets of data concurrently, there's a different focus It still takes the same amount of time to process one chunk of data, but now more data can be processed in the same amount of time Obviously, there are limits to this approach too, and this will not be beneficial in all cases, but the increase in throughput that comes from such an approach can actually make new things possible — increased resolution

in video processing, for example, if different areas of the picture can be processed in parallel

1.2.3 When Not to use Concurrency

It is just as important to know when not to use concurrency as it is to know when to use it

Fundamentally, the one and only reason not to use concurrency is when the benefit is not worth the cost Code using concurrency is harder to understand in many cases, so there is a direct intellectual cost to writing and maintaining multi-threaded code, and the additional complexity can also lead to more bugs Unless the potential performance gain is large enough or separation of concerns clear enough to justify the additional development time required to get it right, and the additional costs associated with maintaining multi-threaded code, don't use concurrency

Also, the performance gain might not be as large as expected: there is an inherent overhead associated with launching a thread, as the OS has to allocate the associated kernel resources and stack space, and then add the new thread to the scheduler, all of which takes time If the task being run on the thread is completed quickly, then the actual time taken by the task may be dwarfed by the overhead of launching the thread, possibly making the overall performance of the application worse than if the task had been executed directly by the spawning thread

Furthermore, threads are a limited resource If you have too many threads running at once, this consumes OS resources, and may make the system as a whole run slower Not only that, but using too many threads can exhaust the available memory or address space for a process, since each thread requires a separate stack space This is particularly a problem for 32-bit processes with a “flat” architecture where there is a 4Gb limit in the available address space: if each thread has a 1Mb stack (as is typical on many systems), then the address space would be all used up with 4096 threads, without allowing for any space for code or static data or heap data Though 64-bit (or larger) systems don't have this direct address-space limit, they still have finite resources: if you run too many threads this

Trang 13

will eventually cause problems Though thread pools (see chapter 9) can be used to limit the number of threads, these are not a silver bullet, and they do have their own issues

If the server side of a client-server application launched a separate thread for each connection, this works fine for a small number of connections, but can quickly exhaust system resources by launching too many threads if the same technique is used for a high-demand server which has to handle many connections In this scenario, careful use of thread pools can provide optimal performance (see chapter 9)

Finally, the more threads you have running, the more context switching the operating system has to do Each context switch takes time that could be spent doing useful work, so

at some point adding an extra thread will actually reduce the overall application performance

rather than increase it For this reason, if you are trying to achieve the best possible performance of the system, it is necessary to adjust the number of threads running to take account of the available hardware concurrency (or lack of it)

Use of concurrency for performance is just like any other optimization strategy — it has potential to greatly improve the performance of your application, but it can also complicate the code, making it harder to understand, and more prone to bugs Therefore it is only worth doing for those performance-critical parts of the application where there is the potential for measurable gain Of course, if the potential for performance gains is only secondary to clarity

of design or separation of concerns then it may still be worth using a multi-threaded design

Assuming that you've decided you do want to use concurrency in your application,

whether for performance, separation of concerns or because it's “multi-threading Monday”, what does that mean for us C++ programmers?

1.3 Concurrency and Multi-threading in C++

Standardized support for concurrency through multi-threading is a new thing for C++ It is only with the upcoming C++0x standard that you will be able to write multi-threaded code without resorting to platform-specific extensions In order to understand the rationale behind lots of the decisions in the new Standard C++ thread library, it's important to understand the history

1.3.1 History of multi-threading in C++

The 1998 C++ Standard does not acknowledge the existence of threads, and the operational effects of the various language elements are written in terms of a sequential abstract machine Not only that, but the memory model is not formally defined, so you can't write multi-threaded applications without compiler-specific extensions to the 1998 C++ Standard

Of course, compiler vendors are free to add extensions to the language, and the prevalence of C APIs for multi-threading — such as those in the POSIX C Standard and the

Trang 14

Microsoft Windows API — has led many C++ compiler vendors to support multi-threading with various platform specific extensions This compiler support is generally limited to allowing the use of the corresponding C API for the platform, and ensuring that the C++ runtime library (such as the code for the exception handling mechanism) works in the presence of multiple threads Though very few compiler vendors have provided a formal multi-threading-aware memory model, the actual behaviour of the compilers and processors has been sufficiently good that a large number of multi-threaded C++ programs have been written

Not content with using the platform-specific C APIs for handling multi-threading, C++ programmers have looked to their class libraries to provide object-oriented multi-threading facilities Application frameworks such as MFC, and general-purpose C++ libraries such as Boost and ACE have accumulated sets of C++ classes that wrap the underlying platform-specific APIs and provide higher-level facilities for multi-threading that simplify the tasks Though the precise details of the class libraries have varied considerably, particularly in the area of launching new threads, the overall shape of the classes has had a lot in common One particularly important design that is common to many C++ class libraries, and which

provides considerable benefit to the programmer, has been the use of the Resource Acquisition Is Initialization (RAII) idiom with locks to ensure that mutexes are unlocked when

the relevant scope is exited

For many cases, the multi-threading support of existing C++ compilers, combined with the availability of platform-specific APIs and platform-independent class libraries such as Boost and ACE provides a good solid foundation on which to write multi-threaded C++ code, and as a result there are probably millions of lines of C++ code written as part of multi-threaded applications However, the lack of Standard support means that there are occasions where the lack of a thread-aware memory model causes problems, particularly for those who try to gain higher performance by using knowledge of the processor hardware, or for those writing cross-platform code where the actual behaviour of the compilers varies between platforms

1.3.2 Concurrency Support in the New Standard

All this changes with the release of the new C++0x Standard Not only is there a brand new thread-aware memory model, but the C++ Standard library has been extended to include classes for managing threads (see chapter 2), protecting shared data (see chapter 3), synchronizing operations between threads (see chapter 4) and low-level atomic operations (see chapter 5)

The new C++ thread library is heavily based on the prior experience accumulated through the use of the C++ class libraries mentioned above In particular, the Boost thread library has been used as the primary model on which the new library is based, with many of

Trang 15

the classes sharing their names and structure with the corresponding ones from Boost As the new Standard has evolved, this has been a two-way flow, and the Boost thread library has itself changed to match the C++ Standard in many respects, so users transitioning from Boost should find themselves very much at home

Concurrency support is just one of the changes with the new C++ Standard — as mentioned at the beginning of this chapter, there are many enhancements to the language itself to make programmers' lives easier Though these are generally outside the scope of this book, some of those changes have actually had a direct impact on the thread library itself, and the ways in which it can be used Appendix A provides a brief introduction to these language features

The support for atomic operations directly in C++ enables programmers to write efficient code with defined semantics without the need for platform-specific assembly language This is a real boon for those of us trying to write efficient, portable code: not only does the compiler take care of the platform specifics, but the optimizer can be written to take into account the semantics of the operations, thus enabling better optimization of the program as a whole

1.3.3 Efficiency in the C++ Thread Library

One of the concerns that developers involved in high-performance computing often raise regarding C++ in general, and C++ classes that wrap low-level facilities, such as those in the new Standard C++ Thread Library specifically, is that of efficiency If you're after the utmost in performance, then it is important to understand the implementation costs associated with using any high-level facilities, compared to using the underlying low-level

facilities directly This cost is the Abstraction Penalty

The C++ Standards committee has been very aware of this when designing the Standard C++ Library in general, and the Standard C++ Thread Library in particular — one

of the design goals has been that there should be little or no benefit to be gained from using the lower-level APIs directly, where the same facility is to be provided The library has therefore been designed to allow for efficient implementation (with a very low abstraction penalty) on most major platforms

Another goal of the C++ Standards committee has been to ensure that C++ provides sufficient low-level facilities for those wishing to work close to the metal for the ultimate performance To this end, along with the new memory model comes a comprehensive atomic operations library for direct control over individual bits and bytes, and the inter-thread synchronization and visibility of any changes These atomic types, and the corresponding operations can now be used in many places where developers would previously have chosen

to drop down to platform-specific assembly language Code using the new standard types and operations is thus more portable and easier to maintain

Trang 16

The Standard C++ Library also provides higher level abstractions and facilities that make writing multi-threaded code easier and less error-prone Sometimes the use of these facilities does comes with a performance cost due to the additional code that must be executed However, this performance cost does not necessarily imply a higher abstraction penalty though: in general the cost is no higher than would be incurred by writing equivalent functionality by hand, and the compiler may well inline much of the additional code anyway

In some cases, the high level facilities provide additional functionality beyond what may

be required for a specific use Most of the time this is not an issue: you don't pay for what you don't use On rare occasions this unused functionality will impact the performance of other code If you are aiming for performance, and the cost is too high, you may be better off hand-crafting the desired functionality from lower-level facilities In the vast majority of cases, the additional complexity and chance of errors far outweighs the potential benefits

from a small performance gain Even if profiling does demonstrate that the bottleneck is in

the C++ Standard Library facilities, it may be due to poor application design rather than a poor library implementation For example, if too many threads are competing for a mutex it

will impact the performance significantly Rather than trying to shave a small fraction of time

off the mutex operations, it would probably be more beneficial to restructure the application

so that there was less contention on the mutex This sort of issue is covered in chapter 8

In those very rare cases where the C++ Standard Library does not provide the performance or behaviour required, it might be necessary to use platform specific facilities

1.3.4 Platform-Specific Facilities

Whilst the C++ Thread Library provides reasonably comprehensive facilities for threading and concurrency, on any given platform there will be platform-specific facilities that go beyond what is offered In order to gain easy access to those facilities without giving

multi-up the benefits of using the Standard C++ thread library, the types in the C++ Thread Library may offer a native_handle() member function which allows the underlying implementation to be directly manipulated using a platform-specific API By its very nature, any operations performed using the native_handle() are entirely platform dependent, and out of the scope of this book (and the C++ Standard Library itself)

Of course, before even considering using platform-specific facilities, it's important to understand what the Standard library provides, so let's get started with an example

1.4 Getting Started

OK, so you've got a nice shiny C++09-compatible compiler What next? What does a threaded C++ program look like? It looks pretty much like any other C++ program, with the usual mix of variables, classes and functions The only real distinction is that some functions

Trang 17

multi-©Manning Publications Co Please post comments or corrections to the Author Online forum:

might be running concurrently, so care needs to be taken to ensure that shared data is safe for concurrent access, as described in chapter 3 Of course, in order to run functions concurrently, specific functions and objects must be used in order to manage the different threads

1.4.1 Hello Concurrent World

Let's start with a classic example: a program to print “Hello World” A really simple “Hello World” program that runs in a single thread is shown below, to serve as our baseline when

we move to multiple threads

All this program does is write Hello World to the standard output stream Let's compare it to

the simple “Hello Concurrent World” shown in listing 1.1, which starts a separate thread to display the message

Listing 1.1: A simple “Hello Concurrent World” program

Cueballs in Code and Text

The first difference is the extra #include <thread> (#1) The declarations for the threading support in the Standard C++ library are in new headers — the functions and classes for managing threads are declared in <thread>, whilst those for protecting shared data are declared in other headers

multi-Secondly, the code for writing the message has been moved to a separate function (#2)

This is because every thread has to have an initial function, which is where the new thread of

Trang 18

execution begins For the initial thread in an application, this is main(), but for every other thread it is specified in the constructor of a std::thread object — in this case, the

std::thread object named t (#3) has the new function hello() as its initial function This is the next difference — rather than just writing directly to standard output, or calling hello() from main(), this program launches a whole new thread to do it, bringing the thread count to two: the initial thread that starts at main(), and the new thread that starts at hello()

After the new thread has been launched (#3), the initial thread continues execution If it didn't wait for the new thread to finish, it would merrily continue to the end of main(), and thus end the program — possibly before the new thread had had a chance to run This is why the call to join() is there (#4) — as described in chapter 2, this causes the calling thread (in main()) to wait for the thread associated with the std::thread object — in this case, t

If this seems like a lot of work to go to just to write a message to standard output, it is

— as described in section 1.2.3 above, it is generally not worth the effort to use multiple threads for such a simple task, especially if the initial thread has nothing to do in the mean time Later in the book, we will work through examples which show scenarios where there is

a clear gain to using multiple threads

1.5 Summary

In this chapter, we've covered what is meant by concurrency and multi-threading, and why

we would choose to use it (or not) in our applications We've also covered the history of multi-threading in C++ from the complete lack of support in the 1998 Standard, through various platform-specific extensions to proper multi-threading support in the new C++ Standard, C++0x This support is coming just in time to allow programmers to take advantage of the greater hardware concurrency becoming available with newer CPUs, as chip manufacturers choose add more processing power in the form of multiple cores which allow more tasks to be executed concurrently, rather than increasing the execution speed of a single core

We've also seen quite how simple to use the classes and functions from the C++ Standard Library can be, in the examples from section 1.4 In C++, using multiple threads is not complicated in and of itself — the complexity lies in designing the code so that it behaves

as intended

After the taster examples of section 1.4, it's time for something with a bit more substance In chapter 2 we'll look at the classes and functions available for managing threads

Trang 19

2

Managing Threads

OK, so you've decided to use concurrency for your application In particular, you've decided

to use multiple threads What now? How do you launch these threads, how do you check that they've finished, and how do you keep tabs on them? The C++ Standard Library makes most thread management tasks relatively easy, with just about everything managed through the

std::thread object associated with a given thread, as you'll see For those tasks that aren't

so straightforward, the library provides the flexibility to build what you need from the basic building blocks

In this chapter, we'll start by covering the basics: launching a thread, waiting for it to finish, or running it in the background We'll then proceed to look at passing additional parameters to the thread function when it is launched, and how to transfer ownership of a thread from one std::thread object to another Finally, we'll look at choosing the number

of threads to use, and identifying particular threads

2.1 Basic Thread Management

Every C++ program has at least one thread, which is started by the C++ runtime: the thread running main() Your program can then launch additional threads which have another function as the entry point These threads then run concurrently with each other and with the initial thread Just as the program exits when the program returns from main(), when the specified entry point function returns, the thread is finished As we'll see, if you have a std::thread object for a thread, you can wait for it to finish, but first we have to start it so let's look at launching threads

2.1.1 Launching a Thread

As we saw in chapter 1, threads are started by constructing a std::thread object that specifies the task to run on that thread In the simplest case, that task is just a plain, ordinary void-returning function that takes no parameters This function runs on its own

Trang 20

thread until it returns, and then the thread stops At the other extreme, the task could be a function object that takes additional parameters, and performs a series of independent operations that are specified through some kind of messaging system whilst it is running, and the thread only stops when it is signalled to do so, again via some kind of messaging system It doesn't matter what the thread is going to do, or where it's launched from, but starting a thread using the C++ thread library always boils down to constructing a

newly-Since the callable object supplied to the constructor is copied into the thread, the original object can be destroyed immediately However, if the object contains any pointers or references, it is important to ensure that those pointers and references remain valid as long

as they may be accessed from the new thread, otherwise undefined behaviour will result In particular, it is a bad idea to create a thread within a function that has access to the local variables in that function, unless the thread is guaranteed to finish before the function exits Listing 2.1 shows an example of a just such a problematic function

Listing 2.1: A function that returns whilst a thread still has access to local variables

Trang 21

Cueballs in code and text

#1 Potential access to dangling reference

#2 The new thread might still be running

In this case, the new thread associated with my_thread will probably still be running when

oops exits (#2), in which case the next call to do_something(i) (#1) will access an already-destroyed variable This is just like normal single-threaded code — allowing a pointer

or reference to a local variable to persist beyond the function exit is never a good idea — but

it is easier to make the mistake with multi-threaded code, as it is not necessarily immediately apparent that this has happened In cases like this, it is desirable to ensure that the thread has completed execution before the function exits

2.1.2 Waiting for a Thread to Complete

If you need to wait for a thread to complete, this can be done by calling join() on the associated std::thread instance In the case of listing 2.1, inserting a call to

my_thread.join() before the closing brace of the function body would therefore be sufficient to ensure that the thread was finished before the function was exited, and thus before the local variables were destroyed In this case, it would mean there was little point running the function on a separate thread, as the first thread would not be doing anything useful in the mean time, but in real code then the original thread would either have work to

do itself, or it would have launched several threads to do useful work before waiting for all of them to complete

join() is very simple, and brute-force — either you wait for a thread to finish, or you don't If you need more fine-grained control over waiting for a thread, such as just to check whether a thread is finished, or wait only a certain period of time, then you have to use alternative mechanisms The act of calling join() also cleans up any storage associated

Trang 22

with the thread, so the std::thread object is no longer associated with the now-finished thread — it is not associated with any thread

Listing 2.2: Waiting for a thread to finish

#A See definition in listing 2.1

Listing 2.2 shows code to ensure that a thread with access to local state is finished before the function exits, whether the function exits normally (#1) or by an exception (#2) Just as

it is important to ensure that any other locally allocated resources are properly cleaned up on function exit, local threads are no exception — if the thread must complete before the function exits, whether because it has a reference to other local variables, or for any other reason, then it is important to ensure this is the case for all possible exit paths, whether

normal or exceptional One way of doing this is to use the standard Resource Acquisition Is Initialization idiom (RAII), and provide a class that does the join() in its destructor, as in listing 2.3 See how it simplifies the function f()

Listing 2.3: Using RAII to wait for a thread to complete

Trang 23

When the execution of the current thread reaches the end of f (#1), the local objects are destroyed in reverse order of construction Consequently, the thread_guard object g is destroyed first, and the thread joined with in the destructor (#3) This even happens if the function exits because do_something_in_current_thread throws an exception

The destructor of thread_guard in listing 2.3 first tests to see if the std::thread

object is joinable() (#2) before calling join() (#3) This is important, because join()

can only be called once for a given thread of execution, so it would therefore be a mistake to

do so if the thread had already been joined with

The copy constructor and copy-assignment operator are marked =delete (#4) to ensure that they are not automatically provided by the compiler: copying or assigning such

an object would be dangerous, as it might then outlive the scope of the thread it was joining The reason we have to take such precautions to ensure that our threads are joined when they reference local variables is because a thread can continue running even when the

std::thread object that was managing it has been destroyed Such a thread is said to be

detached — it is no longer attached to a std::thread object This means that the C++ runtime library is now responsible for cleaning up the resources associated with the thread when it exits, rather than that being the responsibility of the std::thread object It is also

no longer possible to wait for that thread to complete — once a thread becomes detached it

is not possible to obtain a std::thread object that references it, so it can no longer be

Trang 24

joined with Detached threads truly run in the background: ownership and control is passed over to the C++ runtime library

2.1.3 Running Threads in the Background

Detached threads are often called daemon threads after the UNIX concept of a daemon process that runs in the background without any explicit user interface Such threads are

typically long-running: they may well run for almost the entire lifetime of the application, performing a background task such as monitoring the file system, clearing unused entries out of object caches, or optimizing data structures At the other extreme, it may make sense

to use a detached thread where there is another mechanism for identifying when the thread has completed, or where the thread is used for a “fire and forget” task

As we've already seen in section 2.1.2, one way to detach a thread is just to destroy the associated std::thread object This is fine for those circumstances where you can destroy

the std::thread object, either because it is a local object and is destroyed when the containing scope is exited, or because it was allocated dynamically either directly with new,

or as part of a container If the std::thread object cannot be destroyed at the point in code where you wish to detach the thread, you can do so by calling the detach() member function of the std::thread object After the call completes, the std::thread object is no longer associated with the actual thread of execution, and is therefore no longer joinable std::thread t(do_background_work);

t.detach();

assert(!t.joinable());

The thread of execution no longer has an associated management object, just as if the

std::thread object had been destroyed The C++ runtime library is therefore responsible for cleaning up the resources associated with running the thread when it completes

Even if the std::thread object for a thread is to be destroyed at this point in the code, sometimes it is worth calling detach() to be explicit in your intent: it makes it clear to whoever maintains the code that this thread was intended to be detached Given that multi-threaded code can be quite complex, anything that makes it easier to understand should be considered

Of course, in order to detach the thread from a std::thread object, there must be a thread to detach: you cannot call detach() on a std::thread object with no associated thread of execution This is exactly the same requirement as join(), and you can therefore check it in exactly the same way — you can only call t.detach() for a std::thread object

t when t.joinable() returns true

Consider an application such as a word processor that can edit multiple documents at once There are many ways to handle this, both at the UI level, and internally One way that does seem to be increasingly common at the moment is to have multiple independent top-level windows, one for each document being edited Though these windows appear to be

Trang 25

completely independent, each with their own menus and so forth, they are running within the same instance of the application One way to handle this internally is to run each document editing window in its own thread: each thread runs the same code, but with different data relating to the document being edited and the corresponding window properties Opening a new document therefore requires starting a new thread The thread handling the request is not going to care about waiting for that other thread to finish, as it is working on an unrelated document, so this makes it a prime case for running a detached thread

Listing 2.4 shows a simple code outline for this approach: if the user chooses to open a new document, we prompt them for the document to open, then start a new thread to open that document (#1), and detach it (#2) Since the new thread is doing the same operation

as the current thread, but on a different file, we can reuse the same function (edit_document), but with the newly-chosen filename as the supplied argument

Listing 2.4: Detaching thread to handle other documents

void edit_document(std::string const& filename)

This example also shows a case where it is helpful to pass arguments to the function used to start a thread: rather than just passing the name of the function to the std::thread

constructor (#1), we also pass in the filename parameter Though other mechanisms could

Trang 26

be used to do this, such as using a function object with member data instead of an ordinary function with parameters, the thread library provides us with an easy way of doing it

2.2 Passing Arguments to a Thread Function

As seen in listing 2.4, passing arguments to the callable object or function is fundamentally

as simple as passing additional arguments to the std::thread constructor However, it is

important to bear in mind that by default the arguments are copied into internal storage,

where they can be accessed by the newly created thread of execution, even if the corresponding parameter in the function is expecting a reference Here's a simple example: void f(int i,std::string const& s);

std::thread t(f,3,”hello”);

This creates a new thread of execution associated with t, which calls f(3,”hello”) Note that even though f takes a std::string as the second parameter, the string literal is passed as a char const*, and only converted to a std::string in the context of the new thread This is particularly important when the argument supplied is a pointer to an automatic variable, as below:

void f(int i,std::string const& s);

void oops(int some_param)

In this case, it is the pointer to the local variable buffer (#1) that is passed through to the new thread (#2), and there's a significant chance that the function oops will exit before the buffer has been converted to a std::string on the new thread, thus leading to undefined behaviour The solution is to cast to std::string before passing the buffer to the

std::thread constructor:

void f(int i,std::string const& s);

void not_oops(int some_param)

Trang 27

In this case, the problem was that we were relying on the implicit conversion of the pointer

to the buffer into the std::string object expected as a function parameter, as the

std::thread constructor copies the supplied values as-is, without converting to the expected argument type

It's also possible to get the reverse scenario: the object is copied, and what you wanted was a reference This might happen if the thread is updating a data structure which is passed

in by reference, for example:

Though update_data_for_widget (#1) expects the second parameter to be passed by reference, the std::thread constructor (#2) doesn't know that: it is oblivious to the types

of the arguments expected by the function and just blindly copies the supplied values When

it calls update_data_for_widget, it will end up passing a reference to the internal copy of

data, and not a reference to data itself Consequently, when the thread finishes these updates will be discarded as the internal copies of the supplied arguments are destroyed, and process_widget_data will be passed an unchanged data (#3) rather than a correctly-updated version For those of you familiar with std::bind, the solution will be readily apparent: you need to wrap the arguments that really need to be references in std::ref In this case, if we change the thread invocation to

std::thread t(update_data_for_widget,

w,std::ref(data));

then update_data_for_widget will be correctly passed a reference to data rather than a

reference to a copy of data

If you're familiar with std::bind, the whole parameter-passing semantics will be unsurprising, since both the operation of the std::thread constructor and the operation of

std::bind are defined in terms of the same mechanism This means that, for example, you can pass a member function pointer as the function, provided you supply a suitable object pointer as the first argument:

class X

Trang 28

This code will invoke my_x.do_lengthy_work() on the new thread, since the address of

my_x is supplied as the object pointer (#1) You can also supply arguments to such a member function call: the third argument to the std::thread constructor will be the first argument to the member function, and so forth

Another interesting scenario for supplying arguments is where the arguments cannot be

copied, but can only be moved: the data held within one object is transferred over to

another, leaving the original object “empty” An example of such a type is

std::unique_ptr, which provides automatic memory management for dynamically allocated objects Only one std::unique_ptr instance can point to a given object at a time,

and when that instance is destroyed, the pointed-to object is deleted The move constructor and move assignment operator allow the ownership of an object to be transferred around

between std::unique_ptr instances Such a transfer leaves the source object with a NULL

pointer This moving of values allows objects of this type to be accepted as function parameters or returned from functions Where the source object is a temporary, the move is automatic, but where the source is a named value the transfer must be requested directly by invoking std::move() The example below shows the use of std::move to transfer ownership of a dynamic object into a thread:

By specifying std::move(p) in the std::thread constructor, the ownership of the

big_object is transferred first into internal storage for the newly-created thread, and then into process_big_object in turn

Several of the classes in the Standard thread library exhibit the same ownership semantics as std::unique_ptr, and std::thread is one of them Though std::thread

instances don't own a dynamic object in the same way as std::unique_ptr does, they do own a resource: each instance is responsible for managing a thread of execution This ownership can be transferred between instances, since instances of std::thread are

Trang 29

movable, even though they aren't copyable This ensures that only one object is associated

with a particular thread of execution at any one time, whilst allowing programmers the option of transferring that ownership between objects

2.3 Transferring Ownership of a Thread

Suppose you want to write a function that creates a thread to run in the background, but passes back ownership of the new thread to the calling function rather than wait for it to complete, or maybe you want to do the reverse — create a thread, and pass ownership in to some function that should wait for it to complete In either case, you need to transfer ownership from one place to another

This is where the move support of std::thread comes in As described in the previous section, many resource-owning types in the C++ Standard Library such as std::ifstream, and std::unique_ptr are movable but not copyable, and std::thread is one of them This means that the ownership of a particular thread of execution can be moved between

std::thread instances, as in the example below The example shows the creation of two threads of execution, and the transfer of ownership of those threads between three

std::thread instances, t1, t2, and t3

Firstly, a new thread is started (#1), and associated with t1 Ownership is then transferred over to t2 when t2 is constructed, by invoking std::move() to explicitly move ownership (#2) At this point, t1 no longer has an associated thread of execution: the thread running

some_function is now associated with t2

Then, a new thread is started, and associated with a temporary std::thread object (#3) The subsequent transfer of ownership into t1 doesn't require a call to std::move() to explicitly move ownership, since the owner is a temporary object — moving from temporaries is automatic and implicit

t3 is default-constructed (#4), which means that it is created without any associated thread of execution Ownership of the thread currently associated with t2 is transferred into

t3 (#5), again with an explicit call to std::move(), since t2 is a named object After all

Trang 30

these moves, t1 is associated with the thread running some_other_function, t2 has no associated thread, and t3 is associated with the thread running some_function

The final move (#6) transfers ownership of the thread running some_function back to

t1 where it started However, in this case t1 already had an associated thread (which was running some_other_function), so that thread is therefore detached as it no longer has an associated std::thread object

The move support in std::thread means that ownership can readily be transferred out of a function, as shown in listing 2.5

Listing 2.5: Returning a std::thread from a function

Likewise, if ownership should be transferred into a function, it can just accept an instance of

std::thread by value as one of the parameters, as shown here:

One benefit of the move support of std::thread is that we can build on the thread_guard

class from listing 2.3 and have it actually take ownership of the thread This avoids any unpleasant consequences should the thread_guard object outlive the thread it was referencing, and it also means that no-one else can join or detach the thread once ownership has been transferred into the object Since this would primarily be aimed at ensuring threads are completed before a scope is exited, I named this class scoped_thread The implementation is shown in listing 2.6, along with a simple example

Listing 2.6: scoped_thread and example usage

class scoped_thread

{

Trang 31

The example is very similar to that from listing 2.3, but the new thread is passed in directly

to the scoped_thread (#1), rather than having to create a separate named variable for it When the initial thread reaches the end of f (#2), the scoped_thread object is destroyed and then joins with (#3) the thread supplied to the constructor (#4) Whereas with the

thread_guard class from listing 2.3, the destructor had to check the thread was still joinable, we can do that in the constructor (#5), and throw an exception if not

The move support in std::thread also allows for containers of std::thread objects, if those containers are move-aware (like the updated std::vector<>) This means that you can write code like in listing 2.7, which spawns a number of threads, and then waits for them

to finish

Listing 2.7: Spawn some threads and wait for them to finish

void do_work(unsigned id);

void f()

{

std::vector<std::thread> threads;

for(unsigned i=0;i<20;++i)

Trang 32

#2 Call join() on each thread in turn

If the threads are being used to subdivide the work of an algorithm, this is often just what is required: before returning to the caller, all threads must have finished Of course, the simple structure of listing 2.7 implies that the work done by the threads is self-contained, and the result of their operations is purely the side-effects on shared data If f() were to return a value to the caller that depended on the results of the operations performed by these threads, then as-written this return value would have to be determined by examining the shared data after the threads had terminated Alternative schemes for transferring the results of operations between threads are discussed in chapter 4

Putting std::thread objects in a std::vector is a step towards automating the management of those threads: rather than creating separate variables for those threads and joining with them directly, they can be treated as a group We can take this a step further by creating a dynamic number of threads determined at runtime, rather than creating a fixed number as in listing 2.7

2.4 Choosing the Number of Threads at

Runtime

One feature of the C++ Standard Library that helps here is

std::thread::hardware_concurrency() This function returns an indication of the number of threads that can truly run concurrently for a given execution of a program On a multi-core system it might be the number of CPU cores, for example This is only a hint, and the function might return 0 if this information is not available, but it can be a useful guide for splitting a task between threads

Listing 2.8 shows a simple implementation of parallel version of std::accumulate It divides the work between the threads, with a minimum number of elements per thread in order to avoid the overhead of too many threads Note that this implementation assumes that none of the operations will throw an exception, even though exceptions are possible: the std::thread constructor will throw if it cannot start a new thread of execution, for example Handling exceptions in such an algorithm is beyond the scope of this simple example, and will be covered in chapter 8

Trang 33

Listing 2.8: A parallel version of std::accumulate

unsigned long const min_per_thread=25;

unsigned long const max_threads=

block_start,block_end,std::ref(results[i]));

block_start=block_end; #8 }

accumulate_block()(block_start,last,results[num_threads-1]); #9

std::for_each(threads.begin(),threads.end(),

std::mem_fn(&std::thread::join)); #10

return std::accumulate(results.begin(),results.end(),init); #11

Trang 34

}

Though this is quite a long function, it's actually really straightforward If the input range is empty (#1), we just return the initial value init Otherwise, there's at least one element in the range, so we can divide the number of elements to process by the minimum block size,

in order to give the maximum number of threads (#2) This is to avoid us creating 32 threads on a 32-core machine when we've only got 5 values in the range

The number of threads to run is the minimum of our calculated maximum, and the number of hardware threads (#3): we don't want to run more threads than the hardware can support, as the context switching will mean that more threads will decrease the performance If the call to std::hardware_concurrency() returned 0, we simply substitute a number of our choice: in this case I've chosen 2 We don't want to run too many threads, as that would slow things down on a single-core machine, but likewise we don't want to run too few as then we're passing up the available concurrency

The number of entries for each thread to process is simply the length of the range divided by the number of threads (#4) If you're worrying about the case where the number doesn't divide evenly, don't — we'll handle that later

Now we know how many threads we've got we can create a std::vector<T> for the intermediate results, and a std::vector<std::thread> for the threads (#5) Note that we need to launch one fewer thread than num_threads, since we've already got one

Launching the threads is just a simple loop: advance the block_end iterator to the end

of the current block (#6), and launch a new thread to accumulate the results for this block (#7) The start of the next block is just the end of this one (#8)

After we've launched all the threads, this thread can then process the final block (#9) This is where we take account of any uneven division: we know the end of the final block must be last, and it doesn't matter how many elements are in that block

Once we've accumulated the results for the last block, we can wait for all the threads we spawned with std::for_each (#10), like in listing 2.8, and then add up the results with a final call to std::accumulate (#11)

Before we leave this example, it's worth pointing out that where the addition operator for the type T is not associative (such as for float or double), the results of this

parallel_accumulate may vary from those of std::accumulate, due to the grouping of the range into blocks Also, the requirements on the iterators are slightly more stringent:

they must be at least forward-iterators, whereas std::accumulate can work with

single-pass input-iterators, and T must be default-constructible so that we can create the results

vector These sorts of requirement changes are common with parallel algorithms: by their very nature they are different in some manner in order to make them parallel, and this has

Trang 35

consequences on the results and requirements Parallel algorithms are covered in more depth in chapter 8

In this case, all the information required by each thread was passed in when the thread was started, including the location in which to store the result of its calculation This is not always the case: sometimes it is necessary to be able to identify the threads in some way for part of the processing You could of course pass in an identifying number, such as the value

of i in listing 2.8, but if the function that needs the identifier is several levels deep in the call stack, and could be called from any thread, it is inconvenient to have to do it that way When

we were designing the C++ thread library we foresaw this need, and so each thread has a unique identifier

2.5 Identifying Threads

Thread identifiers are of type std::thread::id, and can be retrieved in two ways Firstly, the identifier for a thread can be obtained from its associated std::thread object by calling the get_id() member function If the std::thread object doesn't have an associated thread of execution, the call to get_id() returns a default-constructed std::thread::id

object, which indicates “not any thread” Alternatively, the identifier for the current thread can be obtained by calling std::this_thread::get_id()

Objects of type std::thread::id can be freely copied around and compared: they wouldn't be much use as identifiers otherwise If two objects of type std::thread::id are equal then they represent the same thread, or both are holding the “not any thread” value

If two objects are not equal then they represent different threads, or one represents a thread and the other is holding the “not any thread” value

The thread library doesn't limit you to checking whether or not thread identifiers are the same or not: objects of type std::thread::id offer the complete set of comparison operators, which provide a total ordering for all distinct values This allows them to be used

as keys in associative containers, or sorted, or compared in any other way that you as a programmer may see fit The comparison operators provide a total order for all non-equal values of std::thread::id, so they behave as you would intuitively expect: if a<b and b<c

then a<c, and so forth The Standard Library also provides std::hash<std::thread::id>

so that values of type std::thread::id can be used as keys in the new unordered associative containers too

Instances of std::thread::id are often used to check whether or not a thread needs

to perform some operation For example, if threads are used to divide work as in listing 2.8 then the initial thread that launched the others might need to perform its work slightly differently in the middle of the algorithm In this case it could store the result of

std::this_thread::get_id() before launching the other threads, and then the core part

Trang 36

of the algorithm (which is common to all threads) could check its own thread ID against the stored value

Similarly, thread IDs could be used as keys into associative containers where there is specific data that needs associating with a thread, and alternative mechanisms such as thread-local storage are not appropriate Such a container could, for example, be used by a controlling thread to store information about each of the threads under its control, or for passing information between threads

The idea is that std::thread::id will suffice as a generic identifier for a thread in most circumstances: it is only if the identifier has semantic meaning associated with it (such as being an index into an array) that alternatives should be necessary You can even write out

an instance of std::thread::id to an output stream such as std::cout:

std::cout<<std::this_thread::get_id();

The exact output you get is strictly implementation-dependent: the only guarantee given by the standard is that thread IDs that compare equal should produce the same output, and those that are not equal should give different output This is therefore primarily useful for debugging and logging, but the values have no semantic meaning, so there is not much more that could be said anyway

2.6 Summary

In this chapter we've covered the basics of thread management with the C++ Standard

Library: starting threads, waiting for them to finish, and not waiting for them to finish

because we want them to run in the background We've also seen how to pass arguments into the thread function when a thread is started, and how to transfer the responsibility for managing a thread from one part of the code to another, and how groups of threads can be used to divide work Finally, we've discussed identifying threads in order to associate data or behaviour with specific threads that is inconvenient to associate through alternative means Though you can do quite a lot with purely independent threads that each operate on

Trang 37

separate data, as in listing 2.8 for example, sometimes it is desirable to share data between threads whilst they are running Chapter 3 discusses the issues surrounding sharing data directly between threads, whilst chapter 4 covers more general issues surrounding synchronizing operations with and without shared data

Trang 38

3

Sharing Data between Threads

One of the key benefits of using threads for concurrency is the possibility of easily and directly sharing data between them, so now we've covered starting and managing threads, let's look at the issues surrounding shared data

Imagine for a moment that you're sharing a flat with a friend There's only one kitchen and only one bathroom Unless you're particularly friendly, you can't both use the bathroom

at the same time, and if your flatmate occupies the bathroom for a long time, it can be frustrating if you need to use it Likewise, though it might be possible to both cook meals at the same time, if you've got a combined oven and grill, it's just not going to end well if one

of you tries to grill some sausages at the same time as the other is baking cakes Furthermore, we all know the frustration of sharing a space and getting half-way through a task only to find that someone has borrowed something you need, or changed something from how you had it

It's the same with threads If you're sharing data between threads, you need to have rules for which thread can access which bit of data when, and how any updates are communicated to the other threads that care about that data The ease with which data can

be shared between multiple threads in a single process is not just a benefit — it can also be

a big drawback too Incorrect use of shared data is one of the biggest causes of related bugs, and the consequences can be far worse than sausage-flavoured cakes

concurrency-This chapter is about sharing data safely between threads in C++, avoiding the potential problems that can arise, and maximizing the benefits

3.1 Problems with Sharing Data Between

Threads

When it comes down to it, the problems with sharing data between threads are all due to the

consequences of modifying data If all shared data is read-only there is no problem,

since the data read by one thread is unaffected by whether or not another thread is

Trang 39

reading the same data However, if data is shared between threads, and one or more

threads start modifying the data, there is a lot of potential for trouble In this case, you must take care to ensure that everything works out OK

One concept that is widely used to help programmers reason about their code is that of

invariants — statements that are always true about a particular data structure, such as “this

variable contains the number of items in the list” These invariants are often broken during

an update, especially if the data structure is of any complexity or the update requires modification of more than one value

Consider a doubly-linked list, where each node holds a pointer to both the next node in the list and the previous one One of the invariants is that if you follow a “next” pointer from one node (A) to another (B), and the “previous” pointer from that node (B), points back to the first node (A) In order to remove a node from the list, the nodes either side have to be updated to point to each other Once one has been updated, the invariant is broken until the node the other side has been updated too — after the update has completed, the invariant holds again

The steps in deleting an entry from such a list are shown in figure 3.1:

1 Identify the node to delete (N)

2 Update the link from the node prior to N to point to the node after N

3 Update the link back from the node after N to point to the node prior to N

4 Delete the node N

As you can see, between steps b) and c), the links going in one direction are inconsistent from the links going in the opposite direction, and the invariant is broken

Trang 40

Figure 3.5 Deleting a node from a doubly-linked list

The simplest potential problem with modifying data that is shared between threads is that of broken invariants If you don't do anything special to ensure otherwise, if one thread

is reading the doubly-linked list whilst another is removing a node, it is quite possible for the reading thread to see the list with a node only partially removed (because only one of the links has been changed, as in step b) of figure 3.1), so the invariant is broken The

Tiêu đề	Anthony Williams C++ Concurrency In Action, Practical Multithreading
Trường học	Manning Publications
Chuyên ngành	Computer Science
Thể loại	sách hướng dẫn thực hành
Năm xuất bản	2009
Thành phố	Unknown

Định dạng
Số trang	337
Dung lượng	2,09 MB