Thinking in Cplus plus (P26) doc

Upcasting When you embed subobjects of a class inside a new class, whether you do it by creating member objects or through inheritance, each subobject is placed within the new object by

Trang 1

b.push_back(new X);

for(int i = 0; i < b.size(); i++)

cout << b[i]->vf() << endl;

The term “pointer magic” has been used to describe the way virtual inheritance is

implemented You can see the physical overhead of virtual inheritance with the following program:

Trang 2

alignment) The results are a bit surprising (these are from one particular compiler; yours may

Both b and nonv_inheritance contain the extra pointer, as expected But when virtual

inheritance is added, it would appear that the VPTR plus two extra pointers are added! By the

time the multiple inheritance is performed, the object appears to contain five extra pointers (however, one of these is probably a second VPTR for the second multiply inherited

subobject)

The curious can certainly probe into your particular implementation and look at the assembly language for member selection to determine exactly what these extra bytes are for, and the cost of member selection with multiple inheritance19 The rest of you have probably seen enough to guess that quite a bit more goes on with virtual multiple inheritance, so it should be used sparingly (or avoided) when efficiency is an issue

Upcasting

When you embed subobjects of a class inside a new class, whether you do it by creating member objects or through inheritance, each subobject is placed within the new object by the

compiler Of course, each subobject has its own this pointer, and as long as you’re dealing

with member objects, everything is quite straightforward But as soon as multiple inheritance

19 See also Jan Gray, “C++ Under the Hood”, a chapter in Black Belt C++ (edited by Bruce

Eckel, M&T Press, 1995)

Trang 3

is introduced, a funny thing occurs: An object can have more than one this pointer because

the object represents more than one type during upcasting The following example

demonstrates this point:

Trang 4

Base1* b1 = &mi; // Upcast

Base2* b2 = &mi; // Upcast

out << "Base 1 pointer = " << b1 << endl;

out << "Base 2 pointer = " << b2 << endl;

} ///:~

The arrays of bytes inside each class are created with hexadecimal sizes, so the output

addresses (which are printed in hex) are easy to read Each class has a function that prints its

this pointer, and these classes are assembled with both multiple inheritance and composition into the class MI, which prints its own address and the addresses of all the other subobjects This function is called in main( ) You can clearly see that you get two different this pointers for the same object The address of the MI object is taken and upcast to the two different

types Here’s the output:20

sizeof(mi) = 40 hex

mi this = 0x223e

Base1 this = 0x223e

Base2 this = 0x224e

Member1 this = 0x225e

Member2 this = 0x226e

Base 1 pointer = 0x223e

Base 2 pointer = 0x224e

20 For easy readability the code was generated for a small-model Intel processor

Trang 5

Although object layouts vary from compiler to compiler and are not specified in Standard C++, this one is fairly typical The starting address of the object corresponds to the address of the first class in the base-class list Then the second inherited class is placed, followed by the member objects in order of declaration

When the upcast to the Base1 and Base2 pointers occur, you can see that, even though they’re ostensibly pointing to the same object, they must actually have different this pointers, so the

proper starting address can be passed to the member functions of each subobject The only way things can work correctly is if this implicit upcasting takes place when you call a member function for a multiply inherited subobject

Persistence

Normally this isn’t a problem, because you want to call member functions that are concerned with that subobject of the multiply inherited object However, if your member function needs

to know the true starting address of the object, multiple inheritance causes problems

Ironically, this happens in one of the situations where multiple inheritance seems to be useful:

persistence

The lifetime of a local object is the scope in which it is defined The lifetime of a global

object is the lifetime of the program A persistent object lives between invocations of a

program: You can normally think of it as existing on disk instead of in memory One

definition of an object-oriented database is “a collection of persistent objects.”

To implement persistence, you must move a persistent object from disk into memory in order

to call functions for it, and later store it to disk before the program expires Four issues arise when storing an object on disk:

1 The object must be converted from its representation in memory to a series of bytes

on disk

2 Because the values of any pointers in memory won’t have meaning the next time the program is invoked, these pointers must be converted to something meaningful

3 What the pointers point to must also be stored and retrieved

4 When restoring an object from disk, the virtual pointers in the object must be

respected

Because the object must be converted back and forth between a layout in memory and a serial

representation on disk, the process is called serialization (to write an object to disk) and deserialization (to restore an object from disk) Although it would be very convenient, these

processes require too much overhead to support directly in the language Class libraries will often build in support for serialization and deserialization by adding special member functions

and placing requirements on new classes (Usually some sort of serialize( ) function must be

written for each new class.) Also, persistence is generally not automatic; you must usually explicitly write and read the objects

Trang 6

MI-based persistence

Consider sidestepping the pointer issues for now and creating a class that installs persistence

into simple objects using multiple inheritance By inheriting the persistence class along with

your new class, you automatically create classes that can be read from and written to disk Although this sounds great, the use of multiple inheritance introduces a pitfall, as seen in the following example

Trang 7

class WData1 : public Persistent, public Data {

ofstream f1("f1.dat"), f2("f2.dat");

assure(f1, "f1.dat"); assure(f2, "f2.dat");

WData1 d1(1.1, 2.2, 3.3);

WData2 d2(4.4, 5.5, 6.6);

d1.print("d1 before storage");

d2.print("d2 before storage");

d1.write(f1);

d2.write(f2);

} // Closes files

ifstream f1("f1.dat"), f2("f2.dat");

assure(f1, "f1.dat"); assure(f2, "f2.dat");

WData1 d1;

WData2 d2;

d1.read(f1);

d2.read(f2);

d1.print("d1 after storage");

d2.print("d2 after storage");

} ///:~

In this very simple version, the Persistent::read( ) and Persistent::write( ) functions take the this pointer and call iostream read( ) and write( ) functions (Note that any type of iostream can be used) A more sophisticated Persistent class would call a virtual write( ) function for

each subobject

With the language features covered so far in the book, the number of bytes in the object

cannot be known by the Persistent class so it is inserted as a constructor argument (In

Chapter XX, run-time type identification shows how you can find the exact type of an object

Trang 8

given only a base pointer; once you have the exact type you can find out the correct size with

the sizeof operator.)

The Data class contains no pointers or VPTR, so there is no danger in simply writing it to disk and reading it back again And it works fine in class WData1 when, in main( ), it’s written to file F1.DAT and later read back again However, when Persistent is second in the inheritance list of WData2, the this pointer for Persistent is offset to the end of the object, so

it reads and writes past the end of the object This not only produces garbage when reading the object from the file, it’s dangerous because it walks over any storage that occurs after the object

This problem occurs in multiple inheritance any time a class must produce the this pointer for the actual object from a subobject’s this pointer Of course, if you know your compiler always

lays out objects in order of declaration in the inheritance list, you can ensure that you always put the critical class at the beginning of the list (assuming there’s only one critical class) However, such a class may exist in the inheritance hierarchy of another class and you may unwittingly put it in the wrong place during multiple inheritance Fortunately, using run-time type identification (the subject of Chapter XX) will produce the proper pointer to the actual object, even if multiple inheritance is used

Improved persistence

A more practical approach to persistence, and one you will see employed more often, is to create virtual functions in the base class for reading and writing and then require the creator of any new class that must be streamed to redefine these functions The argument to the function

is the stream object to write to or read from.21 Then the creator of the class, who knows best how the new parts should be read or written, is responsible for making the correct function calls This doesn’t have the “magical” quality of the previous example, and it requires more coding and knowledge on the part of the user, but it works and doesn’t break when pointers are present:

21 Sometimes there’s only a single function for streaming, and the argument contains

information about whether you’re reading or writing

Trang 9

virtual void read(istream& in) = 0;

void print(const char* msg = "") const {

if(*msg) cout << msg << endl;

Trang 10

void read(istream& in) {

out << i << " "; // Store size of string

out << name << endl;

d1.write(out);

d2.write(out);

out << f[0] << " " << f[1] << " " << f[2];

}

// Must read in same order as write:

void read(istream& in) {

delete []name; // Remove old storage

int i;

in >> i >> ws; // Get int, strip whitespace

name = new char[i];

Trang 11

that it was created elsewhere and may be part of another class hierarchy so you don’t have control over its inheritance However, for this scheme to work correctly you must have access

to the underlying implementation so it can be stored; thus the use of protected

The classes WData1 and WData2 use familiar iostream inserters and extractors to store and retrieve the protected data in Data to and from the iostream object In write( ), you can see

that spaces are added after each floating point number is written; these are necessary to allow parsing of the data on input

The class Conglomerate not only inherits from Data, it also has member objects of type WData1 and WData2, as well as a pointer to a character string In addition, all the classes that inherit from Persistent also contain a VPTR, so this example shows the kind of problem

you’ll actually encounter when using persistence

When you create write( ) and read( ) function pairs, the read( ) must exactly mirror what happens during the write( ), so read( ) pulls the bits off the disk the same way they were placed there by write( ) Here, the first problem that’s tackled is the char*, which points to a string of any length The size of the string is calculated and stored on disk as an int (followed

Trang 12

by a space to enable parsing) to allow the read( ) function to allocate the correct amount of

storage

When you have subobjects that have read( ) and write( ) member functions, all you need to

do is call those functions in the new read( ) and write( ) functions This is followed by direct

storage of the members in the base class

People have gone to great lengths to automate persistence, for example, by creating modified preprocessors to support a “persistent” keyword to be applied when defining a class One can imagine a more elegant approach than the one shown here for implementing persistence, but it has the advantage that it works under all implementations of C++, doesn’t require special language extensions, and is relatively bulletproof

Avoiding MI

The need for multiple inheritance in Persist2.cpp is contrived, based on the concept that you

don’t have control of some of the code in the project Upon examination of the example, you

can see that MI can be easily avoided by using member objects of type Data, and putting the virtual read( )and write( ) members inside Data or WData1 and WData2 rather than in a

separate class There are many situations like this one where multiple inheritance may be avoided; the language feature is included for unusual, special-case situations that would otherwise be difficult or impossible to handle But when the question of whether to use

multiple inheritance comes up, you should ask two questions:

1 Do I need to show the public interfaces of both these classes, or could one class be embedded with some of its interface produced with member functions in the new class?

2 Do I need to upcast to both of the base classes? (This applies when you have more than two base classes, of course.)

If you can’t answer “no” to both questions, you can avoid using MI and should probably do

so

One situation to watch for is when one class only needs to be upcast as a function argument

In that case, the class can be embedded and an automatic type conversion operator provided in your new class to produce a reference to the embedded object Any time you use an object of your new class as an argument to a function that expects the embedded object, the type

conversion operator is used However, type conversion can’t be used for normal member selection; that requires inheritance

Mixin types

Rodents & pets(play)

Trang 13

polymorphically

Later in the development of the project or sometime during its maintenance, you discover that the base-class interface provided by the vendor is incomplete: A function may be nonvirtual and you need it to be virtual, or a virtual function is completely missing in the interface, but essential to the solution of your problem If you had the source code, you could go back and put it in But you don’t, and you have a lot of existing code that depends on the original interface Here, multiple inheritance is the perfect solution

For example, here’s the header file for a library you acquire:

//: C06:Vendor.h

// Vendor-supplied class header

// You only get this & the compiled Vendor.obj

void A(const Vendor&);

void B(const Vendor&);

// Etc

Trang 14

#endif // VENDOR_H ///:~

Assume the library is much bigger, with more derived classes and a larger interface Notice

that it also includes the functions A( ) and B( ), which take a base pointer and treat it

polymorphically Here’s the implementation file for the library:

extern ofstream out; // For trace info

void Vendor::v() const {

Trang 15

In your project, this source code is unavailable to you Instead, you get a compiled file as

Vendor.obj or Vendor.lib (or the equivalent for your system)

The problem occurs in the use of this library First, the destructor isn’t virtual This is actually

a design error on the part of the library creator In addition, f( ) was not made virtual; assume

the library creator decided it wouldn’t need to be And you discover that the interface to the base class is missing a function essential to the solution of your problem Also suppose

you’ve already written a fair amount of code using the existing interface (not to mention the

functions A( ) and B( ), which are out of your control), and you don’t want to change it

To repair the problem, create your own class interface and multiply inherit a new set of

derived classes from your interface and from the existing classes:

virtual void v() const = 0;

virtual void f() const = 0;

// New interface function:

virtual void g() const = 0;

virtual ~MyBase() { out << "~MyBase()\n"; }

Trang 16

out << "calling A(p1p)\n";

A(p1p); // Same old behavior

out << "calling B(p1p)\n";

B(p1p); // Same old behavior

out << "delete mp\n";

// Deleting a reference to a heap object:

delete &mp; // Right behavior

} ///:~

In MyBase (which does not use MI), both f( ) and the destructor are now virtual, and a new

virtual function g( ) has been added to the interface Now each of the derived classes in the

original library must be recreated, mixing in the new interface with MI The functions

Paste1::v( ) and Paste1::f( )need to call only the original base-class versions of their

functions But now, if you upcast to MyBase as in main( )

Trang 17

The original library functions A( ) and B( ) still work the same (assuming the new v( ) calls its

base-class version) The destructor is now virtual and exhibits the correct behavior

Although this is a messy example, it does occur in practice and it’s a good demonstration of where multiple inheritance is clearly necessary: You must be able to upcast to both base classes

Summary

The reason MI exists in C++ and not in other OOP languages is that C++ is a hybrid language and couldn’t enforce a single monolithic class hierarchy the way Smalltalk does Instead, C++ allows many inheritance trees to be formed, so sometimes you may need to combine the interfaces from two or more trees into a new class

If no “diamonds” appear in your class hierarchy, MI is fairly simple (although identical

function signatures in base classes must be resolved) If a diamond appears, then you must deal with the problems of duplicate subobjects by introducing virtual base classes This not only adds confusion, but the underlying representation becomes more complex and less efficient

Multiple inheritance has been called the “goto of the 90’s”.22 This seems appropriate because, like a goto, MI is best avoided in normal programming, but can occasionally be very useful It’s a “minor” but more advanced feature of C++, designed to solve problems that arise in special situations If you find yourself using it often, you may want to take a look at your reasoning A good Occam’s Razor is to ask, “Must I upcast to all of the base classes?” If not,

your life will be easier if you embed instances of all the classes you don’t need to upcast to

22 A phrase coined by Zack Urlocker

Trang 18

Exercises

1 These exercises will take you step-by-step through the traps of MI Create a

base class X with a single constructor that takes an int argument and a member function f( ), that takes no arguments and returns void Now inherit

X into Y and Z, creating constructors for each of them that takes a single int argument Now multiply inherit Y and Z into A Create an object of class A, and call f( ) for that object Fix the problem with explicit

disambiguation

2 Starting with the results of exercise 1, create a pointer to an X called px, and assign to it the address of the object of type A you created before Fix the problem using a virtual base class Now fix X so you no longer have to call the constructor for X inside A

3 Starting with the results of exercise 2, remove the explicit disambiguation

for f( ), and see if you can call f( ) through px Trace it to see which

function gets called Fix the problem so the correct function will be called

in a class hierarchy

Trang 19

7: Exception

handling

Improved error recovery is one of the most powerful ways

you can increase the robustness of your code

Unfortunately, it’s almost accepted practice to ignore error conditions, as if we’re in a state of

denial about errors Some of the reason is no doubt the tediousness and code bloat of checking

for many errors For example, printf( ) returns the number of characters that were

successfully printed, but virtually no one checks this value The proliferation of code alone

would be disgusting, not to mention the difficulty it would add in reading the code

The problem with C’s approach to error handling could be thought of as one of coupling – the

user of a function must tie the error-handling code so closely to that function that it becomes

too ungainly and awkward to use

One of the major features in C++ is exception handling, which is a better way of thinking

about and handling errors With exception handling,

1 Error-handling code is not nearly so tedious to write, and it doesn't become

mixed up with your "normal" code You write the code you want to happen;

later in a separate section you write the code to cope with the problems If you

make multiple calls to a function, you handle the errors from that function once,

in one place

2 Errors cannot be ignored If a function needs to send an error message to the

caller of that function, it “throws” an object representing that error out of the

function If the caller doesn’t “catch” the error and handle it, it goes to the next

enclosing scope, and so on until someone catches the error

This chapter examines C’s approach to error handling (such as it is), why it did not work very

well for C, and why it won’t work at all for C++ Then you’ll learn about try, throw, and

catch, the C++ keywords that support exception handling

Error handling in C

In most of the examples in this book, assert( ) was used as it was intended: for debugging

during development with code that could be disabled with #define NDEBUG for the shipping

Trang 20

product Runtime error checking uses the require.h functions developed in Chapter XX

These were a convenient way to say, “There’s a problem here you’ll probably want to handle with some more sophisticated code, but you don’t need to be distracted by it in this example.”

The require.h functions may be enough for small programs, but for complicated products you

may need to write more sophisticated error-handling code

Error handling is quite straightforward in situations where you check some condition and you know exactly what to do because you have all the necessary information in that context Of course, you just handle the error at that point These are ordinary errors and not the subject of this chapter

The problem occurs when you don’t have enough information in that context, and you need to

pass the error information into a larger context where that information does exist There are three typical approaches in C to handle this situation

1 Return error information from the function or, if the return value cannot be

used this way, set a global error condition flag (Standard C provides errno and perror( ) to support this.) As mentioned before, the programmer may

simply ignore the error information because tedious and obfuscating error checking must occur with each function call In addition, returning from a function that hits an exceptional condition may not make sense

2 Use the little-known Standard C library signal-handling system,

implemented with the signal( ) function (to determine what happens when the event occurs) and raise( ) (to generate an event) Again, this has high

coupling because it requires the user of any library that generates signals to understand and install the appropriate signal-handling mechanism; also in large projects the signal numbers from different libraries may clash with each other

3 Use the nonlocal goto functions in the Standard C library: setjmp( ) and longjmp( ) With setjmp( ) you save a known good state in the program, and if you get into trouble, longjmp( ) will restore that state Again, there is

high coupling between the place where the state is stored and the place where the error occurs

When considering error-handling schemes with C++, there’s an additional very critical

problem: The C techniques of signals and setjmp/longjmp do not call destructors, so objects aren’t properly cleaned up This makes it virtually impossible to effectively recover from an exceptional condition because you’ll always leave objects behind that haven’t been cleaned

up and that can no longer be accessed The following example demonstrates this with

Trang 21

using namespace std;

class Rainbow {

public:

Rainbow() { cout << "Rainbow()" << endl; }

~Rainbow() { cout << "~Rainbow()" << endl; }

cout << "Auntie Em! "

<< "I had the strangest dream "

<< endl;

}

} ///:~

setjmp( ) is an odd function because if you call it directly, it stores all the relevant

information about the current processor state in the jmp_buf and returns zero In that case it has the behavior of an ordinary function However, if you call longjmp( ) using the same jmp_buf, it’s as if you’re returning from setjmp( ) again – you pop right out the back end of the setjmp( ) This time, the value returned is the second argument to longjmp( ), so you can detect that you’re actually coming back from a longjmp( ) You can imagine that with many different jmp_bufs, you could pop around to many different places in the program The difference between a local goto (with a label) and this nonlocal goto is that you can go

anywhere with setjmp/longjmp (with some restrictions not discussed here)

Trang 22

The problem with C++ is that longjmp( ) doesn’t respect objects; in particular it doesn’t call

destructors when it jumps out of a scope.23 Destructor calls are essential, so this approach won’t work with C++

Throwing an exception

If you encounter an exceptional situation in your code – that is, one where you don’t have enough information in the current context to decide what to do – you can send information about the error into a larger context by creating an object containing that information and

“throwing” it out of your current context This is called throwing an exception Here’s what it

looks like:

throw myerror(“something bad happened”);

myerror is an ordinary class, which takes a char* as its argument You can use any type

when you throw (including built-in types), but often you’ll use special types created just for throwing exceptions

The keyword throw causes a number of relatively magical things to happen First it creates an

object that isn’t there under normal program execution, and of course the constructor is called for that object Then the object is, in effect, “returned” from the function, even though that object type isn’t normally what the function is designed to return A simplistic way to think about exception handling is as an alternate return mechanism, although you get into trouble if you take the analogy too far – you can also exit from ordinary scopes by throwing an

exception But a value is returned, and the function or scope exits

Any similarity to function returns ends there because where you return to is someplace

completely different than for a normal function call (You end up in an appropriate exception handler that may be miles away from where the exception was thrown.) In addition, only objects that were successfully created at the time of the exception are destroyed (unlike a normal function return that assumes all the objects in the scope must be destroyed) Of course, the exception object itself is also properly cleaned up at the appropriate point

In addition, you can throw as many different types of objects as you want Typically, you’ll throw a different type for each different type of error The idea is to store the information in

the object and the type of object, so someone in the bigger context can figure out what to do

with your exception

23 You may be surprised when you run the example – some C++ compilers have extended

longjmp( ) to clean up objects on the stack This is nonportable behavior

Trang 23

Catching an exception

If a function throws an exception, it must assume that exception is caught and dealt with As mentioned before, one of the advantages of C++ exception handling is that it allows you to concentrate on the problem you’re actually trying to solve in one place, and then deal with the errors from that code in another place

The try block

If you’re inside a function and you throw an exception (or a called function throws an

exception), that function will exit in the process of throwing If you don’t want a throw to

leave a function, you can set up a special block within the function where you try to solve

your actual programming problem (and potentially generate exceptions) This is called the try block because you try your various function calls there The try block is an ordinary scope,

preceded by the keyword try:

checking This means your code is a lot easier to write and easier to read because the goal of the code is not confused with the error checking

Exception handlers

Of course, the thrown exception must end up someplace This is the exception handler, and

there’s one for every exception type you want to catch Exception handlers immediately

follow the try block and are denoted by the keyword catch:

Each catch clause (exception handler) is like a little function that takes a single argument of

one particular type The identifier (id1, id2, and so on) may be used inside the handler, just

Trang 24

like a function argument, although sometimes there is no identifier because it’s not needed in the handler – the exception type gives you enough information to deal with it

The handlers must appear directly after the try block If an exception is thrown, the handling mechanism goes hunting for the first handler with an argument that matches the type

exception-of the exception Then it enters that catch clause, and the exception is considered handled (The search for handlers stops once the catch clause is finished.) Only the matching catch

clause executes; it’s not like a switch statement where you need a break after each case to

prevent the remaining ones from executing

Notice that, within the try block, a number of different function calls might generate the same exception, but you only need one handler

Termination vs resumption

There are two basic models in exception-handling theory In termination (which is what C++

supports) you assume the error is so critical there’s no way to get back to where the exception occurred Whoever threw the exception decided there was no way to salvage the situation, and

they don’t want to come back

The alternative is called resumption It means the exception handler is expected to do

something to rectify the situation, and then the faulting function is retried, presuming success the second time If you want resumption, you still hope to continue execution after the

exception is handled, so your exception is more like a function call – which is how you should set up situations in C++ where you want resumption-like behavior (that is, don’t throw an

exception; call a function that fixes the problem) Alternatively, place your try block inside a while loop that keeps reentering the try block until the result is satisfactory

Historically, programmers using operating systems that supported resumptive exception handling eventually ended up using termination-like code and skipping resumption So

although resumption sounds attractive at first, it seems it isn’t quite so useful in practice One reason may be the distance that can occur between the exception and its handler; it’s one thing

to terminate to a handler that’s far away, but to jump to that handler and then back again may

be too conceptually difficult for large systems where the exception can be generated from many points

The exception specification

You’re not required to inform the person using your function what exceptions you might throw However, this is considered very uncivilized because it means he cannot be sure what code to write to catch all potential exceptions Of course, if he has your source code, he can

hunt through and look for throw statements, but very often a library doesn’t come with

sources C++ provides a syntax to allow you to politely tell the user what exceptions this

function throws, so the user may handle them This is the exception specification and it’s part

of the function declaration, appearing after the argument list

Trang 25

The exception specification reuses the keyword throw, followed by a parenthesized list of all

the potential exception types So your function declaration may look like

void f() throw(toobig, toosmall, divzero);

With exceptions, the traditional function declaration

void f();

means that any type of exception may be thrown from the function If you say

void f() throw();

it means that no exceptions are thrown from a function

For good coding policy, good documentation, and ease-of-use for the function caller, you should always use an exception specification when you write a function that throws

previous value of the unexpected( ) pointer so you can save it and restore it later To use set_unexpected( ), you must include the header file <exception> Here’s an example that

shows a simple use of all the features discussed so far in the chapter:

Tiêu đề	Chapter 15: Multiple Inheritance
Trường học	Vietnam National University, Hanoi
Chuyên ngành	Computer Science
Thể loại	Lecture Notes
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	50
Dung lượng	187,92 KB