Upcasting When you embed subobjects of a class inside a new class, whether you do it by creating member objects or through inheritance, each subobject is placed within the new object by
Trang 1b.push_back(new X);
for(int i = 0; i < b.size(); i++)
cout << b[i]->vf() << endl;
The term “pointer magic” has been used to describe the way virtual inheritance is
implemented You can see the physical overhead of virtual inheritance with the following program:
Trang 2alignment) The results are a bit surprising (these are from one particular compiler; yours may
Both b and nonv_inheritance contain the extra pointer, as expected But when virtual
inheritance is added, it would appear that the VPTR plus two extra pointers are added! By the
time the multiple inheritance is performed, the object appears to contain five extra pointers (however, one of these is probably a second VPTR for the second multiply inherited
subobject)
The curious can certainly probe into your particular implementation and look at the assembly language for member selection to determine exactly what these extra bytes are for, and the cost of member selection with multiple inheritance19 The rest of you have probably seen enough to guess that quite a bit more goes on with virtual multiple inheritance, so it should be used sparingly (or avoided) when efficiency is an issue
Upcasting
When you embed subobjects of a class inside a new class, whether you do it by creating member objects or through inheritance, each subobject is placed within the new object by the
compiler Of course, each subobject has its own this pointer, and as long as you’re dealing
with member objects, everything is quite straightforward But as soon as multiple inheritance
19 See also Jan Gray, “C++ Under the Hood”, a chapter in Black Belt C++ (edited by Bruce
Eckel, M&T Press, 1995)
Trang 3is introduced, a funny thing occurs: An object can have more than one this pointer because
the object represents more than one type during upcasting The following example
demonstrates this point:
Trang 4Base1* b1 = &mi; // Upcast
Base2* b2 = &mi; // Upcast
out << "Base 1 pointer = " << b1 << endl;
out << "Base 2 pointer = " << b2 << endl;
} ///:~
The arrays of bytes inside each class are created with hexadecimal sizes, so the output
addresses (which are printed in hex) are easy to read Each class has a function that prints its
this pointer, and these classes are assembled with both multiple inheritance and composition into the class MI, which prints its own address and the addresses of all the other subobjects This function is called in main( ) You can clearly see that you get two different this pointers for the same object The address of the MI object is taken and upcast to the two different
types Here’s the output:20
sizeof(mi) = 40 hex
mi this = 0x223e
Base1 this = 0x223e
Base2 this = 0x224e
Member1 this = 0x225e
Member2 this = 0x226e
Base 1 pointer = 0x223e
Base 2 pointer = 0x224e
20 For easy readability the code was generated for a small-model Intel processor
Trang 5Although object layouts vary from compiler to compiler and are not specified in Standard C++, this one is fairly typical The starting address of the object corresponds to the address of the first class in the base-class list Then the second inherited class is placed, followed by the member objects in order of declaration
When the upcast to the Base1 and Base2 pointers occur, you can see that, even though they’re ostensibly pointing to the same object, they must actually have different this pointers, so the
proper starting address can be passed to the member functions of each subobject The only way things can work correctly is if this implicit upcasting takes place when you call a member function for a multiply inherited subobject
Persistence
Normally this isn’t a problem, because you want to call member functions that are concerned with that subobject of the multiply inherited object However, if your member function needs
to know the true starting address of the object, multiple inheritance causes problems
Ironically, this happens in one of the situations where multiple inheritance seems to be useful:
persistence
The lifetime of a local object is the scope in which it is defined The lifetime of a global
object is the lifetime of the program A persistent object lives between invocations of a
program: You can normally think of it as existing on disk instead of in memory One
definition of an object-oriented database is “a collection of persistent objects.”
To implement persistence, you must move a persistent object from disk into memory in order
to call functions for it, and later store it to disk before the program expires Four issues arise when storing an object on disk:
1 The object must be converted from its representation in memory to a series of bytes
on disk
2 Because the values of any pointers in memory won’t have meaning the next time the program is invoked, these pointers must be converted to something meaningful
3 What the pointers point to must also be stored and retrieved
4 When restoring an object from disk, the virtual pointers in the object must be
respected
Because the object must be converted back and forth between a layout in memory and a serial
representation on disk, the process is called serialization (to write an object to disk) and deserialization (to restore an object from disk) Although it would be very convenient, these
processes require too much overhead to support directly in the language Class libraries will often build in support for serialization and deserialization by adding special member functions
and placing requirements on new classes (Usually some sort of serialize( ) function must be
written for each new class.) Also, persistence is generally not automatic; you must usually explicitly write and read the objects
Trang 6MI-based persistence
Consider sidestepping the pointer issues for now and creating a class that installs persistence
into simple objects using multiple inheritance By inheriting the persistence class along with
your new class, you automatically create classes that can be read from and written to disk Although this sounds great, the use of multiple inheritance introduces a pitfall, as seen in the following example
Trang 7class WData1 : public Persistent, public Data {
ofstream f1("f1.dat"), f2("f2.dat");
assure(f1, "f1.dat"); assure(f2, "f2.dat");
WData1 d1(1.1, 2.2, 3.3);
WData2 d2(4.4, 5.5, 6.6);
d1.print("d1 before storage");
d2.print("d2 before storage");
d1.write(f1);
d2.write(f2);
} // Closes files
ifstream f1("f1.dat"), f2("f2.dat");
assure(f1, "f1.dat"); assure(f2, "f2.dat");
WData1 d1;
WData2 d2;
d1.read(f1);
d2.read(f2);
d1.print("d1 after storage");
d2.print("d2 after storage");
} ///:~
In this very simple version, the Persistent::read( ) and Persistent::write( ) functions take the this pointer and call iostream read( ) and write( ) functions (Note that any type of iostream can be used) A more sophisticated Persistent class would call a virtual write( ) function for
each subobject
With the language features covered so far in the book, the number of bytes in the object
cannot be known by the Persistent class so it is inserted as a constructor argument (In
Chapter XX, run-time type identification shows how you can find the exact type of an object
Trang 8given only a base pointer; once you have the exact type you can find out the correct size with
the sizeof operator.)
The Data class contains no pointers or VPTR, so there is no danger in simply writing it to disk and reading it back again And it works fine in class WData1 when, in main( ), it’s written to file F1.DAT and later read back again However, when Persistent is second in the inheritance list of WData2, the this pointer for Persistent is offset to the end of the object, so
it reads and writes past the end of the object This not only produces garbage when reading the object from the file, it’s dangerous because it walks over any storage that occurs after the object
This problem occurs in multiple inheritance any time a class must produce the this pointer for the actual object from a subobject’s this pointer Of course, if you know your compiler always
lays out objects in order of declaration in the inheritance list, you can ensure that you always put the critical class at the beginning of the list (assuming there’s only one critical class) However, such a class may exist in the inheritance hierarchy of another class and you may unwittingly put it in the wrong place during multiple inheritance Fortunately, using run-time type identification (the subject of Chapter XX) will produce the proper pointer to the actual object, even if multiple inheritance is used
Improved persistence
A more practical approach to persistence, and one you will see employed more often, is to create virtual functions in the base class for reading and writing and then require the creator of any new class that must be streamed to redefine these functions The argument to the function
is the stream object to write to or read from.21 Then the creator of the class, who knows best how the new parts should be read or written, is responsible for making the correct function calls This doesn’t have the “magical” quality of the previous example, and it requires more coding and knowledge on the part of the user, but it works and doesn’t break when pointers are present:
21 Sometimes there’s only a single function for streaming, and the argument contains
information about whether you’re reading or writing
Trang 9virtual void read(istream& in) = 0;
void print(const char* msg = "") const {
if(*msg) cout << msg << endl;
Trang 10void read(istream& in) {
out << i << " "; // Store size of string
out << name << endl;
d1.write(out);
d2.write(out);
out << f[0] << " " << f[1] << " " << f[2];
}
// Must read in same order as write:
void read(istream& in) {
delete []name; // Remove old storage
int i;
in >> i >> ws; // Get int, strip whitespace
name = new char[i];
Trang 11that it was created elsewhere and may be part of another class hierarchy so you don’t have control over its inheritance However, for this scheme to work correctly you must have access
to the underlying implementation so it can be stored; thus the use of protected
The classes WData1 and WData2 use familiar iostream inserters and extractors to store and retrieve the protected data in Data to and from the iostream object In write( ), you can see
that spaces are added after each floating point number is written; these are necessary to allow parsing of the data on input
The class Conglomerate not only inherits from Data, it also has member objects of type WData1 and WData2, as well as a pointer to a character string In addition, all the classes that inherit from Persistent also contain a VPTR, so this example shows the kind of problem
you’ll actually encounter when using persistence
When you create write( ) and read( ) function pairs, the read( ) must exactly mirror what happens during the write( ), so read( ) pulls the bits off the disk the same way they were placed there by write( ) Here, the first problem that’s tackled is the char*, which points to a string of any length The size of the string is calculated and stored on disk as an int (followed
Trang 12by a space to enable parsing) to allow the read( ) function to allocate the correct amount of
storage
When you have subobjects that have read( ) and write( ) member functions, all you need to
do is call those functions in the new read( ) and write( ) functions This is followed by direct
storage of the members in the base class
People have gone to great lengths to automate persistence, for example, by creating modified preprocessors to support a “persistent” keyword to be applied when defining a class One can imagine a more elegant approach than the one shown here for implementing persistence, but it has the advantage that it works under all implementations of C++, doesn’t require special language extensions, and is relatively bulletproof
Avoiding MI
The need for multiple inheritance in Persist2.cpp is contrived, based on the concept that you
don’t have control of some of the code in the project Upon examination of the example, you
can see that MI can be easily avoided by using member objects of type Data, and putting the virtual read( )and write( ) members inside Data or WData1 and WData2 rather than in a
separate class There are many situations like this one where multiple inheritance may be avoided; the language feature is included for unusual, special-case situations that would otherwise be difficult or impossible to handle But when the question of whether to use
multiple inheritance comes up, you should ask two questions:
1 Do I need to show the public interfaces of both these classes, or could one class be embedded with some of its interface produced with member functions in the new class?
2 Do I need to upcast to both of the base classes? (This applies when you have more than two base classes, of course.)
If you can’t answer “no” to both questions, you can avoid using MI and should probably do
so
One situation to watch for is when one class only needs to be upcast as a function argument
In that case, the class can be embedded and an automatic type conversion operator provided in your new class to produce a reference to the embedded object Any time you use an object of your new class as an argument to a function that expects the embedded object, the type
conversion operator is used However, type conversion can’t be used for normal member selection; that requires inheritance
Mixin types
Rodents & pets(play)
Trang 13polymorphically
Later in the development of the project or sometime during its maintenance, you discover that the base-class interface provided by the vendor is incomplete: A function may be nonvirtual and you need it to be virtual, or a virtual function is completely missing in the interface, but essential to the solution of your problem If you had the source code, you could go back and put it in But you don’t, and you have a lot of existing code that depends on the original interface Here, multiple inheritance is the perfect solution
For example, here’s the header file for a library you acquire:
//: C06:Vendor.h
// Vendor-supplied class header
// You only get this & the compiled Vendor.obj
void A(const Vendor&);
void B(const Vendor&);
// Etc
Trang 14#endif // VENDOR_H ///:~
Assume the library is much bigger, with more derived classes and a larger interface Notice
that it also includes the functions A( ) and B( ), which take a base pointer and treat it
polymorphically Here’s the implementation file for the library:
extern ofstream out; // For trace info
void Vendor::v() const {
Trang 15In your project, this source code is unavailable to you Instead, you get a compiled file as
Vendor.obj or Vendor.lib (or the equivalent for your system)
The problem occurs in the use of this library First, the destructor isn’t virtual This is actually
a design error on the part of the library creator In addition, f( ) was not made virtual; assume
the library creator decided it wouldn’t need to be And you discover that the interface to the base class is missing a function essential to the solution of your problem Also suppose
you’ve already written a fair amount of code using the existing interface (not to mention the
functions A( ) and B( ), which are out of your control), and you don’t want to change it
To repair the problem, create your own class interface and multiply inherit a new set of
derived classes from your interface and from the existing classes:
virtual void v() const = 0;
virtual void f() const = 0;
// New interface function:
virtual void g() const = 0;
virtual ~MyBase() { out << "~MyBase()\n"; }
Trang 16out << "calling A(p1p)\n";
A(p1p); // Same old behavior
out << "calling B(p1p)\n";
B(p1p); // Same old behavior
out << "delete mp\n";
// Deleting a reference to a heap object:
delete ∓ // Right behavior
} ///:~
In MyBase (which does not use MI), both f( ) and the destructor are now virtual, and a new
virtual function g( ) has been added to the interface Now each of the derived classes in the
original library must be recreated, mixing in the new interface with MI The functions
Paste1::v( ) and Paste1::f( )need to call only the original base-class versions of their
functions But now, if you upcast to MyBase as in main( )
Trang 17The original library functions A( ) and B( ) still work the same (assuming the new v( ) calls its
base-class version) The destructor is now virtual and exhibits the correct behavior
Although this is a messy example, it does occur in practice and it’s a good demonstration of where multiple inheritance is clearly necessary: You must be able to upcast to both base classes
Summary
The reason MI exists in C++ and not in other OOP languages is that C++ is a hybrid language and couldn’t enforce a single monolithic class hierarchy the way Smalltalk does Instead, C++ allows many inheritance trees to be formed, so sometimes you may need to combine the interfaces from two or more trees into a new class
If no “diamonds” appear in your class hierarchy, MI is fairly simple (although identical
function signatures in base classes must be resolved) If a diamond appears, then you must deal with the problems of duplicate subobjects by introducing virtual base classes This not only adds confusion, but the underlying representation becomes more complex and less efficient
Multiple inheritance has been called the “goto of the 90’s”.22 This seems appropriate because, like a goto, MI is best avoided in normal programming, but can occasionally be very useful It’s a “minor” but more advanced feature of C++, designed to solve problems that arise in special situations If you find yourself using it often, you may want to take a look at your reasoning A good Occam’s Razor is to ask, “Must I upcast to all of the base classes?” If not,
your life will be easier if you embed instances of all the classes you don’t need to upcast to
22 A phrase coined by Zack Urlocker
Trang 18Exercises
1 These exercises will take you step-by-step through the traps of MI Create a
base class X with a single constructor that takes an int argument and a member function f( ), that takes no arguments and returns void Now inherit
X into Y and Z, creating constructors for each of them that takes a single int argument Now multiply inherit Y and Z into A Create an object of class A, and call f( ) for that object Fix the problem with explicit
disambiguation
2 Starting with the results of exercise 1, create a pointer to an X called px, and assign to it the address of the object of type A you created before Fix the problem using a virtual base class Now fix X so you no longer have to call the constructor for X inside A
3 Starting with the results of exercise 2, remove the explicit disambiguation
for f( ), and see if you can call f( ) through px Trace it to see which
function gets called Fix the problem so the correct function will be called
in a class hierarchy
Trang 197: Exception
handling
Improved error recovery is one of the most powerful ways
you can increase the robustness of your code
Unfortunately, it’s almost accepted practice to ignore error conditions, as if we’re in a state of
denial about errors Some of the reason is no doubt the tediousness and code bloat of checking
for many errors For example, printf( ) returns the number of characters that were
successfully printed, but virtually no one checks this value The proliferation of code alone
would be disgusting, not to mention the difficulty it would add in reading the code
The problem with C’s approach to error handling could be thought of as one of coupling – the
user of a function must tie the error-handling code so closely to that function that it becomes
too ungainly and awkward to use
One of the major features in C++ is exception handling, which is a better way of thinking
about and handling errors With exception handling,
1 Error-handling code is not nearly so tedious to write, and it doesn't become
mixed up with your "normal" code You write the code you want to happen;
later in a separate section you write the code to cope with the problems If you
make multiple calls to a function, you handle the errors from that function once,
in one place
2 Errors cannot be ignored If a function needs to send an error message to the
caller of that function, it “throws” an object representing that error out of the
function If the caller doesn’t “catch” the error and handle it, it goes to the next
enclosing scope, and so on until someone catches the error
This chapter examines C’s approach to error handling (such as it is), why it did not work very
well for C, and why it won’t work at all for C++ Then you’ll learn about try, throw, and
catch, the C++ keywords that support exception handling
Error handling in C
In most of the examples in this book, assert( ) was used as it was intended: for debugging
during development with code that could be disabled with #define NDEBUG for the shipping
Trang 20product Runtime error checking uses the require.h functions developed in Chapter XX
These were a convenient way to say, “There’s a problem here you’ll probably want to handle with some more sophisticated code, but you don’t need to be distracted by it in this example.”
The require.h functions may be enough for small programs, but for complicated products you
may need to write more sophisticated error-handling code
Error handling is quite straightforward in situations where you check some condition and you know exactly what to do because you have all the necessary information in that context Of course, you just handle the error at that point These are ordinary errors and not the subject of this chapter
The problem occurs when you don’t have enough information in that context, and you need to
pass the error information into a larger context where that information does exist There are three typical approaches in C to handle this situation
1 Return error information from the function or, if the return value cannot be
used this way, set a global error condition flag (Standard C provides errno and perror( ) to support this.) As mentioned before, the programmer may
simply ignore the error information because tedious and obfuscating error checking must occur with each function call In addition, returning from a function that hits an exceptional condition may not make sense
2 Use the little-known Standard C library signal-handling system,
implemented with the signal( ) function (to determine what happens when the event occurs) and raise( ) (to generate an event) Again, this has high
coupling because it requires the user of any library that generates signals to understand and install the appropriate signal-handling mechanism; also in large projects the signal numbers from different libraries may clash with each other
3 Use the nonlocal goto functions in the Standard C library: setjmp( ) and longjmp( ) With setjmp( ) you save a known good state in the program, and if you get into trouble, longjmp( ) will restore that state Again, there is
high coupling between the place where the state is stored and the place where the error occurs
When considering error-handling schemes with C++, there’s an additional very critical
problem: The C techniques of signals and setjmp/longjmp do not call destructors, so objects aren’t properly cleaned up This makes it virtually impossible to effectively recover from an exceptional condition because you’ll always leave objects behind that haven’t been cleaned
up and that can no longer be accessed The following example demonstrates this with
Trang 21using namespace std;
class Rainbow {
public:
Rainbow() { cout << "Rainbow()" << endl; }
~Rainbow() { cout << "~Rainbow()" << endl; }
cout << "Auntie Em! "
<< "I had the strangest dream "
<< endl;
}
} ///:~
setjmp( ) is an odd function because if you call it directly, it stores all the relevant
information about the current processor state in the jmp_buf and returns zero In that case it has the behavior of an ordinary function However, if you call longjmp( ) using the same jmp_buf, it’s as if you’re returning from setjmp( ) again – you pop right out the back end of the setjmp( ) This time, the value returned is the second argument to longjmp( ), so you can detect that you’re actually coming back from a longjmp( ) You can imagine that with many different jmp_bufs, you could pop around to many different places in the program The difference between a local goto (with a label) and this nonlocal goto is that you can go
anywhere with setjmp/longjmp (with some restrictions not discussed here)
Trang 22The problem with C++ is that longjmp( ) doesn’t respect objects; in particular it doesn’t call
destructors when it jumps out of a scope.23 Destructor calls are essential, so this approach won’t work with C++
Throwing an exception
If you encounter an exceptional situation in your code – that is, one where you don’t have enough information in the current context to decide what to do – you can send information about the error into a larger context by creating an object containing that information and
“throwing” it out of your current context This is called throwing an exception Here’s what it
looks like:
throw myerror(“something bad happened”);
myerror is an ordinary class, which takes a char* as its argument You can use any type
when you throw (including built-in types), but often you’ll use special types created just for throwing exceptions
The keyword throw causes a number of relatively magical things to happen First it creates an
object that isn’t there under normal program execution, and of course the constructor is called for that object Then the object is, in effect, “returned” from the function, even though that object type isn’t normally what the function is designed to return A simplistic way to think about exception handling is as an alternate return mechanism, although you get into trouble if you take the analogy too far – you can also exit from ordinary scopes by throwing an
exception But a value is returned, and the function or scope exits
Any similarity to function returns ends there because where you return to is someplace
completely different than for a normal function call (You end up in an appropriate exception handler that may be miles away from where the exception was thrown.) In addition, only objects that were successfully created at the time of the exception are destroyed (unlike a normal function return that assumes all the objects in the scope must be destroyed) Of course, the exception object itself is also properly cleaned up at the appropriate point
In addition, you can throw as many different types of objects as you want Typically, you’ll throw a different type for each different type of error The idea is to store the information in
the object and the type of object, so someone in the bigger context can figure out what to do
with your exception
23 You may be surprised when you run the example – some C++ compilers have extended
longjmp( ) to clean up objects on the stack This is nonportable behavior
Trang 23Catching an exception
If a function throws an exception, it must assume that exception is caught and dealt with As mentioned before, one of the advantages of C++ exception handling is that it allows you to concentrate on the problem you’re actually trying to solve in one place, and then deal with the errors from that code in another place
The try block
If you’re inside a function and you throw an exception (or a called function throws an
exception), that function will exit in the process of throwing If you don’t want a throw to
leave a function, you can set up a special block within the function where you try to solve
your actual programming problem (and potentially generate exceptions) This is called the try block because you try your various function calls there The try block is an ordinary scope,
preceded by the keyword try:
checking This means your code is a lot easier to write and easier to read because the goal of the code is not confused with the error checking
Exception handlers
Of course, the thrown exception must end up someplace This is the exception handler, and
there’s one for every exception type you want to catch Exception handlers immediately
follow the try block and are denoted by the keyword catch:
Each catch clause (exception handler) is like a little function that takes a single argument of
one particular type The identifier (id1, id2, and so on) may be used inside the handler, just
Trang 24like a function argument, although sometimes there is no identifier because it’s not needed in the handler – the exception type gives you enough information to deal with it
The handlers must appear directly after the try block If an exception is thrown, the handling mechanism goes hunting for the first handler with an argument that matches the type
exception-of the exception Then it enters that catch clause, and the exception is considered handled (The search for handlers stops once the catch clause is finished.) Only the matching catch
clause executes; it’s not like a switch statement where you need a break after each case to
prevent the remaining ones from executing
Notice that, within the try block, a number of different function calls might generate the same exception, but you only need one handler
Termination vs resumption
There are two basic models in exception-handling theory In termination (which is what C++
supports) you assume the error is so critical there’s no way to get back to where the exception occurred Whoever threw the exception decided there was no way to salvage the situation, and
they don’t want to come back
The alternative is called resumption It means the exception handler is expected to do
something to rectify the situation, and then the faulting function is retried, presuming success the second time If you want resumption, you still hope to continue execution after the
exception is handled, so your exception is more like a function call – which is how you should set up situations in C++ where you want resumption-like behavior (that is, don’t throw an
exception; call a function that fixes the problem) Alternatively, place your try block inside a while loop that keeps reentering the try block until the result is satisfactory
Historically, programmers using operating systems that supported resumptive exception handling eventually ended up using termination-like code and skipping resumption So
although resumption sounds attractive at first, it seems it isn’t quite so useful in practice One reason may be the distance that can occur between the exception and its handler; it’s one thing
to terminate to a handler that’s far away, but to jump to that handler and then back again may
be too conceptually difficult for large systems where the exception can be generated from many points
The exception specification
You’re not required to inform the person using your function what exceptions you might throw However, this is considered very uncivilized because it means he cannot be sure what code to write to catch all potential exceptions Of course, if he has your source code, he can
hunt through and look for throw statements, but very often a library doesn’t come with
sources C++ provides a syntax to allow you to politely tell the user what exceptions this
function throws, so the user may handle them This is the exception specification and it’s part
of the function declaration, appearing after the argument list
Trang 25The exception specification reuses the keyword throw, followed by a parenthesized list of all
the potential exception types So your function declaration may look like
void f() throw(toobig, toosmall, divzero);
With exceptions, the traditional function declaration
void f();
means that any type of exception may be thrown from the function If you say
void f() throw();
it means that no exceptions are thrown from a function
For good coding policy, good documentation, and ease-of-use for the function caller, you should always use an exception specification when you write a function that throws
previous value of the unexpected( ) pointer so you can save it and restore it later To use set_unexpected( ), you must include the header file <exception> Here’s an example that
shows a simple use of all the features discussed so far in the chapter: