This is detected using the default constructor for istreambuf_iterator which produces the past-the-end iterator object end.. Not only is it a useful tool in itself, the TokenIterator is
Trang 1ostream_iterator<string>(cout, "\n"));
} ///:~
This example was suggested by Nathan Myers, who invented the istreambuf_iterator and its
relatives This iterator extracts information character-by-character from a stream Although
the istreambuf_iterator template argument might suggest to you that you could extract, for example, ints instead of char, that’s not the case The argument must be of some character type – a regular char or a wide character
After the file is open, an istreambuf_iterator called p is attached to the istream so characters can be extracted from it The set<string> called wordlist will be used to hold the resulting
words
The while loop reads words until the end of the input stream is found This is detected using the default constructor for istreambuf_iterator which produces the past-the-end iterator object end Thus, if you want to test to make sure you’re not at the end of the stream, you simply say p != end
The second type of iterator that’s used here is the insert_iterator, which creates an iterator that knows how to insert objects into a container Here, the “container” is the string called
word which, for the purposes of insert_iterator, behaves like a container The constructor for insert_iterator requires the container and an iterator indicating where it should start inserting
the characters You could also use a back_insert_iterator, which requires that the container have a push_back( ) (string does)
After the while loop sets everything up, it begins by looking for the first alpha character, incrementing start until that character is found Then it copies characters from one iterator to the other, stopping when a non-alpha character is found Each word, assuming it is non- empty, is added to wordlist
StreamTokenizer:
a more flexible solution
The above program parses its input into strings of words containing only alpha characters, but
that’s still a special case compared to the generality of strtok( ) What we’d like now is an actual replacement for strtok( ) so we’re never tempted to use it WordList2.cpp can be modified to create a class called StreamTokenizer that delivers a new token as a string whenever you call next( ), according to the delimiters you give it upon construction (very similar to strtok( )):
Trang 2The default delimiters for the StreamTokenizer constructor extract words with only alpha
characters, as before, but now you can choose different delimiters to parse different tokens
The implementation of next( ) looks similar to Wordlist2.cpp:
The first non-delimiter is found, then characters are copied until a delimiter is found, and the
resulting string is returned Here’s a test:
//: C04:TokenizeTest.cpp
//{L} StreamTokenizer
Trang 3Now the tool is more reusable than before, but it’s still inflexible, because it can only work
with an istream This isn’t as bad as it first seems, since a string can be turned into an
istream via an istringstream But in the next section we’ll come up with the most general,
reusable tokenizing tool, and this should give you a feeling of what “reusable” really means, and the effort necessary to create truly reusable code
A completely reusable tokenizer
Since the STL containers and algorithms all revolve around iterators, the most flexible
solution will itself be an iterator You could think of the TokenIterator as an iterator that
wraps itself around any other iterator that can produce characters Because it is designed as an input iterator (the most primitive type of iterator) it can be used with any STL algorithm Not
only is it a useful tool in itself, the TokenIterator is also a good example of how you can
design your own iterators.18
The TokenIterator is doubly flexible: first, you can choose the type of iterator that will produce the char input Second, instead of just saying what characters represent the
delimiters, TokenIterator will use a predicate which is a function object whose operator( ) takes a char and decides if it should be in the token or not Although the two examples given
Trang 4
here have a static concept of what characters belong in a token, you could easily design your own function object to change its state as the characters are read, producing a more
template <class InputIter, class Pred = Isalpha>
class TokenIterator: public std::iterator<
TokenIterator(InputIter begin, InputIter end,
Pred pred = Pred())
: first(begin), last(end), predicate(pred) {
Trang 5first = std::find_if(first, last, predicate);
while (first != last && predicate(*first))
Proxy(const std::string& w) : word(w) {}
std::string operator*() { return word; }
// Produce the actual value:
std::string operator*() const { return word; }
std::string* operator->() const {
return &(operator*());
}
// Compare iterators:
bool operator==(const TokenIterator&) {
return word.size() == 0 && first == last;
TokenIterator is inherited from the std::iterator template It might appear that there’s some
kind of functionality that comes with std::iterator, but it is purely a way of tagging an
iterator so that a container that uses it knows what it’s capable of Here, you can see
input_iterator_tag as a template argument – this tells anyone who asks that a TokenIterator
only has the capabilities of an input iterator, and cannot be used with algorithms requiring
Trang 6more sophisticated iterators Apart from the tagging, std::iterator doesn’t do anything else,
which means you must design all the other functionality in yourself
TokenIterator may look a little strange at first, because the first constructor requires both a
“begin” and “end” iterator as arguments, along with the predicate Remember that this is a
“wrapper” iterator that has no idea of how to tell whether it’s at the end of its input source, so the ending iterator is necessary in the first constructor The reason for the second (default)
constructor is that the STL algorithms (and any algorithms you write) need a TokenIterator
sentinel to be the past-the-end value Since all the information necessary to see if the
TokenIterator has reached the end of its input is collected in the first constructor, this second
constructor creates a TokenIterator that is merely used as a placeholder in algorithms
The core of the behavior happens in operator++ This erases the current value of word using
string::resize( ), then finds the first character that satisfies the predicate (thus discovering the
beginning of the new token) using find_if( ) (from the STL algorithms, discussed in the following chapter) The resulting iterator is assigned to first, thus moving first forward to the
beginning of the token Then, as long as the end of the input is not reached and the predicate
is satisfied, characters are copied into the word from the input Finally, the TokenIterator object is returned, and must be dereferenced to access the new token
The postfix increment requires a proxy object to hold the value before the increment, so it can
be returned (see the operator overloading chapter for more details of this) Producing the
actual value is a straightforward operator* The only other functions that must be defined for
an output iterator are the operator== and operator!= to indicate whether the TokenIterator has reached the end of its input You can see that the argument for operator== is ignored – it only cares about whether it has reached its internal last iterator Notice that operator!= is defined in terms of operator==
A good test of TokenIterator includes a number of different sources of input characters including a streambuf_iterator, a char*, and a deque<char>::iterator Finally, the original
Wordlist.cpp problem is solved:
Trang 7IsbIt begin(in), isbEnd;
copy(charIter, end2, back_inserter(wordlist2));
copy(wordlist2.begin(), wordlist2.end(), out);
copy(dcIter, end3, back_inserter(wordlist3));
copy(wordlist3.begin(), wordlist3.end(), out);
Trang 8When using an istreambuf_iterator, you create one to attach to the istream object, and one
with the default constructor as the past-the-end marker Both of these are used to create the
TokenIterator that will actually produce the tokens; the default constructor produces the faux TokenIterator past-the-end sentinel (this is just a placeholder, and as mentioned previously is
actually ignored) The TokenIterator produces strings that are inserted into a container which must, naturally, be a container of string – here a vector<string> is used in all cases except the last (you could also concatenate the results onto a string) Other than that, a
TokenIterator works like any other input iterator
stack
The stack, along with the queue and priority_queue, are classified as adapters, which means
they are implemented using one of the basic sequence containers: vector, list or deque This,
in my opinion, is an unfortunate case of confusing what something does with the details of its underlying implementation – the fact that these are called “adapters” is of primary value only
to the creator of the library When you use them, you generally don’t care that they’re
adapters, but instead that they solve your problem Admittedly there are times when it’s useful
to know that you can choose an alternate implementation or build an adapter from an existing container object, but that’s generally one level removed from the adapter’s behavior So, while you may see it emphasized elsewhere that a particular container is an adapter, I shall only point out that fact when it’s useful Note that each type of adapter has a default container that it’s built upon, and this default is the most sensible implementation, so in most cases you won’t need to concern yourself with the underlying implementation
The following example shows stack<string> implemented in the three possible ways: the default (which uses deque), with a vector and with a list:
Trang 9int main(int argc, char* argv[]) {
requireArgs(argc, 1); // File name is argument
ifstream in(argv[1]);
assure(in, argv[1]);
Stack1 textlines; // Try the different versions
// Read file and store lines in the stack:
been used, this would have been a bit clearer)
The stack template has a very simple interface, essentially the member functions you see
above It doesn’t have sophisticated forms of initialization or access, but if you need that you
can use the underlying container that the stack is implemented upon For example, suppose you have a function that expects a stack interface but in the rest of your program you need the objects stored in a list The following program stores each line of a file along with the leading
number of spaces in that line (you might imagine it as a starting point for performing some kinds of source-code reformatting):
Trang 10string line; // Without leading spaces
int lspaces; // Number of leading spaces
operator<<(ostream& os, const Line& l) {
for(int i = 0; i < l.lspaces; i++)
int main(int argc, char* argv[]) {
requireArgs(argc, 1); // File name is argument
// Turn the list into a stack for printing:
stack<Line, list<Line> > stk(lines);
Trang 11re-inserts the leading spaces so the line prints properly, but you can easily change the number of
spaces by changing the value of lspaces (the member functions to do this are not shown here)
In main( ), the input file is read into a list<Line>, then a stack is wrapped around this list so
it can be sent to stackOut( )
You cannot iterate through a stack; this emphasizes that you only want to perform stack operations when you create a stack You can get equivalent “stack” functionality using a
vector and its back( ), push_back( ) and pop_back( ) methods, and then you have all the
additional functionality of the vector Stack1.cpp can be rewritten to show this:
queue
The queue is a restricted form of a deque – you can only enter elements at one end, and pull them off the other end Functionally, you could use a deque anywhere you need a queue, and you would then also have the additional functionality of the deque The only reason you need
Trang 12to use a queue rather than a deque, then, is if you want to emphasize that you will only be
performing queue-like behavior
The queue is an adapter class like stack, in that it is built on top of another sequence
container As you might guess, the ideal implementation for a queue is a deque, and that is the default template argument for the queue; you’ll rarely need a different implementation
Queues are often used when modeling systems where some elements of the system are
waiting to be served by other elements in the system A classic example of this is the teller problem,” where you have customers arriving at random intervals, getting into a line, and then being served by a set of tellers Since the customers arrive randomly and each take a random amount of time to be served, there’s no way to deterministically know how long the line will be at any time However, it’s possible to simulate the situation and see what happens
“bank-A problem in performing this simulation is the fact that, in effect, each customer and teller should be run by a separate process What we’d like is a multithreaded environment, then each customer or teller would have their own thread However, Standard C++ has no model for multithreading so there is no standard solution to this problem On the other hand, with a little adjustment to the code it’s possible to simulate enough multithreading to provide a satisfactory solution to our problem
Multithreading means you have multiple threads of control running at once, in the same
address space (this differs from multitasking, where you have different processes each running
in their own address space) The trick is that you have fewer CPUs than you do threads (and very often only one CPU) so to give the illusion that each thread has its own CPU there is a
time-slicing mechanism that says “OK, current thread – you’ve had enough time I’m going to
stop you and go give time to some other thread.” This automatic stopping and starting of
threads is called pre-emptive and it means you don’t need to manage the threading process at
all
An alternative approach is for each thread to voluntarily yield the CPU to the scheduler, which then goes and finds another thread that needs running This is easier to synthesize, but
it still requires a method of “swapping” out one thread and swapping in another (this usually
involves saving the stack frame and using the standard C library functions setjmp( ) and
longjmp( ); see my article in the (XX) issue of Computer Language magazine for an
example) So instead, we’ll build the time-slicing into the classes in the system In this case, it will be the tellers that represent the “threads,” (the customers will be passive) so each teller
will have an infinite-looping run( ) method that will execute for a certain number of “time
units,” and then simply return By using the ordinary return mechanism, we eliminate the need for any swapping The resulting program, although small, provides a remarkably reasonable simulation:
//: C04:BankTeller.cpp
// Using a queue and simulated multithreading
// To model a bank teller system
#include <iostream>
#include <queue>
Trang 13int getTime() { return serviceTime; }
void setTime(int newtime) {
static const int slice = 5;
int ttime; // Time left in slice
bool busy; // Is teller serving a customer?
public:
Teller(queue<Customer>& cq)
: customers(cq), ttime(0), busy(false) {}
Teller& operator=(const Teller& rv) {
bool isBusy() { return busy; }
void run(bool recursion = false) {
Trang 14// Inherit to access protected implementation:
class CustomerQ : public queue<Customer> {
// Add a random number of customers to the
// queue, with random service times:
Trang 15for(int i = 0; i < rand() % 5; i++)
In addition, you won’t know how many customers will be arriving in each interval, so this will also be determined randomly
The Customer objects are kept in a queue<Customer>, and each Teller object keeps a reference to that queue When a Teller object is finished with its current Customer object, that Teller will get another Customer from the queue and begin working on the new
Customer, reducing the Customer’s service time during each time slice that the Teller is
allotted All this logic is in the run( ) member function, which is basically a three-way if
statement based on whether the amount of time necessary to serve the customer is less than, greater than or equal to the amount of time left in the teller’s current time slice Notice that if
the Teller has more time after finishing with a Customer, it gets a new customer and recurses
into itself
Just as with a stack, when you use a queue, it’s only a queue and doesn’t have any of the
other functionality of the basic sequence containers This includes the ability to get an iterator
in order to step through the stack However, the underlying sequence container (that the
queue is built upon) is held as a protected member inside the queue, and the identifier for
Trang 16this member is specified in the C++ Standard as ‘c’, which means that you can inherit from
queue in order to access the underlying implementation The CustomerQ class does exactly
that, for the sole purpose of defining an ostream operator<< that can iterate through the
queue and print out its members
The driver for the simulation is the infinite while loop in main( ) At the beginning of each
pass through the loop, a random number of customers are added, with random service times Both the number of tellers and the queue contents are displayed so you can see the state of the system After running each teller, the display is repeated At this point, the system adapts by comparing the number of customers and the number of tellers; if the line is too long another teller is added and if it is short enough a teller can be removed It is in this adaptation section
of the program that you can experiment with policies regarding the optimal addition and removal of tellers If this is the only section that you’re modifying, you may want to
encapsulate policies inside of different objects
Priority queues
When you push( ) an object onto a priority_queue, that object is sorted into the queue according to a function or function object (you can allow the default less template to supply this, or provide one of your own) The priority_queue ensures that when you look at the
top( ) element it will be the one with the highest priority When you’re done with it, you call pop( ) to remove it and bring the next one into place Thus, the priority_queue has nearly the
same interface as a stack, but it behaves differently
Like stack and queue, priority_queue is an adapter which is built on top of one of the basic sequences – the default is vector
It’s trivial to make a priority_queue that works with ints:
Trang 17} ///:~
This pushes into the priority_queue 100 random values from 0 to 24 When you run this
program you’ll see that duplicates are allowed, and the highest values appear first To show how you can change the ordering by providing your own function or function object, the following program gives lower-valued numbers the highest priority:
priority_queue<int, vector<int>, Reverse> pqi;
// Could also say:
complex scheme for ordering your objects
If you look at the description for priority_queue, you see that the constructor can be handed a
“Compare” object, as shown above If you don’t use your own “Compare” object, the default
template behavior is the less template function You might think (as I did) that it would make sense to leave the template instantiation as priority_queue<int>, thus using the default template arguments of vector<int> and less<int> Then you could inherit a new class from
less<int>, redefine operator( ) and hand an object of that type to the priority_queue
Trang 18constructor I tried this, and got it to compile, but the resulting program produced the same old
less<int> behavior The answer lies in the less< > template:
The operator( ) is not virtual, so even though the constructor takes your subclass of
less<int> by reference (thus it doesn’t slice it down to a plain less<int>), when operator( ) is
called, it is the base-class version that is used While it is generally reasonable to expect ordinary classes to behave polymorphically, you cannot make this assumption when using the STL
Of course, a priority_queue of int is trivial A more interesting problem is a to-do list, where each object contains a string and a primary and secondary priority value:
ToDoItem(string td, char pri ='A', int sec =1)
: item(td), primary(pri), secondary(sec) {}
friend bool operator<(
const ToDoItem& x, const ToDoItem& y) {
Trang 19return os << td.primary << td.secondary
ToDoItem’s operator< must be a non-member function for it to work with less< > Other
than that, everything happens automatically The output is:
A1: Water lawn
A2: Feed dog
push_heap( ) and pop_heap( ) (they are the soul of the priority_queue; in fact you could say
that the heap is the priority queue and priority_queue is just a wrapper around it) This turns
out to be reasonably straightforward, but you might think that a shortcut is possible Since the
container used by priority_queue is protected (and has the identifier, according to the
Standard C++ specification, named c) you can inherit a new class which provides access to
the underlying implementation:
Trang 20have to do it by hand, like this:
template<class T, class Compare>
class PQV : public vector<T> {
Compare comp;
public:
PQV(Compare cmp = Compare()) : comp(cmp) {
make_heap(begin(), end(), comp);
}
const T& top() { return front(); }
void push(const T& x) {
push_back(x);
Trang 21push_heap(begin(), end(), comp);
But this program behaves in the same way as the previous one! What you are seeing in the
underlying vector is called a heap This heap represents the tree of the priority queue (stored
in the linear structure of the vector), but when you iterate through it you do not get a linear priority-queue order You might think that you can simply call sort_heap( ), but that only
works once, and then you don’t have a heap anymore, but instead a sorted list This means
that to go back to using it as a heap the user must remember to call make_heap( ) first This
can be encapsulated into your custom priority queue:
template<class T, class Compare>
class PQV : public vector<T> {
Compare comp;
bool sorted;
void assureHeap() {
if(sorted) {
Trang 22// Turn it back into a heap:
make_heap(begin(), end(), comp);
sorted = false;
}
}
public:
PQV(Compare cmp = Compare()) : comp(cmp) {
make_heap(begin(), end(), comp);
// Re-adjust the heap:
push_heap(begin(), end(), comp);
}
void pop() {
assureHeap();
// Move the top element to the last position:
pop_heap(begin(), end(), comp);
// Remove that element:
Trang 23If sorted is true, then the vector is not organized as a heap, but instead as a sorted sequence
assureHeap( ) guarantees that it’s put back into heap form before performing any heap
operations on it
The first for loop in main( ) now has the additional quality that it displays the heap as it’s
being built
The only drawback to this solution is that the user must remember to call sort( ) before
viewing it as a sorted sequence (although one could conceivably override all the methods that produce iterators so that they guarantee sorting) Another solution is to build a priority queue
that is not a vector, but will build you a vector whenever you want one:
// Don't need to call make_heap(); it's empty:
PQV(Compare cmp = Compare()) : comp(cmp) {}
void push(const T& x) {
// Put it at the end:
v.push_back(x);
// Re-adjust the heap:
Trang 24push_heap(v.begin(), v.end(), comp);
}
void pop() {
// Move the top element to the last position:
pop_heap(v.begin(), v.end(), comp);
// Remove that element:
v.pop_back();
}
const T& top() { return v.front(); }
bool empty() const { return v.empty(); }
int size() const { return v.size(); }
typedef vector<T> TVec;
TVec vector() {
TVec r(v.begin(), v.end());
// It’s already a heap
sort_heap(r.begin(), r.end(), comp);
// Put it into priority-queue order:
already a heap), then sorts it (thus it leave’s PQV’s vector untouched), then reverses the order
so that traversing the new vector produces the same effect as popping the elements from the
priority queue
Trang 25You may observe that the approach of inheriting from priority_queue used in
PriorityQueue4.cpp could be used with the above technique to produce more succinct code:
sort_heap(r.begin(), r.end(), comp);
// Put it into priority-queue order:
The brevity of this solution makes it the simplest and most desirable, plus it’s guaranteed that
the user will not have a vector in the unsorted state The only potential problem is that the