lowest free block number to 0, because we have no idea what blocks might be free in the free space list, as we have not looked through it yet.. Then we get the free space list count, the
Trang 1case of the big pointer array itself The problem is that a big pointer array can have
a variable number of elements in it, so it would be all too easy for us to
accidentally step off the end of the big pointer array, with potentially disastrous results To prevent such a possibility, I have created the data type called
AccessVector The purpose of this data type is to combine the safety features
of a normal SVector with the ability to specify the address where the data for the SVector should start, rather than relying on the run-time memory allocation library to assign the address Because this data type is designed to refer to an
existing area of memory, the copy constructor is defaulted, as is the assignment operator and the destructor, and there is no SetSize function such as exists for
"regular" SVectors This data type allows us to "map" a predefined structure onto an existing set of data, which is exactly what we need to access a big pointer array safely
As this suggests, the "big array header" is an AccessVector variable, which we can use just as though it were a normal SVector.47
The LittlePointerBlock class
The interface for LittlePointerBlock is shown in Figure blocki.10
The interface for the LittlePointerBlock class (from quantum\blocki.h) (Figure blocki.10)
codelist/blocki.10
There's nothing in this class that isn't exactly analogous to the corresponding
functions in the big pointer array class Therefore, I won't waste either your time or mine by repeating the analysis of the big pointer array class here Instead, let's move along to another class that is somewhat more interesting, if only because it seems to have no purpose for existing
The LeafBlock class
The interface for LeafBlock is shown in Figure blocki.11
The interface for the LeafBlock class (from quantum\blocki.h) (Figure
blocki.11)
codelist/blocki.11
Trang 2This is a real oddity: a class that defines no new member functions or member variables Of what value could such a class possibly be?
The answer is that it provides a "hook" for attaching a handle class, namely LeafBlockPtr This allows us to use a leaf block just as we would any of the other quantum classes If we did not have this class, we could always create another class called QuantumBlockPtr, which would have much the same effect as creating this class So why did I create this class in the first place?
The answer is that originally it did have some member functions, but they
eventually turned out to be superfluous At this point in the development of this project, it would probably be unwise for me to go through the code to root out all
of the references to this class And after all, a class that defines no new
member functions or member variables certainly can't take up too much extra space; in fact, using this class should have absolutely no effect on the size of the program or its execution time, so I think I'll leave it just as it is, at least for now
The FreeSpaceArray class
Finally, we're finished with the block classes Our next target of opportunity will
be the classes that maintain and provide access to the free space list We'll start with FreeSpaceArray, whose interface is shown in Figure newquant.22
The interface for the FreeSpaceArray class (from quantum\newquant.h) (Figure newquant.22)
codelist/newquant.22
The Normal Constructor for FreeSpaceArray
The first function we'll look at is the normal constructor for FreeSpaceArray, whose code is shown in Figure newquant.23
The normal constructor for FreeSpaceArray (from quantum\newquant.cpp) (Figure newquant.23)
codelist/newquant.23
As you can see, all this function does (as is common in the case of constructors) is
to initialize a number of member variables Most of these initializations are fairly straightforward, but we should go over them briefly First, we set the current
Trang 3lowest free block number to 0, because we have no idea what blocks might be free
in the free space list, as we have not looked through it yet Then we get the free space list count, the number of blocks in the free space list, and the quantum
number adjustment from the quantum file object; this last value is used when we need to convert between block numbers and quantum numbers Next, we resize the block pointer SVector so it can hold block pointers for all of the free space list blocks in the quantum file Finally, we assign free space block pointers to all the elements of that SVector Now we are ready to access the free space list
The FreeSpaceArray::Get Function
The next function we'll look at is FreeSpaceArray::Get, whose code is shown in Figure newquant.24
The FreeSpaceArray::Get function (from quantum\newquant.cpp) (Figure
newquant.24)
codelist/newquant.24
The operation of this function is fairly straightforward First, we check whether we're trying to access something that is off the end of the free space list If so, we return a value that indicates that there is no free space in the quantum for which information was requested However, if the input argument is valid, we calculate which block and which element in that block contains the information we need
We then call the Get function of the block pointer to retrieve that element Finally,
we return the result to the caller
The FreeSpaceArray::Set Function
The next function we'll look at is FreeSpaceArray::Set, whose code is shown in Figure newquant.25
The FreeSpaceArray::Set function (from quantum\newquant.cpp) (Figure
newquant.25)
codelist/newquant.25
This function is very similar to its counterpart, the Get function However, there is one difference that we should look at: if the entry that we have just found in the array indicates that its quantum is completely empty (i.e., has the maximum
available space) and this entry has a lower index than the current value of the
Trang 4"lowest free block" variable, then we reset the "lowest free block" variable to indicate that this is the lowest free block
Free the Quantum 16K!
This is an optimization whose purpose is to avoid searching the entire free list every time we want to find a block that isn't committed to any particular main object In both the previous C implementation and the current C++ one, we first check the last quantum to which we added an item; if that has enough space to add the new item, we use it.48 In the old implementation, the free space list contained only a "free space code", indicating how much space was available in the quantum but not which object it belonged to Therefore, when we wanted to find a quantum belonging to the current object that had enough space to store a new item, we couldn't use the free space list directly As a substitute, the C code went through the current little pointer array, looking up each quantum referenced in that array in the free space list; if one of them had enough space, we used it However, this was quite inefficient; since each quantum can hold dozens or hundreds of items, this algorithm might require us to look at the same quantum that many times!49
Although this wasn't too important in the old implementation, where the free space list was held in memory, it could cause serious delays in the current one if we used the standard virtual memory services to access the free space list The free space list in the old program took up 16K, one byte for each quantum in the maximum quantum file size allowed In the new implementation, using 16K blocks of virtual memory, that same free space list would occupy only one block, so searching such
a list would not require any extra disk accesses However, the current
implementation can handle much larger quantum files that might contain tens or hundreds of thousands of blocks, with correspondingly larger free space lists Using the old method, searching the free space list from beginning to end could take quite a while, because the search routine would not access the list in a linear manner and therefore might require extra disk accesses to access the same free space list entries several times At the very least, the free space blocks would be artificially promoted to higher levels of activity and would therefore tend to crowd other quanta out of the buffers
Even if the free space blocks were already resident, virtual memory accesses are considerably slower than "regular" accesses; it would be much faster to scan the free space list sequentially by quantum number than randomly according to the entries in the little pointer array Of course, we could make a list of which quanta
we had already examined and skip the check in those cases, but I decided to
simplify matters by another method
Trang 5The FreeSpaceArray::FindSpaceForItem Function
In the current implementation, the free space list contains not just the free space for each quantum but also which object it belongs to (if any).50 This lets us write a FreeSpaceArray::FindSpaceForItem routine that finds a place to store a new item by scanning each block of the free list sequentially in memory, rather than using a virtual memory access to retrieve each free space entry; we stop when
we find a quantum that belongs to the current object and has enough free space left
to store the item (Figure newquant.26).51
The FreeSpaceArray::FindSpaceForItem function (from
quantum\newquant.cpp) (Figure newquant.26)
codelist/newquant.26
However, if there isn't a quantum in the free space list that belongs to our desired main object and also has enough space left to add the new item, then we have to start a new quantum; how do we decide which one to use?
One way is to keep track of the first free space block we find in our search and use
it if we can't find a suitable block already belonging to our object However, I want
to bias the storage mechanism to use blocks as close as possible to the beginning of the file, which should reduce head motion, as well as making it possible to shrink the file's allocated size if the amount of data stored in it decreases My solution is
to take a free block whenever it appears; if that happens to be before a suitable block belonging to the current object, so be it This appears to be a self-limiting problem, since the next time we want to add to the same object, the newly assigned block will be employed if it has enough free space and is the first suitable block in the list
This approach solves another problem as well, which is how we determine when to stop scanning the free space list in the first place Of course, we could also
maintain the block number of the last occupied block in the file and stop there However, I felt this was unnecessary, since stopping at the first free block provides
a natural shortcut, without contributing any obvious problems of its own However,
as with many design decisions, my analysis could be flawed: there's a possibility that using this algorithm with many additions and deletions could reduce the space efficiency of the file, although I haven't seen such an effect in my testing
This mechanism did not mature without some growing pains For example, the one-byte FreeSpaceCode code, used to indicate the approximate space available
Trang 6in a quantum, is calculated by dividing the size by a constant (32 in the case of 16K blocks) and discarding the remainder As a result, the size code calculated for items