Optimizing C++ - the WWW version

ISBN: 0-13-977430-0

Copyright 1999 by Prentice-Hall PTR

Copyright 2000 by Chrysalis Software Corporation


Return to the table of contents

Return to my main page


Free at Last: An Efficient Method of Handling Variable-Length Records

Introduction

In this chapter we will develop an algorithm that uses the quantum file access method to handle a file containing a large number of records each of which can vary dynamically in size. In order to appreciate the power of this access method, we will start by considering the much simpler problem of access to fixed-length records.

Algorithm Discussed

The Quantum File Access Method

A Harmless Fixation

Let us suppose that we want to access a number of fixed-length customer records by their record numbers.1 Given the record number, we can locate that customer's record by multiplying the record number by the length of a record, which gives us the offset into the file where that record will be found. We can then read or write that record as needed.

Of course, a real application needs to reuse records of customers who have become inactive, to prevent the customer file from growing indefinitely. To handle this problem, we could set a limit on the file size and, when it is reached, start reusing records that haven't been referenced for a long time, making sure to correct or delete any records in other files that refer to the deleted customer.

This fixed-length record system is not very difficult to implement, but it has significant drawbacks; address fields, for example, tend to vary greatly in length, with some records needing 25 or 30 characters for a city name or street address and others needing only 10. If we allocate enough storage for the extreme case, the records become much longer than if we had to handle only the average case. However, allocating enough for only the average case will leave those customers whose names or addresses won't fit into the allotted space quite unhappy, as I know from personal experience as a software developer! The obvious solution is to allow the fields (and therefore the records) to vary in length as necessary.

Checking Out of the Hotel Procrustes

Unfortunately, variable-length records are much more difficult to deal with than fixed-length ones. The most obvious reason, as discussed above, is that determining where fixed-length records are stored requires only a simple calculation; this is not true of variable-length records. However, we could remedy this fairly easily by maintaining a record index consisting of an array of structures containing the starting location and length of each record, as depicted in Figure recindex.2

A sample record index array and data for variable-length records (Figure recindex)

          +-      Starting  Record
| Index Address Length
| +---------------+
| 0 | 0 12 +----+
Record | 1 | 12 12 +----++
Index | 2 | 24 7 +----++-------+
Array | 3 | 31 2 +----++-------+------+
| 4 | 33 5 +----++-------+------+-+
| +---------------+ || | | |
+- +--------------+| | | |
| +---+ | | |
+- | | | | |
| ++-----------+-----------+------+-+----+
Record | |Steve HellerP.O.Box 0335BaldwinNY11510|
Data | +--------------------------------------+
+-
+-
Record | 0000000000111111111122222222223333333333
Offset | 0123456789012345678901234567890123456789
+-

We encounter a more serious difficulty when we want to delete a record and reuse the space it occupied.3 In some situations we can sidestep the problem by adding the new version of the record to the end of the file and changing the record pointer to refer to the new location of the record; however, in the case of an actively updated file such an approach would cause the file to grow rapidly.

But how can we reuse the space vacated by a deleted record of some arbitrary size? The chances of a new record being exactly the same size as any specific deleted one are relatively small, especially if the records average several hundred bytes each, as is not at all unusual in customer data files. A possible solution is to keep a separate free list for each record size and reuse a record of the correct size. However, there is a very serious problem with this approach: a new record may need 257 bytes, for example, and there may be no available record of exactly that size. Even though half of the records in the file might be deleted, none of them could be reused, and we would still be forced to extend the file. The attempt to solve this difficulty by using a record that is somewhat larger than necessary leads to many unusably small areas being left in the file (a situation known as fragmentation).

However, there is a relatively unknown way to make variable-length records more tractable: the quantum file access method.4 The key is to combine them into groups of fixed length, which can then be allocated and deallocated in an efficient manner.

The Quantum File Access Method

Before the following discussion will make much sense to you, I will need to explain in general terms what we're trying to accomplish: building a virtual memory system that can accommodate records of varying lengths in an efficient manner. This means that even though at any given time, we are storing most of our data on the disk rather than maintaining it all in memory, we will provide access to all the data as though it were in memory. To do this, we have to arrange that any actively used data is actually in memory when it is needed. In the present application, our data is divided into fixed-size blocks called quanta (plural of quantum),5 so the task of our virtual memory system is to ensure that the correct blocks are in memory as needed by the user.6 The quanta in the file are generally divided into a number of addressable units called items.7 When adding a record to the file, we search a free space list, looking for a quantum with enough free space for the new record. When we find one, we add the record to that quantum and store the record's location in the item reference array, or IRA, which replaces the record index in Figure recindex; this array consists of entries of the form "quantum number, item number".8 The item number refers to an entry in the item index stored at the beginning of the quantum; the items are stored in the quantum in order of their item index entries, which allows the size of an item to be calculated rather than having to be stored.

For example, if we were to create an array of variable-length strings, some of its item references might look like those illustrated in Figure itemref1.

Sample IRA, item index, and data, before deletions (Figure itemref1)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +-----------------+
Reference | 1 | 3 2 +-----------------+-+
Array | 2 | 3 3 +-----------------+-+--+
(IRA) | 3 | 3 4 +-----------------+-+--+-+
| 4 | 3 5 +-----------------+-+--+-+-+
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- Item # Offset Type Index | | | | |
| +--------------------------+ | | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | | |
| 2 | 24 +| VARSTRING 1 |±----+ | | |
Item | 3 | +-31 || VARSTRING 2 |±-------+ | |
Index | 4 | ++-33 || VARSTRING 3 |±---------+ |
for | 5 |+++-38 || VARSTRING 4 |±-----------+
Quantum | 6 |||| 38 || UNUSED 0 |
3 | 7 |||| 38 || UNUSED 0 |
| 8 |||| 38 || UNUSED 0 |
| 9 |||| 38 || UNUSED 0 |
| 10 |||| 38 || UNUSED 0 |
| ++++----++-----------------+
+- ||| |+-----------------+
||| +------+ |
||+----+ | |
+- |+---+ | | |
| +--+----+-+------+-----------+-----------+
Quantum | | 11510NYBaldwinP.O.Box 0335Steve Heller|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-

When we delete an item from a quantum, we have to update the free space list entry for that quantum to reflect the amount freed, so the space can be reused the next time an item is to be added to the file. We also have to slide the remaining items in the quantum together so that the free space is in one contiguous block, rather than in slivers scattered throughout the quantum. With a record index like the one in Figure recindex, we would have to change the record index entries for all the records that were moved. Since the records that were moved might be anywhere in the record index, this could impose unacceptable overhead on deletions; to avoid this problem, we will leave the item index entry for a deleted item empty rather than sliding the other entries down in the quantum, so that the IRA is unaffected by changes in the position of records within a quantum. If we delete element 1 from the array, the resulting quantum looks like Figure itemref2.

Sample IRA, item index, and data (Figure itemref2)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +-------------------+
Reference | 1 | NONE 0 | |
Array | 2 | 3 3 +-------------------+----+
(IRA) | 3 | 3 4 +-------------------+----+-+
| 4 | 3 5 +-------------------+----+-+-+
| +-------------+ | | | |
+- | | | |
| | | |
+- Item # Offset Type Index | | | |
| +----------------------------+ | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | |
| 2 | 12 | UNUSED 0 | | | |
Item | 3 | 19-+| VARSTRING 2 |±-------+ | |
Index | 4 | +--21 || VARSTRING 3 |±---------+ |
for | 5 |++--26 || VARSTRING 4 |±-----------+
Quantum | 6 ||| 38 || UNUSED 0 |
3 | 7 ||| 38 || UNUSED 0 |
| 8 ||| 38 || UNUSED 0 |
| 9 ||| 38 || UNUSED 0 |
| 10 ||| 38 || UNUSED 0 |
| +++-----++-------------------+
+- || |+-----------------+
|| +-----------+ |
|+---------------+ | |
+- +-----------+ | | |
| +--------------+----+-+------+-----------+
Quantum | | 11510NYBaldwinSteve Heller|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-

Note that we set the offset value of the deleted item equal to the offset of the item before it so that its length can be calculated as zero. Now let us add a new street address of "123 Main Street". After this change, our data structures look like those in Figure itemref3.

Sample IRA, item index, and data, after field change (Figure itemref3)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +------------------+
Reference | 1 | 3 2 +------------------+-+
Array | 2 | 3 3 +------------------+-+--+
(IRA) | 3 | 3 4 +------------------+-+--+-+
| 4 | 3 5 +------------------+-+--+-+-+
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
|Item # Offset Type Index | | | | |
| +---------------------------+ | | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | | |
| 2 | 27 +| VARSTRING 1 |±----+ | | |
Item | 3 | +-34 || VARSTRING 2 |±-------+ | |
Index | 4 | ++-36 || VARSTRING 3 |±---------+ |
for | 5 +-+-++-41 || VARSTRING 4 |±-----------+
Quantum | 6 | | || 38 || UNUSED 0 |
3 | 7 | | || 38 || UNUSED 0 |
| 8 | | || 38 || UNUSED 0 |
| 9 | | || 38 || UNUSED 0 |
| 10 | | || 38 || UNUSED 0 |
| | +-++----++------------------+
+- | || |+-----------------+
| || +---+ |
| |+-+ | |
+- | ++ | | |
| ++----+-+------+--------------+-----------+
Quantum | |11510NYBaldwin123 Main StreetSteve Heller|
Data | +-----------------------------------------+
+-
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-

This is the actual mechanism used to change the value of an item: we delete the old item and add the new one. Of course, an item that increases in size may no longer fit in the same quantum, in which case we have to move it to a different quantum. Keeping track of such changes is not difficult as long as we always use the standard method of finding the address of an item: look it up in the IRA to get its quantum number and item number, then use its quantum number and item number to look it up in the item index. This allows us to move the item to a new quantum and change only its entries in the IRA and affected item indexes.

For example, if an item was originally in quantum 3 but now no longer fits there, we might move it to quantum 6. In order to do this, we first delete it from quantum 3 and then add it to quantum 6. As before, all the other IRA entries for items in quantum 3 still refer to the same data items as before, after we have adjusted the item index to account for the removed data.

Of course, the deleted item still takes up a slot in the item index of the old quantum; in our implementation this requires 10 bytes of storage.9 We need a method of reclaiming that space, so that we do not gradually fill our quanta with indexes pointing to nothing. We have already seen part of the solution to this problem: whenever we are adding an item to a quantum, we search for an unused item index entry rather than simply adding a new entry at the end.

This mechanism alone would only slow the growth of the item indexes; actually reducing the size of an index requires us to check for empty entries at the end of the item index after deleting an item. If such excess entries exist, we reclaim the space they occupy by reducing the entry count in the item index header to the number actually in use. This reduces wasted space in the item index without affecting entries in the IRA. Since we remove unused item index entries only when they are at the end of the index, no IRA entries can refer to items later in the same quantum.

A Large Array

Our records are stored in the quanta pointed to by our IRA. The IRA will be stored in its own quanta, which allows us to make it as large as needed. However, this means that we can't allocate the IRA in one piece. Since a quantum has a fixed size, we will have to break up our IRA into segments that will fit in a quantum; these segments will be collectively referred to as the little pointer array. We'll also need some way to find the segment that we need for a particular element in the IRA, which will be the responsibility of the big pointer array.

Figure bigpointer shows the relationship between the big pointer array and the IRA segments.

From the big pointer array to the IRA (Figure bigpointer)

+------------------------------------------+
| +- |
| Big | element_count = 8000 |
| Pointer | last_quantum_added_to = 3 |
| Array | |
| Header | |
| +- |
| +- |
| | Quantum | Quantum
| | Index Number | 5
| | +-------------+ |
| Big | 0 | 2 +-------+---+
| Pointer | 1 | 10 +-------+---+-+
| Array | 2 | 12 +-------+---+-+--+
| | 3 | 14 +-------+---+-+--++
| | 4 | 11 +-------+---+-+--+++
| | +-------------+ | | | |||
| +- | | | |||
| | | | |||
+------------------------------------------+ | | |||
+--------------------------------------+ | | |||
| |±------+ | |||
| +- | Quantum | |||
| | Quantum Item | 2 | |||
| | Index Number Number | | |||
| Item | +-------------+ | | |||
| Reference | 0 | 3 2 | | | |||
| Array | 1 | 3 4 | | | |||
| (IRA) | 2 | 3 5 | | | |||
| segment | 3 | 3 3 | | | |||
| 0 | 4 | 3 1 | | | |||
| | +-------------+ | | |||
| +- | | |||
+--------------------------------------+ | |||
+--------------------------------------+ | |||
| |±--------+ |||
| +- | Quantum |||
| | Quantum Item | 10 |||
| | Index Number Number | |||
| Item | +-------------+ | |||
| Reference | 0 | 3 12 | | |||
| Array | 1 | 4 6 | | |||
| (IRA) | 2 | 4 1 | | |||
| segment | 3 | 4 2 | | |||
| 1 | 4 | 4 3 | | ²²²
| | +-------------+ |
| +- |
+--------------------------------------+

Each IRA segment in the real program, of course, holds more than five entries: with a 16 KB quantum, about 4000 entries will fit in one quantum. However, the segmentation of the IRA works in the same way as this example. The segment number can be determined by dividing the element number by the number of elements in each IRA segment; the index into the IRA segment is the remainder of that division.

The big pointer array sets a limit on the number of elements referenced by an IRA, since that array has to fit in one quantum. However, this is not a very restrictive limit, since one big pointer array can contain about 4000 elements, using a 16 KB quantum. Each of these elements points to a little pointer array, which contains pointers to about 4000 entries, as mentioned above. Therefore, we could theoretically create an array with about 16,000,000 items in it.

Because our ArrayIndex type is defined as an unsigned long, which is usually 4 bytes, we can actually create arrays of that size.10

Many Large Arrays

There is one more generalization of the quantum file access method that will make it more generally useful: the ability to create more than one array of variable-length items.

The mechanism by which we will accomplish this goal is called the "main object index", which contains the quantum number of the big pointer array for each object; the number of an object is set when the object is created, by a directory facility which we will examine later.

Figure mainindex shows the path from the main object index through the big pointer array, the IRA segment, and the item index, to finally arrive at the data.

From the main object index to the data (Figure mainindex)

      +-
| Index Quantum Number
| +----------+
Main | 0 |NO_QUANTUM|
Object| 1 | 5 +-------------------------+
Index | 2 | 9 | |
| 3 | 1 | |
| 4 | 7 | |
+- +----------+ |
+-----------------------------------------------+ |
|Big +- |±----+
|Pointer | element_count = 8000 |
|Array | last_quantum_added_to = 3 |
|Header +- |
| +- |
| | Quantum | Quantum
| | Index Number | 5
| | +-------------+ |
|Big | 0 | 2 +--------------+---+
|Pointer | 1 | 10 | | |
|Array | 2 | 12 | | |
| | 3 | 14 | | |
| | 4 | 11 | | |
| +- +-------------+ | |
+-----------------------------------------------+ |
+-------------------------------------------+ |
| +- Quantum Item |±------+
| | Index Number Number | Quantum
|Item | +-------------+ | 2
|Reference | 0 | 3 2 +----------+----------+
|Array | 1 | 3 4 +----------+----------+-+
|(IRA) | 2 | 3 5 +----------+----------+-+-+
|segment | 3 | 3 3 +----------+-------+ | | |
|0 | 4 | 3 1 +----------+-----+ | | | |
| +- +-------------+ | | | | | |
+-------------------------------------------+ | | | | |
+-----------------------------------------------+ | | | | |
| +- | | | | | |
| | Item # Offset Type Index | | | | | |
| | +----------------------------+ | | | | | |
| | 1 | 5 -+ VARSTRING 4 |±+-+ | | | |
| | 2 | 17 +| VARSTRING 0 |±+---+--+ | |
|Item | 3 | +-19 || VARSTRING 3 |±+---+ | |
|Index | 4 | ++-31 || VARSTRING 1 |±+--------+ |
| | 5 |+++-38 || VARSTRING 2 |±+----------+
| | 6 |||| 38 || UNUSED 0 | | Quantum 3
| | ++++----++-------------------+ +---------+
| +- ||| |+------------------------+ |
| ||| +-------------+ | |
| ||+----------------+ | | |
| +- |+-----+ | | | |
|Quantum | +-+------+-----------+-+-----------+----+ |
|Data | | BaldwinP.O.Box 0335NYSteve Heller11510| |
| +- +---------------------------------------+ |
| 333333333222222222211111111110000000000 |
| 876543210987654321098765432109876543210 |
+---------------------------------------------------------+

We might use this facility to keep track of both customer records and inventory records, each of which would be logically separate although they may be stored in the same quantum file. In order to facilitate reconstruction of the file if part of the IRA is lost, we won't use the same quantum for items from two different arrays; as a result, the space freed by deleting an item will only be reused when an item is to be added to the same array.

To find an element in a main object, we index into the main object index by the object's number, retrieving the quantum number of the big pointer array for that object. Then we calculate the IRA segment and the index into that segment and retrieve the quantum number and item number for the element we want. The last step is to use the item index to find the exact location of the element.

Of course, we need a place to store the main object index. A normal application might have five or ten main objects; therefore, our allocation of 256 entries in the main object index is extremely generous in most cases.11

The Light at the End of the Tunnel

Let's start our examination of the implementation of the quantum file code by looking at a simple program that uses it to store a number of strings and then retrieve them (Figure flextest.00).

Initial test program for quantum file implementation (quantum\flextest.cpp) (Figure flextest.00)

codelist/flextest.00

Possibly the most striking feature of this program is how simple it appears, which is a direct consequence of the power of C++. Writing a similar program in C would require much greater knowledge of the implementation of the quantum file algorithm. However, in C++ all we have to know is which header file to include, how to open the quantum file, and how to create a FlexArray object, which behaves like a variable-sized array of variable-sized strings. Once we have dealt with this small set of prerequisites, we can use this extremely powerful and flexible data type in virtually the same way we would use a built-in type.

Let's take a detailed look at exactly how we set up and use a quantum file in this program. First, we have a couple of statements that declare a temporary buffer variable which we'll use to construct data to store in the quantum file and delete any previous version of the quantum file itself. Next, we create a QuantumFile object called QF via the default constructor of the QuantumFile class. Then we call an Open function on that object, supplying the name of the quantum file to be created.

The next statement creates a FlexArray object called TestArray. The arguments to this constructor are a pointer to the QuantumFile object that we'll use to store the data for TestArray, and a name by which we could reuse this same FlexArray object in another program or another execution of this program.

Now we're up to the loop where we will start to use the TestArray object. The loop steps through the first 100 elements of TestArray, assigning each element a String consisting of digits corresponding to the position of that element in the TestArray object. The strstream named Temp is used to format the numeric values of the loop variable i into String values that can be stored in a FlexArray object.

Finally, after we've put the values into TestArray, the second loop displays the values in the TestArray object.

Pay No Attention to the Man Behind the Curtain

At this point, I wouldn't be at all surprised if you had come to the conclusion that I have something up my sleeve ... and you would be right. Underneath this apparent simplicity is a wonderland of complex programming, which I hope I will be able to explain in an understandable way. Let's begin by examining some of these data types we've just seen, in enough detail to demystify them.

The first of these data types is the String. As it happens, this is actually a typedef for a more general type called SVector, so we'll start with the question of exactly what an SVector might be.

An SVector is very much like an array, with the following exceptions:

  1. If you try to store something beyond the end of an SVector, you will get an error message rather than a mysterious crash.
  2. You can find out how many elements an SVector has, by calling its size member function.
  3. You can change the size of an SVector after it has been created, by calling its resize member function.
  4. You can assign one SVector of a particular type to another SVector of the same type.
  5. You can create an SVector by copying from another SVector of the same type.

While these are very useful characteristics for any type of SVector, they are even more important in one particular type of SVector: the String, which is an SVector of chars. As a specialization of the SVector type, the String has all of the characteristics of any other SVector, but has additional abilities as well. Primary among these is its ability to be read or written using the standard C++ input/output stream functions. This String type, or one very much like it, is one of the most obvious improvements in C++ over C, whose corresponding char* type is notoriously error-prone. In this program, we will use the String data type in much the same way as we would use the char* data type in a C program. Unfortunately, we will not be able to go into detail on the implementation of either the SVector or String classes in this book; we will just use their facilities as described immediately above.

A Very Useful typedef

Another "type" used in this simple test program is ArrayIndex. This is also a typedef, but for a much simpler underlying type: unsigned long.

In this case, the reason I didn't just use the underlying type was to enhance portability. The original version of this program was implemented on a 16-bit compiler, where the definition of ArrayIndex was unsigned short, primarily because it was impossible to create an array or SVector having more than 64K elements. When I ported this program to a 32-bit compiler, I changed the definition of ArrayIndex and several other similar typedefs and recompiled.

I would like to be able to tell you that the program worked without further changes, but that would be a lie: it actually took me a couple of days to get it working again. However, considering the size of the source code for the classes (over 100KB), I was quite happy to take such a short time to port it.

If I had used the explicit type unsigned short everywhere, it would have taken me much longer to figure out which occurrences of that type should be changed to unsigned long and which should remain as they were. As it was, I found a few places where I did not use a typedef or used the wrong one and ran into trouble as a result. Overall, though, I did a pretty good job in using appropriate types originally, which saved me a lot of trouble later.

A Quantum Leap

Now let's start our journey into the interior of the quantum file implementation, starting with the QuantumFile class itself, whose header file is shown in Figure blockh.00.

Header file for QuantumFile class (quantum\block.h) (Figure blockh.00)

codelist/blockh.00

I can't say that I'm completely happy with the design of this class. Ideally, the user of any class should not have to worry about any member data or member variables that the user cannot or should not use. Unfortunately, this is not easily accomplished in C++. I believe it is worthwhile to digress slightly here to discuss the reasons why this is so.

First, let's examine the difficulty of "hiding" private and protected member variables from the class user. The technical problem here is that the compiler cannot allocate space for a variable unless its size is known. This implies that you cannot create a variable of a given class without access to its interface definition, which must in turn include the definitions of all of its member variables.

This does not allow the class user to use those member variables unless they are public, so we don't have to worry about non-member functions accidentally altering member variables. However, it does mean that the user of a class must ensure that all of the data types needed in the declaration of a class are available when the header for that class is compiled, and that the names of those data types do not conflict with any previously defined names. This can be a source of considerable difficulty when trying to use a fairly large library in a context other than the one in which it was developed, which of course reduces the payoff that such reuse would otherwise provide.

The other problem with the design of this class is that it defines a number of functions that application programmers have no need for. These functions are intended for use by other classes that contribute to the functionality of the quantum file implementation. You might think that I could simply make these functions private to prevent their being called by outside classes. Unfortunately, I wasn't able to do that with Microsoft Visual C++ 5.0, one of the compilers with which this code must work, as that compiler apparently cannot understand friend declarations that specify classes that are nested within other classes, as some of mine are. Even if this did work, I'd still prefer to spare the application programmer the details of these functions.

Is there a solution to either or both of these problems? Yes, in fact both of these can be remedied. Let's start with the data problem.

The standard solution to avoid exposing member variables to the application programmer is to create a secondary data structure that contains the actual data for the object, and include only a pointer to such a structure in the main object. Because the compiler knows the size of a pointer to a structure without having to see the definition of the structure itself, the header file that defines this secondary structure can be included only when needed in the implementation of the class. This frees the application programmer from any concern about the representation of the data.

What about those functions that the application programmer doesn't need to (and shouldn't) call? This can be solved in much the same way as the data problem. In this case, however, the pointer in the visible class points to an object of a class that supplies the additional functions needed by the other implementation classes. Because the client program would not include the header file that defines this secondary class, it could not access those additional functions. Of course, the other implementation classes would include that header file so that they could access the additional functions as needed.

Why didn't I implement either of these solutions (or better yet, both)? Because doing so would have required wholesale changes to the implementation, and I ran out of time. Of course, had I considered these issues when I was first creating these classes, they would have been fairly simple to solve, but I didn't think of them at that time.

In any event, the design is usable as it stands, even if it could be improved, so let's move on to how we can use it in its present form. We'll start with the functions that are needed by client programs such as our test program flextest.cpp. Figure block.00 shows the first of these functions, QuantumFile::QuantumFile().

The default constructor for the QuantumFile class (from quantum\block.cpp) (Figure block.00)

codelist/block.00

As you can see, this function's implementation consists entirely of initialization expressions in the member initialization list. We'll look at each of the member variables of this class in detail when we get to the functions that use them. For now, I'll just mention that they are all needed to keep track of the status of the quantum file and the buffers that maintain parts of its data in memory.

Next, let's take a look at Figure block.01, which shows the first "regular member function" in this class, QuantumFile::Open. This is a good place to start, since we can't do anything to a quantum file until we open it, as with any other file.

The Open function for the QuantumFile class (from quantum\block.cpp) (Figure block.01)

codelist/block.01

After declaring a number of variables, this function initializes the variables that keep track of the number of physical read and write operations for benchmarking purposes (m_ReadCount and m_WriteCount, respectively). Then it resizes the SVectors that keep track of the block buffers (m_Block), which quantum is in each block buffer (m_BlockNumber), whether each block buffer has been modified and therefore needs to be written back to the disk (m_BufferModified), and the "time stamp" for each block buffer that indicates how recently it has been accessed (m_TimeStamp). We'll see how each of these variables is used in the program as we go through the code.

The next few lines check whether the caller has supplied a non-null file name for the quantum file. If not, we bail out, because we need a valid file name to use when we open the file to store our data.

Assuming that we do have some sort of file name, we try to open the file for reading, to see whether it is already there. If fopen returns NULL, indicating that the file doesn't exist yet, we create the file, clear the QuantumFileHeaderStruct structure called QFHS, set the main_object_index_count and free_space_list_count to their default values, and copy the current program_version_info into the quantum_version_info structure element.

The next operation is to initialize the member variables of QFHS that keep track of the offsets in the file where the main_object_index, the free_space_list, and the quantum file data area begin, so that these areas of the file can be located when we open the file again.

Since we haven't created any main objects yet, our next task is to clear the main object index, setting it to zeros (which indicates available entries, as quantum number zero is not used), except for the entry for object number zero, which is set to NoQuantum; we will not allow a main object number zero, for better error control. We also write the header block to the file at this point.

The last operation in the new file initialization is to set up the free space list to indicate that all quanta have the maximum available free space, except of course for quantum zero, which is not used.

Of course, the process of opening an already existing quantum file starts differently; we have to read in the quantum file header and check whether it was created by the same version of the program. The current program simply tests whether the version of the program that created the file is different from the one reading it and aborts if they are different. A production program would use this information to determine whether the versions were compatible, so that if we changed the program in a way that makes old files unusable, we could warn the user that the file is of an old type and should be converted. Alternatively, we might automatically convert it to a new format; in either case, it is essential that we know what version of our program created the file. After we have taken care of this detail, the rest of the code for opening either an old or a new file is the same.

We are almost done with the initialization, but there are a few more data structures to handle. First, we have to calculate the block numbers where the main object list, the free space list, and the data area start. Then we initialize the member variables that keep track of the current number of main objects and the number of elements in the free space list. Then we initialize the SVectors of block buffers, block numbers, modified flags, and time stamps. Next, we initialize variables that will allow us to access the free space list and main object list; we'll get into exactly how that works later.

Finally, if we have created a new quantum file (as opposed to opening one that already existed), we create a main object called "Directory", which we will use to keep track of the names of the main objects that the user will create.

Now let's take a look at Figure block.02, which shows the next user function, QuantumFile::Flush. This function is used to ensure that all data written to the quantum file has actually been stored on the disk. It's a good idea to call Flush whenever you have finished with a logical portion of your data updating, to make sure that a power failure or other crash doesn't lose data.

The Flush function for the QuantumFile class (from quantum\block.cpp) (Figure block.02)

codelist/block.02

This isn't a very complicated function, but it does show how the m_BufferModified SVector is used: Each buffer whose flag in that SVector is TRUE is written to the disk via the Write member function. Any buffer whose flag is not TRUE has not been modified since it was last read from the disk, so there is no reason to write it back to the disk. Avoiding unnecessary writes is very helpful in reducing disk I/O and improving the speed of the program. The Write function resets the flag for any buffer it writes, so if we called Flush twice in a row, no buffers would be written out on the second execution.12 The final function that we're going to cover in this class at this point is Close (Figure block.03). As its name suggests, this function closes the quantum file and takes care of all the housekeeping required at that time. However, you generally won't call this function explicitly, because it is called automatically when the QuantumFile object is destroyed at the end of its scope, as you can see by looking at the QuantumFile destructor (Figure block.04).

The Close function for the QuantumFile class (from quantum\block.cpp) (Figure block.03)

codelist/block.03

The destructor for the QuantumFile class (from quantum\block.cpp) (Figure block.04)

codelist/block.04

Allowing the destructor to close the file automatically is easier and (more important) safer than closing the file explicitly, because it's not always easy to determine when all of the objects in the quantum file are inactive. Hence, I recommend allowing the compiler to take care of closing the file.

Flexible Flying

Now that we've taken a look at the member functions of the QuantumFile class that user programs are concerned with, let's do the same with the FlexArray class. In the process, we'll start to delve into the underlying layers of the quantum file implementation in more detail. Let's start with the interface for the FlexArray class, which is shown in Figure newquant.00.

The interface for the FlexArray class (from quantum\newquant.h) (Figure newquant.00)

codelist/newquant.00

As before, we'll postpone discussion of the member variables of this class until we see how they are used in the member functions. We'll start with the normal constructor for the FlexArray class, which is shown in Figure newquant.01.

The normal constructor for the FlexArray class (from quantum\newquant.cpp) (Figure newquant.01)

codelist/newquant.01

As you can see, this function simply calls an Open function in the same class to do all of the actual work. The purpose of this handoff is to allow the user to create a FlexArray via the default FlexArray constructor before opening the quantum file and fill in the details later via an explicit Open call. Now that I've cleared that up, let's take a look at the function that actually does the work, which is shown in Figure newquant.02.

The Open function for the FlexArray class (from quantum\newquant.cpp) (Figure newquant.02)

codelist/newquant.02

The header for this function shouldn't be too alarming. The first argument is simply the address of the quantum file object in which the FlexArray will be created. The second argument is the name under which the FlexArray will be stored in the quantum file's directory, which will make it accessible to another program or a later run of this program. The ModifiableElement type is just another name for a String, left over from a previous incarnation of that type. The third argument is the initial number of elements that the FlexArray will contain, and the last argument is the maximum number of elements to which the FlexArray will expand if necessary. If you look at the interface for this class, you'll notice that these last two arguments have default values of 1 and UINT_MAX-1 elements, respectively. Therefore, if you don't have any idea how large a FlexArray might need to be, you can omit these arguments and it will expand as needed.

Now we're up to the first statement inside the function, which appears innocent enough. It assigns a value to a member variable called m_MOA, which stands for "Main Object Array". This member variable can then be used to look up the name of the FlexArray in the quantum file directory, as well as to access a number of other attributes of the quantum file. But what is the type of m_MOA, and how does it give us access to the quantum file attributes?

To answer these questions, we're going to have to get into some fairly complex implementation and language issues. We'll start this excursion with a look at the interface for the MainObjectArrayPtr class, shown in Figure newquant.03.

The interface for the MainObjectArrayPtr class (from quantum\newquant.h) (Figure newquant.03)

codelist/newquant.03

You will notice that this class has an embedded class called MainObjectArray defined within it, and that the embedded class has quite a few member functions. You may also notice that the MainObjectArrayPtr class itself has only a few member functions, most of which are the standard "structural" member functions -- the normal and default constructors, the destructor, and the assignment operator. One of the constructors is the conversion function that constructs a MainObjectArrayPtr object from a QuantumFile pointer (used in the FlexArray Open function), and the others are the default constructor and copy constructor. However, there is one member function that is anything but standard: operator->. Why would we want to redefine that operator, and what can a custom version of operator-> do?

All Hope Abandon, Ye Who Enter Here?

Before we get through with this question, we will have entered a realm of C++ where relatively few have trespassed. I don't believe in adding complexity for the sake of complexity, so why would I use a feature of C++ that most C++ programmers have never dealt with?

Because we need this feature to improve the ease of use and reliability of programs using this quantum file access method implementation. A little history lesson is appropriate here so you can see that I'm not speaking from a solely theoretical perspective.

The first edition of this book included an implementation of the quantum file access method written in C.13 It had many of the features that the current implementation has, but it was much harder to use. The main problem was that the user program had to deal with memory allocation issues, including remembering to free memory for variables whose storage was assigned inside the quantum file code. In addition, the syntax of a quantum file "array" was much less convenient than dealing with a built-in array. It was obvious to me that resolving all of these issues so that the user could ignore the inner workings of the quantum file would greatly improve the usability of this algorithm.

Unfortunately, this isn't possible in C; however, a major impetus behind the creation of C++ was to make it possible for class library designers to provide such facilities. Since the quantum file access method would benefit greatly from such an approach, I decided to rewrite it in C++.

Rewriting History

Why am I using the word "rewrite", rather than "convert", "translate", or other euphemisms? Judging from the number of ads for "C/C++ programmers" I see in the newspapers, some employers have the notion that switching to C++ can be accomplished by changing source file extensions to ".cpp" and recompiling. After removing some syntax errors revealed by the stricter type checking of the C++ compiler, the program compiles and runs as it did before. What's so difficult about object-oriented programming? If you are one of these employers, or you have tried the above experiment yourself and now believe that you are a "C++ programmer", let me break the news to you gently: all you have accomplished is to switch to the C subset of C++, which has nothing to do with object-oriented programming. Virtually none of the code or design from the original quantum file implementation has survived the transition to object orientation unscathed, despite the fact that, according to an expert on object-oriented design, that original C implementation was "object-oriented"!

However, my loss (if that's what it was) can be your gain: I am going to break a long-standing tradition of object-oriented design literature by disclosing not only the final design but also many of the missteps, errors, and difficulties that ensued from my original determination to tackle this rather complex project. So, with no further ado, let's begin with an often neglected concept which is at the core of the implementation of this project: operator overloading.

Warning: Overload!

One of the most powerful mechanisms for hiding the details of implementation from the class user in C++ is operator overloading, which means defining (or in some cases redefining) the semantics of one or more of the standard operators +, -, =, and so forth, as they apply to class objects. For better or worse, the semantics of these operators cannot be changed as they apply to "intrinsic" data types, so the calculation 2+2 is always going to result in 4; this restriction is probably necessary, as the potential confusion could be horrendous.14 A good example of the most common type of use for this facility is the implementation of +, -, etc., for manipulating Complex class objects, following the rules of complex arithmetic. The ability to provide this sort of intuitive operation to the number-crunching user is starting to make C++ (with the proper class libraries, of course) a viable alternative to FORTRAN, long the dominant language for such users.

Hello, Operator?

As usual, it's probably best to start with a relatively simple example: in this case, we'll look at an example of overloading operator-.15 Consider the program in Figure overload1.

Overloading operator- (Figure overload1)

codelist/minus.00

As you can see, I've created a very simplified version of the dreaded Point class, used in innumerable textbooks to represent a point on the Euclidean plane. The result of running our sample program is 5, which is the distance between the Point (1,1) and the Point (4,5), calculated by the normal Euclidean calculation:

Distance = sqrt(xdiff*xdiff + ydiff*ydiff)

where xdiff is the difference between the x coordinates of the two Points, and ydiff is the difference between their y coordinates; sqrt, of course, represents the square root function.

The "magic" is in the definition of operator-, which is the syntax for redefining an operator; in this case it's the subtraction operator. When the compiler sees the expression x-y, it looks for a definition of operator- that is specified for class Point, taking an argument which is also of class Point. Since there is such a definition, the compiler generates a call to the code specified in that definition. Had there not been such a definition, a syntax error would have resulted, since the compiler doesn't have any built-in knowledge of how to subtract two Points.

The Mask of Arrow

Operator overloading doesn't have to be terribly complicated, as that example illustrates. However, there's a trick to the overloading of operator->. When we overload other operators, such as our earlier example of operator-, our code takes over the implementation of the operator completely. That is, the result of calling our operator is whatever we say it is; in the operator- example, it's the double result of the Euclidean distance formula. However, this is not true with operator->; in that case alone, a different scheme is followed by the compiler. Figure overload2 shows some code derived from an early version of the project. How does the compiler interpret it?16

Dangerous operator-> overloading (Figure overload2)

codelist/danger.00

The line X->Get(1); will be compiled as follows:

1. Since X is not a pointer to any type, its class definition is examined for a definition of operator->.

2. Since this definition is found, a call to that code is inserted in the function being compiled.

3. Then the return value of the operator-> code is examined. Since it is a pointer type (BlockPtr *, to be exact), the compiler continues by generating code to call BlockPtr->Get(int) with this equal to the result returned by operator->, which is the same BlockPtr that we started with.

Type-Safety First

So far, this is doing exactly what we wanted it to do. However, there is one serious problem with this method of implementing operator->, which is illustrated by what happens when the line Y->Get(1); is compiled. Unlike our previous example, Y is a pointer (to a BlockPtr); therefore, the compiler doesn't look for a definition of operator->, but merely generates code to call BlockPtr->Get(int) with this equal to Y. As a result, our overloaded operator is never called.

Much the same problem will occur when the line X.Get(1); is compiled. The compiler is happy to generate a call to BlockPtr.Get(int) with this being equal to the address of X; again, though, our custom operator-> code won't be executed.

The reason for this problem is that the compiler stops looking for operator-> overloading whenever an actual pointer is found. After all, if you already have a pointer in a place where a pointer is needed, you must have what you want! Obviously, we're going to have to come up with a type-safe solution, so we can't accidentally use the wrong syntax and end up with a nasty bug.17 The solution is to have not one, but two classes involved in the virtual memory mechanism, an "outer" (or handle) class and an "inner" (or body) class.18 The handle class has two primary functions: the initializing of a body class object and the overloading of operator-> to return a pointer to that body class object, which actually does the work. Figure overload3 presents this solution.

Type-safe operator-> overloading (Figure overload3)

codelist/safe.00

One interesting thing about this program is that the main program refers to the Get function in exactly the same way as it did before: in fact, the only difference visible to the class user is that the compiler is now able to prevent him (and us) from compiling the incorrect references as we did before.

However, there are some changes to the internals that deserve comment. First of all, in the class definition, the previous BlockPtr class has been renamed to Block and made a nested class of a new BlockPtr class. The reason for the renaming is to prevent unnecessary changes to the user's code; that was successful, as noted above. Another design decision whose justification is not quite as obvious is the use of a nested class rather than a freestanding class. Why is this appropriate?

Hidden Virtues

Since the purpose of the nested Block class is solely to add type safety, rather than to provide any visible functionality, nesting it inside the BlockPtr class reduces "name space pollution". That is, if the user needs a Block class for some other purpose, its name won't conflict with this one. Such potential conflicts are going to become more serious as the use of class libraries increases; we should do our part to "preserve the environment" when possible.19 The fact that the Block class is visible only to members of its enclosing class, BlockPtr, is responsible for the somewhat odd appearance of the member function declarations in Figure overload3. For example, what are we to make of the declaration "char BlockPtr::Block::Get(int p_Index)"?

Of course, we C++ programmers are used to "qualified" function names, such as Block::Get. But why do we need two qualifiers? Because the compiler knows the meaning of Block only in the context of BlockPtr; that's what prevents the global name pollution that would ensue if Block were declared outside of BlockPtr. This means that we could, for example, have another class called Block nested inside another enclosing class called NewBlockPtr. If that Block class had a Get member function, we could declare it as "char NewBlockPtr::Block::Get(int p_Index);", and the compiler would have no problem telling which Block::Get we meant. Similarly, an unadorned "Block::Get(int p_Index);" would be yet another completely distinct function.20 One more safety feature in this new arrangement is worthy of note: the constructor for Block is protected, so that the user can't accidentally create one and use it instead of referencing it through a BlockPtr object the way we intend. This isn't very likely anyway, since the user would have to specify BlockPtr::Block as the type, which isn't easy to do by accident; however, according to the principles of object-oriented programming, it is best to hide the internals of our classes whenever possible, so that class user code can't depend on those internals. That restriction makes it possible for us to improve the implementation of our classes without breaking user code.

Polite Pointing

Now that we've investigated the notion of redefining operator->, let's return to the MainObjectArrayPtr class that we were discussing a little while ago. We'll start with Figure newquant.04, which shows the code for the normal constructor for that class.

The normal constructor for MainObjectArrayPtr (from quantum\newquant.cpp) (Figure newquant.04)

codelist/newquant.04

This function is fairly simple. It creates a new MainObjectArray object using the normal constructor for that class. This is the "body" object that will actually do the work of both of these classes. Then it sets the reference count of that object to 1, indicating that there is one MainObjectArray "handle" object using that "body" object.

The general idea of reference counting is fairly simple, as most great ideas are (after you understand them, at least). It's inefficient to copy a lot of data whenever we set one variable to the same value as another; on the other hand, copying a pointer to the data is much easier. However, we have to consider how we will know when we can delete the data being pointed to, which will be when no one needs the data anymore.

One way to share data safely in such a situation is to write the constructor(s), destructor, and assignment operator to keep track of the number of objects that are using a particular body object, and when there are no more users of that object, to delete it.21 We'll see how this works in the case of the MainObjectArray body class and its MainObjectArrayPtr handle class; the implementation is much the same in the other handle/body classes in this program.

However, right now we're more interested in how we can use operator-> in the FlexArray::Open function (Figure newquant.02). After creating a member variable called m_MOA via the normal constructor of the MainObjectArrayPtr class, the next line in the Open function invokes MainObjectArrayPtr::operator-> (Figure newquant.06).

MainObjectArrayPtr::operator-> (from quantum\newquant.cpp) (Figure newquant.06)

codelist/newquant.06

Given the definition of operator-> in Figure newquant.06, how will the compiler interpret the expression m_MOA->FindObjectByName, which is used in the function FlexArray::Open?

1. Since m_MOA is not a pointer to any type, the interface of its class (MainObjectArrayPtr) is examined for a definition of operator->.

2. Since there is such a definition, a call to that code is inserted in the function being compiled.

3. Then the return value of the operator-> code is examined. Since it is a pointer type (MainObjectArray *, to be exact), the compiler continues by generating code to call MainObjectArray->FindObjectByName with this equal to the result returned by operator->, which is the address of the body object of the MainObjectArrayPtr that we started with. Of course, the same analysis applies to the other uses of the m_MOA object in the Open function as well as the other functions that use handle objects.

Now let's continue with the analysis of the Open function in the FlexArray class. The statement we've just examined calls a function called FindObjectByName, which returns an index into the main object array. If the name supplied in the call was in fact the name of an existing object in the main object array, then the returned value will be the element number of that object in the main object array. On the other hand, if the specified name was not found in the main object array, the returned value will be the special value NoObject. In that case, Open will use another member function of MainObjectArray, namely FindAvailableObject, to locate a free element in the main object array. Then it will call the CreateMainObject function to create a new main object which will use that free element.

At this point in the function, we have a valid index into the main object array for our FlexArray, whether it existed previously or was just created. Finally, we retrieve the current and maximum element counts for the FlexArray object.

Now that we have covered the Open function in the FlexArray class, there's only one other function in that class that could use a significant amount of explanation: the implementation of operator []. However, I'm going to postpone coverage of that function until the next chapter, where we'll be examining the somewhat complex task of making one of our data types look as much like an array as possible.

In the meantime, we should go over the rest of the member functions of the MainObjectArrayPtr class, so you can see how this typical handle class works.

The MainObjectArrayPtr class

We'll start with the default constructor (Figure newquant.07), which is pretty simple. In fact, it's exactly like the normal constructor (Figure newquant.04), except that it uses the default constructor of its body class to create its body object rather than the normal constructor of the body class.

The default constructor for MainObjectArrayPtr (from quantum\newquant.cpp) (Figure newquant.07)

codelist/newquant.07

The MainObjectArrayPtr Copy Constructor

The copy constructor (Figure newquant.08) isn't much more complicated. It copies the pointer to the main object array (body) object from the existing object to the same member variable in the newly created object. Then it increments the reference count because there is now one more user of that body object.

The copy constructor for MainObjectArrayPtr (from quantum\newquant.cpp) (Figure newquant.08)

codelist/newquant.08

The MainObjectArrayPtr::operator = Function

The next function, MainObjectArrayPtr::operator = (Figure newquant.09), is a bit more complicated than the previous ones, because it has to account for the reference counts in both the current object and the right-hand object being copied from.

MainObjectArrayPtr::operator= (from quantum\newquant.cpp) (Figure newquant.09)

codelist/newquant.09

We start by decrementing the reference count of the current body object, because we won't be pointing to it after this operation (unless the right-hand object happens to point to the same body object as ours does). Then we check whether the reference count is less than or equal to 0, in which case we can (and will) delete the current body object.22 After taking care of that issue, we copy the right-hand object's main object array pointer to the corresponding member variable in our object and increment the reference count for that main object array. Finally, we return our object to the caller.

The MainObjectArrayPtr Destructor

The final function in the MainObjectArrayPtr class is the destructor (Figure newquant.10).

The destructor for the MainObjectArrayPtr class (from quantum\newquant.cpp) (Figure newquant.10)

codelist/newquant.10

This shouldn't seem at all mysterious after our analysis of the operator = function, as its operation is basically a subset of what that function does. That is, it decrements the reference count of its body object; if the resulting reference count is less than or equal to 0, it deletes the body object.

Now that we're through with the MainObjectArrayPtr class, we'll examine its body class, MainObjectArray, so we can see how this fundamental data type of the quantum file algorithm works.23

The MainObjectArrayPtr::MainObjectArray class

We'll skip over the default constructor and destructor, as these functions don't do anything particularly interesting. However, the normal constructor for this class does some initialization that we should pay attention to, as shown in Figure newquant.11.

The normal constructor for the MainObjectArray class (from quantum\newquant.cpp) (Figure newquant.11)

codelist/newquant.11

As you can see, this constructor begins by checking whether its argument is non-null; if so, it continues by initializing a member variable to the quantum file pointer provided by that argument. Then it uses that quantum file pointer to retrieve the number of main objects in the quantum file as well as the number of blocks that the main object list occupies. Then it resizes its block pointer list to that count and initializes each of the elements in the list to the appropriate block pointer. We'll see exactly how those block pointers work when we get back to the QuantumFile class, but for now we'll just assume that they allow access to the individual blocks that make up the main object list.

The MainObjectArray::Get Function

Now let's take a look at the Get function in this class (Figure newquant.12).

The MainObjectArray::Get function (from quantum\newquant.cpp) (Figure newquant.12)

codelist/newquant.12

This function is reasonably straightforward. It begins by checking the index of the element in the main object array that the caller wants to retrieve. If it is out of range, the return value is NoObject, indicating an error. However, assuming that the index is valid, the function calculates which block and element within that block corresponds to the requested entry. It then retrieves that element via the MainObjectBlock::Get function, which is called for the m_BlockPtr variable for that block.24 Finally, the result of the Get function, which is the quantum number of the big pointer quantum of the object in question, is returned to the calling function.

The MainObjectArray::Set Function

The next function we'll cover, MainObjectArray::Set (Figure newquant.13), is the counterpart to Get.

The MainObjectArray::Set function (from quantum\newquant.cpp) (Figure newquant.13)

codelist/newquant.13

This function takes two arguments, the element number in the main object array and the quantum number of that object's big pointer quantum. After checking that the element number is valid, it calculates the block number and element number in which the desired entry will be found, then calls the MainObjectBlock::Set function for the m_BlockPtr variable of the appropriate block in the main object array. Finally, if the entry being stored is equal to NoObject and the element number of this element is less than the current "lowest free object", then we update that variable to indicate that we should start looking at this place in the main object array the next time we are searching for a free object.

The MainObjectArray::FindAvailableObject Function

The next function we'll cover, MainObjectArray::FindAvailableObject (Figure newquant.14), is used to find an available entry in the main object array when we want to create a new object.

The MainObjectArray::FindAvailableObject function (from quantum\newquant.cpp) (Figure newquant.14)

codelist/newquant.14

This isn't terribly complicated either. It searches the main object list block by block, starting with the entry corresponding to the last known value of the "lowest free object" variable and continuing until it finds an empty entry (i.e., one whose value is NoObject). Once it finds such an entry, it calculates the corresponding element number in the main object array and returns it as the result after storing it in the "lowest free object" variable.25

The MainObjectArray::GetModifiableElement Function

The next function we'll cover, MainObjectArray::GetModifiableElement (Figure newquant.15), is used to retrieve a ModifiableElement value from a main object.

The MainObjectArray::GetModifiableElement function (from quantum\newquant.cpp) (Figure newquant.15)

codelist/newquant.15

Among its many uses, GetModifiableElement is the function that a FlexArray uses to retrieve values, so it is quite important to the functioning of our test program. Although it is somewhat longer than most of the other functions we've seen so far, it's really not too complicated, and it will serve as a good introduction to the more complex functions in this class.

We start by using the Get member function to look up the big pointer array quantum number for the object in which the element is contained. Once we have that quantum number, we call the function QuantumFile::MakeBigPointerBlockPtr to create a handle object that will allow us to access that quantum as a big pointer array. Next, we calculate which big pointer array entry contains the quantum number of the little pointer block that contains the element reference that we'll use to access the actual data. Once we know that, we can call the BigPointerBlock::GetBigArrayElement function to retrieve the quantum number for that little pointer block. Then we call the function QuantumFile::MakeLittlePointerBlockPtr to create a handle object that will allow us to access that quantum as a little pointer array. Once we have done that, we calculate which element in the little pointer array is the element reference to the element we are interested in, and retrieve that element reference from the little pointer array.

Now we're ready to access the element itself, assuming that it actually exists. First, though, we have to account for the possibility that it is a null item, that is, one that contains no data. If this is the case, we return an empty (i.e., default-constructed) ModifiableElement. Why have I handled this case separately, rather than storing a zero-length ModifiableElement in the leaf block?

A null item is an optimization: we could treat items containing no data like all other items. However, special handling for this case is worthwhile, since it is very common. For example, a word processor might store each line of a file as a string, removing the trailing null from each string before storing it; with that approach, strings consisting only of a trailing null would be stored as null items. The advantage of using null items is that, rather than occupying an item index slot indicating a length of zero, they are represented by item references that contain a special quantum number called NO_QUANTUM and a special item number called NULL_ITEM_NUMBER, neither of which can appear in a real item reference. This means that a null item does not occupy any space in a item index, and that we can avoid loading a quantum to retrieve its value.

However, let us suppose that the element reference is not null. In that case, we call QuantumFile::MakeLeafPointerBlockPtr to create a handle object that will allow us to access items in the quantum containing the desired item. Then we use QuantumBlock::GetModifiableItem to retrieve the actual item and return it to the caller.

The MainObjectArray::PutElement Function

The next function we'll cover, MainObjectArray::PutElement, shown in Figure newquant.16, is used to store a ModifiableElement value into a main object.

The MainObjectArray::PutElement function (from quantum\newquant.cpp) (Figure newquant.16)

codelist/newquant.16

This starts out by using a preprocessor macro called qfassert to check whether the size of the item to be stored is legal, i.e., that it is less than the maximum allowable item size.26 After that, the code is very similar to its counterpart, GetModifiableElement, until we retrieve the item reference from the little pointer array. Then, if the item reference is not null, we call QuantumBlock::DeleteItem to delete the current contents of that element. Then we check whether the size of the element to be stored is zero; if so, we store a null reference in the little pointer array (as noted previously).

On the other hand, if we have some data to be stored, we have to find a place to store it. The most logical place to look is in the quantum that we have most recently used to store something for this object, and this information is therefore maintained by the big pointer block access code in the BigPointerBlock class.27 Therefore, we check whether there may be enough free space in that quantum by calling QuantumFile::QGetFreeSpace.28 Because a free space code is only one byte (to reduce the size of the free space list), it cannot exactly represent the amount of free space in a quantum. Therefore, if the free space list entry indicates that this quantum may have enough space for the item, we have to retrieve the quantum via QuantumFile::MakeLeafBlockPtr and then call QuantumBlock::CalculateFreeSpace to calculate its free space more precisely. If the actual free space is larger than the size of the item, then we have found the quantum we will use to store the item.

However, if that quantum does not have enough space to store the item, then we will have to search for one that does. That is handled by QuantumFile::FindSpaceForItem, which returns the number of the first quantum that fits the bill.

At this point, we have a quantum in which to store the item. Therefore, after setting the "last quantum added to" member variable, we set the quantum number in the element reference to that quantum number. Then we call QuantumBlock::AddItem to add the item to that quantum, and set the item number in the element reference to the value returned from that function. Finally, we update the little array pointer that points to the newly added element.

The MainObjectArray::GetMainObjectElementCount Function

The next function, MainObjectArray::GetMainObjectElementCount (Figure newquant.17), is used to retrieve the number of elements that a main object currently contains.

The MainObjectArray::GetMainObjectElementCount function (from quantum\newquant.cpp) (Figure newquant.17)

codelist/newquant.17

You might think that this would be a trivial "get" function that merely returns the value of a member variable, but that is not the case. The reason is that this information is stored in the quantum file itself, as part of the big pointer array, because it needs to be maintained between executions of this program (or any other program that might want to use the file).

However, it's not too complicated. After using the Get function to retrieve the quantum number of the big pointer array for the object, it uses QuantumFile::MakeBigPointerBlockPtr to provide access to the big pointer array. Then it calls BigPointerBlock::GetBigArrayElementCount to retrieve the value to be returned.

The MainObjectArray::GetMainObjectMaxElementCount Function

The next function, MainObjectArray::GetMainObjectMaxElementCount (Figure newquant.18), is used to retrieve the maximum number of elements that a main object is permitted to contain. Because it is exactly like the previous function except for the function it calls in the BigPointerBlock class, I'll just list it without further comment.

The MainObjectArray::GetMainObjectMaxElementCount function (from quantum\newquant.cpp) (Figure newquant.18)

codelist/newquant.18

The MainObjectArray::CreateMainObject Function

The next function we'll cover, MainObjectArray::CreateMainObject, shown in Figure newquant.19, is used to create a new main object.

The MainObjectArray::CreateMainObject function (from quantum\newquant.cpp) (Figure newquant.19)

codelist/newquant.19

This function is somewhat complex, but that should be understandable because it takes quite a bit of work to create a new main object. We start off by checking whether the user has asked to create a zero-length object. Such an object would cause complications in the way that the size of the little pointer array is calculated; namely, the calculation of the number of blocks needed to hold the little pointer arrays (LittlePointerBlockCount) produces correct results for all values of the element count that are greater than 0 but produces a nonsensical result when the element count is 0. Because arrays can grow in size, a zero-element array may be a reasonable thing to create, so I didn't want to disallow them; the simplest solution, and the one I adopted, was to treat a request to create a zero-element array as a request for a one-element array. Because a zero-length object isn't very useful until it has been resized, this isn't likely to surprise the user.

Once that detail has been handled, we compute a free space entry indicating that this block is no longer available for use. Then we continue by calling QuantumFile::FindEmptyBlock to find a block for the big pointer array, then calling QuantumFile::MakeBigPointerBlockPtr to allow access to that block as a big pointer block.

Now we are ready to begin setting up the big pointer block for use. We start by calling BigPointerBlock::Clear to erase any previous data in the block.29 Next, we call BigPointerBlock::SetQuantumType to set the type of the quantum to indicate that it contains a big pointer array; this information is used for consistency checking when accessing a quantum that is supposed to be a big pointer array quantum.30 Then we call BigPointerBlock::SetMainObjectNumber to set the entry in the big pointer block that indicates which main object it belongs to. This is needed when we are updating the free space list, because each free space entry includes the number of the main object for that quantum. This is relevant because we do not share a quantum among several main objects, to simplify the task of programs that recover as much data as possible from a quantum file that has become corrupted.

The next step is to set up the header for the big pointer array. This contains the current element count, the maximum element count, and the last quantum that we've added an element to; of course, the latter variable is initialized to NoQuantum, as we have not yet added any data to this new object. Once we have set the header up, we call QuantumBlock::AddItem to add it to the big pointer block, so that it will be accessible the next time we use the big pointer array.

Now it's time to initialize the little pointer array for this object and the big pointer array that allows us to access the little pointer array. To start this process, we calculate the number of little pointer blocks that we will need, which is dependent on the number of elements in the object and the number of item references that fit in one block. Once we have determined that number, we can create a big pointer array containing that number of quantum numbers, and a temporary little pointer array that we will use to initialize each block in the little pointer array.

We're up to the loop that initializes all of the blocks in the little pointer array. It begins by calling QuantumFile::FindEmptyBlock to find an empty block that we can use for the little pointer array. Then we call QuantumFile::SetFreeSpace to reserve this quantum for the current main object and mark it as full.31 Next, we set the current element of the big pointer array to the quantum number of the quantum where we are storing the current little pointer array block.

Now we're ready to initialize the little pointer block. We start by calling QuantumFile::MakeLittlePointerBlockPtr to allow us to treat the current quantum as a little pointer block. Once we have done that, we call LittlePointerBlock::Clear to initialize the quantum to an unused state, LittlePointerBlock::SetQuantumType to set the quantum type to "little pointer array" (for consistency checking while retrieving data), LittlePointerBlock::SetMainObjectNumber to allow us to update the free space list when adding data to this quantum, QuantumBlock::AddItem to add a section of the little pointer array to this quantum, and LittlePointerBlock::ClearLittleArray to initialize that section of the little pointer array to null item references.

One detail that might not be immediately obvious is why we have to call QuantumFile::SetFreeSpace again at the end of the loop. The reason is that the QuantumBlock::AddItem function recalculates the free space for the block to which it adds the item. Since we want to prevent any further data from being added to this block, we have to reset the free space manually (to "none available") to indicate that this block is full.

Once we have initialized all of the little pointer array sections, we're ready to finish setting up the big pointer array for use. The first step in that process is to add the big pointer array data we have just filled in to the big pointer quantum, via QuantumBlock::AddItem. Then we use QuantumFile::SetFreeSpace to reserve the big pointer quantum for our new main object and mark it full. Then we use the member function Set to set the big pointer array quantum number for our new main object. Finally, we call the member function PutElement to set up the directory entry for the new main object.

The MainObjectArray::FindObjectByName Function

The next function we'll cover, MainObjectArray::FindObjectByName (Figure newquant.20), is used to look up a main object in the "directory".

The MainObjectArray::FindObjectByName function (from quantum\newquant.cpp) (Figure newquant.20)

codelist/newquant.20

This one is fairly simple. It steps through the elements of the main object directory, looking for an element whose contents are equal to the argument to the function. If it finds such an element, it returns the index of that element, which is the object number of the object with that name. On the other hand, if it gets to the end of the main object directory without finding a match, it returns the value NoObject.

The MainObjectArray::GrowMainObject Function

The next function, MainObjectArray::GrowMainObject (Figure newquant.21), is used to increase the size of a main object when needed to accommodate more elements.

The MainObjectArray::GrowMainObject function (from quantum\newquant.cpp) (Figure newquant.21)

codelist/newquant.21

This is about as complicated as CreateMainObject. However, because it is quite similar to that function, we'll be able to shorten the explanation considerably. We start by retrieving the existing big pointer block for the object. Then we copy the big pointer array from that block to a new big pointer array. Next, we calculate the size of the new little pointer array needed to hold references to all the elements in the new, enlarged, object.

The next few lines of code might be a little mysterious if I don't explain a few details of the size calculation. To simplify the accounting needed to keep track of the sizes of all of the little pointer arrays, I've decided always to allocate a little pointer array to the largest size that will fit into a quantum. However, setting the accessible size of an array to the actual number of elements allocated for its little pointer arrays has the drawback that the user wouldn't be able to find out when the program accessed an element that is within the allocated size but outside the number of elements specified when creating the array. Therefore, I've crafted a compromise: when an array is created, I set the number of accessible elements to the number the user specifies32; however, if the array is increased in size, the new size is always equal to the total allocation for all little pointer arrays. This means that if an array is expanded by a reference to an element that was already allocated but was outside the specified array size, GrowMainObject simply resets the size to the number of elements contained in all little pointer arrays already extant and returns that value. Otherwise, a new little pointer array is allocated, and the big pointer array is updated to reflect the change.

Assuming that we actually do need to allocate some new little pointer blocks, we continue by increasing the size of the big pointer array to accommodate the new number of little pointer blocks. Then we allocate and initialize each of the new little pointer blocks in exactly the same way as we allocated and initialized the original little pointer blocks when we created the object. Then we delete the old big pointer array from the big pointer block, add the new big pointer array to the big pointer block, set the free space entry for the big pointer block to indicate that it is full, and return the new element count to the caller.

Now that we have reached the end of the discussion of the MainObjectArray class, we'll return to the QuantumFile class. This time, we're going to get into the member functions that are used in the implementation, since we've already dealt with those that are used by outside programs.

The QuantumFile class

We'll skip over the QuantumFile member functions that fall in three categories:

1. Those that merely compute the number of blocks occupied by a segment of the quantum file (i.e., GetMainObjectBlockCount and QGetFreeSpaceListBlockCount).

2. Those that merely return a block of the appropriate type to be accessed as a little pointer block, big pointer block, and the like (i.e., MakeFreeSpaceBlockPtr, MakeMainObjectBlockPtr, MakeLittlePointerBlockPtr, MakeBigPointerBlockPtr, MakeLeafBlockPtr).

3. Those that merely call member functions of the FreeSpaceArray class (i.e., SetFreeSpace, QGetFreeSpace, FindEmptyBlock, FindSpaceForItem).

We'll see how the functions in the third category work when we examine the underlying functions that they call.

The QuantumFile::SetModified Function

However, that still leaves us with several functions to discuss in the QuantumFile class. We'll start with QuantumFile::SetModified, which is shown in Figure block.06.

The QuantumFile::SetModified function (from quantum\block.cpp) (Figure block.06)

codelist/block.06

This is actually a fairly simple function. It calls a global function called FindBuffer to look up the specified quantum number in the list of quantum buffers.33 If the quantum with that quantum number is in fact in one of the quantum buffers, then the "buffer modified" flag for that buffer is set, and the function returns. The only complication is a consistency check in the form of an "assert" that the quantum in question is actually in one of the buffers. If that is not the case, then we have a serious problem, because the program should not be trying to set the modified flag for a quantum that isn't in memory: that indicates that the buffer that contained the quantum has been reused too soon.

The QuantumFile::Position Function

The next function we will examine is QuantumFile::Position, which is shown in Figure block.07.

The QuantumFile::Position function (from quantum\block.cpp) (Figure block.07)

codelist/block.07

This function is used by the Read and Write functions to position the file pointer to the correct place for reading or writing a quantum. This is quite simple because every block is the same size. However, it is important to remember that the quantum file contains data other than the quanta themselves, so the block number of a quantum is not the same as its quantum number. To convert between these two numbers is the responsibility of the function that computes the block number supplied to Position.

The QuantumFile::Read Function

The next function we will examine is QuantumFile::Read, which is shown in Figure block.08.

The QuantumFile::Read function (from quantum\block.cpp) (Figure block.08)

codelist/block.08

After positioning the file pointer to the location in the file where this block is stored, we read it into the buffer. Then we set the modified flag for the buffer to FALSE, indicating that the disk and memory versions of the block are the same. Finally, we increment the read count, which is used for performance analysis.

The QuantumFile::Write Function

The next function we will examine is QuantumFile::Write, which is shown in Figure block.09. This is identical to the Read function, except of course that it writes the data to the file rather than reading it.

The QuantumFile::Write function (from quantum\block.cpp) (Figure block.09)

codelist/block.09

The QuantumFile::MakeBlockResident Function

The last function in this class is QuantumFile::MakeBlockResident. This is where we rejoin the thread of discussion about operator->. You see, MakeBlockResident is called during the execution of operator-> for any of the BlockPtr classes.34 Figure mbr1 shows an early implementation of this routine.

Early MakeBlockResident code (Figure mbr1)

codelist/mbr1.00

This is a fairly straightforward implementation of a least recently used (LRU) priority algorithm, but there are a few wrinkles to reduce overhead. First, we use the reference parameter p_OldBufferNumber to retrieve the buffer number (if any) in which the searched-for block was located on our last attempt. If the number of the block held in that buffer matches the one we're looking for, we will be able to avoid the overhead of searching the entire buffer list. The reason that p_OldBufferNumber is a reference parameter is so that we can update the caller's copy when we locate the block in a different buffer; that way, the next time that MakeBlockResident is called to retrieve the same block's address, we can check that buffer number first.

In order to make this work, we can't implement the LRU priority list by moving the most recently used block to the front of the block list; the saved buffer number would be useless if the blocks moved around in the list every time a block other than the most recently used one was referenced. Instead, each slot has an attached timestamp, updated by calling the time function every time the corresponding block is referenced. When we need to free up a buffer, we select the one with the lowest (i.e., oldest) timestamp. If that buffer has the "modified" attribute set, then it has been updated in memory and needs to be written back to the disk before being reused, so we do that. Then we read the new block into the buffer, update the caller's p_OldBufferNumber for next time, and return the address of the buffer.

This seems simple enough, without much scope for vast improvements in efficiency; at least, it seemed that way to me, until I ran Turbo ProfilerTM on it and discovered that it was horrendously inefficient. The profiler indicated that 73% of the CPU time of my test program was accounted for by calls to the time routine! To add insult to injury, the timing resolution of that routine is quite coarse (approximately 18 msec), with the result that most of the blocks would often get the same timestamp when running the program normally.35 Upon consideration, I realized that the best possible "timestamp" would be a simple count of the number of times that MakeBlockResident was called. This would entail almost no overhead and would be of exactly the correct resolution to decide which block was least recently used; this is the mechanism used in the current version.

Punching the Time Clock

One interesting consideration in the original design of this mechanism was what size of counter to use for the timestamp. At first, it seemed necessary (and sufficient) to use a 32-bit type such as an unsigned long, which would allow 4 billion accesses before the counter would "turn over" to 0. However, because the original implementation was compiled with a 16-bit compiler, the natural size to use would be an unsigned (meaning unsigned short) variable. Therefore, I had to decide whether a longer type was necessary.

After some thought, I decided that it wasn't. The question is: what happens when the counter turns over? To figure this out, let's do a thought experiment, using two-byte counters. Suppose that the priority list looks like Figure timestamp1, with the latest timestamp being 65535.

Timestamps before turnover (Figure timestamp1)

Block number    Timestamp

14 65533

22 65535

23 65000

9 65100

Let's suppose that the next reference is to block 9, with the counter turning over to 0. The list will now look like Figure timestamp2.

Timestamps immediately after turnover (Figure timestamp2)

Block number    Timestamp

14 65533

22 65535

23 65000

9 0

The next time that a new block has to be loaded, block 9 will be replaced, instead of block 23, which is actually the least recently used block. What effect will this have on performance? At first glance, it doesn't appear that the maximum possible effect could be very large; after all, each turnover would only cause each buffer to be replaced incorrectly once. If we have 100 buffers (a typical number), the worst case would be that the "wrong" buffer is replaced 100 times out of 64K, which is approximately 1.5% of the time; with fewer buffers, the effect is even smaller. There is no danger to the data, since the buffers will be written out if they have changed. I suspected that the cost (under a 16-bit compiler) of handling a unsigned long counter instead of an unsigned short on every call to MakeBlockResident would probably be larger than the cost of this inefficient buffer use, but it didn't appear important either way.

Getting Our Clocks Cleaned

Although the preceding analysis was good enough to convince me not to worry about the counter turning over, unfortunately it wasn't good enough to convince the machine. What actually happened was that after a large number of block references, the program started to run very slowly. I was right that the data wasn't in danger, but performance suffered greatly. Why? Let's go back to our example. Which buffer would be replaced when the next block needed to be read in? The one currently holding block 9, since it has the "lowest" priority. If that block number happened to be 32, for example, that would leave us with the arrangement in Figure timestamp3.

Timestamps shortly after turnover (Figure timestamp3)

Block number    Timestamp

14 65533

22 65535

23 65000

32 1

The problem should now be obvious: the newest block still has the "lowest" priority! The reason that the program started to run very slowly after turnover was that the "fossil" timestamps on the old blocks were preventing them from being reused for more active blocks, so every block that had to be read in had to share buffers with the ones that had been read in after turnover. The solution was fairly simple; on turnover, I set all of the timestamps to 0 to give every buffer the same priority. This isn't really optimal, since it doesn't preserve the relative priority of the blocks already in memory; however, it has the virtue of simplicity, and does reduce the problem to the fairly insignificant level indicated by my first analysis of the turnover problem.

Speed Demon

Is this the end of our concern for the MakeBlockResident routine? Not at all; as befits its central role in the virtual memory mechanism, this routine has undergone quite a few transformations during the development process. One attempt to speed it up took the form of creating a FastResidenceCheck routine that would have the sole purpose of checking whether the old buffer number saved from the previous call to load the same block number was still good; if so, it would return that buffer number after resetting the timestamp. The theoretical advantage of splitting this function off from the more general case was that such a routine might be simple enough to be inlined effectively, which would remove the overhead of one function call from the time needed to make sure that the block in question was memory resident. Unfortunately, this measure turned out to be ineffective; one reason was that the routines that called MakeBlockResident typically didn't reuse the object where the former buffer number was saved, but had to create another one every time they were called by their client routines. Therefore, the attempt to "remember" the previous buffer number wasn't successful in most cases.

While FastResidenceCheck was in use, it suffered from a bug caused by improperly initializing the old buffer number to 0, a valid buffer number. The result of this error was that when a block happened to be loaded into buffer number 0, operator-> didn't initialize the pointer to the ItemIndex array, since the new buffer number "matched" the old buffer number. This problem would have been solved anyway by the new versions of operator->, which always initialize any pointers that might need to be updated; after the attempt to avoid apparently redundant initializations of these pointers caused a couple of bugs, I decided that discretion was the better part of optimization. As a result of this change, we no longer have to inform the caller of the buffer number that this block contains, so the reference argument to MakeBlockResident for that purpose has been removed.

Virtual Perfection

The acid test of this virtual memory mechanism is to run the program with only one block buffer; unsurprisingly, this test revealed a nasty bug. It seems that I had neglected to initialize the value of the EarliestStamp variable, used to keep track of the buffer with the earliest timestamp. When running with only one buffer, it was possible under some circumstances for a block to be replaced before it was ever used; when this happened, the timestamp on the buffer it had occupied was left set to its initial value of ULONG_MAX. This initial value was significant because the search for the earliest timestamp also starts out by setting the TimeStamp variable to ULONG_MAX, which should be greater than any timestamp found in the search. If there were no blocks in the list with a "real" timestamp, the conditional statement that set the EarliestStamp value in the search loop was never executed. As a result, the EarliestStamp variable was left in an uninitialized state, which caused a wild access to a nonexistent block buffer. The fix was to initialize EarliestStamp to 0, so the first buffer will be selected under these circumstances; you can see this implemented in the current version of MakeBlockResident (Figure block.05).

The QuantumFile::MakeBlockResident function

The MakeBlockResident function (from quantum\block.cpp) (Figure block.05)

codelist/block.05

We've already covered most of the tricky parts of this function, but let's go over it one more time just to be on the safe side. We begin by incrementing the counter that serves as a timestamp; if it turns over to 0, we set all of the timestamps to 0 to prevent the newest block from being replaced continually.36 Then we look up the buffer number for the quantum in question, via the FindBuffer global function. If the quantum is found in a buffer, then we merely reset the timestamp on that buffer to indicate that it is the most recently used, and return a pointer to that buffer for use by the operator-> function that called MakeBlockResident.

However, in the event that the quantum we want isn't yet in a buffer, then we have to figure out which buffer it should be read into. We do this by examining the timestamp for each buffer, and selecting the buffer with the earliest timestamp.

Once we have found the buffer that we are going to reuse, we check whether it has been modified; if so, we write it back to the disk. Then we set its block number to the block number of the new quantum that is being read in, set its modified flag to FALSE, set the timestamp on the buffer to the current timestamp, and read the quantum into the buffer. Finally, we return a pointer to the buffer for use by the calling operator-> function.

The Ptr-Patter of Little Blocks

Now that we've seen how MakeBlockResident works, let's see how it is used in one of the "BlockPtr" classes. Because all of these classes work in much the same way, we will examine only one of them in any detail. I've chosen LittlePointerBlockPtr because it made for a better section title, but any of them would have done just as well. First, Figure blocki.00 shows the interface for this class.

The interface for the LittlePointerBlockPtr class (from quantum\blocki.h) (Figure blocki.00)

codelist/blocki.00

As you can see, this interface isn't very large. In fact, with only one member variable and three functions, it's the smallest one we'll see.37 However, that doesn't mean that it is completely without interest, as we'll discover. Let's start with the normal constructor, shown in Figure block.10.

The normal constructor for the LittlePointerBlockPtr class (from quantum\block.cpp) (Figure block.10)

codelist/block.10

This constructor is simple but unrevealing: to see the effects of the variables it is initializing, we will have to wait until we examine the LittlePointerBlock class. For now, let me just say that the block number is the number of the block where the little pointer block is stored, and the block manager is a pointer to the QuantumFile object that controls the quantum file where the little pointer block is stored. The block variable is a pointer to the buffer where the block is stored in memory; it is assigned a value in the next function we will examine, operator-> (Figure block.11).

The LittlePointerBlockPtr::operator-> Function

The LittlePointerBlockPtr::operator-> function (from quantum\block.cpp) (Figure block.11)

codelist/block.11

As we've seen in the prior discussion of operator->, the return value of this function will be used to point to the object for which the final function will be executed. In this case, that object will be the little pointer block object embedded in this LittlePointerBlockPtr object. The main purpose of this operator-> function is to make sure that the block that the little pointer block object refers to is in memory before we try to do anything with it. That's why QuantumFile::MakeBlockResident exists, and that's why we call it here. Once that function returns, we know that the little pointer block is resident in memory. At that point, we set the item index member variable to the address of the item index in the block, call LittlePointerBlock::SetLittlePointerArray to allow access to the little pointer array in the block, and return the address of our little pointer block object.

When the function that called our operator-> resumes execution, it will reapply the same function call to the pointer that we return. This time, because the return value is in fact a pointer rather than an object, the function call will go through to its final destination.38 This would seem to be a logical time to start our examination of how the body class, LittlePointerBlock, works. However, before we get to that class, we have quite a bit of ground to cover. You see, the block classes are arranged in an inheritance tree that begins with a very simple structure called Block, and builds in complexity from there. The best place to start is at the beginning, so that's what we'll do.

Just Up the Block

Figure blocki.01 shows the definition of the Block union.

The definition of the Block union (from quantum\blocki.h) (Figure blocki.01)

codelist/blocki.01

I don't use the union type very much. In fact, this Block type is the only example of a union in this book. That's because they can be the source of subtle errors if you don't keep track of the actual type of the data in a union very carefully. However, in this case it came in very handy because the different types of blocks have to be identified at run-time anyway; putting the logic that discriminates among the various types of blocks into the various block classes should make it reasonably safe, and in fact I haven't seen any problems from this implementation decision.

So what are all the types of data that can be stored in a Block? The simplest is an array of char taking up the entire block; this low-level type is used when we are copying a block or otherwise treating it as simply a bunch of undifferentiated data. The next possibility, a 64-element array of char, is used in the header block of a quantum file, and consists of a 64 character "version information string". This is used to determine whether the version of the quantum file implementation that wrote the file is the same as the current implementation, as described earlier. The next type is an array of QFreeSpaceEntry elements, which is used in the free space list classes. The next possible type is an array of MainObjectEntry elements, which is used in the main object classes. Finally, a Block can contain a QuantumBlockStruct, which is the data type used to access blocks that contain normal quanta.

The QuantumBlockStruct struct

The last of these types is actually a bit more complicated than it looks, as we'll see when we examine the definition of QuantumBlockStruct, shown in Figure blocki.02.

The QuantumBlockStruct struct (from quantum\blocki.h) (Figure blocki.02)

codelist/blocki.02

What makes this apparently innocent QuantumBlockStruct data type more mysterious than it looks? The clue is in the comment on the m_ItemIndex variable: this is actually a place holder for a variable-length array of item index elements. Although you cannot directly declare a variable-length array in C or C++, it is possible to use such a place holder to simplify the calculation of the beginning address of a variable-length array that is "mapped" onto a buffer, as we do when accessing data in a quantum block. We'll see how this works when we get to the first of the block classes that is used for accessing data in the normal quantum format rather than as one of the other types (e.g., free space entry or main object entries).

The QuantumBlockHeader struct

We should also take a look at the definition of QuantumBlockHeader, shown in Figure blocki.03.

The QuantumBlockHeader struct (from quantum\blocki.h) (Figure blocki.03)

codelist/blocki.03

This struct represents the data that is present at the beginning of every block that holds a normal quantum. It consists of three fields: the quantum type field, the object number field, and the item count field.

The quantum type field contains a value indicating the type of the quantum, such as "little pointer array", "big pointer array", or "leaf node". This information could be used to allow a deeper hierarchy of pointer types, but currently it is used only for consistency checking. That is, the top-level quantum pointed to by a main object entry must be a big pointer array quantum, the quantum pointed to by a big pointer array entry must be a little pointer array quantum, and the quantum pointed to by a little pointer entry must be a leaf node (i.e., one that contains user data).

The object number field contains the number of the main object to which this quantum belongs. This is used when we update the free space list, because we do not share a quantum among more than one main object.

The item count field represents the number of data items in this quantum, which is also the number of entries in the variable-length m_ItemIndex array.

The BaseBlockPtr class

Now that w