TOC PREV NEXT INDEX


I've just released a deduplicating backup product for
VMWare Workstation and Server.
Do you need to reduce the storage needed to maintain multiple backups?
Or do you need multiple snapshots on VMWare Server?
If you have either of these needs, just click
here to get started for only $29 (limited time offer ends July 31st, 2011)!

C++: A Dialog


CHAPTER 2 Hardware Fundamentals
Like any complex tool, the computer can be understood on several levels. For example, it's entirely possible to learn to drive an automobile without having the slightest idea how it works. The analogy with computers is that it's relatively easy to learn how to use a Web browser without having any notion of how such programs work. On the other hand, programming is much more closely analogous to designing an automobile than it is to driving one. Therefore, we're going to have to go into some detail about the internal workings of a computer, not at the level of electronic components, but at the lowest level accessible to a programmer.

This is a book on learning to program in C++, not on how a computer works.1 Therefore, it might seem better to start there and eliminate this detour, and indeed many (perhaps most) books on C++ do exactly that. However, in working out in detail how I'm going to explain C++ to you, I've come to the conclusion that it would be virtually impossible to explain why certain features of the language exist and how they actually work, without your understanding how they relate to the underlying computer hardware.

I haven't come to this position by pure logical deduction, either. In fact, I've worked backward from the concepts that you will need to know to program in C++ to the specific underlying information that you will have to understand first. I'm thinking in particular of one specific concept, the pointer, that is supposed to be extremely difficult for a beginning programmer in C++ to grasp. With the approach we're taking, you shouldn't have much trouble understanding this concept by the time you get to it in Chapter 7, "Creating a Homegrown string class". I'd be interested to know how you find my explanation, given the background that you'll have by that point. Don't hesitate to e-mail me about this topic (or any other, for that matter).

On the other hand, if you're an experienced programmer a lot of this will be just review for you. Nonetheless, it can't hurt to go over the basics one more time before diving into the ideas and techniques that make C++ different from other languages.2

Now let's begin with some definitions and objectives.

2.1. Definitions
A digit is one of the characters used in any positional numbering system to represent all numbers starting at 0 and ending at one less than the base of the numbering system. In the decimal system, there are ten digits, 0 - 9, and in the hexadecimal system there are sixteen digits, 0 - 9 and a - f.
A binary number system is one that uses only two digits, 0 and 1.
A hexadecimal number system is one that uses sixteen digits, 0 - 9 and a - f.
CPU is an abbreviation for central processing unit. This is the "active" part of your computer, which executes all the machine instructions that make the computer do useful work.
A machine instruction is one of the fundamental operations that a CPU can perform. Some examples of these operations are addition, subtraction, or other arithmetic operations; other possibilities include operations that control what instruction will be executed next. All C++ programs must be converted into machine instructions by a compiler before they can be executed by the CPU.
An assembly language program is the human-readable representation of a set of machine instructions; each assembly language statement corresponds to one machine instruction. By contrast, a C++ program consists of much higher-level operations which cannot be directly translated one-for-one into machine instructions.

2.2. Objectives of This Chapter

By the end of this chapter, you should
· Understand the programmer's view of the most important pieces of software in your computer.
· Understand the programmer's view of the most important pieces of hardware in your computer.
· Understand how whole numbers are stored in the computer.

2.3. Inside the Box

First we'll need to expand on the definition of hardware. As noted earlier, hardware means the physical components of a computer, the ones you can touch.3 Examples are the monitor, which displays your data while you're looking at it, the keyboard, the printer, and all of the interesting electronic and electromechanical components inside the case of your computer.4 Right now, we're concerned with the programmer's view of the hardware. The hardware components of a computer with which you'll be primarily concerned are the disk, RAM (Random Access Memory), and last but certainly not least, the CPU.5 We'll take up each of these topics in turn.

Disk

When you sit down at your computer in the morning, before you turn it on, where are the programs you're going to run? To make this more specific, suppose you're going to use a word processor to revise a letter you wrote yesterday before you turned the computer off. Where is the letter, and where is the word processing program?

You probably know the answer to this question; they are stored on a disk inside the case of your computer. Technically, this is a hard disk, to differentiate it from a floppy disk, the removable storage medium often used to distribute software or transfer files from one computer to another.6 Disks use magnetic recording media, much like the material used to record speech and music on cassette tapes, to store information in a way that will not be lost when power is turned off. How exactly is this information (which may be either executable programs or data such as word processing documents) stored?

We don't have to go into excruciating detail on the storage mechanism, but it is important to understand some of its characteristics. A disk consists of one or more circular platters, which are extremely flat and smooth pieces of metal or glass covered with a material that can be very rapidly and accurately magnetized in either of two directions, "north" or "south". To store large amounts of data, each platter is divided into many billions of small regions, each of which can be magnetized in either direction, independent of the other regions. The magnetization is detected and modified by recording heads, similar in principle to those used in tape cassette decks. However, in contrast to the cassette heads, which make contact with the tape while they are recording or playing back music or speech, the disk heads "fly" a few millionths of an inch away from the platters, which rotate at very high velocity in a vacuum.7

The separately magnetizable regions used to store information are arranged in groups called sectors, which are in turn arranged in concentric circles called tracks. All tracks on one side of a given platter (a recording surface) can be accessed by a recording head dedicated to that recording surface; each sector is used to store some number of bytes of the data, generally a few hundred to a few thousand. Byte is a coined word meaning a group of binary digits, or bits for short; there are 8 bits in a byte in just about every modern general-purpose computer system, including PCs and Macintoshes.8 You may wonder why the data aren't stored in the more familiar decimal system, which of course uses the digits from 0 through 9. This is not an arbitrary decision; on the contrary, there are a couple of very good reasons that data on a disk are stored using the binary system, in which each digit has only two possible states, 0 and 1. One of these reasons is that it's a lot easier to determine reliably whether a particular area on a disk is magnetized "north" or "south" than it is to determine 1 of 10 possible levels of magnetization. Another reason is that the binary system is also the natural system for data storage using electronic circuitry, which is used to store data in the rest of the computer.

While magnetic storage devices have been around in one form or another since the very early days of computing, the advances in technology just in the last 16 years have been staggering. To comprehend just how large these advances have been, we need to define the term used to describe storage capacities: the megabyte. The standard engineering meaning of mega is "multiply by 1 million", which would make a megabyte equal to 1 million (1,000,000) bytes. As we have just seen, however, the natural number system in the computer field is binary. Therefore, one megabyte is often used instead to specify the nearest round number in the binary system, which is 220 (2 to the power of 20), or 1,048,576 bytes. This wasn't obvious to Susan, so I explained it some more:

Susan: Just how important is it to really understand that the megabyte is 220 (1,048,576) bytes? I know that a meg is not really a meg; that is, it's more than a million. But I don't understand 220, so is it enough to just take your word on this and not get bogged down as to why I didn't go any further than plane geometry in high school? You see, it makes me worry and upsets me that I don't understand how you "round" a binary number.
Steve: 220 would be 2 to the power of 20; that is, 20 twos multiplied together. This is a "round" number in binary, just as 106 (1,000,000, or 6 tens multiplied together) is a round number in decimal.

Seventeen Years of Progress

With that detail out of the way, we can see just how far we've come in a relatively short time. In 1985, I purchased a 20 megabyte disk for $900 ($45 per megabyte) and its access time, which measures how long it takes to retrieve data, was approximately 100 milliseconds (milli = 1/1000, so a millisecond is 1/1000 of a second). In October 2001, a 40,000 megabyte disk cost as little as $130, or approximately 0.3 cent per megabyte; in addition to delivering almost 14000 times as much storage per dollar, this disk had an access time of 9 milliseconds, which is approximately 11 times as fast as the old disk. Of course, this significantly understates the amount of progress in technology in both economic and technical terms. For one thing, a 2001 dollar is worth considerably less than a 1985 dollar. In addition, the new drive is superior in every other measure as well. It is much smaller than the old one, consumes much less power, and has many times the projected reliability of the old drive.

This tremendous increase in performance and decrease in price has prevented the long-predicted demise of disk drives in favor of new technology. However, the inherent speed limitations of disks still require us to restrict their role to the storage and retrieval of data for which we can afford to wait a relatively long time.

You see, while 9 milliseconds isn't very long by human standards, it is a long time indeed to a modern computer. This will become more evident as we examine the next essential component of the computer, the RAM.

RAM

The working storage of the computer, where data and programs are stored while we're using them, is called RAM, which is an acronym for random access memory. For example, a word processor is stored in RAM while you're using it. The document you're working on is likely to be there as well unless it's too large to fit all at once, in which case parts of it will be retrieved from the disk as needed. Since we have already seen that both the word processor and the document are stored on the disk in the first place, why not leave them there and use them in place, rather than copying them into RAM?

The answer, in a word, is speed. RAM, which is sometimes called "internal storage", as opposed to "external storage" (the disk), is physically composed of millions of microscopic switches on a small piece of silicon known as a chip: a 4-megabit RAM chip has approximately 4 million of them.9 Each of these switches can be either on or off; we consider a switch that is on to be storing a 1, and a switch that is off to be storing a 0. Just as in storing information on a disk, where it is easier to magnetize a region in either of two directions, it's a lot easier to make a switch that can be turned on or off reliably and quickly than one that can be set to any value from 0 to 9 reliably and quickly. This is particularly important when you're manufacturing millions of them on a silicon chip the size of your fingernail.

A main difference between disk and RAM is what steps are needed to access different areas of storage. In the case of the disk, the head has to be moved to the right track (an operation known as a seek), and then we have to wait for the platter to spin so that the region we want to access is under the head (called rotational delay). On the other hand, with RAM, the entire process is electronic; we can read or write any byte immediately as long as we know which byte we want. To specify a given byte, we have to supply a unique number, called its memory address, or just address for short.

Memory Addresses

What is an address good for? Let's see how my discussion with Susan on this topic started.
Susan: About memory addresses, are you saying that each little itty bitty tiny byte of RAM is a separate address? Well, this is a little hard to imagine.
Steve: Actually, each byte of RAM has a separate address, which doesn't change, and a value, which does.
In case the notion of an address of a byte of memory on a piece of silicon is too abstract, it might help to think of an address as a set of directions to find the byte being addressed, much like directions to someone's house. For example, "Go three streets down, then turn left. It's the second house on the right". With such directions, the house number wouldn't need to be written on the house. Similarly, the memory storage areas in RAM are addressed by position; you can think of the address as telling the hardware which street and house you want, by giving directions similar in concept to the preceding example. Therefore, it's not necessary to encode the addresses into the RAM explicitly.

Susan wanted a better picture of this somewhat abstract idea:

Susan: Where are the bytes on the RAM, and what do they look like?
Steve: Each byte corresponds to a microscopic region of the RAM chip. As to what they look like, have you ever seen a printed circuit board such as the ones inside your computer? Imagine the lines on that circuit board reduced thousands of times in size to microscopic dimensions, and you'll have an idea of what a RAM chip looks like inside.
Since it has no moving parts, storing and retrieving data in RAM is much faster than waiting for the mechanical motion of a disk platter turning.10 As we've just seen, disk access times are measured in milliseconds, or thousandths of a second. However, RAM access times are measured in nanoseconds (ns); nano means one-billionth. In late 2001, a typical speed for RAM was 10 ns, which means that it is possible to read a given data item from RAM about 1,000,000 times as quickly as from a disk. In that case, why not use disks only for permanent storage, and read everything into RAM in the morning when we turn on the machine?

The reason is cost. In late 2001, the cost of 512 megabytes of RAM was approximately $120. For that same amount of money, you could have bought 37000 megabytes of disk space! Therefore, we must reserve RAM for tasks where speed is all-important, such as running your word processing program and holding a letter while you're working on it. Also, since RAM is an electronic storage medium (rather than a magnetic one), it does not maintain its contents when the power is turned off. This means that if you had a power failure while working with data only in RAM, you would lose everything you had been doing.11 This is not merely a theoretical problem, by the way; if you don't remember to save what you're doing in your word processor once in a while, you might lose a whole day's work from a power outage of a few seconds.12

Before we get to how a program actually works, we need to develop a better picture of how RAM is used. As I've mentioned before, you can think of RAM as consisting of a large number of bytes, each of which has a unique identifier called an address. This address can be used to specify which byte we mean, so the program might specify that it wants to read the value in byte 148257, or change the value in byte 66666. Susan wanted to make sure she had the correct understanding of this topic:

Susan: Are the values changed in RAM depending on what program is loaded in it?
Steve: Yes, and they also change while the program is executing. RAM is used to store both the program and its data.
This is all very well, but it doesn't answer the question of how the program actually uses or changes values in RAM, or performs arithmetic and other operations; that's the job of the CPU, which we will take up next.

The CPU

The CPU (central processing unit) is the "active" component in the computer. Like RAM, it is physically composed of millions of microscopic transistors on a chip; however, the organization of these transistors in a CPU is much more complex than on a RAM chip, as the latter's functions are limited to the storage and retrieval of data. The CPU, on the other hand, is capable of performing dozens or hundreds of different fundamental operations called machine instructions, or instructions for short. While each instruction performs a very simple function, the tremendous power of the computer lies in the fact that the CPU can perform (or execute) tens or hundreds of millions of these instructions per second.

These instructions fall into a number of categories: instructions that perform arithmetic operations such as adding, subtracting, multiplying, and dividing; instructions that move information from one place to another in RAM; instructions that compare two quantities to help make a determination as to which instructions need to be executed next and instructions that implement that decision; and other, more specialized types of instructions.13

Of course, adding two numbers together, for example, requires that the numbers be available for use. Possibly the most straightforward way of making them available is to store them in and retrieve them from RAM whenever they are needed, and indeed this is done sometimes. However, as fast as RAM is compared to disk drives (not to mention human beings), it's still pretty slow compared to modern CPUs. For example, the computer I'm using right now has a 500 megahertz (MHz) Pentium III CPU, which can execute an instruction in 2 ns.14

To see why RAM is a bottleneck, let's calculate how long it would take to execute an instruction if all the data had to come from and go back to RAM. A typical instruction would have to read some data from RAM and write its result back there; first, though, the instruction itself has to be loaded (or fetched) into the CPU before it can be executed. Let's suppose we have an instruction in RAM, reading and writing data also in RAM. Then the minimum timing to do such an instruction could be calculated as in Figure 2.1.

FIGURE 2.1. RAM vs. CPU speeds
Time Function
10 ns Read instruction from RAM
10 ns Read data from RAM
02 ns Execute instruction
10 ns Write result back to RAM
32 ns Total instruction execution time

To compute the effective speed of a CPU, we divide 1 second by the time it takes to execute one instruction.15 Given the assumptions in this example, the CPU could execute only about 31 MIPS (million instructions per second), which is a far cry from the 500 MIPS or more that we might expect.16 This seems very wasteful; is there a way to get more speed?

In fact, there is. As a result of a lot of research and development both in academia and in the semiconductor industry, it is possible to approach the rated performance of fast CPUs, as will be illustrated in Figure 2.12. Some of these techniques have been around for as long as we've had computers; others have fairly recently trickled down from supercomputers to microcomputer CPUs. One of the most important of these techniques is the use of a number of different kinds of storage devices having different performance characteristics; the arrangement of these devices is called the memory hierarchy. Figure 2.2 illustrates the memory hierarchy of my home machine. Susan and I had a short discussion about the layout of this figure.

Susan: OK, just one question on Figure 2.2. If you are going to include the disk in this hierarchy, I don't know why you have placed it over to the side of RAM and not above it, since it is slower and you appear to be presenting this figure in ascending order of speed from the top of the figure downward. Did you do this because it is external rather than internal memory and it doesn't "deserve" to be in the same lineage as the others?
Steve: Yes; it's not the same as "real" memory, so I wanted to distinguish it.
Before we get to the diagram, I should explain that a cache is a small amount of fast memory where frequently used data are stored temporarily. According to this definition, RAM functions more or less as a cache for the disk; after all, we have to copy data from a slow disk into fast RAM before we can use it for anything. However, while this is a valid analogy, I should point out that the situations aren't quite parallel. Our programs usually read data from disk into RAM explicitly; that is, we're aware of whether it's on the disk or in RAM, and we have to issue commands to transfer it from one place to the other. On the other hand, caches are "automatic" in their functioning. We don't have to worry about them and our programs work in exactly the same way with them as without them, except faster. In any event, the basic idea is the same: to use a faster type of memory to speed up repetitive access to data usually stored on a slower storage device.

We've already seen that the disk is necessary to store data and programs when the machine is turned off, while RAM is needed for its higher speed in accessing data and programs we're currently using.17 But why do we need the external cache?

Actually, we've been around this track before when we questioned why everything isn't loaded into RAM rather than read from the disk as needed; we're trading speed for cost. To have a cost-effective computer with good performance requires the designer to choose the correct amount of each storage medium.

So just as with the disk vs. RAM trade-off, the reason that we use the external cache is to improve performance. While RAM can be accessed about 100 million times per second, the external cache is made from a faster type of memory chip, which can be accessed about 250 million times per second. While not as extreme as the speed differential between disk and RAM, it is still significant.

However, we can't afford to use external cache exclusively instead of RAM because there isn't enough of it. Therefore, we must reserve external cache for tasks where speed is all-important, such as supplying frequently used data or programs to the CPU.

FIGURE 2.2. A memory hierarchy

The same analysis applies to the trade-off between the external cache and the internal cache. The internal cache's characteristics are similar to those of the external cache, but to a greater degree; it's even smaller and faster, allowing access at the rated speed of the CPU. Both characteristics have to do with its privileged position on the same chip as the CPU; this reduces the delays in communication between the internal cache and the CPU, but means that chip area devoted to the cache has to compete with area for the CPU, as long as the total chip size is held constant.

Unfortunately, we can't just increase the size of the chip to accommodate more internal cache because of the expense of doing so. Larger chips are more difficult to make, which reduces their yield, or the percentage of good chips. In addition, fewer of them fit on one wafer, which is the unit of manufacturing. Both of these attributes make larger chips more expensive to make.

How Caches Improve Performance

To oversimplify a bit, here's how caching reduces the effects of slow RAM. Whenever a data item is requested by the CPU, there are three possibilities.
1. It is already in the internal cache. In this case, the value is sent to the CPU without referring to RAM at all.
2. It is in the external cache. In this case, it will be "promoted" to the internal cache and sent to the CPU at the same time.
3. It is not in either the internal or external cache. In this case, it has to be entered into a location in the internal cache. If there is nothing already stored in that cache location, the new item is simply added to the cache. However, if there is a data item already in that cache location, then the old item is displaced to the external cache, and the new item is written in its place.18 If the external cache location is empty, that ends the activity; if it is not empty, then the item previously in that location is written out to RAM and its slot is used for the one displaced from the internal cache.19

How Registers Improve Performance

Another way to improve performance which has been employed for many years is to create a small number of private storage areas, called registers, that are on the same chip as the CPU itself.20 Programs use these registers to hold data items that are actively in use; data in registers can be accessed within the time allocated to instruction execution (2 ns in our example), rather than in the much longer time needed to access data in RAM. This means that the time needed to access data in registers is predictable, unlike data that may have been displaced from the internal cache by more recent arrivals and thus must be reloaded from the external cache or even from RAM. Most CPUs have some dedicated registers, which aren't available to application programmers (that's us), but are reserved for the operating system (e.g., DOS, Unix, OS/2) or have special functions dictated by the hardware design; however, we will be concerned primarily with the general registers intended for our use.21 These are used to hold working copies of data items called variables, which otherwise reside in RAM during the execution of the program. These variables represent specific items of data that we wish to keep track of in our programs.

The notion of using registers to hold temporary copies of variables wasn't crystal clear to Susan. Here's our discussion:

Susan: Here we go, getting lost. When you say, "The general registers are used to hold working copies of data items called variables, which reside in RAM", are you saying RAM stores info when not in use?
Steve: During execution of a program, when data aren't in the general registers, they are generally stored in RAM.
Susan: I didn't think RAM stores anything when turned off.
Steve: You're correct; RAM doesn't retain information when the machine is turned off. However, it is used to keep the "real" copies of data that we want to process but won't fit in the registers.22
You can put something in a variable and it will stay there until you store something else there; you can also look at it to find out what's in it. As you might expect, several types of variables are used to hold different kinds of data; the first ones we will look at are variables representing whole numbers (the so-called integer variables), which are a subset of the category called numeric variables. As this suggests, other variable types can represent numbers with fractional parts. We'll use these so-called floating-point variables later.

Different types of variables require different amounts of RAM to store them, depending on the amount of data they contain; a very common type of numeric variable, known as a short, as implemented by the compiler on the CD in the back of the book, requires 16 bits (that is, 2 bytes) of RAM to hold any of 65536 different values, from - 32768 to 32767, including 0.23 As we will see shortly, these odd-looking numbers are the result of using the binary system. By no coincidence at all, the early Intel CPUs such as the 8086 had general registers that contained 16 bits each; these registers were named ax, bx, cx, dx, si, di, and bp. Why does it matter how many bits each register holds? Because the number (and size) of instructions it takes to process a variable is much less if the variable fits in a register; therefore, most programming languages, C++ included, relate the size of a variable to the size of the registers available to hold it. A short is exactly the right size to fit into a 16-bit register and therefore can be processed efficiently by the early Intel machines, whereas longer variables had to be handled in pieces, causing a great decline in efficiency of the program.

Progress marches on: more recent Intel CPUs, starting with the 80386, have 32-bit general registers; these registers are called eax, ebx, ecx, edx, esi, edi, and ebp. You may have noticed that these names are simply the names of the old 16-bit registers with an e tacked onto the front. The reason for the name change is that when Intel increased the size of the registers to 32 bits with the advent of the 80386, it didn't want to change the behavior of previously existing programs that (of course) used the old names for the 16-bit registers. So the old names, as illustrated in Figure 2.2, now refer to the bottom halves of the "real" (that is, 32-bit) registers; instructions using these old names behave exactly as though they were accessing the 16-bit registers on earlier machines. To refer to the 32-bit registers, you use the new names eax, ebx, and so on, for extended ax, extended bx, and so forth.

We still have some way to go before we get to the end of our investigation of the hardware inside a computer, but let's pause here for a question from Susan about how much there is to know about this topic:

Susan: You said before that we were going to look at the computer at the "lowest level accessible to a programmer". Does it get any deeper than this?
Steve: Yes, it certainly does. There are entire disciplines devoted to layers below those that we have any access to as programmers. To begin with, the machine instructions and registers that we see as assembly language programmers are often just the surface manifestation of underlying code and hardware that operates at a deeper level (the so-called "microcode" level), just as C++ code is the surface manifestation of assembly language instructions.
Below even the lowest level of programmable computer hardware is the physical implementation of the circuitry as billions of transistors, and at the bottom we have the quantum mechanical level that allows such things as transistors in the first place. Luckily, the engineers who build the microprocessors (or actually, build the machines that make the microprocessors) have taken care of all those lower levels for us, so we don't have to worry about them.
Now that we've cleared that up, let's get back to the question of what it means to say that instructions using the 16-bit register names behave exactly as though they were accessing the 16-bit registers on earlier machines. Before I can explain this, you'll have to understand the binary number system on which all modern computers are based. To make this number system more intelligible, I have written the following little fable.

2.4. The Binary Number System

Once upon a time, the Acme company had a factory that made golf carts. One day, Bob, the President of Acme, decided to add an odometer to the carts so that the purchaser of the cart could estimate when to recharge the battery. To save money, Bob decided to buy the little numbered wheels for the odometers and have his employees put the odometers together. The minimum order was for a thousand odometer wheels, which was more than he needed for his initial run of 50 odometers. When he got the wheels, however, he noticed that they were defective. Instead of the numbers 0 - 9, each wheel had only two numbers, 0 and 1. Of course, he was quite irritated by this error and attempted to contact the company from which he had purchased the wheels, but it had closed down for a month for summer vacation. What was he to do until it reopened?

While he was fretting about this problem, the employee who had been assigned the task of putting the odometers together from the wheels came up with a possible solution. This employee, Jim, came into Bob's office and said, "Bob, I have an idea. Since we have lots of orders for these odometer-equipped carts, maybe we can make an odometer with these funny wheels and tell the customers how to read the numbers on the odometer."

Bob was taken aback by this idea. "What do you mean, Jim? How can anyone read those screwy odometers?"

Jim had given this some thought. "Let's take a look at what one of these odometers, say with five wheels, can display. Obviously, it would start out reading 00000, just like a normal odometer. Then when one mile has elapsed, the rightmost wheel turns to 1, so the whole display is 00001; again, this is just like a normal odometer."

"Now we come to the tricky part. The rightmost wheel goes back to 0, not having any more numbers to display, and pushes the `tens' wheel to 1; the whole number now reads 00010. Obviously, one more mile makes it 00011, which gives the situation shown in Figure 2.3."

FIGURE 2.3. The first few numbers

Normal odometer Funny odometer
00000 00000
00001 00001
00002 00010
00003 00011

Jim continued, "What's next? This time, the rightmost wheel turns over again to 0, triggering the second wheel to its next position. However, this time, the second wheel is already at its highest value, 1; therefore, it also turns over to 0 and increments the third wheel. It's not hard to follow this for a few more miles, as shown in Figure 2.4."

Bob said, "I get it. It's almost as though we were counting normally, except that you skip all the numbers that have anything but 0s or 1s in them."

FIGURE 2.4. The next few numbers

Normal odometer Funny odometer
00004 00100
00005 00101
00006 00110
00007 00111

"That's right, Bob. So I suppose we could make up a list of the `real' numbers and give it to the customers to use until we can replace these odometers with normal ones. Perhaps they'll be willing to work with us on this problem."

"Okay, Jim, if you think they'll buy it. Let's get a few of the customers we know best and ask them if they'll try it; we won't charge them for the odometers until we have the real ones, but maybe they'll stick with us until then. Perhaps any odometer would be better than no odometer at all."

Jim went to work, making some odometers out of the defective wheels; however, he soon figured out that he had to use more than five wheels because that allowed only numbers from 0 to 31. How did he know this?

Each wheel has two numbers, 0 and 1. So with one wheel, we have a total of two combinations. Two wheels can have either a 0 or a 1 for the first number, and for each position of the first wheel there are two possibilities for the second; 2*2 = 4, so there are a total of four combinations for two wheels.24 With three wheels, the same analysis holds: two numbers for the first wheel * two for the second wheel * two for the third wheel, or eight possibilities in all; actually, they are the same eight possibilities we saw in Figures 2.3 and 2.4.

A pattern is beginning to develop. For each added wheel, we get twice as many possible combinations. To see how this continues, take a look at Figure 2.525, which shows the count of combinations vs. the number of wheels for all wheel counts up to 16 (i.e., quantities that can be expressed by 16-bit numbers).

Jim decided that 14 wheels would do the job since the lifespan of the golf cart probably wouldn't exceed 16383 miles, and so he made up the odometers. The selected customers turned out to be agreeable and soon found that having even a weird odometer was better than none, especially since they didn't have to pay for it. However, one customer did have a complaint. The numbers on the wheels didn't seem to make sense when translated with the chart supplied by Acme.

The customer estimated that he had driven the cart about 9 miles, but the odometer displayed

11111111110111

which, according to his translation chart, was 16375 miles. What could have gone wrong?

Jim decided to have the cart brought in for a checkup and what he discovered was that the odometer cable had been hooked up backwards. That is, instead of turning the wheels forward, they were going backwards. That was part of the solution, but why was the value 16375? Just like a car odometer, in which 99999 (or 999999, if you have a six-wheel odometer) is followed by 0, going backwards from 0 reverses that progression. Similarly, the number 11111111111111 on the funny odometers would be followed by 00000000000000, since the "carry" off the leftmost digit is lost.

FIGURE 2.5. How many combinations?

Number of wheels Number of combinations
1 2
2
4
3
8
4
16
5
32
6
64
7
128
8
256
9
512
10
1024
11
2048
12
4096
13
8192
14
16384
15
32768
16
65536

Therefore, if you start out at 0 and go backward 1 mile, you'll get

11111111111111

The next mile will turn the last digit back to 0, producing

11111111111110

What happens next? The last wheel turns back to 1 and triggers the second wheel to switch as well, with the result

1111111111101

The next few "backward" numbers look like this:

11111111111100

11111111111011

11111111111010

11111111111001

11111111111000

11111111110111

and so on. If you look at the right-hand end of these numbers, you'll see that the progression is just the opposite of the "forward" numbers.

As for the customer's actual mileage, the last one of these is the number the customer saw on his backward odometer. Apparently, he was right about the distance driven, since this is the ninth "backward" number. So Jim fixed the backward odometer cable and reset the value to the correct number, 00000000001001, or 9 miles.

Eventually, Acme got the right odometer wheels with 0 - 9 on them, replaced the peculiar ones and everyone lived happily ever after.

THE END

Signed Variables

Since the wheels that made up the funny odometers contain only two digits, 0 and 1, the odometers use the binary system for counting. Now it should be obvious why we will see numbers like 65536 and 32768 in our discussions of the number of possible different values that a variable can hold; variables are stored in RAM as collections of bytes, each of which contains 8 bits.26 As the list of combinations indicates, 8 bits (1 byte) provide 256 different combinations, while 16 bits (2 bytes) can represent 65536 different possible values.

But what about the "backward" numbers with a lot of 1s on the left? As the fable suggests, they correspond to negative numbers. That is, if moving 2 miles forward from 0 registers as 00000000000010, and moving 2 miles backward from 0 registers as 11111111111110, then the latter number is in some sense equivalent to - 2 miles. This in fact is the way that negative integers are stored in the computer; integer variables that can store either positive or negative values are called signed variables. If we don't specify whether we want to be able to store either positive or negative values in a given variable, the C++ language assumes that we want that ability and provides it for us by default.

However, adding the ability to represent negative numbers has a drawback: you can't represent as many positive numbers. This should be fairly obvious, since if we interpret some of the possible patterns as negative, they can't also be used for positive values. Of course, we don't have to worry about negative numbers when counting, for example, how many employees there are working for our company; in such cases, we can specify that we want to use unsigned variables, which will always be interpreted as positive (or 0) values. An example is an unsigned short variable, which uses 16 bits (that is, 2 bytes) to hold any number from 0 to 65535, which totals 65536 different values.27 This capacity can be calculated as follows: since each byte is 8 bits, 2 bytes contain a total of 16 bits, and 216 is 65536.

It's important to understand that the difference between a short (that is, a signed short) and an unsigned short is not size, but which 65536 values each can hold. An unsigned short can hold any whole number from 0 to 65535, whereas a short can hold any value from - 32768 to +32767.28

I hope this is clear to you, but in case it isn't, let's see how Susan and I worked over this point.

Susan: I really don't think I understand what a short is besides being 2 bytes of RAM, and I don't really know what signed and unsigned mean.
Steve: A short is indeed 2 bytes (that is, 16 bits) of RAM, at least with our current compiler. This means that it can hold any of 216 (65536) different values. This is a very nice range of values for holding the number of pounds that a pumpkin weighs (for example). You'll see some more uses for this type of variable later.
The difference between a (signed) short and an unsigned short is exactly which 65536 values each can hold. An unsigned short can hold any whole number from 0 to 65535, whereas a (signed) short can hold any value from - 32768 to +32767. The difference between these is solely in the interpretation that we (and the compiler) give to the values. In other words, it's not possible to tell whether a given 2 bytes of RAM represent a short or an unsigned short by looking at the contents of those bytes; you have to know how the variable was defined in the program.
Susan: OK, let's start over. A short is 2 bytes of RAM. A short is a variable. A short is a numeric variable. It can be signed (why is that a default?), meaning its value can be - 32768 to +32767, or unsigned, meaning its value can be 0 - 65535. How's that?
Steve: That's fine. Since you've asked, the reason signed is the default is because that's the way it was in C, and changing it in C++ would "break" C programs that depend on this default. Bjarne Stroustrup, the inventor of C++, has a rule that C++ must be as close to C as possible but no closer. In this case, there's no real reason to change the default, so it wasn't changed.
Susan: Oh, why is it that every time you say something is fairly obvious, my mind just shuts down? When you say "if we interpret some of the possible patterns as negative, they can't also be used for positive values." Huh? Then if that is the case, would not the reverse also be true? I can see how this explains the values of the signed and unsigned short, but really, I don't think I have grasped this concept.
Steve: What I was trying to explain is that you have to choose one of the following two possibilities:
1. (signed) short range: - 32768 to +32767
2. unsigned short range: 0 to 65535
In other words, you have to decide whether you want a given variable to represent
1. Any of 65536 values from - 32768 to +32767, including 0.
2. Any of 65536 values from 0 to 65535
If you want a variable with a range like that in selection 1, use a (signed) short; if you prefer the range in selection 2, use an unsigned short. For example, for the number of lines in a text file, you could use an unsigned short, since the maximum number of lines could be limited to less than 65000 lines and couldn't ever be negative. On the other hand, to represent the number of copies of a book that have been sold in the last month, including the possibility of returns exceeding sales, a signed short would be better, since the value could be either positive or negative.29
Susan: In other words, if you are going to be using variables that might have a negative value, then use a signed short, and if you want strictly positive numbers, including 0 as "positive", then use an unsigned short. Right?
Steve: Exactly!
Susan: Well, then, how do you write a short to indicate that it is signed or unsigned?
Steve: When you define it, you have to specify that it is unsigned if you want it to be unsigned; the default is signed. In other words, if we define a variable x as short x;, it will be signed, whereas if we want a variable called x that is an unsigned short, we have to say unsigned short x;.
Susan: So does it make any difference if your variable is going to overlap the signed and unsigned short ranges? For example, if you are using numbers from 10000 to 30000, would it matter which short you used? It falls under the definition of both.
Steve: You can use whichever you wish in that case.

A (Very) Short Course in Hexadecimal Notation

You may have noticed that it's tedious and error prone to represent numbers in binary; a long string of 0s and 1s is hard to remember or to copy. For this reason, the pure binary system is hardly ever used to specify numbers in computing. However, we have already seen that binary is much more "natural" for computers than the more familiar decimal system. Is there a number system that we humans can use a little more easily than binary, while retaining the advantages of binary for describing internal events in the computer?

As it happens, there is. It's called hexadecimal, which means "base 16". As a rule, the term hexadecimal is abbreviated to hex. Since there are 16 possible combinations of 4 bits (2*2*2*2), hexadecimal notation allows 4 bits of a binary number to be represented by one hex digit. Unfortunately, however, there are only 10 "normal" digits, 0 - 9. To represent a number in any base, you need as many different digit values as the base, so that any number less than the base can be represented by one digit. For example, in base 2, you need only two digits, 0 and 1. In base 8 (octal), you need eight digits, 0 - 7.30 So far, so good. But what about base 16? To use this base, we need 16 digits. Since only 10 numeric digits are available, hex notation needs a source for the other six digits. Because letters of the alphabet are available and familiar, the first six letters, a - f, were adopted for this service.31,32

The notion of a base-16 numbering system can really throw someone who learned normal decimal arithmetic solely by rote, without understanding the concepts on which it is based. Susan and I spent quite a bit of time on it, starting with this discussion:

Susan: I don't get this at all! What is the deal with the letters in the hex system? I guess it would be okay if 16 wasn't represented by 10!
Steve: Well, there are only 10 "normal" digits, 0 - 9. To represent a number in any base, you need as many "digits" as the base, so that any number less than the base can be represented by one "digit". This is no problem with a base less than ten, such as octal, but what about base 16? To use this base we need 16 digits, 0 - 9 and a - f. One way to remember this is to imagine that the "hex" in "hexadecimal" stands for the six letters a - f and the "decimal" stands for the 10 digits 0 - 9.
Susan: OK, so a hex digit represents 16 bits? So then is hex equal to 2 bytes? According to the preceding a hex digit is 4 bits.
Steve: Yes, a hex digit represents 4 bits. Let's try a new approach. First, let me define a new term, a hexit. That's short for "hexadecimal digit", just like "bit" is short for "binary digit".
1. How many one-digit decimal numbers exist?
2. How many two-digit decimal numbers exist?
3. How many three-digit decimal numbers exist?
4. How many four-digit decimal numbers exist?
5. How many one-bit binary numbers exist?
6. How many two-bit binary numbers exist?
7. How many three-bit binary numbers exist?
8. How many four-bit binary numbers exist?
9. How many one-hexit hexadecimal numbers exist?
10. How many two-hexit hexadecimal numbers exist?
11. How many three-hexit hexadecimal numbers exist?
12. How many four-hexit hexadecimal numbers exist?
The answers are:
1. 10
2. 100
3. 1000
4. 10000
5. 2
6. 4
7. 8
8. 16
9. 16
10. 256
11. 4096
12. 65536
What do all these answers have in common? Let's look at the answers a little differently, in powers of 10, 2, and 16, respectively:
1. 10 = 101
2. 100 = 102
3. 1000 = 103
4. 10000 = 104
5. 2 = 21
6. 4 = 22
7. 8 = 23
8. 16 = 24
9. 16 = 161
10. 256 = 162
11. 4096 = 163
12. 65536 = 164
That is, a number that has one digit can represent "base" different values, where "base" is two, ten, or sixteen (in our examples). Every time we increase the size of the number by one more digit, we can represent "base" times as many possible different values, or in other words, we multiply the range of values that the number can represent by the base. Thus, a two-digit number can represent any of "base*base" values, a three-digit number can represent any of "base*base*base" values, and so on. That's the way positional number systems such as decimal, binary, and hex work. If you need a bigger number, you just add more digits.
Okay, so what does this have to do with hex? If you look at the above table, you'll see that 24 (16) is equal to 161. That means that 4 bits are exactly equivalent to one hexit in their ability to represent different numbers: exactly 16 possible numbers can be represented by four bits, and exactly 16 possible numbers can be represented by one hexit.
This means that you can write one hexit wherever you would otherwise have to use four bits, as illustrated in Figure 2.6.
So an 8-bit number, such as:
0101 1011
can be translated directly into a hex value, like this:
5 b
For this reason, binary is almost never used. Instead, we use hex as a shortcut to eliminate the necessity of reading, writing, and remembering long strings of bits.
Susan: A hex digit or hexit is like a four-wheel odometer in binary. Since each wheel is capable of only one of two values, being either (1) or (0), then the total number of possible values is 16. Thus your 2*2*2*2 = 16. I think I've got this down.
Steve: You certainly do!
Susan: If it has 4 bits and you have 2 of them then won't there be eight "wheels" and so forth? So 2 hex would hold XXXXXXXX places and 3 hex would hold XXXXXXXXXXXX places.
Steve: Correct. A one-hexit number is analogous to a one-digit decimal number. A one-hexit number contains 4 bits and therefore can represent any of 16 values. A two-hexit number contains 8 bits and therefore can represent any of 256 values.
Now that we've seen how each hex digit corresponds exactly to a group of four binary digits, here's an exercise you can use to improve your understanding of this topic: Invent a random string of four binary digits and see where it is in Figure 2.6. I guarantee it'll be there somewhere! Then look at the "hex" column and see what "digit" it corresponds to. There's nothing really mysterious about hex; since we have run out of digits after 9, we have to use letters to represent the numbers `ten', `eleven', `twelve', `thirteen', `fourteen', and `fifteen'.
FIGURE 2.6. Binary to hex conversion table
4-bit value 1-hexit value
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 8
1001 9
1010 a
1011 b
1100 c
1101 d
1110 e
1111 f

Another reason to use hex rather than decimal is that byte values expressed as hex digits can be combined directly to produce larger values, which is not true with decimal digits. In case this isn't obvious, let's go over it in more detail. Since each hex digit (0 - f) represents exactly 4 bits, two of them (00 - ff) represent 8 bits, or one byte. Similarly, 4 hex digits (0000 - ffff) represent 16 bits, or a short value; the first two digits represent the first byte of the 2-byte value, and the last two digits, the second byte. This can be extended to any number of bytes. On the other hand, representing 4 bits requires two decimal digits, as the values range from 00 - 15, whereas it takes three digits (000 - 255) to represent one byte. A 2-byte value requires five decimal digits, since the value can be from 00000 to 65535. As you can see, there's no simple relationship between the decimal digits representing each byte and the decimal representation of a 2-byte value.

Figure 2.7 is a table showing the correspondence between some decimal, hex, and binary numbers, with the values of each digit position in each number base indicated, and the calculation of the total of all of the bit values in the binary representation.

FIGURE 2.7. Different representations of the same numbers

Decimal Hexadecimal Binary Sum of binary

Place Values Place Values Place Values digit values

10 1 16 1 16 8 4 2 1

0 0 0 0 0 0 0 0 = 0 + 0 + 0 + 0 + 0

1 0 1 0 0 0 0 1 = 0 + 0 + 0 + 0 + 1

2 0 2 0 0 0 1 0 = 0 + 0 + 0 + 2 + 0

3 0 3 0 0 0 1 1 = 0 + 0 + 0 + 2 + 1

4 0 4 0 0 1 0 0 = 0 + 0 + 4 + 0 + 0

5 0 5 0 0 1 0 1 = 0 + 0 + 4 + 0 + 1

6 0 6 0 0 1 1 0 = 0 + 0 + 4 + 2 + 0

7 0 7 0 0 1 1 1 = 0 + 0 + 4 + 2 + 1

8 0 8 0 1 0 0 0 = 0 + 8 + 0 + 0 + 0

9 0 9 0 1 0 0 1 = 0 + 8 + 0 + 0 + 1

1 0 0 a 0 1 0 1 0 = 0 + 8 + 0 + 2 + 0

1 1 0 b 0 1 0 1 1 = 0 + 8 + 0 + 2 + 1

1 2 0 c 0 1 1 0 0 = 0 + 8 + 4 + 0 + 0

1 3 0 d 0 1 1 0 1 = 0 + 8 + 4 + 0 + 1

1 4 0 e 0 1 1 1 0 = 0 + 8 + 4 + 2 + 0

1 5 0 f 0 1 1 1 1 = 0 + 8 + 4 + 2 + 1

1 6 1 0 1 0 0 0 0 = 16 + 0 + 0 + 0 + 0

1 7 1 1 1 0 0 0 1 = 16 + 0 + 0 + 0 + 1

1 8 1 2 1 0 0 1 0 = 16 + 0 + 0 + 2 + 0

1 9 1 3 1 0 0 1 1 = 16 + 0 + 0 + 2 + 1

Susan had some more thoughts on the hexadecimal number system. Let's listen in:
Susan: I think you need to spend a little more time reviewing the hex system, like an entire chapter.<G> Well, I am getting the impression that we are going to be working with hex, so I am trying to concentrate my understanding on that instead of binary. I think this all moves a little too fast for me. I don't know what your other reviewers are saying but I just feel like I get a definition of an abstract concept, and the next thing I know I am supposed to be doing something with it, like make it work. Ha! I personally need to digest new concepts, I really need to think them over a bit, to take them in and absorb them. I just can't start working with it right away.
As usual, I've complied with her request; the results are immediately ahead.

2.5. Exercises

Here are some exercises that you can use to check your understanding of the binary and hexadecimal number systems.

Consider the two types of numeric variables we've encountered so far, short and unsigned short. Let's suppose that x is a short, and y is an unsigned short, both of them currently holding the value 32767, or 7fff in hex.

1. What is the result of adding 1 to y, in both decimal and hex?
2. What is the result of adding 1 to x, in both decimal and hex?
Answers to exercises can be found at the end of the chapter.

2.6. Using the 16-bit Register Names

Before we took this detour into the binary and hexadecimal number systems, I promised to explain what it means to say that the instructions using the 16-bit register names "behave exactly as though they were accessing the 16-bit registers on earlier machines". After a bit more preparation, we'll be ready for that explanation.

First, let's take a look at some characteristics of the human-readable version of machine instructions: assembly language instructions. The assembly language instructions we will look at have a fairly simple format.33 The name of the operation is given first, followed by one or more spaces. The next element is the "destination", which is the register or RAM location that will be affected by the instruction's execution. The last element in an instruction is the "source", which represents another register, a RAM location, or a constant value to be used in the calculation. The source and destination are separated by a comma.34 Here's an example of a simple assembly language instruction:

add ax,1

In this instruction, add is the operation, ax is the destination, and the constant value 1 is the source. Thus, add ax,1 means to add 1 to the contents of ax, replacing the old contents of ax with the result.

Let's see what Susan has to say about the makeup of an assembly language instruction:

Susan: Why is the destination first and the source last? That seems backward to me.
Steve: I agree, it does seem backward. That's just the way Intel did it in their assembly language. Other machines and other assemblers have different arrangements; for example, Motorola 68000 assembly language has the source on the left and the destination on the right.
Susan: So the destination can be a register, cache, or RAM?
Steve: Yes, that's right. However, the cache is transparent to the programmer. That is, you don't say "write to the cache" or "read from the cache"; you just use RAM addresses and the hardware uses the cache as appropriate to speed up access to frequently used locations. On the other hand, you do have to address registers explicitly when writing an assembly language program.
Now we're finally ready to see what the statement about using the 16-bit register names on a 32-bit machine means. Suppose we have the register contents shown in Figure 2.8 (indicated in hexadecimal). If we were to add 1 to register ax, by executing the instruction add ax,1, the result would be as shown in Figure 2.9.
FIGURE 2.8. 32 and 16 bit registers, before add ax,1
32-bit register 32-bit contents 16-bit register 16-bit contents
eax 1235ffff ax ffff

FIGURE 2.9. 32 and 16 bit registers, after add ax,1

32-bit register 32-bit contents 16-bit register 16-bit contents
eax 12350000 ax 0000

In case this makes no sense, consider what happens when you add 1 to 9999 on a four digit counter such as an odometer. It "turns over" to 0000, doesn't it? The same applies here: ffff is the largest number that can be represented as four hex digits, so if you add 1 to a register that has only four (hex) digits of storage available, the result is 0000.

As you might imagine, Susan was quite intrigued with the above detail; here is her reaction.

Susan: I have an understanding retention half-life of about 30 nanoseconds, but while I was reading this I was understanding it except I am boggled as to how adding 1 to ffff makes 0000, see, I am still not clear on hex. Question: When you show the contents of a 32-bit register as being 12350000, then is the 1235 the upper half and the 0000 the lower half? Is that what you are saying?
Steve: That's right!
As this illustrates, instructions that refer to ax have no effect whatever on the upper part of eax; they behave exactly as though the upper part of eax did not exist. However, if we were to execute the instruction add eax,1 instead of add ax,1, the result would look like Figure 2.10.
FIGURE 2.10. 32 and
32-bit register 32-bit contents 16-bit register 16-bit contents
eax 12360000 ax 0000
16 bit registers, after add eax,1

In this case, eax is treated as a whole. Similar results apply to the other 32-bit registers and their 16-bit counterparts.

Revisiting the Memory Hierarchy

Unfortunately, it isn't possible to use only registers and avoid references to RAM entirely, if only because we'll run out of registers sooner or later. This is a good time to look back at the diagram of the "memory hierarchy" (Figure 2.2) and examine the relative speed and size of each different kind of memory.

The "size" attribute of the disk and RAM are specified in megabytes, whereas the size of an external cache is generally in the range from 64 kilobytes to a few megabytes. As I mentioned before, the internal cache is considerably smaller, usually in the 8 to 16 kilobyte range. The general registers, however, provide a total of 28 bytes of storage; this should make clear that they are the scarcest memory resource. To try to clarify why the registers are so important to the performance of programs, I've listed the "speed" attribute in number of accesses per second, rather than in milliseconds, nanoseconds, and so forth. In the case of the disk, this is about 100 accesses per second. Normal RAM can be accessed about 100 million times per second, while the external cache allows about 250 million accesses per second. The clear winners, though, are the internal cache and the registers, which can be accessed 500 million times per second and 1 billion times per second, respectively.

The Advantages and Disadvantages of Using Registers

In a way, the latter figure (1 billion accesses per second for registers) overstates the advantages of registers relative to the cache. You see, any given register can be accessed only 500 million times per second35; however, many instructions refer to two registers and still execute in one CPU cycle. Therefore, the maximum number of references per second is more than the number of instructions per second.

However, this leads to another question: Why not use instructions that can refer to more than one memory address (known as memory-to-memory instructions) and still execute in one CPU cycle? In that case, we wouldn't have to worry about registers; since there's (relatively) a lot of cache and very few registers, it would seem to make more sense to eliminate the middleman and simply refer to data in the cache.36 Of course, there is a good reason for the provision of both registers and cache. The main drawback of registers is that there are so few of them; on the other hand, one of their main advantages is also that there are so few of them. Why is this?

The main reason to use registers is that they make instructions shorter: since there are only a few registers, we don't have to use up a lot of bits specifying which register(s) to use. That is, with eight registers, we only need 3 bits to specify which register we need. In fact, there are standardized 3-bit codes that might be thought of as "register addresses", which are used to specify each register when it is used to hold a variable. Figure 2.11 is the table of these register codes.37

FIGURE 2.11. 32 and 16 bit register codes
Register address 16-bit register 32-bit register
000 ax eax
001 cx ecx
010 dx edx
011 bx ebx
100 sp esp
101 bp ebp
110 si esi
111 di edi

By contrast, with a "memory-to-memory" architecture, each instruction would need at least 2 bytes for the source address, and 2 bytes for the destination address.38 Adding 1 byte to specify what the instruction is going to do, this would make the minimum instruction size 5 bytes, whereas some instructions that use only registers can be as short as 1 byte. This makes a big difference in performance because the caches are quite limited in size; big programs don't fit in the caches, and therefore require a large number of RAM accesses. As a result, they execute much more slowly than small programs.

The Effect of Registers on Program Size

This explains why we want our programs to be smaller. However, it may not be obvious why using registers reduces the size of instructions, so here's an explanation.

Most of the data in use by a program are stored in RAM. When using a 32-bit CPU, it is theoretically possible to have over 4 billion bytes of memory (2^32 is the exact number). Therefore, that many distinct addresses for a given byte of data are possible, and to specify any of these requires 32 bits. Since there are only a few registers, specifying which one you want to use takes only a few bits; therefore, programs use register addresses instead of memory addresses wherever possible to reduce the number of bits in each instruction required to specify addresses.

I hope this is clear, but it might not be. It certainly wasn't to Susan. Here's the conversation we had on this topic:

Susan: I see that you are trying to make a point about why registers are more efficient in terms of making instructions shorter, but I just am not picturing exactly how they do this. How do you go from "make the instructions much shorter" to "we don't have to use up a lot of bits specifying which registers to use"?
Steve: Let's suppose that we want to move data from one place to another in memory. In that case, we'll have to specify two addresses: the "from" address and the "to" address. One way to do this is to store the addresses in the machine language instruction. Since each address is at least 16 bits, an instruction that contains two addresses needs to occupy at least 32 bits just for the addresses, as well as some more bits to specify exactly what instruction we want to perform. Of course, if we're using 32-bit addresses, then a "two-address" instruction would require 64 bits just for the two addresses, in addition to whatever bits were needed to specify the type of instruction.
Susan: OK. . . think I got this. . .
Steve: On the other hand, if we use registers to hold the addresses of the data, we need only enough bits to specify each of two registers. Since there aren't that many registers, we don't need as many bits to specify which ones we're referring to. Even on a machine that has 32 general registers, we'd need only 10 bits to specify two registers; on the Intel machines, with their shortage of registers, even fewer bits are needed to specify which register we're referring to.
Susan: Are you talking about the bits that are needed to define the instruction?
Steve: Yes.
Susan: How would you know how many bits are needed to specify the two registers?
Steve: If you have 32 different possibilities to select from, you need 5 bits to specify one of them, because 32 is 2 to the fifth power. If we have 32 registers, and any of them can be selected, that takes 5 bits to select any one of them. If we have to select two registers on a CPU with 32 registers, we need 10 bits to specify both registers.
Susan: So what does that have to do with it? All we are talking about is the instruction that indicates "select register" right? So that instruction should be the same and contain the same number of bits whether you have 1 or 32 registers.
Steve: There is no "select register" instruction. Every instruction has to specify whatever register or registers it uses. It takes 5 bits to select 1 of 32 items and only 3 bits to select 1 of 8 items; therefore, a CPU design that has 32 registers needs longer instructions than one that has only 8 registers.
Susan: I don't see why the number of registers should have an effect on the number of bits one instruction should have.
Steve: If you have two possibilities, how many bits does it take to select one of them? 1 bit. If you have four possibilities, how many bits does it take to select one of them? 2 bits. Eight possibilities require 3 bits; 16 possibilities require 4 bits; and finally 32 possibilities require 5 bits.
Susan: Some machines have 32 registers?
Steve: Yes. The PowerPC, for example. Some machines have even more registers than that.
Susan: If the instructions to specify a register are the same, then why would they differ just because one machine has more registers than another?
Steve: They aren't the same from one machine to another. Although every CPU that I'm familiar with has registers, each type of machine has its own way of executing instructions, including how you specify the registers.
Susan: OK, and in doing so it is selecting a register, right? An instruction should contain the same number of bits no matter how many registers it has to call on.
Steve: Let's take the example of an add instruction, which as its name implies, adds two numbers. The name of the instruction is the same length, no matter how many registers there are; that's true. However, the actual representation of the instruction in machine language has to have room for enough bits to specify which register(s) are being used in the instruction.
Susan: They are statements right? So why should they be bigger or smaller if there are more or fewer registers?
Steve: They are actually machine instructions, not C++ statements. The computer doesn't know how to execute C++ statements, so the C++ compiler is needed to convert C++ statements into machine instructions. Machine instructions need bits to specify which register(s) they are using; so, with more registers available, more bits in the instructions have to be used to specify the register(s) that the instructions are using.
Susan: Do all the statements change the values of bits they contain depending on the number of registers that are on the CPU?
Steve: Yes, they certainly do. To be more precise, the machine language instructions that execute a statement are larger or smaller depending on the number of registers in the machine because they need more bits to specify one of a larger number of registers.
Susan: "It takes five bits to select one of 32 items..."
"...and only three bits to select one of eight items." Why?
Steve: What is a bit? It is the amount of information needed to select one of two alternatives. For example, suppose you have to say whether a light is on or off. How many possibilities exist? Two. Since a single bit has two possible states, 0 or 1, we can represent "on" by 1 and "off" by 0 and thus represent the possible states of the light by one bit.
Now suppose that we have a fan that has four settings: low, medium, high, and off. Is one bit enough to specify the current setting of the fan? No, because one bit has only two possible states, while the fan has four. However, if we use two bits, then it will work. We can represent the states by bits as follows:
bits state
---- -----
00 off
01 low
10 medium
11 high
Note that this is an arbitrary mapping; there's no reason that it couldn't be like this instead:
bits state
---- -----
00 medium
01 high
10 off
11 low
However, having the lowest "speed" (that is, off) represented by the lowest binary value (00) and the increasing speeds corresponding to increasing binary values makes more sense and therefore is easier to remember.
This same process can be extended to represent any number of possibilities. If we have eight registers, for example, we can represent each one by 3 bits, as noted previously in Figure 2.11 on page 54. That is the actual representation in the Intel architecture; however, whatever representation might have been used, it would require 3 bits to select among eight possibilities. The same is true for a machine that has 32 registers, except that you need 5 bits instead of 3.
Susan: Okay, so then does that mean that more than one register can be in use at a time? Wait, where is the room that you are talking about?
Steve: Some instructions specify only one register (a "one-register" instruction), while others specify two (a "two-register" instruction); some don't specify any registers. For example, there are certain instructions whose effect is to determine which instruction will be executed next (the so-called "branch" instructions). These instructions often do not contain any register references at all, but rather specify the address of the next instruction directly. These instructions are used to implement if statements, for loops, and other flow control statements.
Susan: So, when you create an instruction you have to open up enough "room" to talk to all the registers at once?
Steve: No, you have to have enough room to specify any one register, for a one-register instruction, or any two registers for a two-register instruction.
Susan: Well, this still has me confused. If you need to specify only one register at any given time, then why do you always need to have all the room available? Anyway, where is this room? Is it in RAM or is it in the registers themselves? Let's say you are going to specify an instruction that uses only 1 of 32 registers. Are you saying that even though you are going to use just one register you have to make room for all 32?
Steve: The "room" that I'm referring to is the bits in the instruction that specify which register the instruction is using. That is, if there are eight registers and you want to use one of them in an instruction, 3 bits need to be set aside in the instruction to indicate which register you're referring to.
Susan: So you need the bits to represent the address of a register?
Steve: Right. However, don't confuse the "address of a register" with a memory address. They have nothing to do with one another, except that they both specify one of a number of possible places to store information. That is, register ax doesn't correspond to memory address 0, and so on.
Susan: Yes, I understand the bit numbers in relation to the number of registers.
Steve: That's good.
Susan: So the "address of a register" is just where the CPU can locate the register in the CPU, not an address in RAM. Is that right?
Steve: Right. The address of a register merely specifies which of the registers you're referring to; all of them are in the CPU.
After that comedy routine, let's go back to Susan's reaction to something I said earlier about registers and variables:
Susan: The registers hold only variables... Okay, I know what is bothering me! What else is there besides variables? Besides nonvariables, please don't tell me that. (Actually that would be good, now that I think of it.) But this is where I am having problems. You are talking about data, and a variable is a type of data. I need to know what else is out there so I have something else to compare it with. When you say a register can hold a variable, that is meaningless to me, unless I know what the alternatives are and where they are held.
Steve: What else is there besides variables? Well, there are constants, like the number 5 in the statement x = 5;. Constants can also be stored in registers. For example, let's suppose that the variable x, which is a short, is stored in location 1237. In that case, the statement x = 5; might generate an instruction sequence that looks like this:

mov ax,5

mov [1237],ax

where the number in the [] is the address of the variable x. The first of these instructions loads 5 into register ax, and the second one stores the contents of ax (5, in this case) into the memory location 1237.
Sometimes, however, constants aren't loaded into registers as in this case but are stored in the instructions that use them. This is the case in the following instruction:

add ax,3

This means to add 3 to whatever was formerly in register ax. The 3 never gets into a register but is stored as part of the instruction.39

Reading Instructions into Memory in Advance

Another way of reducing overhead is to read instructions from RAM in chunks, rather than one at a time, and feed them into the CPU as it needs them; this is called prefetching. This mechanism operates in parallel with instruction execution, loading instructions from RAM into special dedicated registers in the CPU before they're actually needed; these registers are known collectively as the prefetch queue. Since the prefetching is done by a separate unit in the CPU, the time to do the prefetching doesn't increase the time needed for instruction execution. When the CPU is ready to execute another instruction, it can get it from the prefetch queue almost instantly, rather than having to wait for the slow RAM to provide each instruction. Of course, it does take a small amount of time to retrieve the next instruction from the prefetch queue, but that amount of time is included in the normal instruction execution time.
Susan: I don't understand prefetching. What are "chunks"? I mean I understand what you have written, but I can't visualize this. So, there is just no time used to read an instruction when something is prefetched?
Steve: A separate piece of the CPU does the prefetching at the same time as instructions are being executed, so instructions that have already been fetched are available without delay when the execution unit is ready to "do" them.
The effect of combining the use of registers and prefetching the instructions can be very significant. In our example, if we use an instruction that has already been loaded, which reads data from and writes data only to registers, the timing reduces to that shown in Figure 2.12.
FIGURE 2.12. Instruction execution time, using registers and prefetching

Time Function
0 ns Read instruction from RAM
0 ns Read data from RAM
2 ns Execute instruction
0 ns Write result back to RAM
2 ns Total instruction execution time

As I indicated near the beginning of this chapter, the manufacturers aren't lying to us: if we design our programs to take advantage of these and other similar efficiency measures, we can often approach the maximum theoretical performance figures. You've just been subjected to a barrage of information on how a computer works. Let's review before continuing.

2.7. Review

The main components of the computer of most significance to programmers are disk, RAM, and the CPU; the first two of these store programs and data that are used by the CPU.

Computers represent pieces of information (or data) as binary digits, universally referred to as bits. Each bit can have the value 0 or 1. The binary system is used instead of the more familiar decimal system because it is much easier to make devices that can store and retrieve 1 of 2 values, than 1 of 10. Bits are grouped into sets of eight, called bytes.

The disk uses magnetic recording heads to store and retrieve groups of a few hundred to a few thousand bytes on rapidly spinning platters in a few milliseconds. The contents of the disk are not lost when the power is turned off, so it is suitable for more or less permanent storage of programs and data.

RAM, which is an acronym for Random Access Memory, is used to hold programs and data while they're in use. It is made of millions of microscopic transistors on a piece of silicon called a chip. Each bit is stored using a few of these transistors. RAM does not retain its contents when power is removed, so it is not good for permanent storage. However, any byte in a RAM chip can be accessed in about 10 nanoseconds, which is about a million times as fast as accessing a disk. Each byte in a RAM chip can be independently stored and retrieved without affecting other bytes, by providing the unique memory address belonging to the byte you want.

The CPU (also called the processor) is the active component in the computer. It is also made of millions of microscopic transistors on a chip. The CPU executes programs consisting of instructions stored in RAM, using data also stored in RAM. However, the CPU is so fast that even the typical RAM access time of 10 nanoseconds is a bottleneck; therefore, computer manufacturers have added both external cache and internal cache, which are faster types of memory used to reduce the amount of time that the CPU has to wait. The internal cache resides on the same chip as the CPU and can be accessed without delay. The external cache sits between the CPU and the regular RAM; it's faster than the latter, but not as fast as the internal cache. Finally, a very small part of the on-chip memory is organized as registers, which can be accessed within the normal cycle time of the CPU, thus allowing the fastest possible processing.

2.8. Conclusion

In this chapter, we've covered a lot of material on how a computer actually works. As you'll see, this background is essential if you're going to understand what really happens inside a program. In the next chapter, we'll get to the "real thing": how to write a program to make all this hardware do something useful.

2.9. Answers to Exercises

1. 32768 decimal, or 8000 in hex
2. -32768, or 8000 in hex
Why is the same hex value rendered here as -32768, while it was 32768 in question 1? The only difference between short and unsigned short variables is how their values are interpreted. In particular, short variables having hexadecimal values from 8000 to ffff are considered negative, while unsigned short values in that range are positive. That's why the range of short values is -32768 to +32767, whereas unsigned short variables can range from 0 to 65535.
By the way, whenever the number system being used isn't obvious, the convention is to add the letter h to the end of hexadecimal numbers. Thus, if it wasn't obvious that the number "8000" we were referring to in the previous paragraph was hexadecimal, we could write it as 8000h to make it perfectly clear.

1 For a really in-depth (or at least very funny) explanation of how computers work, you might want to take a look at Dave Barry in Cyberspace (ISBN 0-517-59575-3).

2 Some people believe that you should learn C before you learn C++. Obviously, I'm not one of those people; for that matter, neither is the inventor of C++, Bjarne Stroustrup. On page 169 of his book, The Design and Evolution of C++, he says "Learn C++ first. The C subset is easier to learn for C/C++ novices and easier to use than C itself."

3 Whenever I refer to a computer, I mean a modern microcomputer capable of running MS-DOS® or some version of Windows®; these are commonly referred to as PCs. Most of the fundamental concepts are the same in other kinds of computers, but the details differ.

4 Although it's entirely possible to program without ever seeing the inside of a computer, you might want to look in there anyway, just to see what the CPU, RAM chips, disk drives, and other components, look like. Some familiarization with the components will give you a head start if you ever want to expand the capacity of your machine.
By the way, Susan recommends that you clean out the dust bunnies with a computer vacuum cleaner while you are in there; it's amazing how much dust can accumulate inside a computer case in a year or two!

5 Other hardware components can be important to programmers of specialized applications; for example, game programmers need extremely fine control of how information is displayed on the monitor. However, we have enough to keep us busy learning how to write general data-handling programs. You can always learn how to write games later, if you're interested in doing so.

6 Although at one time, many small computers used floppy disks for their main storage, the tremendous decrease in hard disk prices means that today even the most inexpensive computer stores programs and data on a hard disk.

7 The heads have to be as close as possible to the platters because the influence of a magnet (called the magnetic field) drops off very rapidly with distance. Thus, the closer the heads are, the more powerful the magnetic field is and the smaller the region that can be used to store data so that it can be retrieved reliably, and therefore the more data that can be stored in the same physical space. Of course, this leaves open the question of why the heads aren't in contact with the surface; that would certainly solve the problem of being too far away. Unfortunately, this seemingly simple solution would not work at all. There is a name for the contact of heads and disk surface while the disk is spinning, head crash. The friction caused by such an event destroys both the heads and disk surface almost instantly.

8 In some old machines, bytes sometimes contained more or less than 8 bits, and there are specialized machines today that have different byte sizes,. The C++ language specification requires only that a byte has at least eight bits.

9 Each switch is made of several transistors. Unfortunately, an explanation of how a transistor works would take us too far afield. Consult any good encyclopedia, such as the Encyclopedia Britannica, for this explanation.

10 There's also another kind of electronic storage, called ROM, for read-only memory; as its name indicates, you can read from it, but you can't write to it. This is used for storing permanent information, such as the program that allows your computer to read a small program from your boot disk; that program, in turn, reads in the rest of the data and programs needed to start up the computer. This process, as you probably know, is called booting the computer. In case you're wondering where that term came from, it's an abbreviation for bootstrapping, which is intended to suggest the fanciful notion of pulling yourself up by your bootstraps. Also, you may have noticed that the terms RAM and ROM aren't symmetrical; why isn't RAM called RWM, read-write memory? Probably because it's too hard to pronounce.

11 The same disaster would happen if your system were to crash, which is not that unlikely under certain operating systems.

12 Most modern word processors can automatically save your work once in a while, for this very reason. I heartily recommend using this facility; it's saved my bacon more than once.

13 Each type of CPU has a different set of instructions, so programs compiled for one CPU cannot in general be run on a different CPU. Some CPUs, such as the very popular Pentium series from Intel, fall into a "family" of CPUs in which each new CPU can execute all of the instructions of the previous family members. This allows upgrading to a new CPU without having to throw out all of your old programs, but limits the ways in which the new CPU can be improved without affecting this "family compatibility".

14 Since frequency is measured in decimal units rather than in binary units, the mega in megahertz means one million (106), not 1,048,576 (220) as it does when referring to memory and disk capacity. I'm sorry if this is confusing, but it can't be helped.

15 In fact, the Pentium III or 4 can actually execute more than one instruction at a time if conditions are right. I'll ignore this detail in my analysis, but if I considered it, the discrepancy between memory speeds and CPU speeds would be even greater.

16 1 second/32 ns per instruction = 31,250,000 instructions per second.

17 These complementary roles played by RAM and the disk explain why the speed of the disk is also illustrated in the memory hierarchy.

18 It's also possible to have a cache that stores more than one item in a "location", in which case one of the other items already there will be displaced to make room for the new one. The one selected is usually the one that hasn't been accessed for the longest time, on the theory that it's probably not going to be accessed again soon; this is called the least recently used (abbreviated LRU) replacement algorithm.

19 This is fairly close to the actual way caches are used to reduce the time it takes to get frequently used data from RAM (known as caching reads); reducing the time needed to write changed values back to RAM (caching writes) is more complicated.

20 In case you're wondering how a small number of registers can help the speed of a large program, I should point out that no matter how large a program is, the vast majority of instructions and data items in the program are inactive at any given moment. In fact, perhaps only a dozen instructions are in various stages of execution at any given time even in the most advanced microprocessor CPU available in 2001. The computer's apparent ability to run several distinct programs simultaneously is an illusion produced by the extremely high rate of execution of instructions.

21 All of the registers are physically similar, being just a collection of circuits in the CPU used to hold a value. As indicated here, some registers are dedicated to certain uses by the design of the CPU, whereas others are generally usable. In the case of the general registers, which are all functionally similar or identical, a compiler often uses them in a conventional way; this stylized usage simplifies the compiler writer's job.

22 Since RAM doesn't maintain its contents when power is turned off, anything that a program needs to keep around for a long time, such as inventory data to be used later, should be saved on the disk. We'll see how that is accomplished in a future chapter.

23 The size of a short varies from one compiler and machine to another, but on the most common current compilers, especially for machines such as the ubiquitous "PC", it is 16 bits.

24 Note that the "*" symbol is used here to represent multiplication. We can't use the typical "x" symbol used in mathematics for this purpose, because letters are used for specific purposes in the C++ language. We'll get into that in detail later.

25 If you think that last number in Figure 2.5 looks familiar, you're right. It's the number of different values that, as I mentioned on page 30, can be stored in a type of numeric variable called a short. This is no coincidence; read on for the detailed explanation.

26 I should mention here that the C++ standard does not require the size of a byte to be 8 bits. But on the most common machines and compilers, it is.

27 As previously noted, the size of a short varies from one compiler and machine to another. However, on the most common current compilers, including the one on the CD in the back of the book, it is 16 bits.

28 You might also think that this would apply to numbers such as the total count of books sold in a particular category since publication date. But apparently that number can be negative, at least according to my royalty statements.

29 If neither of these does what you want, don't despair. Other types of numeric variables have different ranges, as we'll see starting in Chapter 9.

30 In the early days of computing, base 8 was sometimes used instead of base 16, especially on machines that used 12-bit and 36-bit registers; however, it has fallen into disuse because almost all current machines have 32-bit registers. 32, of course, is divisible by 4 but not by 3. Base 8 is unlikely to make a comeback because the upcoming generation of new CPUs will have 64-bit registers, which are again evenly divisible into hex digits but not octal ones.

31 Either upper or lower case letters are acceptable to most programs (and programmers). I'll use lower case because such letters are easier to distinguish than upper case ones; besides, I find them less irritating to look at.

32 See On Beyond Zebra (ISBN 0-394-80084-2) for another possible approach to this problem.

33 I'm simplifying here. There are instructions that follow other formats, but we'll stick with the simple ones for the time being.

34 The actual machine instructions being executed in the CPU don't have commas, register names, or any other human-readable form; they consist of fixed-format sequences of bits stored in RAM. The CPU actually executes machine language instructions rather than assembly language ones; a program called an assembler takes care of translating the assembly language instructions into machine instructions. However, we can usually ignore this step, because each assembly language instruction corresponds to one machine instruction. This correspondence is quite unlike the relationship between C++ statements and machine instructions, which is far more complex.

35 As before, we are ignoring the ability of the CPU to execute more than one instruction simultaneously.

36 Perhaps I should remind you that the programmer doesn't explicitly refer to the cache; you can just use normal RAM addresses and let the hardware take care of making sure that the most frequently referenced data ends up in the cache.

37 Don't blame me for the seemingly scrambled order of the codes; that's the way Intel's CPU architects assigned them to registers when they designed the 8086 and it's much too late to change them now. Luckily, we almost never have to worry about their values, because the assembler takes care of the translation of register names to register addresses.

38 If we want to be able to access more than 64 kilobytes worth of data, which is necessary in most modern programs, we'll need even more room to store addresses.

39 We'll go into this whole notion of using registers to represent and manipulate variables in grotesque detail in Chapter 3.


TOC PREV NEXT INDEX