Up to this point,
cache has been this magical place that automatically stores data when we need
it, perhaps fetching new data as the CPU requires it. However, a good question
is "how exactly does the cache do this?" Another might be "what
happens when the cache is full and the CPU is requesting additional data not in
the cache?" In this section, we'll take a look at the internal cache
organization and try to answer these questions along with a few others.
The basic
idea behind a cache is that a program only access a small amount of data at a
given time. If the cache is the same size as the typical amount of data the
program access at any one given time, then we can put that data into the cache
and access most of the data at a very high speed. Unfortunately, the data rarely
sits in contiguous memory locations; usually, there's a few bytes here, a few
bytes there, and some bytes somewhere else. In general, the data is spread out
all over the address space. Therefore, the cache design has got to accommodate
the fact that it must map data objects at widely varying addresses in memory.
The
idea of a cache system is that we can attach a different (non-contiguous)
address to each of the cache lines. So cache line #0 might correspond to
addresses $10000..$1000F and cache line #1 might correspond to addresses
$21400..$2140F. Generally, if a cache line is n bytes long (n is usually some
power of two) then that cache line will hold n bytes from main memory that fall
on an n-byte boundary.
When the
cache controller reads a cache line from a lower level in the memory hierarchy,
a good question is "where does the data go in the cache?" The most
flexible cache system is the fully associative
cache. In a fully associative cache subsystem, the caching
controller can place a block of bytes in any one of the cache lines present in
the cache memory. While this is a very flexible system, the flexibility is not
without cost.
At the other extreme is the direct mapped cache. In a direct mapped cache, a
block of main memory is always loaded into the same cache line in the cache.
Generally, some number of bits in the main memory address select the cache
line. For example, shows how the cache controller could select a cache line for
an 8 Kilobyte cache with 16-byte cache lines and a 32-bit main memory address.
Perhaps the biggest problem with a direct-mapped cache is
that it may not make effective use of all the cache memory. For example, the
cache scheme maps address zero to cache line #0. It also maps address $2000
(8K), $4000 (16K), $6000 (24K), $8000 (32K), and, in fact, it maps every
address that is an even multiple of eight kilobytes to cache line #0. This
means that if a program is constantly accessing data at addresses that are even
multiples of 8K and not accessing any other locations, the system will only use
cache line #0, leaving all the other cache lines unused. Each time the CPU
requests data at an address that is not at an address within cache line #0, the
CPU will have to go down to a lower level in the memory hierarchy to access the
data. In this pathological case, the cache is effectively limited to the size
of one cache line.
If a fully associative cache
organization is too complex, expensive, and slow to implement, but a
direct-mapped cache organization isn't as good as we'd like, one might ask if
there is a compromise that gives us more capability that a direct-mapped
approach without all the complexity of a fully associative cache. The answer is
yes, we can create an n-way set associative cache
which is a compromise between these two extremes. The idea here is to break up
the cache into sets of cache lines. The CPU selects a particular set using some
subset of the address bits, just as for direct-mapping. Within each set there
are n cache lines. The caching controller uses a fully associative mapping
algorithm to select one of the n cache lines within the set.
As an
example, an 8 kilobyte two-way set associative cache subsystem with 16-byte
cache lines organizes the cache as a set of 256 sets with each set containing
two cache lines ("two-way" means each set contains two cache lines).
Eight bits from the memory address select one of these 256 different sets. Then
the cache controller can map the block of bytes to either cache line within the
set). The advantage of a two-way set associative cache over a direct mapped
cache is that you can have two accesses on 8 Kilobyte boundaries (using the
current example) and still get different cache lines for both accesses.
However, once you attempt to access a third memory location at an address that
is an even multiple of eight kilobytes you will have a conflict.