![]() |
|
|
The CPU/Memory Interface By Vince Freeman
It doesn't matter if a CPU runs at 300MHz or 3.0GHz -- if it isn't given any data to process, it's as useless as a printer waiting for you to refill the paper tray. That's why, while it may be the brains of the operation, the processor is only one component of a high-performance PC; the most important supporting architecture is the CPU/memory subsystem.
There's a hierarchy or spectrum of data storage areas between the CPU and system memory, from fastest to slowest or "closest" to "furthest away" (and extending still further, to data that's not even in memory but must be fetched from a relatively far-off, glacially slow hard disk). The front lines of any CPU access request are the data registers, which are high-speed, temporary storage areas within the CPU itself.
Registers hold data to be processed, the results of calculations, or addresses pointing to the location of desired data; they're of varying number, type, and size depending on the CPU design. For example, the Pentium 4 has 32 registers, split into four groups of eight (for x86 compatibility) that range from 32 to 128 bits in size and are split between various data types and tasks.
The CPU can act on data in the registers virtually instantaneously, but the registers are far too small to hold all the data required. This is also one of the most expensive areas of a CPU to implement, so it's extremely rare that a processor redesign would stop at simply adding more internal registers (although it's a tantalizing possibility; at least in theory, you could create a hyper-threading CPU by doubling the registers and possibly the cache while leaving the CPU core mostly untouched).
In order to provide an intermediary between the CPU registers and slower system memory, modern processors include varying amounts of data buffer or cache memory. Larger and slower than the registers, CPU caches are temporary storage areas usually broken down into Level 1 (L1), Level 2 (L2), and sometimes Level 3 (L3) caches, getting slower and less expensive as you move outward.
Most current desktop and notebook processors have an L1 cache at core level and an L2 cache on die (elsewhere on the chip). Where present (usually in servers), Level 3 cache can be either on-chip -- as in Intel's Itanium 2 -- or off-chip, on the motherboard or an expansion card.
Most desktop processors follow a standard configuration pairing a single L1 and L2 cache, though there are variations such as the forthcoming AMD Hammer's separate L1 instruction and data caches. AMD's Athlon XP and Intel's Pentium 4 share the standard L1/L2 design, though in very different ways: The current P4 combines an 8K Level 1 (plus tiny execution-trace) cache and 512K Level 2 cache (older versions had only 256K of L2), while the Athlon XP has a 128K Level 1 and 256K Level 2 configuration.
At first glance, the Athlon XP would seem to have the more robust design, especially given the small 8K L1 cache of its Intel rival. Looking at cache size is only part of the story, however, as the Pentium 4's internal cache has a broad 256-bit data pathway, while the Athlon XP has only 64-bit pathways.
To use the two essential buzzwords, the Pentium 4's smaller Level 1 cache allows it to maintain low cache latency (response time), but also yields a lower hit rate (the percentage of time a requested piece of data is in the cache, ready for access). Conversely, the Athlon XP's 128K Level 1 cache will have a higher hit rate, but can't match the Intel chip's L1 cache for latency.
This story is somewhat reversed when looking at Level 2 cache, as both theoretical design and hands-on testing show the 512K cache of the Pentium 4 is more adept at handling larger data sets than the 256K cache of the Athlon XP. Successfully predicting which data the CPU will want to be retrieved or processed can also yield performance gains, and both internal (branch prediction) and external (data prefetch) mechanisms play a role.
Go to page: 1 2
|