The CPU/Memory Interface
October 16, 2002
Going Off-Chip
The middle ground between the CPU and vast but slow disk storage, of course, is system memory. The main determinant in memory performance or responsiveness is the speed and size of the CPU bus, followed by the memory bus and memory speed and type.
This is only logical, since (to go back to the printer analogy) a laser that can print 20 pages a minute won't be well-matched with a paper feeder that can deliver either 10 or 30. This is the reason the old experiment of pairing a 133MHz-system-bus Pentium III with DDR266 memory didn't yield noticeable performance gains, while matching today's 533MHz-bus Pentium 4 with DDR266 is almost criminal, leaving the processor starved for memory bandwidth (but the P4 responds handsomely when you replace DDR266 with DDR333 or DDR400 memory).
The optimal goal for achieving fast memory performance is to match the speed of the CPU bus with that of the memory bus. This is called synchronous operation, such as occurs with a 266MHz-bus Athlon XP 2200+ running on a DDR266 platform. Running asynchronously can also work, but there will be performance tradeoffs -- for instance, loading that 266MHz-bus Athlon XP with DDR333 or DDR400 is pretty much wasting money on the more expensive, faster memory.
There are ways around this limitation, and some are as simple as AMD's recent step up to a 333MHz system bus for the Athlon XP 2700+ and 2800+. This resulted in a 5- to 15-percent gain in overall system performance, just based on the higher memory bandwidth and its synchronous relationship or smooth mesh with DDR333. The asynchronous setup of current Pentium 4 DDR platforms is not a problem per se, but even DDR400 memory (maximum 3.2GB/sec) cannot supply the bandwidth that a synchronous mating of the Pentium 4's 533MHz bus and PC1066 RDRAM platform can (4.2GB/sec).
Where the Chipset Comes In
Enhancing the Northbridge component of the motherboard chipset is also a popular strategy, as this is the hub that coordinates memory traffic and can hence help or hinder overall performance.
Standard Northbridge memory controllers utilize a single path to system memory, which in the case of DDR translates into a 64-bit link running speeds of 200MHz to 400MHz. A different Northbridge can't do much about theoretical bandwidth, but improvements can be made to lower memory timings and access latencies. This is one area that VIA Technologies has really pushed with its KT series of AMD chipsets; the "performance-oriented" design that emerged after the original KT266 has really paid dividends as far as higher memory throughput for the company's KT266A through KT400 products.
A more fundamental change is to implement a dual-channel design into the memory controller, thereby providing two data paths between the system memory and chipset. This really is as simple as it sounds, but the actual designs can be very different. For example, the Intel 850E uses a dual-channel link to RDRAM memory, but due to the nature of the latter, the initial design called for two 16-bit paths and required RDRAM modules to be installed in pairs. Newer RIMM 4200 (32-bit) RDRAM changed this limitation and supports single-module use.
Due to its smaller 16/32-bit data pathways, RDRAM needs to run at higher speeds than DDR to keep up -- a single channel of even exotic PC1066 RDRAM can only match the 2.1GB/sec bandwidth of standard DDR266, thanks to the latter's 64-bit path. That is why the i850's dual-channel format (doubling bandwidth to 4.2GB/sec) has been so integral to the performance success of the Pentium 4 RDRAM platform -- and also why a dual-channel DDR memory controller, yielding a 128-bit memory path, is a dream of the performance crowd.
Nvidia's nForce and nForce2 chipsets offer dual-channel DDR support for the AMD camp, while Intel is expected to unveil a dual-channel DDR Pentium 4 chipset (you may have heard the codename Granite Bay) soon.
The speed of the CPU/memory subsystem will continue to increase as time goes by. AMD's forthcoming Hammer/Opteron processors will incorporate a dual-channel DDR memory controller onto the chip itself, doing away with an external Northbridge controller, while dual-DDR chipsets from many vendors will proliferate and CPU caches grow ever larger (AMD's "Barton" variant will double the Athlon XP's Level 2 cache to 512K, while Intel's "Banias" and "Prescott" are expected to raise the L2 ante to 1MB). With CPU clock rates rising ever higher, it's all memory and chipset manufacturers can do to keep pace.