Code Morphing Meets the Northbridge

Speed isn’t everything. Since its debut in 2000, Transmeta Corp. has always positioned its Crusoe processor as a balance of computing performance and low power consumption, optimized for longer battery life in slimline subnotebooks or operation without noisy cooling fans in blade servers or thin-client terminals.

Nevertheless, Windows XP does have an appetite for CPU power. When our sister site Hardware Central tested a 1.0GHz Crusoe TM5800-powered Sharp Actius MM10 notebook in September 2003, it posted the slowest benchmark scores since a 900MHz AMD Duron desktop from mid-2001. A year ago, HP chose the Crusoe chip for its first Tablet PC, the Compaq TC1000, but reviewers used the S word — sluggish. The new TC1100 features Intel’s low-voltage Pentium M.

Transmeta is happy to sell processors to makers of wearable computers or set-top boxes, but won’t give up the PC market without a fight. That’s why early 2004 will see the arrival of the Efficeon TM8600 — the company’s second-generation CPU, which Transmeta’s Web site says “offers a better balance of performance, power, cost, and platform choices than the Crusoe processor … In particular, the Efficeon provides a higher level of absolute performance and responsiveness that will be very impressive to the user.”

Along the way, Transmeta has also come up with one of the most innovative, integrated CPU designs to date: While on-chip caches have become commonplace and AMD squeezed a built-in memory controller into the Opteron and Athlon 64, Efficeon absorbs the entire system-chipset Northbridge onto the processor die. Here’s an overview of how Transmeta hopes to revitalize its different-drummer “Code Morphing” architecture for a new, more demanding generation of mobile devices.

Replacing Hardware with Software

Like Crusoe, Efficeon (the name is supposed to suggest a new eon, or era, of efficient computing) performs the feat of fully compatible, off-the-shelf Windows and Linux software execution without actually using the x86 architecture of most Intel and AMD chips.

The CPU is a proprietary, very-long-instruction-word (VLIW) design, with a native-instruction-set software library of precisely one program: the Code Morphing software layer written by Transmeta, which converts x86 instructions into VLIW instructions in real time. This on-the-fly emulation (more accurately, translation) fools x86 software — starting with the computer’s BIOS, then continuing to the operating system and applications — into thinking it’s running on x86 hardware.

In other words, Crusoe and Efficeon are hardware/software hybrids instead of pure hardware processors — by decoupling the x86 instruction set from the underlying hardware, they allow for even radical CPU redesigns with no need for software changes apart from the Code Morphing layer. The latter is copied from ROM into RAM at boot time to speed execution (stealing some 24K of system memory in current Crusoe PCs; Transmeta hasn’t yet revealed the TM8600’s memory overhead).

The hybrid approach lets Transmeta keep the VLIW core relatively simple, using software to perform tasks — such as translating x86 instructions into RISC-like operations or micro-ops for parallel execution, then reshuffling them into the proper order — that are normally handled by x86 hardware. This means fewer logic transistors are required, permitting a smaller, cooler-running chip. It can also yield relative performance enhancements — instead of translating each x86 instruction every time it’s encountered, for instance, Transmeta’s scheme saves the result in a translation cache for reuse the next time that instruction comes along.

Of course, it also yields a relative performance decrease insofar as the CPU has to dedicate some cycles to running the Code Morphing software instead of application code. Compared to Crusoe, Efficeon’s Code Morphing layer has been rewritten for higher efficiency, putting x86 code through up to four levels of translation — from a “first gear” that gathers data for flow analysis and filters out and handles infrequently executed instructions one at a time (the slowest way, but with the lowest overhead), through a “second and third gear” that translate and optimize “regions” of up to 100 x86 instructions at once, to a “fourth gear” that splices together regions and performs the most advanced optimizations for the most complex code.

More Powerful with Less Power

Hardware-wise, the Efficeon is sort of like Crusoe times two: While Transmeta’s first CPU’s 128-bit VLIW architecture could handle instruction words (which the company calls “molecules”) containing up to four 32-bit instructions (dubbed “atoms”) per clock cycle, the TM8600 is a 256-bit VLIW design that can issue up to eight instructions per clock cycle. It has 6-stage integer and 8-stage floating-point pipelines, 64 32-bit integer registers, and the same number of 80-bit floating-point registers.

Add compatibility with MMX, SSE, and SSE2 multimedia x86 instructions, and you get what Transmeta says is up to 50 percent better performance per megahertz than Crusoe on typical applications and 80 percent better performance on multimedia programs.

The TM8600 chip contains 192K of Level 1 cache — 64K for data and 128K for instructions — and 1MB of Level 2 cache (Transmeta says there’ll also be an economy version with only 512K of L2). It’s also the first x86 processor to fully integrate Northbridge core logic, including a low-latency memory interface compatible with DDR400 (or slower DDR333 or DDR266) memory in either conventional or server-friendly ECC flavors; an AGP 4X (AGP 2.0) graphics controller; a Low Pin Count (LPC) bus interface for flash ROM peripherals; and a low-voltage, point-to-point, 400MHz HyperTransport interface that the company boasts delivers a dozen times the throughput of the classic 32-bit, 33MHz PCI bus.

For PC designers, this means higher memory performance and a lower chip count, the latter reducing both system power consumption and motherboard real-estate requirements. According to Transmeta, the Efficeon TM8600’s die size is just 119 square millimeters — destined to shrink to 68 square millimeters when production moves from TSMC’s 0.13-micron to Fujitsu’s 90-nanometer process technology in the second half of 2004 — and the total package size, including the Northbridge and AGP port, is just 29mm by 29mm or 841 square millimeters.

Combine that with the nForce3 Go 150 Southbridge chip announced by partner Nvidia Corp., and you’ve taken up only 58 percent of the footprint of Intel’s Pentium M processor plus its separate Northbridge and Southbridge.

What’s more, Transmeta claims, its LongRun dynamic power management technology helps a 1.1GHz Efficeon solution squeeze under a total thermal-design-power (TDP) ceiling of 7 watts — about the limit for fanless systems — which restricts Intel’s Pentium M to 900MHz, while taking only one-eighth as much power (0.18 versus 1.45 watts) in standby mode. When fabrication moves to 90 nanometers, the chipmaker predicts, the same 7-watt TDP limit will permit a 1.6GHz instead of 1.1GHz clock speed.

No independent parties have benchmarked an Efficeon system yet, so we can’t verify Transmeta’s declaration that its new CPU will deliver unmatched “performance per watt per dollar” — the company’s launch presentation cited 1.3 times the floating-point and 1.2 to 2.7 times the integer performance of the abovementioned, wattage-comparable Pentium M/900 and a narrow victory in the SysMark 2002 benchmark, though it also candidly included narrow-defeat numbers for the PCMark 2002 CPU and MobileMark 2002 tests.

But win or lose, it looks like Transmeta has prepared one of the most original and interesting processor designs of the year — and one that could keep it on mobile PC shoppers’ short lists alongside Intel and AMD.

Categories: Technology