Tag-Team Processing Offers Supercomputing on a Shoestring
Sometimes one processor, not matter how powerful, just isn't enough. In fact, sometimes not even two Opterons or four Itaniums will get the job done. If you want to model seismic activity along the west coast to predict when "the big one" is coming, or track every economic indicator in the U.S. to see if the dollar will weaken against the yen, your job involves so many variables and such complicated algorithms that not even the largest server can solve them.
In this case, you have two choices: invest in a supercomputer, which takes years to build and costs millions of dollars, or string together a series of less powerful systems and achieve nearly the same performance for a fraction of the cost.
The National Center for Supercomputing Applications (NCSA) is taking the second path as it tries to determine nothing less than the origin of the universe: Rather than a single, huge supercomputer, the Illinois-based center is building a cluster. This fall, NCSA will connect more than more than 1,280 Dell PowerEdge 1750 servers with two Intel Xeon CPUs apiece, running Red Hat Linux and linked by Myricom's Myrinet 2000 interconnect technology. The result should yield a peak performance of 17.7 teraflops (trillion floating point operations per second) -- as of today, fast enough to make it the third most powerful supercomputer on Earth. And you can order all of the parts online.
The Promise of Clustering
The clustering approach is important for several reasons. First, it provides a cost-effective way to get more computing power from existing PCs and workstations. Second, it's one of the few tools available to tackle ultra-complex problems such as weather and economic modeling.
Finally, it makes supercomputing-class power more available to businesses and institutions worldwide. When you think of supercomputing, do you think of the Cray brand? If you check out Top500, which tracks the strongest supercomputers in the world, you'll find it barely in the top 50 -- increasingly eclipsed by clusters of systems using commercially available Xeon and Itanium processors.
Clustering means different things to different people, but at the most basic level it involves connecting multiple computers so they work as a single system -- while multiprocessing involves two or more CPUs in one machine, clustering involves two or more machines (each of which, in turn, may be a multiprocessing system).
The most common goals of clustering are load-balancing and ensuring high-availability computing. On the latter score, despite the demise of the dot-com economy, there are still thousands of dot-coms that want to keep their sites up 99.99 percent of the time -- which is about two hours of downtime per year. With clustered systems, a backup is always running in real time if a failure occurs. In a so-called Web farm, if one server's CPU overheats or hard disk crashes, other servers in the cluster proceed without skipping a beat.
IT managers can also set scripts for load balancing, so that if one server gets overworked and starts to slow down, another can pick up the slack. Clustering enables companies to scale their networks as their company grows -- a business might start with a relatively small, four-system server cluster, but as traffic increases, it can add more to handle the load without changing its entire architecture.