![]() |
|
|
Team Effort: Clustering 101 By Dan Costa
Although these workday tasks are clustering's greatest hits, another application often gets more press: grid computing. The two terms are often used interchangeably -- both involve multiple systems working together to carry out a similar set of functions -- but there are differences. You can think of a cluster as grid computing under one roof: One company or department sets up a cluster and controls the whole, usually localized or centralized, system.
Grid computing is more far-reaching; individual systems can be added or subtracted without a central control. What's more, miles can separate grid participants as long as there's a network connection between them. An example on a massive -- nay, cosmic -- scale is the SETI@Home project, which enlists PC users all over the Internet to download a screen saver that uses extra clock cycles to sort through radio telescope data in search of signs of life in deep space.
One of the earliest examples of clustering was 1994's Beowulf Project, which connected 16 Intel DX4-based PCs via 10Mbps Ethernet; one PC acted as the master and user interface, with the others serving as slaves used solely for computation. Faster CPUs and networking technologies have been plugged into the same framework, which remains popular today as a sort of compromise between massively parallel processing and mere networks of workstations whose nodes may be available for other tasks.
While Beowulf technology is open-source, commercial versions that simplify the installation and configuration process are available from companies ranging from HP to Scyld Computing Corp. and Northrop Grumman. Recently, AMD announced that Scyld's cluster OS will be customized to support forthcoming Opteron server processors, allowing 64-bit clusters using an enhanced Linux kernel.
Beowulf-like clustering can be used to create a system using standard PC, server, and workstation components that rivals the muscle of a supercomputer for tasks like scientific investigations and sophisticated modeling. Although some companies like IBM are trying to apply grid technology to symmetric multiprocessing (SMP) environments to run enterprise applications, for now grid computing is best at tasks that involve SETI@home-style parallel processing -- jobs like modeling or genome sequencing that can be divided into parts and distributed across the grid, with results to be combined later. Accessing a single database isn't well suited to a cluster, but complex data mining is.
Clustering is often thought of as a Linux or Unix application, but Windows XP and Mac OS can be used in clusters, too; Web-server clusters often use off-the-shelf operating systems and software. More scientifically oriented applications tend to be written for whatever platform they run on.
Practically every computer company you can name, from IBM and Sun to HP and Microsoft, has invested in clustered solutions. According to IDC Research, Dell is the current market leader in x86 supercomputer clusters, with revenue of $65 million last year; IBM is close behind with $60 million and HP earned $48 million. Even Apple now offers a cluster-ready version of its Xserve server with dual 1.33GHz PowerPC G4 processors and a Gigabit Ethernet port.
To get the best performance from any cluster, it helps to use the fastest available CPUs and connections between systems, but clusters by nature are more than the sum of their parts. Just as grid computing solutions can accept the computational contributions of big servers and humble desktops alike, old Pentium IIIs and PowerPC chips have been strung together to create some very affordable high-performance computing systems. So think twice before you throw out that old PC -- depending on the applications you need done, it might be reborn as part of a cluster.
Go to page: 1 2
|