Skip over navigation

The Lost continent of

You've found a bug on my site!

Premature optimisation
is the root of all evil

D. Knuth

My Supercomputing Cluster

A Beowulf by any other name.

Today's Gigaflop class machines still not fast enough for you? You can't be cured of your rapacious appetite for processing power (after all, it's only natural), but you can take 2, or 7 or a whole bunch of machines, and configure them to run as one, ridiculously overpowered, computer.

That's what I've built.

Update

I have now disassembled my first cluster and given the component computers away to a local computer club in anticipation of being able to start building a small cluster of modern machines sometime in the new year...


Background

My main research interest is the complex behaviours that can arise from the simple activities of lots of fairly simple units. Consider the termites. Consider your brain. I need lots of processing power to run my experiments in virtual space. Lots of iterations of lots of parallel processes.

I've started small - and cheap, so I can write and test some software on a working cluster before spending any money. You have to write special programs to take advantage of all of the processors (a MOSIX cluster can be an exception to this, but not for the tasks I have in mind). Eventually I'd like to build a number of powerful diskless nodes that boot over the network onto a central server - but that takes time and money, both of which are always in short supply...

Hardware

To start with I've just got four surplus PCs that I got hold of for free. Four Pentium 200MMX machines with 96MB RAM, and one Pentium II 300MHz, with 256MB RAM, which acts as the 'head'. The nodes all have old 3Com 10Mbit ISA network cards, while the head has a PCI 100Mbit NIC. (I realise that a single modern Athlon or Pentium 4 system has many times the processing power of all of these computers combined, but the point for the time being is to learn about setting something like this up, and practice programming for a node...).

Networking

Network connection diagram

In order to do act together all the computers in the cluster must be in communication. I'm using very cheap, slow, ethernet networking to provide this communication — which doesn't matter too much for the applications I'm going to be using.

In the style of the original Beowulf, I have a 'head' node connected to all of the subservient computational nodes. The head node is responsible for providing a point of access to the cluster, and coordinating the cluster's behaviour. Later, when the nodes are diskless, it will provide file system, and boot file services to the them. It is the only node to have a screen or keyboard, and in most Beowulf setups, in the only node to have a network connection to any computers outside the cluster. When I do any work on the cluster, I do it from this machine only.

Software / Libraries

The head needs to talk to the nodes, and the nodes need to talk to the head - and each other. I could write raw TCP/IP code into each of my programs, but instead I'm using a system known as MPI. MPI provides a number of C (and Fortran) functions to send messages between machines. Another, older, project that provides similar functionality is PVM.

(There is another clustering technology known as MOSIX. This is quite different to both MPI, and PVM. The later are just programming libraries that provide some communication services over a TCP/IP network. Mosix is far more reaching than that. It is a set of Kernel patches [the very core of a operating system] that can move running programs from one node to another, transparently. If you had a lot of people using your cluster, each new program would be started on the node that was least busy. The Mosix system is great for groups, or web farms, but not for parallel programming.)

Programming

In order to use my new cluster, I do have to write my programs specially. Luckily the problems I am interested in solving are all 'embarrassingly parallel' - it is relatively easy to write programs to solve them on a cluster, as opposed to on a single machine. All I have to do is break the problem into chunks in the head node, send the individual chunks to each node, then collect and collate the results. I use the functions from the MPI library to pass these messages:

// Setup
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);

// Send Message to head (rank=0) node
MPI_Send(message, strlen(message)+1, MPI_CHAR,
	0, MPI_COMM_WORLD);

// Recieve Message
MPI_Recv(message, 100, MPI_CHAR, source, tag,
	MPI_COMM_WORLD, &status);

// Close
MPI_Finalize();

Messages pass between the head to the nodes only, and infrequently at that, so I can get away with very cheap networking hardware. Most people are not so lucky - their nodes need to talk to each other all the time. Gigabit ethernet in considered an entry-level technology for these unfortunates.

Performance

You might expect that a cluster of ten PCs would be able to solve a problem ten times faster than a single PC - and most often this is, in fact, the case. However, sometimes it's not that simple.

Some problems, like some very large scale simulations, can only be run on clusters, as the problem is too big to fit into the address space of one PC. Often there are machines that can handle the problem, but at a huge cost. (It's a lot cheaper to build a cluster of 20 PCs with 1GB of RAM each than a single computer with 20GB of RAM...)

Sometimes a cluster can hugely outperform a single computer, because working directly from the CPU cache is many times faster than using RAM. A cluster of 20 Athlon CPUs has a combined total of 2.5 MB of L1 cache, and 5MB of L2 cache. Spreading a problem over these CPUs can be much faster than running them on a single system.

The Future

My first cluster is tiny. Five old PCs, connected by a slow old network. However, it cost me nothing but some time, and is a useful learning tool. I'm going to play around with it for a while. Write and test some programs, setup some administration scripts and things...

When I have programs ready to take full advantage of a massive number of Gigaflops (and some money saved!), I'll go out and buy a bunch of very bare mid-tower PCs. Just memory, motherboard, a CPU and a network card, choosing the CPU with the best price/performance ratio - usually somewhere in the middle of the range. I'll set each node up to boot over the network, so I don't need to have hard drives installed. That makes them cheaper, less prone to failure, and reduces the amount of administration I have to do.

Resources

By far and away the best place to learn more about the general practic and theory of cluster computer is the online book written by the venerable Robert G. Brown, 'Engineering a Beowulf-Style Compute Cluster'.