The following is a simple guide to setting up a cluster server and nodes using ArchLinux.The advantage of this approach is the flexibility of setting up a computer capable of high speed parallel computation using commodity hardware.
The procedure will be generally similar for most Unix based systems.The preference for Arch
is driven by its philosophy of keeping-it-simple.’Simple’ is defined from a technical standpoint, not a usability standpoint. It is better to be technically elegant with a higher learning curve, than to be easy to use, and technically crap.Thus for a base system that will be as lean and fast as possible the minimalist base Arch install is perfect for the task at hand.
The Open MPI
Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available.
This guide assumes:
- all the machines have been formatted and Arch base system installed according to the guide
- the machines are connected via a TCP/IP network with the ip addresses and hostnames noted down as they will be required in later steps.
- each machine has a common login account (in this case baloo)
- all machines are using the same processor architecture i686 or x86_64
Its always a good idea to get the latest and up-to date Arch system so a quick:pacman -Syu
Open MPI communicates between the nodes and server over a secure connection provided by openssh secure shell.The full details of openssh options can be found from the arch wiki or the main openssh site .Here the bare minimum is given to get a cluster up and running.
Accomplished by calling:pacman -S openssh
the default configuration for the sshd (server deamon) are enough for our needs.Inspect the /etc/ssh/sshd_config making sure all options are sane then continue.
To allow the cluster to send communication to the nodes from the server without the password being requested at every instance we shall use ssh-keys to enable the seamless logon.Using the defaults accept as given.No passphrase is selected , although inherently less secure than with one this precludes the need to setup key management via keyring
Copying Keys to the server
Start the ssh deamon rc.d start sshd on both the server and the slave node and copy the public key from each node to the server.These will all end up in the home directory for our common user baloo.ie /home/baloo/.ssh/
The server publickey (id_rsa.pub)and each of the publickeys copied over from the nodes are then appended to the authorized_keys file at ~.ssh/authorized_keys on the server.To enable two way communication its then possible to copy this file back to all the nodes after.
IMPORTANT:make sure the permissions for the following are all appropriate for reading and writing only by the owner:chmod 700 ~/
chmod 700 /.ssh
chmod 600 authorized_keys
logging into the remote machines via ssh should no longer require a passsword.
OpenMPI requires the programs that are to be run to be in a common location here .Instead of copying the program executable over and over to the slave nodes we set up a simple NFS shared directory with the actual folder on the server from which all the nodes will mirror the contents.
Create the directory that will be shared /parallel in this instance and edit the /etc/exports to have the file mirrored to the remote nodes
/parallel . . . . . . . *(rw,sync)
and change the ownership permissions for the shared directory to nobody
chown -R nobody.nobody /parallel
Edit /etc/fstab to include the following line so the clients can access the shared /parallel directory
192.168.2.103:/parallel /parallel nfs defaults 0 0
Setting the appropriate daemons to launch on start-up simply requires the modification of /etc/rc.conf and adding the appropriate entries.
DAEMONS=(…….sshd rpcbind nfs-common nfs-server ……)
DAEMONS=(…….sshd rpcbind nfs-common ……)
With the preliminary setup out of the way we can now install the openMPI package , it comes with inbuilt wrappers for c++ fortran and c additionally the python wrappers can also be installed.It should be installed on both the server and nodes
pacman -S openmpi python-mpi4py python2-mpi4py
*the python wrappers are there if you want to implement the parallel programs in mpi for python
To allow Open MPI to know on which machines to run your programs create a hostfile in the default user home directory.if /etc/hosts was set up you can use the host names here otherwise the IP addresses of the machines can work just as well.~/mhosts
#The master node is dual processor machine hence slots = 2
#The slave node is a quad core machine hence the slots=4
Running Programs on the cluster
To run myprogram on the cluster issue the following command from the /parallel directory:$mpirun -n 4 –hostfile ~/mhosts ./myprogram$mpirun -n 4 –hostfile ~/mhosts python myprogram.py
or$mpiexec -n 4 –hostfile ~/mhosts ./myprogram
$mpiexec -n 4 –hostfile ~/mhosts python myprogram.py