Bigred Instructions
Instructions provided by the bigred system admin. Note, these may be a bit out of date (I (Ryan) think the intel compilers are no longer available).
bigred_instructions.txt
—
Plain Text,
7Kb
File contents
Basic Cluster Instructions (for bigred)
---------------------------------------------------------------------
To log into the cluster, just ssh into it, like so:
[user@mypc]$ ssh <cacs-username>@bigred.cacs.louisiana.edu
PROFILE SETUP
-------------
To set up your environment for Java and MPI compilation:
In the .bash_profile file in your home-dir, add (or append to) the PATH variable:
-------------in ~/.bash_profile----------------
For mpich, add/append:
PATH=$PATH:/opt/mpich/gnu/bin
For lam-mpi, add/append:
PATH=$PATH:/opt/lam/gnu/bin
For the Intel distribution of mpi, add/append:
PATH=$PATH:/opt/mpich/intel/bin
For java (particularly RMI) to work, add:
CLASSPATH=~/<classpath> (e.g. ~/javaprogs)
Then add/append:
PATH=$PATH:/usr/java/jdk1.5.0_05/bin
For SGE (this is recommended), add:
echo -n "Setting SGE environment... "
source $SGE_ROOT/default/common/settings.sh
echo done!
Then add/append:
export PATH CLASSPATH
-----------------------------------------------
Remember to re-source the .bash_profile file after making your changes:
[user@bigred]$ source ~/.bash_profile
SGE
---
SGE v6 is the installed job-management system installed on bigred. You should use SGE
to submit any medium/large jobs, especially parallel jobs. The major commands that
users will need to be familiar with to use SGE are:
qsub - To submit jobs.
qdel - To delete jobs.
qstat - To monitor jobs.
Jobs are submitted to SGE via a shell-script. Some examples of what the scripts should look
like are available in /opt/gridengine/examples. In particular, see: sge-qsub-test.sh. The script
specifies what executable(s) will be run and what its environment should be. Some environment examples
that can be set in the script are:
#$ -cwd (tells SGE to put the output files in the current working directory)
#$ -j y (merge the error and output into one file)
#$ -S /usr/bin/bash (tells SGE to use the bash shell)
#$ -pe mpi 20 (tells the mpi environment how many processors to use)
The default output from SGE is in the format of <script-name>.o and <script-name>.e files for program
output and errors, respectively. So if your script name is myscript.sh, SGE will output two files,
myscript.o and myscript.e. The "-j y" will merge these two into one file.
For MPICH jobs, the -pe switch is particularly important.
Once you have created your script, submit it like so:
[user@bigred]$ qsub my-sge-script.sh
your job 1234 ("my-sge-script.sh") has been submitted (sample output after running the command)
To delete it, you need the job-ID number (1234 in the example above), like so:
[user@bigred]$ qdel 1234
<username> has deleted job 1234 (sample output after running the command)
To query the status of your SGE jobs use qstat, like so:
[user@bigred]$ qstat -u <username>
Sample output:
job-ID prior name user state submit/start at queue master ja-task-ID
---------------------------------------------------------------------------------------------
7140 0 <executable> <username> r 11/14/2005 11:53:43 compute-0- MASTER
7124 0 <executable> <username> r 11/14/2005 11:06:46 compute-0- MASTER
7126 0 <executable> <username> r 11/14/2005 11:08:50 compute-0- MASTER
7125 0 <executable> <username> r 11/14/2005 11:07:47 compute-0- MASTER
7142 0 <executable> <username> r 11/14/2005 11:55:14 compute-0- MASTER
There are also a host of other sge commands, such as qalter, qrsh and qlogin. For more information,
see the qsub/qstat/qdel man-pages. Full documentation on using SGE can be found at:
http://gridengine.sunsource.net/documentation.html
MPI
---
To compile an MPI program:
[user@bigred]$ mpicc -o <executable-name> <sourcefile-name>
e.g.: [user@bigred]$ mpicc -o test test.c
To run an MPI program:
[user@bigred]$ mpirun -np <number-of-processors> <executable-name>
e.g.: [user@bigred]$ mpirun -np 4 test (will run a program on 4 processors)
Note that the GNU mpich enviroment is somewhat fragile, and a crashed program will
probably fail to free up shared memory arrays and semaphores. For this reason, there
is a clean IPC command located here:
/opt/mpich/gnu/clean/cleanipcs
This command must be run on all nodes which a submitted program has be run on. So, if
SGE has allocated nodes compute-0-1 and compute-0-2 to run an 8-processor MPI job on,
then following a forced job kill you would have to run:
[user@bigred]$ ssh compute-0-1 /opt/mpich/gnu/clean/cleanipcs
[user@bigred]$ ssh compute-0-2 /opt/mpich/gnu/clean/cleanipcs
You can use Ganglia to find out what nodes have been allocated for your MPI job. See the
Ganglia section.
For more info, check the man-pages on mpirun (man mpirun). There are
many more run-time options (such as what nodes to run your program on)
that you can specify.
NOTE: Running MPICH jobs directly on the cluster is NOT recommended, particularly when
the jobs are very large. You can directly run the job with one or two (-np 2) processors
for testing, but for actual job-submission, use the SGE job-manager. See the SGE section.
Java
----
For java, remember to put the programs you intend to compile in
the directory specified in the CLASSPATH variable in the .bash_profile.
For example, if in .bash_profile, you have:
CLASSPATH=~/javaprogs
Then create a directory called "javaprogs" in your home directory,
and put all your .java files in it. See the Profile Setup section.
NOTE: Running java jobs directly on the cluster is NOT recommended, particularly when
the jobs are very large. For actual job-submission, use the SGE job-manager. See the
SGE section.
RMI
---
Making an RMI program usually requires at a minimum:
1) An interface file
2) An implementation file
3) A client and/or server driver file
For example, if you create an interface file called "Test1.java", then
you will need a file called "Test1Impl.java" with your server-side
implementation code in it.
To compile java programs for RMI:
[user@bigred]$ javac <source-file>
[user@bigred]$ rmic <impl-file>
RMI support has been included in the cluster, but to allow the RMI system
to work, you will need to create a file called "policy" in your classpath
directory, and put the following in it:
---------------policy file-----------------
grant
{
// allows permission for all
permission java.security.AllPermission;
};
-------------------------------------------
Running RMI programs:
After compiling the programs, you need to start the RMI registry service
to run RMI programs, like so:
[user@bigred]$ rmiregistry &
[1] <processid> (this should be output after running the above command)
Remember to kill the RMI registry program after finishing:
[user@bigred]$ kill -9 <processid>
For more info on programming with Java and RMI, see:
http://java.sun.com/j2se/1.5.0/docs/guide/rmi/index.html
NOTE: Running RMI jobs directly on the cluster is NOT recommended, particularly when
the jobs are very large. For actual job-submission, use the SGE job-manager. See the
SGE section.
Ganglia
-------
You can check on the status of the cluster and of running jobs/processes via the Ganglia
system. To do this, go to:
https://bigred.cacs.louisiana.edu
Then click on the "Cluster Status" link. To see the status of SGE jobs, click on
the "Job Queue" link on the cluster status page to see the list of currently
submitted SGE jobs. Details of each job can then be viewed by selecting the
relevant job ID. For each job, the nodes that have been assigned are listed. This
feature is useful for performing post-MPI IPC cleanup (see the MPI section.)
Note that new SGE jobs may take a short while to show up in Ganglia. To monitor SGE jobs,
it may be preferable to log into the cluster and use qstat. See the SGE section.
