People
Software
CLuster Queue Software: SGE
 
CLuster Queue Software: Sun Grid Engine (SGE)

SGE runs the 96 node computer cluster. Each node is a single processor 866 MHz Linux workstation with 1 Gbyte of memory. The Sun Grid Engine sends the jobs requested by the users to the appropriate execution hosts in the cluster and takes care of the scheduling demands.

Documentation (in PDF format)

Due to fluctuations in usage, the number of nodes for the Bioinformatics Group consists of the majority of the nodes, and a smaller number is left to the Xray Group.

If you have any questions: send mail to holton at mbi

Hints

  1. In all cases, write your shell scripts submitting your jobs with minimizing little I/O in mind. Keep track of file handles when reading and writing to files in your home directories. Too many stale or open file handles can cause corrupted data, in fact.

  2. Issuing a Large number of INSERT commands to MySQL can be unreliable. If you are faced with several thousand or greater Insert commands, consider writing these commands to a file and issuing these commands yourself, directly to the MySQL interpreter, when the job is finished.

  3. Consider copying all of your input data to the node you use on the system. This means, that if, as an early command in your script, you issue a copy command like:

    > cp /data1/users/you/your/data/*.inp /usr/temp/you/

    as one of your first commands, you GREATLY reduce the NFS load for your job. This will
    • increase your performance
    • increase your accuracy
    • increase your reliability
    • lessen load network load for other users

    Once the job is finished, PLEASE remove your temporary data:

    > rm -fr /usr/temp/you

    Regradless, this data will be removed if you don't do it yourself.

This is how you use the system:

1. log into joule. Jobs can only be submitted from within joule (as of now)

2. Source the environment for the cluster software:
> source /gridware/codine/default/common/settings.csh
   (There is a helpful addition to your .login or .cshrc or .tcshrc
   that sets up everything at the end of this message)

3. Jobs run by submitting a SHELL script to the queue. the -cwd
   option makes the error output write to the directory where you
   are submitting the job. It does NOT effect where your job will
   write its own output, which is why you must use absolute paths
   in your scripts. If you do not use -cwd, the error output files
   are written to your home directory.

> qsub -cwd yourscript.sh

4. If you do not want to have these error output files produced at all,
(which you probably do not because they will be empty if everything runs
properly), do the following:

qsub -cwd -o /dev/null -e /dev/null  script.csh 

-o and -e for pointing these files to be written to null. 



5. You can see the status of all jobs for the queue using qstat
> qstat
> qstat -f : see a listing of all queues you can use ( -F for an exhaustive listing)
> qstat -help to see all options


6. You can delete a job using qdel
> qdel jobid
> qdel -help



Important that you use absolute paths within your script to make sure
the output goes where you want it to go.
No one can log into the cluster itself, all interaction is done using
the queueing software.

example script:


#!/bin/sh

cd /your/home/directory/path/
yourjob.torun [args] > /your/home/directory/output/path


The new cluster is linux. Thus, only linux executables will run in the new
cluster.

Also,
you can set all of the environment and provide some useful info for
yourself by putting the following into your .login, .cshrc OR .tcshrc
(whichever one you use to set variables)


if ( $HOST == joule.mbi.ucla.edu ) then
    source /gridware/codine/default/common/settings.csh
    echo "Queue stuff"
    echo " submit: qsub -cwd script.sh"
    echo " status: qstat"
    echo " "
endif


The above has been contributed by Carsten.