This document explains the newhorizons cluster (inaugurated October 2015), queuing commands and help information

Logins, file locations 
Your login information, file locations, etc. remain unchanged. The data stored in your home directory also remain unchanged so there is no need to copy anything to the new system (during the transition period both systems have access to the same data concurrently). Most common programs (such as python, blast, pymol, …) are now part of the OS and simply available; no need to source anything or use specific paths. To obtain login access to the cluster,  contact Alex Lisker ( lisker@mbi.ucla.edu)

From inside the MBI network ( ewald, roentgen, sayre, escher)

Connecting to newhorizons

Moving files to newhorizons

shell> rsync -av myLocalFileOrDir user@newhorizons:/home/user/newLocation

New job scheduler (queuing engine)
We are migrating from the old SGE job scheduler to the modern SLURM scheduler. SLURM has been around for 10 years and was developed by HP, then LLNL, and now by SchedMD. It’s used on large clusters as it’s faster, more powerful and easier to use. However, it is different so you will have to get used to it and adjust your scripts; see below. Documentation is available at: http://slurm.schedmd.com/documentation.html

Diskless nodes
We have a new SSD-based file server with 20 Gbps connectivity. This is where the cluster OS, software and databases live. There no longer is a /local directory with local copies of databases. Further, /scratch is now a in-memory location on most nodes that will be cleaned up daily. Your jobs can still write scratch files there, but the space is limited to 1 GB on small (old nodes) and 16 GB on large (new nodes). Some nodes are equipped with local storage available in /scratch (SSDs or drives). You can request such a node by specifying the –tmp= parameter.

Queues
The queue setup and limits remain unchanged. See clusterstatus for a current status.

Software packages
This is a brand new, home-brewed installation. If you are missing a program or package, try to find it with apt-cache search keyword and let me know. I’ll install it right away.

Migration guide

Here’s a “Rosetta Stone” table of the common commands:

SGE
SLURM
Comments
squb job.sh
sbatch job.sh
Submit the script job.sh for batch processing
qdel 123
scancel 123
Stop/cancel the job 123
qstat
squeue
Show jobs in the queue; see also snodes -v
qlogin
qlogin
Start an interactive session on a node; this is a wrapper for salloc
srun command
Run command on a compute node, output goes to console
qhost
sinfo
Show node status; see also snodes -v
qhold 123
scontrol hold 123
Hold job 123

I’ve also make some custom commands that should be helpful:

snodes
Show compute node status. Options:
-v: Show verbose output, including jobs running on each node
-a: Show status for all nodes (not only those that are online)
-n: No color output

clusterstatus
Quick summary of the cluster node status, overall and per queue.

sjobs
A simple wrapper to sacct. Shows status of completed jobs.

Script Variables
The script variables have changed. You may be using those in your scripts, so adjust them accordingly:

SGE
SLURM
Comments
$JOB_ID
$SLURM_JOID
The (numeric) ID of the current job
$SGE_TASK_ID
$SLURM_ARRAY_TASK_ID
The numeric identifier of the current task (1…n)
$PE_HOSTFILE
$SLURM_JOB_NODELIST
List of nodes assigned to your job (parallel environments)

Job submission parameters
The parameters to qsub and qlogin differ from those to sbatch. See here for available command line options: http://slurm.schedmd.com/sbatch.html. See the example below for commonly used flags.

Job submission scripts
If you use custom job parameters (via  #$) in your job submissions scripts, you will need to adjust those. SLURM looks for #SBATCH lines instead, and the flags have changed too (see job submission parameters above). Your job submission script can contain the SGE-syle #$ and #SBATCH specifications concurrently. All are optional.