Logins, file locations
Your login information, file locations, etc. remain unchanged. The data stored in your home directory also remain unchanged so there is no need to copy anything to the new system (during the transition period both systems have access to the same data concurrently). Most common programs (such as python, blast, pymol, …) are now part of the OS and simply available; no need to source anything or use specific paths. To obtain login access to the cluster, contact Alex Lisker ( lisker@mbi.ucla.edu)
From inside the MBI network ( ewald, roentgen, sayre, escher)
Connecting to newhorizons
- shell> ssh user@newhorizons
Moving files to newhorizons
shell> rsync -av myLocalFileOrDir user@newhorizons:/home/user/newLocation
New job scheduler (queuing engine)
We are migrating from the old SGE job scheduler to the modern SLURM scheduler. SLURM has been around for 10 years and was developed by HP, then LLNL, and now by SchedMD. It’s used on large clusters as it’s faster, more powerful and easier to use. However, it is different so you will have to get used to it and adjust your scripts; see below. Documentation is available at: http://slurm.schedmd.com/documentation.html
Diskless nodes
We have a new SSD-based file server with 20 Gbps connectivity. This is where the cluster OS, software and databases live. There no longer is a /local directory with local copies of databases. Further, /scratch is now a in-memory location on most nodes that will be cleaned up daily. Your jobs can still write scratch files there, but the space is limited to 1 GB on small (old nodes) and 16 GB on large (new nodes). Some nodes are equipped with local storage available in /scratch (SSDs or drives). You can request such a node by specifying the –tmp= parameter.
Queues
The queue setup and limits remain unchanged. See clusterstatus for a current status.
Software packages
This is a brand new, home-brewed installation. If you are missing a program or package, try to find it with apt-cache search keyword and let me know. I’ll install it right away.
Migration guide
Here’s a “Rosetta Stone” table of the common commands:
SGE |
SLURM |
Comments |
squb job.sh |
sbatch job.sh |
Submit the script job.sh for batch processing |
qdel 123 |
scancel 123 |
Stop/cancel the job 123 |
qstat |
squeue |
Show jobs in the queue; see also snodes -v |
qlogin |
qlogin |
Start an interactive session on a node; this is a wrapper for salloc |
srun command |
Run command on a compute node, output goes to console | |
qhost |
sinfo |
Show node status; see also snodes -v |
qhold 123 |
scontrol hold 123 |
Hold job 123 |
I’ve also make some custom commands that should be helpful:
snodes
Show compute node status. Options:
-v: Show verbose output, including jobs running on each node
-a: Show status for all nodes (not only those that are online)
-n: No color output
clusterstatus
Quick summary of the cluster node status, overall and per queue.
sjobs
A simple wrapper to sacct. Shows status of completed jobs.
Script Variables
The script variables have changed. You may be using those in your scripts, so adjust them accordingly:
SGE |
SLURM |
Comments |
$JOB_ID |
$SLURM_JOID |
The (numeric) ID of the current job |
$SGE_TASK_ID |
$SLURM_ARRAY_TASK_ID |
The numeric identifier of the current task (1…n) |
$PE_HOSTFILE |
$SLURM_JOB_NODELIST |
List of nodes assigned to your job (parallel environments) |
Job submission parameters
The parameters to qsub and qlogin differ from those to sbatch. See here for available command line options: http://slurm.schedmd.com/sbatch.html. See the example below for commonly used flags.
Job submission scripts
If you use custom job parameters (via #$) in your job submissions scripts, you will need to adjust those. SLURM looks for #SBATCH lines instead, and the flags have changed too (see job submission parameters above). Your job submission script can contain the SGE-syle #$ and #SBATCH specifications concurrently. All are optional.