Cluster Usage Examples – UCLA-DOE Institute

This document describes some basic commands for use with the newhorizons cluster (2015) which uses the job scheduler slurm.

For complete documentation on slurm, click here

The simplest demonstration of usage with slurm is to have a script as simple as:

#!/bin/bash
#SBATCH -o myout.out
#SBATCH -e myerrors.error
#SBATCH --mem=16G
#SBATCH -p medium
mkdir /scratch/myoutputdir
cd /home/[USER]/[Your Job Dir]
./wait.sh
# Where myoutputdir is the location of my job output
# myout.out above, is the file that gets the STDOUT from the job.

Where wait.sh contains:

#!/bin/bash
echo "waiting on 20 seconds..."
sleep 20

And to submit this, you type:

shell> sbatch simple.sh

It will run in the medium queue with 16G of requested memory, write all standard out to myout.out, and any errors to myerrors.error, both of which will be placed in the current directory.

Sample job submission script

#!/bin/bash
#SBATCH -o output.txt
#SBATCH -e errors.txt
#SBATCH -J my_job
#SBATCH -n 4
#SBATCH –mem=2G
#SBATCH –time=1:00:00
#SBATCH -p short
#SBATCH –mail-type END,FAIL,TIME_LIMIT_80

Run script.sh in the medium queue with 1 CPU and 8 GB of RAM.

sbatch -p medium –mem=8G script.sh

Run script.sh in the medium queue with 4 CPUs on 1 node and 32 GB of RAM (4 x 8 GB).

sbatch -p medium -n 4 -N 1 –mem=8G script.sh

Run script.sh in the medium queue with a 2 hour time limit, on a node with a GPU, requesting 16 CPUs and 19 GB of RAM (16 x 1 GB).

sbatch -p medium –time=2:00:00 –constraint=gpu -n 16 -N 1 –mem=1G script.sh

Run a parallel MPI job on 64 CPUs, executing script.sh.

sbatch -p medium -n 64 script.sh

Run an array job, executing script.sh 1000 times in parallel, each task with a different $SLURM_ARRAY_TASK_ID parameter between 1 and 1000.

sbatch -a 1-1000 script.sh

the limit is 50,000. You can limit the concurrent tasks by adding
%limit
sbatch -a 1-10000%200

This will run total 10000 tasks but only 200 concurrently.

A more complete example:

#!/bin/bash
#SBATCH -e myerror.out
#SBATCH --mem=16G
#SBATCH -p medium

# Important: the above files will be written to be all jobs, as they finish, and may result in gibberish
# So instead of using the above -o, use it on the command line:
# sbatch -o slurm-%A_%a.out --array=1-3 -N1 script
# Where %A=SLURM_ARRAY_JOB_ID and %a=SLURM_ARRAY_TASK_ID

# This one will take the index presented in the array submit and send it to wait_array.sh
cd /home/holton/Slurm
./wait_array.sh $SLURM_JOBID $SLURM_ARRAY_JOB_ID $SLURM_ARRAY_TASK_ID 

# Or, you can just redirect the above command using the variables, like this:
# ./wait_array.sh $SLURM_JOBID $SLURM_ARRAY_JOB_ID $SLURM_ARRAY_TASK_ID > $SLURM_ARRAY_TASK_ID.file.out
# same diff

echo ""
echo "it is done. this is the last message that will appear after the output from the above job"

and run it like this:
sbatch -o slurm-%A_%a.out -a 1-4 -N1 array_simple.sh

Start an interactive job with 1 CPU and 8 GB of RAM.

qlogin -l mem=8G

Start an interactive job with 4 CPUs and 8 GB (4 x 2 GB) of RAM.

qlogin -l mem=2G -n 4

Please remember: only ask for resources you will actually use. If your program is single-threaded, don’t request a parallel environment with more than one CPU. If you require all CPUs to be on one physical node, specify that with -N 1. The requested memory allocation is per requested CPU.