User Tools

Site Tools


slurm

:!: I copied this over from my old Wiki. It needs some updating. Especially the node sharing on the zBox partition is now deprecated (I think).— Volker Hoffmann 2014/09/24 11:55 :!:

SLURM Scheduler

Basics

  • Submit batch jobs
$ sbatch script.job
  • Cancel jobs
$ scancel jobid
  • View the queue
$ squeue

See below for example job scripts.

Random Tips & Tricks

$ sattach jobid.jobstep
$ scontrol update JobId=1234 StartTime=now+30days
... later ...
$ scontrol update JobId=1234 StartTime=now
  • If you want squeue to look like at CSCS, add the following to your .bashrc
alias squeue="squeue --format='%.12i %.8u %.9P %.32j %.12B %.2t %.12r %.14M %.14L %.6D %.10Q'"

Launch Interactive GPU Jobs (Compiling, Testing)

  • Allocate a GPU slot
salloc --ntasks 1 --gres gpu:1 --partition tasna --account gpu
  • Once allocated, launch bash shell
srun --pty bash
  • :!: Always do this from the front-end nodes. As Slurm inherits you're environment, CUDA stuff (nvcc, etc) won't be available of you issue this job from other computers.

Example Script for GPU Jobs

  • #XSBATCH lines are comments and are not parsed by the SLURM.
#!/bin/bash
#SBATCH --output /home/ics/volker/Genga/Jobs/HitnRun/Reufer2012/Logs/cC03m_conex-%j.out
#SBATCH --job-name HitnRun/R12/cC03m/ConeX
#SBATCH --partition vesta
#SBATCH --account gpu
#SBATCH --ntasks 1
#SBATCH --gres gpu:1
#SBATCH --time 28-00:00:00
#XSBATCH --exclude=tasna5
#SBATCH --mail-user you@yourdomain.com
#SBATCH --mail-type END
#SBATCH --no-requeue

home=/home/ics/volker
data=/zbox/data/volker

genga=$home/Source/genga-dev-hitnrun/source/genga_hitnrun_coll24days_sm37
outdir=$data/HitnRun/Reufer2012/cC03m_conex

echo ""
echo "***** LAUNCHING *****"
echo `date '+%F %H:%M:%S'`
echo ""

echo "genga="$genga
echo "outdir="$outdir
echo "hostname="`hostname`
echo "cuda_visible_devices="$CUDA_VISIBLE_DEVICES

echo ""
echo "***"
echo ""

cd $outdir
export DATE=`date +%F_%H%M`
srun $genga > Run_$DATE.log

echo ""
echo "***** DONE *****"
echo `date '+%F %H:%M:%S'`
echo ""

Example Script for MPI Jobs

The file /home/itp/volker/Slurm/blacklist contains a line-by-line listing of nodes we wish to avoid.

#!/bin/bash
#SBATCH -o /home/itp/volker/Mydisk/Jobs/AdaC/1024/Logs/t03000_E4__R1-%j.out
#SBATCH -J AdaC/1024/t03000_E4__R1
#SBATCH -p zbox
#SBATCH --time 0-24:00:00
#SBATCH --ntasks=256 --exclusive
#SBATCH --exclude=/home/itp/volker/Slurm/blacklist
#SBATCH --mail-user=volker@physik.uzh.ch

home=/home/itp/volker
scratch=/zbox/project/volker

nml=$home/Mydisk/NML/AdaC/1024/t03000_E4__R1.nml
ramses=$home/Source/ramses-dev/trunk/ramses/bin/ppd3d
data=$scratch/Mydisk/AdaC/1024/t03000_E4__R1
cd $data

echo $nml
echo $ramses
echo "***"
pwd

echo ""
echo "***** LAUNCHING *****"
echo `date '+%F %H:%M:%S'`
echo ""

export DATE=`date +%F_%H%M`
time srun $ramses $nml > $data/Run_$DATE.log

echo ""
echo "***** DONE *****"
echo `date '+%F %H:%M:%S'`
echo ""

Example Script for Node-Sharing Single-Core Jobs

#!/bin/bash
#SBATCH -o /home/itp/volker/Mydisk/Jobs/Viz4/AdaC/1024/Logs/t03000_E4__R1-%j.out
#SBATCH -J Viz4/AdaC/1024/t03000_E4__R1
#SBATCH -p zbox
#SBATCH --ntasks=1
#SBATCH --time=0-06:00:00
#SBATCH --exclude=/home/itp/volker/Slurm/blacklist
#SBATCH --mail-user=volker@physik.uzh.ch

# Load Python Environment
export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/Source
source $HOME/.local/bin/virtualenvwrapper.sh
workon scipy

imin=1
imax=47
opts="--together"
#fps=15

home=/home/itp/volker
scratch=/zbox/project/volker

script1=$home/Source/Viz4/reduce.py
script2=$home/Source/Viz4/plot_quad_xy.py
script3=$home/Source/Viz4/plot_quad_rz.py
script4=$home/Source/Viz4/plot_quad_r.py

data=$scratch/Mydisk/AdaC/1024/t03000_E4__R1

echo $data
echo $script1 $imin $imax --lofi
echo $script2 $imin $imax
echo $script3 $imin $imax 
echo $script4 $imin $imax 

echo ""
echo "***** LAUNCHING *****"
echo `date '+%F %H:%M:%S'`
echo ""

cd $data
time python $script1 $imin $imax --lofi
time python $script2 $imin $imax
time python $script3 $imin $imax
time python $script4 $imin $imax

# mencoder "mf://quad_r_*.png" -mf w=1600:h=1200:fps=${fps}:type=png -ovc lavc -lavcopts vcodec=mpeg4:mbd=2:trell -oac copy -o quad_r.avi
# mencoder "mf://quad_rz_*.png" -mf w=1600:h=1200:fps=${fps}:type=png -ovc lavc -lavcopts vcodec=mpeg4:mbd=2:trell -oac copy -o quad_rz.avi
# mencoder "mf://quad_xy_*.png" -mf w=1600:h=1200:fps=${fps}:type=png -ovc lavc -lavcopts vcodec=mpeg4:mbd=2:trell -oac copy -o quad_xy.avi

echo ""
echo "***** DONE *****"
echo `date '+%F %H:%M:%S'`
echo ""
slurm.txt · Last modified: 2015/03/24 11:47 by volker