User Tools

Site Tools


slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
slurm [2014/09/24 11:51]
volker created
slurm [2015/03/24 11:47] (current)
volker [Basics]
Line 1: Line 1:
-:!: I copied this over from my old Wiki. It needs some updating /Volker :!:+:!: I copied this over from my old Wiki. It needs some updating. Especially the node sharing on the zBox partition is now deprecated (I think).--- ​//​[[volker@physik.uzh.ch|Volker ​Hoffmann]] 2014/09/24 11:​55// ​:!:
  
 ====== SLURM Scheduler ====== ====== SLURM Scheduler ======
  
 +  * [[http://​slurm.schedmd.com/​rosetta.html|Rosetta Stone of Schedulers]]
   * Cf. [[https://​computing.llnl.gov/​linux/​slurm/​man_index.html]]   * Cf. [[https://​computing.llnl.gov/​linux/​slurm/​man_index.html]]
   * Especially [[https://​computing.llnl.gov/​linux/​slurm/​sbatch.html]]   * Especially [[https://​computing.llnl.gov/​linux/​slurm/​sbatch.html]]
 +
 +===== Basics ======
 +
 +  * Submit batch jobs
 +
 +<​code>​
 +$ sbatch script.job
 +</​code>​
 +
 +  * Cancel jobs
 +
 +<​code>​
 +$ scancel jobid
 +</​code>​
 +
 +  * View the queue
 +
 +<​code>​
 +$ squeue
 +</​code>​
 +
 +See below for example job scripts.
 +
 +===== Random Tips & Tricks =====
 +
 +  * Attach to a running job [[https://​computing.llnl.gov/​linux/​slurm/​sattach.html]]
 +
 +<​code>​
 +$ sattach jobid.jobstep
 +</​code>​
 +
 +  * We can hold a job by postponing it's start time [[https://​computing.llnl.gov/​linux/​slurm/​faq.html#​hold]]
 +
 +<​code>​
 +$ scontrol update JobId=1234 StartTime=now+30days
 +... later ...
 +$ scontrol update JobId=1234 StartTime=now
 +</​code>​
 +
 +  * If you want squeue to look like at CSCS, add the following to your .bashrc
 +
 +<​file>​
 +alias squeue="​squeue --format='​%.12i %.8u %.9P %.32j %.12B %.2t %.12r %.14M %.14L %.6D %.10Q'"​
 +</​file>​
 +
 +===== Launch Interactive GPU Jobs (Compiling, Testing) =====
 +
 +  * Allocate a GPU slot
 +
 +<​code>​
 +salloc --ntasks 1 --gres gpu:1 --partition tasna --account gpu
 +</​code>​
 +
 +  * Once allocated, launch bash shell
 +
 +<​code>​
 +srun --pty bash
 +</​code>​
 +
 +  * :!: Always do this from the front-end nodes. As Slurm inherits you're environment,​ CUDA stuff (nvcc, etc) won't be available of you issue this job from other computers.
  
 ===== Example Script for GPU Jobs ===== ===== Example Script for GPU Jobs =====
  
-The file **/​home/​itp/​volker/​Slurm/​blacklist** contains a line-by-line listing of nodes we wish to avoid.+  ​#XSBATCH lines are comments and are not parsed ​by the SLURM.
  
 <​file>​ <​file>​
 #!/bin/bash #!/bin/bash
-#SBATCH --output /home/itp/​volker/​Genga/​Jobs/​Debris/ReRuns-01/​formation/​pro_2/Logs/runm_5-%j.out +#SBATCH --output /home/ics/​volker/​Genga/​Jobs/​HitnRun/Reufer2012/Logs/cC03m_conex-%j.out 
-#SBATCH --job-name ​pro_2/runm_5 +#SBATCH --job-name ​HitnRun/​R12/​cC03m/ConeX 
-#SBATCH --partition ​tasna+#SBATCH --partition ​vesta
 #SBATCH --account gpu #SBATCH --account gpu
 #SBATCH --ntasks 1 #SBATCH --ntasks 1
 #SBATCH --gres gpu:1 #SBATCH --gres gpu:1
-#OLDSBATCH ​--time ​0-00:10:00 +#SBATCH ​--time ​28-00:00:00 
-#SBATCH ​--exclude=/​home/​itp/​volker/​Slurm/​blacklist +#XSBATCH ​--exclude=tasna5 
-#SBATCH --mail-user ​volker@physik.uzh.ch+#SBATCH --mail-user ​you@yourdomain.com 
 +#SBATCH --mail-type END 
 +#SBATCH --no-requeue
  
-genga=/home/itp/volker/​Source/​genga-dev/​source/​genga_sm20 +home=/home/ics/volker 
-scratch=/​zbox/​data/​volker/​Debris/​ReRuns-01/​formation/​pro_2/​runm_5+data=/​zbox/​data/​volker
  
-echo $genga +genga=$home/​Source/​genga-dev-hitnrun/​source/​genga_hitnrun_coll24days_sm37 
-echo $scratch +outdir=$data/​HitnRun/​Reufer2012/​cC03m_conex
-echo ""​+
  
 +echo ""​
 echo "***** LAUNCHING *****" echo "***** LAUNCHING *****"
 echo `date '+%F %H:​%M:​%S'​` echo `date '+%F %H:​%M:​%S'​`
 echo ""​ echo ""​
  
-cd $scratch+echo "​genga="​$genga 
 +echo "​outdir="​$outdir 
 +echo "​hostname="​`hostname` 
 +echo "​cuda_visible_devices="​$CUDA_VISIBLE_DEVICES 
 + 
 +echo ""​ 
 +echo "​***"​ 
 +echo ""​ 
 + 
 +cd $outdir
 export DATE=`date +%F_%H%M` export DATE=`date +%F_%H%M`
-time srun $genga > Run_$DATE.log+srun $genga > Run_$DATE.log
  
 echo ""​ echo ""​
Line 45: Line 117:
 ===== Example Script for MPI Jobs ===== ===== Example Script for MPI Jobs =====
  
 +The file **/​home/​itp/​volker/​Slurm/​blacklist** contains a line-by-line listing of nodes we wish to avoid.
 <​file>​ <​file>​
 #!/bin/bash #!/bin/bash
slurm.1411552306.txt.gz · Last modified: 2014/09/24 11:51 by volker