User Tools

Site Tools


slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
slurm [2014/09/24 11:55]
volker
slurm [2015/03/24 11:47] (current)
volker [Basics]
Line 3: Line 3:
 ====== SLURM Scheduler ====== ====== SLURM Scheduler ======
  
 +  * [[http://​slurm.schedmd.com/​rosetta.html|Rosetta Stone of Schedulers]]
   * Cf. [[https://​computing.llnl.gov/​linux/​slurm/​man_index.html]]   * Cf. [[https://​computing.llnl.gov/​linux/​slurm/​man_index.html]]
   * Especially [[https://​computing.llnl.gov/​linux/​slurm/​sbatch.html]]   * Especially [[https://​computing.llnl.gov/​linux/​slurm/​sbatch.html]]
 +
 +===== Basics ======
 +
 +  * Submit batch jobs
 +
 +<​code>​
 +$ sbatch script.job
 +</​code>​
 +
 +  * Cancel jobs
 +
 +<​code>​
 +$ scancel jobid
 +</​code>​
 +
 +  * View the queue
 +
 +<​code>​
 +$ squeue
 +</​code>​
 +
 +See below for example job scripts.
 +
 +===== Random Tips & Tricks =====
 +
 +  * Attach to a running job [[https://​computing.llnl.gov/​linux/​slurm/​sattach.html]]
 +
 +<​code>​
 +$ sattach jobid.jobstep
 +</​code>​
 +
 +  * We can hold a job by postponing it's start time [[https://​computing.llnl.gov/​linux/​slurm/​faq.html#​hold]]
 +
 +<​code>​
 +$ scontrol update JobId=1234 StartTime=now+30days
 +... later ...
 +$ scontrol update JobId=1234 StartTime=now
 +</​code>​
 +
 +  * If you want squeue to look like at CSCS, add the following to your .bashrc
 +
 +<​file>​
 +alias squeue="​squeue --format='​%.12i %.8u %.9P %.32j %.12B %.2t %.12r %.14M %.14L %.6D %.10Q'"​
 +</​file>​
  
 ===== Launch Interactive GPU Jobs (Compiling, Testing) ===== ===== Launch Interactive GPU Jobs (Compiling, Testing) =====
Line 11: Line 56:
  
 <​code>​ <​code>​
-salloc -1 --gres gpu:1 -tasna -gpu+salloc --ntasks ​1 --gres gpu:1 --partition ​tasna --account ​gpu
 </​code>​ </​code>​
  
Line 19: Line 64:
 srun --pty bash srun --pty bash
 </​code>​ </​code>​
 +
 +  * :!: Always do this from the front-end nodes. As Slurm inherits you're environment,​ CUDA stuff (nvcc, etc) won't be available of you issue this job from other computers.
  
 ===== Example Script for GPU Jobs ===== ===== Example Script for GPU Jobs =====
  
-The file **/​home/​itp/​volker/​Slurm/​blacklist** contains a line-by-line listing of nodes we wish to avoid.+  ​#XSBATCH lines are comments and are not parsed ​by the SLURM.
  
 <​file>​ <​file>​
 #!/bin/bash #!/bin/bash
-#SBATCH --output /home/itp/​volker/​Genga/​Jobs/​Debris/ReRuns-01/​formation/​pro_2/Logs/runm_5-%j.out +#SBATCH --output /home/ics/​volker/​Genga/​Jobs/​HitnRun/Reufer2012/Logs/cC03m_conex-%j.out 
-#SBATCH --job-name ​pro_2/runm_5 +#SBATCH --job-name ​HitnRun/​R12/​cC03m/ConeX 
-#SBATCH --partition ​tasna+#SBATCH --partition ​vesta
 #SBATCH --account gpu #SBATCH --account gpu
 #SBATCH --ntasks 1 #SBATCH --ntasks 1
 #SBATCH --gres gpu:1 #SBATCH --gres gpu:1
-#OLDSBATCH ​--time ​0-00:10:00 +#SBATCH ​--time ​28-00:00:00 
-#SBATCH ​--exclude=/​home/​itp/​volker/​Slurm/​blacklist +#XSBATCH ​--exclude=tasna5 
-#SBATCH --mail-user ​volker@physik.uzh.ch+#SBATCH --mail-user ​you@yourdomain.com 
 +#SBATCH --mail-type END 
 +#SBATCH --no-requeue
  
-genga=/home/itp/volker/​Source/​genga-dev/​source/​genga_sm20 +home=/home/ics/volker 
-scratch=/​zbox/​data/​volker/​Debris/​ReRuns-01/​formation/​pro_2/​runm_5+data=/​zbox/​data/​volker
  
-echo $genga +genga=$home/​Source/​genga-dev-hitnrun/​source/​genga_hitnrun_coll24days_sm37 
-echo $scratch +outdir=$data/​HitnRun/​Reufer2012/​cC03m_conex
-echo ""​+
  
 +echo ""​
 echo "***** LAUNCHING *****" echo "***** LAUNCHING *****"
 echo `date '+%F %H:​%M:​%S'​` echo `date '+%F %H:​%M:​%S'​`
 echo ""​ echo ""​
  
-cd $scratch+echo "​genga="​$genga 
 +echo "​outdir="​$outdir 
 +echo "​hostname="​`hostname` 
 +echo "​cuda_visible_devices="​$CUDA_VISIBLE_DEVICES 
 + 
 +echo ""​ 
 +echo "​***"​ 
 +echo ""​ 
 + 
 +cd $outdir
 export DATE=`date +%F_%H%M` export DATE=`date +%F_%H%M`
-time srun $genga > Run_$DATE.log+srun $genga > Run_$DATE.log
  
 echo ""​ echo ""​
Line 59: Line 117:
 ===== Example Script for MPI Jobs ===== ===== Example Script for MPI Jobs =====
  
 +The file **/​home/​itp/​volker/​Slurm/​blacklist** contains a line-by-line listing of nodes we wish to avoid.
 <​file>​ <​file>​
 #!/bin/bash #!/bin/bash
slurm.1411552548.txt.gz · Last modified: 2014/09/24 11:55 by volker