This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
slurm [2014/09/29 15:30] volker |
slurm [2015/03/24 11:47] (current) volker [Basics] |
||
---|---|---|---|
Line 3: | Line 3: | ||
====== SLURM Scheduler ====== | ====== SLURM Scheduler ====== | ||
- | * [[http://slurm.schedmd.com/rosetta.html|Rosetta Stone of Scheduler]] | + | * [[http://slurm.schedmd.com/rosetta.html|Rosetta Stone of Schedulers]] |
* Cf. [[https://computing.llnl.gov/linux/slurm/man_index.html]] | * Cf. [[https://computing.llnl.gov/linux/slurm/man_index.html]] | ||
* Especially [[https://computing.llnl.gov/linux/slurm/sbatch.html]] | * Especially [[https://computing.llnl.gov/linux/slurm/sbatch.html]] | ||
+ | |||
+ | ===== Basics ====== | ||
+ | |||
+ | * Submit batch jobs | ||
+ | |||
+ | <code> | ||
+ | $ sbatch script.job | ||
+ | </code> | ||
+ | |||
+ | * Cancel jobs | ||
+ | |||
+ | <code> | ||
+ | $ scancel jobid | ||
+ | </code> | ||
+ | |||
+ | * View the queue | ||
+ | |||
+ | <code> | ||
+ | $ squeue | ||
+ | </code> | ||
+ | |||
+ | See below for example job scripts. | ||
+ | |||
+ | ===== Random Tips & Tricks ===== | ||
+ | |||
+ | * Attach to a running job [[https://computing.llnl.gov/linux/slurm/sattach.html]] | ||
+ | |||
+ | <code> | ||
+ | $ sattach jobid.jobstep | ||
+ | </code> | ||
+ | |||
+ | * We can hold a job by postponing it's start time [[https://computing.llnl.gov/linux/slurm/faq.html#hold]] | ||
+ | |||
+ | <code> | ||
+ | $ scontrol update JobId=1234 StartTime=now+30days | ||
+ | ... later ... | ||
+ | $ scontrol update JobId=1234 StartTime=now | ||
+ | </code> | ||
+ | |||
+ | * If you want squeue to look like at CSCS, add the following to your .bashrc | ||
+ | |||
+ | <file> | ||
+ | alias squeue="squeue --format='%.12i %.8u %.9P %.32j %.12B %.2t %.12r %.14M %.14L %.6D %.10Q'" | ||
+ | </file> | ||
===== Launch Interactive GPU Jobs (Compiling, Testing) ===== | ===== Launch Interactive GPU Jobs (Compiling, Testing) ===== | ||
Line 12: | Line 56: | ||
<code> | <code> | ||
- | salloc -n 1 --gres gpu:1 -p tasna -A gpu | + | salloc --ntasks 1 --gres gpu:1 --partition tasna --account gpu |
</code> | </code> | ||
Line 20: | Line 64: | ||
srun --pty bash | srun --pty bash | ||
</code> | </code> | ||
+ | |||
+ | * :!: Always do this from the front-end nodes. As Slurm inherits you're environment, CUDA stuff (nvcc, etc) won't be available of you issue this job from other computers. | ||
===== Example Script for GPU Jobs ===== | ===== Example Script for GPU Jobs ===== | ||
Line 26: | Line 72: | ||
<file> | <file> | ||
- | #!/bin/bash | + | #!/bin/bash |
- | #SBATCH --output /home/ics/volker/Genga/Jobs/Debris/Chaos-41/gas_03/Logs/run_01-%j.out | + | #SBATCH --output /home/ics/volker/Genga/Jobs/HitnRun/Reufer2012/Logs/cC03m_conex-%j.out |
- | #SBATCH --job-name c41/gas_03/run_01 | + | #SBATCH --job-name HitnRun/R12/cC03m/ConeX |
- | #SBATCH --partition tasna | + | #SBATCH --partition vesta |
#SBATCH --account gpu | #SBATCH --account gpu | ||
#SBATCH --ntasks 1 | #SBATCH --ntasks 1 | ||
#SBATCH --gres gpu:1 | #SBATCH --gres gpu:1 | ||
#SBATCH --time 28-00:00:00 | #SBATCH --time 28-00:00:00 | ||
- | #XSBATCH --exclude=tasna1 | + | #XSBATCH --exclude=tasna5 |
- | #SBATCH --mail-user volker@cheleb.net | + | #SBATCH --mail-user you@yourdomain.com |
- | #SBATCH --mail-type ALL | + | #SBATCH --mail-type END |
#SBATCH --no-requeue | #SBATCH --no-requeue | ||
Line 42: | Line 88: | ||
data=/zbox/data/volker | data=/zbox/data/volker | ||
- | genga=$home/Source/genga-dev/source/genga_ss_gas_sm20 | + | genga=$home/Source/genga-dev-hitnrun/source/genga_hitnrun_coll24days_sm37 |
- | outdir=$data/Debris/Runs/Chaos-41/gas_03/run_01 | + | outdir=$data/HitnRun/Reufer2012/cC03m_conex |
- | echo $genga | ||
- | echo $outdir | ||
echo "" | echo "" | ||
- | |||
echo "***** LAUNCHING *****" | echo "***** LAUNCHING *****" | ||
echo `date '+%F %H:%M:%S'` | echo `date '+%F %H:%M:%S'` | ||
+ | echo "" | ||
+ | |||
+ | echo "genga="$genga | ||
+ | echo "outdir="$outdir | ||
+ | echo "hostname="`hostname` | ||
+ | echo "cuda_visible_devices="$CUDA_VISIBLE_DEVICES | ||
+ | |||
+ | echo "" | ||
+ | echo "***" | ||
echo "" | echo "" | ||
cd $outdir | cd $outdir | ||
export DATE=`date +%F_%H%M` | export DATE=`date +%F_%H%M` | ||
- | time srun $genga > Run_$DATE.log | + | srun $genga > Run_$DATE.log |
echo "" | echo "" |