This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
slurm [2015/01/13 08:56] volker holding jobs |
slurm [2015/03/24 11:47] (current) volker [Basics] |
||
|---|---|---|---|
| Line 6: | Line 6: | ||
| * Cf. [[https://computing.llnl.gov/linux/slurm/man_index.html]] | * Cf. [[https://computing.llnl.gov/linux/slurm/man_index.html]] | ||
| * Especially [[https://computing.llnl.gov/linux/slurm/sbatch.html]] | * Especially [[https://computing.llnl.gov/linux/slurm/sbatch.html]] | ||
| + | |||
| + | ===== Basics ====== | ||
| + | |||
| + | * Submit batch jobs | ||
| + | |||
| + | <code> | ||
| + | $ sbatch script.job | ||
| + | </code> | ||
| + | |||
| + | * Cancel jobs | ||
| + | |||
| + | <code> | ||
| + | $ scancel jobid | ||
| + | </code> | ||
| + | |||
| + | * View the queue | ||
| + | |||
| + | <code> | ||
| + | $ squeue | ||
| + | </code> | ||
| + | |||
| + | See below for example job scripts. | ||
| ===== Random Tips & Tricks ===== | ===== Random Tips & Tricks ===== | ||
| Line 12: | Line 34: | ||
| <code> | <code> | ||
| - | sattach jobid.jobstep | + | $ sattach jobid.jobstep |
| </code> | </code> | ||
| Line 22: | Line 44: | ||
| $ scontrol update JobId=1234 StartTime=now | $ scontrol update JobId=1234 StartTime=now | ||
| </code> | </code> | ||
| + | |||
| + | * If you want squeue to look like at CSCS, add the following to your .bashrc | ||
| + | |||
| + | <file> | ||
| + | alias squeue="squeue --format='%.12i %.8u %.9P %.32j %.12B %.2t %.12r %.14M %.14L %.6D %.10Q'" | ||
| + | </file> | ||
| + | |||
| ===== Launch Interactive GPU Jobs (Compiling, Testing) ===== | ===== Launch Interactive GPU Jobs (Compiling, Testing) ===== | ||
| Line 35: | Line 64: | ||
| srun --pty bash | srun --pty bash | ||
| </code> | </code> | ||
| + | |||
| + | * :!: Always do this from the front-end nodes. As Slurm inherits you're environment, CUDA stuff (nvcc, etc) won't be available of you issue this job from other computers. | ||
| ===== Example Script for GPU Jobs ===== | ===== Example Script for GPU Jobs ===== | ||
| Line 41: | Line 72: | ||
| <file> | <file> | ||
| - | #!/bin/bash | + | #!/bin/bash |
| - | #SBATCH --output /home/ics/volker/Genga/Jobs/Debris/Chaos-41/gas_03/Logs/run_01-%j.out | + | #SBATCH --output /home/ics/volker/Genga/Jobs/HitnRun/Reufer2012/Logs/cC03m_conex-%j.out |
| - | #SBATCH --job-name c41/gas_03/run_01 | + | #SBATCH --job-name HitnRun/R12/cC03m/ConeX |
| - | #SBATCH --partition tasna | + | #SBATCH --partition vesta |
| #SBATCH --account gpu | #SBATCH --account gpu | ||
| #SBATCH --ntasks 1 | #SBATCH --ntasks 1 | ||
| #SBATCH --gres gpu:1 | #SBATCH --gres gpu:1 | ||
| #SBATCH --time 28-00:00:00 | #SBATCH --time 28-00:00:00 | ||
| - | #XSBATCH --exclude=tasna1 | + | #XSBATCH --exclude=tasna5 |
| - | #SBATCH --mail-user volker@cheleb.net | + | #SBATCH --mail-user you@yourdomain.com |
| - | #SBATCH --mail-type ALL | + | #SBATCH --mail-type END |
| #SBATCH --no-requeue | #SBATCH --no-requeue | ||
| Line 57: | Line 88: | ||
| data=/zbox/data/volker | data=/zbox/data/volker | ||
| - | genga=$home/Source/genga-dev/source/genga_ss_gas_sm20 | + | genga=$home/Source/genga-dev-hitnrun/source/genga_hitnrun_coll24days_sm37 |
| - | outdir=$data/Debris/Runs/Chaos-41/gas_03/run_01 | + | outdir=$data/HitnRun/Reufer2012/cC03m_conex |
| - | echo $genga | ||
| - | echo $outdir | ||
| echo "" | echo "" | ||
| - | |||
| echo "***** LAUNCHING *****" | echo "***** LAUNCHING *****" | ||
| echo `date '+%F %H:%M:%S'` | echo `date '+%F %H:%M:%S'` | ||
| + | echo "" | ||
| + | |||
| + | echo "genga="$genga | ||
| + | echo "outdir="$outdir | ||
| + | echo "hostname="`hostname` | ||
| + | echo "cuda_visible_devices="$CUDA_VISIBLE_DEVICES | ||
| + | |||
| + | echo "" | ||
| + | echo "***" | ||
| echo "" | echo "" | ||
| cd $outdir | cd $outdir | ||
| export DATE=`date +%F_%H%M` | export DATE=`date +%F_%H%M` | ||
| - | time srun $genga > Run_$DATE.log | + | srun $genga > Run_$DATE.log |
| echo "" | echo "" | ||