Home
Batch Usage in JSC - Introduction to Slurm
Contents
1. gt max 20 pending gueued jobs Mitglied der Helmholtz Gemeinschaft or 0 Hr WH Ww a 0 0 4 eo System Usage Modules JULICH The installed software of the clusters is organized through a hierarchy of modules Loading a module adapts your environment variables to give you access to a specific set of software and its dependencies Preparing the module environment includes two steps 1 Load one of the available tool chains There are 3 levels of tool chains a Compilers b Compilers MPI and c FullToolchains 2 Then load other application modules which where built with currently loaded tool chain Useful commands module avail module load intel para 2014 11 module list module avail module spider Boost 1 56 0 module load Boost 1 56 0 module unload Boost 1 56 0 module purge System Usage Compilation A J LICH FORSCHUNGSZENTRUM On our clusters in JSC we offer some wrappers to the users in order to compile and execute parallel jobs using MPI The current wrappers are mpicc mpicxx mpif77 mpif90 Some useful compiler options that are commonly used g Enables debugging openmp Enables the use of OpenMP 0 0 3 Sets the optimization level L A path can be given to the linker for searching libraries D Defines a macro U Undefines a macro I Adds further directories to the include file search path Compile an MPI program in C mpicxx 02 o mpi_prog program
2. e g the maximum wall time limit Default limits settings are used when not given by the users like number of nodes number of tasks per node wall time limit etc According to group s contingent user jobs are given certain QoS normal group has contingent high job priorities lowcont this months contingent was used penalty gt lower job priorities nocont all contingent of the 3 months time frame was used penalty gt lowest job priorities and lower max wall time limit nolimits special QoS for the admins testing or maintenance suspended the group s project has ended user cannot submit jobs J LICH FORSCHUNGSZENTRUM Y Slurm Architecture Master Nodes Compute Node N Seismisel TT Compute Node 2 Login Nodes Commands YEYDSUIBW9D ZIJOYW AH Jap PAGI Mitglied der Helmholtz Gemeinsc JUROPATEST A J LICH JUROPATEST is a test system which allows users of JSC s current general purpose supercomputer JUROPA to port and optimize their applications for the new Haswell CPU architecture Partitions batch default partition includes all compute nodes Limits max 6h default 30mins max 16 nodes default 1 node large all compute nodes intended for large jobs Limits max Lh default 30mins max 62 nodes default 17 nodes maint all compute nodes used by the admins For both batch and large partitions gt max 4 running jobs
3. 49 2461 61 2828 A J LICH FORSCHUNGSZENTRUM Questions
4. A J LICH FORSCHUNGSZENTRUM Batch Usage in JSC Introduction to Slurm May 2015 Chrysovalantis Paschoulas c paschoulas fz juelich de Mitglied der Helmholtz Gemeinsch Batch System Concepts 1 A J LICH A cluster consists of a set of tightly connected identical computers that are presented as a single system and work together to solve computation intensive problems The nodes are connected through high speed local network and have access to shared resources like shared file systems etc Batch Processing is the execution of programs or jobs without user s intervention A job is the execution of user defined work flows by the batch system Resource Manager is the software responsible for managing the resources of a cluster like tasks nodes CPUs memory network etc It manages also the execution of the jobs It makes sure that jobs are not overlapping on the resources and handles also their I O Usually it is controlled by the scheduler FORSCHUNGSZENTRUM Mitglied der Helmholtz Gemeinsch 4 eo Batch System Concepts 2 J JULICH Scheduler Workload Manager is the software that controlls user s jobs on a cluster It receives jobs from users and controls the resource manager to make sure that the jobs are completed successfully Handles the job submissions and put jobs into queues It offers many features like gt user commands for managing the jobs start stop hold gt interfaces for d
5. CPUS_PER_TASK srun hybrid prog Mitglied der Helmholtz Gemeinsch Job Script Multiple Job Steps A J LICH FORSCHUNGSZENTRUM Slurm introduces the concept of job steps A job step can be viewed as a smaller job or allocation inside the current allocation Job steps can be started only with the srun command The following example shows the usage of job steps With sbatch we allocate 32 compute nodes for 6 hours Then we spawn 2 job steps The first step will run on 16 compute nodes for 50 minutes the second step on 2 nodes for 10 minutes and the third step will use all 32 allocated nodes for 5 hours bin bash t SBATCH SBATCH SBATCH srun N srun N srun N N 32 time 06 00 00 partition batch 16 n 32 t 00 50 00 mpi progl 2 n 4 t 00 10 00 mpi prog2 32 ntasks per node 2 t 05 00 00 mpi prog3 Mitglied der Helmholtz Gemeinschaft Job Dependencies amp Job Chains A J LICH FORSCHUNGSZENTRUM Slurm supports dependency chains which are collections of batch jobs with defined dependencies Job dependencies can be defined using the dependency Or d option of sbatch The format is sbatch d lt type gt lt jobID gt lt jobscript gt Available dependency types after afterany afternotok afterok Below is an example of a bash script for starting a chain of jobs The script submits a chain of sno or Joss Each job will start only after successful completion o
6. Number of tasks MPI processes ntasks per node Number of tasks per compute node o output Path to the job s standard output Mitglied der Helmholtz Gemeinschaft p partition Partition to be used from the job t time Maximum wall clock time of the job Mitglied der Helmholtz Gemeinschaft Slurm Job Submission Examples Y egret Submit a job requesting 2 nodes for 1 hour with 28 tasks per node implied value of ntasks 56 sbatch N2 ntasks per node 28 time 1 00 00 jobscript Submit a job array of 4 jobs with 1 node per job with the default wall time sbatch array 0 3 N1 jobscript Submit a job script in the large partition requesting 62 nodes for 2 hours sbatch N62 p large t 2 00 00 jobscript Submit a job requesting all available mail notifications to the specified email address sbatch N2 mail user emailtaddress com mail type ALL jobscript Specify a job name and the standard output error files sbatch N1 J myjob o MyJob j out e MyJob j err jobscript Allocate 4 nodes for 1 hour salloc N4 time 60 Mitglied der Helmholtz Gemeinschaft Slurm Spawning Command A J LICH FORSCHUNGSZENTRUM With srun the users can spawn any kind of application process or task inside a job allocation srun should be used either 1 Inside a job script submitted by sbatch starts a job step 2 After calling salloc execute programs interactively Command format srun option
7. can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job s node allocation sshare IS used to retrieve fair share information for each user sstat allows to query status information about a running job sview IS a graphical user interface to get state information for jobs partitions and nodes sacct IS used to retrieve accounting information about jobs and job steps in Slurm s database sacctmgr allows also the users to query some information about their accounts and other accounting information in Slurm s database For more detailed info please check the online documentation and the man pages i i 4 eo Slurm Job Submission I JULICH There are 2 commands for job allocation sbatch is used for batch jobs and salloc is used to allocate resource for interactive jobs The format of these commands sbatch options jobscript args salloc options lt command gt command args List of the most important submission allocation options c cpus per task Number of logical CPUs hardware threads per task e error Path to the job s standard error i input Connect the jobscript s standard input directly to a file J job name Set the name of the job mail user Define the mail address for notifications mail type When to send mail notifications Options BEGIN END FAIL ALL N nodes Number of compute nodes used by the job n ntasks
8. cpp Compile a hybrid MPI OpenMP program in C Mitglied der Helmholtz Gemeinschaft mpicc openmp o mpi_prog program c Mitglied der Helmholtz Gemeinsc A J LICH Slurm User Commands 1 tead salloc is used to request interactive jobs allocations sattach IS used to attach standard input output and error plus signal capabilities to a currently running job or job step sbatch is used to submit a batch script which can be a bash Perl or Python script scancel is used to cancel a pending or running job or job step sbcast is used to transfer a file to all nodes allocated for a job sgather Is used to transfer a file from all allocated nodes to the currently active job This command can be used only inside a job script scontrol provides alSo some functionality for the users to manage jobs or query and get some information about the system configuration sinfo IS used to retrieve information about the partitions reservations and node states smap graphically shows the state of the partitions and nodes using a curses interface We recommend llview as an alternative which is Supported on all JSC machines Mitglied der Helmholtz Gemeinsch A J LICH Slurm User Commands 2 J JULICH sprio Can be used to query job priorities squeue allows to query the list of pending and running jobs srun IS used to initiate job steps mainly within a job or start an interactive jobs A job
9. efining work flows and job dependencies gt interfaces for job monitoring and profiling accounting gt partitions and queues to control jobs according to policies and limits gt scheduling mechanisms like backfilling according to priorities Batch System is the combination of a scheduler and a resource manager It combines all the features of these two parts in an efficient way Slurm for example offers both functionalities scheduling and resource management Mitglied der Helmholtz Gemeinsc A J LICH JSC Batch Model FORSCHUNGSZENTRUM job scheduling according to priorities The jobs with the highest priorities will be scheduled next Backfilling scheduling algorithm The scheduler checks the queue and may schedule jobs with lower priorities that can fit in the gap created by freeing resources for the next highest priority jobs No node sharing The smallest allocation for jobs is one compute node Running jobs do not disturb each other For each project a Linux group is created where the users belong to Each user has available contingent from one project only CPU Quota modes monthly and fixed The projects are charged on a monthly base or get a fixed amount until it is completely used Accounted CPU Quotas per job Number of nodes x Walltime Contingent CPU Quota states for the projects normal low contingent no contingent Contingent priorities normal gt lowcont gt nocont Users without c
10. es 2 t 00 30 00 After a successful allocation salloc will start a shell on the login node where the submission happened After the allocation the users can execute srun in order to spawn interactively their applications on the compute nodes For example srun N 4 ntasks per node 2 t 00 10 00 c 7 mpi prog The interactive session is terminated by exiting the shell It is possible to obtain a remote shell on the compute nodes after salloc by running srun with the pseudo terminal pty option and a shell as argument srun cpu bind none N 2 pty bin bash It is also possible to start an interactive job and get a remote shell on the compute nodes with srun not recommended without salloc srun cpu_bind none N 1 n 1 t 01 00 00 pty bin bash i Mitglied der Helmholtz Gemeinschaft Further Information A J LICH FORSCHUNGSZENTRUM Updated status of the systems See Message oftoday at login Get recent status updates by subscribing to the system high messages http juelich de jsc CompServ services high_msg html JUROPA online documentation http www fz juelich de ias jsc juropa JUROPATEST online documentation http www fz juelich de ias jsc juropatest JUROPATEST Slurm User Manual pdf http fz juelich de ias jsc juropatest batchpdf JURECA online documentation http www fz juelich de ias jsc jureca User support at FZ Email sc fz juelich de Phone
11. f its predecessor bin bash NO OF JOBS lt no of jobs gt JOB SCRIPT lt jobscript name gt JOBID sbatch JOB SCRIPT 2 gt 81 awk print NF I 0 while I le S NO OF JOBS do JOBID sbatch d afterok JOBID JOB SCRIPT 2 gt 81 awk print NF let I I 1 done Mitglied der Helmholtz Gemeinschaft Job Arrays J LICH FORSCHUNGSZENTRUM Slurm supports job arrays which can be defined using the option a or array Of sbatch command To address a job array Slurm provides a base array ID and an array index for each job The format for specifying an array job IS lt base job id gt lt array index gt Slurm exports also 2 env variables that can be used in the job scripts SLURM ARRAY JOB ID base array job ID SLURM ARRAY TASK ID array index Some additional options are available to specify the std in out err file names in the job scripts sa will be replaced by SLURM ARRAY JOB ID and sa will be replaced by SLURM ARRAY TASK ID bin bash SBATCH nodes 1 SBATCH output prog A a out SBATCH error prog A a err SBATCH time 02 00 00 SBATCH array 1 20 srun N 1 ntasks per node 1 prog input _ SLURM ARRAY TASK ID txt Mitglied der Helmholtz Gemeinschaft i 4 eo Interactive Jobs I JULICH Interactive sessions can be allocated using the salloc command The following command will allocate 2 nodes for 30 minutes salloc nod
12. id MPI OpenMP job is presented This job will allocate 5 compute nodes for 2 hours The job will have 35 MPI tasks in total 7 tasks per node and 4 OpenMP threads per task On each node 28 cores will be used no SMT enabled Note It is important to define the environment variable ome NUM THREADS and this must match with the value of the option cpus per task c t bin bash SBATCH t SBATCH SBATCH SBATCH SBATCH t SBATCH J TestJob N 5 o TestJob j out e TestJob j err time 02 00 00 partition large export OMP NUM THREADS 4 srun N 5 ntasks per node 7 cpus per task 4 hybrid prog Mitglied der Helmholtz Gemeinschaft Job Script Hybrid Job with SMT A J LICH The CPUs on our clusters support Simultaneous Multi Threading SMT SMT is enabled by default for Slurm In order to use SMT the users must either allocate more than half of the Logical Cores on each Socket or by setting some specific CPU Binding Affinity options This example presents a hybrid application which will execute hybrid prog on 3 nodes using 2 MPI tasks per node and 28 OpenMP threads per task 56 CPUs per node Job was executed on JUROPATEST with 28 Logical Cores per Socket bin bash SBATCH ntasks 6 SBATCH ntasks per node 2 SBATCH cpus per task 28 SBATCH output mpi out j SBATCH error mpi err j SBATCH time 00 20 00 SBATCH partition batch export OMP_NUM_THREADS SLURM_
13. ontingent get some penalties for the their jobs but they are still allowed to Submit and run jobs Mitglied der Helmholtz Gemeinsc A J LICH Slurm Introduction 1 seas Slurm is the chosen Batch System that will be used on JURECA Slurm is an open source project developed by SchedMD For our clusters psslurm which is a plug in of psid daemon and part of the Parastation Cluster tools will replace slurmd on the compute nodes psslurm is under development by ParTec and JSC in the context of our collaboration Slurm s configuration in JSC gt gt gt High availability for the main daemons slurmctid and slurmdbd Backfilling scheduling algorithm No node sharing Job scheduling according to priorities Accounting mechanism slurmdbd with MySQL MariaDB database as back end storage User and job limits configured by QoS and Partitions No preemption configured Running jobs cannot be preempted Prologue and Epilogue with pshealthcheck from Parastation Mitglied der Helmholtz Gemeinschaft i 4 eo Slurm Introduction 2 LICH Slurm groups the compute nodes into Partitions similar to queues from Moab Some limits and policies can be configured for each Partition gt allowed users groups or accounts gt max nodes and max wall time limit per job gt max submitted queued jobs per user Other limits are enforced also by the Quality of Services QoS according to the contingent of user s group
14. s executable args Some useful options forward x Enable X11 forwarding only for interactive jobs pty Execute a task in pseudo terminal mode multiprog lt file gt Run different programs with different arguments for each task specified in a text file Note In order to spawn the MPI applications the users should always use srun and not mpiexec Mitglied der Helmholtz Gemeinsc Job Script Serial Job Y un Instead of passing options to sbatch from the command line it is better to specify these options using the sBatcH directives inside the job scripts Here is a simple example where some system commands are executed inside the job script This job will have the name Testjob One compute node will be allocated for 30 minutes Output will be written in the defined files The job will run in the default partition batch bin bash SBATCH J TestJob SBATCH N 1 SBATCH o TestJob j out SBATCH e TestJob j err SBATCH time 30 sleep 5 hostname Mitglied der Helmholtz Gemeinschaft Job Script Parallel Job O debe Here is a simple example of a job script where we allocate 4 compute nodes for 1 hour Inside the job script with the srun command we request to execute on 2 nodes with 1 process per node the system command hostname requesting a walltime of 10 minutes In order to start a parallel job users have to use the srun command that will spawn processes on the allocated compu
15. te nodes of the job bin bash SBATCH J TestJob SBATCH N 4 SBATCH o TestJob j out SBATCH e TestJob j err SBATCH time 60 srun ntasks per node 1 hostname Mitglied der Helmholtz Gemeinsch Job Script OpenMP Job O J LICH In this example the job will execute an OpenMP application named omp prog The allocation is for 1 node and by default since there is no node sharing all CPUs of the node are available for the application The output filenames are also defined and a walltime of 2 hours is requested Note It is important to define and export the variable OMP_NUM_THREADS that will be used by the executable bin bash SBATCH J TestOMP SBATCH N 1 SBATCH o TestOMP j out SBATCH e TestOMP j err SBATCH time 02 00 00 export OMP NUM THREADS 56 home user test omp prog Mitglied der Helmholtz Gemeinschaft FORSCHUNGSZENTRUM Job Script MPI Job A J LICH In the following example an MPI application will start 112 tasks on 4 nodes running 28 tasks per node requesting a wall time limit of 15 minutes in batch partition Each MPI task will run on a separate core of the CPU bin bash SBATCH nodes 4 SBATCH ntasks 112 SBATCH output mpi out j SBATCH error mpi err j SBATCH time 00 15 00 SBATCH partition batch srun N4 ntasks per node 28 mpi prog Mitglied der Helmholtz Gemeinschaft Job Script Hybrid Job O J LICH In this example a hybr
Download Pdf Manuals
Related Search
Related Contents
GetinGe GeW Washer/Dryers securinG critical cleaninG in Bedienungsanleitung 見 積 読 争 の 公 告 l`économie positive Yamaha EMX5014C Owner's Manual Alex Parts Book UBUNTU Manual de Configuración Contenido Introducción descargar manual de instrucciones en pdf Samsung NV100HD Manuel de l'utilisateur Samsung GT-S5690 Uporabniški priročnik Copyright © All rights reserved.
Failed to retrieve file