Home

bullx cluster suite User's Guide - cenapad-mg

image

Contents

1. 2 4 2 27 torre T 2 9 22 8 MPIBull2 Example of Use eee eto tcm mre eaa 2 11 2 2 9 and NFS CIUSISIS rivum EE FERRI cite 2 12 FON IMPIDE tier ere oe ens 2 13 2 3 rel 2 14 2 3 1 The mpibull2 params command aan 2 14 2 9 2 Familynames six icone ertt eit bd eie d Rica tu ERA ERR 2 17 2 4 Managing your MPI environment seccion eoe petebant 2 18 2 5 Profiling ith mpi SEP eim ta ta omn bebe uH pb toS onov tea 2 19 Chapter 3 Scientific Libraries csset prr dete tetur ba dcn tiene Ue 3 1 3 1 OVENI PEUT 3 1 3 2 Bull Scientific a a 3 1 3 2 1 Scientific Libraries and Documentation Ara bete PA vete 3 2 322 JU Gc 3 3 3 2 3 3 4 324 O E 3 5 CE MENS I rn cp 3 5 cw rr HS 3 6 32A PEU 3 6 3 2 8 SRI EUER 3 7 3 29 NETCDEF ceret ha Mie e Hd M ME Hs EH I 3 7 Table of Contents spIMERGDE Rec 3 7 MENIS Gnd PAR METIS oa trea respondeat ten Renate o mm Sanne meee 3 8 352 102 SSC
2. char jobFile MAXFILENAMELEN int numAskedHosts char askedHosts int numExecHosts char execHosts int jStatus job status double hostFactor char jobName MAXLINELEN char command MAXLINELEN struct lsfRusage LSFrusage char mailUser user option mail string char projectName the project name for this job used for accounting purposes int exitStatus job status int maxNumProcessors char loginShell login shell specified by user char timeEvent int idx array idx must be 0 in JOB NEW int maxRMem int maxRswap char inFileSpool MAXFILENAMELEN spool input file char commandSpool MAXFILENAMELEN spool command file char rsvid char sla The service class under which the job runs int exceptMask char additionalInfo int exitInfo char warningAction warning action SIGNAL CHKPNT int warningTimePeriod char chargedSAAP char licenseProject int slurmJobId command NULL if unspecified warning time period in seconds 1 if unspecified License Project job id from slurm part two is the SLURM info minus the duplicated infomation from LSF long priority char partition 64 int gid int blockId int numTasks double aveVsize int maxRSs int maxRssTaskId double aveRss int maxPages int maxpagestaskId double avePages int minCpu int minCpu
3. e To obtain the compiler documentation go to opt intel Compiler 11 0 069 Documentation Remember that if you are using MPI Bull then a compiler version has to be used which is compatible with the compiler originally used to compile the MPI library 4 4 Compiler Licenses Three types of Intel Compiler licenses are available e Single User allows one user to operate the product on multiple computers as long as only one copy is in use at any given time e Node Locked locked to a node allows any user who has access to this node to operate the product concurrently with other users limited to the number of licenses purchased e Floating locked to a network allows any user who has access to the network server to operate the product concurrently with other users limited to the number of licenses purchased The node locked and floating licenses are managed by FlexLM from Macrovision License installation and FlexLM configuration may differ according to your compiler the license type the number of licenses purchased and the period of support for your product Please check the Bull Product Designation document delivered with your compiler and follow the instructions contained therein Compilers 4 3 4 5 Intel Math Kernel Library Licenses Intel Math Kernel Library licenses are required for each Node on which you compile with MKL However the runtime libraries which are used on the compute nodes do not r
4. Obtains details of the user s configuration mpi_user gt gt gt mpibull2 devices c MPIBULL2 home install path User prefs __ Directory home nfs mpi user MPIBull2 _ Custom devices home nfs mpi user MPIBull2 site libs _ MPI Core flavor Standard Error detection on __ MPI Communication Driver oshm Shared Memory device to be used on a Single machine static dynamic d xxx Sets the communication device driver specified mpi user gt gt gt mpibull2 devices d ibmr gen2 mpibull2 launch This is a meta launcher which connects to whatever process manager is specified by the user It is used to ensure compatibility between different process manager launchers and also to allow users to specify their custom key bindings The purpose of mpibull2 launch is to help users to retain their launching commands mpibull2 launch also interprets user s special keybindings in order to allow the user to retain their preferences regardless of the cluster and the MPI library This means that the user s scripts will not need changing except for the particular environment variables that are required The mpibull2 launch tool provides default keybindings The user can check them using the metahelp option If the user wishes to check some of the CPM Cluster Process Manager special commands they should use options with the CPM launch name command e g options srun Parallel Libraries
5. The libraries for ParmMETIS are located in the following directory opt scilibs PARMETIS ParMETIS version mpibull2 version lib More information is available from documentation included in the SciStudio shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt PARMETIS ParMETIS lt version gt SciPort SCIPORT is a portable implementation of CRAY SCILIB that provides both single and double precision object libraries SCIPORTS provides single precision and SCIPORTD provides double precision The libraries for SCIPORT can be found in the following directory opt scilibs SCIPORT sciport lt versions gt lib More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt SCIPORT sciport lt version gt gmp_sci GMP is free library for arbitrary precision arithmetic operating on signed integers rational numbers and floating point numbers There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on GMP has a rich set of functions and the functions have a regular interface The main target applications for GMP are cryptography applications and research Internet security applications algebra systems computational algebra research
6. Application Debugging Tools 8 1 8 4 TotalView File Edit View Group Process Thread Action Point Tools Window Help Group Control x p BR P p 3 s3 3 Go Halt Kil Restat Next Step Out Run i Process 1 fork_loopLinux At Breakpoint 3 Thread 1 Stopped Stack Trace s Stack Frame select FP bfffdbe8 Function fork wrapper FP bfffdbe8 fork count 0x00000003 3 snore FP bfffdc28 Block 1 forker FP bfffdca8 my ptid 0x00000400 1024 fork wrapper FP bfffddl8 new tid 0 0000000 10 main FP bfffdd48 attr pthread attr t libe start main FP bfffdd88 whoops Ox400add2e 107445 local fork count 0 40147 1 m Registers for the frane wani Nu FEFEFEFAFn Ctd Function fork wrapper in fork Ioop cxx void fork wrapper int fork count pthread t my ptid pthread self pthread t new tid pthread attr t attr int whoops int local fork count thread ptids 0 my ptid if lfork late forker Never returns Dive local fork count k count 1 Add to Expression List 1045 printf Pid amp d m s thread to fork n int getg 1046 SO BDRSABOBM i 1047 printf root pt Set Barrier ong my_ptid int getpid 1048 new tid 0 1049 if defined Ly 3 1050 pthread attr ini 1051 else Disable 1052 pthread attr cre Delete Properties Action Points Processes JE 1 fork loop cooutb54 wait_a_while 0x82 2 for
7. Scientific Libraries 3 13 3 3 4 3 3 5 3 4 3 4 1 3 14 PBLAS PBLAS stands for Parallel Basic Linear Algebra Subprograms PBLAS is the parallelized version of BLAS for distributed memory machines It requires the cyclic distribution by matrix block that the BLACS library offers This library is included in the Intel MKL package LAPACK LAPACK stands for Linear Algebra PACKage This is a set of Fortran 77 routines used to resolve linear algebra problems such as the resolution of linear systems eigenvalue computations matrix computations etc However it is not written for a parallel architecture This library is included in the Intel MKL package NVIDIA CUDA Scientific Libraries For clusters which include NVIDIA Tesla graphic accelerators the NVIDIA Compute Unified Device Architecture CUDA Toolkit including versions of the CUFFT and the CUBLAS scientific libraries is installed automatically on the LOGIN COMPUTE and COMPUTEX nodes The CUFFT and CUBLAS libraries not ABI compatible by symbol by call or by libname with the libraries included in Bull Scientific Studio The use of the NVIDIA CUBLAS and CUFFT libraries needs to be made explicit and is exclusive to systems which include the NVIDIA Tesla graphic accelerators CUFFT CUFFT the NVIDIA9 CUDA Fast Fourier Transform FFT library is used for computing discrete Fourier transforms of complex or real valued data sets Th
8. List all the default family names Use the FNAME argument this could be a list to specify a precise family name or just a part of a name Use the option to list all parameters for the family specified v and i options are as described above Examples This command will list all family names with the string band in their names mpibull2 params f band Parallel Libraries 2 15 For each family name with the string band inside this command will list all the parameters and current values mpibull2 params fl band m v PARAMETER VALUE Modify a MPI PARAMETER with VALUE The exact name of the parameter should be used to modify a parameter The parameter is set in the environment independently of the shell syntax ksh csh being used The keyword default should be used to restore the parameter to its original value If necessary the parameter can then be unset in its environment The m operator lists all the modified MPI parameters by comparing all the MPI parameters with their default values If none of the MPI parameters have been modified then nothing is displayed The m operator is like the d option Use the v option for a verbose mode Examples This command will set the ROMIO_LUSTRE parameter in the current environment mpibull2 params m mpibull2 romio lustre true This command will unset the ROMIO LUSTRE parameter in the environment in which it is running and returns it to its defa
9. Parallel MPIBull2 mpiexec mpirun MPD Serial srun OpenMP on salloc Clusters with one node srun no of CPUs SLURM Parallel MPI srun no of CPUs per MPI task Table 7 1 Launching an application without a Batch Manager for different clusters Launching an Application 7 3 7 4 bullx cluster suite User s Guide Chapter 8 Application Debugging Tools 8 1 Overview There are two types of debuggers symbolic ones and non symbolic ones e symbolic debugger gives access to a program s source code This means that The lines of the source file can be accessed program variables can be accessed by name e Whereas a non symbolic debugger enables access to the lines of the machine code program only and to the top physical addresses The following debugging tools are described e 8 2 GDB e 8 3 IDB e 8 4 Total View e 8 5 DDT e 8 6 MALLOC_CHECK_ Debugging Memory Problems in C programs e 8 7 Electric Fence 8 2 GDB GDB stands for Gnu DeBugger It is a powerful Open source debugger which can be used either through a command line interface or a graphical interface such as XXGDB or DDD Data Display Debugger It is also possible to use an emacs xemacs interface GDB supports parallel applications and threads GDB is published under the GNU license 8 3 IDB IDB is a debugger delivered with Intel compilers It can be used with C C and F90 programs
10. When this is installed the documentation files will be located under opt scilibs SCISTUDIO SHELF SciStudio shelf eversion SCALAPACK ScaLAPACK ver SCALAPACK Used for complex computations system resolution eigenvalue computations etc Global Local Sequential equivalent of SCALAPACK me Message passing primitive Figure 3 2 Interdependence of the different mathematical libraries Scientific Studio and Intel 3 4 bullx cluster suite User s Guide 3 2 3 1 3 2 4 3 2 5 Using SCALAPACK Local component routines are called by a single process with arguments residing in local memory Global component routines are synchronous and parallel They are called with arguments that are matrices or vectors distributed over all the processes SCALAPACK uses MPIBull2 The default installation of this library is as follows opt scilibs SCALAPACK ScaLAPACK lt version gt mpibull2 lt version gt The following library is provided Libscalapack a Several tests are provided in the following directory opt scilibs SCALAPACK ScalAPACK lt version gt mpibull2 lt version gt tests Blocksolve95 BlockSolve95 is a scalable parallel software library primarily intended for the solution of sparse linear systems that arise from physical models especially problems involving multiple degrees of freedom at each node BlockSolve95 uses the MPIBull2 library The default installation
11. Reading symbols from lib libc so 6 done Loaded symbols for lib libc so 6 Reading symbols from lib ld linux so 2 done Loaded symbols for lib ld linux so 2 0 0x40097354 in mallopt from lib libc so 6 gdb bt 0 0 40097354 in mallopt from lib libc so 6 1 0x4009615f in free from lib libc so 6 2 0x0804852f in main at exemple c 18 gdb The bt command is used to display the current memory stack In this example the last line indicates the problem came from line 18 in the main function of the example c file Looking at the example c program on the previous page we can see that line 18 corresponds to the second call to the free function which created the memory overflow bullx cluster suite User s Guide 8 7 Electric Fence Electric Fence is an open source malloc debugger for Linux and Unix It stops your program on the exact instruction that overruns or under runs a malloc buffer Electric Fence is installed on the Management Node only Electric Fence helps you detect two common programming bugs e Software that overruns the boundaries of a malloc memory allocation e Software that touches a memory allocation that has been released by You can use the following example replacing ice version by the command line of your program test host LD PRELOAD usr local tools ElectricFence 2 2 2 lib libefence so 0 0 icc version Electric Fence 2 2 0 Copyright C 1987 1999 Bruce Perens
12. Table 5 1 Examples of different module configurations ssssssssssssssseeee 5 3 Table 7 1 Launching an application without a Batch Manager for different clusters 7 3 viii bullx cluster suite User s Guide Chapter 1 Introduction to the extreme computing Environment The term extreme computing describes the development and execution of large scientific applications and programs that require a powerful computation facility which can process enormous amounts of data to give highly precise results The bullx cluster suite is a software suite that is used to operate and manage a Bull extreme computing cluster of Xeon based nodes These clusters are based on Bull platforms using InfiniBand stacks or with Gigabit Ethernet networks The bullx cluster suite includes both Bull proprietary and Open Source software which provides the infrastructure for optimal interconnect performance The Bull extreme computing cluster includes an administrative network based on a 10 100 Mbit or a Gigabit Ethernet network and a separate console management network The bullx cluster suite delivery also provides a full environment for development including optimized scientific libraries MPI libraries as well as debugging and performance optimization tools This manual describes these software components and explains how to work within the bullx cluster suite environment 1 1 Software Configuration Isl Operating Sy
13. UDP User Datagram Protocol UID User ID ULP Upper Layer Protocol USB Universal Serial Bus UTC Coordinated Universal Time V VCRC Variant Cyclic Redundancy Check VDM Voltaire Device Manager VFM Voltaire Fabric Manager VGA Video Graphic Adapter VL Virtual Lane VLAN Virtual Local Area Network VNC Virtual Network Computing Used to enable access to Windows systems and Windows applications from the Bull NovaScale cluster management system Glossary and Acronyms G 7 W WWPN World Wide Port Name X XFS eXtended File System XHPC Xeon High Performance Computing XIB Xeon InfiniBand XRC Extended Reliable Connection Included in Mellanox ConnectX HCAs for memory scalability G 8 bullx cluster suite User s Guide Index B Batch Management 7 1 BLACS 3 3 BLAS 3 13 BlockSolve95 3 5 Bull Scientific Studio 3 1 bullx cluster suite definition 1 1 Compiler C 12 Fortran 1 2 4 1 GCC 1 2 4 4 GNU compilers 4 1 Intel C C 4 2 NVIDIA nvcc 4 4 NVIDIA nvcc and MPI 4 5 Compiler licenses 4 3 FlexLM 4 3 CUDA makefile system 4 5 CUDA Toolkit 5 14 D Debugger DDT 8 3 Electric Fence 8 7 GDB 1 2 8 1 8 6 Intel Debugger 1 2 8 1 MALLOC_CHECK 8 5 Non symbolic debugger 8 1 Symbolic debugger 8 1 TotalView 8 2 E eval command 5 2 F FFTW 3 6 File System NFS 1 3 5 2 G ga Global Array 3 10 gmp sci 3 8 CSL
14. amp lsb acct fg amp slurm acct fg amp newAcct if an extra combine account variable is needed the user can define the new variable and call init cacct rec to initialize the record and call free cacct ptrs to free the memory used in the new variable For example to define variable for the new record struct CombineAcct otherAcct before using the variable otherAcct do init cacct rec amp otherAcct when done do the following to free the memory used by the otherAcct variable free cacct ptrs amp otherAcct The new record contains the combined accounting information as follows combine LSF and SLURM acct log information struct CombineAcct part one is the LSF information char evenType 50 char versionNumber 50 time t eventTime int jobId int userId long options int numProcessors time t submitTime time t beginTime time t termTime time t startTime char userName MAX LSB NAME LEN char queue MAX LSB NAME LEN char resReq char dependCond char preExecCmd the command string to be pre executed char fromHost MAXHOSTNAMELEN Resource Management using SLURM 6 17 char cwd MAXFILENAMELEN char inFile MAXFILENAMELEN char outFile MAXFILENAMELEN char errFile MAXFILENAMELEN
15. and profile your programs in detail You can also use Valgrind to build new tools The Valgrind distribution currently includes five production quality tools a memory error detector a thread error detector a cache and branch prediction profiler a call graph generating cache profiler and a heap profiler It also includes two experimental tools a data race detector and an instant memory leak detector The libraries for Valgrind are located in the following directory opt opens VALGRIND_OPENS valgrind_OpenS lt version gt share doc valgrind opt opens VALGRIND_OPENS valgrind_OpenS lt version gt bin opt opens VALGRIND_OPENS valgrind_OpenS lt version gt valgrind include opt opens VALGRIND_OPENS valgrind_OpenS lt version gt valgrind lib opt opens VALGRIND_OPENS valgrind_OpenS lt version gt include valgrind vki opt opens VALGRIND_OPENS valgrind_OpenS lt version gt man opt opens VALGRIND_OPENS valgrind_OpenS lt version gt lib valgrind amd64 linux opt opens VALGRIND_OPENS valgrind_OpenS lt version gt lib valgrind x86 linux More information is available from documentation included in the SciStudio_shelf rom When this is installed the documentation files will be located under opt opens OPENS_SHELF OpenS_shelf lt version gt VALGRIND valgrind lt version gt bullx cluster suite User s Guide 3 3 Intel Scientific Libraries Note scientific libraries in this section are all Intel proprietary libraries and must
16. combine accounting record the calling sequence is int get combine acct info File lsb acct fg File slurm acct fg int jobId CombineAcct newAcct where lsb acct fg is the pointer to the LSF accounting log file Slurm acct fg is the pointer to the Slurm accounting log file jobid is the job Id from the LSF accounting log file newAcct is the address of the variable to hold the new record information 6 16 bullx cluster suite User s Guide This routine will use the input LSF job ID to locate the LSF accounting information in the LSF log file then get the SLURM JOBID and locate the SLURM accounting information in the SLURM log file This routine will return a zero to indicate that both records are found and processed successfully otherwise one or both records are in error and the content in the newAcct variable is undefined For example to get the combine acct information for a specified jobid 2010 jobId 2010 status get combine acct info lsb acct fg slurm acct fg jobId amp newAcct to display the record call display combine acct record routine display combine acct record amp newAcct when finished accessing the record the user must close the log files and the free memory used in the newAcct variable by calling cacct wrapup routine For example if 15 acct fg NULL if open successfully before cacct wrapup
17. dot files and directories that every new user should start with The skeleton files are copied to the new user s HOME directory with the m option added to the useradd command A set of sample dot files are located in etc skel Copy everything but the in and CVS files and directories to the skeleton directory Edit and tailor for your system If you have a pre existing set of skeleton files then make sure the following minimum set exists cshrc login kshenv profile These can be automatically updated with the command env HOME etc skel opt modules default bin add modules Inspect the new dot files and if they are OK then remove all the old original files An alternative way of setting up the users dot files can be found in ext This model can be used with the with dot ext configure option User Shell RC Dot Files The final step for a functioning modules environment is to modify the user dot files to source the right files One way to do this is to put a message in the etc motd telling each user to run the command opt modules default bin add modules This is a script that parses their existing dot files prepending the appropriate commands to initialize the Modules environment The user can re run this script and it will find and remember what modules they initially loaded and then strip out the previous module initialization and restore it with an upgraded one bu
18. lib superlu x86 64 a Tests are provided for each library under the following directory opt scilibs SUPERLU type SuperlU type version test directory FFTW FFTW stands for the Fastest Fourier Transform in the West FFTW is a C subroutine library for computing a discrete Fourier transform DFT in one or more dimensions of arbitrary input size and using both real and complex data There are three versions of FFTW in this distribution They are located in the following directories opt scilibs FFTW FFTW3 lt version gt lib opt scilibs FFTW fltw 2 lt version gt mpibull2 lt version gt lib Tests are also available in the following directory opt scilibs FFTW fftw version test bullx cluster suite User s Guide More information is available from documentation included in the SciStudio shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt FFTW fftw lt version gt 3 2 0 PETSc PETSc stands for Portable Extensible Toolkit for Scientific Computation PETSc is a suite of data structures and routines for the scalable parallel solution of scientific applications modeled by partial differential equations It employs the MPI standard for all message passing communications see http www mcs anl gov mpi for more details The Pets library is available in the following directory opt scilibs PETSC PETSc 2 3 3 p0 mpib
19. option and then breakpoints should be added to the program to control its execution TotalView is an XWindows application Context sensitive help provides you with basic information You may download TotalView in the directory opt totalview Before running TotalView update your environment by using the following command Source opt totalview totalview vars sh Then enter totalview amp See http www totalviewtech com productsTV htm for additional information and for copies of the documentation for Totalview 8 5 DDT DDT is a proprietary debugging tool from Allinea and is not included with the bullx cluster suite delivery Its source code browser shows at a glance the state of the processes within a parallel job and simplifies the task of debugging large numbers of simultaneous processes DDT has a range of features designed to debug effectively from deadlock and memory leak tools to data comparison and group wise process control and it interoperates with all known MPIBull2 implementations For multi hreaded or OpenMP development DDT allows threads to be controlled individually and collectively with advanced capabilities to examine data across threads The Parallel Stack Viewer allows the program state of all processes and threads to be seen at a glance making parallel programs easier to manage Application Debugging Tools 8 3 Ivi Current Memory Usage Graphical View
20. the executable program modulecmd and a directory containing a collection of master modulefiles MODULEPATH This is the path that the module command searches when looking for modulefiles Typically it is set to the master modulefiles directory MODULESHOME modulefiles by the initialization script MODULEPATH can be set using module use or by the module initialization script to search group or personal modulefile directories before or after the master modulefile directory bullx cluster suite User s Guide LOADEDMODULES A colon separated list of all loaded modulefiles _LOADED_MODULEFILES_ A colon separated list of the full pathname for all loaded modulefiles _MODULESBEGINENV_ The filename of the file containing the initialization environment snapshot Files opt The MODULESHOME directory MODULESHOMEJ etc rc The system wide modules rc file The location of this file can be changed using the MODULERCFILE environment variable as described above HOME modulerc The user specific modules rc file MODULESHOME modulefiles The directory for system wide modulefiles The location of the directory can be changed using the MODULEPATH environment variable as described above MODULESHOME bin modulecmd The modulefile interpreter that gets executed upon each invocation of a module MODULESHOME init shellname The Modules package initialization file sourced into the user s environment MODULESHOME i
21. A Compiling with and MPI s esc tete cet vct oeste ek en bp 4 5 Chapter 5 The User s Environment trei oo ba ero pues pau dix 5 5 1 Cluster Access anc Eo ere 5 1 ST ssh Secure Shell e OR ERU ER GS sway ad we TO fug 5 5 2 Gl bal kile Systems saos desactivar ei it erdum ee cupo 5 2 5 3 Environment Modules pepe fr e RARE Eee uu n fee 5 2 S d Sacs oes oc cated Ney Sing SOR ete uoi widen 5 2 5 3 2 Setting Up the Shell RG Rilessscao sc tau EE NOR RISUS DP cs wate 5 4 MN Module MEET 5 5 5 4 1 Upgrading via the Modules Command ssssssssssssssseeee 5 6 5 5 TheModule Command E ET E TT OTT 57 555r modblehlesc cal eor en s vds tuae 57 5 0 2 Modules Package Initialization zoom 5 8 5 5 3 Examples of Initialization nin ets ea Od esp ER Iu UNE NOS IMS 5 9 SSA MAodulscid iit ane Me AU UR UMS 5 9 vi bullx cluster suite User s Guide 5 5 5 Module Command Line Switches seeeee 5 9 5 5 67 Mod le S b C mmands co diretti ex rers eftt e tene e 5 10 5 5 7 Modules Environment Vartables cro ertet Deo te ete e atn b p ese ua 5 12 5 6 The NVIDIA CUDA Development Environment sssssssseeeneeneeeeeeenene 5 14 5 9 1 bulseluster suite and i Sont RO E n OR Dl bee Pm CF 5 14 5 6 2 CUD
22. Bourne and Korn Shells bashrc bash env and bash profile for the GNU Bourne Again Shell zshrc zshenv and zlogin for zsh The modules file is checked for all shells If a module load line is found in any of these files the modulefile s is are appended to any existing list of modulefiles The module load line must be located in at least one of the files listed above for any of the init sub commands to work properly If the module load line is found in multiple shell initialization files all of the lines are changed initadd modulefile modulefile e Does the same as initadd but prepends the given modules to the beginning of the list initrm modulefile modulefile Remove modulefile from the shell s initialization files initprepend modulefile modulefile e Switch modulefilel with modulefile2 in the shell s initialization files initswitch modulefilel modulefile2 e list all of the modulefiles loaded from the shell s initialization file initlist e Clear all of the modulefiles from the shell s initialization files initclear Modules Environment Variables Environment variables are unset when unloading a modulefile Thus it is possible to load a modulefile and then unload it without having the environment variables return to their prior state MODULESHOME This is the location of the master Modules package file directory containing module command initialization scripts
23. Device Architecture CUFFT CUDA Fast Fourier Transform Glossary and Acronyms G 1 CVS Concurrent Versions System Cygwin A Linux like environment for Windows Bull cluster management tools use Cygwin to provide SSH support on a Windows system enabling command mode access D DDN Data Direct Networks DDR Double Data Rate DHCP Dynamic Host Configuration Protocol DLID Destination Local Indentifier DNS Domain Name Server A server that retains the addresses and routing information for TCP IP LAN users DSO Dynamic Shared Object E EBP End Bad Packet Delimiter ECT Embedded Configuration Tool EIP Encapsulated IP EPM Errors per Million G 2 bullx cluster suite User s Guide EULA End User License Agreement Microsoft F FDA Fibre Disk Array FFT Fast Fourier Transform FFTW Fastest Fourier Transform in the West FRU Field Replaceable Unit FTP File Transfer Protocol G Ganglia A distributed monitoring tool used to view information associated with a node such as CPU load memory consumption and network load GCC GNU C Compiler GDB Gnu Debugger GFS Global File System GMP GNU Multiprecision Library GID Group ID GNU GNU s Not Unix GPL General Public License GPT GUID Partition Table Gratuitous ARP A gratuitous ARP request is an Address Resolution Protocol request packet where the source and destination IP are both set to the IP
24. Fortran 90 languages These allow the user to concentrate on developing the application without having to think about the internal mechanics of MPI The man page files provide more details about wrappers When using compiling tools the wrappers need to know which communication device and a linking strategy they should use The compiling tools parse as long as some of the following conditions have been met e device and linking strategy has been specified in the command line using the sd options e environment variables DEF MPIDEV DEF MPIDEV LINK required to ensure compatibility MPIBULL2 COMM DRIVER and MPIBULL2 LINK STRATEGY have been set e preferences have already been set up the tools will use the device they find in the environment using the MPIBULL2 devices tool bullx cluster suite User s Guide e tools take the system default using the dynamic socket device Note can obtain better performance using the fast static options to link statically with one of the dependent libraries as shown in the commands below mpicc static prog c mpicc fast prog c 2 2 8 MPIBull2 Example of use 2 2 8 1 Setting up the devices When compiling an application the user may wish to keep the makefiles and build files which have already been generated Bull has taken this into account The code and build files can be kept as they are All the user needs to do is to set up a few variables or
25. Parallel prefix sum scan of large arrays e Parallel Mersenne Twister random number generation See The CUDA Zone at www nvidia com for more examples of applications developed within the CUDA environment and for additional development tools and help The User s Environment 5 15 5 16 bullx cluster suite User s Guide Chapter 6 Resource Management using SLURM 6 SLURM Resource Management Utilities As a cluster resource manager SLURM has three key functions Firstly it allocates exclusive and or non exclusive access to resources Compute Nodes to users for a time period so that they can perform work Secondly it provides a framework for starting executing and monitoring work normally a parallel job on the set of allocated nodes Finally it arbitrates conflicting requests for resources by managing a queue of pending work Users interact with SLURM through various command line utilities e to submit a job for execution e SBATCH for submitting a batch script to SLURM e SALLOC for allocating resources for a SLURM job e to attach to a running SLURM job step e STRIGGER used to set get or clear SLURM event triggers e SBCAST to transmit a file to all nodes running a job e SCANCEL to terminate a pending or running job e SQUEUE to monitor job queues e SINFO to monitor partition and the overall system state e SACCTMGR to view and modify SLURM account information Used with th
26. Process at work merge comm Process 1 with 1 Threads runs at work 1 Got task from 900001 to 1000000 Merged and disconnected MPI Comm disconnect Assigned tasks O 0 1 x10 compute give up 3 Wallclock Time 45 2732 1 Wallclock Time 45 2732 Unpublishing my service toyMaster 2 Wallclock Time 45 2732 Closing my port of connection master master disconnected from 1 master disconnected from 2 master disconnected from 3 Master with 1 Threads joins computation univ 1 disconnected from server 0 Wallclock Time 45 2757 2 8 bullx cluster suite User s Guide 2 2 7 2 2 74 2 2 7 2 MPIBull2 Tools MPIBull2 devices This tool may be used to change the user s preferences It can also be used to disable a library For example if the program has already been compiled and the intention is to use dynamic MPI Devices which have already been linked with the MPI Core then it is now possible to specify a particular runtime device with this tool The following options are available with MPIBULL2 devices dl Provides list of drivers This is also supported by MPI wrappers dlv Provides list of drivers with versions of the drivers mpi user gt gt gt mpibull2 devices dl MPIBULL2 Communication Devices Original Devices oshm Shared Memory device to be used on a single machine static dynamic osock Socket protocol can be used over IPoIB SDP SCI static dynamic KKKKKK
27. Table View Total Across Processes in Bytes Current Usage Across Processes in Bytes 1 200 000 e egend Process 1 Process 7 Process 0 TUO Process 4 Process 5 Process 2 Process 3 600 000 Process 6 Largest set bytes 2nd largest set bytes 3rd largest set bytes 4th largest set bytes Sth largest set bytes Other allocations bytes 600 000 1 ptr from 0 80480 Ptr 0 67 116000 size 400000 400 000 200 000 BES CES DE s pco P EIC B Figure 8 2 The Graphical User Interface for DDT DDT can find memory leaks and detect common memory usage errors before your program crashes A programmable STL Wizard enables C Standard Template Library variables and the abstract data they represent including lists maps sets multimaps and strings to be viewed easily Developers of scientific code have full access to modules allocated data strings and derived types for Fortran 77 90 and 95 MPI message queues can be examined in order to identify deadlocks in parallel code and data may be viewed in 3D with the multi dimensional array viewer It is possible to run DDT with the PBS Professional Batch Manager See hitp allinea com for more information refer 8 4 bullx cluster suite User s Guide 8 6 MALLOC CHECK Debugging Memory Problems in programs When developing an application the developer should ensure that al
28. a library for multiple precision floating point computation which is both efficient and has a well defined semantics The libraries for MPFR can be found in the following directory opt scilibs MPFR MPFR lt version gt lib opt scilibs MPFR MPFR lt version gt include opt scilibs MPFR MPFR lt version gt info More information is available from the documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt MPFR MPFR lt version gt 3 2 15 sHDF5 pHDF5 The HDF5 technology suite includes versatile data model that can represent very complex data objects and a wide variety of metadata e completely portable file format with no limit on the number or size of data objects in the collection e software library that runs on a range of computational platforms from laptops to massively parallel systems and implements a high level API with C C Fortran 90 and Java interfaces Scientific Libraries 3 9 3 2 16 3 2 17 3 10 e Arich set of integrated performance features that allow for access time and storage space optimizations e Tools and applications for managing manipulating viewing and analyzing the data in the collection The libraries for sHDF5 pHDF5 can be found in the following directory opt scilibs PHDF5 pHDF5 lt version gt mpibull2 lt version gt lib opt scilibs
29. be bought separately 3 9 1 Intel Math Kernel Library This library which has been optimized by Intel for its processors contains among other things the following libraries BLAS LAPACK and FFT The Intel Cluster MKL is a fully thread safe library The library is located in the opt intel mkl lt release_nb gt directory To use it the environment has to be set by updating the LD_LIBRARY_PATH variable export LD LIBRARY PATH opt intel mkl release nb lib 64 LD LIBRARY PATH Example for MKL 7 2 export LD LIBRARY PATH opt intel mk172 1ib 64 LD LIBRARY PATH 3 3 2 Intel Cluster Math Kernel Library The Intel Cluster Math Kernel Library contains all the highly optimized math functions of the Math Kernel Library plus ScaLAPACK for Linux Clusters The Intel Cluster MKL is a fully thread safe library and provides C and Fortran interfaces The Cluster MKL library is located in the opt intel mkl lt release_nb gt cluster directory 3 3 3 BLAS BLAS stands for Basic Linear Algebra Subprograms This library contains linear algebraic operations that include matrixes and vectors Its functions are separated into three parts e level 1 routine to represent vectors and vector vector operations e level 2 routines to represent matrixes and matrix vector operations e level 3 routines mainly for matrix matrix operations This library is included in the Intel MKL package For more information see www netlib org blas
30. commands and srun OPTIONS Please refer to the man page for more details on the options including examples of use Example man salloc 6 6 bullx cluster suite User s Guide 6 6 SATTACH NAME sattach Attach to a SLURM job step SYNOPSIS sattach OPTIONS lt jobid stepid gt DESCRIPTION sattach attaches to a running SLURM job step By attaching it makes available the I O streams for all the tasks of a running SLURM job step It also suitable for use with a parallel debugger like TotalView OPTIONS Please refer to the man page for more details on the options including examples of use Example man sattach Resource Management using 6 7 6 7 6 8 SACCTMGR NAME sacctmgr Used to view and modify SLURM account information SYNOPSIS sacctmgr OPTIONS COMMAND DESCRIPTION sacctmgr is used to view or modify SLURM account information The account information is maintained within a database with the interface being provided by slurmdbd Slurm Database daemon This database serves as a central storehouse of user and computer information for multiple computers at a single site SLURM account information is recorded based upon four parameters that form what is referred to as an association These parameters are user cluster partition and account useris the login name cluster is the name of a Slurm managed cluster as specified by the ClusterName paramete
31. debugger and mathematical libraries In this way you can easily reproduce trial conditions or use only proven environments The Modules environment is a program that can read and list module files returning commands suitable for the shell to interpret and most importantly for the eval command Modulefiles is a kind of flat database which uses files In UNIX a child process can not modify its parent environment So how does Modules do this Modules parses the given modules file and produces the appropriate shell commands to set unset append un append onto an environment variable These commands are eval d by the shell Each shell provides some mechanism where commands can be executed and the resulting output can in turn be executed as shell commands In the C shell amp Bourne shell and derivatives this is the eval command This is the only way that a child process can modify the parent s login shell environment Hence the module command itself is a shell alias or function that performs these operations To the user it looks just like any other command The module command is only used in the development environment and not in other environments such as that for administration node See http modules sourceforge net for more details Using Modules The following command gives the list of available modules on a cluster module avail 3414 6 im EE E opt modules 3 1 6 modulefiles dot module in
32. either in short or long notation The following switches are accepted force f Force active dependency resolution This will result in modules found using a prereq command inside a module file being loaded automatically Unloading module files using this switch will result in all required modules which have been loaded automatically using the f switch being unloaded This switch is experimental at the moment terse t Display avail and list output in short format long 4 Display avail and list output in long format The User s Environment 5 9 Su 5 10 human h Display short output of the avail and list commands in human readable formot verbose v Enable verbose messages during module command execution silent s Disable verbose messages Redirect stderr to dev null if stderr is found not to be a tty This is a useful option for module commands being written into cshrc login or profile files because some remote shells e g rsh 1 and remote execution commands e g rdist get confused if there is output on stderr create c Create caches for module avail and module apropos You must be granted write access to the MODULEHOME modulefiles directory if you try to invoke module with the c option icase i This is a case insensitive module parameter evaluation Currently only implemented for the module apropos command userlvl lvl u lvl Set the user level to the specif
33. etc bullx cluster suite User s Guide GMP is carefully designed to be as fast as possible both for small operands and for huge operands The speed is achieved by using full words as the basic arithmetic type by using fast algorithms with highly optimized assembly code for the most common inner loops for a lot of CPUs and by a general emphasis on speed is faster than any other big num library The advantage for GMP increases with the operand sizes for many operations since GMP uses asymptotically faster algorithms The libraries for GMP SCI can be found in the following directory opt scilibs GMP SCI gmp sci version lib opt scilibs GMP SCI gmp sci version include opt scilibs GMP_SCI gmp_sci lt version gt info More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt GMP gmp lt version gt 3 2 14 The MPFR library is a C library for multiple precision floating point computations with correct rounding MPFR has continuously been supported by the INRIA Institut National de Recherche en Informatique et en Automatique and the current main authors come from the CACAO and Ar naire project teams at Loria Nancy France and LIP Lyon France respectively MPFR is based on the GMP multiple precision library The main goal of MPFR is to provide
34. from standard input The batch script may contain options preceded with SBATCH before any executable commands in the script sbatch exits immediately after the script has been successfully transferred to the SLURM controller and assigned a SLURM job ID The batch script may not be granted resources immediately and may sit in the queue of pending jobs for some time before the required resources become available When the batch script is granted the resources for its job allocation SLURM will run a single copy of the batch script on the first node in the set of allocated nodes OPTIONS Please refer to the man page for more details on the options including examples of use Example man sbatch Resource Management using SIURM 6 5 6 5 SALLOC allocation NAME SALLOC Obtain a SLURM job allocation a set of nodes execute a command and then release the allocation when the command is finished SYNOPSIS salloc OPTIONS lt command gt command_args DESCRIPTION salloc is used to define a SLURM job allocation which is a set of resources nodes possibly with some constraints e g number of processors per node When salloc obtains the requested allocation it will then run the command specified by the user Finally when the user specified command is complete salloc relinquishes the job allocation The command may be any program the user wishes Some typical commands are xterm a shell script containing srun
35. in the following directory opt scilibs GSL GSL lt version gt lib opt scilibs GSL GSL lt version gt bin opt scilibs GSL GSL lt version gt include opt scilibs GSL GSL lt version gt doc More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt GSL gsl lt version gt 3 2 18 pgapack PGAPack is a general purpose data structure neutral parallel genetic algorithm package developed by Argonne National Laboratory The libraries for pga can be found in the following directory opt scilibs PGAPACK pgapack lt version gt mpibull2 lt version gt lib opt scilibs PGAPACK pgapack lt version gt mpibull2 lt version gt doc opt scilibs PGAPACK pgapack lt version gt mpibull2 lt version gt include opt scilibs PGAPACK pgapack lt version gt mpibull2 lt version gt man More information is available from the documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt PGAPACK pgapack lt version gt Scientific Libraries 3 11 3 2 19 3 12 valgrind Valgrind is an award winning instrumentation framework for building dynamic analysis tools There are Valgrind tools that can automatically detect many memory management and threading bugs
36. is to use a message passing library where a process uses library calls to exchange messages information with another process This message passing allows processes running on multiple processors to cooperate Simply stated a MPI Message Passing Interface provides a standard for writing message passing programs A MPI application is a set of autonomous processes each one running its own code and communicating with each other through calls to subroutines of the MPI library Bull MPI2 Bull s second generation MPI library is included in the bullx cluster suite delivery This library enables dynamic communication with different device libraries including InfiniBand IB interconnects socket Ethernet IB EIB devices or single machine devices Bull MPI2 is fully integrated with the SLURM resource manager See Chapter 2 for more information on MPI Libraries 1 2 4 Data and Files Application file I O operations may be performed using locally mounted storage devices or alternatively on remote storage devices using either Lustre or the NFS file systems By using separate interconnects for administration and O operations the Bull cluster system administrator is able to isolate user application traffic from administrative operations and monitoring With this separation application I O performance and process communication can be made more predictable while still enabling administrative operations to proceed Introduction to the extre
37. lt bruce perens com gt See http perens com FreeSoftware for more information about Electric Fence Application Debugging Tools 8 7 8 8 bullx cluster suite User s Guide Glossary and Acronyms A ABI Application Binary Interface ACL Access Control List ACT Administration Configuration Tool ANL Argonne National Laboratory MPICH2 Application Programmer Interface ARP Address Resolution Protocol ASIC Application Specific Integrated Circuit B BAS Bull Advanced Server BIOS Basic Input Output System Blade Thin server that is inserted in a blade chassis BLACS Basic Linear Algebra Communication Subprograms BLAS Basic Linear Algebra Subprograms BMC Baseboard Management Controller BSBR Bull System Backup Restore BSM Bull System Manager C CGI Common Gateway Interface CLI Command Line Interface ClusterDB Cluster Database CLM Cluster Management CMC Chassis Management Controller ConMan A management tool based on telnet enabling access to all the consoles of the cluster Cron A UNIX command for scheduling jobs to be executed sometime in the future A cron is normally used to schedule a job that is executed periodically for example to send out a notice every morning It is also a daemon process meaning that it runs continuously waiting for specific events to occur CUBLAS CUDA BLAS CUDA Compute Unified
38. no direct interaction with the CUDA driver is necessary The basic model by which applications use the CUBLAS library is to create matrix and vector objects in the memory space of the Tesla graphics accelerator fill them with data call a sequence of CUBLAS functions and finally load the results back to the host To accomplish this CUBLAS provides helper functions for creating and destroying objects in the graphics accelerator memory space and for writing data to and retrieving data from these objects Because the CUBLAS core functions as opposed to the helper functions do not return an error status directly for reasons of compatibility with existing BLAS libraries CUBLAS provides a separate function that retrieves the last recorded error to help debugging The interface to the CUBLAS library is the header file cublas h Applications using CUBLAS need to link against the cublas so Linux DSO when building for the device and against the cublasemu so Linux DSO when building for device emulation See The CUDA CUBLAS Library document available from www nvidia com for more information regarding functions for this library Scientific Libraries 3 15 3 16 bullx cluster suite User s Guide Chapter 4 Compilers This chapter describes the following topics 4 1 Overview 4 2 Intel Fortran Compiler Professional Edition for Linux 4 3 Intel C Compiler Professional Edition for Linux 4 4 Intel Compiler Licenses 4 5 Int
39. use the MPIBULL2 devices tool During the installation process the etc profile d mpibull2 sh file will have been modified by the System Administrator according to the user s needs This file determines the default settings by default the rpm sets the osock socket TCP IP driver It is possible to override these settings by using environment variables this is practical as it avoids modifying makefiles or by using the tools options For example the user can statically link their application against a static driver as shown below The default linking is dynamic and this enables drive modification during runtime Linking statically as shown below overrides the user s preferences but does not change them mpi user gt gt gt mpicc sd ibmr gen2 prog c o prog mpicc Linking statically MPI library with device ibmr gen2 The following environment variables may also be used MPIBULL2 COMM DRIVER Specifies the default device to be linked against MPIBULL2 LINK STRATEGY Specifies the link strategy the default is dynamic this is required to ensure compatibility MPIBULL2 MPITOOLS VERBOSE Provides information when building the default is verbose off mpi user gt gt gt export DEF MPIDEV ibmr gen2 mpi user gt gt gt export MPIBULL2 MPITOOLS VERBOSE 1 mpi user gt gt gt mpicc prog c o prog mpicc Using environment MPI variable specifications mpicc Linking dynamically MPI library with dev
40. 17 intel fc version 8 0 019 intel cc version 8 0 022 intel db version 8 1 3 intel mkl version 7 0 017 Configuration Configuration 2 Configuration 3 Configuration 4 Table 5 1 Examples of different module configurations The User s Environment 5 3 duds 5 4 Setting Up the Shell RC Files A quick tutorial on Shell rc run command files follows When a user logs in and if they have bin csh bin sh as their shell the first rc fire to be parsed by the shell is etc csh login amp etc csh cshrc etc profile the order is implementation dependent and then the user s HOME cshrc HOME kshenv and finally HOME login HOME profile All the other login shells are based on bin csh and bin sh with additional features and rc files Certain environment variables and aliases functions need to be set for Modules to work correctly This is handled by the Module init files in opt modules default init which contains separate init files for each of the various supported shells where the default is a symbolic link to a module command version Skeleton Shell RC Dot Files The skeleton files provide a default environment for new users when they are added to your system this can be used if you do not have the time to set them up individually The files are usually placed in etc skel or wherever you specified with the with skel path2 path option to the configuration script and contains a minimal set of
41. 2 9 2 27 3 2 2 7 4 2 24 5 2 10 Some tool commands and device functionalities rely on the implementation of the MPI components This simple tool maps keybindings to the underlying CPM Therefore a unique command can be used to launch a job on a different CPM using the same syntax mpibull2 launch system takes in account the fact that a user might want to choose their own keybindings A template file named keylayout tmp1 may be found in the tools rpm which may be used to construct individual keybinding preferences Launching a job on a cluster using mpibull2 launch For a SLURM CPM use a command similar to the one below and set MPIBULL2_LAUNCHER srun to make this command compatible with the SLURM CPM mpibull2 launch n 16 N 2 ptest job Example for a user who wants to use the Y key for the partition PM Partition to usetY partition The user should edit a file using the format found in the example template and then add custom bindings using the custom keybindings option The sign is used to separate the fields The first field is the name of the command the second the short option with a colon if an argument is needed and the third field is the long option mpiexec This is a launcher which connects to the MPD ring mpirun This is a launcher which connects to the MPD ring mpicc mpiCC mpicxx mpif77 and mpif90 These are all compiler wrappers and are available for C C Fortran 77 and
42. 2 user Using the mpibull2 params tool this file can then be used to restore the set of parameters combined in exactly the same way at a later date Notes The effectiveness of a set of parameters will vary according to the application For instance a particular set of parameters may ensure low latency for an application but reduce the bandwidth By carefully defining the parameters for an application the optimum in terms of both latency and bandwidth may be obtained e Some parameters are located in the proc file system and only super users can modify them The entry point of the mpibull2 params tool is an internal function of the environment This function calls an executable to manage the MPI parameter settings and to create two temporary files According to which shell is being used one of these two files will be used to set the environment and the two temporary files will then be removed To update your environment automatically with this function please source either the MPI HOME bin setenv mpibull2 sh file or the MPILHOME bin setenv_mpibull2 csh file according to which shell is used The mpibull2 params command SYNOPSIS mpibull2 params operation type options Actions The following actions are possible for the mpibull2 params command List the MPI parameters and their values List families of parameters m MPI parameter d Display all modified parameters Save the current c
43. 3 10 IDB 8 1 Intel C compiler 4 2 Intel compiler licenses 4 3 Intel Fortran compiler 4 1 K KSIS 1 1 L lapack_sci 3 5 LSF 6 16 METIS 3 8 Modules 1 2 5 2 command line switches 5 9 Commands 5 2 5 7 Environment variables 5 12 modulecmd 5 9 Modulefiles 5 7 modulefiles directories 5 5 Shell RC files 5 4 Sub Commands 5 10 TCL 5 7 MPFR 3 9 MPI libraries Bull MPI2 1 2 Bull MPI22 1 3 MPI 2 standard 2 2 MPIBull2 2 2 Features 2 3 Index 1 1 MPI COMM SPAWN 2 7 MPI PUBLISH NAME 2 7 Thread safety 2 5 MPIBull2 devices 2 9 MPIBull2 launch 2 9 N NETCDF 3 7 Nodes Compilation nodes 5 1 login node 5 1 Service node 5 1 NVIDIA CUDA cubin object 4 4 CUDA Toolkit 4 4 5 14 5 15 Software Developer Kit 5 14 5 15 NVIDIA Scientific Libraries CUBLAS 3 15 CUFIT 3 14 NVIDIA Scientific Libraries 3 14 O OpenS shelf rpm 3 2 P Parallel Libraries 2 1 PARAMETIS 3 8 PBLAS 3 14 PBS Professional Job script 7 1 Launching a job 7 2 Tracing a job 7 2 Using 7 1 Performance and Profiling Tools Profilecomm 2 19 PETSc 3 7 pgapack 3 11 pNETCDF 3 7 profilecomm 2 19 2 bullx cluster suite User s Guide R rlogin 5 1 rsh 5 1 S SCALAPACK 3 4 Scientific Libraries 3 1 BLACS 3 3 BLAS 3 13 BlockSolve9 5 3 5 Cluster MKL Intel Cluster Math Kernel Library 3 13 FFTW 3 6 ga Global Array 3 10 gmp sci 3 8 GSL 3 10 LAP
44. A Toolkit and Software Developer 5 15 Chapter 6 Resource Management using 6 1 6 1 SLURM Resource Management Utilities Dos de deett 6 1 6 2 MEI Suppor eie E e PPM DRE C cereale f OR VR eeu heen PD VR VV nee 6 2 6 3 SR ING fos a Saeed cae e d Off sehen ead a 6 4 A MER un RCNH TEN 6 5 6 5 5 allocation sores eee ES 6 6 6 6 AVA GTA ot icu acct oleae a che Sah eek eT tate cat ca tee enema ides 6 7 6 7 SACS TIGR i ong pias nc a e a ea a eed eet es 6 8 6 8 SBGAST cesses etr eite i ut e sly Sa Ses SUN pea yet eee eS DTE 6 9 GU SSGDEUE NEM edle e E p 6 10 6 10 SINFO Report Partition and Node 6 11 6 11 SGANGEL Signal Cancel Jobs eter PS Egit be E DR EDEN FUORI Ce ins 6 12 6 12 SSACCOT Accounting Dalai veia pO e Pate PR ia eR EROR Ea INE 6 13 wo ISURIGGER mpm 6 14 8 A drin 6 15 615 lobalAecauniteuAP ctas is eic ap vol b ER decet head 6 16 Chapter 7 Launching an Application 7 1 7 1 Using PBS Professional Batch Managers cisco a trei er ERIS OR er e Sr ied SER 7 1 7 1 1 Pre requisites anea a aa a a 7 1 7 1227 S bmiting ern tate se iue Nea Di TA up N 7 1 PAGS Launching Ed E A Ge Ne
45. ACK 3 14 lapack sci 3 5 METIS 3 8 MKL Intel Math Kernel Library 3 13 MPFRi 3 9 NetCDF 3 7 PARAMETIS 3 8 PBLAS 3 14 PETSc 3 7 pgapack 3 11 pNETCDF 3 7 SCALAPACK 3 4 SCIPORT 3 8 sHDF5 pHDF5 3 9 SuperLU 3 6 valgrind 3 12 Scientific Studio 3 1 SCIPORT 3 8 SciStudio_shelf rpm 3 2 Secure Shell ssh command 5 1 sHDF5 pHDF5 3 9 SLURM Command Line Utilities 6 1 Global Accounting API 6 1 6 16 sacct command 6 1 6 13 sacctmgr command 6 1 6 8 salloc command 6 1 6 6 sattach command 6 1 6 7 sbatch command 6 1 6 5 T sbcast command 6 9 scancel command 6 1 6 12 TCL 5 7 sinfo command 6 1 6 11 squeue command 6 1 6 10 V srun command 6 1 6 4 strigger command 6 1 6 14 valgrind 3 12 sview command 6 1 6 15 SuperLU 3 6 Index l 3 4 bullx cluster suite User s Guide BULL CEDOC 357 AVENUE PATTON B P 20845 49008 ANGERS CEDEX 01 FRANCE REFERENCE 86 A2 22FA 02
46. ACK Linear Algebra PACKage LDAP Lightweight Directory Access Protocol LDIF LDAP Data Interchange Format A plain text data interchange format to represent LDAP directory contents and update requests LDIF conveys directory content as a set of records one record for each object or entry It represents update requests such as Add Modify Delete and Rename as a set of records one record for each update request LKCD Linux Kernel Crash Dump A tool used to capture and analyze crash dumps LOV Logical Object Volume LSF Load Sharing Facility LUN Logical Unit Number G A bullx cluster suite User s Guide LVM Logical Volume Manager LVS Linux Virtual Server M MAC Media Access Control a unique identifier address attached to most forms of networking equipment MAD Management Datagram Managed Switch A switch with no management interface and or configuration options MDS MetaData Server MDT MetaData Target MFT Mellanox Firmware Tools MIB Management Information Base MKL Maths Kernel Library MPD MPI Process Daemons MPFR C library for multiple precision floating point computations MPI Message Passing Interface MTBF Mean Time Between Failures MTU Maximum Transmission Unit N Nagios A tool used to monitor the services and resources of Bull HPC clusters NETCDF Network Common Data Form NFS Network File System NIC Network Interface Ca
47. C Fortran and C wrappers mpi user gt gt gt mpicc cc gcc prog c o prog bullx cluster suite User s Guide 2 2 3 Configuring MPIBull2 MPIBull2 may be used for different architectures including standalone SMPs Ethernet Infiniband or Quadrics Clusters You have to select the device that will use MPIBull2 before launching an application with MPIBull2 The list of possible devices available is as follows osock is the default device This uses sockets to communicate and is the device of choice for Ethernet clusters oshm should be used on a standalone machines communication is through shared memory ibmr gen2 otherwise known as InfiniBand multi rail gen2 This works over InfiniBand s verbs interface The device is selected by using the mpibull2 devices command with the d switch for example enter the command below to use the shared memory device mpi user gt gt gt mpibull2 devices d oshm For more information on the mpibull2 devices command see section 2 2 7 2 2 4 Running MPIBull2 The MPI application requires a launching system in order to spawn the processes onto the cluster Bull provides the SLURM Resource Manager as well as the MPD subsystem For MPIBull2 to communicate with SLURM and MPD the PMI interface has to be defined By default MPIBull2 is linked with MPD s PMI interface If you are using SLURM you must ensure that MPIBULL2_PRELIBS includes lpmi so that your MPI app
48. DULESHOME init csh script The sh csh tcsh bash ksh and zsh shells are all supported by modulecmd In addition python and perl shells are supported which writes the environment changes to stdout as python or perl code bullx cluster suite User s Guide 95 5 3 Examples of Initialization In the following examples replace MODULESHOME with the actual directory name C Shell initialization and derivatives source S MODULESHOME init csh module load modulefile modulefile Bourne Shell sh and derivatives S MODULESHOME init sh module load modulefile modulefile Perl require S MODULESHOME init perl amp module load modulefile modulefile 5 5 4 Modulecmd Startup Upon invocation modulecmd sources rc files which contain global user and modulefile specific setups These files are interpreted as modulefiles Upon invocation of modulecmd module RC files are sourced in the following order 1 Global RC file as specified by MODULERCFILE or MODULESHOME J etc rc 2 User specific module RC file HOME modulerc 3 All module rc and version files found during modulefile searches 23 9 0 Module Command Line Switches The module command accepts command line switches as its first parameter These may be used to control output format of all information displayed and the module behaviour in the case of locating and interpreting module files All switches may be entered
49. IPOIT oc Seien ee E REIR ER de et eee tt en ERE ERR CERO EROR oer d 3 8 gimpzsel erect a Ted etta Ce 3 8 uL ACA IMPER va nea s CS RON SQ bu ORE EN PNR Ronan A af ond 3 9 3 219 SEIBESZDEIDES sitet ceca oa nea epa E a 3 9 3 2 16 dag SIO Bel AUG soe oua b detulit am oi od M Cir 3 10 xm LEE rM EE M ER 3 10 2 2 418 v pOdpdtk cervo metes vibe 3 11 3 2 197 ee brio iru t i o tu tudin fien 3 12 3 3 Intel Scientific libraries oo E a Mc Md 3 13 3 3 1 Intel Math Kernel sese ea a Rp o UR RM SR ERA 3 13 3 3 2 Intel Cluster Math SIS D e ae tres vs oe ie 3 13 35323 e e d eet ee ei dee ead emo e e E 3 13 COPBUNS Lower teret ler e e 3 14 9 9 5 LAPACK sot ote tee do ee nest ne eui P at ere Pe 3 14 3 4 NVIDIA CUDA Scientific Libraries eene 3 14 3 4 1 lel 3 14 Lm VUE CUP KA 3 15 Chapter 4 4 1 4 1 Oveni Mr c 4 1 4 2 Intel Fortran Compiler Professional Edition for Linux sse 4 1 4 3 Intel Compiler Professional Edition for Linux ssseeeenees 4 2 4 4 Intel Compiler EA te URS M ERR PER ERROR 4 3 4 5 Intel Math Kernel Library Licenses eee po rre dose eo apad n e a ates pass 4 4 AVG d 4 4 4 7 INVIDIA AVG eas sod ouo 4 4 AF
50. L with its optimized functions for maths processing It is also compatible with GNU products It also supports big endian encoded files Finally this compiler allows the execution of applications which combine programs written in C and Fortran See www intel com for more details Compilers 4 1 Different versions of the compiler may be installed to ensure compatibility with the compiler versions used to compile the libraries and applications on your system Note It may be necessary to contact the System Administrator to ascertain the location of the compilers on your system The paths shown in the examples below may vary To specify a particular environment use the command below source opt intel Compiler maj ver nb min ver nb bin ifortvars sh intel i For example e use version 11 0 069 of the Fortran compiler source opt intel Compiler 11 0 069 bin ifortvars sh intel64 e display the version of the active compiler enter ifort version e obtain the compiler documentation go to opt intel Compiler 11 0 069 Documentation Remember that if you are using MPI_Bull then a compiler version has to be used which is compatible with the compiler originally used to compile the MPI library 4 3 Intel C Compiler Professional Edition for Linux The current version of the Intel C compiler is version 11 The main features of this compiler are e Advanced optimizatio
51. Linking Strategies Thread safety If the application needs an MPI Library which provides MPI THREAD MULTIPLE thread safety level then choose a device which supports thread safety and select a ts device Use the mpibull2 device commands Note Thread safety within the MPI Library requires data locking Linking with such a library may impact performance A loss of around 10 to 30 has been observed on micro benchmarks Not all MPI Drivers are delivered with a thread safe version Devices known to support MPI THREAD MULTIPLE include osock and oshm Using MPD MPD is a simple launching system from MPICH 2 To use it you need to launch the MPD daemons on Compute hosts If you have a single machine just launch mpd amp and your MPD setup is complete If you need to spawn MPI processes across several machines you must use mpdboot to create a launching ring on the cluster This is done as follows 1 Create the hosts list mpi user gt gt gt export cluster machines hostl host2 host3 host4 2 Create the file used to store host information mpi user gt gt gt for i in cluster machines do echo i gt gt machinefiles done 3 Bootthe MPD system on all the hosts mpi user gt gt gt mpdboot n cat clustermachines wc 1 f machinefiles Parallel Libraries 2 5 4 Check if everything is OK mpi user gt gt gt mpdtrace 5 Run the application or try hostname mpi_user gt gt
52. NUM CUDA SAFE CALL cudaMemcpy dvalues values sizeof int It will now be possible to compile 5 The makefile system will automatically recognize your MPI compiler and use it to obtain the right options This has been tested for MPIBull products as well as OpenMPI and MPICH products 6 The makefile system will create two directories in your application directory These are linux and obj and are used to store the executable file and the object files respectively See The NVIDIA CUDA Compute Unified Device Architecture Programming Guide and The CUDA Compiler Driver document available from www nvidia com for more information 4 6 bullx cluster suite User s Guide Chapter 5 The User s Environment This chapter describes how to access the extreme computing environment how to use file systems and how to use the modules package to switch and compare environments e 5 1 Cluster Access and Security e 5 2 Global File Systems e 5 3 Environment Modules e 5 4 Module Files e 5 5 The Module Command e 5 6 The NVIDIA CUDA Development Environment 5 1 Cluster Access and Security Typically users connect to and use a cluster as described below e Users log on to the cluster platform either through Service Nodes or through the Login Node when the configuration includes these special Login Node s Once logged on to a node users can then launch their jobs e Compilation is possible on all nodes which h
53. PHDF5 pHDF5 lt version gt mpibull2 lt version gt bin opt scilibs PHDF5 pHDF5 lt version gt mpibull2 lt version gt include opt scilibs PHDF5 pHDF5 lt version gt mpibull2 lt version gt doc opt scilibs SHDF5 sHDF5 lt version gt lib opt scilibs SHDF5 sHDF5 lt version gt bin opt scilibs SHDF5 sHDF5 lt version gt include opt scilibs SHDF5 sHDF5 lt version gt doc More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt PHDF5 pHDF5 lt version gt opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt SHDF5 sHDF5 lt version gt ga Global Array The Global Arrays GA toolkit provides an efficient and portable shared memory programming interface for distributed memory computers Each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed dense multi dimensional arrays without the need for explicit cooperation with other processes Unlike other shared memory environments the GA model exposes the non uniform memory access NUMA characteristics of the high performance computers to the programmer and takes into account the fact that access to a remote portion of the shared data is slower than to the local portion The location information for the shared data is available and direct access to the local portion
54. TaskId char stepName NAME SIZE char stepNodes STEP NODE 51211 int maxVsizeNode int maxRssNodeId int maxPagesNodeId int minCpuTimeNodeId char account H 6 18 bullx cluster suite User s Guide priority partition node group ID Block ID nproc ave vsize max rss max rss task ave rss max pages max pages task ave pages min cpu min cpu task step process name step node list max vsize node max rss node max pages node min cpu node account number Chapter 7 Launching an Application 7 1 Using PBS Professional Batch Manager PBS Professional Batch Manager from Altair Engineering is the batch manager used by bullx cluster suite to run batch jobs See The PBS Professional Administrator s Guide and User s Guide available on the PBS Professional CD ROM for more information on the options for using PBS Professional Y Important PBS Professional does not work with SLURM and should only be installed on clusters which do not use SLURM If SLURM has been installed see your System Administrator or chapter 8 in the Administrator s Guide 7 1 1 Pre requisites 1 The User ssh keys should have been dispatched so that the User can access the Compute Nodes See the Administrator s Guide for details on how to do this 2 To use PBS Professional with MPIBull2 the
55. ample node name WARNING A Warning notice indicates an action that could cause damage to a program device system or data Preface iii iv bullx cluster suite User s Guide Table of Contents Y Mele i Chapter 1 Introduction to the extreme computing Environment 1 1 1 1 Software Coni qur OI etd E EE 1 1 L3 Operating System and axipaandoneaneues 1 1 1 2 Program Execution Environment TE 1 2 1 2 1 Resource AMO 1 2 1 2 2 Vigne aaa eas ae 1 2 1 2 3 Parallel processing and MPI TIDIGIeS s oro 1 3 er MEE 1 3 Chapter 2 Parallel WL PCM s es ii ries ti iiaae 2 1 25 Overview of Parallel Libraries siccata Mog EUR RUM SE E ME ames tance 2 1 2 2 viz eem E TE UN ETE 2 2 22 Quick Start for MPIBUIS nM RUM MERLO MM ME 2 2 FEL NEP Trees rm 2 2 VA MED redu T 2 3 MEE MPIBull2 2 3 HUC NEP DNE 2 3 226 Adyanced aieo ite A ut e Mt ER RUM Ia
56. an pages carefully The version directory may look something like this aca opt modules versions 3 0 5 rko 3 0 6 rko 3 0 7 rko 3 0 8 rko 3 0 9 rko The model you should use for modulefiles is name version For example opt modules modulefiles directory may have a directory named firefox which contains the following module files 301 405c 451c etc When it is displayed with module avail it looks something like this firefox 301 firefox 405c firefox 451c default firefox 45c firefox 46 The default is established with version file in the firefox directory and it looks something like this The User s Environment 5 5 5 4 1 5 6 Modulel Od44 THE TH HH Ha HE Ha HE HE aE aE aE aE aE HEE HE aE EE EE aE a aE HE aE aE EE aE EE aE aE a aE EE BE GE tt version file for Firefox Ht set ModulesVersion 4516 If the user does module load firefox then the default firefox 45 1c will be used The default can be changed by editing the version file to point to a different module file in that directory If no version file exists then Modules will just use the last module in the alphabetical ordered directory listed as the default Upgrading via the Modules Command The theory is that Modules should use a similar package version locality as the package environments it helps to define Switching between versions of the module command should be as easy as switching between different packages via the m
57. and profilecomm Parallel Libraries 2 19 2 20 bullx cluster suite User s Guide Chapter 3 Scientific Libraries This chapter describes the following topics e 3 1 Overview e 3 2 Bull Scientific Studio 3 3 Intel Scientific Libraries e 3 4 NVIDIA CUDA Scientific Libraries Important See the Software Release Bulletin for details of the Scientific Libraries included with your delivery 3 1 Overview Scientific Libraries include tested optimized and validated functions that spare users the need to develop such subprograms themselves The advantages of scientific libraries are e Portability e Support for different types of data real complex double precision etc e Support for different kinds of storage banded matrix symmetrical etc The following sets of scientific libraries are available for Bull extreme computing clusters Bull Scientific Studio is included in the bullx cluster suite delivery and includes a range of Open Source libraries that can be used to facilitate the development and execution of a wide range of applications See The Software Release Bulletin for your delivery for details of the Scientific Studio libraries included in your release Proprietary scientific libraries that have to be purchased separately are available from Intel and from NVIDIA for those clusters which include NVIDIA graphic card accelerators 3 2 Bull Scientific Studio Bull Scientific Stud
58. ave compilers installed on them The best approach is that compilers reside on Login Nodes so that they do not interfere with performance on the Compute Nodes 5 1 ssh Secure Shell The ssh command is used to access a cluster node Syntax ssh 1 login name hostname user hostname command ssh afgknqstvxACNTX1246 b bind address c cipher spec e escape char i identity file 1 login name m mac spec o option p port F configfile L port host hostport R port host hostport D port hostname user hostname command ssh ssh client can also be used as a command to log onto a remote machine and to execute commands on it It replaces rlogin and rsh and provides secure encrypted communications between two untrusted hosts over an insecure network X11 connections and arbitrary TCP IP ports can also be forwarded over the secure channel ssh connects and logs onto the specified hostname The user must verify his her identity using the appropriate protocol before being granted access to the remote machine The User s Environment 5 1 2 2 5 3 3 9 1 Global File Systems The bullx cluster suite uses the NFS distributed file system Environment Modules Environment modules provide a great way to customize your shell environment easily particularly on the fly For instance an environment can consist of one set of compatible products including a defined release of a FORTRAN compiler a C compiler a
59. bullx cluster suite User s Guide O 2 Q C O x REFERENCE 86 A2 22FA 02 extreme computing bullx cluster suite User s Guide Software July 2009 BULL CEDOC 357 AVENUE PATTON B P 20845 49008 ANGERS CEDEX 01 FRANCE REFERENCE 86 A2 22FA 02 The following copyright notice protects this book under Copyright laws which prohibit such actions as but not limited to copying distributing modifying and making derivative works Copyright O Bull SAS 2009 Printed in France Trademarks and Acknowledgements We acknowledge the rights of the proprietors of the trademarks mentioned in this manual All brand names and software and hardware product names are subject to trademark and or patent protection Quoting of brand and product names is for information purposes only and does not represent trademark misuse The information in this document is subject to change without notice Bull will not be liable for errors contained herein or for incidental or consequential damages in connection with the use of this material Preface Scope and Objectives The purpose of this guide is to describe the tools and libraries included in the bullx cluster suite delivery which allow the development and testing of application programs on the Bull extreme computing clusters In addition various open source and proprietary tools are described Intended Readers This guide is for users and developers of B
60. cluster nodes apart from the user login system For development the environment consists of Standard Linux tools such as GCC a collection of free compilers that can compile C C and FORTRAN GDB Gnu Debugger and other third party tools including the Intel FORTRAN Compiler the Intel C Compiler Intel MKL libraries and Intel Debugger IDB Optimized parallel libraries that are part of the bullx cluster suite These libraries include the Bull MPI2 message passing library Bull MPI2 complies with the MPI1 and 2 standards and is a high performance high quality native implementation Bull MPI2 exploits shared memory for intra node communication It includes a trace and profiling tool enabling data to be tracked Modules software provides a means for predefining and changing environments Each one includes a compiler a debugger and library releases which are compatible with each other So it is easy to invoke one given environment in order to perform tests and then compare the results with other environments 12 1 Resource Management The resource manager is responsible for the allocation of resources to jobs The resources are provided by nodes that are designated as compute resources Processes of the job are assigned to and executed on these allocated resources Both Gigabit Ethernet and InfiniBand clusters use the SLURM Simple Linux Utility for Resource Management open source highly scalable cluster management and job scheduling syst
61. cmd Each modulefile contains the changes to a user s environment needed to access an application TCL is a simple programming language which permits modulefiles to be arbitrarily complex depending on the needs of the application and the modulefile writer If support for extended tcl tclX has been configured for your installation of modules you may also use all the extended commands provided by tclX modulefiles can be used to implement site policies regarding the access and use of applications The User s Environment 5 7 9 9 2 5 8 A typical modulefiles file is a simple bit of code that sets or adds entries to the PATH or other environment variables TCL has conditional statements that are evaluated when the modulefile is loaded This is very effective for managing path or environment changes due to different OS releases or architectures The user environment information is encapsulated into a single modulefile kept in a central location The same modulefile is used by all users independent of the machine So from the user s perspective starting an application is exactly the same regardless of the machine or platform they are on modulefiles also hide the notion of different types of shells From the user s perspective changing the environment for one shell looks exactly the same as changing the environment for another shell This is useful for new or novice users and eliminates the need for statements such as if you r
62. creation access and sharing of array oriented scientific data The library is located in the following directories opt scilibs PNETCDF pNetCDF lt version gt gt mpibull2 lt version gt bin opt scilibs PNETCDF pNetCDF lt version gt gt mpibull2 lt version gt include opt scilibs PNETCDF pNetCDF lt version gt gt mpibull2 lt version gt lib Scientific Libraries 3 7 248511 3 2 12 24 13 3 8 opt scilibs PNETCDF pNetCDF lt version gt mpibull2 lt version gt man More information is available from documentation included in the SciStudio shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt PNETCDF pNetCDF lt version gt METIS and PARMETIS METIS is a set of serial programs for partitioning graphs partitioning finite element meshes and producing fill reducing orderings for sparse matrices The algorithms implemented in METIS are based on the multilevel recursive bisection multilevel kway and multi constraint partitioning schemes developed in our lab ParMETIS is an MPl based parallel library that implements a variety of algorithms for partitioning unstructured graphs meshes and for computing fill reducing orderings of sparse matrices ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel Adaptive Mesh Refinement computations and large scale numerical simulations
63. dd it and export the new value exportfs a bullx cluster suite User s Guide 2 2 10 Debugging 2 2 10 1 Parallel gdb With the mpiexec launching tool it is possible to add the Gnu DeBugger in the global options by using gdb All the gdb outputs are then aggregated indicating when there are differences between processes The gdb option is very useful as it helps to pinpoint faulty code very quickly without the need of intervention by external software Refer to the gdb man page for more details about the options which are available 2 2 10 2 Totalview Totalview is a proprietary software application and is not included in the bullx cluster suite distribution See chapter 8 for more details It is possible to submit jobs using the SLURM resource manage with a command similar to the format below or via MPD totalview srun a args prog Xprogs args Alternatively it is possible to use MPI process daemons MPD and to synchronize Totalview with the processes running on the MPD ring mpiexec tv args prog Xprogs args 2 2 10 3 MARMOT MPI Debugger MARMOT is an MPI debugging library MARMOT surveys and automatically checks the correct usage of the MPI calls and their arguments made during runtime It does not replace classical debuggers but is used in addition to them The usage of the MARMOT library will be specified when linking and building an application This library will be linked to the a
64. e CUFFT library provides a simple interface for computing parallel FFTs on a Compute Node connected to a Tesla graphic accelerator allowing the floating point power and parallelism of the node to be fully exploited FFT libraries vary in terms of supported transform sizes and data types For example some libraries only implement Radix 2 FFTs restricting the transform size to a power of two while other implementations support arbitrary transform sizes The CUFFT library delivered with bullx cluster suite supports the following features e 1D 2D and 3D transforms of complex and real valued data e Batch execution of multiple 1D transforms in parallel bullx cluster suite User s Guide e 2D and 3D transforms in the 2 16384 range in any dimension e 10 transforms up to 8 million elements e Inplace and outof place transforms for real and complex data The interface to the CUFFT library is the header file cufft h Applications using CUFFT need to link against the cufft so Linux DSO when building for the device and against the cufftemu so Linux DSO when building for device emulation See The CUDA CUFFT Library document available from www nvidia com for more information regarding types API functions code examples and the use of this library 3 4 2 CUBLAS CUBLAS is an implementation of BLAS Basic Linear Algebra Subprograms on top of the NVIDIA CUDA driver The library is self contained at the API level that is
65. e slurmdbd daemon to display data for all jobs and job steps in the SLURM accounting log e SVIEW used to display SLURM state information graphically Requires an Xwindows capable display e Global Accounting API for merging the data from a LSF accounting file and the SLURM accounting file into a single record b n SLURM does not work with PBS Professional Resource Manager and should only be installed on clusters which do not use PBS PRO Note X There is only a general explanation of each command in the following sections For complete and detailed information please refer to the man pages For example man srun Resource Management using SIURM 6 1 6 2 6 2 MPI Support The PMI Process Management Interface is provided by MPIBull2 to launch processes on a cluster and provide services to the MPI interface For example a call to pmi get appnum returns the job id This interface uses sockets to exchange messages In MPIBull2 this mechanism uses the MPD daemons running on each compute node Daemons can exchange information and answer the PMI calls SLURM replaces the Process Management Interface with its own implementation and its own daemons No MPD is needed and when a PMI request is sent for example pmi get appnum SLURM extension must answer this request The following diagrams show the difference between the use of PMI with and without a resource manager that allows process mana
66. e using the C Shell do this otherwise if you re using the Bourne shell do this Announcing and accessing new software is uniform and independent of the user s shell From the modulefile writer s perspective this means one set of information will take care of all types of shells Example of a Module file SMOdulel OF HHHHHE HEHE HEHE HEHE HEE HE HEE HE EE HE EE HE EE EE EE E AE EE FE AE EE EE RE RE EE set INTEL intel cc module whatis loads the icc 10 1 011 Intel C C environment for EM64T set iccroot opt intel cce 10 1 011 prepend path PATH Siccroot bin prepend path LD LIBRARY PATH iccroot lib setenv MAN PATH Siccroot man prepend path INTEL LICENSE FILE iccroot licenses opt intel licenses Modules Package Initialization The Modules package and the module command are initialized when a shell specific initialization script is sourced into the shell The script creates the module command as either an alias or function creates Modules environment variables and saves a snapshot of the environment in HOME modulesbeginenv The module alias or function executes the modulecmd program located in MODULESHOME bin and has the shell evaluate the command s output The first argument to modulecmd specifies the type of shell The initialization scripts are kept in MODULESHOME init shellname where shellname is the name of the sourcing shell For example a C Shell user sources the MO
67. el Math Kernel Library Licenses 4 6 GNU Compilers 4 7 NVIDIA nvcc C Compiler 4 1 Overview Compilers play an essential role in exploiting the full potential of Xeon processors Bull therefore recommends the use of Intel C C and Intel Fortran compilers GNU compilers are also available However these compilers are unable to compile link any program which uses MPI Bull For MPI Bull programs it is essential that Intel compilers are used Alternatively clusters that use NVIDIA Tesla graphic accelerators connected to the Compute Nodes will use the compilers supplied with the NVIDIA CUDA Toolkit and Software Development Kit 4 2 intel Fortran Compiler Professional Edition for Linux The current version of the Intel Fortran compiler is version 11 This supports the Fortran 95 Fortran 90 Fortran 77 Fortran IV standards whilst including many features from the Fortran 2003 language standard The main features of this compiler are Advanced optimization features including auto vectorization High Performance Parallel Optimizer HPO Interprocedural Optimization IPO Profile Guided Optimization PGO and Optimized Code Debugging Multi hreaded Application Support including OpenMP and Auto Parallelization to convert serial applications into parallel applications to fully exploit the processing power that is available Data preloading Loop unrolling The Professional Edition includes the Intel Math Kernel Library Intel MK
68. em SLURM has the following functions It allocates compute resources in terms of processing power and Computer Nodes to jobs for specified periods of time If required the resources may be allocated exclusively with priorities set for jobs It is also used to launch and monitor jobs on sets of allocated nodes and will also resolve any resource conflicts between pending jobs It helps to exploit the parallel processing capability of a cluster See Administrator s Guide and Chapter 6 in this manual for more information SLURM 1 2 2 Batch Management Different possibilities exist for handling batch jobs for extreme computing clusters PBS Professional a sophisticated scalable robust Batch Manager from Altair Engineering is supported as a standard PBS Pro can also be integrated with the MPI libraries 1 2 bullx cluster suite User s Guide See PBS Professional Administrator s Guide and User s Guide available on the PBS Pro CD ROM delivered for the clusters which use PBS Pro and the PBS Pro web site http www pbsgridworks com o AD aan PBS Pro does not work with SLURM and should only be installed on clusters which do not use SLURM LSF a software from Platform Company for managing and accelerating batch workload processing for compute and data intensive applications is optional on Bull extreme computing 1 2 3 Parallel processing and MPI libraries A common approach to parallel programming
69. equire a license fee 4 6 GNU Compilers GCC a collection of free compilers that can compile both C C and Fortran is part of the installed Linux distribution 4 7 nvcc C Compiler For clusters which include NVIDIA Tesla graphic accelerators the NVIDIA Compute Unified Device Architecture CUDA Toolkit is installed automatically on the LOGIN COMPUTE and COMPUTEX nodes The NVIDIA CUDA Toolkit provides a C development environment that includes the nvcc compiler This compiler provides command line options to invoke the different tools required for each compilation stage nvcc s basic workflow consists in separating device code from host code and compiling the device code into a binary form or cubin object The generated host code is outputted either as C code that can be compiled using another tool or directly as object code that invokes the host compiler during the last compilation stage Source files for CUDA applications consist of a mixture of conventional C host code and graphic accelerator device functions The CUDA compilation trajectory separates the device functions from the host code compiles the device functions using proprietary NVIDIA compilers assemblers compiles the host code using the general purpose C C compiler that is available on the host platform and afterwards embeds the compiled graphic accelerator functions as load images in the host object file In the linking stage specific CUDA runtime l
70. fo null module cvs modules use own lala lier eT ME opt modules modulefiles oscar modules 1 0 3 default Modules available for the user are listed under the line opt modules modulefiles bullx cluster suite User s Guide The command to load a module is module load module name The command to verify the loaded modules list is module list Using the avail command it is possible that some modules will be marked default module avail These modules are those which have been loaded without the user specifying a module version number For example the following commands are the same module load configuration module load configuration 2 The module unload command unloads a module The module purge command clears all the modules from the environment module purge It is not possible to load modules which include different versions of intel cc or intel fc at the same time because they cause conflicts Module Configuration Examples Note The configurations shown below are examples only The module configurations for bullx cluster suite will differ intel fc version 8 0 046 intel cc version 8 0 066 intel db version 8 1 3 intel mkl version 7 0 017 intel fc version 8 0 049 intel cc version 8 0 071 intel db version 8 1 3 intel mkl version 7 0 017 intel fc version 8 0 061 intel cc version 8 0 071 intel db version 8 1 3 intel mkl version 7 0 0
71. from the command line for example module load cuda bullx cluster suite User s Guide 50 2 NVIDIA recommends that the SDK is copied into the file system for each user To do this a makefile is used this produces around 60 MBs of binaries and libraries for each user The SDK is installed in the opt cuda sdk directory A patch has been applied to some of the files in order to suppress the relative paths that obliged the user to develop inside SDK These patches are mainly related to the CUDA environment and the MPI options provided for the nvcc compiler and linker Programme examples are included in the opt cuda sdk projects directory These programmes and the use of SDK are not documented however the source code can be examined to obtain an idea of developing a program in the CUDA environment SDK will be delivered precompiled to save time for the user and includes macros to help error tracking NVIDA CUDA Toolkit and Software Developer Kit The NVIDIA CUDA Toolkit provides a complete C development environment including e nvcc C compiler e CUDA FFT and BLAS libraries e visual profiler A GDB debugger CUDA runtime driver e CUDA programming manual The NVIDIA CUDA Developer Software Developer Kit provides CUDA examples with the source code to help get started with the CUDA environment Examples include e Matrix multiplication e Matrix transpose e Performance profiling using timers e
72. gement MPI PROCESS MANAGEMENT WITHOUT RESOURCE MANAGER mpd dt MP mpd m mpd _ n mpd y u 4 Communication by sockets mpd on each compute node MPI PROCESS MANAGEMENT WITH RESOURCE MANAGER Slurmd threads gt Slurm global 7 communications Slurmd threads T Slurmd threads Slurmd daemons on each compute node creates threads A Slurmd threads gt C Slurmd threads gt oo a Communication with the Slurm PMI module Figure 6 1 MPI Process Management With and Without Resource Manager bullx cluster suite User s Guide MPIBull2 jobs can be launched directly by the srun command SLURM s none MPI plug in must be used to establish communications between the launched tasks This can be accomplished either using the SLURM configuration parameter MpiDefault none in slurm conf or srun s mpi none option The program must also be linked with SLURM s implementation of the PMI library so that tasks can communicate host and port information at startup The system administrator can add this option to the mpicc and mpif77 commands directly so the user will not need to bother Do not use SLURM s MVAPICH plug in for MPIBull2 mpicc L path to slurm lib lpmi srun n20 mpi none a out Notes e Some MPIBull2 functions are not currently supported by the PMI library integrated with SLURM e the environment
73. gt mpiexec n 4 your_application MPI Process Daemons MPD run on all nodes in a ring like structure and may be used in order to manage the launch of the different processes MPIBull2 library is PMI compliant which means it can interact with any other PMI PM This software has been developed by ANL In order to set up the system the MPD ring must firstly be knitted using the procedure below 1 At the HOME prompt edit the mpd conf file by adding something like MPD_SECRETWORD your_password and chmod 600 to the file 2 Create a boot sequence file Any type of file may be used The MPD system will by default use the mpd hosts file in your HOME directory if a specific file is not specified in the boot sequence This contains a list of hosts separated by carriage returns Semi colons can be added to the host to specify the number of CPUS for the host for example host1 4 host2 8 CU mpiexec CU mpirun processes processes processes on host on host on host Figure 2 2 MPD ring 3 Boot the ring by using the mpdboot command and specify the number of hosts to be included in the ring mpdboot n 2 f myhosts file Check that the ring is functioning correctly by using the mpdtrace or mpdringtest commands If everything is okay then jobs may be run on the cluster 2 6 bullx cluster suite User s Guide 2 2 6 4 Dynamic Process Services The main goal of these services is to provide a means t
74. home directory of the user should include the mpd conf file which includes the user s password details Only the user should have read and write rights for the mpd conf file 3 If necessary add the opt pbs default to the user s PATH 7 1 2 Submitting a script Run the command below to see the job submission script named test pbs in this example cat test pbs The script will appear similar to that below and can be edited if necessary bin bash PBS 1 select 2 ncpus 3 mpiprocs 3 PBS 1 place scatter source opt mpi mpibull2 version share setenv mpibull2 sh mpirun n 6 hostname Launching an Application 7 1 7 139 Launching a job Use the qsub command to launch a job with this script as below qsub test pbs The output will be in the format This indicates that the number of the job is 466 on machine zeusO 7 1 4 Displaying the results for a job Use the command qstat to see the details of the jobs submitted qstat an zeus0 Req d Req d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time 466 zeus0 user name gt workq test pbs 11449 2 6 R 00 00 zeus8 0 3 zeus9 0 3 Here it is possible to see that as specified in the script the job is running on 3 CPUs on two nodes named zeus8 and zeus9 7 1 5 Tracing a job Run the command tracejob to see the progress for a specific job for example 466 tracejob 466 This will give output similar to that below show
75. ibraries are added to support remote SIMD procedure calls and to provide explicit GPU manipulations such as allocation of GPU memory buffers and host GPU data transfer The compilation trajectory involves several splitting compilation preprocessing and merging steps for each CUDA source file These steps are subtly different for different modes of CUDA compilation such as compilation for device emulation or the generation of fat device code binaries It is the purpose of the CUDA nvcc compiler driver to keep the intricate details of CUDA compilation hidden from developers Additionally instead of being a specific CUDA compilation driver nvcc mimics the behavior of general purpose compiler drivers e g GCC in that it accepts a range of conventional compiler options for example to define macros and include library paths and to manage the compilation process All non CUDA compilation steps are forwarded to the general C compiler that is available on the platform 4 4 bullx cluster suite User s Guide 4 7 1 Compiling with nvcc and MPI The CUDA development environment uses a makefile system A set of makefile rules indicates how to interact with the files the make encounters including the cu source files and c or cxx cpp host code files Note Only C and C formats are accepted in the CUDA programming environment Fortran programs should call the functions from C or C libraries The user can program in any language Pytho
76. ibrary documentation is found under opt opens OPENS SHELF OpenS shelf versions For example the SciStudio libraries are found under SCISTUDIO SHELF SciStudio shelf version library name for example the SCIPORT documentation is included in the folder opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt SCIPORT sciport lt version gt If there are multiple versions of a library then there is a separate directory for each version number A typical documentation directory structure for a shelf rpm files is shown below Packaging information e Configuration information e README notice e e Installation bullx cluster suite User s Guide 3 2 2 3 2 2 1 3 2 2 2 Documentation e HowTos tips e Manuals e Examples tutorials Support e Troubleshooting e Bug reports e FAQs External documents e Documents related to the subject e Weblinks The following scientific libraries are included in Bull Scientific Studio BLACS BLACS stands for Basic Linear Algebra Communication Subprograms BLACS is a specialized communications library that uses message passing After defining a process chart it exchanges vectors matrices and blocks and so on It can be compiled on top of MPI systems BLACS uses MPI and uses MPIBull2 libraries More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located u
77. ice ibmr gen2 2 2 B 2 Submitting a job If a user wants to submit a job then according to the process management system they can use MPIEXEC MPIRUN SRUN or MPIBULL2 LAUNCH to launch the processes on the cluster the online man pages gives details of all the options for these launchers Parallel Libraries 2 11 2 2 9 2 12 MPIBull2 and NFS Clusters To use MPI and NFS together the shared NFS directory must be mounted with the no attribute caching noac option added otherwise the performance of the Input Output operations will be impacted To do this edit the etc fstab file for the NFS directories on each client machine in a multi host MPI environment Note All the commands below must be carried out as root Run the command below on the NFS client machines grep nfs noac etc fstab The fstab entry for nfs noac should appear as below nfs noac nfs noac nfs bg intr noac 0 0 If the noac option is not present add it and then remount the NFS directory on each machine using the commands below umount nfs noac mount nfs noac To improve performance export the NFS directory from the NFS server with the async option This is done by editing the etc exports file on the NFS server to include the async option as below Example The following is an example of an export entry that includes the async option for nfs noac grep nfs noac etc exports If the async option is not present a
78. ied value The argument of this option may be one of novice nov Novice expert exp Experienced module user advanced adv Advanced module user Module Sub Commands e the use of each sub command If an argument is given print the Module specific help information for the modulefile help modulefile e modulefile into the shell environment load modulefile modulefile add modulefile modulefile e Remove modulefile from the shell environment unload modulefile modulefile rm modulefile modulefile e Switch loaded modulefilel with modulefile2 Switch modulefilel modulefile2 swap modulefilel modulefile2 bullx cluster suite User s Guide e Display information about a modulefile The display sub command will list the full path of the modulefile and all or most of the environment changes the modulefile will make when loaded It will not display any environment changes found within conditional statements display modulefile modulefile e List loaded modules show modulefile modulefile list avail path e List all available modulefiles in the current MODULEPATH All directories in the MODULEPATH are recursively searched for files containing the modulefile magic cookie If an argument is given then each directory in the MODULEPATH is searched for modulefiles whose pathname match the argument Multiple versions of an application can be suppo
79. ing all the job execution steps that have been carried out Job 466 zeus0 10 30 2007 12 43 46 L Considering job to run 10 30 2007 12 43 46 S enqueuing into state 1 hop 1 10 30 2007 12 43 46 S Job Queued at request of user zeus0O owner user name gt zeus0 job name test pbs queue workq 10 30 2007 12 43 46 S Job Run at request of Scheduler 8zeusO on hosts zeus8 ncpus 3 mpiprocs 3 zeus9 ncpus 3 mpiprocs 3 10 30 2007 12 43 46 S Job Modified at request of Scheduler zeus0 10 30 2007 12 43 46 L Job run 10 30 2007 12 43 48 S Obit received momhop 1 serverhop 1 state 4 substate 42 10 30 2007 12 43 48 5 Exit status 0 resources used cpupercent 0 resources used cput 00 00 01 resources used mem 2764kb resources used ncpus 6 resources used vmem 30612kb resources used walltime 00 00 02 10 30 2007 12 43 48 S dequeuing from workq state 5 7 2 bullx cluster suite User s Guide 7 1 6 Exiting a job If a job exits before it has completed then use the command in the format below to look at the error log cat test pbs e466 If the mpirun n 6 hostname command in the job script completes successfully run the command below cat essai pbs o466 The output will list the nodes used for example zeus8 zeus8 zeus8 zeus9 zeus9 zeus9 7 2 Launching an Application without a Batch Manager Application Launching tool Serial Clusters with no Resource OpenMP Manager
80. io is based on the Open Source Management Framework OSMF and provides an integrated set of up to date and tested mathematical scientific libraries that can be used in multiple environments They simplify modeling by fixing priorities ensuring the cluster is in full production for the maximum amount of time and are ideally suited for large multi core systems Scientific Libraries 3 1 3 2 1 3 2 BSS Figure 3 1 Bull Scientific Studio structure Scientific Libraries and Documentation The scientific libraries are delivered with the tools included in Bull Scientific Studio for developing and running your application All the libraries included in Bull Scientific Studio are documented in a two rpm files called SciStudio shelf and OpenS shelf as shown in Figure 3 1 This file is included in the bullx cluster suite delivery and can be installed on any system The install paths are opt scilibs SCISTUDIO SHELF SciStudio shelf lt version gt opt opens OPENS_SHELF OpenS_shelf lt version gt The SciStudio shelf and the OpenS_shelf rpm are generated for each release and contain the documentation for each library included in the release The documentation for each library is included in the directory for each library based on the type of library All of the Scientific Studio libraries are found in opt scilibs SCISTUDIO SHELF SciStudio shelf version and the OpenS l
81. ions provide fine grained data parallelism and thread parallelism nested within coarse grained data parallelism and task parallelism They guide the programmer to partition the problem into coarse sub problems that can be solved independently in parallel and then into finer pieces that can be solved cooperatively in parallel Such decomposition preserves language expressivity by allowing threads to cooperate when solving each sub problem and at the same time enables transparent scalability since each sub problem can be scheduled to be solved on any of the available processor cores A compiled CUDA program can therefore execute on any number of processor cores and only the runtime system needs to know the physical processor count See The NVIDIA CUDA Compute Unified Device Architecture Programming Guide and the other documents in the opt cuda doc directory for more information bullx cluster suite and CUDA Bull provides a CUDA development environment based on the NVIDIA CUDA Toolkit including the nvcc compiler and runtime libraries The NVIDIA Software Developer Kit SDK including utilities and project examples is also delivered The CUDA Toolkit is delivered as RPMs and installed in opt cuda and includes the bin lib and man sub directories These files are sourced to load the CUDA environment variables by for example by using the command below source opt cuda bin cudavars sh Alternatively the module can be loaded
82. ittals and avoiding hardware problems The SINFO output is a table whose content and format can be controlled using the SINFO options NAME SINFO view information about SLURM nodes and partitions SYNOPSIS sinfo OPTIONS DESCRIPTION SINFO is used to view partition and node information for a system running SLURM OPTIONS Please refer to the man page for more details on the options including examples of use Example man sinfo Resource Management using SLURM 6 1 1 6 11 SCANCEL Signal Cancel Jobs SCANCEL cancels a running or waiting job or sends a specified signal to all processes on all nodes associated with a job only job owners or their administrators can cancel jobs SCANCEL may also be used to cancel a single job step instead of the whole job NAME SCANCEL Used to signal jobs or job steps that are under the control of SLURM SYNOPSIS scancel OPTIONS job id step id job id step id DESCRIPTION SCANCEL is used to signal or cancel jobs or job steps An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and or job step IDs A job or job step can only be signaled by the owner of that job or user root If an attempt is made by an unauthorized user to signal a job or job step an error message will be printed and the job will not be signaled OPTIONS Please refer to the man page for more details on
83. k loop coowt655 snore xbc 3 fork loop coowt658 0 2 4 fork_loop cxx 1041 fork_wrapper 0x21 Figure 8 1 Totalview graphical interface image taken from http www totalviewtech com productsTV htm TotalView is a proprietary software application and is not included with the bullx cluster suite delivery Totalview is used in the same way as standard symbolic debuggers for C C and Fortran 77 90 and HPF programs It can also debug MPI or OpenMPI applications TotalView has the advantage of being a debugger which supports multi processes and multi threading It can take control of the various processes or threads of the program and make it possible for the user to visualize the evolution of the execution in the same window or in different windows The processes may be local or remote It works equally as well with mono processor SMP clustered distributed and MPP systems 8 2 bullx cluster suite User s Guide TotalView accepts new processes and threads exactly as generated by the application and regardless of the processor used for the execution It is also possible to connect to a process started up outside TotalView Data tables can be filtered displayed and viewed in order to monitor the behavior of the program Finally you can descend call the components and details of into the objects and structures of the program The program which needs debugging must be compiled with the g
84. l the buffers allocated during the run time of the application are freed afterwards However even if he is vigilant it is not unusual for memory leaks to be introduced into the code A simple way to detect these memory leaks is to use the environment variable MALLOC CHECK __ This variable ensures that allocation routines check that each allocated buffer is freed correctly The routines then become more tolerant and allow byte overflows on both sides of blocks or for the block to be released again According to the value of MALLOC CHECK __ when a release or allocation error appears the application behaves as follows e IFMALLOC CHECK __ is set to 1 an error message is written when exiting normally e If MALLOC CHECK __ is set to 2 an error message is written when exiting normally and the process aborts A core file is created You should check that it is possible to create a core file by using the command ulimit c If not enter the command ulimit c unlimited e For any other value of MALLOC CHECK the error is ignored and no message appears Example c program include lt stdio h gt include lt stdlib h gt define SIZE 256 int main void char buffer buffer char calloc 256 sizeof char if buffer 11 failed exit 1 strcpy buffer fills the buffer free buffer fprintf stdout Buffer freed for the first time free buffer fprintf
85. l2 lt version gt directory The environmental variables _ HOME PATH LD_LIBRARY_PATH MAN PATH PYTHON PATH will need to be set or updated These variables should not be set by the user Use the setenv_mpibull2 sh csh environment setting file which may be sourced from the mpibull2 install path share directory by a user or added to the profile for all users by the administrator MPIBull2 Compilers The MPIBull2 library has been compiled with the latest Intel compilers which according to Bull s test farms are the fastest ones available for the Xeon architecture Bull uses Intel Icc and Ifort compilers to compile the MPI libraries It is possible for the user to use their own compilers to compile their applications for example gcc however see below In order to check the configuration and the compilers used to compile the MPI libraries look at the mpibull2_install_path share doc compilers_version text file MPI applications should be compiled using the MPIBull2 MPI wrapper to compilers C programs mpicc your code c C programs mpiCC your code cc or yourcode cc for case insensitive file systems F77 programs mpif77 your code f F90 programs mpif90 your code f90 Wrappers to compilers simply add various command line flags and invoke a back end compiler they are not compilers in themselves The command below is used to override the compiler type used by the wrapper ce fc and cxx and used for
86. le on your system OPTIONS Please refer to the man page for more details on the options including examples of use Example man sacct Resource Management using SLURM 6 13 6 13 STRIGGER NAME strigger Used to set get or clear SLURM trigger information SYNOPSIS strigger set OPTIONS strigger get OPTIONS strigger clear OPTIONS DESCRIPTION strigger is used to set get or clear SLURM trigger information Triggers include events such as a node failing a job reaching its time limit or a job terminating These events can cause actions such as the execution of an arbitrary script Typical uses include notifying system administrators regarding node failures and terminating a job when its time limit is approaching Trigger events are not processed instantly but a check is performed for trigger events on a periodic basis currently every 15 seconds Any trigger events which occur within that interval will be compared against the trigger programs set at the end of the time interval The trigger program will be executed once for any event occurring in that interval with a hostlist expression for the nodelist or job ID as an argument to the program The record of those events e g nodes which went DOWN in the previous 15 seconds will then be cleared The trigger program must set a new trigger before the end of the next interval to insure that no trigger events are missed If desired multiple trigger p
87. lication can be linked with SLURM s PMI library See e Chapter for more information on SLURM e Section 2 2 6 3 for more information on MPD e Chapter 7 for more information on batch managers and launching jobs on extreme computing clusters 2 2 5 MPIBull2 1 3 x features MPIBull2_1 3 x includes the following features e It only has to be compiled once supports the NovaScale architecture and is compatible with the more powerful interconnects e It is designed so that both development and testing times are reduced and it delivers high performance on NovaScale architectures Parallel Libraries 2 3 2 2 6 2 2 6 1 2 4 e Fully compatible with MPICH2 MPI libraries Just set the library path to get all the MPIBull2 features e Supports both MPI 1 2 and MPI 2 standard functionalities including Dynamic processes osock only Onesided communications Extended collectives Thread safety see the Thread Safety Section below including the latest patches developed by Bull e Multi device functionality delivers high performance with an accelerated multi device support layer for fast interconnects The library supports Sockets based messaging for Ethernet SDP SCI and EIP Hybrid shared memory based messaging for shared memory InfiniBand architecture multirails driver Gen2 e Easy Runtime Selection makes it easy and cost effective to support multiple platforms With MPIBull2 Librar
88. llx cluster suite User s Guide If the user lacks a necessary dot file the script will copy one over from the skeleton directory The user will have to logout and login for it to come into effect Another way is for the system administrator to su username to each user and run it interactively The process can be semi automated with a single line command that obviates the need for direct interaction Su username c yes opt modules modules default bin add modules Power users can create a script to directly parse the etc passwd file to perform this command Otherwise just copy the passwd file and edit it to execute this command for each valid user 5 4 Module Files Once the above steps have been performed then it is important to have module files in each of the modulefiles directories For example the following module files will be installed 55585 58 opt modules 3 0 9 rko modulefiles dot module info modules null use own If you do not have your own module files in opt modules modulefiles then copy null to that directory On some systems an empty modulefiles directory will cause a core dump whilst on other systems there will be no problem Use opt modules default modulefiles modules as a template for creating your own module files For more information run module load modules You will then have ready access to the module 1 modulefile 4 man pages as well as the versions directory Study the m
89. me computing Environment 1 3 1 4 bullx cluster suite User s Guide Chapter 2 Parallel Libraries This chapter describes the following topics e 2 1 Overview of Parallel Libraries e 2 2 MPIBull2 e 2 3 mpibull2 params e 2 4 Managing your MPI environment e 2 5 Profiling with mpianalyser 2 1 Overview of Parallel Libraries A common approach to parallel programming is to use a message passing library where a process uses library calls to exchange messages information with another process This message passing allows processes running on multiple processors to cooperate Simply stated a MPI Message Passing Interface provides a standard for writing message passing programs A MPI application is a set of autonomous processes each one running its own code and communicating with each other through calls to subroutines of the MPI library Programming with MPI It is not in the scope of the present guide to describe how to program with MPI Please refer to the Web where you will find complete information Parallel Libraries 2 1 2 2 2 2 1 2 2 2 2 2 MPIBull2 MPIBull2 is a second generation MPI library This library enables dynamic communication with different device libraries including InfiniBand IB interconnects socket Ethernet IB EIB devices or single machine devices MPIBull2 conforms to the MPI 2 standard Quick Start for MPIBull2 MPIBULL2 is usually installed in the opt mpi mpibul
90. mily names which are possible for a particular cluster environment Some of the parameter family names which are possible for bullx cluster suite are listed below LK Ethernet Core driver LK IPvA route LK IPvA driver OpenFabrics IB driver Marmot Debugging Library MPI Collective Algorithms MPI Errors CH3 drivers CHG drivers Shared Memory Execution Environment Infiniband RDMA IMBR mpibull2 driver Infiniband Gen2 mpibull2 driver UDAPL mpibull2 driver IBA VAPI mpibull2 driver MPIBull2 Postal Service MPIBull2_Romio Run the command mpibull2 params lt fl gt lt family gt to see the list of individual parameters included in the parameter families used within your cluster environment Parallel Libraries 2 17 2 4 2 18 Managing your MPI environment Bull provides different MPI libraries for different user requirements In order to help users manage different environment configurations Bull also ships Modules which can be used to switch from one MPI library environment to another This relies on the module software see Chapter 5 The directory used to store the module files is opt mpi modulefiles into which the different module files that include the mpich vitmpi libraries for InfiniBand and MPIBull2 environments are placed 9 ceri It is recommended that a file is created for example 99 mpimodules sh and 99 mpimodules sh csh and this is added to the etc profile d directory The line below sh
91. n etc as long as C C routines are called Using the makefile system for the CUDA development environment Carry out the following steps to use the makefile system 1 Create the directory for the application code and populate it with the cu and of cpp source files 2 Set the environment module load cuda 3 Create a makefile to build your application as shown below Add source files here EXECUTABLE bitonic Cuda source files compiled with nvcc CUFILES bitonic cu C C source files compiled with gcc c CCFILES bitonic gold cpp Hat tat HH EH EE aE HE HE aE EE EE EE HE EE EE E E EE aE EE EE EE EE EE EE EH Rules and targets ifneq CUDA MAKEFILE include CUDA MAKEFILE The makefile in the example above builds an application named bitonic from two source files bitonic cu and bitonic gold cpp A For the cu file by default nvcc wraps C so the SEEK variables and the mpi h prototype file must be unset as the two C name spaces collide undef SEEK SET undef SEEK END undef SEEK CUR include lt mpi h gt int main int argc char argv CUT DEVICE INIT argc argv int values NUM int err MPI Init NULL NULL for int i 0 i NUM i Compilers 4 5 int dvalues CUDA SAFE CALL cudaMalloc void amp dvalues sizeof int
92. n features including auto vectorization High Performance Parallel Optimizer HPO Interprocedural Optimization IPO Profile Guided Optimization PGO and Optimized Code Debugging e Multithreaded Application Support including OpenMP and Auto Parallelization to convert serial applications into parallel applications to fully exploit the processing power that is available e Data preloading e loop unrolling The Professional Edition includes Intel Threading Building Blocks Intel TBB Intel Integrated Performance Primitives Intel IPP and the Intel Math Kernel Library Intel MKL with its optimized functions for maths processing It is also compatible with GNU products See www intel com for more details 4 2 bullx cluster suite User s Guide Different versions of the compiler may be installed to ensure compatibility with the compiler version used to compile the libraries and applications on your system Note It may be necessary to contact the System Administrator to ascertain the location of the compilers on your system The paths shown in the examples below may vary To specify a particular environment use the command below source opt intel Compiler maj ver nb min ver nb bin iccvars sh intel64 For example e use version 11 0 069 of the C C compiler source opt intel Compiler 11 0 069 bin iccvars sh intel64 display the version of the active compiler enter icc version
93. nder opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt BLACS blacs lt ver gt Using BLACS BLACS is located in the following directory opt scilibs BLACS blacs lt version gt mpibull2 lt version gt The libraries include the following libblacsCinit_MPILLINUX 0 a libblacsF77init_MPI LINUX 0 a libblacs_MPI LINUX 0 a Testing the Installation of the Library The installation of the library can be tested using the tests found in the following directory opt scilibs BLACS blacs lt version gt mpibull2 lt version gt tests Setting Up the Environment First the and LD_LIBRARY_PATH variables must be set up to point to the MPI libraries that are to be tested Scientific Libraries 3 3 export MPI HOME opt mpi mpibull2 version export PATH SMPI HOME bin PATH export LD LIRARY HOME lib LD LIRARY PATH Running the Tests Then run the tests as follows mpirun np 4 xCbtest mpirun np 4 xFbtest 3 2 3 SCALAPACK SCALAPACK stands for SCALable Linear Algebra PACKage This library is the scalable version of LAPACK Both libraries use block partitioning to reduce data exchanges between the different memory levels to a minimum SCALAPACK is used above all for eigenvalue problems and factorizations LU Cholesky and QR Matrices are distributed using BLACS More information is available from documentation included in the SciStudio shelf rpm
94. ne eae anes 7 2 FAA Displaying the results for a job oss ved eR ie Oen oues oda fase ved ts 7 2 Z5 dto 7 2 Exiting djob resan metti Ae 7 3 7 2 Launching an Application without a Batch Manager 7 3 Chapter 8 Application Debugging Tools 8 1 8 1 cUDH P 8 1 8 2 GDB zr ceo Eee due E o QU UP cce eee CU tdeo 8 1 8 3 gt DIEN 8 1 8 4 TONG VIEW e nr 82 8 5 Bp 8 3 Table of Contents vii 8 6 MALLOC CHECK Debugging Memory Problems in C 8 5 8 7 Electiic cave Dp at np UG eM 8 7 Glossary and Acronyms sio G 1 NN n l 1 List of Figures Figure 2 1 MPIBull2 Linking Strategies iecur oerte atv Er MN ERU sonnets aig COR ae dees 2 5 Figure 2 2 MPD GHG E T EEE E E 2 6 Figure 3 1 Boll Scientific Studio mene ee 3 2 Figure 3 2 Interdependence of the different mathematical libraries Scientific Studio and Intel 3 4 Figure 6 1 MPI Process Management With and Without Resource Manager 6 2 Figure 8 1 Totalview graphical interface image taken from http www totalviewtech com productsTV htm eese 82 Figure 8 2 The Graphical User Interface for DDT ssssssssssssseeeeem 8 4 List of Tables
95. nit modulespath The initial search path setup for module files This file is read by all shell init files MODULEPATH moduleavailcache File containing the cached list of all modulefiles for each directory in the MODULEPATH only when the avail cache is enabled MODULEPATH moduleavailcachedir File containing the names and modification times for all sub directories with an avail cache HOME modulesbeginenv A snapshot of the user s environment taken when Modules are initialized This information is used by the module update sub command The User s Environment 5 13 5 6 5 6 1 5 14 The NVIDIA CUDA Development Environment For clusters which include NVIDIA Tesla graphic accelerators the NVIDIA Compute Unified Device Architecture CUDA Toolkit is installed automatically on the LOGIN COMPUTE and COMPUTEX nodes so that the NVIDIA nvcc C compiler is in place for the application Note The NVIDIA Tesla C1060 card is used on NovaScale RA25 servers only whereas the NVIDIA Tesla 1070 accelerator is used by both NovaScale R422 E1 and R425 servers CUDA is a parallel programming environment designed to scale parallelism so that all the processor cores available are exploited As it builds on C extensions the CUDA development environment is easily mastered by application developers At its core are three key abstractions a hierarchy of thread groups shared memories and barrier synchronizations These abstract
96. nstalled assumes usr folder build the Global Accounting API library by going to the usr lib slurm bullacct folder and executing the following command make f makefile lib This will build the library libcombine_acct a This makefile lib assumes that the SLURM product is installed in the usr folder and LSF is installed in app slurm lsf 6 2 If this is not the case the SLURM_BASE and LSF_BASE variables in the makefile lib file must be modified to point to the correct location 2 After the library is built add the library usr lib slurm bullacct libcombine_acct a to the link option when building an application that will use this API 3 user application program add the following for new accounting record assumes Slurm is installed under the opt slurm folder include usr lib slurm bullacct combine acct h define file pointer for LSF and Slurm log file FILE lsb acct fg NULL file pointer for LSF accounting log file FILE slurm acct fg NULL file pointer for Slurm log file int status jobId struct CombineAcct newAcct define variable for the new records call cacct init routine to open lsf and slurm log file and initialize the newAcct structure status cacct init amp lsb acct fg amp slurm acct fg amp newAcct if the status returns 0 imply no error all log files are opened successfully then call get combine acct info rountine to get the
97. o develop software using multi agent or master server paradigms They provide a mechanism to establish communication between newly created processes and an existing MPI application 1 COMM SPAWN They also provide a mechanism to establish communication between two existing MPI applications even when one did not start the other MPI PUBLISH NAME MPI PUBLISH NAME structure MPI PUBLISH NAME service name info port name IN service name a service name to associate with the port string IN implementation specific information handle IN port name a port name string Although these paradigms are useful for extreme computing clusters there may be a performance impact MPIBull2 includes these Dynamic Process Services but with some restrictions e Only the osock socket MPI driver can be used with these dynamic processes A PMI server implementing spawn answering routines must be used as follows For all Bull clusters the MPD sub system is used see the sections above for more details For clusters which use SLURM a MPD ring must be deployed once SLURM s allocation has been guaranteed PBS Professional clusters can use MPD without any restrictions e quantity of processes which can be spawned depend on the reservation previously allocated with the Batch Manager Scheduler if used See The chapter on Process Creation and Management in the MPI 2 1 Standard documentation available f
98. odule command Suppose there is a change from 3 0 5 rko to version 3 0 6 rko The goal is to semi automate the changes to the user dot files so that the user is oblivious to the change The first step is to install the new module command amp files to opt modules 3 0 6 rko Test it out by loading with module load modules 3 0 6 rko You may get an error like 3 0 6 rko 25 ERROR 152 Module modules is currently not loaded This is OK and should not appear with future versions Make sure you have the new version with module version If it seems stable enough then advertise it to your more adventurous users Once you are satisfied that it appears to work adequately well then go into opt modules remove the old default symbolic link to the new versions For example cd opt modules rm default ln s 3 0 6 rko default This new version is now the default and will be referenced by all the users that log in and by those that have not loaded a specific module command version bullx cluster suite User s Guide 5 9 34 1 The Module Command Synopsis module switches sub command sub command args The Module command provides a user interface to the Modules package The Modules package provides for the dynamic modification of the user s environment via modulefiles Each modulefile contains the information needed to configure the shell for an application Once the Modules package is initialized
99. of the machine issuing the packet and the destination MAC is the broadcast address xx xx xx xx xx xx Ordinarily no reply packet will occur Gratuitous ARP reply is a reply to which no request has been made GSL GNU Scientific Library GT s Giga transfers per second GUI Graphical User Interface GUID Globally Unique Identifier H HBA Host Bus Adapter HCA Host Channel Adapter HDD Hard Disk Drive HoQ Head of Queue HPC High Performance Computing Hyper Threading A technology that enables multi threaded software applications to process threads in parallel within each processor resulting in increased utilization of processor resources IB InfiniBand IBTA InfiniBand Trade Association ICC Intel C Compiler IDE Integrated Device Electronics IFORT Intel Fortran Compiler IMB Intel MPI Benchmarks INCA Integrated Cluster Architecture Bull Blade platform IOC Input Output Board Compact with 6 PCI Slots IPMI Intelligent Platform Management Interface IPO Interprocedural Optimization IPoIB Internet Protocol over InfiniBand IPR IP Router iSM Storage Manager FDA storage systems ISV Independent Software Vendor Glossary and Acronyms G 3 K KDC Key Distribution Centre KSIS Utility for Image Building and Deployment KVM Keyboard Video Mouse allows the keyboard video monitor and mouse to be connected to the node L LAN Local Area Network LAP
100. of this library is as follows opt scilibs BLOCKSOLVE95 BlockSolve95 lt version gt mpibull2 lt version gt lib libO linux The following library is provided libBS95 a Some examples are also provided in the following directory opt scilibs BLOCKSOLVE95 BlockSolve95 lt version gt mpibull2 lt version gt examples More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt BLOCKSOLVE95 BlockSolve95 lt ver gt lapack lapack_sci is a set of Fortran 77 routines used to resolve linear algebra problems such as the resolution of linear systems eigenvalue computations matrix computations etc However it is not written for a parallel architecture The default installation of this library is as follows opt scilibs LAPACK_SCI lapack_sci lt version gt Scientific Libraries 3 5 3 2 0 3 2 7 3 6 More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt LAPACK SCI lt version gt SuperLU This library is used for the direct solution of large sparse nonsymmetrical systems of linear equations on high performance machines The routines will perform an LU decomposition with partial pivoting and triangular system
101. onfiguration into a file bullx cluster suite User s Guide Restore a configuration from a file h Show help message and exit Options The following options and arguments are possible for the mpibull2 params command Note options shown can be combined for example li or can be listed separately for example l i The different option combinations for each argument are shown below 4 iv PNAME List current default values of all MPI parameters Use the PNAME argument this could be a list to specify a precise MPI parameter name or just a part of a name Use the v verbose option to display all possible values including the default Use the i option to list all information Examples This command will list all the parameters with the string all or shm in their name mpibull2 params grep e all e shm will return the same result mpibull2 params 1 all shm This command will display all information possible values family purpose etc for each parameter name which includes the string all This command will also indicate when the current value has been returned by getenv i e the parameter has been modified in the current environment mpibull2 params li all This command will display current and possible values for each parameter name which includes the string rom It is practical to run this command before a parameter is modified mpibull2 params lv rom f FNAME
102. ould be pasted into this file This will make the configuration environment available to all users module use a opt mpi modulefiles 1 To check the modules which are available run the following command module av This will give output similar to that below a opt mpi modulefiles mpibull2 1 2 8 1 t mpich 1 2 7 pl vitmpi 24 1 2 To see which modules are loaded run the command module li This will give output similar to that below Currently Loaded Modulefiles 1 oscar modules 1 0 3 3 To change MPI environments run the following commands according to your needs module load mpich module li Currently Loaded Modulefiles 1 oscar modules 1 0 3 2 mpich 1 2 7 pl 4 To check which MPI environment is loaded run the command below which mpicc This will give output similar to that below opt mpi mpich 1 2 7 pl bin mpicc bullx cluster suite User s Guide 5 To remove a module e g mpich run the command below module rm mpich 6 Then load the new MPI environment by running the load command as below module load mpibull2 2 5 Profiling with mpianalyser mpianalyser is a profiling tool developed by Bull for its own MPI Bull implementation This is a non intrusive tool which allows the display of data from counters that has been logged when the application is run See Chapter 1 in the Application Tuning Guide for details on mpianalyser
103. pplication and to the MPIBULL2 library It is possible to specify the usage of this library manually by using the MPIBULL2 USE MPI MARMOT environment variable as shown in the example below export MPIBULL2 USE MPI MARMOT 1 mpicc bench c o bench or by using the marmot option with the MPI compiler wrapper as shown below mpicc marmot bench c o bench See the documentation in the share section of the marmot package or go to http www hlrs de organization amt projects marmot for more details on Marmot Parallel Libraries 2 13 2 3 2 3 1 2 14 mpibull2 params mpibull2 params is a tool that is used to list modify save restore the environment variables that are used by the mpibull2 library and or by the communication device libraries InfiniBand Quadrics etc The behaviour of mpibull2 MPI library may be modified using environment variable parameters to meet the specific needs of an application The purpose of the mpibull2 params tool is to help mpibull2 users to manage different sets of parameters For example different parameter combinations can be tested separately on a given application in order to find the combination that is best suited to its needs This is facilitated by the fact that mpibull2 params allow parameters to be set unset dynamically Once a specific combination of parameters has been tested and found to be good for a particular context they can be saved into a file by a mpibull
104. r in the slurm conf configuration file partition is the name of a Slurm partition on that cluster account is the bank account for a job The intended mode of operation is to initiate the sacctmgr command add delete modify and or list association records then commit the changes and exit OPTIONS Please refer to the man page for more details on the options including examples of use Example man sacctmgr bullx cluster suite User s Guide 6 8 SBCAST sbcast is used to copy a file to local disk on all nodes allocated to a job This should be executed after a resource allocation has taken place and can be faster than using a single file system mounted on multiple nodes NAME sbcast transmit a file to the nodes allocated to a SLURM job SYNOPSIS sbcast CfpsvV SOURCE DEST DESCRIPTION sbcast is used to transmit a file to all nodes allocated to the SLURM job which is currently active This command should only be executed within a SLURM batch job or within the shell spawned after the resources have been allocated to a SLURM SOURCE is the name of the file on the current node DEST should be the fully qualified pathname for the file copy to be created on each node DEST should be on the local file system for these nodes Note Parallel file systems may provide better performance than sbcast OPTIONS Please refer to the man page for more details on the options including examples of use E
105. rce Management using SLURM Describes the SLURM Resource Management utilities and commands Chapter 7 Batch Management and Launching an Application Describes how to use the PBS Professional Batch Manager and different program launching options Chapter 8 Debugging Tools Describes some debugging tools Glossary and Acronyms Provides a Glossary and lists some of the Acronyms used in the manual Bibliography Refer to the manuals included on the documentation CD delivered with your system OR download the latest manuals for your bullx cluster suite release and for your cluster hardware from http support bull com The bullx cluster suite Documentation CD ROM 86 A2 12FB includes the following manuals e bullx cluster suite Installation and Configuration Guide 86 A2 19FA e bullx cluster suite Administrator s Guide B6 2 20FA e bullx cluster suite User s Guide 86 2 22FA e bullx cluster suite Maintenance Guide 86 2 24FA e bullx cluster suite Application Tuning Guide 86 A2 23FA e bullx cluster suite High Availability Guide 86 A2 25FA e InfiniBand Guide 86 A2 42FD e LDAP Authentication Guide 86 A2 41FD The following document is delivered separately e The Software Release Bulletin SRB 86 A2 73E The Software Release Bulletin contains the latest information for your delivery This should be read first Contact your support representative for more information For Bull Sys
106. rd NIS Network Information Service NS NovaScale NTP Network Time Protocol NUMA Non Uniform Memory Access NVRAM Non Volatile Random Access Memory O OFA Open Fabrics Alliance OFED Open Fabrics Enterprise Distribution OPMA Open Platform Management Architecture OpenSM Open Subnet Manager OpenlB Open InfiniBand OpenSSH Open Source implementation of the SSH protocol OSC Object Storage Client OSS Object Storage Server OST Object Storage Target P PAM Platform Administration and Maintenance Software PAPI Performance Application Programming Interface PBLAS Parallel Basic Linear Algebra Subprograms PBS Portable Batch System PCI Peripheral Component Interconnect Intel PDSH Parallel Distributed Shell PDU Power Distribution Unit PETSc Portable Extensible Toolkit for Scientific Computation PGAPACK Parallel Genetic Algorithm Package Glossary and Acronyms G 5 PM Performance Manager Platform Management PMI Process Management Interface PMU Performance Monitoring Unit pNETCDF Parallel NetCDF Network Common Data Form PVFS Parallel Virtual File System Q QDR Quad Data Rate QoS Quality of Service A set of rules which guarantee a defined level of quality in terms of transmission rates error rates and other characteristics for a network R RAID Redundant Array of Independent Disks RDMA Remote Direct Memory Acces
107. rograms can be set for the same event This command can only set triggers if run by the user SlurmUser unless SlurmUser is configured as root user This is required for the slurmctld daemon to set the appropriate user and group IDs for the executed program Also note that the program is executed on the same node that the slurmctld daemon uses rather than on an allocated Compute Node To check the value of SlurmUser run the command scontrol show config grep SlurmUser OPTIONS Please refer to the man page for more details on the options including examples of use Example man strigger 6 14 bullx cluster suite User s Guide 6 14 SVIEW NAME sview Graphical user interface to view and modify SLURM state Note This command requires an XWindows capable display SYNOPSIS sview DESCRIPTION sview can be used to view the SLURM configuration job step node and partition state information Authorized users can also modify select information The primary display modes are Jobs and Partitions each with a selection tab There is also an optional map of the nodes on the left side of the window which will show the nodes associated with each job or partition Left click on the tab of the display you would like to see Rightclick on the tab in order to control which fields will be displayed Within the display window left click on the header to control the sort order of entries e g increa
108. rom http www mpi forum org docs for more information MPI Ports Publishing Example Sever Client Command mpiexec n 1 server mpiexec n 4 toy Process MPI_Open_port MPI_Publish_name MPIBull2 1 3 9 s Astlik MPI_THREAD_FUNNELED device osock Server is waiting for connections MPIBull2 1 3 9 s Astlik MPI_THREAD_FUNNELED device osock MPI_Get_attribute Got the universe size from server MPI Lookup name Lookup found service attag 0 port 35453 description 10 11 0 11 ifname 10 11 0 11 port x4 Parallel Libraries 2 7 MPI Comm accept Master available Received from O Now time to merge the communication MPI Comm merge Establish communication with 1st slave Accept communication to port Slave 1 available Slave 2 available Disconnected from slave Send message to Master Slave 3 available Disconnected from slave Send message to Master MPI Comm Unpublish MPI Close Port _ Comm connect MPI Send MPI_Recv Sent stuff to the commlnter Recv stuff to the commlnter Master Process at work merge comm Master number of tasks to distribute 10 Sent a message to the following MPI process Sent stuff to the commlnter Recv stuff to the commlnter Slave Process at work merge comm Sent stuff to the commlnter Recv stuff to the commlnter Slave Process at work merge comm Sent stuff to the commlnter Recv stuff to the commlnter Slave
109. rted by creating a subdirectory for the application containing modulefiles for each version use directory directory e Prepend directory to the MODULEPATH environment variable The append flag will append the directory to MODULEPATH use a append directory directory e Remove directory from the MODULEPATH environment variable unuse directory directory e Attempt to reload all loaded modulefiles The environment will be reconfigured to match the saved HOME modulesbeginenv and the modulefiles will be reloaded The update command will only change the environment variables that the modulefiles set update e Force the Modules Package to believe that no modules are currently loaded clear Unload all loaded modulefiles purge e Display the modulefile information set up by module whatis commands inside the specified modulefiles If no modulefiles are specified all the whatis information lines will be shown whatis modulefile modulefile The User s Environment 5 11 Sid 5 12 e Searches through the whatis information of all modulefiles for the specified string All module whatis information matching the search string will be displayed apropos string keyword string e Add modulefile to the shell s initialization file in the user s home directory The startup files checked are cshrc login and csh variables for the C Shell profile for the
110. s ROM Read Only Memory RPC Remote Procedure Call RPM RPM Package Manager G 6 bullx cluster suite User s Guide RSA Rivest Shamir and Adleman the developers of the RSA public key cryptosystem 5 SA Subnet Agent SAFTE SCSI Accessible Fault Tolerant Enclosures SAN Storage Area Network SCALAPACK SCALable Linear Algebra PACKage SCSI Small Computer System Interface SCIPORT Portable implementation of CRAY SCILIB SDP Socket Direct Protocol SDPOIB Sockets Direct Protocol over Infiniband SDR Sensor Data Record Single Data Rate SFP Small Form factor Pluggable transceiver extractable optical or electrical transmitter receiver module SEL System Event Log SIOH Server Input Output Hub SIS System Installation Suite SL Service Level SL2VL Service Level to Virtual Lane SLURM Simple Linux Utility for Resource Management an open source highly scalable cluster management and job scheduling system SM Subnet Manager SMP Symmetric Multi Processing The processing of programs by multiple processors that share a common operating system and memory SNMP Simple Network Management Protocol SOL Serial Over LAN SPOF Single Point of Failure SSH Secure Shell Syslog ng System Log New Generation T TCL Tool Command Language TCP Transmission Control Protocol TFTP Trivial File Transfer Protocol TGT Ticket Granting Ticket U
111. s of shared data is provided The libraries for ga are located in the following directory opt opens GA ga lt version gt mpibull2 lt version gt lib More information is available from documentation included in the OpenS shelf rpm When this is installed the documentation files will be located under opt opens OPENS_SHELF OpenS_shelf lt version gt GlobalArray ga lt version gt gsl The GNU Scientific Library GSL is a numerical library for C and C programmers It is free software provided under the GNU General Public License The library provides a wide range of mathematical routines such as random number generators special functions and least squares fitting There are over 1000 functions in total with an extensive test suite The complete range of subject areas covered by the library includes bullx cluster suite User s Guide Complex Numbers Roots of Polynomials Special Functions Vectors and Matrices Permutations Sorting BLAS Support Linear Algebra Eigensystems Fast Fourier Transforms Quadrature Random Numbers Quasi Random Sequences Random Distributions Statistics Histograms N Tuples Monte Carlo Integration Simulated Annealing Differential Equations Interpolation Numerical Differentiation Chebyshev Approximation Series Acceleration Discrete Hankel Transforms RootFinding Minimization LeastSquares Fitting Physical Constants IEEE Floating Point Discrete Wavelet Transforms Basis splines The gsl libraries can be found
112. s solves through forward and back substitution The factorization routines can handle non square matrices but the triangular solves are performed only for square matrices The matrix commands may be pre ordered either through library or user supplied routines This pre ordering for sparse equations is completely separate from the factorization Working precision iterative refinement subroutines are provided for improved backward stability Routines are also provided to equilibrate the system estimate the condition number calculate the relative backward error and estimate error bounds for the refined solutions SuperLU_Dist is used for distributed memory More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt SUPERLU_DIST SuperLU_DISC lt version gt opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt SUPERLU_MT SuperLU_MT lt version gt opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt SUPERLU_SEQ SuperLU_SEQ lt version gt SuperLU Libraires The following SuperLU Libraries are provided opt scilibs SUPERLU_DIST SuperLU_DIST lt version gt mpibull2 lt version gt lib superlu_Inx_x86_64 a opt scilibs SUPERLU_MT SuperLU MT lt version gt lib superlu_mt_PTHREAD a opt scilibs SUPERLU_SEQ SuperLU SEQ 2 0 lib superlu_x86_64 a opt scilibs SUPERLU_SEQ SuperLU SEQ3
113. sing or decreasing in the display You can also left click and drag the headers to move them right or left in the display If a JobID has an arrow next to it click on that arrow to display or hide information about that job s steps Right click on a line of the display to get more information about the record There is an Admin Mode option which permits the root user to modify many of the fields displayed such as node state or job time limit In the mode a SLURM Reconfigure Action is also available It is recommended that Admin Mode be used only while modifications are actively being made Disable Admin Mode immediately after the changes to avoid making unintended changes OPTIONS Please refer to the man page for more details on the options including examples of use Example man sview See https computing lInl gov linux slurm documentation html for more information Resource Management using SIURM 6 15 6 15 Global Accounting API Note The Global Accounting API only applies to clusters which use SLURM and the Load Sharing Facility LSF batch manager from Platform Computing together Both the LSF and SLURM products can produce an accounting file The Global Accounting API offers the capability of merging the data from these two accounting files and presenting it as a single record to the program using this API Perform the following steps to call the Global Accounting API 1 After SLURM has been i
114. stdout Buffer freed for the second time return 0 A program which is executed with the environmental variable MALLOC_CHECK __ set to 1 gives the following result export MALLOC CHECK 1 example Buffer freed for the first time Segmentation fault Application Debugging Tools 8 5 8 6 ulimit c O The limit for the core file size must be changed to allow files bigger than 0 bytes to be generated ulimit c unlimited f Allows an unlimited core file to be generated A program which is executed with the environmental variable MALLOC CHECK __ set to 2 gives the following result export MALLOC CHECK 2 example Buffer freed for the first time Segmentation fault core dumped Example Program Analysis using the GDB Debugger The core file should be analyzed to identify where the problem is the program should be compiled with the option G gdb example c core GNU gdb 6 3 debian Copyright 2004 Free Software Foundation Inc GDB is free software covered by the GNU General Public License and you are welcome to change it and or distribute copies of it under certain conditions Type show copying to see the conditions There is absolutely no warranty for GDB Type show warranty for details This GDB was configured as i386 linux Using host libthread db library lib libthread db so 1 Core was generated by example Program terminated with signal 11 Segmentation fault
115. stem and Installation The bullx cluster suite is based on a standard Linux distribution combined with a number of Open Source applications that exploit the best from the Open Systems community This combined with technology from Bull and its partners results in a powerful complete solution for the development execution and management of parallel and serial applications simultaneously Its key features are Strong manageability through Bull s systems management suite that is linked to state of the art workload management software e High bandwidth low latency interconnect networks e Scalable high performance file systems both distributed and parallel All cluster nodes use the same Linux distribution Parallel commands are provided to supply users and system administrators with single system attributes which make it easier to manage and to use cluster resources Software installation is carried out by first creating an image on a node loading this image onto the Management Node and then distributing it to the other nodes using the Image Building and Deployment KSIS utility This distribution is performed via the administration network Introduction to the extreme computing Environment 1 1 1 2 Program Execution Environment When a user logs onto the system the login session is directed to one of several nodes where the user may then develop and execute their applications Applications can be executed on other
116. tem Manager refer to the Bull System Manager documentation suite For clusters which use the PBS Professional Batch Manager e 5 Professional 10 0 Administrator s Guide on the PBS Professional CD ROM e PBS Professional 10 0 User s Guide on the PBS Professional CD ROM bullx cluster suite User s Guide For clusters which use LSF LSF Installation and Configuration Guide 86 A2 39FB the LSF CD ROM e Installing Platform LSF on UNIX and Linux on the LSF CD ROM For clusters which include the Bull Cool Cabinet e Site Preparation Guide 86 1 AOFA e RGck nRoll amp R ck to Build Installation and Service Guide 86 A1 17FA e Cabinet Installation Guide 86 1 20EV e Cool Cabinet Console User s Guide 86 A1 41FA e Cool Cabinet Service Guide 86 A7 42FA Highlighting e Commands entered by the user are in a frame in Courier font as shown below mkdir var lib newdir e System messages displayed on the screen are Courier New font between 2 dotted lines as shown below e Values to be entered in by the user are in Courier New for example COMI e Commands files directories and other items whose names are predefined by the system are in Bold as shown below The etc sysconfig dump file e use of Italics identifies publications chapters sections figures and tables that are referenced e lt gt identifies parameters to be supplied by the user for ex
117. the environment can be modified on a per module basis using the module command which interprets modulefiles Typically modulefiles instruct the module command to alter or to set shell environment variables such as PATH MANPATH etc modulefiles may be shared by many users on a system and users may have their own collection to supplement or replace the shared modulefiles The modulefiles are added to and removed from the current environment by the user The environment changes contained in a modulefile can be summarized through the module command as well If no arguments are given a summary of the module usage and sub commands are shown The action for the module command to take is described by the sub command and its associated arguments modulefiles modulefiles are the files containing TCL code for the Modules package modulefiles are written in the Tool Command Language TCL 3 and are interpreted by the modulecmd program via the module 1 user interface modulefiles can be loaded unloaded or switched on the fly while the user is working A modulefile begins with the magic cookie Module A version number may be placed after this string The version number is useful as the format of modulefiles may change If a version number does not exist then modulecmd will assume the modulefile is compatible with the latest version The current version for modulefiles will be 1 0 Files without the magic cookie will not be interpreted by module
118. the options including examples of use Example man scancel 6 12 bullx cluster suite User s Guide 6 12 SACCT Accounting Data NAME SACCT displays accounting data for all jobs and job steps in the SLURM job accounting log SYNOPSIS sacct options DESCRIPTION Accounting information for jobs invoked with SLURM is logged in the job accounting log file The SACCT command displays job accounting data stored in the job accounting log file in a variety of forms for your analysis The SACCT command displays information about jobs job steps status and exit codes by default The output can be tailored with the use of the fields option to specify the fields to be shown For the root user the SACCT command displays job accounting data for all users although there are options to filter the output to report only the jobs from a specified user or group For the non root user the SACCT command limits the display of job accounting data to jobs that were launched with their own user identifier UID by default Data for other users can be displayed with the all user or uid options Note Much of the data reported by SACCT has been generated by the wait3 and getrusage system calls Some systems gather and report incomplete information for these calls SACCT reports values of O for this missing data See the getrusage man page for your system to obtain information about which data are actually availab
119. ull extreme computing applications Prerequisites The installation of all hardware and software components of the cluster must have been completed The cluster administrator must have carried out basic administration tasks creation of users definition of the file systems network configuration etc See the Administrator s Guide for more details Structure This guide is organized as follows Chapter 1 Introduction to the extreme computing Environment Provides a general introduction to software environment Two types of programming libraries are used when running programs in the extreme computing environment Parallel libraries and Mathematical libraries These are described in the chapters 2 and 3 Chapter 2 Parallel Libraries Describes the Message Passing Interface MPI libraries to be used when parallel programming Chapter 3 Scientific Libraries Describes the scientific libraries and scientific functions delivered with the bullx cluster suite delivery and how these should be invoked Some of Intel s and NVIDIA proprietary libraries are also described Chapter 4 Compilers Describes the compilers available and how to use them Chapter 5 The User s Environment Describes the user s environment on extreme computing clusters including how clusters are accessed and the use of the file systems A description of Modules which can be used to change and compare environments is also included Preface Chapter 6 Resou
120. ull2 lt version gt lib linux intel opt More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt PETSC PETSc lt version gt 3 2 9 NETCDF NetCDF Network Common Data Form allows the management of input output data NetCDF is an interface for array oriented data access and is a library that provides an implementation of the interface The NetCDF library also defines a machine independent format for representing scientific data Together the interface library and format support the creation access and sharing of scientific data The library is located in the following directories opt scilibs NETCDF netCDF lt version gt bin opt scilibs NETCDF netCDF lt version gt include opt scilibs NETCDF netCDF lt version gt lib opt scilibs NETCDF netCDF lt version gt man More information is available from documentation included in the SciStudio_shelf rpm When this is installed the documentation files will be located under opt scilibs SCISTUDIO_SHELF SciStudio_shelf lt version gt NETCDF netCDF lt version gt 3 2 10 pNETCDF Parallel NetCDF library provides high performance I O while still maintaining file format compatibility with Unidata s NetCDF NetCDF Network Common Data Form is a set of software libraries and machine independent data formats that support the
121. ult value mpibull2 params m mpibull2 romio lustre default d v This will display the difference between the current and the default configurations Displays all modified MPI parameters by comparing all MPI parameters with their default values 5 M FILE This will save all modified MPI parameters into FILE It is not possible to overwrite an existing file an error will be returned if one exists Without any specific arguments this file will create a file named with the date and time of the day in the current directory This command works silently by default Use the v option to list all modified MPI parameters in a standard output Example This command will for example try to save all the MPI parameters into the file named Thu_Feb_14_15_50_28_2008 mpibull2 params sv Output Example save the current setting mpibull2 mpid xxx 1 1 parameter s saved 2 16 bullx cluster suite User s Guide r v FILE Restore all the MPI parameters found in FILE and set the environment Without any arguments this will restore all modified MPI parameters to their default value This command works silently in the background by default Use the v option to list all restored parameters in a standard output Example This command will restore all modified parameters to default mpibull2 params r h Displays the help page 202 Family names The command mpibull2 params f will list the parameter fa
122. variable PMI DEBUG to a numeric value of 1 or higher for the PMI library to print debugging information Resource Management using 6 3 6 3 SRUN SRUN submits jobs to run under SLURM management SRUN can submit an interactive job and then persist to shepherd the job as it runs SLURM associates every set of parallel tasks job steps with the SRUN instance that initiated that set SRUN options allow the user to both e Specify the parallel environment for job s such as the number of nodes used node partition distribution of processes among nodes and total time e Control the behavior of a parallel job as it runs such as redirecting or labeling its output or specifying its reporting verbosity NAME srun run parallel jobs SYNOPSIS srun OPTIONS executable args DESCRIPTION Run a parallel job on cluster managed by SLURM If necessary srun will first create a resource allocation in which to run the parallel job OPTIONS Please refer to the man page for more details on the options including examples of use Example man srun 6 4 bullx cluster suite User s Guide 6 4 SBATCH batch NAME SBATCH Submit a batch script to SLURM SYNOPSIS sbatch OPTIONS SCRIPT ARGS DESCRIPTION sbatch submits a batch script to SLURM The batch script may be linked to sbatch using its file name and the command line If no file name is specified sbatch will read in a script
123. xample man sbcast Resource Management using SLURM 6 9 6 9 SQUEUE List Jobs SQUEUE displays by default the queue of running and waiting jobs or job steps including the Jobld used for SCANCEL and the nodes assigned to each running job However SQUEUE reports can be customized to cover any of the 24 different job properties sorted according to the most important properties It also displays the job ID and job name for every job being managed by the SLURM control daemon SLURMCTLD The status and resource information for each job such as time used so far or a list of committed nodes are displayed in a table whose content and format can be set using the SQUEUE options NAME SQUEUE view information about jobs located in the SLURM scheduling queue SYNOPSIS squeue OPTIONS DESCRIPTION SQUEUE is used to view job and job step information for jobs managed by SLURM OPTIONS Please refer to the man page for more details on the options including examples of use Example man squeue 6 10 bullx cluster suite User s Guide 6 10 SINFO Report Partition and Node Information SINFO displays a summary of status information on SLURM managed partitions and nodes not jobs Customizable SINFO reports can cover the node count state and name list for a whole partition or the CPUs memory disk space or current status for individual nodes as specified These reports can assist in planning job subm
124. y both users and developers can select drivers at runtime easily without modifying the application code The application is built once and works for all interconnects supported by Bull e Ensures that the applications achieve a high performance with a high degree of interoperability with standard tools and architectures e Common feature for all devices FUTEX Fast User mode muTEX mechanism in user mode Advanced features MPIBull2 Linking Strategies Designed to reduce development and testing time MPIBull2 includes two linking strategies for users Firstly the user can choose to build his application and link dynamically leaving the choice of the MPI driver until later according to which resources are available For instance if a small Ethernet cluster is the only resource available the user compiles and links dynamically using an osock driver whilst waiting for access to a bigger cluster via a different InfiniBand interconnect and which uses the ibmr_gen2 driver at runtime Secondly the User might want to use an out of the box application designed for a specific MPI device Bull provides the combination of a MPI Core and alll its supported devices which enables static libraries to be linked to by the User s application bullx cluster suite User s Guide 2 2 5 2 2 2 6 3 MPI CORE MPI CORE MPI DRIVERS OSOCK DYNAMIC STRATEGY STATIC STRATEGY USER S APPLICATION UA Figure 2 1 MPIBull2

Download Pdf Manuals

image

Related Search

Related Contents

STABMIXER-SET HAND BLENDER SET  Auto Close Heart hGate  Kurzanleitung  取扱説明書 D880SS  MAP-Works User Manual  

Copyright © All rights reserved.
Failed to retrieve file