Home

HP MPI User's Guide

1. 1 O functionality Standard reference File manipulation section 9 2 File views section 9 3 Data access section 9 4 except sections 9 4 4 and 9 4 5 Consistency and section 9 6 semantics HP MPI I O has the following limitations All nonblocking I O requests use a MPIO Request object instead of MPI_Request TheMPIO Test andMPIO Wait routines are provided to test and wait for MPIO Request objects MPIO Test and MPIO Wait havethe same semantics as MPI Test and MPI Wait respectively The status argument is not returned in any MPI I O operation All calls that involve MPI I O file offsets must use an 8 byte integer Because HP UX Fortran 77 only supports 4 byte integers all Fortran 77 source files that involve file offsets must be compiled using HP UX Fortran 90 In this case the Fortran 90 offset is defined by integer KIND MPI_OFFSET_KIND 166 Appendix C Table 13 NOTE MPI 2 0 features supported MPI I O e Some 1 O routines for example MPI File open MPI_File_delete and MPI_File_set_info takean input argument called info Refer to Table 13 for supported keys for this argument Info object keys Buffer size for collective I O cb_nodes Number of processes that actually perform I O in collective I O ind rd buffer size Buffer size for data sieving in independent reads ind wr buffer size Buffer size for data sieving in independent writes If a given key is not supported or if the value is inva
2. Viewing instrumentation data with mpiview Using XMPI Working with postmortem mode Working with interactive mode Using CXperf Using the profiling interface Chapter 4 67 Profiling Using counter instrumentation Using counter instrumentation Counter instrumentation is a lightweight method for generating cumulative runtime statistics for your MPI applications When you create an instrumentation profile HP MPI creates two file formats an ASCII format and a graphical format readable by the mpiview utility You can create instrumentation profiles for applications linked with the standard HP MPI library and for applications linked with HP MPI version 1 7 you can also create profiles for applications linked with the thread compliant library Instrumentation is not supported for applications linked with the diagnostic library Idmpi Creating an instrumentation profile Create an instrumentation profile using one of the following methods Usethefollowing syntax mpirun i spec np program Refer to Preparing mpiview instrumentation files on page 23 and mpirun on page 49 for more details about implementation and syntax For example to create an instrumentation profile for an application called compute pi f enter mpirun i compute pi np 2 compute pi This invocation creates an instrumentation profile in two formats compute pi instr ASCII and compute pi mpiview graphical e Specify a filename p
3. A Example applications This appendix provides example applications that supplement the conceptual information throughout the rest of this book about MPI in general and HP MPI in particular Table 10 summarizes the examples in this appendix The example codes are also induded in the opt mpi help subdirectory in your HP MPI product Table 10 Example applications shipped with HP MPI ee ume tei E send receive f Fortran 77 Illustrates a simple send and recei ve operation ping pong c Measures thetimeit takes to send and receive data between two processes compute pi f Fortran 77 Computes pi by integrating f x 4 14x master_worker f90 Fortran 90 Distributes sections of an array and does computation on all sections in parallel Generates a virtual topology communicator c Copies the default communicator MPI COMM WORLD multi par f Fortran 77 Uses the alternating direction iterative ADI method on a 2 dimensional compute region Writes data for each process to a separate file called iodatax where X represents each process rank in turn Then the data in iodatax is read back thread safe c C Tracks the number of client np gt 2 requests handled and prints a log of the requests to stdout Appendix A 131 Step 1 Step 2 Step 3 Example applications These examples and the Makefile are located in the opt mpi help subdirectory The examples are presented for illustration purpos
4. Application types that no longer require linking to the thread compliant library include e 403 Oparallel Thread parallel MLIB applications e OpenMP e pthreads Only if no two threads call MPI at the same time Otherwise use the thread compliant library for pthreads 30 Chapter 3 NOTE Understanding HP MPI Running applications Running applications This section introduces the methods to run your HP MPI application Using one of the mpirun methods is required The examples below demonstrate two basic methods Refer to mpirun on page 49 for all the mpirun command line options You should use the j option to display the HP MPI job ID ThejobID is useful during troubleshooting to check for a hung job using mpi job or terminate a job using mpiclean There are two methods you can use to start your application Usempirun with the np option and the name of your program For example mpirun j np 4 hello world starts an executable file named hello world with four processes This is the recommended method to run applications on a single host with a single executable file Usempirun with an appfile For example mpirun f appfile where f appfile specifies a text file appfile that is parsed by mpirun and contains process counts and a list of programs You can use an appfile when you run a single executable file on a single host and you must usethis appfile method when you run on multiple hosts or run mult
5. JOB USER NPROCS PROGNAME 22623 charlie 12 home watts 22513 keith 14 home richards 22617 mick 100 home jagger 22677 ron 4 home wood When you specify the j option mpi job reports the following for each job RANK Rank for each process in the job HOST Host where the job is running PID Process identifier for each process in the job LIVE Indicates whether the process is running an x iS used or has been terminated PROGNAME Program names used in the HP MPI application 60 Chapter 3 Understanding HP MPI Running applications mpiclean mpiclean kills processes in an HP MPI application Invoke mpiclean on the host on which you initiated mpirun TheMPI library checks for abnormal termination of processes while your application is running In some cases application bugs can cause processes to deadlock and linger in the system When this occurs you can use mpi job to identify hung jobs and mpiclean to kill all processes in the hung application mpiclean syntax has two forms l mpiclean help v j id id id 2 mpiclean help v m where help Prints usage information for the utility v Turns on verbose mode m Cleans up your shared memory segments j id Kills the processes of job number id You can specify multiple job IDs in a space separated list Obtain the job ID using the j option when you invoke mpirun Thefirst syntax is used for all servers and is the preferred method
6. Step 1 Debugging and troubleshooting Debugging HP MPI applications Debugging HP MPI applications HP MPI allows you to use single process debuggers to debug applications The available debuggers are ADB DDE XDB WDB and GDB You access these debuggers by setting options in the MP I_FLAGS environment variable HP MPI also supports the multithread multiprocess debugger TotalView on HP UX 11 0 and later In addition to the use of debuggers HP MPI provides a diagnostic library DLIB for advanced error checking and debugging Another useful debugging tool especially for deadlock investigations is the XMPI utility HP MPI also provides options to the environment variable MPI FLAGS that report memory leaks 1 force MPI errors to be fatal print the MPI job ID 3 and other functionality This section discusses single and multi process debuggers and the diagnostic library refer to MPI FLAGS on page 37 and Using XMPI on page 78 for information about using MPI FLAGS option and XMPI respectively Using a single process debugger Because HP MPI creates multiple processes and ADB DDE XDB WDB and GDB only handle single processes HP MPI starts one debugger session per process HP MPI creates processes in MPI Init and each process instantiates a debugger session Each debugger session in turn attaches to the process that created it HP MPI provides MP I_DEBUG_CONT to avoid a possible race condition while the debugger
7. node shift NORTH node print node shift EAST node print node shift SOUTH node print node shift WEST O node print Main program it is probably a good programming practice to call MPI Init and MPI Finalize here int main int argc char argv MPI Init amp argc amp argv body 0 MPI Finalize 144 Appendix A Example applications cart C cart output The output from running the cart executable is shown below The application was run with np 4 Dimensions 2 2 global rank 0 cartesian rank 0 coordinate global rank 2 cartesian rank 2 coordinate global rank 3 cartesian rank 3 coordinate global rank 1 cartesian rank 1 coordinate ORrRo SS XE holds holds holds holds FAO PO UNRO holds holds holds holds PO AO RON holds holds holds holds oro ornw holds holds holds holds o C N C00 H holds holds holds holds je x3 P4 HRN UWO Appendix A 145 Example applications communicator c communicator c This C example shows how to make a copy of the default communicator MPI COMM WORLD USing MPI Comm dup tinclude lt stdio h gt tinclude lt mpi h gt main argc argv int argc char argv int rank size data MPI_Status status MPI_Comm libcomm MPI Init amp argc amp argv MPI Comm rank MPI COMM WORLD amp rank MPI Comm size MPI COMM WORLD amp size if size 2
8. oo oo oo oo o 21 Running and collecting profiling data 0ooocoococoo 23 Preparing mpiview instrumentation files o o o o 23 Preparing XMPI files 0 0 0 cect tees 24 Directory structure 0 0 0 0 ce tees 25 3 Understanding HP MPI ccccccccc eee 27 Compiling applications ooocococcccccco eh 28 Compilation utilities 2 llle 28 64 bit support ode a SET TRUCO ERR D ER E E S 29 Thread compliant library s sesana aaeeea 30 Running applications n sasaaa aaa 31 Types of applications nasaan 31 Running SPMD applications seslsleseselelsleess 32 Table of Contents v Running MPMD applications llle eese 32 Runtime environment variables o ococococcccccnoo oo 34 MPI COMMD ii amp e RR eee end eb ee and de EGER 35 MPICDELIB FLAGS scort patie d v 35 MPI FEEAGS uui xa erg a ee ogee LER Ea 37 MPAGAN Gini ia ce tenes UP ESO M Ene oe RE 40 MPI GLOBMEMSIZE occccccc tees 41 MPICENSTR 5 tan rd vec bei ra dep 41 MPI EOCALIP sert we eae ee ee REG LER oe 43 MPIMT EbAGS oup x ied eG eae eas aie ee ee 44 MPI NOBACKTRACE 000 c eee cece eee 45 MPISREMSH nk eder utu we etri Es gh 45 MPI SHMEMCNTL 0 0 0c cece RII HA eh 46 MPLTMPDIR reddo xS REESE PURSE Eres 46 MPI WORKDIR seseser cece RR mmn 46 MP ICMP TD zu aa IMP ED d ep VE 47 TOTAEVIEW zu a m went ee ee eed ge WIE Rex 48 Runtime uti
9. tinclude lt stdio h gt tinclude lt mpi h gt int MPI_Send void buf int count MPI_Datatype type int to int tag MPI Comm comm printf Calling C MPI Send to d n to return PMPI_Send buf count type to tag comm pragma _HP_SECONDARY_DEF mpi_send mpi_send_ void mpi_send void buf int count int type int to int tag int comm int ierr printf Calling Fortran MPI Send to d n to pmpi_send buf count type to tag comm ierr 102 Chapter 4 Tuning This chapter provides information about tuning HP MPI applications to improve performance The topics covered are MPI FLAGS options Message latency and bandwidth Multiple network interfaces Processor subscription MPI routine selection Multilevel parallelism Coding considerations Thetuning information in this chapter improves application performance in most but not all cases Use this information together with the output from counter instrumentation mpiview or XMPI to determine which tuning changes are appropriate to improve your application s performance When you develop HP MPI applications several factors can affect performance whether your application runs on a single computer or in an environment consisting of multiple computers in a network These factors are outlined in this chapter Chapter 5 103 Tuning MPI_FLAGS options MPI_FLAGS options By default HP MPI validates all function paramet
10. Chapter 2 23 Getting started Running and collecting profiling data Preparing XMPI files You can use XMPI in either interactive or postmortem mode To use XMPI s postmortem mode you must first create a trace file Load this file into XMPI to view state information for each process in your application The following example shows you how to create the trace file but for details about using XMPI in postmortem and interactive mode refer to Using XMPI on page 78 When you run your hello world program and want to create instrumentation files to use with the XMPI utility enter mpirun t hello world np 4 hello world where t hello world Enables run time raw trace generation for all processes and uses the name following the t option in this case hello_wor1d as the prefix to your instrumentation file np Specifies the number of processes to run hello world Specifies the name of the executable torun mpirun creates a raw trace dump for each application process and uses the name following the t option in this case nello world asthe prefix for each file MPT Finalize consolidates all the raw trace dump files into a single file hello world tr Load hello world tr into XMPI for analysis 24 Chapter 2 Table 3 Getting started Directory structure Directory structure All HP MPI files are stored in the opt mpi directory The directory structure is organized as described in Table 3 If you move the HP MPI ins
11. Returns the last name that was associated with a given window Appendix C 179 MPI 2 0 features supported Miscellaneous features 180 Appendix C Standard flexibility in HP MPI HP MPI is fully compliant with the MPI 1 2 standard and supports the subset of the MPI 2 0 standard described in Appendix C MPI 2 0 features supported There are items in the MPI standard for which the standard allows flexibility in implementation This appendix identifies HP MPI simplementation of many of these standard flexible issues Table 20 displays references to sections in the MPI standard that identify flexibility in the implementation of an issue Accompanying each reference is HP MPI s implementation of that issue Table 20 Reference in MPI standard MPI implementations are required to define the behavior of MPI Abort at least for a comm Of MPI COMM WORLD MPI implementations may ignore the comm argument and act as if comm was MPI COMM WORLD SeeMPI 1 2 Section 7 5 HP MPI implementation of standard flexible issues HP MPI s implementation MPI Abort kills the application comm is ignored uses MPT COMM WORLD An implementation must document the implementation of different language bindings of the MPI interface if they are layered on top of each other See MPI 1 2 Section 8 1 MPI does not mandate what an MPI process is MPI does not specify the execution model for each process a process can be sequentia
12. XMPI 61 See also mpirun run examples 132 runtime environment variables 34 problems 123 126 utilities 26 49 62 utility commands 49 mpiclean 61 mpijob 59 mpirun 49 mpiview 62 xmpi 61 runtime environment vari S ables MP GANG 34 40 MPI COMMD 34 35 MPI DLIB FLAGS 34 35 MPI FLAGS 34 37 MPI GLOBMEMSIZE 34 41 MPI INSTR 34 41 MPI LOCALIP 34 43 MPI MT FLAGS 44 45 MPI NOBACKTRACE 34 MPI_REMSH 45 MPI_SHMCNTL 40 MPI SHMEMCNTL 34 46 MPI TMPDIR 34 46 MPI WORKDIR 34 46 MPI_XMPI 34 47 s 38 203 scan 12 scatter 10 12 secure shell 45 select process 86 select reduction operation 13 send buffer address 13 data type of 13 number of elements in 13 sendbuf variable 12 13 sendcount variable 12 sending data in one operation 4 messages 4 6 process rank 88 sendtype variable 12 setenv MPI_ROOT 25 XAPPLRESDIR 163 setting up view options 95 shared libraries 54 shared memory control subdivision of 46 default settings 40 MPI_SHMEMCNTL 46 specify 41 system limits 123 204 SIGBUS 119 SIGILL 119 SIGSEGV 119 SIGSYS 119 Simpler trace field 98 single process debuggers 114 single threaded processes 16 SMP 189 snapshot utility 91 source variable 8 9 spin yield logic 39 SPMD 190 SPMD applications 31 SIC See sending process rank src field 88 standard send mode 7 starting HP MPI 19 122 multihost
13. i endif MPI_Send buf nbytes MPI_CHAR 1 1000 i MPI_COMM_WORLD ifdef CHECK memset buf 0 nbytes endif MPI_Recv buf nbytes MPI_CHAR 1 2000 i MPI COMM WORLD status 136 Appendix A Example applications ping_pong c tifdef CHECK for j 0 j lt nbytes j if buf j char 3 i printf error buf d Sd not din j buf j j i break tendif stop MPI_Wtime printf d bytes 2f usec msg n nbytes stop start NLOOPS 2 1000000 if nbytes gt 0 printf Sd bytes 2f MB sec n nbytes nbytes 1000000 stop start NLOOPS 2 else 1 warm up loop ef for 12 0 i lt 5 i MPI_Recv buf nbytes MPI_CHAR 0 1 MPI COMM WORLD amp status MPI Send buf nbytes MPI CHAR 0 1 MPI COMM WORLD for i 0 i lt NLOOPS i MPI_Recv buf nbytes MPI_CHAR 0 1000 i MPI_COMM_WORLD amp status MPI_Send buf nbytes MPI_CHAR 0 2000 i MPI COMM WORLD MPI_Finalize exit 0 ping_pong output The output from running the ping_pong executable is shown below The application was run with np 2 ping pong 0 bytes 0 bytes 2 98 3 99 34 99 usec msg Appendix A 137 Example applications compute_pi f compute pi f This Fortran 77 example computes pi by integrating f x 4 1 x Each process Receives the number of intervals used in the approximation e Calculates the areas of its
14. 131 memory leaks 37 message bandwidth achieve highest 110 message bin 41 message buffering prob lems 124 message label 8 message latency achieve lowest 110 message latency bandwidth 104 105 message passing advantages 2 message queue XMPI 86 88 message signature analysis 118 message size 5 message status 8 mixed language applica tions 168 Monitor interval in second field 96 MP GANG 34 40 MPI allgather operation 10 alltoall operation 10 app hangs at MPI Send 130 broadcast operation 10 build application on multiple hosts 21 build application on sin gle host 20 execution source 46 directory structure 25 gather operation 10 initialize environment 4 prefix 101 routine selection 110 run application 19 31 run application on multi ple hosts 21 run application on single host 20 scatter operation 10 terminate environment 4 MPI 2 0 standard 166 MPI application starting 19 MPI concepts 3 16 MPI I O 166 MPI library extensions 32 bit Fortran 25 64 bit Fortran 25 change MPI library routines commonly used 4 MPI Comm rank 4 MPI Comm size 4 MPI Finalize 4 MPI init 4 MPI Recv 4 MPI Send 4 number of 3 MPI object space corrup tion 118 MPI web sites xvii MPI Abort 181 MPI ANY SOURCE See improve latency MPI Barrier 13 14 111 MPI Bcast 4 11 MPI Bsend 7 MPI Comm MPI Comm c2f 168 MPI Comm rank 4 32 MPI COMM SELF 5 MPI Comm size 4 MPI COMM W
15. 87 88 code a blocking receive 8 blocking send 7 broadcast 11 nonblocking send 9 scatter 12 code error conditions 128 collect profile information ASCII report 69 mpiview 73 76 XMPI interactive mode 90 99 postmortem mode 78 90 See MPIHP_Trace_off See MPIHP_Trace_on collective communication 10 84 all reduce 12 reduce 12 reduce scatter 12 scan 12 XMPI collective operations 10 10 14 communication 10 computation 12 synchronization 13 comm field 87 88 comm variable 8 9 11 12 13 communication context 8 13 hot spot 70 hot spots 57 improving interhost 57 one sided 175 using daemons 62 communicator defaults 5 determine no of pro cesses 6 freeing memory 37 communicator c 131 commutative reductions 111 compatibility 49 compilation utilities 26 compiler options autodbl 29 autodbl4 29 autodblpad 29 DA2 0W 29 DD64 29 32 and 64 bit library 29 Fortran 29 L 28 28 notv 28 WI 28 compiling applications 28 completing HP MPI 128 completion routine 7 complying with MPI 2 0 standard committed 178 I O 166 Info objects 178 language interoperabili ty 168 MPI Finalize 178 MPI Init NULL NULL 174 one sided communica tion 175 process termination 178 status information 178 thread compliant li brary 170 computation 12 compute pi f 68 131 configuration files 25 configure environment 18 datatypes setenv MPI ROOT 25 set
16. 88 Focus dialog select pro cess 86 interactive mode 78 Kiviat dialog 89 main window 80 90 monitor options dialog 95 postmortem mode 78 79 rebuild Xresource data base 163 resource file 163 snapshot utility 91 trace application default settings 25 Trace dialog 92 Trace dialog View 82 trace file generation 48 Trace Selection dialog 81 using interactively 90 95 X application resource environment variable 163 X resource file contents 164 XAPPLRESDIR 163 xmpi command line 49 61 XMPI Focus fields cnt 87 88 comm 87 88 copy 88 peer 87 src 88 tag 87 88 XMPI monitor options field Automatic snapshot 95 Monitor interval in sec ond 96 XMPI Trace dialog 82 90 Dump 92 Express 92 Y yellow See process colors yield spin logic 39 Z zero buffering 40 207
17. 90 XMPI Focus dialog 86 process info view from trace 85 process placement multihost 57 processor subscription 109 profiling interface 101 See also debug HP MPI See also MPI FLAGS using counter instru mentation 68 using CXperf 100 using mpiview 73 77 using XMPI 78 progression 106 propagation of environment variables 124 pthreads 30 170 R race condition 114 rank 5 of calling process 4 of root process 13 of source process 8 reordering 39 raw trace files 129 ready send mode 7 rebuild Xresource database 163 receive message information 8 message methods 6 messages 4 5 receive buffer address 13 data type of 13 data type of elements 8 number of elements in 8 starting address 8 recvbuf variable 12 13 recvcount variable 12 recvtype variable 12 red See process colors reduce 12 reduce scatter 12 reduction 13 operation 13 release notes 25 remote hosts See rhosts file remote shell 21 remsh command 45 122 secure 45 remsh 21 remsh command See remote shell reordering rank 39 req variable 9 rewind trace log 83 rhosts file 21 122 ROMIO See I O 166 root process 10 root variable 11 12 13 rotate graph 76 routine selection 110 run appfile interactively 90 application 19 MPI application 31 123 MPI on multiple hosts 21 49 55 59 MPI on single host 20 MPI on single hosts 49 mpiview 73 process 83
18. 98 aCC 28 ADI See alternating direc tion iterative method allgather 10 allows 104 all reduce 12 alltoall 10 alternating direction itera tive method 131 147 amount variable 41 appfile configure for multiple network inter faces 107 description of 22 XMPI interactive mode 90 appfiles adding program argu ments 56 assigning ranks in 57 application hangs See zero buffering argument checking disable 40 array partitioning 148 ASCII instrumentation pro file 69 asynchronous tion 3 automatic snapshot 91 Automatic snapshot field 95 communica B backtrace 119 backward compatibility 49 bandwidth 5 105 110 barrier 14 111 blocked process 83 blocking communication 6 buffered mode 7 MPI Bsend 7 MPI Recv 8 MPI Rsend 7 193 MPI Send 7 MPI Ssend 7 point to point 83 read mode 7 receive mode 7 8 send mode 7 standard mode 7 synchronous mode 7 blocking receive 8 blocking send 7 broadcast 10 11 buf variable 7 8 9 11 Buffer size field 99 buffered send mode 7 build examples 132 MPI on multiple hosts 21 55 61 MPI on single host 20 problems 122 C C compiler 28 C examples communicator c 146 io c 156 ping pong c 131 135 thread safe c 158 C compiler 28 C examples 131 194 cart C 131 142 cart C 131 change default settings 95 execution location 46 viewing options 95 cnt See data element count cnt field
19. For example compute pi f on page 138 uses MPI_BCAST to broadcast one integer from process O to every process in MPI COMM WORLD Chapter 1 11 Introduction MPI concepts To code a scatter use MPI Scatter void sendbuf int sendcount MPI_Datatype sendtype void recvbuf int recvcount MPI Datatype recvtype int root MPI Comm comm where sendbuf Specifies the starting address of the send buffer sendcount Specifies the number of elements sent to each process sendtype Denotes the datatype of the send buffer recvbuf Specifies the address of the receive buffer recvcount Indicates the number of elements in the receive buffer recvtype Indicates the datatype of the receive buffer elements root Denotes the rank of the sending process comm Designates the communication context that identifies a group of processes Computation Computational operations do global reduction operations such as sum max min product or user defined functions across all members of a group There are a number of global reduction functions Reduce Returns the result of a reduction at one node All reduce Returns the result of a reduction at all nodes Reduce Scatter Combines the functionality of reduce and scatter operati ons Scan Performs a prefix reduction on data distributed across a group Section 4 9 Global Reduction Operations in the MPI 1 0 standard describes each of these functions in detail Reduction operations are bin
20. Propagation of environment variables oo ooooo o 124 interoperability sarsa perra mrena pria neia aep eee 125 Fortran 90 programming features liliis eee eee 125 UNIX open file descriptors 0 00 cece eee 126 External input and output 0 0000 c ee eee 126 Completitig oer der eue ate wane SURE ta 128 Frequently asked questions 00 0 eee eee 129 Timein MPI Finalize 00 ce eee 129 MP cleanup soii cyte thee dto m RA Mad ed iad Wie nho AS wed 129 Table of Contents vii Application hangsin MPI Send islslseselsrssrssss 130 Appendix A Example applications eese ene nn nn nnn 131 send receve Pirata leave a beled E Rent ane d 133 send receiveoutput o coooccoconocr nen 134 ping pong C iss gx a Pp A A ee ee ie See GEDUE 135 ping pong output 0 tee 137 compute PIF seve m Bbw ex RR De Rea EXEXG ee ee da 138 compute pi Output 2 6 0 ee 139 master workerf90 oriai eraa E SEE a OEE aA E a 140 master worker output 2 ee ne 141 Cart cu specs eue a et Rer alacer Aes Ru ER etate un aria 142 CORE QUE UT cC 145 COMUNICA Evita a Sa pw BIA MR a ad 146 communicator output ooooocccccc es 147 MUNG Dat eite dated rae er tae reu e peer ebur 147 lO G sa oue aee vetera M stes LUE EMI as 156 IO OUTPUT e rc prx 157 thread Sate cssc usce cerae rr REI MEME WU rsen enm oe 158 thread safeoutput ooccocococoo eh 160 Appen
21. coords A destructor Node Node void if comm MPI_COMM_NULL MPI Comm free amp comm Shift function void Node shift Direction dir if comm MPI_COMM_NULL return int direction disp src dest if dir NORTH direction 0 disp 1 else if dir SOUTH direction 0 disp 1 else if dir EAST direction 1 disp 1 else 1 direction 1 disp 1 MPI Cart shift comm direction disp MPI Status stat MPI Sendrecv replace amp data 1 MPI_ INT Synchronize and print the data being held void Node print void if comm MPI COMM NULL MPI Barrier comm if lrank 0 puts line feed MPI_Barrier comm printf d d holds d n coords 0 Print object s profile void Node profile void Non member does nothing if comm MPI_COMM_NULL return Appendix A coords 1 Example applications cart C Estat 143 Example applications cart C Print Dimensions at first if lrank 0 printf Dimensions d d n dims 0 dims 1 MPI_Barrier comm Each process prints its profile printf global rank d cartesian rank d coordinate d d n grank lrank coords 0 coords 1 Program body Define a torus topology and demonstrate shift operations void body void Node node node profile node print
22. h hosta np 2 programl h hostb np 2 program2 Chapter 3 57 Understanding HP MPI Running applications However this places processes 0 and 1 on hosta and processes 2 and 3 on hostb resulting in interhost communication between the ranks identified as having slow communication lt gt Slow communication Cres un wo hosta hostb A more optimal appfile for this example would be h hosta np 1 programl h hostb np 1 program2 h hosta np 1 programl h hostb np 1 program2 This places ranks O and 2 on hosta and ranks 1 and 3 on hostb This placement allows intrahost communication between ranks that are identified as communication hot spots Intrahost communication yields better performance than interhost communication 1 Fast communication hosta hostb Multipurpose daemon process HP MPI incorporates a multipurpose daemon process that provides start up communication and termination services The daemon operation is transparent HP MPI sets up one daemon per host or appfile entry for communication Refer to Communicating using daemons on page 62 for daemon details 58 Chapter 3 NOTE Understanding HP MPI Running applications Because HP MPI sets up one daemon per host or appfile entry for communication when you invoke your application with np x HP MPI generates x 1 processes Generating multihost instrumentation profiles To generate tracing output files for multihost appl
23. if rank printf communicator must have two MPI Finalize exit 0 MPI Comm dup MPI COMM WORLD amp libcomm if rank 0 data 12345 data 6789 else 4 printf received data d n data MPI Comm free amp libcomm MPI Finalize exit 0 146 PI Send amp data 1 MPI INT 1 5 libcomm PI Recv amp data 1 MPI INT 0 5 libcomm printf received libcomm data d n data PI Recv amp data 1 MPI INT 0 5 MPI COMM WORLD processes Mn PI Send amp data 1 MPI INT 1 5 MPI COMM WORLD amp status amp status Appendix A Example applications multi_par f communicator output The output from running the communicator executable is shown below The application was run with np 2 received libcomm data 6789 received data 12345 multi par f The Alternating Direction Iterative ADI method is often used to solve differential equations In this example multi par f a compiler that supports OPENMP directives is required in order to achieve multi level parallelism multi par f implements the following logic for a 2 dimensional compute region DO J 1 JMAX DO I 2 1MAX A I J A I J A I 1 J ENDDO ENDDO DO J 2 JMAX DO I 1 IMAX A I J A I J A I J 1 ENDDO ENDDO There are loop carried dependencies in the first inner DO loop the array s rows and the second outer DO loop the array s columns Partitioning the array into col
24. number of processes in the communicator send modes Point to point communication in which messages are passed using one of four different types of blocking sends The four send modes include standard mode MPI_Send buffered mode MPI_Bsend synchronous mode MP1_Ssena and ready mode MP I_Rsend The modes are all invoked in a similar manner and all pass the same arguments shared memory model Model in which each process can access a shared address space Concurrent accesses to shared memory are controlled by synchronization primitives SIMD Single instruction multiple data Category of applications in which homogeneous processes execute the same instructions on their own data SMP Symmetric multiprocessor A multiprocess computer in which all the processors have equal access to all machine resources Symmetric multiprocessors have no manager or worker processes spin yield Refers toan HP MPI facility that allows you to specify the number of milliseconds a process should block spin waiting for a message before yielding the CPU to another 189 process Specify a spin yield value in theMPI FLAGS environment variable SPMD Single program multiple data Implementations of HP MPI where an application is completely contained in a single executable SPMD applications begin with the invocation of a single process called the master The master then spawns some number of identical child processes The master and
25. server ack MPI Status status for w 0 w MAX WORK w server rand size MPI Sendrecv amp rank 1 MPI INT server SERVER TAG amp ack 1 MPI INT server CLIENT TAG MPI COMM WORLD amp status if ack server printf server failed to process my request n MPI_Abort MPI_COMM_WORLD MPI ERR OTHER void shutdown servers rank int rank int request shutdown REQ SHUTDOWN MPI Barrier MPI COMM WORLD MPI Send amp request shutdown 1 MPI INT rank SERVER TAG MPI COMM WORID main argc argv int argc char argv int rank size rtn pthread_t mtid MPI_Status status int my_value his_value MPI Init amp argc amp argv MPI Comm rank MPI COMM WORLD amp rank MPI Comm size MPI COMM WORLD amp size rtn pthread create amp mtid 0 server void amp rank if rtn 0 printf pthread create failed n MPI Abort MPI COMM WORLD MPI ERR OTHER client rank size shutdown servers rank Appendix A 159 Example applications thread safe c rtn pthread join mtid 0 if rtn 0 printf pthread join failedin MPI Abort MPI COMM WORLD MPI ERR OTHER MPI Finalize exit 0 thread safe output The output from running the thread_safe executable is shown below The application was run with np 2 server 1 processed request for c server 1 processed request Lor Gc server 1 processed request for c server 1 proce
26. setenv XAPPLRESDIR HOME XMPI You can copy the contents of XMPI tothe Xdefaults filein your home directory and customize it If you change your Xdefaults file during your login session you can load the specifications immediately by typing the following command at a shell prompt xrdb load HOME Xdefaults Appendix B 163 XMPI resource file The following section displays the contents of the opt mpi lib X 11 app defaults XM PI Xresource file PI Title XMPI PI IconName XMPI PI multiClickTime 500 PI background lightgray PI fontList helvetica bold r normal 120 PI msgFont helvetica medium r normal 120 PI fo func fontList helvetica bold o normal 120 PI dt dtype fontList helvetica medium r normal 100 X cole PI ctl bar bottomShadowColor darkslateblue PI ctl bar background slateblue PI ctl bar foreground white PI banner background slateblue PI banner foreground white PI view draw background black PI view draw foreground gray PI trace draw foreground black PI kiviat draw background gray PI kiviat draw foreground black PI matrix draw background gray PI matrix draw foreground black PI app list visibleItemCount 8 PI aschema text columns 24 PI prog mgr columns 16 PI comCol cyan PI rcomCol plum PI label_frame XmLabel background D3B5B5 PI XmToggleButtonGadget selectColor red PI XmToggleButton selectColor red P amp P PG P PG PX PC P
27. where the corresponding receive completed Collective communications are represented by a trace for each process showing the time spent in system overhead and time spent blocked waiting for communication Some send and receive segments may not have a matching segment In this case a stub lineis drawn out of the send segment or into the receive segment To play the trace file select Play or Fast Forward on the icon bar For any given dial time the state of the processes is reflected in the main window and the Kiviat diagram as well as the trace log window Refer to Viewing process information on page 85 and Viewing Kiviat information on page 89 to learn how to interpret the information 84 Chapter 4 Figure 9 Step 1 Profiling Using XMPI Viewing process information When you play the trace file the state of the processes is reflected in the main window and the Kiviat diagram The following instructions describe how to view process information in the main window Start XMPI and open a trace for viewing as described in Creating a trace file on page 79 The XMPI main window fills with a group of tiled hexagons each representing the current state of a process and labelled by the process s rank within MPI COMM WORLD Figure 9 shows the XMPI main window displaying hexagons representing six processes ranks O through 5 XMPI process information Application Trace Options App lt None gt Process
28. xvi MPI The Complete Reference 2 volume set MIT Press MPI 1 2 and 2 0 standards available at http www mpi forum org MPI A MessagePassing Interface Standard and MPI 2 Extensions to the M essage Passing Interface TotalView documents available at http www etnus com TotalView Command Line Interface Guide TotalView User s Guide TotalView Installation Guide CXperf User s Guide CXperf Command Reference Paralld Programming Guidefor HP UX Systems The following table shows World Wide Web sites that contain additional MPI information URL Description http www hp com go mpi Hewlett Packard s HP MPI web page http www mpi forum org Official site of the MPI forum http www mcs anl gov P roj ects mpi index html Argonne National Laboratory s MPICH implementation of MPI http www mpi nd edu lam University of Notre Dames LAM implementation of MPI http www erc msstate edu mpi Mississippi State University s MPI web page http www tc cornell edu Services E du Topics Cornell Theory Center s MPI M PI Basics more asp tutorial and lab exercises http www unix mcs anl gov romio Argonne National Laboratory s implementation of MPI I O Credits HP MPI is based on MPICH from Argonne National Laboratory and Mississippi State University and LAM from the University of Notre Dame and Ohio Supercomputer Center The XMPI utility is based on LAM s version available at http www mpi nd edu l a
29. 31 49 59 mpirun 49 appfiles 55 command line options 49 54 options dialog 97 trace file generation 48 mpirun options fields Buffer size 99 Initially off 98 34 No clobber 98 Prefix 98 Simpler trace 98 Tracing 97 Verbose 97 mpirun options trace dialog Tracing button 92 mpiview 49 62 73 76 graph analysis function ality 76 graph types 73 Window menu 76 MPMD 188 MPMD applications 31 55 multi par f 131 multilevel parallelism 16 110 multiple buffer writes detec tion 118 multiple hosts 21 55 59 assigning ranks in app files 57 communication 57 multiple network interfaces 107 configure in appfile 107 diagram of 108 improve performance 107 using 107 multiple threads 16 110 multi process debugger 116 N Native Language Support NLS 65 network interfaces 107 NLS 65 NLSPATH 65 no clobber 42 See HP MPI abort No clobber field 98 nonblocking communica tion 6 9 buffered mode 9 MPI Ibsend 9 MPI Irecv 9 MPI Irsend 9 MPI Isend 9 MPI Issend 9 point to point 83 ready mode 9 receive mode 9 standard mode 9 synchronous mode 9 nonblocking send 9 noncontiguous and contigu ous data 14 nonportable code uncover ing 40 number of message copies sent 88 number of MPI library rou tines 3 O object convert between guages 168 one sided communication 175 op variable 13 OPENMP block partition ing 149 optimization report 39 organizati
30. 8 file adi out array close 8 status keep Free the resources do rank 0 comm size 1 call mpi type free twdtype rank ierr enddo do b1k 0 comm_size 1 call mpi type free rdtype blk ierr call mpi type free cdtype blk ierr enddo deallocate rbs Finalize the M Q rbe cbs cbe rdtype cdtype twdt ype PI system call mpi_finalize ierr end CREKKKKK KKK KKK KKK KKK Ck CK Ck KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK ARK Sk ko kx kx kx KK subroutine blockasgn subs sube blockcnt nth blocks blocke e c This subroutine c is given a range of subscript and the total number of blocks in G which the range is to be divided assigns a subrange to the caller that is n th member of the blocks e implicit none integer subs in subscript start integer sube Ls an subscript end integer blockcnt My cm block count integer nth in my block begin from 0 integer blocks out assigned block start subscript integer blocke out assigned block end subscript c integer d1 ml c dl sube subs 1 blockent ml mod sube subs 1 blockcnt blocks nth d1 subs min nth ml blocke blocks d1 1 if ml gt nth b locke blocke 1 Appendix A 153 Example applications multi_par f end C CREKKKKKKKKKKK KKK RARA kk KKK KKK ck ck ck ck ck ck KKK KKK KKK KK KKK KKK KK KKK KKK KK KK ck kc KK KKK subroutine compcolumn nrow ncol array rbs rbe cbs cbe c e This subroutin
31. COMM WORLD amp size PI Get processor name name amp len printf Hello world I m d of d on s n rank size name PI Finalize exit 0 Chapter 2 19 Step 1 Step 2 Step 3 Step 4 Getting started Compiling and running your first application Building and running on a single host This example teaches you the basic compilation and run steps to execute hello world c on your local host with four way parallelism To build and run hello world c on a local host named jawbone Changeto a writable directory Compilethe hello world executable file mpicc o hello world opt mpi help hello world c Run the hello world executable file mpirun np 4 hello world where np 4 specifies the number of processes to run is 4 Analyze hello world output HP MPI prints the output from running the hello world executable in non deterministic order The following is an example of the output Hello world I m 1 of 4 on jawbone Hello world I m 3 of 4 on jawbone Hello world I m 0 of 4 on jawbone Hello world I m 2 of 4 on jawbone 20 Chapter 2 Step 1 Step 2 Step 3 Step 4 Getting started Compiling and running your first application Building and running on multiple hosts This example teaches you to build and run hello_world c using two hosts to achieve four way parallelism For this example the local host is named jawbone and a remote host is named wizard This assumes that both machines run e
32. Init amp argc amp argv MPI Comm rank MPI COMM WORLD amp rank buf int malloc SIZE nints SIZE sizeof int for i20 i lt nints i buf i rank 100000 i each process opens a separate file called FILENAME myrank filename char malloc strlen FILENAME 10 sprintf filename s d FILENAME rank MPI File open MPI COMM SELF filename MPI MODE CREATE MPI MODE RDWR MPI INFO NULL amp fh MPI File set view fh MPI_Offset 0 MPI INT MPI INT native MPI INFO NULL MPI File write fh buf nints MPI INT amp status MPI File close amp fh reopen the file and read the data back for i20 i nints i buf i 0 MPI File open MPI COMM SELF filename MPI MODE CREATE MPI MODE RDWR MPI INFO NULL amp fh 156 Appendix A Example applications io c MPI_File_set_view fh MPI_Offset 0 MPI_INT MPI_INT native MPI INFO NULL MPI File read fh buf nints MPI INT amp status MPI File close amp fh check if the data read is correct flag 0 for i20 i nints 1 if buf i rank 100000 i printf Process d error read d should be d n rank buf i rank 100000 i flag 1 if flag printf Process d data read back is correct n rank MPI File delete filename MPI INFO NULL free buf free filename MPI_Finalize exit 0 io output The output from running the o executable is shown b
33. Rank and Peer Routine Rank Peer Calls Overhead Blocking PI Bcast 0 0 1 0 021456 0 000000 1 0 an 0 021393 0 000000 PI_Reduce 0 0 1 0 000297 0 000000 1 0 1 0 000408 0 000000 essage Summary Routine Message Bin Count PI_Bcast 0 32 2 PI_Reduce 05 321 2 essage Summary by Rank Routine Rank Message Bin Count PI Bcast 0 0 32 1 1 0 32 1 PI_Reduce 0 0 2 32 1 1 0 32 ii essage Summary by Rank and Peer Routine Rank Peer Message Bin Count PI Bcast 0 0 10 432 T 1 0 0 32 T PI_Reduce 0 0 0 32 d 1 0 0 32 1 72 Chapter4 Profiling Using counter instrumentation Viewing instrumentation data with mpiview The mpiview utility is a graphical user interface that displays instrumentation data collected at runtime by an MPI application The following sections describe how to use mpiview to analyze your instrumentation data files Loading an mpiview file Selecting a graph type Viewing multiple graphs Analyzing graphs Loading an mpiview file To view an instrumentation profile invoke the mpiview utility and load your prefix mpiview instrumentation file in one of the following ways Providethe name of the instrumentation file when you invokethe mpiview utility For example mpiview compute pi mpiview loads the compute pi mpiview file created in the mpirun example command above nvoke mpiview without a filename Enter mpiview From the mpiview control window select File from the menu bar then Open
34. The mpiview utility displays a dialog box from which you can select your instrumentation file After you select the file mpiview displays a message stating either that the file was read successfully or that an error occurred Selecting a graph type From the Graph pulldown menu on the main control window select the type of graph you want to view There are seven graph types that display your data in different formats Each time you select a graph mpiview displays it in a separate window Chapter 4 73 Profiling Using counter instrumentation Figure 3 displays the options on the Graph pulldown menu Figure 3 MPIVIEW Graph menu MPIVIEW send_receive mpiview File Graph Window Application summary by rank Routine summary Routine summary by rank Routine summary by rank and peer Message length summary by rank Message length summary by routine Message summary by rank and peer AIVI EVY There are seven types from which to select Application summary by rank Displays data by rank Routine summary Displays data by routine Routine summary by rank Displays data by rank and routine Routine summary by rank and peer Displays data by rank and its peer rank for a given routine Message length summary by rank Displays data by routine and message length for a given rank or for all ranks Message length summary by routine Displays data by rank and message length for a given routine Message length
35. These sections were copied by permission of the University of Tennessee Parts of this book came from MPI Primer Developing with LAM That document is copyrighted by the Ohio Supercomputer Center These sections were copied by permission of the Ohio Supercomputer Center Contents Preface iia a Oi a lAs xiii System platforms eme a a a RA a BES xiv Notational conventions lilssseseeeee eene XV Associated DocumentS 0000 cece ens xvi Credits 2 jose vri e ou bp Mireles uu date des xvii 1 Introduction 00 cee 1 The message passing model 0 0 e eee eee 2 MPI CONCEDES cits a heen SES wits PO Aa a PR ee RR RR ae We Bl 3 Point to point communication 000 eee 5 Communicators 0 cee 5 Sending and receiving messages 0 eee ee ee 6 Collective operationS 0 0 0 eee 10 COMMUNI CACLON x cz soe a claves Asie a qa Ee 10 COMPUTATION ox eu ba eet ipie De Meee ee De EE dls oat eed es 12 Synchronization centro e A eee a ee A 13 MPI datatypes and packing 0 0 cece eee ee 14 Multilevel parallelism 0 0 00 cece eee 16 Advanced topicS oocococccoco tte 16 2 Getting started coocoococoncn a a e nha hahaha 17 Configuring your environment ooo 18 Compiling and running your first applicati0N 19 Building and running on a single host oooocooccccooo 20 Building and running on multiple hostS
36. You must run 64 bit executables on the 64 bit system though you can build 64 bit executables on the 32 bit system HP MPI supports a 64 bit version of the MPI library on platforms running HP UX 11 0 Both 32 and 64 bit versions of the library are shipped with HP UX 11 0 For HP UX 11 0 you cannot mix 32 bit and 64 bit executables in the same application The mpicc and mpicc compilation commands link the 64 bit version of the library if you compile with the DA2 ow or DD64 options Use the following syntax mpicc mpiCC DA2 0W DD64 o filename filename c When you use mpi 90 compile with the DA2 ow option to link the 64 bit version of the library Otherwise mpi 90 links the 32 bit version For example to compile the program myprog f90 and link the 64 bit library enter mpif90 DA2 0W o myprog myprog f90 Chapter 3 29 Understanding HP MPI Compiling applications Thread compliant library HP MPI provides a thread compliant library for applications running under HP UX 11 0 32 and 64 bits By default the non thread compliant library libmpi is used when running HP MPI jobs Linking to the thread compliant library libmtmpi is now required only for applications that have multiple threads making MPI calls simultaneously In previous releases linking to the thread compliant library was required for multithreaded applications even if only one thread was making a MPI call at a time See Table 15 on page 170
37. added to messages to allow the system to detect mismatches See MPI 1 2 Section 3 3 2 HP MPI always sets the value of MPI HOST tOMPI PROC NULL If you do not specify a host name to use the hostname returned is that of the UNIX gethostname 2 If you specify a host name using the n option to mpirun HP MPI returns that host name The default HP MPI library does not carry this information due to overload but the HP MPI diagnostic library DLIB does To link with the diagnostic library use ldmpi on the link line Vendors may write optimized collective routines matched to their architectures or a complete library of collective communication routines can be written using MPI point to point routines and a few auxiliary functions See MPI 1 2 Section 4 1 Error handlers in MPI take as arguments the communicator in use and the error code to bereturned by the MPI routine that raised the error An error handler can also take stdargs arguments whose number and meaning is implementation dependent See MPI 1 2 Section 7 2 and MPI 2 0 Section 4 12 6 182 Use HP MPI s collective routines instead of implementing your own with point to point routines HP MPI s collective routines are optimized to use shared memory where possible for performance To ensure portability HP MPI s implementation does not take stdargs For example in C the user routine should be a C function of type MPI handler function defined as
38. applications 21 122 singlehost applications 20 status 8 status variable 8 stdargs 181 stdin 126 stdio 126 181 stdout 126 stop playing trace log 83 storing temp files 46 structure constructor 15 subdivision of shared mem ory 46 subscription definition of 109 types 109 swapping overhead 41 synchronization 13 performance and 111 variables 3 synchronous send mode 7 T t option 48 53 tag See tag argument value tag argument value 87 88 tag field 87 88 tag variable 8 9 terminate MPI environment 4 thread communication 62 multiple 16 safety 170 thread compliant 30 170 Oparallel 30 library total transfer time 5 TOTALVIEW 48 Total View See multi process bugger de trace get full 93 get partial 93 view process info 85 Trace dialog 92 trace file create 79 Kiviat 89 play 84 state 84 85 89 viewing 80 trace file generation enable runtime 91 enable runtime raw 92 97 using mpirun 48 XMPI 48 trace log fast forward 83 magnification 83 play 83 rewind 83 set magnification 83 stop playing 83 trace magnification 83 Trace Selection dialog 81 tracing See trace file generation Tracing button 92 Tracing field 97 tracing options dialog See mpirun options fields troubleshooting 113 Fortran 90 125 HP MPI 121 130 message buffering 124 MPI Finalize 128 mpiclean 31 mpijob 31 UNIX file descriptors 126 using MPI FLAGS 114 using t
39. called an appfile Refer to Appfiles on page 55 for more information For example mpirun t my trace f my appfile enables tracing specifies the prefix for the tracing output fileis my trace and runs an appfile named my appfile Toinvoke LSF for applications where all processes execute the same program on the same host bsub lsf options pam mpi mpirun mpirun options program args In this case LSF assigns a host to the MPI job 50 Chapter 3 NOTE Understanding HP MPI Running applications For example bsub pam mpi mpirun np 4 compute pi requests a host assignment from LSF and runs the compute pi application with four processes Refer to Assigning hosts using LSF on page 64 for more information This is the last release of HP MPI that will support tightly coupled integration between LSF s Parallel Application Manager PAM and HP MPI Shell scripts will be provided to enable similar functionality when support for this feature is discontinued Toinvoke LSF for applications that run on multiple hosts bsub lsf options pam mpi mpirun mpirun options f appfile extra args for appfile In this case each host specified in the appfile is treated as a symbolic name referring to the host that LSF assigns to the MPI job For example bsub pam mpi mpirun f my appfile runs an appfile named my appfile and requests host assignments for all remote and local hosts specified in my appfile If my
40. debugger must be in the command search path See Debugging HP MPI applications on page 114 for more information ewdb Starts the application under the wdb debugger The debugger must be in the command search path See Debugging HP MPI applications on page 114 for more information 1 Reports memory leaks caused by not freeing memory allocated when an HP MPI job is run For example when you create a new communicator or user defined datatype after you call MPI Init you must free the memory allocated to these objects before you call MPI_Finalize InC this is analogous to making calls to malloc and free for each object created during program execution Setting the 1 option may decrease application performance Chapter 3 37 Understanding HP MPI Running applications s alpl t 38 Forces MPI errors to be fatal Using the f option sets theMPI ERRORS ARE FATAL error handler ignoring the programmer s choice of error handlers This option can help you detect nondeterministic error problems in your code If your code has a customized error handler that does not report that an MPI call failed you will not know that a failure occurred Thus your application could be catching an error with a user written error handler or with MPI ERRORS RETURN which masks a probl em Selects signal and maximum ti me delay for guaranteed message progression The sa option selects SIGALRM The sp option selects SIGPROF The option is the
41. displays the expanded Tracing options dialog Tracing options dialog mpirun options Print job ID FF Verbose Tracing Prefix FF No clobber F Initially off Simpler trace Buffer size 4096 kilo bytes Ok Thefields you can useto specify tracing options are Prefix Specifies the prefix name for the file where process write trace data The trace files for each process are consolidated to a prefix tr output file This is a required field No dobber Specifies no clobber which means that an HP MPI application aborts if a file with the name specified in the Prefix field already exists Initially off Specifies that trace generation is initially turned off Simpler trace Specifies a simpler tracing mode by omitting MPI_Test MPI_Testall MPI_Testany and MPI Testsome Calls that do not complete a request 98 Chapter 4 NOTE Buffer size Profiling Using XMPI Denotes the buffering size in kilobytes for dumping process trace data Actual buffering size may be rounded up by the system The default buffering size is 4096 kilobytes Specifying a large buffering size reduces the need to flush trace data to a file when process buffers reach capacity Flushing frequently can increase the overhead for 1 O HP MPI 1 7 is the last release that will support XMPI XMPI is not supported for Itanium based systems Chapter 4 99 Profiling Using CXperf Using CXperf CXperf allows you to profile each proce
42. environment variables When working with applications that run on multiple hosts you must set values for environment variables on each host that participates in the job A recommended way to accomplish this is to set the e option in the appfile h remote host e var val np program args Refer to Creating an appfile on page 55 for details Alternatively you can set environment variables using the cshrc file on each remote host if you are using a bin csh based shell 124 Chapter 6 Debugging and troubleshooting Troubleshooting HP MPI applications Interoperability Depending upon what server resources are available applications may run on heterogeneous systems For example suppose you create an MPMD application that calculates the average acceleration of particles in a simulated cyclotron The application consists of a four process program called sum_accelerations and an eight process program called calculate average Because you have access to a K Class server called K server and an V Class server called V server you create the following appfile h K server np 4 sum accelerations h V server np 8 calculate average Then you invoke mpirun passing it the name of the appfile you created Even though the two application programs run on different platforms all processes can communicate with each other resulting in twelve way parallelism The four processes belonging to the sum accelerations application are ranke
43. instrumentation For details refer to Using counter instrumentation on page 68 Using XMPI on page 78 e Using the diagnostics library on page 118 The profiling interface allows you to intercept calls made by the user program to the MPI library For example you may want to measurethe time spent in each call toa certain library routine or createa log file You can collect your information of interest and then call the underlying MPI implementation through a name shifted entry point All routines in the HP MPI library begin with theMPr prefix Consistent with the Profiling Interface section of the MPI 1 2 standard routines are also accessible using the PMPr prefix for example MPI Send and PMPI Send access the same routine To use the profiling interface write wrapper versions of the MPI library routines you want the linker to intercept These wrapper routines collect data for some statistic or perform some other action The wrapper then calls the MPI library routine using its PMPI prefix Chapter 4 101 Profiling Using the profiling interface Fortran profiling interface To facilitate improved Fortran performance we no longer implement Fortran calls as wrappers to C calls Consequentl y profiling routines built for C calls will no longer cause the corresponding Fortran calls to be wrapped automatically n order to profile Fortran routines separate wrappers need to be written for the Fortran calls For example
44. instrumentation file for the compute pi f application you can print the prefix instr file If you defined prefix for the file as compute pi as you did when you created the instrumentation file in Creating an instrumentation profile on page 68 you would print compute pi instr TheASCII instrumentation profile provides the version the date your application ran and summarizes information according to application rank and routines Figure 2 on page 71 is an example of an ASCII instrumentation profile Chapter 4 69 NOTE Profiling Using counter instrumentation The information available in the prefix instr file includes Overhead ti me The time a process or routine spends inside MPI For example the time a process spends doing message packing Blocking time The time a process or routine is blocked waiting for a message to arrive before resuming execution Communication hot spots The processes in your application between which the largest amount of time is spent in communication Message bin The range of message sizes in bytes The instrumentation profile reports the number of messages according to message length You do not get message size information for MPI Alltoallv instrumentation 70 Chapter 4 Profiling Using counter instrumentation Figure 2 displays the contents of the example report compute pi instr Figure 2 ASCII instrumentation profile Version HP MPI B6011 B6280 HP UX 10 20 Date Mon Feb 2
45. interconnect and software that functions collectively as a parallel machine collective communication Communication that involves sending or receiving messages among a group of processes at the same time The communication can be one to many many to one or many to many The main collective routines are MPI Bcast MPI Gather and MPI Scatter 185 communicator Global object that groups application processes together Processes in a communicator can communicate with each other or with processes in another group Conceptually communicators define a communication context and a static group of processes within that context context Internal abstraction used to define a safe communication space for processes Within a communicator context separates point to point and collective communications data parallel model Design model where data is partitioned and distributed to each process in an application Operations are performed on each set of data in parallel and intermediate results are exchanged between processes until a problem is solved derived data types User defined structures that specify a sequence of basic data types and integer displacements for noncontiguous data You create derived data types through the use of type constructor functions that describe the layout of sets of primitive types in memory Derived types may contain arrays as well as combinations of other primitive data types determinism A
46. is disabled Process priorities for gangs are managed identically to timeshare policies The timeshare priority scheduler determines when to schedule a gang for execution Whileit is likely that scheduling a gang will preempt one or more higher priority timeshare processes the gang schedule policy is fair overall In addition gangs are scheduled for a single time slice which is the same for all processes in the system MPI processes are allocated statically at the beginning of execution As an MPI process creates new threads they are all added to the same gang if MP GANG is enabled The MP_GANG syntax is as follows ON OFF where ON Enables gang scheduling 40 Chapter 3 Understanding HP MPI Running applications OFF Disables gang scheduling For multihost configurations you need to set MP_GANG for each appfile entry Refer tothe e option in Creating an appfile on page 55 You can also use the HP UX utility mpsched 1 to enable gang scheduling Refer to the HP UX gang sched 7 and mpsched 1 manpages for more information MPI_GLOBMEMSI ZE MPI GLOBMEMSIZE specifies the amount of shared memory allocated for all processes in an HP MPI application The MPI GLOBMEMSIZE syntax is as follows amount where amount specifies the total amount of shared memory in bytes for all processes The default is 2 M bytes for up to 64 way applications and 4Mbytes for larger applications Be sure that the valu
47. lient lient lient lient lient Oir 0010 ARA AA SA O UO Oe HIERO IC CT IPIE EO qp EP Appendix A server server server server POLENG server server server server server server server Server server server server server server server Server server server server server server server server server server server Server server Server server server server server Serve server server server server server Server PD CGU 4 E TA GO CAOS C P COELO LC AO AO AA CIE OI Id ID CLCHO ON 0 0 Appendix A processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed processed request reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques reques cct eer GP ct Gh eb ck cb ck ck cl Ghee erict ce tet GP Gh GP ck cb ck ct ct cbcbctcrt cb cet T DE E ct ct OoOooooomn
48. line for machine enterprise becomes compute pi arg3 arg4 arg5 When you usethe extra args for appfileoption it must be specified at the end of the mpirun command line 56 Chapter 3 Understanding HP MPI Running applications Setting remote environment variables To set environment variables on remote hosts use the e option in the appfile For example to set the variable MPI FLAGS h remote host e MPI_FLAGS val np j program args Assigning ranks and improving communication The ranks of the processes in MPl COMM WORLD are assigned and sequentially ordered according to the order the programs appear in the appfile For example if your appfile contains h voyager np 10 send receive h enterprise np 8 compute pi HP MPI assigns ranks 0 through 9 to the 10 processes running send receive and ranks 10 through 17 to the 8 processes running compute pi You can usethis sequential ordering of process ranks to your advantage when you optimize for performance on multihost systems You can split process groups according to communication patterns to reduce or remove interhost communication hot spots For example if you havethe following A multi host run of four processes Two processes per host on two hosts Communication between ranks 0 2 and 1 3 is slow you can identify communication hot spots using HP MPI s instrumentation refer to mpiview on page 62 You could use an appfilethat contains the following
49. number of seconds to wait before issuing a signal to trigger message progression The default valuefor the MPI library is sp604800 which issues a SIGPROF once a week If the application uses both signals for its own purposes you must disable the heart beat signals A time value of zero seconds disables the heart beats This mechanism is used to guarantee message progression in applications that use nonblocking messaging requests followed by prolonged periods of time in which HP MPI routines are not called Generating a UNIX signal introduces a performance penalty every time the application processes are interrupted As a result while some applications will benefit from it others may experience a decrease in performance As part of tuning the performance of an application you can control the behavior of the heart beat signals by changing their time period or by turning them off This is accomplished by setting the time period of the s option in the MPI FLAGS environment variable for example s600 Timeis in seconds You can usethe s a p 4 option with the thread compliant library as well as the standard non thread compliant library Setting s a p for the thread compliant library has the same effect as setting MPI MT FLAGS ct When you use a value greater than Chapter 3 Chapter 3 Understanding HP MPI Running applications O for The default value for the thread compliant library is sp0 MPI MT FLAGS ct tak
50. processes Each process can execute a different program Torun an MPMD application the mpirun command must reference an appfile that contains the list of programs to be run and the number of processes to be created for each program 32 Chapter 3 Understanding HP MPI Running applications A simple invocation of an MPMD application looks like this mpirun f appfile where appfileis the text file parsed by mpirun and contains a list of programs and process counts Suppose you decompose the poisson application into two source files poisson master uses a single master process and poisson child uses four child processes The appfilefor the example application contains the two lines shown below refer to Creating an appfile on page 55 for details np 1 poisson master np 4 poisson child To build and run the example application use the following command sequence mpicc o poisson master poisson master c mpicc o poisson child poisson child c mpirun f appfile See Creating an appfile on page 55 for more information about using appfiles Chapter 3 33 Understanding HP MPI Running applications Runtime environment variables Environment variables are used to alter the way HP MPI executes an application The variable settings determine how an application behaves and how an application allocates internal resources at runtime Many applications run without setting any environment variables However applicat
51. select the Save icon on the toolbar Select the Options pulldown menu then View Graph Data or select the Data icon on the toolbar Select the Options pulldown menu then Reset Orientation or select the Reset icon on the toolbar Usethe Graph Type radio button on the toolbar to select from a submenu of graph types Movethe mouse over any bar in the graph and dick the left mouse button Data values display in a pop up window beside the mouse arrow For example refer to the pop up for MPI Sendin Figure 4 on page 75 Place the cursor over the graph and hold down the middle mouse button while moving the mouse You can restrict rotation toa single axis by pressing the X y or z key while moving the mouse Hold down the Control key and the left mouse button Drag the mouse to stretch a rectangle over the area you want to zoom Releasethe Control key and the mouse button Select the Options pulldown menu then Show Legend HP MPI 1 7 is the last release that will support mpiview mipview is not supported for Itanium based systems Chapter 4 77 Profiling Using XMPI Using XMPI XMPI is an X Motif graphical user interface for running applications monitoring processes and messages and viewing trace files XMPI provides a graphical display of the state of processes within an HP MPI application This functionality is supported for applications linked with the standard HP MPI library but not for applications linked with t
52. session starts and attaches to a process MPI_DEBUG_CONT is an environment variable that HP MPI uses to temporarily halt debugger progress beyond MPI Init By default MPI DEBUG CONT is set to 0 and you must reset it to 1 to allow the debug session to continue past MPI Init Thefollowing procedure outlines the steps to follow when you usea single process debugger Set the eadb exdb edde ewdb Or egdb option in theMPI FLAGS environment variableto usethe ADB XDB DDE WDB or GDB debugger respectively Refer to MPI FLAGS on page 37 for information about MPI FLAGS options 114 Chapter 6 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Debugging and troubleshooting Debugging HP MPI applications On remote hosts set DISPLAY to point to your console In addition use xhost to allow remote hosts to redirect their windows to your console Run your application When your application enters wMPI Init HP MPI starts one debugger Session per process and each debugger session attaches to its process Set a breakpoint anywhere following MPI Init in each session Set the global variable MPI DEBUG CONT to 1 using each session s command line interface or graphical user interface The syntax for setting the global variable depends upon which debugger you use adb mpi debug cont w 1 dde set mpi debug cont 1 xdb print MPI DEBUG CONT 1 wdb set MPI DEBUG CONT 1 gdb set MPI DEBUG CONT 1 Issu
53. six routines listed in Table 1 Six commonly used MPI routines MPI routine Description PI Init Initializes the MPI environment PI Finalize Terminates the MPI environment PI_Comm_rank Determines the rank of the calling process within a group PI_Comm_size Determines the size of the group PI_Send Sends messages PI_Recv Receives messages You must call MPT_Finalize in your application to conform to the MPI Standard HP MPI issues a warning when a process exits without calling MPI_Finalize There should be no code before MPI_Init and after MPI_Finalize Applications that violate this rule are non portable and may give incorrect results As your application grows in complexity you can introduce other routines from the library For example MPI Bcast is an often used routine for sending or broadcasting data from one process to other processes in a single operation U se broadcast transfers to get better performancethan with point to point transfers Thelatter useMPI Send to send data from each sending process and MPI Recv toreceiveit at each receiving process Thefollowing sections briefly introduce the concepts underlying MPI library routines For more detailed information refer to MPI A Message Passing Interface Standard 4 Chapter 1 Introduction MPI concepts Point to point communication Point to point communication involves sending and receiving messages between two processes This is the simplest for
54. state Process rank Number of messages sent to process but not yet received Chapter 4 85 Figure 10 Step 2 Profiling Using XMPI The current state of a process is indicated by the color of the signal light in the hexagon The color of the signal light corresponds to the color in the XMPI trace log for a given process As the trace file plays and processes communicate with each other the signal light colors change Along with the signal light icon hexagons may contain a second icon indicating the number of messages sent to a process but not yet received Click once on the hexagon representing the process for which you want more information XMPI displays the XMPI Focus dialog that has a process area anda message queue area Figure 10 displays a Focus dialog XMPI Focus dialog HP MPI function being executed t Process area Message queue area i ID i ID El Values in the fields change as you play the trace file and processes communicate with each other 86 Chapter 4 Figure 11 Profiling Using XMPI The process area describes the state of a process together with the name and arguments for the HP MPI function being executed The fields indude peer comm tag cnt Displays the rank of the function s peer process A process is identified in theformat rank x rank y where rank xindicates the rank of the process in MPI COMM WORLD and rank y indicates the rank of the pr
55. the children all run the same executable standard send mode Form of blocking send where the sending process returns when the system can buffer the message or when the message is received stride Constant amount of memory space between data elements where the elements are stored noncontiguously Strided data are sent and received using derived data types synchronization Bringing multiple processes to the same point in their execution before any can continue For example MPI_Barrier isa collective routine that blocks the calling process until all receiving processes have called it This is a useful approach for separating two stages of a computation so messages from each stage are not overlapped synchronous send mode Form of blocking send wherethe sending process returns only if a matching 190 receive is posted and the receiving process has started to receive the message tag Integer label assigned to a message when it is sent Message tags are one of the synchronization variables used to ensure that a message is delivered to the correct receiving process task Uniquely addressable thread of execution thread Smallest notion of execution in a process All MPI processes have one or more threads Multithreaded processes have one address space but each process thread contains its own counter registers and stack This allows rapid context switching because threads require little or no memory management th
56. the host P address that is assigned throughout a session Ordinarily mpirun and XMPI determine the IP address of the host they are running on by calling gethostbyaddr However when a host uses a SLIP or PPP protocol the host s IP address is dynamically assigned only when the network connection is established In this case gethostbyaddr may not return the correct IP address TheMPr LOCALIP syntax is as follows where xxx xxx Xxx XXX Specifies the host IP address Chapter 3 43 Understanding HP MPI Running applications MPI MT FLAGS MPI MT FLAGS controls runtime options when you usethe thread compliant version of HP MPI TheMwPr MT FLAGS syntax is a comma separated list as follows ct single fun serial mult where ct Creates a hidden communication thread for each rank in thejob When you enable this option be careful not to oversubscribe your system For example if you enable ct for a 16 process application running on a 16 way machine the result will be a 32 way job single Asserts that only one thread executes fun Asserts that a process can be multithreaded but only the main thread makes MPI calls that is all calls are funneled to the main thread serial Asserts that a process can be multithreaded and multiple threads can make MPI calls but calls are serialized that is only one call is made at a time mult Asserts that multiple threads can call MPI at any time with no restrictions Sett
57. to obtain representative bandwidth measurements tinclude lt stdio h gt tinclude lt stdlib h gt tinclude lt math h gt tinclude lt mpi h gt define NLOOPS 1000 define ALIGN 4096 main argc argv int argc char argv int i ji double start stop int nbytes 0 int rank size MPI Status status char but MPI Init amp argc amp argv MPI Comm rank MPI COMM WORLD amp rank MPI Comm size MPI COMM WORLD amp size if size 2 if rank printf ping pong must have two processes Wn MPI Finalize exit 0 nbytes argc gt 1 atoi argv 1 0 if nbytes lt 0 nbytes 0 Appendix A 135 Example applications ping_pong c Page align buffers and displace them in the cache to avoid collisions buf char malloc nbytes 524288 ALIGN 1 if buf 0 MPI Abort MPI COMM WORLD MPI ERR BUFFER exit 1 buf char unsigned long buf ALIGN 1 amp ALIGN 1 if rank 1 buf 524288 memset buf 0 nbytes Ping pong n if rank 0 printf ping pong d bytes n nbytes warm up loop xt for i 0 i lt 5 itt MPI_Send buf nbytes MPI_CHAR 1 1 MPI_COMM_WORLD MPI_Recv buf nbytes MPI_CHAR 1 1 MPI COMM WORLD amp status timing loop xy start MPI Wtime for i 0 i NLOOPS i ifdef CHECK for j 0 j lt nbytes j buf j char j
58. to the working directory on the host where mpirun runs When you enable tracing for multihost runs and invoke mpirun on a machine that is not running an MPI process HP MPI issues a warning and does not write the trace file TOTALVIEW When you usethe TotalView debugger HP MPI uses your PATH variable to find TotalView You can also set the absolute path and TotalView specific options in the TOTALVIEW environment variable This environment variable is used by mpirun setenv TOTALVIEW opt totalview bin totalview totalview options 48 Chapter 3 CAUTION Understanding HP MPI Running applications Runtime utility commands HP MPI provides a set of utility commands to supplement the MPI library routines These commands are listed below and described in the following sections mpirun This section also includes discussion of Shared library support Appfiles the Multipurpose daemon process and Generating multihost instrumentation profiles mpijob mpiclean xmpi mpiview mpirun The new HP MPI 1 7 start up provides the following advantages Provides support for shared libraries Allows many multi threaded applications to use high performance single threaded code paths e Includes a cleaner tear down mechanism for abnormal termination Provides a simplified path to provide bug fixes to the field HP MPI 1 7 is backward compatible at a source code level only It is not start up backward compatible
59. useful for error reporting debugging and profiling Routines used to associate names with objects indude those described in Table 19 on page 179 178 Appendix C Table 18 Table 19 Info object routines Object routine MPI Info create MPI Info set MPI Info delete MPI Info get MPI Info get valuelen MPI Info get nkeys MPI Info get nthkey MPI Info dup MPI Info free Naming object routines Object routine MPI 2 0 features supported Miscellaneous features Function Creates a new info object Adds the key value pair to info and overrides the value if a value for the same key was previously set Deletes a key value pair from info Retrieves the value associated with key in a previous call to MPI Info set Retrieves length of the value associated with key Returns the number of keys currently defined in info Returns the nth defined key in info Duplicates an existing info object creating a new object with the same key value pairs and ordering of keys Frees the info object Function MPI Comm set name Associates a name string with a communi cator MPI Comm get name Returns the last name that was associated with a given communicator MPI Type set name Associates a name string with a datatype MPI Type get name Returns the last name that was associated with a given datatype MPI Win set name Associates a name string with a window MPI Win get name
60. using MPI Send and MPI Recv each time when one process communicates with others Also use the HP MPI collectives rather than customizing your own e Specify the source process rank whenever possible when calling MPI routines Using MPI ANY SOURCE may increase latency e Double word align data buffers if possible This improves byte copy performance between sending and receiving processes because of doubl e word loads and stores UseMPI Recv init and MPI Startall instead of a loop of MPI Irecv calls in cases where requests may not complete immediately Chapter 5 105 Tuning Message latency and bandwidth For example suppose you write an application with the following code section j 0 for i 0 i lt size i if i rank continue MPI_Irecv buf il count dtype i 0 comm amp requests j MPI Waitall size 1 requests statuses Suppose that one of the iterations through MPI Irecv does not complete before the next iteration of theloop In this case HP MPI tries to progress both requests This progression effort could continue to grow if succeeding iterations also do not complete immediately resulting in a higher latency However you could rewrite the code section as follows j 0 for i 0 i lt size i if i rank continue MPI_Recv_init buf i count dtype i 0 comm amp requests j MPI Startall size 1 requests MPI Waitall size 1 requests statuses In this case all iter
61. void MPI Handler function MPI Comm int Appendix D Standard flexibility in HP MPI Reference in MPI standard Continued HP MPI s implementation Continued MPI implementors may place a barrier inside MPI_FINALIZE See MPI 2 0 Section 3 2 2 MPI defines minimal requirements for thread compliant MPI implementations and MPI can beimplemented in environments where threads are not supported See MPI 2 0 Section 8 7 The format for spedifying the filename in MPI FILE OPEN is implementation dependent An implementation may require that filename include a string specifying additional information about the file See MPI 2 0 Section 9 2 1 HP MPI sMPI FINALIZE behaves as a barrier function such that the return from MPI FINALIZE is delayed until all potential future cancellations are processed HP MPI provides a thread compliant library libmtmpi Use 1ibmtmpi on the link line to use the libmtmpi Refer to Thread compliant library on page 170 for more information HP MPI I O supports a subset of the MPI 2 0 standard using ROMIO a portable implementation developed at Argonne National Laboratory No additional file information is necessary in your filename string Appendix D 183 Standard flexibility in HP MPI 184 Appendix D Glossary asynchronous Communication in which sending and receiving processes place no constraints on each other in terms of comp
62. 0 program 2 Chapter 1 Introduction MPI concepts MPI concepts The primary goals of MPI are efficient communication and portability Although several message passing libraries exist on different systems MPI is popular for the following reasons Support for full asynchronous communication P rocess communication can overlap process computation Group membership Processes may be grouped based on context Synchronization variables that protect process messaging When sending and receiving messages synchronization is enforced by source and destination information message labeling and context information Portability All implementations are based on a published standard that specifies the semantics for usage An MPI program consists of a set of processes and a logical communication medium connecting those processes An MPI process cannot directly access memory in another MPI process nter process communication requires calling MPI routines in both processes MPI defines a library of routines through which MPI processes communicate The MPI library routines provide a set of functions that support Point to poi nt communications Collective operations Process groups Communication contexts Process topol ogies Datatype manipulation Chapter 1 3 Table 1 CAUTION Introduction MPI concepts Although the MPI library contains a large number of routines you can design a large number of applications by using the
63. 17 36 59 1998 Scale Wall Clock Seconds Processes 2 User 33 65 PI 66 35 Overhead 66 35 Blocking 0 00 Total Message Count 4 inimum Message Range 4 0 32 aximum Message Range 4 0 32 Average Message Range 4 0 32 Top Routines PI_Init 86 39 Overhead 86 39 Blocking 0 00 PI_Bcast 12 96 Overhead 12 96 Blocking 0 00 PI_Finalize 0 43 Overhead 0 43 Blocking 0 00 PI_Reduce 0 21 Overhead 0 21 Blocking 0 00 il a Instrumentation Data A TEE Application Summary by Rank Rank Duration Overhead Blocking User MPI 1 0 248998 0 221605 0 000000 11 00 89 00 0 0 249118 0 108919 0 000000 56 28 43 72 Routine Summary Routine Calls Overhead Blocking PI Init 2 0 285536 0 000000 min 0 086926 0 000000 max 0 198610 0 000000 avg 0 142768 0 000000 PI Bcast 2 0 042849 0 000000 min 0 021393 0 000000 max 0 021456 0 000000 avg 0 021424 0 000000 PI Finalize 2 0 001434 0 000000 min 0 000240 0 000000 max 0 001194 0 000000 avg 0 000717 0 000000 PI Reduce 2 0 000705 0 000000 min 0 000297 0 000000 max 0 000408 0 000000 avg 0 000353 0 000000 Chapter 4 71 Profiling Using counter instrumentation Routine Summary by Rank Routine Rank Calls Overhead Blocking PI Init 0 1 0 086926 0 000000 1 1 0 198610 0 000000 PI Bcast 0 1 0 021456 0 000000 1 1 0 021393 0 000000 PI Finalize 0 1 0 000240 0 000000 1 1 0 001194 0 000000 PI Reduce 0 1 0 000297 0 000000 1 1 0 000408 0 000000 Routine Summary by
64. 93 See Trace dialog external input and output F 126 FAQ 129 Fast Forward See trace file fast forward trace log 83 file data partitioning See I O 166 file descriptor limit 126 Fortran 77 compiler 28 Fortran 77 examples array partitioning 148 compute_pi f 131 138 multi_par f 131 147 send_receive f 131 133 Fortran 90 compiler 28 Fortran 90 examples master_worker f90 140 Fortran 90 troubleshooting 125 Fortran compiler options 29 Fortran profiling 102 freeing memory 37 frequently asked questions 129 full trace 93 fully subscribed See subscription types G gang scheduling 40 109 gather 10 GDB 37 114 130 gethostname 181 getting started 17 ght 68 global reduce scatter 12 global reduction 12 global variables MPI_DEBUG_CONT 114 graph MPIVIEW 75 rotate 76 view multiple 76 window 73 zoom 76 graph legend 76 green See process colors group membership 3 group size 4 H header files 25 heart beat signals 38 hexagons 90 hosts assigning using LSF 64 multiple 55 61 HP MPI abort 98 building 122 change behavior 37 130 clean up 129 completing 128 debug 113 FAQ 113 129 frequently asked ques tions 129 jobs running 59 kill 61 multi process gers 116 profile process 100 ruming 123 debug 197 single process debug gers 114 specify shared memory 41 starting 50 122 troubleshooting 130 twisted data l
65. G PX PX PX PX PX PX PX PX PX PX PX PX CO PG X PX P PS PX PX pS NOTE HP MPI 1 7 is the last release that will support XMPI 164 Appendix B Table 11 MPI 2 0 features supported HP MPI is fully compliant with the MPI 1 2 standard and supports a subset of the MPI 2 0 standard The MPI 2 0 features supported are identified in Table 11 MPI 2 0 features supported in HP MPI MPI 2 0 feature Standard reference MPI I O Chapter 9 Language interoperability Section 4 12 Thread compliant library Section 8 7 MPI Init NULL arguments Section 4 2 One sided communi cati on Chapter 6 Miscellaneous features Sections 4 6 through 4 10 and section 8 3 Each of these features is briefly described in the sections of this appendix Appendix C 165 Table 12 MPI 2 0 features supported MPI I O MPI I O UNIX I O functions provide a model for a portable file system H owever the portability and optimization needed for parallel O cannot be achieved using this model The MPI 2 0 standard defines an interface for parallel I O that supports partitioning of file data among processes The standard also supports a collective interface for transferring global data structures between processes memories and files HP MPI I O supports a subset of the MPI 2 0 standard using ROMIO a portable implementation developed at Argonne National Laboratory The subset is identified in Table 12 MPI I O functionality supported by HP MPI
66. HP MPI User s Guide Sixth Edition Us HEWLETT PACKARD B 6060 96004 March 2001 O Copyright 2001 Hewlett Packard Company Edition Sixth B6060 96001 Remarks Released with HP MPI V1 7 March 2001 Edition Fifth B6060 96001 Remarks Released with HP MPI V1 6 J une 2000 Edition Fourth B6011 90001 Remarks Released with HP MPI V1 5 February 1999 Edition Third B6011 90001 Remarks Released with HP MPI V1 4 J une 1998 Edition Second B6011 90001 Remarks Released with HP MPI V1 3 October 1997 Edition First B6011 90001 Remarks Released with HP MPI V1 1 J anuary 1997 Notice Reproduction adaptation or translation without prior written permission is prohibited except as allowed under the copyright laws The information contained in this document is subject to change without notice Hewlett Packard makes no warranty of any kind with regard tothis material including but not limited to the implied warranties of merchantability and fitness for a particular purpose Hewlett Packard shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing performance or use of this material Parts of this book came from Cornell Theory Center s web document That document is copyrighted by the Cornell Theory Center Parts of this book came from MPI A Message Passing Interface That book is copyrighted by the University of Tennessee
67. ORLD 5 MPI COMMD 34 35 MPI Datatype MPI Type f2c 168 MPI DEBUG CONT 114 MPI DLIB FLAGS 34 35 MPI Finalize 4 129 199 MPI Fint MPI Comm c f 168 MPI Fint MPI Group c 2f 168 MPI Fint MPI Op c2f 168 MPI Fint MPI Request c2f 168 MPI Fint MPI Request f2c 169 MPI Fint 168 MPI FLAGS 34 37 104 using to troubleshoot 114 MPI FLAGS options DDE 114 E 104 GDB 114 WDB 114 XDB 114 y 104 MPI GET PROCESSOR NAME 181 MPI_GLOBMEMSIZE 34 41 MPI_Group MPI Group f2c 168 MPI handler function 181 MPI Type c2f 200 MPI Ibsend 9 MPI Init 4 174 MPI INSTR 34 41 68 MPI Irecv 9 MPI Irsend 9 MPI Isend 9 MPI Issend 9 MPI_LOCALIP 34 43 MPI_MT_FLAGS 44 45 MPI_NOBACKTRACE 34 MPI_Op MPI_Op_c2f 168 MPI Recv 4 8 high message width 110 low message latency 110 MPI Reduce 12 13 MPI Reduce 13 MPI_REMSH 45 MPI_ROOT variable 25 MPI Rsend 7 convert to MPI Ssend 40 MPI Scatter 12 MPI Send 4 7 130 convert to MPI Ssend 40 high message width 110 low message latency band band 110 MPI_SHMCNTL 40 MPI SHMEMCNTL 46 MPI Ssend 7 MPI Status c2f 169 MPI Status f2c 169 MPI_TMPDIR 34 46 MPI_TOPOLOGY See also improve net work perfor mance MPI_WORKDIR 34 46 MPI_XMPI 34 47 mpiCC utility 28 29 mpicc utility 28 29 mpiclean 31 49 61 128 mpif77 utility 28 29 mpif90 utility 28 29 MPIHP Trace off 69 79 MPIHP Trace on 69 79 mpijob
68. PI Send index 1 MPI INTEGER dest 0 MPI COMM WORLD ierr call MPI Send data index chunksize MPI REAL dest O0 amp MPI COMM WORLD ierr index index chunksize end do do i 1 numworkers source i call MPI_Recv index 1 MPI INTEGER source 1 MPI COMM WORLD status ierr call MPI Recv result index chunksize MPI REAL source 1 amp MPI COMM WORLD status ierr end do 140 Appendix A Example applications master_worker f90 do i 1 numworkers chunksize if result i ne i 1 then print element i expecting i41 actual is result i numfail numfail 1 endif enddo if numfail ne 0 then print out of ARRAYSIZE elements numfail wrong answers r else print correct results endif end if KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK Worker task ck ck Ck Ck Ck CK Ck KKK KKKKKKKKKKKKKKKK KK du taskid gt MASTER then call MPI_Recv index 1 MPI INTEGER MASTER 0 MPI COMM WORLD amp status ierr call MPI_Recv result index chunksize MPI REAL MASTER 0 amp MPI COMM WORLD status ierr do i index index chunksize 1 result i i 1 end do call MPI_Send index 1 MPI INTEGER MASTER 1 MPI COMM WORLD ierr call MPI Send result index chunksize MPI REAL MASTER 1 amp MPI COMM WORLD ierr end if call MPI Finalize ierr end program array manipulation master worker output The output from running the master worker exe
69. Win create and returns a null handle Window attributes HP MPI supports theMPI Win get group function MPI Win get group returns a duplicate of the group of the communicator used to create the window that is the processes that share access to the window Appendix C 175 MPI 2 0 features supported One sided communication Data transfer Data transfer operations are nonblocking data transfer calls initiate the transfer but the transfer may continue after the call returns The transfer is completed both at the origin and at the target when a subsequent synchronization call is issued by the caller on the involved window object HP MPI supports two data transfer operations MPI Put and MPI Get MPI Put is similar to execution of a send by the origin process and a matching receive by the target process except that all arguments are provided by the call executed by the origin process Synchronization Transfer operations complete at the origin and at thetarget when a subsequent synchronization call is issued by the caller on the involved window object HP MPI supports three synchronization calls MPI Win fence MPI Win lock andMPI Win unlock MPI Win fence isa collective synchronization call that supports a loosely synchronous model where global computation phases alternate with global communication phases All remote memory access calls originating at a given process and started beforethe fence call complete at that proces
70. XMPI Dump dialog 1 2 2 00 teens 92 XMPI Express dialog 0 0 cece teen ees 93 XMPI monitor options dialog 0 0 c eect ee 95 XMPI buffer size dialog 2 0 eee 96 mpirun options dialog 2 2 2 0c tte 97 Tracing options dialog 1 2 0 eects 98 Multiple network interfaces 2 0 0 ccc tee 108 Array partitioning s cet oa E eR a3 et ey ae De ux 148 List of Figures ix List of Figures Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12 Table 13 Table 14 Table 15 Table 16 Table 17 Table 18 Table 19 Table 20 Tables Six commonly used MPI routines oooccccccccoc sees 4 MPI blocking and nonblocking calls ooooocococccconoco o 9 Organization of the opt mpi directory 0 0 0 eee eee ee 25 Man page Categories oooooccoocococco rn 26 Compilation utilities ooooooccoccccococo en 28 Compilation environment variables llll liess 29 MPIVIEW analysis functions ococcccococooo 77 Subscription types iris eee a eda pa iia 109 Non buffered messages and deadlock o oooooooooooommmo 124 Example applications shipped with HP MPI o cococcccccco 131 MPI 2 0 features supported in HP MPI 0 00 cece eee eee 165 MPI I O functionality supported by HP MPI 00000000 0 ee 166 Info object keys cess by bk ed ade bev ae ede bo RR DER EIS 167 Language inte
71. Your previous version of HP MPI must be retained in order to run executables built with archive libraries on previous versions of HP MPI Thenew HP MPI 1 7 start up requires that MPI be installed in the same directory on every execution host The default is the location from which mpirun is executed This can be overridden with the MPI_ROOT environment variable We recommend setting the MPI ROOT environment variable prior to starting mpi run Chapter 3 49 Understanding HP MPI Running applications NOTE Options w and w are no longer supported Previous versions of HP MPI allowed mpirun to exit prior to application termination by specifying the w option Because the w option used with mpirun is nolonger supported place mpirun in the background to achieve similar functionality mpirun syntax has four formats For applications where all processes execute the same program on the same host mpirun np help version djpv ck t spec i spec h host 1 user e var val sp paths tv program args For example mpirun j np 3 send receive runs the send receive application with three processes and prints out the job ID For applications that consist of multiple programs or that run on multiple hosts mpirun help version djpv ck t spec i spec commd tv f appfile extra args for appfile In this case each program in the application is listed in a file
72. and MPI Rsend but it is dependent on message size and at the discretion of the implementation QUESTION How can tell if the deadlock is because my code depends on buffering ANSWER To quickly determine whether the problem is due to your code being dependent on buffering set the z option for MPI FLAGS MPI FLAGS modifies the general behavior of HP MPI and in this case converts MPI Send and MPI Rsend Calls in your codetoMPI Ssend without you having to rewrite your code MPT_Ssend guarantees synchronous send semantics that is a send can be started whether or not a matching receive is posted However the send completes successfully only if a matching receive is posted and the receive operation has started to receive the message sent by the synchronous send If your application still hangs after you convert MPI Send and MPI Rsend Calls to MPI Ssend you know that your code is written to depend on buffering You should rewriteit sothat MPI Send and MP I_Rsend do not depend on buffering Alternatively use nonblocking communication calls to initiate send operations A nonblocking send start call returns before the message is copied out of the send buffer but a separate send complete call is needed to complete the operation Refer also to Sending and receiving messages on page 6 for information about blocking and nonblocking communication Refer to MPI FLAGS on page 37 for information about MPI FLAGS options 130 Chapter 6
73. and synchronization functions One process specifies all communication parameters both for the sending side and the receiving side This mode of communication is best for applications with dynamically changing data access patterns where data distribution is fixed or slowly changing Each process can compute what data it needs to access or update at other processes Processes in such applications however may not know which data in their memory needs to be accessible by remote processes or even the identity of these remote processes In this case applications can open windows in their memory space that are accessible by remote processes HP MPI supports a subset of the MPI 2 0 one sided communication functionality Window creation The initialization process that allows each process in an intracommunicator group to specify in a collective operation a window in its memory that is made accessible to remote processes The window creation call returns an opaque object that represents the group of processes that own and access a set of windows and the attributes of each window as specified by the initialization call HP MPI supports the MPI Win create andtheMPI Win free functions MPI Win Create isa collective call executed by all processes in a group It returns a window object that can be used by these processes to perform remote memory access operati ons MPI Win free is also a collective call and frees the window object created by MPI
74. appfile contains the following items h voyager np 10 send receive h enterprise np 8 compute pi Host assignments are returned for the two symbolic links voyager and enterprise When requesting a host from LSF you must ensure that the path to your executable file is accessible by all machines in the resource pool Refer to Assigning hosts using LSF on page 64 for more information where mpirun options are ck Behaves likethe p option but supports two additional checks of your MPI application it checks if the specified host machines and programs are available and also checks for access or permission problems Chapter 3 51 Understanding HP MPI Running applications commd d e var val f appfile h host help i spec 1 user np 52 Routes all off host communication through daemons rather than between processes Refer to Communicating using daemons on page 62 for more information Turns on debug mode Sets the environment variable var for the program and gives it the value val if provided Environment variable substitutions for example F OO are supported in the val argument Specifies the appfilethat mpirun parses to get program and process count information for the run Refer to Creating an appfile on page 55 for details about setting up your appfile Specifies a host on which to start the processes default islocal host Prints usage information for the util
75. ary and are only valid on numeric data Reductions are always associative but may or may not be commutative 12 Chapter 1 Introduction MPI concepts You can select a reduction operation from a predefined list refer to section 4 9 2 in the MPI 1 0 standard or define your own operation The operations are invoked by placing the operation name for example MPI SUM Or MPI PROD in op as described in the MPT_Reduce syntax below Toimplement a reduction use MPI Reduce void sendbuf void recvbuf int count MPI Datatype dtype MPI Op op int root MPI Comm comm where sendbuf Specifies the address of the send buffer recvbuf Denotes the address of the receive buffer count Indicates the number of elements in the send buffer dtype Specifies the datatype of the send and receive buffers op Specifies the reduction operation root Indicates the rank of the root process comm Designates the communication context that identifies a group of processes For example compute pi f on page 138 uses MPI REDUCE tosumthe elements provided in the input buffer of each process in MPI COMM WORLD using MPI SUM and returns the summed value in the output buffer of the root process in this case process 0 Synchronization Collective routines return as soon as their participation in a communication is complete However the return of the calling process does not guarantee that the receiving processes have completed or even started the ope
76. ate should be equal If an application is synchronized the segments representing each of the three states should be concentric Select the Application menu then Quit to close XMPI Working with interactive mode Interactive mode allows you to load and run an appfile to view state information for each process as your application runs Running an appfile Use these instructions to run and view your appfile Enter xmpi at your UNIX prompt to open the XMPI main window Refer to xmpi on page 61 for information about options you can specify with xmpi Figure 6 on page 80 shows the XMPI main window Select the Application menu then Browse amp Run XMPI opens the XMPI Application Browser dialog Select or type the full path name of the appropriate appfile in the Selection field and select Run The XMPI main window fills with a group of tiled hexagons each representing the current state of a process and labelled by the process s rank within MPI_COMM_WORLD The window is the same as the one XMPI invokes in postmortem mode Refer to Figure 9 on page 85 The state of a process is indicated by the color of the signal light in the hexagon Along with the signal light icon hexagons can contain an icon that indicates the number of messages sent toa process that it has yet to receive The process hexagons persist only as long as the application runs and disappear when the application completes 90 Chapter 4 Profiling Using XMPI T
77. ations usethe MPI blocking routines MPI Send and MPI Recv For asynchronous communications use the MPI nonblocking routines MPI Isend and MPI Irecv When using blocking routines try to avoid pending requests MPI must advance nonblocking messages so calls to blocking receives must advance pending requests occasionally resulting in lower application performance For tasks that require collective operations use the appropriate MPI collective routine HP MPI takes advantage of shared memory to perform efficient data movement and maximize your applicati on s communication performance Multilevel parallelism There are several ways to improve the performance of applications that use multilevel parallelism Usethe MPI library to provide coarse grained parallelism and a parallelizing compiler to provide fine grained that is thread based parallelism An appropriate mix of coarse and fine grained parallelism provides better overall performance e Assign only one multithreaded process per host when placing application processes This ensures that enough processors are available as different process threads become active 110 Chapter 5 Tuning Coding considerations Coding considerations The following are suggestions and items to consider when coding your MPI applications to improve performance Use HP MPI collective routines instead of coding your own with point to point routines because HP MPI s collective routines are opti
78. ations through MPI Recv init are progressed just once when MPI Startallis called This approach avoids the additional progression overhead when using MPI Irecv and can reduce application latency 106 Chapter 5 Tuning Multiple network interfaces Multiple network interfaces You can use multiple network interfaces for interhost communication while still having intrahost exchanges In this case the intrahost exchanges use shared memory between processes mapped to different same host P addresses To use multiple network interfaces you must specify which MPI processes are associated with each IP address in your appfile For example when you have two hosts hostO and host1 each communicating using two ethernet cards ethernetO and ethernet1 you have four host names as follows e hostO ethernetO e hostO ethernet1 e hostl ethernetO e hostl ethernet1 If your executable is called beavis exe and uses 64 processes your appfile should contain the following entries h host0 ethernet0 np 16 beavis exe h hostO0 ethernetl np 16 beavis exe h hostl ethernet0 np 16 beavis exe h hostl ethernetl np 16 beavis exe Now when the appfileis run 32 processes run on hostO and 32 processes run on host1 as shown in Figure 19 Chapter 5 107 Tuning Multiple network interfaces Figure 19 Multiple network interfaces Ranks 0 15 ethernet0 ethernet0 Ranks 32 47 4 y Ranks 16 31 ethernet1 ethernet1 Rank
79. ayout 149 utility files 25 HP MPI User s Guide ht ml 25 HP MPI utility files 25 HP UX gang scheduling 40 109 121 I i option 42 52 T O 166 181 IMPI 64 implement barrier 14 reduction 13 improve bandwidth 105 coding HP MPL 111 latency 105 network performance 107 improving interhost com munication 57 increase trace magnifica 198 tion 83 indexed constructor 15 initialize MPI environment 4 Initially off field 98 instrumentation mpiview file 73 tr file 79 91 ASCII profile 71 counter 68 creating profile 68 MPIVIEW 73 77 multihost 59 output file 68 XMPI 78 instrumentation bin 41 interactive mode 90 intercommunicators 5 interhost communication See multiple network in terfaces interoperability problems 125 interrupt calls to MPI library See profiling interface intracommunicators 5 message J j option 31 job ID 31 97 K kill MPI jobs 61 Kiviat dialog 84 85 89 views 89 L language bindings 181 language interoperability 168 latency 5 105 110 libmtmpi See linking thread com pliant library linking thread compliant li brary 30 170 load sharing facility See LSF logical values in Fortran77 40 LSF load sharing facility 64 M magnify trace log 83 main window XMPI 80 90 Makefile 132 man pages categories 26 compilation utilities 26 general HP MPI 26 HP MPI library 25 HP MPI utilities 25 runtime 26 master worker f90
80. behavior describing repeatability in observed parameters The order of a set of events does not vary from run torun 186 domain decomposition Breaking down an MPI application s computational space into regular data structures such that all computation on these structures is identical and performed in parallel explicit parallelism Programming stylethat requires you to specify parallel constructs directly Using the MPI library is an example of explicit parallelism functional decomposition Breaking down an MPI application s computational space into separate tasks such that all computation on these tasks is performed in parallel gather Many to one collective operation where each process including the root sends the contents of ts send buffer to the root granularity Measure of the work done between synchronization points Fine grained applications focus on execution at the instruction level of a program Such applications are load balanced but suffer from a low computation communi cation ratio Coarse grained applications focus on execution at the program level where multiple programs may be executed in parallel group Set of tasks that can be used to organize MPI applications Multiple groups are useful for solving problems in linear algebra and domain decomposition implicit parallelism Programming style where parallelismis achieved by software layering that is parallel constructs are generated thro
81. ble precision startt endt elapsed time keepers external compcolumn comprow subroutines execute in threads e e e Q qgaqaaqgqagqagqgqagqagqadgqaaagqaaqana MPI initialization call mpi_init ierr call mpi comm size mpi comm world comm size ierr call mpi comm rank mpi comm world comm rank ierr Data initialization and start up if comm rank eq 0 then write 6 Initializing nrow x ncol array call getdata nrow ncol array write 6 Start computation endif call mpi barrier MPI COMM WORLD ierr startt mpi_wtime Compose MPI datatypes for row column send receive Note that the numbers from rbs i to rbe i are the indices of the rows belonging to the i th block of rows These indices Specify a portion the i th portion of a column and the datatype rdtype i is created as an MPI contiguous datatype to refer to the i th portion of a column Note this is a contiguous datatype because fortran arrays are stored column wise For a range of columns to specify portions of rows the situation is similar the numbers from cbs j to cbe j are the indices of the columns belonging to the j th block of columns These indices specify a portion the j th portion of a row and the datatype cdtype j is created as an MPI vector datatype to refer to the j th portion of a row Note this a vector datatype because adjacent elements in a row are actually spaced nrow elements apart in memory allocate rbs 0 comm s
82. block to the rank 1 process that computes 1 2 in the second step By repeating these steps all processes finish summations in row wise fashion the first outer loop in the illustrated program The second outer loop the summations in column wise fashion is done in the same manner For example at the beginning of the second step for the column wise summations the rank 2 process receives data from the rank 1 process that computed the 3 0 block The rank 2 process also sends the last column of the 2 0 block to the rank 3 process Note that each process keeps the same blocks for both of the outer loop computations 148 Appendix A Example applications multi_par f This approach is good for distributed memory architectures on which repartitioning requires massive data communications that are expensive However on shared memory architectures the partitioning of the compute region does not imply data distribution The row and column block partitioning method requires just one synchronization at the end of each outer loop For distributed shared memory architectures the mix of the two methods can be effective The sample program implements the twisted data layout method with MPI and the row and column block partitioning method with OPE NMP thread directives In the first case the data dependency is easily satisfied as each thread computes down a different set of columns In the second case we still want to compute down the columns fo
83. buffer contains the message the receiving process is free to access it and the status object that returns information about the received message is set Chapter 1 9 Introduction MPI concepts Collective operations Applications may require coordinated operations among multiple processes For example all processes need to cooperate to sum sets of numbers distributed among them MPI provides a set of collective operations to coordinate operations among processes These operations are implemented such that all processes call the same operation with the same arguments Thus when sending and receiving messages one collective operation can replace multiple sends and receives resulting in lower overhead and higher performance Collective operations consist of routines for communication computation and synchronization These routines all specify a communicator argument that defines the group of participating processes and the context of the operation Collective operations are valid only for intracommunicators ntercommunicators are not allowed as arguments Communication Collective communication involves the exchange of data among all processes in a group The communication can be one to many many to one or many to many The single originating process in the one to many routines or the single receiving process in the many to one routines is called the root Collective communications have three basic patterns Broadcast a
84. cess can unpack data received in a contiguous buffer and store it in noncontiguous locations Using derived datatypes is more efficient than using MPI Pack and MP I_Unpack However derived datatypes cannot handle the case where the data layout varies and is unknown by the receiver for example messages that embed their own layout description 14 Chapter 1 Introduction MPI concepts Section 3 12 Derived Datatypes in the MPI 1 0 standard describes the construction and use of derived datatypes Thefollowing is a summary of the types of constructor functions available in MPI e Contiguous MPI Type contiguous Allows replication of a datatype into contiguous locations e Vector MPI Type vector Allows replication of a datatype into locations that consist of equally spaced blocks e ndexed MPI Type indexed Allows replication of a datatype into a sequence of blocks where each block can contain a different number of copies and have a different displacement e Structure MPI Type struct Allows replication of a datatype into a sequence of blocks such that each block consists of replications of different datatypes copies and displacements After you create a derived datatype you must commit it by calling MPI Type commit HP MPI optimizes collection and communication of derived datatypes Section 3 13 Pack and unpack in the MPI 1 0 standard describes the details of the pack and unpack functions for MPI Used tog
85. cesses that belong to the communicator are highlighted in the XMPI main window Displays the value of the tag argument associated with the message when it was sent Shows the count of the message data elements associated with the message when it was sent When you select the icon to the right of the ent field XMPI opens the XMPI Datatype dialog The XMPI Datatype dialog displays thetype map of the datatype associated with the message when it was sent Refer to Figure 11 on page 87 for the Datatype dialog Displays thetotal number of messages and the number of messages of the type described in the current Focus dialog The format is number of messages of the type described in the Current focus dialog of total number of messages A message type is defined by its message envelope consisting of the sender the communicator the tag the count and the datatype For example if a process is waiting to receive 10 messages where six of the messages have one type of message envelope and the remaining four have another the copy field toggles between 6 of 10 and 4 of 10 Usetheicon tothe right of the copy field to view the different Focus dialogs that exist to describe each message type Chapter 4 Figure 12 Step 3 Step 1 Step 2 Profiling Using XMPI XMPI treats six messages each with the same envelope as one copy and the remaining four messages as a different copy This way one Focus dialog is necessary for each
86. compute the E block next to the computed block Receive the last row of the c block that the next block being computed depends on G nrb rb 1 ncb mod nrb comm rank comm size call mpi sendrecv array rbe rb cbs cb 1 cdtype cb dest O array rbs nrb 1 cbs ncb 1 cdtype ncb src 0 mpi comm world mstat ierr endif enddo c c Sum up in each row The same logic as the loop above except rows and columns are c switched c src mod comm rank 1 comm size comm size dest mod comm_rank 1 comm_size do cb 0 comm size 1 rb mod cb comm rank comm size comm size call comprow nrow ncol array rbs rb rbe rb cbs cb cbe cb if cb lt comm size 1 then ncb cb 1 nrb mod ncb comm_rank comm_size comm_size call mpi sendrecv array rbs rb cbe cb 1 rdtype rb dest O array rbs nrb cbs ncb 1 1 rdtype nrb src O0 mpi comm world mstat ierr endif enddo c c Gather computation results call mpi barrier MPI COMM WORLD ierr endt mpi_wtime if comm_rank eq 0 then do src 1 comm_size 1 call mpi recv array l twdtype src src 0 mpi comm world mstat ierr enddo 152 Appendix A Example applications multi_par f elapsed endt startt write 6 else Computation took elapsed seconds call mpi_send array 1 twdtype comm rank 0 0 mpi comm world id ierr endif Dump to a file if comm rank write 8 endif 0000000000000 eq 0 then print Dumping to adi out open
87. count 10 from MPI_ANY_SOURCE call MPI_Recv data count MPI_DOUBLE_PRECISION from tag MPI COMM WORLD status ierr Appendix A 133 Example applications send_receive f call MPI_Get_Count status MPI DOUBLE PRECISION st count ierr St source status MPI SOURCE st tag status MPI TAG print Status info source st source tag st tag count st count print rank received data i i 1 10 endif call MPI Finalize ierr stop end send receive output The output from running the send receive executable is shown below The application was run with np 10 Process 0 of 10 is alive Process 1 of 10 is alive Process 3 of 10 is alive Process 5 of 10 is alive Process 9 of 10 is alive Process 2 of 10 is alive Status info source 0 tag 2001 count 10 9 received 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Process 4 of is alive Process 7 of is alive Process 8 of is alive Process 6 of is alive eO co 134 Appendix A Example applications ping_pong c ping_pong c This C example is used as a performance benchmark to measure the amount of time t takes to send and receive data between two processes The buffers are aligned and offset from each other to avoid cache conflicts caused by direct process to process byte copy operations To run this example Definethe CHECK macroto check data integrity e Increase the number of bytes to at least twice the cache size
88. create an appfile and assign ranks Writes an optimization report to stdout MPI Cart create and MPI Graph create optimize the mapping of processes onto the virtual topology if rank reordering is enabled 39 Understanding HP MPI Running applications E2 Sets 1 as the value of TRUE and 0 as the value for FALSE when returning logical values from HP MPI routines called within Fortran 77 applications D Dumps shared memory configuration information Use this option to get shared memory values that are useful when you want to set the MPT_SHMCNTL flag E Disables function parameter error checking Turning off argument checking can improve performance z Enables zero buffering mode Set this flag to convert MPI Send and MPI_Rsend Calls in your code to MPI_Ssend without rewriting your code Refer to Troubleshooting Application hangs in MPI Send on page 130 for information about how using this option can help uncover nonportable code in your MPI application MP GANG MP GANG enables gang scheduling Gang scheduling improves the latency for synchronization by ensuring that all runable processes in a gang are scheduled simultaneously Processes waiting at a barrier for example do not have to wait for processes that are not currently scheduled This proves most beneficial for applications with frequent synchronization operations Applications with infrequent synchronization however may perform better if gang scheduling
89. cting profiling data When you run your hello world program as described in Compiling and running your first application on page 19 you can set options so that you collect counter instrumentation and profiling data to view and analyze using the mpiview and XMPI utilities This section describes the mpirun options you can use to collect instrumentation data For complete details about how to usethe mpiview and XMPI utilities to analyze profiling information refer to Chapter 4 Profiling Preparing mpiview instrumentation files Counter instrumentation provides cumulative statistics about your applications Once you have created an instrumentation profile you can view the data either in ASCII format or graphically using the mpiview utility To create instrumentation files in both formats when you run the hello world program enter mpirun i hello world np 4 hello world where i hello world Enables runtime instrumentati on profiling for all processes and uses the name following the i option in this case hello_wor1d as the prefix to your instrumentation file np 4 Specifies the number of processes hello_world Specifies the name of the executable This invocation creates an instrumentation profile in two formats each with the prefix hello world as defined by the i option hello world instr is in ASCII format and hello world mpiview is in graphical format You can usethe mpiview utility to analyze the mpiview format
90. cutable is shown below The application was run with np 2 correct results Appendix A 141 Example applications cart C cart C This C program generates a virtual topology The class Node represents a node in a 2 D torus E ach process is assigned a node or nothing Each node holds integer data and the shift operation exchanges the data with its neighbors Thus north east south west shifting returns the initial data tinclude lt stdio h gt tinclude lt mpi h gt define NDIMS 2 typedef enum NORTH SOUTH EAST WEST Direction A node in 2 D torus class Node private MPI Comm comm int dims NDIMS coords NDIMS int grank lrank int data public Node void Node void void profile void void print void void shift Direction y A constructor Node Node void int i nnodes periods NDIMS Create a balanced distribution PI Comm size MPI COMM WORLD amp nnodes for i 0 i NDIMS i dims i PI Dims create nnodes NDIMS dims 0 Establish a cartesian topology communicator for i 0 i lt NDIMS i periods i 1 PI Cart create MPI COMM WORLD NDIMS dims periods 1 amp comm Initialize the data PI Comm rank MPI COMM WORLD amp grank if comm MPI COMM NULL lrank MPI PROC NULL data 1 142 Appendix A else 4 MPI Comm rank comm amp lrank data lrank MPI Cart coords comm lrank NDIMS
91. d O through 3 and the eight processes belonging to the calculate average application are ranked 4 through 11 because HP MPI assigns ranks in MPI COMM WORLD according to the order the programs appear in the appfile Fortran 90 programming features The MPI 1 1 standard defines bindings for Fortran 77 but not Fortran 90 Although most Fortran 90 MPI applications work using the Fortran 77 MPI bindings some Fortran 90 features can cause unexpected behavior when used with HP MPI In Fortran 90 an array is not always stored in contiguous memory When noncontiguous array data are passed to an HP MPI subroutine Fortran 90 copies the data into temporary storage passes it to the HP MPI subroutine and copies it back when the subroutine returns As a result HP MPI is given the address of the copy but not of the original data In some cases this copy in and copy out operation can cause a problem For a nonblocking HP MPI call the subroutine returns immediately and the temporary storage is deallocated When HP MPI tries to access the already invalid memory the behavior is unknown Moreover HP MPI operates dose to the system level and needs to know the address of the original data However even if the address is known HP MPI does not know if the data are contiguous or not Chapter 6 125 CAUTION Debugging and troubleshooting Troubleshooting HP MPI applications UNIX open file descriptors UNIX imposes a limit to the number of file descr
92. dix B XMPI resource file cooooocooococoncn nn nnn nnn 163 Appendix C MPI 2 0 features supported cocoococooc enne nnn 165 MPBI TQ ey zu eb as eiectus sut vates MM dd 166 Language interoperability lille esee 168 Thread compliant library lisse eese 170 MPI Init NULL arguments 0 00 eee eee 174 One sided communication oocococcccoco esee 175 Miscellaneous features 1 0 0 178 Appendix D Standard flexibility in HP MPI eee eee eee 181 dh cC TTE 185 Index iszislicuizuexsd3 xd timui atie edad cia ny cR Bla aoo da aaa ada Dea y ord f MR aa SUN Rin 193 viii Table of Contents Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figures Daemon communication ooocccccccc ren 63 ASCII instrumentation profile 0 0 00 cee eee 71 MPIVIEW Graph menu ssssssesese tte eee 74 MPIVIEW graph window 0 000 eect en 75 MPIVIEW Window menu sssssse e I IR m ren 76 XMPI main window llle mr 80 XMPI Trace Selection 0 0 ne 81 XMPI trace lOG cts o oy A ek p 82 XMPI process information 0 000 cee tees 85 XMPI Focus dialog e dr hae bh vg Pages ta pee aad Rs 86 XMPI Datatypedial09 oooooococcoccnccnnea ee 87 XMP I Kiviat scion cao a GI D Pilate ta ed agg ESS 89
93. dtype adisp ablen Scatter initial data with using derived datatypes defined above for the partitioning MPI_send and MPI recv will find out the layout of the data from those datatypes This saves application programs to manually pack unpack the data and more importantly gives opportunities to the MPI system for optimal communication strategies if comm rank eq 0 then do dest 1 comm size 1 call mpi send array l twdtype dest dest 0 mpi comm world x ierr enddo else call mpi_recv array 1 twdtype comm_rank 0 0 mpi_comm_world mstat ierr endif Computation Sum up in each column Each MPI process or a rank computes blocks that it is assigned The column block number is assigned in the variable cb The Starting and ending subscripts of the column block cb are Stored in cbs cb and cbe cb respectively The row block number is assigned in the variable rb The starting and ending subscripts of the row block rb are stored in rbs rb and rbe rb respectively as well src mod comm rank 1 comm size Appendix A 151 Example applications multi_par f dest mod comm_rank 1 comm_size comm_size ncb comm rank do rb 0 comm size 1 cb ncb c c Compute a block The function will go thread parallel if the c compiler supports OPENMP directives c call compcolumn nrow ncol array rbs rb rbe rb cbs cb cbe cb if rb lt comm size 1 then c e Send the last row of the block to the rank that is to
94. e MPI MT FLAGS on page 44 for more information Alternatively you may set the s a p option for the MPI FLAGS environment variable For the thread compliant library setting MPI_FLAGS s a p has the same effect as setting MPI MT FLAGS ct when the value of is greater than 0 MPI MT FLAGS ct takes priority over the default MP T_FLAGS sp0 setting Refer to MPI FLAGS on page 37 To set the level of thread support for your job you can specify the appropriate run time option in MPI MT FLAGS or modify your application to useMPI Init threadinstead of MPI Init To modify your application replace the call to PI rnit with MPI Init thread int argc char argv int required int provided where required Specifies the desired level of thread support provi ded Specifies the provided level of thread support Table 16 shows the possible thread initialization values for required and the values returned by provided for the non thread compliant library libmpi and for the thread compliant library libmtmpi Appendix C 171 MPI 2 0 features supported Thread compliant library MPI library libmpi libmpi libmpi libmpi libmtmpi libmtmpi libmtmpi libmtmpi y z 0 o Thread initialization values Value for required MPI THREAD SINGLE MPI THREAD FUNNELED MPI THREAD SERIALIZED MPI THREAD MULTIPLE MPI THREAD SINGLE MPI THREAD FUNNELED MPI THREAD SERIALIZED MPI THREAD MULTIPLE Value returned by
95. e el does summations of columns in a thread c implicit none integer nrow of rows integer ncol 4 of columns double precision array nrow ncol compute region integer rbs row block start subscript integer rbe row block end subscript integer cbs column block start subscript integer cbe column block end subscript c c Local variables c integer i j c c The OPENMP directive below allows the compiler to split the e values for j between a number of threads By making i and j c private each thread works on its own range of columns j c and works down each column at its own pace i e E Note no data dependency problems arise by having the threads all c working on different columns simultaneously c CSOMP PARALLEL DO PRIVATE i 3 do j cbs cbe do i max 2 rbs rbe array i j array i 1 j array i j enddo enddo C OMP END PARALLEL DO end COX RA RAI RARA RARA RARA RARA RA RARA kk ck ck k ck ck KKK KKK KK KKK KKK KK KK KKK KKK KKK KKK KKK kc subroutine comprow nrow ncol array rbs rbe cbs cbe This subroutine does summations of rows in a thread qaaaqaa implicit none of rows of columns compute region row block start subscript row block end subscript column block start subscript integer nrow integer ncol double precision array nrow ncol integer rbs integer rbe integer cbs 154 Appendix A aa 00000000000 CSO CSO CSO CSO c Example applications multi pa
96. e 82 90 XMPI Trace Selection 8l directory structure MPI 25 distribute sections compute in parallel 131 140 dtype variable 8 9 11 13 Dump 92 dump shmem configuration 196 40 edde 37 114 130 egdb 37 114 130 enable instrumentation 23 50 trace generation 24 91 92 97 verbose mode 97 enhanced debugging output 119 environment variables MP GANG 40 MPI CC 29 MPI COMMD 35 MPI CXX 29 MPI DLIB FLAGS 35 MPI F77 29 MPI F90 29 MPI FLAGS 37 114 MPI GLOBMEMSIZE 41 MPI_INSTR 41 68 MPI LOCALIP 43 MPI MT FLAGS 44 45 46 MPI REMSH 45 MPI_SHMEMCNTL 46 MPI_WORKDIR 46 MPI_XMPI 47 NLSPATH 65 runtime 34 40 setting in appfiles 57 TOTALVIEW 48 XAPPLRESDIR 163 error checking disable 40 error conditions 128 ewdb 37 114 130 example applications 131 161 cart C 131 142 communicator c 131 146 compiling and running 132 compute pi f 68 131 138 copy default communi cator 131 146 distribute sections com pute in parallel 131 140 generate virtual topolo gy 131 10 c 156 master worker f90 131 140 measure send receive time 131 multi par f 131 147 ping pong c 131 135 receive operation 131 send operation 131 send_receive f 133 thread_safe c 158 use ADI on 2D compute region 131 exceeding file descriptor limit 126 exdb 37 114 130 Express option get full trace 93 get partial trace
97. e all windows command For each graph you invoke from the Graph pulldown menu a new item appears in the Window pulldown menu Each new item has thetitle of the graph along with the name of the data file used to generate the graph Figure 5 displays an example of the Window menu containing the Close all windows option and four graph options MPIVIEW Window menu MPIVIEW send_receive mpiview Window Close all windows MPIVIEW Application summary by rank send_receive mpiview MPIVIEW Routine summary send_receive mpiview MPIVIEW Message length summary by routine send receive mpiview MPIVIEW Message summary by rank and peer send receive mpiview Analyzing graphs Each graph window provides functionality accessible through menu items the toolbar and using mouse manipulations Table 7 describes the functionality availableto help you analyze of your data 76 Chapter4 Table 7 Functionality Save graph as a postscri pt file Reset a three dimensional graph to its original position after you rotate it or usethe zoom feature Change the context of the graph View exact data values for regions Rotate a three dimensional graph Zoom on a particular section of a three dimensional graph Toggle the graph legend NOTE Display graphed data in text format Profiling Using counter instrumentation MPIVIEW analysis functions How to invoke Select the File pulldown menu then Save as or
98. e default value of is infinity Default stdio bline i Chapter 6 127 Debugging and troubleshooting Troubleshooting HP MPI applications Completing In HP MPI MPT_Finalize is a barrier like collective routine that waits until all application processes have called it before returning If your application exits without calling MPI Finalize pending requests may not complete When running an application mpirun waits until all processes have exited If an application detects an MPI error that leads to program termination it calls MPT_Abort instead You may want to code your error conditions using MPI Abort which deans up the application Each HP MPI application is identified by a job ID unique on the server wherempi run is invoked If you use the j option mpirun prints the job ID of the application that it runs Then you can invoke mpi job with the job ID to display the status of your application If your application hangs or terminates abnormally you can use mpiclean to kill any lingering processes and shared memory segments mpiclean uses thejob ID from mpirun j tospecify the application to terminate 128 Chapter 6 Debugging and troubleshooting Frequently asked questions Frequently asked questions This section describes frequently asked HP MPI questions These questions address the following issues Timein MPI Finalize MPI dean up e Application hangs in MPI Send Time in MPI Finalize QUESTION Whe
99. e specified for MPI GLOBMEMSIZE is less than the amount of global shared memory allocated for the host Otherwise swapping overhead will degrade application performance MPI INSTR MPI INSTR enables counter instrumentation for profiling HP MPI applications TheMPI INSTR syntax is a colon separated list no spaces between options as follows prefixr bzl 42 b 1 2 nd nc 0 n1 np nm c where prefix Specifies the instrumentation output file prefix The rank zero process writes the application s measurement data to prefix instr in ASCII and to prefix mpiview in a graphical format readable by mpiview If the prefix does not represent an absolute pathname the instrumentation output file is opened in the working directory of the rank zero process when MPI Init is called b 1 2 Redefines the instrumentation message bins to include a bin having byte range 1 and 2 inclusive The high bound of the range 2 can be infinity representing Chapter 3 41 NOTE Understanding HP MPI Running applications the largest possible message size When you specify a number of bin ranges ensure that the ranges do not overlap nd Disables rank by peer density information when running counter instrumentation nc Specifies no dobber If the instrumentation output file exists MPI Init aborts off Specifies counter instrumentation is initially turned off and only begins after all processes collectivel
100. each executable using it You can use HP MPI 1 7 as archive or shared libraries However your previous version of HP MPI must be retained in order to run executables built with archive libraries on previous versions of HP MPI An advantage of shared libraries is that when the library is updated eg to fix a bug all programs which use the library immediately enjoy the fix The disk and memory savings of shared libraries is offset by a slight performance penalty when a shared executable starts up References to shared library routines must be resolved by finding the libraries containing those routines However references need be resolved only once sothe performance penalty is quite small In order to use shared libraries HP MPI must be installed on all machines in the same directory Shared libraries are used by default In order tolink with archive libraries use the aarchive shared linker option Archive libraries are not available on the Itanium based version of HP MPI 54 Chapter 3 Understanding HP MPI Running applications Appfiles An appfile is a text file that contains process counts and a list of programs When you invoke mpirun with the name of the appfile mpirun parses the appfile to get information for the run You can use an appfile when you run a single executable file on a single host and you must use an appfile when you run on multiple hosts or run multiple executable files Creating an appfile The format of e
101. elow The application was run with np 4 Process 1 data read back is correct Process 3 data read back is correct Process 2 data read back is correct Process 0 data read back is correct Appendix A 157 Example applications thread safe c thread safe c In this C example N dients loop MAx WORK times As part of a single work item a dient must request service from one of Nservers at random Each server keeps a count of the requests handled and prints a log of the requests to stdout include stdio h include lt mpi h gt include pthread h define MAX WORK 40 define SERVER TAG 88 define CLIENT TAG 99 define REQ SHUTDOWN 1 static int service cnt 0 int process request request int request i r f request REQ SHUTDOWN service_cnt eturn request void server args void args int rank request MPI_Status status rank int args while 1 MPI Recv amp request 1 MPI_INT MPI_ANY_SOURCE SERVER TAG MPI COMM WORLD amp status if process request request REQ SHUTDOWN break MPI Send amp rank 1 MPI INT status MPI SOURCE CLIENT TAG MPI COMM WORLD printf server d processed request d for client d n rank request status MPI SOURCE printf server d total service requests d n rank service cnt return void 0 158 Appendix A Example applications thread safe c void client rank size int rank int size int w
102. em platforms HP MPI version 1 7 runs under HP UX 11 0 or higher HP MPI is supported on multinode HP UX The HP UX operating system is used on Workstations s700 series Midrange servers s800 series High end servers xiv NOTE CAUTION Notational conventions This section describes notational conventions used in this book bold monospace monospace italic Brackets KeyCap In command examples bold monospace identifies input that must be typed exactly as shown In paragraph text monospace identifies command names system calls and data structures and types In command examples monospace identifies command output including error messages In paragraph text italic identifies titles of documents n command syntax diagrams italicidentifies variables that you must provide The following command example uses brackets to indicate that the variable output fileis optional command input file output file In command examples square brackets designate optional entries In paragraph text KeyCap indicates the keyboard keys or the user selectable buttons on the Graphical User Interface GUI that you must press to execute a command A note highlights important supplemental information A caution highlights procedures or information necessary to avoid damage to equipment damage to software loss of data or invalid test results XV Associated Documents Associated documents include
103. ents availablein shared memory for inbound messages nbound messages are sent from processes on one or more hosts to processes on a given host using the communication daemon The default valuefor in frags is 64 Increasing the number of fragments for applications with a large number of processes improves system throughput Refer to Communicating using daemons on page 62 for more information MPI DLIB FLAGS MPI DLIB FLAGS controls runtime options when you usethe diagnostics library TheMPI DLIB FLAGS syntax is a comma separated list as follows ns h strict nmsg nwarn dump prefix dumpf prefix x NUM where ns Disables message signature analysis h Disables default behavior in the diagnostic library that ignores user specified error handlers The default considers all errors to be fatal Chapter 3 35 Understanding HP MPI Running applications strict nmsg nwarn dump prefix dump f prefix xNUM Enables MPI object space corruption detection Setting this option for applications that make calls to routines in the MPI 2 0 standard may produce false error messages Disables detection of multiple buffer writes during receive operations and detection of send buffer corruptions Disables the warning messages that the diagnostic library generates by default when it identifies a receive that expected more bytes than were sent Dumps unformatted all sent and received messages to
104. env MPI NOBACKTRACE See Backtrace functionality on page 119 for more information MPI REMSH MPI REMSH specifies a command other than the default remsh to start remote processes The mpirun mpijob and mpidean utilities support MPI REMSH For example you can set the environment variable to use a secure shell setenv MPI REMSH bin ssh The alternative remote shell command should be a drop in replacement for usr bin remsh that is the argument syntax for the alternative shell should be the same as for usr bin remsh Chapter 3 45 Understanding HP MPI Running applications MPI SHMEMCNTL MP I_SHMEMCNTL controls the subdivision of each process s shared memory for the purposes of point to point and collective communications The MP I_SHMEMCNTL syntax is a comma separated list as follows nenv frag generic where nenv Specifies the number of envelopes per process pair The default is 8 frag Denotes the size in bytes of the message passing fragments region The default is 87 5 percent of shared memory after mailbox and envelope allocation generic Specifies the size in bytes of the generic shared memory region The default is 12 5 percent of shared memory after mailbox and envelope allocation MPI_TMPDIR By default HP MPI uses the tmp directory to store temporary files needed for its operations MPT_TMPDTR is used to point to a different temporary directory The MPI TMPDIR syntax is directory wh
105. env NLSPATH 65 setenv XAPPLRESDIR 163 constructor functions contiguous 15 indexed 15 structure 15 vector 15 context communication 8 13 context switching 109 contiguous and noncontigu ous data 14 contiguous constructor 15 convert objects between lan guages 168 copy field 88 copy See number of mes sage copies sent corresponding MPI block ing nonblocking calls 9 count variable 7 8 9 11 counter instrumentation 41 68 ASCII format 69 create profile 68 using mpiview 73 77 create appfile 55 ASCII profile 68 instrumentation profile 68 trace file 79 CXperf 100 D daemons multipurpose 58 number of processes 58 daemons communication 62 data element count 88 DDE 37 114 130 debug HP MPI 37 114 130 See also diagnostic li brary See also enhanced de bugging output See also MPI Flags debuggers 114 decrease trace magnifica tion 83 derived data types 14 dest variable 8 9 determine group size 4 no of messages sent nO of processes in com municator 6 rank of calling process 4 195 diagnostics library message signature anal ysis 118 MPI object space cor ruption 118 multiple buffer writes detection 118 using 118 dial time 82 90 dialogs Kiviat 84 85 89 mpirun options 97 XMPI Application Browser 90 XMPI buffer size 96 XMPI Confirmation 93 XMPI Datatype 87 XMPI Express 93 XMPI Focus 86 XMPI Kiviat 89 XMPI monitor options 95 XMPI Trac
106. ere directory specifies an existing directory used to store temporary files MPI_WORKDIR By default HP MPI applications execute in the directory where they are started MPI WORKDIR changes the execution directory The MPI WORKDIR syntax is shown below directory where directory specifies an existing directory where you want the application to execute 46 Chapter 3 Understanding HP MPI Running applications MPI XMPI MPI XMPI specifies options for runtime trace generation These options represent an alternate way to set tracing rather than using the trace options supplied with mpirun The argument list for MP1_XMPT contains the prefix name for the file where each process writes its own trace data Before your application exits MPI_Finalize consolidates the process trace files to a single trace file named prefix tr If the file prefix does not represent an absolute pathname for example tmp test the consolidated trace file is stored in the directory in which the process is executing MPI Init ThempPI_XMPI syntax is a colon separated list no spaces between options as follows prefix bsiHHH nc off s where prefix Spedifies the tracing output file prefix prefixisa required parameter bs Denotes the buffering size in kbytes for dumping raw trace data Actual buffering size may be rounded up by the system The default buffering size is 4096 kbytes Specifying a large buffering size reduces the need to
107. ers for all MPI function calls If you have a well behaved application you can turn off argument checking by setting MPI FLAGS E to improve performance If you arerunning an application stand alone on a dedicated system setting MPI FLAGS y allows MPI to busy spin thereby improving latency See MPI FLAGS on page 37 for more information on the y option 104 Chapter 5 Tuning Message latency and bandwidth Message latency and bandwidth Latency is the time between the initiation of the data transfer in the sending process and the arrival of the first byte in the receiving process Latency is often dependent upon the length of messages being sent An application s messaging behavior can vary greatly based upon whether a large number of small messages or a few large messages are sent Message bandwidth is the reciprocal of the time needed to transfer a byte Bandwidth is normally expressed in megabytes per second Bandwidth becomes important when message sizes are large Toimprove latency or bandwidth or both Reducethe number of process communications by designing coarse grained applications Usederived contiguous data types for dense data structures to eliminate unnecessary byte copy operations in certain cases Use derived data types instead of MPI Pack and MPI Unpack if possible HP MPI optimizes noncontiguous transfers of derived data types Usecollective operations whenever possible This eliminates the overhead of
108. es how long the application has been running in seconds The time is indicated on the tool bar The trace log display area shows a separate trace for each process in the application Dial timeis represented as a vertical line The rank for each process is shown where the dial time line intersects a process trace Each process trace can have three colors Green Represents the length of time a process runs outside of MPI Red Represents the length of time a process is blocked waiting for communication to finish before the process resumes execution Yellow Represents a process s overhead time inside MPI for example time spent doing message packing Blocking point to point communications are represented by a trace for each process showing the time spent in system overhead and time spent blocked waiting for communication A line between process traces connects the appropriate send and receive trace segments The line starts at the beginning of the send segment and ends at the end of the receive segment For nonblocking point to point communications XMPI draws a system overhead segment when a send and receive are initiated When the communication is completed using a wait or a test XMPI draws Chapter 4 83 Profiling Using XMPI segments showing system overhead and blocking time Lines are drawn between matching sends and receives except in this case the line is drawn from the segment where the send was initiated to the segment
109. es only They may not necessarily represent the most efficient way to solvea given problem To build and run the examples follow the following procedure Changeto a writable directory Copy all files from the help directory to the current writable directory cp opt mpi help Compile all the examples or a single example To compile and run all the examples in the help directory at your UNIX prompt enter make To compile and run the thread safec program only at your UNIX prompt enter make thread safe 132 Appendix A send_receive f Example applications send_receive f In this Fortran 77 example process O sends an array to other processes in the default communicator MPI COMM WORLD program main include mpif h integer rank size to from tag count i ierr integer src dest integer st source st tag st count integer status MPI STATUS SIZE double precision data 100 call MPI Init ierr call MPI Comm rank MPI COMM WORLD rank ierr call MPI Comm size MPI COMM WORLD size ierr if size eq 1 then print must have at least 2 processes call MPI_Finalize ierr stop endif print Process rank of size is alive dest size 1 src 0 if rank eq src then to dest count 10 tag 2001 do i 1 10 data i I enddo call MPI_Send data count MPI_DOUBLE_PRECISION to tag MPI_COMM_WORLD ierr endif if rank eq dest then tag MPI_ANY_TAG
110. es priority over the default MP I_FLAGS sp0 Refer to MPI_MT_FLAGS on page 44 and Thread compliant library on page 170 for additional information Enables spin yield logic is the spin value and is an integer between zero and 10 000 The spin value specifies the number of milliseconds a process should block waiting for a message before yielding the CPU to another process How you apply spin yield logic depends on how well synchronized your processes are For example if you have a process that wastes CPU time blocked waiting for messages you can use spin yield to ensure that the process relinquishes the CPU to other processes Do this in your appfile by setting y to yo for the process in question This specifies zero milliseconds of spin that is immediate yield On the other extreme you can set spin yield for a process sothat it spins continuously that is it does not relinquish the CPU whileit waits for a message To spin without yielding specify y without a spin value If thetime a process is blocked waiting for messages is short you can possibly improve performance by setting a spin value between 0 and 10 000 that ensures the process does not relinquish the CPU until after the message is received thereby reducing latency The system treats a nonzero spin value as a recommendation only It does not guarantee that the value you specify is used Refer to Appfiles on page 55 for details about how to
111. ethe appropriate debugger command in each session to continue program execution Each process runs and stops at the breakpoint you set after MPI Init Continue to debug each process using the appropriate commands for your debugger Chapter 6 115 Debugging and troubleshooting Debugging HP MPI applications Using a multi process debugger HP MPI supports the TotalView debugger on HP UX version 11 0 and later The preferred method when you run TotalView with HP MPI applications is to use the mpirun runtime utility command For example mpicc myprogram c g mpirun tv np 2 a out In this example myprogram c is compiled using the HP MPI compiler utility for C programs refer to Compiling and running your first application on page 19 The executable file is compiled with source line information and then mpirun runs the a out MPI program g Specifies that the compiler generate the additional information needed by the symbolic debugger np2 Specifies the number of processes to run 2 in this case tv Specifies that the MPI ranks are run under TotalView Alternatively use mpirun to invoke an appfile mpirun tv f my appfile tv Specifies that the MPI ranks are run under TotalView f appfile Specifies that mpirun parses my appfileto get program and process count information for the run Refer to Creating an appfile on page 55 for details about setting up your appfile Refer to mpirun on page 49 for detai
112. ether these routines allow you to transfer heterogeneous data in a single message thus amortizing the fixed overhead of sending and receiving a message over the transmittal of many elements Refer to Chapter 3 User Defined Datatypes and Packing in MPI The Complete Referencefor a discussion of this topic and examples of construction of derived datatypes from the basic datatypes using the MPI constructor functions Chapter 1 15 Introduction MPI concepts Multilevel parallelism By default processes in an MPI application can only do one task at a time Such processes are single threaded processes This means that each process has an address space together with a single program counter a set of registers and a stack A process with multiple threads has one address space but each process thread has its own counter registers and stack Multilevel parallelism refers to MPI processes that have multiple threads Processes become multithreaded through calls to multithreaded libraries parallel directives and pragmas and auto compiler parallelism Multilevel parallelism is beneficial for problems you can decompose into logical parts for parallel execution for example a looping construct that spawns multiple threads to do a computation and joins after the computation is complete The example program multi par f on page 147 is an example of multilevel parallelism Advanced topics This chapter only provides a brief introduc
113. ew and displays counter instrumentation data from the my data mpiview file For more information refer to Creating an instrumentation profile on page 68 and Viewing instrumentation data with mpiview on page 73 HP MPI 1 7 is the last release that supports XMPI and mpiview XMPI and mipview are not supported for Itanium based systems Communicating using daemons By default off host communication between processes is implemented using direct socket connections between process pairs For example if process A on host1 communicates with processes D and E on host2 then process A sends messages using a separate socket for each process D and E This is referred to as the n squared or direct approach because to run an n process application n sockets are required to allow processes on one host to communicate with processes on other hosts When you usethis direct approach you should be careful that the total number of open Sockets does not exceed the system limit You can also use an indirect approach and specify that all off host communication occur between daemons by specifying the comma option tothe mpirun command In this case the processes on a host use shared 62 Chapter 3 Figure 1 NOTE Daemon process Understanding HP MPI Running applications memory to send messages to and receive messages from the daemon The daemon in turn uses a socket connection to communicate with daemons on other hosts Figure 1 show
114. f each process buffer are unloaded every time a snapshot is taken For communication intensive applications process buffers can quickly fill and overflow You can enable or disable automatic snapshot while your application is running This can be useful during troubleshooting when the application runs to a certain point and you want to disable automatic snapshot to study process state information Monitor interval in seconds Specifies in seconds how often XMPI takes a snapshot when automatic snapshot is enabled Step 3 Selec Buffers from the Options menu XMPI opens the XMPI buffer size dialog as shown in Figure 16 Figure 16 XMPI buffer size dialog XMPI buffer size Specify the size in kilobytes for each process buffer When you run an application state information for each process is stored in a separate buffer You may need to increase buffer size if overflow problems occur 96 Chapter 4 Profiling Using XMPI Step 4 Select mpirun from the Options menu XMPI opens the mpirun options dialog as shown in Figure 17 Figure 17 mpirun options dialog mpirun options The fields include Print job ID Enables printing of the HP MPI jobID Verbose Enables verbose mode Tradng Enables runtime trace generation for all application processes When you select Tracing XMPI expands the options dialog to indude more tracing options as shown in Figure 18 Chapter 4 97 Figure 18 Profiling Using XMPI Figure 18
115. flush raw trace data to a file when process buffers reach capacity Flushing too frequently can cause communication routines to run slower nc Specifies no clobber which means that an HP MPI application aborts if a file with the name specified in prefix already exists off Denotes that trace generation is initially turned off and only begins after all processes collectively call MPIHP Trace on S Specifies a simpler tracing mode by omitting tracing for MPI Test MPI Testall MPI Testany and MPI Testsome calls that do not complete a request This option may reduce the size of trace data so that xmpi runs faster Chapter 3 47 NOTE Understanding HP MPI Running applications Even though you can specify tracing options through the MPI XMPI environment variable the recommended approach is to use the mpirun command with the t option instead In this case the specifications you provide with the t option take precedence over any specifications you may have set with MP1_XMPI Using mpirun to specify tracing options guarantees that multihost applications do tracing in a consistent manner Refer to mpirun on page 49 for more information Trace file generation in conjunction with XM PI and counter instrumentation are mutually exdusive profiling techniques To generate tracing output files for multihost applications you must invoke mpirun on a host where at least one MPI process is running HP MPI writes the trace file prefix tr
116. from the XMPI Application Browser dialog 94 Chapter 4 Figure 15 Profiling Using XMPI Changing default settings and viewing options You should initially run your appfile using the XMPI default settings You can change XMPI default settings and profile viewing options from the Options pulldown menu The Options menu has three commands Monitoring Controls automatic snapshot Buffers Controls buffer size for processes mpirun Controls tracing options Use the following instructions to change the XMPI default settings and your viewing options Step 1 Enter xmpi to open the XMPI main window You can specify options to change the default XM PI window settings size color position etc Refer to xmpi on page 61 for details Step 2 Select the Options menu then Monitoring XMPI opens the XMPI monitor options dialog as shown in Figure 15 XMPI monitor options dialog XMPI monitor options The fields include Automatic snapshot Enables the automatic snapshot function If automatic snapshot is enabled XMPI takes snapshots of the application you are running and displays state information for each process Chapter 4 95 Profiling Using XMPI If automatic snapshot is disabled XMPI displays information for each process when the application begins However you can only update this information manually Disabling automatic snapshot may lead to buffer overflow problems because the contents o
117. gram arguments and are not processed by mpirun Adding program arguments to your appfile When you invoke mpirun using an appfile arguments for your program are supplied on each line of your appfile Refer to Creating an appfile on page 55 HP MPI also provides an option on your mpirun command lineto provide additional program arguments to those in your appfile This is useful if you wish to specify extra arguments for each program listed in your appfile but do not wish to edit your appfile To use an appfile when you invoke mpirun use one of the following as described in mpirun on page 49 mpirun mpirun options f appfile extra args for appfile e bsub lsf options pam mpi mpirun mpirun options f appfile extra args for appfile The extra args for_appfile option is placed at the end of your command line after appfile to add options to each line of your appfile Arguments placed after are treated as program arguments and are not processed by mpirun Use this option when you want to specify program arguments for each line of the appfile but want to avoid editing the appfile For example suppose your appfile contains h voyager np 10 send receive argl arg2 h enterprise np 8 compute pi If you invoke mpirun using the following command line mpirun f appfile arg3 arg4 arg5 Thesend receive command line for machine voyager becomes send receive argl arg2 arg3 arg4 arg5 Thecompute pi command
118. h fepmwrk y 7 xm HI fH Selection ao Chapter 4 81 Figure 8 Step 3 Profiling Using XMPI Select or type the full path name of the appropriate trace file in the Trace Selection dialog Selection field and select View XMPI invokes the XMPI Trace dialog Figure 8 shows an example of a trace log XMPI trace log Increase magnification Decrease magnification Rewind Stop Play Fast forward Dial time line Trace log display area When viewing trace files containing multiple segments that is multiple MPIHP_Trace_on and MPIHP_Trace_off pairs XMPI prompts you for the number of the segment you want to view To view different segments reload the trace file and specify the new segment number when you get the prompt Figure 8 displays a typical XPMI Trace consisting of an icon bar information about the current magnification and dial time and a main window displaying the trace log 82 Chapter 4 Profiling Using XMPI The icon bar allows you to ncreasethe magnification of the trace log Decreasethe magnification of the trace log e Rewind thetracelog to the beginning resets Dial time to the beginning Stop playing the trace log Play the trace log Fast forward the trace log Refer to Figure 8 on page 82 to identify the icons and their functionality To set the magnification for viewing a trace file select the Increase or Decrease icon on the icon bar Dial time indicat
119. he thread compliant library or the diagnostic library XMPI is useful when analyzing programs at the application level for example examining HP MPI datatypes and communicators You can run XMPI without having to recompile or relink your application XMPI runs in one of two modes postmortem mode or interactive mode In postmortem mode you can view trace information for each process in your application In interactive mode you can monitor process communications by taking snapshots while your application is running The default X resource settings that determine how XMPI displays on your workstation are stored in opt mpi lib X11 app defaults XM PI See Appendix B XMPI resource file for a list of these settings 78 Chapter 4 CAUTION Profiling Using XMPI Working with postmortem mode To use XMPI s postmortem mode you must first create a trace file Load the trace file into XMPI to view state information for each process in your application Creating a trace file To create a trace file use the following syntax mpirun t Spec np program as described in mpirun on page 49 and Preparing XMPI files on page 24 By default XMPI profiles the entire application from MPI Init to MPI Finalize However HP MPI provides nonstandard MPIHP Trace on and MPIHP_Trace_off routines to help troubleshoot application problems at finer granularity To useMPIHP Trace on and MPIHP Trace off 1 Insert the MPIHP Trace
120. he what com mand 18 121 version information 18 121 See MPIHP Trace off See MPIHP Trace on tuning 103 111 twisted data layout 149 U under subscribed See subscription types UNIX open file descriptors 126 unpacking and packing 14 using counter instrumentation 68 gang scheduling 40 mpiview 62 73 77 multiple network inter faces 107 profiling interface 101 XMPI in interactive mode 90 95 in postmortem mode 79 80 XMPI V variables buf 7 9 11 comm 8 9 11 12 13 count 7 8 9 11 dest 8 9 dtype 8 9 11 13 MPI DEBUG CONT 114 MPI ROOT 25 op 13 recvbuf 12 13 recvcount 12 recvtype 12 req 9 root 11 12 13 205 runtime 34 40 sendbuf 12 13 sendcount 12 sendtype 12 source 8 9 status 8 tag 8 9 XAPPLRESDIR 163 vector constructor 15 Verbose field 97 verbose mode 97 version using what 18 121 View 82 view kiviat information 89 multiple mpiview graphs 76 process info 85 trace file 80 view options changing and setting 95 viewing ASCII profile 69 instrumentation file 73 TI trace file 80 99 W WDB 37 114 130 what command 18 121 206 X X resource environment variable 163 XAPPLRESDIR 163 XDB 37 114 130 XMPL 78 99 Application Browser di alog 90 buffer size dialog 96 command line syntax 61 Confirmation dialog 93 Datatype dialog 87 display 78 Express dialog 93 Focus dialog 86 Focus dialog message queue
121. ications you must invoke mpirun on a host where at least one MPI process is running HP MPI writes the trace file prefix tr tothe working directory on the host where mpirun runs When you enable instrumentation for multihost runs and invoke mpirun either on a host where at least one MPI process is running or on a host remote from all your MPI processes HP MPI writes the instrumentation output files prefix instr and prefix mpiview to the working directory on the host that is running rank O mpijob mpi job lists the HP MPI jobs running on the system Invoke mpi job on the same host as you initiated mpirun mpiJjob syntax is shown below mpijob help a u j id id id where help Prints usage information for the utility a Lists jobs for all users u Sorts jobs by user name j id Provides process status for job id You can list a number of job IDs in a space separated list Chapter 3 59 Understanding HP MPI Running applications When you invoke mpi job it reports the following information for each job JOB HP MPI job identifier USER User name of the owner NPROCS Number of processes PROGNAME Program names used in the HP MPI application By default your jobs are listed by job ID in increasing order However you can specify the a and u options to change the default behavior An mpi job output using the a and u options is shown below listing jobs for all users and sorting them by user name
122. igure 13 on page 92 for the XMPI Dump dialog The Dump utility is only available if you first enable runtime trace generation for all application processes as follows Select Options from the main window s pulldown menu then mpirun XMPI invokes an mpirun options dialog Select Tracing in the mpirun options dialog Enter a prefix for the tr file in the Prefix field Chapter 4 91 Profiling Using XMPI Refer to Changing default settings and viewing options on page 95 for more details about enabling runtime trace generation and the mpirun options dialog At any time while your application is running you can select Dump from the Trace menu XMPI invokes the Dump dialog displayed in Figure 13 Figure 13 XMPI Dump dialog Specify the name of the consolidated tr output file The name you specified in the Prefix field in the mpirun options trace dialog is entered by default You can usethis name or type another After you have created the tr output file you can resume snapshot monitoring Express utility The Express utility allows generation of an XMPI Trace log using the data collected up to the current time in the application s life span Refer to Figure 8 on page 82 for an example of a Tracelog Express like the Dump utility is only available if you first enable runtime trace generation for all application processes by selecting the Options pulldown menu then mpirun and then the Tracing button on
123. ing MPI_MT_FLAGS ct has the same effect as setting MPI_FLAGS s a p when the value of that is greater than 0 MPI MT FLAGS ct takes priority over the default MPI FLAGS spO setting Refer to MPI FLAGS on page 37 Thesingle fun serial and mult options are mutually exclusive For example if you specify the serial and mult options in MPI MT FLAGS only thelast option specified is processed in this case the mult option If no runtime option is specified the default is mult For more information about using MPI MT FLAGS with the thread compliant library refer to Thread compliant library on page 170 44 Chapter 3 Understanding HP MPI Running applications MPI NOBACKTRACE On PA RISC systems a stack traceis printed when the following signals occur within an application SIGILL SIGBUS SIGSEGV SIGSYS In the event one of these signals is not caught by a user signal handler HP MPI will display a brief stack tracethat can be used to locate the signal in the code Signal 10 bus error PROCEDURE TRACEBACK 0 0x0000489c bar Oxc a out 1 0x000048c4 foo Oxlc a out 2 0x000049d4 main Oxa4 a out 3 0xc013750c _start 0xa8 usr lib libc 2 4 0x0003b50 SSTARTS 0x1a0 a out This feature can be disabled for an individual signal handler by dedaring a user level signal handler for the signal To disable for all signals set the environment variable MPI NOBACKTRACE set
124. intercommunicators Many MPI applications depend upon knowing the number of processes and the process rank within a given communicator There are several communication management functions two of the more widely used are Chapter 1 5 Introduction MPI concepts MPI Comm size andMPI Comm rank The process rank is a unique number assigned to each member process from the sequence 0 through size 1 where sizeis the total number of processes in the communicator To determine the number of processes in a communicator use the following syntax MPI Comm size MPI Comm comm int size where comm Represents the communicator handle size Represents the number of processes in the group of comm To determine the rank of each process in comm use MPI Comm rank MPI Comm comm int rank where comm Represents the communicator handle rank Represents an integer between zero and size 1 A communicator is an argument to all communication routines The C code example communi cator c on page 146 displays the use MPI Comm dup one of the communicator constructor functions and MPI Comm free the function that marks a communication object for deallocati on Sending and receiving messages There are two methods for sending and receiving data blocking and nonblocking In blocking communications the sending process does not return until the send buffer is available for reuse In nonblocking communications the sending process return
125. ions that use a large number of nonblocking messaging requests require debugging support or need to control process placement may need a more customized configuration Environment variables are always local to the system where mpi run runs To propagate environment variables to remote hosts specify each variable in an appfile using the e option See Creating an appfile on page 55 for more information The environment variables that affect the behavior of HP MPI at runtime are listed below and described in the following sections PI_COMMD s PI_DLIB_FLAGS PI_FLAGS P_GANG PI_GLOBMEMSIZE s PI_INSTR PI_LOCALIP gt PI MT FLAGS MPI NOBACKTRACE PI REMSH PI SHMEMCNTL nd PI TMPDIR PI_WORKDIR PI_XMPI e TOTALVIEW 34 Chapter 3 Understanding HP MPI Running applications MPI COMMD MP I_COMMD routes all off host communication through daemons rather than between processes The MP I_COMMD syntax is as follows out frags in frags where out frags Specifies the number of 16K byte fragments availablein shared memory for outbound messages Outbound messages are sent from processes on a given host to processes on other hosts using the communication daemon The default value for out frags is 64 Increasing the number of fragments for applications with a large number of processes improves system throughput in frags Specifies the number of 16K byte fragm
126. iple executables For details about building your appfile refer to Creating an appfile on page 55 Starting an application without using the mpi run command is no longer supported Types of applications HP MPI supports two programming styles SPMD applications and MPMD applications Chapter 3 31 Understanding HP MPI Running applications Running SPMD applications A single program multiple data SPMD application consists of a single program that is executed by each process in the application Each process normally acts upon different data Even though this style simplifies the execution of an application using SPMD can also makethe executable larger and more complicated Each process calls MPT_Comm_rank to distinguish itself from all other processes in the application It then determines what processing to do Torun a SPMD application use the mpirun command like this mpirun np program where is the number of processors and program is the name of your application Suppose you want to build a C application called poisson and run it using five processes to do the computation To do this usethe following command sequence mpicc o poisson poisson c mpirun np 5 poisson Running MPMD applications A multiple program multiple data MPM D application uses two or more separate programs to functionally decompose a problem This style can be used to simplify the application source and reduce the size of spawned
127. iptors that application processes can have open at onetime When running a multihost application each local process opens a socket to each remote process An HP MPI application with a large amount of off host processes can quickly reach the file descriptor limit Ask your system administrator to increase the limit if your applications frequently exceed the maxi mum External input and output You can use stdin stdout and stderr in your applications to read and write data All standard input is routed through the mpi run process Standard input to mpirun is selectively ignored default behavior replicated to all of the MPI processes or directed to a single process Input intended for any of the processes in an MPI application should therefore be directed to the standard input of mpi run Since mpirun reads stdin on behalf of the processes running an MPI application in the background will result in the application being suspended by most shells For this reason the default mode is to ignore stdin If your application uses stdin usethe following options for making standard input available to processes Similarly the stdout and stderr of MPI processes are combined to become the stdout and stderr of the mpi run process used to start the MPI application How the streams are combined and displayed is determined by the MPI standard IO settings Applications that read from stdin must use stdio i or stdio i n HP MPI standard O options ca
128. ither HP UX or hello world cis built on HP UX so the same binary file can run on both hosts To build and run hello world c on two hosts usethe following procedure replacing jawbone and wizard with the names of your machines Edit the rhosts file on jawbone and wizard Add an entry for wizard in the rhosts file on jawbone and an entry for jawbone in the rhosts file on wizard In addition to the entries in the rhosts file ensure that your remote machine permissions are set up so that you can usethe remsh command tothat machine Refer to the HP UX remsh 1 man page for details You can use the MPI_REMSH environment variable to specify a command other than remsh to start your remote processes Refer to MPI REMSH on page 45 Ensure that the correct commands and permissions are set up on all hosts Changeto a writable directory Compilethe hello world executable mpicc o hello world opt mpi help hello world c Copy the hello world executable file from jawbone to a directory on wizard that is in your command path PATH Chapter 2 21 Step 5 Step 6 Step 7 Getting started Compiling and running your first application Create an appfile An appfile is a text file that contains process counts and a list of programs In this example create an appfile named my_appfile containing the following two lines np 2 hello world h wizard np 2 hello world The appfile file should contain a separate line for each hos
129. ity Enables runtime instrumentation profiling for all processes spec specifies options used when profiling The options arethe same as those for the environment variable MPI INSTR For example the following is a valid command line mpirun i mytrace nd nc f appfile Refer to MPI INSTR on page 41 for an explanation of i options Prints the HP MPI jobID Specifies the username on the target host default is local username Specifies the number of processes to run Chapter 3 sp paths t Spec version args Understanding HP MPI Running applications Turns on pretend mode That is the system goes through the motions of starting an HP MPI application but does not create processes This is useful for debugging and checking whether the appfileis set up correctly Sets the target shell PATH environment variable to paths Search paths are separated by a colon Enables runtime trace generation for all processes spec specifies options used when tracing The options arethe same as those for the environment variable MPI XMPI For example the following is a valid command line mpirun t mytrace off nc f appfile Refer to MPI XMPI on page 47 for an explanation of t options Specifies that the application runs with the TotalView debugger This option is not supported when you run mpirun under LSF Turns on verbose mode Prints the version information Specifies command line arguments to the program A s
130. ize 1 rbe 0 comm size 1 cbs 0 comm size 1 cbe 0 comm size 1 rdtype 0 comm size 1 E cdtype 0 comm_size 1 twdtype 0 comm_size 1 do b1k 0 comm_size 1 call blockasgn 1 nrow comm size blk rbs blk rbe blk call mpi type contiguous rbe b1k rbs b1k 1 mpi double precision rdtype blk ierr call mpi type commit rdtype blk ierr call blockasgn 1 ncol comm size blk cbs blk cbe blk call mpi type vector cbe blk cbs b1k 1 1 nrow 2 mpi double precision cdtype blk ierr call mpi type commit cdtype blk ierr enddo 150 Appendix A Qo o oG 0 00000000 00000000000 Example applications multi_par f Compose MPI datatypes for gather scatter Each block of the partitioning is defined as a set of fixed length vectors Each process es partition is defined as a struct of such blocks allocate adtype 0 comm size 1 adisp 0 comm size 1 ablen 0 comm_size 1 call mpi type extent mpi double precision dsize ierr do rank 0 comm size 1 do rb 0 comm size 1 cb mod rb rank comm size call mpi type vector cbe cb cbs cb 1 rbe rb rbs rb 1 x nrow mpi double precision adtype rb ierr call mpi type commit adtype rb ierr adisp rb rbs rb 1 cbs cb 1 nrow dsize ablen rb 1 enddo call mpi type struct comm size ablen adisp adtype twdtype rank ierr call mpi type commit twdtype rank ierr do rb 0 comm size 1 call mpi type free adtype rb ierr enddo enddo deallocate a
131. l or multithreaded See MPI 1 2 Section 2 6 MPI does not provide mechanisms to specify the initial allocation of processes to an MPI computation and their initial binding to physical processes See MPI 1 2 Section 2 6 Fortran is layered on top of C and profile entry points are given for both languages MPI processes are UNIX processes and can be multithreaded HP MPI provides the mpirun np utility and appfiles Refer tothe relevant sections in this guide MPI does not mandate that any I O service be provided but does suggest behavior to ensure portability if it is provided See MPI 1 2 Section 2 8 Each process in HP MPI applications can read and write data to an external drive Refer to External input and output on page 126 for details Appendix D 181 Standard flexibility in HP MPI Reference in MPI standard Continued HP MPI s implementation Continued The value returned for P1 HOST gets the rank of the host process in the group associated with MPI COMM WORLD MPI PROC NULL is returned if thereis no host MPI does not specify what it means for a process to be a host nor does it specify that a HOST exists MPI provides MPI GET PROCESSOR NAME to return the name of the processor on which it was called at the moment of the call See MPI 1 2 Section 7 1 1 The current MPI definition does not require messages to carry data type information Type information might be
132. letion The communication operation between the two processes may also overlap with computation bandwidth Reciprocal of the time needed to transfer a byte Bandwidth is normally expressed in megabytes per second barrier Collective operation used to synchronize the execution of processes MPI Barrier blocks the calling process until all receiving processes have called it This is a useful approach for separating two stages of a computation so messages from each stage are not overlapped blocking receive Communication in which the receiving process does not return until its data buffer contains the data transferred by the sending process blocking send Communication in which the sending process does not return until its associated data buffer is availablefor reuse The data transferred can be copied directly into the matching receive buffer or a temporary system buffer broadcast Oneto many collective operation where the root process sends a message to all other processes in the communicator including itself buffered send mode Form of blocking send where the sending process returns when the message is buffered in application supplied space or when the message is received buffering Amount or act of copyingthat a system uses to avoid deadlocks A large amount of buffering can adversely affect performance and make MPI applications less portable and predictable cluster Group of computers linked together with an
133. lid the key is ignored The example C code io c on page 156 demonstrates the use of MPI 2 0 standard parallel IO functions Theio c program has functions to manipulate files access data and change the process s view of data in the file Appendix C 167 MPI 2 0 features supported Language interoperability Language interoperability Language interoperability allows you to write mixed language applications or applications that call library routines written in another language For example you can write applications in Fortran or C that call MPI library routines written in C or Fortran respectively MPI provides a special set of conversion routines for converting objects between languages You can convert MPI communicators data types groups requests reduction operations and status objects Conversion routines are described in Table 14 Table 14 Language interoperability conversion routines Routine Description PI Fint MPI Comm c2f MPI Comm Converts a C communicator handleinto a Fortran handle PI Comm MPI Comm f2c MPI Fint PI Fint MPI Type c2f MPI Datatypeo Converts a Fortran communicator handleinto a C handle Converts a C data typeinto a Fortran data type PI Datatype MPI Type f2c MPI Fint Converts a Fortran data typeinto a C data type PI Fint MPI Group c2f MPI Group PI Group MPI Group f2c MPI Fint PI Fint MPI Op c2f MPI Op Converts a C group into a Fortran grou
134. lity commands 0000 cee 49 MYO A UT ia eck vk ites ime A ws Pe al oe eee Seve teen ede ae ae ite 49 Shared library support o oocccccocococ ee 54 Appfiles uico eek Rcx RE ee ea RUE UR Pe 55 Multipurpose daemon process 00 c cece eee eee 58 Generating multihost instrumentation profiles 59 A 59 A RR MI 61 MMP de vasis we A ea AAs pra xq Vete DM Rp teca esos De feque 61 npivieWzls sb eme See CERCLE EE RS ex ACE 62 Communicating using daemons lsslesleses llle 62 IMPI iesu mre x ume emp EX eG n ERN x DE RUE 64 Assigning hosts using LSF oococccocococo esee 64 Native Language Support 0 0 0 cece ee 65 4 Profiling x34 A NACER URS REM RN EUR EN S EUR 67 Using counter instrumentation lllslleeeees esee 68 Creating an instrumentation profile oooo cocoooo 68 MPIHP Trace on and MPIHP Trace Off o oo ooooo 69 Viewing ASCII instrumentation data oocococcocooo 69 Viewing instrumentation data with MpivieW 73 Loading an mpiview file 0 000 c eee ees 73 Selecting a graph type 2 0 ee ee 73 Viewing multiple graphs 0 0 e ee 76 Analyzing graphs 0 000 cect ee 76 USINGXMP o RE ORT 78 Working with postmortem mode n s assas 79 vi Table of Contents Creating a tracefile o oooooooooocoornnonnrr aa 79 Viewing atracefile 0 cece eee 80 Working with i
135. live Process 5 of 10 is alive Process 6 of 10 is alive Process 2 of 10 is alive Process 4 of 10 is alive Process 8 of 10 is alive pi is approximately 3 1416009869231250 Error is 0000083333333318 Appendix A 139 Example applications master_worker f90 master_worker f90 In this Fortran 90 example a master task nitiates numtasks 1 number of worker tasks The master distributes an equal portion of an array to each worker task Each worker task receives its portion of the array and sets the value of each element to the element s index 1 Each worker task then sends its portion of the modified array back tothe master program array manipulation include mpif h integer kind 4 status MPI STATUS SIZE integer kind 4 parameter ARRAYSIZE 10000 MASTER 0 integer kind 4 numtasks numworkers taskid dest index i integer kind 4 arraymsg indexmsg source chunksize int4 real4 real kind 4 data ARRAYSIZE result ARRAYSIZE integer kind 4 numfail ierr call MPI Init ierr call MPI Comm rank MPI COMM WORLD taskid ierr call MPI Comm size MPI COMM WORLD numtasks ierr numworkers numtasks 1 chunksize ARRAYSIZE numworkers arraymsg 1 indexmsg 2 int4 4 real4 4 numfail 0 KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK Master task ck Ck ck Ck CK Ck Ck Ck Ck Ck Ck Ck kk Ck Sk KKK KKK KKK if taskid eq MASTER then data 0 0 index 1 do dest 1 numworkers call M
136. ll PATH environment variable You can set this option in your appfile e The cshrc file does not contain tty commands such as stt y if you are using a bin csh based shell 122 Chapter 6 Debugging and troubleshooting Troubleshooting HP MPI applications Running Run time problems originate from many sources and may include Shared memory Message buffering Propagation of environment variables e Interoperability Fortran 90 programming features UNIX open file descriptors External input and output Shared memory When an MPI application starts each MPI process attempts to allocate a section of shared memory This allocation can fail if the system i mposed limit on the maximum number of allowed shared memory identifiers is exceeded or if the amount of available physical memory is not sufficient to fill the request After shared memory allocation is done every MPI process attempts to attach to the shared memory region of every other process residing on the same host This attachment can fail if the number of shared memory segments attached to the calling process exceeds the system i mposed limit In this case usethe MPI GLOBMEMSIZE environment variable to reset your shared memory allocation Furthermore all processes must be able to attach to a shared memory region at the same virtual address For example if the first process to attach tothe segment attaches at address ADR then the virtual memory region
137. ls about mpirun Refer tothe MPI FLAGS on page 37 and the TotalView documentation for details about MPr FLAGS and TotalView command line options respectively By default mpirun searches for TotalView in your PATH settings You can also define the absolute path to TotalView using the TOTALVIEW environment variable setenv TOTALVIEW opt totalview bin total view totalview opti ons The TOTALVIEW environment variable is used by mpirun 116 Chapter 6 NOTE Debugging and troubleshooting Debugging HP MPI applications When attaching to a running MPI application you should attach to the MPI daemon process to enable debugging of all the MPI ranks in the application You can identify the daemon process as the one at the top of a hierarchy of MPI jobs the daemon also usually has the lowest PID among the MPI jobs Limitations The following limitations apply to using TotalView with HP MPI applications 1 All the executable files in your multihost MPI application must reside on your local machine that is the machine on which you start TotalView Refer to TotalView multihost example on page 117 for details about requirements for directory structure and file locations 2 TotalView sometimes displays extra HP UX threads that have no useful debugging information These are kernel threads that are created to deal with page and protection faults associated with one copy operations that HP MPI uses to improve perfor
138. m of data transfer in a message passi ng model and is described in Chapter 3 Point to Point Communication in the MPI 1 0 standard The performance of point to point communication is measured in terms of total transfer time The total transfer time is defined as total transfer time latency message size bandwi dth where latency Specifies the time between the initiation of the data transfer in the sending process and the arrival of the first byte in the receiving process message size Specifies the size of the message in Mbytes bandwidth Denotes the reciprocal of the time needed to transfer a byte Bandwidth is normally expressed in Mbytes per second Low latencies and high bandwidths lead to better performance Communicators A communicator is an object that represents a group of processes and their communication medium or context These processes exchange messages to transfer data Communicators encapsulate a group of processes such that communication is restricted to processes within that group The default communicators provided by MPI are MPI_COMM_WORLD and MPI_COMM_SELF MPI COMM WORLD contains all processes that are running when an application begins execution Each process is the single member of its own MPI COMM SELF communicator Communicators that allow processes within a group to exchange data are termed intracommunicators Communicators that allow processes in two different groups to exchange data are called
139. mance You can ignore these kernel threads during your debugging session TotalView multihost example The following example demonstrates how to debug a typical HP MPI multihost application using Total View induding requirements for directory structure and file locations The MPI application is represented by an appfile named my_appfile which contains the following two lines h local host np 2 path to programl h remote host np 2 path to program2 my appfile resides on the local machine local host in the work mpiapps total directory To debug this application using TotalView in this example TotalView is invoked from the local machine 1 Place your binary files in accessible locations path to program1 exists on local host e path to program2 exists on remote host Chapter 6 117 NOTE Debugging and troubleshooting Debugging HP MPI applications To run the application under TotalView the directory layout on your local machine with regard tothe MPI executable files must mirror the directory layout on each remote machine Therefore in this case your setup must meet the following additional requirement e path to program2 exists on local host 2 In the work mpiapps total directory on local host invoke TotalView by passing the tv option to mpirun 9 mpirun tv f my appfile Using the diagnostics library HP MPI provides a diagnostics library DLIB for advanced run time error checking and a
140. message type and not for each individual message For example if a communication involves a hundred messages all having the same envelope you can work with a single Focus dialog not with one hundred copies Select the Application menu then Quit to close XMPI Viewing Kiviat information When you play the trace file the state of the processes is reflected in the main window and the Kiviat graph Use the following instructions to view performance information in a Kiviat graph Start XMPI and open a trace for viewing as described in Creating a trace file on page 79 Select Kiviat from the Trace menu XMPI opens a window containing a Kiviat graph as shown in Figure 12 XMPI Kiviat XMPI Kiviat Red Process blocked Yellow MPI overhead Green Process running outside MPI The XMPI Kiviat shows in segmented pie chart format the cumulative time up to the current dial time spent by each process in running overhead and blocked states represented by green yellow and red respectively The process numbers are indicated on the graph Asthetrace file plays and processes communicate the Kiviat changes to reflect the time spent running blocked or in MPI overhead Chapter 4 89 Step 3 Step 1 Step 2 Step 3 Profiling Using XMPI Usethe XMPI Kiviat to determine whether processes are load balanced and applications are synchronized If an application is load balanced the amount of time processes spend in each st
141. mized to use shared memory where possible for performance Use commutative MPI reduction operations UsetheMPI predefined reduction operations whenever possible because they are optimized When defining your own reduction operations make them commutative Commutative operations give MPI more options when ordering operations allowing it to select an order that leads to best performance Use MPI derived datatypes when you exchange several small size messages that have no dependencies Minimize your use of MPI Test polling schemes to minimize polling overhead Code your applications to avoid unnecessary synchronization In particular striveto avoid MPI Barrier calls Typically an application can be modified to achieve the same end result using targeted synchronization instead of collective calls For example in many cases a token passing ring may be used to achieve the same coordination as a loop of barrier calls Chapter 5 111 Tuning Coding considerations 112 Chapter5 Debugging and troubleshooting This chapter describes debugging and troubleshooting HP MPI applications The topics covered are Debugging HP MPI applications Using a single process debugger Usinga multi process debugger Usingthe diagnostics library Enhanced debugging output Backtrace functionality Troubleshooting HP MPI applications Building Starting Running Completing Frequently asked questions Chapter 6 113
142. moococormiu Oorooot oot OOOO for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for total service requests total service requests pepe A aa SA ASA OO A SA OIGA OS O OA E O O O O OA OO A Example applications thread safe c 161 Example applications thread safe c 162 Appendix A XMPl resource file This appendix displays the contents of the XMPI Xresource file stored in opt mpi lib X11 app defaults XM PI You should make your own copy of the resource file when you wish to customize the contents Set your Xresource environment in one of the following ways By default the XMPI utility uses the XMPI Xresource file in Jopt mpi lib X 11 app defaults XM PI f you move your HP MPI product from it s default opt mpi install location set the MPI ROOT environment variable to point to the new location Also set the X application resource environment variable to point to your XMPI resource file To set the X application resource environment variable enter setenv XAPPLRESDIR MPI ROOT lib X11 app defaults XMPI You can copy the XMPI resource file to another location and customize it Set the XAPPLRESDIR environment variable to point to the new XMPI file For example if you copy the XMPI file to your home directory type the following command
143. my HP MPI includes ROMIO a portable implementation of MPI 1 O developed at the Argonne National Laboratory xvii xviii Introduction This chapter provides a brief introduction about basic Message Passing Interface MPI concepts and the HP implementation of MPI This chapter contains the syntax for some MPI functions Refer to MPI A MessagePassing Interface Standard for syntax and usage details for all MPI standard functions Also refer to MPI A Message Passing Interface Standard and to MPI The Complete Reference for in depth discussions of MPI concepts The introductory topics covered in this chapter include The message passing model e MPI concepts Point to point communication Collective operations MPI datatypes and packing Multilevel parallelism Advanced topics Chapter 1 Introduction The message passing model The message passing model Programming models are generally categorized by how memory is used In the shared memory model each process accesses a shared address space while in the message passing model an application runs as a collection of autonomous processes each with its own local memory In the message passing model processes communicate with other processes by sending and receiving messages When data is passed in a message the sending and receiving processes must work to transfer the data from the local memory of oneto the local memory of the other Message passing is used widely o
144. n build with HP MPI and then turn tracing on the application takes a long time inside MPT_Finalize What is causing this ANSWER When you turn tracing on MPI Finalize spends time consolidating the raw trace generated by each process into a single output file with a tr extension MPI clean up QUESTION How does HP MPI clean up when something goes wrong ANSWER HP MPI uses several mechanisms to dean up job files Note that all processes in your application must call MPT_Finalize When a correct HP MPI program that is one that calls MPI Finalize exits successfully the root host deletes the job file e f you usempirun it deletes the job file when the applicati on terminates whether successfully or not e When an application calls MPT_Abort MPI_Abort deletes the job file fyouusempijob jtoget more information on a job and the processes of that job have all exited mpi job issues a warning that the job has completed and deletes the job file Chapter 6 129 Debugging and troubleshooting Frequently asked questions Application hangs in MPI Send QUESTION My MPI application hangs at MPI Send Why ANSWER Deadlock situations can occur when your code uses standard send operations and assumes buffering behavior for standard communication mode You should not assume message buffering between processes because the MPI standard does not mandate a buffering strategy HP MPI does sometimes use buffering for PT send
145. n HP MPI 1 7 s standard I O processing are useful in providing more readable output The default behavior is to print a stack trace Backtracing can be turned off entirely by setting the environment variable MPT_NOBACKTRACE See MPI NOBACKTRACE on page 45 Backtradng is only supported on HP PA RISC systems 120 Chapter 6 Debugging and troubleshooting Troubleshooting HP MPI applications Troubleshooting HP MPI applications This section describes limitations in HP MPI some common difficulties you may face and hints to help you overcome those difficulties and get the best performance from your HP MPI applications Check this information first when you troubleshoot problems Thetopics covered are organized by development task and also indude answers to frequently asked questions Building Starting Running Completing Frequently asked questions To get information about the version of HP MPI installed on your system usethe what command Thefollowing is an example of the command and its output what opt mpi bin mpicc opt mpi bin mpicc HP MPI 01 07 00 00 dd mm yyyy B6060BA HP UX 11 0 This command returns the HP MPI version number the date this version was released HP MPI product numbers and the operating system version Chapter 6 121 CAUTION Debugging and troubleshooting Troubleshooting HP MPI applications Building You can solve most build time problems by referring to the documentati
146. n be set by using the following options to mpirun mpirun stdio bline bnone b pl Ir 7H i where sb Broadcasts standard input to all MPI processes i A Directs standard input to the process with global rank A The following modes are available for buffering 126 Chapter6 b 0 bnone 0 bline 0 Debugging and troubleshooting Troubleshooting HP MPI applications Specifies that the output of a single MPI process is placed to the standard out of mpirun after bytes of output have been accumulated The same as b except that the buffer is flushed both when it is full and when it is found to contain any data Essentially provides no buffering from the user s perspective Displays the output of a process after a line feed is encountered or the byte buffer is full The default value of in all cases is 10k bytes The following option is available for prepending P Enables prepending The global rank of the originating process is prepended to stdout and stderr output Although this mode can be combined with any buffering mode prepending makes the most sense with the modes b and bline The following option is available for combining repeated output r 41 Combines repeated identical output from the same process by prepending a multiplier tothe beginning of the output At most maximum repeated outputs are accumulated without display This option is used only with b1ine Th
147. n parallel computers with distributed memory and on clusters of servers The advantages of using message passing include e Portability Message passing is implemented on most parallel platforms Universality M odel makes minimal assumptions about underlying parallel hardware M essage passing libraries exist on computers linked by networks and on shared and distributed memory multiprocessors e Simplicity M odel supports explicit control of memory references for easier debugging However creating message passing applications may require more effort than letting a parallelizing compiler produce parallel applications In 1994 representatives from the computer industry government labs and academe developed a standard specification for interfaces to a library of message passing routines This standard is known as MPI 1 0 MPI A MessagePassing Interface Standard Since this initial standard versions 1 1 J une 1995 1 2 J uly 1997 and 2 0 J uly 1997 have been produced Versions 1 1 and 1 2 correct errors and minor omissions of MPI 1 0 MPI 2 0 MPI 2 Extensions to the M essage Passing Interface adds new functionality to MPI 1 2 You can find both standards in HTML format at http www mpi forum org MPI 1 compliance means compliance with MPI 1 2 MPI 2 compliance means compliance with MPI 2 0 Forward compatibility is preserved in the standard That is a valid MPI 1 0 programis a valid MPI 1 2 program and a valid MPI 2
148. nalysis DLIB provides the following checks e Message signature analysis Detects type mismatches in MPI calls For example in the two calls below the send operation sends an integer but the matching receive operation receives a floating point number if rank 1 then MPI Send amp bufl 1 MPI INT 2 17 MPI COMM WORLD else if rank 2 MPI Recv amp buf2 1 MPI FLOAT 1 17 MPI COMM WORLD amp status MPI object space corruption Detects attempts to write into objects such as MPI Comm MPI Datatype MPI Request MPI Group and MPI Errhandler Multiple buffer writes Detects whether the data type specified in a receive or gather operation causes MPI to write to a user buffer more than once To disable these checks or enable formatted or unformatted printing of message data to a file set thewPI DLIB FLAGS environment variable options appropriately See MPI DLIB FLAGS on page 35 for more information To use the diagnostics library specify the 1dmpi option when you compile your application Using DLIB reduces application performance DLIB is not thread compliant Also you cannot use DLIB with instrumentation or XMPI tracing 118 Chapter 6 Debugging and troubleshooting Debugging HP MPI applications Enhanced debugging output HP MPI 1 7 provides improved readability and usefulness of MPI processes stdout and stderr More intuitive options have been added for handling standard input Directed Inpu
149. nd Scatter Root sends data to all processes induding itself Gather Root receives data from all processes induding itself Allgather and Alltoall E ach process communicates with each process including itself 10 Chapter 1 Introduction MPI concepts The syntax of the MPI collective functions is designed to be consistent with point to poi nt communications but collective functions are more restrictive than point to point functions Some of the important restrictions to keep in mind are The amount of data sent must exactly match the amount of data specified by the receiver Collective functions come in blocking versions only Collective functions do not use a tag argument meaning that collective calls are matched strictly according to the order of execution e Collective functions come in standard mode only For detailed discussions of collective communications refer to Chapter 4 Collective Communication in the MPI 1 0 standard Thefollowing examples demonstrate the syntax to code two collective operations a broadcast and a scatter To code a broadcast use MPI Bcast void buf int count MPI Datatype dtype int root MPI Comm comm where buf Specifies the starting address of the buffer count Indicates the number of buffer entries dtype Denotes the datatype of the buffer entries root Specifies the rank of the root comm Designates the communication context that identifies a group of processes
150. nding and receiving messages Applications based on message passing are nondeter ministic by default However when one process sends two or more messages to another the transfer is deterministic as the messages are always received in the order sent MIMD Multiple instruction multiple data Category of applications in which many instruction streams are applied concurrently to multiple data sets MPI Message passing interface Set of library routines used to design scalable parallel applications These routines provide a wide range of operations that include computation communication and synchronization MPI 1 2 is the current standard supported by major vendors 187 MPIVIEW AnHP MPI utility that is a graphical user interfaceto display instrumentation data collected at run time MPMD Multiple data multiple program Implementations of HP MPI that usetwo or more Separate executables to construct an application This design style can be used to simplify the application source and reduce the size of spawned processes Each process may run a different executable multilevel parallelism Refers to multithreaded processes that call MPI routines to perform computations This approach is beneficial for problems that can be decomposed into logical parts for parallel execution for example a looping construct that spawns multiple threads to perform a computation and then joins after the computation is complete multihos
151. nteractive mode 0 c eects 90 Running an appfile 0 0 0 2 cece 90 Changing default settings and viewing options 95 Using CAHIT xxr A E Qr ee Sie T regen ine 100 Using the profiling interface 1 0 0 eee 101 Fortran profiling interface 0 0 0 eee 102 5 TUNING ir ERE 103 MPI FEAGS Options stares tetanic aa a Ge aera eR RE 104 Message latency and bandwidth saaana cece ee nee ne eas 105 Multiple network interfaces 1 0 0 0 cee 107 Processor SUDSCTIPtiON o ooocoocococoocn teens 109 MPI routine selection 0 0 cts 110 Multilevel parallelism 00 0000 cece cee ee 110 Coding Considerati0NS oococccococnnc 111 6 Debugging and troubleshootiNg lt lt lt lt ooooocooococoncn nn nn nnn 113 Debugging HP MPI applications ocococccccococooo oo 114 Using a single process debugger lille eee eens 114 Using a multi process debugger oococccccccc eee 116 Limitations clue a RT RETI xc RR UE LE 117 TotalView multihost example 0 0000 eee eee ee 117 Using the diagnostics library 00000 cece eee 118 Enhanced debugging output 0 0 0 e ee eee 119 Backtrace functionality sasaaa aae 119 Troubleshooting HP MPI applicati0NS oooococococoo 121 Building 020 ES a ea eee eae e M Rs 122 Stat a gone 122 RUANO lt a e BE RIVE te 123 Shared memory ooo 123 Message bufferinQ oooococcccococr tee 124
152. ntries in an appfile is line oriented Lines that end with the backslash V character are continued on the next line forming a single logical line A logical line starting with the pound character is treated as a comment Each program along with its arguments is listed on a separate logical line The general form of an appfile entry is h remote host e var val 1 use sp paths np program args where h remote host Specifies the remote host where a remote executable fileis stored The default is to search the local host remote host is either a host name or an IP address e var val Sets the environment variable var for the program and gives it the value val The default is not to set environment variables When you use e with the n option the environment variable is set to val on the remote host 1 user Specifies the user name on the target host The default is the current user name sp paths Sets thetarget shell PATH environment variable to paths Search paths are separated by a colon np Specifies the number of processes to run The default value for is 1 program Specifies the name of the executable to run mpirun searches for the executable in the paths defined in the PATH environment variable Chapter 3 55 CAUTION Understanding HP MPI Running applications args Specifies command line arguments to the program Options following a program name in your appfile are treated as pro
153. o monitor and analyze your application when running interactive mode XMPI provides the following functionality Snapshot utility The snapshot utility helps you debug applications that hang If automatic snapshot is enabled XMPI takes periodic snapshots of the application and displays state information for each process in the XMPI main window the XMPI Focus dialog and the XMPI Datatype dialog You can usethis information to view the state of each process when an application hangs Refer to Changing default settings and viewing options on page 95 for information to enable automatic snapshot Refer to Figure 10 on page 86 and Figure 11 on page 87 for details about the XMPI Focus and Datatype dialogs If automatic snapshot is disabled XMPI displays information for each process when the application begins but does not update the information as the application runs You can take application snapshots manually by selecting the Application pulldown menu then Snapshot XMPI displays information for each process but this information is not updated until you take the next snapshot You can only take snapshots when an appfile is running and you cannot replay snapshots like trace files Dump utility The Dump utility consolidates all trace file data collected up tothe current time in the application s life span into a single output file prefix tr Define prefix in the XMPI Dump dialog as the name you want to give your tr file Refer to F
154. ocess within the current communicator Names the communicator used by the HP MPI function When you select the icon tothe right of the comm field the hexagons for processes that belong to the communicator are highlighted in the XMPI main window Displays the value of the tag argument associated with the message Shows the count of the message data elements associated with the message when it was sent When you select the icon tothe right of the cnt field XMPI opens the XMPI Datatype dialog as shown in Figure 11 XMPI Datatype dialog Chapter 4 XMPI Datatype The XMPI Datatype dialog displays the type map of the datatype associated with the message when it was sent The datatype can be one of the predefined datatypes or a user defined datatype The datatype information changes as the trace file plays and processes communicate with each other 87 Profiling Using XMPI The message queue area describes the current state of the queue of messages sent to the process but not yet received The fields include src comm tag cnt copy 88 Displays the rank of the process sending the message A process is identified in the format rank x rank y where rank xindicates the rank of the process in MPI COMM WORLD and rank y indicates the rank of the process within the current communicator Names the communicator used by the HP MPI function When you select the icon tothe right of the comm field the hexagons for pro
155. ompliant library supports calls to the following MPI 2 0 standard functions e MPI_Init_thread e MPI Is thread main MPI Query thread No other MPI 2 0 calls are supported in the thread compliant library Appendix C 173 MPI 2 0 features supported MPI_Init NULL arguments MPI Init NULL arguments In MPI 1 1 itis explicitly stated that an implementation is allowed to require that the arguments argc and argv passed by an application to MPI INIT in C be the same arguments passed into the application as the arguments to main In MPI 2 implementations are not allowed to impose this requirement HP MPI complies with this MPI 2 standard extension by allowing applications to pass NULL for both the argc and argv arguments of main However MPI Init NULL NULL is supported onl y when you use mpirun to run your MPI application For example use one of the following mpirun np4 my program mpirun f my appfile Refer to Compiling and running your first application on page 19 and mpirun on page 49 for details about the methods to run your HP MPI application 174 Appendix C MPI 2 0 features supported One sided communication One sided communication Message passing communication involves transferring data from the sending process to the receiving process and synchronization of the sender and receiver Remote memory access and one sided communication extend the communication mechanism of MPI by separating the communication
156. on and MPIHP Trace off pair around code that you want to profile 2 Build the application and invoke mpirun with the t off option t off specifies that tracing is enabled but initially turned off refer to mpirun on page 49 and MPI XMPI on page 47 Data collection begins after all processes collectively call MPIHP Trace on XMPI collects trace information only for code between MPIHP Trace on and MPIHP Trace off 3 Run thetracefilein XMPI to identify problems during application execution MPIHP Trace on and MPIHP Trace off are collective routines and must be called by all ranks in your application Otherwise the application deadlocks Chapter 4 79 Profiling Using XMPI Viewing a trace file Usethe following instructions to view a trace file Step 1 Enter xmpi at your UNIX prompt to open the XMPI main window Refer to xmpi on page 61 for information about options you can specify with xmpi Figure 6 shows the XMPI main window Figure 6 XMPI main window Application Trace Options S a BE ca e App lt None gt 80 Chapter 4 Profiling Using XMPI Step 2 Select the Trace pull down menu on the main window then View XMPI invokes the XMPI Trace Selection dialog in which you can find and select your trace file Figure 7 shows the Trace Selection dialog Figure 7 XMPI Trace Selection XMPI Trace Selection _ ra Directories Files 1 i 7 m isw fin idev fepm1 fepmscratc
157. on for the compiler you are using If you use your own build script specify all necessary input libraries To determine what libraries are needed check the contents of the compilation utilities stored in the HP MPI opt mpi bin subdirectory HP MPI supports a 64 bit version of the MPI library on platforms running HP UX 11 0 Both 32 and 64 bit versions of the library are shipped with HP UX 11 0 For HP UX 11 0 you cannot mix 32 bit and 64 bit executables in the same application HP MPI does not support Fortran applications that are compiled with the following options e autodblpad Fortran 77 programs e autodbi Fortran 90 programs e autodb14 Fortran 90 programs Starting Starting a MPI executable without the mpi run utility is no longer supported For example applications previously started by using a out np args must now be started using npirun np a out args When starting multihost applications make sure that All remote hosts are listed in your rhosts file on each machine and you can remsh tothe remote machines The mpirun command has the ck option you can use to determine whether the hosts and programs specified in your MPI application are available and whether there are access or permission problems Refer to mpirun on page 49 Application binaries are available on the necessary remote hosts and are executable on those machines The sp option is passed to mpi run to set the target she
158. on of opt mpi 25 over subscribed See subscription types overhead process 83 lan P packing and unpacking 14 parent process 10 partial trace 93 peer See rank performance collective routines 111 communication hot spots 57 derived data types 111 disable argument check ing 40 latency bandwidth 104 201 105 polling schemes 111 synchronization 111 permissions See rhosts file ping_pong c 131 play trace file 84 trace log 83 PMPI prefix 101 point to point communica tions blocking 83 nonblocking 83 overview 5 See also nonblocking communication See also blocking com munication portability 3 postmortem mode 79 prefix for output file 68 MPI 101 PMPI 101 Prefix field 92 98 print HP MPI job ID 97 problems autodb1 29 autodb14 29 autodblpad 29 202 application hangs at MPI Send 130 build 122 exceeding file descriptor limit 126 external input and out put 126 Fortran 90 behavior 125 interoperability 125 message buffering 124 performance 104 105 111 propagation of environ ment variables 124 runtime 123 126 shared memory 123 UNIX open file descrip tors 126 process blocked 83 colors 83 86 hexagons 90 multi threaded 16 overhead 83 profile in HP MPI 100 rank 5 83 rank of peer process 87 rank of root 13 rank of source 8 reduce communications 105 running 83 single threaded 16 state 86
159. p Converts a Fortran group into a C group Converts a C reduction operation into a Fortran reduction operati on PI Op MPI Op f2c MPI Fint PI Fint PI Request c2f MPI Request Converts a Fortran reduction operation into a C reduction operation Converts a C request into a Fortran request 168 Appendix C MPI 2 0 features supported Language interoperability Routine Description PI_Request PI Request f2c MPI Fint int MPI Status c2f MPI Status PI Fint int MPI Status f2c MPI Fint PI Status Converts a Fortran request into a C request Converts a C status into a Fortran status Converts a Fortran status into a C status PI file MPI File f2c MPI Fint fil Appendix C PI Fint MPI File c2f MPI File fil Converts a Fortran file handle into a C file handle Converts a C file handle into a Fortran file handle 169 MPI 2 0 features supported Thread compliant library Thread compliant library HP MPI provides a thread compliant library for applications running under HP UX 11 0 32 and 64 bits On HP UX 11 0 HP MPI supports concurrent MPI calls by threads and a blocking MPI call blocks only the invoking thread allowing another thread to be scheduled By default the non thread compliant library libmpi is used when running MPI jobs Linking to the thread compliant library libmtmpi is now required only for applications that have multiple
160. pace separated list of arguments extra args for appfile program Chapter 3 Specifies extra arguments to be applied to the programs listed in the appfile A space separated list of arguments Usethis option at the end of your command line to append extra arguments to each line of your appfile Refer to the example in Adding program arguments to your appfile on page 56 for details Specifies the name of the executable file to run 53 CAUTION Understanding HP MPI Running applications IMPI_options Specifies this mpirun is an IMPI client Refer to IMPI on page 64 for more information on IMPI as well as a complete list of IMPI options Isf options Specifies bsub options that the load sharing facility LSF applies to the entire job that is every host Refer to the bsub 1 man page for a list of options you can use Notethat LSF must be installed for Isf options to work correctly stdio options Specifies standard IO options Refer to External input and output on page 126 for more information on standard IO as well as a complete list of stdio options The help version p and tv options are not supported with the bsub pam mpi mpirun Startup method Shared library support When a library is shared programs using it contain only references to library routines as opposed to archive libraries which must be linked into every program using them The same copy of the shared library is referenced by
161. prefix msgs rank where rank is the rank of a specific process Dumps formatted all sent and received messages to prefix msgs rank where rank is the rank of a specific process Defines a type signature packing size NUM is an unsigned integer that specifies the number of signature leaf elements For programs with diverse derived datatypes the default value may be too small If NUM is too small the diagnostic library issues a warning during the MPI Finalize operation Refer to Using the diagnostics library on page 118 for more information 36 Chapter 3 Understanding HP MPI Running applications MPI FLAGS MPI FLAGS modifies the general behavior of HP MPI TheMPI FLAGS syntax is a comma separated list as follows edde exdb egdb eadb ewdb 1 s alp l I8 1 Ivy E78 1 0 FE2 IC ID E z where edde Starts the application under the dde debugger The debugger must be in the command search path See Debugging HP MPI applications on page 114 for more information exdb Starts the application under the xdb debugger The debugger must be in the command search path See Debugging HP MPI applications on page 114 for more information egdb Starts the application under the gdb debugger The debugger must be in the command search path See Debugging HP MPI applications on page 114 for more information eadb Starts the application under adb the absolute debugger The
162. provided MPI THREAD SINGLE MPI THREAD SINGLE MPI THREAD SINGLE MPI THREAD SINGLE MPI THREAD SINGLE MPI THREAD FUNNELED MPI THREAD SERIALIZED MPI THREAD MULTIPLE Table 17 shows the relationship between the possible thread support levels in MPI Init thread and the corresponding options in MPI MT FLAGS Table 17 Thread support levels MPI Init thread MPI THREAD SINGLE MPI THREAD FUNNELED MPI THREAD SERIALIZED MPI THREAD MULTIPLE MPI MT FLAGS single fun serial mult Behavior Only one thread will execute The process may be multithreaded but only the main thread will make MPI calls The process may be multithreaded and multiple threads can make MPI calls but only one call can be made at a time Multiple threads may call MPI at any time with no restrictions This option is the default Refer to example thread safe c on page 158 for a program that uses multiple threads 172 Appendix C MPI 2 0 features supported Thread compliant library To prevent application deadlock do not call the thread compliant library from a signal handler or cancel a thread that is executing inside an MPI routine Counter instrumentation refer to Using counter instrumentation on page 68 is supported for the thread compliant library in addition to the standard MPI library Therefore you can collect profiling information for applications linked with the thread compliant library The thread c
163. r cache reasons but to satisfy the data dependency each thread computes a different portion of the same column and the threads work left to right across the rows together implicit none include integer integer mpif h nrow of rows ncol of columns parameter nrow 1000 ncol1 1000 double precision array nrow ncol compute region integer blk block iteration counter integer rb row block number integer cb column block number integer nrb next row block number integer ncb next column block number integer rbs row block start subscripts integer rbe row block end subscripts integer cbs column block start subscripts integer cbe column block end subscripts integer rdtype row block communication datatypes integer cdtype column block communication datatypes integer twdtype twisted distribution datatypes integer ablen array of block lengths integer adisp array of displacements integer adtype array of datatypes allocatable rbs rbe cbs cbe rdtype cdtype twdtype ablen adisp adtype integer rank rank iteration counter integer comm size number of MPI processes integer comm rank sequential ID of MPI process integer ierr MPI error code integer mstat mpi status size MPI function status integer src source rank integer dest destination rank integer dsize size of double precision in bytes Appendix A 149 Example applications multi_par f dou
164. r f integer cbe column block end subscript Local variables integer i j The OPENMP directives below allow the compiler to split the values for i between a number of threads while j moves forward lock step between the threads By making j shared and i private all the threads work on the same column j at any given time but they each work on a different portion i of that column This is not as efficient as found in the compcolumn subroutine but is necessary due to data dependencies PARALLEL PRIVATE i do jemax 2 cbs cbe DO do i rbs rbe array i j array i j 1 array i j enddo END DO enddo END PARALLEL end COX RAI RIERA RARA RARA RARA RARA KKK KKK KKK KKK KKK KK KK KK KKK KKK KK KKK KKK KK ck k KK KKK Q subroutine getdata nrow ncol array Enter dummy data integer nrow ncol double precision array nrow ncol do j 1 ncol do i 1 nrow array i j j 1 0 ncolti enddo enddo end Appendix A 155 Example applications io c 10 C In this C example each process writes to a separate file called iodatax where x represents each process rank in turn Then the data in iodatax is read back include lt stdio h gt include lt string h gt include lt stdlib h gt include lt mpi h gt define SIZE 65536 define FILENAME iodata main argc argv int argc char argv int buf i rank nints len flag char filename MPI_File fh MPI_Status status MPI
165. ration To synchronize the execution of processes call MPI Barrier MPI Barrier blocks the calling process until all processes in the communicator have called it This is a useful approach for separating two stages of a computation so messages from each stage do not overlap Chapter 1 13 Introduction MPI concepts To implement a barrier use MPI Barrier MPI Comm comm where comm Identifies a group of processes and a communication context For example cart C on page 142 uses MPI Barrier to synchronize data before printing MPI datatypes and packing You can use predefined datatypes for example MPI INT in C to transfer data between two processes using point to point communication This transfer is based on the assumption that the data transferred is stored in contiguous memory for example sending an array in a C or Fortran application When you want to transfer data that is not homogeneous such as a structure or that is not contiguous in memory such as an array section you can use derived datatypes or packing and unpacking functions Derived datatypes Specifies a sequence of basic datatypes and integer displacements describing the data layout in memory You can use user defined datatypes or predefined datatypes in MPI communication functions Packing and Unpacking functions Provide MPI_Pack and MPI_Unpack functions so that a sending process can pack noncontiguous data intoa contiguous buffer and a receiving pro
166. read compliant An implementation where an MPI process may be multithreaded If it is each thread can issue MPI calls However the threads themselves are not separately addressable trace Information collected during program execution that you can use to analyze your application You can collect trace information and store it in a file for later use or analyze it directly when running your application interactively for example when you run an application in the XMPI utility yield Seespin yield XMPI An X Motif graphical user interface for running applications monitoring processes and messages and viewing trace files 191 192 Index Symbols manl Z 25 creating 55 autodbl 29 opt mpi share man improving communica autodbl4 29 man3 Z 25 tion on multihost autodblpad 29 systems 57 DA2 option 29 Numerics setting remote environ DD64 option 29 64 bit support 29 ment variables mpiview file 68 in 57 tr file 79 tr output file 92 lopt aCC bin aCC 28 lopt ansic bin cc 28 lopt fortran bin f77 28 opt fortran90 bin f90 28 opt mpi subdirectories 25 opt mpi directory organization of 25 lopt mpi bin 25 lopt mpi doc html 25 lopt mpi help 25 opt mpi include 25 opt mpi lib hpux32 25 opt mpi lib hpux64 25 lopt mpi lib pal 1 libfm pi a 25 opt mpi lib pa20 64 libfm pi a 25 opt mpi lib X 1 1 app de faults 25 opt mpi newconfig 25 opt mpi share man A abort HP MPI
167. rectangles Synchronizes for a global summation Process O prints the result of the calculation program main include mpif h double precision PI25DT parameter PI25DT 3 141592653589793238462643d0 double precision mypi pi h sum x f a integer n myid numprocs i ierr E C Function to integrate C f a 4 830 1 d0 a a call MPI INIT ierr call MPI COMM RANK MPI COMM WORLD myid ierr call MPI COMM SIZE MPI COMM WORLD numprocs ierr print Process myid of numprocs is alive sizetype 1 sumtype 2 if myid eq 0 then n 100 endif call MPI BCAST n 1 MPI INTEGER 0 MPI COMM WORLD ierr C C Calculate the interval size C h 1 0240 n sum 0 0d0 do 20 i myid 1 n numprocs x h dble i 0 5d0 sum sum f x 20 continue 138 Appendix A Example applications compute_pi f mypi h sum C C Collect all the partial sums C call MPI REDUCE mypi pi 1 MPI DOUBLE PRECISION MPI SUM 0 MPI COMM WORLD ierr C C Process 0 prints the result C if myid eq 0 then write 6 97 pi abs pi PI25DT 97 format pi is approximately F18 16 to EEEO LST ELS 16 endif call MPI_FINALIZE ierr stop end compute pi output The output from running the compute pi executable is shown below The application was run with np 10 Process 0 of 10 is alive Process 1 of 10 is alive Process 3 of 10 is alive Process 9 of 10 is alive Process 7 of 10 is a
168. refer tothe IP addresses that LSF assigns Use LSF todothis mapping by specifying a variant of mpi run to execute your job This is the last release of HP MPI that will support tightly coupled integration between LSF s Parallel Application Manager PAM and HP MPI Shell scripts will be provided to enable similar functionality when support for this feature is discontinued Native Language Support By default diagnostic messages and other feedback from HP MPI are provided in English Support for other languages is available through the use of the Native Language Support NL S catalog and the internationalization environment variable NLSPATH The default NLS search path for HP MPI is NLSPATH Refer tothe environ 5 man page for NLSPATH usage When an MPI language catalog is available it represents HP MPI messages in two languages The messages are paired so that the first in the pair is always the English version of a message and the second in the pair is the corresponding translation to the language of choice Refer to the hpnls 5 environ 5 and lang 5 man pages for more information about Native Language Support Chapter 3 65 Understanding HP MPI Running applications 66 Chapter 3 Profiling This chapter provides information about utilities you can use to analyze HP MPI applications The topics covered are Using counter instrumentation Creatingan instrumentation profile Viewing ASCII instrumentation data
169. refix using the MPI INSTR environment variable Refer to MPI INSTR on page 41 for syntax information For example setenv MPI INSTR compute pi Specifies the instrumentation output file prefix as compute pi Specifications you make using mpirun i override any specifications you make using the MPI INSTR environment variable 68 Chapter 4 CAUTION Profiling Using counter instrumentation MPIHP_Trace_on and MPIHP_Trace_off By default the entire application is profiled from MPI Init to MPI Finalize However HP MPI provides the nonstandard MPIHP Trace on and MPIHP Trace off routines to collect profile information for selected code sections only Tousethis functionality 1 Insert the MPIHP Trace on and MPIHP Trace off pair around code that you want to profile 2 Build the application and invoke mpi run with the i off option i off specifies that counter instrumentation is enabled but initially turned off refer to mpirun on page 49 and MPI INSTR on page 41 Data collection begins after all processes collectively call MPIHP Trace on HP MPI collects profiling information only for code between MPIHP Trace on and MPIHP Trace off MPIHP Trace on and MPIHP Trace off are collective routines and must be called by all ranks in your application Otherwise the application deadlocks Viewing ASCII instrumentation data TheASCII instrumentation profileis a text file with the instr extension For example to view the
170. ronment variable to point to the new location Set PATH to MPI_ROOT bin Set MANPATH to MPI_ROOT share man MPI must be installed in the same directory on every execution host If you have HP MPI installed on your system and want to determine its version use the what command The what command returns The path where HP MPI is installed TheHP MPI version number The date this version was released The product number e The operating system version For example what opt mpi bin mpicc opt mpi bin mpicc HP MPI 01 07 00 00 dd mm yyyy B6060BA HP UX 11 0 18 Chapter 2 Getting started Compiling and running your first application Compiling and running your first application To quickly become familiar with compiling and running HP MPI programs start with the C version of a familiar hello world program This program is called hello world c and prints out thetext string Hello world I m r of s on host where r is a process s rank sis the size of the communicator and host is the host on which the program is run The processor name is the host name for this implementation The source code for hello world c is stored in opt mpi help and is shown below tinclude lt stdio h gt tinclude lt mpi h gt void main argc argv int argc char argv int rank size len char name MPI MAX PROCESSOR NAME PI Init amp argc amp argv PI Comm rank MPI COMM WORLD amp rank PI Comm size MPI
171. roperability conversion routines o ooooooooo 168 HP MPI library usage sssseeee RR III III 170 Thread initialization values 0 0 0 cece 172 Thread support levels 0 0 0 cece eee 172 Info objed routines rt rub sq dag eee 179 Naming object routines 0 0 tees 179 HP MPI implementation of standard flexibleissues 181 List of Tables xi xii List of Tables Preface This guide describes the HP MPI version 1 7 implementation of the Message Passing Interface MPI standard The guide helps you use HP MPI to develop and run parallel applications You should already have experience developing UNIX applications You should also understand the basic concepts behind parallel processing be familiar with MPI and with the MPI 1 2 and MPI 2 0 standards MPI A Message Passing Interface Standard and MPI 2 Extensions to the Message Passing Interface respectively You can access HTML versions of the MPI 1 2 and 2 0 standards at http www mpi forum org This guide supplements the material in the MPI standards and MPI The Complete Reference The HP MPI User s Guideis provided in HTML format with HP MPI Refer to opt mpi doc html in your product See Directory structure on page 25 for more information Some sections in this book contain command line examples used to demonstrate HP MPI concepts These examples use the bin csh syntax for illustration purposes xiii Syst
172. s immediately and may only have started the message transfer operation not necessarily completed it The application may not safely reusethe message buffer after a nonblocking routine returns 6 Chapter 1 Introduction MPI concepts In nonblocking communications the following sequence of events occurs 1 The sending routine begins the message transfer and returns immediately 2 The application does some computation 3 The application calls a completion routine for example MPT Test or MPI Wait totest or wait for completion of the send operation Blocking communication Blocking communication consists of four send modes and one receive mode The four send modes are Standard MPI Send The sending process returns when the system can buffer the message or when the message is received and the buffer is ready for reuse Buffered MPI Bsend The sending process returns when the message is buffered in an application supplied buffer Avoid using the MPI Bsend mode because it forces an additional copy operation Synchronous MPI_Ssend The sending process returns only if a matching receive is posted and the receiving process has started to receive the message Ready MPI Rsend The message is sent as soon as possible You can invoke any mode by using the appropriate routine name and passing the argument list Arguments arethe same for all modes For example to code a standard blocking send use MPI Send void b
173. s 48 63 hostO host1 HostO processes with rank 0 15 communicate with processes with rank 16 31 through shared memory shmem HostO processes also communicate through the hostO ethernetO and the hostO ethernet1 network interfaces with host1 processes 108 Chapter 5 Table 8 Tuning Processor subscription Processor subscription Subscription refers tothe match of processors and active processes on a host Table 8 lists possible subscription types Subscription types Subscription type Description Under subscribed More processors than active processes Fully subscribed Equal number of processors and active processes Over subscribed More active processes than processors When a host is over subscribed application performance decreases because of increased context switching Context switching can degrade application performance by slowing the computation phase increasing message latency and lowering message bandwidth Simulations that use timing sensitive algorithms can produce unexpected or erroneous results when run on an over subscribed system In asituation where your system is oversubscribed but your MPI application is not you can use gang scheduling to improve performance Refer to Gang scheduling for details Chapter 5 109 Tuning MPI routine selection MPI routine selection To achieve the lowest message latendies and highest message bandwidths for point to point synchronous communic
174. s before the fence call returns Remote memory access operations started by a process after the fence call returns access their target window only after MPT Win fence has been called by the target process MPI Win lock and MPI Win unlock start and complete a remote memory access epoch respectively Remote memory access operations issued during the epoch complete at the origin and at the target before MPI Win unlock returns 176 Appendix C MPI 2 0 features supported One sided communication Restrictions for the HP MPI implementation of one sided communication include MPI window segments must be allocated using MP T_Al1loc_mem they cannot be placed in COMMON blocks the stack or the heap Multi host user programs that call one sided communication functions must be started by mpirun with the commd option This option is not required on single host programs MPI Accumulate is not supported e Non contiguous derived data types are not supported for one sided communications One sided communications are not supported in the diagnostic library One sided communications are not supported in the multithreaded library Appendix C 177 MPI 2 0 features supported Miscellaneous features Miscellaneous features Miscellaneous features supported from sections 4 6 through 4 10 and section 8 3 through 8 4 of the MPI 2 0 standard include Committing a committed datatype Allows MPI Type commit to accept committed datat
175. s not mandate a buffering strategy HP MPI does sometimes use buffering for MPI Send and MPI Rsend but it is dependent on message size Deadlock situations can occur when your code uses standard send operations and assumes buffering behavior for standard communication mode Refer to Frequently asked questions on page 129 for an example of how to resolve a deadlock situation 8 Chapter 1 Table 2 Introduction MPI concepts Nonblocking communication MPI provides nonblocking counterparts for each of thefour blocking send routines and for the receive routine Table 2 lists blocking and nonblocking routine calls MPI blocking and nonblocking calls Blocking mode Nonblocking mode PI Send PI Isend PI Bsend PI Ibsend PI Ssend PI Issend PI Rsend PI Irsend PI Recv PI Irecv Nonblocking calls have the same arguments with the same meaning as their blocking counterparts plus an additional argument for a request To code a standard nonblocking send use MPI Isend void buf int count MPI datatype dtype int dest int tag MPI Com comm MPI Request req where req Specifies the request used by a completion routine when called by the application to complete the send operation To complete nonblocking sends and receives you can use MPI Wait or MPI Test The completion of a send indicates that the sending process is free to access the send buffer The completion of a receive indicates that the receive
176. s the structure for daemon communication Daemon communication Socket connection p Daemon process Outbound Inbound shared memory y fragments Application E processes host1 host2 To use daemon communication specify the comma option in thempi run command Once you have set the comma option you can use the MP I_COMMD environment variable to specify the number of shared memory fragments used for inbound and outbound messages Refer to mpirun on page 49 and MPI COMMD on page 35 for more information Daemon communication can result in lower application performance Therefore use it only when scaling an application to a large number of hosts HP MPI sets up one daemon per host or appfile entry for communication If you invoke your application with np x HP MPI generates x 1 processes Chapter 3 63 Understanding HP MPI Running applications IMPI The Interoperable MPI protocol IMPI extends the power of MPI by allowing applications to run on heterogeneous clusters of machines with various architectures and operating systems while allowing the program to use a different implementation of MPI on each machine This is accomplished without requiring any modifications to the existing MPI specification That is IMPI does not add remove or modify the semantics of any of the existing MPI routines All current valid MPI programs can be run in this way without any changes to their
177. see Figure 18 on page 98 XMPI loads and displays the prefix tr output filein the XMPI Trace window After XMPI loads and displays the tr output filein the XMPI Trace window you cannot resume snapshot monitoring even though the application may still be running Chapter 4 93 Step 4 Step 5 Profiling Using XMPI In interactive mode XMPI gathers and displays data from the running appfile or a trace file When an application is running the data sourceis the appfile and automatic snapshot is enabled Even though the application may be creating trace data the snapshot function does not use it Instead the snapshot function acquires data from internal hooks in HP MPI At any point in interactive mode you can load and view a trace file by selecting the Trace menu then the View or Express command When you usethe View or Express command to load and view a trace file the data source switches to the loaded trace file and the snapshot function is disabled You must rerun your application to switch the data source from a trace file back to an appfile Select Clean from the Application menu at any timein interactive mode to kill the application and dose any associated XMPI Focus and XMPI Datatype dialogs XMPI displays the XM PI Confirmation dialog to confirm that you want toterminate the application Select Yes to terminate your application and dose associated dialogs You can run another application by selecting an appfile
178. source code In IMPI all messages going out of a host go through the daemon The messages between daemons have the fixed message format The protocols in different IMPI implementations are the same Currently IMPI is not supported in multi threaded library If the user application is a multi threaded program it is not allowed to start as an IMPI job An IMPI server is available for download from Notre Dame at http www Isc nd edu research i mpi ThelMPI syntax is mpirun dient ip port where client Specifies this mpirun is an IMPI client Specifies the client number The first is 0 ip Specifies the IP address of the IMPI server port Specifies the port number of the IMPI server Assigning hosts using LSF The load sharing facility LSF allocates one or more hosts to run an MPI job In general LSF improves resource utilization for MPI jobs that run in multihost environments LSF handles the job scheduling and the allocation of the necessary hosts and HP MPI handles the task of starting up the application s processes on the hosts selected by LSF 64 Chapter 3 NOTE Understanding HP MPI Running applications By default mpirun starts the MPI processes on the hosts specified by the user in effect handling the direct mapping of host names to IP addresses When you use LSF to start MPI applications the host names specified to mpirun or implicit when the h option is not used are treated as symbolic variables that
179. ss in an HP MPI application Profile information is stored in a separate performance data file PDF for each process To analyze your profiling data using CXperf you must first use the merge utility to merge the data from the separate files into a single PDF Refer to the merge 1 man page Using CXperf you can instrument your application to collect performance using one or more of the following metrics Wall dock time CPU time Execution counts Cachemisses Latency Migrations Context switches e Page faults Instruction counts Data translation lookaside buffer DTLB misses e Instruction translation lookaside buffer I TL B misses You can display the data as a 3D Parallel profile a 2D Summary profile a text report or a dynamic call graph For more information refer tothe CXperf User s Guide and the CXperf Command Reference 100 Chapter 4 Profiling Using the profiling interface Using the profiling interface The MPI profiling interface provides a mechanism by which implementors of profiling tools can collect performance information without access to the underlying MPI implementation source code BecauseHP MPI provides several options for profiling your applications you may not need the profiling interface to write your own routines HP MPI makes use of MPI profiling interface mechanisms to provide the diagnostic library for debugging In addition HP MPI provides tracing and lightweight counter
180. ssed request 0 for c server 1 processed request 0 for c server 0 processed request for G server 0 processed request for server 1 processed request FOE server 0 processed request 0 for c server 0 processed request 0 for c server 1 processed request 0 for c server 0 processed request Lor c server 1 processed request 0 for c server 0 processed request for c server 1 processed request Lor server 0 processed request 1 for c server 1 processed request 0 for c server 0 processed request for G server 1 processed request 1 for c server 0 processed request 0 for c server 1 processed request 0 for c server 0 processed request por y server 1 processed request 0 for c server 0 processed request 0 for c server 1 processed request for server 0 processed request for c server 1 processed request 0 for c server 1 processed request Lor server 0 processed request for c server 1 processed request 0 for c server 0 processed request 1 for c server 0 processed request 0 for c server 1 processed request 0 for c server 0 processed request 1 for c server 0 processed request 0 for c server 0 processed request 0 for c server 0 processed request 0 for c server 1 processed request 0 for c 160 lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient lient
181. starting at ADR must be available to all other processes Placing MPI Init to execute first can help avoid this problem A process with a large stack sizeis also proneto this failure Choose process stack size carefully Chapter 6 123 Table 9 Debugging and troubleshooting Troubleshooting HP MPI applications Message buffering According to the MPI standard message buffering may or may not occur when processes communicate with each other using MPI Send MPI Send buffering is at the discretion of the MPI implementation Therefore you should take care when coding communications that depend upon buffering to work correctl y For example when two processes use MPI Send to simultaneously send a message to each other and use MPT_Recv to receive the messages the results are unpredictable If the messages are buffered communication works correctly If the messages are not buffered however each process hangs in MPI Send waiting for MPI Recv to take the message For example a sequence of operations labeled Deadlock as illustrated in Table 9 would result in such a deadlock Table 9 also illustrates the sequence of operations that would avoid code deadlock Non buffered messages and deadlock Deadlock No Deadlock Process 1 Process 2 Process 1 Process 2 MPI Send 2 MPI_Send 1 MPI_Sena 2 MPI Recv 1 MPI_Recv 2 MPI_Recv 1 MPI_Recv 2 MPI Send 1 Propagation of
182. summary by rank and peer Displays data by rank and its peer rank for a given routine Each time you select a graph mpiview displays it in a separate window with thetitle of the graph and the filename of the data file used to generate it in thetitlebar 74 Chapter 4 Profiling Using counter instrumentation Figure4 is an example of a graph window containing a Message length summary by rank and peer graph Figure4 MPIVIEW graph window Save graph as postscript View graph data Change context of graph Reset orientation MPIVIEW Message summary by rank and peer send receive mpiview File Options Help Qe 25 Graph Type MPI Send Message length summary by rank and peer for MPI Send L egend Pop up with data for MPI Send Chapter 4 75 Figure 5 File Graph Profiling Using counter instrumentation Viewing multiple graphs From the Window pulldown menu you can Select one of the graphs from the list to view The mpiview utility shuffles the window containing the selected graph to the top of your stack of overlapping windows Select Close all windows to dismiss all the graphs from your display The mpiview utility does not impose a limit on the number of graphs it displays at any time Each time you select a graph mpiview displays it in a separate window with the title of the graph and the filename of the data file in the titlebar The mpiview Window pulldown menu initially contains only the Clos
183. t Each line specifies the name of the executable file and the number of processes to run on the host The n option is followed by the name of the host where the specified processes must be run Instead of using the host name you may use its IP address Run the hello world executable file mpirun f my appfile The option specifies the filename that follows it is an appfile mpirun parses the appfile line by line for the information to run the program In this example mpirun runs the hello world program with two processes on thelocal machine jawbone and two processes on the remote machine wizard as dictated by the np 2 option on each line of the appfile Analyze hello world output HP MPI prints the output from running the hello world executable in non deterministic order The following is an example of the output Hello world I m 2 of 4 on wizard Hello world I m 0 of 4 on jawbone Hello world I m 3 of 4 on wizard Hello world I m 1 of 4 on jawbone Notice that processes 0 and 1 run on jawbone the local host while processes 2 and 3 run on wizard HP MPI guarantees that the ranks of the processes in MPI COMM WORLD are assigned and sequentially ordered according to the order the programs appear in the appfile The appfile in this example my appfile describes the local host on the first line and the remote host on the second line 22 Chapter 2 Getting started Running and collecting profiling data Running and colle
184. t A mode of operation for an MPI application where a cluster is used to carry out a parallel application run nonblocking receive Communication in which the receiving process returns before a message is stored in the receive buffer Nonblocking receives are useful when communication and computation can be effectively overlapped in an MPI application Use of nonblocking receives may also avoid system buffering and memory to memory copying 188 nonblocking send Communication in which the sending process returns before a message is stored in the send buffer Nonblocking sends are useful when communication and computation can be effectively overlapped in an MPI application non determinism A behavior describing non repeatable observed parameters The order of a set of events depends on run time conditions and so varies from run to run parallel efficiency An increase in speed in the execution of a parallel application point to point communication Communication where data transfer involves sending and receiving messages between two processes This is the simplest form of data transfer in a message passing model polling Mechanism to handle asynchronous events by actively checking to determine if an event has occurred process Address space together with a program counter a set of registers and a stack Processes can be single threaded or multithreaded Single threaded processes can only perform one task at a
185. t is directed to a specific MPI process Broadcast Input is copied to the stdin of all processes Ignore Input is ignored The default behavior is standard input is ignored Additional options are available to avoid confusing interleaving of output Line buffering block buffering or no buffering e Prepending of processes ranks to their stdout and stderr e Simplification of redundant output Backtrace functionality HP MPI 1 7 handles several common termination signals differently than earlier versions of HP MPI If any of the following signals are generated by an MPI application a stack trace is printed prior to termination SIGBUS bus error SIGSEGV segmentation violation e SIGILL illegal instruction e SIGSYS illegal argument to system call The backtrace is helpful in determining where the signal was generated and the call stack at the time of the error If a signal handler is established by the user code before calling MPI Init no backtrace will be printed for that signal type and the user s handler will be solely responsible for handling the signal Any signal handler installed after MPI Init Will also override the backtrace functionality for that signal after the point it is established If multiple processes cause a signal each of them will print a backtrace Chapter 6 119 Debugging and troubleshooting Debugging HP MPI applications In some cases the prepending and buffering options availablei
186. tallation directory from its default location in opt mpi set the MPI ROOT environment variable to point to the new location Refer to Configuring your environment on page 18 Organization of the opt mpi directory Subdirectory Contents bin Command files for the HP MPI utilities doc html TheHP MPI User s Guide help Source files for the example programs include Header files lib X11 app defaults Application default settings for the XMPI trace utility and the mpiview profiling tool lib pal 1 MPI PA RISC 32 bit libraries lib pa20 64 MPI PA RISC 64 bit libraries lib hpux32 MPI Itanium 32 bit libraries lib npux64 MPI Itanium 64 bit libraries newconfig Configuration files and release notes share man man1 Z Man pages for the HP MPI utilities share man man3 Z Man pages for HP MPI library The man pages located in the opt mpi share man man1 Z subdirectory can be grouped into three categories general compilation and run time There is one general man page MPI 1 that is an overview describing general features of HP MPI The compilation and run time man pages are those that describe HP MPI utilities Chapter 2 25 Table 4 General Getting started Directory structure Table 4 describes the three categories of man pages in the man1 Z subdirectory that comprise man pages for HP MPI utilities Man page categories MPI 1 Describes the general features of HP MPI Compilation Run
187. the mpirun options trace dialog To invoke the XM PI Express dialog select the Trace pulldown menu then Express while your application is running 92 Chapter 4 Figure 14 Profiling Using XMPI Figure 14 displays the XMPI Express dialog XMPI Express dialog XMPI Express Select one of two options from the dialog Terminate the application and get full trace Specifies that the content of each process buffer is written to a trace file The write happens whether process buffers are partially or totally full Thetrace files for each process are consolidated in a prefix tr output file where prefix is the name you specified in the Prefix field of the Tracing options dialog see Figure 18 on page 98 XMPI loads and displays the prefix tr output filein the XMPI Trace window When you select this field XMPI displays the XMPI Confirmation dialog to confirm that you want to terminate the application You must select Yes before processing will continue After XMPI loads and displays the tr output filein the XMPI Trace window you cannot resume snapshot monitoring because the application has terminated Get partial trace that processes dump at every 4096 kilobytes Specifies that the content of each process buffer is written to a trace file only after the buffer becomes full Thetracefiles are then consolidated to a prefix tr output file where prefix is the name you specified in the Prefix field of the Tracing options dialog
188. threads making MPI calls simultaneously Table 15 shows which library to use for a given HP MPI application type Table 15 HP MPI library usage Application type Library to link Comments Non threaded MPI application libmpi Most MPI applications Non threaded MPI application libmtmpi libmpi Potential performance with mostly nonblocking communication Non parallel MLIB applications link with 1veclib improvement if run with libmtmpi and the communication thread MPI MT FLAGS ct Thread parallel MLIB applications link with 1veclib 03 Oparallel Using pthreads libmtmpi libmpi If the user is explicitly using pthreads and they guarantee that no 2 threads call MPI at the sametime libmpi can be used Otherwise use libmtmpi libmpi represents the non thread compliant library libmtmpi represents thethread compliant library NOTE When you use the thread compliant library overall performance is a function of the level of thread support required by the application Thread support levels are described in Table 16 on page 172 170 Appendix C MPI 2 0 features supported Thread compliant library To link with the thread compliant library use the 1ibmtmpi option when compiling your application To create a communication thread for each process in your job for example to overlap computation and communication specify the ct option in the MPI_MT_FLAGS environment variable Se
189. time mpicc 1 mpiCC 1 mpif77 1 Describes the available compilation mpif90 1 utilities Refer to Compiling applications on page 28 for more information n mpidean 1 mpijob 1 Describes runtime utilities environment mpirun 1 mpivi ew 1 variables debugging thread safe and xmpi 1 mpienv 1 diagnostic libraries mpidebug 1 mpimtsafe 1 26 Chapter 2 Understanding HP MPI This chapter provides information about the HP MPI implementation of MPI Thetopics covered include details about compiling and running your HP MPI applications Compiling applications Compilation utilities 64 bit support Thread compliant library e Running applications Types of applications Runtime environment variables Runtime utility commands Communicating using daemons MPI Assigning hosts using LSF Native Language Support Chapter 3 27 Table 5 Understanding HP MPI Compiling applications Compiling applications The compiler you use to build HP MPI applications depends upon which programming language you use The HP MPI compiler utilities are shell scripts that invoke the appropriate native compiler You can pass the pathname of the MPI header files using the 1 option and link an MPI library for example the diagnostic or thread compliant library using the w1 L or 1 option By default HP MPI compiler utilities include a small amount of debug information in order to allow the TotalView debugger
190. time Multithreaded processes can perform multiple tasks concurrently as when overlapping computation and communication race condition Situation in which multiple processes vie for the same resource and receive it in an unpredictable manner Race conditions can lead to cases where applications do not run correctly from one invocation to the next rank Integer between zero and number of processes 1 that defines the order of a process in a communicator Determining the rank of a process is important when solving problems where a master process partitions and distributes work toslave processes The slaves perform some computation and return the result tothe master as the solution ready send mode Form of blocking send where the sending process cannot start until a matching receive is posted The sending process returns immediately reduction Binary operations such as summation multiplication and boolean applied globally to all processes in a communicator These operations areonly valid on numeric data and are always associative but may or may not be commutative scalable Ability to deliver an increase in application performance proportional to an increase in hardware resources normally adding more processors scatter One to many operation where the root s send buffer is partitioned into n segments and distributed to all processes such that the ith process receives the ith segment n represents the total
191. tion to basic MPI concepts Advanced MPI topics include Error handling Process topologies User defined datatypes Process grouping Communicator attribute caching e TheMPI profiling interface To learn more about the basic concepts discussed in this chapter and advanced MPI topics refer to MPI The CompleteReference and MPI A Message Passing Interface Standard 16 Chapter 1 Getting started This chapter describes how to get started quickly using HP MPI The semantics of building and running a simple MPI program are described for single and multiple hosts You learn how to configure your environment before running your program You become familiar with the file structure in your HP MPI directory The goal of this chapter is to demonstrate the basics to getting started using HP MPI For complete details about running HP MPI and analyzing and interpreting profiling data refer to Chapter 3 Understanding HP MPI and Chapter 4 Profiling Thetopics covered in this chapter are Configuring your environment Compiling and running your first application Building and running on a single host Building and running on multiple hosts Running and collecting profiling data Directory structure Chapter 2 17 NOTE Getting started Configuring your environment Configuring your environment If you move the HP MPI installation directory from its default location in opt mpi e Set theMPI_ROOT envi
192. to function However certain compiler options are incompatible with this debug information Use the not v option to exclude debug information The notv option will also disable TotalView usage on the resulting executable The notv option applies to archive libraries only Compilation utilities HP MPI provides separate compilation utilities and default compilers for the languages shown in Table 5 Compilation utilities Language Utility Default compiler C mpicc opt ansic bin cc C mpicc opt aCC bin aCC Fortran 77 mpif77 opt fortran bin f77 Fortran 90 mpif90 opt fortran90 bi n f90 If aCC is not available mpicc uses CC as the default C compiler Even though the mpicc and mpif90 compilation utilities are shipped with HP MPI all C4 and Fortran 90 applications use C and Fortran 77 bindings respectively If you want to use a compiler other than the default one assigned to each utility set the corresponding environment variables shown in Table 6 28 Chapter 3 Table 6 CAUTION Understanding HP MPI Compiling applications Compilation environment variables Environment variable MPI CC MP I_CXX MPI_F77 MPI_F90 HP MPI does not support applications that are compiled with the following options e autodblpad Fortran 77 programs e autodb1 Fortran 90 programs e autodbl4 Fortran 90 programs 64 bit support HP UX 11 0is available as a 32 and 64 bit operating system
193. to kill an MPI application You can only kill jobs that are your own The second syntax is used when an application aborts during MPI Init and the termination of processes does not destroy the allocated shared memory segments xmpi xmpi invokes the XMPI utility an X Motif graphical user interface for running applications monitoring processes and messages and viewing trace files The xmpi syntax is shown below xmpi h bg arg bd arg bw arg display arg fg arg geometry arg iconic title arg where the xmpi arguments are standard X M otif arguments Chapter 3 61 NOTE NOTE Understanding HP MPI Running applications The X resource settings that determine the default settings for displaying XMPI are in opt mpi lib X11 app defaults XM PI Refer to Using XM PI on page 78 and Appendix B XMPI resource file for more information HP MPI 1 7 is the last release that supports XMPI and mpiview XMPI and mipview are not supported for Itanium based systems mpiview mpiview invokes the mpiview utility a graphical user interface to display counter instrumentation data mpiview reads a prefix mpiview file containing the counter instrumentation data You specified the filename prefix either in the environment variable MPI INSTR refer to MPI INSTR on page 41 or by using the i option with the mpirun command refer to mpirun on page 49 For example mpiview my data mpiview invokes mpivi
194. uf int count MPI Datatype dtype int dest int tag MPI Comm comm where buf Specifies the starting address of the buffer count Indicates the number of buffer elements Chapter 1 7 NOTE Introduction MPI concepts dtype Denotes the datatype of the buffer elements dest Specifies the rank of the destination process in the group associated with the communicator comm tag Denotes the message label comm Designates the communication context that identifies a group of processes To code a blocking receive use MPI Recv void buf int count MPI datatype dtype int source int tag MPI Comm comm MPI Status status where buf Specifies the starting address of the buffer count Indicates the number of buffer elements dtype Denotes the datatype of the buffer elements source Specifies the rank of the source process in the group associated with the communicator comm tag Denotes the message label comm Designates the communication context that identifies a group of processes status Returns information about the received message Status information is useful when wildcards are used or the received message is smaller than expected Status may also contain error codes Examples send receivef on page 133 ping pong c on page 135 and master worker f90 on page 140 all illustrate the use of standard blocking sends and receives You should not assume message buffering between processes becausethe MPI standard doe
195. ugh the software High performance Fortran is an example of implicit parallelism intercommunicators Communicators that allow only processes within the same group or in two different groups to exchange data These communicators support only point to point communication intracommunicators Communicators that allow processes within the same group to exchange data These communicators support both point to point and collective communication instrumentation Cumulative statistical information collected and stored in ascii format Instrumentation is the recommended method for collecting profiling data latency Time between the initiation of the data transfer in the sending process and the arrival of the first byte in the receiving process load balancing Measure of how evenly the work load is distributed among an application s processes When an application is perfectly balanced all processes share the total work load and complete at the same time locality Degree to which computations performed by a processor depend only upon local data Locality is measured in several ways including the ratio of local to nonlocal data accesses message bin A message bin stores messages according to message length You can definea message bin by defining the byte range of the message to be stored in the bin usetheMPI_INSTR environment variable message passing model Model in which processes communicate with each other by se
196. umn sections supports parallelization of the first outer loop Partitioning the array into row sections supports parallelization of the second outer loop However this approach requires a massive data exchange among processes because of run time partition changes In this case twisted data layout partitioning is a better approach because the partitioning used for the parallelization of the first outer loop can accommodate the partitioning of the second outer loop The partitioning of the array is shown in Figure 20 Appendix A 147 Figure 20 row block Example applications multi_par f Array partitioning column block In this sample program the rank n process is assigned to the partition n at distribution initialization Because these partitions are not contiguous memory regions MPI s derived datatype is used to define the partition layout to the MPI system Each process starts with computing summations in row wisefashion For example the rank 2 process starts with the block that is on the Oth row block and 2nd column block denoted as 0 2 The block computed in the second step is 1 3 Computing the first row elements in this block requires the last row elements in the 0 3 block computed in the first step in the rank 3 process Thus the rank 2 process receives the data from the rank 3 process at the beginning of the second step Note that the rank 2 process also sends the last row elements of the 0 2
197. y call MPIHP Trace on nl Specifies not to dump a long breakdown of the measurement data to the instrumentation output file that is do not dump minimum maxi mum and average time data np Specifies that a per process breakdown of the measurement data is not dumped to the instrumentation output file nm Specifies that message size measurement data is not dumped to the instrumentation output file c Specifies that time measurement data is not dumped to the instrumentation output file Refer to U sing counter instrumentation on page 68 for more information Even though you can specify profiling options through the MPI_INSTR environment variable the recommended approach is to use the mpirun command with the i option instead Using mpirun to specify profiling options guarantees that multihost applications do profiling in a consistent manner Refer to mpirun on page 49for moreinformation Counter instrumentation and trace file generation used in conjunction with XMPI are mutually exclusive profiling techniques When you enable instrumentation for multihost runs and invoke mpirun either on a host where at least one MPI process is running or on a host remote from all your MPI processes HP MPI writes the instrumentation output files prefix instr and prefix mpiview to the working directory on the host that is running rank 0 42 Chapter 3 Understanding HP MPI Running applications MPI LOCALIP MP I_LOCALIP specifies
198. ypes n this case no action is taken e Allowing user functions at process termination D efines what actions take place when a process terminates These actions are specified by attaching an attribute to MPI COMM SELF with a callback function When MPI FINALIZE is called it first executes the equivalent of an MPI COMM FREE ON MPI COMM SELF This causes the delete callback function to be called on all keys associated with MP1_COMM_SELF The freeing of MPI COMM SELF occurs before any other part of MPI is affected Determining whether MPI has finished Allows layered libraries to determine whether MPI is still active by using MPI Finalize e Using the Info object Provides system dependent hints Sets key and value pairs both key and valueare strings for the opaque information object Info Info object routines include those described in Table 18 on page 179 e Associating information with status Sets the number of elements to associate with the status for requests n addition sets the status to associate with the cancel flag to indicate whether a request was cancelled Status routines indude MPI Status set elements Modifies the opaque part of status MPI Status set cancelled Indicates whether a status request is cancelled Associate a name with a communicator a window or a datatype Allows you to assodiate a printable identifier with an HP MPI communicator window or datatype This can be

HP MPI User's Guide

Contents

Download Pdf Manuals

Related Search

Related Contents