Home

Vampir 7 User Manual

image

Contents

1. 46 mumua DMM mmama BBHHU manaa BHHHHH mamm BHHH mananam BHHHHHHH maHHEHHH naa mmmHEHHH ana Deet nmmummmm BHEZGDUUD DUHUHEHEH mHumnnu EDBBBHH mmmnnu ononnmm manom onon oocse msBEBBDHDDDBODBHHEHHH CHAPTER6 AUSECASE MER 6 AUse Case 6 1 Introduction In many cases the Vampir suite has been successfully applied to identify per formance bottlenecks and assist their correction To show in which ways the provided toolset can be used to find performance problems in program code one optimization process is illustrated in this chapter The following example is a three part optimization of a weather forecast model including simulation of cloud microphysics Every run of the code has been performed on 100 cores with man ual function instrumentation MPI communication instrumentation and recording the number of L2 cache misses In Figure 6 1 Vampir has been set up to show a high level overview of the model s code This layout can be achieved through two simple manipulations Set up the Master Timeline to adjust the process bar height to fit the chart height All 100 processes are now arranged into one view Likewise change the event category in the Function Summary to show function groups This way the many functions have been condensed into fewer function groups One run of the instrumented program took 290 seconds to finish The first half of the trace Figure 6 1 A is the
2. Process 28 84 560075 s 7 25 Em UTIL po 07 BM Application 0 46 VT API 0 27 COUPLE Process 35 Process 42 Process 49 Proces Process rocess 118 s 202 324368 s 84 324368 s Figure 6 6 Overview showing a significant overall improvement By using the Vampir toolkit three problems have been identified As a con sequence of addressing each problem the duration of one iteration has been decreased from 3 5 seconds to 2 0 seconds As is shown by the Ruler Chapter 4 1 in Figure 6 6 two large iterations now take 84 seconds to finish Whereas at first Figure 6 1 it took roughly 140 seconds making a total speed gain of 40 This huge improvement has been achieved easily by using the insight into the program s runtime behavior provided by the Vampir toolkit to ultimately optimize the inefficient parts of the code 52
3. gram Therefore simply use the compiler wrappers without any parameters e g vtf90 hello f90 o hello For manual instrumentation with the Vampir Trace API simply include vt_user inc Fortran or vt user h C C and label any user defined sequence of statements for instrumentation as follows http www tu dresden de zih vampirtrace nmn ann amnmum DMM mmama BHHHH manaa BHHHHH anana maaHHEHnu naa mEBHHEHHHU EBBHEHHE EBEHBHBH BHEBHHHH msHEGDUUD DUHDHEHEH mumnnu EDBBBHH mmmnnu nDBHHBH manom onon BmBHBH VT USER START name VT USER END name in Fortran and C respectively in C as follows VI TRACER name Afterwards use e vtec DVIRACE hello c o hello to combine the manual instrumentation with automatic compiler instrumentation Or vtcc vt inst manual DVTRACE hello c o hello to prevent an additional compiler instrumentation For a detailed description of manual instrumentation please consider the vampirTrace User Manual El 2 3 2 Tracing an Application Running a VampirTrace instrumented application should normally result in an OTF trace file in the current working directory where the application was exe cuted On Linux Mac OS and Sun Solaris the default name of the trace file will be equal to the application name For other systems the default name is a otf but can be defined
4. nmn ann waa waa waa HHHH BHHHHHH maEHHEHHH aa mammHmHHu mBBBHEBSHR EBEHBHHH ocooooooe wao hhh manooo0 DBBBHH manonn 2001 madan onon oocse zao Open Trace Hile Suchen in O io wrrF Of L Zuletzt verwendete D 2 Desktop Eigene Dateien Arbeitsplatz Dateiname Metzwerkumgeb Dateityp OTF trace files otf Abbrechen OTF trace files fo EPILOG trace files elg esd All files Figure 2 2 Loading a Trace Log File in Vampir Vampir Trace View C ZIH gns vampirtrace otf File Edit Chart Filter Window Help Figure 2 3 Progress Bar and Cancel Loading Button 11 ams ann waa waa waa manm DOMM ansann 2 12 na maBHEEHHH aa kwa BHEBHHHH BHEBGDUUD DUHUHEHEH mumnnu DBBBHH mmmnnu nBuBHBH manom onon BmBHBH msBEBBDHDDDBODBHHEHHH CHAPTERS BASICS YAM R 3 Basics After loading has been completed the Trace View window title displays the trace file s name as depicted in Figure By default the Charts toolbar and the Zoom Toolbar are available Furthermore the default set of charts Vampir Trace View D IHNsoc128 otf mri eim All Processes Accumulated Exclusive Time per Function Group 5000 s 0s 10160 388681s EWALD_MOD EWALD_ENERGY MPI i 445 185915s JJ HAMILTONIAN_MOD GET_SITE_DATA_I 418 615375 s SYMMETRY_MOD G_STRUCTURES_I 96 382357 s H
5. 49 AE Solution To even this asymmetry out the code which determines the size of the work packages for each process had to be thought over To achieve the desired effect an improved version of the domain decomposition has been implemented Fig ure 6 3 shows that all occurrences of the MICROPHYSICS routine are vertically aligned thus balanced Additionally the MPI receive routine calls are now clearly smaller than before Comparing the Function Summary of Figure and Fig ure 6 3 shows that the relative time spent in MPI receive has been decreased and in turn the time spent inside MICROPHYSICS has been increased greatly This means that now we spent more time computing and less time communicat ing which is exactly what we want 6 2 2 Serial Optimization Problem All the displays in Vampir show information only of the time span visible in the Timeline Thus the most time intensive routine of one iteration can be deter mined by zooming into one or more iterations and having a look at the Function Summary The function with the largest bar takes up the most time In this ex ample Figure 6 2 the MICROPHYSICS routine can be identified as the most costly part of an iteration Therefore it is a good candidate for gaining speedup through serial optimization techniques Solution In order to get a fine grained view of the MICROPHYSICS routine s inner work ings we had to trace the program using full function instrumentation Only then it
6. FACTORS I 10 966021s SOCORRO 9 ARG MOD ARG START Yi lll ATOMIC OPERATORS NCP MOD CONSTRUCTOR AO ff ATOMIC OPERATORS NCP MOD CONSTRUCTOR AP an Mi ATOMIC_OPERATORS_NCP_MOD FORM_FACTORS_I Process 18 E ATOMIC OPERATORS NCP MOD TYPE DATA I d Man anti sonm mann li ff CONFIG_MOD CONSTRUCTOR_CFG lll CONFIG MOD STANDARD PORTAL I ll CRYSTAL MOD CONSTRUCTOR CRYS 2 i ECCE eR i D ELECTRONS_MOD CONSTRUCTOR_EL Process21 l m DI ELECTRONS MOD STANDARD PORTAL I lProcess22 EWALD MOD EWALD ENERGY E 9 ERROR MERE poke yi START Process23 m EXC M D FORM EXC FIELDS 1 a gt gt ENERGY Process 19 Figure 3 2 Moving and Arranging Charts in the Trace View Window 1 The Trace View window can host an arbitrary number of charts Charts can be 14 CHAPTER 3 BASICS Vampir Trace View DAZIHisoc128 0tf File View View Chart mus CODEHS 1G AG ww 2852 773858 s SYMMETRY _MOD _ SPACE GROUP I 445 185915 s HAMILTONIAN MOD GET SITE DATA I 418 615375 s SYMMETRY MOD G STRUCTURES 1 96 382357 s HAMILTONIAN MO S PROJECTORS I 51 9519545 FIELDS MOD CON RUCTOR NCP FD I 42 242958 s SYS MOD SYS START E 28 416744 s ERROR MOD ERROR START 25 28145 s EXC MOD WB VXC I 24 16995 s CONFIG_MOD CONSTRUCTOR_CFG 22 504484 s TAU USER 20 890244 s EXTERNAL MOD CONSTRUCTOR EXT 2 20 518876 s FIELDS MOD CONSTR
7. I 6 395182 s 36 773000 ms E m GRID MOD TAKE PTR GRID R 5 543465 s 305 000000 us E m GRID MOD TRANSFER GRID 5 543192s 172 000000 us m LAYOUT MOD FFT PARALLEL LAY 5 543057 s 15 000000 us B m FFT MOD FFT 3D PARALLEL 5 543042 s 131 000000 us a M void fft 3d FFT DATA FFT DATA int struct fft plan 3d C 5 542911s 13 000000 us E m oid fft In ki int strui In 5 542901s 83 859000 ms a M void remap_3d cfun c double dou ble double structremap plan 3d 5 C 5 532224s 798 000000 us Di m EXC MOD WB a I 855 171000 ms 409 491000 ms EXC MOD XCFUN XC 1 113327 s 168 000000 us EXC MOD XCDEN XC 917 168000 ms 302 000000 us EXC MOD WB VXC I 531 100000 ms 269 043000 ms EXC MOD FORM VXC FIELDS I 6 464842 s 40 487000 ms EXC MOD FORM EXC FIELDS I 641 601000 ms 41 804000 ms GRID MOD TAKE PTR GRID R 300 008000 ms 273 000000 us MOD g 299 811000 ms E m LAYOUT MOD FFT PARALLEL LAY 299 736000 ms 8 000000 us B m FFT MOD FFT 3D PARALLEL 299 730000 ms 57 000000 us a M void Wi ard DATA FFT_DATA int struct fft n 3d C 299 685000 ms 2 000000 us B B vo 3d_cfunc FFT 1 DATA FAT DATA int stri ct fit plan 3d 5 C 299 684000 ms 7 741000 ms a n hor a cfunc double double dou ble struct remap_plan_3d 5 C 291 985000 ms 660 000000 ps ER MPI Waitany 289 752000 ms 289 112000 ms m er Fa compare 855 000000 us 855 000000 us E el MPI Send 20 037000 ms 19 329000 ms m MPI Comm compare 850 0000
8. It is possible to hide functions and function groups from the displayed information with the context menu entry Filter To mark the function or function group to be filtered just click on the associated label or color representation in the chart Us ing the Process Filter see Section 4 4 allows you to restrict this view to a set of processes As a result only the consumed time of these processes is displayed for each function group or function Instead of using the filter which effects all other displays by hiding processes it is possible to select a single process via Set Process in the context menu of the Function Summary This does not have any effect on other timeline displays The Function Summary can be shown as a Histogram a bar chart like in timeline charts or as a Pie Chart To switch between these representations use the Set Chart Mode entry of the context menu The shown functions or function groups can be sorted by name or value via the context menu option Sort By 31 am Am GWT biisoni 00 42 STATISTICAL CHARTS 4 2 3 Process Summary The Process Summary shown in Figure 4 9 is similar to the Function Sum mary but shows the information for every process independently This is useful for analyzing the balance between processes to reveal bottlenecks For instance finding that one process spends a significantly high time performing the calcu lations could indicate an unbalanced distribution of
9. That enables a very powerful format with respect to storage size human readability and search capabilities on timed event records In order to support fast and selective access to large amounts of performance trace data OTF is based on a stream model i e single separate units repre senting segments of the overall data OTF streams may contain multiple inde pendent processes whereas a process belongs to a single stream exclusively As shown in Figure 1 1 each stream is represented by multiple files which store waa waa waa mama DOMM BHHHHHH mammumnmnu na mmHHEHHH EBBHEHHR Deet nmmummmm mBHEGDUUD DUBHEHEH manonn DBBBHH manono noe manom onon BBHHBH zao CHAPTERS WIER ew definition records performance events status information and event summaries separately A single global master file holds the necessary information for the process to stream mappings Each file name starts with an arbitrary common prefix defined by the user The master file is always named name otf The global definition file is named name 0 def Events and local definitions are placed in files name x events and name x defs where the latter files are optional Snapshots and statistics are placed in files named name x snaps and name x stats which are op tional too Note Open the master file otf to load a trace When copying moving or deleting traces it is important to take all according files
10. been done in a before and after fashion to point out what changed by applying the specific improvements Vampir Trace View traces success story pmp old otf W file Edit Chart Filter Window Help L I US Ty Os 50s 100 s 150 s 200 s 250 s All Processes Accumulated Exclusive Time per F Process 0 Process 7 Process 14 Process 21 Process 28 MP UTIL 0 73 f Application 0 32 VT API 0 1996 COUPLE Process 42 Process49 Process 56 Process 63 Process 70 Process 77 Process 84 Process 91 Figure 6 1 Master Timeline and Function Summary showing an overview of the program run 6 2 Identified Problems and Solutions 6 2 1 Computational Imbalance Problem As can be seen in Figure 6 2 each occurrence of the MICROPHYSICS routine ourple color starts at the same time on all processes inside one iteration but takes between 1 7 and 1 3 seconds to finish This imbalance leads to idle time in Subsequent synchronization calls on the processes 1 to 4 because they have to wait for process 0 to finish its work marked parts in Figure 6 2 This is wasted time which could be used for computational work if all MICROPHYSICS calls would have the same duration Another hint at this overhead in synchronization is the fact that the MPI receive routine uses 17 6 of the time of one iteration Function Summary in Figure 6 2 48 one mumua DMM waa waa BH DDB anana
11. chart The functions own the color of their function group 30 nmn ann aEEUSB BHHHM MENS BBHHH manaa BHHHHH mama h o DOMM mananam BHHHHHH msmamHmnnu haa mmHHEHHH EBBHEBSHR EBHEHBHBH BHEBHHHH BHEUDUUD DUBDHEHEH mumpnnu DBBBHH manono oodccse manom onon oocse Vampir Trace View D ZIH soc128 otf W File View Help View Chart Filter ic I I Dn etn A ENLOTFERE BOY ANA nn Function Summary AX Function Summary AX 45s 0s 15s Os l AS es W ooo Sm UP 1 nnn sn e BEE exc M vxc 1 i 119429235 MN ATOMI ors 1 9277526 s NNA ATOMI ATA 1 9 184775 s EN AU user 6 498692 s Ill Exc M LDS 1 5 336949 s M EXTER EXT_2 2 38151s ATom R_AO 1 395091s NcP D EAR 1 11 338789 s leur T xc 112271925 EXC M LDSI 793 514 ms ATOM R_AP 100 ms GRID GRID i 100 ms GRID RID R i 100ms EXC M DER I i 100ms MPI M V INT 100ms MPI M E LOG io 100ms FFT_M ALLEL SYMMETRY _ UCTURES_I 376 674494 s i 100ms EXC N XC i 100ms EXC N XC i 100ms FIELD OR FD 100ms FIELD TER I 100ms ATOMI R LAY io 10ms NCP R PD i 10ms IO MO FILE io 10ms LAYOU L LAY 10ms GRID_ RID_C io 10ms CONFI TAL_I EXC MOD WB VXC I 20 343932 s SYMMETRY E GROUP I 106 467136 s EWALD MO D ENERGY 290 524173 s Le Figure 4 8 Function Summary
12. initialization part Processes get started and synced input is read and distributed among these processes and the prepara tion of the cloud microphysics function group MP is done here The second half is the iteration part where the actual weather forecasting takes place In a normal weather simulation this part would be much larger But in order to keep the recorded trace data and the overhead introduced by tracing as small as possible only a few iterations have been recorded This is sufficient since they are all doing the same work anyway Therefore the simulation has been configured to only forecast the weather 20 seconds into the future The iteration part consists of two large iterations Figure 6 1 B and C each calcu lating 10 seconds of forecast Each of these in turn is partitioned into several smaller iterations For our observations we focus on only two of these small inner iterations since 47 MET a Te 62 IDENTIFIED PROBLEMS AND SOLUTIONS this is the part of the program where most of the time is spent The initialization work does not increase with increasing forecast duration and in a real world run takes a relatively small amount of time The constant part at the beginning of each large iteration takes less than a tenth of the whole iteration time Therefore by far the most time is spent in the small iterations Thus they are the best place to look at for improvement All screenshots starting with Figure have
13. installation depends on the operation system To install Vampir on a Unix machine the tarball has to be unpacked after having placed it in an arbitrary directory On Windows platforms Vampir comes with an installer what makes the installa tion very simple and straightforward Just run the installer and follow the instal lation wizard Install Vampir in a directory of your choice we recommend C Program Files In order to run the installer in silent unattended mode use the S option It is also possible to specify the output directory of the installation with D air An example of running a silent installation is as follows Vampir 7 3 0 Standard setup x86 exe S D C Program Files If you want to you can associate Vampir with OTF trace files otf during the installation process The Open Trace Format OTF is described in Chapter i 2 This allows you to load a trace file quickly by double clicking it Subsequently Vampir can be launched by double clicking its icon or by using the command line interface see Chapter 2 4 2 2 Generation of Trace Data on Windows Systems 2 2 1 Enabling Performance Tracing The generation of trace log files for the Vampir performance visualization tool requires a working monitoring system to be attached to your parallel program The Event Tracing for Windows ETW infrastructure of the Windows client and server OS s is such a monitor The Windows HPC Server 2008 version of MS MPI has built i
14. into account otherwise Vampir will render the whole trace invalid Good practice is to hold all files be longing to one trace in a dedicated directory Detailed information about the Open Trace Format can be found in the open documentation 1 3 Vampir and Windows HPC Server 2008 The Vampir performance visualization tool usually consists of a performance monitor Vampir Trace that records performance data and a performance GUI which is responsible for the graphical representation of the data In Windows HPC Server 2008 the performance monitor is fully integrated into the operating system which simplifies its employment and provides access to a wide range of system metrics A simple execution flag controls the generation of performance data This is very convenient and an important difference to solutions based on explicit source object or binary modifications Windows HPC Server 2008 is shipped with a translator which produces trace log files in Vampirs Open Trace Format OTF The resulting files can be visualized very efficiently with the Vampir 7 performance data browser http www tu dresden de zih otf waa waa waa mama DOMM BHHHHHH aBHHEBHHH 1 aa mmmHmHHu oopeoenns Dee nmmummum BHEBGDUUD DUBDHEHEH mumnnu EDBBBHH manono oodccse manom onon BmBHBH msBEBBDHDDDBODBHHEHHH 2 Getting Started 2 1 Installation of Vampir Vampir is available on all major platforms but naturally its
15. manually by setting the environment variable VT FILE PREFIX to the desired name After a run of an instrumented application the traces of the single processes need to be unified in terms of timestamps and event IDs In most cases this happens automatically If it is necessary to perform unification of local traces manually use the following command O vtunify lt nproc gt prefix lf VampirTrace was built with support for OpenMP and or MPI it is possible to speedup the unification of local traces significantly To distribute the unification on multiple processes the MPI parallel version vtunify mpi can be used as fol lows O v mpirun np lt nranks gt vtunify mpi lt nproc gt Sprerix http www tu dresden de zih vampirtrace AE 2 4 Starting Vampir and Loading a Trace File Viewing performance data with the Vampir GUI is very easy On Windows the tool can be started by double clicking its desktop icon if installed or by using the Start Menu On a Unix based machine run vampir in the directory where Vampir is installed To open a trace file select Open in the File menu which provides the file open dialog depicted in Figure 2 2 It is possible to filter the files in the list The file type input selector determines the visible files The default OTF Trace Files on shows only files that can be processed by the tool All file types can be displayed by using All Files Alterna
16. run of the application using the Microsoft tool mpicsync Now the eventlog files can be converted into OTF files with help of the tool etiZotf The last neces sary step is to copy the generated OTF files from the compute nodes into one shared directory Then this directory includes all files needed by the Vampir per formance GUI The application performance can be analyzed now The following commands illustrate the procedure described above and show as a practical example how to trace an application on the Windows HPC Server 2008 For proper utilization and thus successful tracing the file system of the cluster needs to meet the following prerequisites e Nshare userHome is the shared user directory throughout the cluster e MS MPI executable myApp exe is available in the shared directory e share userHome Trace is the directory where the OTF files are col lected 1 Launch application with tracing enabled use of tracefile option mpiexec wdir share userHome tracefile SUSERPROFILE trace etl myApp exe e wdir sets the working directory myApp exe has to be there e SUSERPROFILE translates to the local home directory e g C iNUsersNuserHome on each compute node the eventlog file etl is stored locally in this directory LIT waa waa waa mamm DOMM BHHHHHH mmmumumnu aa mamHHEHHH EBBHEHHR EBEHBHHH BHEBHHHH mEHEBGDUUD DUBDHEHEH mHumpnnu DBBBHH manono nD
17. was possible to inspect and measure subroutines and subsubroutines of MICRO PHYSICS This way the most time consuming subroutines have been spotted and could be analyzed for optimization potential The review showed that there were a couple of small functions which were called a lot So we simply inlined them With Vampir you can determine how often a functions is called by changing the metric of the Function Summary to the num ber of invocations The second inefficiency we discovered were invariant calculations being done inside loops So we just moved them in front of their loops Figure 6 3 sums up the tuning of the computational imbalance and the serial op timization In the Timeline you can see that the duration of the MICROPHYSICS 90 amm ann waa waa waa mama DOMM BHHHHHH mammHmnnu na mammHEHHH ana EBHEHBHHH BZHEBHHHH BHEZGDUUD DUBDHEHEH mumnnu DBBBHH manono nBuBHHBH manom onon Tonnage zaa CHAPTER6 AUSECASE AA routine is now equal among all processes Through serial optimization its dura tion has been decreased from about 1 5 to 1 0 second A decrease in duration of about 33 is quite good given the simplicity of the changes done 6 2 3 The High Cache Miss Rate Vampir Trace View traces success story pmp old otf W file Edit Chat filter Window Help mime issus Timeline 148 s 1 5 2 5 Process 0 Process 0 Values of Counter PAPI L2 TCM our Tim
18. 00 us 850 000000 us m MPI_IrecvQ 256 000000 us 256 000000 us S EXC MOD FORM NDER I 371 270000 ms 406 000000 us GRID MOD TAKE PTR GRID C 370 864000 ms 15 000000 ps E m GRID MOD TRANSFER GRID 370 852000 ms 156 000000 us a m GRID_MOD TRANSFER GRID 370 700000 ms 28 000000 us GRID MOD TRANSFER GRID Callers Callees ll GRID MOD TRANSFER GRID Room MOD TAKE PTR GRID C Fl GRID MOD TAKE PTR GRID R Figure 4 7 Call Tree 4 2 2 Function Summary The Function Summary chart Figure 4 8 gives an overview of the accumu lated time consumption across all function groups and functions For example every time a process calls the MPI Sena function the elapsed time of that func tion is added to the MPI function group time The chart gives a condensed view on the execution of the application and a comparison between the different func tion groups can be made so that dominant function groups can be distinguished easily It is possible to change the information displayed via the context menu entry Set Metric that offers values like Average Exclusive Time Number of Invocations Accumulated Inclusive Time and others Note Inclusive means the amount of time spent in a function and all of its sub routines Exclusive means the amount of time just spent in this function The context menu entry Set Event Category specifies whether either function groups or functions should be displayed in the
19. 3 45 s 13 50 s 13 55 s rocess 0 Values of Counter PAPI FP OPS over Time i i i i i Figure 4 4 Counter Data Timeline An example Counter Data Timeline chart is shown in Figure 4 4 The chart is restricted to one counter at a time lt shows the selected counter for one process Using multiple instances of the Counter Data Timeline counters or processes can be compared easily The context menu entry Set Counter allows to choose the displayed counter directly from a drop down list The entry Set Process selects the particular process for which the counter is shown 27 am Am GWT ee 42 STATISTICAL CHARTS 4 1 3 Performance Radar The Performance Radar chart provides the search of function occurrences in the trace file and the extended visualization of counter data It can happen that a function is not shown in Master and Process Timeline due to a short runtime An alternative to zooming is the option Find Function A color coded timeline indicates the intervals in which the function is executed Vampir Trace View C AZIHfiofwrf otf W File Edit Chat Filter Window Help EriSOesERS amp v 7 ME Timeline 955115 9551435 955155 955175 955195 Occurrences of Function ALL SUB K over Time Process 0 Process 1 M Process 2 3 KS Process 3 posed Process 4 DEC Process 5 e Process 6 LES 4 Figure 4 5 Performance Radar Timeline Search
20. 767 MB 490 575188 MB s 2 124947 MB 488 396869 MiB s 1 136177 MiB 488 144613 MB s 1 685493 MiB 288 048712 MB s 1 026314 MiB 487 977614 MiB s 113 445312 KiB 437 023005 MB s 938 445312 KiB 486 700691 MB s 1 355904 MiB 485 985429 MB s r 1 57563 MiB 485 258178 MiB s l 713 445312 KiB 497 293101 MiB s 495 620 109 MiB s 483 428245 MiB s 35 054688 KiB 478 558369 MiB s 1 24604 MiB 600 945312 KB 825 945312 KB 281 164062 KiB 2 23481 MiB 299 0625 KiB 375 945312 KB 243 75 KB 595 53125 KiB 349 03125 KiB 245 21875 KiB 525 46875 KiB 443 649 197 MiB s 488 445312 KiB Figure 4 10 Message Summary Chart with metric set to Message Trans fer Rate showing the average transfer rate A and the mini mal maximal transfer rate B 33 am Am GWT ene 42 STATISTICAL CHARTS All values are represented in a bar chart fashion The number next to each bar is the group base while the number inside a bar depicts the different values depending on the chosen metric Therefore the Set Metric sub menu of the context menu can be used to switch between Aggregated Message Volume Message Size Number of Messages and Message Transfer Rate The group base can be changed via the context menu entry Group By It is possible to choose between Message Size Message Tag and Communica tor MPI Note There will be one bar for every occurr
21. AMILTONIAN_MO S_PROJECTORS_I 51 951954 s FIELDS MOD CON RUCTOR NCP FD I e LII EWALD MOD EWALD ENERGY sa 42 242958 s SYS MOD 5YS START 28 416744 s ERROR MOD ERROR START EWALD_MOD EWALD_ENERGY 25 28145 s EXC_MOD WB_VXC_I EWALD MOD EWALD ENERGY ji M AA EAE Contest View Process 13 Ubaldi at ml Process 14 Al Lil LLN EWALD MOD EWALD ENERGY H EIL n EWALD MOD EWALD ENERGY Process 19 i Lo Process 18 d Il Ak ne LI BR ARG MOD ARG START e hh d lll ATOMIC OPERATORS NCP MOD CONSTRUCTOR AO i eg fl ATOMIC OPERATORS NCP MOD CONSTRUCTOR AP Ug EWALD MOD EWALD ENERGY Mi ATOMIC OPERATORS NCP MOD FORM FACTORS I MM ATOMIC OPERATORS NCP MOD TYPE DATA I i EI CONFIG MOD CONSTRUCTOR CFG Process24 III EWALD_MOD EWALD_ENERGY lll CONFIG MOD STANDARD PORTAL I i i i i i i E i E CRYSTAL MOD CONSTRUCTOR CRYS 2 E ELECTRONS_MOD CONSTRUCTOR_EL mn m e Figure 3 1 Trace View Window with Charts Toolbar A and Zoom Toolbar B IS opened automatically after loading has been finished The charts can be di vided into three groups timeline statistical and informational charts Timeline charts show detailed event based information for arbitrary time intervals while statistical charts reveal accumulated measures which were computed from the corresponding event data Informational charts provide additional or explanatory information regar
22. BBHHBH madan onon Tonnage zaa Rank 0 node myApp exe 1 Run myApp with tracing enabled iii an 2 Time Sync the ETL logs L y 3 Convert the ETL logs to OTF mpicsync 4 Copy OTF files to head node ums HEAD NODE share Figure 2 1 MS MPI Tracing Overview 2 Time sync the eventlog files throughout all compute nodes mpiexec cores 1 wdir USERPROFILE S mpicsync trace etl e cores 1 run only one instance of mpicsync on each compute node 3 Format the eventlog files to OTF files mpiexec cores 1 wdir USERPROFILES etl2otf trace etl 4 Copy all OTF files from compute nodes to trace directory on share mpiexec cores 1 wdir USERPROFILE cmd c copy y otfx share userHome Trace More information about performance tracing of MPI applications can be found in the Microsoft HPC SDK tutorial gt Tracing the Execution of MPI Ap plications with Windows HPC Server 2008 http resourcekit windowshpc net MORE INFO TracingMP lApplications html AE A 2 3 Generation of Trace Data on Linux Systems The generation of trace files for the Vampir performance visualization tool re quires a working monitoring system to be attached to your parallel program Contrary to Windows HPC Server 2008 whereby the performance monitor is in tegrated into the operating system recording performance under Linux is done by a separate performance monitor We recommend our Vampir Trace
23. BHH manono nDBBHBH CHAPTER 4 PERFORMANCE DATA VISUALIZATION ane 0000 It is possible to profile only one function or function group or to hide functions and function groups from the displayed information To mark the function or function group to be profiled or filtered just click on the associated color representation in the chart and the context menu will contain the possibilty to profile or filter via the context menu entry Profile of Selected Function Group or Filter of Selected Function Group Using the Process Filter see Section 4 4 allows you to re strict this view to a set of processes The context menu entry Sort by allows you to order function profiles by Num ber of Clusters This option is only accessible if the chart is clustered otherwise function profiles are sorted by process automatically Profiling one function al lows you to order functions by length in addition via context entry Sort by Value 4 2 4 Message Summary The Message Summary is a statistical chart showing an overview of the differ ent messages grouped by certain characteristics as shown in Figure 4 10 Vampir Trace View D ZIH soc128 otf File View Help View Chart Filter Message Summary 880 MiB s 800 MiB s 720 MiB s 640 MB s 560 MiB s 480 MiB s 400 MiB s 320 MiB s 240 MiB s 160 MiB s 80 MiB s 0 MiB s 150 945312 KB 2 344673 MiB 1 90522 MiB 1 795357 MiB 2 015083 MiB 1 465
24. D CONSTRUCTOR EXT 2 F FIELDS MOD CONSTRUCTOR FD f FIELDS MOD CONSTRUCTOR NCP FD I Mi HAMILTONIAN MOD CONSTRUCTOR HC fl HAMILTONIAN MOD FORM RS PROJECTORS I Mi HAMILTONIAN MOD GET SITE DATA I Mi 10 MOD 10 START f LAYOUT MOD CONSTRUCTOR LAY f LAYOUT MOD GATHER R LAY f MPI MOD ALLGATHERV INT fl MPI MOD ALLREDUCE LOG F MPI MOD BROADCAST CH fll MPI MOD BROADCAST DPR1 f MPI MOD MPI START ff NCP DATA MOD READ LINEAR I F SOCORRO Mi SYMMETRY MOD FORM SYMMETRIZING STRUCTURES I F SYMMETRY MOD GENERATE SPACE GROUP I F erg MOD SYS START fj SYS_MOD SYS_STOP lg TAU USER Figure 3 7 Docking of a Chart When hover the blank space between labels and graphical representation a moveable seperator appears After clicking a separator decoration moving the mouse while leaving the left mouse button pressed causes resizing The whole process is illustrated in Figure 3 8 3 2 Context Menus All of the chart displays have their own context menu with common entries as well as display specific ones In the following section only the most common entries will be discussed A context menu can be accessed by right clicking in the display window Common entries are e Reset Zoom Go back to the initial state in horizontal zooming e Reset Vertical Zoom Go back to the initial state in vertical zooming e Set Metric Change values which should be represented in the chart e g
25. EWALD MOD EWALD ENERGY D EXC MOD FORM EXC FIELDS I ll EXC MOD FORM VXC FIELDS 1 MM EXC MOD WB vxc 1 lll EXC MOD XCDEN xc ll Exc MOD XCPOT xc D EXTERNAL MOD CONSTRUCTOR EXT 2 F FIELDS MOD CONSTRUCTOR FD ff FIELDS MOD CONSTRUCTOR NCP FD I Mi HAMILTONIAN MOD CONSTRUCTOR HC MM HAMILTONIAN MOD M RS PROJECTORS I Mi HAMILTONIAN MOD GET SITE DATA I Mi 10 MOD 10 START f LAYOUT MOD CONSTRUCTOR LAY f LAYOUT MOD GATHER BR LAY m ver f MPI MOD ALLGATHERV INT ff MPI MOD ALLREDUCE LOG F MPI MOD BROADCAST CH fl MPI MOD BROADCAST DPR1 ff vpt MOD MPI START ff NCP DATA MOD READ LINEAR I F SOCORRO Mi SYMMETRY MOD FOR ZING STRUCTURES I F SYMMETRY MOD GENERATE SPACE GROUP I 9 es MOD SYS START f svs Mop SYS STOP lg TAU usen View Chart Filter Sru eE B iu Function Summary 250s Os MPI A EWALD MOD EWALD ENERGY JA Message Summary 14000 12000 10000 8000 6000 0 0 Q Te me nm m nnm mm tt 2 zen G 2974 7 8125 KiB 750s 1035 248779 3 500 s SYM I Communication Matrix View Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 y mp 4 S P S y PF b a x Process Lo A Q3 a SS S S g 48 4 444444446 Be process EN Zoom Apply Global Process Filter Function DI m SYMMETRY MOD GENERATE SPACE GROUP I EI SYMMETRY MOD DISTRIBUTE FIELD POINT
26. Exclusive Time to Inclusive Time e Sort By Rearrange values or bars by a certain characteristic 17 ZZ Amy Vampir Trace View D ZIH soc128 otf View Chart Filter z I Tm np on EENMOTFERSES BOYZ All Processes Accumulated Exclusive Time per Function Group 50s 458 40s 35s 30s 13 491091s 11 228158 s 10 325262 s lt All Processes Accumulated Exclusive Time per Function Group 50s 45s 40s 35s s s s 15s 10s WALD MOD EWALD ENERGY SYMMETRY MOD G SIRIICTURES 1 IELDS MOD GON BWUCTYR NCP FD I bYS MOD M e on A Pe CONSTRI CTOR EXT 2 RTOMIC OPERATO FORMIIFACTORS 1 IPI MOD MPI START AU USER CRYSTAL MOD CONSTRUGFOR CRYS 2 ATOMIC OPERATOR OD AYPE DATA I Wi Figure 3 8 Resizing Labels A Hover a Seperator Decoration B Drag and Drop the Seperator 3 3 Zooming Zooming is a key feature of Vampir In most charts it is possible to zoom in and out to get abstract and detailed views of the visualized data In the timeline charts zooming produces a more detailed view of a special time interval and therefore reveals new information that could not be seen in the larger section Short function calls in the Master Timeline may not be visible unless an ap propriate zooming level has been reached If the execution time of these short functions is too short regarding the pixel resolution of your computer display the selection of a shorter time interval is required Note Other ch
27. G W forschung t innovation Vampir 7 User Manual EEE ERE EERE EERE GELEET GELEET ERED ERED ERED EE NNI II EL EL E Bannan Aa V8 8 B Ec CEET EERE EERE EERE E HEN E E ED UL E NI ES ES ES EL I I I NI E ES ES ERI I I I M 1 E ES ESI RI I I I L1 EJ ES ES ERI I I I SIE E ES ESI I I I DETE E I M EE EH LL SL EL EL EL E 1 E 1 E 1 E31 E EE ESI ERI RI RI I I V E TR GWI forschung innovation Copyright 2011 GWT TUD GmbH Blasewitzer Str 43 01307 Dresden Germany http gwtonline de Support Feedback Bugreports Please provide us feedback We are very interested to hear what people like dislike or what features they are interested in If you experience problems or have suggestions about this application or manual please contact service vampir eu When reporting a bug please include as much detail as possible in order to reproduce it Please send the version number of your copy of Vampir along with the bugreport The version is stated in the About Vampir dialog accessible from the main menu under Help About Vampir Please visit http vampir euj for updates and new versions httpi vanplr eu Manual Version 2011 06 18 Vampir 7 4 waa waa waa mama DOMM aaa mammumnnu naa mEBHEEHHHU ana mmmanoon BHEBGHHHH BHEBUDUUD DUBUHEHEH mumnnu DBDBBHH manono numBHBm manom onon BBHBH zaa Contents VAMPIR Con
28. S I Mi NCP DATA MOD CONSTRUCTOR PD MM RG MOD ARG START lll ATOMIC OPERATORS NCP MOD CONSTRUCTOR AO fll ATOMIC OPERATORS NCP MOD CONSTRUCTOR AP lll ATOMIC OPERATORS NCP MOD FORM FACTORS I f ATOMIC OPERATORS NCP MOD TYPE DATA I m CONFIG MOD CONSTRUCTOR CFG f MPI MOD VOTE 9 MPI MOD BROADCAST INT ll MPI MOD BROADCAST DPR Mi MPI MOD ALLREDUCE DPR MM MPI AlireduceQ Bil MPT Allnatherv Ce ee ee Mi SYMMETRY_MOD FORM_SYMMETRIZING_STRUCTURES_I Y Min Inclusive Time 0 000000 s 0 000000 s 0 000000 s 0 000000 s 0 000000 s 0 000000 s 0 000000 s 0 000000 s 0 000000 s n nnnnnn s Figure 3 4 A Custom Chart Arrangement in the Trace View Window 15 ZZ ZI Function Group Legend fl ARG MOD ARG START lll ATOMIC OPERATORS NCP MOD CONSTRUCTOR AO fl ATOMIC OPERATORS NCP MOD CONSTRUCTOR AP Mi ATOMIC OPERATORS NCP MOD FORM FACTORS I f ATOMIC OPERATORS NCP MOD TYPE DATA I f CONFIG MOD CONSTRUCTOR CFG lll CONFIG MOD STANDARD PORTAL I ll CRYSTAL MOD CONSTRUCTOR CRYS 2 f ELECTRONS MOD CONSTRUCTOR EL F ELECTRONS MOD STANDARD PORTAL I F ERROR MOD ERROR START ll EWALD MOD EWALD ENERGY LL m cuc son rans ru rena x Figure 3 5 Closing right and Undocking left of a Chart added by clicking on the respective Charts toolbar icon or the corresponding Chart menu entry With a few more clicks charts can be combined to
29. This tool experience is now available for HPC systems that are based on Microsoft Windows HPC Server 2008 This new Win dows edition of Vampir combines modern scalable event processing techniques with a fully redesigned graphical user interface 1 1 Event based Performance Tracing and Profiling In software analysis the term profiling refers to the creation of tables which sum marize the runtime behavior of programs by means of accumulated performance measurements Its simplest variant lists all program functions in combination with the number of invocations and the time that was consumed This type of profiling is also called inclusive profiling as the time spent in subroutines is in cluded in the statistics computation A commonly applied method for analyzing details of parallel program runs is to record so called trace log files during runtime The data collection process itself is also referred to as tracing a program Unlike profiling the tracing approach records timed application events like function calls and message communica tion as a combination of timestamp event type and event specific data This creates a stream of events which allows very detailed observations of parallel programs With this technology synchronization and communication patterns of parallel program runs can be traced and analyzed in terms of performance and correctness The analysis is usually carried out in a postmortem step i e after d Ay completio
30. Timeline Master Timeline Process 2 Function Function Process 4 MPI Beast MPI Ecast MPI 2490043 s 228 55 HS 5 827255 350 ns 292 66274 ms 228 5 US Process 0 Function Group MPI Process 6 Interval Begin 5 489814 s p o Interval End o 82 255 bids Duration 292 9109 ms Wo IH gn Hm ti Process 10 Process 12 Process 14 Figure 4 16 Comparison between Context Information 39 Amy p dA 4 4 Information Filtering and Reduction Due to the large amount of information that can be stored in trace files it is usually necessary to reduce the displayed information according to some filter criteria In Vampir there are different ways of filtering It is possible to limit the displayed information to a certain choice of processes or to specific types of communication events e g to certain types of messages or collective oper ations Deselecting an item in a filter means that this item is fully masked In Vampir filters are global Therefore masked items will no longer show up in any chart Filtering not only affects the different charts but also the Zoom Toolbar The different filters can be reached via the Filter entry in the main menu Example Figure 4 17 shows a typical process representation in the Process Filter window This kind of representation is equal to all other filters Processes can be filtered by their Process Group Communicators and Process Hier archy Items to be filtered are arra
31. UCTOR FD 14 217819 s ELECTRONS MOD CONSTRUCTOR EL 14 007424 s MPI MOD MPI START 13 729707 s CONFIG MOD STANDARD PORTAL I 13 491091s ATOMIC OPERATO FORM FACTORS I 10 966021s SOCORRO 10 325262 s CRYSTAL_MOD CONSTRUCTOR_CRYS_2 9 409968 s ATOMIC OPERATOR OD TYPE DATA I 9 249719 s HAMILTONIAN MO CONSTRUCTOR HC 6 638595 s EXC MOD FORM EXC FIELDS I 5 875283 s LAYOUT MOD CONSTRUCTOR LAY 5 851204 ELECTRONS MOD NDARD PORTAL I 4 446968 s SYS MOD SYS STOP 3 998655 s ARG MOD ARG START 2 545526 s ATOMIC OPERATO CONSTRUCTOR AO 1 909427 s MPI MOD ALLREDUCE LOG 1 790565 s MPI MOD ALLGATHERV INT 1 489477 s LAYOUT_MOD GATHER_R_LAY 1 449485 s NCP DATA MOD READ LINEAR I Vampir Trace View D ZIH soc128 otf File View Help Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Process 16 Process 17 Process 18 Process 19 Process 20 Process 21 Process 22 Process 23 Process 24 m ARG ARG 7 ATOMIC OPERATORS CONSTRUCTOR AO ffl ATOMIC OPERATORS D CONSTRUCTOR AP Mi ATOMIC OPERATORS D FORM FACTORS I D ATOMIC OPERATORS _ _ MOD TYPE DATA I f CONFIG MOD CONSTRUCTOR CFG ll CONFIG MOD STANDARD PORTAL I lll CRYSTAL MOD CONSTRUCTOR CRYS 2 f ELECTRONS MOD CONSTRUCTOR EL F ELECTRONS MOD STANDARD PORTAL I F ERROR MOD ERROR START ll
32. a custom chart arrangement as depicted in Figure 3 4 Customized layouts can be saved as described in Chapter 5 3 Every chart can be undocked or closed by clicking the dedicated icon in its upper right corner as shown in Figure 3 5 Undocking a chart means to free the chart from the current arrangement and present it in an own window To dock undock a chart follow Figure 3 6 respectively Figure 3 7 Vampir Trace View D ZIH soc128 otf File View Help View Chart Filter EvuLeJFERms BG 40s 60s 80s WALD_MOD EWALD_ENERGY SUD ENERGY Process2 Process3 ZWALD ENERGY EWALD ENERGY MOD EWALD_ENERGY Process 12 UL LIL LII OU Ak CWALD VALD ENERGY Process 13 l F H VALD ENERGY OD EWALD ENERGY Function Legend E AX Function Summary f ARG MOD ARG START Al Processes Accumulated Exclusive Time per Function Group B ATOMIC OPERATORS NCP MOD CONSTRUCTOR AP NE NN NN ym img J BEES W EWA Y ll ATOMIC OPERATORS NCP MOD FORM FACTORS 1 Bue een FA 0080 ENERG IB ATOMIC OPERATORS NCP MOD TYPE DATA I t fl CONFIG_MOD CONSTRUCTOR_CFG l 2852 773858 s M SYMMETRY MOD SPACE GROUP I lll CONFIG MOD STANDARD PORTAL I 445 185915 s J HAMILTONIAN_MOD GET_SITE_DATA_I Mi CRYSTAL MOD CONSTRUCTOR CRYS 2 D ELECTRONS MOD CONSTRUCTOR EL i i F ELECTRONS MOD STANDARD PORTAL I i i 26 382357 s HAMILTONIAN MO S PROJECT ORS I ERROR MOD ERROR START
33. arts can be affected when zooming in timeline displays Meaning the interval chosen in a timeline chart such as Master Timeline or Process Timeline also defines the time interval for the calculation of accumulated mea surements in the statistical charts otatistical charts like the Function Summary provide zooming of statistic values In these cases zooming does not affect any other chart Zooming is disabled in the Pie Chart mode of the Function Summary reachable via context menu 18 MEN ann waa waa waa mama DOMM mananam BHHHHHH msEHHEHHU ZBHBHHBHH mamHHEHHH aa mmmanoon BHEBHHHH msHEZGDUUD DUBDHEHEH mumnnu DBBBHH manono ooocse madan nnnu onama CHAPTERS BASICS 1 e nnn under Set Chart Mode Pie Chart To zoom into an area click and hold the left mouse button and select the area as Vampir Trace View D ZIH soc128 otf W File View Help View Chart Filter EBries ites BO sg Process 1 Process 2 Process 3 Process 6 Process 7 Process 11 Process 12 Process 16 Process 17 Figure 3 9 Zooming within a Chart shown in Figure 3 9 lt is possible to zoom horizontally and in some charts also vertically Horizontal Zooming in the Master Timeline defines the time interval to be visualized whereas vertical zooming selects a group of processes to be displayed To scroll horizontally move the slider at the bottom or use the mouse wheel Addit
34. button held the intended position executes horizontal zooming in all charts Note Instead of dragging boundaries it is also possible to use the mouse wheel for zooming Hover the Zoom Toolbar and scroll up to zoom in and scroll down to zoom out Dragging the zoom area changes the section that is displayed without changing the zoom factor For dragging click into the highlighted zoom area and drag and drop it to the desired region Zooming and dragging within the Zoom Toolbar is shown in Figure 3 10 If the user double clicks in the Zoom Toolbar the initial zooming state is reverted The colors represent user defined groups of functions or activities Please note that all charts added to the Trace View window will adapt their statistics in formation according to this time interval selection The Zoom Toolbar can be disabled and enabled with the toolbar s context menu entry Zoom Toolbar 3 5 The Charts Toolbar Use the Charts toolbar to open instances of the different charts It is situated in the upper left corner of the main window by default as shown in Figure 3 1 Of course it is possible to drag and drop it as desired The Charts toolbar can be disabled with the toolbar s context menu entry Charts Table 3 1 shows the different icons representing the charts in Charts toolbar The icons are arranged in three groups divided by a small separator The first group represents timeline charts whose zooming states aff
35. can be increased by a recipient that delays reception for some reason This will cause the dura tion to increase by this delay and the message rate which is the size of the 34 nmn ann CECR waa waa mama DOMM mananam BHHHHHHM aBHHEHHH ZBHBHHBHH mEBHHEBHHH EBBHEBUHR mmmanoon BHEBHHHH BHEZGDUUD DUBUHEHEH manonn EDBBBHH mmmnnu nDBBHBH manom onon BnBHBH msBEBBDDHDDDBODBHHEHHH Vampir Trace View D ZIH soc128 otf File View Help View Chart Filter CEA Ya BOU nl Average Bandwidth Process 47 Process 48 Process 49 Process 50 Process 51 Process 52 Process 53 Process 54 Process 55 Process 56 Process 57 Process 58 Process 60 Figure 4 11 Communication Matrix View message divided by the duration to decrease accordingly 4 2 6 1 0 Summary The I O Summary shown in Figure is a Statistical chart giving an overview of the input output operations recorded in the trace file Vampir Trace View C ZIH IO Trace dios mpi vt otf File Edit Chart Filter Window Help EYL SOS ERSS d wn m All Processes Number of WO Operations per Operation Type 70 k 65 k 60 k 55 k 50 k 45 k 40 k 35k 30k 25k 20k 15k M 752 MMB CLOSE 1 752 og 96 SEEK 48 SYNC Figure 4 12 WO Summary All values are represented in a histogram fashion The text label indicates the group base while the number inside each bar represents t
36. cess 2 1 191315 8 ERROR MPI Type or ous ES is For v Type Process 3 1 191929 s ERROR MPI Type contiguous oldtype is Fortran Typ Process 0 1 191796 s WARNING Tag 80000 greater then 32767 MPI only guarantees tags up to this THIS implementation allows tags up to 138603128 Process 0 1 191882 s WARNING Tag 90000 greater then 32767 MPI only guarantees tags up to this THIS implementation allows tags up to 138603128 Process 0 1 192668 s WARNING Tag 80000 greater then 32767 MPI only guarantees tags up to this THIS implementation allows tags up to 138603128 Process 0 1 192702 s WARNING Tag 90000 greater then 32767 MPI only guarantees tags up to this THIS implementation allows tags up to 138603128 Process 0 1 193078 s WARNING Tag 80000 greater then 32767 MPI only guarantees tags up to this THIS implementation allows tags up to 138603128 Process 0 1 19311s WARNING Tag 90000 greater then 32767 MPI only guarantees tags up to this THIS implementation allows tags up to 138603128 Process 0 A pam e oe ar then hy w guar uaran ah up rin THIS tags up pouason Drarace n Tan tar than 220757 ta THTE i lam ac Figure 4 14 A chosen marker A and its representation in the Marker View B Context View may contain several tabs a new empty one can be added by clicking on the add symbol on the right hand side If an object in another chart is selected its information is displayed in the
37. current tab If the Context View is closed it opens automatically in that moment The Context View offers a comparison between the information that is displayed in different tabs Just use the on the left hand side and choose two objects in the emerged dialog Itis possible to compare different elements from different charts this can be useful in some cases The comparison shows a list of com mon properties The corresponding values are displayed and their difference if the values are numbers The first line always shows the names of the displays 38 nmn ann mumua DMM waa manaa BHHHHH mama DEMM BHHHHHM mmamHmnmnu o mmmHEHHH ana EBHEHBHBH BHEBHHHH msHEGDUUD DUBUHEHEH mumpnnu EDBBBHH manono nBnBHHBH madan onon Tonnage zao Vampir Trace View D ZIH soc128 otf File View Help z 53x View Chart Filter aliua Cl ZE Context View FunctionSummary FE MasterTimeine E Empty 3 Property Value 33 4s 33 5 s 33 6 s 33 7 s L UL Interval Begin 33 42685 s IntervalEnd 33 81029 s 0 38344 s IO MOD 10 START IO MOD 10 START Figure 4 15 Context View showing context information B of a selected function A Vampir Trace View C ZlHfiofwrf otf File Edit Chart Filter Window Help Context View 888 Master Timeline SS Master Timeline piff E3 value 1 Comparison Value 2 DiFF Master
38. ding timeline and statistical charts All available charts can be opened with use of the Charts toolbar which is explained in Chapter 3 5 In the following section we will explain the basic functions of the Vampir GUI which are generic to all charts Feel free to go to Chapter 4 to skip the funda mentals and directly start with the details about the different charts 13 AE A 3 1 Chart Arrangement The utility of charts can be increased by correlating them and their provided in formation Vampir supports this mode of operation by allowing to display multiple charts at the same time Charts that display a sequence of events such as the Master Timeline and the Process Timeline chart are aligned vertically This alignment ensures that the temporal relationship of events is preserved across chart boundaries The user can arrange the placement of the charts according to his preferences by dragging them into the desired position When the left mouse button is pressed while the mouse pointer is located above a placement decoration then the layout engine will give visual clues as to where the chart may be moved As soon as the user releases the left mouse button the chart arrangement will be changed according to his intentions The entire procedure is depicted in Figures 3 2 and The flexible display architecture furthermore allows increasing or decreasing the screen space that is used by a chart Charts of particular interest
39. e Figure 6 4 Before Tuning Counter Data Timeline revealing a high amount of L2 cache misses inside the CLIPPING routine light blue Vampir Trace View traces success story pmp tuned otf W file Edit Chart filter Window Help Ev Be TFERSESBYZI Timeline 2 0 s 2 55 3 0 s 3 5 s 4 0 121s 0 5 5 1 0 s 1 5 5 Process 0 Figure 6 5 After Tuning Visible improvement of the cache usage 51 a a Te 68 CONCLUSION Problem As can be seen in the Counter Data Timeline Figure 6 4 the CLIPPING routine light blue causes a high amount of L2 cache misses Also its duration is long enough to make it a candidate for inspection What caused these inefficien cies in cache usage were nested loops which accessed data in a very random non linear fashion Data access can only profit from cache if subsequent reads access data that are in the vicinity of the previously accessed data Solution After reordering the nested loops to match the memory order the tuned version of the CLIPPING routine now needs a fraction of the original time Figure 6 5 6 3 Conclusion Vampir Trace View traces success story pmp tuned otf W file Edit Chart Alter Window Help X i poco SS m dw e F i ii imeline x Function S Os 50s rene s 150 s 200 s All Processes Accumulated Ex m ive Time per F 40 0 30 0 20 0 10 0 0 0 Process 0 Process 7 Process 14 Process 21
40. e Display Message Profile Function Summary Process Profile General Counter Display Timeline Display MessageStatistics Display Zoom Display Counters Messages Function Groups Layout n L w a Appearance UO Events Collectives Appearance Markers Saye changes in selected categories O Always Saving Polic Never G Ask Figure 5 3 Saving Policy Settings In the dialog Saving Behavior you tell Vampir what to do in the case of changed preferences The user can choose the categories of settings e g layout that 45 y Y A Ay Ww B SAVING POLICY should be treated Possible options are that the application automatically Al ways or Never saves changes The default option is to have Vampir asking you whether to save or discard changes Usually the settings are stored in the folder of the trace file If the user has no write access to it it is possible to place them alternatively in the Application Data Folder All such stored settings are listed in the tab Locally Stored Preferences with creation and modification date Note On loading Vampir always favors settings in the Application Data Folder Default Preferences offers to save preferences of the current trace file as de fault settings Then they are used for trace files without settings Another option is to restore the default settings Then the current preferences of the trace file are reverted
41. ect all other charts The second group consists of statistical charts providing special information 20 nmn ann aEEUSB BHHHM MENS BBHHH manaa BHHHHH mama h o DOMM mananam BHHHHHH msmamHmnnu haa mmHHEHHH EBBHEBSHR EBHEHBHBH BHEBHHHH BHEUDUUD DUBDHEHEH mumpnnu DBBBHH manono oodccse manom onon oocse zaa CHAPTERS BASICS e Vampir Trace View D ZIH soc128 otf File View Help View Chart Filter Eriexs ems ics All Processes Accumulated Exclusive Time pg 50s 75s 100s 125s 150s 175s 200 s 5000 s Process 23 Process24 il EWALD_MOD EWALD_ENERGY l Mi CONFIG MOD STANDARD PORTAL I i Mi CRYSTAL MOD CONSTRUCTOR CRYS 2 EI ELECTRONS MOD CONSTRUCTOR EL Figure 3 10 Zooming and Navigation within the Zoom Toolbar A B Zooming in out with Mouse Wheel C Scrolling by Moving the Highlighted Zoom Area D Zooming by Selecting and Moving a Boundary of the Highlighted Zoom Area and statistics for a chosen interval Vampir allows multiple instances for charts of these categories The last group comprises informational charts providing specific textual information or legends Only one instance of an informational chart can be opened at a time 3 6 Properties of the Trace File Vampir provides an info dialog containing the most important characteristics of the opened trace file This dialog
42. he receiver of the message The correspond ing function calls normally reflect a pair of MPI communication directives like MPI Send and MPI Recv It is also possible to show a collective communi cation like MPI Allreduce by selecting one corresponding message as shown in Figure 4 3 Furthermore additional information like message bursts markers and UO events is available Table shows the symbols and descriptions of these objects 25 Te A1 TIMELINE CHARTS Figure 4 3 Selected MPI Collective in Master Timeline Symbol Message Burst Due to a lack of pixels it is not possible to display i i a large amount of messages in a very short time interval Therefore outgoing messages are summarized as so called message bursts In this representation you cannot determine which processes receive these messages Zooming into this interval reveals the corresponding single messages Markers To indicate particular points of interest during the F multiple runtime of an application like errors or warnings mark single ers can be placed in a trace file They are drawn as triangles which are colored according to their types To illustrate that two or more markers are located at the same pixel a tricolored triangle is drawn I O Events Vampir shows detailed information about I O oper ations if they are included in the trace file O events mV are depicted as triangles at the beginning of an I O interval Multiple I O events are
43. he so called timeline chart This chart type graphically presents the chain of events of monitored processes or counters on a horizontal time axis Multiple timeline chart instances can be added to the Trace View window via the Chart menu or the Charts toolbar Note To measure the duration between two events in a timeline chart Vampir provides a tool called Huler In order to use the Huler click on any point of interest in a timeline display and move the mouse while holding the left mouse button and Shift key pressed A ruler like pattern appears in the current timeline chart which provides rough measurement directly The exact time between the start point and the current mouse position is given in the status bar If the Shift key is released before the left mouse button Vampir will proceed with zooming 4 1 1 Master Timeline and Process Timeline In the Master and the Process Timeline detailed information about functions communication and synchronization events is shown Timeline charts are avail able for individual processes Process Timeline as well as for a collection of processes Master Timeline The Master Timeline consists of a collection of rows Each row represents a single process as shown in Figure 4 1 A Process Timeline shows the different levels of function calls in a stacked bar chart for a single process as depicted in Figure 4 2 Every timeline row consists of a process name on the left and a col
44. he value of the chosen 35 a GWT bsos 43 INFORMATIONAL CHARTS metric The Set Metric sub menu of the context menu can be used to access the available metrics Number of I O Operations Accumulated 1 0 Transaction Sizes and all ranges of I O Operation Size I O Transaction Time or I O Bandwidth The I O operations can be grouped by the characteristics Transaction Size File Name and Operation Type The group base can be changed via the context menu entry Group I O Operations by Note There will be one bar for every occurring metric For a quick and con venient overview it is also possible to show minimum maximum and average values for the metrics Transaction Size Range of 1 0 Operations Time Range of I O Operations and Bandwidth Range of I O Operations all at once The minimum and maximum values are shown in an additional smaller bar beneath the bar indicating the average value The additional bar starts at the minimum and ends at the maximum value of the metric To select what l O operation types should be considered for the statistic cal culation the Set I O Operations sub menu of the context menu can be used Possible options are Read Write Read Write and Apply Global I O Op erations Filter including all selected operation types from the I O Events filter dialog see Chapter 4 4 4 3 Informational Charts 4 3 1 Function Legend The Functi
45. ing group However if metric is set to Message Transfer Rate the minimal and the maximal transfer rate is given in an additional bar beneath the one showing the average transfer rate The ad ditional bar starts at the minimal rate and ends at the maximal one To filter out messages click on the associated label or color representation in the chart and choose Filter from the context menu afterwards 4 2 5 Communication Matrix View The Communication Matrix View is another way of analyzing communication imbalances It shows information about messages sent between processes The chart as shown in Figure 4 1 1 is figured as a table Its rows represent the send ing processes whereas the columns represent the receivers The color legend on the right indicates the displayed values Depending on the displayed informa tion the color legend changes It is possible to change the type of displayed values Different metrics like the average duration of messages passed from sender to recipient or minimum and maximum bandwidth are offered To change the type of value that is displayed use the context menu option Set Metric Use the Process Filter to define which processes groups should be displayed see Section 4 4 Note A high duration is not automatically caused by a slow communication path between two processes but can also be due to the fact that the time between starting transmission and successful reception of the message
46. ionally the zoom can be accessed with help of the Zoom Toolbar by drag ging the borders of the selection rectangle or scrolling down the mouse wheel as described in Chapter To return to the previous zooming state the global Undo is provided that can be found in the Edit menu Alternatively press Ctrl Z to revert the last zoom Accordingly a zooming action can be repeated by selecting Redo in the Edit menu or pressing Ctri Shift Z Both functions work independently of the cur rent mouse position Next to Undo and Redo it is shown which kind of action in which display could be undone and redone respectively To get back to the initial state of zooming in a fast way select Reset Horizontal Zoom or Reset 19 am Am GWT 11 1 1 24 THE ZOOM TOOLBAR Vertical Zoom see Section 3 2 in the context menu of the desired timeline dis play To reset zoom is also an action that can be reverted by Undo 3 4 The Zoom Toolbar Vampir provides a Zoom Toolbar that can be used for zooming and naviga tion in the trace data It is situated in the upper right corner of the Trace View window as shown in Figure 3 1 Of course it is possible to drag and drop it as desired The Zoom Toolbar offers an overview of the data displayed in the corresponding charts The current zoomed area can be seen highlighted as a rectangle within the Zoom Toolbar Clicking on one of the two boundaries and moving it with left mouse
47. is called Irace Properties and can be ac cessed by File Get Info The information originates from the trace file and includes details such as the filename the creator and the OTF version 21 Be Description Master Timeline Section 4 1 1 O O Process Timeline Section 4 1 1 Counter Data Timeline Section 4 1 2 ar d i Performance Radar Section 4 1 3 Function Summary Section 4 2 2 a Message Summary Section 4 2 4 Process Summary Section 4 2 3 IT LI I LI b b b ERT err Communication Matrix View Section 4 2 5 I O Summary Section 4 2 6 Call Tree Section 4 2 1 Function Legend Section 4 3 1 Context View Section 4 3 3 Marker View Section Table 3 1 Icons of the Toolbar 22 waa waa waa mama DOMM mananam anana aBHHEBHHH na mmmHmHHu oopeoenns Dee nmmummum BHEBGDUUD DUBDHEHEH mumnnu EDBBBHH mmmnnu hhh manom onon BmBHBH msBEBBDHDDDBODBHHEHHH 4 Performance Data Visualization This chapter deals with the different charts that can be used to analyze the be havior of a program and the comparison between different function groups e g MPI and Calculation Even communication performance issues are regarded in this chapter Various charts address the visualization of data transfers between processes The following sections describe them in detail 4 1 Timeline Charts A very common chart type used in event based performance analysis is t
48. l 51 951954s FIELDS MOD CON RUCTOR NCP FD I a poban pasa ee 42 242958 s SYS MOD SYS START m EXC_MOD FORM_VXC_FIELDS I 28 416744 s ERROR MOD ERROR START Mon Figure 3 6 Undocking of a Chart Considering that labels e g those showing names or values of functions of ten need more space to show its whole text there is a further form of resiz ing arranging In order to read labels completely it is possible to resize the distri bution of space owned by the labels and the graphical representation in a chart 16 amm ann munna DMM waa waa mama DOMM mananam BHHHHHH msmamHmnnu GBBHHBHH mmHHEHHH EBBHEBSHR EBHEHBHBH BHEBHHHH BHEUDUUD DUBDHEHEH mumpnnu DBBBHH manono oodccse manom onon oocse CHAPTER 3 BASICS CAMPI Vampir Trace View D ZIHAsoc128 otf a File View Help D ARG MOD ARG START lll ATOMIC OPERATORS NCP MOD CONSTRUCTOR AO f ATOMIC OPERATORS NCP MOD CONSTRUCTOR AP Mi ATOMIC OPERATORS NCP MOD FORM FACTORS I f ATOMIC OPERATORS NCP MOD TYPE DATA I f CONFIG MOD CONSTRUCTOR CFG ll CONFIG MOD STANDARD PORTAL I lll CRYSTAL MOD CONSTRUCTOR CRYS 2 ff ELECTRONS MOD CONSTRUCTOR EL F ELECTRONS MOD STANDARD PORTAL I F ERROR MOD ERROR START ll EWALD MOD EWALD ENERGY fl EXC MOD FORM EXC FIELDS I lll Exc MOD FORM VXC FIELDS I MM Exc MOD WB vxc I lll Exc MOD XCDEN xc Mi Exc MOD XCPOT xc D EXTERNAL MO
49. lor blindness 5 2 Appearance In the Appearance settings of the Preferences dialog there are six different ob jects for which the color options can be changed the functions function groups markers counters collectives messages and l O events Choose an entry and click on its color to make a modification A color picker dialog opens where it is possible to adjust the color For messages and collectives a change of the line width is also available In order to quickly find the desired item a search box is provided at the bottom of the dialog 44 one waa waa waa mama DOMM BHHHHHH maaHHEHHnH 1 GBBHHBHH maBHHEHHH EBDBHEHHR EBHEHBHHH BHEBHHHH BHEGDUUD DUBDHEHHH mumnnu EDBBBHH manono nnBHBm nnn Preferences Function Groups Waasia anae Even Name Application DYN Gnral Default mp IJO IO NETCDF MEM MPI E NoGroup PHYS 5 VT API WRF Appearance Saving Policy o 6 Search L Figure 5 2 Appearance Settings 5 3 Saving Policy Vampir detects whenever changes to the various settings are made In the Sav ing Policy dialog it is possible to adjust the saving behavior of the different com ponents to the own needs Preferences Saving behavior Locally stored preferences Default preferences Categories E Displays Performance Radar ProcessTimelin
50. maaHHEHHH naa mmmHmHHu ana mmmanoon BHEBHHHH BHEBGDUUD DUBDHEHEH manonn DBBBHH mmmnnu oodccse mnnnnnun nnn Vampir Trace View traces success story pmp old otf W file Edit Chart Filter Window Help Ex AS YAUA Timeline Function Summa AN 148 s 1 5 2s KSE 4 5 5 5 6 5 7s All Processes Accumulated Exclusive E i 2096 096 O El uerge Process 0 Dou Process 1 4 71 IM MPI wait 4 63 rem 4 61 IM EXCHANGE 2 02 lADVECTION PD ech MPI Barrier 0 25 MPI Isend UT J 1 elt Process 2 Process 3 0 196 RUNGE KUTTA n TP 0 01x MPI Allreduce 096 MPI Comm size Process 4 35 8496 Figure 6 2 Before Tuning Master Timeline and Function Summary identifying MICROPHYSICS purple color as predominant and unbalanced Vampir Trace View traces success story pmp tuned otf W file Edit Chart fiter Window Help f E 1 ti am cE v m ev See gji SE Ss d ee L AX 121 s 1 8 2 5 3 s All Processes Accumulated Exclusive TI MICROPHYSICS Process 0 MPI Recv TENDENCIES EXCHANGE Process 1 5 CLIPPING 3 78 fADVECTION PD 1 84 f MPI Wait 0 4496 j MPI Isend 0 34 MPI Barrier 0 236 RUNGE KUTTA 0 01 kane lt 0 MPI Comm size Process 2 Process 3 Process 4 a Figure 6 3 After Tuning Timeline and Function Summary showing an improve ment in communication behavior
51. may get more space in order to render information in more detail Vampir Trace View D ZIH soc128 otf File View Help View Chart Filter Sri egps Do D 2852 773858 s g MOD SPACE GROUP I E 445 185915s HAMILTONIAN MOD GET SITE DATA I 418 615375 s SYMMETRY MOD G STRUCTURES I 96 382357 s HAMILTONIAN MO S PROJECTORS I 51 951954 s FIELDS MOD CON RUCTOR NCP FD I i 42 242958 s SYS MOD SYS START 28 416744 s ERROR MOD ERROR START i 25 28145s EXC MOD WB VXC I i 24 169955 CONFIG MOD CONSTRUCTOR CFG 22 504484 s TAU USER 20 890244 s EXTERNAL MOD CONSTRUCTOR EXT 2 20 518876 s FIELDS_MOD CONSTRUCTOR_FD 14 217819 s ELECTRONS MOD CONSTRUCTOR EL 14 007424 s MPI MOD MPI START 3 729707 s CONFIG MOD STANDARD PORTAL I 2852 773858 s M SYMMETRY MOD SPACE GROUP I i 445 185915s HAMILTONIAN MOD GET SITE DATA I gt 418 615375s SYMMETRY MOD G STRUCTURES 1 io 96 382357 HAMILTONIAN MO S PROJECTORS I 51 951954 FIELDS_MOD CON RUCTOR_NCP_FD_I 42 242958 s SYS MOD SYS START 28 416744 s ERROR MOD ERROR START 25 28145 s EXC MOD WB VXC I 24 16995 s CONFIG_MOD CONSTRUCTOR_CFG 22 504484 s TAU USER 20 890244 s EXTERNAL MOD CONSTRUCTOR EXT 2 20 518876 s FIELDS MOD CONSTRUCTOR FD 14 217819 s ELECTRONS MOD CONSTRUCTOR EL 14 007424 s MPI MOD MPI START 13 729707 s CONFIG MOD STANDARD PORTAL I 13 491091s ATOMIC OPERATO FORM
52. monitor ing facility which is available as Open Source software During a program run of an application Vampir Trace generates an OTF trace file which can be analyzed and visualized by Vampir The Vampir Trace library allows MPI communication events of a parallel program to be recorded in a trace file Additionally certain program specific events can also be included To record MPI communication events simply relink the program with the Vampir Trace library A new compilation of the program source code is only necessary if program specific events should be added Detailed information of the installation and usage of VampirTrace can be found in the VampirTrace User Manual 2 3 1 Enabling Performance Tracing To perform measurements with VampirTrace the application program needs to be instrumented Also VampirTrace handles this automatically by default man ual instrumentation is also possible All the necessary instrumentation of user functions MPI and OpenMP events is handled by the compiler wrappers of Vampir Trace vtcc vtcxx vtf77 vtf90 and the additional wrappers mpicc vt mpicxx vt mpif77 vt and mpif90 vt in Open MPI 1 3 All compile and link commands in the used makefile should be replaced by the Vampir Irace compiler wrapper which performs the necessary instrumentation of the program and links the suitable Vampir Trace library Automatic instrumentation is the most convenient method to instrument your pro
53. n Legend Information Filtering and Reduction 5 Customization 5 1 General Preferences 5 2 Appearance 5 3 Saving Policy Introduction 6 2 Identified Problems and Solutions 6 2 1 Computational Imbalance 6 2 2 Serial Optimization 6 2 3 The High Cache Miss Rate 6 3 Conclusion Contents waa waa waa mama DOMM mananam BHHHHHH aBHHEBHHH na mmmHmHHu oopeoenns Dee nmmummmm BHEBGDUUD DUBDHEHEH mumnnu EDBBBHH mmmnnu hhh manom onon BmBHBH msBEBBDHDDDBODBHHEHHH GBAPTERT INTRODUCTION FC 1 Introduction Performance optimization is a key issue for the development of efficient parallel software applications Vampir provides a manageable framework for analysis which enables developers to quickly display program behavior at any level of de tail Detailed performance data obtained from a parallel program execution can be analyzed with a collection of different performance views Intuitive navigation and zooming are the key features of the tool which help to quickly identify in efficient or faulty parts of a program code Vampir implements optimized event analysis algorithms and customizable displays which enable a fast and interac tive rendering of very complex performance monitoring data Ultra large data volumes can be analyzed with a parallel version of Vampir which is available on request Vampir has a product history of more than 15 years and is well established on Unix based HPC systems
54. n of the program It is needless to say that program traces can also be used to calculate the profiles mentioned above Computing profiles from trace data allows arbitrary time intervals and process groups to be specified This is in contrast to fixed profiles accumulated during runtime 1 2 The Open Trace Format OTF The Open Trace Format OTF was designed as a well defined trace format with open public domain libraries for writing and reading This open specification of the trace information provides analysis and visualization tools like Vampir to op erate efficiently at large scale The format addresses large applications written in an arbitrary combination of Fortran77 Fortran 90 95 etc C and C DLE ae ee an EE Events name x events Ar Statistics name x stats Snapshots name x snaps Local Definitions Events Master Control name otf AP Statistics Snapshots Local Definitions w Events Global Definitions name 0 def a Statistics EE Ee EE EE El Figure 1 1 Representation of Streams by Multiple Files OTF uses a special ASCII data representation to encode its data items with num bers and tokens in hexadecimal code without special prefixes
55. n support for this monitor It enables application developers to quickly produce traces in production environments by simply adding an extra mpiexec flag t race In order to trace an application the user account is re Be Ay quired to be a member of the Administrator or Performance Log Users groups No special builds or administrative privileges are necessary The cluster admin istrator will only have to add the Performance Log Users group to the head node s Users group if you want to use this group for tracing Trace files will be generated during the execution of your application The recorded trace log files include the following events Any MS MPI application call and low level com munication within sockets shared memory and NetworkDirect implementations Each event includes a high precision CPU clock timer for precise visualization and analysis 2 2 2 Tracing an MPI Application The steps necessary for monitoring the MPI performance of an MS MPI appli cation are depicted in Figure First the application needs to be available throughout all compute nodes in the cluster and has to be started with tracing enabled The Event Tracing for Windows ETW infrastructure writes eventlogs etl files containing the respective MPI events of the application on each com pute node In order to achieve consistent event data across all compute nodes clock corrections need to be applied This step is performed after the successful
56. ng Policy Color blindness Enable support for color blindness Deuteranope Protanope Figure 5 1 General Settings Show time as decides whether the time format for the trace analysis is based 43 ET GWT veer 52 APPEARANCE on seconds or ticks With the Automatically open context view option disabled Vampir does not open the context view after the selection of an item like a message or function Use color gradient in charts allows to switch off the color gradient used in the performance charts The next option is to change the style and size of the font Show source code enables the possibility to open an editor show the respective Source file In order to open a source file first click on the intended function in the Master Timeline and then on the source code path in the Context View For the source code location to work properly you need a trace file with source code location support The path to the source file can be adjusted in Preferences dialog A limit for the size of the source file can be set too In the Analysis section the number of analysis threads can be chosen If this option is disabled Vampir determines the number automatically by the number of cores e g two analysis threads on a dual core machine In the Updates section the user can decide if Vampir should check automati cally for new versions It is also possible to use Vampir with support for co
57. nged in a spreadsheet representation In addition to selecting or deselecting an entire group of processes it is certainly possible to filter single processes Filter Processes Include Exclude All Include Exclude All MPI Communicator 0 mars Communicators Process Hierarchy Process 14 Process 15 Number of processes 16 Selected processes 16 Figure 4 17 Process Filter Different selection methods can be used in a filter The check box Include Exclude All either selects or deselects every item Specific items can be selected deselected by clicking into the check box next to it Furthermore it is possible to se lect deselect multiple items at once Therefore mark the desired entries by 40 waa waa waa mama DOMM aaa 2 12 ha mmHHEHHH aa kwali nana wao ha manonn oooooe manono oonnmm manom onon BBHHBH clicking their names while holding either the Shift or the Ctrl key By hold ing the Shift key every item in between the two clicked items will be marked Holding the Ctrl key on the other hand enables you to add or remove specific items from to the marked ones Clicking into the check box of one of the marked entries will cause selection deselection for all of them Filter Object Filter Criteria Processes Process Groups Communicators Process Hierarchy Collective Operations Communicators Collective Operations Messages Message Communicator
58. of Functions By default the Performance Radar shows the values of one counter for each process as shown in Figure In this mode the user can choose between Line Plot and Color Coded drawing In the latter case a color scale on the bottom informs about the range of values Clicking on Set Counter leads to a dialog which offers to choose another counter and to calculate the sum or average values Summarizing means that the values of the selected counter of all processes are summed up The average is this sum divided by the number of processes Both options provide a single graph 4 2 Statistical Charts 4 2 1 Call Tree The Call Tree depicted in Figure 4 7 illustrates the invocation hierarchy of all monitored functions in a tree representation The display reveals information 28 waa waa waa mama DOMM BHHHHHH maaHHEunu aa mamHHEHHH EBBBHEBSHR EBHEHBHBH BHEBHHHH mBHEGDUUD DUHUHEHEH mumnnu EDBBBHH manono nDBBHBH madan onon oocse Vampir Trace View C ZIHfiofwrf otf W Ser Ge FERis GOO Timeline Os 25 105 15s 205 values of Counter MEM APP ALLOC over Time Fracess 0 Process 1 Process 2 Process 3 Process 4 Uh Figure 4 6 Performance Radar Timeline Visualization of Counters about the number of invocations of a given function the time spent in the differ ent calls and the caller callee relationship The entries of the Call Tree can be s
59. on Legend lists all visible function groups of the loaded trace file along with its corresponding color If colors of functions are changed they appear in a tree like fashion under their respective function group as well see Figure 4 13 4 3 2 Marker View The Marker View lists all marker events included in the trace file The display is made up in a tree like fashion and organizes the marker events in their respective groups and types Additional information like the time of occur rence in the trace file and its description is provided for each marker 36 waa waa waa mama 1 1 BHHH mananam anana aBHHEHHH naa mmHHEHHH ana EHEHBHBH ooooomanN BHEGDUUD DUBDHEHHH manonn DBBBHH manono nBBHBH manom onon oocse zaa Vampir Trace View C ZlH io wrf otf mE Bx ELEH CEREA OMe IIIA Function Summa ry All Processes Accumulated Exclusive Time per Function Group 240 5 220 5 200 5 180 s 1605 140 s 1205 100 s 2584175 3329s MO NETCDF 402 62 ms O 100 ms VT API 100 ms MEM po z IO NETCDF z MEM MPI Hl PHYS VT API E WRF o 2 o Fis c Si wu Figure 4 13 Function Legend By clicking on a marker event in the Marker View this event gets selected in the timeline displays that are currently open and vice versa If this marker event is not visible the zooming area jumps to this event automaticall
60. ored se quence of function calls or program phases on the right The color of a function 23 a Te 41 TIMELINE CHARTS Vampir Trace View D IH Ysoc128 otf File View Help 3 View Chart Filter EBries 2m S BIG 7 Timeline AX 90 s 92s 94s 96 s 3 100s 102s 104s 106s 108s MPI_wWaitany aa WPL Waitaavi OAI lr Waitany LIES Ald aa LX Zo ZEV i LL e MPI m IE o odi MPI Waitany MPI Waitany MPI Waizanyi MPI Waitany MPI Waitanyt i PI Waitany NIC PL PP RTT l I Wailtanvi m UI Waitany n tir Waitany riri Waitanyii hi a a an iri Waitanv ii Maitany inp W aitan y MPI_Waitany Figure 4 1 Master Timeline is defined by its group membership e g MPI Send belonging to the function group MPI has the same color presumably red as MPI Recv which also be longs to the function group MPI Clicking on a function highlights it and causes the Context View display to show detailed information about that particular func tion e g its corresponding function group name time interval and the complete name The Context View display is explained in Chapter 4 3 3 some function invocations are very short thus these are not show up in the over all view due to a lack of display pixels A zooming mechanism is provided to inspect a specific time interval in more detail For further information see Sec tion 3 3 If Zooming is performed panning in horizontal direc
61. orted in various ways Simply click on one header of the tree representation to use its characteristic to re sort the Call Tree Please note that not all available characteristics are enabled by default To add or remove characteristics a context menu is provided accessible by right click on any of the tree headers To leaf through the different function calls it is possible to fold and unfold the levels of the tree This can be achieved by double clicking a level or by using the fold level buttons next to the function name Functions can be called by many different caller functions what is hardly obvi ous in the tree representation Therefore a relation view shows all callers and callees of the currently selected function in two separated lists as shown in the lower area in Figure 4 7 To find a certain function by its name Vampir provides a search option accessi ble with the context menu entry Show Find View The entered keyword has to be confirm by pressing the Return key The Previous and Next buttons can be used to flip through the results afterwards 29 am Am GWTzu 00 42 STATISTICAL CHARTS Vampir Trace View D ZIH soc128 otf File View Help View Chart Filter ECKE EES HOY 7 Sen Call Tree Apply Global Process Filter Function Max Indusive Time Max Exclusive Time ES m FFT MOD FFT 3D PARALLEL 557 687000 ms 69 000000 us a m EXC MOD XCPOT XC 6 396320 s 38 279000 ms B m EXC MOD FORM VXC FIELDS
62. s Message Tags UO Events I O Groups File Names Operation Types Table 4 2 Options of Filtering 41 nmn ann mumua DMM waa waa mama DOMM anana 2 10 naa mammHEHHH ana mmmanoon ooooomaN wao DUBDHEHHH mumpnnu EDBDBBHH manono nDBBHBH manom onon BBHBH nnn 5 Customization The appearance of the trace file and various other application settings can be altered in the preferences accessible via the main menu entry File Prefer ences Settings concerning the trace file itself e g layout or function group colors are saved individually next to the trace file in a file with the ending vset tings This way it is possible to adjust the colors for individual trace files without interfering with others The options Import Preferences and Export Preferences provide the loading and saving of preferences of arbitrary trace files 5 1 General Preferences The General settings allow to change application and trace specific values Preferences Charts Show time as Seconds v Automatically open context view General Use color gradient in charts Font Arial Restore Default Source code C Enable source code viewer 7 Local path to source Files Prefix to remove from trace source File path Appearance Set maximum size For file in KiByte 100 Analysis W C Fix number of analysis threads 1 Updates Automatically check for newer versions Savi
63. tents 1 1 1 Event based Performance Tracing and Profiling 1 1 2 The Open Trace Format OTF 2 1 3 Vampir and Windows HPC Server 2008 3 2 Getting Started 5 2 1 Installation of Vampir 02 D 2 2 Generation of Trace Data on Windows Systems 5 2 2 1 Enabling Performance Tracing 5 2 2 2 racing an MPI Application 6 2 3 Generation of Trace Data on Linux SystemsS 8 2 3 1 Enabling Performance Tracing 8 2 3 2 racing an Application 9 2 4 Starting Vampir and Loading a Trace File 10 13 3 1 ChartArrangement 14 TT 17 X RCRUM r A A EU NW RN fi aa av fi dit ie A Sivik di v 18 15 3 4 9 5 8 X 8 15 99 ee de a ae BONOS SOR L 20 TIT 20 3 6 Properties of the Trace File 21 23 41 TimelineCharisi 23 4 1 1 Master Timeline and Process Timeline 23 4 1 2 Counter Data Timeline 27 4 1 3 Performance Radar 28 42 Statistical Charts 28 T 28 WET KE EE een eeeve eae 30 kd down dom KON pi kw n RW ad ea es 32 ME AE taba chike pa mat kt BEL 33 KOY ae a NW L di W ANN Edi Vo 34 42 6 I O Summary 35 GWT 4 3 4 4 6 1 forschung innovation Informational Charts 4 3 1 Functio
64. tion is possible with the scroll bar at the bottom The Process Timeline resembles the Master Timeline with slight differences The charts timeline is divided into levels which represent the different call stack levels of function calls The initial function begins at the first level a sub function 24 waa waa waa mama DOMM 0 BHHHHHM aEHHEHHH ZBHBHBHBHH mamHHEHHH nnaman EBHEHBHHH ocooooooe wao DUBDHEHHH mumpnnu DBBBHH mmmnnu oonnmm manom onon oocse zaa Vampir Trace View D ZIH soc128 otf MW File View Help View Chart Filter SRM OC DEMS dg CURRERE A Ax x 25s 40s 55s 70 85s 100 s 115s 130s 145s 160 s 175s 190 s A Process 0 x A EWALD MOD EWALD ENERGY Figure 4 2 Process Timeline called by that function is located a level beneath and so forth If a sub function returns to its caller the graphical representation also returns to the level above In addition to the display of categorized function invocations Vampir s Master and Process Timeline also provide information about communication events Messages exchanged between two different processes are depicted as black lines In timeline charts the progress in time is reproduced from left to right The leftmost starting point of a message line and its underlying process bar therefore identify the sender of the message whereas the rightmost position of the same line represents t
65. tively on Windows a command line invocation is possible C Program Files Vampir Vampir exe trace file To open multiple trace files at once you can take them one after another as com mand line arguments C Program Files Vampir Vampir exe file 1 file n It is also possible to start the application by double clicking on a otf file If Vam pir was associated with otf files during the installation process The trace files to be loaded have to be compliant with the Open Trace For mat OTF standard described in Chapter 1 2 Microsoft HPC Server 2008 is shipped with the translator program et 2otf exe which produces appropriate input files While Vampir is loading the trace file an empty Trace View window with a progress bar at the bottom opens After Vampir loaded the trace data com pletely a default set of charts will appear The illustrated loading process can be interrupted at any point of time by clicking on the cancel button in the lower right corner as shown in Figure 2 3 Because events in the trace file are traversed one after another the GUI will also open but shows only the ealiest information from the trace file For huge trace files with performance problems assumed to be at the beginning this proceeding is a suitable strategy to save time Basic functionality and navigation elements are described in Chapter 3 The available charts and the information provided by them are explained in Chap ter 4 10
66. tricolored and occupy a ke line to the end of the interval To see the whole interval of a single I O event the triangle has to be selected In that case a second triangle at the end of the interval appears Table 4 1 Additional Information in Master and Process Timeline oince the Process Timeline reveals information of one process only short black arrows are used to indicate outgoing communication Clicking on message lines or arrows shows message details like sender process receiver process mes sage length message duration and message tag in the Context View display 26 nmn ann waa waa waa BHHHHHM msmamumnnu naa mmmmmnHnu gBBBHESHR EBHEHBHHH BHEBHHHH mBHEGDUUD DUHUHEHEH mumpnnu EDBBBHH manono ooocse madan onon BmBHBH zaa 4 1 2 Counter Data Timeline Counters are values collected over time to count certain events like floating point operations or cache misses Counter values can be used to store not just hard ware performance counters but arbitrary sample values There can be counters for different statistical information as well for instance counting the number of function calls or a value in an iterative approximation of the final result Counters are defined during the instrumentation of the application and can be individually assigned to processes Vampir Trace View D ZIH wrf 1h otf ExikLOTFERS 66 2 MEGA nn 13 10 13 15 s 13 20 s 13 25 s 13 30 s 13 35 s 13 40 s 1
67. work and therefore can slow down the whole application Vampir Trace View C ZIH soc128 soc128 soc128 otf SE W File Edit Chart Filter window Help EUS ODER SS amp 70 Similar Processes Accumulated Inclusive Time per Function s 200 s a S 600 s 800 s 1 000 s 1 200 s 1 400 s jJ 1 _ Lo j _ L 0 L 0j m _ E 1 1 1 1 1 1 1 1 1 1 1 2 6 1 2 aj Figure 4 9 Process Summary The context menu entry Set Event Category specifies whether either function groups or functions should be displayed in the chart The functions own the color of their function group The chart can calculate the analysis based on Number of Invocations Accu mulated Inclusive Time or Accumulated Exclusive Time To change between these three modes use the context menu entry Set Metric The number of clustered profile bars is based upon the window height by de fault You can also disable the clustering or set a fixed number of clusters via the context menu entry Clustering by selecting the corresponding value in the spin box To the left of the clustered profile bars there is an overview of the cluster associated processes Moving the cursor over the blue places of the rectangle shows you the process name as a tooltip 32 one waa waa waa EELT BHHHHHH mammumnmu naa mEEHHEHHHU EBBHEHHR EBHEHBHBH BHHEBHHHH BHEUDUUD DUHUHEHEH manonn EDBB
68. y It is possible to select markers and types Then all events belonging to that marker or type gets selected in the Master Timeline and the Process Timeline If Ctrl or Shift is pressed the user can highlight several events In this case the user can fit the borders of the zooming area in the timeline charts to the timestamps of the two marker events that were chosen at last 4 3 3 Context View As implied by its name the Context View provides more detailed information of a selected object compared to its graphical representation An object e g a function function group message or message burst can be selected directly in a chart by clicking its graphical representation For different types of objects different context information is provided by the Context View For example the object specific information for functions holds properties like Interval Begin Interval End and Duration as shown in Figure 4 15 The 37 a GWT bsos 23 INFORMATIONAL CHARTS Vampir Trace View D ZIH datatype special otf File View Help View Chart Filter Ere ERS ol VIA mmm seess BI n Property Value Display Master Timeline Type Marker Event Description ERROR MPI Type contiguous oldtype is Fortran Type Time 1 191657s Process Process 0 Marker MARMOT Error Group Error Process Processgroup Process 0 1 191657 s ERROR MPI Type contiguous sidiype ka Fortrer Type Process 1 Pro

Download Pdf Manuals

image

Related Search

Related Contents

第15号 - 松本市地区福祉ひろば  juin 2013 - Besançon Triathlon  8105 Accessoires pour modules E/S  Muriel Jolivet - Editions Philippe Picquier  USER MANUAL - CAMBOARD Electronics  Manual de Usuario User´s manual Manuel d    

Copyright © All rights reserved.
Failed to retrieve file