Home

Vampir 8 User Manual

image

Contents

1. 53 4 4 2 Eramplesi 55 59 5 1 Process Filter 60 EREEREER 61 Do VOR sesionar raras 62 ee ee ee ee AA 63 5 4 1 FiterOptions i 64 rodar dara 67 6 Comparison of Trace Files 77 6 1 Starting and Saving a Comparison Session 78 24 3 9 933 539 aaa 80 poate een eee 2349 29529 9 82 6 4 Usage of Predefined Markersi 84 86 7 1 General Preferences 0 0 eee ee ee ee 86 7 2 Appearance 87 pene aaa as 88 90 6 1 IMIOOUCTION z2e9 522 524L2 524 559 34 53534 90 8 2 Identified Problems and Solutions 91 8 2 1 Computational Imbalance 91 8 2 2 Serial Optimization 93 8 2 3 High Cache Miss Rate o 94 8 3 Conclusion llle ers 96 aaa waa AA TO AA ELO A mannana a EHHEBHHEH BHBHHHHH maHHEHHH BBBHEUHE BHEHBHHHU EEE CTO BHEBBHHH EEE ae DBDBBHH manono DBHHBH manom nnnu BBHHBH BHBBDDDDDUDUDDHHEHHS CHAPTER 1 INTRODUCTION e 1 Introduction Performance optimization is a key issue for the development of efficient parallel soft ware applications Vampir provides a manageable framework for analysis which en ables developers to quickly display program behavior at any level of detail Detailed pe
2. aa mamaa BHBHBHH maana BHHHHH maaa BUBBH mannana a EHEBHHEH BHBHHHHH mBBHHHEHHH ERECTO BHEHHHHU EEE CTO BHEBBHHH EEE ao hhh ADO DBHHBH mamom oooo BHHBH FLOPS of SOLVE EM x Custom Metrics Description FLOPS of SOLVE_EM Unit Ys Metric x Trace Counter PAPI FP OPS v Increments per Second Operation x Multiply w _0 Metric Function is Active oA apply Q conc Qo Figure 4 32 Custom metric showing FLOPS only for function SOLVE EM Vampir also allows to search for invocations of individual functions below or above a certain threshold In this example invocations of the function SOLVE EM with a FLOP rate above 150 M are searched Therefore the first step is to construct a custom metric showing the FLOP rate only for the function SOLVE EM The process of constructing a custom metric is described in more detail in Section 4 4 The constructed custom metric is depicted in Figure 4 32 Figure 4 33 shows the constructed metric in the overlay The color scale is set to highlight only functions above 150 M FLOPS When zooming into an area of interest the opacity slider can be used to reveal individual function invocations in the timeline Figure 4 34 57 Amy OO Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help uu SA Ya 3 V Timeline Ax Os 255 50 s Zu 1005 1255 1505 1
3. V soy cance 200 Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help u mises Sy Timeline 1 239 6 05 6 55 7 0s 22 5 8 05 ki Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 MED INITIALDATA INPUT MODULE IO DOMAIN INPUT MODEL INPUT WRF_INPUTIN Figure 5 8 Showing only functions with more than 250 ms duration 69 Amy Combining Function Name and Duration Rules This example combines the two previous rules First the any relation is used Thus the filter shows all functions that have at least 250 ms duration time and additionally also all MPI functions Show only functions that match any of the following conditions name comtains 72 mpi Duration Is greaterthan 250 Miliseconds FIC 200 Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help m wie ttre sa Br Y Timeline 533 6 05 6 55 7 05 7 35 8 0 s gt Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 Figure 5 9 Combining rules using any 70 mamma 1
4. 0G m e N e UJ e Pes e Figure 4 9 Performance Radar 200 Vampir Trace View Vampir Large wrf otf Y File Edit Chart Filter Window Help z z 2 VET A Srusenvdtmass BO Timeline Os 255 50s 75s 100s 125s 150s 1798 200s Values of Metric PAPI_FP_OPS over Time l Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Loe 0G 1G 2G 3G 4G 4 D Figure 4 10 Adjusted value range in color scale 35 7 A GWT bismo 4 TIMELINE CHARTS with the highest or lowest value is displayed respectively In case of Average the aver age of all data points on the respective pixel width is displayed This procedure is also explained in section Counter Data Timeline The value range of the color scale can be easily adjusted with the left mouse button To adjust the color coded value rage just drag the edges of the color scale to the desired positions Figure 4 10 depicts the Performance Radar chart shown in Figure 4 9 with a smaller value range of 1 G 3 G FLOPS This allows to easily spot areas of high or low performance in the trace file The selected value range can also be dragged to other positions in the color scale A double click with the left mouse button on the color scale resets the selected value range The option Options Color Scale in the context me
5. Y d E C d E P d Process 0 680 MiB s Process 1 O O O 640 MiB s Process 2 CB CB C3 Process 3 a O 560 MiB s Process 4 LB CB CB mun Process 5 OD C3 CB C3 aa dece Process 6 D z 400 929 Process 7 Process 8 CB CB CB a Process 9 O 0D C3 280 MiB s Process 10 LB CB CB CD 240 MiB s Process 11 e CB C3 200 MiB s Process 12 CBS cz 160 MiB s Process 13 O C3 120 MiB s Process 14 CB CB CB 80 MiB s Process 15 OD O 40 MiB s 0 MiB s Figure 4 20 Communication Matrix View The chart as shown in Figure 4 20 is figured as a table Its rows represent the sending processes whereas the columns represent the receivers The color legend on the right indicates the displayed values It adapts automatically to the currently shown value range It is possible to change the type of displayed values Different metrics like the aver age duration of messages passed from sender to recipient or minimum and maximum 45 ZEE GWTo 0000 42 STATISTICAL CHARTS bandwidth are offered To change the type of value that is displayed use the context menu option Set Metric Use the Process Filter to define which processes groups should be displayed see Section 5 1 Note A high duration is not automatically caused by a slow communication path be tween two processes but can also be due to the fact that the time between starting transmission and successful reception of the message can be increased by a recipient that delays reception
6. SET TILES2 3 373 ms 4 106 ms B malloc 7 400 us 13 950 us n free 8 250 us 15 950 us WRF MESSAGE 140 600 us 282 700 us REGION BOUNDS 1 572 ms 1 803 ms NL GET TILE SZ Y 24 800 us 25 550 us NL GET TILE SZ X 24 950 us 28 450 ys write x Callers Callees Ifl EXT NCD PUT DOM TI INTEGER 2 Ill EXT NCD PUT DOM TI REAL 2 Il ExT NCD IOSYNC 1 END TIMING 3 F MM LANDUSE INIT 1 M EXT NCD REALFIELDIO 1 M EXT NCD INTFIELDIO 1 Im Mi init 1 WRF TERMIO DUP 1 WRF MESSAGE 6 M INITIAL CONFIG 1 Figure 4 22 Call Tree 4 3 Informational Charts 4 3 1 Function Legend The Function Legend lists all visible function groups of the loaded trace file along with their corresponding color If colors of functions are changed they appear in a tree like fashion under their respec tive function group as well see Figure 4 23 4 3 2 Marker View The Marker View lists all marker events included in the trace file The display organizes the marker events based on their respective groups and types in a tree like fashion Additional information like the time of occurrence or descriptions are provided for each marker By clicking on a marker event in the Marker View this event becomes selected in the timeline displays If this marker is located outside the visible area the zoom jumps to this event automatically It is possible to select marker events by their type as well Then all events belon
7. aa mamaa BSBHBHH maana BHHHHH mannana 0 BHHHHUEH EHEBHHHEG EFECTO mauHEHHH ERECTO BHHHHHUU EEE CTO BHEBBHHH EE mooooo DBDBBHH mmmEnnu EEE AD nnnu BBHHBH MOOD ODO OOOO The second example illustrates the usage of the a relation Here all shown functions have to satisfy both rules Therefore the filter shows only MPI functions that have a duration time of more than 250 ms Show only functions that match al of the following conditions Name Contains mpi a Duration Isgreaterthan 250 Milliseconds Vy canca 200 Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help xite wmuaSuut Timeline 5 55 6 05 6 55 Process 0 gt n gt p Process 1 f Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 Figure 5 10 Combining rules using all 71 P d GWT comen 84 FUNCTIONFILTER Building Ranges with Number of Invocation Rules The combination of rules also allows for the filtering of functions in a specified criteria range The following example filter setup shows all functions whose number of invoca tions lie inside the range between 2000 and 15000 Y Show only functions that match al of the following conditions Number o
8. s Lu Type Process Processgroup r1 datatype special otf Warning MARMOT Warning Error fll MARMOT Error Process 2 Process 1 Process 3 Marker View Time 0 113515 s 0 11319 s 0 114129 s Duration Description 0s ERROR MPI Type contiguous oldty 0s ERROR MPI Type a M us Process 0 arker2 potential vt otf Warning MARMOT Warning Error 3 ffl MARMOT Error Process 0 0 113857 s 0 1148 s ERROR MPI Type contiguous as ERROR MPI Send datatype is MPI_I n Zoom Between Marker Align Traces at Marker Figure 6 11 Jump to a marker in the Master Timeline The Comparison View provides two additional ways of navigating with markers If two markers of one trace are selected in the Marker View the button Zoom Between Marker sets the trace zoom to the according timestamps of the markers If two markers of dif ferent traces are selected the button A ign Traces at Marker adjusts the time offset between the respective traces The selected markers are shown next to each other in the timeline charts and consequently both traces are aligned at the respective mark ers 85 7 Customization The appearance of the trace file and various other application settings can be altered in the preferences accessible via the main menu entry File Preferences Settings concerning the trace file itself e g layout or function group colors are saved individually next to t
9. Scalasca Vampir and Tau Amy Amy that obviates the need for multiple repetitions of the instrumentation and thus substan tially reduces the amount of work required It is open for other tools as well Moreover Score P provides the new Open Trace Format Version 2 OTF2 for the tracing data and the new CUBE4 profiling data format which allow a better scaling of the tools with respect to both the run time of the process to be analyzed and the number of cores to be used Score P supports the programming paradigms serial OpenMP MPI and hybrid MPI combined with OpenMP Internally the instrumentation itself will insert special measurement calls into the ap plication code at specific important points events This can be done in an almost automatic way using corresponding features of typical compilers but also semi auto matically or in a fully manual way thus giving the user complete control of the process In general an automatic instrumentation is most convenient for the user This is done by using the scorep command that needs to be prefixed to all the compile and link commands usually employed to build the application Thus an application executable app that is normally generated from the two source files app1 f90 and app2 f90 via the command mp1 90 app 90 appZ r29U 0 app will now be built by scorep mp1f90 appl t90 app2 90 0o app using the Score P instrumentor When makefiles are employed to build the application it
10. This filter mode provides a number input field to select the call level Available options e Is greater than All functions whose enter event is higher than the specified level are shown e Is less than All functions whose enter event is lower than the specified level are shown 66 waa AA TO AA ELO A mannana a EHEBHHEG BHBHHHHH maHHEHHH BBBHEUSE BHEBHEHUU DUBBGHHHH BHEBBHHH EE mHmpnnu DBDBBHH ADO ooo ADD oooo BHHBH BHBBDDDDDODUDDHHUEHHS 5 4 2 Examples In this chapter a few examples explain the usage of the function filter This enables the user to understand the basic principles of function filtering in Vampir at a glace It also illustrates a part of the set of available filter options provided by Vampir Unfiltered Trace File This section introduces the example trace file in an unfiltered state The timelines show a part of the initialization of the WRF weather forecast code The red color corresponds to communication MPI whereas the purple areas represent some input functions of the weather model BOO Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help mie w as t V Timeline 5 5 S 6 0 s 6 55 7 05 7 985 8 0 s Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 Figure 5 6 Maste
11. of all selected measuring points e g processes at a given time The red maximum line shows the highest value that one of the selected measuring points achieved at a given time A click with the left mouse button on any point in the chart reveals its details in the Context View display Stated are the minimum maximum and average values and the measurement points e g processes that achieved maximum and minimum values at the selected point in time The options dialog is depicted in Figure 4 8 It is accessible via the context menu under Options lt allows to enable and disable the display of the graph s line data points 33 Ap GWT scission 4 TIMELINE CHARTS Options Graph elements Fill Line Points Maximum v v Average ul v Minimum Show total average line Show caption Show zero line Y Show y axis label Y Apply o Cancel Figure 4 8 Counter Timeline options dialog and filling It is also possible to enable an average line showing the average value of all data points in the visible area Likewise the chart s caption and y axis label can turned on and off The switch Show zero line disables the auto scaling of the y axis for the lower bound and enforces a zero line in all situations The Counter Data Timeline chart allows to create custom metrics This process is described in Section Created custom metrics become available in the Select Metric dialog 4 1 3 Performance Radar The Performance Ra
12. Function Filter The filtering of functions in Vampir is controlled via the Function Filter Dialog which can be accessed via the main menu under Filter Functions Initially a list of available rule sets is depicted as can be seen in Figure By default the list only shows a None entry It can only be one filter active at a given time To select the active filter use the radio buttons on the left hand side of the list Clicking on the Add button creates a new set of rules and shows the input mask depicted in Figure 5 5 Filter Functions Active Description Add Duplicate Remove Import Export Von ens Her Figure 5 4 Function Filter Dialog with List of Rule Sets The Function Filter Dialog is build on the concept of filter rules The user can define several individual rules The rules are explained in more detail in Chapter 5 4 1 The header of the dialog defines how multiple rules are evaluated One possibility is to build up the filter in a way that combines the filter rules with an and relation To choose this mode a must be selected in the combo box in the header of the dialog This means that all rules must evaluate to true in order to produce the filter output The other option is to combine the rules with an or relation To choose this mode any must be selected in the combo box in the header of the dialog In this case any rule must be evaluate to true in order to produce the filter output The examples i
13. Y Sale Timeline Ax 84 7 s 84 85 84 9 5 85 0 5s 85 1s 85 25 Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Sa SE L AAA A Figure 4 1 Master Timeline 200 Vampir Trace View Vampir Large wrf otf YW File Edit Chart Filter Window Help Ea E E FAN En Timeline 13258 13 505 13 75 s 14 005 14 255 14 505 14 755 15 00 s 15 25 5 Process 0 E I rmg ml rmm t I Mi 1 al OW ON OU y hN rm m Figure 4 2 Process Timeline 28 maa 5 BHH mamma 3 aa mamaa BSUBHBHH maana BHHHHH maaa 1 1 BUBB mannana sm BHHHUEH EHEBHHEH BHBHHHHH muuHEHHH ERECTO BHEBHHUU ERECTO BHEBBHHH EEE mHHEHnH DBDBBHH mmmEnnu ETT manom nnnu BHHBH BHBBDDDDDUDUDDHHEHHS Figure 4 3 Selected MPI Collective in Master Timeline group MPI Clicking on a function highlights it and causes the Context View display to show detailed information about that particular function e g its corresponding func tion group name time interval and the complete name The Context View display is explained in Chapter 4 3 3 Clicking on a process label on the left hand side provides basic information about the process in the Context View and highlights the label Pro cess rows c
14. and Function Summary showing an overview of the pro gram run Getting a grasp of the program s overall behavior is a reasonable first step In Figure 8 1 Vampir has been set up to provide such a high level overview of the model s code This layout can be achieved through two simple manipulations Set up the Master Timeline to adjust the process bar height to fit the chart height All 100 processes are now arranged into one view Likewise change the event category in the Function 90 TAL mamma 5 aa mamaa GBHBHBHH maana BHHHHH mana 3 BUBB mannana a EHHEBHHEH EEC mHHHEHHH ERECTO BHEBEHHU EEE CTO OE EE aoe DDBBHH ADO na manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS CHAPTER 8 AUSE CASE AA summary to show function groups This way the many functions are condensed into fewer function groups One run of the instrumented program took 290 seconds to finish The first half of the trace Figure 8 1 A is the initialization part Processes get started and synced input is read and distributed among these processes The preparation of the cloud microphysics function group MP is done here as well The second half is the iteration part where the actual weather forecasting takes place In a normal weather simulation this part would be much larger But in order to keep the recorded trace data and the overhead introduced by tracing as small as possible only a few iterations have been
15. be found in the VampirTrace user manual 2 2 3 Event Tracing for Windows ETW The Event Tracing for Windows ETW infrastructure of the Windows client and server OS s provides a powerful software monitor Starting with Windows HPC Server 2008 MS MPI has built in support for this monitor It enables application developers to quickly produce traces in production environments by simply adding an extra mpiexec flag trace Trace files will be generated during the execution of your application The recorded trace log files include the following events Any MS MPI application call and low level communication within sockets shared memory and NetworkDirect im plementations Each event includes a high precision CPU clock timer for precise visu alization and analysis http www tu dresden de zih vampirtrace 11 Amy Amy The steps necessary for monitoring the MPI performance of an MS MPI application are depicted in Figure First the application needs to be available throughout all compute nodes in the cluster and has to be started with tracing enabled The Event Tracing for Windows ETW infrastructure writes event logs et files containing the respective MPI events of the application on each compute node In order to achieve consistent event data across all compute nodes clock corrections need to be applied This step is performed after the successful run of the application using the Microsoft tool mpicsync Now the event log files can be
16. code which determines the size of the work packages for each process had to be changed To achieve the desired effect an improved ver sion of the domain decomposition has been implemented Figure 8 3 shows that all occurrences of the MICROPHYSICS routine are vertically aligned thus balanced Ad ditionally the MPI receive routine calls are now clearly smaller than before Comparing the Function Summary of Figure 8 2 and Figure B 3 shows that the relative time spent in MPI receive has been decreased and in turn the time spent inside MICROPHYSICS has been increased greatly This means that we now spend more time computing and less time communicating which is exactly what we want 8 2 2 Serial Optimization Inlining of frequently called functions and elimination of invariant calculations inside loops are two ways to improve the serial performance This section shows how to detect candidate functions for serial optimization and suggests measures to speed them up Problem All performance charts in Vampir show information of the time span currently selected in the timeline Thus the most time intensive routine of one iteration can be determined by zooming into one or more iterations and having a look at the Function Summary The function with the largest bar takes up the most time In this example Figure 8 2 the MICROPHYSICS routine can be identified as the most costly part of an iteration Therefore it is a good candidate for gaining speedup
17. converted into OTF files with help of the tool et 12o0t The last necessary step is to copy the generated OTF files from the compute nodes into one shared directory Then this directory includes all files needed by Vampir The application performance can be analyzed now Rank O node myApp exe Run myApp with tracing enabled 389m gt bg ET Time Sync the ETL logs Convert the ETL logs to OTF mpicsync Copy OTF files to head node zb a cR o etl2otf HEAD NODE share Rank 1 node 3 lt gt p o 7 Figure 2 1 MS MPI Tracing Overview The following commands illustrate the procedure described above and show as a prac tical example how to trace an application on the Windows HPC Server 2008 For proper utilization and thus successful tracing the file system of the cluster needs to meet the following prerequisites e share userHome Is the shared user directory throughout the cluster e MS MPI executable myApp exe is available in the shared directory e share userHome Trace is the directory where the OTF files are collected 1 Launch application with tracing enabled use of tracefile option 12 o waa mamaa BHBHBHH maana BHHHHH maaa 1 BUBB mannana 0 a EHHEBHHEH EFECTO maHHEHHH ERECTO BHEHHHUU EEE CTO BHEBBHHH EEE mHHEHnnH DBDBBHH ADO DBHHBH manom oooo BHHBH S08 000000000000000m8 CHAPTER
18. example demonstrates the opposite behavior of the previous example In call paths that contain the function WRF_INPUTIN only functions that lead to WRF_INPUTIN are shown The function WRF INPUTIN itself and their directly or indirectly called sub functions are filtered Other call paths remain unaffected by the filter and are still shown Filter Functions Show only functions that match any gt of the following conditions Description Filter callPath 8 Does not contain WRF_INPUTIN se ls Yao Goma 200 Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help Suali II tithes S BH Y Timeline Ax 6s 7s 8s 9s 10s lls 12s 13s 14s A Process 0 INPUT_WRF Process 1 INPUT_WRF Process 2 INPUT_WRF Process 3 INPUT_WRF Process 4 INPUT_WRF Process 5 INPUT_WRF Process 6 INPUT_WRF Process 7 INPUT_WRF Process 1 MED INITIALDATA INPUT INPUT_MODEL_INPUT O wo On OU un m 11 35 s Figure 5 14 Call path filter which does not contain WRF_INPUTIN 19 gt GWT roewe 0 84 FUNCTION FILTER Showing only Functions until a certain Call Level This example demonstrates the filtering of functions by their call level Here only func tions with an enter event less then call level five are shown All other functions are filtered Show only functions that match of the following conditions Description Filte
19. for some reason This will cause the duration to increase by this delay and the message rate which is the size of the message divided by the duration to decrease accordingly 4 2 5 1 0 Summary The O Summary depicted in Figure 4 21 is a statistical chart giving an overview of the input output operations recorded in the trace file 23200 Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help mrisedismaxso f l i I O Summary All Processes Number of I O Operations per File Name 60 k 45k 30 k isk Ok Sum EX oot 01 20 0 24 00 00 00 13 219 zs input o0 LEX lt sto0uT gt 1 341 J wrfbdy_ do1 342 lt STDERR gt 17 work home0 ml namelist input 10 work home0 un RRTM_DATA 1 work homeo LANDUSE TBL Figure 4 21 I O Summary All values are represented in a histogram like fashion The text label indicates the group base while the number inside each bar represents the value of the chosen metric The Set Metric sub menu of the context menu is used to switch between the available metrics Number of I O Operations Aggregated I O Transaction Size Aggregated I O Transaction Time and values of VO Transaction Size VO Transaction Time or VO 46 TAL mamma 5 aa mamaa GBHBHBHH maana BHHHHH mana 3 BUBB mannana a EHHEBHHEH EEC mHHHEHHH ERECTO BHEBEHHU EEE CTO OE EE aoe DDBBHH ADO na manom oooo BHHBH BHBBD
20. iteration but takes between 1 7 and 1 3 seconds to finish This imbalance leads to idle time in subsequent syn chronization calls on the processes 1 to 4 because they have to wait for process O to finish its work marked parts in Figure 8 2 This is wasted time which could be used for 91 a Aa CWT 8 2 IDENTIFIED PROBLEMS AND SOLUTIONS gt No Vampir Trace View Vampir SuccessStory pmp old otf Y Sle Edit Chart Filter Window Help 0 X SRA MICROPAYSICS Figure 8 2 Before Tuning Master Timeline and Function Summary identifying MICRO PHYSICS purple color as predominant and unbalanced 200 Vampir Trace View Vampir SuccessStory pmp tuned otf Y Sle Edit Chart Filter Window Help 10 x SEIZE MICROPHYSICS 11 55 MPI_Recv Figure 8 3 After Tuning Timeline and Function Summary showing an improvement in communication behavior 92 maa waa mamaa BSBHBHH maana BHHHHH mana mBUBB mannana a EHHEBHHEG ERRE mHHHEHHH ERECTO BHBHHHHU DHUBBGHHHH BHEBBHHH DBDHHEBHH mHHEnnu DBDBBHH ADO BDBHHBH manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS CHAPTER 8 AUSE CASE X weR computational work if all MICROPHYSICS calls would have the same duration An other hint at this overhead in synchronization is the fact that the MPI receive routine uses 17 6 of the time of one iteration Function Summary in Figure B 2 Solution To even out this asymmetry the
21. recorded This is sufficient since they are all doing the same work anyway Therefore the simulation has been configured to only forecast the weather 20 seconds into the future The iteration part consists of two large iterations Figure B 1 B and C each calculating 10 seconds of forecast Each of these in turn is partitioned into several smaller iterations For our observations we focus on only two of these small inner iterations since this is the part of the program where most of the time is spent The initialization work does not increase with a higher forecast duration and would only take a relatively small amount of time in a real world run The constant part at the beginning of each large iteration takes less than a tenth of the whole iteration time Therefore by far the most time is spent in the small iterations Thus they are the most promising candidates for optimization All screenshots starting with Figure 8 2 are in a before and after fashion to point out what changed by applying the specific improvements 8 2 Identified Problems and Solutions 8 2 1 Computational Imbalance A varying size of work packages thus varying processing time of this work means waiting time in subsequent synchronization routines This section points out two easy ways to recognize this problem Problem As can be seen in Figure 8 2 each occurrence of the MICROPHYSICS routine purple color starts at the same time on all processes inside one
22. regular expressions to filter processes It is also possible to use a wildcard like Process You can escape characters with a backslash If no process name matches the given pattern a simple string comparison will be performed Examples e Process x 13579 matches odd numbered items e Process 0 1 S matches eyery tenth item e Process 0 2 matches eyery hundredth item e Process matches all containing Process e Process matches exactly Process 60 aaa mamma aa mamaa 1 1 BHHBHH maana BHHHHH mana BUBBH mannana BHHHHUEH EHEBHHEH ERRE mHHHEHHH ERECTO BHEHBHHHU EEC BHEBBHHH EEE mHHEHDH DBDBBHH ADO ooo manom oooo BBHHBH BHBBDDDDDUDUDDHHEHHS 5 2 Message and Collective Operations Filter Filter Messages Message Communicators Message Tags Y Include Exclude All Include Exclude All MPI Communicator O amp 4 amp 4 44444444 A w 12 os Gea Figure 5 2 Message Filter Figure shows a Message Filter dialog This dialog allows to filter messages from the displayed trace data Available options are to select deselect messages based on their Message Tag or Message Communicator The default is to show all messages The Collectives Filter is designed accordingly It allows to filter collective operations from the displayed trace data The collectives can be filtered by their Communicator or their Collective Ope
23. the calculation of accumulated measurements in the statistical charts Statistical charts like the Function Summary provide zooming of statistic values In these cases zooming does not affect any other chart Zooming is disabled in the Pie 22 o TT waa AA TO AA ELO A mannana ana waa nana mmuHEHHH ERECTO wao EEE CTO BHEBBHHH EEE ao DBDBBHH ADO BDBHHBH manom nnnu BHHBH CHAPTER 3 BASICS YAMPIR 200 Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help mile 5d Timeline 70s 75s 80s 100s 1055s 1105 Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 lt Ki 83 6 s 98 2 s 14 6 s Figure 3 9 Zooming within a Chart Chart mode of the Function Summary accessible via the context menu under Set Chart Mode Pie Chart To zoom into an area click and hold the left mouse button and select the area as shown in Figure 3 9 Itis possible to zoom horizontally and in some charts also vertically In the Master Timeline horizontal zooming defines the time interval to be visualized whereas vertical zooming selects a group of processes to be displayed To scroll horizontally move the slider at the bottom or use the mouse wheel To get back to the initial state of zooming select Reset Horizontal Zoom or Reset Vertical Zoom see Sec
24. the tracing approach records timed application events like function calls and message communication as a combination of timestamp event type and event specific data This creates a stream of events which allows very detailed observations of parallel programs With this technology synchronization and communication patterns of parallel program runs can be traced and analyzed in terms of performance and correctness The analysis is usually carried out in a postmortem step i e after completion of the program It is needless to say Ay Ay that program traces can also be used to calculate the profiles mentioned above Com puting profiles from trace data allows arbitrary time intervals and process groups to be specified This is in contrast to profiles accumulated during runtime 1 2 The Open Trace Format OTF The Open Trace Format OTF was designed as a well defined trace format with open public domain libraries for writing and reading This open specification of the trace information provides analysis and visualization tools like Vampir to operate efficiently at large scale The format addresses large applications written in an arbitrary combination of Fortran77 Fortran 90 95 etc C and C Local Definitions name x def M Events name x events Statistics name x stats Snapshots name x snaps ronen Local Definitions Events Master Contr
25. the zoom area changes the section that is displayed without changing the zoom factor For dragging click into the highlighted zoom area and drag and drop it to the desired position Zooming and dragging within the Zoom Toolbar is illustrated in Figure 3 10 If the user double clicks in the Zoom Toolbar the initial zooming state is reverted 200 Vampir Trace View Vampir Large wrf otf Y File Edit Chart Filter Window Help Figure 3 10 Zooming and Navigation within the Zoom Toolbar A B Zooming in out with the Mouse Wheel C Scrolling by Moving the Highlighted Zoom Area D Zooming by Selecting and Moving a Boundary of the Highlighted Zoom Area The colors represent user defined groups of functions or activities Please note that all charts added to the Trace View window will calculate their statistic information ac 24 aaa mamma aa mamaa GBHBHBHH maana BHHHHH maaa BUBBH mannana a EHHEBHHEH BHBHHHHH mmuHEHHH ERECTO BHEHBHHHU ERECTO mBHEBBHHH EEE mHHEHDH DDBBHH mmmnnu ETT manom nnnu BHHBH BHBBDDDDDUDUDDHHEHHS CHAPTERS BASICS XAweR cording to the selected time interval zooming state in the Zoom Toolbar The Zoom Toolbar can be enabled and disabled with the toolbar s context menu entry Zoom Tool bar 3 5 The Charts Toolbar Icon IFE 4 ill 4 0 8 rr 2222 PEE eer K ar ONG Name Master Timeline Proce
26. through serial optimization tech niques Solution In order to get a fine grained view of the MICROPHYSICS routine s inner workings we had to trace the program using full function instrumentation Only then it was possible to inspect and measure subroutines and subsubroutines of MICROPHYSICS This way the most time consuming subroutines have been spotted and could be analyzed for optimization potential 93 a au CWT 8 2 IDENTIFIED PROBLEMS AND SOLUTIONS The review showed that there were a couple of small functions which were called a lot SO we simply inlined them With Vampir you can determine how often a functions is called by changing the metric of the Function Summary to the number of invocations The second inefficiency we discovered had been invariant calculations being done in side loops So we just moved them in front of the respective loops Figure 8 3 sums up the tuning of the computational imbalance and the serial optimiza tion In the timeline you can see that the duration of the MICROPHYSICS routine is now equal among all processes Through serial optimization the duration has been decreased from about 1 5 to 1 0 second A decrease in duration of about 339 is quite good given the simplicity of the changes done 8 2 3 High Cache Miss Rate The latency gap between cache and main memory is about a factor of 8 Therefore optimizing for cache usage is crucial for performance If you don t access your data in a linear fas
27. 000 are shown i e functions with less than 2000 invocations and functions with more than 15000 invocations Y Show only functions that match any of the following conditions Number of Invocations Is less than 2000 Number of Invocations Is greater than 15000 209 9 Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help Sa uu EA EAS Br Y Timeline Function Summary Os 10s 20s 305 All Processes Number of Invocations per Function 7 Ok Processo SEES O AA vP Wait occi SN E ER SIE ver send Process2 ST ae ESTERO MPi Irecv Process3 hah EH malloc Posi AHA 1 920 MPI_Gather Process 5 TO GETISEE GEREERETE 1 920 MODULE PHYSI NDC ADD A2A Process 6 TO EEEE FEH 1 680 MODULE_BC ZERO_GRAD_BDY Process7 S E 1 680 MODULE_BC_E DYUPDATE_PH Process 8 TO HEE F 1 680 MODULE_SMAL ADVANCE_W Process 9 O E eee 1 680 MODULE SMALL EM SUMFLUX Process 10 MI RERTE ERES 1 680 MODULE_SMAL DVANCE_MU_T Process 11 S E E 1 680 MODULE_SMALL ADVANCE_UV Process 12 S HH eae 1 583 EXT NCD SUPP NETCDF ERR Process 13 MI HT RERTIREN 1 440 MODULE BC RELAX BDYTEND Process 14 NENNEN FREE ERREUR 1 440 MODULE BC FLOW DEP BDY 1 212 EXT_NCD_SUPP S LOWERCASE Process 1 960 ALL SUB R 1 960 MPI_Scatterv 2 MODULE_INTEGRATE INTEGRATE 960 MPI_Gatherv 3 SOLVE_INTERFACE 960 MODULE BIG S LC P RHO P
28. 2 GETTING STARTED UA mpiexec wdir share userHome tracefile SUSERPROFILE trace etl myApp exe e wdir sets the working directory myApp exe has to be there e SUSERPROFILE translates to the local home directory e g C Users userHome on each compute node the event log file etl is stored locally in this directory 2 Time sync the event log files throughout all compute nodes mpiexec cores 1 wdir USERPROFILE S mpicsync trace etl e cores 1 run only one instance of mpicsync on each compute node 3 Format the event log files to OTF files mpiexec cores 1 wdir USERPROFILES etl2otf trace etl 4 Copy all OTF files from compute nodes to trace directory on share mpiexec cores 1 wdir USERPROFILE cmd c copy y x otf share userHome Trace 2 3 Starting Vampir and Loading Performance Data x Open Recent File Help W VAMPIR B Recent Ales INampir 5mall wrf otf Y Nampir Large wrf otf Open Other Cancel Figure 2 2 List of recent trace files Viewing performance data with the Vampir GUI is very easy On Windows the tool can be started by double clicking its desktop icon if installed or by using the Start Menu 13 a au GW T o 2 3 STARTING VAMPIR AND LOADING PERFORMANCE DATA On a Linux based machine run vampi r in the directory where Vampir is installed A double click on the application icon opens Vampir on Mac OS X systems At startup Vampir presents a list of recently l
29. 2 file 14 maa waa mamaa GBHBHBHH maana BHHHHH mana 1 BUBHB mannana a EHEBHHEH BHHBHHHHH manomano ERECTO BHEBEHHU EEE CTO BHEBBHHH EE ae DBDBBHH ADO BDBHHBH manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHN CHAPTER 2 GETTING STARTED RR While Vampir is loading the trace file an empty Trace View window with a progress bar at the bottom opens After Vampir loaded the trace data completely a default set of charts will appear The loading process can be interrupted at any time by clicking the cancel button in the lower right corner of the Trace View Because events in the trace file are loaded one after another the GUI will open and show the earliest already loaded information from the trace file The basic functionality and navigation elements of the GUI are described in Chapter 3 The available charts and the information provided by them are explained in Chapter 4 2 3 2 Loading a Trace File Subset To handle large trace files and save time and memory resources it is possible to load only a performance data subset from a trace file For this purpose the open dialog Figure 2 3 provides the button Open Subset Clicking on this button opens a trace data pre selection dialog as depicted in Figure 2 4 An overview snapshot of the recorded application run is given at the top of the dialog The time range of interest can be set with the edge markers on the left and right of the
30. 755 2005s Metric PAPI FP OPS v Opacity x Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Figure 4 5 Active overlay showing PAPI_FP_OPS in the Master Timeline 30 aEH 5 mamma 5 aa mamaa GBSBHBHH maana BHHHHH maaa 1 GB BHBH mannana a EHHEBHHEH BHBHHHHH mBBHHHEHHH ERECTO BHEBEHHU EEC BHEBBHHH EE mHHEHDH DDBBHH ADO Ss ooo manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS CHAPTER 4 PERFORMANCE DATA VISUALIZATION NAME Symbol Description Message Burst Due to a lack of pixels it is not possible to display a i i large amount of messages in a very short time interval Therefore outgoing messages are summarized as so called message bursts In this representation you cannot deter mine which processes receive these messages Zooming into this interval reveals the corresponding single messages Markers To indicate particular points of interest during the run F multiple time of an application like errors or warnings markers can single be placed in a trace file They are drawn as triangles which are colored according to their types To indicate that two or more markers are located at the same pixel a tricolored triangle is drawn I O Events Vampir shows detailed information about I O ope
31. 755s 200s Metric FLOPS of SOLVE EM v Opacity 4 tx Process 0 Process1 TUT ETH th Process2 a m HORADADA OOOJOOOO AAADADUO QOIOONI OOOOOUI WANAO HOAANAAA OODOUOOI WAOANA AONVANDO YANA Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 OUTIL THREE TNT THAT TTT UT HUE EE TTA TT TTT Process 14 M EN NINH HEN UN NE Process 15 OM 50M 100M 150M 200M s D 136 s Figure 4 33 SOLVE_EM invocations with highest FLOP rate OO Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help EXUS ODE SV eee Tt Timeline 131 7893 5 131 7894 s 131 7895 s 131 7896 s 131 7897 s FLOPS of SOLVE EM w Process 0 Process 1 Process 2 ETA Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 SOLVE EM JJ Process 14 Process 15 Opacity c f tx OM 50 M 100M 150 M 200M Figure 4 34 Using the opacity slider to investigate individual invocations of SOLVE EM 08 aEH 5 BHH mamma aa mamaa GBHBHBHH maana 1 HHHHH maaa BUBB mannana s BHHHHHUHH EHEBHHEH BHHBHHHHH mBBHHHEHHH ERECTO BHEBEHHU EEE CTO BHEBBHHH EE aoe DDBBHH mmEnnu ETT manom
32. BBHH mmEnnu hhh ADD oooo BHHBH BHBBDDDDDUDUDDHHUEHHS All available charts work the same way as in the Trace View Due to the fact that the Comparison View couples the zoom of all trace files the charts can be used to directly compare performance characteristics between the traces 200 Comparison View File Edit Chart Filter Window Help MM M_M M e MM C A calcTest otf 8 64m C B calcTest otf C calcTest otf Db 274 ms Timeline Function Summary 6 5 ms 7 0 ms 7 5 ms 8 0 ms 8 5 ms All Processes Accumulated Exclusive Time per Funct 2 ms 1 ms 0 ms 2 l r Process 1 l Process 2 Process 3 Process 0 Process 1 Process 2 mm Summary All Processes Accumulated Exclusive Time per Funct Process 3 5 ms 0 ms MPI Process 0 Process 1 Process 2 Process 3 Process 0 Function Summary All Processes Accumulated Exclusive Time per Funct 2 ms 1 ms 0 ms 2 662 Application IECIT ms CALCULATION TAS MS A TEST VT_API 418 05 us IBI mP Process 0 Process 0 Figure 6 6 Zoom to compute iterations of trace C As shown in Figure 6 5 trace A has the biggest duration time The duration of trace C is so short that it is barely visible Zooming into the compute iteration phase of trace C makes them visible but due to the coupled zoom also displays only the MPI_Init phase of trace A and B see Figure In order to compare the compute iterations b
33. DDDDOUDUDDHHUEHHS Bandwidth with respect to their selected value type Therefore one has the opportunity to switch between the value types Minimum Average Maximum and Average amp Range via the context menu entry Set Value Note There will be one bar for every occurring metric Furthermore the value type Average amp Range gives a quick and convenient overview and shows minimum max imum and average values at once The minimum and maximum values are shown in an additional smaller bar beneath the main bar indicating the average value The additional bar starts at the minimum and ends at the maximum value of the metric see Figure The I O operations can be grouped by the characteristics Transaction Size File Name and Operation Type The group base can be changed via the context menu entry Group I O Operations by In order to select the I O operation types that should be considered for the statistic calculation use the Set O Operations sub menu of the context menu Available op tions are Read Write Read Write and Apply Global I O Operations Filter The latter includes all selected operation types from the O Events filter dialog see Section 5 3 4 2 6 Call Tree The Call Tree depicted in Figure 4 22 illustrates the invocation hierarchy of all mon itored functions in a tree representation The display reveals information about the number of invocations of a given function the time spent in the different calls and the cal
34. Data Timeline chart is shown in Figure The chart is re stricted to one counter at a time It shows the selected counter for one measuring point e g process Using multiple instances of the Counter Data Timeline counters or processes can be compared easily The displayed graph in the chart is constructed from actual measurements data points Since display space is limited it is likely that there are more data points than display pixels available In that case multiple data points need to be displayed on one pixel width Therefore the counter values are displayed in two graphs A maximum line red and an average line yellow When multiple data points need to be displayed on one pixel width the red line shows the data point with the highest value and the yellow line indicates the average of all data points lying on this pixel width An optional blue line shows the lowest value When zooming into a smaller time range less data points need to be displayed on the available pixel space Eventually when zooming 32 maa mamma aa mamaa n GBHHBHH maana BHHHHH maaa 1 BUBHBH mannanna BHHHHUEH EHEBHHEH BHBHHHHH maHHEHHH ERECTO wee ERECTO BHEBBHHH EEE mHHEHDH DBDBBHH mmEnnu BDBHHBH manom nnnu BHHBH BHBBDDDDDODUDDHHUEHHS far enough only one data point needs to be display on one pixel Then also the three graphs will merge together The actual measured data points can be displayed i
35. H 4 SOLVE_EM 896 WRF GLOBAL TO PATCH REAL 5 720 MODULE BC EM HYS BC DRY 1 6 i 720 MODULE BIG S EM CALC PHP 7 720 MODULE BIG S EM CALC ALT 8 o 720 MODULE BIG S EM CALC CQ l 720 MODULE BIG S CALC WW CP i l 720 MODULE BIG LE MOMENTUM AAA Figure 5 12 Show functions outside a specified range 73 P d GWT cone 84 FUNCTION FILTER Call Path contains WRF_INPUTIN In this example only functions that are called directly or indirectly by WRF_INPUTIN are shown As a consequence all call paths start with WRF_INPUTIN All other functions are filtered Filter Functions Show only functions that match any gt of the following conditions Description Filter call Path E Contains WRF_INPUTIN e x Cancel ox BOO Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help 1 nzm 55745 1495 m miumwe 3m y My Timeline 6s 7s 85 9s 105 lls 125 135 145 Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 1 e WN la Figure 5 13 Call path filter which contains WRF_INPUTIN 74 waa mamaa GBHBHBHH maana BHHHHH mannana a EHEBHHHEG ERRE maHHEHHH ERECTO wee ERECTO BHEBBHHH EE mHHEHnH DBDBBHH mmEnnu EEE manom oooo BHHBH nung Call Path does not contain WRF INPUTIN This
36. I Bcast 17 986 BI SOLVE_EM 171 423 s B mpi Wait 160 215 s PI caLc co 138 95 s Bl caLc_P_RHO_PHI 137 865 s BI ADVANCE_w 17 986 BENI SOLVE EM 112 872 s vsu 171 423 s B vni Wait 109 672 s Ill ADVECT_SCALAR 160 215 s BI CALC co 107344 s Bill AbvaNCE uv 138 95 s CALC P RHO PHI 106 174 s MM ADVANCE MU_T 137 865 s ADVANCE_W 112 872 s YSU 109 572 s ADVECT SCALAR 107 344 s ADVANCE UV 106174 s ADVANCE MU T MI Processes Accumulated Exclusive Time per Fun 250s Os RADIATION DRIVER WSM3 MPI Bcast Figure 3 3 Moving and Arranging Charts in the Trace View Window 1 200 Vampir Trace View Vampir Large wrf otf All Processes Accumulated Exclu 250s Os RADI IVER 317708 wsm3 MPI Bcast 17 986 II SOLVE_EM 171 423 s BE mpi Wait 160 215 s BI caLc co 138 95 s Bill CALC _PHI 137 865 s BI ADVANCE_w 112 872 s WI vsu 109 672 s IIl ADVE ALAR 107344 s BN AbvaNCE uv 106174 s Bill ADVA MU_T 85 832 s ALLO IELD 64 356 s Bl SMAL PREP 39 952 s PY CALC_P_RHO 4 722 s Bl PHY_PREP 46 491 s ij SUMFLUX 44 281 s I RK_T ENCY 44 016 s f CUMU IVER 36 653 s Ij RK A DRY 133 484 s fj wois P EM Figure 3 4 Moving and Arranging Charts in the Trace View Window 2 19 Amy menu entry With a few more clicks charts can be combined to a custom chart ar rangement as depicted in Figure 8 2 Customized layouts can be saved as described i
37. Image series showing different opacity settings for the performance data overlay going from zero opacity in the top image to full opacity in the bottom image 38 aaa mamma aa mamaa BSBHBHH maana BHHHHH mana mBUBB mannana a EHEBHHEH BHHBHHHHH maHHEHHH ERECTO wao EEE CTO BHEBBHHH EEE ae DBDBBHH ADO BDBHHBH manom nnnu BHHBH BHBBDDDDDOUDUDDHHUEHHS weather forecast code run The timelines show the initialization in the beginning fol lowed by a number of compute iterations Figure 4 13 depicts this trace file The top image shows the pure timelines of the Master Timeline chart the bottom image shows the values of the PAPI FP OPS counter superimposed on the timelines Here the red areas indicate high computational activity and therefore mark the compute iterations High and Low FLOP Rate 200 Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help m mae J3 wm u amp 573 Timeline AX Os 255 50s 75s 100s 125s 150s 175s 200s TA tx 2 Process 0 Metric PAPI FP OPS v Opacity Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 0G 1G 2G 3G 4G La 3 129 5 s Figure 4 14 Highlighted areas with a low FLOP rate In order to analyze the FLOP rate the overlay mode of the Master Timeline is con
38. It is possible to hide functions and function groups from the displayed information with the context menu entry Filter In order to mark the function or function group to be filtered just click on the associated label or color representation in the chart Using the Process Filter see Section 5 1 allows you to restrict this chart to a set of processes As a result only the consumed time of these processes is displayed for each function group or function Instead of using the filter which affects all other displays by hiding processes it is possible to select a single process via Set Process in the context menu of the Function Summary This does not have any effect on other charts The Function Summary can be shown as Histogram a bar chart like in timeline charts or as Pie Chart To switch between these representations use the Set Chart Mode entry of the context menu The shown functions or function groups can be sorted by name or value via the context menu option Sort By 42 maa waa AA TO AA ELO A mannanna a EHEBHHEH nana wili uana wao EEE CTO BHEBBHHH EE aoe DBDBBHH mmEnnu EEE ADD oooo BHHBH BHBBDDDDDUDUDDHHUEHHS 4 2 2 Process Summary The Process Summary depicted in Figure 4 18 is similar to the Function Summary but shows the information for every process independently This is useful for analyzing the balance between processes to reveal bottlenecks For instance finding that one process spends a significantl
39. N 296 117 us Application Process 0 Process 1 l Process 2 Function Summary Function Summary AS z E E All Processes aie Al Processes Accumulated I E Os 10 ms Oms MPi SSS vP EM 939 444 us VT API VT API 241 939 us TEST 3 68 ms IB Ap on Process 1 228 578 us CA ON 2 176 ms CA ON m 82 214 ys Ap on 1 745 ms TEST Process 3 Ee po a i a Figure 6 4 Open Comparison View To save a comparison session use the menu entries File Save or File Save As This will store a vcompare file containing the compared trace files settings and the Comparison View layout To restore a comparison session simply open the respective vcompare file Previous comparison sessions are also available in the recent open files list of Vampir 79 GWT 5e 6 2 USAGE OF CHARTS 6 2 Usage of Charts For the comparison of performance metrics the Comparison View provides all common charts of Vampir In contrast to the ordinary Trace View the Comparison View opens one chart instance for each trace file i e with three open trace files one click on the Master Timeline icon opens three Master Timeline charts By using the icon menus accessible via the triangles next to the chart icons it is also possible to open only one chart instance for the selected trace Also in order to distinguish the same charts be tween the trace files a dedicated background color is assigned to all charts belongi
40. NISH y Figure 3 7 Docking of a Chart 21 a am UT 88 ZOOMING 23200 Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help rus e3m 5 ITA Function Summary All Processes pe Exclusive Time per Function 400 s 350 s 300 S 250 s 200 S 150 s 100 S 50s A Function Summary All Processes Accumulated Exclusive Time per Function 400s 350s 300s 2505 200 s 150s 100s E Figure 3 8 Resizing Labels A Hover a Separator Decoration B Drag and Drop the Separator e Sort By Rearrange values or bars by a certain characteristic 3 3 Zooming Zooming is a key feature of Vampir In most charts it is possible to zoom in and out to get detailed or abstract views of the visualized data In the timeline charts Zooming produces a more detailed view of a selected time interval and therefore reveals new information that was previously hidden in the larger section Short function calls in the Master Timeline may not be visible unless an appropriate zooming level has been reached In other words if the execution time of functions is too short with respect to the available pixel resolution of your computer display Zooming into a shorter time interval is required in order to make them visible Note Other charts are affected by zooming in the timeline displays The interval chosen in a timeline chart such as Master Timeline or Process Timeline also defines the time interval for
41. Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help ESErusBsBet ites a Sy Timeline Os 255 50s 75s 100s 125s 150s 1755 200s Process 0 Metric PAPI FP OPS w Opacity __ Ctx i Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 H e 0 00 0G l 1G l Values of Metric PAPI_FP_OPS over Time Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 N eo UJ e 5 e 4 w x gt e r2 e N e UJ e de O o 4 de 1325 Figure 4 12 Master Timeline top chart and Performance Radar bottom chart dis playing the same PAPI_FP_OPS counter The selected metric is shown in a color coded fashion like in the Performance Radar chart Figure 4 12 depicts the Master Timeline chart top and the Performance Radar chart bottom both displaying the same performance metric PAPI FP OPS floating point operations per second As can be seen the overlay mode provides the perfor mance data visualization capabilities of the Performance Radar for the Master Timeline To fully benefit from this combination the opacity slider of the overlay control window should be used see Figure 4 13 The slider allows to quickly manipulate the opacity of the overlay and thus making underlying functions visible This is particularly useful for first pinpointing performance relevant areas and then directly analyzing the individual identified funct
42. Trace View Window The utility of charts can be increased by correlating them and their provided informa tion Vampir supports this mode of operation by allowing to display multiple charts at the same time All timeline charts such as the Master Timeline and the Process Time line display a sequence of events Those charts are therefore aligned vertically This alignment ensures that the temporal relationship of events is preserved across chart boundaries The user can arrange the placement of the charts according to his preferences by dragging them into the desired position When the left mouse button is pressed while the mouse pointer is located above a placement decoration the layout engine will give visual clues as to where the chart may be moved As soon as the user releases the left mouse button the chart arrangement will be changed according to his intentions The entire procedure is depicted in Figures 3 3 and The layout engine furthermore allows a flexible adjustment of the screen space that is used by a chart Charts of particular interest may get more space in order to render information in more detail The Trace View window can host an arbitrary number of charts Charts can be added by clicking on the respective icon in the Charts toolbar or the corresponding Chart 18 CHAPTER 3 BASICS 200 Vampir Trace View Vampir Large wrf otf ulated Exclusive Time per Fun 0 s RADIATION DRIVER 317708 wsm MP
43. Using the opacity slider Figure 4 31 the individual function occurrences become visible in the Master Timeline 55 gt Aa CWT 44 CUSTOMIZABLE PERFORMANCE METRICS 200 Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help Eek SB C3R HS oo BEMA Timeline AX Os 255 50s 755 100s 1255s 150s 175s 2005 Process 0 Metric MPI Wait Duration w Opacity e Process 1 i Process 2 Process 3 Process 4 Process 5 i Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 E Process 14 B EM ES Process 15 1325 Figure 4 30 MPI Wait invocations with longest duration gt Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help FEEL LC meline X 28 925 s 29 000 s 29 075 s 29 150 s 29 225 s Processo YSU CUMULUS Metric Process 1 SU CUMULUS E OS Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 X A 0 0 03 0 6 09 12 JN RENI 000 29 048 s Figure 4 31 Using the opacity slider to reveal MPI Wait invocations in the timeline together with the superimposed color coded duration 56 aEH 5h 3 8BHH mamma
44. aining important characteristics of the opened trace file This Trace Properties are displayed in the Context View dialog Section 4 3 3 and can be opened via the main menu under File Get Info The information originates from the trace file and includes details such as file name creator or the OTF version 26 o waa mamaa BHBHBHH maana BHHHHH maaa 1 BUBB mannana 0 a EHHEBHHEH EFECTO maHHEHHH ERECTO BHEHHHUU EEE CTO BHEBBHHH EEE mHHEHnnH DBDBBHH ADO DBHHBH manom oooo BHHBH S08 000000000000000m8 4 Performance Data Visualization This chapter deals with the different charts that can be used to analyze the behavior of a program and the comparison between different function groups e g MPI and Calculation Communication performance issues are regarded in this chapter as well Various charts address the visualization of data transfers between processes The following sections describe them in detail 4 1 Timeline Charts A very common chart type used in event based performance analysis is the so called timeline chart This chart type graphically presents the chain of events of monitored processes or counters on a horizontal time axis Multiple timeline chart instances can be added to the 7race View window via the Chart menu or the Charts toolbar Note To measure the duration between two events in a timeline chart Vampir provides a tool called Ruler The Ruler is enabl
45. an be re ordered by clicking and dragging the process label at the front of each row If a process has been recorded with subordinated information like threads this information can be hidden and exposed by clicking the black arrow shape in front of the process label Some function invocations are very short Hence these are not shown in the overall view due to a lack of display pixels A zooming mechanism is provided to inspect a specific time interval in more detail For further information on zooming see Section 3 3 If zooming has been performed scrolling in horizontal direction is possible with the mouse wheel or the scroll bar at the bottom The Process Timeline resembles the Master Timeline with slight differences The chart s timeline is divided into levels which represent the different call stack levels of function calls The initial function begins at the first level a sub function called by that function is located a level beneath and so forth If a sub function returns to its caller the graphical representation also returns to the level above In addition to the display of categorized function invocations Vampirs Master and Process Timeline also provide information about communication events Messages exchanged between two different processes are depicted as black lines In timeline charts the progress in time is reproduced from left to right The leftmost starting point of a message line and its underlying process bar therefore id
46. ce C to the compute iterations of trace A and B As shown in the Figure 6 9 although the initialization of trace A took the longest this machine was the fastest in computing the calculations 83 Amy Amy 6 4 Usage of Predefined Markers Markers in traces point to particular places of interest in the trace data These markers can be used to navigate in the trace files For trace file comparison markers are inter esting due to their potential to quickly locate places in large trace data sets With the help of markers it is possible to find the same location in multiple trace files with just a few clicks 200 Comparison View File Edit Chart Filter Window Help m mr du W 6 93 i HHN 4 amp 8 i7 pe special otf otential vt otf 1 442 s Timeline 0 25 0 45 0 65s 0 85 1 05 1 25 1 45 0 0s ProcessO_ Process 1 Process 2 WBPWHETTTTIIEEUEm Process 3 REEEHH Em Process 0 Process 1 Process 2 Process 3 T i i i B Marker View Type Process Processgroup Time Duration Description ONE EAA AEE Warning M MARMOT Warning Error lll MARMOT Error Process 2 1 191315s Os ERROR MPI Type contiguous oldtype i Process 1 1 19099s Os ERROR MPI Type contiguous oldtype i Process 3 1 191929 5 Os ERROR MPI Type contiguous oldtype i Process 0 1 1916575 Os ERROR MPI Type contiguous oldtype i arker2 potential vt otf Warn
47. cted deselected by clicking into the check box next to it Furthermore it is possible to select deselect multiple items at once For this mark the desired entries by clicking their names while holding either the Shift or the Ctrl key By holding the Shift key every item between the two clicked items will be marked Holding 99 Amy the Ctrl key on the other hand enables you to add or remove specific items from to the marked ones Clicking into the check box of one of the marked entries will cause selection deselection for all of them 5 1 Process Filter Filter Processes ES Edit process selection using Process Hierarchy gt Include Exclude All Process 0 Y Process 1 Process 2 Process 3 Process 4 Y Process 5 Y Process 6 Process 7 Process 8 Process 9 Process 10 Y Process 11 Y Process 12 Y Number of processes to be displayed 16 out of 16 x Exclude processes with this name part Case sensitive Set Y Apply Q cancel Figure 5 1 Process Filter Figure 5 1 shows a typical process representation in the Process Filter dialog Pro cesses can be filtered by their Process Hierarchy Communicators Process Group and Representative Processes ltems to be filtered are arranged in a spreadsheet rep resentation In addition to selecting or deselecting an entire group of processes it is also possible to filter single processes You can also use
48. dar chart Figure 4 9 displays counter data and provides the pos sibility to create custom metrics In contrast to the Counter Data Timeline the Perfor mance Radar shows one counter for all processes at once The values of the counter are displayed in a color coded fashion The displayed counter in the chart can be chosen via the context menu entry Set Metric Own created custom metrics are listed under this option as well The option Adjust Bar Height to allows to change the height of the displayed value bars in the chart This useful for traces with a large number of processes Here the option Adjust Bar Height to Fit Chart Height tries to display all processes in the chart This provides an overview of the counter data across the entire application run Set Chart Mode allows to define whether minimum maximum or average values should be shown This setting comes into effect when multiple measured data points need to be displayed on one pixel If Maximum or Minimum is active the data point 34 CHAPTER 4 PERFORMANCE DATA VISUALIZATION 3200 Vampir Trace View Vampir Large wrf otf Y File Edit Chart Filter Window Help 3 a y Eri Bes im 25 amp 2 0 Timeline Os 25s 50s 75s 100s 125s 150s 175s 200s Values of Metric PAPI_FP_OPS over Time Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15
49. ds a list of trace files to be compared in the current session The list is editable at any time using the plus and minus buttons Clicking the OK button will load the respective trace files and open the Comparison View Comparison Session Manager Trace Progress Vampir Comparison A calcTest otf Vampir Comparison B calcTest otf Vampir Comparison C calcTest otf je Q cancel qo Figure 6 3 Comparison Session Manager listing three trace files for comparison 78 waa mamaa 1 1 GBHBHBHH maana BHHHHH mannana aa EHEBHHEH BHHBHHHHH maHHEHHH ERECTO wao EEE CTO BHEBBHHH EE aoe DBDBBHH ADO Ss ooo manom oooo BHHBH MODO OOOO Figure 6 4 shows the resulting Comparison View As indicated by the navigation tool bars at the top of the figure all selected trace files are now included in a single Com parison View instance The files in the view are sharing a coupled zoom The usage of charts and zooming in this view is described in the next section 200 Comparison View File Edit Chart Filter Window Help M OD iZ Md gt DA Y C A calcTest otf C B calcTest otf C calcTest otf 16 966 s Timeline SS Function Summary Os 5s 10s 15s All Processes Accumulated Exclusive Time per Functio 40s 20s Os Process 0 MPI Process 1 6 257 s PY v ani Process 2 718 456 us TEST Process 3 480 336 us CALCULATIO
50. ed by default during every zoom operation in a timeline chart In order to use the Ruler for measurement only i e without performing any zoom hold the Shift key pressed while clicking on any point of interest in a timeline chart and moving the mouse while holding the left mouse button pressed A ruler like pattern appears in the timeline chart which provides the exact time between the start point and the current mouse position 4 1 1 Master Timeline and Process Timeline In the Master Timeline and the Process Timeline detailed information about functions communication and synchronization events is shown Timeline charts are available for individual processes Process Timeline as well as for a collection of processes Master Timeline The Master Timeline consists of a collection of rows Each row represents a single process as shown in Figure 4 1 A Process Timeline shows the different levels of function calls in a stacked bar chart for a single process as depicted in Figure 4 2 Every timeline row consists of a process name on the left and a colored sequence of function calls or program phases on the right The color of a function is defined by its group membership e g MPI_Send belonging to the function group MPI has the same color presumably red as MPI Recv which also belongs to the function 27 GWT memos AA TIMELINE CHARTS 200 Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help m YU aa 23 BH
51. entify the sender of the message whereas the rightmost position of the same line represents the receiver of the message The corresponding function calls usually reflect a pair of MPI commu nication directives like MPI Send and MPI Recv Collective communication like MPI_Gatherv Is also displayed in the Master Timeline as shown in Figure 4 3 Furthermore additional information like message bursts markers and l O events is available Table 4 1 shows the symbols and descriptions of these objects since the Process Timeline reveals information of one process only short black arrows are used to indicate outgoing communication Clicking on message lines or arrows shows message details like sender process receiver process message length mes sage duration and message tag in the Context View display 29 7 A GWT corno 4 TIMELINE CHARTS BOO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help elf Skuse ass amp gt MUNIN Timeline Os 25s 50s 75s 100s 125s 150s 175s 200s Process 0 Find MPI Bcast x Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Figure 4 4 Search for MPI_Bcast in the Master Timeline 200 Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help Sruisexytwass amp Timeline Os 255 505 75s 100s 1255 150 s 1
52. etween the traces they need to be aligned properly This process is described in the next section 81 ZI Amy 6 3 Alignment of Multiple Trace Files The Comparison View functionality to shift individual trace files in time allows to com pare areas between traces that did not occur at the same time For instance in order to compare the compute iterations of the three example trace files these areas need to be aligned to each other For the example traces this is required because the initialization of the application took different times on the three machines Set Time Offset k Reset Zoom Ctrl R Reset Time Offset Figure 6 7 Context menu controlling the time offset There are several ways to shift the trace files in time One option is to use the context menu of the Navigation Toolbar A right click on the toolbar reveals the menu as shown in Figure 6 7 The entry Set Time Offset allows to manually set the time offset for the respective trace file The entry Heset Time Offset clears the offset File Edit Chart Filter Window Help m i i 9 6 3 i iW 5 7 _ A calcTest otf _ B calcTest otf 77 828 ms 15 643 s _ C calcTest otf 16 966 s Timeline Os 3s 65 9s 125 15s Process 0 Process 1 3 Process 2 Process 3 Process 0 i Process 1 Process2 Process3 Process 0 E Process 1 Process2 Process3 q D Figure 6 8 Alignment in the Navi
53. f Invocations T Is greater than 2000 Number of Invocations Is less than 15000 200 Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help Slee 1 nu 39 m milite EAG BH Y 39 042 Timeline Function Summary Os 10s 20s 305 All Processes Number of Invocations per Function F 10 k Ok Processo EEEEB EEEE HEEE read Process 1 y 12876 ES Process2 y PCT NENENIN free Process 3 OO MPI Bcast Process 4 write Process5 3 840 MM MODULE BC SPEC_BDYTEND Process 6 3 584 DEBUG 10 WAF Process 7 3 3 584 Ml USE _INPUT_SERVERS Process 8 gt 3 584 PB MODULE_1O R ST_OPERATION Process9 3 472 El CALL_PKG_AND DIST REAL Process 10 gt 2 880 MODULE ADVE VECT SCALAR Process 11 P 2 400 J MODULE SMALL CALC P RHO Process 12 gt 2 160 MODULE EM RK SCALAR TEND Process 13 gt 2 160 MODULE EM R PDATE SCALAR Process14 gt Process 1 1 gt ILI 2 o GT Figure 5 11 Show functions inside a specified range 72 waa mamaa 8BBHBHH maana HHHHH mannana n a EHEBHHEH BHBHHHHH maHHEHHH ERECTO BHEHHHHU EEC BHEBBHHH EEE mHHEnnHu DBDBBHH ADO ooo manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS nung This example demonstrates the opposite behavior of the previous example Here all functions whose number of invocations lie outside the range between 2000 and 15
54. fig ured to show the performance counter PAPI FP OPS To identify functions with a high or low FLOP rate the value range of the color scale can be limited This is done by dragging the edges of the colored area of the scale to the desired minimum maximum values That way only values inside the chosen range appear color coded in the chart Outside values are visualized in gray Figure 4 14 and Figure 4 15 depict two examples Functions with a low FLOP rate are highlighted in Figure 4 14 The color scale is limited to a range between 100 M and 1 6 G FLOPS The minimum value is raised to 100 M in order to gray out non computing functions like MPI In Figure 4 14 all areas with a low FLOP rate are highlighted in red In this example these areas represent functions in the beginning of each iteration Functions with a high FLOP rate are highlighted in Figure 4 15 Here the color scale 39 a GWT baremos A TIMELINE CHARTS 200 Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help mise gimus AAA Timeline qux 0s 255 50s 75s 100s 125s 150s 175s 200 s Metric PAPI FP OPS Opacity m o E MR x A PP es A Process O Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 B 0G 1G 2G 3G 4G 136s Figure 4 15 Highlighted areas with a high FLOP rate is set to highl
55. for trace files without settings Another option is to restore the default settings Then the current preferences of the trace file are reverted 89 8 A Use Case This chapter explains by example how Vampir can be used to discover performance problems in your code and how to correct them 8 1 Introduction In many cases the Vampir suite has been successfully applied to identify performance bottlenecks and assist their correction To show in which ways the provided toolset can be used to find performance problems in program code one optimization process is illustrated in this chapter The following example is a three part optimization of a weather forecast model including simulation of cloud microphysics Every run of the code has been performed on 100 cores with manual function instrumentation MPI communication instrumentation and recording of the number of L2 cache misses 3200 Vampir Trace View Vampir SuccessStory pmp old otf WW Sle Edit Chart Filter Window Help TAT edr FEI EA Timeline m Function Summary 3 0s 50 s 100 s 150 s 200 s 250 s All Processes Accumulated Exclusive Time per F 20 0 same En ss e Process 7 Process 14 15 Process 21 MPI Process 28 MS MP_UTIL Process 35 0 73 Application Process 42 0 32 vT API Process 49 ES 15 0 19 COUPLE Process 56 Process 63 Process 70 Process 77 Process 84 Process 91 l KIC Figure 8 1 Master Timeline
56. formation is provided in the Context View For example the object specific information for functions includes properties like nterval Begin Interval End and Duration shown in Figure 4 25 Objects may provide additional information for some items In that case such items are displayed as links A click double click on OS X systems on the link opens a new tab containing the additional information The Context View may contain several tabs A new empty tab can be added by clicking the symbol on the right hand side Information of new selected objects are always displayed in the currently active tab The Context View offers a mode for the comparison of information between tabs The button on the left hand side allows to choose two objects for comparison It is pos 49 GWT 5e 4 3 INFORMATIONAL CHARTS 200 Vampir Trace View Vampir Comparison Marker2 potential vt otf n Figure 4 24 A chosen marker A and its representation in the Marker View B sible to compare different objects from different charts This might be useful in some analysis cases The comparison shows a list of common properties along with the cor responding values Differences are displayed as well The first line always indicates the names of the respective charts see Figure 4 26 50 CHAPTER 4 PERFORMANCE DATA VISUALIZATION 200 Vampir Trace View i m mam am 1 PIE Vampir Large wrf otf um PL Value Maste
57. forschung innovation GWT Vampir 8 User Manual ERR ERED BERRA ERR BR g gsguam E E E EJ ET ET E AAAA ES AAAA DETE E S E f E E ES EL S f i E E ES EI RI I I I E EJ ES ESI RI I I I OBESE E ES ES AA UNI GWT forschung innovation Copyright 2013 GWT TUD GmbH Blasewitzer Str 43 01307 Dresden Germany http gwtonline de Support Feedback Bugreports Please provide us feedback We are very interested to hear what people like dislike or what features they are interested in If you experience problems or have suggestions about this application or manual please contact sezvice amp vampir eu When reporting a bug please include as much detail as possible in order to reproduce it Please send the version number of your copy of Vampir along with the bug report The version is stated in the About Vampir dialog accessible from the main menu under Help About Vampir Please visit http vampir eu for updates http vampir eu Manual Version 2013 06 Vampir 8 1 aaa waa AA TO AA ELO A mannana a EHHEBHHEH BHBHHHHH maHHEHHH BBBHEUHE BHEHBHHHU EEE CTO BHEBBHHH EEE ae DBDBBHH manono DBHHBH manom nnnu BBHHBH BHBBDDDDDUDUDDHHEHHS Contents VAMBIR Contents 9 1 1 Event based Performance Tracing and Profiling 5 1 2 The Open lraceFormat OTF 6 1 3 Vampir and Windows HPC Server 2008 7 2 Getting Sta
58. gation Toolbar The easiest way to achieve a coarse alignment is to drag the trace file in the Navigation Toolbar While holding the Ctr Cmd on Mac OS X modifier key pressed the trace can 82 aEH 5 BHH mamma aa mamaa 1 1 0 BHHBHH maana BHHHHH maaa 1 3 amp GBUBHBH mannana a EHHEBHHEH EE ETT A ERECTO A EEE CTO BHEBBHHH EEE ae DDBBHH mmmEnnu EEE ADD 0000 BHHBH BHBBDDDDDODUDDHHEHHS be dragged to the desired position with the left mouse button In Figure 6 8 the compute iterations of all example trace files are coarsely aligned 200 comparison View File Edit Chart Filter Window Help riu aA Adobi a AE dr Bry O A calcTest otf IO 8 BARE C B calcTest otf _ C calcTest otf Timeline 16 9650 s 16 9640 s 16 9645 s 16 9655 s 16 9660 s Process 0 Process 1 Process 2 Process 3 Process 0 Process 1 Process 2 Process 3 Be thie Process 0 Process 1 Process 2 Process 3 BO gt 16 9578586 s 0 0002094 s Figure 6 9 Alignment in the Master Timeline After the coarse shifting a finer alignment can be achieved in the Master Timeline or Process Timeline charts Therefore the user needs to zoom into the area to compare Then while keeping the Ctrl Cmd on Mac OS X modifier key pressed the trace can be dragged with the left mouse button in the Master Timeline Figure 6 9 depicts the process of dragging tra
59. ging to that type are selected in the Master Timeline and the Process Timeline By holding the Ctrl or Shift key pressed multiple marker events can 48 O 5 5 5 BHH mamma 5 aa mamaa BHBHBHHH maana BHHHHH maaa 1 1 BUBB mannana BHHHUHH manomano BHBHHHHH maHHEHHH ERECTO wao DHUBGHHHH mHEBBHHH EEE mHHmEnnu DBDBBHH ADO na manom nnnu BHHBH BHBBDDUDDDUDUDDHHEHHS Vampir Trace View Vampir Small wrf otf Y File Edit Chart Filter Wind nx M Y 1 Function Groups Markers Counters Collectives Messages I O Events W OYN Name r m 1 0 Application IO NETCDF DYN Preferences Colo E MEM General Default RI malloc 1 MPI IO NETCDF E m PHYS EM mw free malloc 1 realloc A MPI NoGroup L 1 PHYS 15 VT_API O WRF E Saving Policy Search Y Apply Q cancel ox Figure 4 23 Function Legend be selected If exactly two marker events are selected the zoom is set automatically to the occurrence time of the markers 4 3 3 Context View As implied by its name the Context View provides detailed information of a selected object additional to its graphical representation An object e g a function function group message or message burst can be selected directly in a chart by clicking its graphical representation For different types of objects different context in
60. he trace file in a file with the ending vsettings This way it is possible to adjust the colors for individual trace files without interfering with others The options mport Preferences and Export Preferences provide the loading and sav ing of preferences of arbitrary trace files 7 1 General Preferences The General settings allow to change application and trace specific values Show time as decides whether the time format for the trace analysis is based on sec onds or ticks With the Automatically open context view option disabled Vampir does not open the context view after the selection of an item like a message or function Use color gradient in charts allows to switch off the color gradient used in the perfor mance charts The next option allows to change the style and size of the font Show source code enables the internal source code viewer This viewer shows the source code corresponding to selected locations in the trace file In order to open a source file first click on the intended function in the Master Timeline and then on the source code path in the Context View For the source code location to work properly you need a trace file with source code location support The path to the source file can be adjusted in the Preferences dialog A limit for the size of the source file to be opened can be set too In the Analysis section the number of analysis threads can be chosen If this option is disabled Vampir determines
61. hether to save or discard changes Usually the settings are stored in the folder of the trace file If the user has no write access to it it is possible to place them alternatively in the Application Data Folder All such stored settings are listed in the tab Locally Stored Preferences with creation and modification date Note On loading Vampir always favors settings in the Application Data Folder 88 CHAPTER 7 CUSTOMIZATION y General d Appearance Preferences Saving behavior Locally stored preferences Default preferences Categories Displays Y MasterTimeline Display 1 0 Summary Time Axis Y Performance Radar Y ProcessTimeline Display Y Process Profile Y CommunicationMatrix Display Zoom Display Counter Display Y Message Profile Function Summary Y Appearance Custom Metrics I O Events Y Collectives Function Groups Markers Counters Layout Save changes in selected categories gt Always Never Sron Ocna P Figure 7 3 Saving Policy Settings mamma aa mamaa BHBHBHH maana BHHHHH mannana a EHEBHHEH BHBHHHHH manomano ERECTO wao DUBBHHHH BHEBBHHH EEE mHHEHnnH DBDBBHH mmmEnnu EEE manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS nunnu nna nung Default Preferences offers to save preferences of the current trace file as default set tings Then they are used
62. hion as the cache expects so called cache misses occur and the spe cific instructions have to suspend execution until the requested data arrives from main memory A high cache miss rate therefore indicates that performance might be im proved through reordering of the memory access pattern to match the cache layout of the platform Problem As can be seen in the Counter Data Timeline Figure B 4 the CLIPPING routine light blue causes a high amount of L2 cache misses Also its duration is long enough to make it a candidate for inspection What caused these inefficiencies in cache usage were nested loops which accessed data in a very random non linear fashion Data access can only profit from cache if subsequent read calls access data in the vicinity of the previously accessed data Solution After reordering the nested loops to match the memory order the tuned version of the CLIPPING routine now needs only a fraction of the original time Figure 8 5 94 waa AA TO AA ELO AA mannana a mwaa BHBHHHHH wili ERECTO BHEHBHHHU ERECTO BHEBBHHH EE mHHEnnu DBDBBHH ADO DBHHBH mamom oooo BHHBH BHBBDDDDDUDUDDHHEHHN CHAPTER 8 AUSE CASE XEweR 200 Vampir Trace View Vampir SuccessStory pmp old otf WwW Sle Edit Chart Filter Window Help Xx gt SEI a A E D ood Loc Figure 8 4 Before Tuning Counter Data Timeline revealing a high amount of L2 cache misses inside the CLIPPING rout
63. ibrary allows MPI communication events of a parallel program to be recorded in a trace file Additionally certain program specific events can be included To record MPI communication events simply re link the program with the VampirTrace library A new compilation of the program source code is only necessary if program specific events should be added To perform measurements with VampirTrace the application program needs to be in strumented which is done automatically All the necessary instrumentation steps are handled by the compiler wrappers of VampirTrace vtcc vtcxx vtf77 vtf90 and the ad ditional wrappers mpicc vt mpicxx vt mpif77 vt and mpif90 vt in Open MPI 1 3 All compile and link commands in the used makefile should be replaced by the Vampir Trace compiler wrapper which performs the necessary instrumentation of the program and links the suitable VampirTrace library Simply use the compiler wrappers without any parameters e g vtf90 hello f90 o hello Running a Vampir Trace instrumented application results in an OTF trace file stored the current working directory where the application was executed On Linux Mac OS X and Sun Solaris the default name of the trace file will be equal to the application name For other systems the default name is a ot but can be defined manually by setting the environment variable VT FILE PREFIX to the desired name Detailed information about the installation and usage of VampirTrace can
64. ight only areas with the highest FLOP rate These areas are represented by functions in the compute iterations 40 aEH 5s 8BHH mamma aa mamaa GBHBHBHH maana 1 1 HHHHH maaa BUBB mannana 0 BHHHHUEH EHHEBHHEH ERECTO maHHEHHH ERECTO BHEHBHHHU EEE CTO BHEBBHHH EEE mHHEnnu DBDBBHH mmmEnnu EEE manom oooo BHHBH BHBBDDUDDDUDUDDHHEHHS Memory Allocation BOO Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help m wie titre SB Y U90 5045 Timeline AX Os 255 50 s 755 100s 1255 150 s 1755 200s Process 0 E E Metric MEM_APP_ALLOC Opacity Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 E Process 11 Process 12 Process 13 E Process 14 Process 15 OM 25M 50M 75M 100 M 125M 150M 175M fmm i n 130s Figure 4 16 Functions with 160 MB 175 MB allocated memory The performance data overlay can also be used to identify functions with a certain amount of allocated memory Figure 4 16 shows an example Here functions that have between 160 MB and 175 MB memory allocated are highlighted The highlighted range of allocated memory can be easily changed by adjusting the color scale value range 4 2 Statistical Charts 4 2 1 Function Summary The Function Summary chart Figure 4 17 gives an overview of the accumulated time consumption acros
65. in MARMOT Warning Error E MARMOT Error Process 0 0 125184s Os ERROR MPI Send datatype is MPI DAT 4 _ Align Traces at Marker Zoom Between Marker Figure 6 10 Open Marker View First step in order to use markers is to open the Marker View Figure shows a Comparison View with an open Marker View The markers of all open traces are shown combined in one Marker View After a click on one marker in the Marker View the respective marker is highlighted in the Master Timeline and the Process Timeline 84 CHAPTER 6 COMPARISON OF TRACE FILES nanna mnogo sua COME EHHBHHEH maHHEHHH ERECTO BHEBEHUU EEC mBHEBBHHH DBDBHEBHH mHHEHDH DDBBHH EEEE manom oooo BHHBH S08 000000000000000n8 222099 nunnu nna Another way to navigate to a marker in the timeline charts is to use the Vampir zoom If the user zoomed in the Master Timeline or the Process Timeline into the desired zooming level then a click on a marker in the Marker View will shift the timeline zoom to the marker position Thus the selected marker appears in the center of the timeline chart see Figure 6 1 1 200 Comparison View File Edit Chart Filter Window Help E My is SIE bit dE RO _ pe special otf 3 _ otential vt otf 0 1134 s 0 1136 s Process 0 Process 1 Process 2 Process 3 Process 0 Process 1 Process 2 Process 3 Timeline 0 1138 s 0 1140 s 0 1142 s main main main main
66. ine light blue 200 Vampir Trace View Vampir SuccessStory pmp tuned otf Y Sle Edit Chart Filter Window Help X Figure 8 5 After Tuning Visible improvement of the cache usage 95 a GWT cero 833 CONCLUSION 8 3 Conclusion By using the Vampir toolkit three problems have been identified As a consequence of addressing each problem the duration of one iteration has been decreased from 3 5 seconds to 2 0 seconds 200 Vampir Trace View Vampir SuccessStory pmp tuned otf YW File Edit Chart Filter Window Help m UL eum it om B Ea e a r F E m Timeline A x Function Summa 1 0s 4505s 100 s 150 s 2060 s All Processes Accumulated Exclusive Time per F 40 0 30 0 200 100 0 0 EEE METEO 7 25 PA MP UTIL iL 07 B Application 0 46 VT API 0 27 COUPLE Process 0 Process 7 Process 14 Process 21 Process 28 84156007515 Process 35 Process 42 Process 49 Process 56 Process 63 Process 70 Process 77 Process 84 Process 91 4 118 s 202 324368 s 84 324368 s Figure 8 6 Overview showing a significant overall improvement As is shown by the Ruler see Section 4 1 in Figure 8 6 two large iterations now take 84 seconds to finish Whereas at first Figure 8 1 it took roughly 140 seconds making a total speed gain of 40 This huge improvement has been achieved by using the insight into the program s runtime behavior provided by
67. ing the command line interface see Chapter 2 3 At the first start Vampir will display instructions for license installation 2 2 Generation of Performance Data The generation of trace log files for the Vampir performance visualization tool requires a working monitoring system to be attached to your parallel program The following software packages provide compatible monitoring systems with built in support for the Vampir performance data file format 2 2 1 Score P Score P is the recommended code instrumentation and run time measurement frame work for Vampir 8 The goal of Score P is to simplify the analysis of the behavior of high performance computing software and to allow the developers of such software to find out where and why performance problems arise where bottlenecks may be expected and where their codes offer room for further improvements with respect to the run time A number of tools have been around to help in this respect but typically each of these tools has only handled a certain subset of the questions of interest A crucial problem in the traditional approach used to be the fact that each analysis tool had its own in strumentation system so the user was commonly forced to repeat the instrumentation procedure if more than one tool was to be employed In this context Score P offers the user a maximum of convenience by providing the Opari2 instrumentor as a common infrastructure for a number of analysis tools like Periscope
68. ion Wait Time Metric Function Duration MPI Irecv Inclusive Metric x Function Duration MPI Wait Wal Pa Bre oe Figure 4 28 Custom metrics editor showing the construction of a custom Wait Time metric The metric is defined by the addition of the duration of MPI Irecv and MPI_Wait functions 54 aaa o a A TO AA ELO AA IR a A BZHBHHHHH maHHEHHH ERECTO BHHHEHHU DHUBBGHHHH BHEBBHHH EEE mHEnnu DBDBBHH mmmnnu EEE manom nnnu BHHBH BHBBDDDDDUDUDDHHEHHS 4 4 2 Examples MPI Wait Duration x Custom Metrics Description MPI Wait Duration Unit 1 5 Metric Function Duration MPI Wait Exclusive Oca Po Figure 4 29 Construction of a custom metric showing the MPI_Wait duration In Vampir it is also possible to identify long running functions In this example long running invocations of the function MPI Wait are highlighted First step is to construct a custom metric showing the MPI Wait duration time The custom metric editor is described in more detail in Section 4 4 The constructed custom metric is depicted in Figure 4 29 Then the performance data overlay is used to show the own metric in the Master Time line Figure The color scale is configured to show only MPI Wait invocations with a high duration After identification of the areas with the highest duration deep red Zooming into such an area will eventually reveal the respective MPI Wait invoca tions
69. ions in the Master Timeline The color scale of the performance data overlay is freely customizable Clicking the wrench icon in the overlay control window opens the color scale options dialog The color scale provides three modes Default Highlight and Find Additionally the Cus tom mode allows to manually adapt the color scale to the own preferences Examples This section illustrates the usage of the Performance Radar chart and the Master Time line overlay in a few examples The trace file used for the examples shows a WRF 37 GWT forschung innovation 200 Trace View File Edit Chart Filter zm mie 4 1 TIMELINE CHARTS Vampir Large wrf otf Vampir Window Help mS 0s Process 0 Process 1 Process 2 Process 3 Process 4 200 Trace View File Edit Chart Filter Suu aja Timeline 55 100s 125s 150s Opacity 175s 200s Metric PAPI_FP_OPS txj Vampir Large wrf otf Vampir Window Help 5 d 2 0s 255 50 s Process O Process 1 Process 2 Process 3 Process 4 0900 Trace View File Edit Chart Filter m mite Timeline 55 100s 125s 150s 175s 200s Metric PAPI_FP_OPS v Opacity 4 tx Vampir Large wrf otf Vampir Window Help TILE O Os 255 50s Process 0 Process 1 Process 2 Process 3 Process 4 Timeline 755 100s 125s 150s 175s 200s Metric PAPI FP OPS v Opacity XK tx Figure 4 13
70. is convenient to define a place holder variable to indicate whether a preparation step like an instrumentation is desired or only the pure compilation and linking For example if this variable is called PREP then the lines defining the C compiler in the makefile can be changed from MPLCC mpicc to MPICC PREP mpicc and analogously for linkers and other compilers One can then use the same makefile to either build an instrumented version with the make PREP scorep command or a fully optimized and not instrumented default build by simply using make in the standard way i e without specifying PREP on the command line Detailed information about the installation and usage of Score P can be found in the Score P user manual http www score p org 10 aEH 5 mamma 5 aa mamaa GBSBHBHH maana BHHHHH maaa 1 GB BHBH mannana a EHHEBHHEH BHBHHHHH mBBHHHEHHH ERECTO BHEBEHHU EEC BHEBBHHH EE mHHEHDH DDBBHH ADO Ss ooo manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS CHAPTER 2 GETTING STARTED RR 2 2 2 VampirTrace Vampir Irace used to be the recommended monitoring facility for Vampir It is still avail able as Open Source software but no longer under active development see Score P section 2 2 1 During a program run of an application VampirTrace generates an OTF trace file which can be analyzed and visualized by Vampir The VampirTrace l
71. ives e Message Volume in Transit Aggregated number of bytes in transit via messages e Simultaneous I O Operations Number of interleaved l O directives e Simultaneous Messages Number of interleaved message passing directives e Time Spent in MPI Wait Times spent in MPI Wait routines 4 4 1 Metric Editor The Custom Metrics Editor allows to define derived metrics based on existing coun ters and functions This is particularly useful as the performance data overlay of the Master Timeline Section 4 1 3 is capable of displaying such custom metrics as well The editor is accessible via the list of customizable performance metrics explained in the previous section by clicking on the Edit button Figure 4 28 shows an example con struction of a custom metric Wait Time This metric is an addition of the time spent in the functions MPI Irecv and MPI Wait Custom metrics are build from input metrics that are linked together using a set of available operations In the editor the context menu accessible via the right mouse button allows to add new input metrics and op erations All created custom metrics become available in the Set Metric selections of the Performance Radar and Counter Data Timeline charts They are available as well in the overlay mode of the Master Timeline Custom metrics can be exported and imported in order to use them in multiple trace files 53 a a WWT 44 CUSTOMIZABLE PERFORMANCE METRICS x Custom Metrics Descript
72. ler callee relationship The entries of the Call Tree can be sorted in various ways Simply click on one header of the tree representation to use its characteristic to re sort the Call Tree Please note that not all available characteristics are enabled by default To add or remove characteristics use the Set Metric sub menu of the context menu To leaf through the different function calls it is possible to fold and unfold the levels of the tree This can be achieved by double clicking a level or by using the fold level buttons next to the function name Functions can be called by many different caller functions what is hardly obvious in the tree representation Therefore a relation view shows all callers and callees of the cur rently selected function in two separated lists shown in the lower area in Figure 4 22 In order to find a certain function by its name Vampir provides a search option acces sible via the context menu entry Find The entered keyword has to be confirmed by pressing the Return key The Previous and Next buttons can be used to flip through the results 47 a GWT cero 43 INFORMATIONAL CHARTS BOO Vampir Trace View Vampir Large wrf otf YW File Edit Chart Filter Window Help SErusesxy twats oO Call Tree All Processes Function Min Inclusive Time Max Inclusive Time E I SPEC BDYUPDATE PH 4 393 ms 0 239 s MM SMALL STEP PREP 3 951 5 4 097 s MM SMALL STEP FINISH 1 989 s 2 072 sisi
73. llows to directly select the desired set of func tions and function groups Available options e Contains The selected functions are shown e Does not contain The selected functions are filtered Filtering Functions by Duration Functions can also be filtered by their duration Duration of a function refers to the time spent in this function from the entry to the exit of the function There are two options available e Is greater than All functions whose duration time is longer than the specified time are shown e Is less than All functions whose duration time is shorter than the specified time are shown Filtering Functions by Number of Invocations The number of invocations of a function can also be used as filter rule This criteria refers to how often a function is executed in an application There are two possible filter rules in this mode Number of Invocations shows functions based on their total number of invocations in the whole application run There are two options available e Is greater than All functions whose number of invocations is greater than the specified number are shown e Is less than All functions whose number of invocations is less than the specified number are shown Number of Invocations per Process shows functions based on their individual num ber of invocations per process Hence if the number of invocations of a function varies over different processes this function might be shown for
74. m puted from the corresponding event data Informational charts provide additional or explanatory information regarding timeline and statistical charts All available charts can be opened with the Charts toolbar which is explained in Chapter 3 5 In the following sections we will explain the basic functions of the Vampir GUI which are generic to all charts If you are already familiar with the fundamentals feel free to skip this chapter The details of the different charts are explained in Chapter 4 17 Ay 3 1 Chart Arrangement Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help Slee i 1 n s i 208 ls miei mS oxi Process Summary Function Summary Similar Processes Accumulated Exclusive Time per Functions All Processes Accumulated Exclusive Time per Fun 1 000 s 0s DYN 98025 5 PHYS 493 094 s IB mpi i 93 04 s WRF 7 779 s NETCDF Timeline Function Legend 50 5s 1005s 1505 2005 H H F m I O NM mem m mpi Ml NETCDF M PHYS I fa Process 0 Process 1 Process 2 Process 3 Process 4 WRF Call Tree All Processes Process 0 Values of Counter PAPI FP OPS over Time oa 4 0 G d mE TEN Function Min Inclusi M write E malloc A n free WRF IOINIT WRF IOEXIT M WRF GET DM COMMUNICATOR y mE l a u Lai Figure 3 2 A Custom Chart Arrangement in the
75. m Go back to the initial state in vertical zooming e Set Metric Set the values which should be represented in the chart e g change from Exclusive Time to Inclusive Time 20 CHAPTER 3 BASICS 200 Vampir W File Edit Chart Filter Window Help Srusexvtmss Ble ol Trace View Vampir Large wrf otf Function Summary Timeline Ax Os 50s 100s 150s 200s All Processes Accumulated Exclusive 1 000 s Os Process 0 j j Process 1 Process 2 Process 3 Process 4 Process 5 7 781 s NETCDF Process 6 1s MEM Process 7 0 1 s VT API Process 8 Process 9 gt Function Legend Process 10 m DYN Process 11 P yo Process 12 ll mem Process 13 Bl mer Il NETCDF Process 14 m PHYS Process 15 m VT API E E E E L WRF s Figure 3 6 Undocking of a Chart 200 Vampir Function Summary Vampir Large wrf otf File Window Help All Processes Accumulated Exclusive Time per Function 400s 350s 300 s 2505 200 s 150s 100s 50s Os RADIATION_DRIVER LR O ADVANCE w MAZBTZ SA su KUNASA ADVECT_SCALAR UNEN ADVANCE_UV MOSHA ADVANCE_MU T 64 8885 ALLOC SPACE FIELD USE SMALL STEP PREP CALC_P_RHO DAA PHY _PREP 46 491 s IB SUMFLUX 44 281 s IBI Rk TENDENCY 44 016 s PP CUMULUS DRIVER 36 6535 BI RK_ADDTEND_DRY 33 484 s PIN MOIST_PHYSICS_PREP_EM 33 049 s IR UPDATE SCALAR 32 674 s IBI ADVECT_U 32 604 s IB SMALL STEP FI
76. metric Therefore the Set Metric sub menu of the context menu can be used to switch between Aggregated Message Volume Message Size Number of Messages and Message Transfer Rate 44 a mamma aa mamaa GBHBHBHH maana BHHHHH mana 1 BUBHB mannana a EHEBHHEH BHHBHHHHH manomano ERECTO BHEBEHHU EEE CTO BHEBBHHH EE ae DBDBBHH ADO BDBHHBH manom oooo BHHBH CHAPTER 4 PERFORMANCE DATA VISUALIZATION XEMER nung The group base can be selected via the context menu entry Group By Possible options are Message Size Message Tag and Communicator MPI Note There will be one bar for every occurring group However if the metric is set to Message Transfer Rate the minimal and the maximal transfer rate is given in an additional small bar beneath the main bar showing the average transfer rate The additional bar starts at the minimal rate and ends at the maximal rate see Figure 4 19 In order to filter out messages click on the associated label or color representation in the chart and then choose Filter from the context menu 4 2 4 Communication Matrix View The Communication Matrix View is another way of analyzing communication imbal ances It shows information about messages sent between processes 200 Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help m mide itn se SB Y Communication Matrix View Average Message Data Rate
77. n Chapter 5 4 2 illustrate both modes 63 Filter Functions Show only functions that match any of the following conditions Name S Contains E Y Apply E cancel Figure 5 5 Function Filter Dialog with Rule Set 5 4 1 Filter Options This chapter explains the various options available to build up filter rules Filtering Functions by Name One way of filtering functions is by their name This filter mode provides two different options Name provides a text field for an input string Depending on the options all functions whose names match the input string are shown The matching is not case sensitive Available options 64 Contains The given input string must occur in the function name Does not contain The given input string must not occur in the function name Is equal to The given input string must be the same as the function name Is not equal to The given input string must not be the same as the function name Begins with The function name must start with the given input string Ends with The function name must end with the given input string aEH 5 O mamma aa mamaa GBBHBHH maana HHHHH maaa 1 BUBB mannana BHHHHUEH EHEBHHHEH BHBHHHHH mHHHEHHH ERECTO BHEHBHHHU DUBBGHHHH BHEBBHHH DBDHHEHHH mHHEHnnH DBDBBHH ADO hhh manom oooo BHHBH BHBBDDDDDUDUDDHHEHHS List of Names provides a dialog that a
78. n Chapter 7 3 Every chart can be undocked or closed by clicking the dedicated icon in its upper right corner as shown in Figure Undocking a chart means to free the chart from the current arrangement and present it in an own window To dock undock a chart follow Figure 3 6 respectively Figure Function Summary aa All Processes Accumulated Exclusive 1 000 s Os M 495 E b 7815 NETCDF 15 MEM lt 0 1 5 VT API Figure 3 5 Closing right and Undocking left of a Chart Considering that labels e g those showing names or values of functions often need more space to show its whole text there is a further option of resizing In order to read labels completely it might be useful to alter the distribution of space shared by the labels and the graphical representation in a chart When hovering the blank space between labels and graphical representation a movable separator appears By drag ging the separator decoration with the left mouse button the chart space provided for the labels can be resized The whole process is illustrated in Figure 3 2 Context Menus All chart displays have their own context menu containing common as well as display specific entries In this section only the most common entries will be discussed A context menu can be accessed by right clicking anywhere in the chart window Common entries are e Reset Zoom Go back to the initial state in horizontal zooming e Reset Vertical Zoo
79. n the chart by enabling them via the context menu under Options Select Metric Select by Metric Metrics Measuring Points ar MEM APP ALLOC Select by Measuring Point AAA Process 1 PAPI L3 TCM Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 1 Process 13 Process 14 Process 15 Summarize multiple measuring points Y aves conca Figure 4 7 Select metric dialog The context menu entry Select Metric opens the selection dialog depicted in Fig ure 4 7 This dialog allows to choose the displayed counter in the chart Each counter is defined by its metric and its measuring point Note depending on the measurement not all metrics might be available on all measurement points The two left buttons in the dialog decide whether the counter should be selected by metric or by measuring point first In the case of Select by Metric there is also the option to Summarize multiple measuring points available This option allows to identify outliers by summarizing counters e g PAPILFP_OPS over multiple measuring points e g processes Hence when this option is active multiple measuring points can be selected The counter for the selected metric is then summarized over all selected measuring points The displayed counter graphs in the chart need then to be read as follows The yellow average line in the middle displays the average value e g PAPI_FP_OPS
80. name starts with an arbitrary common prefix defined by the user The master file is always named name otf The global definition file is named name 0 def Events and local definitions are placed in files name x events and name x defs where the latter files are optional Snapshots and statistics are placed in files named name x snaps and name x stats which are optional too Note Open the master file otf to load a trace When copying moving or deleting traces it is important to take all according files into account otherwise Vampir will render the whole trace invalid Good practice is to hold all files belonging to one trace in a dedicated directory Detailed information about the Open Trace Format can be found in the Open Trace Format OTF documentation 1 3 Vampir and Windows HPC Server 2008 The Vampir performance visualization tool usually consists of a performance moni tor e g Score P see Section 2 2 1 or VampirTrace see Section that records performance data and a performance GUI which is responsible for the graphical rep resentation of the data In Windows HPC Server 2008 the performance monitor is fully integrated into the operating system which simplifies its employment and provides ac cess to a wide range of system metrics A simple execution flag controls the generation of performance data This is very convenient and an important difference to solutions based on explicit source object or binary modificatio
81. ng to one trace The background color can be changed by clicking the respective colored rectangle next to the trace file path in the Navigation Toolbar 200 Comparison View File Edit Chart Filter Window Help zm ix 4 9 6 3 i W X 5 7 C A calcTest otf _ B calcTest otf _ C calcTest otf 16 966 s Timeline Function Summary Os 5s 10s 15s All Processes Accumulated Exclusive Time per Function Group 40s 205s Os Process 0 MPI Process 1 6 257 s IBI vr an Process 2 718 456 us TEST Process 3 480 336 us CALCULATION 296 117 us Application Process 0 Process 1 Process 2 Function Summary m All Processes Accumulated Exclusive Time per Function Group 4s 3s 2s 15 Os i les Process 0 939 444 us VT API Process 1 241 939 us TEST Process 2 228 578 us CALCULATION Process 3 82 214 us Application Process 0 L Function Summary 3 All Processes Accumulated Exclusive Time per Function Group 15 ms 10 ms 5 ms 0 ms processo OBE va if SCENT von 2 ERAS Application PA CALCULATION cue e 1 745 ms 77 TEST 1 E 2 3 Figure 6 5 Comparison View with open charts Figure 6 5 depicts a Comparison View with open Master Timeline Process Timeline and Function Summary charts 80 waa mamaa GBHBHBHH maana BHHHHH mannana a EHEBHHHEH BHHBHHHHH maHHEHHH BBBHEUSE BHHHHHHU EEC BHEBBHHH EE mHHEHDH DD
82. nnnu BHHBH S08 000000000000000n8 5 Information Filtering and Reduction Due to the large amount of information that can be stored in trace files it is usually nec essary to reduce the displayed information according to some filter criteria In Vampir there are different ways of filtering It is possible to limit the displayed information to a certain choice of processes or to specific types of communication events e g to cer tain types of messages or collective operations Deselecting an item in a filter means that this item is fully masked In Vampir filters are global Therefore masked items will no longer show up in any chart Filtering not only affects all performance charts but also the Zoom Toolbar All filter can be reached via the Filter entry in the main menu The available filter and their respective filter criteria are summarized in Table 5 1 Filtered Object Filter Criteria Processes Process Groups Communicators Process Hierarchy Representative Processes Messages Message Communicators Message Tags Functions Name Duration Number of Invocations Communicators Collective Operations I O Groups File Names Operation Types Collective Operations I O Events Table 5 1 Object Filtering Options Note The available selection methods are the same across all filter dialogs except the Function Filter The check box nclude Exclude All either selects or deselects every item Specific items can be sele
83. ns Windows HPC Server 2008 is shipped with a translator which produces trace log files in Vampir s Open Trace Format OTF The resulting files can be visualized with the Vampir performance data browser http www tu dresden de zih otf 2 Getting Started 2 1 Installation of Vampir Vampir is available for all major platforms Its installation process depends on the target operation system The following sections explain the particular installation steps for each system 2 1 1 Linux Unix An installer package is provided for Linux Unix systems To install Vampir run the installer from the command line lvamnpir o 1l 0 85L8Bd3fd l10x 1822 8e6tup b n Additional instructions are provided during installation For an overview of all available options run the installer package with the option ne1p It is possible to run the installer in silent unattended mode with the s command line option In this case the installer assumes default values for all options By default the installer associates Vampir with OTF and OTF2 files otf otf2 This allows to quickly open a trace file by double clicking its master file Furthermore a desktop icon and a desktop dependent menu items are generated During the first start of Vampir the license installation is completed Finally Vampir can be launched via the respective desktop icon or by using the com mand line interface see Section 2 3 2 1 2 Mac OS X Open the dmg installati
84. nu of the chart allows to cus tomize the color scale to the own preferences Master Timeline Overlay Mode BOO Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help m mi e ern FE BH Y Timeline A 0s 255 50s 75s 100 s 125 5 150s 175s 2005s Process 0 Metric PAPI FP OPS v Opacity m 4 tx j Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 0G 16 026 03G 4G Le 136 s Figure 4 11 Master Timeline with active performance data overlay Figure 4 11 shows an overview of the performance data overlay mode available in the Master Timeline chart The overlay is capable of displaying all metrics available in the Performance Radar chart and the Counter Data Timeline chart It is activated via the chart s context menu under Options Performance Data When the overlay mode is active a control window appears at the top of Master Timeline chart It allows to configure the overlay and to select the displayed performance data metric 36 aEH 5 5 BHH aEEHS 5 aa mamaa 3 GBHBHBHH maana BHHHHH maaa BUBBH mannana BHHHHUHH EHHEBHHEH BHBHHHHH mBHHEHHH ERECTO BHEHBHHUU EEE CTO BHEBBHHH EEE ao DBDBBHH mmmEnnu ETT manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS 200
85. oaded trace files as depicted in Figure 2 2 Selecting a list entry and clicking the Open button loads the respective trace The recent list is empty when Vampir is started for the first time 2 3 1 Loading a Trace File To open an arbitrary trace file click on Open Other or select Open in the File menu which provides the file open dialog depicted in Figure 2 3 File Help Favorite Links Path Vampir Large d Filesystem wrf otf mmm oo o Open Subset Cancel Figure 2 3 Loading a trace file in Vampir It is possible to filter the files in the list The file type input selector determines the visible files The default is All trace files otf otf2 elg esd which only shows trace files that can be processed by the tool All file types can be displayed by using A Files After selection of the trace file the loading process is started by a click on the Open but ton Alternatively a command line invocation is possible The following command line sequence shows an example for a Windows system Other platforms work accordingly C Program Files Vampir Vampir exe trace file To open multiple trace files at once you can give them one after another as command line arguments C Program Files Vampir Vampir exe file 1 file n If Vampir was associated with otf otf2 files during the installation process it is also possible to start the application by double clicking an otf otf
86. ol name otf AAA pp AS SAS AS SS DU SS SS IS SS E e Local Definitions Events Global Definitions name 0 def a Statistics Snapshots ES SS Figure 1 1 Representation of Streams by Multiple Files OTF uses a special ASCII data representation to encode its data items with numbers and tokens in hexadecimal code without special prefixes That enables a very powerful format with respect to storage size human readability and search capabilities on timed event records In order to support fast and selective access to large amounts of performance trace data OTF is based on a stream model i e single separate units representing seg ments of the overall data OTF streams may contain multiple independent processes whereas a process belongs to a single stream exclusively As shown in Figure each stream is represented by multiple files which store definition records performance TAL mamma 5 aa mamaa GBHBHBHH maana BHHHHH mana 3 BUBB mannana a EHHEBHHEH EEC mHHHEHHH ERECTO BHEBEHHU EEE CTO OE EE aoe DDBBHH ADO na manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS CHAPTER 1 INTRODUCTION AR events status information and event summaries separately A single global master file holds the necessary information for the process to stream mappings Each file
87. ol window also provides an opacity control slider This slider al lows to adjust the opacity of the overlay and thus makes the underlying functions easily visible without the need to disable the overlay mode 31 Ap GWT aam 4 TIMELINE CHARTS 4 1 2 Counter Data Timeline Counters are values collected over time to count certain events like floating point op erations or cache misses Counter values can be used to store not just hardware performance counters but arbitrary sample values There can be counters for different Statistical information as well for instance counting the number of function calls or a value in an iterative approximation of the final result Counters are defined during the instrumentation of the application and can be individually assigned to processes BOO Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help m mime J3i mouBGUZ Timeline ax Os 255 50s 75s 1005 1255 1505 1755 200 s Process 0 Values of Counter PAPI L3_TCM over Time m 60 M A A EE AEE EAE E E EE AN E KAA EAA AE E ED A p AMM M OM CEA AE PEERS SEN aN A U 30 Mi a a ae lj pii e oi ROES 7 Y M i M i i Y Y Y VN J WI r V PP u en f y v n n E Tr 13455 15 5M Figure 4 6 Counter Data Timeline An example Counter
88. on View File Edit Chart Filter Window Help Midi MIDA Mia MEA Y DA E C A calcTest otf 6196582155 2161966125 C B calcTest otf L C calcTest otf Timeline 16 96585 s 16 96590 s 16 96595 s 16 96600 s 16 96605 s 16 96610 s Process 0 L ZZ Ww MN Process 1 Process 2 Process 3 Process 0 Process 1 Process 2 Process 3 Process 0 Process 1 Process 2 Process 3 J Figure 6 1 Comparison View The Comparison View window depicted in Figure provides all comparison fea tures This chapter introduces its usage with the help of screenshots For this purpose the comparison of three trace files is demonstrated step by step The example trace files show one test application performing ten iterations of simple calculations Each trace respectively represents the run of this application on a different machine 77 Amy 6 1 Starting and Saving a Comparison Session gt Open New File Help W VAMPIR B Comparison Session Local File Remote File Figure 6 2 Vampir start window The fist step in order to compare trace files in Vampir is to start a comparison session A comparison session is setup using the Comparison Session Manager This dialog is accessible via the main menu entry File Comparison Session Manager or by clicking the Open Other button in the Vampir start window Figure The Comparison Session Manager depicted in Figure 6 3 hol
89. on package and drag the Vampir icon into the applications folder on your computer You might need administrator rights to do so Alternatively you can also drag the Vampir application to another directory that is writable for you After that double click on the Vampir application and follow the instructions for license installation aEH 5 mamma 5 aa mamaa GBSBHBHH maana BHHHHH maaa 1 GB BHBH mannana a EHHEBHHEH BHBHHHHH mBBHHHEHHH ERECTO BHEBEHHU EEC BHEBBHHH EE mHHEHDH DDBBHH ADO Ss ooo manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS CHAPTER 2 GETTING STARTED RR 2 1 3 Windows On Windows platforms the provided Vampir installer makes the installation very simple and straightforward Just run the installer and follow the installation wizard Install Vampir in a folder of your choice e g C Program Files In order to run the installer in silent unattended mode use the s option It is also possible to specify the output folder of the installation with D dir An example of a silent installation command is as follows Vampir 8 1 0 Standard x86 setup exe S D C Program Files You also have the option to associate Vampir with OTF and OTF2 files otf otf2 during the installation process This allows you to load a trace file quickly by double clicking its master file Subsequently Vampir can be launched by double clicking its icon or by us
90. or function group or to hide arbitrary functions and function groups from the displayed information To mark the function or function 43 gt A GWTo 0000 42 STATISTICAL CHARTS group to be profiled or filtered just click on the associated color representation in the chart The context menu entries Profile of Selected Function Group and Filter Se lected Function Group will then provide the possibility to profile or filter the selected function or function group Using the Process Filter see Section allows you to restrict this view to a set of processes The context menu entry Sort by allows you to order function profiles by Number of Clusters This option is only available if the chart is currently showing clusters Other wise function profiles are sorted automatically by process While profiling one function the menu entry Sort by Value allows to order functions by their execution time 4 2 3 Message Summary The Message Summary is a statistical chart showing an overview of all messages grouped by certain characteristics Figure 4 19 3200 Vampir Trace View Vampir Large wrf otf MAX AVERAGE MIN Figure 4 19 Message Summary Chart with metric set to Message Transfer Rate show ing the average transfer rate A and the minimal maximal transfer rate B All values are represented in a bar chart fashion The number next to each bar is the group base while the number inside a bar depicts the values depending on the chosen
91. overview snapshot Likewise the time range to be loaded can be set explicitly in the input fields From and To If markers are available in the trace file their timing information can be used as reference points as well Two markers need to be selected first use shift mouse click for the second marker Next click on Zoom Between Marker to set the respective time interval in the From and To input fields The event data to be loaded can also be restricted to certain processes or threads of execution by disabling unwanted instances in the selection area entitled Processes see Section 5 1 for further details Once the data subset of interest is specified a click on the OK button starts the loading process 15 GWT ie 2 3 STARTING VAMPIR AND LOADING PERFORMANCE DATA Open Subset File Help Time 40 56 m p SSS le Markers Type Process Processgroup Time Duration Description Warning i E PT OTF IRREGULARITY WARNING Process 0 Process 0 Process 0 Process 0 Process 0 89 3109 ms 0 11319 s 0 113152 5 0 11715 s 0 121189 5 WARNING WARNING WARNING WARNING WARNING MPI Recv MP Recv MPI Recv MP Recv Use MPI Recv 0 1251515 0 12115 5 Process 0 Process 0 WARNING MPI Recv WARNING MPI Recv Zoom Between Marker Edit proces
92. r Call Level 3 Is less than 5 5 E 3200 Trace View Vampir Large wrf otf Vampir File Edit Chart Filter Window Help Srmispeemctmieas BY Timeline 6s ES 8s 9s 10s lls 12s 13s 145 15s MES Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 1 l 1 OGG 2 MED_INITIALDATA_INPUT INTEGRATE 3 INPUT MODEL INPUT SART DOMAN n 4 ll MED HISTORY OUT Figure 5 15 Showing only functions with a call level less than five 76 waa mamaa 8BHBHBHH maana 1 BHHHHH maaa 1 1 BUBBH mannana a EHEBHHEH BZHBHHHHH maHHEHHH ERECTO BHEHBHHHU EEE CTO BHEBBHHH EEE ae DBDBBHH manono DBHHBH manom nnnu BBHHBH BHBBDDDDDUDUDDHHEHHS 6 Comparison of Trace Files In Vampir the comparison of trace files seamlessly integrates with the functionality explained in the previous chapters of this document The user can benefit from already gained experiences For the comparison of performance characteristics all common charts are provided Additionally in order to effectively compare multiple trace files their zoom is coupled and synchronized For the comparison of areas of interest the displayed trace regions are freely shiftable in time This allows for arbitrary alignments of the trace files and thus enables comparison of user selected areas in the trace data BOO Comparis
93. r Timeline Function Function MPI Wait Function Group MPI Interval Begin 99 203092 s Interval End 99 28449 s Duration 81 3979 ms Figure 4 25 Context View showing context information B of a selected function A JN Nc Vampir Trac e View Mampir Large wrf otf YW File Edit Chart Filter Window Help Srusesitass amp 7 0 Timeline Context View a dada a 5 Master Timeline 54 Master Timeline 3 Diff 54 Process 0 Property Value 1 Comparison Value 2 Diff Process 1 Display Master Timeline Master Timeline Process 2 Type Function Function Process 3 Function MPI_Wait MPI_Wait cas dl Function Group MPI MPI Interval Begin 99 346965 s gt 99 203092 s 0 143873 5 Process 5 Interval End 99 411091s gt 99 28449 s 0 126601 s Process 6 Duration 64 1259 ms lt 81 3979 ms 17 272 ms Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 ees AA Figure 4 26 Comparison between Context Information 51 a a GWT cence 444 CUSTOMIZABLE PERFORMANCE METRICS 4 4 Customizable Performance Metrics Vampir is shipped with a set of predefined customizable metrics that reflect known sources for performance issues and can serve as starting point for application specific customizations Figure 4 27 shows the list of custom metrics that are predefined in Vampir The list is accessible via the conte
94. r Timeline and Process Timeline without filtering 67 Amy Showing only MPI Functions In this example only functions that contain the string mpi not case sensitive some where in their name are shown Since only MPI functions start with MPI in their name this filter setting shows all MPI functions and filters the others Show only functions that match any of the following conditions Name Contains mpi 200 Vampir Trace View Vampir WRF wrf otf Y File Edit Chart Filter Window Help olew 4 OS Timeline 6 55 Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 Figure 5 7 Showing only MPI 68 waa mamaa GBHHBHH mamana BHHHHH maaa i mannana a EHEBHHHEH BHBHHHHH mHHHEHHH ERECTO BHHHHHHU DHUBBGHHHH BHEBBHHH EEE mHHEHDH DDBBHH ADO DBHHBH manom nnnu BHHBH BHBBDDDDDUDUDDHHEHHS Showing only Functions with at least 250 ms Duration This example demonstrates the filtering of functions by their duration Here only long function occurrences with a minimum duration time of 250 ms are shown All other functions are filtered Y Show only functions that match any of the following conditions Duration gt Is greater than 250 Milliseconds
95. rance In the Appearance settings of the Preferences dialog there are six different objects for which the color options can be changed The functions function groups markers counters collectives messages and I O events Choose an entry and click on its color to make a modification A color picker dialog opens where it is possible to adjust the color For messages and collectives a change of the line width is also available In order to quickly find the desired item a search box is provided at the bottom of the dialog 8 a GWTess 0 78 SAVING POLICY Preferences Function Groups Markers Counters Collectives a Messag sages ale yo Events Name Color Application Em DYN 1 General Default yo CL MEM 1 MPI NETCDF Es NoGroup Ez PHYS VT API 1 Saving Policy Search Y Apply Q cancel qo Figure 7 2 Appearance Settings 7 3 Saving Policy Vampir detects whenever changes to the various settings are made In the Saving Policy dialog it is possible to adjust the saving behavior of the different components to the own needs In the dialog Saving Behavior you tell Vampir what to do in the case of changed prefer ences The user can choose the categories of settings e g the layout that should be affected by the selected behavior Possible options are that the application automati cally Always or Never saves changes The default option is to have Vampir asking you w
96. ration type 61 GWT forschung innovation 9 3 I O Filter Filter 1 O Events 9 3 I O FILTER I O Groups File Names Y Include Exclude All Y Include Exclude All Y fileio Y Idev sgi fetchop 4 Y stdio dev zero Y work h USE TBL Y work M_DATA Y work h t input lt STDERR gt Y lt STDIN gt lt STDOUT gt Y rsl error 0000 v rsl error 0001 Y rsl error 0002 Y rsl error 0003 Y rsl error 0004 v rsl error 0005 Y rsl error 0006 v rsl error 0007 Y rsl error 0008 v rsl error 0009 Y rsl error 0010 Operation Types Y Include Exclude All Y Read Write Open Close Seek v Unlink Y Rename Dup Sync Y Lock Y Unlock Other Failed Async Coll Direct Sync IsReadLock AKA Y Apply Q Cancel Figure 5 3 I O Filter Figure 5 3 depicts the O Filter dialog The dialog allows to selectively filter I O events displayed in timelines and statistics Available filter criteria are O Groups or the I O Operation Type lt is also possible to filter l O operations based on input and output files 62 aEH 5 BHH mamma aa mamaa 1 BHHBHH mamana BHHHHH maaa BUBB mannana a EHHEHHHEH BHBHHHHH mHHHEHHH ERECTO BHEBHEHHU EEE CTO BHEBBHHH EE ae 8 DBDBBHH mmmEnnu ETT manom oooo BHHBH CHAPTER 5 INFORMATION FILTERING AND REDUCTION AM 5 4
97. rations if they are included in the trace file I O events are depicted as mV triangles at the beginning of an I O interval In order to see the whole interval of a single I O event its triangle has to be bes selected In that case a second triangle indicating the end of the interval appears Multiple I O events are tricolored and Po tg drawn as a triangle with a line to the end of the interval Table 4 1 Additional Information in the Master and Process Timeline The Master Timeline also provides the possibility to search for function and function group occurrences In order to activate the search mode use the context menu and select Find After activation an input field appears at the top of the Master Timeline A search string can be written in this field and all corresponding function and function group occurrences are highlighted in yellow in the Master Timeline An example search for the function MPI Bcast is depicted in Figure Furthermore the Master Timeline also features an overlay mode for performance counter data Figure 4 5 In order to activate the overlay mode use the context menu Options Performance Data When the overlay mode is active a control window ap pears at the top of Master Timeline It allows to select the displayed counter data metric The counter data is displayed in a color coded fashion like in the Performance Radar Section The color scale can be freely customized by clicking on the wrench icon The contr
98. rformance data obtained from a parallel program execution can be analyzed with a collection of different performance views Intuitive navigation and zooming are the key features of the tool which help to quickly identify inefficient or faulty parts of a pro gram code Vampir implements optimized event analysis algorithms and customizable displays which enable a fast and interactive rendering of very complex performance monitoring data Ultra large data volumes can be analyzed with a parallel version of Vampir which is available on request Vampir has a product history of more than 15 years and is well established on Unix based HPC systems This tool experience is also available for HPC systems that are based on Microsoft Windows HPC Server 2008 1 1 Event based Performance Tracing and Profiling In software analysis the term profiling refers to the creation of tables which summarize the runtime behavior of programs by means of accumulated performance measure ments Its simplest variant lists all program functions in combination with the number of invocations and the time that was consumed This type of profiling is also called inclusive profiling as the time spent in subroutines is included in the statistics compu tation A commonly applied method for analyzing details of parallel program runs is to record so called trace log files during runtime The data collection process itself is also re ferred to as tracing a program Unlike profiling
99. rted 8 2 1 InstallationofVampiri 8 2 1 1 Linux Unix 2 a 8 21 2 MOCOS Rs aw eee hehe baa a ew eRe eae Se 5199 5 8 2 1 3 Windows 9 2 2 Generation of Performance Data 9 eS E Rae eee E eee eee RES eee 9 eee a RERO SES PETUPSEOEG P x 11 2 2 3 Event Tracing for Windows ETW 11 2 3 Starting Vampir and Loading Performance Data 13 2 3 1 Loading a TraceFile 14 2 3 2 Loading a Trace File Subset 15 17 3 1 Chart Arrangement 18 wae eee eG ee aaa tee ee een eee eee 4 20 hGe othe bbe eeer She ene ae ne en eee ne eae 22 wee eee eeee ase eee ee aaa sa 24 Ur Peewee eae eee aa eee ee 25 3 6 Properties of the Trace File 26 27 41 TimelineCharisi 27 4 1 1 Master Timeline and Process Timeline 27 aaa 32 A 34 FAA Ran ee 41 4 2 1 Function Summary 41 VVU eee eee ee ON OXON WT 4 WOW V 3 R78 d 43 RS HOP OR one eae NONO RUSO SOS RP RU 44 ee ee ULA ee NOS UR S 45 GwT a Contents 4 2 5 V OSummary 46 peewee he ee he eae eee eae ee eee ee de 47 4 3 InformationalCharis 48 4 3 1 FunctionLegend 48 ATENTEN 95 48 aa 49 PATEAR 52 4 41 Metric Editor
100. s all function groups and functions For example every time a pro cess calls the MPI_Send function the elapsed time of that function is added to the MPI function group time The chart gives a condensed view of the execution of the application A comparison between the different function groups can be made and dominant function groups can be distinguished easily It is possible to change the information displayed via the context menu entry Set Metric that offers options like Average Exclusive Time Number of Invocations Accumulated Inclusive Time etc 41 A GWT bosono 000 42 STATISTICAL CHARTS 200 Vampir Trace View Vampir Large wrf otf YW File Edit Chart Filter Window Help Srusenditm ss amp 0 Ly Function Summary Function Summary All Processes Accumulated Exclusive Time per Function Group All Processes Accumulated Exclusive Time per F 1500s 1 000s 500 s Os DYN PHYS 495 692 s lm ve 116 338 s WRF 18 999 s 1 O 7 781s NETCDF ils MEM 0 ls VT API MPI 495 692 s DYN 1 713 881 s PHYS 980 425 s Figure 4 17 Function Summary Note nclusive means the amount of time spent in a function and all of its subroutines Exclusive means the amount of time spent in just this function The context menu entry Set Event Category specifies whether function groups or func tions should be displayed in the chart The functions own the color of their function group
101. s selection using Process Hierarchy ri Include Exclude All Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Y Process 6 Process 7 wl Process A Number of processes to be displayed 64 out of 64 processes with this name part o Case sensitive Set Figure 2 4 Selecting a trace data subset to be loaded 16 EH 5 h BHH mamma aa mamaa 1 GBHBHBHH maana BHHHHH maaa BUBHBH mannana n BHHHHUEH EHHEBHHEG BHBHHHHH maHHEHHH ERECTO BHEBHHHU DHUBBGHHHH BHEBBHHH EEE BADOO DDBBHH mmmEnnu EEE mamom nnnu BHHBH S080 000000000000R0n8 CHAPTERS BASICS 2 AAA 3 Basics After loading has been completed the Trace View window title displays the trace file s name as depicted in Figure 3 1 By default the Charts toolbar and the Zoom Toolbar are available 23200 Vampir Trace View Vampir Large wrf otf Figure 3 1 Trace View Window with Charts Toolbar A and Zoom Toolbar B Furthermore a default set of charts is opened automatically after loading has been finished The charts can be divided into three groups timeline statistical and infor mational charts Timeline charts show detailed event based information for arbitrary time intervals while statistical charts reveal accumulated measures which were co
102. some processes and filtered for others There are two options available e Is greater than All functions whose number of invocations is greater than the specified number are shown 65 Amy e Is less than All functions whose number of invocations is less than the specified number are shown Filtering Functions by Call Path The Call Path filter provides a string input field for a pattern Depending on the options all functions with their related events are shown which satisfy a substring match against the given pattern This filter mode provides two opposing options e Contains The call path must contain a function where the given pattern must occur in the functions name This specifically means that functions that lead to the matched function wont be shown anymore The matched function itself along with its possibly called sub functions is still shown All other call paths that do not contain a matched function are filtered out as well and won t be shown e Does not contain The call path must not contain a function where the given pattern occurs in the function name This specifically means that only functions that lead to the matched function will be shown excluding the matched function itself as well as its possibly called sub functions Call paths that do not contain a matched function are still shown and remain unaffected by the filter Filtering Functions by Call Level Functions can also be filtered by their Call Level
103. ss Timeline Counter Data Timeline Performance Radar Function Summary Message Summary Process Summary Communication Matrix View O Summary Call Tree Function Legend Marker View Context View Description Section 4 1 1 Section 4 1 1 Section 4 1 2 Section 4 1 3 Section 4 2 1 Section 4 2 3 Section 4 2 2 Section 4 2 4 Section Section 4 2 6 Section 4 3 1 Section 4 3 2 Section 4 3 3 Table 3 1 Icons of the Charts Toolbar The Charts Toolbar is used to open instances of the available performance charts It is located in the upper left corner of the Trace View window as shown in Figure 3 1 The toolbar can be dragged and dropped to alternative positions The Charts Toolbar can be disabled with the toolbar s context menu entry Charts 25 Amy Amy Table 3 1 gives an overview of the available performance charts with their correspond ing icons The icons are arranged in three groups divided by small separators The first group represents timeline charts whose zooming states affect all other charts The second group consists of statistical charts providing special information and statistics for a chosen interval Vampir allows multiple instances for charts of these categories The last group comprises of informational charts providing specific textual information or legends Only one instance of an informational chart can be opened at a time 3 6 Properties of the Trace File Vampir provides an info dialog cont
104. the Vampir toolkit to optimize the inefficient parts of the code 96
105. the number automatically by the number of cores e g two analysis threads on a dual core machine In the Updates section the user can decide if Vampir should check automatically for new versions 86 mamma aa mamaa 8BHBHBHH maana HHHHH maaa BUBB mannana a EHEHHHEH ERECTO maHHEHHH ERECTO BHEBHHUU EEE CTO BHEBBHHH EEE mHHEHDH DDBBHH mmmEnnu EEE manom oooo BHHBH BHEBDDDDDUDUDDHHEHHS CHAPTER 7 CUSTOMIZATION RR Preferences Charts Show time as Seconds Automatically open context view Use color gradient in charts Font Sans Serif Select Restore Default Source code Source file location Search y tai Remove prefix from source reference Appearance Set maximum size for source file in KiB 100 Analysis Fix number of analysis threads DE H Updates t Automatically check for newer versions Color blindness Saving Poli OE Enable support for color blindness Document layout Enable multiple document interface Voy 69 conca d Figure 7 1 General Settings Vampir also features a color blindness support mode On Linux systems there is also the Document layout option available If this option is enabled all open Trace View windows need to stay in one main window If it is disabled the Trace View windows can be moved freely over the Desktop 7 2 Appea
106. tion in the context menu of the respective performance chart Additionally the zoom can be accessed with help of the Zoom Toolbar by dragging the borders of the selection rectangle or by scrolling of the mouse wheel as described in Chapter In order to return to the previous zooming state an undo functionality accessible via the Edit menu is provided Alternatively the key combination Ctrl Z also reverts the last zoom Accordingly a reverted zooming action can be redone by selecting Redo in the Edit menu or by pressing Ctrl Shift Z The undo functionality is not bound to single performance charts but works across the entire application The labels of the Undo and Redo menu entries also state which kind of action will be undone redone next 23 ZI 3 4 The Zoom Toolbar Vampir provides a Zoom Toolbar that can be used for zooming and navigation in the trace data It is located in the upper right corner of the Trace View window shown in Figure It is possible to adjust its position via drag and drop The Zoom Toolbar offers an overview and summary of the loaded trace data The currently zoomed area is highlighted as a rectangle within the Zoom Toolbar By dragging of the two boundaries of the highlighted rectangle the horizontal zooming state can be adjusted Note Instead of dragging boundaries it is also possible to use the mouse wheel for zooming Hover the Zoom Toolbar and scroll up and down to zoom in and out respec tively Dragging
107. xt menu entry Customize Metrics in the Performance Radar or the Counter Data Timeline chart Custom Metrics Add D Active Description FLOPS in User Defined Function Edit I O Bandwidth Y Duplicate Y IO Volume in Transit Remove v MPI Latencies v Message Data Rate Import Y Message Transfer Times Export Y Message Volume in Transit Simultaneous I O Operations Simultaneous Messages Yo Time Spent in MPL Wait Figure 4 27 List of predefined customizable performance metrics The following time dependent metrics are provided e FLOPS in User Defined Function Floating point performance for a given function which can be set by the user see Section 4 4 1 I O Bandwidth Aggregated file I O bandwidth requires that I O events have been recorded IO Volume in Transit Aggregated number of bytes in transit to and from the I O system e MPI Latencies Duration of individual MPI calls Message Data Rate Bytes per second exchanged with message passing direc tives 52 aaa O mamma BBHHHM mamaa BHHBHH maana 1 1 BSHHHH maaa 1 BUBB mannana 3 a EHEBHHEG EEE maHHEHHH ERECTO BHEHHHHU EEC BHEBBHHH DBDHBHEBHH mHHEnnu DBDBBHH ADO BDBHHBH manom oooo BHHBH S08 000000000000000n8 e Message Transfer Times Latencies of individual message passing direct
108. y high time performing the calculations could indicate an unbalanced distribution of work and therefore can slow down the whole application 200 Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help EIEE Srusestm2zse gt W Process Summary Individual Processes Accumulated Exclusive Time per Functions 60 s u u Uu 00 o u o N o de o 1005 1205 Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 39927277 PRESTA BROS EB KUNI Figure 4 18 Process Summary The contert menu entry Set Event Category specifies whether function groups or func tions should be displayed in the chart The functions own the color of their function group The chartcalculates statistics based on Number of Invocations Accumulated Inclusive Time or Accumulated Exclusive Time To change between these three modes use the context menu entry Set Metric The number of clustered profile bars is based on the chart height by default You can also disable the clustering or set a fixed number of clusters via the context menu entry Clustering by selecting the corresponding value in the spin box Located left of the clustered profile bars is a graphical overview indicating the processes associated to the cluster Moving the cursor over the blue areas in the overview opens a tooltip stating the respective process name lt is possible to profile only one function

Download Pdf Manuals

image

Related Search

Related Contents

BIA RI-MI1217    R&S®FSV Signal Analyzer Specifications  communiqué de presse  Descargar Manual    O - VAYR  lire l`article de remo linda du 17 aout 2015  Enginomix Power Plant Library  (様式第5号) 年度末モニタリング結果公開用様式 平成24年度 施設名  

Copyright © All rights reserved.
Failed to retrieve file