Home

Vampir 7 User Manual

image

Contents

1. e Local Definitions Events Global Definitions name O def vs Statistics Snapshots E EE Figure 1 1 Representation of Streams by Multiple Files OTF uses a special ASCII data representation to encode its data items with numbers and tokens in hexadecimal code without special prefixes That enables a very powerful format with respect to storage size human readability and search capabilities on timed event records In order to support fast and selective access to large amounts of performance trace data OTF is based on a stream model i e single separate units representing seg ments of the overall data OTF streams may contain multiple independent processes whereas a process belongs to a single stream exclusively As shown in Figure each stream is represented by multiple files which store definition records performance no nua mamma CH mamaa GBHBHBHH maana BHHHHH mana 3 BUBB mannana BHHHHHUHH EHHEBHHEH BHBHHHHH mHHHEHHH BBBHEUHE BHEBEHHU DUBBHHHH BHEBBHHH DBDBHEBHH mHHEHDH DDBBHH mmEnnu ooo manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS CHAPTER 1 INTRODUCTION ZS events status information and event summaries separately A single global master file holds the necessary information for the process to stream mappings Each file name starts with an arbitrary c
2. 0 5 0208 5 6 1 3 Vampir and Windows HPC Server 2008 7 2 Getting Started 8 2 1 Installation of Vampir aoa oa a a ee 8 aoe heen thew nee eee nee eee eee ese 8 202 MACOS MM es esee reprae EER eG eee Se A 25125395 8 2 1 3 Windows 8 2 2 Generation of Trace Data on Windows Systems 9 2 2 1 Enabling Performance Tracing 9 2 2 2 racing an MPI Application lll 9 2 3 Generation of Trace Data on Linux Systems 11 2 3 1 Enabling Performance Tracing 11 2 3 2 racing an Application lll rn 12 2 4 Starting Vampir and Loading a Trace File 13 15 3 1 Chart Arrangement 16 A m 18 LU Garros Res ASLDs NASA ne eae 20 ewes eee ans RARE Revo Ra RA 22 seu e heat eeue acess cares cama aa 23 3 6 Properties of the Trace File 24 25 41 Timeline Charte 25 4 1 1 Master Timeline and Process Timeline 25 ada Dama RSS aa 30 CP 32 4 2 Statistical Charte 35 4 2 1 Function Summary llle ln 35 eng eee eee TRAD 4 WOW V 3 A d 36 RS HOP OR one eae NONO RUSO SOS aaa 37 hoo dama nd P 38 GWT m o Contents 4 2 5 VO Summary aooaa enn 39 DEENEN 41 4 3 Informational Charts 0 0 0000000000008 42 4 3 1 FunctionLegend 2 2 00 ll n 42 Pea tee E E E E 42 E eR eee eee E Ra a E 43 4 4 Information Filtering an
3. BHHHHUHH BERDDECO BHBHHHHH mHHHEHHH BBBHEUHE BHHHHHHU DUBBHHHH BHEBBHHH DBDHHEBHH mHHEnnu DBDBBHH manono ooo mamom nnnu BHHBH S080 00000000000000m8 of three trace files is demonstrated step by step The three example trace files shown are measurements of one test application performing ten iterations of simple calcula tions run on three different machines 5 1 Starting the Compare View The fist step in order to compare trace files in Vampir is to load the respective files Each trace file can be loaded one after another via File Open in the main menu Figure 5 2 shows Vampir with three open trace files 20906 Vampir Trace View Vampir Comparison A calcTest otf W File Edit Chart Filter Window Ho oag Os m idum sal LELI so ii 0s 15s All Processes TI of re ei Function Gr 40 20 Processo E Process1 E CALCULATION Process2 SS Process 3 d Ev An EM application 20906 Vampir Trace View Vampir Comparison B calcTest otf oga Y File Edit Chart Filter Window hop Bri SO 3m en S All Processes of eatin ei Function Gr 40 D 0 00 s 0 255 EET 0 75 s 1 00 s ei Process 1 DIT CALCULATION Process 2 5 Process 3 E v_ar B i Bee 20906 Vampir Trace View Vampir Comparison C calcTest otf W File Edit Chart Filter Window Help EISE arinegi 22587 d
4. Voy Brong Po Figure 6 3 Saving Policy Settings Default Preferences offers to save preferences of the current trace file as default settings Then they are used for trace files without settings Another option is to restore the default settings Then the current preferences of the trace file are reverted 68 mamma 5 CH mamaa 1 GBHBHBHH maana nana maaa BUBHBH mannana BHHHUHH EHEBHHEG BHBHHHHH maHHEHHH BBBHEUSE BHEBHHHU DHUBBGHHHH BHEBBHHH DBDHHEBHH BEDOCO DDBBHH mmmEnnu ooo mamom nnnu BHHBH 7 A Use Case This chapter explains by example how Vampir can be used to discover performance problems in your code and how to correct them 7 1 Introduction In many cases the Vampir suite has been successfully applied to identify performance bottlenecks and assist their correction To show in which ways the provided toolset can be used to find performance problems in program code one optimization process is illustrated in this chapter The following example is a three part optimization of a weather forecast model including simulation of cloud microphysics Every run of the code has been performed on 100 cores with manual function instrumentation MPI communication instrumentation and recording of the number of L2 cache misses LU Vampir Trace View Vampir SuccessStory pmp old otf w File Edit Chart Filter Window Help Sr Re
5. 1 1 m 6 Customization The appearance of the trace file and various other application settings can be altered in the preferences accessible via the main menu entry File Preferences Settings concerning the trace file itself e g layout or function group colors are saved individually next to the trace file in a file with the ending vsettings This way itis possible to adjust the colors for individual trace files without interfering with others The options Import Preferences and Export Preferences provide the loading and saving of preferences of arbitrary trace files 6 1 General Preferences The General settings allow to change application and trace specific values Show time as decides whether the time format for the trace analysis is based on seconds or ticks With the Automatically open context view option disabled Vampir does not open the context view after the selection of an item like a message or function Use color gradient in charts allows to switch off the color gradient used in the perfor mance charts The next option allows to change the style and size of the font Show source code enables the internal source code viewer This viewer shows the source code corresponding to selected locations in the trace file In order to open a source file first click on the intended function in the Master Timeline and then on the source code path in the Context View For the source code locatio
6. Y Process 6 Y Process 7 Y Process 8 Y Process 9 Y Process 10 Y Process 11 Y Process 12 Y Process 13 Y Process 14 Y Process 15 Number of processes 16 Selected processes 16 rr yen Qcca j Pox Figure 4 22 Process Filter Different selection methods can be used in a filter The check box Include Exclude All either selects or deselects every item Specific items can be selected deselected 45 Amy by clicking into the check box next to it Furthermore it is possible to select deselect multiple items at once Therefore mark the desired entries by clicking their names while holding either the Shift or the Ctrl key By holding the Shift key every item in between the two clicked items will be marked Holding the Ctrl key on the other hand enables you to add or remove specific items from to the marked ones Clicking into the check box of one of the marked entries will cause selection deselection for all of them Filter Object Filter Criteria Processes Process Groups Communicators Process Hierarchy Collective Operations Communicators Collective Operations Messages Message Communicators NR message ago O UO Events I O Groups File Names Operation Types Table 4 2 Options of Filtering 4 5 Function Filtering The filtering of functions in Vampir is controlled via the Function Filter Dialog de picted in Figure This dialog can be accessed via the
7. Counter type Predefined Custom Painting preferences Graph Line Points Minimum Maximum v d Average Li v Show average line Show caption Show zero line ELES Figure 4 8 Counter Timeline options dialog under Options It provides the possibility include exclude the graphs lines and data points from the chart Itis also possible to enable an average line showing the average value of all data poinis in the visible area The counter type setting is used to determine how the data points should be con nected This is dependent on the type of the counter and usually predefined during the recording of the trace data Nevertheless this setting can also be changed afterwards in Vampir The Counter Data Timeline chart allows to create custom metrics This process is described in Section Created custom metrics become available in the Select Metric dialog 4 1 3 Performance Radar The Performance Radar chart Figure displays counter data and provides the possibility to create custom metrics In contrast to the Counter Data Timeline the Performance Radar shows one counter for all processes at once The values of the counter are displayed in a color coded fashion The displayed counter in the chart can be chosen via the context menu entry Set Metric Own created custom metrics are listed under this option as well The option Adjust Bar Height to allows to change the heigh
8. Marker View will shift the timeline zoom to the marker position Thus the marker appears in the center of the timeline display see Figure 5 1 1 63 ZI Amy OO Vampir Compare View W File Edit Chart Window Help Gelee mss Mo E E Timeline 0 1150 s 0 1175 s 0 1200 s 0 1225 s 0 1250 s 0 1275 s Process O Process 1 Process 2 Process 3 Process O Process 1 Process 2 a O eee Process 3 Slave ave sync time BOO D Marker View Marker View Type Process Processgroup Time Dez Process Processgroup Time E 2 Pl Error y MARMOT Error RMOT Error Process 0 1 191657 s ER Process O 0 125184s E Process 1 1 190995 ER ning Process 2 1 191315 s ER iMOT Warning Process 3 1 191929 s ER Process O 89 3109 ms V f warning Process O 89 57775 ms MARMOT Warning Process 0 93 15785 ms Process O 1 191796 s Wi Process 0 97 15465 ms Process O 1 191882 s Ws Process O 97 1934 ms Process O 1 192668 s Wi Process 0 0 101154 s Process 0 1 192702 s Wie Process 0 0 105153 s E E I ir TT Figure 5 11 Jump to marker in the Master Timeline 64 no nua mamma CC mamaa 8BHBHBHH maana 3 BHHHHH maaa 1 BUBBH mannana BHHHHUEH BERDDECO BHBHHHHH maHHEHHH BBBHEUSE BHEHBHHHU DHUBBGHHHH BHEBBHHH DBDHHEBHH mHHmEnnu DBDBBHH manono aoe manom nnnu BBHHBH BHBBDDDDDUDUDDHHEHHS CHAPTER6 CUSTOMIZATION
9. Context View provides more detailed information of a selected object compared to its graphical representation An object e g a function function group message or message burst can be selected directly in a chart by clicking its graphical representation For different types of objects different context information is provided by the Context View For example the ob ject specific information for functions includes properties like Interval Begin Interval End and Duration shown in Figure 4 20 The Context View may contain several tabs A new empty one can be added by clicking the symbol on the right hand side Information of new selected objects are always displayed in the currently active tab The Context View offers a mode for the comparison of information between tabs Just use the button on the left hand side and choose two objects to compare It is possible to compare different objects from different charts This can be useful in some 43 GWT Figure 4 20 44 forschung innovation 4 3 INFORMATIONAL CHARTS No Vampir Trace View Vampir Large wrf otf LLL oie Master Timeline Master Timeline b Value Master Timeline WY WT YSU li TPI Wale E Mi T7 ua n we i Process Zi sen im a iii E atem 3 ES ri AT 4 Process 11 E DI E Process 12 im IL me ess 13 T Function Function MPI Wait Function
10. Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Figure 4 5 Active overlay showing PAPI FP OPS in the Master Timeline 28 no mamma CC mamaa BHBHBHH maana GHHHHH maaa BUBBH mannana BHHHHHUHH EHHEBHHEH BHBHHHHH mHHHEHHH BBBHEUHE BHEBHEHUU DUBBHHHH BHEBBHHH DBDBHEHHH mHHEnnu DBDBBHH manono DBHHBH manom nnnu BHHBH BHBBDDDDDUDUDDHHUEHHS Symbol Message Burst Due to a lack of pixels it is not possible to display a I B M large amount of messages in a very short time interval Therefore outgoing messages are summarized as so called message bursts In this representation you cannot deter mine which processes receive these messages Zooming into this interval reveals the corresponding single messages Markers To indicate particular points of interest during the run F multiple time of an application like errors or warnings markers can single be placed in a trace file They are drawn as triangles which are colored according to their types To indicate that two or more markers are located at the same pixel a tricolored triangle is drawn I O Events Vampir shows detailed information about I O operations if they are included in the trace file I O events are depicted as triangles at the beginning of an I O interval In order to see the whole interval of a single I O event its
11. Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help olew mite SR AS Br Y Timeline Function Summary Os 10s 20s 305 All Processes Number of Invocations per Function Ok Processo SEES MEME I 046 525 occi DID FENERE Sea DEER send Process ST ae DEER Me Irecv Process3 HSM IEEE E malioc Posi HE RE REM 1 920 MPI_Gather Process 5 E EE ua Mu d 1 920 MODULE PHYSI NDC ADD A2A Process 6 E RR TERRE 1 680 MODULE BC ZERO GRAD BDY Process7 NE RARE REM 1 680 MODULE BC E DYUPDATE PH Process 8 E GEBISSE GERBIETEO 1 680 MODULE SMAL ADVANCE W Process 9 A E LL nh 184 1 680 MODULE SMALL EM SUMFLUX Proceso HH ee 1 680 MODULE SMAL DVANCE MU T Process11 HA RARE ERRO 1 680 MODULE SMALL ADVANCE UV Process12 NE HH eae 1 583 EXT NCD SUPP NETCDF ERR Process13 NE HT eee 1 440 MODULE BC RELAX BDYTEND Process 14 NENNEN FREE ERREUR 1 440 MODULE BC FLOW DEP BDY e 1 212 EXT_NCD_SUPP S LOWERCASE Process 1 960 ALL SUB R 1 960 MPI Scatterv 2 MODULE INTEGRATE INTEGRATE 960 MPI_Gatherv 3 SOLVE_INTERFACE 960 MODULE BIG S LC P RHO PH 4 SOLVE EM 896 WRF GLOBAL TO PATCH REAL 5 720 MODULE BC EM HYS BC DRY 1 6 i 720 MODULE BIG S EM CALC PHP 7 720 MODULE_BIG_S EM CALC_ALT 8 720 MODULE BIG S EM CALC CQ i 720 MODULE BIG S CALC WW CP i l 720 MODULE BIG LE MOMENTUM E eee Figure 4 30 Show fun
12. lt nproc gt lt prefix gt http www tu dresden de zih vampirtrace 12 no mamma CC mamaa nEHER maana GSHHHH maaa 1 1 BUBBH mannaaa 1 n BHHHHUEH EHEBHHEH BHBHHHHH maHHEHHH BBBHEUHE BHEBHEHUU DHUBBHHHH BHEBBHHH DBDHHEBHH mHHEHDH DDBBHH manono BDBHHBH manom oooo BHHBH CHAPTER 2 GETTING STARTED vam 2 4 Starting Vampir and Loading a Trace File Viewing performance data with the Vampir GUI is very easy On Windows the tool can be started by double clicking its desktop icon if installed or by using the Start Menu On a Linux based machine run vampir in the directory where Vampir is installed A double click on the application icon opens Vampir on Mac OS X systems 2 Open Trace File an Bmewhxe e Computer wrf otf i 3 vampir Vampir Files of type All trace files otf elg esd ad Q cancel Figure 2 2 Loading a Trace Log File in Vampir To open a trace file select Open in the File menu which provides a file open dialog depicted in Figure lt is possible to filter the files in the list The file type input selector determines the visible files The default OTF Trace Files otf shows only files that can be processed by the tool All file tyoes can be displayed by using All Files Alternatively a command line invocation is always possible Shown here is an example for a W
13. 23 Function Filter Dialog 4 5 1 Filter Options This chapter explains the various options available to build up filter rules Filtering Functions by Name One way of filtering functions is by their name This filter mode provides two different options Name provides a text field for an input string Depending on the options all functions whose names match the input string are shown The matching is not case sensitive Available options Contains The given input string must occur in the function name Does not contain The given input string must not occur in the function name Is equal to The given input string must be the same as the function name Is not equal to The given input string must not be the same as the function name Begins with The function name must start with the given input string Ends with The function name must end with the given input string 47 Amy List of Names provides a dialog that allows to directly select the desired set of func tions and function groups Available options e Contains The selected functions are shown e Does not contain The selected functions are filtered Filtering Functions by Duration Functions can also be filtered by their duration Duration of a function refers to the time spent in this function from the entry to the exit of the function There are two options available e Is greater than All functions whose duration time is longer than the specified time
14. BHHHHH maaa mBUBBH mannana BHHHHUEH EHEHHHEH BHBHHHHH mHHHEHHH BBBHEUSE BHEHBHEHUU DHUBBHHHH BHEBBHHH DBDHHEBHH mHHEnnu DBDBBHH manono ooo manom nnnu BHHBH BHBBDDDDDUDUDDHHEHHS CHAPTER2 GETTING STARTED m e cores 1 run only one instance of mpicsync on each compute node 3 Format the eventlog files to OTF files mpiexec cores 1 wdir USERPROFILE etl2otf trace etl 4 Copy all OTF files from compute nodes to trace directory on share mpiexec cores 1 wdir USERPROFILE cmd c copy y otf x share userHome Trace 2 3 Generation of Trace Data on Linux Systems The generation of trace files for the Vampir performance visualization tool requires a working monitoring system to be attached to your parallel program Contrary to Windows HPC Server 2008 whereby the performance monitor is inte grated into the operating system recording performance under Linux is done by a separate performance monitor We recommend our Vampir Trace monitoring facility which is available as Open Source software During a program run of an application Vampir Trace generates an OTF trace file which can be analyzed and visualized by Vampir The Vampir Trace library allows MPI com munication events of a parallel program to be recorded in a trace file Additionally certain program specific events can also be included To record MPI communication events simply relink the program with the Vamp
15. Process 3 Processo Processl Process 2 Process 3 Process 0 Process O Process O N Figure 5 8 Alignment in the Navigation Toolbar There are several ways to shift the trace files in time One way is to use the context menu of the Navigation Toolbar A right click on the toolbar reveals the menu shown in Figure 5 7 Here the entry Set Time Offset allows to manually set the time offset for the trace file The entry Reset Time Offset resets the offset The easiest way to achieve a coarse alignment is to drag the trace file in the Navigation Toolbar itself While holding the Ctrl Cmd on Mac OS X modifier key pressed the 61 Ay Amy trace can be dragged to the desired position with the left mouse button In Figure 5 8 the compute iterations of all example trace files are coarsely aligned gt OO Vampir Compare View W File Edit Chart Window Help olew SS ve RA Br Y Os son A calelestott aco NE B son B calcTestAotf 16 966 s 05 175 son C calcTest otf 16 966 s Timeline Ax Os as 6s 9s 12s 1558 Process O Process 1 Process 2 Process 3 Process O Process 1 Process 2 Process 3 Process O Process 1 Process 2 Process 3 Process O Processo l main 2 NEM Process 0 15 399 s 13 649 s Figure 5 9 Alignment in the Master Timeline After t
16. R ASS bad Timeline m Function Summary 3 0s 50 s 100 s 150 s 200 s 250 s All Processes Accumulated Exclusive Time per F 20 0 d 9678999770007 M Process 7 Process 14 15 Process 21 MPI Process 28 MP_UTIL Process 35 0 73 f Application Process 63 Process 70 Process 77 Process 84 Process 91 Process 42 0 3296 VT API Process 49 c 0 19 COUPLE Process 56 i l Dt Figure 7 1 Master Timeline and Function Summary showing an overview of the pro gram run Getting a grasp of the program s overall behavior is a reasonable first step In Figure 7 1 Vampir has been set up to provide such a high level overview of the model s code This layout can be achieved through two simple manipulations Set up the Master Timeline to adjust the process bar height to fit the chart height All 100 processes are now arranged into one view Likewise change the event category in the Function 69 a au GWT vrs 72 IDENTIFIED PROBLEMS AND SOLUTIONS summary to show function groups This way the many functions are condensed into fewer function groups One run of the instrumented program took 290 seconds to finish The first half of the trace Figure 7 1 A is the initialization part Processes get started and synced input is read and distributed among these processes The preparation of the cloud microphysics function group MP is done here as well The second half is the iteration part wh
17. Vampir s Open Trace Format OTF The resulting files can be visualized with the Vampir 7 performance data browser http www tu dresden de zih otf 2 Getting Started 2 1 Installation of Vampir Vampir is available on all major platforms but naturally its installation depends on the operation system 2 1 1 Unix Linux In order to install Vampir on an Unix Linux machine it is sufficient to unpack the tar ball into the installation folder After that start the Vampir application and follow the instructions for license installation 2 1 2 Mac OS X Open the dmg installation package and drag the Vampir icon into the applications folder on your computer You might need administrator rights to do so Alternatively you can also drag the Vampir application to another directory that is writable for you After that double click on the Vampir application and follow the instructions for license installation 2 1 3 Windows On Windows platforms the provided Vampir installer makes the installation very simple and straightforward Just run the installer and follow the installation wizard Install Vampir in a folder of your choice e g C Program Files In order to run the installer in silent unattended mode use the S option It is also possible to specify the output folder of the installation with D dir An example of a silent installation command is as follows Vampir 7 5 0 Standard setup x86 exe S D C Program Files ann an
18. are shown e Is less than All functions whose duration time is shorter than the specified time are shown Filtering Functions by Number of Invocations The number of invocations of a function can also be used as filter rule This criteria refers to how often a function is executed in an application There are two possible filter rules in this mode Number of Invocations shows functions based on their total number of invocations in the whole application run There are two options available e Is greater than All functions whose number of invocations is greater than the specified number are shown e Is less than All functions whose number of invocations is less than the specified number are shown Number of Invocations per Process shows functions based on their individual num ber of invocations per process Hence if the number of invocations of a function varies over different processes this function might be shown for some processes and filtered for others There are two options available e Is greater than All functions whose number of invocations is greater than the Specified number are shown 48 mamma CH mamaa GBHBHBHH maana BHHHHH mana 1 mBUBB mannaaa opnnnns EHHEBHHEG BHBHHHHH maHHEHHH BBBHEUHE BHEHHHHU DUBBHHHH BHEBBHHH DBDBHEBHH mHHEnnu DBDBBHH mmEnnu BDBHHBH manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS e Is less than All functions
19. be opened at a time 3 6 Properties of the Trace File Vampir provides an info dialog containing important characteristics of the opened trace file This Trace Properties dialog can be accessed via the main menu under File Get Info The information originates from the trace file and includes details such as file name creator or the OTF version 24 no nua mamma CH mamaa BHBHBHH maana BHHHHH maaa 1 BUBB mannana BHHHUHH EHHEBHHEH BHBHHHHH maHHEHHH BBBHEUHE BHEHHHUU DHUBBGHHHH BHEBBHHH DBDHHEBHH mHHEHnnH DBDBBHH mmEnnu DBHHBH manom oooo BHHBH S08 000000000000000m8 4 Performance Data Visualization This chapter deals with the different charts that can be used to analyze the behavior of a program and the comparison between different function groups e g MPI and Calculation Communication performance issues are regarded in this chapter as well Various charts address the visualization of data transfers between processes The following sections describe them in detail 4 1 Timeline Charts A very common chart type used in event based performance analysis is the so called timeline chart This chart type graphically presents the chain of events of monitored processes or counters on a horizontal time axis Multiple timeline chart instances can be added to the Trace View window via the Chart menu or the Charts toolbar Note To measu
20. context menu of the respective performance chart Additionally the zoom can be accessed with help of the Zoom Toolbar by dragging the borders of the selection rectangle or by scrolling of the mouse wheel as described in Chapter In order to return to the previous zooming state an Undo functionality accessible via the Edit menu is provided Alternatively the key combination Ctrl Z also reverts the last zoom Accordingly a reverted zooming action can be redone by selecting Redo in the Edit menu or by pressing Cirl Shift Z The Undo functionality is not bound to single performance charts but works across the entire Vampir application The labels of the Undo and Redo menu entries also state which kind of action will be undone redone next 21 A 3 4 The Zoom Toolbar Vampir provides a Zoom Toolbar that can be used for zooming and navigation in the trace data Itis located in the upper right corner of the Trace View window shown in Figure 3 1 It is possible to adjust its position via drag and drop The Zoom Toolbar offers an overview and summary of the loaded trace data The currently zoomed area is highlighted as a rectangle within the Zoom Toolbar By dragging of the two boundaries of the highlighted rectangle the horizontal zooming state can be adjusted Note Instead of dragging boundaries it is also possible to use the mouse wheel for zooming Hover the Zoom Toolbar and scroll up and d
21. delay and the message rate which is the size of the message divided by the duration to decrease accordingly 4 2 5 UO Summary The I O Summary depicted in Figure 4 16 is a statistical chart giving an overview of the input output operations recorded in the trace file 39 A GWT bosono 4 2 STATISTICAL CHARTS gt OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help ErxumesEns45 g O Summary All Processes Number of I O Operations per File Name 30k 15k Ok HEN EINE e 07 20 0 24 00 00 00 EEE mos lt STDOUT gt 1 341 wrfbdy doi 342 lt STDERR gt 17 work home0 ml namelist input 10 work homeo un RRTM DATA 1 work home0 LANDUSE TBL CAT Figure 4 16 I O Summary All values are represented in a histogram like fashion The text label indicates the group base while the number inside each bar represents the value of the chosen metric The Set Metric sub menu of the context menu is used to switch between the available met rics Number of I O Operations Accumulated I O Transaction Sizes and all ranges of I O Operation Size I O Transaction Time or I O Bandwidth The I O operations can be grouped by the characteristics Transaction Size File Name and Operation Type The group base can be changed via the context menu entry Group I O Operations by Note There will be one bar for every o
22. display a sequence of events Those charts are therefore aligned vertically This alignment ensures that the temporal relationship of events is preserved across chart boundaries The user can arrange the placement of the charts according to his preferences by dragging them into the desired position When the left mouse button is pressed while the mouse pointer is located above a placement decoration the layout engine will give visual clues as to where the chart may be moved As soon as the user releases the left mouse button the chart arrangement will be changed according to his intentions The entire procedure is depicted in Figures 3 3 and The layout engine furthermore allows a flexible adjustment of the screen space that is used by a chart Charts of particular interest may get more space in order to render information in more detail The Trace View window can host an arbitrary number of charts Charts can be added by clicking on the respective Charts toolbar icon or the corresponding Chart menu entry With a few more clicks charts can be combined to a custom chart arrangement 16 CHAPTER 3 BASICS gt OO Vampir Trace View Vampir Large wrf otf ulated Exclusive Time per Fun 0 s RADIATION DRIVER 317708 wsm MPI Bcast 17 986 BI SOLVE_EM 171 423 s B MP wait 160 215 s BI caLc co 138 95 s BB CALC_P_RHO_PHI 137 865 s BB ADVANCE_w 17 986 BENI soLve EM 112 872 s vsu 171 428 s B vn
23. events is available Table 4 1 shows the symbols and descriptions of these objects Since the Process Timeline reveals information of one process only short black ar rows are used to indicate outgoing communication Clicking on message lines or ar rows shows message details like sender process receiver process message length message duration and message tag in the Context View display The Master Timeline also provides the possibility to search for function and function group occurrences In order to activate the search mode use the context menu and se lect Find After activation an input field appears at the top of the Master Timeline 2 7 A GWT bismo 441 TIMELINE CHARTS OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help elf Skuse ass amp 4 gt MNE Timeline Os 25s 50s 75s 100s 125s 150s 175s 200s Process O Find MPI Bcast x Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Figure 4 4 Search for MPI_Bcast in the Master Timeline OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help Sruisexytwass amp Timeline Os 255 505 75s 100s 1255s 1505 175s 2005s Metric PAPI FP OPS v Opacity L x Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7
24. fashion under their respec tive function group as well see Figure 4 18 4 3 2 Marker View The Marker View lists all marker events included in the trace file The display organizes the marker events based on their respective groups and types in a tree like fashion Additional information like the time of occurrence or descriptions are provided for each marker By clicking on a marker event in the Marker View this event becomes selected in the timeline displays If this marker is located outside the visible area the zoom jumps to this event automatically It is possible to select marker events by their type as well 42 no nua nano CC mamaa 3 GBHBHBHH maana BHHHHH maaa BUBBH mannana BHHHHUHH EHHEBHHEH BHBHHHHH mBHHEHHH BBBHEUHE BHEHBHHUU DUBBHHHH BHEBBHHH DBDHHEBHH mHHmEnnu DBDBBHH mmmEnnu ooo manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS 2006 Vampir Trace View Vampir Comparison Marker2 potential vt otf B Figure 4 19 A chosen marker A and its representation in the Marker View B Then all events belonging to that type are selected in the Master Timeline and the Process Timeline By holding the Ctrl or Shift key pressed multiple marker events can be selected If exactly two marker events are selected the zoom is set automatically to the occurrence time of the markers 4 3 3 Context View As implied by its name the
25. respectively in C as follows VT TRACER name Afterwards use vtcc DVTRACE hello c o hello to combine the manual instrumentation with automatic compiler instrumentation or vtcc vt inst manual DVTRACE hello c o hello to prevent an additional compiler instrumentation For a detailed description of manual instrumentation please consider the VampirTrace User Manual 2 3 2 Tracing an Application Running a VampirTrace instrumented application should normally result in an OTF trace file stored the current working directory where the application was executed On Linux Mac OS X and Sun Solaris the default name of the trace file will be equal to the application name For other systems the default name is a ot f but can be defined manually by setting the environment variable VT FILE PREFIX to the desired name After a run of an instrumented application the traces of the single processes need to be unified in terms of timestamps and event IDs In most cases this happens automat ically If itis necessary to perform unification of local traces manually use the following command O 6 vtunify lt nproc gt lt prefix gt If Vampir Trace was built with support for OpenMP and or MPI it is possible to speedup the unification of local traces significantly To distribute the unification on multiple pro cesses the MPI parallel version vtunify mpi can be used as follows O 6 mpirun np lt nranks gt vtunify mpi
26. saving behavior of the different componenis to the own needs In the dialog Saving Behavior you tell Vampir what to do in the case of changed preferences The user can choose the categories of settings e g the layout that should be affected by the selected behavior Possible options are that the application automatically Always or Never saves changes The default option is to have Vampir asking you whether to save or discard changes Usually the settings are stored in the folder of the trace file If the user has no write access to it it is possible to place them alternatively in the Application Data Folder All such stored settings are listed in the tab Locally Stored Preferences with creation and modification date Note On loading Vampir always favors settings in the Application Data Folder 6 a GWT memos 63 SAVING POLICY Preferences Saving behavior Locally stored preferences Default preferences Categories wg Displays Y MasterTimeline Display General UO Summary Time Axis Performance Radar ProcessTimeline Display Process Profile CommunicationMatrix Display Zoom Display Counter Display Message Profile Function Summary wv Appearance Custom Metrics I O Events Collectives Function Groups Markers Counters Appearance e Save changes in selected categories Layout gt Always Never o Ask
27. whose number of invocations is less than the specified number are shown 4 5 2 Examples In this chapter a few examples explain the usage of the function filter This enables the user to understand the basic principles of function filtering in Vampir at a glace It also illustrates a part of the set of available filter options provided by Vampir Unfiltered Trace File This section introduces the example trace file in an unfiltered state The timelines show a part of the initialization of the WRF weather forecast code The red color corresponds to communication MPI whereas the purple areas represent some input functions of the weather model 2069 Vampir Trace View Vampir WRF wrf otf YW File Edit Chart Filter Window Help mime SiS ti Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 MED_INITIALDATA_INPUT MODULE IO DOMAIN INPUT MODEL INPUT INPUT WRF WRF_INPUTIN Figure 4 24 Master Timeline and Process Timeline without filtering 49 Ay Showing only MPI Functions In this example only functions that contain the string mpi not case sensitive some where in their name are shown Since only MPI functions start with MPI in their name this filter setting shows all MPI functions and filters the others Show only functions that match any of the follo
28. Customize Met rics Figure 4 11 shows an example custom metric Wait Time This metric is an addition of the time spent in the functions MPI lrecv and MPI Wait Custom Metrics are build from input metrics that are liked together with the available operations The 34 no nua mamma 5 CC mamaa 11 1 0 BHBHBHH maana BHHHHH mana 1 BUBBH mannaaa BHHHHUEH mHEBHHEH BHBHHHHH maHHEHHH BBBHEUSE BHEBHHHU DUBBHHHH BHEBBHHH DBDHBHEBHH mHHEnnH DBDBBHH manono ooo manom oooo BHHBH BHBBDDDDDUDUDDHHEHHS context menu accessible via the right mouse button allows to add new input metrics and operations All created custom metrics become available in the Set Metric se lections of the Performance Radar and Counter Data Timeline charts There are available as well in the overlay mode of the Master Timeline Custom metrics can be exported and imported in order to use them in multiple trace files 4 2 Statistical Charts 4 2 1 Function Summary The Function Summary chart Figure gives an overview of the accumulated time consumption across all function groups and functions For example every time a process calls the MP _Send function the elapsed time of that function is added to the MPI function group time The chart gives a condensed view of the execution of the application A comparison between the different function groups can be made and dom
29. E BHEBEHHU DUBBHHHH BHEBBHHH DBDBHEBHH BEDODO DBDBBHH mmEnnu DBHHBH manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS 5 2 Usage of Charts For the comparison of performance metrics the Compare View provides all common charts of Vampir In contrast to the ordinary Trace View the Compare View opens one chart instance for each trace file i e with three open trace files one click on the Master Timeline icon opens three Master Timeline charts Also in order to distinguish the same charts between the trace files all charts belonging to one trace file have a special individual background color Figure 5 5 depicts a Compare View with open Master Timeline Process Timeline and Function Summary charts OO Vampir Compare View W File Edit Chart Window Help ole es m CV Wee Aa S dy 175 son B calcTest otf 16 966 s Os 175 son C calcTest otf 16 966 s Timeline Function Summary Os 5s 10s 15s All Processes Number of Invocations per Function G 40 20 0 Ro E un Process 1 ag CALCULATION pom MU Test Process 3 ER v_ar Application Process 0 Process 1 Process 2 Process 3 Function Summary All Processes Number of Invocations per Function G 40 20 0 eee Rer un EE CALCULATION geet test Process 3 i l vr an i Application Process 0 i l l I 2 ER i E E Function Summary Process 0 i All Processes Number of Inv
30. Group MPI Interval Begin 99 203092 s Interval End 99 28449 s Duration 81 3979 ms Context View showing context information B of a selected function A No Vampir Trace View Vampir Large wrf otf YW File Edit Chart Filter Window Help SS xd we Timeline Context View dad ns Master Timeline 34 Master Timeline 54 Diff X Process O Property Value 1 Comparison Value 2 Diff Process 1 Display Master Timeline Master Timeline Process 2 Type Function Function Process 3 Function MPI Wait MPI Wait andate Function Group MPI MPI Interval Begin 99 346965 s gt 99 203092s 0 143873 s Process 5 Interval End 99 411091 s gt 99 28449 s 0 126601 s Process 6 Duration 64 1259 ms lt 81 3979 ms 17 272 ms Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Figure 4 21 Comparison between Context Information no nua mamma CH mamaa BHHBHH maana BHHHHH mana BUBB mannana h BHHHUEH EHHEBHHEH BHBHHHHH mBBHHHEHHH BBBHEUBE BHEBEHHU DUBBHHHH mBHEBBHHH DBDHBHEBHH mHmpnnu DBDBBHH mmEnnu DBHHBH manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS cases he comparison shows a list of common properties The corresponding values as well as the differences are displayed The first line always indicates the names of the respective chart
31. Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 680 MiB s 640 MiB s 600 MiB s 560 MiB s 520 MiB s 480 MiB s 440 MiB s 400 MiB s 360 MiB s 320 MiB s 280 MiB s 240 MiB s 200 MiB s 160 MiB s 120 MiB s 80 MiB s mamma CC mamaa TH maana BHHHHH maaa 1 n BUBBH mannaaa BHHHHUuH EHHEBHHEG BHBHHHHH mmHHEHHH BBBHEUHE BHEBHHHU DUBBHHHH BHEBBHHH DBDHHEBHH mHHmEnnu DBDBBHH mmmEnnu ooo manom oooo BHHBH BHBBDDDDDUDUDDBHEHHN nunnu nna 40 MiB s 0 MiB s Figure 4 15 Communication Matrix View indicates the displayed values It adapts automatically to the currently shown value range It is possible to change the type of displayed values Different metrics like the aver age duration of messages passed from sender to recipient or minimum and maximum bandwidth are offered To change the type of value that is displayed use the context menu option Set Metric Use the Process Filter to define which processes groups should be displayed see Section 4 4 Note A high duration is not automatically caused by a slow communication path be tween two processes but can also be due to the fact that the time between starting transmission and successful reception of the message can be increased by a recipient that delays reception for some reason This will cause the duration to increase by this
32. alatai Process 0 Process 1 sync time Process 2 Process 3 sync time E D Marker View ES Marker View Type Process Pr Time Description E Process P Time X Description E m Error Error MARMOT Error MARMOT Error Process 0 1 191 ERROR MPI Type cor Process O 0 125 ERROR MPI Send Process 1 1 190 ERROR MPI Type cor Warning Process 2 1 191 ERROR MPI Type cor MARMOT Warning Process 3 1 191 ERROR MPI Type cor Process O 89 31 WARNING MPI Rec M Warning Process 0 89 57 WARNING MPI Rec MARMOT Warning Process O 93 15 WARNING MPI Rec Process O 1 191 WARNING Tag 800 Process O 97 15 WARNING MPI Rec Process O 1 191 WARNING Tag 900 Process O 97 19 WARNING MPI Rec Process O 1 192 WARNING Tag 80 Process O 0 101 WARNING MPI Process O 1 192 WARNING Tag 900 Process O 0 105 WARNING MPI Lal DEJE H Figure 5 10 Open Marker View First step in order to use markers is to open the Marker View Figure 5 10 shows a Compare View with an open Marker View for each trace file After a click on one marker in the Marker View the selected marker is highlighted in the Master Timeline and the Process Timeline Another way to navigate to a marker in the timeline displays is to use the Vampir zoom lf the user zoomed in the Master Timeline or Process Timeline into the desired zooming level then a click on a marker in the
33. aster Timeline Section 4 1 1 Process Timeline Section 4 1 1 Counter Data Timeline Section 4 1 2 LEA ill mA A E Performance Radar Section 4 1 3 Function Summary Section 4 2 1 Message Summary Section 4 2 3 M Process Summary Section 4 2 2 PRP 2222 EF Communication Matrix View Section 4 2 4 UO Summary Section 4 2 5 Call Tree Section 4 2 6 Function Legend Section 4 3 1 Context View Section 4 3 3 Marker View Section 4 3 2 Table 3 1 Icons of the Charts Toolbar The Charts Toolbar is used to open instances of the available performance charts It is located in the upper left corner of the Trace View window as shown in Figure 8 1 The toolbar can be dragged and dropped to alternative positions The Charts Toolbar can be disabled with the toolbar s context menu entry Charis Table 3 1 gives an overview of the available performance charts with their correspond ing icons The icons are arranged in three groups divided by small separators The 23 Amy Amy first group represents timeline charts whose zooming states affect all other charts The second group consists of statistical charts providing special information and statistics for a chosen interval Vampir allows multiple instances for charts of these categories The last group comprises of informational charts providing specific textual information or legends Only one instance of an informational chart can
34. ations and having a look at the Function Summary The function with the largest bar takes up the most time In this example Figure 7 2 the MICROPHYSICS routine can be identified as the most costly part of an iteration Therefore it is a good candidate for gaining speedup through serial optimization tech niques Solution In order to get a fine grained view of the MICROPHYSICS routine s inner workings we had to trace the program using full function instrumentation Only then it was possible to inspect and measure subroutines and subsubroutines of MICROPHYSICS This way the most time consuming subroutines have been spotted and could be analyzed for optimization potential 2 ann ann mamma CH mamaa 6 BBHBHH maana BHHHHH mana BUBHB mannana BHHHHHUEH EHEBHHEH BHBHHHHH manomano BBBHEUSE BHEBEHHU DUBBHHHH mBHEBBHHH DBDBHEBHH mHmpnnu DBDBBHH mmEnnu DBHHBH manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS CHAPTER7 AUSE CASE Nam The review showed that there were a couple of small functions which were called a lot So we simply inlined them With Vampir you can determine how often a functions is called by changing the metric of the Function Summary to the number of invocations The second inefficiency we discovered had been invariant calculations being done in side loops So we just moved them in front of the respective loops Figure 7 3 sums up the tuning of the compu
35. ccurring metric For a quick and convenient overview it is also possible to show minimum maximum and average values at once This option is available for the metrics Transaction Size Range of I O Operations Time Range of I O Operations and Bandwidth Range of I O Operations The min imum and maximum values are shown in an additional smaller bar beneath the main bar indicating the average value The additional bar starts at the minimum and ends at the maximum value of the metric see Figure In order to select the I O operation types that should be considered for the statistic calculation use the Set I O Operations sub menu of the context menu Available options are Read Write Read Write and Apply Global I O Operations Filter The latter includes all selected operation types from the I O Events filter dialog see Chapter 4 4 40 mamma CC mamaa BHBHBHH maana BHHHHH maaa BUBBH mannaaa BHHHUHH EHEBHHHEH BHBHHHHH manomano BBBHEUSE BHEBEHHU DUBBHHHH BHEBBHHH DBDHHEBHH mHHEHDH DDBBHH mmmEnnu DBHHBH manom oooo BHHBH BHBBDDDDDOUDUDDHHUEHHS 4 2 6 Call Tree The Call Tree depicted in Figure 4 17 illustrates the invocation hierarchy of all mon itored functions in a tree representation The display reveals information about the number of invocations of a given function the time spent in the different ca
36. ctions outside a specified range 55 5 Comparison of Trace Files The comparison of trace files in Vampir extends existing functionality That way the user can best benefit from already gained analysis experience For the comparison of performance characteristics all common charis are provided In order to effectively compare multiple trace files their zoom needs to be coupled and synchronized For comparison of selected areas of interest the trace files need to be freely shiftable in time This allows for arbitrary alignments of the trace files and thus enables compari son of user selected areas in the trace data Nc Vampir Compare View YW File Edit Chart Window Help m CV Wee bR AS BH Y fi 8 Si 1680815 EPL 68 0 9Fs RO V5 as 16 808 s 16 son B calcTest otf d S 16 808 s 16 809 s son C calcTest otf 1 044 ms Timeline 16 8077 s 16 8079 s 16 80815 16 8083 s 16 8085 s Process O Process 1 Process 2 Process 3 Process O Process 1 Process 2 Process 3 Process O Process 1 Process 2 Process 3 aM u l i l D Figure 5 1 Compare View All comparison features are provided in the Compare View window depicted in Fig ure 5 1 This section introduces the Compare View window and explains its usage It illus trates the functionality with the help of screen shots For this purpose the comparison 56 mamma CH mamaa nEHER maana HHHHH Er mannana
37. ctively Figure 3 7 Function Summary m X All Processes Accumulated Exclusive Gr 1 000 5 0s EAEN o 495 592 116 3 1 7781s NETCDF lt 15 MEM lt 0 15 VT API Bier Figure 3 5 Closing right and Undocking left of a Chart Considering that labels e g those showing names or values of functions often need more space to show its whole text there is a further option of resizing In order to read labels completely it might be useful to alter the distribution of space shared by the labels and the graphical representation in a chart When hovering the blank space between labels and graphical representation a movable separator appears By drag ging the separator decoration with the left mouse button the chart space provided for the labels can be resized The whole process is illustrated in Figure 3 8 3 2 Context Menus All chart displays have their own context menu containing common as well as display specific entries In this section only the most common entries will be discussed A context menu can be accessed by right clicking anywhere in the chart window Common entries are e Reset Zoom Go back to the initial state in horizontal zooming e Reset Vertical Zoom Go back to the initial state in vertical Zooming e Set Metric Set the values which should be represented in the chart e g change from Exclusive Time to Inclusive Time e Sort By Rearrange values or bars by a certain charac
38. d Reduction 45 4 5 Function Filtering 2 a 46 4 5 1 Filter Option 47 4 5 2 Examples 0 A 49 5 Comparison of Trace Files 56 pee ebeenetaeeaAns eee eee RSS 57 5 2 Usageof Charts e 59 5 3 Alignment of Multiple Trace Files 60 5 4 Usage of Predefined Markers 0 0000 ee 63 65 6 1 General Preferences 0 0 0 0 00000000 2 eee 65 6 2 Appearance 1 eee eee ers 66 ee ee ee 67 69 7 31 introduction RR 69 7 2 Identified Problems and Solutions 70 7 2 1 Computational Imbalance Ls 70 7 2 2 SerialOptimization lll ll 72 7 2 3 High Cache Miss Hate 73 7 3 Conclusion cele ss 75 no nua mamma CC mamaa 8BHBHBHH maana nana maaa 1 BUBBH mannana BHHHHUEH BERDDECO BHBHHHHH maHHEHHH BBBHEUHE BHEHBHHHU DHUBBGHHHH BHEBBHHH DBDHHEBHH mHHmEnnu DBDBBHH manono aoe manom nnnu BBHHBH BHBBDDDDDUDUDDHHEHHS CHAPTER 1 INTRODUCTION ZS 1 Introduction Performance optimization is a key issue for the development of efficient parallel soft ware applications Vampir provides a manageable framework for analysis which en ables developers to quickly display program behavior at any level of detail Detailed performance data obtained from a parallel program execution can be analyzed with a collection of differe
39. dow Help rus e3m 5 AE Function Summary All Processes pe Exclusive Time per Function 400 s 350 s 300 S 250 S 200 S 150 s 100 S 50s Function Summary All Processes Accumulated Exclusive Time per Function 400s 350s 300s 2505 200 s 150 s 100 s E Figure 3 8 Resizing Labels A Hover a Separator Decoration B Drag and Drop the Separator 3 3 Zooming Zooming is a key feature of Vampir In most charts it is possible to zoom in and out to get detailed or abstract views of the visualized data In the timeline charts Zooming produces a more detailed view of a selected time interval and therefore reveals new information that was previously hidden in the larger section Short function calls in the Master Timeline may not be visible unless an appropriate zooming level has been reached In other words if the execution time of functions is too short with respect to the available pixel resolution of your computer display Zooming into a shorter time interval is required in order to make them visible Note Other charts are affected by zooming in the timeline displays The interval cho sen in a timeline chart such as Master Timeline or Process Timeline also defines the time interval for the calculation of accumulated measurements in the statistical charts Statistical charts like the Function Summary provide zooming of statistic values In these cases zooming does not affect a
40. ds only a fraction of the original time Figure 7 5 73 a y 4 GWT iremos 7 2 IDENTIFIED PROBLEMS AND SOLUTIONS OQ O Vampir Trace View Vampir SuccessStory pmp old otf w File Edit Chart Filter Window Help Xx Figure 7 4 Before Tuning Counter Data Timeline revealing a high amount of L2 cache misses inside the CLIPPING routine light blue 2 Nc Vampir Trace View Vampir SuccessStory pmp_tuned otf w File Edit Chart Filter Window Help x Figure 7 5 After Tuning Visible improvement of the cache usage 74 mamma CC mamaa 8BHBHBHH maana 1 1 BHHHHH maaa BUBB mannana BHHHHHUHH EmHEBHHEG BHBHHHHH manomano BBBHEUSE BHEBHHUU DUBBHHHH BHEBBHHH DBDBHEBHH mHHEHnH DBDBBHH mmmEnnu ooo manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS 7 3 Conclusion By using the Vampir toolkit three problems have been identified As a consequence of addressing each problem the duration of one iteration has been decreased from 3 5 seconds to 2 0 seconds gt LU Vampir Trace View Vampir SuccessStory pmp tuned otf W File Edit Chart Filter Window Help SS re ee bR SE BS d e Timeline A x Function Summa 1 0s 50 s 100 s 150 5s 2060 s All Processes Accumulated Exclusive Time per F 1 40 0 30 0 20 0 10 0 0 0 EEE METEO 7 25 M MP UTIL iL 07 Application 0 46 VT API 0 27 COUPLE Process 0 Process 7 Pr
41. ere the actual weather forecasting takes place In a normal weather simulation this part would be much larger But in order to keep the recorded trace data and the overhead introduced by tracing as small as possible only a few iterations have been recorded This is sufficient since they are all doing the same work anyway Therefore the simulation has been configured to only forecast the weather 20 seconds into the future The iteration part consists of two large iterations Figure 7 1 B and C each calculating 10 seconds of forecast Each of these in turn is partitioned into several smaller iterations For our observations we focus on only two of these small inner iterations since this is the part of the program where most of the time is spent The initialization work does not increase with a higher forecast duration and would only take a relatively small amount of time in a real world run The constant part at the beginning of each large iteration takes less than a tenth of the whole iteration time Therefore by far the most time is spent in the small iterations Thus they are the most promising candidates for optimization All screenshots starting with Figure 7 2 are in a before and after fashion to point out what changed by applying the specific improvements 7 2 Identified Problems and Solutions 7 2 1 Computational Imbalance A varying size of work packages thus varying processing time of this work means waiting time in subsequen
42. forschung innovation GWT Vampir 7 User Manual ERR ERED BERRA ERR BR g gsguam E E E EJ ET ET E AAAA OOo DETE E S E f E E ES EL S f i E E ES EI RI I I I E EJ ES ESI RI I I I OBESE E ES ES AA UNI GWT forschung innovation Copyright 2011 GWT TUD GmbH Blasewitzer Str 43 01307 Dresden Germany http gwtonline de Support Feedback Bugreports Please provide us feedback We are very interested to hear what people like dislike or what features they are interested in If you experience problems or have suggestions about this application or manual please contact service amp vampir eu When reporting a bug please include as much detail as possible in order to reproduce it Please send the version number of your copy of Vampir along with the bug report The version is stated in the About Vampir dialog accessible from the main menu under Help About Vampir Please visit http vampir eu for updates http vampir eu Manual Version 2011 11 11 Vampir 7 5 no nua mamma CC mamaa 8BHBHBHH maana nana maaa 1 BUBBH mannana BHHHHUEH BERDDECO BHBHHHHH maHHEHHH BBBHEUHE BHEHBHHHU DHUBBGHHHH BHEBBHHH DBDHHEBHH mHHmEnnu DBDBBHH manono aoe manom nnnu BBHHBH BHBBDDDDDUDUDDHHEHHS Contents VAMBIR Contents 9 1 1 Event based Performance Tracing and Profiling 5 1 2 The Open Trace Format OTF
43. ge transfer rate A and the minimal maximal transfer rate B between Aggregated Message Volume Message Size Number of Messages and Message Transfer Rate The group base can be selected via the context menu entry Group By Possible options are Message Size Message Tag and Communicator MPI Note There will be one bar for every occurring group However if the metric is set to Message Transfer Rate the minimal and the maximal transfer rate is given in an additional small bar beneath the main bar showing the average transfer rate The additional bar starts at the minimal rate and ends at the maximal rate see Figure 4 14 In order to filter out messages click on the associated label or color representation in the chart and then choose Filter from the context menu 4 2 4 Communication Matrix View The Communication Matrix View is another way of analyzing communication imbal ances It shows information about messages sent between processes The chart as shown in Figure 4 15 is figured as a table Its rows represent the sending processes whereas the columns represent the receivers The color legend on the right 38 2500 CHAPTER 4 PERFORMANCE DATA VISUALIZATION Vampir Trace View Vampir Large wrf otf YW File Edit Chart Filter Window Help ExupeosEns45 6240 Communication Matrix View Average Message Data Rate Process 0 Process 1 Process 2 Process 3
44. he coarse shifting a finer alignment can be done in the Master Timeline There fore the user needs to zoom into the area to compare Then while keeping the Ctrl Cmd on Mac OS X modifier key pressed the trace can be dragged with the left mouse button in the Master Timeline Figure depicts the process of dragging trace B to the compute iterations of trace A As can be seen in the Figure 5 9 al though the initialization of trace A took the longest time this machine was the fastest in computing the iterations 62 mamma CC mamaa CH maana HHHHH maaa BUBBH mannana BHHHHHUHH EHEHHHEG BHBHHHHH manomano BBBHEUHE BHEBHEHHU DUBBHHHH BHEBBHHH DBDHBHEBHH mHHEHDH DBDBBHH mmEnnu ooo mmEBH oooo BHHBH BHBBDDDDDODUDDHHEHHS 5 4 Usage of Predefined Markers The Open Trace Format OTF allows to define markers pointing to particular places of interest in the trace data These markers can be used to navigate in the trace files For trace file comparison markers are interesting due to their potential to quickly locate places in large trace data With the help of markers it is possible to find the same location in multiple trace files with just a few clicks OO Vampir Compare View W File Edit Chart Window Hep D EI Timeline 0 0 s 0 65 Process 0 Process 1 Process 2 WIEN Process3 __ ala
45. he filter shows only MPI functions that have a duration time of more than 250 ms Show only functions that match al of the following conditions Name Contains mpi a Duration Isgreaterthan 250 Miliseconds wm 20060 Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help m reck Ree FA RAY Timeline 5 55 6 05 6 55 Process 0 gt jo os p Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 Figure 4 28 Combining rules using all 53 a GWT bsos 45 FUNCTION FILTERING Building Ranges with Number of Invocation Rules The combination of rules also allows for the filtering of functions in a specified criteria range The following example filter setup shows all functions whose number of invoca tions lie inside the range between 2000 and 15000 Y Show only functions that match al of the following conditions Number of Invocations T Is greater than 2000 Number of Invocations T Is less than T 15000 20069 Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help yj gt 1 naa E 39 m milite EAG S d V 39 042 Timeline Function Summary Os 10s 20s 305s All Processes Number of Invocatio
46. i wait 109 572 s Ill ADVECT SCALAR 160 215 s BI CALC co 107344 s ADVANCE uv 138 95 s CALC P RHO PHI 106 174 s MM ADVANCE MU_T 137 865 s ADVANCE_W 112 872 s YSU 109 572 s ADVECT SCALAR 107 344 s ADVANCE UV 106174 s ADVANCE MU T MI Processes Accumulated Exclusive Time per Fun 250s Os RADIATION DRIVER WSM3 MPI Bcast Figure 3 3 Moving and Arranging Charts in the Trace View Window 1 O O Vampir Trace View Vampir Large wrf otf All Processes Accumulated Exclu 250s Os RADI IVER 317708 wsw3 MPI Bcast 17 986 II SOLVE_EM 171 423 s BE Mei wait 160 215 s BI caLc co 138 95 s BB CALC pu 137 865 s BM ADVANCE w 112 872 s WI vsu 109 672 s IIl ADVE ALAR 107344 s BB ADVANCE uv 106174 s Bill ADVA MU_T 85 832 s ALLO IELD 64 356 s Bl SMAL PREP 39 952 s CALC P RHO 4 722 s Bl PHY_PREP 46 491 s ij SUMFLUX 44 281 s I RK_T ENCY 44 016 s f CUMU IVER 36 653 s nr A DRY 133 484 s fj wois P EM Figure 3 4 Moving and Arranging Charts in the Trace View Window 2 17 Amy as depicted in Figure Customized layouts can be saved as described in Chap ter 6 3 Every chart can be undocked or closed by clicking the dedicated icon in its upper right corner as shown in Figure Undocking a chart means to free the chart from the current arrangement and present it in an own window To dock undock a chart follow Figure 3 6 respe
47. ific time interval in more detail For further information on zooming see Section 3 3 If zooming has been performed scrolling in horizontal direction is possible with the mouse wheel or the scroll bar at the bottom The Process Timeline resembles the Master Timeline with slight differences The chart s timeline is divided into levels which represent the different call stack levels of function calls The initial function begins at the first level a sub function called by that function is located a level beneath and so forth If a sub function returns to its caller the graphical representation also returns to the level above In addition to the display of categorized function invocations Vampir s Master and Process Timeline also provide information about communication events Messages exchanged between two different processes are depicted as black lines In timeline charts the progress in time is reproduced from left to right The leftmost starting point of a message line and its underlying process bar therefore identify the sender of the message whereas the rightmost position of the same line represents the receiver of the message The corresponding function calls usually reflect a pair of MPI com munication directives like MPI Send and MI Recv Collective communication like MPL Gatherv is also displayed in the Master Timeline as shown in Figure 4 3 Furthermore additional information like message bursts markers and UO
48. inant function groups can be distinguished easily XN OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help EISE 208 s miriwe m xouoo uli M Function Summa Function Summa All Processes Accumulated Exclusive Time per Function Group All Processes Accumulated Exclusive Time per F 1 500 s 1 000 s 500 s 0s 495 692 s BEI mr 116 338 s WRF 18 999 s 1 0 7 781 s NETCDF 1s MEM 0 1s VT_API MPI 495 692 s DYN 1 713 881 s PHYS 980 425 s Figure 4 12 Function Summary lt is possible to change the information displayed via the context menu entry Set Met ric that offers options like Average Exclusive Time Number of Invocations Accu mulated Inclusive Time etc 35 Any GWTo 4 2 STATISTICAL CHARTS Note Inclusive means the amount of time spent in a function and all of its subrou tines Exclusive means the amount of time spent in just this function The context menu entry Set Event Category specifies whether function groups or functions should be displayed in the chart The functions own the color of their function group lt is possible to hide functions and function groups from the displayed information with the context menu entry Filter In order to mark the function or function group to be filtered just click on the associated label or color representation in the chart Using the Process Filter see Sect
49. indows system Other platforms work accordingly C Program Files Vampir Vampir exe trace file To open multiple trace files at once you can give them one after another as command line arguments C Program Files Vampir Vampir exe file 1 file n 13 Amy Amy If Vampir was associated with ott files during the installation process it is also possi ble to start the application by double clicking an ot file The trace files to be loaded have to be compliant with the Open Trace Format OTF standard described in Chapter 1 2 Microsoft HPC Server 2008 is shipped with the translator program etl2otf exe which produces appropriate input files for this platform While Vampir is loading the trace file an empty Trace View window with a progress bar at the bottom opens After Vampir loaded the trace data completely a default set of charts will appear The loading process can be interrupted at any time by clicking on the cancel button in the lower right corner of the Trace View depicted in Figure 2 3 Because events in the trace file are loaded one after another the GUI will open but only show the earliest already loaded information from the trace file For large trace files with performance problems assumed to be at the beginning this proceeding is a suitable strategy to save time XN OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help ali Vr p gg Pg PUTO w e Figure 2 3 Progre
50. ion 4 4 allows you to restrict this chart to a set of processes As a result only the consumed time of these processes is displayed for each function group or function Instead of using the filter which affects all other displays by hiding processes it is possible to select a single process via Set Process in the context menu of the Function Summary This does not have any effect on other charts The Function Summary can be shown as Histogram a bar chart like in timeline charts or as Pie Chart To switch between these representations use the Set Chart Mode entry of the context menu The shown functions or function groups can be sorted by name or value via the context menu option Sort By 4 2 2 Process Summary The Process Summary depicted in Figure 4 13 is similar to the Function Summary but shows the information for every process independently This is useful for analyzing the balance between processes to reveal bottlenecks For instance finding that one process spends a significantly high time performing the calculations could indicate an unbalanced distribution of work and therefore can slow down the whole application The context menu entry Set Event Category specifies whether function groups or functions should be displayed in the chart The functions own the color of their function group The chart calculates statistics based on Number of Invocations Accumulated Inclu sive Time or Accum
51. ir Trace library A new compilation of the program source code is only necessary if program specific events should be added Detailed information of the installation and usage of Vampir Trace can be found in the VampirTrace User Manual H 2 3 1 Enabling Performance Tracing To perform measurements with Vampir Trace the application program needs to be in strumented Vampir Trace handles this automatically by default nevertheless manual instrumentation is possible as well All the necessary instrumentation of user functions MPI and OpenMP events is han dled by the compiler wrappers of Vampir Trace vtcc vtcxx vtf77 vtf90 and the addi tional wrappers mpicc vt mpicxx vt mpif77 vt and mpif90 vt in Open MPI 1 3 All compile and link commands in the used makefile should be replaced by the Vampir Trace compiler wrapper which performs the necessary instrumentation of the program and links the suitable Vampir Trace library http www tu dresden de zih vampirtrace 11 Amy Amy Automatic instrumentation is the most convenient method to instrument your program Therefore simply use the compiler wrappers without any parameters e g vtf90 hello f90 o hello For manual instrumentation with the Vampirlrace API simply include vt user inc Fortran or vt user h C C and label any user defined sequence of statements for instrumentation as follows VT USER START name VT USER END name in Fortran and C
52. it is almost not visible Zooming into the compute iterations of trace C would make them visible but would also only reveal the MPI Init phase of trace A and B see Figure 5 6 In order to compare the compute iterations the trace files need to be aligned properly This process is described in Section 5 3 5 3 Alignment of Multiple Trace Files The Compare View functionality to shift individual trace files in time allows to compare areas of the data that did not occur at the same time For instance in order to compare the compute iterations of the three example trace files these areas need to be aligned to each other This is required due to the fact that the initialization of the application took different times on the three machines 60 no nua mamma CH mamaa nRHER maana 1 1 HHHHH maaa BUBB mannana 0 BHHHHUEH EHHEBHHEH BHBHHHHH maHHEHHH BBBHEUHE BHEHBHHHU DHUBBHHHH BHEBBHHH DBDBHEBHH mHHEnnu oons gooooo ooo manom oooo BHHBH BHBBDDUDDDUDUDDHHEHHS Time mode k Set Time Offset h Reset Zoom Ctri R Reset Time Offset Figure 5 7 Context menu controlling the time offset 2096 Vampir Compare View W File Edit Chart Window Help SS cv Wee AS BS dy 0 195 s son B calcTest otf 15 676 s Os 17 50n C calcTest otf 16 966 s Timeline 05 3s 65 95 12s 15s SG Process 0 Process 1 Process 2 Process 3 Process 0 Process 1 Process 2
53. l line d l 0 0 ms 2 5 ms 5 0 ms 7 5 ms All Processes Bes ef Te E Function Gr 2 Bee EE Process 1 em CALCULATION Process 3 i vr Ar m Application Function Legend AR E Application CALCULATION m mpi TEST m vr ari Figure 5 2 Vampir with three open trace files Now the Compare View can be opened This is done from the main menu by select ing Window Compare Traces see Figure 57 a a GWT swe BI STARTING THE COMPARE VIEW Window Tile Horizontally Tile Vertically Cascade Save Screenshot yw Trace View Vampir Comparison A calc Test otf W Trace View Vampir Comparison B calc Test otf i Trace View Vampir Comparison C calcTest otf Close All Compare Traces Figure 5 3 Vampir Window menu entries Figure shows the opened Compare View As can be seen by the navigation toolbars all three open trace files are included in the Compare View The files in the view are sharing one coupled zoom The usage of charts and zooming in this view is described in Section OO Vampir Compare View Eie W File Edit Chart Window Help u Ree AS S d V TUS son B calcTest otf 16 966 s 05 TAS Son C calcTest otf 16 966 s o B le Figure 5 4 Open Compare View 08 mamma CH mamaa BHHBHH maana 11 1 BHHHHH maaa 1 BUBBH mannana BHHHUEH EHEHHHEG j BHBHHHHH maHHEHHH BBBHEUS
54. lls and the caller callee relationship OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help SrusOndEm 5 pd Al Call Tree All Processes Function Min Inclusive Time Max Inclusive Time Il sPEC BDYUPDATE PH 4 393 ms 0 239 s Il SMALL STEP PREP 3 9515 4 097 s Il SMALL STEP FINISH 1 989 s 2 0725 SET TILES2 3 373 ms 4 106 ms B malloc 7 400 us 13 950 us n free 8 250 us 15 950 us WRF MESSAGE 140 600 us 282 700 us m write REGION BOUNDS 1 572 ms 1 803 ms NL GET TILE SZ Y 24 800 us 25 550 us NL GET TILE SZ X 24 950 us 28 450 us v write x Callers Callees lll EXT NCD PUT DOM TI INTEGER 2 lll EXT NCD PUT DOM TI REAL 2 Il Cor NCD IOSYNC 1 END TIMING 3 EI LANDUSE INIT 1 D EXT NCD REALFIELDIO 1 Ill EXT NCD INTFIELDIO 1 Il MPL Init 1 WRF TERMIO DUP 1 WRF MESSAGE 6 INITIAL CONFIG 1 ES Figure 4 17 Call Tree The entries of the Call Tree can be sorted in various ways Simply click on one header of the tree representation to use its characteristic to re sort the Call Tree Please note that not all available characteristics are enabled by default To add or remove characteristics use the Set Metric sub menu of the context menu To leaf through the different function calls it is possible to fold and unfold the levels of the tree This can be achieved by double clicking a level or by u
55. main menu under Filter Functions The filter can be enabled disabled using the checkbox left hand side in the header Show only functions that of the dialog The Function Filter Dialog is build on the concept of filter rules The user can define several individual rules The rules are explained in more detail in Chapter 4 5 1 The header of the dialog defines how multiple rules are evaluated One possibility is to build up the filter in a way that combines the filter rules with an and relation To choose this mode all must be selected in the combo box in the header of the dialog This means that all rules must be true in order to produce the filter output The other option is to combine the rules with an or relation To choose this mode any must be selected in the combo box in the header of the dialog In this case each rule is applied individually to the trace file The examples in Chapter 4 5 2 illustrate both modes 46 no nua mnmma CC mamaa nEHER maana BHHHHH maaa n 1 1 BUBBH mannana 0 BHHHUuH EHHEBHHHEH BHBHHHHH mmuHEHHH BBBHEUSE BHEBHEHHU DHUBBGHHHH BHEBBHHH DBDBHEBHH mHmEnnu oons gooooo BDBHHBH manom nnnu BHHBH S08 0000000000000R0m8 D Filter Functions Show only functions that match any of the following conditions A Name Contains wv Apply Q cancel Figure 4
56. n This setting comes into effect when multiple measured data points need to be displayed on one pixel If Maximum or Minimum is active the data point with the highest or lowest value is displayed respectively In case of Average the average of all data points on the respective pixel width is displayed This process is also explained in the section Counter Data Timeline 4 1 2 The value range of the color scale can be easily adjusted with the left mouse button In order to adjust the color coded value rage just drag the edges of the color scale to the desired positions Figure 4 10 shows the Performance Radar chart in Figure 4 9 with a smaller value range of 1G 3G FLOPS This allows to easily spot areas of high or low performance in the trace file The selected value range can also be dragged to other positions in the color scale A double click with the left mouse button on the color scale resets the selected value range The option Options Color Scale in the context menu of the chart allows to cus tomize the color scale to the own preferences Custom Metrics Description Wait Time Unit 1 s Metric Function MPL Irecv Inclusive Metric Function MPI_Wait Inclusive CESSA Figure 4 11 Custom Metrics Editor The Custom Metrics Editor allows to derive own metrics based on existing counters and functions The editor is accessible via the context menu entry
57. n mamma CH mamaa nEHER maana HHHHH maaa 1 mBUBBH mannana BHHHHUHH EHEBHHEG BHBHHHHH maHHEHHH BBBHEUSE BHEHBHHHU DUBBHHHH BHEBBHHH DBDBHEBHH mHHEnnu DBDBBHH mmEnnu ooo mamom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS CHAPTER2 GETTING STARTED ZS If you want to you can associate Vampir with OTF trace files otf during the in stallation process The Open Trace Format OTF is described in Chapter 1 2 This allows you to load a trace file quickly by double clicking it Subsequently Vampir can be launched by double clicking its icon or by using the command line interface see Chapter 2 4 At the first start Vampir will display instructions for license installation 2 2 Generation of Trace Data on Windows Systems 2 2 1 Enabling Performance Tracing The generation of trace log files for the Vampir performance visualization tool requires a working monitoring system to be attached to your parallel program The Event Tracing for Windows ETW infrastructure of the Windows client and server OS s is such a monitor The Windows HPC Server 2008 version of MS MPI has built in support for this monitor It enables application developers to quickly produce traces in production environments by simply adding an extra mpiexec flag t race In order to trace an application the user account is required to be a member of the Administrator or Performance Log Users groups No special builds or administrati
58. n Chapter 3 5 In the following sections we will explain the basic functions of the Vampir GUI which are generic to all charts If you are already familiar with the fundamentals feel free to skip this chapter The details of the different charts are explained in Chapter 4 15 7 A GWT ossnsmonm A1 CHART ARRANGEMENT 3 1 Chart Arrangement Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help le Eris eos ttmss Ble eo IT Process Summary Function Summary Similar Processes Accumulated Exclusive Time per Functions All Processes Accumulated Exclusive Time per Fun 1 000 s 0s i 93 04 s WRF 7 779 s NETCDF Timeline Function Legend gt f 505s 100s 150s 200s Process 0 E EE MEM Process 1 Process 2 ic La Process 3 M WRF Process 4 Call Tree i All Processes Process 0 Values of Counter PAPI FP OPS over Time 40G 2 d p Function Min Inclusi M write B malloc p free F WRF_IOINIT WRF IOEXIT M WRF GET DM COMMUNICATOR lt b 4 ui fo Ld 2 Figure 3 2 A Custom Chart Arrangement in the Trace View Window The utility of charts can be increased by correlating them and their provided informa tion Vampir supports this mode of operation by allowing to display multiple charts at the same time All timeline charts such as the Master Timeline and the Process Timeline
59. n all measurement points The two left buttons in the dialog decide whether the counter should be selected by metric or by measuring point first In the case of Select by Metric there is also the op tion to Summarize multiple measuring points available This option allows to identify outlier by summarizing counters e g PAPI FP OPS over multiple measuring points e g processes Hence when this option is active multiple measuring points can be selected The counter for the selected metric is then summarized over all selected measuring points The displayed counter graphs in the chart need then to be read as follows The yellow average line in the middle displays the average value e g PAPI FP OPS of all selected measuring points e g processes at a given time The red maximum line shows the highest value that one of the selected measuring points achieved at a given time The blue minimum line shows the smallest value that one of the selected measuring points e g process achieved at a given time A click with the left mouse button on any point in the chart reveals its details in the Context View display Stated are the min max and average values and the measurement points e g processes that achieved maximum and minimum values at the selected point in time The options dialog is depicted in Figure 4 8 It can be enabled via the context menu 31 Ap GWT bismo A1 TIMELINE CHARTS x Options
60. n to work properly you need a trace file with source code location support The path to the source file can be adjusted in the Preferences dialog A limit for the size of the source file to be opened can be set too In the Analysis section the number of analysis threads can be chosen If this option is disabled Vampir determines the number automatically by the number of cores e g two analysis threads on a dual core machine In the Updates section the user can decide if Vampir should check automatically for new versions 65 a GWT smemo 6 2 APPEARANCE Preferences Charts Show time as Seconds Automatically open context view Use color gradient in charts Font Sans Serif Select Restore Default Source code Source file location Search Miles iil Remove prefix from source reference Appearance Set maximum size for source file in KiB 100 r Analysis Fix number of analysis threads DE H Updates Ch Automatically check for newer versions Color blindness Saving Poli OE Enable support for color blindness E Document layout Enable multiple document interface yf Apply Q cancel Pax Figure 6 1 General Settings Vampir also features a color blindness support mode On Linux systems there is also the Document layout option available If this option is enabled all open Trace View windows need to stay in one main wind
61. nalyzed now Amy Amy Rank O node myApp exe Run myApp with tracing enabled ____________y Time Sync the ETL logs _ _ __ gt SPIED Convert the ETL logs to OTF mpicsync Copy OTF files to head node Sie etizotf share Rank 1 node oa v Rank N node po tS HEAD NODE Figure 2 1 MS MPI Tracing Overview The following commands illustrate the procedure described above and show as a prac tical example how to trace an application on the Windows HPC Server 2008 For proper utilization and thus successful tracing the file system of the cluster needs to meet the following prerequisites e share userHome is the shared user directory throughout the cluster e MS MPI executable myApp exe is available in the shared directory e share userHome Trace is the directory where the OTF files are col lected 1 Launch application with tracing enabled use of tracefile option mpiexec wdir share userHome tracefile SUSERPROFILE trace etl myApp exe e wdir seis the working directory myApp exe has to be there e SUSERPROFILE translates to the local home directory e g c Users userHome on each compute node the eventlog file etl is stored locally in this directory 2 Time sync the eventlog files throughout all compute nodes mpiexec cores 1 wdir USERPROFILES mpicsync trace etl 10 no nua mamma CH mamaa nEHER maana
62. ns per Function 10 k Ok Processo EEEEB EEEE HEEE read Process 1 j gt SC am Process2 p 10 850 free Process 3 OO MPI Bcast Process 4 write Process5 3 840 E MODULE BC SPEC_BDYTEND Process 6 e 3 584 DEBUG IO wnr Process 7 3 584 USE INPUT sERvERS Process 8 3 584 MM MODULE op Gr OPERATION Process 3 472 El CALL PKG AND DIST REAL Process 10 gt 2 880 MODULE ADVE VECT SCALAR Process 11 e 2 400 J MODULE SMALL CALC P RHO Process 12 gt 2 160 MODULE EM RK SCALAR TEND Process 13 gt 2 160 MODULE EM R PDATE SCALAR Process l4 Process 1 1 gt ILI 2 o GT Figure 4 29 Show functions inside a specified range 54 mamma CH mamaa nEHER maana 1 BHHHHH mannana s BHHHHUEH EHEBHHEH BHBHHHHH mBBHHHEHHH BBBHEUHE BHEHBHHHHU DUBBGHHHH BHEBBHHH DBDHBHEBHH mHHEnnHu DBDBBHH mmEnnu ooo manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS nung This example demonstrates the opposite behavior of the previous example Here all functions whose number of invocations lie outside the range between 2000 and 15000 are shown i e functions with less than 2000 invocations and functions with more than 15000 invocations Y Show only functions that match any of the following conditions Number of Invocations Is less than 2000 Number of Invocations T Is greater than 15000 209 O Vampir
63. nt performance views Intuitive navigation and zooming are the key features of the tool which help to quickly identify inefficient or faulty parts of a pro gram code Vampir implements optimized event analysis algorithms and customizable displays which enable a fast and interactive rendering of very complex performance monitoring data Ultra large data volumes can be analyzed with a parallel version of Vampir which is available on request Vampir has a product history of more than 15 years and is well established on Unix based HPC systems This tool experience is also available for HPC systems that are based on Microsoft Windows HPC Server 2008 1 1 Event based Performance Tracing and Profiling In software analysis the term profiling refers to the creation of tables which summarize the runtime behavior of programs by means of accumulated performance measure ments Its simplest variant lists all program functions in combination with the number of invocations and the time that was consumed This type of profiling is also called inclusive profiling as the time spent in subroutines is included in the statistics compu tation A commonly applied method for analyzing details of parallel program runs is to record so called trace log files during runtime The data collection process itself is also re ferred to as tracing a program Unlike profiling the tracing approach records timed application events like function calls and message communication a
64. ny other chart Zooming is disabled in the Pie Chart mode of the Function Summary accessible via the context menu under Set Chart Mode Pie Chart 20 no nua mamma CH mamaa 5 GBHBHH maana BHHHHH maaa 1 1 1 BUBB mannana 0 BHHHUUH EHHEBHHEH BHBHHHHH mmuHEHHH BBBHEUHE BHHHBHEHHU DHUBBHHHH BHEBBHHH DBDHHEBHH mHHmEnnu DBDBBHH mmEnnu BDBHHBH manom nnnu BHHBH CHAPTER 3 BASICS VAM Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help mile 5d Timeline 705 75s 805s 100s 1055 110 s Process O Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 tI i L3 i i D 83 6 s 98 2 s 14 6 5 Figure 3 9 Zooming within a Chart To zoom into an area click and hold the left mouse button and select the area as shown in Figure 3 9 It is possible to zoom horizontally and in some charts also vertically In the Master Timeline horizontal zooming defines the time interval to be visualized whereas vertical zooming selects a group of processes to be displayed To scroll horizontally move the slider at the bottom or use the mouse wheel To get back to the initial state of zooming select Reset Horizontal Zoom or Reset Vertical Zoom see Section 3 2 in the
65. ocations per Function G 1 H H H gt 40 20 0 e e 4 TE CALCULATION ProcessO a TEST l Eye 2 EM Application 3 a Figure 5 5 Compare View with open charts All charts work the same way as in the Trace View Due to the fact that the Com pare View couples the zoom of all trace files the charts can be used to compare performance characteristics As can be seen in Figure 5 5 trace A has the biggest duration time The duration of 99 A PA OO Vampir Compare View I E D W File Edit Chart Window Help SS CV Wee bR AS DAY lms 1 0 x40 3 mscTest otf O ms 11 ms 1 OHO 3 mscTest otf Timeline Function Summary 0 0 ms 2 5 ms 5 0 ms 7 5 ms 10 0 ms All Processes Number of Invocations per Function G Ass pr Process1 2 NT p Process2 Process 3 Process O Process 1 Process 2 Process 3 Function Summary AX All Processes Number of Invocations per Function G 3 2 1 0 a o ne n ME DEER Apr ication Process 2 Process 3 i i Process 0 3 3 3 i i Function Summary Process 0 i All Processes Number of Invocations per Function G 1 Burn T un ET CALCULATION Process 0 Mo TEST 1 ver 2 EM Application l d a n 3 625 Figure 5 6 Zoom to compute iterations of trace C trace C is so short that
66. ocess 14 Process 21 Process 28 84156007515 Process 35 Process 42 Process 49 Process 56 Process 63 Process 70 Process 77 Process 84 Process 91 D 118 s 202 324368 s 84 324368 s Figure 7 6 Overview showing a significant overall improvement As is shown by the Ruler Chapter 4 1 in Figure 7 6 two large iterations now take 84 seconds to finish Whereas at first Figure 7 1 it took roughly 140 seconds making a total speed gain of 40 This huge improvement has been achieved by using the insight into the program s runtime behavior provided by the Vampir toolkit to optimize the inefficient parts of the code 19
67. oint with the lowest value and the yellow line indicates the average of all data points lying on this pixel width When zooming into an smaller time range less data points need to be displayed on the available pixel space 30 ann ann mamma CH mamaa BHHBHH maana 1 BHHHHH maaa BUBHB mannaaa BHHHHUHH EHEBHHEH BHHBHHHHH mHHHEHHH BBBHEUSE BHEHHHHHU DUBBHHHH BHEBBHHH DBDBHEBHH mHHEHDH DDBBHH mmEnnu DBHHBH manom oooo BHHBH BHBBDDDDDUDUDDHHEHHS Eventually when zooming far enough only one data point needs to be display on one pixel Then also the three graphs will merge together The actual measured data points can be displayed in the chart by enabling them via the context menu under Options E x Select Metric Select by Metric Metrics Measuring Points eee MEM APP ALLOC Select by Measuring Point negare Process 1 PAPI L3 TCM Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Summarize multiple measuring points eo One Figure 4 7 Select metric dialog The context menu entry Select Metric opens the selection dialog depicted in Fig ure 4 7 This dialog allows to choose the displayed counter in the chart Each counter is defined by its metric and its measuring point Note depending on the measurement not all metrics might be available o
68. ommon prefix defined by the user The master file is always named name otf The global definition file is named name 0 def Events and local definitions are placed in files name x events and name x defs where the latter files are optional Snapshots and statistics are placed in files named name x snaps and name x stats which are optional too Note Open the master file otf to load a trace When copying moving or deleting traces it is important to take all according files into account otherwise Vampir will render the whole trace invalid Good practice is to hold all files belonging to one trace in a dedicated directory Detailed information about the Open Trace Format can be found in the Open Trace documentation 1 3 Vampir and Windows HPC Server 2008 The Vampir performance visualization tool usually consists of a performance monitor Vampir Trace that records performance data and a performance GUI which is respon sible for the graphical representation of the data In Windows HPC Server 2008 the performance monitor is fully integrated into the operating system which simplifies its employment and provides access to a wide range of system metrics A simple execu tion flag controls the generation of performance data This is very convenient and an important difference to solutions based on explicit source object or binary modifica tions Windows HPC Server 2008 is shipped with a translator which produces trace log files in
69. ow If it is disabled the Trace View windows can be moved freely over the Desktop 6 2 Appearance In the Appearance settings of the Preferences dialog there are six different objects for which the color options can be changed The functions function groups markers counters collectives messages and I O events Choose an entry and click on its color to make a modification A color picker dialog opens where it is possible to adjust the color For messages and collectives a change of the line width is also available In order to quickly find the desired item a search box is provided at the bottom of the dialog 66 no nua mamma CH mamaa BHBHBHH maana BHHHHH mana 1 BUBB mannana BHHHHUUH EHEBHHEH BHBHHHHH manomano BBBHEUHE BHEHHHUU DUBBHHHH BHEBBHHH DBDHBHEBHH mHHEHnnH DBDBBHH mmmEnnu ooo manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS CHAPTER6 CUSTOMIZATION 1 VMER D Preferences Function Groups Markers Counters Collectives Messages I O Events Name Color Application DYN 1 General Default o I O MEM 1 MPI NETCDF Ee NoGroup o PHYS VT API o Saving Policy Search vw Apply Q cancel Pax Figure 6 2 Appearance Settings 6 3 Saving Policy Vampir detects whenever changes to the various settings are made In the Saving Policy dialog it is possible to adjust the
70. own to zoom in and out re spectively Dragging the zoom area changes the section that is displayed without changing the zoom factor For dragging click into the highlighted zoom area and drag and drop it to the desired position Zooming and dragging within the Zoom Toolbar is illustrated in Figure 8 10 If the user double clicks in the Zoom Toolbar the initial zooming state is reverted XN OO Vampir Trace View Vampir Large wrf otf YW File Edit Chart Filter Window Help Figure 3 10 Zooming and Navigation within the Zoom Toolbar A B Zooming in out with the Mouse Wheel C Scrolling by Moving the Highlighted Zoom Area D Zooming by Selecting and Moving a Boundary of the Highlighted Zoom Area The colors represent user defined groups of functions or activities Please note that all charts added to the Trace View window will calculate their statistic information 22 DT O mamma CH mamaa GBSBHBHH maana 1 1 0 BHHHHH epee i mannana s opnnnns EHEBHHEH BHBHHHHH mHHHEHHH BBBHEUHE BHEBHHUU DUBBHHHH BHEBBHHH DBDBHEBHH DEDODO DBDBBHH mmmEnnu ooo manom oooo BHHBH BHEBDDDDDUDUDDHHUEHHN CHAPTER 3 BASICS vam according to the selected time interval zooming state in the Zoom Toolbar The Zoom Toolbar can be enabled and disabled with the toolbars context menu entry Zoom Toolbar 3 5 The Charts Toolbar Description M
71. r Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help SruBOsHReE 620 Timeline Process O Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Figure 4 1 Master Timeline OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help 3 TT 382 CT St S Fs Se recu Ree AE amp 7 2 IBS Timeline 13 25 s 13 50 s 13 75 s 14 00 s 14 25 s 14 50 s 14 75 s 15 00 s 15 25 s Process 0 E 4 oo Hom 8 WW 2 5 THU dE write o 10 E E J M H H M M M H M M D Figure 4 2 Process Timeline 26 no nua mamma CH mamaa 0 TH maana HHHHH maaa BUBBH mannana 1 BHHHHUEH EHEBHHEH BHBHHHHH mBHHHEHHH BBBHEUHE BHEBHEHUHU DHUBBGHHHH BHEBBHHH DBDHBHEBHH mHHEHnnu DBDBBHH mmEnnu ooo manom nnnu BHHBH BHBBDDDDDODUDDBHUEHHS Figure 4 3 Selected MPI Collective in Master Timeline show detailed information about that particular function e g its corresponding function group name time interval and the complete name The Context View display is explained in Chapter 4 3 3 Some function invocations are very short Hence these are not shown in the overall view due to a lack of display pixels A zooming mechanism is provided to inspect a spec
72. re the duration between two events in a timeline chart Vampir provides a tool called Ruler In order to use the Ruler hold the Shift key pressed while clicking on any point of interest in a timeline chart and moving the mouse while holding the left mouse button pressed A ruler like pattern appears in the timeline chart which provides the exact time between the start point and the current mouse position 4 1 1 Master Timeline and Process Timeline In the Master Timeline and the Process Timeline detailed information about func tions communication and synchronization events is shown Timeline charts are avail able for individual processes Process Timeline as well as for a collection of pro cesses Master Timeline The Master Timeline consists of a collection of rows Each row represents a single process as shown in Figure 4 1 A Process Timeline shows the different levels of function calls in a stacked bar chart for a single process as depicted in Figure 4 2 Every timeline row consists of a process name on the left and a colored sequence of function calls or program phases on the right The color of a function is defined by its group membership e g MDDI Send belonging to the function group MPI has the same color presumably red as MPI Recv which also belongs to the function group MPI Clicking on a function highlights it and causes the Context View display to 25 a Te 41 TIMELINE CHARTS OO Vampi
73. rom the displayed information To mark the function or function group to be profiled or filtered just click on the associated color representation in the chart The context menu entries Profile of Selected Function Group and Filter Se lected Function Group will then provide the possibility to profile or filter the selected function or function group Using the Process Filter see Section allows you to restrict this view to a set of processes The context menu entry Sort by allows you to order function profiles by Number of Clusters This option is only available if the chart is currently showing clusters Otherwise function profiles are sorted automatically by process While profiling one function the menu entry Sort by Value allows to order functions by their execution time 4 2 3 Message Summary The Message Summary is a statistical chart showing an overview of all messages grouped by certain characteristics Figure 4 14 All values are represented in a bar chart fashion The number next to each bar is the group base while the number inside a bar depicts the values depending on the chosen metric Therefore the Set Metric sub menu of the context menu can be used to switch 3 A GWT bosono 42 STATISTICAL CHARTS gt OO Vampir Trace View Vampir Large wrf otf MAX AVERAGE MIN Figure 4 14 Message Summary Chart with metric set to Message Transfer Rate showing the avera
74. s Figure 4 4 Information Filtering and Reduction Due to the large amount of information that can be stored in trace files it is usually nec essary to reduce the displayed information according to some filter criteria In Vampir there are different ways of filtering It is possible to limit the displayed information to a certain choice of processes or to specific types of communication events e g to cer tain types of messages or collective operations Deselecting an item in a filter means that this item is fully masked In Vampir filters are global Therefore masked items will no longer show up in any chart Filtering not only affects all performance charts but also the Zoom Toolbar The available filter can be reached via the Filter entry in the main menu Example Figure 4 22 shows a typical process representation in the Process Filter dialog This kind of representation is equal to all other filter dialog windows Processes can be filtered by their Process Group Communicators and Process Hierarchy Items to be filtered are arranged in a spreadsheet representation In addition to se lecting or deselecting an entire group of processes it is also possible to filter single processes D Filter Processes Include Exclude All Y Include Exclude All Ev ea 6 16 gt Process 0 Y neptun 16 16 gt Process 1 Communicators w Process 2 Process 3 Process Hierarchy Process 4 Y Process 5
75. s a combination of timestamp event type and event specific data This creates a stream of events which allows very detailed observations of parallel programs With this technology synchronization and communication patterns of parallel program runs can be traced and analyzed in terms of performance and correctness The analysis is usually carried out in a postmortem step i e after completion of the program It is needless to say Ay Ay that program traces can also be used to calculate the profiles mentioned above Com puting profiles from trace data allows arbitrary time intervals and process groups to be specified This is in contrast to fixed profiles accumulated during runtime 1 2 The Open Trace Format OTF The Open Trace Format OTF was designed as a well defined trace format with open public domain libraries for writing and reading This open specification of the trace information provides analysis and visualization tools like Vampir to operate efficiently at large scale The format addresses large applications written in an arbitrary combination of Fortran77 Fortran 90 95 etc C and C Local Definitions name x def M Events name x events Statistics name x stats Snapshots name x snaps ronen Local Definitions Events Master Control name otf EE TDT ES DAS SS SS SD MEN M MN MMC M es
76. sing the fold level buttons next to the function name Functions can be called by many different caller functions what is hardly obvious in the tree representation Therefore a relation view shows all callers and callees of the cur rently selected function in two separated lists shown in the lower area in Figure 4 17 In order to find a certain function by its name Vampir provides a search option acces sible via the context menu entry Find The entered keyword has to be confirmed by pressing the Return key The Previous and Next buttons can be used to flip through the results 41 7 A GWTzw A3 INFORMATIONAL CHARTS 4 3 Informational Charts 4 3 1 Function Legend The Function Legend lists all visible function groups of the loaded trace file along with their corresponding color Vampir Trace View Vampir Small wrf otf W File Edit Chart Filter Wingli k i I i on 3 as ut ix ti dw i Function Groups Markers Counters Collectives Messages I O Events E DYN Name Color M yo Application Bo NETCDF DYN 2 i Me Preferences EEE L 1 s M i General Default L malloc I MPI IO NETCDF EEE PHYS EE WR free L malloc EEE al MPI LC NoGroup L 1 PHYS E d VT API O WRF EE Saving Policy Search vam Dona Pox Figure 4 18 Function Legend If colors of functions are changed they appear in a tree like
77. ss 12 Process 13 Process 14 Process 1 MED_INITIALDATA_INPUT MODULE IO DOMAIN INPUT MODEL INPUT WRF_INPUTIN Figure 4 26 Showing only functions with more than 250 ms duration 51 A Combining Function Name and Duration Rules This example combines the two previous rules First the any relation is used Thus the filter shows all functions that have at least 250 ms duration time and additionally also all MPI functions Show only functions that match any of the following conditions Name e contans 6 mpi Duration Is greaterthan 250 Miliseconds FIC OO Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help m tc Ree FA RAY Timeline us 6 0s 6 55 7 05 7 35 8 0s gt Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 Figure 4 27 Combining rules using any 52 mamma CH mamaa GBHBHBHH maana HHHHH mannana BHHHHUHH manamana BHBHHHHH maHHEHHH BBBHEUSE BHHHHHUU DUBBHHHH BHEBBHHH DBDHHEBHH BEDOCO DBDBBHH mmmnnu ooo mmmmnn nnnu BHHBH The second example illustrates the usage of the all relation Here all shown functions have to satisfy both rules Therefore t
78. ss Bar and Cancel Loading Button The basic functionality and navigation elements of the GUI are described in Chapter The available charts and the information provided by them are explained in Chapter 14 no mamma 5 CH mamaa 1 GBHBHBHH maana nana maaa BUBHBH mannana BHHHUHH EHEBHHEG BHBHHHHH maHHEHHH BBBHEUSE BHEBHHHU DHUBBGHHHH BHEBBHHH DBDHHEBHH BEDOCO DDBBHH mmmEnnu ooo mamom nnnu BHHBH S080 000000000000R0n8 CHAPTERS BASICS EE 3 Basics After loading has been completed the Trace View window title displays the trace file s name as depicted in Figure 3 1 By default the Charts toolbar and the Zoom Toolbar are available XN GO Vampir Trace View Vampir Large wrf otf Figure 3 1 Trace View Window with Charts Toolbar A and Zoom Toolbar B Furthermore a default set of charts is opened automatically after loading has been finished The charts can be divided into three groups timeline statistical and infor mational charts Timeline charts show detailed event based information for arbitrary time intervals while statistical charts reveal accumulated measures which were com puted from the corresponding event data Informational charts provide additional or explanatory information regarding timeline and statistical charts All available charts can be opened with the Charts toolbar which is explained i
79. st hardware performance counters but arbitrary sample values There can be counters for different statistical information as well for instance counting the number of function calls or a value in an iterative approximation of the final result Counters are defined during the instrumentation of the application and can be individually assigned to processes XN OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help leg E E me SS CV We Os 255 505 75s 100s 1255s 1505 1755s 2005 Process 1 Values of Counter PAPI L3 TCM over Time Figure 4 6 Counter Data Timeline An example Counter Data Timeline chart is shown in Figure The chart is re stricted to one counter at a time It shows the selected counter for one measuring point e g process Using multiple instances of the Counter Data Timeline counters or processes can be compared easily The displayed graph in the chart is constructed from actual measurements data points Since display space is limited it is likely that there are more data points than display pixels available In that case multiple data points need to be displayed on one pixel width Therefore the counter values are displayed in three graphs A maximum line red an average line yellow and a minimum line blue When multiple data points need to be displayed on one pixel width the red line shows the data point with the highest value the blue line shows the p
80. t of the displayed value bars in the chart This useful for traces with a high number of processes Here the option Adjust Bar Height to Fit Chart Height tries to display all processes in the chart This enables a overview of the counter data across the entire application run 32 CHAPTER 4 PERFORMANCE DATA VISUALIZATION OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help 3 E E Eri Bes im Ae amp 2 0 Timeline Os 25s 50s 75s 100s 125s 150s 175s 200s Values of Metric PAPI_FP_OPS over Time Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 0G m e N e LA e E e Figure 4 9 Performance Radar 2 OO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help z E CG rg Se rcu Reap ae bO Timeline Os 25s 50s 75s 100s 125s 150s 175s 200 s Values of Metric PAPI FP OPS over Time l Process O Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 15 Loe 0G 1G 2G 3G 4G D D Figure 4 10 Adjusted value range in color scale 33 Ap GWT bismo A1 TIMELINE CHARTS Set Chart Mode allows to define whether minimum maximum or average values should be show
81. t synchronization routines This section points out two easy ways to recognize this problem Problem As can be seen in Figure 7 2 each occurrence of the MICROPHYSICS routine purple color starts at the same time on all processes inside one iteration but takes between 1 7 and 1 3 seconds to finish This imbalance leads to idle time in subsequent syn chronization calls on the processes 1 to 4 because they have to wait for process O to finish its work marked parts in Figure 7 2 This is wasted time which could be used for 70 mamma CC mamaa 1 GBHBHBHH maana BHHHHH mannana BHHHHUEH EHEBHHHEH BHBHHHHH mBHHHEHHH BBBHEUHE BHEBHHHU DHUBBGHHHH mBHEBBHHH DBDHHEBHH mHHEnnu DBDBBHH mmmnnu DBHHBH mamom oooo BHHBH BHBBDDDDDUDUDDHHEHHN CHAPTER7 AUSE CASE EE OO Vampir Trace View Vampir SuccessStory pmp old otf w File Edit Chart Filter Window Help Bx ES MicROPHYSICS Figure 7 2 Before Tuning Master Timeline and Function Summary identifying MICRO PHYSICS purple color as predominant and unbalanced gt OO Vampir Trace View Vampir SuccessStory pmp tuned otf w File Edit Chart Filter Window Help CHEN MICROPHYSICS 13 55 MEI Recv Figure 7 3 After Tuning Timeline and Function Summary showing an improvement in communication behavior 71 a au GWITo 72 IDENTIFIED PROBLEMS AND SOLUTIONS computational
82. tational imbalance and the serial optimiza tion In the timeline you can see that the duration of the MICROPHYSICS routine is now equal among all processes Through serial optimization the duration has been decreased from about 1 5 to 1 0 second A decrease in duration of about 339 is quite good given the simplicity of the changes done 7 2 3 High Cache Miss Rate The latency gap between cache and main memory is about a factor of 8 Therefore optimizing for cache usage is crucial for performance If you don t access your data in a linear fashion as the cache expects so called cache misses occur and the spe cific instructions have to suspend execution until the requested data arrives from main memory A high cache miss rate therefore indicates that performance might be im proved through reordering of the memory access pattern to match the cache layout of the platform Problem As can be seen in the Counter Data Timeline Figure 7 4 the CLIPPING routine light blue causes a high amount of L2 cache misses Also its duration is long enough to make it a candidate for inspection What caused these inefficiencies in cache usage were nested loops which accessed data in a very random non linear fashion Data access can only profit from cache if subsequent read calls access data in the vicinity of the previously accessed data Solution After reordering the nested loops to match the memory order the tuned version of the CLIPPING routine now nee
83. teristic 18 CHAPTER 3 BASICS OO Vampir W File Edit Chart Filter Window Help Srusexvtmss Ble Trace View Vampir Large wrf otf Function Summary Timeline Ax Os 50s 100s 150s 200s All Processes Accumulated Exclusive 1 000 s Os Process 0 j j Process 1 Process 2 Process 3 Process 4 Process 5 7 1815 NETCDF Process 6 1s MEM Process 7 0 1 s VT API Process 8 Process 9 Function Legend Process 10 m DYN Process 11 P yo Process 12 ll mem Process 13 E mei D NETCDF Process 14 m PHYS Process 15 m VT API L WRF s Figure 3 6 Undocking of a Chart 209 O Vampir Function Summary Vampir Large wrf otf O File Window Help All Processes Accumulated Exclusive Time per Function 400 s 3505 300 s 2505 200 s 150s 100s 50s Os RADIATION_DRIVER ESS CALC P RHO PH ER vc w DE vsu DEER AVEC SCALAR DEER AO VANCE UV ERES A0vance MU T 64 8885 ALLOC SPACE FIELD BSS SMALL STEP_PREP CALC_P_RHO ERRA run PREP 46 491 s IB suMFLUX 44 281 s IBI Rk TENDENCY 44 016 s J CUMULUS DRIVER 36 6535 BI RK ADDTEND DRY 33 484 s HM MOIST_PHYSICS_PREP_EM 33 049 s IR UPDATE SCALAR 32 674 s IBI ADVECT v 32 604 s SMALL STEP FINISH Figure 3 7 Docking of a Chart 19 a am Te 88 ZOOMING GO Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Win
84. triangle has to be selected In that case a second triangle indicating the end of the interval appears Multiple I O events are tricolored and drawn as a triangle with a line to the end of the interval Table 4 1 Additional Information in the Master and Process Timeline A search string can be written in this field and all corresponding function and func tion group occurrences are highlighted in yellow in the Master Timeline An example search for the function MPI Bcast is depicted in Figure 4 4 Furthermore the Master Timeline also features an overlay mode for performance counter data Figure 4 5 In order to activate the overlay mode use the context menu Options Performance Data When the overlay mode is active a control window appears at the top of Master Timeline It allows to select the displayed counter data metric The counter data is displayed in a color coded fashion like in the Performance Radar Section The color scale can be freely customized by clicking on the wrench icon The control window also provides an opacity control slider This slider allows to adjust the opacity of the overlay and thus makes the underlying functions easily visible without the need to disable the overlay mode 29 7 A GWT bismo 441 TIMELINE CHARTS 4 1 2 Counter Data Timeline Counters are values collected over time to count certain events like floating point op erations or cache misses Counter values can be used to store not ju
85. ulated Exclusive Time To change between these three modes use the context menu entry Set Metric The number of clustered profile bars is based on the chart height by default You can also disable the clustering or set a fixed number of clusters via the context menu entry Clustering by selecting the corresponding value in the spin box Located left of the clustered profile bars is a graphical overview indicating the processes associated to the cluster Moving the cursor over the blue areas in the overview opens a tooltip stating the respective process name 36 no nua aEEHS CC mamaa TH maana BHHHHH maaa BUBBH mannana BHHHHUHH EHHEBHHEH BHBHHHHH mBHHEHHH BBBHEUHE BHEHBHHUU DUBBHHHH BHEBBHHH DBDHHEBHH mHHmEnnu DBDBBHH mmmEnnu ooo manom oooo BHHBH BHBBDDDDDUDUDDHHUEHHS Vampir Trace View Vampir Large wrf otf W File Edit Chart Filter Window Help BE Seu BON El 45 e MI e Process Summary individual Processes Accumulated Exclusive e per Functions 60 s co eo Vi 100 s 120s u u Du o N o A o Process O Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 39297277 SRBRRS ARR RE GEGNES Figure 4 13 Process Summary lt is possible to profile only one function or function group or to hide arbitrary functions and function groups f
86. ve privileges are necessary The cluster administrator will only have to add the Performance Log Users group to the head node s Users group if you want to use this group for tracing Trace files will be generated during the execution of your application The recorded trace log files include the following events Any MS MPI application call and low level com munication within sockets shared memory and NetworkDirect implementations Each event includes a high precision CPU clock timer for precise visualization and analysis 2 2 2 Tracing an MPI Application The steps necessary for monitoring the MPI performance of an MS MPI application are depicted in Figure First the application needs to be available throughout all compute nodes in the cluster and has to be started with tracing enabled The Event Tracing for Windows ETW infrastructure writes eventlogs etl files containing the respective MPI events of the application on each compute node In order to achieve consistent event data across all compute nodes clock corrections need to be applied This step is performed after the successful run of the application using the Microsoft tool mpicsync Now the eventlog files can be converted into OTF files with help of the tool et 2otf The last necessary step is to copy the generated OTF files from the compute nodes into one shared directory Then this directory includes all files needed by the Vampir performance GUI The application performance can be a
87. wing conditions Name Contains mpi GO Vampir Trace View Vampir WRF wrf otf Y File Edit Chart Filter Window Help olew SS ve Timeline 6 5 s Process O Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 Process 13 Process 14 Process 1 Figure 4 25 Showing only MPI 50 mamma CH mamaa GBBHBHH maana BHHHHH maaa Dumm mannana 0 BHHHHUEH EHEBHHHEG BHBHHHHH maHHEHHH BBBHEUHE BHHHHHHU DHUBBGHHHH BHEBBHHH DBDBHEBHH mHHEHDH DDBBHH mmmEnnu BDBHHBH manom oooo BHHBH BHBBDDDDDUDUDDHHEHHS Showing only Functions with at least 250 ms Duration This example demonstrates the filtering of functions by their duration Here only long function occurrences with a minimum duration time of 250 ms are shown All other functions are filtered Y Show only functions that match any of the following conditions Duration sJlisgresterthan lt 250 Miliseconds s ven ence OO Vampir Trace View Vampir WRF wrf otf W File Edit Chart Filter Window Help Sc Ree FE 7 i Timeline 239 6 05 6 55 7 0s rey 8 05 gt Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Proce
88. work if all MICROPHYSICS calls would have the same duration An other hint at this overhead in synchronization is the fact that the MPI receive routine uses 17 6 of the time of one iteration Function Summary in Figure 7 2 Solution To even out this asymmetry the code which determines the size of the work packages for each process had to be changed To achieve the desired effect an improved ver sion of the domain decomposition has been implemented Figure 7 3 shows that all occurrences of the MICROPHYSICS routine are vertically aligned thus balanced Ad ditionally the MPI receive routine calls are now clearly smaller than before Comparing the Function Summary of Figure 7 2 and Figure 7 3 shows that the relative time spent in MPI receive has been decreased and in turn the time spent inside MICROPHYSICS has been increased greatly This means that we now spend more time computing and less time communicating which is exactly what we want 7 2 2 Serial Optimization Inlining of frequently called functions and elimination of invariant calculations inside loops are two ways to improve the serial performance This section shows how to detect candidate functions for serial optimization and suggests measures to speed them up Problem All performance charts in Vampir show information of the time span currently selected in the timeline Thus the most time intensive routine of one iteration can be determined by zooming into one or more iter

Download Pdf Manuals

image

Related Search

Related Contents

Manual del usuario  GDPS Ver. 1.2 User`s Manual  M140BA-1 User`s Manual  Woodstock W1725 User's Manual  Important - Concept Even  PowerPak Install - Super Six Motorsports  english - Parts Town  "取扱説明書"  UX-T550  

Copyright © All rights reserved.
Failed to retrieve file