Home

Valgrind Documentation

image

Contents

1. In C it s important to deallocate memory in a way compatible with how it was allocated The deal is e If allocated with malloc calloc realloc valloc or memalign you must deallocate with free If allocated with new you must deallocate with delete If allocated with new you must deallocate with delete The worst thing is that on Linux apparently it doesn t matter if you do muddle these up and it all seems to work ok but the same program may then crash on a different platform Solaris for example So it s best to fix it properly According to the KDE folks it s amazing how many C programmers don t know this Pascal Massimino adds the following clarification delete must be used for objects allocated by new because the compiler stores the size of the array and the pointer to member to the destructor of the array s content just before the pointer actually returned This implies a variable sized overhead in what s returned by new or new 3 3 5 Passing system call parameters with inadequate read write permissions Memcheck checks all parameters to system calls It checks all the direct parameters themselves Also if a system call needs to read from a buffer provided by your program Memcheck checks that the entire buffer is addressible and has valid data ie it is readable Also if the system call needs to write to a user supplied buffer Memcheck checks that the buffer is address
2. The tool does nothing except run the program uninstrumented These steps don t have to be followed exactly you can choose different names for your source files and use a different prefixfor configure Now that we ve setup built and tested the simplest possible tool onto the interesting stuff 4 2 6 Writing the code A tool must define at least these four functions preMeillo reale 0 Post ello sbgate 0 instrument fini Also it must use the macro VG_DETERMINE_INTERFACE_VERSION exactly once in its source code If it doesn t you will get a link error involving VG tool interface version This macro is used to ensure the core tool interface used by the core and a plugged in tool are binary compatible In addition if a tool wants to use some of the optional services provided by the core it may have to define other functions and tell the code about them 4 2 7 Initialisation Most of the initialisation should be done in pre clo init Only use post clo init if a tool provides command line options and must do some initialisation after option processing takes place c10 stands for command line options First of all various details need to be set for a tool using the functions VG details x Some are all compulsory some aren t Some are used when constructing the startup message detail bug reports tois usedif VG tool panic is ever called or a tool asser
3. 3 1 3 Associations The most important extension to the original format of Cachegrind is the ability to specify call relationship among functions More generally you specify assoziations among positions For this the second part of the file also can contain assoziation specifications These look similar to position specifications but consist of 2 lines For calls the format looks like calls Call Count Destination position Source position Inclusive cost of call The destination only specifies subpositions like line number Therefore to be able to specify a call to another function in another source file you have to precede the above lines with a cfn specification for the name of the called function and a cfl specification if the function is in another source file The 2nd line looks like a regular cost line with the difference that inclusive cost spent inside of the function call has to be specified Other assoziations which or for example conditional jumps See the reference below for details 3 1 4 Extended Example The following example shows 3 functions main funcl and func2 Function main calls funcl once and func2 3 times func calls func2 2 times events Instructions l filel c n main 20 n funcl ls 1 50 400 l file2 c n func2 1s 3 20 00 GU OY de O O dan dy Fay rn ony A on Eo O Y fn func1 Sal L00 cfl file2 c cfn func2 calls 2 20 5l 300
4. Cachegrind is a cache profiler It performs detailed simulation of the I1 D1 and L2 caches in your CPU and so can accurately pinpoint the sources of cache misses in your code If you desire it will show the number of cache misses memory references and instructions accruing to each line of source code with per function per module and whole program summaries If you ask really nicely it will even show counts for each individual machine instruction On x86 and AMD64 Cachegrind auto detects your machine s cache configuration using the CPUID instruction and so needs no further configuration info in most cases Cachegrind is nicely complemented by Josef Weidendorfer s amazing KCacheGrind visualisation tool http kcachegrind sourceforge net a KDE application which presents these profiling results in a graphical and easier to understand form Introduction 3 Helgrind finds data races in multithreaded programs Helgrind looks for memory locations which are accessed by more than one POSIX p thread but for which no consistently used pthread_mutex_ lock can be found Such locations are indicative of missing synchronisation between threads and could cause hard to find timing dependent problems Helgrind Hell s Gate in Norse mythology implements the so called Eraser data race detection algorithm along with various refinements thread segment lifetimes which reduce the number of false errors it reports It is as yet somewhat of
5. Why all the hassle Because imagine the potential chaos of both the simulated and real CPUs executing in glibc so Itjust seems simpler and cleaner to be completely self contained so that only the simulated CPU visits glibc so In practice it s not much hassle anyway Also valgrind starts up before glibc has a chance to initialise itself and who knows what difficulties that could lead to Finally glibc has definitions for some types specifically sigset t which conflict are different from the Linux kernel s idea of same When Valgrind wants to fiddle around with signal stuff it wants to use the kernel s definitions not glibc s definitions So it s simplest just to keep glibc out of the picture entirely To find out which glibc symbols are used by Valgrind reinstate the link flags nostdlib W1l no undefined This causes linking to fail but will tell you what you depend on I have mostly but not entirely got rid of the glibc dependencies what remains is IMO fairly harmless AFAIK the current dependencies are memset memcmp stat system sbrk set jmp and 1ong jmp Similarly valgrind should not really import any headers other than the Linux kernel headers since it knows of no API other than the kernel interface to talk to At the moment this is really not in a good state and vg syscall mem imports via vg unsafe h a significant number of C library headers so as to know the sizes of various structs passed across the kernel boun
6. Using and understanding the Valgrind core Here is an important point about the relationship between the commentary and profiling output from tools The commentary contains a mix of messages from the Valgrind core and the selected tool If the tool reports errors it will report them to the commentary However if the tool does profiling the profile data will be written to a file of some kind depending on the tool and independent of what 1log options are in force The commentary is intended to be a low bandwidth human readable channel Profiling data on the other hand is usually voluminous and not meaningful without further processing which is why we have chosen this arrangement 2 4 Reporting of errors When one of the error checking tools Memcheck Helgrind detects something bad happening in the program an error message is written to the commentary For example 25832 Invalid read of size 4 25832 at 0x8048724 BandMatrix ReSize int int int bogon cpp 45 25832 by 0x80487AF main bogon cpp 66 25832 Address OxBFFFF74C is not stack d malloc d or free d This message says that the program did an illegal 4 byte read of address OxBFFFF74C which as far as Memcheck can tell is not a valid stack address nor corresponds to any currently malloc d or free d blocks The read is happening at line 45 of bogon cpp called from line 66 of the same file etc For errors associated with an identifi
7. valgrind vg_intercept c 598 vgAllRoadsLeadToRome_select Assertion ms_end gt ms_now failed Implement pthread_mutexattr_setpshared Understand Pentium 4 branch hints Also implemented a couple more obscure x86 instructions Lots of other minor bug fixes We have a decent regression test system for the first time This doesn t help you directly but it does make it a lot easier for us to track the quality of the system especially across multiple linux distributions You can run the regression tests with make regtest after make install completes On SuSE 8 2 and Red Hat 9 I get this 84 tests O stderr failures O stdout failures On Red Hat 8 I get this 84 tests 2 stderr failures 1 stdout failure corecheck tests res_search stdout memcheck tests sigaltstack stderr sigaltstack is probably harmless res_search doesn t work on R H 8 even running natively so I m not too worried On Red Hat 7 3 a glibc 2 2 5 system I get these harmless failures 84 tests 2 stderr failures 1 stdout failure corecheck tests pth_atfork1 stdout corecheck tests pth_atfork1 stderr memcheck tests sigaltstack stderr You need to run on a PII system at least since some tests contain P6 specific instructions and the test machine needs access to the internet so that corecheck tests res_search a test that the DNS resolver works can function 178 NEWS As ever thanks for the vast
8. 96966 valgrind fails when application opens more than 16 sockets 97398 valgrind vg libpthread c 2667 Assertion failed 97407 valgrind vg mylibc c 1226 vgPlain_safe_fd Assertion 97427 Warning invalid file descriptor 1 in syscall close 97785 missing backtrace 97792 build in obj dir fails autoconf makefile cleanup 97880 pthread mutex lock fails from shared library special ker 97975 program aborts without ang VG messages 98129 Failed when open and close file 230000 times using stdio 98175 Crashes when using valgrind 2 2 0 with a program using al 98288 Massif broken 98303 UNIMPLEMENTED FUNCTION pthread condattr setpshared 98630 failed compilation missing warnings pm fails to make he 98756 Cannot valgrind signal heavy kdrive X server 98966 valgrinding the JVM fails with a sanity check assertion 99035 Valgrind crashes while profiling 99142 loops with message Signal 11 being dropped from thread O 99195 threaded apps crash on thread start using QThread start 99348 Assertion vgPlain lseek core fd 0 1 phdrs i p off 99568 False negative due to mishandling of mprotect 99738 valgrind memcheck crashes on program that uses sigitimer 99923 0 sized allocations are reported as leaks 99949 program seg faults after exit 100036 newSuperblock s request for 1048576 bytes failed 100116 valgrind pthread cond init Assertion sizeof cond 100486 memcheck reports valgrind the impossible ha
9. Consider the example code above After the preliminary pass we know we need two cost centres one iCC and one acc So we allocate an array to store these which looks like this einne tag 1 byte inane instr size 1 bytes uninit padding 2 bytes aminas instr_addr 4 bytes uninit La 8 bytes EEES JE c fal 8 bytes uninit TIMA 8 bytes uninit tag 1 byte tinaas instr_size 1 byte uninit data_size 1 byte uninit padding 1 byte COTES TOR instr addr 4 bytes uninit Toa 8 bytes ARES JE staal 8 bytes Unas JE Sui 8 bytes Gumarsie ra bie te D a 8 bytes Gasse Deil 8 bytes Acs D m2 8 bytes We can see now why we need tags to distinguish between the two types of cost centres We also record the size of the array We look up the debug info of the first instruction in the basic block and then stick the array into a table indexed by filename and function name This makes it easy to dump the information quickly to file at the end 2 4 Instrumentation The instrumentation pass has two main jobs 119 How Cachegrind works 1 Fill in the gaps in the allocated cost centres 2 Add UCode to call the cache simulator for each instruction The instrumentation pass steps through the UCode and the cost centres in tandem As each original x86 instruction s UCode is processed the appropriate gaps in the instructions cost centre are filled in for example ENSTRACe
10. For line the position is the line number of a source file which is responsible for the events raised Note that the mapping of instr and line positions are given by the debugging line information produced by the compiler This field is optional If not specified line is supposed only events event type abbrevations Cachegrind A list of short names of the event types logged in this file The order is the same as in cost lines The first event type is the second or third number in a cost line depending on the value of positions Callgrind does not add additional cost types Specify exactly once Cost types from original Cachegrind are Ir Instruction read access e l1mr Instruction Level 1 read cache miss 2mr Instruction Level 2 read cache miss summary costs Callgrind totals costs Cachegrind The value or the total number of events covered by this trace file Both keys have the same meaning but the totals line happens to be at the end of the file while summary appears in the header This was added to allow postprocessing tools to know in advance to total cost The two lines always give the same cost counts 131 Callgrind Format Specification 3 2 3 Description of Body Lines There exist lines spec position The values for position specifications are arbitrary strings When starting with and a digit it s a string in compressed format Otherwise it s the real position string This allows
11. INCEIPo 9 GETVFo q18 TESTVo q18 Jnzo 0x40435A50 rOSZACP JMPo 0x40435A5B The Design and Implementation of Valgrind 1 2 10 Translation from UCode This is all very simple even though vg_from_ucode c is a big file Position independent x86 code is generated into a dynamically allocated array emitted_code this is doubled in size when it overflows Eventually the array is handed back to the caller of VG_ translate who must copy the result into TC and TT and free the array This file is structured into four layers of abstraction which thankfully are glued back together with extensive inline directives From the bottom upwards Address mode emitters emit_amode_regmem_reg et al Emitters for specific x86 instructions There are quite a lot of these with names such as emit_movv_offregmem_reg The v suffix is Intel parlance for a 16 32 bit insn there are also b suffixes for 8 bit insns e The next level up are the synth x functions which synthesise possibly a sequence of raw x86 instructions to do some simple task Some of these are quite complex because they have to work around Intel s silly restrictions on subregister naming See synth_nonshiftop_reg_reg for example e Finally at the top of the heap we have emitUInstr which emits code for a single uinstr Some comments The hack for FPU instructions becomes apparent here To do a FPU ucode instruction we load the simulated FPU s state into from i
12. OPUS t2 SEAX 0x40435A54 testb 0x20 1 ecx eax 2 73 CETL SEAX t6 8 GETL ECX t8 9 LEA2L i EG 6 2 y t4 10 LDB e y TE 11 MOVB S020 TL2 12 ANDB CUm ECO SVG TS MIN GENISE Be 0x40435A59 jnz 8 0x40435A50 l s gazo 0x40435A50 rOSZACP SO MPO 0x40435A5B Notice how the block always ends with an unconditional jump to the next block This is a bit unnecessary but makes many things simpler Most x86 instructions turn into sequences of GET PUT LEA1 LEA2 LOAD and STORE Some complicated ones however rely on calling helper bits of code in vg helpers S The ucode instructions PUSH POP CALL CALLM S and CALLM E support this The calling convention is somewhat ad hoc and is not the C calling convention The helper routines must save all integer registers and the flags that they use Args are passed on the stack underneath the return address as usual and if result s are to be returned it they are either placed in dummy arg slots created by the ucode PUSH sequence or just overwrite the incoming args 103 The Design and Implementation of Valgrind In order that the instrumentation mechanism can handle calls to these helpers VG_ saneUCodeBlock enforces the following restrictions on calls to helpers Each CALL uinstr must be bracketed by a preceding CALLM_S marker dummy uinstr and a trailing CALLM_E marker These markers are used by the instrumentation mechanism later to establi
13. specifically main executables do not have sonames Any object lacking a soname is treated as if its soname was NONE which is why the original example above had a name I WRAP SONAME FNNAME ZU NONE foo I I 2 10 3 Wrapping Semantics The ability for a wrapper to replace an infinite family of functions is powerful but brings complications in situations where ELF objects appear and disappear are dlopen d and dlclose d on the fly Valgrind tries to maintain sensible behaviour in such situations For example suppose a process has dlopened an ELF object with soname object1 so which contains functionl It starts to use function1 immediately After a while it dlopens wrappers so which contains a wrapper for function1 in soname object1 so All subsequent calls to function are rerouted to the wrapper If wrappers so is later dlclose d calls to function are naturally routed back to the original Alternatively if object1 so is dlclose d but wrappers so remains then the wrapper exported by wrapper so becomes inactive since there is no way to get to it there is no original to call any more However Valgrind remembers that the wrapper is still present If object1 so is eventually dlopen d again the wrapper will become active again 26 Using and understanding the Valgrind core In short valgrind inspects all code loading unloading events to ensure that the set of currently active wrappers remains consis
14. 3 August 2005 3 0 0 is a major overhaul of Valgrind The most significant user visible change is that Valgrind now supports architectures other than x86 The new architectures it supports are AMD64 and PPC32 and the infrastructure is present for other architectures to be added later AMD64 support works well but has some shortcomings It generally won t be as solid as the x86 version For example support for more obscure instructions and system calls may be missing We will fix these as they arise Address space may be limited see the point about 161 NEWS position independent executables below If Valgrind is built on an AMD64 machine it will only run 64 bit executables If you want to run 32 bit x86 executables under Valgrind on an AMD64 you will need to build Valgrind on an x86 machine and copy it to the AMD64 machine And it probably won t work if you do something tricky like exec ing a 32 bit program from a 64 bit program while using trace children yes We hope to improve this situation in the future The PPC32 support is very basic It may not work reliably even for small programs but it s a start Many thanks to Paul Mackerras for his great work that enabled this support We are working to make PPC32 usable as soon as possible Other user visible changes Valgrind is no longer built by default as a position independent executable PIE as this caused too many problems Without PIE enabled AMD64 p
15. 4 Set any breakpoints you want and proceed as normal for GDB gdb b vgPlain_do_exec The macro VG FU could do like this in NC is expanded to vgPlain FUNC so If you want to set a breakpoint VG do exec you GDB 5 Run the tool with required options gdb run pwd 139 Writing a New Valgrind Tool GDB may be able to give you useful information Note that by default most of the system is built with fomit frame pointer and you ll need to get rid of this to extract useful tracebacks from GDB 4 2 11 3 UCode Instrumentation Problems If you are having problems with your VEX UIR instrumentation it s likely that GDB won t be able to help at all In this case Valgrind s trace flags option is invaluable for observing the results of instrumentation 4 2 11 4 Miscellaneous If you just want to know whether a program point has been reached using the OINK macro in include pub tool libcprint h can be easier than using GDB The other debugging command line options can be useful too run valgrind help debug for the list 4 3 Advanced Topics Once a tool becomes more complicated there are some extra things you may want need to do 4 3 1 Suppressions If your tool reports errors and you want to suppress some common ones you can add suppressions to the suppression files The relevant files are valgrind supp the final suppression file is aggregated from these files by combining the relevant supp file
16. Addr8 Addr16 meaning an invalid address during a memory access of 1 2 4 8 or 16 bytes respectively Or Param meaning an invalid system call parameter error Or Free meaning an invalid or mismatching free Or Overlap meaning a src dst overlap in memcpy or a similar function Or Leak meaning a memory leak The extra information line for Param errors is the name of the offending system call parameter No other error kinds have this extra line The first line of the calling context for Value and Addr errors it is either the name of the function in which the error occurred or failing that the full path of the so file or executable containing the error location For Free errors is the name of the function doing the freeing eg free builtin vec delete etc For Overlap errors is the name of the function with the overlapping arguments eg memcpy strcpy etc Lastly there s the rest of the calling context 3 5 Details of Memcheck s checking machinery Read this section if you want to know in detail exactly what and how Memcheck is checking 3 5 1 Valid value V bits It is simplest to think of Memcheck implementing a synthetic CPU which is identical to a real CPU except for one crucial detail Every bit literally of data processed stored and handled by the real CPU has in the synthetic CPU an associated valid value bit which says whether or not the accompanying bit has a legitimate
17. This merely gives a handy name to the suppression by which it is referred to in the summary of used suppressions printed out when a program finishes It s not important what the name is any identifying string will do e Second line name of the tool s that the suppression is for if more than one comma separated and the name of the suppression itself separated by a colon Nb no spaces are allowed eg tool_namel tool_name2 suppression_name Recall that Valgrind is a modular system in which different instrumentation tools can observe your program whilst it is running Since different tools detect different kinds of errors it is necessary to say which tool s the suppression is meaningful to Tools will complain at startup if a tool does not understand any suppression directed to it Tools ignore suppressions which are not directed to them As a result it is quite practical to put suppressions for all tools into the same suppression file Valgrind s core can detect certain PThreads API errors for which this line reads core PThread 13 Using and understanding the Valgrind core Next line a small number of suppression types have extra information after the second line eg the Param suppression for Memcheck Remaining lines This is the calling context for the error the chain of function calls that led to it There can be up to four of these lines Locations may be either names of shared objects executables
18. it should be fairly obvious how the instrumentation machinery hangs together One point if you do this in order to make it easy to differentiate TempRegs carrying values from TempRegs carrying V bit vectors Valgrind prints the former as for example t 28 and the latter as 928 the fact that they carry the same number serves to indicate their relationship This is purely for the convenience of the human reader the register allocator and code generator don t regard them as different 1 2 6 Translation into UCode VG disBB allocates anew UCodeBlock and then uses disInstr to translate x86 instructions one at a time into UCode dumping the result in the UCodeBlock This goes on until a control flow transfer instruction is encountered Despite the large size of vg to ucode c this translation is really very simple Each x86 instruction is translated entirely independently of its neighbours merrily allocating new TempRegs as it goes The idea is to have a simple translator in reality no more than a macro expander and the resulting bad UCode translation is cleaned up by the UCode optimisation phase which follows To give you an idea of some x86 instructions and their translations this is a complete basic block as Valgrind sees it 0x40435A50 incl edx 0 GETL EDX tO 1 INCL tO wOSZAP 2 PUTL t0 SEDX 0x40435A51 movsbl edx eax 3 GETL EDX t2 4 LDB ED PE 5 WIDENL Bs t2
19. name where name is a directory all tests within will be run or a single vgperf test file or the name of a program which has a like named vgperf file Eg perl perf vg perf perf perl perf vg perf perf bz2 vgperf perl perf vg perf perf bz2 188 README_DEVELOPERS To compare multiple versions of Valgrind use the vg option multiple times For example if you have two Valgrinds next to each other one in trunk1 and one in trunk2 from within either trunk1 or trunk2 do this to compare them on all the performance tests perl perf vg perf vg trunk1 vg trunk2 perf Debugging Valgrind with GDB To debug the valgrind launcher program lt prefix gt bin valgrind just run it under gdb in the normal way Debugging the main body of the valgrind code and or the code for a particular tool requires a bit more trickery but can be achieved without too much problem by following these steps 1 Set VALGRIND_LAUNCHER to lt prefix gt bin valgrind export VALGRIND_LAUNCHER usr local bin valgrind 2 Run gdb lt prefix gt lib valgrind lt platform gt lt tool gt gdb usr local lib valgrind ppc32 linux lackey 3 Do handle SIGSEGV SIGILL nostop noprint in GDB to prevent GDB from stopping on a SIGSEGV or SIGILL gdb handle SIGILL SIGSEGV nostop noprint 4 Set any breakpoints you want and proceed as normal for gdb The macro VG_ FUNC is expanded to vgPlain_FUNC so If you want to set a breakpoint VG_ do
20. not to mention fiddly and fragile It needs to be cleaned up The only perhaps surprise is that the whole thing is run on top of a set jmp installed exception handler be cause supposing a translation got a segfault we have to bail out of the Valgrind supplied exception handler VG oursignalhandler and immediately start running the client s segfault handler if it has one In par ticular we can t finish the current basic block and then deliver the signal at some convenient future point because signals like SIGILL SIGSEGV and SIGBUS mean that the faulting insn should not simply be re tried I m sure there is a clearer way to explain this 1 2 12 Lazy updates of the simulated program counter Simulated SEIP is not updated after every simulated x86 insn as this was regarded as too expensive Instead ucode INCEIP insns move it along as and when necessary Currently we don t allow it to fall more than 4 bytes behind reality see VG_ disBB for the way this works Note that SEIP is always brought up to date by the inner dispatch loop in VG_ dispatch so that if the client takes a fault we know at least which basic block this happened in 1 2 13 Signals Horrible horrible vg_signals c Basically since we have to intercept all system calls anyway we can see when the client tries to install a signal handler If it does so we make a note of what the client asked to happen and ask the kernel to route the signal to our own s
21. so the FPU hack applies This should be fairly easy 1 4 2 Fix stabs info reader The machinery in vg symtab2 c which reads stabs style debugging info is pretty weak It usually correctly translates simulated program counter values into line numbers and procedure names but the file name is often completely wrong I think the logic used to parse stabs entries is weak It should be fixed The simplest solution IMO is to copy either the logic or simply the code out of GNU binutils which does this since GDB can clearly get it right binutils or GDB must have code to do this somewhere 1 4 3 BT BTC BTS BTR These are x86 instructions which test complement set or reset a single bit in a word At the moment they are both incorrectly implemented and incorrectly instrumented The incorrect instrumentation is due to use of helper functions This means we lose bit level definedness tracking which could wind up giving spurious uninitialised value use errors The Right Thing to do is to invent a couple of new UOpcodes I think GET BIT and SET BIT which can be used to implement all 4 x86 insns get rid of the helpers and give bit accurate instrumentation rules for the two new UOpcodes I realised the other day that they are mis implemented too The x86 insns take a bit index and a register or memory location to access For registers the bit index clearly can only be in the range zero to register width minus 1 and I assumed the sam
22. that you have to use an IP address here rather than a hostname Writing to a network socket is pretty useless if you don t have something listening at the other end We provide a simple listener program valgrind listener which accepts connections on the specified port and copies whatever it is sent to stdout Probably someone will tell us this is a horrible security risk It seems likely that people will write more sophisticated listeners in the fullness of time valgrind listener can accept simultaneous connections from up to 50 valgrinded processes In front of each line of output it prints the current number of active connections in round brackets valgrind listener accepts two command line flags or xit at zero when the number of connected processes falls back to zero exit Without this it will run forever that is until you send it Control C e portnumber changes the port it listens on from the default 1500 The specified port must be in the range 1024 to 65535 The same restriction applies to port numbers specified by a 1og socket to Valgrind itself If a valgrinded process fails to connect to a listener for whatever reason the listener isn t running invalid or unreachable host or port etc Valgrind switches back to writing the commentary to stderr The same goes for any process which loses an established connection to a listener In other words killing the listener doesn t kill the processes sending data to it 11
23. to assure everyone the effective freedom to copy and redistribute it with or without modifying it either commercially or noncommercially Secondarily this License preserves for the author and publisher a way to get credit for their work while not being considered responsible for modifications made by others This License is a kind of copyleft which means that derivative works of the document must themselves be free in the same sense It complements the GNU General Public License which is a copyleft license designed for free software We have designed this License in order to use it for manuals for free software because free software needs free documentation a free program should come with manuals providing the same freedoms that the software does But this License is not limited to software manuals it can be used for any textual work regardless of subject matter or whether it is published as a printed book We recommend this License principally for works whose purpose is instruction or reference 1 APPLICABILITY AND DEFINITIONS This License applies to any manual or other work in any medium that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License Such a notice grants a world wide royalty free license unlimited in duration to use that work under the conditions stated herein The Document below refers to any such manual or work Any member of the public is a lice
24. whose eyes can see to the far regions of the nine worlds Only those judged worthy by the guardians are allowed to pass through Valgrind All others are refused entrance It s not short for value grinder although that s not a bad guess 2 Compiling installing and configuring 2 1 When I trying building Valgrind make dies partway with an assertion failure something like this make expand c 489 allocated variable append Assertion current variable set list next 0 failed It s probably a bug in make Some but not all instances of version 3 79 1 have this bug see www mail archive com bug make gnu org msg01658 html Try upgrading to a more recent version of make Alterna tively we have heard that unsetting the CFLAGS environment variable avoids the problem 2 2 When I try to build Valgrind make fails with usr bin ld cannot find lc collect2 ld returned 1 exit status You need to install the glibc static devel package 82 Valgrind Frequently Asked Questions 3 Valgrind aborts unexpectedly 3 1 3 2 3 3 3 4 Programs run OK on Valgrind but at exit produce a bunch of errors involving libc_freeres and then die with a segmentation fault When the program exits Valgrind runs the procedure __libc_freeres in glibc This is a hook for memory debuggers so they can ask glibc to free up any memory it has used Doing that is needed to ensure that Valgrind doesn t incorrec
25. yes no default no When enabled assume that reads and writes some small distance below the stack pointer are due to bugs in gcc 2 96 and does not report them The small distance is 256 bytes by default Note that gcc 2 96 is the default compiler on some older Linux distributions RedHat 7 X and so you may need to use this flag Do not use it if you do not have to as it can cause real errors to be overlooked A better alternative is to use a more recent gcc g in which this bug is fixed partial loads ok yes no default no Controls how memcheck handles word sized word aligned loads from addresses for which some bytes are address ible and others are not When yes such loads do not elicit an address error Instead the loaded V bytes correspond ing to the illegal addresses indicate Undefined and those corresponding to legal addresses are loaded from shadow memory as usual When no loads from partially invalid addresses are treated the same as loads from completely invalid addresses an illegal address error is issued and the resulting V bytes indicate valid data Note that code that behaves in this way is in violation of the the ISO C C standards and should be considered broken If at all possible such code should be fixed This flag should be used only as a last resort 40 Memcheck a heavyweight memory checker undef valu rrors yes no default yes Controls whether memcheck detects dangerous uses of
26. 1 2 How to navigate this manual 0 he hh 8 2 Using and understanding the Valgrind core 1 eee eee eee 9 2 1 What Valgrind does with your program 1 6 ee 9 2 2 Getting started serio pe ho RrAS aed Desculesesa cold epi ees ced ix ette teh weed de pepe 9 2 3 The Commehtaty 4 tus Deni aia de Vide E gencre Ue obo RTL CORE Lega D oa e Rer iet e 10 2 4 Reporting OF ertors fetes e eer tht eh Wed pete oni deg or Rd ges oe ptem 12 2 3 Suppressing errors adesse deste le vais ae annes Dot rne a C MP eT A PD 12 2 6 Command line flags for the Valgrind core 0 cee eee 14 2 6 1 TOOlESELECHONOPUON ovine Siow cai eda reet Sie e a Ie Da SE de E dte do aan ed 15 2 6 2 Basi Options dieat ka daa waded yes bru kr iva kA krcd sa 15 2 6 3 Error related options socre aid dar lb ERR eae os EE da uad erri Eds 16 2 6 4 mal Loe related Options viii br rec be aA b ERR A RA a ERE 19 2 6 5 Uncommon Options isses risu epe ene ee pua rp HPh RR peat eh beber e a RP DE rd 19 2 6 6 Debugging Valgrind Options 00 en 21 2 6 7 Setting default Options ciscus E obo dh bue dae rer p er a er eles 21 2 7 The Chent Request Mechanical aped aei ior Dae C ae Da er ge ite ed 21 2 8 Support for Threads di da bea drei E lat ipn lt a d wd 23 2 97 Handling of Signale licita Pea wiht Pacte tide ers Rp ae ee T fele RA A 24 2 10 Function Wrapping or iNest Ae tee IDA ee E MAU we dedu 24 2 10 1 A Simple Example dete Doe Reto e
27. 113 1 4 5 User defined Permission Ranges ooocccccccccoccccon cnc 114 2 How Cachegrind Works 2 0 cic cataccceesaia eee Rr repa LER Ri audi ead a RIQw x ER 117 2 1 Cache profiling ici cece hy ede eae A i da ea end eee bad eee te 117 2 2 Cost centres sts ii aia aware a eee ea te ad ha A cea paid E D EEA a 117 2 3 Storing COSTCONTTES gt rm e A nee tle ae oi ide Her dada TE ok ae 118 2 4 Instrumentationy vocera aa a alee ra ya went tei iy eas 119 2 5 Handling basic block retranslations 2 0 60 121 2 6 The cache simulation it Pots ates bad Sane ha rni a ae ee aed 122 2 1 Output ions a ee esr teet Ries mobail hed AE uae aka dedi Rae ean 122 2 8 Summary of performance features iiis hne 123 2 9 AnfiotatlOn ti Made ale Sen es teak tale tik eas RIEN cider E Gees EU See e aah eyes 123 Z 10 Similar work xtensiOn s 41 4 4 siden de pat RES A LIGA beds Sa eel we dee Dab eer 123 3 Callgrind Format Specification cocina cene eram Eun E ii ERE ee tee 124 3 I OVetvIeW cues scie be le rd daa es dicta 124 3 1 1 Basic Structure ciieiloiass s RIA D eha Rua Ried RE yi sede a uc RAT aa Er coded 124 3 1 2 Simple Example 2 1 12 24 oes Bonde dde LAE APP Ia P I M EUG d e REPRE 124 3 1 3 AssoclaliOns iol loe PAIR ie it Desa bru HU AA 124 3 L4 Extended Example eriy ideanku s det vam ELI ria Cd pP dris 125 3 1 5 Name COMPIESSION sos ass ne tb uh e rata pp dap E dece A ea o tee Ried Mta oti ert 126 3 1 6 Subposition Compressio
28. 12345 There are several graphical front ends for Valgrind such as Valkyrie Alleyoop and Valgui See http www valgrind org downloads guis html for a list BUGS FIXED 109861 amd64 hangs at startup 110301 ditto 111554 valgrind crashes with Cannot allocate memory 111809 Memcheck tool doesn t start java 111901 cross platform run of cachegrind fails on opteron 113468 vgPlain mprotect range Assertion r 1 failed 92071 Reading debugging info uses too much memory 109744 memcheck loses track of mmap from direct ld linux so 2 110183 tail of page with end 82301 FV memory layout too rigid 98278 Infinite recursion possible when allocating memory 108994 Valgrind runs out of memory due to 133x overhead 115643 valgrind cannot allocate memory 105974 vg hashtable c static hash table 109323 ppc32 dispatch S uses Altivec insn which doesn t work on POWER 109345 ptrace setregs not yet implemented for ppc 110831 Would like to be able to run against both 32 and 64 bit binaries on AMD64 110829 110831 111781 compile of valgrind 3 0 0 fails on my linux gcc 2 X prob 112670 Cachegrind cg main c 486 handleOneStatement 112941 vex x86 OxD9 OxF4 fxtract 110201 112941 113015 vex amd64 gt IR OxE3 0x14 0x48 0x83 jrexz 113126 Crash with binaries built with gstabs ggdb 104065 113126 115741 113126 113403 Partial SSE3 support on x86 113541 vex Grp5 x86 alt encoding inc dec case 1 113642 val
29. 4 1 Remove limit on number of semaphores supported Add support for syscalls set tid address 258 acct 51 Support instruction repne movs not official but seems to occur Implement an emulated soft limit for file descriptors in addition to the current reserved area which effectively acts as a hard limit The setrlimit system call now simply updates the emulated limits as best as possible the hard limit is not allowed to move at all and just returns EPERM if you try and change it This should stop reductions in the soft limit causing assertions when valgrind tries to allocate descriptors from the reserved area This actually came from bug 83998 Major overhaul of Cachegrind implementation First user visible change is that cachegrind out files are now typically 9096 smaller than they used to be code annotation times are correspondingly much smaller Second user visible change is that hit miss counts for code that is unloaded at run time is no longer dumped into a single discard pile 171 NEWS but accurately preserved Client requests for telling valgrind about memory pools Developer cvs head release 2 1 1 12 March 2004 2 1 1 contains some internal structural changes needed for V s long term future These don t affect end users Most notable user visible changes are Greater isolation between Valgrind and the program being run so the program is less likely to inadvertently kill Valgrind by
30. 5 uninit imada iii ey ey fem tag 1 byte instr size 1 bytes padding 2 bytes instr addr 4 bytes Tea 8 bytes E oom 8 bytes I m2 8 bytes tag 1 byte instr_size 1 byte data_size 1 byte padding 1 byte instr_addr 4 bytes la 8 bytes JE Sail 8 bytes Toma 8 bytes D a 8 bytes ID roll 8 bytes Dome 8 bytes Note that this step is not performed if a basic block is re translated see Handling basic block retranslations for more information GCC inserts padding before the instr size field so that it is word aligned The instrumentation added to call the cache simulation function looks like this instrumentation is indented to distinguish it from the original UCode 120 How Cachegrind works MOVL SOM c20 EE t20 SEAX PUSHL eax PUSHL ecx PUSHL Sedx OVL SOx4091F8A4 t46 address of 1st CC PUSHL t46 CALLMo SOx12 second cachesim function CLEARO 0x4 POPL Sedx POPL ecx POPL eax INCEIPo 5 LEA1L SAA MOVL OO reals OVL clad 15207 STL well ela PUSHL eax PUSHL ecx PUSHL Sedx PUSHL t42 OVL SOx4091F8C4 t44 address of 2nd CC PUSHL t44 CALLMo SiO sles second cachesim function CLEARo 0x8 POPL Sedx POPL ecx POPL eax TENCE EO SEES Consider the first instruction s UCode Each call is surrounded by three PUSHL and POPL instructions to save and restore the caller save registers Then t
31. Entitled Acknowledgements and any sections Entitled Dedications You must delete all sections Entitled Endorsements 6 COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License and replace the individual copies of this License in the various documents with a single copy that is included in the collection provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects You may extract a single document from such a collection and distribute it individually under this License provided you insert a copy of this License into the extracted document and follow this License in all other respects regarding verbatim copying of that document 7 AGGREGATION WITH INDEPENDENT WORKS A compilation of the Document or its derivatives with other separate and independent documents or works in or on a volume of a storage or distribution medium is called an aggregate if the copyright resulting from the compilation is not used to limit the legal rights of the compilation s users beyond what the individual works permit When the Document is included in an aggregate this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document If the Cover Text requirement of section 3 is applicable to these copies of the Document then if the Document is less than one half o
32. Suppression update for Debian unstable 122067 amd64 femovnu OxDB 0xD9 n i bz ppc32 broken signal handling in cpu feature detection n i bz ppc32 rounding mode problems improved partial fix only 119482 ppc32 mtfsbl n i bz ppc32 mtocrf mfocrf 3 1 1 15 March 2006 vex r1597 valgrind r5771 Release 3 1 0 25 November 2005 3 1 0 is a feature release with a number of significant improvements AMD64 support is much improved PPC32 support is good enough to be usable and the handling of memory management and address space is much more robust In detail AMD64 support is much improved The 64 bit vs 32 bit issues in 3 0 X have been resolved and it should just work now in all cases On AMD64 machines both 64 bit and 32 bit versions of Valgrind are built The right version will be invoked automatically even when using trace children and mixing execution between 64 bit and 32 bit executables Also many more instructions are supported PPC32 support is now good enough to be usable It should work with all tools but please let us know if you have problems Three classes of CPUs are supported integer only no FP no Altivec which covers embedded PPC uses integer and FP but no Altivec G3 ish and CPUs capable of Altivec too G4 G5 Valgrind s address space management has been overhauled As a result Valgrind should be much more robust with programs that use large amounts of memory There should be many
33. a crash do the following Try running with vex guest chase thresh 0 trace flags 10000000 trace notbelow 999999 This should print one line for each block translated and that includes the address Then re run with 999999 changed to the highest bb number shown This will print the one line per block and also will print a disassembly of the block in which the fault occurred 190 8 README_PACKAGERS Greetings packaging person This information is aimed at people building binary distributions of Valgrind Thanks for taking the time and effort to make a binary distribution of Valgrind The following notes may save you some trouble Unfortunate but true When you configure to build with the prefix foo bar xyzzy option the prefix foo bar xyzzy gets baked into valgrind The consequence is that you _must_ install valgrind at the location specified in the prefix If you don t it may appear to work but will break doing some obscure things particularly doing fork and exec So you can t build a relocatable RPM whatever from Valgrind Don t strip the debug info off stage2 or libpthread so Valgrind will still work if you do but it will generate less helpful error messages Here s an example Mismatched free delete delete at 0x40043249 free vg_clientfuncs c 171 by 0x4102BB4E QGArray QGArray void tools qgarray cpp 149 by 0x4C261C41 PptDoc PptDoc void include qmemarray h 60 by 0
34. a lot of its contents are out of date and misleading 1 1 1 History Valgrind came into public view in late Feb 2002 However it has been under contemplation for a very long time perhaps seriously for about five years Somewhat over two years ago I started working on the x86 code generator for the Glasgow Haskell Compiler http www haskell org ghc gaining familiarity with x86 internals on the way I then did Cacheprof gaining further x86 experience Some time around Feb 2000 I started experimenting with a user space x86 interpreter for x86 Linux This worked but it was clear that a JIT based scheme would be necessary to give reasonable performance for Valgrind Design work for the JITter started in earnest in Oct 2000 and by early 2001 I had an x86 to x86 dynamic translator which could run quite large programs This translator was in a sense pointless since it did not do any instrumentation or checking Most of the rest of 2001 was taken up designing and implementing the instrumentation scheme The main difficulty which consumed a lot of effort was to design a scheme which did not generate large numbers of false uninitialised value warnings By late 2001 a satisfactory scheme had been arrived at and I started to test it on ever larger programs with an eventual eye to making it work well enough so that it was helpful to folks debugging the upcoming version 3 of KDE I ve used KDE since before version 1 0 and wanted to Valgrind to be an
35. a warning message explaining that annotations for the file might be incorrect e If you compile some files with g and some without some events that take place in a file without debug info could be attributed to the last line of a file with debug info whichever one gets placed before the non debug info file in the executable 61 Cachegrind a cache profiler This list looks long but these cases should be fairly rare Note stabs is not an easy format to read If you come across bizarre annotations that look like might be caused by a bug in the stabs reader please let us know 4 3 3 Accuracy Valgrind s cache profiling has a number of shortcomings t doesn t account for kernel activity the effect of system calls on the cache contents is ignored t doesn t account for other process activity although this is probably desirable when considering a single program It doesn t account for virtual to physical address mappings hence the entire simulation is not a true representation of what s happening in the cache t doesn t account for cache misses not visible at the instruction level eg those arising from TLB misses or speculative execution Valgrind will schedule threads differently from how they would be when running natively This could warp the results for threaded programs The x86 amd64 instructions bts btr and btc will incorrectly be counted as doing a data read if both the argument
36. address A bit This indicates whether or not the program can legitimately read or write that location It does not give any indication of the validity or the data at that location that s the job of the V bits only whether or not the location may be accessed Every time your program reads or writes memory Memcheck checks the A bits associated with the address If any of them indicate an invalid address an error is emitted Note that the reads and writes themselves do not change the A bits only consult them So how do the A bits get set cleared Like this When the program starts all the global data areas are marked as accessible When the program does malloc new the A bits for exactly the area allocated and not a byte more are marked as accessible Upon freeing the area the A bits are changed to indicate inaccessibility When the stack pointer register SP moves up or down A bits are set The rule is that the area from SP up to the base of the stack is marked as accessible and below SP is inaccessible If that sounds illogical bear in mind that the stack grows down not up on almost all Unix systems including GNU Linux Tracking SP like this has the useful side effect that the section of stack used by a function for local variables etc is automatically marked accessible on function entry and inaccessible on exit When doing system calls A bits are changed appropriately For example mmap magically makes files appe
37. amount of feedback and bug reports We may not answer all messages but we do at least look at all of them and tend to fix the most frequently reported bugs Version 1 9 6 7 May 2003 or thereabouts Major changes in 1 9 6 Improved threading support for glibc gt 2 3 2 SuSE 8 2 RedHat 9 to name but two It turned out that 1 9 5 had problems with threading support on glibc gt 2 3 2 usually manifested by threaded programs deadlocking in system calls or running unbelievably slowly Hopefully these are fixed now 1 9 6 is the first valgrind which gives reasonable support for glibc 2 3 2 Also fixed a 2 3 2 problem with pthread atfork Majorly expanded FAQ txt We ve added workarounds for all common problems for which a workaround is known Minor changes in 1 9 6 Fix identification of the main thread s stack Incorrect identification of it was causing some on stack addresses to not get identified as such This only affected the usefulness of some error messages the correctness of the checks made is unchanged Support for kernels gt 2 5 68 Dummy implementations of libc current sigrtmin libe current sigrtmax and libc allocate rtsig hopefully good enough to keep alive programs which previously died for lack of them Fix bug in the VALGRIND DISCARD TRANSLATIONS client request Fix bug in the DWARF2 debug line info loader when instructions following each other have source lines far from each ot
38. an experimental tool so your feedback is especially welcomed here Helgrind has been hacked on extensively by Jeremy Fitzhardinge and we have him to thank for getting it to a releasable state NOTE Helgrind is unfortunately not available in Valgrind 3 1 X as a result of threading changes that happened in the 2 4 0 release We hope to reinstate its functionality in a future 3 2 0 release A couple of minor tools Lackey and Nulgrind are also supplied These aren t particularly useful they exist to illustrate how to create simple tools and to help the valgrind developers in various ways Nulgrind is the null tool it adds no instrumentation Lackey is a simple example tool which counts instructions memory accesses and the number of integer and floating point operations your program does Valgrind is closely tied to details of the CPU and operating system and to a lesser extent the compiler and basic C libraries Nonetheless as of version 3 1 0 it supports several platforms x86 Linux mature AMD64 Linux maturing and PPC32 Linux immature but works well Valgrind uses the standard Unix configure make make install mechanism and we have attempted to ensure that it works on machines with kernel 2 4 or 2 6 and glibc 2 2 X 2 3 X Valgrind is licensed under the The GNU General Public License version 2 The valgrind x h headers that you may wish to include in your code eg valgrind h memcheck h are distributed under a BSD st
39. apply it to your programs too When we speak of free software we are referring to freedom not price Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software and charge for this service if you wish that you receive source code or can get it if you want it that you can change the software or use pieces of it in new free programs and that you know you can do these things To protect your rights we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights These restrictions translate to certain responsibilities for you if you distribute copies of the software or if you modify it For example if you distribute copies of such a program whether gratis or for a fee you must give the recipients all the rights that you have You must make sure that they too receive or can get the source code And you must show them these terms so they know their rights We protect your rights with two steps 1 copyright the software and 2 offer you this license which gives you legal permission to copy distribute and or modify the software Also for each author s protection and ours we want to make certain that everyone understands that there is no warranty for this free software If the software is modified by someone else and passed on we want its recipients to know that what they have is not the original so that any problems int
40. as needed to only see event counters happening while inside of the program part you want to profile The second option can be used if the program part you want to profile is called many times Option 1 i e creating a lot of dumps is not practical here Collection state can be toggled at entry and exit of a given function with the option toggle collect If you use this flag collection state should be switched off at the beginning Note that the specification of toggle collect implicitly sets collect state no Collection state can be toggled also by using a Valgrind Client Request in your application For this include valgrind callgrind h and specify the macro CALLGRIND TOGGLE COLLECT at the needed positions This only will have any effect if run under supervision of the Callgrind tool toggle collect lt prefix gt Toggle collection on entry exit of a function whose name starts with lt prefix gt collect jumps no yes default no This specifies whether information for conditional jumps should be collected As above callgrind annotate currently is not able to show you the data You have to use KCachegrind to get jump arrows in the annotated code 5 4 5 Cost entity separation options These options specify how event counts should be attributed to execution contexts More specifically they specify e g if the recursion level or the call chain leading to a function should be accounted
41. attach yes conflicts with trace children yes You can t use them together Valgrind refuses to start up in this situation May 2002 this is a historical relic which could be easily fixed if it gets in your way Mail us and complain if this is a problem for you Nov 2002 if you re sending output to a logfile or to a network socket I guess this option doesn t make any sense Caveat emptor db command command default gdb nw f Sp Specify the debugger to use with the db attach command The default debugger is gdb This option is a template that is expanded by Valgrind at runtime f is replaced with the executable s file name and p is replaced by the process ID of the executable This specifies how Valgrind will invoke the debugger By default it will use whatever GDB is detected at build time which is usually usr bin gdb Using this command you can specify some alternative command to invoke the debugger you want to use The command string given can include one or instances of the p and f expansions Each instance of p expands to the PID of the process to be debugged and each instance of f expands to the path to the executable for the process to be debugged 18 Using and understanding the Valgrind core input fd lt number gt default 0 stdin When using db attach yes and gen suppressions yes Valgrind will stop so as to read keyboard input from you when each error occurs By default it reads fr
42. beginning Valgrind hopefully will have emitted a proper message to that effect before dying in this way This is a known problem which we should fix 29 Using and understanding the Valgrind core Read the Valgrind FAQ for more advice about common problems crashes etc 2 13 Limitations The following list of limitations seems long However most programs actually work fine Valgrind will run Linux ELF binaries on a kernel 2 4 X or 2 6 X system on the x86 amd64 ppc32 and ppc64 architectures subject to the following constraints e On x86 and amd64 there is no support for 3DNow instructions If the translator encounters these Valgrind will generate a SIGILL when the instruction is executed Apart from that on x86 and amd64 essentially all instructions are supported up to and including SSE2 Version 3 1 0 includes limited support for SSE3 on x86 This could be improved if necessary On ppc32 and ppc64 almost all integer floating point and Altivec instructions are supported Specifically integer and FP insns that are mandatory for PowerPC the General purpose optional group fsqrt fsqrts stfiwx the Graphics optional group fre fres frsqrte frsqrtes and the Altivec also known as VMX SIMD instruction set are supported Atomic instruction sequences are not properly supported in the sense that their atomicity is not preserved This will affect any use of synchronization via memory shared between processes They
43. binary incompatible changes If the core and tool has the same major version number X they should work together If X doesn t match Valgrind will abort execution with an explanation of the problem This approach was chosen so that if the interface changes in the future old tools won t work and the reason will be clearly explained instead of possibly crashing mysteriously We have attempted to minimise the potential for binary incompatible changes by means such as minimising the use of naked structs in the interface 4 4 Final Words This whole core tool business is under active development although it s slowly maturing The first consequence of this is that the core tool interface will continue to change in the future we have no intention of freezing it and then regretting the inevitable stupidities Hopefully most of the future changes will be to add new features hooks functions etc rather than to change old ones which should cause a minimum of trouble for existing tools and we ve put some effort into future proofing the interface to avoid binary incompatibility But we can t guarantee anything The versioning system should catch any incompatibilities Just something to be aware of The second consequence of this is that we d love to hear your feedback about it If you love it or hate it f you find bugs f you write a tool f you have suggestions for new features needs trackable events functions f you have s
44. by D1mr D1mw and that L2 total accesses is given by I2mr D2mr D2mw e Events shown the events shown a subset of events gathered This can be adjusted with the show option Event sort order the sort order in which functions are shown For example in this case the functions are sorted from highest Ir counts to lowest If two functions have identical Ir counts they will then be sorted by I1mr counts and so on This order can be adjusted with the sort option Note that this dictates the order the functions appear It is not the order in which the columns appear that is dictated by the events shown line and can be changed with the show option e Threshold cg annotate by default omits functions that cause very low numbers of misses to avoid drowning you in information In this case cg annotate shows summaries the functions that account for 99 of the Ir counts Ir is chosen as the threshold event since it is the primary sort event The threshold can be adjusted with the threshold option Chosen for annotation names of files specified manually for annotation in this case none e Auto annotation whether auto annotation was requested via the auto yes option In this case no 56 Cachegrind a cache profiler Then follows summary statistics for the whole program These are similar to the summary provided when running valgrind tool cachegrind Then follows function by function statistics Each funct
45. checking determines how willing memcheck is to consider different backtraces to be the same When set to Low only the first two entries need match When med four entries have to match When high all entries need to match For hardcore leak debugging you probably want to use leak resolution high together with num callers 40 or some such large number Note however that this can give an overwhelming amount of information which is why the defaults are 4 callers and low resolution matching Note that the leak resolution setting does not affect memcheck s ability to find leaks It only changes how the results are presented freelist vol lt number gt default 5000000 When the client program releases memory using f ree in C or delete C that memory is not immediately made available for re allocation Instead it is marked inaccessible and placed in a queue of freed blocks The purpose is to defer as long as possible the point at which freed up memory comes back into circulation This increases the chance that memcheck will be able to detect invalid accesses to blocks for some significant period of time after they have been freed This flag specifies the maximum total size in bytes of the blocks in the queue The default value is five million bytes Increasing this increases the total amount of memory used by memcheck but may detect invalid uses of freed blocks which would otherwise go undetected workaround gcc296 bugs
46. data cache accesses The iCC and dCC structs also store unchanging information about the instruction An instruction type identification tag explained below Instruction size e Data reference size 1dCC only Instruction address Note that data address is not one of the fields for idCC This is because for many memory referencing instructions the data address can change each time it s executed eg if it uses register offset addressing We have to give this item to the cache simulation in a different way see Instrumentation section below Some memory referencing instructions do always reference the same address but we don t try to treat them specialy in order to keep things simple Also note that there is only room for recording info about one data cache access in an idCC So what about instructions that do a read then a write such as 9 inc esi In a write allocate cache as simulated by Valgrind the write cannot miss since it immediately follows the read which will drag the block into the cache if it s not already there So the write access isn t really interesting and Valgrind doesn t record it This means that Valgrind doesn t measure memory references but rather memory references that could miss in the cache This behaviour is the same as that used by the AMD Athlon hardware counters It also has the benefit of simplifying the implementation instructions that read and write memory can be treated like instructions t
47. enabled I have no plan to remove or disable them later Over the past couple of months as valgrind has become more widely used they have shown their worth pulling up various bugs which would otherwise have appeared as hard to find segmentation faults I am of the view that it s acceptable to spend 5 of the total running time of your valgrindified program doing assertion checks and other internal sanity checks Aside from the assertions valgrind contains various sets of internal sanity checks which get run at varying frequencies during normal operation VG do sanity checks runs every 1000 basic blocks which means 500 to 2000 times second for typical machines at present It checks that Valgrind hasn t overrun its private stack and does some simple checks on the memory permissions maps Once every 25 calls it does some more extensive checks on those maps Etc etc The following components also have sanity check code which can be enabled to aid debugging The low level memory manager VG mnallocSanityCheckArena This does a complete check of all blocks and chains in an arena which is very slow Is not engaged by default The symbol table reader s various checks to ensure uniqueness of mappings see VG read symbols forastart Is permanently engaged The A and V bit tracking stuff in vg memory c This can be compiled with cpp symbol VG DEBUG MEMORY defined which removes all the fast optimised cases and uses simple bu
48. errno h_errno or the DNS resolver functions in threaded programs 20030716 should improve matters This snapshot seems stable enough to run OpenOffice org 1 1rc on Red Hat 7 3 SuSE 8 2 and Red Hat 9 and that s a big threaded app if ever I saw one Automatic generation of suppression records you no longer need to write them by hand Use gen suppressions yes strcpy memcpy etc check their arguments for overlaps when running with the Memcheck or Addrcheck skins malloc_usable_size is now supported new client requests VALGRIND_COUNT_ERRORS VALGRIND_COUNT_LEAKS useful with regression testing VALGRIND_NON_SIMD_CALL 0123 for running arbitrary functions on real CPU use with caution The GDB attach mechanism is more flexible Allow the GDB to be run to be specified by gdb path path to gdb and specify which file descriptor V will read its input from with input fd lt number gt Cachegrind gives more accurate results wasn t tracking instructions in malloc and friends previously is now 177 NEWS Complete support for the MMX instruction set Partial support for the SSE and SSE2 instruction sets Work for this is ongoing About half the SSE SSE2 instructions are done so some SSE based programs may work Currently you need to specify skin addrcheck Basically not suitable for real use yet Significant speedups 10 20 for standard memory checking Fix assertion failure in pthread_once Fix this
49. for and whether the thread ID should be remembered Also see Avoiding cycles separate threads lt no yes gt default no This option specifies whether profile data should be generated separately for every thread If yes the file names get threadID appended fn recursion lt level gt default 2 Separate function recursions maximal lt level gt See Avoiding cycles fn caller lt callers gt default 0 Separate contexts by maximal lt callers gt functions in the call chain See Avoiding cycles skip plt lt nolyes gt default yes Ignore calls to from PLT sections 69 Callgrind a heavyweight profiler fn skip lt function gt Ignore calls to from a given function E g if you have a call chain A gt B gt C and you specify function B to be ignored you will only see A C This is very convenient to skip functions handling callback behaviour E g for the SIGNAL SLOT mechanism in QT you only want to see the function emitting a signal to call the slots connected to that signal First determine the real call chain to see the functions needed to be skipped then use this option fn group number function Put a function into a separate group This influences the context name for cycle avoidance All functions inside of such a group are treated as being the same for context name building which resembles the call chain leading to a context By specifying function groups with this optio
50. for example the pid of the traced process is 12345 This is helpful when val grinding a whole tree of processes at once since it means that each process writes to its own logfile rather than the result being jumbled up in one big logfile If filename 12345 already exists then it will name new files filename 12345 1 and so on If you want to specify precisely the file name to use without the trailing 12345 part you can instead use log file exactly filename You can also use the 1og file qualifier lt VAR gt option to modify the filename via according to the environment variable VAR This is rarely needed but very useful in certain circumstances eg when running MPI programs In this case the trailing 12345 part is replaced by the contents of SVAR The idea is that you specify a variable which will be set differently for each process in the job for example BPROC RANK or whatever is applicable in your MPI setup 3 The least intrusive option is to send the commentary to a network socket The socket is specified as an IP address and port number pair like this log socket 192 168 0 1 12345 if you want to send the output to host IP 192 168 0 1 port 12345 I have no idea if 12345 is a port of pre existing significance You can also omit the port number 1og socket 192 168 0 1 in which case a default port of 1500 is used This default is defined by the constant V6 CLO DEFAULT LOGPORT in the sources Note unfortunately
51. funny things with pointers still reachable means your program is probably ok it didn t free some memory it could have This is quite common and often reasonable Don t use show reachable yes if you don t want to see these reports suppressed means that a leak error has been suppressed There are some suppressions in the default suppression files You can ignore suppressed errors 7 How To Get Further Assistance Please read all of this section before posting If you think an answer is incomplete or inaccurate please e mail valgrindO valgrind org Read the appropriate section s of the Valgrind Documentation Read the Distribution Documents Search the valgrind users mailing list archives using the group name gmane comp debugging valgrind Only when you have tried all of these things and are still stuck should you post to the valgrind users mailing list In which case please read the following carefully Making a complete posting will greatly increase the chances that an expert or fellow user reading it will have enough information and motivation to reply Make sure you give full details of the problem including the full output of valgrind v your prog if applicable Also which Linux distribution you re using Red Hat Debian etc and its version number You are in little danger of making your posting too long unless you include large chunks of Valgrind s unsuppressed output so err on the side of giving too much
52. if you do not accept this License Therefore by modifying or distributing the Program or any work based on the Program you indicate your acceptance of this License to do so and all its terms and conditions for copying distributing or modifying the Program or works based on it 6 Each time you redistribute the Program or any work based on the Program the recipient automatically receives a license from the original licensor to copy distribute or modify the Program subject to these terms and conditions You may not impose any further restrictions on the recipients exercise of the rights granted herein You are not responsible for enforcing compliance by third parties to this License 7 If as a consequence of a court judgment or allegation of patent infringement or for any other reason not limited to patent issues conditions are imposed on you whether by court order agreement or otherwise that contradict the conditions of this License they do not excuse you from the conditions of this License If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations then as a consequence you may not distribute the Program at all For example if a patent license would not permit royalty free redistribution of the Program by all those who receive copies directly or indirectly through you then the only way you could satisfy both it and this License would be to refrain entire
53. indirect contribution to the KDE 3 development effort At the start of Feb 02 the kde core devel crew started using it and gave a huge amount of helpful feedback and patches in the space of three weeks Snapshot 20020306 is the result In the best Unix tradition or perhaps in the spirit of Fred Brooks depressing but completely accurate epitaph build one to throw away you will anyway much of Valgrind is a second or third rendition of the initial idea The instrumentation machinery vg_translate c vg_memory c and core CPU simulation vg_to_ucode c vg_from_ucode c have had three redesigns and rewrites the register allocator low level memory manager vg_malloc2 c and symbol table reader vg_symtab2 c are on the second rewrite In a sense this document serves to record some of the knowledge gained as a result 1 1 2 Design overview Valgrind is compiled into a Linux shared object valgrind so and also a dummy one valgring so of which more later The valgrind shell script adds valgrind so to the LD_PRELOAD list of extra libraries to be loaded with any dynamically linked library This is a standard trick one which I assume the LD_PRELOAD mechanism was developed to support valgrind so is linked with the z initfirst flag which requests that its initialisation code is run before that of any other object in the executable image When this happens valgrind gains control The real CPU 91 The Design and Implementation of Valgri
54. is very easy you just replace the three files and recompile 2 7 Output Output is fairly straightforward basically printing the cost centre for every instruction grouped by files and functions Total counts eg total cache accesses total L1 misses are calculated when traversing this structure rather than during execution to save time the cache simulation functions are called so often that even one or two extra adds can make a sizeable difference Input file has the following format file desc line cmd line events line data_line summary line desc line SS CN SO NS terras m y cmd_line EMO RAN Se mel events_line events ws event ws data_lin file_line fn_line count_line Eile Lain fl fi fe filename al e fn fn name count line line num ws count ws summary line summary ws count WS t count EBes ewe Wu Where non nl string is any string not containing a newline cmd is a command line invocation filename and fn name can be anything num and line num are decimal numbers ws is whitespace 122 How Cachegrind works enl is a newline The contents of the desc lines is printed out at the top of the summary This is a generic way of providing simulation specific information eg for giving the cache configuration for cache simulation on Counts can be to represent N A eg the number of write misses for an instruction t
55. least a little you might have to do more complicated things with it later on In particular the name of the foobar_SOURCES variable determines the name of the tool which determines what name must be passed to the too1 option to use the tool Copy none nl main c into foobar renaming it as b main c Edit it by changing the lines in pre clo init to something appropriate for the tool These fields are used in the startup message except for bug reports to which is used if a tool assertion fails Edit Makefile am adding the new directory oobar to the SUBDIRS variable Edit configure in adding foobar Makefile to the AC OUTPUT list Run autogen sh configure prefix pwd inst make install It should automake configure and compile without errors putting copies of the tool in foobar and inst lib valgrind 136 Writing a New Valgrind Tool 8 You can test it with a command like inst bin valgrind tool foobar date almost any program should work date is just an example The output should be something like this 738 foobar 0 0 1 a foobarring tool for x86 linux 738 Copyright C 1066AD and GNU GPL d by J Random Hacker 738 Built with valgrind 1 1 0 a program execution monitor 738 Copyright C 2000 2003 and GNU GPL d by Julian Seward 738 Estimated CPU clock rate is 1400 MHz 738 For more details rerun with v 738 Wed Sep 25 10 31 54 BST 2002 738
56. like this can unnecessarily increase the amount of memory they are using over time 6 1 1 Why Use a Heap Profiler Everybody knows how useful time profilers are for speeding up programs They are particularly useful because people are notoriously bad at predicting where are the bottlenecks in their programs But the story is different for heap profilers Some programming languages particularly lazy functional languages like Haskell have quite sophisticated heap profilers But there are few tools as powerful for profiling C and C programs Why is this Maybe it s because C and C programmers must think that they know where the memory is being allocated After all you can see all the calls to malloc and new and new right But in a big program do you really know which heap allocations are being executed how many times and how large each allocation is Can you give even a vague estimate of the memory footprint for your program Do you know this for all the libraries your program uses What about administration bytes required by the heap allocator to track heap blocks have you thought about them What about the stack If you are unsure about any of these things maybe you should think about heap profiling Massif can tell you these things Or maybe it s because it s relatively easy to add basic heap profiling functionality into a program to tell you how many bytes you have allocated for certain objects or similar But this in
57. new architecture support The new JIT unfortunately translates more slowly than the old one so programs may take longer to start We believe the code quality is produces is about the same so once 162 NEWS started programs should run at about the same speed Feedback about this would be useful On the plus side Vex and hence Memcheck tracks value flow properly through floating point and vector registers something the 2 X line could not do That means that Memcheck is much more likely to be usably accurate on vectorised code There is a subtle change to the way exiting of threaded programs is handled In 3 0 Valgrind s final diagnostic output leak check etc is not printed until the last thread exits If the last thread to exit was not the original thread which started the program any other process wait ing on this one to exit may conclude it has finished before the diagnostic output is printed This may not be what you expect 2 X had a different scheme which avoided this problem but caused deadlocks under obscure circumstances so we are trying something different for 3 0 Small changes in control log file naming which make it easier to use valgrind for debugging MPI based programs The relevant new flags are log file exactly and log file qualifier As part of adding AMD64 support DWARF2 CFI based stack unwinding support was added In principle this means Valgrind can produce meaningful backtraces on x86 co
58. now run itself see README DEVELOPERS for how This is not much use to you but it means the developers can now profile Valgrind using Cachegrind As a result a couple of performance bad cases have been fixed The XML output format has changed slightly See docs internals xml output txt Core dumping has been reinstated it was disabled in 3 0 0 and 3 0 1 If your program crashes while running under Valgrind a core file with the name vgcore lt pid gt will be created if your settings allow core file creation Note that the floating point information is not all there If Valgrind itself crashes the OS will create a normal core file The following are some user visible changes that occurred in earlier versions that may not have been announced or were announced but not widely noticed So we re mentioning them now The tool flag is optional once again if you omit it Memcheck is run by default The num callers flag now has a default value of 12 It was previously 4 The xml yes flag causes Valgrind s output to be produced in XML format This is designed to make it easy for other programs to 158 NEWS consume Valgrind s output The format is described in the file docs internals xml format txt The gen suppressions flag supports an all value that causes every suppression to be printed without asking The log file option no longer puts pid in the filename eg the old name foo pid12345 is now foo
59. of warranty and each file should have at least the copyright line and a pointer to where the full notice is found lt one line to give the program s name and a brief idea of what it does gt Copyright C lt year gt lt name of author gt This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with this program if not write to the Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1307 USA Also add information on how to contact you by electronic and paper mail If the program is interactive make it output a short notice like this when it starts in an interactive mode Gnomovision version 69 Copyright C year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY for details type show w This is free software and you are welcome to redistribute it under certain conditions type show c for details 200 The GNU General Public License The hypothetical commands show w and show c
60. or wildcards matching function names They begin obj and fun respectively Function and object names to match against may use the wildcard characters and Important note C function names must be mangled If you are writing suppressions by hand use the demangle no option to get the mangled names in your error messages Finally the entire suppression must be between curly braces Each brace must be the first character on its own line A suppression only suppresses an error when the error matches all the details in the suppression Here s an example __gconv_transform_ascii_internal __mbrtowc mbtowc Memcheck Value4 fun gconv transform ascii internal Tun emi Beto fun mbtowc What it means is for Memcheck only suppress a use of uninitialised value error when the data size is 4 when it occurs in the function gconv transform ascii internal when that is called from any function of name matching __mbrx toc when that is called from mbtowc It doesn t ap ply under any other circumstances The string by which this suppression is identified to the user is Ggconv transform ascii internal mbrtowc mbtowc See Writing suppression files for more details on the specifics of Memcheck s suppression kinds Another example again for the Memcheck tool IDAS oa MASSA EAS OA NO Memcheck Value4 obj usr X11R6 lib libX11 s obj usr X11R6 lib libX11 so obj usr X11R6 lib libXaw s Suppress any size 4 uni
61. point of terminology most references to valgrind in the rest of this section Section 2 refer to the Valgrind core services 2 1 What Valgrind does with your program Valgrind is designed to be as non intrusive as possible It works directly with existing executables You don t need to recompile relink or otherwise modify the program to be checked Simply put valgrind tool tool name at the start of the command line normally used to run the program For example if want to run the command 1s 1 using the heavyweight memory checking tool Memcheck issue the command valgrind tool memcheck ls 1 Memcheck is the default so if you want to use it you can actually omit the too1 flag Regardless of which tool is in use Valgrind takes control of your program before it starts Debugging information is read from the executable and associated libraries so that error messages and other outputs can be phrased in terms of source code locations if that is appropriate Your program is then run on a synthetic CPU provided by the Valgrind core As new code is executed for the first time the core hands the code to the selected tool The tool adds its own instrumentation code to this and hands the result back to the core which coordinates the continued execution of this instrumented code The amount of instrumentation code added varies widely between tools At one end of the scale Memcheck adds code to check every memory access and every
62. pointer and the assigned function will be called each time this happens More information about details needs and trackable events can be found in include pub_tool_tooliface h 4 2 8 Instrumentation instrument is the interesting one It allows you to instrument VEX JR which is Valgrind s RISC like intermediate language VEX IR is described in Introduction to UCode The easiest way to instrument VEX IR is to insert calls to C functions when interesting things happen See the tool Lackey lackey 1k_main c for a simple example of this or Cachegrind cachegrind cg_main c fora more complex example 4 2 9 Finalisation This is where you can present the final results such as a summary of the information collected Any log files should be written out at this point 4 2 10 Other Important Information Please note that the core tool split infrastructure is quite complex and not brilliantly documented Here are some important points but there are undoubtedly many others that I should note but haven t thought of The files include pub_tool_ h contain all the types macros functions etc that a tool should hopefully need and are the only h files a tool should need to include In particular you can t use anything from the C library there are deep reasons for this trust us Valgrind provides an implementation of a reasonable subset of the C library details of which are in pub tool libc h Similarly when wri
63. program will use the native 1ibpthread but not all of its facilities will work In particular synchonisation of processes via shared memory segments will not work This relies on special atomic instruction sequences which Valgrind does not emulate in a way which works between processes Unfortunately there s no way for Valgrind to warn when this is happening and such calls will mostly work it s only when there s a race that it will fail Valgrind also supports direct use of the clone system call futex and so on clone is supported where either everything is shared a thread or nothing is shared fork like partial sharing will fail Again any use of atomic instruction sequences in shared memory between processes will not work reliably 2 9 Handling of Signals Valgrind has a fairly complete signal implementation It should be able to cope with any valid use of signals If you re using signals in clever ways for example catching SIGSEGV modifying page state and restart ing the instruction you re probably relying on precise exceptions In this case you will need to use vex iropt precise memory exns yes If your program dies as a result of a fatal core dumping signal Valgrind will generate its own core file vgcore NNNNN containing your program s state You may use this core file for post mortem debugging with gdb or similar Note it will not generate a core if your core dump size limit is 0 At the time of writ
64. should show the appropriate parts of the General Public License Of course the commands you use may be called something other than show w and show c they could even be mouse clicks or menu items whatever suits your program You should also get your employer if you work as a programmer or your school if any to sign a copyright disclaimer for the program if necessary Here is a sample alter the names Yoyodyne Inc hereby disclaims all copyright interest in the program Gnomovision which makes passes at compilers written by James Hacker lt signature of Ty Coon gt 1 April 1989 Ty Coon President of Vice This General Public License does not permit incorporating your program into proprietary programs If your program is a subroutine library you may consider it more useful to permit linking proprietary applications with the library If this is what you want to do use the GNU Library General Public License instead of this License 201 2 The GNU Free Documentation License GNU Free Documentation License Version 1 2 November 2002 Copyright C 2000 2001 2002 Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed 0 PREAMBLE The purpose of this License is to make a manual textbook or other functional and useful document free in the sense of freedom
65. sp 4 y gt alilo if j 77 printf hello there n then Valgrind will complain at the if that the condition depends on uninitialised values Note that it doesn t complain at the j a i since at that point the undefinedness is not observable It s only when a decision has to be made as to whether or not to do the printf an observable action of your program that Memcheck complains Most low level operations such as adds cause Memcheck to use the V bits for the operands to calculate the V bits for the result Even if the result is partially or wholly undefined it does not complain Checks on definedness only occur in three places when a value is used to generate a memory address when control flow decision needs to be made and when a system call is detected Valgrind checks definedness of parameters as required If a check should detect undefinedness an error message is issued The resulting value is subsequently regarded as well defined To do otherwise would give long chains of error messages In effect we say that undefined values are non infectious This sounds overcomplicated Why not just check all reads from memory and complain if an undefined value is loaded into a CPU register Well that doesn t work well because perfectly legitimate C programs routinely copy uninitialised values around in memory and we don t want endless complaints about that Here s the canonical example Consider a struc
66. started running for such 196 The GNU General Public License interactive use in the most ordinary way to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty or else saying that you provide a warranty and that users may redistribute the program under these conditions and telling the user how to view a copy of this License Exception if the Program itself is interactive but does not normally print such an announcement your work based on the Program is not required to print an announcement These requirements apply to the modified work as a whole If identifiable sections of that work are not derived from the Program and can be reasonably considered independent and separate works in themselves then this License and its terms do not apply to those sections when you distribute them as separate works But when you distribute the same sections as part of a whole which is a work based on the Program the distribution of the whole must be on the terms of this License whose permissions for other licensees extend to the entire whole and thus to each and every part regardless of who wrote it Thus it is not the intent of this section to claim rights or contest your rights to work written entirely by you rather the intent is to exercise the right to control the distribution of derivative or collective works based on the Program In addition mere aggregation of anoth
67. the CPU so that we can keep going sensibly afterwards In fact the only thing which is important is our own stack pointer but for paranoia reasons I save and restore our own FPU state as well even though that s probably pointless The complication on the above complication is that for horrible reasons to do with signals we may have to handle a second client system call whilst the client is blocked inside some other system call unbelievable That means there s two sets of places to dump Valgrind s stack pointer and FPU state across the syscall and we decide which to use by consulting VG syscall depth which is in turn maintained by VG_ wrap_syscall 1 2 3 Introduction to UCode UCode lies at the heart of the x86 to x86 JITter The basic premise is that dealing the the x86 instruction set head on is just too darn complicated so we do the traditional compiler writer s trick and translate it into a simpler easier to deal with form In normal operation translation proceeds through six stages coordinated by VG translate 1 Parsing of an x86 basic block into a sequence of UCode instructions VG_ disBB 2 UCode optimisation vg improve with the aim of caching simulated registers in real registers over multiple simulated instructions and removing redundant simulated EFLAGS saving restoring 3 UCode instrumentation vg instrument which adds value and address checking code 4 Post instrumentation cleanup vg c
68. the Valgrind core include lt stdio h gt include valgrind h int I WRAP SONAME FNNAME ZU NONE foo int x int y int resul OrigFn fn VALGRIND GET ORIG FN fn printf foo s wrapper args d dWMn x y CALL FN W WW result fn x y printf foo s wrapper result dMn result return result To become active the wrapper merely needs to be present in a text section somewhere in the same process address space as the function it wraps and for its ELF symbol name to be visible to Valgrind In practice this means either compiling to a o and linking it in or compiling to a so and LD_PRELOADing itin The latter is more convenient in that it doesn t require relinking All wrappers have approximately the above form There are three crucial macros I WRAP SONAME FNNAME ZU this generates the real name of the wrapper This is an encoded name which Valgrind notices when reading symbol table information What it says is I am the wrapper for any function named foo which is found in an ELF shared object with an empty NONE soname field The specification mechanism is powerful in that wildcards are allowed for both sonames and function names The fine details are discussed below VALGRIND GET ORIG FN once in the the wrapper the first priority is to get hold of the address of the original and any other supporting information needed This is stored in a value of opaque type OrigFn The in
69. the difference 66 Callgrind a heavyweight profiler if you profile with skip plt no If a call is ignored cost events happening will be attached to the enclosing function If you have a recursive function you can distinguish the first 10 recursion levels by specifying fn recursionl0 funcprefix Or for all functions with fn recursion 10 but this will give you much bigger profile data files In the profile data you will see the recursion levels of func as the different functions with names func func 2 func 3 and so on If you have call chains A gt B gt C and A gt C gt B in your program you usually get a false cycle B lt gt C Use fn caller2 B fn caller2 C and functions B and C will be treated as different functions depending on the direct caller Using the apostrophe for appending this context to the function name you get A gt B A gt C B and A gt C A gt B C and there will be no cycle Use n caller 3 to get a 2 caller dependency for all functions Note that doing this will increase the size of profile data files 5 4 Command line option reference In the following options are grouped into classes in same order as the output as callgrind help 5 4 1 Miscellaneous options help Show summary of options This is a short version of this manual section version Show version of callgrind 5 4 2 Dump creation options These options influence the name
70. the movl 0x1 0xffffffec ebp instruction covers the address range 0x8048f2b 0x804833 by itself and attributes the counts for the mov esi Sesi toit Inlined functions can cause strange results in the function by function summary If a function inline_me is defined in oo h and inlined in the functions 1 2 and 3 in bar c there will not be a foo h inline me function entry Instead there will be separate function entries for each inlining site le foo h f1 foo h 2 and foo h 3 To find the total counts for foo h inline_me add up the counts from each entry The reason for this is that although the debug info output by gcc indicates the switch from bar c to foo h it doesn t indicate the name of the function in foo h so Valgrind keeps using the old one Sometimes the same filename might be represented with a relative name and with an absolute name in different parts of the debug info eg home user proj proj h and proj h In this case if you use auto annotation the file will be annotated twice with the counts split between the two Files with more than 65 535 lines cause difficulties for the stabs debug info reader This is because the line number in the struct nlist defined in a out h under Linux is only a 16 bit value Valgrind can handle some files with more than 65 535 lines correctly by making some guesses to identify line number overflows But some cases are beyond it in which case you ll get
71. valgrind MPI wrappers 16386 Try MPIWRAP DEBUG help for possible options and then be relatively quiet You can give a list of comma separated options in MPIWRAP DEBUG These are 35 Using and understanding the Valgrind core verbose show entries exits of all wrappers Also show extra debugging info such as the status of outstanding MPI_Requests resulting from uncompleted MPI Irecvs quiet opposite of verbose only print anything when the wrappers want to report a detected programming error or in case of catastrophic failure of the wrappers e warn by default functions which lack proper wrappers are not commented on just silently ignored This causes a warning to be printed for each unwrapped function used up to a maximum of three warnings per function e strict print an error message and abort the program if a function lacking a wrapper is used If you want to use Valgrind s XML output facility xml yes you should pass quiet in MP IWRAP_DEBUG so as to get rid of any extraneous printing from the wrappers 2 16 4 Abilities and limitations 2 16 4 1 Functions All MPI2 functions except MPI_Wtick MPI_Wtime and MPI_Pcontrol have wrappers The first two are not wrapped because they return a double and Valgrind s function wrap mechanism cannot handle that it could easily enough be extended to MPI_Pcontrol cannot be wrapped as it has variable arity int MPI_Pcontrol const int level Most functi
72. value In the discussions which follow this bit is referred to as the V valid value bit Each byte in the system therefore has a 8 V bits which follow it wherever it goes For example when the CPU loads a word size item 4 bytes from memory it also loads the corresponding 32 V bits from a bitmap which stores the V bits 46 Memcheck a heavyweight memory checker for the process entire address space If the CPU should later write the whole or some part of that value to memory at a different address the relevant V bits will be stored back in the V bit bitmap In short each bit in the system has an associated V bit which follows it around everywhere even inside the CPU Yes all the CPU s registers integer floating point vector and condition registers have their own V bit vectors Copying values around does not cause Memcheck to check for or report on errors However when a value is used in a way which might conceivably affect the outcome of your program s computation the associated V bits are immediately checked If any of these indicate that the value is undefined an error is reported Here s an admittedly nonsensical example int i j ime ALLO Je 017 ioe al e Ug L lt Ws xx 4 J Salil lola Jg Memcheck emits no complaints about this since it merely copies uninitialised values from a into b and doesn t use them in any way However if the loop is changed to ious a Ug aL lt JD
73. versions A key indicator of this is if Memcheck says All heap blocks were freed no leaks are possible when you know your program calls malloc The workaround is to avoid statically linking your program Why doesn t Memcheck find the array overruns in this program int static 5 int main void int stack 5 static 5 0 stack 5 0 return 0 Unfortunately Memcheck doesn t do bounds checking on static or stack arrays We d like to but it s just not possible to do in a reasonable way that fits with how Memcheck works Sorry 86 Valgrind Frequently Asked Questions 6 Miscellaneous 6 1 I tried writing a suppression but it didn t work Can you write my suppression for me Yes Use the gen suppressions yes feature to spit out suppressions automatically for you You can then edit them if you like eg combining similar automatically generated suppressions using wildcards like up m If you really want to write suppressions by hand read the manual carefully Note particularly that C function names must be mangled on 6 2 With Memcheck s memory leak detector what s the difference between definitely lost possibly lost still reachable and suppressed The details are in the Memcheck section of the user manual In short definitely lost means your program is leaking memory fix it possibly lost means your program is probably leaking memory unless you re doing
74. well but then having to push pop it around special uses e ebp points permanently at VG baseBlock Valgrind s translations are position independent partly because this is convenient but also because translations get moved around in TC as part of the LRUing activity All static entities which need to be referred to from generated code whether data or helper functions are stored starting at VG baseBlock and are therefore reached by indexing from ebp There is but one exception which is that by placing the value VG EBP DISPATCH CHECKED in ebp just before a return to the dispatcher the dispatcher is informed that the next address to run in eax requires special treatment e The real machine s FPU state is pretty much unimportant for reasons which will become obvious Ditto its eflags register The state of the simulated CPU is stored in memory in VG baseBlock which is a block of 200 words IIRC Recall that ebp points permanently at the start of this block Function vg init baseBlock decides what the offsets of various entities in VG baseBlock are to be and allocates word offsets for them The code generator then emits Sebp relative addresses to get at those things The sequence in which entities are allocated has been carefully chosen so that the 32 most popular entities come first because this means 8 bit offsets can be used in the generated code If I was clever I could make ebp point 32 words along VG
75. will appear to work but fail sporadically f your program does its own memory management rather than using malloc new free delete it should still work but Valgrind s error checking won t be so effective If you describe your program s memory management scheme using client requests see The Client Request mechanism Memcheck can do better Nevertheless using malloc new and free delete is still the best approach Valgrind s signal simulation is not as robust as it could be Basic POSIX compliant sigaction and sigprocmask functionality is supplied but it s conceivable that things could go badly awry if you do weird things with signals Workaround don t Programs that do non POSIX signal tricks are in any case inherently unportable so should be avoided if possible e Machine instructions and system calls have been implemented on demand So it s possible although unlikely that a program will fall over with a message to that effect If this happens please report ALL the details printed out so we can try and implement the missing feature Memory consumption of your program is majorly increased whilst running under Valgrind This is due to the large amount of administrative information maintained behind the scenes Another cause is that Valgrind dynamically translates the original executable Translated instrumented code is 12 18 times larger than the original so you can easily end up with 50 MB of translations when running eg a w
76. 1 I1 miss rate 0 0 31751 L2i miss rate 0 0 31751 31751 D refs 230 290 lO UI SE sel qp bo NA Wis vase y 31751 D1 misses L S 4 21 905 EIS CES TORZE ON wie 31751 L2 misses 29 055 1 SPEO ol r 19 098 wr 31751 D1 miss rate 00 2 OPE 0 4 31751 L2d miss rate OPEM Os F 0 4 31751 31751 L2 misses ZiSyr O Al 2152 3l a 19 098 wr 31751 L2 miss rate 0 0 OO 0 4 Cache accesses for instruction fetches are summarised first giving the number of fetches made this is the number of instructions executed which can be useful to know in its own right the number of I1 misses and the number of L2 instruction L2i misses Cache accesses for data follow The information is similar to that of the instruction fetches except that the values are also shown split between reads and writes note each row s rd and wr values add up to the row s total Combined instruction and data figures for the L2 cache follow that 4 2 1 Output file As well as printing summary information Cachegrind also writes line by line cache profiling information to a file named cachegrind out pid This file is human readable but is best interpreted by the accompanying program cg_annotate described in the next section Things to note about the cachegrind out pid file e tis written every time Cachegrind is run and will overwrite any existing cachegrind out pid in the current directory but that won t ha
77. 1 9 5 available to your users You can regard the 1 0 X branch as obsolete 1 9 5 is stable and vastly superior There are no plans at all for further releases of the 1 0 X branch If you want a leading edge valgrind consider building the cvs head from SourceForge or getting a snapshot of it Current cool stuff going in includes MMX support done SSE SSE2 support in progress a significant 10 20 performance improvement done and the usual large collection of minor changes Hopefully we will be able to improve our NPTL support but no promises 181 5 README Release notes for Valgrind If you are building a binary package of Valgrind for distribution please read README_PACKAGERS It contains some important information If you are developing Valgrind please read README DEVELOPERS It contains some useful information For instructions on how to build install see the end of this file Valgrind works on most reasonably recent Linux setups If you have problems consult FAQ txt to see if there are workarounds Executive Summary Valgrind is an award winning suite of tools for debugging and profiling Linux programs With the tools that come with Valgrind you can automatically detect many memory management and threading bugs avoiding hours of frustrating bug hunting making your programs more stable You can also perform detailed profiling to speed up and reduce memory use of your programs The Valgrind distribution curre
78. 184 isn t a link We could rerun the program with a greater depth value if we wanted more information Sometimes you will get a code location like this SO 3 2 Odia wo The code address isn t really OXFFFFFFFF that s impossible This is what Massif does when it can t work out what the real code address is Massif produces this information in a plain text file by default or HTML with the ormat html option The plain text version obviously doesn t have the links but a similar effect can be achieved by searching on the code addresses In Vim the and searches are ideal for this 6 3 1 Accuracy The information should be pretty accurate Some approximations made might cause some allocation contexts to be attributed with less memory than they actually allocated but the amounts should be miniscule The heap admin spacetime figure is an approximation as described above If anyone knows how to improve its accuracy please let us know 6 4 Massif Options Massif specific options are heap lt yes no gt default yes When enabled profile heap usage in detail Without it the massif pid txt or massif pid html will be very short 75 Massif a heap profiler heap admin lt number gt default 8 The number of admin bytes per block to use This can only be an estimate of the average since it may vary The allocator used by glibc requires somewhere between 4 to 15 bytes per block depending on var
79. 20 700 As position specifications carry no information themself but only change the meaning of subsequent cost lines or associations they can appear everywhere in the file without any negative consequence Especially you can define name compression mappings directly after the header and before any cost lines Thus the above example can also be written as 126 Callgrind Format Specification events Instructions define file ID mapping f1 1 filel c f1 2 file2 c define function ID mapping fn 1 main fn 2 funcl fn 3 func2 f1 1 fn 1 6 20 3 1 6 Subposition Compression If a Calltree data file should hold costs for each assembler instruction of a program you specify subpostion instr in the positions header line and each cost line has to include the address of some instruction Addresses are allowed to have a size of 64bit to support 64bit architectures This motivates for subposition compression instead of every cost line starting with a 16 character long address one is allowed to specify relative subpositions A relative subposition always is based on the corresponding subposition of the last cost line and starts with a to on specify a positive difference a to specify a negative difference or consists of to specify the same subposition Assume the following example subpositions can always be specified as hexadecimal numbers beginning with Ox positions instr line even
80. 32 crash syscall timer settime 124499 amd64 2IR OxF OxE 0x48 0x85 femms 124528 FATAL aspacem assertion failed segment is sane 155 NEWS 124697 vex x86 gt IR OxF 0x70 OxC9 0x0 pshufw 124892 vex x86 gt IR OxF3 OxAE REPx SCASB 126216 124892 124808 ppc32 sys_sched_getaffinity not handled n i bz Very long stabs strings crash m_debuginfo n i bz amd64 gt IR 0x66 OxF OxF5 pmaddwd 125492 ppc32 support a bunch more syscalls 121617 ppc32 64 coredumping gives assertion failure 121814 Coregrind return error as exitcode patch 126517 121814 125607 amd64 gt IR 0x66 OxF OxA3 0x2 btw etc 125651 amd64 2IR OxF8 0x49 OxFF OxE3 clc 126253 x86 movx is wrong 126451 3 2 SVN doesn t work on ppc32 CPU s without FPU 126217 increase threads 126243 vex x86 gt IR popw mem 126583 amd64 gt IR 0x48 OxF OxA4 OxC2 shld 1 rax rdx 126668 amd64 gt IR Ox1C OxFF sbb 0xff al 126696 support for CDROMREADRAW ioctl and CDROMREADTOCENTRY fix 126722 assertion segment is sane at m aspacemgr aspacemgr c 1624 126938 bad checking for syscalls linkat renameat symlinkat 3 2 0RC1 27 May 2006 vex r1626 valgrind 15947 3 2 0 7 June 2006 vex r1628 valgrind r5957 Release 3 1 1 15 March 2006 3 1 1 fixes a bunch of bugs reported in 3 1 0 There is no new functionality The fixed bugs are note n i bz means not in bugzilla this bug does not have a bugzilla entry n i bz ppc
81. 32 fsub 3 3 3 in dispatcher doesn t clear NaNs n i bz ppc32 __NR_ set get priority 117332 x86 missing line info with icc 8 1 117366 amd64 OxDD 0x7C fnstsw 118274 117366 117367 amd64 OxD9 OxF4 fxtract 117369 amd64 NR getpriority 140 117419 ppc32 Ifsu f5 4 r11 117419 ppc32 fsqrt 117936 more stabs problems segfaults while reading debug info 119914 117936 120345 117936 118239 amd64 OxF OxAE Ox3F clflush 118939 vm86old system call n i bz memcheck tests mempool reads freed memory n i bz AshleyP s custom allocator assertion n i bz Dirk strict aliasing stuff n i bz More space for debugger cmd line Dan Thaler n i bz Clarified leak checker output message n i bz AshleyP s gen suppressions output fix 156 NEWS n i bz cg annotate s sort option broken n i bz OSet 64 bit fastcmp bug n i bz VG getgroups fix Shinichi Noda n i bz ppc32 allocate from callee saved FP VMX regs n i bz misaligned path word size bug in mc_main c 119297 Incorrect error message for sse code 120410 x86 prefetchw OxF OxD 0x48 0x4 120728 TIOCSERGETLSR TIOCGICOUNT HDIO GET DMA ioctls 120658 Build fixes for gcc 2 96 120734 x86 Support for changing EIP in signal handler n i bz memcheck tests zeropage de looping fix n i bz x86 fxtract doesn t work reliably 121662 x86 lock xadd 0xF0 OxF 0xCO 0x2 121893 calloc does not always return zeroed memory 121901 no support for syscall tkill n i bz
82. 6 Translation into UCode cssig esse vase iudi Eee Pu WA ead owe p vere au al b RN 103 1 2 7 UCode optimisation iius ek eee ee fr eme et kt a PRI RU DE dda eee RE Ree eg d 104 1 2 8 UCode instrumentation 6 5 cid ce oda gente us aee cadi bd rude Ri ER reed 105 1 2 9 UCode post instrumentation cleanup eee eens 108 1 2 10 Translation from UCOd e cerne ee Lies tsi iets Sateen aa dee iie Bale ead Drei te d 110 1 2 11 Top level dispatch l00p 2 2 R1 A ee Pe eel delet ain eA ee ate DEL Hte 111 1 2 12 Lazy updates of the simulated program counter 10 eect I 111 1 27 13 S1gnals A A LER os S LLLA e vti dete 111 1 2 14 To be written rl A br Eurer rate t dert teet rest leote 111 1 35 ExtensionS ilensteAGAG eX pem ub Ripe ERI WR Pe ISed RU Re Ria IE de eed eed tmb eg pace 112 L3 5 BUBS eu BbLEMeReh du of be Tobia a Abe de RP pen oes Daher UE aa aided bb dee petere 112 13 2 Threads 2i De is a ue A AGAS RLMEG ER PEARCE as eie queo 112 1 3 3 Venfication suite ole cyt ie be eve a A de tune eddie gaan Pebga i por edad 112 1 3 4 Porting to other platforms ieies oes ari pled daa p er gegen ee ha 112 1 4 Easy stuff which ought to be done cece es 113 1 41 MMX Instructions oo tek ke eek enm Ede a es a ede e tei a are Glad aes doa a 113 1 42 Fax stabs into reader abba sit A REL ee dE Le eda 113 1 53 BT BTCIBTS BTR uu Pine inne E E Cae tete ette xpo ae Ped 113 1 44 Usng PREFEICH Instructi nS ciar sont AA T E AAA Sa
83. 657062 al JL 73 360 NAS WALZ 0 0 73 356 4 0 0 al 4 0 0 il 3 0 0 2 Diim a 0 0 Dw Dimw 4 0 a il 3 0 998 0 997 53 4 0 Oia TOO 13 SSC 2 0 2 0 D2mw void init_hash_table char file_name Word_N 4 FILE xfile_ptr Word_Info xdata dl ine Jiang il alg 0 data Word Info create sizeof Word Ili 0 for i 0 i TABLE SIZE i 52 table i NULL Opens cile checks acces 0 file_ptr fopen file_name r if file ptr fprintf stderr Couldn t open s n exit EXIT FAILURE 0 0 while line get_word data line f 0 0 insert data gt word data gt line t 0 free data 0 telose tellekpetr Although column widths are automatically minimised a wide terminal is clearly useful Each source file is clearly marked User annotated source as having been chosen manually for annotation If the file was found in one of the directories specified with the I include option the directory and file are both given Each line is annotated with its event counts Events not applicable for a line are represented by a this is useful for distinguishing between an event which cannot happen and one which can but did not Sometimes only a small section of a source file is executed To minimise uninteresting output Cachegrind only shows annotated lines and lines within a small distance of annotated lines Gaps are marked with the line numb
84. AGS 02 LIBS Iposix configure Or on systems that have the env program you can do it like this env CPPFLAGS I usr local include LDFLAGS s configure Compiling For Multiple Architectures You can compile the package for more than one kind of computer at the same time by placing the object files for each architecture in their own directory To do this you must use a version of make that supports the VPATH variable such as GNU make cd to the directory where you want the object files and executables to go and run the configure script configure automatically checks for the source code in the directory that configure is in and in If you have to use a make that does not supports the VPATH variable you have to compile the package for one architecture at a time in the source code directory After you have installed the package for one architecture use make distclean before reconfiguring for another architecture Installation Names By default make install will install the package s files in usr local bin usr local man etc You can specify an installation prefix other than usr local by giving configure the option prefix PATH You can specify separate installation prefixes for architecture specific files and architecture independent files If you give configure the option exec prefix PATH the package will use PA
85. AM AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESSED OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU SHOULD THE 199 The GNU General Public License PROGRAM PROVE DEFECTIVE YOU ASSUME THE COST OF ALL NECESSARY SERVICING REPAIR OR CORRECTION 12 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY MODIFY AND OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE BE LIABLE TO YOU FOR DAMAGES INCLUDING ANY GENERAL SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program and you want it to be of the greatest possible use to the public the best way to achieve this is to make it free software which everyone can redistribute and change under these terms To do so attach the following notices to the program It is safest to attach them to the start of each source file to most effectively convey the exclusion
86. Data Race Detector for Multithreaded Programs Stefan Savage Michael Burrows Greg Nelson Patrick Sobalvarro and Thomas Anderson ACM Transactions on Computer Systems 15 4 391 411 November 1997 We also incorporate significant improvements from this paper Runtime Checking of Multithreaded Applications with Visual Threads Jerry J Harrow Jr Proceedings of the 7th International SPIN Workshop on Model Checking of Software Stanford California USA August 2000 LNCS 1885 pp331 342 K Havelund J Penix and W Visser editors 7 2 What Helgrind Does Basically what Helgrind does is to look for memory locations which are accessed by more than one thread For each such location Helgrind records which of the program s pthread_mutex_ locks were held by the accessing thread at the time of the access The hope is to discover that there is indeed at least one lock which is used by all threads to protect that location If no such lock can be found then there is apparently no consistent locking strategy being applied for that location and so a possible data race might result Helgrind also allows for thread segment lifetimes If the execution of two threads cannot overlap for example if your main thread waits on another thread with a pthread join operation they can both access the same variable without holding a lock There s a lot of other sophistication in Helgrind aimed at reducing the number of false reports and at producing us
87. ESTVL qa is the addr defined LOADV ta qloaded fetch V bits for the addr LOAD ta tloaded do the original load At the point where the LOADV is done we know the actual address ta from which the real LOAD will be done We also know that the LOADV will take around 20 x86 insns to do So it seems plausible that doing a prefetch of ta just before the LOADV might just avoid a miss at the LOAD point and that might be a significant performance win Prefetch insns are notoriously tempermental more often than not making things worse rather than better so this would require considerable fiddling around It s complicated because Intels and AMDs have different prefetch insns with different semantics so that too needs to be taken into account As a general rule even placing the prefetches before the LOADV insn is too near the LOAD the ideal distance is apparently circa 200 CPU cycles So it might be worth having another analysis transformation pass which pushes prefetches as far back as possible hopefully immediately after the effective address becomes available Doing too many prefetches is also bad because they soak up bus bandwidth cpu resources so some cleverness in deciding which loads to prefetch and which to not might be helpful One can imagine not prefetching client stack relative SEBP or ESP accesses since the stack in general tends to show good locality anyway There s quite a lot of experimentation to do here bu
88. GO GN CE See SO7 99l Sil SL 897 851 95 30 62 il L PPPSP Re 598 068 i 1 299 034 0 y 149 517 0 O sysdeps generic lockfil 598 068 0 O 299 034 0 O 149 517 0 O sysdeps generic lockfil 5987 O24 4 ar AAs OSO 35 LS 149505 0 O vg_clientmalloc c malloc e c flo e c fun 446 587 i TENES IVS 2r tor 4S0 OIOV S O57 OS e OB orante aaa eeke Sa VSO 2 2 AS LO 0 O 12 3 160 0 0 vg clientmalloc c vg trap here WRAP IAN VIZ 4 4 150 7 296 0 595 O27 53 DS comenzo salma ars ble 2907990 al JL 106 7895 0 0 SA 071 il JL concer e smasias 149 518 0 14 Sis 0 0 11 0 O tolower GLIBC_2 0 149 518 0 O 149 516 0 0 1 0 0 fgetc GLIBC_2 0 SS 4 4 SE ODI 0 0 CATA OOS Sees LIO comas eenen woo moss 85 440 0 0 42 720 0 0 DAL SOO 0 O vg_clientmalloc c vg_bogus_epilogue 55 Cachegrind a cache profiler First up is a summary of the annotation options e 1 cache D1 cache L2 cache cache configuration So you know the configuration with which these results were obtained Command the command line invocation of the program under examination Events recorded event abbreviations are e Ir I cache reads ie instructions executed e I 1mr Il cache read misses e T2mr L2 cache instruction read misses Dr D cache reads ie memory reads Dimr D1 cache read misses D2mr L2 cache data read misses Dw D cache writes ie memory writes D1mw D1 cache write misses D2mw L2 cache data write misses Note that D1 total accesses is given
89. IR fucomp 0xDD OxE9 114196 vex x86 gt IR out eax dx OxEF OxC9 OxC3 0x90 114289 Memcheck fails to intercept malloc when used in an uclibc environment 114756 mbind syscall support 114757 Valgrind dies with assertion Assertion noLargerThan gt 0 failed 114563 stack tracking module not informed when valgrind switches threads 114564 clone and stacks 114565 114564 115496 glibc crashes trying to use sysinfo page 116200 enable fsetxattr fgetxattr and fremovexattr for amd64 3 1 0RC1 20 November 2005 vex r1466 valgrind 15224 3 1 0 26 November 2005 vex 11471 valgrind 15235 Release 3 0 1 29 August 2005 3 0 1 fixes a bunch of bugs reported in 3 0 0 There is no new functionality Some of the fixed bugs are critical so if you use distribute 3 0 0 an upgrade to 3 0 1 is recommended The fixed bugs are note n i bz means not in bugzilla this bug does not have a bugzilla entry 109313 110505 x86 cmpxchg8b n i bz x86 track but ignore changes to eflags AC alignment check 110102 dis op2 E G amd64 110202 x86 sys_waitpid 286 110203 clock getres 0 110208 execve fail wrong retval 110274 SSEI now mandatory for x86 110388 amd64 OxDD OxDI 160 NEWS 110464 amd64 OxDC 0x1D FCOMP 110478 amd64 OxF OxD PREFETCH n i bz XML unique printing wrong n i bz Dirk r4359 amd64 syscalls from trunk 110591 amd64 and x86 rdtsc not implemented properly n i bz Nick r4384 stub imple
90. ImproveAND1 TQ t10 q14 29 TAG20 ep Wakil C pz eu y 30 TAG20 q10 DifD1 g14 q10 31 MOVL g12 q14 32 TAG20 q14 ImproveAND1 TQ t12 q14 SIS TAG 2O q10 DifD1 g14 q10 34 MOVL al0 qi6 SISTEMA SS apost ON ele 3108 ENS BIG qi6 37 ANDB AO WOS ACE IGI MNIENIGSRNPONEESIO 39 GETVFo q18 40 TESTVo qi18 41 SETVo qlig 107 42 Jnzo 0x40435A50 rOSZACP 43 JMPo 0x40435A5B The Design and Implementation of Valgrind 1 2 9 UCode post instrumentation cleanup This pass coordinated by vg_cleanup removes redundant definedness computation created by the sim plistic instrumentation pass It consists of two passes vg propagate definedness followed by vg delete redundant SETVs vg propagate definedness is a simple constant propagation and constant folding pass It tries to determine which TempRegs containing V bits will always indicate fully defined and it propagates this information as far as it can and folds out as many operations as possible For example the instrumentation for an ADD of a literal to a variable quantity will be reduced down so that the definedness of the result is simply the definedness of the variable quantity since the literal is by definition fully defined vg delete redundant SETVs removes SETVs on shadow TempRegs for which the next action is a write I don t think there s anything else worth saying about this it is simple Read the source
91. LE addr len VALGRIND_CHECK_INITIALISED addr len I then include in my sources a header defining these macros rebuild my app run under Valgrind and get user defined checks Now here s a neat trick It s a nuisance to have to re link the app with some new library which implements the above macros So the idea is to define the macros so that the resulting executable is still completely stand alone and can be run without Valgrind in which case the macros do nothing but when run on Valgrind the Right Thing happens How to do this The idea is for these macros to turn into a piece of inline assembly code which 1 has no effect when run on the real CPU 2 is easily spotted by Valgrind s JITter and 3 no sane person would ever write which is important for avoiding false matches in 2 So here s a suggestion VALGRIND_MAKE_NOACCESS addr len becomes roughly speaking movl addr Seax movl len Sebx movi gi ecx 1 describes the action MAKE WRITABLE might be 2 Ste mol Sus Ec sol Sal Ec conl Silla Ec rorl 21 eax The rotate sequences have no effect and it s unlikely they would appear for any other reason but they define a unique byte sequence which the JITter can easily spot Using the operand constraints section at the end of a gcc inline assembly statement we can tell gcc that the assembly fragment kills eax ebx ecx and the condition codes so this fragment is ma
92. MPI COMBINER HVECTORMPI COMBINER INDEXED MPI COMBINER HINDEXED andMPI COMBINER STRUCT This should cover all MPI 1 1 types The mechanism function walk type should extend easily to cover MPI2 combiners MPI defines some named structured types MPI FLOAT INT MPI DOUBLE INT MPI LONG INT MPI 2INT MPI SHORT INT MPI LONG DOUBLE INT which are pairs of some basic type and a C int Unfortunately the MPI specification makes it impossible to look inside these types and see where the fields are Therefore these wrappers assume the types are laid out as struct float val int loc forMPI FLOAT INT etc and act accordingly This appears to be correct at least for Open MPI 1 0 2 and for Quadrics MPI If strict is an option specified in MPIWRAP DEBUG the application will abort if an unhandled type is encountered Otherwise the application will print a warning message and continue Some effort is made to mark check memory ranges corresponding to arrays of values in a single pass This is important for performance since asking Valgrind to mark check any range no matter how small carries quite a large constant cost This optimisation is applied to arrays of primitive types double float int long long long short char andlong double on platforms where sizeof long double 8 For arrays of all other types the wrappers handle each element individually and so there can
93. Mist toda aia dd ii Ue AA eres td 3 6 More information iv The Valgrind Quick Start Guide The Valgrind Quick Start Guide 1 Introduction The Valgrind distribution has multiple tools The most popular is the memory checking tool called Memcheck which can detect many common memory errors such as touching memory you shouldn t eg overrunning heap block boundaries using values before they have been initialized incorrect freeing of memory such as double freeing heap blocks memory leaks What follows is the minimum information you need to start detecting memory errors in your program with Memcheck Note that this guide applies to Valgrind version 2 4 0 and later some of the information is not quite right for earlier versions 2 Preparing your program Compile your program with g to include debugging information so that Memcheck s error messages include exact line numbers Using O0 is also a good idea if you can tolerate the slowdown With 01 line numbers in error messages can be inaccurate although generally speaking Memchecking code compiled at O1 works fairly well Use of C2 and above is not recommended as Memcheck occasionally reports uninitialised value errors which don t really exist 3 Running your program under Memcheck If you normally run your program like this myprog argl arg2 Use this command line valgrind leak check yes myprog argl arg2 Memcheck is the default tool The 1eak ch
94. ND NON SIMD CALL 0123 executes a function of 0 1 2 or 3 args in the client program on the real CPU not the virtual CPU that Valgrind normally runs code on These are used in various ways internally to Valgrind They might be useful to client programs Warning Only use these if you really know what you are doing VALGRIND PRINTF format printf a message to the log file when running under Valgrind Nothing is output if not running under Valgrind Returns the number of characters output VALGRIND PRINTF BACKTRACE format printf a message to the log file along with a stack backtrace when running under Valgrind Nothing is output if not running under Valgrind Returns the number of characters output VALGRIND STACK REGISTER start end Register a new stack Informs Valgrind that the memory range between start and end is a unique stack Returns a stack identifier that can be used with other VALGRIND STACK x calls Valgrind will use this information to determine if a change to the stack pointer is an item pushed onto the stack or a change over to a new stack Use this if you re using a user level thread package and are noticing spurious errors from Valgrind about uninitialized memory reads VALGRIND STACK DEREGISTER id Deregister a previously registered stack Informs Valgrind that previously registered memory range with stack id id is no longer a stack VALGRIND STACK CHANGE id start end Change a previously registered s
95. NX LINXY PLAX PLAXY GEN for generic syscalls in syswrap generic c LIN for linux specific ones in syswrap linux c and PLA for the platform dependant ones in syswrap PLATFORM linux c The XY variant if it requires a PRE and POST function and the X variant if it only requires a PRE function The 2nd arg of these macros indicate if the syscall could possibly block If you find this difficult read the wrappers for other syscalls for ideas A good tip is to look for the wrapper for a syscall which has a similar behaviour to yours and use it as a starting point If you need structure definitions and or constants for your syscall copy them from the kernel headers into include vki h and co with the appropriate vki VKI name mangling Don t include any kernel headers And certainly don t include any glibc headers Test it Note that a common error is to call POST_MEM_WRITE with 0 NULL as the first address argument This usually means your logic is slightly inadequate It s a sufficiently common bug that there s a built in check for it and you ll get a probably sanity check failure for the syscall wrapper you just made if this is the case 186 README_MISSING_SYSCALL_OR_IOCTL 4 Once happy send us the patch Pretty please Writing your own ioctl wrappers Is pretty much the same as writing syscall wrappers except that all the action happens within PRE ioctl and POST ioctl T
96. OA OVERVIEW cisci ore lebe edu cv a coh be UMeeb eet sede Eee eb ed 79 92 Lackey Options coronilla Gd slate Wide pb ian Wath phe demam e d DE Red 79 vi l Introduction 1 1 An Overview of Valgrind Valgrind is a suite of simulation based debugging and profiling tools for programs running on Linux x86 amd64 and ppc32 The system consists of a core which provides a synthetic CPU in software and a series of tools each of which performs some kind of debugging profiling or similar task The architecture is modular so that new tools can be created easily and without disturbing the existing structure A number of useful tools are supplied as standard In summary these are 1 F2 Memcheck detects memory management problems in your programs All reads and writes of memory are checked and calls to malloc new free delete are intercepted As a result Memcheck can detect the following problems Use of uninitialised memory Reading writing memory after it has been free d Reading writing off the end of malloc d blocks Reading writing inappropriate areas on the stack Memory leaks where pointers to malloc d blocks are lost forever Mismatched use of malloc new new vs free delete delete Overlapping src and dst pointers in memcpy and related functions Problems like these can be difficult to find by other means often lying undetected for long periods then causing occasional difficult to diagnose crashes
97. PRELOAD prefix lib valgrind lt platform gt libmpiwrap so mpirun args prefix bin valgrind hello You should see something similar to the following valgrind MPI wrappers 31901 Active for pid 31901 valgrind MPI wrappers 31901 Try MPIWRAP_DEBUG help for possible options repeated for every process in the group If you do not see these there is an build installation problem of some kind The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching 1ibmpi sox This is known to be correct at least for Open MPI and Quadrics MPI and can easily be changed if required 2 16 2 Getting started Compile your MPI application as usual taking care to link it using the same mpicc that your Valgrind build was configured with Use the following basic scheme to run your application on Valgrind with the wrappers engaged MP IWRAP_DEBUG wrapper args LD_PRELOAD Sprefix lib valgrind lt platform gt libmpiwrap so mpirun mpirun args N Sprefix bin valgrind valgrind args application app args As an alternative to LD_PRELOADing 1ibmpiwrap so you can simply link it to your application if desired This should not disturb native behaviour of your application in any way 2 16 3 Controlling the wrapper library Environment variable MPIWRAP DEBUG is consulted at startup The default behaviour is to print a starting banner valgrind MPI wrappers 16386 Active for pid 16386
98. PTL only setups Greater isolation between Valgrind and the program being run so the program is less likely to inadvertently kill Valgrind by doing wild writes Massif a new space profiling tool Try it It s cool and it ll tell you in detail where and when your C C code is allocating heap Draws pretty ps pictures of memory use against time A potentially powerful tool for making sense of your program s space use File descriptor leakage checks When enabled Valgrind will print out a list of open file descriptors on exit Improved SSE2 SSE3 support Time stamped output use time stamp yes Stable release 2 2 0 31 August 2004 CHANGES RELATIVE TO 2 1 2 2 2 0 is not much different from 2 1 2 released seven weeks ago A number of bugs have been fixed most notably 85658 which gave problems for quite a few people There have been many internal cleanups but those are not user visible The following bugs have been fixed since 2 1 2 85658 Assert in coregrind vg libpthread c 2326 open64 void O failed This bug was reported multiple times and so the following duplicates of it are also fixed 87620 85796 85935 86065 86919 86988 87917 88156 80716 Semaphore mapping bug caused by unmap sem destroy Was fixed prior to 2 1 2 86987 semctl and shmctl syscalls family is not handled properly 86696 valgrind 2 1 2 RH AS2 1 librt 86730 valgrind locks up at end of run with assertio
99. Profiling Tools Most widely known is the GCC profiling tool GProf one needs to compile an application with the compiler option pg Running the program generates a file gmon out which can be transformed into human readable form with the command line tool gprof A disadvantage here is the the need to recompile everything and also the need to statically link the executable Another profiling tool is Cachegrind part of Valgrind It uses the processor emulation of Valgrind to run the executable and catches all memory accesses which are used to drive a cache simulator The program does not need to be recompiled it can use shared libraries and plugins and the profile measurement doesn t influence the memory access behaviour The trace includes the number of instruction data memory accesses and 1st 2nd level cache misses 63 Callgrind a heavyweight profiler and relates it to source lines and functions of the run program A disadvantage is the slowdown involved in the processor emulation around 50 times slower Cachegrind can only deliver a flat profile There is no call relationship among the functions of an application stored Thus inclusive costs i e costs of a function including the cost of all functions called from there cannot be calculated Callgrind extends Cachegrind by including call relationship and exact event counts spent while doing a call Because Callgrind and Cachegrind is based on simulation the slowdown due to p
100. READABLE MAKE MEM DEFINED CHECK WRITABLE CHECK MEM IS ADDRESSABLE CHECK READABLE CHECK MEM IS DEFINED CHECK DEFINED CHECK VALUE IS DEFINED The reason for the change is that the old names are subtly misleading The old names will still work but they are deprecated and may be removed in a future release We also added a new client request MAKE MEM DEFINED IF ADDRESSABLE a len which is like MAKE MEM DEFINED but only affects a byte if the byte is already addressable The way client requests are encoded in the instruction stream has changed Unfortunately this means 3 2 0 will not honour client requests compiled into binaries using headers from earlier versions of Valgrind We will try to keep the client request encodings more stable in future BUGS FIXED 108258 NPTL pthread cleanup handlers not called 117290 valgrind is sigKILL d on startup 117295 117290 118703 m signals c 1427 Assertion tst gt status VgTs WaitSys 118466 add reg Y reg generates incorrect validity for bit 0 123210 New strlen from Id linux on amd64 123244 DWARF2 CFI reader unhandled CFI instruction 0 18 123248 syscalls in glibc 2 4 openat fstatat symlinkat 123258 socketcall recvmsg msg msg iov i points to uninit 123535 mremap new addr requires MREMAP FIXED in 4th arg 123836 small typo in the doc 124029 ppc compile failed vor gcc 3 3 5 124222 Segfault 9 don t know what type is 124475 ppc
101. RS for details on running Valgrind under Valgrind To do simple tick based profiling of a tool include the line include vg_profile c in the tool somewhere and rebuild you may have to make clean first Then run Valgrind with the profile yes option The profiler is stack based you can register a profiling event with VG register profile event and then use the VGP PUSHCC and VGP_POPCC macros to record time spent doing certain things New profiling event numbers must not overlap with the core profiling event numbers See include pub tool profile h for details and Memcheck for an example 4 3 5 Other Makefile Hackery If you add any directories under valgrind foobar you will need to add an appropriate Makefile am to it and add a corresponding entry to the AC OUTPUT list in valgrind configure in If you add any scripts to your tool see Cachegrind for an example you need to add them to the b n SCRIPTS variable in valgrind foobar Makefile am 4 3 6 Core tool Interface Versions In order to allow for the core tool interface to evolve over time Valgrind uses a basic interface versioning system All a tool has to do is use the VG DETERMINE INTERFACE VERSION macro exactly once in its code If not a link error will occur when the tool is built 142 Writing a New Valgrind Tool The interface version number has the form X Y Changes in Y indicate binary compatible changes Changes in X indicate
102. TH as the prefix for installing programs and libraries Documentation and other data files will still use the regular prefix In addition if you use an unusual directory layout you can give options like bindir PATH to specify different values for particular kinds of files Run configure help for a list of the directories you can set and what kinds of files go in them If the package supports it you can cause programs to be installed with an extra prefix or suffix on their names by giving configure the 150 INSTALL option program prefix PREFIX or program suffix SUFFIX Optional Features Some packages pay attention to enable FEATURE options to configure where FEATURE indicates an optional part of the package They may also pay attention to with PACKAGE options where PACKAGE is something like gnu as or x for the X Window System The README should mention any enable and with options that the package recognizes For packages that use the X Window System configure can usually find the X include and library files automatically but if it doesn t you can use the configure options x includes DIR and x libraries DIR to specify their locations Specifying the System Type There may be some features configure can not figure out automatically but needs to determine by the type of host the package will run on Usu
103. Valgrind Documentation Release 3 2 0 7 June 2006 Copyright 2000 2006 AUTHORS Permission is granted to copy distribute and or modify this document under the terms of the GNU Free Documentation License Version 1 2 or any later version published by the Free Software Foundation with no Invariant Sections with no Front Cover Texts and with no Back Cover Texts A copy of the license is included in the section entitled The GNU Free Documentation License Valgrind Documentation Table of Contents Th Valerind Quick Start Guide i2 eigen imar a ea eels miden na see fers 3 Valgrind User Manual air a dese die ed eae at 4 Valgrind FAQ 1i n ese n oh Ye wea Shee de hee eS Ae ceeds agua dae ERAS 80 Valgrind Technical Documentation 0 0 een hen 88 Valgrind Distribution Documents enn hh hn 144 GNU LICENSES irnir iio el Ad dene de deen ae a ERU Ae HD chad M e E ERO be E OT 193 The Valgrind Quick Start Guide Release 3 2 0 7 June 2006 Copyright 2000 2006 Valgrind Developers Email valgrind valgrind org The Valgrind Quick Start Guide Table of Contents The Valgrind Quick Start Guide acs ec gen de elena e elle eed t Sada Ne nen suite Da 1 1 IntHOdUCHON eiii dd a Aa bei nee tal dl ghd sadam ee 1 2 Preparing your program erica a ti E A para ievh yeh A eel NR E E 1 3 Running your program under Memcheck 0 0 1 4 Interpreting Memcheck s output 2 0 00 eee enn net e eee hn 1 5 GavedtS
104. X and 4 X An important fact about demangling is that function names mentioned in suppressions files should be in their mangled form Valgrind does not demangle function names when searching for applicable suppressions because to do otherwise would make suppressions file contents dependent on the state of Valgrind s demangling machinery and would also be slow and pointless num callers lt number gt default 12 By default Valgrind shows twelve levels of function call names to help you identify program locations You can change that number with this option This can help in determining the program s location in deeply nested call chains Note that errors are commoned up using only the top four function locations the place in the current function and that of its three immediate callers So this doesn t affect the total number of errors reported The maximum value for this is 50 Note that higher settings will make Valgrind run a bit more slowly and take a bit more memory but can be useful when working with programs with deeply nested call chains rror limit lt yes no gt default yes When enabled Valgrind stops reporting errors after 10 000 000 in total or 1 000 different ones have been seen This is to stop the error tracking machinery from becoming a huge performance overhead in programs with many errors error exitcode number default 0 Specifies an alternative exit code to return if Valgrind reported any erro
105. _ baseBlock so that I d have another 32 words of short form offsets available but that s just complicated and it s not important the first 32 words take 99 or whatever of the traffic Currently the sequence of stuff in VG baseBlock is as follows oo 9 words holding the simulated integer registers SEAX SEDI and the simulated flags SEFLAGS Another 9 words holding the V bit shadows for the above 9 regs 96 The Design and Implementation of Valgrind The addresses of various helper routines called from generated code V6 helper_value_check4_fail VG_ helper_value_check0_fail which register V check failures VG_ helperc_STOREV4 VG_ helperc_STOREV1 VG_ helperc_LOADV4 VG helperc_LOADV1 which do stores and loads of V bits to from the sparse array which keeps track of V bits in memory and VGM_ handle_esp_assignment which messes with memory addressibility resulting from changes in SESP zal The simulated EIP 24 spill words for when the register allocator can t make it work with 5 measly registers e Addresses of helpers V6 helperc STOREV2 VG helperc_LOADV2 These are here because 2 byte loads and stores are relatively rare so are placed above the magic 32 word offset boundary e For similar reasons addresses of helper functions VGEM_ fpu write check andVGM fpu read check which handle the A V maps testing and changes required by FPU writes
106. _exec you could do like this in GDB gdb b vgPlain_do_exec 5 Run the tool with required options gdb run pwd Self hosting To run Valgrind under Valgrind 1 Check out 2 trees inner and outer inner runs the app directly and is what you will be profiling outer does the profiling 2 Configure inner with enable inner and build install as usual 3 Configure outer normally and build install as usual 189 README DEVELOPERS 4 Choose a very simple program date and try outer bin valgrind sim hints enable outer trace childrenzyes A tool cachegrind v inner bin valgrind tool none v prog If you omit the trace childrenzyes you ll only monitor inner s launcher program not its stage2 The whole thing is fragile confusing and slow but it does work well enough for you to get some useful performance data The inner Valgrind has most of its output ie those lines beginning with lt pid gt prefixed with a gt which helps a lot At the time of writing the allocator is not annotated with client requests so Memcheck is not as useful as it could be It also has not been tested much so don t be surprised if you hit problems When using self hosting with an outer callgrind tool use pop on jump on the outer Otherwise callgrind has much higher memory requirements Printing out problematic blocks If you want to print out a disassembly of a particular block that causes
107. ains plus any associated interface definition files plus the scripts used to 197 The GNU General Public License control compilation and installation of the executable However as a special exception the source code distributed need not include anything that is normally distributed in either source or binary form with the major components compiler kernel and so on of the operating system on which the executable runs unless that component itself accompanies the executable If distribution of executable or object code is made by offering access to copy from a designated place then offering equivalent access to copy the source code from the same place counts as distribution of the source code even though third parties are not compelled to copy the source along with the object code 4 You may not copy modify sublicense or distribute the Program except as expressly provided under this License Any attempt otherwise to copy modify sublicense or distribute the Program is void and will automatically terminate your rights under this License However parties who have received copies or rights from you under this License will not have their licenses terminated so long as such parties remain in full compliance 5 You are not required to accept this License since you have not signed it However nothing else grants you permission to modify or distribute the Program or its derivative works These actions are prohibited by law
108. alled by Valgrind s core They are then linked against the coregrind library Libcoregrind a that valgrind provides as well as the VEX library Libvex a that also comes with valgrind and provides the JIT engine Each tool is linked as a statically linked program and placed in the valgrind library directory from where valgrind will load it automatically when the too1 option is used to select it 4 2 4 Getting the code 135 Writing a New Valgrind Tool To write your own tool you ll need the Valgrind source code A normal source distribution should do although you might want to check out the latest code from the Subversion repository See the information about how to do so at the Valgrind website 4 2 5 Getting started Valgrind uses GNU automake and autoconf for the creation of Makefiles and configuration But don t worry these instructions should be enough to get you started even if you know nothing about those tools In what follows all filenames are relative to Valgrind s top level directory valgrind 6 7 Choose a name for the tool and an abbreviation that can be used as a short prefix We ll use foobar and fb as an example Make a new directory foobar which will hold the tool Copy none Makefile aminto foobar Edit it by replacing all occurrences of the string none with foobar and the one occurrence of the string nl_ with fb_ It might be worth trying to understand this file at
109. ally configure can figure that out but if it prints a message saying it can not guess the host type give it the hostZTYPE option TYPE can either be a short name for the system type such as sun4 or a canonical name with three fields CPU COMPANY S YSTEM See the file config sub for the possible values of each field If config sub isn t included in this package then this package doesn t need to know the host type If you are building compiler tools for cross compiling you can also use the target TYPE option to select the type of system they will produce code for and the build TYPE option to select the type of system on which you are compiling the package Sharing Defaults If you want to set default values for configure scripts to share you can create a site shell script called config site that gives default values for variables like CC cache_file and prefix configure looks for PREFIX share config site if it exists then PREFIX etc config site if it exists Or you can set the CONFIG SITE environment variable to the location of the site script A warning not all configure scripts look for a site script Operation Controls configure recognizes the following options to control how it operates 151 INSTALL cache file FILE Use and save the results of the tests in FILE instead of Jconfig cache Set FILE to dev nu
110. already a slow tool So the best solution is to turn off optimisation altogether Since this often makes things unmanagably slow a plausible compromise is to use O This gets you the majority of the benefits of higher optimisation levels whilst keeping relatively small the chances of false complaints from Memcheck All other tools as far as we know are unaffected by optimisation level Valgrind understands both the older stabs debugging format used by gcc versions prior to 3 1 and the newer DWARF format used by gcc 3 1 and later We continue to refine and debug our debug info readers although the majority of effort will naturally enough go into the newer DWARF2 reader When you re ready to roll just run your application as you would normally but place valgrind tool tool name in front of your usual command line invocation Note that you should run the real machine code executable here If your application is started by for example a shell or perl script you ll need to modify it to invoke Valgrind on the real executables Running such scripts directly under Valgrind will result in you getting error reports pertaining to bin sh usr bin perl or whatever interpreter you re using This may not be what you want and can be confusing You can force the issue by giving the flag t race children yes but confusion is still likely 2 3 The Commentary Valgrind tools write a commentary a stream of text detailing error reports and other si
111. and format of the profile data files base prefix default callgrind out Specify the base name for the dump file names To distinguish different profile runs of the same application pid is appended to the base dump file name with pid being the process ID of the profile run with multiple dumps happening the file name is modified further see below This option is especially usefull if your application changes its working directory Usually the dump file is generated in the current working directory of the application at program termination By giving an absolute path with the base specification you can force a fixed directory for the dump files dump instr no yes default no This specifies that event counting should be performed at per instruction granularity This allows for assembler code annotation but currently the results can only be shown with KCachegrind dump line no yes default yes This specifies that event counting should be performed at source line granularity This allows source annotation for sources which are compiled with debug information g 67 Callgrind a heavyweight profiler compress strings no yes default yes This option influences the output format of the profile data It specifies whether strings file and function names should be identified by numbers This shrinks the file size but makes it more difficult for humans to read which is not recommand eithe
112. aning of this License 2 VERBATIM COPYING You may copy and distribute the Document in any medium either commercially or noncommercially provided that this License the copyright notices and the license notice saying this License applies to the Document are reproduced in all copies and that you add no other conditions whatsoever to those of this License You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute However you may accept compensation in exchange for copies If you distribute a large enough number of copies you must also follow the conditions in section 3 You may also lend copies under the same conditions stated above and you may publicly display copies 3 COPYING IN QUANTITY If you publish printed copies or copies in media that commonly have printed covers of the Document numbering more than 100 and the Document s license notice requires Cover Texts you must enclose the copies in covers that carry clearly and legibly all these Cover Texts Front Cover Texts on the front cover and Back Cover Texts on the back cover Both covers must also clearly and legibly identify you as the publisher of these copies The front cover must present the full title with all words of the title equally prominent and visible You may add other material on the covers in addition Copying with changes limited to the covers as long as they preserve the title of th
113. apper and replacement functions This gives two limitations firstly longjumping out of wrappers will rapidly lead to disaster since the shadow stack will not get correctly cleared Secondly since the shadow stack has finite size recursion between wrapper replacement functions is only possible to a limited depth beyond which Valgrind has to abort the run This depth is currently 16 calls 27 Using and understanding the Valgrind core For all platforms x86 amd64 ppc32 ppc64 linux all the above comments apply on a per thread basis In other words wrapping is thread safe each thread must individually observe the above restrictions but there is no need for any kind of inter thread cooperation 2 10 6 Limitations original function signatures As shown in the above example to call the original you must use a macro of the form CALL_FN_x For technical reasons it is impossible to create a single macro to deal with all argument types and numbers so a family of macros covering the most common cases is supplied In what follows W denotes a machine word typed value a pointer or aC long and v denotes C s void type The currently available macros are CALL FN v v mca Tom qiiae 9 eye Wigulol ima eO ci CALL FN W v gt Camil euer oriona Morn ys lomo sciat ose CALL FN v W SOLE ac Auermu CALL FN W W lone sein Lam
114. ar in the process address space so the A bits must be updated if mmap succeeds Optionally your program can tell Valgrind about such changes explicitly using the client request mechanism described above 48 Memcheck a heavyweight memory checker 3 5 3 Putting it all together Memcheck s checking machinery can be summarised as follows Each byte in memory has 8 associated V valid value bits saying whether or not the byte has a defined value and a single A valid address bit saying whether or not the program currently has the right to read write that address When memory is read or written the relevant A bits are consulted If they indicate an invalid address Valgrind emits an Invalid read or Invalid write error When memory is read into the CPU s registers the relevant V bits are fetched from memory and stored in the simulated CPU They are not consulted When a register is written out to memory the V bits for that register are written back to memory too When values in CPU registers are used to generate a memory address or to determine the outcome of a conditional branch the V bits for those values are checked and an error emitted if any of them are undefined When values in CPU registers are used for any other purpose Valgrind computes the V bits for the result but does not check them One the V bits for a value in the CPU have been checked they are then set to indicate validity This a
115. ation and unstripped best Invalid write of size 1 at 0x80483BF really mallocl c 20 by 0x8048370 main mallocl c 9 With no debug information unstripped Invalid write of size 1 at 0x80483BF really in auto homes njn25 grind head5 a out by 0x8048370 main in auto homes njn25 grind head5 a out With no debug information stripped Invalid write of size 1 at 0x80483BF by 0x8048370 by 0x42015703 by 0x80482CC within auto homes njn25 grind head5 a out within auto homes njn25 grind head5 a out libc start main in lib tls libc 2 3 2 s0 within auto homes njn25 grind head5 a out With debug information and fomit frame pointer Invalid write of size 1 at 0x80483C4 really mallocl c 20 by 0x42015703 libc start main in lib tls libc 2 3 2 s0 by 0x80482CC start S 81 A leak error message involving an unloaded shared object 84 bytes in 1 blocks are possibly lost in loss record 488 of 713 at Ox1B9036DA by Ox1DB63EEB by 0x1DB4B800 by 0x1D65E007 operator new unsigned vg replace malloc c 132 22 220 272 by 0x8049EE6 main main cpp 24 4 3 The stack traces given by Memcheck or another tool seem to have the wrong function name in them What s happening Occasionally Valgrind stack traces get the wrong function names This is caused by glibc using aliases to effectively give one function two names Most of the time Val
116. ations Now pay attention I shall say this only once and it is important you understand this In what follows I will refer to registers in the host real cpu using their standard names eax Sedi etc I refer to registers in the simulated CPU by capitalising them SEAX SEDI etc These two sets of registers usually bear no direct relationship to each other there is no fixed mapping between them This naming scheme is used fairly consistently in the comments in the sources Host registers once things are up and running are used as follows e esp the real stack pointer points somewhere in Valgrind s private stack area VG_ stack or transiently into its signal delivery stack VG_ sigstack edi is used as a temporary in code generation it is almost always dead except when used for the Left value tag operations 9 Seax ebx ecx edx and Sesi are available to Valgrind s register allocator They are dead carry unimportant values in between translations and are live only in translations The one exception to this is eax which as mentioned far above has a special significance to the dispatch loop VG dispatch when a translation returns to the dispatch loop eax is expected to contain the original code address of the next translation to run The register allocator is so good at minimising spill code that using five regs and not having to save restore edi actually gives better code than allocating to edi as
117. ause Cachegrind doesn t report errors VALGRIND_MALLOCLIKE_BLOCK If your program manages its own memory instead of using the standard malloc new new tools that track information about heap blocks will not do nearly as good a job For example Memcheck won t detect nearly as many errors and the error messages won t be as informative To improve this situation use this macro just after your custom allocator allocates some new memory See the comments in valgrind h for information on how to use it VALGRIND_FREELIKE BLOCK This should be used in conjunction with VALGRIND_MALLOCLIKE_BLOCK Again see memcheck memcheck h for information on how to use it T VALGRIND_CREATE_MEMPOOL This is similar to VALGRIND_MALLOCLIKE_BLOCK but is tailored towards code that uses memory pools See the comments in valgrind h for information on how to use it VALGRIND_DESTROY_MEMPOOL This should be used in conjunction with VALGRIND_CREATE_MEMPOOL Again see the comments in valgrind h for information on how to use it 22 Using and understanding the Valgrind core VALGRIND_MEMPOOL_ALLOC This should be used in conjunction with VALGRIND_CREATE_MEMPOOL Again see the comments in valgrind h for information on how to use it VALGRIND_MEMPOOL_FREE This should be used in conjunction with VALGRIND_CREATE_MEMPOOL Again see the comments in valgrind h for information on how to use it VALGRI
118. aution If you don t understand what this option does then you almost certainly don t need it Currently known variants are bproc Support the sys broc system call on x86 This is for running on BProc which is a minor variant of standard Linux which is sometimes used for building clusters show emwarns yes no default no When enabled Valgrind will emit warnings about its CPU emulation in certain cases These are usually not interesting 20 Using and understanding the Valgrind core smc check none stack all default stack This option controls Valgrind s detection of self modifying code Valgrind can do no detection detect self modifying code on the stack or detect self modifying code anywhere Note that the default option will catch the vast majority of cases as far as we know Running with a11 will slow Valgrind down greatly but running with none will rarely speed things up since very little code gets put on the stack for most programs 2 6 6 Debugging Valgrind Options There are also some options for debugging Valgrind itself You shouldn t need to use them in the normal run of things If you wish to see the list use the help debug option 2 6 7 Setting default Options Note that Valgrind also reads options from three places 1 The file valgrindre 2 The environment variable VALGRIND_OPTS 3 The file valgrindrc These are processed in the given order before the command line options Opt
119. ay VG_ tt_fast is used as a direct map cache for fast lookups in TT it usually achieves a hit rate of around 98 and facilitates an orig to trans lookup in 4 x86 insns which is not bad Function VG dispatch in vg dispatch S is the heart of the JIT dispatcher Once a translated code address has been found it is executed simply by an x86 ca11 to the translation At the end of the translation the next original code addr is loaded into eax and the translation then does a ret taking it back to the dispatch loop with interestingly zero branch mispredictions The address requested in eax is looked up first in VG tt fast and if not found by calling C helper VG search transtab If there is still no translation available VG dispatch exits back to the top level C dispatcher VG_ toploop which arranges for VG translate to make a new translation All fairly unsurprising really There are various complexities described below The translator orchestrated by VG translate is complicated but entirely self contained It is described in great detail in subsequent sections Translations are stored in TC with TT tracking administrative information The translations are subject to an approximate LRU based management scheme With the current settings the TC can hold at most about 15MB of translations and LRU passes prune it to about 13 5MB Given that the orig to translation expansion ratio is about 13 1 to 14 1 this means TC holds transla
120. be a very large performance cost 2 16 5 Writing new wrappers For the most part the wrappers are straightforward The only significant complexity arises with nonblocking receives The issue is that MPI Irecv states the recv buffer and returns immediately giving a handle MPI Request for the transaction Later the user will have to poll for completion with MPI Wait etc and when the transaction completes successfully the wrappers have to paint the recv buffer But the recv buffer details are not presented to MPI Wait only the handle is The library therefore maintains a shadow table which associates uncompleted MPI Requests with the corresponding buffer address count type When an operation completes the table is searched for the associated address count type info and memory is marked accordingly Access to the table is guarded by a POSIX pthreads lock so as to make the library thread safe The table is allocated with malloc and never freed so it will show up in leak checks Writing new wrappers should be fairly easy The source file is auxprogs libmpiwrap c Ifpossible find an existing wrapper for a function of similar behaviour to the one you want to wrap and use it as a starting point The wrappers are organised in sections in the same order as the MPI 1 1 spec to aid navigation When adding a wrapper remember to comment out the definition of the default wrapper in the long list of defaults at the bottom of the file do not remov
121. be done but would give measurable performance overheads and so far no need for it has been found As on x86 AMD64 IEEE754 exceptions are not supported all floating point exceptions are handled using the default IEEE fixup actions Valgrind detects ignores and can warn about attempts to unmask the 5 IEEE FP exception kinds by writing to the floating point status and control register fpscr Vector Altivec VMX essentially as with x86 AMD64 SSE SSE2 no exceptions and limited observance of rounding mode For Altivec FP arithmetic is done in IEEE Java mode which is more accurate than the Linux default setting More accurate means that denormals are handled properly rather than simply being flushed to zero Programs which are known not to work are e emacs starts up but immediately concludes it is out of memory and aborts It may be that Memcheck does not provide a good enough emulation of the mal linfo function Emacs works fine if you build it to use the standard malloc free routines 2 14 An Example Run This is the log for a run of a small program using Memcheck The program is in fact correct and the reported error is as the result of a potentially serious code generation bug in GNU g snapshot 20010527 32 Using and understanding the Valgrind core sewardj phoenix newmat10 Valgrind 6 valgrind v bogon 25832 Valgrind 0 10 a memory error detector for x86 RedHat 7 1 25832 Copyright C 2000 2001 an
122. blefree c 10 Address 0x3807F7B4 is 0 bytes inside a block of size 177 free d at Ox4004FFDF free vg_clientmalloc c 577 by 0x80484C7 main tests doublefree c 10 Memcheck keeps track of the blocks allocated by your program with malloc new so it can know exactly whether or not the argument to free delete is legitimate or not Here this test program has freed the same block twice As with the illegal read write errors Memcheck attempts to make sense of the address free d If as here the address is one which has previously been freed you wil be told that making duplicate frees of the same block easy to spot 42 Memcheck a heavyweight memory checker 3 3 4 When a block is freed with an inappropriate deallocation function In the following example a block allocated with new has wrongly been deallocated with free Mismatched free delete delete at 0x40043249 free vg_clientfuncs c 171 by 0x4102BB4E QGArray QGArray void tools qgarray cpp 149 by 0x4C261C41 PptDoc PptDoc void include qmemarray h 60 by 0x4C261F0E PptXml PptXml void pptxml cc 44 Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc d at 0x4004318C builtin vec new vg clientfuncs c 152 by 0x4C21BC15 KLaola readSBStream int const klaola cc 314 by Ox4C21C155 KLaola stream KLaola OLENode const klaola cc 416 by Ox4 207388E T OLER leer convert OCString const amp olletiltes ce 272
123. bug and erroneously jumps to a non code address in which case you ll get a SIGILL signal Memcheck may issue a warning just before this happens but they might not if the jump happens to land in addressable memory I tried running a Java program or another program that uses a just in time compiler under Valgrind but something went wrong Does Valgrind handle such programs Valgrind can handle dynamically generated code so long as none of the generated code is later overwritten by other generated code If this happens though things will go wrong as Valgrind will continue running its translations of the old code this is true on x86 and AMD64 on PPC32 there are explicit cache flush instructions which Valgrind detects You should try running with smc check all in this case Valgrind will run much more slowly but should detect the use of the out of date code Alternativaly if you have the source code to the JIT compiler you can insert calls to the VALGRIND_DISCARD_TRANSLATIONS client request to mark out of date code saving you from us ing smc check all Apart from this in theory Valgrind can run any Java program just fine even those that use JNI and are partially implemented in other languages like C and C In practice Java implementations tend to do nasty things that most programs do not and Valgrind sometimes falls over these corner cases If your Java programs do not run under Valgrind even with smc check a11 please file a bu
124. but not 0x80 76762 vg to ucode c 3748 dis push segreg Assertion sz 4 failed 76747 cannot include valgrind h in c program 76223 parsing B 3 10 gave NULL type impossible happens 75604 shmdt handling problem 76416 Problems with gcc 3 4 snap 20040225 75614 using gstabs when building your programs the impossible happened 75787 Patch for some CDROM ioctls CDORM GET MCN CDROM SEND PACKET 172 NEWS 75294 gcc 3 4 snapshot s libstdc have unsupported instructions REP RET 73326 vg_symtab2 c 272 addScopeRange Assertion range gt size gt 0 failed 72596 not recognizing __libc_malloc 69489 Would like to attach ddd to running program 72781 Cachegrind crashes with kde programs 73055 Illegal operand at DXTCV11CompressBlockSSE2 more SSE opcodes 73026 Descriptor leak check reports port numbers wrongly 71705 README_MISSING_SYSCALL_OR_IOCTL out of date 72643 Improve support for SSE SSE2 instructions 72484 valgrind leaves it s own signal mask in place when execing 72650 Signal Handling always seems to restart system calls 72006 The mmap system call turns all errors in ENOMEM 71781 gdb attach is pretty useless 71180 unhandled instruction bytes OxF OxAE 0x85 OxE8 69886 writes to zero page cause valgrind to assert on exit 71791 crash when valgrinding gimp 1 3 stabs reader problem 69783 unhandled syscall 218 69782 unhandled instruction bytes 0x66 OxF 0x2B 0x80 70385 valgrind fails if th
125. check otherwise typically reports for MPI applications The wrappers also take the opportunity to carefully check size and definedness of buffers passed as arguments to MPI functions hence detecting errors such as passing undefined data to PMPI Send or receiving data into a buffer which is too small Unlike most of the rest of Valgrind the wrapper library is subject to a BSD style license so you can link it into any code base you like See the top of auxprogs libmpiwrap c for details 2 16 1 Building and installing the wrappers The wrapper library will be built automatically if possible Valgrind s configure script will look for a suitable mpicc to build it with This must be the same mpicc you use to build the MPI application you want to debug By default Valgrind tries mpi cc but you can specify a different one by using the configure time flag with mpicc Currently the wrappers are only buildable with mpiccs which are based on GNU gcc or Intel s icc Check that the configure script prints a line like this checking for usable MPI2 compliant mpicc and mpi h yes mpicc 34 Using and understanding the Valgrind core If it says no your mpicc has failed to compile and link a test MPI2 program If the configure test succeeds continue in the usual way with make and make install The final install tree should then contain 1ibmpiwrap so Compile up a test MPI program eg MPI hello world and try this LD_
126. checking for strcpy memcpy etc Do overlap checking with Addrcheck as well as Memcheck Fix this Memcheck the impossible happened get_error_name unexpected type Install headers needed to compile new skins Remove leading spaces and colon in the LD LIBRARY PATH LD PRELOAD passed to non traced children Fix file descriptor leak in valgrind listener Fix longstanding bug in which the allocation point of a block resized by realloc was not correctly set This may 176 NEWS have caused confusing error messages Snapshot 20030716 16 July 2003 20030716 is a snapshot of our current CVS head development branch This is the branch which will become valgrind 2 0 It contains significant enhancements over the 1 9 X branch Despite this being a snapshot of the CVS head it is believed to be quite stable at least as stable as 1 9 6 or 1 0 4 if not more so and therefore suitable for widespread use Please let us know asap if it causes problems for you Two reasons for releasing a snapshot now are It s been a while since 1 9 6 and this snapshot fixes various problems that 1 9 6 has with threaded programs on glibc 2 3 X based systems So as to make available improvements in the 2 0 line Major changes in 20030716 as compared to 1 9 6 More fixes to threading support on glibc 2 3 1 and 2 3 2 based systems SuSE 8 2 Red Hat 9 If you have had problems with inconsistent illogical behaviour of
127. cially if you are using the C STL Reading them from the bottom up can help If the stack trace is not big enough use the num callers option to make it bigger The code addresses eg 0x804838F are usually unimportant but occasionally crucial for tracking down weirder bugs Some error messages have a second component which describes the memory address involved This one shows that the written memory is just past the end of a block allocated with malloc on line 5 of example c The Valgrind Quick Start Guide It s worth fixing errors in the order they are reported as later errors can be caused by earlier errors Memory leak messages look like this 19182 40 bytes in 1 blocks are definitely lost in loss record 1 of 1 19182 at Ox1B8FF5CD malloc vg replace malloc c 130 19182 low SNO SiS S E e635 19182 by 0x80483AB main a c 11 The stack trace tells you where the leaked memory was allocated Memcheck cannot tell you why the memory leaked unfortunately Ignore the vg_replace_malloc c that s an implementation detail There are several kinds of leaks the two most important categories are definitely lost your program is leaking memory fix it probably lost your program is leaking memory unless you re doing funny things with pointers such as moving them to point to the middle of a heap block If you don t understand an error message please consult Explanation of error me
128. ctor of up to four depending on program behaviour This means you should be able to run programs that use more memory than before without hitting problems Addrcheck has been removed It has not worked since version 2 4 0 and the speed and memory improvements to Memcheck make it redundant If you liked using Addrcheck because it didn t give undefined value errors you can use the new Memcheck option undef value errors no to get the same behaviour The number of undefined value errors incorrectly reported by Memcheck has been reduced such false reports were already very rare In particular efforts have been made to ensure Memcheck works really well with gcc 4 0 4 1 generated code on X86 Linux and AMD64 Linux Josef Weidendorfer s popular Callgrind tool has been added Folding it in was a logical step given its popularity and usefulness and makes it easier for us to ensure it works out of the box on all supported targets The associated KDE KCachegrind GUI remains a separate project A new release of the Valkyrie GUI for Memcheck version 1 2 0 accompanies this release Improvements over previous releases include improved robustness many refinements to the user interface and use of a standard autoconf automake build system You can get it from http www valgrind org downloads guis html Valgrind now works on PPC64 Linux As with the AMD64 Linux port this supports programs using to 32G of address space On 64 bit capab
129. d GNU GPL d by Julian Seward 25832 Startup with flags 23832 25832 reading syms 25832 reading syms 25832 reading syms 25832 reading syms 25832 reading syms 25832 reading syms 25832 reading syms NI 25832 Invalid read from from from from from from from lib ld Linux so 2 Ay iS oo mnt pima jrs Inst lib l suppressions home sewardj Valgrind redhat71 supp aa G9 o 0 lib libm so 6 mnt pima jrs Inst lib 1 ibstdc so 3 home sewardj Valgrind valgrind so proc self exe of size 4 at 0x8048724 _ZN10BandMatrix6ReSize by 0x80487AF main bogon cpp 66 25832 Address OxBFFFF74C is not stack d mall dal aia bogon cpp 45 LOC dor free d 25832 ERROR SUMMARY 1 errors from 1 contexts suppressed 0 from 0 25832 25832 25832 25832 ma 25832 mall 25832 For a detailed leak analysis rerun with 25832 loc free in use at exit 0 bytes in 0 blocks oc free 0 allocs 0 frees 0 bytes allocated 25832 exiting did 1881 basic blocks 0 misses 25832 223 translations 3626 bytes in 56801 bytes out The GCC folks fixed this about a week before gcc 3 0 shipped leak check yes 2 15 Warning Messages You Might See Most of these only appear if you run in verbose mode enabled by v More than 100 errors detected but in less detail than b
130. dary This is of course completely bogus since there is no guarantee that the C library s definitions of these structs matches those of the kernel I have started to sort this out using vg kerneliface h into which I had intended to copy all kernel definitions which valgrind could need but this has not gotten very far At the moment it mostly contains definitions for s gset t and struct sigaction since the kernel s definition for these really does clash with glibc s I plan to use a vk i_ prefix on all these types and constants to denote the fact that they pertain to Valgrind s Kernel Interface Another advantage of having a vg kerneliface h file is that it makes it simpler to interface to a different kernel Once can for example easily imagine writing a new vg_kerneliface h for FreeBSD or x86 NetBSD 1 1 5 Current limitations Support for weird non POSIX signal stuff is patchy Does anybody care 1 2 The instrumenting JITter This really is the heart of the matter We begin with various side issues 95 The Design and Implementation of Valgrind 1 2 1 Run time storage and the use of host registers Valgrind translates client original basic blocks into instrumented basic blocks which live in the translation cache TC until either the client finishes or the translations are ejected from TC to make room for newer ones Since it generates x86 code in memory Valgrind has complete control of the use of registers in the transl
131. de compiled with fomit frame pointer providing you also compile your code with fasynchronous unwind tables The documentation build system has been completely redone The documentation masters are now in XML format and from that HTML PostScript and PDF documentation is generated As a result the manual is now available in book form Note that the documentation in the source tarballs is pre built so you don t need any XML processing tools to build Valgrind from a tarball Changes that are not user visible The code has been massively overhauled in order to modularise it As a result we hope it is easier to navigate and understand Lots of code has been rewritten BUGS FIXED 110046 sz 4 assertion failed 109810 vex amd64 gt IR unhandled instruction bytes OxA3 Ox4C 0x70 OxD7 109802 Add a plausible stack size command line parameter 109783 unhandled ioctl TIOCMGET running hw detection tool discover 109780 unhandled ioctl BLKSSZGET running fdisk 1 dev hda 109718 vex x86 gt IR unhandled instruction ffreep 109429 AMD64 unhandled syscall 127 sigpending 109401 false positive uninit in strchr from Id linux so 2 109385 stabs parse failure 109378 amd64 unhandled instruction REP NOP 109376 amd64 unhandled instruction LOOP Jb 163 NEWS 109363 AMD64 unhandled instruction bytes 109362 AMD64 unhandled syscall 24 sched yield 109358 fork won t work with valgrind 3 0 SVN 109332 amd64 unhandled instruc
132. de harmless when not running on Valgrind runs quickly when not on Valgrind and does not require any other library support 1 4 5 2 Part 2 Using it to detect Interference between Stack Variables Currently Valgrind cannot detect errors of the following form 115 The Design and Implementation of Valgrind void fooble void int a 10 aine 1 LO a 10 9 9 Now imagine rewriting this as void fooble void int spacer0 aioe aloe int spacerl ame lo LO I 2 int spacer2 VALGRIND MAKE NOACCESS amp spacer0O sizeof int VALGRIND MAKE NOACCESS amp spacerl sizeof int VALGRIND MAKE NOACCESS amp spacer2 sizeof int a 10 99 Now the invalid write is certain to hit spacer0 or spacer1 so Valgrind will spot the error There are two complications 1 The first is that we don t want to annotate sources by hand so the Right Thing to do is to write a C C parser annotator prettyprinter which does this automatically and run it on post CPP d C C source The parser prettyprinter is probably not as hard as it sounds I would write it in Haskell a powerful functional language well suited to doing symbolic computation with which I am intimately familar There is already a C parser written in Haskell by someone in the Haskell community and that would probably be a good starting point 2 The second complication is how to get rid of these NOACCESS records inside Valgrind when the instru
133. ded Valgrind is aware of the memory state changes caused by a subset of the MPI functions and will carefully check data passed to the P MPI_ interface A new flag error exitcode has been added This allows changing the exit code in runs where Valgrind reported errors which is useful when using Valgrind as part of an automated test suite Various segfaults when reading old style stabs debug information have been fixed A simple performance evaluation suite has been added See perf README and README DEVELOPERS for details There are various bells and whistles New configuration flags enable only32bit enable only64bit By default on 64 bit platforms ppc64 linux amd64 linux the build system will attempt to build a Valgrind which supports both 32 bit and 64 bit executables This may not be what you want and you can override the default behaviour using these flags Please note that Helgrind is still not working We have made an important step towards making it work again however with the 154 NEWS addition of function wrapping see below Other user visible changes Valgrind now has the ability to intercept and wrap arbitrary functions This is a preliminary step towards making Helgrind work again and was required for MPI support There are some changes to Memcheck s client requests Some of them have changed names MAKE NOACCESS gt MAKE MEM NOACCESS MAKE WRITABLE MAKE MEM UNDEFINED MAKE
134. ded about line numbers past the end of a file This can be caused by the above problem ie shortening the source file while using an old cachegrind out pidfile If this happens the figures for the bogus lines are printed anyway clearly marked as bogus in case they are important 60 Cachegrind a cache profiler 4 3 2 Things to watch out for Some odd things that can occur during annotation If annotating at the assembler level you might see something like this 1 1 0 0 o 0 0 leal 12 Sebp Seax 1 0 0 movl Seax 84 ebx 01 0 0 movl 1 20 Sebp align 4 0x90 5 movl LnrB eax 1 0 0 movl eax 16 ebp How can the third instruction be executed twice when the others are executed only once As it turns out it isn t Here s a dump of the executable using objdump d 8048 25 8048 28 8048 2b 8048 32 8048 34 8048 39 8d 89 c7 89 b8 89 45 43 45 6 08 45 4 lea Oxfffffff4 Sebp seax 54 mov Seax 0x54 Sebx ec 01 00 00 00 movl 0x1 0xffffffec ebp mov Sesi esi 8b 07 08 mov 0x8078b08 eax 0 mov Seax Oxfffffff0 Sebp Notice the extra mov esi esi instruction Where did this come from The GNU assembler inserted it to serve as the two bytes of padding needed to align the movl LnrB eax instruction on a four byte boundary but pretended it didn t exist when adding debug information Thus when Valgrind reads the debug info it thinks that
135. documentation for explanations of the tool specific macros RUNNING_ON_VALGRIND returns if running on Valgrind 0 if running on the real CPU If you are running Valgrind on itself it will return the number of layers of Valgrind emulation we re running on VALGRIND_DISCARD_TRANSLATIONS discard translations of code in the specified address range Useful if you are debugging a JITter or some other dynamic code generation system After this call attempts to execute code in the invalidated address range will cause Valgrind to make new translations of that code which is probably the semantics you want Note that code invalidations are expensive because finding all the relevant translations quickly is very difficult So try not to call it often Note that you can be clever about this you only need to call it when an area which previously contained code is overwritten with new code You can choose to write code into fresh memory and just call this occasionally to discard large chunks of old code all at once Alternatively for transparent self modifying code support use smc check a11 VALGRIND_COUNT_ERRORS returns the number of errors found so far by Valgrind Can be useful in test harness code when combined with the 1log d 1 option this runs Valgrind silently but the client program can detect when errors occur Only useful for tools that report errors e g it s useful for Memcheck but for Cachegrind it will always return zero bec
136. doing wild writes Massif a new space profiling tool Try it It s cool and it ll tell you in detail where and when your C C code is allocating heap Draws pretty ps pictures of memory use against time A potentially powerful tool for making sense of your program s space use Fixes for many bugs including support for more SSE2 SSE3 instructions various signal syscall things and various problems with debug info readers Support for glibc 2 3 3 based systems We are now doing automatic overnight build and test runs on a variety of distros As a result we believe 2 1 1 builds and runs on Red Hat 7 2 7 3 8 0 9 Fedora Core 1 SuSE 8 2 SuSE 9 The following bugs and probably many more have been fixed These are listed at http bugs kde org Reporting a bug for valgrind in the http bugs kde org is much more likely to get you a fix than mailing developers directly so please continue to keep sending bugs there 69616 glibc 2 3 2 w NPTL is massively different than what valgrind expects 69856 I don t know how to instrument MMXish stuff Helgrind 73892 valgrind segfaults starting with Objective C debug info fix for S type stabs 73145 Valgrind complains too much about close reserved fd 73902 Shadow memory allocation seems to fail on RedHat 8 0 68633 VG N SEMAPHORES too low V itself was leaking semaphores 75099 impossible to trace multiprocess programs 76839 the impossible happened disInstr INT
137. don t want to continuously be reminded of them Note By far the easiest way to add suppressions is to use the gen suppressions yes flag described in Command line flags for the Valgrind core Each error to be suppressed is described very specifically to minimise the possibility that a suppression directive inadvertantly suppresses a bunch of similar errors which you did want to see The suppression mechanism is designed to allow precise yet flexible specification of errors to suppress If you use the v flag at the end of execution Valgrind prints out one line for each used suppression giving its name and the number of times it got used Here s the suppressions used by a run of valgrind tool memcheck is p 27579 supp 1 socketcall connect serv addr libc connect nscd getgrgid ia 27579 supp 1 socketcall connect serv addr libc connect nscd getpwuid r 29599 ssa Sicicieclur ll meo 919 eXelc zom y u mejo jedi Multiple suppressions files are allowed By default Valgrind uses SPREFIX lib valgrind default supp You can ask to add suppressions from another file by specifying suppressions path to file supp If you want to understand more about suppressions look at an existing suppressions file whilst reading the following documentation The file g1ibc 2 2 supp in the source distribution provides some good examples Each suppression has the following components First line its name
138. done by helper fns so bit level accuracy is lost there This should be fixed by doing them inline it will probably require adding a couple new uinstrs Also left and right rotates through the carry flag x86 rcl and rcr are approximated via a single V bit so far this has not caused anyone to complain The non carry rotates rol and ror are much more common and are done exactly Re visiting the instrumentation for AND and OR they seem rather verbose and I wonder if it could be done more concisely now The lowercase o on many of the uopcodes in the running example indicates that the size field is zero usually meaning a single bit operation Anyroads the post instrumented version of our running example looks like this 106 The Design and Implementation of Valgrind Instrumented code 0 GETVL SEDX q0 1 GET EDX tO 2 TAGlo q0 Left4 q0 3 INCL to 4 PUTVL q0 EDX 53 PUTE t0 EDX 6 TESTVL q0 7 SETVL q0 8 LOADVB t0 q0 9 LDB Ay CO 10 TAG1o q0 SWiden14 q0 MEAN MID ENTES S EO ds JEU INVITES q0 SEAX USS PUL tO SEAX 4 GETV SECX q8 SC deo CS 16 MOVL q0 q4 Lye Siklik 0x1 q4 18 TAG20 q4 UifU4 q8 q4 19 TAGlo q4 Left4 q4 20 LEA2L Les cAr Ay T4 CARME SIA ASES q4 23 LOADVB t4 q10 24 LDB ed o TEMO 25 SETVB q12 26 MOVB SOO eile 2 MONA al0 ql4 28 TAG20 q14
139. e CALL FN v WW ole ma lome ome CALL_FN_W_WW loma ma Loma Lome CALL FN v WWW oic ma ome loja O D Gr CALL FN W WWW long ma 6 Loney lom LGN CALL_FN_W_WWWW long a loma Lemos Lom Lee CALL_FN_W_5W ome a Loma Lone Lome some eme CALL_FN_W_6W gt leong a Loma Lome lone Lema eme Lome and so on up to CALL FN W 12W The set of supported types can be expanded as needed It is regrettable that this limitation exists Function wrapping has proven difficult to implement with a certain apparently unavoidable level of ickyness After several implementation attempts the present arrangement appears to be the least worst tradeoff At least it works reliably in the presence of dynamic linking and dynamic code loading unloading You should not attempt to wrap a function of one type signature with a wrapper of a different type signature Such trickery will surely lead to crashes or strange behaviour This is not of course a limitation of the function wrapping implementation merely a reflection of the fact that it gives you sweeping powers to shoot yourself in the foot if you are not careful Imagine the instant havoc you could wreak by writing a wrapper which matched any function name in any soname in effect one which claimed to be a wrapper for all functions in the process 2 10 7 Examples In the source tree memcheck tests wrap 1 8 c provide a series of examples ranging from very simple
140. e Document and satisfy these conditions can be treated as verbatim copying in other respects If the required texts for either cover are too voluminous to fit legibly you should put the first ones listed as many as fit reasonably on the actual cover and continue the rest onto adjacent pages 204 The GNU Free Documentation License If you publish or distribute Opaque copies of the Document numbering more than 100 you must either include a machine readable Transparent copy along with each Opaque copy or state in or with each Opaque copy a computer network location from which the general network using public has access to download using public standard network protocols a complete Transparent copy of the Document free of added material If you use the latter option you must take reasonably prudent steps when you begin distribution of Opaque copies in quantity to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy directly or through your agents or retailers of that edition to the public It is requested but not required that you contact the authors of the Document well before redistributing any large number of copies to give them a chance to provide you with an updated version of the Document 4 MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 abo
141. e applied to memory locations too But evidently not for memory locations the index can be arbitrary and the processor will index arbitrarily into memory as a result This too should be fixed Sigh Presumably indexing outside the immediate word is not actually used by any programs yet tested on Valgrind for otherwise they presumably would simply not work at all If you plan to hack on this first check the Intel docs to make sure my understanding is really correct 113 The Design and Implementation of Valgrind 1 4 4 Using PREFETCH Instructions Here s a small but potentially interesting project for performance junkies Experiments with valgrind s code generator and optimiser s suggest that reducing the number of instructions executed in the translations and mem check helpers gives disappointingly small performance improvements Perhaps this is because performance of Valgrindified code is limited by cache misses After all each read in the original program now gives rise to at least three reads one for the VG_ primary_map one of the resulting secondary and the original Not to mention the instrumented translations are 13 to 14 times larger than the originals All in all one would expect the memory system to be hammered to hell and then some So here s an idea An x86 insn involving a read from memory after instrumentation will turn into ucode of the following form calculate effective addr into ta and qa T
142. e data from one memory block to another or something similar memcpy strcpy strncpy streat strncat The blocks pointed to by their src and dst pointers aren t allowed to overlap Memcheck checks for this For example 27492 Source and destination overlap in memcpy 0xbffff294 Oxbffff280 21 27492 at 0x40026CDC memcpy mc_replace_strmem c 71 27492 by 0x804865A main overlap c 40 27492 You don t want the two blocks to overlap because one of them could get partially trashed by the copying 44 Memcheck a heavyweight memory checker You might think that Memcheck is being overly pedantic reporting this in the case where dst is less than src For example the obvious way to implement memcpy is by copying from the first byte to the last However the optimisation guides of some architectures recommend copying from the last byte down to the first Also some implementations of memcpy zero dst before copying because zeroing the destination s cache line s can improve performance The moral of the story is if you want to write truly portable code don t make any assumptions about the language implementation 3 3 7 Memory leak detection Memcheck keeps track of all memory blocks issued in response to calls to malloc calloc realloc new So when the program exits it knows which blocks have not been freed If leak check is set appropriately for each remaining block Memcheck scans th
143. e entire address space of the process looking for pointers to the block Each block fits into one of the three following categories e Still reachable A pointer to the start of the block is found This usually indicates programming sloppiness Since the block is still pointed at the programmer could at least in principle free it before program exit Because these are very common and arguably not a problem Memcheck won t report such blocks unless show reachable yes is specified Possibly lost or dubious A pointer to the interior of the block is found The pointer might originally have pointed to the start and have been moved along or it might be entirely unrelated Memcheck deems such a block as dubious because it s unclear whether or not a pointer to it still exists Definitely lost or leaked The worst outcome is that no pointer to the block can be found The block is classified as leaked because the programmer could not possibly have freed it at program exit since no pointer to it exists This is likely a symptom of having lost the pointer at some earlier point in the program For each block mentioned Memcheck will also tell you where the block was allocated It cannot tell you how or why the pointer to a leaked block has been lost you have to work that out for yourself In general you should attempt to ensure your programs do not have any leaked or dubious blocks at exit For example 8 bytes in 1 blocks are defin
144. e is of no importance Additionally inherited event types can be introduced for which no raw data is available but which are calculated from given types Suppose the last example you could add events Stm Jue ar De to specify an additional event type Sum which is calculated by adding costs for Ir and Dr 3 2 Reference 3 2 1 Grammar ProfileDataFile FormatVersion Creator PartDatax FormatVersion version Spacer Number An Creator creator NoNewLineCharx An PartData HeaderLine n BodyLine n HeaderLine empty line 3 NoNewLineCharx PartDetail Description EventSpecification CostLineDef PartDetail TargetCommand TargetID 128 Callgrind Format Specification TargetCommand cmd Space NoNewLineCharx TargetID pid thread part Space Number Description desc Space Name Space NoNewLineChar x EventSpecification event Spacex Name InheritedDef LongNameDef InheritedDef InheritedExpr InheritedExpr Name Number Space Spacex Name InheritedExpr Spacex Space InheritedExpr LongNameDef NoNewLineChar x CostLineDef events Space Name Space Name posiclicass Walinsicie P Soacer l3 y BodyLine empty line 14 NoNewLineCharx CostLine PositionSpecification AssoziationSpecification CostLine SubPositionList Costs SubPosi
145. e it just comment it out 2 16 6 What to expect when using the wrappers 37 Using and understanding the Valgrind core The wrappers should reduce Memcheck s false error rate on MPI applications Because the wrapping is done at the MPI interface there will still potentially be a large number of errors reported in the MPI implementation below the interface The best you can do is try to suppress them You may also find that the input side buffer length definedness checks find errors in your MPI use for example passing too short a buffer to MPI_Recv Functions which are not wrapped may increase the false error rate A possible approach is to run with MPI DEBUG containing warn This will show you functions which lack proper wrappers but which are nevertheless used You can then write wrappers for them 38 3 Memcheck a heavyweight memory checker To use this tool you may specify tool memcheck on the Valgrind command line You don t have to though since Memcheck is the default tool 3 1 Kinds of bugs that Memcheck can find Memcheck is Valgrind s heavyweight memory checking tool All reads and writes of memory are checked and calls to malloc new free delete are intercepted As a result Memcheck can detect the following problems Use of uninitialised memory Reading writing memory after it has been free d Reading writing off the end of malloc d blocks Reading writing inappropriate areas on the stac
146. e of the first 8 words of VG baseBlock Such tags can exist at any point in the translation process Last but notleast TempReg The field contains the number of one of an infinite set of virtual integer registers TempRegs are used everywhere throughout the translation process you can have as many as you want The register allocator maps as many as it can into RealRegs and turns the rest into Spi llNos so TempRegs should not exist after the register allocation phase TempRegs are always 32 bits long even if the data they hold is logically shorter In that case the upper unused bits are required and I think generally assumed to be zero TempRegs holding V bits for quantities shorter than 32 bits are expected to have ones in the unused places since a one denotes undefined 1 2 5 UCode instructions type UInstr UCode was carefully designed to make it possible to do register allocation on UCode and then translate the result into x86 code without needing any extra registers well that was the original plan anyway Things have gotten a little more complicated since then In what follows UCode instructions are referred to as uinstrs to distinguish them from x86 instructions Uinstrs of course have uopcodes which are naturally different from x86 opcodes A uinstr type UInstr contains various fields not all of which are used by any one uopcode e Three 16 bit operand fields va11 val2 and val3 e Three tag fields tag1 ta
147. e size of the level 2 cache 4 2 3 Annotating C C programs Before using cg annotate it is worth widening your window to be at least 120 characters wide if possible as the output lines can be quite long To get a function by function summary run cg annotate pid in a directory containing a cachegrind out pid file The pid is required so that cg annotate knows which log file to use when several are present The output looks like this 54 Cachegrind a cache profiler Mesas he 65536 B 64 B 2 way associative D1 cache 65536 B 64 B 2 way associative 2 cache 262144 B 64 B 8 way associative Command concord vg_to_ucode c Events recorded Ir Ilmr I2mr Dr Dimi D2mr Dw D1mw D2mw Events shown Ir Ilmr I2mr Dr Dlmr D2mr Dw D1mw D2mw Event sort order Ir Ilmr I2mr Dr Dimr D2mr Dw D1mw D2mw Threshold 99 Chosen for annotation Auto annotation on alr I ue 2 TEE 3S Dimr D2mr Dw Dimw D2mw Dip Ee IMS 2758 2759 0 955 517 2 905 3 987 4 474 773 12 250 9 098 JOIS OXGISUANMIE EROMA S ike Jia Is Iie Dimr D2mr Dw Dimw D2mw pales TEASE ALON 90929 482 5 5 2r 242 702 1 52 TS Lp Pa 230 0 mEgetcmcum ORC c Dp Leip O23 4 Mp OG Soe 16 12 875 959 1 1 concord c get_word 2 649 248 2 IS e 7 S25 il 335 E s 3 eme Oe Sica Lp DLL DE 2 2 989i 215 0 379 398 0 ORconcordacsnash 2p Ce TAO 2 2 Ne OAS 612 56E 22 448 548 0 0 ctype c tolower JE AS 937 4 2 GS Om RTA O 000A 00 27 97 3 88 0 EG On
148. e soft file descriptor limit is less than about 828 69529 rep nop should do a yield 70827 programs with lots of shared libraries report mmap failed for some of them when reading symbols 71028 glibc s strnlen is optimised enough to confuse valgrind Unstable cvs head release 2 1 0 15 December 2003 For whatever it s worth 2 1 0 actually seems pretty darn stable to me Julian It looks eminently usable and given that it fixes some significant bugs may well be worth using on a day to day basis 2 1 0 is known to build and pass regression tests on SuSE 9 SuSE 8 2 RedHat 8 2 1 0 most notably includes Jeremy Fitzhardinge s complete overhaul of handling of system calls and signals and their interaction with threads In general the accuracy of the system call thread and signal simulations is much improved Specifically Blocking system calls behave exactly as they do when running natively not on valgrind That is if a syscall blocks only the calling thread when running natively than it behaves the same on valgrind No more mysterious hangs because V doesn t know that some syscall or other should block only the calling thread Interrupted syscalls should now give more faithful results Finally signal contexts in signal handlers are supported As a result konqueror on SuSE 9 no longer segfaults when notified of file changes in directories it is watching 173 NEWS Other changes Robert Walsh s f
149. e supervised application for which this profile was generated cmd program name args Cachegrind This specifies the full command line of the supervised application for which this profile was generated part number Callgrind This specifies a sequentially incremented number for each dump generated starting at 1 edesc type value Cachegrind This specifies various information for this dump For some types the semantic is defined but any description type is allowed Unknown types should be ignored There are the types Il cache D1 cache L2 cache which specify parameters used for the cache simulator These are the only types originally used by Cachegrind Additionally Callgrind uses the following types Timerange gives a rough range of the basic block counter for which the cost of this dump was collected Type Trigger states the reason of why this trace was generated E g program termination or forced interactive dump positions instr line Callgrind For cost lines this defines the semantic of the first numbers Any combination of instr bb and line is allowed but has to be in this order which corresponds to position numbers at the start of the cost lines later in the file If instr is specified the position is the address of an instruction whose execution raised the events given later on the line This address is relative to the offset of the binary shared library file to not have to specify relocation info
150. eb browser Valgrind can handle dynamically generated code just fine If you regenerate code over the top of old code ie at the same memory addresses if the code is on the stack Valgrind will realise the code has changed and work correctly This is necessary to handle the trampolines GCC uses to implemented nested functions If you regenerate code somewhere other than the stack you will need to use the smc check a11 flag and Valgrind will run more slowly than normal 30 Using and understanding the Valgrind core As of version 3 0 0 Valgrind has the following limitations in its implementation of x86 AMD64 floating point relative to IEEE754 Precision There is no support for 80 bit arithmetic Internally Valgrind represents all such long double numbers in 64 bits and so there may be some differences in results Whether or not this is critical remains to be seen Note the x86 amd64 fidt fstpt instructions read write 80 bit numbers are correctly simulated using conversions to from 64 bits so that in memory images of 80 bit numbers look correct if anyone wants to see The impression observed from many FP regression tests is that the accuracy differences aren t significant Generally speaking if a program relies on 80 bit precision there may be difficulties porting it to non x86 amd64 platforms which only support 64 bit FP precision Even on x86 amd64 the program may get different results depending on whether it is compil
151. eck option turns on the detailed memory leak detector Your program will run much slower eg 20 to 30 times than normal and use a lot more memory Memcheck will issue messages about memory errors and leaks that it detects 4 Interpreting Memcheck s output Here s an example C program with a memory error and a memory leak The Valgrind Quick Start Guide include lt stdlib h gt void f void as o meda ALO e ezaon sae p x 10 0 problem 1 heap block overrun problem 2 memory leak x not freed int main void FOE cer bue OP Most error messages look like the following which describes problem 1 the heap block overrun 19182 Invalid write of size 4 19182 at 0x804838F f example c 6 19182 by 0x80483AB main example c 11 19182 Address 0x1BA45050 is 0 bytes after a block of size 40 alloc d 19182 at Ox1B8FF5CD malloc vg_replace_malloc c 130 19182 by 0x8048385 f example c 5 19182 by 0x80483AB main example c 11 Things to notice There is a lot of information in each error message read it carefully The 19182 is the process ID it s usually unimportant The first line Invalid write tells you what kind of error it is Here the program wrote to some memory it should not have due to a heap block overrun Below the first line is a stack trace telling you where the problem occurred Stack traces can get quite large and be confusing espe
152. ed malloc d free d block for example reading free d memory Valgrind reports not only the location where the error happened but also where the associated block was malloc d free d Valgrind remembers all error reports When an error is detected it is compared against old reports to see if it is a duplicate If so the error is noted but no further commentary is emitted This avoids you being swamped with bazillions of duplicate error reports If you want to know how many times each error occurred run with the v option When execution finishes all the reports are printed out along with and sorted by their occurrence counts This makes it easy to see which errors have occurred most frequently Errors are reported before the associated operation actually happens If you re using a tool Memcheck which does address checking and your program attempts to read from address zero the tool will emit a message to this effect and the program will then duly die with a segmentation fault In general you should try and fix errors in the order that they are reported Not doing so can be confusing For example a program which copies uninitialised values to several memory locations and later uses them will generate several error messages when run on Memcheck The first such error message may well give the most direct clue to the root cause of the problem The process of detecting duplicate errors is quite an expensive one and can become a sig
153. ed SE IP by the specified literal amount This supports lazy SE IP updating as described below Stages 1 and 2 of the 6 stage translation process mentioned above deal purely with these uopcodes and no others They are sufficient to express pretty much all the x86 32 bit protected mode instruction set at least everything understood by a pre MMX original Pentium P54C Stages 3 4 5 and 6 also deal with the following extra instrumentation uopcodes They are used to express all the definedness tracking and checking machinery which valgrind does In later sections we show how to create checking code for each of the uopcodes above Note that these instrumentation uopcodes although some appearing complicated have been carefully chosen so that efficient x86 code can be generated for them GNU superopt v2 5 did a great job helping out here Anyways the uopcodes are as follows GETV and PUTV are analogues to GET and PUT above They are identical except that they move the V bits for the specified values back and forth to TempRegs rather than moving the values themselves e Similarly LOADV and STOREV read and write V bits from the synthesised shadow memory that Valgrind maintains In fact they do more than that since they also do address validity checks and emit complaints if the read written addresses are unaddressible TESTV whose parameters are a TempReg and a size tests the V bits in the TempReg at the specified operat
154. ed only if its contents constitute a work based on the Program independent of having been made by running the Program Whether that is true depends on what the Program does 1 You may copy and distribute verbatim copies of the Program s source code as you receive it in any medium provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty keep intact all the notices that refer to this License and to the absence of any warranty and give any other recipients of the Program a copy of this License along with the Program You may charge a fee for the physical act of transferring a copy and you may at your option offer warranty protection in exchange for a fee 2 You may modify your copy or copies of the Program or any portion of it thus forming a work based on the Program and copy and distribute such modifications or work under the terms of Section 1 above provided that you also meet all of these conditions a You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change b You must cause any work that you distribute or publish that in whole or in part contains or is derived from the Program or any part thereof to be licensed as a whole at no charge to all third parties under the terms of this License c If the modified program normally reads commands interactively when run you must cause it when
155. ed to use SSE2 instructions 64 bits only or x87 instructions 80 bit The net effect is to make FP programs behave as if they had been run on a machine with 64 bit IEEE floats for example PowerPC On amd64 FP arithmetic is done by default on SSE2 so amd64 looks more like PowerPC than x86 from an FP perspective and there are far fewer noticable accuracy differences than with x86 Rounding Valgrind does observe the 4 IEEE mandated rounding modes to nearest to infinity to infinity to zero for the following conversions float to integer integer to float where there is a possibility of loss of precision and float to float rounding For all other FP operations only the IEEE default mode round to nearest is supported Numeric exceptions in FP code IEEE754 defines five types of numeric exception that can happen invalid operation sqrt of negative number etc division by zero overflow underflow inexact loss of precision For each exception two courses of action are defined by 754 either 1 a user defined exception handler may be called or 2 a default action is defined which fixes things up and allows the computation to proceed without throwing an exception Currently Valgrind only supports the default fixup actions Again feedback on the importance of exception support would be appreciated When Valgrind detects that the program is trying to exceed any of these limitations setting exception handlers rounding mode or p
156. ee hey pa eee A RETE EIE RERPE ER LEER ERE 138 4 2 10 Other Important Information ssssssssssesseee cee eee eee ee 138 4 2 TE Words of Advice 6 E LAA Ree a PEA iege e pepe iu e Pep A 138 4 3 Advanced Topics cei seit ons eR alga per b EA Rudi eR ae yes eras ld aed edd GR des i d 140 4 34 SUuppressioDs eerren o de ons sha ld ia rd Res ced du pecore A nep dte pd 140 4 3 2 Documentatioti iiie RR ar est E gered A ae A ee val dies REAPER RES 140 4 3 3 Regression Tests fehl bene sso e Deed ea ee dee de a cad di Ri e nie ste dag 142 4 34 Profiling Saeco prese reset ie AS ae Dae A i eb etica QE eer HR ae fur 142 4 3 5 Other Makehle Hackety cuicos coral Ope dada ee tue ir a e a 142 4 3 6 Core tool Interface Versions miii rebos eate e ani Veste A EA d trii dea 142 44 Ema Words 422 255 eder I oops ettet te ea queues tL 143 XC 1 The Design and Implementation of Valgrind Detailed technical notes for hackers maintainers and the overly curious 1 1 Introduction This document contains a detailed highly technical description of the internals of Valgrind This is not the user manual if you are an end user of Valgrind you do not want to read this Conversely if you really are a hacker type and want to know how it works I assume that you have read the user manual thoroughly You may need to read this document several times and carefully Some important things I only say once Note this document is now very old and
157. efore Subsequent errors will still be recorded After 100 different errors have been shown Valgrind becomes more conservative about collecting them It then requires only the program counters in the top two stack frames to match when deciding whether or not two errors are really the same one Prior to this point the PCs in the top four frames are required to match This hack has the effect of slowing down the appearance of new errors after the first 100 The 100 constant can be changed by recompiling Valgrind More than 1000 errors detected error counts may be inaccurate I m not reporting any more Final Go fix your program After 1000 different errors have been detected Valgrind ignores any more It seems unlikely that collecting even more different ones would be of practical help to anybody and it avoids the danger that Valgrind spends more and more of its time comparing new errors against an ever growing collection As above the 1000 number is a compile time constant 33 Using and understanding the Valgrind core Warning client switching stacks Valgrind spotted such a large change in the stack pointer esp that it guesses the client is switching to a different stack At this point it makes a kludgey guess where the base of the new stack is and sets memory permissions accordingly You may get many bogus error messages following this if Valgrind guesses wrong At the moment large change is defined as a change of more
158. eful error reports We hope to have more documentation one day 7 3 Helgrind Options Helgrind specific options are private stacks yes no default no Assume thread stacks are used privately show last access yes some no default no Show location of last word access on error TI 8 Nulgrind the null tool A tool that does not very much at all Nulgrind is the minimal tool for Valgrind It does no initialisation or finalisation and adds no instrumentation to the program s code It is mainly of use for Valgrind s developers for debugging and regression testing Nonetheless you can run programs with Nulgrind They will run roughly 5 times more slowly than normal for no useful effect Note that you need to use the option tool none to run Nulgrind ie not tool nulgrind 78 9 Lackey a simple profiler and memory tracer To use this tool you must specify tool lackey on the Valgrind command line 9 1 Overview Lackey is a simple valgrind tool that does some basic program measurement It adds quite a lot of simple instrumentation to the program s code It is primarily intended to be of use as an example tool It measures and reports various things 1 When command line option basic counts yes is specified it prints the following statistics and informa tion about the execution of the client program a The number of calls to_d1_runtime_resolve the function in glibc s dynamic linker that r
159. ell c 8 An uninitialised value use error is reported when your program uses a value which hasn t been initialised in other words is undefined Here the undefined value is used somewhere inside the printf machinery of the C library This error was reported when running the following small program int main aigue lt p prince Use c Sol 2 It is important to understand that your program can copy around junk uninitialised data as much as it likes Memcheck observes this and keeps track of the data but does not complain A complaint is issued only when your program attempts to make use of uninitialised data In this example x is uninitialised Memcheck observes the value being passed to_TO_printf and thence to_IO_vfprintf but makes no comment However IO vfprintf has to examine the value of x so it can turn it into the corresponding ASCII string and it is at this point that Memcheck complains Sources of uninitialised data tend to be Local variables in procedures which have not been initialised as in the example above The contents of malloc d blocks before you write something there In C the new operator is a wrapper round malloc so if you create an object with new its fields will be uninitialised until you or the constructor fill them in which is only Right and Proper 3 3 3 Illegal frees For example Invalid free at Ox4004FFDF free vg_clientmalloc c 577 by 0x80484C7 main tests dou
160. er work not based on the Program with the Program or with a work based on the Program on a volume of a storage or distribution medium does not bring the other work under the scope of this License 3 You may copy and distribute the Program or a work based on it under Section 2 in object code or executable form under the terms of Sections and 2 above provided that you also do one of the following a Accompany it with the complete corresponding machine readable source code which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or b Accompany it with a written offer valid for at least three years to give any third party for a charge no more than your cost of physically performing source distribution a complete machine readable copy of the corresponding source code to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or c Accompany it with the information you received as to the offer to distribute corresponding source code This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer in accord with Subsection b above The source code for a work means the preferred form of the work for making modifications to it For an executable work complete source code means all the source code for all modules it cont
161. ers so you know which part of a file the shown code comes from eg 58 Cachegrind a cache profiler figures and code for line 704 line 704 line 878 figures and code for line 878 The amount of context to show around annotated lines is controlled by the context option To get automatic annotation run cg annotate auto yes cg annotate will automatically annotate every source file it can find that is mentioned in the function by function summary Therefore the files chosen for auto annotation are affected by the sort and threshold options Each source file is clearly marked Auto annotated source as being chosen automatically Any files that could not be found are mentioned at the end of the output eg The following files chosen for auto annotation could not be found OSEE oe eil p ere Sysdeps generic lockfile c This is quite common for library files since libraries are usually compiled with debugging information but the source files are often not present on a system If a file is chosen for annotation both manually and automatically it is marked asUser annotated source Usethe I include option to tell Valgrind where to look for source files if the filenames found from the debugging information aren t specific enough Beware that cg annotate can take some time to digest large cachegrind out pid files e g 30 seconds or more Also beware that auto annotation can produce a lot of output if your
162. esolves function references to shared objects You can change the name of the function tracekd with command line option nname name b The number of conditional branches encountered and the number and proportion of those taken c The number of basic blocks entered and completed by the program Note that due to optimisations done by the JIT this is not really an accurate value d The number of guest x86 amd64 ppc etc instructions and IR statements executed IR is Valgrind s RISC like intermediate representation via which all instrumentation is done e Ratios between some of these counts f The exit code of the client program 2 When command line option detailed counts yes is specified a table is printed with counts of loads stores and ALU operations for various types of operands The types are identified by their IR name I1 1128 F32 F64 and V128 3 When command line option t race mem yes is specified it prints out the size and address of almost every load and store made by the program See the comments at the top of the file Lackey 1k_main c for details about the output format how it works and inaccuracies in the address trace Note that Lackey runs quite slowly especially when detailed counts yes is specified It could be made to run a lot faster by doing a slightly more sophisticated job of the instrumentation but that would undermine its role as a simple example tool Hence we ha
163. et A dat debt D eT ae dae IA 24 2 10 2 Wrapping Specifications wees iver aoni ii us mir RR ad Rem eee eve RR ER pr NE Y ER EE 25 2 10 3 Wrapping Semantics isl isle ae Bri ee a e uerbi Po kp RR reed a iaa 26 2 10 4 Deb gging iiesliike Rb Ek ie n ki beni ucc ae Ae aa Pire kr weds 27 2 10 5 Limitations control flow 2 0 6 e ee 27 2 10 6 Limitations original function signatures 2 1 e 28 2AO 7 SE xamples it A chile Wb b Eisen 28 2 11 Building and Installing 43 cuc ae a id SE pH P etd aa dede A talca 28 212 1 You Have Problems 42 iieri Le lt ELDER SI Ec 29 ZAS Limitations 2 A A EA A AAA Da 30 2 14 An Example RUN turis A beg eh Re AA 32 2 15 Warning Messages You Might See 1 cece ene 33 2 16 Debugging MPI Parallel Programs with Valgrind 00 c cece eee 34 2 16 1 Building and installing the wrappers 0 0 cee een ee 34 2 16 2 Getting started ctas da epe ea hd A bere ae ds 35 2 16 3 Controlling the wrapper library eee ee 35 2 16 4 Abilities and limitations 00 eens 36 2 16 5 Writings New Wrappers su rocas ook eae ee Pede Sh aes He Seid Dan de vale 37 2 16 6 What to expect when using the wrappers 0 0 2 cece eee eens 37 3 Memcheck a heavyweight memory checker ssssssssesssessee eene 39 3 1 Kinds of bugs that Memcheck can find 0 cece eee ees 39 3 2 Command line flags specific to Memcheck 0 cece nee 39 3 3 Explanation of error message
164. etails Documentation A comprehensive user guide is supplied Point your browser at PREFIX share doc valgrind manual html where PREFIX is whatever you specified with prefix when building Building and installing it To install from the Subversion repository 0 Check out the code from SVN following the instructions at http www valgrind org downloads repository html 1 cd into the source directory 2 Run autogen sh to setup the environment you need the standard autoconf tools to do so 3 Continue with the following instructions To install from a tar bz2 distribution 4 Run configure with some options if you wish The standard options are documented in the INSTALL file The only interesting one is the usual prefix where you want it installed 5 Do make 6 Do make install possibly as root if the destination permissions require that 7 See if it works Try valgrind ls 1 Either this works or it bombs out with some complaint In that case please let us know see www valgrind org Important Do not move the valgrind installation into a place different from that specified by prefix at build time This will cause things to break in subtle ways mostly when Valgrind handles fork exec calls The Valgrind Developers 183 6 README MISSING SYSCALL OR IOCTL Dealing with missing system call or ioctl wrappers in Valgrind You re probably reading this because Valgrind bombed out whils
165. evious three macros To do this the above macros return a small integer block handle You can pass this block handle to VALGRIND DISCARD After doing so Valgrind will no longer be able to relate addressing errors to the user defined block associated with the handle The permissions settings associated with the handle remain in place this just affects how errors are reported not whether they are reported Returns 1 for an invalid handle and 0 for a valid handle although passing invalid handles is harmless Always returns 0 when not run on Valgrind e VALGRIND CHECK MEM IS ADDRESSABLE and VALGRIND CHECK MEM IS DEFINED check immedi ately whether or not the given address range has the relevant property and if not print an error message Also for the convenience of the client returns zero if the relevant property holds otherwise the returned value is the address of the first byte for which the property is not true Always returns 0 when not run on Valgrind VALGRIND CHECK VALUE IS DEFINED a quick and easy way to find out whether Valgrind thinks a particular value lvalue to be precise is addressable and defined Prints an error message if not Returns no value VALGRIND DO LEAK CHECK run the memory leak detector right now Returns no value Iguess this could be used to incrementally check for leaks between arbitrary places in the program s execution Warnin
166. f the entire aggregate the Document s Cover Texts may be placed on covers that bracket the Document within the aggregate or the electronic equivalent of covers if the Document is in electronic form Otherwise they must appear on printed covers that bracket the whole aggregate 207 The GNU Free Documentation License 8 TRANSLATION Translation is considered a kind of modification so you may distribute translations of the Document under the terms of section 4 Replacing Invariant Sections with translations requires special permission from their copyright holders but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections You may include a translation of this License and all the license notices in the Document and any Warranty Disclaimers provided that you also include the original English version of this License and the original versions of those notices and disclaimers In case of a disagreement between the translation and the original version of this License or a notice or disclaimer the original version will prevail If a section in the Document is Entitled Acknowledgements Dedications or History the requirement section 4 to Preserve its Title section 1 will typically require changing the actual title 9 TERMINATION You may not copy modify sublicense or distribute the Document except as expressly provided for under this License An
167. f SSE SSE2 floating point instructions or at least the subset emitted by Icc Also added support for the following instructions MOVNTDQ UCOMISD UNPCKLPS UNPCKHPS SQRTSS PUSH POP FS GS and PUSH CS Nb there is no POP CS CFI support for GDB version 6 Needed to enable newer GDBs to figure out where they are when using gdb attach yes Fix this mc_translate c 1091 memcheck_instrument Assertion u_in gt size 4 Il u_in gt size 16 failed Return an error rather than panicing when given a bad socketcall 174 NEWS Fix checking of syscall rt_sigtimedwait Implement __NR_clock_gettime syscall 265 Needed on Red Hat Severn Fixed bug in overlap check in strncpy it was assuming the src was n bytes long when it could be shorter which could cause false positives Support use of select for very large numbers of file descriptors Don t fail silently if the executable is statically linked or is setuid setgid Print an error message instead Support for old DWARF 1 format line number info Snapshot 20031012 12 October 2003 Three months worth of bug fixes roughly Most significant single change is improved SSE SSE2 support mostly thanks to Dirk Mueller 20031012 builds on Red Hat Fedora Severn but doesn t really work curiosly mozilla runs OK but a modest Is 1 bombs I hope to get a working version out soon It may or may not work ok on the forthcoming SuSE 9 I hear pos
168. f one of these functions was called from several different places in the program Which one of these is responsible for most of the memory used For nl intern locale data this question is answered by clicking on the 22 1 link which takes us to the following part of the file Context accounted for 22 196 of measured spacetime 0x401767D0 nl intern locale data in lib i686 1libc 2 3 2 so Called from e 22 1 0x40176F95 nl load locale from archive in lib 1686 libc 2 3 2 so 74 Massif a heap profiler At this level we can see all the places from which n1 load locale from archive was called such that it allocated memory at 0x401767D0 We can click on the top 22 1 link to go back to the parent entry At this level we have moved beyond the information presented in the graph In this case it is only called from one place We can again follow the link for more detail moving to the following part of the file Context accounted for 22 196 of measured spacetime 0x401767D0 nl intern locale data in lib i686 libc 2 3 2 so 0x40176F95 nl load locale from archive in lib i686 1ibc 2 3 2 so Called from e 22 1 0x40176184 nl find locale in lib 1686 libc 2 3 2 so In this way we can dig deeper into the call stack to work out exactly what sequence of calls led to some memory being allocated At this point with a call depth of 3 the information runs out thus the address of the child entry 0x40176
169. fewer memory exhausted messages and debug symbols should be read correctly on large eg 300MB executables On 32 bit machines the full address space available to user programs usually 3GB or 4GB can be fully utilised On 64 bit machines up to 32GB of space is usable when using Memcheck that means your program can use up to about 14GB 157 NEWS A side effect of this change is that Valgrind is no longer protected against wild writes by the client This feature was nice but relied on the x86 segment registers and so wasn t portable Most users should not notice but as part of the address space manager change the way Valgrind is built has been changed Each tool is now built as a statically linked stand alone executable rather than as a shared object that is dynamically linked with the core The valgrind program invokes the appropriate tool depending on the tool option This slightly increases the amount of disk space used by Valgrind but it greatly simplified many things and removed Valgrind s dependence on glibc Please note that Addrcheck and Helgrind are still not working Work is underway to reinstate them or equivalents We apologise for the inconvenience Other user visible changes The weird hacks option has been renamed sim hints The time stamp option no longer gives an absolute date and time It now prints the time elapsed since the program began It should build with gcc 2 96 Valgrind can
170. fl file2 c fn func2 20 70 0 One can see that in main only code from line 16 is executed where also the other functions are called Inclusive cost of main is 420 which is the sum of self cost 20 and costs spent in the calls 125 Callgrind Format Specification Function func is located in filel c the same as main Therefore a cfl specification for the call to func1 is not needed The function func only consists of code at line 51 of filel c where func2 is called 3 1 5 Name Compression With the introduction of association specifications like calls it is needed to specify the same function or same file name multiple times As absolute filenames or symbol names in C can be quite long it is advantageous to be able to specify integer IDs for position specifications To support name compression a position specification can be not only of the format spec name but also spec ID name to specify a mapping of an integer ID to aname and spec ID to reference a previously defined ID mapping There is a separate ID mapping for each position specification i e you can use ID 1 for both a file name and a symbol name With string compression the example from 1 4 looks like this events instructions fl 1 filel c fn 1 main 16 20 cutem 2 romel calls 1 50 16 400 AAA cfn 3 func2 calls 3 20 16 400 fn 2 54 L00 eu 2 cfn 3 calls 2 20 Hal SOW f1 2 fn 3
171. for file and symbol names as position strings as these never start with digit The compressed format is either number space position or only number The first relates position to number in the context of the given format specification from this line to the end of the file it makes the number an alias for position Compressed format is always optional Position specifications allowed ob Callgrind The ELF object where the cost of next cost lines happens 1 Cachegrind i Cachegrind e Cachegrind The source file including the code which is responsible for the cost of next cost lines fi fe is used when the source file changes inside of a function i e for inlined code n Cachegrind The name of the function where the cost of next cost lines happens cob Callgrind The ELF object of the target of the next call cost lines e c 1 Callgrind The source file including the code of the target of the next call cost lines c n Callgrind The name of the target function of the next call cost lines calls Callgrind The number of nonrecursive calls which are responsible for the cost specified by the next call cost line This is the cost spent inside of the called function After calls there MUST be a cost line This is the cost spent in the called function The first number is the source line from where the call happened e jump count target position Callgrind Uncond
172. formation is acquired using VALGRIND GET ORIG FN Itis crucial to make this macro call before calling any other wrapped function in the same thread CALL FN W WW eventually we will want to call the function being wrapped Calling it directly does not work since that just gets us back to the wrapper and tends to kill the program in short order by stack overflow Instead the result lvalue OrigFn and arguments are handed to one of a family of macros of the form CALL FN x These cause Valgrind to call the original and avoid recursion back to the wrapper 2 10 2 Wrapping Specifications This scheme has the advantage of being self contained A library of wrappers can be compiled to object code in the normal way and does not rely on an external script telling Valgrind which wrappers pertain to which originals Each wrapper has a name which in the most general case says I am the wrapper for any function whose name matches FNPATT and whose ELF soname matches SOPATT Both FNPATT and SOPATT may contain wildcards asterisks and other characters spaces dots 9 etc which are not generally regarded as valid C identifier names This flexibility is needed to write robust wrappers for POSIX pthread functions where typically we are not completely sure of either the function name or the soname or alternatively we want to wrap a whole bunch of functions at once For example pthread create in GNU libpthread is usually a versioned symbol one who
173. formation might only be simple like total counts for the whole program s execution What about space usage at different points in the program s execution for example And reimplementing heap profiling code for each project is a pain Massif can save you this effort 71 Massif a heap profiler 6 2 Using Massif 6 2 1 Overview First off as for normal Valgrind use you probably want to compile with debugging info the g flag But as opposed to Memcheck you probably do want to turn optimisation on since you should profile your program as it will be normally run Then run your program with valgrind tool massi f in front of the normal command line invocation When the program finishes Massif will print summary space statistics It also creates a graph representing the program s heap usage in a file called massif pid ps which can be read by any PostScript viewer such as Ghostview It also puts detailed information about heap consumption in a file massif pid txt text format or massif pid html HTML format where pid is the program s process id 6 2 2 Basic Results of Profiling To gather heap profiling information about the program prog type valgrind tool massif prog The program will execute slowly Upon completion summary statistics that look like this will be printed 27519 Total spacetime DANS LOS usa 27519 heap 24 0 27519 heap admin Bod 27519 stack s Tale All measu
174. from to the relevant address in the simulated V bit memory FPU loads and stores are different As above the definedness of the address is first tested However the helper routine for FPU loads VGM_ fpu_read_check emits an error if either the address is invalid or the referenced area contains undefined values It has to do this because we do not simulate the FPU at all and so cannot track definedness of values loaded into it from memory so we have to check them as soon as they are loaded into the FPU ie at this point We notionally assume that everything in the FPU is defined It follows therefore that FPU writes first check the definedness of the address then the validity of the address and finally mark the written bytes as well defined If anyone is inspired to extend Valgrind to MMX SSE insns I suggest you use the same trick It works provided that the FPU MMX unit is not used to merely as a conduit to copy partially undefined data from one place in memory to another Unfortunately the integer CPU is used like that when copying C structs with holes for example and this is the cause of much of the elaborateness of the instrumentation here described vg_instrument in vg_translate c actually does the instrumentation There are comments explaining how each uinstr is handled so we do not repeat that here As explained already it is bit accurate except for calls to helper functions Unfortunately the x86 insns bt bt s btc bt r are
175. g not properly tested VALGRIND COUNT LEAKS fills in the four arguments with the number of bytes of memory found by the previous leak check to be leaked dubious reachable and suppressed Again useful in test harness code after calling VALGRIND DO LEAK CHECK e VALGRIND GET VBITS and VALGRIND SET VBITS allow you to get and set the V validity bits for an address range You should probably only set V bits that you have got with VALGRIND GET VBITS Only for those who really know what they are doing 50 4 Cachegrind a cache profiler Detailed technical documentation on how Cachegrind works is available in How Cachegrind works If you only want to know how to use it this is the page you need to read 4 1 Cache profiling To use this tool you must specify tool cachegrind on the Valgrind command line Cachegrind is a tool for doing cache simulations and annotating your source line by line with the number of cache misses In particular it records Ll instruction cache reads and misses L data cache reads and read misses writes and write misses 2 unified cache reads and read misses writes and writes misses On a modern machine an L1 miss will typically cost around 10 cycles and an L2 miss can cost as much as 200 cycles Detailed cache profiling can be very useful for improving the performance of your program Also since one instruction cache read is performed per instruction executed
176. g cycles Each group of functions with any two of them happening to have a call chain from one to the other is called a cycle For example with A calling B B calling C and C calling A the three functions A B C build up one cycle If a call chain goes multiple times around inside of a cycle with profiling you can not distinguish event counts coming from the first round or the second Thus it makes no sense to attach any inclusive cost to a call among functions inside of one cycle If A B appears multiple times in a call chain you have no way to partition the one big sum of all appearances of A B Thus for profile data presentation all functions of a cycle are seen as one big virtual function Unfortunately if you have an application using some callback mechanism like any GUI program or even with normal polymorphism as in OO languages like C it s quite possible to get large cycles As it is often impossible to say anything about performance behaviour inside of cycles it is useful to introduce some mechanisms to avoid cycles in call graphs This is done by treating the same function in different ways depending on the current execution context either by giving them different names or by ignoring calls to functions There is an option to ignore calls to a function with n skip funcprefix E g you usually do not want to see the trampoline functions in the PLT sections for calls to functions in shared libraries You can see
177. g from a signal handler when VDSOs are turned off in FC2 69508 java 1 4 2 client fails with erroneous stack size too small This fix makes more of the pthread stack attribute related functions work properly Java still doesn t work though 71906 malloc alignment should be 8 not 4 All memory returned by malloc new etc is now at least 8 byte aligned 81970 vg alloc ThreadState no free slots available closed because the workaround is simple increase VG N THREADS rebuild and try again 78514 Conditional jump or move depends on uninitialized value s a slight mishanding of FP code in memcheck 77952 pThread Support crash due to initialisation ordering probs also 85118 80942 Addrcheck wasn t doing overlap checking as it should 78048 return NULL on malloc new etc failure instead of asserting 73655 operator new override in user so files often doesn t get picked up 83060 Valgrind does not handle native kernel AIO 69872 Create proper coredumps after fatal signals 82026 failure with new glibc versions libc functions are not exported 70344 UNIMPLEMENTED FUNCTION tcdrain 81297 Cancellation of pthread cond wait does not require mutex 82872 Using debug info from additional packages wishlist 83025 Support for ioctls FIGETBSZ and FIBMAP 83340 Support for ioctl HDIO GET IDENTITY 79714 Support for the semtimedop system call 77022 Support for ioctls FBIOGET VSCREENINFO and FBIOGET FSCREENINFO 82098 hp2ps ansif
178. g report and hopefully we ll be able to fix the problem 83 Valgrind Frequently Asked Questions 4 Valgrind behaves unexpectedly 4 1 My program uses the C STL and string classes Valgrind reports still reachable memory leaks involving these classes at the exit of the program but there should be none First of all relax it s probably not a bug but a feature Many implementations of the C standard libraries use their own memory pool allocators Memory for quite a number of destructed objects is not immediately freed and given back to the OS but kept in the pool s for later re use The fact that the pools are not freed at the exit of the program cause Valgrind to report this memory as still reachable The behaviour not to free pools at the exit could be called a bug of the library though Using gcc you can force the STL to use malloc and to free memory as soon as possible by globally disabling memory caching Beware Doing so will probably slow down your program sometimes drastically e With gcc 2 91 2 95 3 0 and 3 1 compile all source using the STL with D__USE_MALLOC Beware This is removed from gcc starting with version 3 3 EW before Z e With gcc 3 2 2 and later you should export the environment variable GLIBCPP_FORCE running your program e With gcc 3 4 and later that variable has changed name to GLIBCXX_FORCE_NEW There are other ways to disable memory pooling using t
179. g this right is critical and so V6 saneUCodeBlock makes various checks on the use of these uopcodes It is important to understand that these uopcodes have nothing to do with the x86 call return push or pop instructions and are not used to implement them Those guys turn into combinations of GET PUT LOAD STORE ADD SUB and JMP What these uopcodes support is calling of helper functions such as VG helper imul 32 64 which do stuff which is too difficult or tedious to emit inline FPU FPU Rand FPU W Valgrind doesn t attempt to simulate the internal state of the FPU at all Consequently it only needs to be able to distinguish FPU ops which read and write memory from those that don t and for those which do it needs to know the effective address and data transfer size This is made easier because the x86 FP instruction encoding is very regular basically consisting of 16 bits for a non memory FPU insn and 11 IIRC bits an address mode for a memory FPU insn So our FPU uinstr carries the 16 bits in its vall field And FPU R and FPU_W carry 11 bits in that field together with the identity of a TempReg or later RealReg which contains the address e JIFZ is unique in that it allows a control flow transfer which is not deemed to end a basic block It causes a jump to a literal original address if the specified argument is zero 100 The Design and Implementation of Valgrind e Finally INCEIP advances the simulat
180. g2 and tag3 Each of these has a value of type Tag and they describe what the vall val2 and val3 fields contain A 32 bit literal field Two FlagSets specifying which x86 condition codes are read and written by the uinstr An opcode byte containing a value of type Opcode 99 The Design and Implementation of Valgrind A size field indicating the data transfer size 1 2 4 8 10 in cases where this makes sense or zero otherwise A condition code field which for jumps holds a value of type Condcode indicating the condition which applies The encoding is as it is in the x86 insn stream except we add a 17th value CondAlways to indicate an unconditional transfer Various 1 bit flags indicating whether this insn pertains to an x86 CALL or RET instruction whether a widening is signed or not etc UOpcodes type Opcode are divided into two groups those necessary merely to express the functionality of the x86 code and extra uopcodes needed to express the instrumentation The former group contains GET and PUT which move values from the simulated CPU s integer registers Ar chRegs into TempRegs and back GETF and PUTF do the corresponding thing for the simulated 3EF LAGS There are no corresponding insns for the FPU register stack since we don t explicitly simulate its registers LOAD and STORE which in RISC like fashion are the only uinstrs able to interact with memory MOV and CMOV allow u
181. ght c YEAR YOUR NAME Permission is granted to copy distribute and or modify this document under the terms of the GNU Free Documentation License Version 1 2 or any later version published by the Free Software Foundation with no Invariant Sections no Front Cover Texts and no Back Cover Texts A copy of the license is included in the section entitled GNU Free Documentation License If you have Invariant Sections Front Cover Texts and Back Cover Texts replace the with Texts line with this with the Invariant Sections being LIST THEIR TITLES with the Front Cover Texts being LIST and with the Back Cover Texts being LIST If you have Invariant Sections without Cover Texts or some other combination of the three merge those two alternatives to suit the situation If your document contains nontrivial examples of program code we recommend releasing these examples in parallel under your choice of free software license such as the GNU General Public License to permit their use in free software 209
182. gnificant events All lines in the commentary have following form 12345 Some message from Valgrind The 12345 is the process ID This scheme makes it easy to distinguish program output from Valgrind commentary and also easy to differentiate commentaries from different processes which have become merged together for whatever reason By default Valgrind tools write only essential messages to the commentary so as to avoid flooding you with information of secondary importance If you want more information about what is happening re run passing the v flag to Valgrind A second v gives yet more detail You can direct the commentary to three different places 10 Using and understanding the Valgrind core 1 The default send it to a file descriptor which is by default 2 stderr So if you give the core no options it will write commentary to the standard error stream If you want to send it to some other file descriptor for example number 9 you can specify log fd 9 This is the simplest and most common arrangement but can cause problems when valgrinding entire trees of processes which expect specific file descriptors particularly stdin stdout stderr to be available for their own use 2 A less intrusive option is to write the commentary to a file which you specify by log file filename Note carefully that the commentary is not written to the file you specify but instead to one called filename 12345 if
183. grind chooses a suitable name but very occasionally it gets it wrong Examples we know of are printing bcmp instead of memcmp index instead of strchr and rindex instead of strrchr 85 Valgrind Frequently Asked Questions 5 Memcheck doesn t find my bug 5 1 5 2 I try running valgrind tool memcheck my_program and get Valgrind s startup message but I don t get any errors and I know my program has errors There are two possible causes of this First by default Valgrind only traces the top level process So if your program spawns children they won t be traced by Valgrind by default Also if your program is started by a shell script Perl script or something similar Valgrind will trace the shell or the Perl interpreter or equivalent To trace child processes use the t race children yes option If you are tracing large trees of processes it can be less disruptive to have the output sent over the network Give Valgrind the flag log socket 127 0 0 1 12345 if you want logging output sent to port 12345 on localhost You can use the valgrind listener program to listen on that port valgrind listener 12345 Obviously you have to start the listener process first See the manual for more details Second if your program is statically linked most Valgrind tools won t work as well because they won t be able to replace certain functions such as malloc with their own
184. grind crashes when trying to read debug information 113810 vex x86 gt IR 66 OF F6 66 PSADBW SSE PSADBW 113796 read and write do not work if buffer is in shared memory 113851 vex x86 gt IR pmaddwd 0x66 OxF OxF5 0xC7 114366 vex amd64 cannnot handle asm fninit 114412 vex amd64 gt IR OxF OxAD 0xC2 OxD3 128 bit shift shrdq 114455 vex amd64 gt IR OxF OxAC OxDO 0x1 also shrdq 115590 amd64 gt IR 0x67 OxE3 0x9 OxEB address size override 115953 valgrind svn r5042 does not build with parallel make j3 116057 maximum instruction size VG MAX INSTR SZB too small 159 NEWS 116483 shmat failes with invalid argument 102202 valgrind crashes when realloc ing until out of memory 109487 102202 110536 102202 112687 102202 111724 vex amd64 gt IR 0x41 OxF OxAB more BT S R C fun n games 111748 vex amd64 gt IR OxDD 0xE2 fucom 111785 make fails if CC contains spaces 111829 vex x86 gt IR sbb AL Ib 111851 vex x86 gt IR Ox9F 0x89 lahf sahf 112031 iopl on AMD64 and README MISSING SYSCALL OR IOCTL update 112152 code generation for Xin MFence on x86 with SSEO subarch 112167 112152 112789 112152 112199 naked ar tool is used in vex makefile 112501 vex x86 gt IR movq OxF Ox7F 0xC1 OxF mmx MOVQ 113583 112501 112538 memalign crash 113190 Broken links in docs html 113230 Valgrind sys pipe on x86 64 wrongly thinks file descriptors should be 64bit 113996 vex amd64 gt
185. hat doesn t write to memory The number of counts in each 1ine and the summary line should not exceed the number of events in the event line If the number in each line is less cg annotate treats those missing as though they were a entry A file line changes the current file name A n line changes the current function name A count line contains counts that pertain to the current filename fn name A fn file line and a fn line must appear before any count lines to give the context of the first count lines Each file line should be immediately followed by a n line fi file lines are used to switch filenames for inlined functions fe file lines are similar but are put at the end of a basic block in which the file name hasn t been switched back to the original file name fi and fe lines behave the same they are only distinguished to help debugging 2 8 Summary of performance features Quite a lot of work has gone into making the profiling as fast as possible This is a summary of the important features The basic block level cost centre storage allows almost free cost centre lookup Only one function call is made per instruction simulated even this accounts for a sizeable percentage of execution time but it seems unavoidable if we want flexibility in the cache simulator Unchanging information about an instruction is stored in its cost centre avoiding unnecessary argument pushing and minimising UCode instrume
186. hat read memory 2 3 Storing cost centres Cost centres are stored in a way that makes them very cheap to lookup which is important since one is looked up for every original x86 instruction executed Valgrind does JIT translations at the basic block level and cost centres are also setup and stored at the basic block level By doing things carefully we store all the cost centres for a basic block in a contiguous array and lookup comes almost for free Consider this part of a basic block for exposition purposes pretend it s an entire basic block movl 0x0 eax movl 0x99 4 ebp The translation to UCode looks like this 118 How Cachegrind works MOVL Ss T20 PUTL t20 EAX INCEIPo 5 Jo LEA1L EX EAE MOVL 0x99 t18 ET ele 1A INCEIPo 7 The first step is to allocate the cost centres This requires a preliminary pass to count how many x86 instructions were in the basic block and their types and thus sizes UCode translations for single x86 instructions are delimited by the INCEIPo instruction the argument of which gives the byte size of the instruction note that lazy INCEIP updating is turned off to allow this We can tell if an x86 instruction references memory by looking for LDL and STL UCode instructions and thus what kind of cost centre is required From this we can determine how many cost centres we need for the basic block and their sizes We can then allocate them in a single array
187. he address of the instruction s cost centre is pushed onto the stack to be the first argument to the cache simulation function The address is known at this point because we are doing a simultaneous pass through the cost centre array This means the cost centre lookup for each instruction is almost free just the cost of pushing an argument for a function call Then the call to the cache simulation function for non memory reference instructions is made note that the CALLMo Ulnstruction takes an offset into a table of predefined functions it is not an absolute address and the single argument is CLEARed from the stack The second instruction s UCode is similar The only difference is that as mentioned before we have to pass the address of the data item referenced to the cache simulation function too This explains the MOVL t14 t42 and PUSHL t 42 Ulnstructions Note that the seemingly redundant MOVing will probably be optimised away during register allocation Note that instead of storing unchanging information about each instruction instruction size data size etc in its cost centre we could have passed in these arguments to the simulation function But this would slow the calls down two or three extra arguments pushed onto the stack Also it would bloat the UCode instrumentation by amounts similar to the space required for them in the cost centre bloated UCode would also fill the translation cache more quickly requiring more translat
188. he basic block count with the result that you can get close to the basic block causing a problem but can t home in on it exactly My kludgey hack is to define SIGNAL SIMULATION to 1 towards the bottom of vg syscall mem c so that signal handlers are run on the real CPU and don t change the BB counts A second hole in the switch back to real CPU story is that Valgrind s way of delivering signals to the client is different from that of the kernel Specifically the layout of the signal delivery frame and the mechanism used to detect a sighandler returning are different So you can t expect to make the transition inside a sighandler and still have things working but in practice that s not much of a restriction Valgrind s implementation of malloc free etc in vg_clientmalloc c not the low level stuff in vg_malloc2 c is somewhat complicated by the need to handle switching back at arbitrary points It does work tho 1 1 4 Correctness There s only one of me and I have a Real Life tm as well as hacking Valgrind allegedly That means I don t have time to waste chasing endless bugs in Valgrind My emphasis is therefore on doing everything as simply as 93 The Design and Implementation of Valgrind possible with correctness stability and robustness being the number one priority more important than performance or functionality As a result The code is absolutely loaded with assertions and these are permanently
189. he directory containing the package s source code and type configure to configure the package for your system If you re using csh on an old version of System V you might need to type sh configure instead to prevent csh from trying to execute configure itself Running configure takes awhile While running it prints some messages telling which features it is checking for 2 Type make to compile the package 3 Optionally type make check to run any self tests that come with the package 4 Type make install to install the programs and any data files and documentation 5 You can remove the program binaries and object files from the source code directory by typing make clean To also remove the files that configure created so you can compile the package for a different kind of computer type make distclean There is also a make maintainer clean target but that is intended mainly for the package s developers If you use it you may have to get 149 INSTALL all sorts of other programs in order to regenerate files that came with the distribution Compilers and Options Some systems require unusual options for compilation or linking that the configure script does not know about You can give configure initial values for variables by setting them in the environment Using a Bourne compatible shell you can do that on the command line like this CC c89 CFL
190. he malloc_alloc template with your objects not portable but should work for gcc or even writing your own memory allocators But all this goes beyond the scope of this FAQ Start by reading http gcc gnu org onlinedocs libstdc ext howto html 3 if you absolutely want to do that But beware 1 there are currently changes underway for gcc which are not totally reflected in the docs right now now 26 Apr 03 2 allocators belong to the more messy parts of the STL and people went to great lengths to make it portable across platforms Chances are good that your solution will work on your platform but not on others 4 2 The stack traces given by Memcheck or another tool aren t helpful How can I improve them If they re not long enough use num callers to make them longer If they re not detailed enough make sure you are compiling with g to add debug information And don t strip symbol tables programs should be unstripped unless you run strip on them some libraries ship stripped Also for leak reports involving shared objects if the shared object is unloaded before the program terminates Valgrind will discard the debug information and the error message will be full of entries The workaround here is to avoid calling dlclose on these shared objects Also fomit frame pointer and fstack check can make stack traces worse Some example sub traces 84 Valgrind Frequently Asked Questions With debug inform
191. heck is more effective for heap allocated data than for stack allocated data If you have to use this flag you may wish to consider rewriting your code to allocate on the heap rather than on the stack 2 6 4 malloc related Options For tools that use their own version of malloc e g Memcheck and Massif the following options apply alignment number default 8 By default Valgrind s malloc realloc etc return 8 byte aligned addresses This is standard for most processors However some programs might assume that malloc et al return 16 byte or more aligned memory The supplied value must be between 8 and 4096 inclusive and must be a power of two 2 6 5 Uncommon Options These options apply to all tools as they affect certain obscure workings of the Valgrind core Most people won t need to use these 19 Using and understanding the Valgrind core run libc freeres yes no default yes The GNU C library 1ibc so which is used by all programs may allocate memory for its own uses Usually it doesn t bother to free that memory when the program ends there would be no point since the Linux kernel reclaims all process resources when a process exits anyway so it would just slow things down The glibc authors realised that this behaviour causes leak checkers such as Valgrind to falsely report leaks in glibc when a leak check is done at exit In order to avoid this they provided a routine called __l
192. her e g with inlined functions Debug info reading read symbols from both symtab and dynsym sections rather than merely from the one that comes last in the file New syscall support prctl creat lookup dcookie When checking calls to accept recvfrom getsocketopt don t complain if buffer values are NULL Try and avoid assertion failures in mash LD PRELOAD and LD LIBRARY PATH 179 NEWS Minor bug fixes in cg_annotate Version 1 9 5 7 April 2003 It occurs to me that it would be helpful for valgrind users to record in the source distribution the changes in each release So I now attempt to mend my errant ways Changes in this and future releases will be documented in the NEWS file in the source distribution Major changes in 1 9 5 Critical bug fix Fix a bug in the FPU simulation This was causing some floating point conditional tests not to work right Several people reported this If you had floating point code which didn t work right on 1 9 1 to 1 9 4 it s worth trying 1 9 5 Partial support for Red Hat 9 RH9 uses the new Native Posix Threads Library NPTL instead of the older LinuxThreads This potentially causes problems with V which will take some time to correct In the meantime we have partially worked around this and so 1 9 5 works on RH9 Threaded programs still work but they may deadlock because some system calls accept read write etc which should be nonblocking
193. here s a default case sometimes it isn t correct and you have to write a more specific case to get the right behaviour As above please create a bug report and attach the patch as described on http www valgrind org 187 7 README_DEVELOPERS Building and not installing it To run Valgrind without having to install it run coregrind valgrind with the VALGRIND LIB environment variable set where lt dir gt is the root of the source tree and must be an absolute path Eg VALGRIND LIB grind head4 in place grind head4 coregrind valgrind This allows you to compile and run with make instead of make install saving you time I recommend compiling with make quiet to further reduce the amount of output spewed out during compilation letting you actually see any errors warnings etc Running the regression tests To build and run all the regression tests run make quiet regtest To run a subset of the regression tests execute perl tests vg regtest name where name is a directory all tests within will be run or a single vgtest test file or the name of a program which has a like named vgtest file Eg perl tests vg regtest memcheck perl tests vg regtest memcheck tests badfree vgtest perl tests vg_regtest memcheck tests badfree Running the performance tests To build and run all the performance tests run make quiet perf To run a subset of the performance suite execute perl perf vg perf
194. iant Sections and required Cover Texts given in the Document s license notice H Include an unaltered copy of this License I Preserve the section Entitled History Preserve its Title and add to it an item stating at least the title year new authors and publisher of the Modified Version as given on the Title Page If there is no section Entitled History in the Document create one stating the title year authors and publisher of the Document as 205 The GNU Free Documentation License given on its Title Page then add an item describing the Modified Version as stated in the previous sentence J Preserve the network location if any given in the Document for public access to a Transparent copy of the Document and likewise the network locations given in the Document for previous versions it was based on These may be placed in the History section You may omit a network location for a work that was published at least four years before the Document itself or if the original publisher of the version it refers to gives permission K For any section Entitled Acknowledgements or Dedications Preserve the Title of the section and preserve in the section all the substance and tone of each of the contributor acknowledgements and or dedications given therein L Preserve all the Invariant Sections of the Document unaltered in their text and in their titles Section numbers or the equivalent are not considered part of the sect
195. ibc_freeres specifically to make glibc release all memory it has allocated Memcheck therefore tries to run __libc_freeres at exit Unfortunately in some versions of glibc __ 1 ibc_freeres is sufficiently buggy to cause segmentation faults This is particularly noticeable on Red Hat 7 1 So this flag is provided in order to inhibit the run of __libc_freeres If your program seems to run fine on Valgrind but segfaults at exit you may find that run libc freeres no fixes that although at the cost of possibly falsely reporting space leaks in 1ibc so sim hints hintl hint2 Pass miscellaneous hints to Valgrind which slightly modify the simulated behaviour in nonstandard or dangerous ways possibly to help the simulation of strange features By default no hints are enabled Use with caution Currently known hints are elax ioctls Be very lax about ioctl handling the only assumption is that the size is correct Doesn t require the full buffer to be initialized when writing Without this using some device drivers with a large number of strange ioctl commands becomes very tiresome enable inner Enable some special magic needed when the program being run is itself Valgrind kernel variant variantil variant2 Handle system calls and ioctls arising from minor variants of the default kernel for this platform This is useful for running on hacked kernels or with kernel modules which support nonstandard ioctls for example Use with c
196. ible 43 Memcheck a heavyweight memory checker After the system call Memcheck updates its tracked information to precisely reflect any changes in memory permissions caused by the system call Here s an example of two system calls with invalid parameters include lt stdlib h gt include lt unistd h gt Tine MAL VOLE charx arr malloc 10 int arr2 malloc sizeof int wclto Le sueco expe dq J escala enpiz2 1011 2 You get these complaints Syscall param write buf points to uninitialised byte s at 0x25A48723 __ write nocancel in lib tls libc 2 3 3 s0 by 0x259AFAD3 ces tono abn at als 1 315162 SES RES O by 0x8048348 within auto homes njn25 grind head4 a out Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc d at 0x259852B0 malloc vg_replace_malloc c 130 by 0x80483F1 main a c 5 Syscall param exit error_code contains uninitialised byte s Ehe ORA SAABA Gi ees Gin 7 dbaLloy 1e Jb laloe 2 3 S o 60 by 0x8048426 main a c 8 because the program has a tried to write uninitialised junk from the malloc d block to the standard output and b passed an uninitialised value to exit Note that the first error refers to the memory pointed to by buf not buf itself but the second error refers to the argument error_code itself 3 3 6 Overlapping source and destination blocks The following C library functions copy som
197. ication wishlist 83573 Valgrind SIGSEGV on execve 170 NEWS 82999 show which cmdline option was erroneous wishlist 83040 make valgrind VPATH and distcheck clean wishlist 83998 Assertion newfd gt vgPlain max fd failed see below 82722 Unchecked mmap in as pad leads to mysterious failures later 78958 memcheck seg faults while running Mozilla 85416 Arguments with colon e g logsocket ignored Additionally there are the following changes which are not connected to any bug report numbers AFAICS Rearranged address space layout relative to 2 1 1 so that Valgrind tools will run out of memory later than currently in many circumstances This is good news esp for Calltree It should be possible for client programs to allocate over 800MB of memory when using memcheck now Improved checking when laying out memory Should hopefully avoid the random segmentation faults that 2 1 1 sometimes caused Support for Fedora Core 2 and SuSE 9 1 Improvements to NPTL support to the extent that V now works properly on NPTL only setups Renamed the following options logfile fd gt log fd logfile gt log file logsocket gt log socket to be consistent with each other and other options esp input fd Add support for SIOCGMIIPHY SIOCGMIIREG and SIOCSMIIREG ioctls and improve the checking of other interface related ioctls Fix building with gcc 3
198. ich Valgrind stores the call stack at the time of the malloc call When the client calls free Valgrind tries to find the shadow block corresponding to the address passed to free and emits an error message if none can be found If it is found the block is placed on the freed blocks queue vg_freed_ list it is marked as inaccessible and its shadow block now records the call stack at the time of the free call Keeping free d blocks in this queue allows Valgrind to spot all presumably invalid accesses to them However once the volume of blocks in the free queue exceeds VG clo freelist vol blocks are finally removed from the queue Keeping track of A and V bits note if you don t know what these are you haven t read the user guide carefully enough for memory is done in vg memory c This implements a sparse array structure which covers the entire 4G address space in a way which is reasonably fast and reasonably space efficient The 4G address space is divided up into 64K sections each covering 64Kb of address space Given a 32 bit address the top 16 bits are used to select one of the 65536 entries in VG_ primary_map The resulting secondary SecMap holds A and V bits for the 64k of address space chunk corresponding to the lower 16 bits of the address 1 1 3 Design decisions Some design decisions were motivated by the need to make Valgrind debuggable Imagine you are writing a CPU simulator It works fairly well However you run some
199. ied when certain interesting events occur But the core takes care of all the hard work 4 1 3 Execution Spaces An important concept to understand before writing a tool is that there are three spaces in which program code executes 1 User space this covers most of the program s execution The tool is given the code and can instrument it any way it likes providing more or less total control over the code Code executed in user space includes all the program code almost all of the C library including things like the dynamic linker and almost all parts of all other libraries 2 Core space a small proportion of the program s execution takes place entirely within Valgrind s core This includes Dynamic memory management malloc etc Thread scheduling e Signal handling 133 Writing a New Valgrind Tool A tool has no control over these operations it never sees the code doing this work and thus cannot instrument it However the core provides hooks so a tool can be notified when certain interesting events happen for example when dynamic memory is allocated or freed the stack pointer is changed or a pthread mutex is locked etc Note that these hooks only notify tools of events relevant to user space For example when the core allocates some memory for its own use the tool is not notified of this because it s not directly part of the supervised program s execution W Kernel space execution in the ke
200. ies to establish what the illegal address might relate to since that s often useful So if it points into a block of memory which has already been freed you ll be informed of this and also where the block was free d at Likewise if it should turn out to be just off the end of a malloc d block a common result of off by one errors in array subscripting you ll be informed of this fact and also where the block was malloc d In this example Memcheck can t identify the address Actually the address is on the stack but for some reason this is not a valid stack address it is below the stack pointer and that isn t allowed In this particular case it s probably caused by gcc generating invalid code a known bug in some ancient versions of gcc Note that Memcheck only tells you that your program is about to access memory at an illegal address It can t stop the access from happening So if your program makes an access which normally would result in a segmentation fault you program will still suffer the same fate but you will get a message from Memcheck immediately prior to this In this particular example reading junk on the stack is non fatal and the program stays alive 3 3 2 Use of uninitialised values For example 41 Memcheck a heavyweight memory checker Conditional jump or move depends on uninitialised value s at 0x402DFA94 _IO_vfprintf itoa h 49 by 0x402E8476 IO _ printf printf c 36 by 0x8048472 main tests manu
201. ignal handler V6 oursignalhandler This simply notes the delivery of signals and returns Every 1000 basic blocks we see if more signals have arrived If so VG deliver signals builds signal delivery frames on the client s stack and allows their handlers to be run Valgrind places in these signal delivery frames a bogus return address V6 signalreturn bogusRA and checks all jumps to see if any jump to it If so this is a sign that a signal handler is returning and if so Valgrind removes the relevant signal frame from the client s stack restores the from the signal frame the simulated state before the signal was delivered and allows the client to run onwards We have to do it this way because some signal handlers never return they just 1ong jmp which nukes the signal delivery frame The Linux kernel has a different but equally horrible hack for detecting signal handler returns Discovering it is left as an exercise for the reader 1 2 14 To be written The following is a list of as yet not written stuff Apologies 1 The translation cache and translation table 2 Exceptions creating new translations 3 Self modifying code 111 The Design and Implementation of Valgrind 4 Errors error contexts error reporting suppressions 5 Client malloc free 6 Low level memory management 7 A and V bitmaps 8 Symbol table management 9 Dealing with system calls 10 Namespace management 11 GDB attaching 12 Non depende
202. ile descriptor leakage checks When enabled Valgrind will print out a list of open file descriptors on exit Along with each file descriptor Valgrind prints out a stack backtrace of where the file was opened and any details relating to the file descriptor such as the file name or socket details To use give track fds yes Implemented a few more SSE SSE2 instructions Less crud on the stack when you do where inside a GDB attach Fixed the following bugs 68360 Valgrind does not compile against 2 6 0 testX kernels 68525 CVS head doesn t compile on C90 compilers 68566 pkgconfig support wishlist 68588 Assertion sz 4 failed in vg_to_ucode c disInstr 69140 valgrind not able to explicitly specify a path to a binary 69432 helgrind asserts encountering a MutexErr when there are EraserErr suppressions Increase the max size of the translation cache from 200k average bbs to 300k average bbs Programs on the size of OOo 680m17 are thrashing the cache at the smaller size creating large numbers of retranslations and wasting significant time as a result Stable release 2 0 0 5 Nov 2003 2 0 0 improves SSE SSE2 support fixes some minor bugs and improves support for SuSE 9 and the Red Hat Severn beta Further improvements to SSE SSE2 support The entire test suite of the GNU Scientific Library gsl 1 4 compiled with Intel Icc 7 1 20030307Z g O xW now works I think this gives pretty good coverage o
203. iler The steps are described in detail in the following sections 4 1 2 Cache simulation specifics Cachegrind uses a simulation for a machine with a split L1 cache and a unified L2 cache This configuration is used for all modern x86 based machines we are aware of Old Cyrix CPUs had a unified I and D L1 cache but they are ancient history now The more specific characteristics of the simulation are as follows Write allocate when a write miss occurs the block written to is brought into the D1 cache Most modern caches have this property Bit selection hash function the line s in the cache to which a memory block maps is chosen by the middle bits M M N 1 of the byte address where e line size 24M bytes cache size line size 2 N bytes Inclusive L2 cache the L2 cache replicates all the entries of the L1 cache This is standard on Pentium chips but AMD Athlons use an exclusive L2 cache that only holds blocks evicted from L1 Ditto AMD Durons and most modern VIAs The cache configuration simulated cache size associativity and line size is determined automagically using the CPUID instruction If you have an old machine that a doesn t support the CPUID instruction or b supports it in an early incarnation that doesn t give any cache information then Cachegrind will fall back to using a default configuration that of a model 3 4 Athlon Cachegrind will tell you if this happens You can manually specify
204. in fact do block This is a known bug which we are looking into If you can your best bet unfortunately is to avoid using 1 9 5 on a Red Hat 9 system or on any NPTL based distribution If your glibc is 2 3 1 or earlier you re almost certainly OK Minor changes in 1 9 5 Added some errors to valgrind h to ensure people don t include it accidentally in their sources This is a change from 1 0 X which was never properly documented The right thing to include is now memcheck h Some people reported problems and strange behaviour when incorrectly including valgrind h in code with 1 9 1 1 9 4 This is no longer possible Add some __extension__ bits and pieces so that gcc configured for valgrind checking compiles even with Werror If you don t understand this ignore it Of interest to gcc developers only Removed a pointless check which caused problems interworking with Clearcase V would complain about shared objects whose names did not end so and refuse to run This is now fixed In fact it was fixed in 1 9 4 but not documented Fixed a bug causing an assertion failure of waiters 1 somewhere in vg_scheduler c when running large threaded apps notably MySQL 180 NEWS Add support for the munlock system call 124 Some comments about future releases 1 9 5 is we hope the most stable Valgrind so far It pretty much supersedes the 1 0 X branch If you are a valgrind packager please consider making
205. information Clearly written subject lines and message bodies are appreciated too Finally remember that despite the fact that most of the community are very helpful and responsive to emailed questions you are probably requesting help from unpaid volunteers so you have no guarantee of receiving an answer 87 Valgrind Technical Documentation Release 3 2 0 7 June 2006 Copyright 2000 2006 Valgrind Developers Email valgrind valgrind org Valgrind Technical Documentation Table of Contents 1 The Design and Implementation of Valgrind 0 eee eee 91 1d Introduction ivan cet ire bd RU ERI Pe bd eb NONE EO ede bon baci 91 os i2 se t Lr Ig pe ee eli a decir ae ded eriirdes ud uua deeb yeas 91 131 2 Design overview aia id Pha Sea dak dau bie ERE Reicha bey PERIERE 91 1 1 33 Design decisions iile sae lao ih Gad A eee diene Va in AA id RE T E daa 93 1 1 4 Correctness escri rg de idea cd erede Ae iad debated Mana aonb epo PR eph Rr des 93 11 5 Current limitations eo dora een rere sete E qt ition d a ae wha Pee Ht ge ta eae wed 95 12 The instruments JITter 0 da da a E ala mis 95 1 2 1 Run time storage and the use of host registers 95 1 2 2 Startup shutdown and system calls 6 cee eee eee 97 1 23 Introduction to UGOdE 25r edet rod e d REA pee ele AA e de EOS 98 1 2 4 UCode operand tags type Tag essa due aeu EEr RER E A ER eee bee E ER AR 98 1 2 5 UCode instructions type UInstr oo e e e 99 1 2
206. ing the core dumps do not include all the floating point register information If Valgrind itself crashes hopefully not the operating system will create a core dump in the usual way 2 10 Function wrapping Valgrind versions 3 2 0 and above and can do function wrapping on all supported targets In function wrapping calls to some specified function are intercepted and rerouted to a different user supplied function This can do whatever it likes typically examining the arguments calling onwards to the original and possibly examining the result Any number of different functions may be wrapped Function wrapping is useful for instrumenting an API in some way For example wrapping functions in the POSIX pthreads API makes it possible to notify Valgrind of thread status changes and wrapping functions in the MPI message passing API allows notifying Valgrind of memory status changes associated with message arrival departure Such information is usually passed to Valgrind by using client requests in the wrapper functions although that is not of relevance here 2 10 1 A Simple Example Supposing we want to wrap some function sane 506 dime ame y d ceuta E ar wp i A wrapper is a function of identical type but with a special name which identifies it as the wrapper for oo Wrappers need to include supporting macros from valgrind h Here is a simple wrapper which prints the arguments and return value 24 Using and understanding
207. ion size 0 1 2 4 byte and emits an error if any of them indicate undefinedness This is the only uopcode capable of doing such tests SETV whose parameters are also TempReg and a size makes the V bits in the TempReg indicated definedness at the specified operation size This is usually used to generate the correct V bits for a literal value which is of course fully defined Q ETVF and PUTVF are analogues to GETF and PUTF They move the single V bit used to model definedness of EFLAGS between its home in VG_ baseBlock and the specified TempReg oe TAG1 denotes one of a family of unary operations on TempRegs containing V bits Similarly TAG2 denotes one in a family of binary operations on V bits These 10 uopcodes are sufficient to express Valgrind s entire definedness checking semantics In fact most of the interesting magic is done by the TAG1 and TAG2 suboperations First however I need to explain about V vector operation sizes There are 4 sizes 1 2 and 4 which operate on groups of 8 16 and 32 V bits at a time supporting the usual 1 2 and 4 byte x86 operations However there is also the mysterious size 0 which really means a single V bit Single V bits are used in various circumstances in particular the definedness of EFLAGS is modelled with a single V bit Now might be a good time to also point out that for V bits 1 means undefined and 0 means defined Similarly for A bits 1 means invalid add
208. ion is identified by a i1e name function name pair If a column contains only a dot it means the function never performs that event eg the third row shows that st remp contains no instructions that write to memory The name is used if the the file name and or function name could wasn t compiled with g If any code was invalidated either due to self modifying code or unloading of shared objects its counts are aggregated into a single cost centre written as discarded discarded It is worth noting that functions will come from three types of source files 1 From the profiled program concord c in this example 2 From libraries eg getc c 3 From Valgrind s implementation of some libc functions eg vg clientmalloc c malloc These are recognisable because the filename begins with vg and is probably one of vg main c vg clientmalloc cor vg mylibc c There are two ways to annotate source files by choosing them manually or with the auto yes option To do it manually just specify the filenames as arguments to cg annotate For example the output from running cg annotate concord c for our example produces the same output as above followed by an annotated version of concord c a section of which looks like 57 Cachegrind a cache profiler User annotated source S IY G TR GIGS Iie Iba IE eine IDEE snip 3 ilk 1 1 0 0 5 0 0 4 991 0 1995 3 OSE ik i 1 99 6 0 0 dl 2 0 0 dl il il iL 1
209. ion titles M Delete any section Entitled Endorsements Such a section may not be included in the Modified Version N Do not retitle any existing section to be Entitled Endorsements or to conflict in title with any Invariant Section O Preserve any Warranty Disclaimers If the Modified Version includes new front matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document you may at your option designate some or all of these sections as invariant To do this add their titles to the list of Invariant Sections in the Modified Version s license notice These titles must be distinct from any other section titles You may add a section Entitled Endorsements provided it contains nothing but endorsements of your Modified Version by various parties for example statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard You may add a passage of up to five words as a Front Cover Text and a passage of up to 25 words as a Back Cover Text to the end of the list of Cover Texts in the Modified Version Only one passage of Front Cover Text and one of Back Cover Text may be added by or through arrangements made by any one entity If the Document already includes a cover text for the same cover previously added by you or by arrangement made by the same entity you are acting on behalf of you may not add another but yo
210. ions Nicholas Nethercote njn valgrind org Nick did the core tool generalisation wrote Cachegrind and Massif and tons of other stuff Paul Mackerras Paul did a lot of the initial per architecture factoring that forms the basis of the 3 0 line and is also to be seen in 2 4 0 He also did UCode based dynamic translation support for PowerPC and created a set of ppc linux derivatives of the 2 X release line Dirk Mueller dmuell gmx net Dirk contributed the malloc free mismatch checking stuff and various other bits and pieces and acted as our KDE liaison Donna Robinson donna terpsichore ws Keeper of the very excellent http www valgrind org Julian Seward julian valgrind org Julian was the original designer and author of Valgrind created the dynamic translation framework wrote Memcheck and Addrcheck and did lots of other things Robert Walsh rjwalsh valgrind org Robert added file descriptor leakage checking new library interception machinery support for client allocation pools and minor other tweakage 146 ACKNOWLEDGEMENTS Frederic Gobry helped with autoconf and automake Daniel Berlin modified readelf s dwarf2 source line reader written by Nick Clifton for use in Valgrind Michael Matz and Simon Hausmann modified the GNU binutils demangler s for use in Valgrind And lots and lots of other people sent bug reports patches and very helpful feedback 147 2 AUTHORS Cerion Armour Brow
211. ions for large programs and slowing them down more 2 5 Handling basic block retranslations 121 How Cachegrind works The above description ignores one complication Valgrind has a limited size cache for basic block translations if it fills up old translations are discarded If a discarded basic block is executed again it must be re translated However we can t use this approach for profiling we can t throw away cost centres for instructions in the middle of execution So when a basic block is translated we first look for its cost centre array in the hash table If there is no cost centre array it must be the first translation so we proceed as described above But if there is a cost centre array already it must be a retranslation In this case we skip the cost centre allocation and initialisation steps but still do the UCode instrumentation step 2 6 The cache simulation The cache simulation is fairly straightforward It just tracks which memory blocks are in the cache at the moment it doesn t track the contents since that is irrelevant The interface to the simulation is quite clean The functions called from the UCode contain calls to the simulation functions in the files vg cachesim I1 D1 L2 c these calls are inlined so that only one function call is done per simulated x86 instruction The file vg_cachesim c simply includes the three files containing the simulation which makes plugging in new cache simulations
212. ions processed later override those processed earlier for example options in valgrindrc will take precedence over those in valgrindrc The first two are particularly useful for setting the default tool to use Any tool specific options put in VALGRIND_OPTS or the valgrindrc files should be prefixed with the tool name and a colon For example if you want Memcheck to always do leak checking you can put the following entry in valgrindre memcheck leak check yes This will be ignored if any tool other than Memcheck is run Without the nemcheck part this will cause problems if you select other tools that don t understand 1eak check yes 2 7 The Client Request mechanism Valgrind has a trapdoor mechanism via which the client program can pass all manner of requests and queries to Valgrind and the current tool Internally this is used extensively to make malloc free etc work although you don t see that For your convenience a subset of these so called client requests is provided to allow you to tell Valgrind facts about the behaviour of your program and also to make queries In particular your program can tell Valgrind about changes in memory range permissions that Valgrind would not otherwise know about and so allows clients to get Valgrind to do arbitrary custom checks Clients need to include a header file to make this work Which header file depends on which client requests you use Some client requests a
213. ious factors It also requires admin space for freed blocks although massif does not count this stacks lt yes no gt default yes When enabled include stack s in the profile Threaded programs can have multiple stacks depth lt number gt default 3 Depth of call chains to present in the detailed heap information Increasing it will give more information but massif will run the program more slowly using more memory and produce a bigger massif pid txt or massif pid hp file alloc fn lt name gt Specify a function that allocates memory This is useful for functions that are wrappers to malloc which can fill up the context information uselessly and give very uninformative bands on the graph Functions specified will be ignored in contexts i e treated as though they were malloc This option can be specified multiple times on the command line to name multiple functions format text html default text Produce the detailed heap information in text or HTML format The file suffix used will be either t xt or html 76 7 Helgrind a data race detector To use this tool you must specify tool helgrind on the Valgrind command line Note Helgrind does not work in Valgrind 3 1 0 We hope to reinstate in version 3 2 0 7 1 Data Races Helgrind is a valgrind tool for detecting data races in C and C programs that use the Pthreads library It uses the Eraser algorithm described in Eraser A Dynamic
214. iously issued to you by malloc calloc realloc free delete delete you may only pass to these functions a pointer previously issued to you by the corresponding allocation function Otherwise Valgrind complains If the pointer is indeed valid Valgrind marks the entire area it points at as unaddressible and places the block in the freed blocks queue The aim is to defer as long as possible reallocation of this block Until that happens all attempts to access it will elicit an invalid address error as you would hope 49 Memcheck a heavyweight memory checker 3 6 Client Requests The following client requests are defined in memcheck h See memcheck h for exact details of their arguments e VALGRIND_MAKE_MEM_NOACCESS VALGRIND_MAKE_MEM_UNDEFINED and VALGRIND MAKE MEM DEFINED These mark address ranges as completely inaccessible accessible but containing undefined data and accessible and containing defined data respectively Subsequent errors may have their faulting addresses described in terms of these blocks Returns a block handle Returns zero when not run on Valgrind VALGRIND MAKE MEM DEFINED_IF_ADDRESSABLE This is just like VALGRIND MAKE MEM DEFINED but only affects those bytes that are already addressable e VALGRIND_DISCARD At some point you may want Valgrind to stop reporting errors in terms of the blocks defined by the pr
215. irable property There is some mucking around to do with subregisters SAL vs SAH SAX vs EAX etc I can t remember how it works but in general we are very conservative and these tend to invalidate the caching Redundant PUT elimination This annuls PUTs of values back to simulated CPU registers if a later PUT would overwrite the earlier PUT value and there is no intervening reads of the simulated register ArchReg As before we are paranoid when faced with subregister references Also PUTs of SESP are never annulled because it is vital the instrumenter always has an up to date SESP value available ESP changes affect addressibility of the memory around the simulated stack pointer The implication of the above paragraph is that the simulated machine s registers are only lazily updated once the above two optimisation phases have run with the exception of ESP TempRegs go dead at the end of every basic block from which is is inferrable that any TempReg caching a simulated CPU reg is flushed back into the relevant VG_ baseBlock slot at the end of every basic block The further implication is that the simulated registers are only up to date at in between basic blocks and not at arbitrary points inside basic blocks And the consequence of that is that we can only deliver signals to the client in between basic blocks None of this seems any problem in practice Finally there is a simple def use thing for c
216. is is useful when running MPI programs For further details see Section 2 3 The Commentary in the manual log socket lt ip address port number gt Specifies that Valgrind should send all of its messages to the specified port at the specified IP address The port may be omitted in which case port 1500 is used If a connection cannot be made to the specified socket Valgrind falls back to writing output to the standard error stderr This option is intended to be used in conjunction with the valgrind listener program For further details see Section 2 3 The Commentary in the manual 2 6 3 Error related options These options are used by all tools that can report errors e g Memcheck but not Cachegrind xml yes no default no When enabled output will be in XML format This is aimed at making life easier for tools that consume Valgrind s output as input such as GUI front ends Currently this option only works with Memcheck xml user comment string Embeds an extra user comment string at the start of the XML output Only works when xml yes is specified ignored otherwise 16 Using and understanding the Valgrind core demangle lt yes no gt default yes Enable disable automatic demangling decoding of C names Enabled by default When enabled Valgrind will attempt to translate encoded C names back to something approaching the original The demangler handles symbols mangled by g versions 2 X 3
217. itely lost in loss record 1 of 14 Gum bs ccs ooB mallas ye meplace meillloe 3 lex Obss so osos Ws eak treere 1 7 beso ce Maan eak CEEE 88 8 direct 80 indirect bytes in 1 blocks are definitely lost in loss record 13 of 14 at bsososesesc amp S MALLOC ve replace malloc C3 oor loy Oso oc occ k Mer creas es los Weiss ooo se mesa eak tiras 9925 The first message describes a simple case of a single 8 byte block that has been definitely lost The second case mentions both direct and indirect leaks The distinction is that a direct leak is a block which has no pointers to it An indirect leak is a block which is only pointed to by other leaked blocks Both kinds of leak are bad 45 Memcheck a heavyweight memory checker The precise area of memory in which Memcheck searches for pointers is all naturally aligned machine word sized words for which all A bits indicate addressibility and all V bits indicated that the stored value is actually valid 3 4 Writing suppression files The basic suppression format is described in Suppressing errors The suppression 2nd line should have the form Memcheck suppression type The Memcheck suppression types are as follows Valuel Value2 Value4 Value8 Valuel 6 meaning an uninitialised value error when using a value of 1 2 4 8 or 16 bytes Or Cond or its old name Value0 meaning use of an uninitialised CPU condition code Or Addr1 Addr2 Addr4
218. iting b manual xml to help you see how it s looking The generated files end up in valgrind docs html Use the following command within valgrind docs 9 make html docs 7 When you have finished also generate pdf and ps output to check all is well from within valgrind docs make print docs Check the output pdf and ps files in valgrind docs print 141 Writing a New Valgrind Tool 4 3 3 Regression Tests Valgrind has some support for regression tests If you want to write regression tests for your tool 1 Make a directory foobar tests Make sure the name of the directory is tests as the build system assumes that any tests for the tool will be in a directory by that name 2 Edit configure in adding foobar tests Makefile to the AC OUTPUT list 3 Write foobar tests Makefile am Use memcheck tests Makefile amas an example 4 Write the tests vgtest test description files stdout exp and stderr exp expected output files Note that Valgrind s output goes to stderr Some details on writing and running tests are given in the comments at the top of the testing script tests vg_regtest 5 Write a filter for stderr results foobar tests filter_stderr It can call the existing filters in tests See memcheck tests filter_stderr for an example in particular note the dir trick that ensures the filter works correctly from any directory 4 3 4 Profiling To profile a tool use Cachegrind on it Read README DEVELOPE
219. itional jump executed count times to the given target position jcnd exe count jumpcount target position Callgrind Conditional jump executed exe count times with jumpcount jumps to the given target position 132 4 Writing a New Valgrind Tool 4 1 Introduction 4 1 1 Supervised Execution Valgrind provides a generic infrastructure for supervising the execution of programs This is done by providing a way to instrument programs in very precise ways making it relatively easy to support activities such as dynamic error detection and profiling Although writing a tool is not easy and requires learning quite a few things about Valgrind it is much easier than instrumenting a program from scratch yourself Nb What follows is slightly out of date 4 1 2 Tools The key idea behind Valgrind s architecture is the division between its core and tools The core provides the common low level infrastructure to support program instrumentation including the JIT compiler low level memory manager signal handling and a scheduler for pthreads It also provides certain services that are useful to some but not all tools such as support for error recording and suppression But the core leaves certain operations undefined which must be filled by tools Most notably tools define how program code should be instrumented They can also call certain functions to indicate to the core that they would like to use certain services or be notif
220. itive noises about it but haven t been able to verify this myself not until I get hold of a copy of 9 A detailed list of changes in no particular order Describe gen suppressions in the FAQ Syscall NR waitpid supported Minor MMX bug fix v prints program s argv at startup More glibc 2 3 suppressions Suppressions for stack underrun bug s in the c support library distributed with Intel Icc 7 0 Fix problems reading proc self maps Fix a couple of messages that should have been suppressed by q but weren t Make Addrcheck understand Overlap suppressions At startup check if program is statically linked and bail out if so Cachegrind Auto detect Intel Pentium M also VIA Nehemiah 175 NEWS Memcheck addrcheck minor speed optimisations Handle syscall __NR_brk more correctly than before Fixed incorrect allocate free mismatch errors when using operator new unsigned std nothrow_t const z operator new unsigned std nothrow_t const amp Support POSIX pthread spinlocks Fixups for clean compilation with gcc 3 3 1 Implemented more opcodes push es push ds pop es pop ds movntq sfence pshufw pavgb ucomiss enter mov imm32 esp all in and out opcodes inc dec esp A whole bunch of SSE SSE2 instructions Memcheck don t bomb on SSE SSE2 code Snapshot 20030725 25 July 2003 Fixes some minor problems in 20030716 Fix bugs in overlap
221. itten in the body of this License 9 The Free Software Foundation may publish revised and or new versions of the General Public License from time to time Such new versions will be similar in spirit to the present version but may differ in detail to address new problems or concerns Each version is given a distinguishing version number If the Program specifies a version number of this License which applies to it and any later version you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation If the Program does not specify a version number of this License you may choose any version ever published by the Free Software Foundation 10 If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different write to the author to ask for permission For software which is copyrighted by the Free Software Foundation write to the Free Software Foundation we sometimes make exceptions for this Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally NO WARRANTY 11 BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE THERE IS NO WARRANTY FOR THE PROGRAM TO THE EXTENT PERMITTED BY APPLICABLE LAW EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND OR OTHER PARTIES PROVIDE THE PROGR
222. k Memory leaks where pointers to malloc d blocks are lost forever Mismatched use of malloc new new vs free delete delete Overlapping src and dst pointers in memcpy and related functions 3 2 Command line flags specific to Memcheck leak check no summary yes full default summary When enabled search for memory leaks when the client program finishes A memory leak means a malloc d block which has not yet been free d but to which no pointer can be found Such a block can never be free d by the program since no pointer to it exists If set to summary it says how many leaks occurred If set to full or yes it gives details of each individual leak show reachable yes no default no When disabled the memory leak detector only shows blocks for which it cannot find a pointer to at all or it can only find a pointer to the middle of These blocks are prime candidates for memory leaks When enabled the leak detector also reports on blocks which it could find a pointer to Your program could at least in principle have freed such blocks before exit Contrast this to blocks for which no pointer or only an interior pointer could be found they are more likely to indicate memory leaks because you do not actually have a pointer to the start of the block which you can hand to free even if you wanted to 39 Memcheck a heavyweight memory checker leak resolution low med high default low When doing leak
223. k up to the result s sizes as needed If by seeing that all the args are got rid of with CLEAR and none with POP Valgrind sees that the result of the call is not actually used it immediately examines the result V bit with a TESTV SETV pair If it did not do this there would be no observation point to detect that the some of the args to the helper were undefined Of course if the helper s results are indeed used we 105 The Design and Implementation of Valgrind don t do this since the result usage will presumably cause the result definedness to be checked at some suitable future point In general Valgrind tries to track definedness on a bit for bit basis but as the above para shows for calls to helpers we throw in the towel and approximate down to a single bit This is because it s too complex and difficult to track bit level definedness through complex ops such as integer multiply and divide and in any case there is no reasonable code fragments which attempt to eg multiply two partially defined values and end up with something meaningful so there seems little point in modelling multiplies divides etc in that level of detail Integer loads and stores are instrumented with firstly a test of the definedness of the address followed by a LOADV or STOREV respectively These turn into calls to for example V6 helperc_LOADV4 These helpers do two things they perform an address valid check and they load or store V bits
224. l tell you and will leave behind a file named massif pid hp containing the raw heap profiling data Here s an example graph date 1 ms sample 582 072 bytes x ms time and late B M oon comu s maa zias ria O _ Amm nam Bl onerat ninm rn B top mn 3 sepan mara B aaa ccm fro B 3 e r mem liini The graph is broken into several bands Most bands represent a single line of your program that does some heap allocation each such band represents all the allocations and deallocations done from that line Up to twenty bands are shown less significant allocation sites are merged into other and or OTHER bands The accompanying text HTML file produced by Massif has more detail about these heap allocation bands Then there are single bands for the stack s and heap admin bytes Note it s the height of a band that s important Don t let the ups and downs caused by other bands confuse you For example the read alias file band in the example has the same height all the time it s in existence The triangles on the x axis show each point at which a memory census was taken These aren t necessarily evenly spread Massif only takes a census when memory is allocated or deallocated The time on the x axis is wallclock time 73 Massif a heap profiler which is not ideal because you can get different graphs for different executions of the same program due to random OS delays But it s not too bad and it becomes less
225. large program like Netscape and after tens of millions of instructions it crashes How can you figure out where in your simulator the bug is Valgrind s answer is cheat Valgrind is designed so that it is possible to switch back to running the client program on the real CPU at any point Using the stop after flag you can ask Valgrind to run just some number of basic blocks and then run the rest of the way on the real CPU If you are searching for a bug in the simulated CPU you can use this to do a binary search which quickly leads you to the specific basic block which is causing the problem This is all very handy It does constrain the design in certain unimportant ways Firstly the layout of memory when viewed from the client s point of view must be identical regardless of whether it is running on the real or simulated CPU This means that Valgrind can t do pointer swizzling well no great loss and it can t run on the same stack as the client again no great loss Valgrind operates on its own stack VG_ stack which it switches to at startup temporarily switching back to the client s stack when doing system calls for the client Valgrind also receives signals on its own stack VG_ sigstack but for different gruesome reasons discussed below This nice clean switch back to the real CPU whenever you like story is muddied by signals Problem is that signals arrive at arbitrary times and tend to slightly perturb t
226. le PPC64 Linux setups you get a dual architecture build so that both 32 bit and 64 bit executables can be run Linux on POWERS 153 NEWS is supported and POWERA is also believed to work Both 32 bit and 64 bit DWARF2 is supported This port is known to work well with both gcc compiled and xlc xlf compiled code Floating point accuracy has been improved for PPC32 Linux Specifically the floating point rounding mode is observed on all FP arithmetic operations and multiply accumulate instructions are preserved by the compilation pipeline This means you should get FP results which are bit for bit identical to a native run These improvements are also present in the PPC64 Linux port Lackey the example tool has been improved t has a new option detailed counts off by default which causes it to print out a count of loads stores and ALU operations done and their sizes t has a new option trace mem off by default which causes it to print out a trace of all memory accesses performed by a program It s a good starting point for building Valgrind tools that need to track memory accesses Read the comments at the top of the file lackey Ik main c for details The original instrumentation counting numbers of instructions jumps etc is now controlled by a new option basic counts It is on by default MPI support partial support for debugging distributed applications using the MPI library specification has been ad
227. leanup removing redundant value check computations 5 Register allocation vg do register allocation which note is done on UCode 6 Emission of final instrumented x86 code VG emit code Notice how steps 2 3 4 and 5 are simple UCode to UCode transformation passes all on straight line blocks of UCode type UCodeBlock Steps 2 and 4 are optimisation passes and can be disabled for debugging purposes with optimise no and cleanup no respectively Valgrind can also run in a no instrumentation mode given instrument no This is useful for debugging the JITter quickly without having to deal with the complexity of the instrumentation mechanism too In this mode steps 3 and 4 are omitted These flags combine so that instrument no together with opt imise no means only steps 1 5 and 6 are used single step yes causes each x86 instruction to be treated as a single basic block The translations are terrible but this is sometimes instructive The stop after N flag switches back to the real CPU after N basic blocks It also re JITs the final basic block executed and prints the debugging info resulting so this gives you a way to get a quick snapshot of how a basic block looks as it passes through the six stages mentioned above If you want to see full information for every block translated probably not but still find in V6 translate the lines dis True dis debugging_translation and comment out
228. lf exe are supported 165 NEWS BUGS FIXED 88520 88604 88614 88703 88886 89032 89106 89139 89198 89263 89440 89481 89663 89792 90111 90128 90778 90834 91028 91162 91199 91325 91599 91604 91821 91844 92264 92331 92420 92513 92528 93096 93117 93128 93174 93309 93328 93763 93776 93810 94378 94429 94645 94953 95667 96243 96252 96520 96660 96747 96923 pipe fork dup2 kills the main program Valgrind Aborts when using VALGRIND_OPTS and user progra valgrind vg libpthread c 2323 read Assertion read_pt Stabs parser fails to handle ioctl wrappers for TIOCMBIS and TIOCMBIC valgrind pthread_cond_timedwait fails the impossible happened Missing sched_setaffinity amp sched_getaffinity valgrind lacks support for SIOCSPGRP and SIOCGPGRP Missing ioctl translations for scsi generic and CD playing tests deadlock c line endings impossible happened EXEC FAILED valgrind 2 2 0 crash on Redhat 7 2 Report pthread_mutex_lock deadlocks instead of returnin statvfs64 gives invalid error warning crash memory fault with stabs generated by gnat for a run VALGRIND_CHECK_DEFINEDO not as documented in memcheck h cachegrind crashes at end of program without reporting re valgrind vg_memory c 229 vgPlain_unmap_range Assertio valgrind crash while debugging drivel 1 2 1 Unimplemented function Signal routing does not propagate the siginfo structure Assertion cv void 0 rw_
229. lgrind e Valgrind runs in the same namespace as the client at least from 1d so s point of view and it therefore absolutely had better not export any symbol with a name which could clash with that of the client or any of its libraries Therefore all globally visible symbols exported from valgrind so are defined using the VG_ CPP macro As you ll see from vg constants h this appends some arbitrary prefix to the symbol in order that it be we hope globally unique Currently the prefix is vgPlain For convenience there are also VGM VGP_ and VGOFF All locally defined symbols are declared stat ic and do not appear in the final shared object To check this I periodically do nm valgrind so grep T which shows you all the globally exported text symbols They should all have an approved prefix except for those like malloc free etc which we deliberately want to shadow and take precedence over the same names exported from glibc so so that valgrind can intercept those calls easily Similarly nm valgrind so grep D allows you to find any rogue data segment symbol names Valgrind tries and almost succeeds in being completely independent of all other shared objects in particular of glibc so For example we have our own low level memory manager in vg malloc2 c which is a fairly standard malloc free scheme augmented with arenas and vg mylibc c exports reimplementations of various bits and pieces you d normally get from the C library
230. lgrind h into your source and add CALLGRIND DUMP STATS when you want a dump to happen Use CALLGRIND ZERO STATS to only Zero cost centers In Valgrind terminology this method is called Client requests The given macros generate a special instruction pattern with no effect at all i e a NOP When run under Valgrind the CPU simulation engine detects the special instruction pattern and triggers special actions like the ones described above 65 Callgrind a heavyweight profiler If you are running a multi threaded application and specify the command line option separate threads yes every thread will be profiled on its own and will create its own profile dump Thus the last two methods will only generate one dump of the currently running thread With the other methods you will get multiple dumps one for each thread on a dump request 5 3 3 Limiting the range of collected events For aggregating events function enter leave instruction execution memory access into event numbers first the events must be recognizable by Callgrind and second the collection state must be switched on Event collection is only possible if instrumentation for program code is switched on This is the default but for faster execution identical to valgrind tool none it can be switched off until the program reaches a state in which you want to start collecting profiling data Callgrind can start without instrumentation by specifying optio
231. ll to disable caching for debugging configure help Print a summary of the options to configure and exit quiet silent q Do not print messages saying which checks are being made To suppress all normal output redirect it to dev null any error messages will still be shown srcdir DIR Look for the package s source code in directory DIR Usually configure can determine that directory automatically version Print the version of Autoconf used to generate the configure script and exit configure also accepts some other not widely useful options 152 4 NEWS Release 3 2 0 7 June 2006 3 2 0 is a feature release with many significant improvements and the usual collection of bug fixes This release supports X86 Linux AMD64 Linux PPC32 Linux and PPC64 Linux Performance especially of Memcheck is improved Addrcheck has been removed Callgrind has been added PPC64 Linux support has been added Lackey has been improved and MPI support has been added In detail Memcheck has improved speed and reduced memory use Run times are typically reduced by 15 30 averaging about 24 for SPEC CPU2000 The other tools have smaller but noticeable speed improvments We are interested to hear what improvements users get Memcheck uses less memory due to the introduction of a compressed representation for shadow memory The space overhead has been reduced by a fa
232. lookup clears orig and sends the NULL value to rw_new Small problems building valgrind with top_builddir ne t signal 11 SIGSEGV at get_tcb libpthread c 86 in corec UNIMPLEMENTED FUNCTION pthread_condattr_setpshared per target flags necessitate AM_PROG_CC_C_O valgrind doesn t compile with linux 2 6 8 1 9 Valgrind 2 2 0 generates some warning messages vg_symtab2 c 170 addLoc Assertion loc gt size gt 0 failed unhandled ioctl Ox4B3A and 0x5601 Tool and core interface versions do not match Can t run valgrind tool memcheck because of unimplement Valgrind can crash if passed bad args to certain syscalls Stack frame in new thread is badly aligned Wrong types used with sys_sigprocmask usr include asm msr h is missing valgrind vg memory c 508 vgPlain find map space Asser fcntl argument checking a bit too strict Assertion tst sigqueue head tst gt sigqueue_tail failed valgrind 2 2 0 segfault with mmap64 in glibc 2 3 3 Impossible happened PINSRW mem valgrind the impossible happened SIGSEGV Valgrind does not work with any KDE app Assertion res 0 failed stage2 loader of valgrind fails to allocate memory All programs crashing at dl start in lib ld 2 3 3 so ioctl CDROMREADTOCENTRY causes bogus warnings After looping in a segfault handler the impossible happens Zero sized arrays crash valgrind trace back with SIGFPE 166 NEWS 96948 valgrind stops with assertion failure regarding mmap2
233. ls are provided callgrind_annotate This command reads in the profile data and prints a sorted lists of functions optionally with annotation For graphical visualization of the data check out KCachegrind callgrind_control This command enables you to interactively observe and control the status of currently running applications without stopping the application You can get statistics information the current stack trace and request zeroing of counters and dumping of profiles data To use Callgrind you must specify tool callgrind on the Valgrind command line or use the supplied script callgrind Callgrind s cache simulation is based on the Cachegrind tool of the Valgrind package Read Cachegrind s documentation first this page describes the features supported in addition to Cachegrind s features 5 2 Purpose 5 2 1 Profiling as part of Application Development With application development a common step is to improve runtime performance To not waste time on optimizing functions which are rarely used one needs to know in which parts of the program most of the time is spent This is done with a technique called profiling The program is run under control of a profiling tool which gives the time distribution of executed functions in the run After examination of the program s profile it should be clear if and where optimization is useful Afterwards one should verify any runtime changes by another profile run 5 2 2
234. ly from distribution of the Program If any portion of this section is held invalid or unenforceable under any particular circumstance the balance of the section is intended to apply and the section as a whole is intended to apply in other 198 The GNU General Public License circumstances It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system it is up to the author donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License 8 If the distribution and or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries so that distribution is permitted only in or among countries not thus excluded In such case this License incorporates the limitation as if wr
235. mentations of Addrcheck and Helgrind 110652 AMD64 valgrind crashes on cwtd instruction 110653 AMD64 valgrind crashes on sarb 0x4 foo rip instruction 110656 PATH usr bin bin valgrind foobar stats fooba 110657 Small test fixes 110671 vex x86 gt IR unhandled instruction bytes OxF3 OxC3 rep ret n i bz Nick Cachegrind should not assert when it encounters a client request 110685 amd64 gt IR unhandled instruction bytes OxEl 0x56 loope Jb 110830 configuring with host fails to build 32 bit on 64 bit target 110875 Assertion when execve fails n i bz Updates to Memcheck manual n i bz Fixed broken malloc_usable_size 110898 opteron instructions missing btq btsq btrq bsfq 110954 x86 gt IR unhandled instruction bytes OxE2 OxF6 loop Jb n i bz Make suppressions work for lines in stacktraces 111006 bogus warnings from linuxthreads 111092 x86 dis_Grp2 Reg unhandled case x86 111231 sctp getladdrs and sctp getpaddrs returns uninitialized memory 111102 comment 44 Fixed 64 bit unclean silly arg message n i bz vex x86 gt IR unhandled instruction bytes 0x14 0x0 n i bz minor umount fcntl wrapper fixes 111090 Internal Error running Massif 101204 noisy warning 111513 Illegal opcode for SSE instruction x86 movups 111555 VEX Makefile CC is set to gcc n i bz Fix XML bugs in FAQ 3 0 1 29 August 05 vex branches VEX_3_0 BRANCH 11367 valgrind branches VALGRIND 3 0 BRANCH 14574 Release 3 0 0
236. mented function exits after all these refer to stack addresses and will make no sense whatever when some other function happens to re use the same stack address range probably shortly afterwards Ithink I would be inclined to define a special stack specific macro VALGRIND MAKE NOACCESS STACK addr len T which causes Valgrind to record the client s ESP at the time it is executed Valgrind will then watch for changes in ESP and discard such records as soon as the protected area is uncovered by an increase in ESP I hesitate with this scheme only because it is potentially expensive if there are hundreds of such records and considering that changes in ESP already require expensive messing with stack access permissions This is probably easier and more robust than for the instrumenter program to try and spot all exit points for the procedure and place suitable deallocation annotations there Plus C procedures can bomb out at any point if they get an exception so spotting return points at the source level just won t work at all Although some work it s all eminently doable and it would make Valgrind into an even more useful tool 116 2 How Cachegrind works 2 1 Cache profiling Note this document is now very old and a lot of its contents are out of date and misleading Valgrind is a very nice platform for doing cache profiling and other kinds of simulation because it converts horrible x86 instruction
237. n instr atstart no Instrumentation can be switched on interactively with cal lgrincl comezolL i CO T and off by specifying off instead of on Furthermore instrumentation state can be programatically changed with the macros CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION In addition to enabling instrumentation you must also enable event collection for the parts of your program you are interested in By default event collection is enabled everywhere You can limit collection to specific function s by using toggle collect funcprefix This will toggle the collection state on entering and leaving the specified functions When this option is in effect the default collection state at program start is off Only events happening while running inside of functions starting with funcprefix will be collected Recursive calls of functions with funcprefix do not trigger any action It is important to note that with instrumentation switched off the cache simulator cannot see any memory access events and thus any simulated cache state will be frozen and wrong without instrumentation Therefore to get useful cache events hits misses after switching on instrumentation the cache first must warm up probably leading to many cold misses which would not have happened in reality If you do not want to see these start event collection a few million instructions after you have switched on instrumentation 5 3 4 Avoidin
238. n you can shorten the context name as functions in the same group will not appear in sequence in the name fn recursion number function Separate number recursions for function See Avoiding cycles fn caller number function Separate number callers for function See Avoiding cycles 5 4 6 Cache simulation options simulate cache yes no default no Specify if you want to do full cache simulation By default only instruction read accesses will be profiled 70 6 Massif a heap profiler To use this tool you must specify tool massif on the Valgrind command line 6 1 Heap profiling Massif is a heap profiler i e it measures how much heap memory programs use In particular it can give you information about Heap blocks Heap administration blocks Stack sizes Heap profiling is useful to help you reduce the amount of memory your program uses On modern machines with virtual memory this provides the following benefits t can speed up your program a smaller program will interact better with your machine s caches and avoid paging f your program uses lots of memory it will reduce the chance that it exhausts your machine s swap space Also there are certain space leaks that aren t detected by traditional leak checkers such as Memcheck s That s because the memory isn t ever actually lost a pointer remains to it but it s not in use Programs that have leaks
239. n failure in __pthread_unwind 86641 memcheck doesn t work with Mesa OpenGL ATI on Suse 9 1 also fixes 74298 a duplicate of this 85947 MMX SSE unhandled instruction sfence 168 NEWS 84978 Wrong error Conditional jump or move depends on uninitialised value resulting from sbbl reg 9oreg 86254 ssort fails when signed int return type from comparison is too small to handle result of unsigned int subtraction 87089 memalign 4 xxx makes valgrind assert 86407 Add support for low level parallel port driver ioctls 70587 Add timestamps to Valgrind output wishlist 84937 vg libpthread c 2505 se remap Assertion res 0 fixed prior to 2 1 2 86317 cannot load libSDL 1 2 so 0 using valgrind 86989 memcpy from mac replace strmem c complains about uninitialized pointers passed when length to copy is zero 85811 gnu pascal symbol causes segmentation fault ok in 2 0 0 79138 writing to sbrk d memory causes segfault 77369 sched deadlock while signal received during pthread join and the joined thread exited 88115 In signal handler for SIGFPE siginfo gt si_addr is wrong under Valgrind 78765 Massif crashes on app exit if FP exceptions are enabled Additionally there are the following changes which are not connected to any bug report numbers AFAICS Fix scary bug causing mis identification of SSE stores vs loads and so causing memcheck to sometimes give nonse
240. n sisri t aea e he nee 127 Valgrind Technical Documentation 3 1 7 Miscellaneous catar eset ad paca ge eae Ba idea NG Wik Oe REP ia PER ewe eed dd 127 3 2 Reference isa rise heed aoe cided ERR np e CR ee lada 128 3 2 1 Grammat ice ue Rack NR era Ga Base HEAR Oe Sy ia pex ea redu e E a RE 128 3 2 2 Description of Header Lines si hirsin restet tpr ean aean n 130 3 2 3 Description Of Body ines iii a LT et OX Va Red a EUR EG Ent CER e A ene 132 4 Writing a New Valgund Tool uitio cia pestes eR re ERI DECRE uen e gebe eda s 133 4 T Introduction cerei When E eed coed tae P EON IRE rU PERTH aces Dd 133 4 1 1 Supervised Execution A omi a Rl RI Ede Rx aee Eu C E ete a Rays 133 AIDA TOONS Sixto bsc pese a petet co Meteo kee MAD eh ee OD tad dato e ds edel 133 4 1 3 Execution Spaces eI pie Aulus E eR PAIS d aun E PAAR dea 133 42 Writing a Tool eite Rhine ree tm esce rh eh denda d ei pee te Re rb done 134 42 1 Why writeatool 4 03 cieyes ey tek OR a RR RE E EORR EE 134 4 2 2 Suggested tools coitus hib iw DR IP X ere eC Wd VR ue 134 4 2 3 How tools Work iiem iaa ita rada aid acad a aida tad 135 4 2 A Getting the Code oriol e a la as 135 ES PG A A D IRI erede oe i ance oca 136 42 67 Writing the code alias e de e d ee n a TE 137 4 27 Intialisation ib a Andee sa p sek Rae ea das a NP CURE e dett 137 42 8 Instr mentatlOhD L1 oes ee Apr eben uenia eiue ea eee cocoate tees E 138 4 2 9 Finalisation ssoceesiee e e m
241. n worked on PowerPC instruction set support using the Vex dynamic translation framework Jeremy Fitzhardinge wrote Helgrind and totally overhauled low level syscall signal and address space layout stuff among many other things Tom Hughes did a vast number of bug fixes and helped out with support for more recent Linux glibc versions Nicholas Nethercote did the core tool generalisation wrote Cachegrind and Massif and tons of other stuff Paul Mackerras did a lot of the initial per architecture factoring that forms the basis of the 3 0 line and is also to be seen in 2 4 0 He also did UCode based dynamic translation support for PowerPC and created a set of ppc linux derivatives of the 2 X release line Dirk Mueller contributed the malloc free mismatch checking stuff and other bits and pieces and acted as our KDE liaison Julian Seward was the original founder designer and author created the dynamic translation frameworks wrote Memcheck and Addrcheck and did lots of other things Robert Walsh added file descriptor leakage checking new library interception machinery support for client allocation pools and minor other tweakage Josef Weidendorfer wrote Callgrind and the associated KCachegrind GUI Frederic Gobry helped with autoconf and automake Daniel Berlin modified readelf s dwarf2 source line reader written by Nick Clifton for use in Valgrind Michael Matz and Simon Hausmann modified the GNU binutils demangler
242. nce on glibc or anything else 13 The leak detector 14 Performance problems 15 Continuous sanity checking 16 Tracing or not tracing child processes hb 7 Assembly glue for syscalls 1 3 Extensions Some comments about Stuff To Do 1 3 1 Bugs Stephan Kulow and Marc Mutz report problems with kmail in KDE 3 CVS RC2 ish when run on Valgrind Stephan has it deadlocking Marc has it looping at startup I can t repro either behaviour Needs repro ing and fixing 1 3 2 Threads Doing a good job of thread support strikes me as almost a research level problem The central issues are how to do fast cheap locking of the VG primary map structure whether or not accesses to the individual secondary maps need locking what race condition issues result and whether the already nasty mess that is the signal simulator needs further hackery I realise that threads are the most frequently requested feature and I am thinking about it all If you have guru level understanding of fast mutual exclusion mechanisms and race conditions I would be interested in hearing from you 1 3 3 Verification suite Directory tests contains various ad hoc tests for Valgrind However there is no systematic verification or regression suite that for example exercises all the stuff in vg_memory c to ensure that illegal memory accesses and undefined value uses are detected as they should be It would be good to have such a suite 1 3 4 Porting to o
243. nconditional and conditional moves of values between TempRegs ALU operations Again in RISC like fashion these only operate on TempRegs before reg alloc or RealRegs after reg alloc These are ADD ADC AND OR XOR SUB SBB SHL SHR SAR ROL ROR RCL RCR NOT NEG INC DEC BSWAP CC2VAL and WIDEN WIDEN does signed or unsigned value widening CC2VAL is used to convert condition codes into a value zero or one The rest are obvious To allow for more efficient code generation we bend slightly the restriction at the start of the previous para for ADD ADC XOR SUB and SBB we allow the first source operand to also be an ArchReg that is one of the simulated machine s registers Also many of these ALU ops allow the source operand to be a literal See VG saneUInstr for the final word on the allowable forms of uinstrs LEA1 and LEA2 are not strictly necessary but allow faciliate better translations They record the fancy x86 addressing modes in a direct way which allows those amodes to be emitted back into the final instruction stream more or less verbatim CALLM calls a machine code helper one of the methods whose address is stored at some VG_ baseBlock offset PUSH and POP move values to from TempReg to the real Valgrind s stack and CLEAR removes values from the stack CALLM_S and CALLM E delimit the boundaries of call setups and clearings for the benefit of the instrumentation passes Gettin
244. nd becomes trapped in valgrind so and the translations it generates The synthetic CPU provided by Valgrind does however return from this initialisation function So the normal startup actions orchestrated by the dynamic linker 1d so continue as usual except on the synthetic CPU not the real one Eventually main is run and returns and then the finalisation code of the shared objects is run presumably in inverse order to which they were initialised Remember this is still all happening on the simulated CPU Eventually valgrind so s own finalisation code is called It spots this event shuts down the simulated CPU prints any error summaries and or does leak detection and returns from the initialisation code on the real CPU At this point in effect the real and synthetic CPUs have merged back into one Valgrind has lost control of the program and the program finally exit s back to the kernel in the usual way The normal course of activity once Valgrind has started up is as follows Valgrind never runs any part of your program usually referred to as the client not a single byte of it directly Instead it uses function VG_ translate to translate basic blocks BBs straight line sequences of code into instrumented translations and those are run instead The translations are stored in the translation cache TC vg_tc with the translation table TT vg_tt supplying the original to translation code address mapping Auxiliary arr
245. nd OpenOffice org 1 0 I use these as test programs and I know they fairly thoroughly exercise Valgrind The command lines to use are valgrind v trace children yes workaround gcc296 bugs yes mozilla valgrind v trace children yes workaround gcc296 bugs yes soffice If you find any more hints tips for packaging please report it as a bugreport See http www valgrind org for details 192 GNU Licenses GNU Licenses Table of Contents 1 The GNU General Public License cunas ts ahead tele dea Ge bee ba te beu AE dre ped 195 2 The GNU Free Documentation License o 202 CXC1V 1 The GNU General Public License GNU GENERAL PUBLIC LICENSE Version 2 June 1991 Copyright C 1989 1991 Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed Preamble The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation s software and to any other program whose authors commit to using it Some other Free Software Foundation software is covered by the GNU Library General Public License instead You can
246. nd to which originals As mentioned somewhere above TempRegs carrying values have names like t28 and each one has a shadow carrying its V bits with names like q28 This pairing aids in reading instrumented ucode One decision about all this is where to have observation points that is where to check that V bits are valid Iuse a minimalistic scheme only checking where a failure of validity could cause the original program to seg fault So the use of values as memory addresses causes a check as do conditional jumps these cause a check on the definedness of the condition codes And arguments PUSHed for helper calls are checked hence the weird restrictions on help call preambles described above Another decision is that once a value is tested it is thereafter regarded as defined so that we do not emit multiple undefined value errors for the same undefined value That means that TESTV uinstrs are always followed by SETV on the same shadow TempRegs Most of these SETVs are redundant and are removed by the post instrumentation cleanup phase The instrumentation for calling helper functions deserves further comment The definedness of results from a helper is modelled using just one V bit So in short we do pessimising casts of the definedness of all the args down to a single bit and then UifU these bits together So this single V bit will say undefined if any part of any arg is undefined This V bit is then pessimally cast bac
247. nificant performance overhead if your program generates huge quantities of errors To avoid serious problems Valgrind will simply stop collecting errors after 1000 different errors have been seen or 10000000 errors in total have been seen In this situation you might as well stop your program and fix it because Valgrind won t tell you anything else useful after this Note that the 1000 10000000 limits apply after suppressed errors are removed These limits are defined in m errormgr c and can be increased if necessary To avoid this cutoff you can use the rror limit no flag Then Valgrind will always show errors regardless of how many there are Use this flag carefully since it may have a bad effect on performance 2 5 Suppressing errors 12 Using and understanding the Valgrind core The error checking tools detect numerous problems in the base libraries such as the GNU C library and the XFree86 client libraries which come pre installed on your GNU Linux system You can t easily fix these but you don t want to see these errors and yes there are many So Valgrind reads a list of errors to suppress at startup A default suppression file is cooked up by the configure script when the system is built You can modify and add to the suppressions file at your leisure or better write your own Multiple suppression files are allowed This is useful if part of your project contains errors you can t or don t want to fix yet you
248. nitialised value error which occurs anywhere in 1ibX11 s0 6 2 when called from anywhere in the same library when called from anywhere in libXaw so 7 0 The inexact specification of locations is regrettable but is about all you can hope for given that the X11 libraries shipped with Red Hat 7 2 have had their symbol tables removed Note since the above two examples did not make it clear you can freely mix the obj and fun styles of description within a single suppression record 2 6 Command line flags for the Valgrind core 14 Using and understanding the Valgrind core As mentioned above Valgrind s core accepts a common set of flags The tools also accept tool specific flags which are documented seperately for each tool You invoke Valgrind like this valgrind valgrind options your prog your prog options Valgrind s default settings succeed in giving reasonable behaviour in most cases We group the available options by rough categories 2 6 1 Tool selection option The single most important option tool name defaultzmemcheck Run the Valgrind tool called name e g Memcheck Cachegrind etc 2 6 2 Basic Options These options work with all tools h help Show help for all options both for the core and for the selected tool help debug Same as help but also lists debugging options which usually are only of use to Valgrind s developers version Show the version number of the Valg
249. nse results on SSE code Add support for the POSIX message queue system calls Fix to allow 32 bit Valgrind to run on AMD64 boxes Note this does NOT allow Valgrind to work with 64 bit executables only with 32 bit executables on an AMD64 box At configure time only check whether linux mii h can be processed so that we don t generate ugly warnings by trying to compile it Add support for POSIX clocks and timers Developer cvs head release 2 1 2 18 July 2004 169 NEWS 2 1 2 contains four months worth of bug fixes and refinements Although officially a developer release we believe it to be stable enough for widespread day to day use 2 1 2 is pretty good so try it first although there is a chance it won t work If so then try 2 0 0 and tell us what went wrong 2 1 2 fixes a lot of problems present in 2 0 0 and is generally a much better product Relative to 2 1 1 a large number of minor problems with 2 1 1 have been fixed and so if you use 2 1 1 you should try 2 1 2 Users of the last stable release 2 0 0 might also want to try this release The following bugs and probably many more have been fixed These are listed at http bugs kde org Reporting a bug for valgrind in the http bugs kde org is much more likely to get you a fix than mailing developers directly so please continue to keep sending bugs there 76869 Crashes when running any tool under Fedora Core 2 testl This fixes the problem with returnin
250. nsee and is addressed as you You accept the license if you copy modify or distribute the work in a way requiring permission under copyright law A Modified Version of the Document means any work containing the Document or a portion of it either copied verbatim or with 202 The GNU Free Documentation License modifications and or translated into another language A Secondary Section is a named appendix or a front matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document s overall subject or to related matters and contains nothing that could fall directly within that overall subject Thus if the Document is in part a textbook of mathematics a Secondary Section may not explain any mathematics The relationship could be a matter of historical connection with the subject or with related matters or of legal commercial philosophical ethical or political position regarding them The Invariant Sections are certain Secondary Sections whose titles are designated as being those of Invariant Sections in the notice that says that the Document is released under this License If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant The Document may contain zero Invariant Sections If the Document does not identify any Invariant Sections then there are none The Cover Texts are certain sh
251. ntation bloat e Summary counts are calculated at the end rather than during execution e The cachegrind out output files can contain huge amounts of information file format was carefully chosen to minimise file sizes 2 9 Annotation Annotation is done by cg annotate It is a fairly straightforward Perl script that slurps up all the cost centres and then runs through all the chosen source files printing out cost centres with them It too has been carefully optimised 2 10 Similar work extensions It would be relatively straightforward to do other simulations and obtain line by line information about interesting events A good example would be branch prediction all branches could be instrumented to interact with a branch prediction simulator using very similar techniques to those described above In particular cg annotate would not need to change the file format is such that it is not specific to the cache simulation but could be used for any kind of line by line information The only part of cg annotate that is specific to the cache simulation is the name of the input file cachegrind out although it would be very simple to add an option to control this 123 3 Callgrind Format Specification This chapter describes the Callgrind Profile Format Version 1 A synonymous name is Calltree Profile Format These names actually mean the same since Callgrind was previously named Calltree The format description is meant f
252. ntly includes four tools a memory error detector a thread error detector a cache profiler and a heap profiler To give you an idea of what Valgrind tools do when a program is run under the supervision of Memcheck the memory error detector tool all reads and writes of memory are checked and calls to malloc new free delete are intercepted As a result Memcheck can detect if your program Accesses memory it shouldn t areas not yet allocated areas that have been freed areas past the end of heap blocks inaccessible areas of the stack Uses uninitialised values in dangerous ways Leaks memory Does bad frees of heap blocks double frees mismatched frees Passes overlapping source and destination memory blocks to memcpy and related functions Problems like these can be difficult to find by other means often lying undetected for long periods then causing occasional difficult to diagnose crashes When one of these errors occurs you can attach GDB to your program so you can poke around and see what s going on 182 README Valgrind is closely tied to details of the CPU operating system and to a less extent compiler and basic C libraries This makes it difficult to make it portable Nonetheless it is available for the following platforms x86 Linux AMD64 Linux and PPC32 Linux Valgrind is licensed under the GNU General Public License version 2 Read the file COPYING in the source distribution for d
253. ods are Dump on program termination This method is the standard way and doesn t need any special action from your side Spontaneous interactive dumping Use callgrind control d hint PID Name to request the dumping of profile information of the supervised application with PID or Name hint is an arbitrary string you can optionally specify to later be able to distinguish profile dumps The control program will not terminate before the dump is completely written Note that the application must be actively running for detection of the dump command So for a GUI application resize the window or for a server send a request If you are using KCachegrind for browsing of profile information you can use the toolbar button Force dump This will request a dump and trigger a reload after the dump is written Periodic dumping after execution of a specified number of basic blocks For this use the command line option dump every bb count Dumping at enter leave of all functions whose name starts with funcprefix Use the option dump before funcprefix and dump after funcprefix To zero cost counters before entering a function use zero before funcprefix The prefix method for specifying function names was choosen to ease the use with C you don t have to specify full signatures You can specify these options multiple times for different function prefixes Program controlled dumping Put finclude valgrind cal
254. of a problem the longer a program runs Massif takes censuses at an appropriate timescale censuses take place less frequently as the program runs for longer There is no point having more than 100 200 censuses on a single graph The graphs give a good overview of where your program s space use comes from and how that varies over time The accompanying text HTML file gives a lot more information about heap use 6 3 Details of Heap Allocations The text HTML file contains information to help interpret the heap bands of the graph It also contains a lot of extra information about heap allocations that you don t see in the graph Here s part of the information that accompanies the above graph Heap allocation functions accounted for 50 8 of measured spacetime Called from e 22 1 0x401767D0 _nl_intern_locale_data in 11b 1686 libc 2 3 2 s0 e 8 6 0x4017C393 read alias file in 11b 1686 libc 2 3 2 s0 S assis several entries omitted and 6 other insignificant places The first part shows the total spacetime due to heap allocations and the places in the program where most memory was allocated Nb if this program had been compiled with g actual line numbers would be given These places are sorted from most significant to least and correspond to the bands seen in the graph Insignificant sites accounting for less than 0 5 of total spacetime are omitted That alone can be useful but often isn t enough What i
255. om the standard input stdin which is problematic for programs which close stdin This option allows you to specify an alternative file descriptor from which to read input max stackframe number default 2000000 The maximum size of a stack frame if the stack pointer moves by more than this amount then Valgrind will assume that the program is switching to a different stack You may need to use this option if your program has large stack allocated arrays Valgrind keeps track of your program s stack pointer If it changes by more than the threshold amount Valgrind assumes your program is switching to a different stack and Memcheck behaves differently than it would for a stack pointer change smaller than the threshold Usually this heuristic works well However if your program allocates large structures on the stack this heuristic will be fooled and Memcheck will subsequently report large numbers of invalid stack accesses This option allows you to change the threshold to a different value You should only consider use of this flag if Valgrind s debug output directs you to do so In that case it will tell you the new threshold you should specify In general allocating large structures on the stack is a bad idea because 1 you can easily run out of stack space especially on systems with limited memory or which expect to support large numbers of threads each with a small stack and 2 because the error checking performed by Memc
256. on summary Functions are shown that account for more than X96 of the primary sort event If auto annotating also affects which files are annotated Note thresholds can be set for more than one of the events by appending any events for the sort option with a colon and a number no spaces though E g if you want to see the functions that cover 9996 of L2 read misses and 99 of L2 write misses use this option sort D2mr 99 D2mw 99 e auto no default auto yes When enabled automatically annotates every file that is mentioned in the function by function summary that can be found Also gives a list of those that couldn t be found context N default 8 Print N lines of context before and after each annotated line Avoids printing large sections of source files that were not executed Use a large number eg 10 000 to show all source lines I dir include dir default empty string Adds a directory to the list in which to search for files Multiple I include options can be given to add multiple directories 4 3 1 Warnings There are a couple of situations in which cg annotate issues warnings f a source file is more recent than the cachegrind out pid file This is because the information in cachegrind out pid is only recorded with line numbers so if the line numbers change at all in the source eg lines added deleted swapped any annotations will be incorrect f information is recor
257. ondition codes If an earlier uinstr writes the condition codes and the next uinsn along which actually cares about the condition codes writes the same or larger set of them but does not read any the earlier uinsn is marked as not writing any condition codes This saves a lot of redundant cond code saving and restoring 104 The Design and Implementation of Valgrind The effect of these transformations on our short block is rather unexciting and shown below On longer basic blocks they can dramatically improve code quality ar Ss Celece Gar enans El eo cO ia 4 sue 38 celece Gar rename Co rco cO a S e S at 1 annul flag write OSZAP due to later OSZACP Improved code Ds Came EDX tO ERNE 10 Z PUTL tO SEDX 4 LDB GeO y O 5 WIDENL Bs tO 6 PULL t0 SEAX 8 GETL ECX t8 9 LEA2L d e 2 T4 10 LDB e cio 11 MOVB S0207 eile 12 ANDB C2 EON CERO SANI ESTAN CHE OSS 14 Jnzo 0x40435A50 rOSZACP 15 JMPo 0x40435A5B 1 2 8 UCode instrumentation Once you understand the meaning of the instrumentation uinstrs discussed in detail above the instrumentation scheme is fairly straightforward Each uinstr is instrumented in isolation and the instrumentation uinstrs are placed before the original uinstr Our running example continues below I have placed a blank line after every original ucode to make it easier to see which instrumentation uinstrs correspo
258. one two or all three levels I1 D1 L2 of the cache from the command line using the I1 D1 and L2 options Other noteworthy behaviour References that straddle two cache lines are treated as follows f both blocks hit gt counted as one hit f one block hits the other misses gt counted as one miss If both blocks miss gt counted as one miss not two Instructions that modify a memory location eg inc and dec are counted as doing just a read ie a single data reference This may seem strange but since the write can never cause a miss the read guarantees the block is in the cache it s not very interesting Thus it measures not the number of times the data cache is accessed but the number of times a data cache miss could occur 52 Cachegrind a cache profiler If you are interested in simulating a cache with different properties it is not particularly hard to write your own cache simulator or to modify the existing ones in vg cachesim I1 c vg cachesim Dl c vg cachesim L2 c and vg cachesim gen c We d be interested to hear from anyone who does 4 2 Profiling programs To gather cache profiling information about the program 1s 1 invoke Cachegrind like this valgrind tool cachegrind ls 1 The program will execute slowly Upon completion summary statistics that look like this will be printed 31751 1 refs DOPING 31751 I1 misses PEG 31751 L2 misses 21D 3175
259. ons are wrapped with a default wrapper which does nothing except complain or abort if it is called depending on settings in MP IWRAP_DEBUG listed above The following functions have real do something useful wrappers PMPI Send PMPI Bsend PMPI Ssend PMPI Rsend PIVIPASESC Cre Mesa S S CE CO ERE PMPI Isend PMPI Ibsend PMPI Issend PMPI Irsend EME C VA PMPI Wait PMPI Waitall PMPI Test PMPI Testa PMENSSIIToObcSEMEHNSEmobe PMPI Cancel PMPI Sendrecv PMPI Type commit PMPI Type free PMPI Bcast PMPI Gather PMPI Scatter PMPI Alltoall PMPI Reduce PMPI Allreduce PMPI Op create PMPI Comm create PMPI Comm dup PMPI Comm free PMPI Comm rank PMPI Comm size PMPT Error string PMPT Init PMPI Initialized PMPI Finalize 36 Using and understanding the Valgrind core A few functions such as PMPI_Address are listed as HAS_NO_WRAPPER They have no wrapper at all as there is nothing worth checking and giving a no op wrapper would reduce performance for no reason Note that the wrapper library itself can itself generate large numbers of calls to the MPI implementa tion especially when walking complex types The most common functions called are PMPI_Extent PMPI_Type_get_envelope PMPI_Type_get_contents and PMPI_Type_free 2 16 4 2 Types MPI 1 1 structured types are supported and walked exactly The currently supported combin ers are MP I_COMBINER_NAMED MPI COMBINER CONTIGUOUS MPI COMBINER VECTOR
260. or the user to be able to understand the file contents but more important it is given for authors of measurement or visualization tools to be able to write and read this format 3 1 Overview The profile data format is ASCII based It is written by Callgrind and it is upwards compatible to the format used by Cachegrind ie Cachegrind uses a subset It can be read by callgrind_annotate and KCachegrind This chapter gives on overview of format features and examples For detailed syntax look at the format reference 3 1 1 Basic Structure Each file has a header part of an arbitrary number of lines of the format key value The lines with key positions and events define the meaning of cost lines in the second part of the file the value of positions is a list of subpositions and the value of events is a list of event type names Cost lines consist of subpositions followed by 64 bit counters for the events in the order specified by the positions and events header line The events header line is always required in contrast to the optional line for positions which defaults to line i e a line number of some source file In addition the second part of the file contains position specifications of the form spec name spec can be e g fn for a function name or fl for a file name Cost lines are always related to the function file specifications given directly before 3 1 2 Simple Example events Cycles Instruction
261. ort passages of text that are listed as Front Cover Texts or Back Cover Texts in the notice that says that the Document is released under this License A Front Cover Text may be at most 5 words and a Back Cover Text may be at most 25 words A Transparent copy of the Document means a machine readable copy represented in a format whose specification is available to the general public that is suitable for revising the document straightforwardly with generic text editors or for images composed of pixels generic paint programs or for drawings some widely available drawing editor and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters A copy made in an otherwise Transparent file format whose markup or absence of markup has been arranged to thwart or discourage subsequent modification by readers is not Transparent An image format is not Transparent if used for any substantial amount of text A copy that is not Transparent is called Opaque Examples of suitable formats for Transparent copies include plain ASCII without markup Texinfo input format LaTeX input format SGML or XML using a publicly available DTD and standard conforming simple HTML PostScript or PDF designed for human modification Examples of transparent image formats include PNG XCF and JPG Opaque formats include proprietary formats that can be read and edited only by proprietary
262. ould be determined In addition to the standard coverage information such a tool could record extra information that would help a user generate test cases to exercise unexercised paths For example for each conditional branch the tool could record all inputs to the conditional test and print these out when annotating run time type checking A nice example of a dynamic checker is given in this paper Debugging via Run Time Type Checking Alexey Loginov Suan Hsi Yong Susan Horwitz and Thomas Reps Proceedings of Fundamental Approaches to Software Engineering April 2001 Similar is the tool described in this paper Run Time Type Checking for Binary Programs Michael Burrows Stephen N Freund Janet L Wiener Proceedings of the 12th International Conference on Compiler Construction CC 2003 April 2003 This approach can find quite a range of bugs particularly in C and C programs and could be implemented quite nicely as a Valgrind tool Ways to speed up this run time type checking are described in this paper Reducing the Overhead of Dynamic Analysis Suan Hsi Yong and Susan Horwitz Proceedings of Runtime Verification 02 July 2002 Valgrind s client requests could be used to pass information to a tool about which elements need instrumentation and which don t We would love to hear from anyone who implements these or other tools 4 2 3 How tools work Tools must define various functions for instrumenting programs that are c
263. own pthread implementation Instead Valgrind is finally capable of running the native thread library either LinuxThreads or NPTL This means our libpthread has gone along with the bugs associated with it Valgrind now supports the kernel s threading syscalls and lets you use your standard system libpthread As a result There are many fewer system dependencies and strange library related bugs There is a small performance improvement and a large stability improvement On the downside Valgrind can no longer report misuses of the POSIX PThreads API It also means that Helgrind currently does not work We hope to fix these problems in a future release Note that running the native thread libraries does not mean Valgrind is able to provide genuine concurrent execution on SMPs We still 164 NEWS impose the restriction that only one thread is running at any given time There are many other significant changes too Memcheck is once again the default tool The default stack backtrace is now 12 call frames rather than 4 Suppressions can have up to 25 call frame matches rather than 4 Memcheck and Addrcheck use less memory Under some circumstances they no longer allocate shadow memory if there are large regions of memory with the same A V states such as an mmaped file The memory leak detector in Memcheck and Addrcheck has been improved It now report
264. ppen very often because it takes some time for process ids to be recycled e It can be huge 1s 1 generates a file of about 350KB Browsing a few files and web pages with a Konqueror built with full debugging information generates a file of around 15 MB 53 Cachegrind a cache profiler The pid suffix on the output file name serves two purposes Firstly it means you don t have to rename old log files that you don t want to overwrite Secondly and more importantly it allows correct profiling with the trace children yes option of programs that spawn child processes 4 2 2 Cachegrind options Manually specifies the 11 D1 L2 cache configuration where size and line_size are measured in bytes The three items must be comma separated but with no spaces eg valgrind tool cachegrind 11265535 2 64 You can specify one two or three of the 11 D1 L2 caches Any level not manually specified will be simulated using the configuration found in the normal way via the CPUID instruction for automagic cache configuration or failing that via defaults Cache simulation specific options are Il 2 size associativity line size Specify the size associativity and line size of the level 1 instruction cache Dl 2 size associativity line size Specify the size associativity and line size of the level 1 data cache L2 2 size associativity line size Specify the size associativity and lin
265. ppened V 100833 second call to mremap fails with EINVAL 101156 vgPlain find map space Assertion addr 1 lt lt 12 1 101173 Assertion recDepth gt 0 amp amp recDepth lt 500 failed 101291 creating threads in a forked process fails 101313 valgrind causes different behavior when resizing a window 101423 segfault for c array of floats 101562 valgrind massif dies on SIGINT even with signal handler r Stable release 2 2 0 31 August 2004 CHANGES RELATIVE TO 2 0 0 2 2 0 brings nine months worth of improvements and bug fixes We believe it to be a worthy successor to 2 0 0 There are literally hundreds of bug fixes and minor improvements There are also some fairly major user visible changes A complete overhaul of handling of system calls and signals and their interaction with threads In general the accuracy of the system call thread and signal simulations is much improved Blocking system calls behave exactly as they do when running natively not on valgrind That is if a syscall blocks only the calling thread when running natively than it behaves the same on valgrind No more mysterious hangs because V doesn t know that some syscall or other should block only the calling thread nterrupted syscalls should now give more faithful results 167 NEWS Signal contexts in signal handlers are supported Improvements to NPTL support to the extent that V now works properly on N
266. program is large 4 2 4 Annotating assembler programs Valgrind can annotate assembler programs too or annotate the assembler generated for your C program Sometimes this is useful for understanding what is really happening when an interesting line of C code is translated into multiple instructions To do this you just need to assemble your s files with assembler level debug information gcc doesn t do this but you can use the GNU assembler with the gstabs option to generate object files with this information eg AS moO STElINS LOOS You can then profile and annotate source files in the same way as for C C programs 4 3 cg annotate options pid Indicates which cachegrind out pid file to read Not actually an option it is required 59 Cachegrind a cache profiler h help v version Help and version as usual sort A B C default order in cachegrind out pid Specifies the events upon which the sorting of the function by function entries will be based Useful if you want to concentrate on eg I cache misses sort I1mr I2mr or D cache misses sort D1mr D2mr or L2 misses sort D2mr I2mr e show A B C default all using order in cachegrind out pid Specifies which events to show and the column order Default is to use all present in the cachegrind out pid file and use the order in the file threshold X default 99 Sets the threshold for the function by functi
267. ptors on exit Along with each file descriptor is printed a stack backtrace of where the file was opened and any details relating to the file descriptor such as the file name or socket details time stamp lt yes no gt default no When enabled each message is preceded with an indication of the elapsed wallclock time since startup expressed as days hours minutes seconds and milliseconds log fd lt number gt default 2 stderr Specifies that Valgrind should send all of its messages to the specified file descriptor The default 2 is the standard error channel stderr Note that this may interfere with the client s own use of stderr as Valgrind s output will be interleaved with any output that the client sends to stderr log file lt filename gt Specifies that Valgrind should send all of its messages to the specified file In fact the file name used is created by concatenating the text filename and the process ID ie lt filename gt lt pid gt so as to create a file per process The specified file name may not be the empty string log file exactly lt filename gt Just like log file but the suffix pid is not added If you trace multiple processes with Valgrind when using this option the log file may get all messed up log file qualifier VAR When used in conjunction with 10g file causes the log file name to be qualified using the contents of the environment variable SVAR Th
268. r way However this currently has to be switched off if the files are to be read by cal lgrind_annotate compress pos no yes default yes This option influences the output format of the profile data It specifies whether numerical positions are always specified as absolute values or are allowed to be relative to previous numbers This shrinks the file size However this currently has to be switched off if the files are to be read by cal lgrind_annotate combine dumps no yes default no When multiple profile data parts are to be generated these parts are appended to the same output file if this option is set to yes Not recommand 5 4 3 Activity options These options specify when actions relating to event counts are to be executed For interactive control use callgrind_control dump every bb lt count gt default 0 never Dump profile data every lt count gt basic blocks Whether a dump is needed is only checked when Valgrinds internal scheduler is run Therefore the minimum setting useful is about 100000 The count is a 64 bit value to make long dump periods possible dump before lt prefix gt Dump when entering a function starting with lt prefix gt zero before lt prefix gt Zero all costs when entering a function starting with lt prefix gt dump after lt prefix gt Dump when leaving a function starting with lt prefix gt 5 4 4 Data collection options These option
269. rams ccce eb haces eh ee DISCE Ved ie aM Ober X A Ra i APA aqu E ME aes 53 42 l Outp thle ic ice beheben exe Rp SP aid Re eb es eb ex eei ad amas 53 42 2 Cachegrind options acciccasacsaadiee dou x x E Ra rua dr ER rA asperi i ud Pad 54 4 2 3 Annotating C C programs A herb i pae ua baee ee 54 4 2 4 Annotating assembler programs oooococococcon e n 59 43 9dg annotateoDOls mp ebbe Pot tp utor EE t eat Dose des eet meret ur 59 4 3 T Warnings aei eedem eee err os eae AN tr Beso e eer cse e i e ante em eap dra 60 413 2 Things to Watch outtor a feta lee do ede eet fa yokes Suga oP ode near teste tore sed as 61 43 3 ACCURACY iva a ee based RA hie A baie A dd PA IT dase 62 Cu huir 62 5 Callgrind a heavyweight profiler ssssssssssssssessse cece eee 63 DL OVERVIEW ai pecie ux ir terria case ura ia ates Rr Kata aoe hath nk dee 63 Sui d EET 63 5 2 1 Profiling as part of Application Development sssssssssslsse eee 63 5 2 2 Profiling Tools Lise bn eee rir ed e niRreses m1 mis reQudci qe b beu t EE 63 3 3 Usage xix ex AR ea Ped bed id cbe wae aii a E 64 BIEMEDLISL MCCC 64 5 3 2 Multiple profiling dumps from one program run eee eee 64 5 3 3 Limiting the range of collected events 0 cee eee eens 66 5 394 Avoiding Cycles 7a t EE pne T RE ee beet Gs vd dai dana E de Dave ioa ee ah gees 66 5 4 Command line option reference 0 0 he 67 5 41 Miscellaneous options siii t RB rel REIR I ua pre Er e Ha
270. re handled by the core and are defined in the header file valgrind valgrind h Tool specific header files are named after the tool e g valgrind memcheck h All header files can be found in the include valgrind directory of wherever Valgrind was installed 21 Using and understanding the Valgrind core The macros in these header files have the magical property that they generate code in line which Valgrind can spot However the code does nothing when not run on Valgrind so you are not forced to run your program under Valgrind just because you use the macros in this file Also you are not required to link your program with any extra supporting libraries The code left in your binary has negligible performance impact on x86 amd64 and ppc32 the overhead is 6 simple integer instructions and is probably undetectable except in tight loops However if you really wish to compile out the client requests you can compile with DNVALGRIND analogous to DNDEBUG s effect on assert You are encouraged to copy the valgrind h headers into your project s include directory so your program doesn t have a compile time dependency on Valgrind being installed The Valgrind headers unlike most of the rest of the code are under a BSD style license so you may include them without worrying about license incompatibility Here is a brief description of the macros available in valgrind h which work with more than one tool see the tool specific
271. reads e Some other boring helper addresses VG helper value check2 fail andVG helper value checkl fail These are probably never emitted now and should be removed The entire state of the simulated FPU which I believe to be 108 bytes long e Finally the addresses of various other helper functions in vg helpers S which deal with rare situations which are tedious or difficult to generate code in line for As a general rule the simulated machine s state lives permanently in memory at VG baseBlock However the JITter does some optimisations which allow the simulated integer registers to be cached in real registers over multiple simulated instructions within the same basic block These are always flushed back into memory at the end of every basic block so that the in memory state is up to date between basic blocks This flushing is implied by the statement above that the real machine s allocatable registers are dead in between simulated blocks 1 2 2 Startup shutdown and system calls Getting into of Valgrind VG startup called from valgrind so s initialisation section really means copying the real CPU s state into VG baseBlock and then installing our own stack pointer etc into the real CPU and then starting up the JITter Exiting valgrind involves copying the simulated state back to the real state Unfortunately there s a complication at startup time Problem is that at the point where we need to take a snapsho
272. recision control it can print a message giving a traceback of where this has happened and continue execution This behaviour used to be the default but the messages are annoying and so showing them is now optional Use show emwarns yes to see them The above limitations define precisely the IEEE754 default behaviour default fixup on all exceptions round to nearest operations and 64 bit precision As of version 3 0 0 Valgrind has the following limitations in its implementation of x86 AMD64 SSE2 FP arithmetic relative to IEEE754 Essentially the same no exceptions and limited observance of rounding mode Also SSE2 has control bits which make it treat denormalised numbers as zero DAZ and a related action flush denormals to zero FTZ Both of these cause SSE2 arithmetic to be less accurate than IEEE requires Valgrind detects ignores and can warn about attempts to enable either mode 31 Using and understanding the Valgrind core As of version 3 2 0 Valgrind has the following limitations in its implementation of PPC32 and PPC64 floating point arithmetic relative to IEEE754 Scalar non Altivec Valgrind provides a bit exact emulation of all floating point instructions except for fre and fres which are done more precisely than required by the PowerPC architecture specification All floating point operations observe the current rounding mode However fpscr FPRF is not set after each operation That could
273. rements are done in spacetime i e space in bytes multiplied by time in milliseconds Note that because Massif slows a program down a lot the actual spacetime figure is fairly meaningless it s the relative values that are interesting Which entries you see in the breakdown depends on the command line options given The above example measures all the possible parts of memory Heap number of words allocated on the heap via malloc new and new e Heap admin each heap block allocated requires some administration data which lets the allocator track certain things about the block Itis easy to forget about this and if your program allocates lots of small blocks it can add up This value is an estimate of the space required for this administration data e Stack s the spacetime used by the programs stack s Threaded programs can have multiple stacks This includes signal handler stacks T2 Massif a heap profiler 6 2 3 Spacetime Graphs As well as printing summary information Massif also creates a file representing a spacetime graph massif pid hp It will produce a file called massif pid ps which can be viewed in a PostScript viewer Massif uses a program called hp2ps to convert the raw data into the PostScript graph It s distributed with Massif but came originally from the Glasgow Haskell Compiler You shouldn t need to worry about this at all However if the graph creation fails for any reason Massif wil
274. ress and 0 means valid address This seems counterintuitive and so it is but testing against zero on x86s saves instructions compared to testing against all 1s because many ALU operations set the Z flag for free so to speak With that in mind the tag ops are 101 The Design and Implementation of Valgrind UNARY Pessimising casts VgT PCast40 VgT_PCast20 VgT_PCast10 VgT_PCast01 VgT_PCast02 and VgT_PCast04 A pessimising cast takes a V bit vector at one size and creates anew one at another size pessimised in the sense that if any of the bits in the source vector indicate undefinedness then all the bits in the result indicate undefinedness In this case the casts are all to or from a single V bit so for example VgT_PCast 40 is a pessimising cast from 32 bits to 1 whereas VgT_PCast04 simply copies the single source V bit into all 32 bit positions in the result Surprisingly these ops can all be implemented very efficiently There are also the pessimising casts VgT_PCast14 from 8 bits to 32 VgT_PCast12 from 8 bits to 16 and VgT PCast11 from 8 bits to 8 This last one seems nonsensical but in fact it isn t a no op because as mentioned above any undefined 1 bits in the source infect the entire result UNARY Propagating undefinedness upwards in a word VgT_Left4 VgT_Left2 and VgT_Leftl These are used to simulate the worst case effects of carry propagation in adds and subtracts They return a V vector identical
275. rind core Tools can have their own version numbers There is a scheme in place to ensure that tools only execute when the core version is one they are known to work with This was done to minimise the chances of strange problems arising from tool vs core version incompatibilities q quiet Run silently and only print error messages Useful if you are running regression tests or have some other automated test machinery v verbose Be more verbose Gives extra information on various aspects of your program such as the shared objects loaded the suppressions used the progress of the instrumentation and execution engines and warnings about unusual behaviour Repeating the flag increases the verbosity level d Emit information for debugging Valgrind itself This is usually only of interest to the Valgrind developers Repeating the flag produces more detailed output If you want to send us a bug report a log of the output generated by v v d d will make your report more useful tool toolname default memcheck Run the Valgrind tool called toolname e g Memcheck Addrcheck Cachegrind etc 15 Using and understanding the Valgrind core trace children yes no default no When enabled Valgrind will trace into child processes This can be confusing and isn t usually what you want so it is disabled by default track fds yes no default no When enabled Valgrind will print out a list of open file descri
276. rnel Two kinds a System calls can t be directly observed by either the tool or the core But the core does have some idea of what happens to the arguments and it provides hooks for a tool to wrap system calls b Other all other kernel activity e g process scheduling is totally opaque and irrelevant to the program S It should be noted that a tool only has direct control over code executed in user space This is the vast majority of code executed but it is not absolutely all of it so any profiling information recorded by a tool won t be totally accurate 4 2 Writing a Tool 4 2 1 Why write a tool Before you write a tool you should have some idea of what it should do What is it you want to know about your programs of interest Consider some existing tools e memcheck among other things performs fine grained validity and addressibility checks of every memory reference performed by the program cachegrind tracks every instruction and memory reference to simulate instruction and data caches tracking cache accesses and misses that occur on every line in the program helgrind tracks every memory access and mutex lock unlock to determine if a program contains any data races elackey does simple counting of various things the number of calls to a particular function _dl_runtime_resolve the number of basic blocks guest instructions VEX instructions executed the number of branches executed and the propor
277. rocessing the synthetic runtime events does not influence the results See Usage for more details on the possibilities 5 3 Usage 5 3 1 Basics To start a profile run for a program execute callgrind callgrind options your program program options While the simulation is running you can observe execution with callem inel Cont rel This will print out a current backtrace To annotate the backtrace with event counts run callilseialiasl comeizol 8 19 After program termination a profile data file named callgrind out pidis generated with pid being the process ID of the execution of this profile run The data file contains information about the calls made in the program among the functions executed together with events of type Instruction Read Accesses Ir If you are additionally interested in measuring the cache behaviour of your program use Callgrind with the option simulate cache yes This will further slow down the run approximately by a factor of 2 If the program section you want to profile is somewhere in the middle of the run it is beneficial to fast forward to this section without any profiling at all and switch it on later This is achieved by using instr atstart no and interactively use callgrind_control i on before the interesting code section is about to be executed If you want to be able to see assembler annotation specify dump instr yes This will produce profile data at instruction granularity No
278. roduced by others will not reflect on the original authors reputations Finally any free program is threatened constantly by software 195 The GNU General Public License patents We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses in effect making the program proprietary To prevent this we have made it clear that any patent must be licensed for everyone s free use or not licensed at all The precise terms and conditions for copying distribution and modification follow GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING DISTRIBUTION AND MODIFICATION 0 This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License The Program below refers to any such program or work and a work based on the Program means either the Program or any derivative work under copyright law that is to say a work containing the Program or a portion of it either verbatim or with modifications and or translated into another language Hereinafter translation is included without limitation in the term modification Each licensee is addressed as you Activities other than copying distribution and modification are not covered by this License they are outside its scope The act of running the Program is not restricted and the output from the Program is cover
279. rograms will only be able to access 2GB of address space We will fix this eventually but not for the moment Use enable pie at configure time to turn this on Support for programs that use stack switching has been improved Use the max stackframe flag for simple cases and the VALGRIND_STACK_REGISTER VALGRIND_STACK_DEREGISTER and VALGRIND_STACK_CHANGE client requests for trickier cases Support for programs that use self modifying code has been improved in particular programs that put temporary code fragments on the stack This helps for C programs compiled with GCC that use nested functions and also Ada programs This is controlled with the smc check flag although the default setting should work in most cases Output can now be printed in XML format This should make it easier for tools such as GUI front ends and automated error processing schemes to use Valgrind output as input The xml flag controls this As part of this change ELF directory information is read from executables so absolute source file paths are available if needed Programs that allocate many heap blocks may run faster due to improvements in certain data structures Addrcheck is currently not working We hope to get it working again soon Helgrind is still not working as was the case for the 2 4 0 release The JITter has been completely rewritten and is now in a separate library called Vex This enabled a lot of the user visible changes such as
280. roveORl TQ These help out with AND and OR operations AND and OR have the inconvenient property that the definedness of the result depends on the actual values of the arguments as well as their definedness At the bit level 1 AND undefined undefined but O AND undefined 0 and similarly 0 OR undefined undefined but 1 OR undefined 1 It turns out that gcc quite legitimately generates code which relies on this fact so we have to model it properly in order to avoid flooding users with spurious value errors The ultimate definedness result of AND and OR is calculated using Ui fU on the definedness of the arguments but we also Di fD in some improvement terms which take into account the above phenomena ImproveAND takes as its first argument the actual value of an argument to AND the T and the definedness of that argument the Q and returns a V bit vector which is defined 0 for bits which have value 0 and are defined this when DifD into the final result causes those bits to be defined even if the corresponding bit in the other argument is undefined The ImproveOR ops do the dual thing for OR arguments Note that XOR does not have this property that one argument can make the other irrelevant so there is no need for such complexity for XOR 102 The Design and Implementation of Valgrind That s all the tag ops Ifyou stare at this long enough and then run Valgrind and stare at the pre and post instrumented ucode
281. rra err ER ede 67 5 4 2 Dump creation Options ceci ila e elk re a A ao 67 5 4 3 ACHVILY OPUODS iorssteedotirrene teetnbsvaepepeseS geh a e heb bier 68 5 4 4 Data collection options segue av eased dies Saeed weeded ERE E da DERE ER a 68 5 4 5 Cost entity separation Options oooocccccccccn e e e 69 5 34 6 Cach simulation options iue Leere ha dae anlage bx dede E WE PROP tdt diea 70 6 Massif a heap profiler card ek icis e rac PLUS Fur Raw ee de prre Ee d eR s 71 6 1 Heap eo roni pa DP 71 6 1 1 Why Usea Heap Profiler 100 A ele tdi db IDA 71 6 2 Using MassIL A AA ma Anita V debet eis 71 6 2 T OVetVIeW ausos rose site AA A uei Qua P P DOCU ED nie Va e da un Tube ta 72 6 2 2 Basic Results of Profiling cunicinninada use pedperee bre eR a d ni que bip 72 6 2 3 Spacetime Graphs issus cc er apre RELiR Raw a sek de a RAW EE da 73 6 3 Details of Heap Allocations 0 00 e e eee 74 6 321 ACCULACy ci ege hio id IA A bead da De ERE PATCR EA 75 6 4 Massif Options eciscee theses e ea pibe ys oe cee 75 7 Helgrind a data race detector oy Lectio debent Re Re an p AU are Ca RET RH e e TI Ll Data Races scs dili dendi SA EE a Ms d da V ote das 77 7 2 What Helgnnd Does 4s daira PER wee ad pus Gee D Dac Ca REIR A es TI 7 3 Helgenind Options 220232 22m nb it sash iR EO esee be RE ML det ev i TI 8 Nulgrnd the null tool erre oec sedet C cte daba die e sr eo E erue 78 9 Lackey a simple profiler and memory tracer 79
282. rs in the run When set to the default value zero the return value from Valgrind will always be the return value of the process being simulated When set to a nonzero value that value is returned instead if Valgrind detects any errors This is useful for using Valgrind as part of an automated test suite since it makes it easy to detect test cases for which Valgrind has reported errors just by inspecting return codes show below main lt yes no gt default no By default stack traces for errors do not show any functions that appear beneath main or similar functions such asglib s libo start main ifmain is not present in the stack trace most of the time it s uninteresting C library stuff If this option is enabled those entries below main will be shown suppressions filename default S PREFIX lib valgrind default supp Specifies an extra file from which to read descriptions of errors to suppress You may use as many extra suppressions files as you like 17 Using and understanding the Valgrind core gen suppressions lt yes nolal1 gt default no When set to yes Valgrind will pause after every error shown and print the line Print suppression Return N n Y y C c The prompt s behaviour is the same as for the db attach option see below If you choose to Valgrind will print out a suppression for this error You can then cut and paste it into a suppression file if
283. s for use in Valgrind And lots and lots of other people sent bug reports patches and very helpful feedback Thank you all 148 3 INSTALL Basic Installation These are generic installation instructions The configure shell script attempts to guess correct values for various system dependent variables used during compilation It uses those values to create a Makefile in each directory of the package It may also create one or more h files containing system dependent definitions Finally it creates a shell script config status that you can run in the future to recreate the current configuration a file config cache that saves the results of its tests to speed up reconfiguring and a file config log containing compiler output useful mainly for debugging configure If you need to do unusual things to compile the package please try to figure out how configure could check whether to do them and mail diffs or instructions to the address given in the README so they can be considered for the next release If at some point config cache contains results you don t want to keep you may remove or edit it The file configure in is used to create configure by a program called autoconf You only need configure in if you want to change it or regenerate configure using a newer version of autoconf The simplest way to compile this package is 1 cd to t
284. s Flops fl file f fn main 15 90 14 2 LS 20 12 The above example gives profile information for event types Cycles Instructions and Flops Thus cost lines give the number of CPU cycles passed by number of executed instructions and number of floating point operations executed while running code corresponding to some source position As there is no line specifying the value of positions it defaults to line which means that the first number of a cost line is always a line number Thus the first cost line specifies that in line 15 of source file file f there is code belonging to function main While running 90 CPU cycles passed by and 2 of the 14 instructions executed were floating point operations Similarily the next line specifies that there were 12 instructions executed in the context of function main which can be related to line 16 in file file f taking 20 CPU cycles If a cost line specifies less event counts than given in the events line the rest is assumed to be zero I e there was no floating point instruction executed relating to line 16 Note that regular cost lines always give self also called exclusive cost of code at a given position If you specify multiple cost lines for the same position these will be summed up On the other hand in the example above there is no specification of how many times function main actually was called profile data only contains sums 124 Callgrind Format Specification
285. s are registers eg btsl eax edx This should only happen rarely e x86 amd64 FPU instructions with data sizes of 28 and 108 bytes e g fsave are treated as though they only access 16 bytes These instructions seem to be rare so hopefully this won t affect accuracy much Another thing worth nothing is that results are very sensitive Changing the size of the valgrind so file the size of the program being profiled or even the length of its name can perturb the results Variations will be small but don t expect perfectly repeatable results if your program changes at all While these factors mean you shouldn t trust the results to be super accurate hopefully they should be close enough to be useful 4 3 4 Todo Program start up shut down calls a lot of functions that aren t interesting and just complicate the output Would be nice to exclude these somehow 62 5 Callgrind a heavyweight profiler 5 1 Overview Callgrind is a Valgrind tool for profiling programs The collected data consists of the number of instructions executed on a run their relationship to source lines and call relationship among functions together with call counts Optionally a cache simulator similar to cachegrind can produce further information about the memory access behavior of the application The profile data is written out to a file at program termination For presentation of the data and interactive control of the profiling two command line too
286. s depending on the versions of linux X and glibc on a system Suppression types have the form tool name suppression name The tool name here is the name you specify for the tool during initialisation with V6 details name 4 3 2 Documentation As of version 3 0 0 Valgrind documentation has been converted to XML Why See The XML FAQ 4 3 2 1 The XML Toolchain If you are feeling conscientious and want to write some documentation for your tool please use XML The Valgrind Docs use the following toolchain and versions xmllint using libxml version 20607 xedbepseexes iusasbaer ssl 206077 Lust 10102 ammel oe 302 pdfxmltex pdfTeX Web2C 7 4 5 3 14159 1 10b pdftops version 3 00 DocBook version 4 2 Latency you should note that latency is a big problem DocBook is constantly being updated but the tools tend to lag behind somewhat It is important that the versions get on with each other so if you decide to upgrade something then you need to ascertain whether things still work nicely this cannot be assumed Stylesheets The Valgrind docs use various custom stylesheet layers all of which are in valgrind docs lib You shouldn t need to modify these in any way 140 Writing a New Valgrind Tool Catalogs Catalogs provide a mapping from generic addresses to specific local directories on a given machine Most recent Linux distributions have adopted a common place for storing catalogs etc xml Assuming that yo
287. s for details So the cleaned up running example looks like this As above I have inserted line breaks after every original non instrumentation uinstr to aid readability As with straightforward ucode optimisation the results in this block are undramatic because it is so short longer blocks benefit more because they have more redundancy which gets eliminated 108 The Design and Implementation of Valgrind at at at at at at at 298 312 8 41 31 8 25 8 22 8 PISE ES 258 218 AEP S 0 312 8 ISE 34 SO S6 319 SNP 39g 40 42 43 C1 CO 00 10 109 delete UifU1 due to defd argl change ImproveAND1 TQ to MOV due to defd arg2 delete SETV delete MOV delete SETV delete SETV delete SETV GETVL SEDX q0 GET SEDX tO TAG O q0 Left4 q0 INCL to PUTVL q0 EDX PUTL t0 SEDX TESTVL 90 LOADVB t0 q0 LDB ONO TAG1o q0 SWiden14 q0 IDENL_Bs t0 PUTVL q0 SEAX PUTE tO SEAX GETV SECX q8 GE SECX t8 MOVL q0 q4 SHLL 0x1 q4 TAG20 q4 UifU4 q8 q4 TAGlo q4 Left4 q4 LEA2L JL 23719025 4 1 4 TESTVL q4 LOADVB 124 CHO LDB ie Mete MOVB GOZO L2 MOVIL al0 q14 TAG20 ql4 ImproveAND1_TQ t10 q14 TAG20 SO Dial eje clo y MOVL t12 q14 TAG20 SO e Dim eje Glo y MOVL eLO Gils TAG1o cile casco ele PUTVFo qil 6 ANDB t12 t10 wOSZACP
288. s from Memcheck 000 o cece cee 41 3 3 1 Illegal read Illegal write errors iiec RR kde RR Dr kp E RR Hr RR EE 41 3 3 2 Use of uninitialised values cornisa 41 3 3 3 legal frees iei e a E ec p REPE APR Ooh ad nek AG pe Rd 42 3 3 4 When a block is freed with an inappropriate deallocation function 000 2000 42 3 3 5 Passing system call parameters with inadequate read write permissions 00 005 43 3 3 6 Overlapping source and destination blocks ooooccoccccccccccoccnccocco rr 44 3 3 7 Memory leak detection eco Det eT et to Pe ice EA IRR aD RA etre Od aa 45 3 4 Writing suppression files ue len is tu eR da A uae t edd ur aes 46 Valgrind User Manual 3 5 Details of Memcheck s checking machinery 0 000 c cece eee eens 46 39 1 Valid value CV Dits noia a dd ERE oF boa e bRCOERRUEP UE RRIRR S eR RA TE VI es 46 3 5 2 Nahlid addtess A Bits ances dae dase RE Ree RR e a ER C RET RE E 48 3 5 3 Putting t all together is tk deco T e eee eia pde rand e OLEI RT A dia 49 3 6 Client Requests Loved kona ROS docet uec bur Deak a a AAA AS 50 4 Cachegnnd cache profiler cestas pl diay ued A A a Sag eae ET 51 41 Cache proving ioc meet de tei ye etie qe o ta n ODE ene eU DRE ias 51 A Vel OVeryJew e dose es ep det eas da tte feret tte t eR UNIT ee E ER AIRE E uh 51 4 12 Cach simulation specifics uices cr bale UA each RE enu taie ues dope der ee Ten 52 4 2 Profiling prog
289. s into nice clean RISC like UCode For example for cache profiling we are interested in instructions that read and write memory in UCode there are only four instructions that do this LOAD STORE FPU_Rand FPU_W By contrast because of the x86 addressing modes almost every instruction can read or write memory Most of the cache profiling machinery is in the file vg_cachesim c These notes are a somewhat haphazard guide to how Valgrind s cache profiling works 2 2 Cost centres Valgrind gathers cache profiling about every instruction executed individually Each instruction has a cost centre associated with it There are two kinds of cost centre one for instructions that don t reference memory iCC and one for instructions that do 1dCC typedef struct _CC ULong a Urong ml ULong m2 CC typedef struct iCC x word 1 UChar tag UGharcqmtns Cr RSie words 2 x Addr instr_addr GG is jJ aces typedef struct _idCC x word 1 T em aste UChar ins secs UChar data_size words 2 x Addr instr_addr CG is CERD J LECCE Each CC has three fields a m1 m2 for recording references level 1 misses and level 2 misses Each of these is a 64 bit ULong the numbers can get very large ie greater than 4 2 billion allowed by a 32 bit unsigned int 117 How Cachegrind works A iCC has one CC for instruction cache accesses A idCC has two one for instruction cache accesses and one for
290. s more types of memory leak including leaked cycles When reporting leaked memory it can distinguish between directly leaked memory memory with no references and indirectly leaked memory memory only referred to by other leaked memory Memcheck s confusion over the effect of mprotect has been fixed previously mprotect could erroneously mark undefined data as defined Signal handling is much improved and should be very close to what you get when running natively One result of this is that Valgrind observes changes to sigcontexts passed to signal handlers Such modifications will take effect when the signal returns You will need to run with single step yes to make this useful Valgrind is built in Position Independent Executable PIE format if your toolchain supports it This allows it to take advantage of all the available address space on systems with 4Gbyte user address spaces Valgrind can now run itself requires PIE support Syscall arguments are now checked for validity Previously all memory used by syscalls was checked but now the actual values passed are also checked Syscall wrappers are more robust against bad addresses being passed to syscalls they will fail with EFAULT rather than killing Valgrind with SIGSEGV Because clone is directly supported some non pthread uses of it will work Partial sharing where some resources are shared and some are not is not supported open and readlink on proc se
291. s specify when events are to be aggregated into event counts Also see Limiting range of event collection instr atstart yes no default yes Specify if you want Callgrind to start simulation and profiling from the beginning of the program When set to no Callgrind will not be able to collect any information including calls but it will have at most a slowdown of around 4 which is the minimum Valgrind overhead Instrumentation can be interactively switched on via callgrind control i on Note that the resulting call graph will most probably not contain main but will contain all the functions executed after instrumentation was switched on Instrumentation can also programatically switched on off See the Callgrind include file lt callgrind h gt for the macro you have to use in your source code For cache simulation results will be less accurate when switching on instrumentation later in the program run as the simulator starts with an empty cache at that moment Switch on event collection later to cope with this error 68 Callgrind a heavyweight profiler collect atstart yes no default yes Specify whether event collection is switched on at beginning of the profile run To only look at parts of your program you have two possibilities 1 Zero event counters before entering the program part you want to profile and dump the event counters to a file after leaving that program part 2 Switch on off collection state
292. se name ends in eg GGLIBC 2 3 Hence we are not sure what its real name is We also want to cover any soname of the form libpthread sox So the header of the wrapper will be 25 Using and understanding the Valgrind core int I WRAP SONAME FNNAME ZZ libpthreadZdsoZzd0 pthreadZucreateZAZa Cass EQMENLLS So ceo ee oco In order to write unusual characters as valid C function names a Z encoding scheme is used Names are written literally except that a capital Z acts as an escape character with the following encoding Za encodes Zp ap io Zd Zu Zh ZS ZA ZZ space N o Hence libpthreadZdsoZd0 is an encoding of the soname libpthread so 0 and pthreadZucreateZAZa is an encoding of the function name pthread_create The macro 1_WRAP_SONAME_FNNAME_Z2Z constructs a wrapper name in which both the soname first component and function name second component are Z encoded Encoding the function name can be tiresome and is often unnecessary so a second macro I WRAP SONAME FNNAME ZU can be used instead The ZU variant is also useful for writing wrappers for C functions in which the function name is usually already mangled using some other convention in which Z plays an important role having to encode a second time quickly becomes confusing Since the function name field may contain wildcards it can be anything including just The same is true for the soname However some ELF objects
293. sh the boundaries of the PUSH POP and CLEAR sequences for the call E PUSH POP and CLEAR may only appear inside sections bracketed by CALLM_S and CALLM else and nowhere n any such bracketed section no two PUSH insns may push the same TempReg Dually no two two POPs may pop the same TempReg Finally although this is not checked args should be removed from the stack with CLEAR rather than POPs into a TempReg which is not subsequently used This is because the instrumentation mechanism assumes that all values POPped from the stack are actually used Some of the translations may appear to have redundant TempReg to TempReg moves This helps the next phase UCode optimisation to generate better code 1 2 7 UCode optimisation UCode is then subjected to an improvement pass vg_improve which blurs the boundaries between the translations of the original x86 instructions It s pretty straightforward Three transformations are done Redundant GET elimination Actually more general than that eliminates redundant fetches of ArchRegs In our running example uinstr 3 GETs EDX into t 2 despite the fact that by looking at the previous uinstr it is already in tO The GET is therefore removed and t2 renamed to t0 Assuming tO is allocated to a host register it means the simulated EDX will exist in a host CPU register for more than one simulated x86 instruction which seems to me to be a highly des
294. ssages from Memcheck in the Valgrind User Manual which has examples of all the error messages Memcheck produces 5 Caveats Memcheck is not perfect it occasionally produces false positives and there are mechanisms for suppressing these see Suppressing errors in the Valgrind User Manual However it is typically right 99 of the time so you should be wary of ignoring its error messages After all you wouldn t ignore warning messages produced by a compiler right The suppression mechanism is also useful if Memcheck is reporting errors in library code that you cannot change the default suppression set hides a lot of these but you may come across more Memcheck also cannot detect every memory error your program has For example it can t detect if you overrun the bounds of an array that is allocated statically or on the stack But it should detect every error that could crash your program eg cause a segmentation fault 6 More information Please consult the Valgrind FAQ and the Valgrind User Manual which have much more information Note that the other tools in the Valgrind distribution can be invoked with the too1 option Valgrind User Manual Release 3 2 0 7 June 2006 Copyright 2000 2006 Valgrind Developers Email valgrind valgrind org Valgrind User Manual Table of Contents 1 nttoduichon ih iced cae Sab Poe Shy hah clas AAA Fag Be a ee aes iA 7 1 1 An Overview ol Valgrind miii a esate ECT da eta UR b Ce imei 7
295. ssential The happenings mostly involve reading writing of memory So let s look at an example of a wrapper for a system call which should be familiar to many Unix programmers The syscall wrapper for time Removing the debug printing clutter it looks like this PRE time 184 README_MISSING_SYSCALL_OR_IOCTL time_t time time_t t PRINT time p _ arg1 if argl UWord NULL PRE_MEM_WRITE time argl sizeof time_t POST time if argl UWord NULL POST_MEM_WRITE argl sizeof vki time t The first thing we do happens before the syscall occurs in the PRE function if a non NULL buffer is passed in as the argument tell the tool that the buffer is about to be written to if arg UWord NULL PRE_MEM_WRITE time argl sizeof vki_time_t Finally the really important bit after the syscall occurs in the POST function if and only if the system call was successful tell the tool that the memory was written if arg UInt NULL POST_MEM_WRITE argl sizeof vki time t The POST function won t be called if the syscall failed so you don t need to worry about checking that in the POST function Note this is sometimes a bug some syscalls do return results when they fail for example nanosleep returns the amount of unslept time if interrupted TODO add another per syscall flag for this case Note that we use the t
296. t running your program and advised you to read this file The good news is that in general it s easy to write the missing syscall or ioctl wrappers you need so that you can continue your debugging If you send the resulting patches to me then you ll be doing a favour to all future Valgrind users too Note that an ioctl is just a special kind of system call really so there s not a lot of need to distinguish them at least conceptually in the discussion that follows All this machinery is in coregrind m syswrap What are syscall ioctl wrappers What do they do Valgrind does what it does in part by keeping track of everything your program does When a system call happens for example a request to read part of a file control passes to the Linux kernel which fulfills the request and returns control to your program The problem is that the kernel will often change the status of some part of your program s memory as a result and tools instrumentation plug ins may need to know about this Syscall and ioctl wrappers have two jobs 1 Tell a tool what s about to happen before the syscall takes place A tool could perform checks beforehand eg if memory about to be written is actually writeable This part is useful but not strictly essential 2 Tell a tool what just happened after a syscall takes place This is so it can update its view of the program s state eg that memory has just been written to This step is e
297. t I think it might make an interesting week s work for someone As of 15 ish March 2002 I ve started to experiment with this using the AMD prefetch prefetchw insns 1 4 5 User defined Permission Ranges This is quite a large project perhaps a month s hacking for a capable hacker to do a good job but it s potentially very interesting The outcome would be that Valgrind could detect a whole class of bugs which it currently cannot The presentation falls into two pieces 1 4 5 1 Part 1 User defined Address range Permission Setting Valgrind intercepts the client s malloc free etc calls watches system calls and watches the stack pointer move This is currently the only way it knows about which addresses are valid and which not Sometimes the client program knows extra information about its memory areas For example the client could at some point know that all elements of an array are out of date We would like to be able to convey to Valgrind this information that the array is now addressable but uninitialised so that Valgrind can then warn if elements are used before they get new values What I would like are some macros like this 114 The Design and Implementation of Valgrind VALGRIND_MAKE_NOACCESS addr len VALGRIND MAKI RITABLE addr len VALGRIND_MAK EADABLE addr len gal les _W ARI and also to check that memory is addressible initialised VALGRIND_CHECK_ADDRESSIB
298. t like this 47 Memcheck a heavyweight memory checker eerte S if slime P Cha Ca js Sexe sl S27 sl x 42 Si E s2 sl The question to ask is how large is struct S in bytes An int is 4 bytes and a char one byte so perhaps a struct S occupies 5 bytes Wrong All non toy compilers we know of will round the size of struct Sup to a whole number of words in this case 8 bytes Not doing this forces compilers to generate truly appalling code for subscripting arrays of struct S s So s1 occupies 8 bytes yet only 5 of them will be initialised For the assignment s2 s1 gcc generates code to copy all 8 bytes wholesale into s2 without regard for their meaning If Memcheck simply checked values as they came out of memory it would yelp every time a structure assignment like this happened So the more complicated semantics described above is necessary This allows gcc to copy s1 into s2 any way it likes and a warning will only be emitted if the uninitialised values are later used 3 5 2 Valid address A bits Notice that the previous subsection describes how the validity of values is established and maintained without having to say whether the program does or does not have the right to access any particular memory location We now consider the latter issue As described above every bit in memory or in the CPU has an associated valid value V bit In addition all bytes in memory but not in the CPU have an associated valid
299. t of the real CPU s state the offsets in VG baseBlock are not set up yet because to do so would involve disrupting the real machine s state significantly The way round this is to dump the real machine s state into a temporary static block of memory VG m state static We can then set up the VG baseBlock offsets at our leisure and copy into it from VG m state static at some convenient later time This copying is done by VG copy m state static to baseBlock On exit the inverse transformation is rather unnecessarily used stuff in VG baseBlock is copied to VG m state static and the assembly stub then copies from VG m state static into the real ma chine registers Doing system calls on behalf of the client vg syscall S is something of a half way house We have to make the world look sufficiently like that which the client would normally have to make the syscall actually work properly but we can t afford to lose control So the trick is to copy all of the client s state except its program counter into the real CPU do the system call and copy the state back out Note that the client s state includes its stack pointer register so one effect of this partial restoration is to cause the system call to be run on the client s stack as it should be 97 The Design and Implementation of Valgrind As ever there are complications We have to save some of our own state somewhere when restoring the client s state into
300. t slow fallbacks instead Not engaged by default Ditto VG_DEBUG_LEAKCHECK The JITter parses x86 basic blocks into sequences of UCode instructions It then sanity checks each one with VG saneUInstr and sanity checks the sequence as a whole with VG saneUCodeBlock This stuff is engaged by default and has caught some way obscure bugs in the simulated CPU machinery in its time e The system call wrapper does V6 irst and last secondaries look plausible after every syscall this is known to pick up bugs in the syscall wrappers Engaged by default e The main dispatch loop in VG dispatch checks that translations do not set ebp to any value different from VG EBP DISPATCH CHECKED or amp VG baseBlock In effect this test is free and is permanently engaged There are a couple of ifdefed out consistency checks I inserted whilst debugging the new register allocater vg do register allocation try to avoid techniques algorithms mechanisms etc for which I can supply neither a convincing argument that they are correct nor sanity check code which might pick up bugs in my implementation I don t always succeed in this but I try Basically the idea is avoid techniques which are in practice unverifiable in some sense When doing anything always have in mind how can I verify that this is correct Some more specific things are 94 The Design and Implementation of Va
301. tack Informs Valgrind that the previously registerer stack with stack id id has changed it s start and end values Use this if your user level thread package implements stack growth Note that valgrind h is included by all the tool specific header files such as memcheck h so you don t need to include it in your client if you include a tool specific header 2 8 Support for Threads Valgrind supports programs which use POSIX pthreads Getting this to work was technically challenging but it all works well enough for significant threaded applications to work The main thing to point out is that although Valgrind works with the built in threads system eg NPTL or Linux Threads it serialises execution so that only one thread is running at a time This approach avoids the horrible implementation problems of implementing a truly multiprocessor version of Valgrind but it does mean that threaded apps run only on one CPU even if you have a multiprocessor machine Valgrind schedules your program s threads in a round robin fashion with all threads having equal priority It switches threads every 50000 basic blocks on x86 typically around 300000 instructions which means you ll get a much finer 23 Using and understanding the Valgrind core interleaving of thread executions than when run natively This in itself may cause your program to behave differently if you have some kind of concurrency critical race locking or similar bugs Your
302. te that the resulting profile data can only be viewed with KCachegrind For assembler annotation it also is interesting to see more details of the control flow inside of functions ie conditional jumps This will be collected by further specifying collect jumps yes 5 3 2 Multiple profiling dumps from one program run Often you aren t interested in time characteristics of a full program run but only of a small part of it e g execution of one algorithm If there are multiple algorithms or one algorithm running with different input data it s even useful to get different profile information for multiple parts of one program run Profile data files have names of the form 64 Callgrind a heavyweight profiler callgrind out pid part threadID where pid is the PID of the running program part is a number incremented on each dump part is skipped for the dump at program termination and threadID is a thread identification threadID is only used if you request dumps of individual threads with separate threads yes There are different ways to generate multiple profile dumps while a program is running under Callgrind s supervision Nevertheless all methods trigger the same action which is dump all profile information since the last dump or program start and zero cost counters afterwards To allow for zeroing cost counters without dumping there is a second action zero all cost counters now The different meth
303. tent A second possible problem is that of conflicting wrappers It is easily possible to load two or more wrappers both of which claim to be wrappers for some third function In such cases Valgrind will complain about conflicting wrappers when the second one appears and will honour only the first one 2 10 4 Debugging Figuring out what s going on given the dynamic nature of wrapping can be difficult The trace redir yes flag makes this possible by showing the complete state of the redirection subsystem after every mmap munmap event affecting code text There are two central concepts A redirection specification is a binding of a soname pattern fnname pattern pair to a code address These bindings are created by writing functions with names made with the I_WRAP_SONAME_FNNAME_ 2Z _ZU macros An active redirection is code address to code address binding currently in effect The state of the wrapping and redirection subsystem comprises a set of specifications and a set of active bindings The specifications are acquired discarded by watching all mmap munmap events on code text sections The active binding set is conceptually recomputed from the specifications and all known symbol names following any change to the specification set trace redir yes shows the contents of both sets following any such event v prints a line of text each time an active specification is used for the first time Hence for ma
304. that 2000000 in the value of the esp stack pointer register Warning client attempted to close Valgrind s logfile fd number Valgrind doesn t allow the client to close the logfile because you d never see any diagnostic information after that point If you see this message you may want to use the 1og fd lt number gt option to specify a different logfile file descriptor number Warning noted but unhandled ioctl lt number gt Valgrind observed a call to one of the vast family of ioct1 system calls but did not modify its memory status info because I have not yet got round to it The call will still have gone through but you may get spurious errors after this as a result of the non update of the memory info Warning set address range perms large range number Diagnostic message mostly for benefit of the Valgrind developers to do with memory permissions 2 16 Debugging MPI Parallel Programs with Valgrind Valgrind supports debugging of distributed memory applications which use the MPI message passing standard This support consists of a library of wrapper functions for the PMP1_x interface When incorporated into the application s address space either by direct linking or by LD_PRELOAD the wrappers intercept calls to PMPI Send PMPI_Recv etc They then use client requests to inform Valgrind of memory state changes caused by the function being wrapped This reduces the number of false positives that Mem
305. that both 32 bit and 64 bit executables can be run Sometimes this cleverness is a problem for a variety of reasons These two flags allow for single target builds in this situation If you issue both the configure script will complain Note they are ignored on 32 bit only platforms x86 linux ppc32 linux The configure script tests the version of the X server currently indicated by the current SDISPLAY This is a known bug The intention was to detect the version of the current XFree86 client libraries so that correct suppressions could be selected for them but instead the test checks the server version This is just plain wrong If you are building a binary package of Valgrind for distribution please read README PACKAGERS Readme Packagers It contains some important information Apart from that there s not much excitement here Let us know if you have build problems 2 12 If You Have Problems Contact us at http www valgrind org See Limitations for the known limitations of Valgrind and for a list of programs which are known not to work on it All parts of the system make heavy use of assertions and internal self checks They are permanently enabled and we have no plans to disable them If one of them breaks please mail us If you get an assertion failure on the expression blockSane ch in VG free inm mallocfree c this may have happened because your program wrote off the end of a malloc d block or before its
306. the second line This will spew out debugging junk faster than you can possibly imagine 1 2 4 UCode operand tags type Tag 98 The Design and Implementation of Valgrind UCode is more or less a simple two address RISC like code In keeping with the x86 AT amp T assembly syntax generally speaking the first operand is the source operand and the second is the destination operand which is modified when the uinstr is notionally executed UCode instructions have up to three operand fields each of which has a corresponding Tag describing it Possible values for the tag are NoValue indicates that the field is not in use e Lit16 the field contains a 16 bit literal Literal the field denotes a 32 bit literal whose value is stored in the 1it 32 field of the uinstr itself Since there is only one 1it32 for the whole uinstr only one operand field may contain this tag SpillNo the field contains a spill slot number in the range 0 to 23 inclusive denoting one of the spill slots contained inside VG_ baseBlock Such tags only exist after register allocation RealReg the field contains a number in the range 0 to 7 denoting an integer x86 real register on the host The number is the Intel encoding for integer registers Such tags only exist after register allocation e ArchReg the field contains a number in the range 0 to 7 denoting an integer x86 register on the simulated CPU In reality this means a reference to on
307. ther platforms 112 The Design and Implementation of Valgrind It would be great if Valgrind was ported to FreeBSD and x86 NetBSD and to x86 OpenBSD if it s possible doesn t OpenBSD use a out style executables not ELF The main difficulties for an x86 ELF platform seem to be You d need to rewrite the proc self maps parser vg_procselfmaps c Easy You d need to rewrite vg syscall mem c or more specifically provide one for your OS This is tedious but you can implement syscalls on demand and the Linux kernel interface is for the most part going to look very similar to the BSD interfaces so it s really a copy paste and modify on demand job As part of this you d need to supply anew vg kerneliface h file You d also need to change the syscall wrappers for Valgrind s internal use in vg mylibc c All in all I think a port to x86 ELF BSDs is not really very difficult and in some ways I would like to see it happen because that would force a more clear factoring of Valgrind into platform dependent and independent pieces Not to mention BSD folks also deserve to use Valgrind just as much as the Linux crew do 1 4 Easy stuff which ought to be done 1 4 1 MMX Instructions MMX insns should be supported using the same trick as for FPU insns If the MMX registers are not used to copy uninitialised junk from one place to another in memory this means we don t have to actually simulate the internal MMX unit state
308. ting a tool you shouldn t need to look at any of the code in Valgrind s core Although it might be useful sometimes to help understand something The pub_tool_ h files have a reasonable amount of documentation in it that should hopefully be enough to get you going But ultimately the tools distributed Memcheck Cachegrind Lackey etc are probably the best documentation of all for the moment Note that the VG_ macro is used heavily This just prepends a longer string in front of names to avoid potential namespace clashes 4 2 11 Words of Advice 138 Writing a New Valgrind Tool Writing and debugging tools is not trivial Here are some suggestions for solving common problems 4 2 11 1 Segmentation Faults If you are getting segmen gdb lt prog gt core usually gives the location tation faults in C functions used by your tool the usual GDB command of the segmentation fault 4 2 11 2 Debugging C functions If you want to debug C functions used by your tool you can achieve this by following these steps 1 Set VALGRIND_LAUNCHER to lt prefix gt bin valgrind export VALGRIND_LAUNCHER usr local bin valgrind 2 Thenrun gdb lt prefix gt lib valgrind lt platform gt lt tool gt gdb usr local lib valgrind ppc32 linux lackey 3 Do handle SIGS or SIGILL gdb handle EGV SIGILL nostop noprint in GDB to prevent GDB from stopping on a SIGSEGV SIGILL SIGSEGV nostop noprint
309. tion ADC Ev Gv 109314 Bogus memcheck report on amd64 108883 Crash vg_memory c 905 vgPlain_init_shadow_range Assertion vgPlain_defined_init_shadow_page failed 108349 mincore syscall parameter checked incorrectly 108059 build infrastructure small update 107524 epoll ctl event parameter checked on EPOLL CTL DEL 107123 Vex dies with unhandled instructions OXD9 0x31 OxF OxAE 106841 auxmap amp openGL problems 106713 SDL Init causes valgrind to exit 106352 setcontext and makecontext not handled correctly 106293 addresses beyond initial client stack allocation not checked in VALGRIND DO LEAK CHECK 106283 PIE client programs are loaded at address 0 105831 Assertion vgPlain defined init shadow page failed 105039 long run times probably due to memory manager 104797 valgrind needs to be aware of BLKGETSIZE64 103594 unhandled instruction FICOM 103320 Valgrind 2 4 0 fails to compile with gcc 3 4 3 and O0 103168 potentially memory leak in coregrind ume c 102039 bad permissions for mapped region at address 0xB7C73680 101881 weird assertion problem 101543 Support fadvise64 syscalls 75247 x86_64 amd64 support the biggest bug we have ever fixed 3 0RC1 27 July 05 vex r1303 valgrind r4283 3 0 0 3 August 05 vex r1313 valgrind r4316 Stable release 2 4 0 March 2005 CHANGES RELATIVE TO 2 2 0 2 4 0 brings many significant changes and bug fixes The most significant user visible change is that we no longer supply our
310. tion fails Others have other uses Second various needs can be set for a tool using the functions VG needs They are mostly booleans and can be left untouched they default to False They determine whether a tool can do various things such as record report and suppress errors process command line options wrap system calls record extra information about malloc d blocks etc 137 Writing a New Valgrind Tool For example if a tool wants the core s help in recording and reporting errors it must call VG_ needs_tool_errors and provide definitions of eight functions for comparing errors printing out errors reading suppressions from a suppressions file etc While writing these functions requires some work it s much less than doing error handling from scratch because the core is doing most of the work See the function VG_ needs_tool_errors in include pub_tool_tooliface h for full details of all the needs Third the tool can indicate which events in core it wants to be notified about using the functions VG track x These include things such as blocks of memory being malloc d the stack pointer changing a mutex being locked etc If a tool wants to know about this it should provide a pointer to a function which will be called when that event happens For example if the tool want to be notified when a new block of memory is malloc d it should call VG_ track_new_mem_heap with an appropriate function
311. tion of them which were taken These examples give a reasonable idea of what kinds of things Valgrind can be used for The instrumentation can range from very lightweight e g counting the number of times a particular function is called to very intrusive e g memcheck s memory checking 4 2 2 Suggested tools Here is a list of ideas we have had for tools that should not be too hard to implement branch profiler A machine s branch prediction hardware could be simulated and each branch annotated with the number of predicted and mispredicted branches Would be implemented quite similarly to Cachegrind and could reuse the cg annotate script to annotate source code The biggest difficulty with this is the simulation the chip makers are very cagey about how their chips do branch prediction But implementing one or more of the basic algorithms could still give good information 134 Writing a New Valgrind Tool coverage tool Cachegrind can already be used for doing test coverage but it s massive overkill to use it just for that It would be easy to write a coverage tool that records how many times each basic block was recorded Again the cg annotate script could be used for annotating source code with the gathered information Although cg annotate is only designed for working with single program runs It could be extended relatively easily to deal with multiple runs of a program so that the coverage of a whole test suite c
312. tionList SubPosition Spacet SubPosition Number Number Number x Costs Number Spacet PositionSpecification Position Spacex PositionName Position CostPosition CalledPosition COSIEPOSiicstom se Yolo Wiel Weasspu More Marin 129 Callgrind Format Specification CallecPosiciom s Y Werle Vere verms PositionName Number Spacex NoNewLineCharx AssoziationSpecification CallSpezification JumpSpecification CallSpecification CallLine n CostLine CallLine calls Space Number Space SubPositionList JumpSpecification Siem se VU Wye Number HexNumber Digit Dateplie a VO aco wow HexNumber 0x Digit HexChar sierdCingue ge Wei oo Wag Jp ug ieee Eine Name Alpha Digit Alpha Abo has sal e Mz Lr T NoNewLineChar all characters without An 3 2 2 Description of Header Lines The header has an arbitrary number of lines of the format key value Possible key values for the header are version number Callgrind This is used to distinguish future profile data formats compatible with Cachegrinds format It is optional if not appearing version 1 is supposed Otherwise this has to be the first header line A major version of 0 or 1 is supposed to be upwards 130 Callgrind Format Specification pid process id Callgrind This specifies the process ID of th
313. tions for more or less a megabyte of original code which generally comes to about 70000 basic blocks for C compiled with optimisation on Generating new translations is expensive so it is worth having a large TC to minimise the capacity miss rate The dispatcher VG_ dispatch receives hints from the translations which allow it to cheaply spot all control transfers corresponding to x86 call and ret instructions It has to do this in order to spot some special events e Calls to VG_ shutdown This is Valgrind s cue to exit NOTE actually this is done a different way it should be cleaned up e Returns of system call handlers to the return address V6 signalreturn bogusRA The signal simulator needs to know when a signal handler is returning so we spot jumps returns to this address Calls to vg trap here All malloc free etc calls that the client program makes are eventually routed to a call to vg trap here and Valgrind does its own special thing with these calls In effect this provides a trapdoor by which Valgrind can intercept certain calls on the simulated CPU run the call as it sees fit itself on the real CPU and return the result to the simulated CPU quite transparently to the client program 92 The Design and Implementation of Valgrind Valgrind intercepts the client s malloc free etc calls so that it can store additional information Each block malloc d by the client gives rise to a shadow block in wh
314. tly report space leaks in glibc Problem is that running libc freeres in older glibc versions causes this crash WORKAROUND FOR 1 1 X and later versions of Valgrind use the run libc freeres no flag You may then get space leak reports for glibc allocations please _don t_ report these to the glibc people since they are not real leaks but at least the program runs My buggy program dies like this valgrind vg malloc2 c 442 bszW to pszW Assertion pszW gt 0 failed If Memcheck the memory checker shows any invalid reads invalid writes and invalid frees in your program the above may happen Reason is that your program may trash Valgrind s low level memory manager which then dies with the above assertion or something like this The cure is to fix your program so that it doesn t do any illegal memory accesses The above failure will hopefully go away after that My program dies printing a message like this along the way disInstr unhandled instruction bytes 0x66 OxF 0x2E 0x5 Older versions did not support some x86 instructions particularly SSE SSE2 instructions Try a newer Valgrind we now support almost all instructions If it still happens with newer versions if the failing instruction is an SSE SSE2 instruction you might be able to recompile your program without it by using the flag march to gcc Either way let us know and we ll try to fix it Another possibility is that your program has a
315. to quite advanced auxprogs libmpiwrap cisanexample of wrapping a big complex API the MPI 2 interface This file defines almost 300 different wrappers 2 11 Building and Installing 28 Using and understanding the Valgrind core We use the standard Unix configure make make install mechanism and we have attempted to ensure that it works on machines with kernel 2 4 or 2 6 and glibc 2 2 X or 2 3 X You may then want to run the regression tests with make regtest There are five options in addition to the usual prefix which affect how Valgrind is built nable inner This builds Valgrind with some special magic hacks which make it possible to run it on a standard build of Valgrind what the developers call self hosting Ordinarily you should not use this flag as various kinds of safety checks are disabled enable tls TLS Thread Local Storage is a relatively new mechanism which requires compiler linker and kernel support Valgrind tries to automatically test if TLS is supported and if so enables this option Sometimes it cannot test for TLS so this option allows you to override the automatic test with vex Specifies the path to the underlying VEX dynamic translation library By default this is taken to be in the VEX directory off the root of the source tree enable only64bit enable only32bit On 64 bit platforms amd64 linux ppc64 linux Valgrind is by default built in such a way
316. to the original except that if the original contained any undefined bits then it and all bits above it are marked as undefined too Hence the Left bit in the names UNARY Signed and unsigned value widening VgT SWiden14 VgT_SWiden24 VgT_SWiden12 VgT_ZWiden14 VgT ZWiden24 and VgT_ZWiden12 These mimic the definedness effects of stan dard signed and unsigned integer widening Unsigned widening creates zero bits in the new positions so VgT_ZWidenx accordingly park mark those parts of their argument as defined Signed widening copies the sign bit into the new positions so VgT_SWidenx copies the definedness of the sign bit into the new positions Because 1 means undefined and 0 means defined these operations can fascinatingly be done by the same opera tions which they mimic Go figure BINARY Undefined if either Undefined Defined if either Defined VgT_UifU4 VgT UifU2 VgT UifU1 VgT_UifU0 VgT DifD4 VgT DifD2 VgT DifDl These do simple bitwise opera tions on pairs of V bit vectors with UifU giving undefined if either arg bit is undefined and Di D giving defined if either arg bit is defined Abstract interpretation junkies if any make it this far may like to think of them as meets and joins or is it joins and meets in the definedness lattices BINARY one value one V bits Generate argument improvement terms for AND and OR VgT ImproveANDA TQ VgT ImproveAND2 TQ VgT ImproveAND1 TQ VgT ImproveORA TQ VgT ImproveOR2 TQ VgT Imp
317. ts ticks fn func 0x80001234 90 1 0x80001237 90 5 0x80001238 91 6 With subposition compression this looks like positions instr line events ticks fn func 0x80001234 90 1 SS aedL ard 5 Remark For assembler annotation to work instruction addresses have to be corrected to correspond to addresses found in the original binary Le for relocatable shared objects often a load offset has to be subtracted 3 1 7 Miscellaneous 127 Callgrind Format Specification 3 1 7 1 Cost Summary Information For the visualization to be able to show cost percentage a sum of the cost of the full run has to be known Usually it is assumed that this is the sum of all cost lines in a file But sometimes this is not correct Thus you can specify a summary line in the header giving the full cost for the profile run This has another effect a import filter can show a progress bar while loading a large data file if he knows to cost sum in advance 3 1 7 2 Long Names for Event Types and inherited Types Event types for cost lines are specified in the events line with an abbreviated name For visualization it makes sense to be able to specify some longer more descriptive name For an event type Ir which means Instruction Fetches this can be specified the header line event Ir Instruction Fetches events Ir Dr In this example Dr itself has no long name assoziated The order of event lines and the events lin
318. ts VG_ baseBlock into the real FPU using an x86 frstor insn do the ucode FPU insn on the real CPU and write the updated FPU state back into VG_ baseBlock using an fnsave instruction This is pretty brutal but is simple and it works and even seems tolerably efficient There is no attempt to cache the simulated FPU state in the real FPU over multiple back to back ucode FPU instructions FPU_R and FPU_W are also done this way with the minor complication that we need to patch in some addressing mode bits so the resulting insn knows the effective address to use This is easy because of the regularity of the x86 FPU instruction encodings An analogous trick is done with ucode insns which claim in their flags_r and flags_w fields that they read or write the simulated SEFLAGS For such cases we first copy the simulated SEF LAGS into the real Seflags then do the insn then if the insn says it writes the flags copy back to SEF LAGS This is a bit expensive which is why the ucode optimisation pass goes to some effort to remove redundant flag update annotations 110 The Design and Implementation of Valgrind And so that s the end of the documentation for the instrumentating translator It s really not that complex because it s composed as a sequence of simple ish self contained transformations on straight line blocks of code 1 2 11 Top level dispatch loop Urk In VG toploop This is basically boring and unsurprising
319. u have the various tools listed above installed you probably won t need to modify your catalogs Butif you do then just add another group to this file reflecting your local installation 4 3 2 2 Writing the Documentation Follow these steps using foobar as the example tool name again 1 Make a directory valgrind foobar docs 2 Copy the XML documentation file for the tool Nulgrind from valgrind none docs nl manual xml to foobar docs and rename it to foobar docs fb manual xml Note there is a really stupid tetex bug with underscores in filenames so don t use _ 3 Write the documentation There are some helpful bits and pieces on using xml markup in valgrind docs xml xml_help txt 4 Include it in the User Manual by adding the relevant entry to valgrind docs xml manual xml Copy and edit an existing entry 5 Validate foobar docs fb manual xml using the following command from within valgrind docs make valid You will probably get errors that look like this xml index xml 5 element chapter validity error No declaration for attribute base of element chapter Ignore only these they re not important Because the xml toolchain is fragile it is important to ensure that fo manual xml won t break the documen tation set build Note that just because an xml file happily transforms to html does not necessarily mean the same holds true for pdf ps 6 You can re generate the HTML docs while you are wr
320. u may replace the old one on explicit permission from the previous publisher that added the old one The author s and publisher s of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version 5 COMBINING DOCUMENTS You may combine the Document with other documents released under this License under the terms defined in section 4 above for modified 206 The GNU Free Documentation License versions provided that you include in the combination all of the Invariant Sections of all of the original documents unmodified and list them all as Invariant Sections of your combined work in its license notice and that you preserve all their Warranty Disclaimers The combined work need only contain one copy of this License and multiple identical Invariant Sections may be replaced with a single copy If there are multiple Invariant Sections with the same name but different contents make the title of each such section unique by adding at the end of it in parentheses the name of the original author or publisher of that section if known or else a unique number Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work In the combination you must combine any sections Entitled History in the various original documents forming one section Entitled History likewise combine any sections
321. uggestions for making tools easier to write If you have suggestions for improving this documentation f you don t understand something or anything else Happy programming 143 Valgrind Distribution Documents Release 3 2 0 7 June 2006 Copyright 2000 2006 Valgrind Developers Email valgrind valgrind org Valgrind Distribution Documents Table of Contents 1 ACKNOWLEDGEMENTS hunger Eua DID e arca Gelade Ec ene oad dye Sede 146 Z AUTHORS cesiones er eph baie tule eae noted edet ues er a ped ee oe dd 148 3 INSTALL ad ITA RII err hr ash Pede ENLEVER Y MERE EORR E NE 149 4 NEWS zones RRIMRIdpIEReRAILIX RE IM eeu bs deeb od La ied ebd seed ps 153 S RBADMBE iue cas chip suni eat ba saa A a RR AA ERR E Er EUM IRA DAR 182 6 README MISSING SYSCALL OR IOCTL ssssssssse I n 184 T README DEVELOPERS ge tak tenerte o one ae tees e RCM Re te IRE TE Je DIET oe 188 README PACKAGERS Vio otis an Sheds wk D eder cu Retard a ee A ei bd e it 191 cxlv 1 ACKNOWLEDGEMENTS Cerion Armour Brown cerion open works co uk Cerion worked on PowerPC instruction set support using the Vex dynamic translation framework Jeremy Fitzhardinge jeremy valgrind org Jeremy wrote Helgrind and totally overhauled low level syscall signal and address space layout stuff among many other improvements Tom Hughes tom valgrind org Tom did a vast number of bug fixes and helped out with support for more recent Linux glibc vers
322. undefined value errors When yes Memcheck behaves like Addrcheck a lightweight memory checking tool that used to be part of Valgrind which didn t detect undefined value errors Use this option if you don t like seeing undefined value errors 3 3 Explanation of error messages from Memcheck Despite considerable sophistication under the hood Memcheck can only really detect two kinds of errors use of illegal addresses and use of undefined values Nevertheless this is enough to help you discover all sorts of memory management nasties in your code This section presents a quick summary of what error messages mean The precise behaviour of the error checking machinery is described in Details of Memcheck s checking machinery 3 3 1 Illegal read Illegal write errors For example Invalid read of size 4 at Ox40F6BBCC within usr lib libpng so 2 1 0 9 by Ox40F6B804 within usr lib libpng so 2 1 0 9 by Ox40B07FF4 read png image FP8QImageIO kernel qpngio cpp 326 by 0x40AC751B QImagelO read kernel qimage cpp 3621 Address OxBFFFFOEO is not stack d malloc d or free d This happens when your program reads or writes memory at a place which Memcheck reckons it shouldn t In this example the program did a 4 byte read at address OXBFFFFOEO somewhere within the system supplied library libpng so 2 1 0 9 which was called from somewhere else in the same library called from line 326 of qongio cpp and so on Memcheck tr
323. value computed increasing the size of the code at least 12 times and making it run 25 50 times slower than natively At the other end of the spectrum the ultra trivial none tool a k a Nulgrind adds no instrumentation at all and causes in total only about a 4 times slowdown Valgrind simulates every single instruction your program executes Because of this the active tool checks or profiles not only the code in your application but also in all supporting dynamically linked so format libraries including the GNU C library the X client libraries Qt if you work with KDE and so on If you re using one of the error detection tools Valgrind will often detect errors in libraries for example the GNU C or X11 libraries which you have to use You might not be interested in these errors since you probably have no control over that code Therefore Valgrind allows you to selectively suppress errors by recording them in a suppressions file which is read when Valgrind starts up The build mechanism attempts to select suppressions which give reasonable behaviour for the libc and XFree86 versions detected on your machine To make it easier to write suppressions you can use the gen suppressions yes option which tells Valgrind to print out a suppression for each error that appears which you can then copy into a suppressions file Different error checking tools report different kinds of errors The suppression mechanism therefore allows you to sa
324. ve provided that you release the Modified Version under precisely this License with the Modified Version filling the role of the Document thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it In addition you must do these things in the Modified Version A Use in the Title Page and on the covers if any a title distinct from that of the Document and from those of previous versions which should if there were any be listed in the History section of the Document You may use the same title as a previous version if the original publisher of that version gives permission B List on the Title Page as authors one or more persons or entities responsible for authorship of the modifications in the Modified Version together with at least five of the principal authors of the Document all of its principal authors if it has fewer than five unless they release you from this requirement C State on the Title page the name of the publisher of the Modified Version as the publisher D Preserve all the copyright notices of the Document E Add an appropriate copyright notice for your modifications adjacent to the other copyright notices F Include immediately after the copyright notices a license notice giving the public permission to use the Modified Version under the terms of this License in the form shown in the Addendum below G Preserve in that license notice the full lists of Invar
325. ve chosen not to do so 9 2 Lackey Options Lackey specific options are fnname name default dl runtime resolve Count calls to name detailed counts no yes default no Count loads stores and alu ops 79 Valgrind FAQ Release 3 2 0 7 June 2006 Copyright 2000 2006 Valgrind Developers Email valgrind valgrind org Valgrind FAQ Table of Contents Valgrind Frequently Asked Questions 2 oes 82 Ixxxi Valgrind Frequently Asked Questions Valgrind Frequently Asked Questions 1 Background 1 1 How do you pronounce Valgrind The Val as in the world value The grind is pronounced with a short i ie grinned rhymes with tinned rather than grined rhymes with find Don t feel bad almost everyone gets it wrong at first 1 2 Where does the name Valgrind come from From Nordic mythology Originally before release the project was named Heimdall after the watchman of the Nordic gods He could see a hundred miles by day or night hear the grass growing see the wool growing on a sheep s back etc This would have been a great name but it was already taken by a security package Heimdal Keeping with the Nordic theme Valgrind was chosen Valgrind is the name of the main entrance to Valhalla the Hall of the Chosen Slain in Asgard Over this entrance there resides a wolf and over it there is the head of a boar and on it perches a huge eagle
326. voids long chains of errors When values are loaded from memory valgrind checks the A bits for that location and issues an illegal address warning if needed In that case the V bits loaded are forced to indicate Valid despite the location being invalid This apparently strange choice reduces the amount of confusing information presented to the user It avoids the unpleasant phenomenon in which memory is read from a place which is both unaddressible and contains invalid values and as a result you get not only an invalid address read write error but also a potentially large set of uninitialised value errors one for every time the value is used There is a hazy boundary case to do with multi byte loads from addresses which are partially valid and partially invalid See details of the flag partial loads ok for details Memcheck intercepts calls to malloc calloc realloc valloc memalign free new new delete and delete The behaviour you get is malloc new new the returned memory is marked as addressible but not having valid values This means you have to write on it before you can read it calloc returned memory is marked both addressible and valid since calloc clears the area to zero realloc if the new size is larger than the old the new section is addressible but invalid as with malloc If the new size is smaller the dropped off section is marked as unaddressible You may only pass to realloc a pointer prev
327. word processors SGML or XML for which the DTD and or processing tools are not generally available and the machine generated HTML PostScript or PDF produced by some word processors for output purposes only The Title Page means for a printed book the title page itself plus such following pages as are needed to hold legibly the material this License requires to appear in the title page For works in formats which do not have any title page as such Title Page means 203 The GNU Free Documentation License the text near the most prominent appearance of the work s title preceding the beginning of the body of the text A section Entitled XYZ means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language Here XYZ stands for a specific section name mentioned below such as Acknowledgements Dedications Endorsements or History To Preserve the Title of such a section when you modify the Document means that it remains a section Entitled XYZ according to this definition The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document These Warranty Disclaimers are considered to be included by reference in this License but only as regards disclaiming warranties any other implication that these Warranty Disclaimers may have is void and has no effect on the me
328. x4C261F0E PptXml PptXml void pptxml cc 44 Address 0x4BB292A8 is O bytes inside a block of size 64 alloc d at 0x4004318C builtin vec new vg clientfuncs c 152 by 0x4C21BC15 KLaola readSBStream int const klaola cc 314 by 0x4C21C155 KLaola stream KLaola OLENode const klaola cc 416 by 0x4C21788F OLEFilter convert QCString const amp olefilter cc 272 This tells you that some memory allocated with new was freed with free If stage2 was stripped the message would look like this Mismatched free delete delete at 0x40043249 inside stage2 by 0x4102BB4E QGArray QGArray void tools qgarray cpp 149 by 0x4C261C41 PptDoc PptDoc void include qmemarray h 60 by 0x4C261F0E PptXml PptXml void pptxml cc 44 Address 0x4BB292A8 is O bytes inside a block of size 64 alloc d at 0x4004318C inside stage2 by 0x4C21BC15 KLaola readSBStream int const klaola cc 314 by 0x4C21C155 KLaola stream KLaola OLENode const klaola cc 416 by 0x4C21788F OLEFilter convert QCString const amp olefilter cc 272 This isn t so helpful Although you can tell there is a mismatch the names of the allocating and deallocating functions are no longer visible The same kind of thing occurs in various other messages 191 README_PACKAGERS from valgrind Please test the final installation works by running it on something huge I suggest checking that it can start and exit successfully both Mozilla 1 0 a
329. ximum debugging effectiveness you will need to use both flags One final comment The function wrapping facility is closely tied to Valgrind s ability to replace redirect specified functions for example to redirect calls to malloc to its own implementation Indeed a replacement function can be regarded as a wrapper function which does not call the original However to make the implementation more robust the two kinds of interception wrapping vs replacement are treated differently trace redir yes shows specifications and bindings for both replacement and wrapper functions To differentiate the two replacement bindings are printed using R whereas wraps are printed using W 2 10 5 Limitations control flow For the most part the function wrapping implementation is robust The only important caveat is in a wrapper get hold of the Or igFn information using VALGRIND GET ORIG FN before calling any other wrapped function Once you have the OrigFn arbitrary intercalling recursion between and longjumping out of wrappers should work correctly There is never any interaction between wrapped functions and merely replaced functions eg malloc so you can call malloc etc safely from within wrappers The above comments are true for x86 amd64 ppc32 linux On ppc64 linux function wrapping is more fragile due to the arguably poorly designed ppc64 linux ABI This mandates the use of a shadow stack which tracks entries exits of both wr
330. y which tool or tool s each suppression applies to 2 2 Getting started Using and understanding the Valgrind core First off consider whether it might be beneficial to recompile your application and supporting libraries with debugging info enabled the g flag Without debugging info the best Valgrind tools will be able to do is guess which function a particular piece of code belongs to which makes both error messages and profiling output nearly useless With g you ll hopefully get messages which point directly to the relevant source code lines Another flag you might like to consider if you are working with C is no inline That makes it easier to see the function call chain which can help reduce confusion when navigating around large C apps For whatever it s worth debugging OpenOffice org with Memcheck is a bit easier when using this flag You don t have to do this but doing so helps Valgrind produce more accurate and less confusing error reports Chances are you re set up like this already if you intended to debug your program with GNU gdb or some other debugger This paragraph applies only if you plan to use Memcheck On rare occasions optimisation levels at 02 and above have been observed to generate code which fools Memcheck into wrongly reporting uninitialised value errors We have looked in detail into fixing this and unfortunately the result is that doing so would give a further significant slowdown in what is
331. y other attempt to copy modify sublicense or distribute the Document is void and will automatically terminate your rights under this License However parties who have received copies or rights from you under this License will not have their licenses terminated so long as such parties remain in full compliance 10 FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new revised versions of the GNU Free Documentation License from time to time Such new versions will be similar in spirit to the present version but may differ in detail to address new problems or concerns See http www gnu org copyleft Each version of the License is given a distinguishing version number If the Document specifies that a particular numbered version of this License or any later version applies to it you have the option of following the terms and conditions either of that specified version or of any later version that has been published not as a draft by the Free Software Foundation If the Document does not specify a version number of this License you may choose any version ever published not as a draft by the Free Software Foundation ADDENDUM How to use this License for your documents To use this License in a document you have written include a copy of 208 The GNU Free Documentation License the License in the document and put the following copyright and license notices just after the title page Copyri
332. yle license so you may include them in your code without worrying about license conflicts Some of the PThreads test cases pth c are taken from Pthreads Programming by Bradford Nichols Dick Buttlar amp Jacqueline Proulx Farrell ISBN 1 56592 115 1 published by O Reilly amp Associates Inc 1 2 How to navigate this manual The Valgrind distribution consists of the Valgrind core upon which are built Valgrind tools which do different kinds of debugging and profiling This manual is structured similarly First we describe the Valgrind core how to use it and the flags it supports Then each tool has its own chapter in this manual You only need to read the documentation for the core and for the tool s you actually use although you may find it helpful to be at least a little bit familar with what all tools do If you re new to all this you probably want to run the Memcheck tool If you want to write a new tool read Writing a New Valgrind Tool Be aware that the core understands some command line flags and the tools have their own flags which they know about This means there is no central place describing all the flags that are accepted you have to read the flags documentation both for Valgrind s core and for the tool you want to use 2 Using and understanding the Valgrind core This section describes the Valgrind core services flags and behaviours That means it is relevant regardless of what particular tool you are using A
333. you can find out how many instructions are executed per line which can be useful for traditional profiling and test coverage Any feedback bug fixes suggestions etc welcome 4 1 1 Overview First off as for normal Valgrind use you probably want to compile with debugging info the g flag But by contrast with normal Valgrind use you probably do want to turn optimisation on since you should profile your program as it will be normally run The two steps are 1 Run your program with valgrind tool cachegrindin front of the normal command line invocation When the program finishes Cachegrind will print summary cache statistics It also collects line by line information in a file cachegrind out pid where pid is the program s process id This step should be done every time you want to collect information about a new program a changed program or about the same program with different input N Generate a function by function summary and possibly annotate source files using the supplied cg_annotate program Source files to annotate can be specified manually or manually on the command line or interesting source files can be annotated automatically with the auto yes option You can annotate C C files or assembly language files equally easily This step can be performed as many times as you like for each Step 2 You may want to do multiple annotations showing different information each time 51 Cachegrind a cache prof
334. you don t want to hear about the error in the future When set to al 1 Valgrind will print a suppression for every reported error without querying the user This option is particularly useful with C programs as it prints out the suppressions with mangled names as required Note that the suppressions printed are as specific as possible You may want to common up similar ones eg by adding wildcards to function names Also sometimes two different errors are suppressed by the same suppression in which case Valgrind will output the suppression more than once but you only need to have one copy in your suppression file but having more than one won t cause problems Also the suppression name is given as lt insert a suppression name here gt the name doesn t really matter it s only used with the v option which prints out all used suppression records db attach lt yes no gt default no When enabled Valgrind will pause after every error shown and print the line Attach to debugger Return N n Y y C c Pressing Ret orN Ret orn Ret causes Valgrind not to start a debugger for this error Pressing Y Ret or y Ret causes Valgrind to start a debugger for the program at this point When you have finished with the debugger quit from it and the program will continue Trying to continue from inside the debugger doesn t work C Ret orc Ret causes Valgrind not to start a debugger and not to ask again Note db
335. ype vki_time_t This is a copy of the kernel type with vki prefixed Our copies of such types are kept in the appropriate vki h file s We don t include kernel headers or glibc headers directly Writing your own syscall wrappers see below for ioctl wrappers If Valgrind tells you that system call NNN is unimplemented do the following 1 Find out the name of the system call grep NNN usr include asm unistd h This should tell you something like NR_mysyscallname Copy this entry to coregrind vki_unistd VG_PLATFORM h 185 README_MISSING_SYSCALL_OR_IOCTL Do man 2 mysyscallname to get some idea of what the syscall does Note that the actual kernel interface can differ from this so you might also want to check a version of the Linux kernel source NOTE any syscall which has something to do with signals or threads is probably special and needs more careful handling Post something to valgrind developers if you aren t sure Add a case to the already huge collection of wrappers in the coregrind m_syswrap syswrap c files For each in memory parameter which is read or written by the syscall do one of PRE MEM READY PRE MEM RASCIIZ PRE MEM WRITE for that parameter Then do the syscall Then if the syscall succeeds issue suitable POST MEM WRITE calls There s no need for POST MEM READ calls Also add it to the syscall table array use one of GENX GENXY LI

Download Pdf Manuals

image

Related Search

Related Contents

BVSTKT308-033  Biacore 3000 v4 ウィザード版 日本語取扱説明書  User Manual Rev. 01a  MIPRO ACT-70H    Installation Instructions and User Guide    8-Port 10/100Mbps Fast Ethernet User`s Manual  user`s guide  Kramer Electronics C-HDMI/DVI-10  

Copyright © All rights reserved.
Failed to retrieve file