Home

CUDA-GDB (NVIDIA CUDA Debugger)

1. Current CUDA kernel 0 device 0 sm 0 warp 0 lane 0 grid 1 block 0 0 thread 0 0 0 Changing the coordinate or kernel focus To change the focus specify a value to the parameter you want to change For example To change the physical coordinates cuda gdb cuda device 0 sm 1 warp 2 lane 3 New CUDA focus device 0 sm 1 warp 2 lane 3 grid 1 block 10 0 thread 67 0 0 To change the logical coordinates thread parameter cuda gdb cuda thread 15 0 0 New CUDA focus device 0 sm 1 warp 0 lane 15 grid 1 block 10 0 thread 15 0 0 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 7 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS To change the logical coordinates block and thread parameters cuda gdb cuda block 1 0 thread 3 0 0 New CUDA focus device 0 sm 3 warp 0 lane 3 grid 1 block 1 0 thread 3 0 0 Note If the specified set of coordinates is incorrect cuda gdb will try to find the lowest set of valid coordinates If cuda thread selection is set to logical the lowest set of valid logical coor dinates will be selected If cuda thread selection is set to physical the lowest set of physical coordinates will be selected Use set cuda thread_selection to switch the value To change the kernel focus specify a value to the parameter you want to change To change the kernel focus cuda gdb cuda kernel 0 Switching to CUDA Kernel 0 lt lt lt 0 0
2. 0 0 0 gt gt gt J 0 acos main lt lt lt 240 1 128 1 1 gt gt gt parms arg 0x5100000 res 0x5100200 n 5 at acos cu 367 367 int totalThreads gridDim x blockDim x Getting Help For more information use the cuda gdb help with the help cuda and help set cuda commands CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 1 8 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS cuda gdb info commands Note The commands info cuda state and info cuda threads have been removed info cuda system This command displays system information that includes the number of GPUs in the system with device header information for each GPU The device header includes the GPU type compute capability of the GPU number of SMs per GPU number of warps per SM number of threads lanes per warp and the number of registers per thread Example cuda gdb info cuda system Number of devices 1 DEV 0 1 Device Type gt200 SM Type sm_13 SM WP LN 30 32 32 Regs LN 128 info cuda device This command displays the device information with an SM header in addition to the device header per GPU The SM header lists all the SMs that are actively running CUDA blocks with the valid warp mask in each SM The example below shows eight valid warps running on one of the SMs cuda gdb info cuda device DEV 0 1 Device Type gt200 SM Type sm_13 SM WP LN 30 32 32 Regs LN 128 SM 0 30 valid warps 00000000000000
3. cuda gdb now supports device function calls on GPU types sm_20 and higher e Users can step in step out and finish nested functions e Application user stack overflow and hardware CRS stack overflow are detected in cuda gdb The user can override these limits with the following APIs In CUDA runtime applications use cudaThreadSetLimit and cudaThreadGetLimit In CUDA driver APIs use cuCtxSetLimit and cuCtxGetLimit gt Debugger compatibility e The debugger version is tied to the compiler version both binaries need to be from the same toolkit version e CUDA driver is backward compatible with debugger version 3 0 and higher CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 2 CUDA GDB FEATURES AND EXTENSIONS Just as programming in CUDA C is an extension to C programming debugging with cuda gdb is a natural extension to debugging with gdb cuda gdb supports debugging CUDA applications that use the CUDA driver APIs in addition to runtime APIs and supports debugging just in time JIT compiled PTX kernels The following sections describe the cuda gdb features that facilitate debugging CUDA applications gt Debugging CUDA applications on GPU hardware in real time on page 4 Extending the gdb debugging environment on page 4 Supporting an initialization file on page 4 v v v Pausing CUDA execution at any function symbol or source file line number on page 5 Single stepping individual
4. gt 1 19 0x55555555 amp array threadIdx x lt lt 1 20 21 idata threadIdx x array threadIdx x 22 23 24 int main void 25 void d NULL int i 26 unsigned int idata N odata N 27 28 for i 0 i lt N i 29 idata i unsigned int i CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 15 Chapter 04 CUDA GDB WALK THROUGH 30 31 cudaMalloc void fd sizeof int N 32 cudaMemcpy d idata sizeof int N 33 cudaMemcpyHostToDevice 34 35 bitreverse lt lt lt 1 N N sizeof int gt gt gt 4 36 37 cudaMemcpy odata d sizeof int N 38 cudaMemcpyDeviceToHost 39 40 for i 0 i lt N i 41 printf Su gt Su n idata i odatali 42 43 cudaFree void d 44 return 0 45 Walking Through the Code 1 Begin by compiling the bitreverse cu CUDA application for debugging by entering the following command at a shell prompt nvcc g G bitreverse cu o bitreverse This command assumes the source file name to be bitreverse cu and that no additional compiler flags are required for compilation See also Compiling for Debugging on page 14 2 Start the CUDA debugger by entering the following command at a shell prompt cuda gdb bitreverse 3 Set breakpoints Set both the host main and GPU bitreverse breakpoints here Also set a breakpoint at a particular line in the device function bitreverse cu 18 main at 0x400db0 file bit
5. indicates the current kernel only 1 right now the first number is the kernel id the second number is the device id cuda gdb info cuda kernels 0 Device 0 acos main lt lt lt 240 1 128 1 1 gt gt gt parms arg 0x5100000 res 0x5100200 n 5 at acos cu 367 Breaking into running applications cuda gdb provides support for debugging kernels that appear to be hanging or looping indefinitely The CTRL C signal freezes the GPU and reports back the source code location The current thread focus will be on the host you can use cuda kernel lt n gt to switch to the device kernel you need At this point the program can be modified and then either resumed or terminated at the developer s discretion This feature is limited to applications running within the debugger It is not possible to break into and debug applications that have been previously launched CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 11 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Checking Memory Errors The CUDA MemoryChecker feature is enabled which allows detection of global memory violations and mis aligned global memory accesses This feature is off by default and can be enabled using the the following variable in cuda gdb before the application is run set cuda memcheck on Once CUDA memcheck is enabled any detection of global memory violations and mis aligned global memory accesses will be detected only in the run or continue mode
6. info cuda sm This command displays the warp header in addition to the SM and the device headers for every active SM The warp header lists all the warps with valid active and divergent lane mask information for each warp The warp header also includes the block index within the grid to which it belongs The example below lists eight warps with 32 active threads each There is no thread divergence on any of the valid active warps cuda gdb info cuda sm DEV 0 1 Device Type gt200 SM Type sm 13 SM WP LN 30 32 32 Regs LN 128 SM 0 30 valid warps 00000000000000 WP 0 32 valid active divergent lanes Oxffffffff Ooxffffffff 0x00000000 block 0 0 CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 1 9 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS WP 1 32 valid active divergent lanes Oxffffffff oxffffffff 0x00000000 block 0 0 WP 2 32 valid active divergent lanes Oxffffffff oxffffffff 0x00000000 block 0 0 WP 3 32 valid active divergent lanes Oxffffffff oxffffffff 0x00000000 block 0 0 WP 4 32 valid active divergent lanes Oxffffffff Ooxffffffff Ox WP Ox WP Ox WP 0 00000000 block 0 0 5 32 valid active divergent lanes Oxffffffff oxffffffff 00000000 block 0 0 6 32 valid active divergent lanes Oxffffffff oxffffffff 00000000 block 0 0 7 32 valid active divergent lanes Oxffffffff oxffffffff x00000000 block 0 0 info cuda warp This com
7. Ultra Tesla C870 Quadro Plex 1000 Model IV Tesla D870 Quadro Plex 2100 Model S4 Tesla S870 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 20 KNOWN ISSUES The following are known issues with the current release gt Debugging 32 bit CUDA code on a 64 bit host system is not supported gt X11 cannot be running on the GPU that is used for debugging because the debugger effectively makes the GPU look hung to the X server resulting in a deadlock or crash Two possible debugging setups exist e remotely accessing a single GPU using VNC ssh etc e using two GPUs where X11 is running on only one Note The CUDA driver automatically excludes the device used by X11 from being picked by the application being debugged This can change the behavior of the application gt Multi GPU applications are not supported CUDA GDB can debug only CUDA applications that use one GPU In multi GPU applications the CUDA driver exposes only one GPU to the application that is being debugged This can alter the multi GPU application s behavior under the debugger gt The debugger enforces blocking kernel launches gt Device memory allocated via cudaMalloc is not visible outside of the kernel function gt Host memory allocated with cudaMallocHost is not visible in CUDA GDB gt Not all illegal program behavior can be caught in the debugger gt On GPUs with SM type lesss than sm_20 it is not possible to step over a subr
8. also access the shared memory indexed into the starting offset to see what the stored values are cuda gdb p shared int 0x20 3 0 cuda gdb p shared int 0x24 4 128 cuda gdb p shared int 0x28 5 64 The example below shows how to access the starting address of the input parameter to the kernel cuda gdb p amp data 6 const global void const parameter 0x10 cuda gdb p global void const parameter 0x10 7 global void const parameter 0x110000 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 6 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Switching to any coordinate or kernel To support CUDA thread and block switching new commands have been introduced to inspect or change the logical coordinates kernel grid block thread and the physical coordinates device sm warp lane The different between grid and kernel is that grid is a unique identifier for a kernel launch on a given device per device launch id whereas kernel is a unique identifier for a kernel launch across multiple devices Inspecting the coordinates or kernel To see the current selection use the cuda command followed by a space separated list of parameters Example Determing the coordinates cuda gdb cuda device sm warp lane block thread Current CUDA focus device 0 sm 0 warp 0 lane 0 block 0 0 thread 0 0 0 Example Determining the kernel focus cuda gdb cuda kernel
9. and not while single stepping through the code You can also run CUDA memory checker as a standalone tool cuda memcheck CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 12 INSTALLATION AND DEBUG COMPILATION Included in this chapter are instructions for installing cuda gdb and for using NVCC the NVIDIA CUDA compiler driver to compile CUDA programs for debugging Installation Instructions Follow these steps to install NVIDIA cuda gdb 1 Visit the NVIDIA CUDA Zone download page http www nvidia com object cuda_get html 4 Select the appropriate Linux operating system See Host Platform Requirements on page 20 5 Download and install the 3 1 CUDA Driver 6 Download and install the 3 1 CUDA Toolkit This installation should point the environment variable LD_LIBRARY_PATH to usr local cuda lib and should also include usr local cuda bin in the environment variable PATH 7 Download and install the 3 1 CUDA Debugger CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 13 Chapter 03 INSTALLATION AND DEBUG COMPILATION Compiling for Debugging NVCC the NVIDIA CUDA compiler driver provides a mechanism for generating the debugging information necessary for cuda gdb to work properly The g G option pair must be passed to NVCC when an application is compiled in order to debug with cuda gdb for example nvcc g G foo cu o foo Using this line to compile the CUDA application foo cu gt forces 00 most
10. application execution continued A special case is the stepping of the thread barrier call syncthreads In this case an implicit breakpoint is set immediately after the barrier and all threads are continued to this point Currently it is not possible to step over a device subroutine Since all device subroutines are implicitly inlined cuda gdb always steps into a device subroutine Displaying device memory in the device kernel The gdb print command has been extended to decipher the location of any program variable and can be used to display the contents of any CUDA program variable including gt allocations made via cudaMalloc gt data that resides in various GPU memory regions such as shared local and global memory gt special CUDA runtime variables such as threadIdx CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 5 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Variable Storage and Accessibility Depending on the variable type and usage variables can be stored either in registers or in local shared const or global memory You can print the address of any variable to find out where it is stored and directly access the associated memory The example below shows how the variable array which is of shared int array can be directly accessed in order to see what the stored values are in the array cuda gdb p array 1 shared int 0 0x20 cuda gdb p array 0 4 2 0 128 64 192 You can
11. the user with seamless debugging environment that allows simultaneous debugging of GPU and CPU code Standard debugging features are inherently supported for host code and additional features have been provided to support debugging CUDA code cuda gdb is supported on 32 bit and 64 bit Linux Note All information contained within this document is subject to change CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 1 Chapter 01 INTRODUCTION What s New in Version 3 1 In this latest cuda gdb version the following improvements have been made gt cuda gdb now displays kernel launch and kernel termination messages including the kernel id the kernel name and the device on which the kernel is launched or terminated gt cuda gdb now displays the device frame stack separately from the host frame stack instead of being unified in a single call stack gt cuda gdb commands cuda block and cuda thread now accept more flexible notations Instead of x y for cuda block and x y z for cuda thread x y and x y Z are accepted respectively gt New cuda gdb commands e info cuda kernels Displays the list of current active kernels and the device on which they run See info cuda kernels on page 11 e cuda kernel n Without argument display the kernel in focus With argument switch focus to kernel n where n is the kernel id See To change the kernel focus on page 8 gt
12. warps on page 5 Displaying device memory in the device kernel on page 5 Switching to any coordinate or kernel on page 7 cuda gdb info commands on page 9 Breaking into running applications on page 11 vvv vV vV VvV V Checking Memory Errors on page 12 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 3 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Debugging CUDA applications on GPU hardware in real time The goal of cuda gdb is to provide developers a mechanism for debugging a CUDA application on actual hardware in real time This enables developers to verify program correctness without the potential variations introduced by simulation and emulation environments Extending the gdb debugging environment GPU memory is treated as an extension to host memory and CUDA threads and blocks are treated as extensions to host threads Furthermore there is no difference between cuda gdb and gdb when debugging host code The user can inspect either a specific host thread or a specific CUDA thread gt To switch focus to a host thread use the thread N command gt To switch focus to a CUDA thread use the cuda device sm warp lane kernel grid block thread command Note It is important to use the cuda device 0 or cuda kernel 0 command to switch to the required device or kernel before using any of the other CUDA thread commands Supporting an initialization file cuda gdb supports an initi
13. 80 0x40 Oxc0 0x20 Oxadl 0x60 Oxe0 0x19 0x90 0x50 0x40 cuda gdb p data 9 global void parameter 0x10 cuda gdb p global void parameter 0x10 10 global void parameter 0x100000 This verifies thread 170 0 0 is working on the correct data 170 11Delete the breakpoints and continue the program to completion cuda gdb delete b Delete all breakpoints y or n y cuda gdb continue Comi mUe Program exited normally cuda gdb This concludes the cuda gdb walkthrough CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 19 SUPPORTED PLATFORMS The general platform and GPU requirements for running NVIDIA cuda gdb are described in this section Host Platform Requirements NVIDIA supports cuda gdb on the 32 bit and 64 bit Linux distributions listed below gt Red Hat Enterprise Linux 5 3 gt Red Hat Enterprise Linux 4 8 gt Fedora 10 gt Novell SLED 11 gt openSUSE 11 1 gt Ubuntu 9 04 GPU Requirements Debugging is supported on all CUDA capable GPUs with a compute capability of 1 1 or later Compute capability is a device attribute that a CUDA application can query about for more information see the latest NVIDIA CUDA Programming Guide on the NVIDIA CUDA Zone Web site http www nvidia com object cuda_home html These GPUs have a compute capability of 1 0 and are not supported GeForce 8800 GTS Quadro FX 4600 GeForce 8800 GTX Quadro FX 5600 GeForce 8800
14. DA threads cuda gdb info cuda threads SST TU OP AON EA E INIME oon KAOKS OAO NE See oeeie rS lt lt lt 1 1 256 1 gt gt datca 0x 100000 at bitreverse cu 9 The above output indicates that there is one CUDA block with 256 threads executing and all the threads are on the same pc cuda gdb bt 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 Switch to host thread cuda gdb thread Current thread is 2 Thread 140609798666000 LWP 4153 cuda gdb thread 2 Switching to thread 2 Thread 140609798666000 LWP 4153 40 0x0000000000400e7d in main at bitreverse cu 35 35 bitreverse lt lt lt 1 N N sizeof int gt gt gt d cuda gdb bt 0 0x0000000000400e7d in main at bitreverse cu 35 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 17 Chapter 04 CUDA GDB WALK THROUGH Switch to CUDA kernel cuda gdb info cuda kernels 0 Device 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 cuda gdb cuda kernel 0 Switching to CUDA Kernel 0 lt lt lt 0 0 0 0 0 gt gt gt 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 9 unsigned int idata cuda gdb bt 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 unsigned int data The above output indicates that the host thread of focus has LWP ID 9146 and the current C
15. NVIDIA CUDA GDB NVIDIA CUDA Debugger TABLE OF CONTENTS 1 IER OGUCHOIN saksas naa maana kampaania 1 cuda gdb The NVIDIA CUDA Debugger vveeereeeeeneenee 1 What s New in Version 3 1 eneneneeneneneneneeeenenanaseenenenee 2 2 cuda gdb Features and Extensions eeeeeeeeeeeoeeoeeoeeoeeeee 3 Debugging CUDA applications on GPU hardware in real time 4 Extending the gdb debugging environment neereeeeeeeee 4 Supporting an initialization file eeeeeeeeeeeeeeeveevee 4 Pausing CUDA execution at any function symbol or source file line number5 Single stepping individual warps vvrereeeeeeneeveeneenee 5 Displaying device memory in the device kernel eeeeees 5 Variable Storage and Accessibility vverveneenenvenne ne 6 Switching to any coordinate or kernel eeereeeeeeeeneenee 7 Inspecting the coordinates or kernel eeveneeneeneeee 7 Changing the coordinate or kernel focus c ccc cee cee ceececeeeeees 7 Getting Help ss excep po genase daa aaa aaa nates 8 cuda gdb info COMMANAS cece cece ccc ceceee eee ee cee ceeceecesceeeeeeeeeees 9 Breaking into running applications eereereenenneeneenena 11 Checking Memory Errors ssssesssssssesssessssossecsesossscosssosecesseesese
16. UDA thread has block coordinates 0 0 and thread coordinates 0 0 0 7 Corroborate this information by printing the block and thread indices cuda gdb print blockIdx Slo je 0 Yy Of cuda gdb print threadIdx S25 te 0 v 0 z 0 8 The grid and block dimensions can also be printed cuda gdb print gridDim 3 x 1 y 1 cuda gdb print blockDim 4 x 256 y 1 z 1 9 Since thread 0 0 0 reverses the value of 0 switch to a different thread to show more interesting data cuda gdb cuda thread 170 Switehime to CUDA Kernel 0 device 07 sm 0 warp 5 lane 0 grid 1 block OTOP thread 11740 97 0 1 10 Advance kernel execution and verify some data cuda gdb n 1 2 array threadIdx x idatalthreadIdx x cuda gdb n 14 array threadIdx x 0xfofofofo amp array threadIdx x gt gt 4 cuda gdb n 16 array threadIdx x Oxcccccccc amp array threadIdx x gt gt 2 cuda gdb n 18 array threadIdx x Oxaaaaaaaa amp array threadIdx x gt gt 1 cuda gdb n Breakpoint 3 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 21 Pall idata threadIdx x array threadIdx x cuda gdb p array 0 12 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 18 Chapter 04 CUDA GDB WALK THROUGH Sil 10 128 4 192 32 180 SG 224 16 144 80 208 cuda gdb p x array 0 12 Se 0x0 0x
17. alization file which must reside in your home directory cuda gdbinit This file accepts any cuda gdb command or extension as input to be processed when the cuda gdb command is executed It is just like the gdbinit file used by standard versions of gdb only renamed CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 4 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Pausing CUDA execution at any function symbol or source file line number cuda gdb supports setting breakpoints at any host or device function residing in a CUDA application by using the function symbol name or the source file line number This can be accomplished in the same way for either host or device code For example if the kernel s function name is mykernel main the break command is as follows cuda gdb break mykernel_main The above command sets a breakpoint at a particular device location the address of mykernel_main and forces all resident GPU threads to stop at this location There is currently no method to stop only certain threads or warps at a given breakpoint Single stepping individual warps cuda gdb supports stepping GPU code at the finest granularity of a warp This means that typing next or step from the cuda gdb command line when in the focus of device code advances all threads in the same warp as the current thread of focus In order to advance the execution of more than one warp a breakpoint must be set at the desired location and then the
18. ee 12 3 Installation and Debug Compilation esssesssesssccsscsosscosecosescsseoo 13 Installation Instructions ssssessessosssssosseseosseseosseseosseseeseessoo 13 Compiling for Debugging esssesssesssessscessscsssossscessessscesseesesees 14 Compiling Debugging for Fermi and Tesla GPUS ccccccceeceeeeeees 14 Compiling for Fermi GPUS i etma konnani tmiid un nduv isukum nm kik sande kit ika 14 Compiling for Fermi and Tesla GPUs veveeveveneneeeee 14 4 cuda odb Walk through isicissivasissiacssccsvsceasacsieassveavacsecans emaraseeans 15 bitreverse cu Source Code sssssssssssesssscssssssecesseosecesesesesesseee 15 Walking Through the Code ssssssssssssssssssssessssessesoseesseceseseese 16 Appendix A Supported Platforms scccccscssscccccccccccssscccsceees 20 Host Platform Requirements sssssosseseosssseosseseoseeseoseeseeseessoo 20 GPU Requirements esessesesessesesessescoecscsesesesoesesecsceececseseseeo 20 Appendix B Known ISSUCS ssseseeseesecceccecseeseesecsecseesecoeeseeseesee 21 Graphics Driver CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 i INTRODUCTION This document introduces cuda gdb the NVIDIA CUDA debugger and describes what is new in version 3 1 cuda gdb The NVIDIA CUDA Debugger cuda gdb is an extension to the standard 1386 AMD64 port of gdb the GNU Project debugger version 6 6 It is designed to present
19. ly unoptimized compilation gt makes the compiler include symbolic debugging information in the executable Compiling Debugging for Fermi and Tesla GPUs Compiling for Fermi GPUs If you are using the latest Fermi board add the following flags to target Fermi output when compiling the application gencode arch compute 20 code sm 20 Compiling for Fermi and Tesla GPUs If you are targetting both Fermi and Telsa GPUs include these two flags gencode arch compute 20 code sm 20 gencode arch compute 10 code sm 10 CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 1 14 CUDA GDB WALK THROUGH This chapter presents a cuda gdb walk through of eleven steps based on the source code bitreverse cu which performs a simple 8 bit bit reversal on a data set bitreverse cu Source Code 1 include lt stdio h gt 2 include lt stdlib h gt 3 4 Simple 8 bit bit reversal Compute test 5 6 define N 256 7 8 _ global void bitreverse void data 9 unsigned int idata unsigned int data 10 extern _shared__ int array TI 12 array threadIdx x idata threadIdx x 13 14 array threadIdx x 0xfofofof0 amp array threadIdx x gt gt 4 15 0x0fofofof amp array threadIdx x lt lt 4 16 array threadIdx x Oxcccccccc amp array threadIdx x gt gt 2 17 0x33333333 amp array threadIdx x lt lt 2 18 array threadIdx x Oxaaaaaaaa amp array threadIdx x gt
20. mand takes the detailed information one level deeper by displaying lane information for all the threads in the warps The lane header includes all the active threads per warp It includes the program counter in addition to the thread index within the block to which it belongs The example below lists the 32 active lanes on the first active warp index 0 cuda gdb info cuda warp EV 0 1 Device Type gt200 SM Type sm 13 SM WP LN 30 32 32 egs LN 128 0 30 valid warps 00000000000000 0 32 valid active divergent lanes Oxffffffff oxffffffff 00000000 block 0 0 0 32 pc 0x00000000000002b8 thread 1 32 pc 0x00000000000002b8 thread 2 32 pc 0x00000000000002b8 thread 3 32 pc 0x00000000000002b8 thread 4 32 pc 0x00000000000002b8 thread 5 32 pc 0x00000000000002b8 thread 6 32 pc 0x00000000000002b8 thread 71 32 pc 0x00000000000002b8 thread 8 32 pc 0x00000000000002b8 thread 9 32 pc 0x00000000000002b8 thread 10 32 pc 0x00000000000002b8 thread 11 32 pc 0x00000000000002b8 thread 12 32 pc 0x00000000000002b8 thread 13 32 pc 0x00000000000002b8 thread 14 32 pc 0x00000000000002b8 thread 15 32 pc 0x00000000000002b8 thread 16 32 pc 0x00000000000002b8 thread 17 32 pc 0x00000000000002b8 thread 18 32 pc 0x00000000000002b8 thread 19 32 pc 0x00000000000002b8 thread 20 32 pc 0x00000000000002b8 thread 21 32 pc 0x00000000000002b8 thread 22 32 pc 0x00000000000002b8 thread 23 32 pc 0x00000000000002b8 thread 24 32 pc 0x00000000000002b8 th
21. outine in the device code gt Multi threaded applications may not work gt Device allocations larger than 100 MB on Tesla GPUs and larger than 32 MB on Fermi GPUs may not be accessible in the debugger gt Breakpoints in divergent code may not behave as expected CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 21 Chapter gt Debugging applications using textures is not supported CUDA GDB may output the following error message when setting breakpoints in kernels using textures Cannot access memory at address 0x0 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 22 Notice ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOARDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This
22. publication supersedes and replaces all other information previously supplied NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA the NVIDIA logo NVIDIA nForce GeForce NVIDIA Quadro NVDVD NVIDIA Personal Cinema NVIDIA Soundstorm Vanta TNT2 TNT RIVA RIVA TNT VOODOO VOODOO GRAPHICS WAVEBAY Accuview Antialiasing Detonator Digital Vibrance Control ForceWare NVRotate NVSensor NVSync PowerMizer Quincunx Antialiasing Sceneshare See What You ve Been Missing StreamThru SuperStability T BUFFER The Way It s Meant to be Played Logo TwinBank TwinView and the Video amp Nth Superscript Design Logo are registered trademarks or trademarks of NVIDIA Corporation in the United States and or other countries Other company and product names may be trademarks or registered trademarks of the respective owners with which they are associated Copyright 2007 2010 NVIDIA Corporation All rights reserved www nvidia com NVIDIA
23. read 25 32 pc 0x00000000000002b8 thread 26 32 pc 0x00000000000002b8 thread 27 32 pc 0x00000000000002b8 thread 28 32 pc 0x00000000000002b8 thread 29 32 pc 0x00000000000002b8 thread 30 32 pc 0x00000000000002b8 thread 31 32 pc 0x00000000000002b8 thread E Ee Sa EA ee EA E E E E EA EE E a SS ee 2222 AAA A A A A A A A AA A A A AA A A A A A A A A A A A oS SY YYW Vy Vryyryyrwyrwryryryrvyrwry OoOoddcccdccddcda OO O O S O O OC S Oo OC O C OO O C O OOrw ss SVN SSS RS uns OODDCOCCOOCO GO OO OO O O O OOO OOO GO OOO eee eee ro or YH YS YS YS YS YS YS YS STS SS LS DS WWNHNHNNMNNNNNNFRRERRERRRERERFRFPOONDOUBWNHER O FPOWODMDTNANOPWNHRFPOCOOMAINVNOUBPWNF OT YS FYB yw yV Vs NN nnn nnn rn nn nnn nan on CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 10 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS info cuda lane This command displays information per thread level if you are not interested in the warp level information for every thread cuda gdb info cuda lane DEV 0 1 Device Type gt200 SM Type sm_13 SM WP LN 30 32 32 Regs LN 128 M 0 30 valid warps 00000000000000 P 0 32 valid active divergent lanes Oxffffffff oxffffffff x00000000 block 0 0 N 0 32 pc 0x00000000000001b8 thread 0 0 0 info cuda kernels This command displays the list of current active kernels and the device on which they run In the following output example the
24. reverse cu line 25 cuda gdb b il cuda gdb b bitreverse 2 b 3 Breakpoint Breakpoint at 0x40204f file bitreverse cu line 8 21 at Ox40205b file bitreverse cu line 21 cuda gdb Breakpoint CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 1 16 Chapter 04 CUDA GDB WALK THROUGH 4 Run the CUDA application and it executes until it reaches the first breakpoint main set in step 3 cuda gdb Starting program old ssalian local src rel gpgpu toolkit r3 1 bin x86 64 Linux debug bitreverse Thread debugging using libthread_db enabled New process 4153 New Thread 140609798666000 LWP 4153 Switching to Thread 140609798666000 LWP 4153 Breakpoint 1 main at bitreverse cu 25 25 void d NULL int i 5 At this point commands can be entered to advance execution or to print the program state For this walkthrough continue to the device kernel Cuda gabe Continuing Breakpoint 3 at Oxle3O910 file bitreyverse cu lime 21 Launch of CUDA Kernel 0 on Device 0 Swattiching to CUDA kerne TM IM ST 00 Breakpoint 2 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 9 unsigned int idata unsigned int data cuda gdb has detected that a CUDA device kernel has been reached so it prints the current CUDA thread of focus 6 Verify the CUDA thread of focus with the info cuda threads command and switch between host thread and the CU

CUDA-GDB (NVIDIA CUDA Debugger)

Contents

Download Pdf Manuals

Related Search

Related Contents