Home

GPUmat User Guide

image

Contents

1. 3 6 INDEXED REFERENCES N Operation CPU GPU ver GPU ver GPU 0 23 0 22 assign 1 A 1 end B 0 007636 0 0126 0 01822 0 000382 2 A 1 10 B 0 00006 0 000638 0 000333 0 000327 3 A B 0 003462 0 000706 0 000338 0 000371 4 A 1 2 end B 0 004054 0 006677 0 030853 0 000364 5 A end 5 1 0 002161 0 003077 0 018304 0 000318 B 6 A end 5 1 0 001726 0 000756 0 000904 0 000318 B 7 AG B 0 000291 0 000658 0 003723 0 000356 Table 3 10 subsasgn performance analysis A slice B 1 1 END Matlab syntax Ah Bh 1 10 Equivalent slice syntax A slice B 1 1 10 Matlab syntax Ah Bh 2 3 1 Equivalent slice syntax A sli ce B 2 3 11 gt 7 Matlab syntax Ah Bh 2 3 1 1 Equivalent slice syntax A slice B 2 3 1 1 Matlab syntax Ah Bh Equivalent slice syntax NS Tar Re to 2 A rand 100 GPUsingle w Il rand 10 10 GPUsingle 38 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 7 GPUMAT FUNCTIONS Ah single A Bh single B 7 Matlab syntax ABI 310 7 10 Bh Equivalent assign syntax assigni Aa By Ca t10 OE A rand 100 GPUsingle B rand 4 10 GPUsingle Ah single A Bh single B 7 Matlab syntax Ah 2 3 1 5 1 10 Bh Equivalent assign syntax assign 1 A B 2 3 1 5 1 1 10 3 7 GPUmat functions GPUmat
2. gt gt whos Name Size Bytes Class Attributes A 1x1000000 924 GPUsingle ans 1x1 4 uint32 3 9 Low level GPU memory management Memory management using high level functions is explained in section 3 8 41 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 9 LOW LEVEL GPU MEMORY MANAGEMENT Memory management methods summary GPUallocVector Allocates a variable on GPU memory GPU variables are managed in the following way e The GPUsingle GPUdouble implements a destructor which takes care of clearing unused memory regions There is no need to explicitly clean up the GPU memory If necessary it can be done using the Matlab clear command e Ifthe user creates a Matlab pointer to the GPU memory using low level functions the memory is not automatically cleaned when the variable is not used anymore In this case the user must manually clean the GPU memory Above concepts are explained in next sections 3 9 1 Memory management using the GPU classes The following code shows how to allocate and delete a GPUsingle or GPU double A rand 100 100 GPUsingle clear A B GPUsingle creates empty GPUsingle setReal B REAL type setSize B 100 100 must set GPUsingle size GPUallocVector B allocate GPU memory clear B if GPUisDoublePrecision A rand 100 100 GPUdouble clear A B GPUdouble creates empty GPUdouble setReal B REAL type s
3. GPUcompileStop R Not every GPUmat function is supported in compilation mode Check the function reference for more details A Matlab variable passed to a GPUmat function is hard coded if not defined in GPUcompileStart as an input parameter For example A randn 5 5 GPUsingle GPUcompileStart code_ex2 f A assign 1 A single 1 GPUcompileStop In the above code all the arguments of the function assign are hard coded except A The function code_ex2 performs always the same operation on the input argument For example gt gt A randn 3 3 GPUsingle code_ex2 A A ans 0 3848 1 0992 0 4760 0 3257 0 6532 2 0516 1 2963 0 5051 0 4483 54 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 4 GPUmat compiler 4 4 LIMITATIONS Single precision REAL GPU type ans 1 1 1 1 1 1 1 1 1 Single precision REAL GPU type The following code is similar but allows the user to define the arguments of the assign function A randn 5 5 GPUsingle a 1 dummy b 1 dummy c 1 dummy GPUcompileStart code_ex3 f A a b c assign 1 A a b c GPUcompileStop The following command A randn 3 3 GPUsingle code_ex3 A single 2 A generates the following output gt gt A randn 3 3 GPUsingle code rca CA stnplet eo ete A ans 0 8776 0 6011 0 2676 1 0336 0 6740 0 1866 0 4198 1 0952 0 9509 55 GPUmat Guide Version 0 27 Copyright
4. 6 3 63 permute 152 permute Permute array dimensions SYNTAX permute X ORDER GPUsingle GPUdouble GPUsingle GPUdouble GPUsingle GPUdouble Dex a MODULE NAME NUMERICS DESCRIPTION R PERMUTE X ORDER rearranges the dimensions of X so that the yare in the order specified by the vector ORDER Compilation supported EXAMPLE A B rand 3 4 5 GPUsingle permute A 3 2 1 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 64 plus plus Plus SYNTAX R X Y R plus X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION X Y plus X Y adds matrices X and Y X and Y must have the same dimensions unless one is a scalar a 1 by 1 matrix A scalar can be added to anything Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R A B A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B 153 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 65 power power Array power SYNTAX R X Y R power X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION Z X Y denotes element by element powers Compilation supported EXAMPLE
5. Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R zeros size A GPUsingle GPUeq A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 32 GPUexp 206 GPUexp Exponential SYNTAX GPUexp X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUexp X R is equivalent to EXP X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 1 5 GPUsingle i rand 1 5 GPUsingle R complex zeros size X GPUsingle GPUexp X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 33 GPUeye 207 GPUeye Identity matrix SYNTAX GPUeye R R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUeye R fills the matrix R with 1 s on the diagonal and zeros elsewhere Compilation supported EXAMPLE X rand 10 GPUsingle GPUeye X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 208 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 34 GPUfill GPUfill Fill a GPU variable SYNTAX GPUfill A offset incr m p offsetp type A GPUsingle GPUdoubl
6. Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R zeros size B GPUsingle GPUgt A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 39 GPUimag 214 GPUimag Imaginary part of complex number SYNTAX GPUimag X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUimag X R is equivalent to imag X but result is returned in the input parameter R Compilation supported EXAMPLE A rand 10 GPUsingle sqrt 1 rand 10 GPUsingle R zeros size A GPUsingle GPUimag A R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 40 GPUldivide 215 GPUldivide Left array divide SYNTAX GPUldivide X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUldivide A B R is equivalent to ldivide A B but result is returned in the input parameter R Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R zeros size B GPUsingle GPUldivide A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 41 GPUle 216 GPUle Less than or equal SYNTAX GPUle X Y R
7. X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUle A B R is equivalent to le A B but result is returned in the input parameter R Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R zeros size A GPUsingle GPUle A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 42 GPUlog 217 GP Ulog Natural logarithm SYNTAX GPUlog X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUlog X R is equivalent to LOG X but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUlog X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 43 GPUlog10 218 GPUlog10 Common base 10 logarithm SYNTAX GPUlog10 X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUlog10 X R is equivalent to LOG10 X but the result is re turned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUlog10 X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS A
8. setReal Set a GPU variable as real SYNTAX setReal A A GPU variable MODULE NAME na DESCRIPTION setReal P sets the GPU variable P as real Should be called before using GPUallocVector Compilation not supported EXAMPLE A GPUsingle setSize A 10 10 setReal A GPUallocVector A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 72 setSize setSize Set GPU variable size SYNTAX setSize A SIZE A GPU variable MODULE NAME na DESCRIPTION setSize R SIZE set the size of R to SIZE Compilation not supported EXAMPLE A GPUsingle setSize A 10 10 A GPUdouble setSize A 10 10 162 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 73 sin sin Sine of argument in radians SYNTAX R sin X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION SIN X is the sine of the elements of X Compilation supported EXAMPLE X rand 10 GPUsingle R sin X X rand 10 GPUdouble R sin X MATLAB COMPATIBILITY Not implemented for complex X 163 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 74 single 164 single Converts a GPU variable into a Matlab single precision variable
9. ALPHABETICAL LIST 6 3 16 cosh cosh Hyperbolic cosine SYNTAX R cosh X X GPUsingle GPUdouble 105 R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION COSH X is the hyperbolic cosine of the elements of X Compilation supported EXAMPLE X R rand 10 GPUsingle cosh X MATLAB COMPATIBILITY Not implemented for complex X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 17 ctranspose 106 ctranspose Complex conjugate transpose SYNTAX R X R ctranspose X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION X is the complex conjugate transpose of X Compilation supported EXAMPLE X rand 10 GPUsingle i rand 10 GPUsingle R X R ctranspose X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 18 display 107 display Display GPU variable SYNTAX display X X GPUsingle GPUdouble MODULE NAME na DESCRIPTION Prints GPU single information DISPLAY X is called for the ob ject X when the semicolon is not used to terminate a statement Compilation supported EXAMPLE A rand 10 GPUsingle display A A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 19 doubl
10. numerics module GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 71 GPUzeros 244 GPUzeros GPU zeros array SYNTAX GPUzeros R R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUzeros R sets to zero all the elements of R Compilation supported EXAMPLE A rand 5 GPUsingle GPUzeros A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 72 memCpyDtoD 245 memCpyDtoD Device Device memory copy SYNTAX memCpyDtoD R X index count R GPUsingle GPUdouble X GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION memCpyDtoD R X index count copies count elements from X to R index Compilation supported EXAMPLE R rand 100 100 GPUsingle X rand 100 100 GPUsingle memCpyDtoD R X 100 20 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 73 memCpyHtoD 246 memCpyHtoD Host Device memory copy SYNTAX memCpyHtoD R X index count R GPUsingle GPUdouble X Matlab array MODULE NAME NUMERICS DESCRIPTION memCpyHtoD R X index count copies count elements from the Matlab variable X CPU to R index Compilation supported EXAMPLE R rand 100 100 GPUsingle X single rand 100 100 memCpyHtoD
11. 1GPU single complex 43 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 11 CODING GUIDELINES 2 Using real imag complex GPUreal GPUimag GPUcomplex A GPUsingle 1 2 3 4 5 sgrt 1 6 7 8 9 10 RE real A IM imag A same as above code with low level functions RE zeros size A GPUsingle IM zeros size A GPUsingle GPUreal A RE GPUimag A IM convert to complex D complex RE IM same as above code with low level functions E complex zeros size RE GPUsingle GPUcomplex RE IM E 3 Multiply a real array by the imaginary unit Gh G rand 10 Matlab real variable GPUsingle Gh sqrt 1 sqrt 1 gives imaginary unit 3 11 Coding guidelines To maximize the execution performance keep in mind the following points 44 e Memory Transfers Avoid excessive memory transfers between GPU CPU memory e Vectorized operations and for loops The best performance in both Matlab and GPUmat can be achieved by using vectorized operations and avoiding for loops More information can be found at the following link Matlab Code Vectorization Guide e Use low level functions to avoid the creation of too many intermediate and temporary variables This can speed up the code or help solving out of GPU memory errors e Compile the function using the GPUmat compiler The compiler can be used to record GPU functions into a new Matlab function Plea
12. GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION SQRT X is the square root of the elements of X NaN results are produced if X is not positive Compilation supported EXAMPLE Ps Il rand 10 GPUsingle sqrt X gt i MATLAB COMPATIBILITY Not implemented for complex X 169 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 79 subsref subsref Subscripted reference SYNTAX R X I X GPUsingle GPUdouble I GPUsingle GPUdouble Matlab range R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ACI subsref is an array formed from the elements of A specified by the subscript vector I The resulting array is the same size as I except for the special case where A and I are both vectors In this case ACI has the same number of elements as I but has the orientation of A Compilation not supported EXAMPLE A GPUsingle 1 2 3 4 5 A GPUdouble 1 2 3 4 5 idx GPUsingle 1 2 B A idx 170 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 80 sum 171 sum Sum of elements SYNTAX R sum X R sum X DIM X GPUsingle GPUdouble DIM integer R GPUsingle GPUdouble MODULE NAME na DESCRIPTION S SUM X is the sum of the elements of the vector X S SUM X
13. OST SCE u 0 A atime Os He he 168 OSMOSIS A SOE ok wile he ee ES we ME a 169 NT ooo cin ae A eRe Ge east de 2 170 EI Dasein Sree ae bo es eo oe A 171 23 O 172 6 3 82 tanh 3 ek eo OR a a en et 173 02383 TIMES 25 4 A Sa San US GS os A rer 174 6 3 84 unpackfC2C esa ee awe ere a a mS 175 6 3 85 unpackfC2R na a how Sa ek gS oe oie pie el 175 6 3 86 o 34 Ase a a a a OE U pa 176 OOT Zeros s un ii oh ee Sk EE Bh a ee ke a A 178 Low level functions alphabetical list 179 6 4 1 cuCheckStatus 7 2 2 2 a a do esa 179 6 4 2 cudaCheckStatus 2 5 22 2 3 22208 di e ds 179 6 4 3 cudaGetDeviceCount ecos eee Wer aa eee ae 180 6 4 4 cudaGetDeviceMajorMinor 181 6 4 5 cudaGetDeviceMemory 2 182 6 4 6 cudaGetDeviceMultProcCount 183 6 4 7 cudaGetLastError fuer del ade Oth eae la hs aw 183 6 4 8 cudaSetDevice Ss beatae a Bieta Bead BS 184 6 4 9 cudaThreadSynchroniz 184 6 4 10 eufftPlan3d aras a 3 a aa a 185 6 4 TI culi en ae a eB he FOR N a 185 6 4 12 cuMemGetlnfo 225 rar aa 186 ALS Berri a ae ee ade SG 187 GPUmat Guide Version 0 27 Copyright gp you org 6 4 14 6 4 15 6 4 16 6 4 17 6 4 18 6 4 19 6 4 20 6 4 21 6 4 22 6 4 23 6 4 24 6 4 25 6 4 26 6 4 27 6 4 28 6 4 29 6 4 30 6 4 31 6 4 32 6 4 33 6 4 34 6 4 35 6 4 36 6 4 37 6 4 38 6 4 39 6 4 40 6 4 41 6 4 42 6 4 43 6 4 44 6 4 45 6 4 46 6 4 47 6 4 48 6 4 49 6 4 50 6 4
14. SYNTAX R single X X GPU or Matlab variable R Matlab variable MODULE NAME na DESCRIPTION B SINGLE A returns the contents of the GPU variable A into a single precision Matlab array Compilation not supported EXAMPLE A rand 100 GPUsingle Ah single A A rand 100 GPUdouble Ah single A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 75 sinh sinh Hyperbolic sine SYNTAX R sinh X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION SINH X is the hyperbolic sine of the elements of X Compilation supported EXAMPLE X rand 10 GPUsingle R sinh X X rand 10 GPUdouble R sinh X MATLAB COMPATIBILITY Not implemented for complex X 165 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 76 size 166 size Size of array SYNTAX R size X M N SIZE X M1 M2 MN SIZE X X GPU variable MODULE NAME NUMERICS DESCRIPTION D SIZE X for M by N matrix X returns the two element row vector D M N containing the number of rows and columns in the matrix Compilation not supported EXAMPLE X rand 10 GPUsingle size X X rand 10 GPUdouble size X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH L
15. are executed on the GPU To convert the GPU variables A B and C into the Matlab variables Ah Bh and Ch use the functions single and double as follows hh Single precision A rand 100 GPUsingle A is on GPU memory B rand 100 GPUsingle B is on GPU memory C A B executed on GPU C is on GPU memory Ah single A Ah is on HOST A is on GPU Bh single B Bh is on HOST B is on GPU Ch single C Eh is on HOST C is on GPU hh double precision if GPUisDoublePrecision A rand 100 GPUdouble A is on GPU memory B rand 100 GPUdouble B is on GPU memory C A B executed on GPU C is on GPU memory Ah double A Ah is on HOST A is on GPU Bh double B Bh is on HOST B is on GPU Ch double C Ch is on HOST C is on GPU end The following code shows a different way to initialize the arrays A and B by using the colon function The original Matlab code is the following A single colon 0 1 1000 A is on CPU memory B single colon 0 1 1000 B is on CPU memory C A B executed on CPU C is on CPU memory The ported GPUmat code is the following 19 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 2 2 MATRIX MULTIPLICATION EXAMPLE A colon 0 1 1000 GPUsingle A is on GPU memory B colon 0 1 1000 GPUsingle B is on GPU memory C A B executed on GPU C is on GPU memory The Matlab expression A single colon 0 1 1000 is equivalent to A sing
16. dev if status 0 error Error getting total memory end totmem totmem 1024 1024 disp Total memory num2str totmem MB GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 6 cudaGetDeviceMultProcCount cudaGet DeviceMultProcCount Returns device multi processors count MODULE NAME na DESCRIPTION STATUS COUNT cudaGetDeviceMultProcCount DEV re turns the number of multi processors of the device DEV STATUS is the result of the operation Compilation not supported EXAMPLE dev 0 status count cudaGetDeviceMultProcCount dev if status 0 error Error getting numer of multi proc end disp Mult processors num2str count 6 4 7 cudaGetLastError 183 cudaGetLastError Wrapper to CUDA cudaGetLastError function MODULE NAME na DESCRIPTION STATUS cudaGetLastError returns the last error from the run time call STATUS is the result of the operation Original function declaration cudaError_t cudaGetLastError void Compilation not supported GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 8 cudaSetDevice cudaSetDevice Wrapper to CUDA cudaSetDevice function MODULE NAME na DESCRIPTION STATUS cudaSetDevice DEV sets the device to DEV and re turns the result of the o
17. exp tmp1 clear tmpl C tmp2 2 0 clear tmp2 The creation of the intermediate variables tmp1 and tmp2 can be avoided using low level functions Some high level functions have a corresponding low level function that performs exactly the same function without returning any value The output vector should be passed as input argument as follows A rand 100 GPUsingle A is on GPU B rand 100 GPUsingle B is on GPU C exp A B 2 0 C is on GPU create output vector C C zeros size A GPUsingle GPUplus A B C GPUexp C C GPUtimes C 2 0 C In the above code the result C is created using the zeros function C is then updated with the sum between A and B the exp C and finally it is multiplied by 2 0 At the end of the calculations C contains the result of exp A B 2 0 and no intermediate temporary variable has been created By using low level functions it is possible to avoid out of memory errors In fact temporary variables might not be deleted immediately by the Matlab garbage collector but in the above example we are sure that only one variable for the result has been created 47 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 11 CODING GUIDELINES 3 11 4 Matlab and GPU variables Operations and functions involving Matlab and GPU variables at the same time are not defined except operations involving GPU variables and Matlab scalars The following is an exampl
18. 1 7 Numerical functions Low level 64 Name Description GPUabs Absolute value GPUacos Inverse cosine GPUacosh Inverse hyperbolic cosine GPUand Logical AND GPUasin Inverse sine GPUasinh Inverse hyperbolic sine GPUatan Inverse tangent result in radians GPUatanh Inverse hyperbolic tangent GPUceil Round towards plus infinity GPUconj GPUconj X R is the complex conjugate of X GPUcos Cosine of argument in radians GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 1 FUNCTIONS BY CATEGORY GPUcosh Hyperbolic cosine GPUctranspose Complex conjugate transpose GPUeq Equal GPUexp Exponential GPUfloor Round towards minus infinity GPUge Greater than or equal GPUgt Greater than GPUldivide Left array divide GPUle Less than or equal GPUlog Natural logarithm GPUlog10 Common base 10 logarithm GPUlog1p Compute log 1 z accurately GPUlog2 Base 2 logarithm and dissect floating point number GPUlt Less than GPUminus Minus GPUmtimes Matrix multiply GPUne Not equal GPUnot Logical NOT GPUor Logical OR GPUplus Plus GPUpower Array power GPUrdivide Right array divide GPUsin Sine of argument in radians GPUtan Tangent of argument in radians GPUtanh Hyperbolic tangent GPUtimes Array multiply GPUtranspose Transpose GPUuminus Unary minus resh
19. 10 GPUsingle Ah single A Bh single B Ah 1 10 1 10 Bh assign 1 AO Ode assign 1 A Bh 1101 assign 1 A single 10 1 1 10 1 1 10 96 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 8 atan 97 atan Inverse tangent result in radians SYNTAX R atan X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ATAN X is the arctangent of the elements of X Compilation supported EXAMPLE X R rand 10 GPUsingle atan X MATLAB COMPATIBILITY Not implemented for complex X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 9 atanh 98 atanh Inverse hyperbolic tangent SYNTAX R atanh X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ATANH X is the inverse hyperbolic tangent of the elements of X Compilation supported EXAMPLE X R rand 10 GPUsingle atanh X MATLAB COMPATIBILITY Not implemented for complex X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 10 ceil ceil Round towards plus infinity SYNTAX R ceil X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION CEIL X rounds the elements of
20. 1000 GPUdouble B is a GPU variable end A zeros size GPUsingle A zeros size GPUdouble Has the same behavior as Matlab zeros function Creates a GPU array with zeros single or double precision Example A zeros 1 1000 GPUsingle A is a GPU variable if GPUisDoublePrecision B zeros 1 1000 GPUdouble B is a GPU variable end A ones size GPUsingle 32 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 3 PERFORMING CALCULATIONS ON THE GPU A ones size GPUdouble Has the same behavior as Matlab ones function Creates a GPU array with ones single or double precision Example A ones 1 1000 GPUsingle A is a GPU variable if GPUisDoublePrecision B ones 1 1000 GPUdouble B is a GPU variable end Find some examples of GPU variables creation in the file CreateGPU Vari ables m located in the example folder GPU variables can be converted into different types as follows if GPUisDoublePrecision A ones 1 1000 GPUdouble A is a GPU variable double prec B ones 1 1000 B is a CPU variable double prec A B gives an error The CPU variable B is not automatically conveterted to a GPU variable C A B A 1 is OK The scalar 1 is automatically converted to a GPU variable C A 1 end If Matlab types and GPU types are combined together the conversion of one type to the other is not automatic except for scalars 3 3 Performing calculati
21. 51 6 4 52 6 4 53 6 4 54 CONTENTS CONTENTS BESZE O a a oR eRe ot Aenea ees Ge Rk aes 188 PetTypPe aa DA a 189 GPU eta Fes the rare paren ee ds dl 190 GPUICOS 2 5 Wienke Zoe ee ele Dente en as Baan ee Ge ite oe 191 GPUac sh eeii 2 8 2 288 a set 192 GPUallocVector a e Gadd BR Gee Mw ee OS 193 GPWands ui 22 ee A A ee te 194 GPUs dr N ee oleh ag Gide site te z 195 GPlasinhr ua oe a eS te ES os 196 GPU te ca Va ered SAP Seder oN IA he nee 197 GPUatanh ers too ey os tence ay See ee ee ts ne 198 CPUC a 1 LL te eR een SEAR Seen Seat ob he WS iia 199 GPUCOMPIEX ar yaa ete ee 200 PUC N de ca 20h ee Se ahead are ee at eee 201 GRUESA a ant Bh SG 202 A 2 2 5 2 3 288 28 a 6 ble Be oe ade A 203 GPUctrans pos s nts cas BRA tee ee Si OA 204 GPU EG ace Be kOe ew ba BIE ee 205 GPW exits ois A eee ek BO rn 206 GRUESA bee Eas Be ae aed 207 GRPUFII xe ahesA ea AE Groene de he neice 209 GPU 2 A Ghee ae e 210 CBP O ne a ake uke Ga Me re 211 GPUgetUserModule 212 GE Se oe A AR es 213 GPUimag ra e a e 000 4d 214 GPUldivide E 5 Es DA AAA ee 215 Ge ar es na a E Ch a ek E en a ar 216 A wer ee Nee ee 217 GPUNGEIO ma m 28 2 2 8 5 2 2 5 ee et atk 218 GPUISETD ce sens hee pare Ed NE O 219 GRUE Z tect oe rer a ang ae A ae 220 GRU ae ch ot faye it ener A Sates aed 221 GPUMINUS 23 v3 ee HER An Se 222 GPUmtimes 4 5 sata Artus II a SA 223 GPURE an re rare Kassa de do hee 224 GPURGES 2 2 232 es 28 2 ae ae ER 225 GP W
22. A ones 5 GPUsingle GPUrand A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 56 GPUrandn 230 GPUrandn GPU pseudorandom generator SYNTAX GPUrandn R R GPUsingle GPUdouble MODULE NAME RAND DESCRIPTION GPUrandn R returns in R a matrix containing pseudorandom val ues drawn from the normal uniform distribution Compilation supported EXAMPLE A ones 5 GPUsingle GPUrandn A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 57 GPUrdivide 231 GPUrdivide Right array divide SYNTAX GPUrdivide X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUrdivide X Y R is equivalent to rdivide X Y but the re sult is returned in input parameter R Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R zeros size A GPUsingle GPUrdivide A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 58 GPUreal 232 GPUreal Real part of complex number SYNTAX GPUreal X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUreal X R is equivalent to real X but result is returned in the inpu
23. DIM sums along the dimension DIM Note currently the performance of the sum X DIM with DIM gt 1 is 3x or 4x better than the sum X DIM with DIM 1 Compilation not supported EXAMPLE X rand 5 5 GPUsingle i rand 5 5 GPUsingle R sum X E sum X 2 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 81 tan tan Tangent of argument in radians SYNTAX R tan X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION TAN X is the tangent of the elements of X Compilation supported EXAMPLE X rand 10 GPUsingle R tan X X rand 10 GPUdouble R tan X MATLAB COMPATIBILITY Not implemented for complex X 172 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 82 tanh tanh Hyperbolic tangent SYNTAX R tanh X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION TANH X is the hyperbolic tangent of the elements of X Compilation supported EXAMPLE X rand 10 GPUsingle R tanh X X rand 10 GPUdouble R tanh X MATLAB COMPATIBILITY Not implemented for complex X 173 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 83 times times Array multiply SYNTAX R X Y R tim
24. Function Reference 6 2 OPERATORS 6 2 19 A B times Array multiply SYNTAX R X Y R times X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble DESCRIPTION X Y denotes element by element multiplication X and Y must have the same dimensions unless one is a scalar A scalar can be multiplied into anything Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R A B A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B A rand 10 GPUdouble i rand 10 GPUdouble B rand 10 GPUdouble ix rand 10 GPUdouble R A B 87 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 20 A B vertcat Vertical concatenation SYNTAX R X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble DESCRIPTION A B is the vertical concatenation of matrices A and B A and B must have the same number of columns Any number of matrices can be concatenated within one pair of brackets Compilation not supported EXAMPLE A zeros 10 1 GPUsingle colon 0 1 10 GPUsingle 88 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 High level functions alphabetical list 6 3 1 abs 89 abs Absolute value SYNTAX R abs X X GPUsingle GPUdouble R GPU
25. GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 86 vertcat vertcat Vertical concatenation SYNTAX R X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME na DESCRIPTION A B is the vertical concatenation of matrices A and B A and B must have the same number of columns Any number of matrices can be concatenated within one pair of brackets Compilation not supported EXAMPLE A zeros 10 1 GPUsingle colon 0 1 10 GPUsingle 176 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 177 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 87 zeros 178 zeros GPU zeros array SYNTAX zeros N GPUsingle zeros M N GPUsingle zeros M N GPUsingle zeros M N P GPUsingle zeros M N P GPUsingle zeros N GPUdouble zeros M N GPUdouble zeros M N GPUdouble zeros M N P GPUdouble zeros M N P GPUdouble MODULE NAME NUMERICS DESCRIPTION zeros N GPUsingle is an N by N GPU matrix of zeros zeros M N GPUsingle or zeros M N GPUsingle is an M by N GPU matrix of single precision zeros zeros M N P GPUsingle or zeros M N P GPUsingle is an M by N by P by GPU array of single pre
26. R 127 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 39 ifft 128 ifft Inverse discrete Fourier transform SYNTAX R ifft X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION IFFT X is the inverse discrete Fourier transform of X Compilation supported EXAMPLE X rand 1 5 GPUsingle i rand 1 5 GPUsingle R fft X KESE ERCOR GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 40 ifft2 129 ifft2 Two dimensional inverse discrete Fourier transform SYNTAX R ifft2 X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION IFFT2 F returns the two dimensional inverse Fourier transform of matrix F Compilation supported EXAMPLE X rand 5 5 GPUsingle i rand 5 5 GPUsingle R ELO X ifft2 R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 41 imag 130 imag Imaginary part of complex number SYNTAX R imag X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION R imag X returns the imaginary part of the elements of X Compilation supported EXAMPLE A R rand 10 GPUsingle sqrt 1 rand 10 GPUsingle imag
27. R X 100 20 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 74 reshape reshape Reshape array SYNTAX reshape X m n reshape X m n p reshape X m n p GPUsingle GPUdouble GPUsingle GPUdouble IS DIDI DI Il MODULE NAME na DESCRIPTION R reshape X m n returns the m by n matrix R whose elements are taken column wise from X R reshape X m n p or B reshape A m n p returns an n dimensional array with the same elements as X but reshaped to have the size m by n by p by Compilation not supported EXAMPLE X rand 30 1 GPUsingle R reshape X 6 5 R reshape X 6 5 247 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 75 round round Round towards nearest integer SYNTAX R round X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ROUND X rounds the elements of X to the nearest integers Compilation supported EXAMPLE X rand 10 GPUsingle R round X X rand 10 GPUdouble R round X MATLAB COMPATIBILITY Not implemented for complex X 248 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 76 setComplex setComplex Set a GPU variable as complex SYNTA
28. X rand 10 GPUsingle R zeros size X GPUsingle GPUsinh X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 36 GPUsqrt 125 GPUsart Square root SYNTAX GPUsqrt X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUsqrt X R is equivalent to sqrt X but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUsqrt X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 37 GPUstart 126 GPUstart Starts the GPU environment and loads required com ponents SYNTAX GPUstart MODULE NAME na DESCRIPTION Start GPU environment and load required components Compilation not supported EXAMPLE GPUstart GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 38 gt gt Greater than SYNTAX R X gt Y R gt X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A gt B gt A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A gt B single R R gt A B single
29. gp you org CHAPTER 4 GPUmat compiler 4 4 LIMITATIONS Single precision REAL GPU type ans 2 2 2 2 2 2 2 2 2 Single precision REAL GPU type Indexed assignement are not implemented For example the following code generates an error A randn 5 5 5 GPUsingle GPUcompileStart code_ex1 f A 3 NCR 8 sys GPUcompileStop R A randn 5 5 5 GPUsingle GPUcompileStart code_ex1 f A KGRS e yl GPUcompileStop The above code can be replaced with the following A randn 5 5 5 GPUsingle GPUcompileStart code_ex1 f A se aliea Cie EL 1 8 222 90 GPUcompileStop R A randn 5 5 5 GPUsingle GPUcompileStart code_ex1 f A assi ingie CMER GPUcompileStop Above example shows that native Matlab indexed assignement statements have to be replaced with functions slice or assign 56 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 4 GPUmat compiler 4 5 COMPILATION ERRORS 4 5 Compilation errors 4 5 1 GPUfor 1 Unable to parse iterator GPUmat was not able to parse the iterator The following code contains an error GPUfor jt 1 M GPUend The above code generates an error of type GPUfor 1 4 5 2 GPUfor 2 Iterator name cannot be i orj The variables and j cannot be used as iterator names The following code generates an error GPUfor j 1 10 GPUend Above code can be modified as follows GPUfor jt 1 10 GPUend 4 5 3
30. in Chapter 6 13 GPUmat Guide Version 0 27 Copyright gp you org Chapter 2 Quick start The most important concepts about GPUmat are the following e GPUmat defines the following GPU variables or classes i GP Us ingle ii GPUdouble They correspond to single and double precision floating point variables respectively We will refer to these variables as GPU variables because although they are available from Matlab workspace as any other Matlab variable they are allocated on the GPU memory Matlab variables are allocated on CPU memory e GPUmat defines functions and operators that are called from Matlab and executed on the GPU These functions work with GPUsingle or GPUdouble classes The next example creates two single precision Matlab variables Ah and A allocated on the CPU memory and on the GPU memory respectively Ah is used to initialize A Ah A single rand 100 100 Ah in on CPU memory GPUsingle Ah A is on GPU memory In the above code the function single in used to create the single precision Matlab array Ah and similarly the GPUsingle function is used to create a single precision GPU variable Although is always possible to use GPUsingle or GPUdouble to create a GPU variable these functions perform a memory transfer from CPU memory to GPU memory they copy the content of the CPU array to the GPU memory It is faster if the GPU array is directly created on the GPU memory For example it is possible to dire
31. is a GPU variable The syntax to create a Matlab variable is very similar to the above code Ah colon 0 2 1000 A is a CPU variable Existing variables can be efficiently used also to create others The follow ing example shows how to create a complex GPU variable using the colon function A colon 0 2 6 GPUsingle A is a real GPU variable B sqrt 1 A B is a complex GPU variable G al ap 45 All real elements of B are set to 1 The previous commands result in gt gt A Single precision REAL GPU type gt gt B ans O 0 2200001 0O 400001 0O 6 0000i Single precision COMPLEX GPU type gt gt C ans 1 0000 1 0000 2 00001 1 0000 4 0000i 1 0000 6 00001 The function colon is very efficient to create a GPU variable because array 31 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 2 CREATING A GPU VARIABLE values are directly created on the GPU memory without any data transfer between CPU and GPU A rand size GPUsingle A rand size GPUdouble A randn size GPUsingle A randn size GPUdouble Have the same behavior as Matlab rand and randn function Create a GPU array with random numbers single or double precision Example A rand 1 1000 GPUsingle A is a GPU variable if GPUisDoublePrecision B rand 1 1000 GPUdouble B is a GPU variable end A randn 1 1000 GPUsingle A is a GPU variable if GPUisDoublePrecision B randn 1
32. loads the required library components GPUstop Stops the GPU environment GPUinfo Prints information about available CUDA capable GPUs Table 3 1 GPU management functions Table 3 1 shows functions used to start GPUmat and to manage the GPU The GPUstart and GPUstop commands are used to start and to stop GPUmat respectively If more than a GPU is installed in the system the user will be prompted to select the GPU device to use The command GPUinfo prints information about installed GPUs GPUinfo There is 1 device supporting CUDA CUDA Driver Version 2 30 CUDA Runtime Version 2 30 Device 0 GeForce GTX 275 CUDA Capability Major revision number il CUDA Capability Minor revision number 3 Total amount of global memory 939196416 bytes Number of multiprocessors 30 Number of cores 240 27 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 2 CREATING A GPU VARIABLE 3 2 Creating a GPU variable A GPU variable is a Matlab variable that is allocated on GPU memory and is created using the Matlab classes GPUsingle or GPUdouble The GPUsingle and GPUdouble classes are equivalent to the single and double precision real complex types in Matlab Functions to create a GPU variable are shown in table 3 2 and explained with more details in the next paragraphs It is important to know that a memory transfer between GPU and CPU is required if the GPU variable is initialized with a Matlab array A mem
33. ne ne Not equal SYNTAX R X Y R ne X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A B ne A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A B single R R ne A B single R 147 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 59 not 148 not Logical NOT SYNTAX RESTA X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A not A performs a logical NOT of input array A Compilation supported EXAMPLE A GPUsingle 1 2 0 4 R A single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 60 numel 149 numel Number of elements in an array or subscripted array ex pression SYNTAX R numel X X GPU variable R number of elements MODULE NAME NUMERICS DESCRIPTION N NUMEL A returns the number of elements N in array A Compilation not supported EXAMPLE X rand 10 GPUsingle numel X X rand 10 GPUdouble numel X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 61 ones 150 ones GPU one
34. pito Eee A ER Set ase Sele a 8 135 e O A ne 136 0 3 48 JENDEh zen We ee a Se AP Sst gh 137 03149 OR dra it ne IE it a ee ata ag te Ie at 138 S25 ISO en et AD eed AD etal anes re 139 5 GAS NORV se ay 6 tos cies oe sie GV ene ee a a recs pee eds a pe a 140 Da NOR pe a ate a Monel 6 ence nh RN Bo 141 OS Dect a E ee oe ea 142 O25 54 MINUS 4 uF Ye jo Me a GAH ene a a ts hc 143 60 355 mrdivide 2 8 2 A Bead rl 144 0 3 50 MMES a s a i dt a re 145 03 51 NIMS i A eee he The etek Beet etna each of ee 146 03 58 NE ca a Oe ae RO Be eS A ee Se BIE 147 0359 NOE 2 0 4a ewe an ever we a ete Soke Be and be a 148 63 00 Mumel co Pura bo et bree Se es eae be 149 GPUmat Guide Version 0 27 Copyright gp you org 6 4 CONTENTS CONTENTS OSOL ONES 2 4 a ae hw tie 150 OOZ SOL irea een a E A 151 6 3 63 permute sto Dale palas ea et 152 6 304 PINS 2A Get obs Braten ae Be Be rs na Gs Sees ea gh 153 6 3 65 power eee E a a a a get hes a 154 0 360 rand kone de area Beta Be ee ee D 155 6 3 67 IN ee A 157 6 3 68 rdivide A Gee Se oh ig Gee Sa oe Soin gg See RP wien eS 158 6 32 69 real eiman eis A ES a ER Eoi 159 93 70 TeRBmaL ua Se eee vt re CA Bed OAL gS he ae 160 6 3 71 setReals 2 aca ine ay She ev ey ren a Ben 161 GIRLS SIZE Au Fe flats e Len te baer Bas a ia 162 DINOS Te Lt St Be ee te ee ee pa ae oi es e 163 Dad small eins cd Bie o PRO RHE Et auld ac E 164 A A Bn eee oe Be te ea a ts Se 165 OS 7 OCSIZE rl AS OS A a oe o 166
35. 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 22 eye eye Identity matrix SYNTAX eye N CLASSNAME eye M N CLASSNAME eye M N CLASSNAME eye M N P CLASSNAME eye M NP CLASSNAME CLASSNAME GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION EYE M N CLASSNAME or EYE M N CLASSNAME is an M by N ma trix with 1 s of class CLASSNAME on the diagonal and zeros else where CLASSNAME can be GPUsingle or GPUdouble Compilation supported EXAMPLE Ps Il eye 2 3 GPUsingle eye 4 5 GPUdouble rs Il 111 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 23 fft 142 fft Discrete Fourier transform SYNTAX R fft X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION FFT X is the discrete Fourier transform DFT of vector X Compilation supported EXAMPLE X R rand 1 5 GPUsingle i rand 1 5 GPUsingle fft X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 24 fft2 113 fft2 Two dimensional discrete Fourier Transform SYNTAX R fft2 X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION FFT2 X returns the two dimensional Fourier transform of matrix X
36. 45 3 11 3 Reduce intermediate variables creation 46 3 11 4 Matlab and GPU variables 48 3 12 Performance analysis ln a ee A pa 49 GPUmat compiler 50 AL EMS a ria dr ra AS TE Th oe 50 4 2 ls E BaP Se ee ae an 52 4 3 System requirements 22 oo oo 53 4 4 Limitations eS So ae ok en be A a er a 53 4 5 Compilation errors Sue ee br abi sg 57 4 5 1 GPUfor 1 Unable to parse iterator 57 4 5 2 GPUfor 2 Iterator name cannot be i orj 57 4 5 3 GPUfor 3 GPUfor iterator must be a Matlab double pre cision variable ace Gite na AS ew Ha Ee Be woe 57 4 5 4 NUMERICS 1 Function compilation is not implemented 57 4 5 5 GPUMANAGER 13 GPUtype variable not available in compilation context cir od A a 58 4 5 6 GPUMANAGER 15 Compilation stack overflow 58 4 6 Not implemented functions 2 2 2222 2 nn 58 4 7 Additional compilation options 2 22 2 22m 59 Developer s section 60 Function Reference 61 6 1 Functions by category e A ii A Ss 61 6 1 1 GPU startup and management 61 6 1 2 GPU variables management 61 6 1 3 GPU memory management 62 6 1 4 Random numbers generator High level 62 6 1 5 Random numbers generator Low level 62 6 1 6 Numerical functions High level 63 6 1 7 Numerical functions Low level 64 6 1 8 General information 65 6 1 9 User defined modul
37. 6 Using the colon function to create a vector with arbitrary real increments between the elements A colon 0 1 5 GPUsingle A is on GPU memory results in A 0 0 1000 0 2000 0 3000 0 4000 0 5000 In the following example the function single is used to convert the GPU variable A into the Matlab variable Ch while the function double is used to convert the double precision GPU variable B into the double precision Matlab Dh Every time a GPU variable is converted into a Matlab variable the data is copied from GPU memory to CPU memory Ah A single rand 100 100 Ah in on CPU memory GPUsingle Ah Create GPU variable A The following creates the same variable A without CPU to GPU memory transfer A rand 100 100 GPUsingle Create GPU variable A Ch single A convert A GPU to Ch CPU if GPUisDoublePrecision Bh rand 100 100 Bh in on CPU memory B GPUdouble Bh Create GPU variable B The following creates the same variable A without CPU to GPU memory transfer B rand 100 100 GPUdouble Create GPU variable B Dh double B convert B GPU to Dh CPU end The following example shows 16 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start e The creation of the GPU variable A initialized with Matlab array Ah e The calculation of exp A The execution is on GPU and the result is stored on the GPU variable C e The conversion of the result C into the Matlab va
38. A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 42 iscomplex 131 iscomplex True for complex array SYNTAX R iscomplex X X GPU variable R logical 0 or 1 MODULE NAME NUMERICS DESCRIPTION ISCOMPLEX X returns 1 if X does have an imaginary part and 0 otherwise Compilation not supported EXAMPLE A rand 5 GPUsingle iscomplex A A rand 5 GPUsingle i rand 5 GPUsingle iscomplex A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 43 isempty 132 isempty True for empty GPUsingle array SYNTAX R isempty X X GPU variable R logical 0 or 1 MODULE NAME NUMERICS DESCRIPTION ISEMPTY X returns 1 if X is an empty GPUsingle array and 0 otherwise An empty GPUsingle array has no elements that is prod size X 0 Compilation not supported EXAMPLE A GPUsingle isempty A A rand 5 GPUsingle i rand 5 GPUsingle isempty A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 44 isreal 133 isreal True for real array SYNTAX R isreal X X GPU variable R logical 0 or 1 MODULE NAME NUMERICS DESCRIPTION ISREAL X returns 1 if X does not have an imaginary part and 0 otherwise Compilation
39. A rand 10 GPUsingle I 28 R A 7 B A rand 10 GPUsingle i rand 10 GPUsingle R A 7 B MATLAB COMPATIBILITY Implemented for REAL exponents only 154 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 66 rand 155 rand GPU pseudorandom generator SYNTAX rand N GPUsingle rand M N GPUsingle rand M N GPUsingle rand M N P GPUsingle rand M N P GPUsingle rand N GPUdouble rand M N GPUdouble rand M N GPUdouble rand M N P GPUdouble rand M N P GPUdouble MODULE NAME RAND DESCRIPTION rand N GPUsingle is an N by N GPU matrix of values generated with a pseudorandom generator uniform distribution rand M N GPUsingle or rand M N GPUsingle is an M by N GPU matrix rand M N P GPUsingle or rand M NP GPUsingle is an M by N by P by GPU array of single precision values rand M N P GPUdouble or rand M NP GPUdouble is an M by N by P by GPU array of double precision values Compilation supported EXAMPLE rand 10 GPUsingle rand 10 10 GPUsingle rand 10 10 GPUsingle rand 10 GPUdouble rand 10 10 GPUdouble rand 10 10 GPUdouble oe nat GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 156 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Functi
40. ALL GPUstart command is started from the directory where the library was unpacked The GPUstart command should generate the following output in your Matlab command window gt gt GPUstart Starting GPU There is 1 device supporting CUDA CUDA Driver Version 2 30 CUDA Runtime Version 2 30 Device 0 GeForce GTX 275 CUDA Capability Major revision number 1 CUDA Capability Minor revision number 3 Total amount of global memory 939196416 bytes Number of multiprocessors 30 Number of cores 240 CUDA compute capability 1 3 done Loading module EXAMPLES_CODEOPT Loading module EXAMPLES_NUMERICS gt numerics13 cubin Loading module NUMERICS gt numerics13 cubin If you get some error make sure that GPUmat is in the Matlab path or run the diagnostic command gt gt GPUmatSystemCheck The above command generates a report about the system configuration xxx GPUmat system diagnostics Running on gt win32 GPUmat version gt 0 2 GPUmat build gt 23 Oct 2009 GPUmat architecture gt win32 xxx ARCHITECTURE TEST xxx GPUmat architecture test gt passed 11 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 1 Introduction 1 5 TERMINOLOGY CUDA TEST CUDA CUBLAS gt installed xxx CUDA CUFFT gt installed xxx CUDA CUDART gt installed On Windows it is also necessary to have the Microsoft Visual C 2008 Re distributable Package installed GPUstart generate
41. AMPLE freemem 0 c 0 status freemem c cuMemGetInfo freemem c GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 13 getPtr 187 getPtr Get pointer on GPU memory SYNTAX R getPtr X X GPU variable R the pointer to the GPU memory region MODULE NAME NUMERICS DESCRIPTION This is a low level function used to get the pointer value to the GPU memory of a GPU variable Compilation not supported EXAMPLE A rand 10 GPUsingle getPtr A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 14 getSizeOf 188 getSizeOf Get the size of the GPU datatype similar to sizeof in C SYNTAX R getSizeOf X X GPU variable R the size of the GPU variable datatype MODULE NAME NUMERICS DESCRIPTION This is a low level function used to get the size of the datatype of the GPU variable Compilation not supported EXAMPLE A rand 10 GPUsingle getSize0f A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 15 getType 189 getType Get the type of the GPU variable SYNTAX R getType X X GPU variable R the type of the GPU variable MODULE NAME NUMERICS DESCRIPTION This is a low level function used to get the type of the GPU v
42. B to GPUsingle as follows for i 1 1e6 45 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 11 CODING GUIDELINES A rand 3 3 GPUsingle B rand 3 3 GPUsingle C A B hh do something with C end Nevertheless matrix operations can be used instead of the for loop by creating two arrays with 3 x 3e6 elements and multiplying them element by element A rand 3 3e6 GPUsingle A is on GPU B rand 3 3e6 GPUsingle B is on GPU C A B C is on GPU The following Matlab code perform the matrix addition C A B using a for loop statement A rand 100 B rand 100 C zeros 100 for i 1 size A 1 for j 1 size B 2 CJ AG Ba end end To port the code to the GPU it is suggested to use the element by element addition instead of using the for loop A rand 100 GPUsingle A is on GPU B rand 100 GPUsingle B is on GPU C A B C is on GPU 3 11 3 Reduce intermediate variables creation Consider the following code A B rand 100 GPUsingle A is on GPU rand 100 GPUsingle B is on GPU 46 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 11 CODING GUIDELINES C exp A B 2 0 C is on GPU In the above code the calculation of C is done internally by Matlab with the following steps A rand 100 GPUsingle A is on GPU B rand 100 GPUsingle B is on GPU hC exp A B 2 0 A C is on GPU tmp1 A B tmp2
43. CPU 3 4 Porting existing Matlab code To port existing Matlab code Matlab variables have to be converted to a GPU variable except scalars The easiest way to do it is to use the GPUsingle or GPUdouble initialized with the existing Matlab variable but this is not the most efficient approach because it involves a memory transfer between CPU and GPU Here is an example 34 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 4 PORTING EXISTING MATLAB CODE Name Description a b Binary addition a b Binary subtraction a Unary minus a b Element wise multiplication arb Matrix multiplication Right element wise division Left element wise division Element wise power Less than Greater than Less than or equal to Greater than or equal to Not equal to Equality Logical AND Logical OR a Logical NOT Complex conjugate trans pose aa Matrix transpose m NS ion ion ej o o ARS Il o o0 oo o o JS m Table 3 9 Operators defined for GPU variables Ah 0 10 1000 Ah is on CPU A GPUsingle Ah A is on GPU single precision if GPUisDoublePrecision B GPUdouble Ah B is on GPU double precision end The above code can be written more efficiently using the colon function as follows A colon 0 10 1000 GPUsingle A is on GPU if GPUisDoublePrecision B colon 0 10 1000 GPU
44. Compilation supported EXAMPLE X R rand 5 5 GPUsingle i xrand 5 5 GPUsingle fft2 X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 25 floor floor Round towards minus infinity SYNTAX R floor X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION FLOOR X rounds the elements of X to the nearest integers towards minus infinity Compilation supported EXAMPLE X R rand 1 5 GPUsingle floor X MATLAB COMPATIBILITY Not implemented for complex X 114 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 26 ge ge Greater than or equal SYNTAX R X gt Y R ge X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A gt B ge A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A gt B single R R ge A B single R 115 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 27 GPUcompileAbort 116 GPUcompileAbort Aborts the GPUmat compilation SYNTAX GPUcompileAbort MODULE NAME na DESCRIPTION Aborts the GPUmat compilation Check the ma
45. ER 6 Function Reference 6 2 OPERATORS 6 2 3 A eq Equal SYNTAX R X R eq X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble DESCRIPTION A B eq A B does element by element comparisons between A and B 71 Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A B single R R eq A B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 4 A gt B ge Greater than or equal SYNTAX R X gt Y R ge X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 72 DESCRIPTION A gt B ge A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A gt B single R R ge A B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 5 A gt B gt Greater than SYNTAX R X gt Y R gt X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 73 DESCRIPTION A gt B gt A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A gt B single R R gt A B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2
46. ESCRIPTION Stops GPU environment Compilation not supported 6 4 62 GPUsync GPUsync Wait until all GPU operations are completed SYNTAX GPUsync MODULE NAME na DESCRIPTION Wait until all GPU operations are completed Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle tic A B GPUsync toc 235 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 63 GPUtan 236 GPUtan Tangent of argument in radians SYNTAX GPUtan X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUtan X R is equivalent to tan X but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUtan X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 64 GPUtanh 237 GPUtanh Hyperbolic tangent SYNTAX GPUtanh X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUtanh X R is equivalent to tanh X but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUtanh X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LI
47. EVEL FUNCTIONS ALPHABETICAL LIST 167 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 77 slice slice Subscripted reference SYNTAX R slice X Ri R2 RN X GPUsingle GPUdouble R1 R2 RN Range R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION slice X R1 RN is an array formed from the elements of X specified by the ranges R1 R2 RN A range can be constructed as follows inf stride sup defines a range between inf and sup with spec ified stride It is similar to the Matlab syntax A inf stride sup The special keyword END please note uppercase END can be used similar to the colon used in Matlab indexing i1 i2 in any array enclosed by brackets is consid ered an indexes array similar to A 1 2 3 4 1 2 in Matlab i1 a single value is interpreted as an index Similar to A 10 in Matlab Compilation supported EXAMPLE Bh single rand 100 B GPUsingle Bh Ah Bh 1 end A slice B 1 1 END Ah Bh 1 10 Ke selico e MeL ioa Ah Bh 2 3 1 Ae slice ch fies S let Ds Ah Bh 2 3 1 1 A slice B 2 3 1 1 Ah Bh ESESTE CO CB RAA swe 168 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 78 sqrt sqrt Square root SYNTAX R sqrt X X
48. GPUfor 3 GPUfor iterator must be a Matlab double precision variable A valid iterator must be a Matlab double precision variable 4 5 4 NUMERICS 1 Function compilation is not imple mented Some functions cannot be used during the compilation Please check Sec tion 4 6 for a list of not implemented functions 57 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 4 GPUmat compiler 4 6 NOT IMPLEMENTED FUNCTIONS 4 5 5 GPUMANAGER 13 GPUtype variable not available in compilation context When accessing a variable during the compilation the variable should be defined in the compilation context A new variable is automatically added to the compilation context whereas an existing variable should be declared when calling the function GPUcompileStart For example A randn 5 5 GPUsingle GPUcompileStart code_ex4 f A R exp A GPUcompileStop In the above code the variable R is created during the compilation and it is automatically added to the compilation context The variable A must be passed to the function GPUcompilerStart otherwise an error is generated 4 5 6 GPUMANAGER 15 Compilation stack overflow The compiler stack is limited This error can occur in the following cases e The script being compiled is too long The compiled function should not be too long Try to split your code into different parts e Matlab for loop If you compile a for loop not a GPUfor loop the GPUmat compiler ge
49. GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 29 GPUcosh 203 GPUcosh Hyperbolic cosine SYNTAX GPUcosh X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUcosh X R is equivalent to COSH X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUcosh X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 30 GPUctranspose 204 GPUctranspose Complex conjugate transpose SYNTAX GPUctranspose X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUctranspose X R is equivalent to ctranspose X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle i rand 10 GPUsingle R complex zeros size X GPUsingle GPUctranspose X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 31 GPUeq 205 GPUeg Equal SYNTAX GPUeq X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUeq A B R is equivalent to eq A B but result is returned in the input parameter R
50. GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 68 GPUuserModuleLoad 241 GPUuserModuleLoad Loads CUDA cubin module SYNTAX GPUuserModuleLoad module_name filename module_name string filename string MODULE NAME na DESCRIPTION GPUuserModuleLoad module_name filename loads the CUDA cubin module filename and assigns to it the name module_name Module handler can be retrieved using GPUgetUserModule Compilation not supported EXAMPLE 4GPUuserModuleLoad numerics numerics cubin GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 69 GPUuserModulesinfo GPUuserModulesInfo Prints loaded CUDA cubin modules 242 SYNTAX GPUuserModulesInfo MODULE NAME na DESCRIPTION GPUuserModulesInfo displays modules loaded GPUuserModuleLoad Compilation not supported EXAMPLE 4GPUuserModulesInfo using GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 70 GPUuserModuleUnload GPUuserModuleUnload Unloads CUDA cubin module 243 SYNTAX GPUuserModuleUnload module_name module_name string MODULE NAME na DESCRIPTION GPUuserModuleUnload module_name unload the module_name Compilation not supported EXAMPLE GPUuserModuleUnload
51. GPUmat User Guide Version 0 27 December 2010 Contents Contents 1 Introduction 171 About GPUS pto e diate a when ds oh fo Bib amp Kote 1 2 System requirements ce 04a eee hr 1 3 Credits and licensing a TA How tounstalll lo di ge A A E a o BY 1 5 Terminology a A 48 2 8 a a oe ee 1 6 Documentation overview 0 2 02005 Quick start 2 1 Matrix addition example sisas So ewe eS 2 2 Matrix multiplication example 2 2 2 0 2 3 FFT calculation Sampler 2 2 2 2 2 ety A os ZA GPUmat compiler 2 2 A OE a 2 5 Variable assignment gas 23 3 2E 132 222 er ES 2 6 Performance analysis er Sa det rare een eh GPUmat overview 3 1 Starting the GPU environment 2 2 22 22mm nenn 3 2 Creating a GPU variable x 342 2 a 3 3 Performing calculations on the GPU oaaae 3 4 Porting existing Matlab code aoaaa aaa a 3 5 Converting a GPU variable into a Matlab variable 3 6 Indexed references ooa aa 020200 eee eee 3 7 GPUmat functions AR AAA 3 8 GPU memory management aa 3 9 Low level GPU memory management 3 9 1 Memory management using the GPU classes 3 9 2 Memory management using low level functions 3 10 Complex numbers 02 0002 eee ee CONTENTS CONTENTS 3 11 Coding guidelines 3 15 wh eters dened ee oR ee og 44 3 11 1 Memory transfers snes ae Gr ene ee Bei 45 3 11 2 Vectorized code and for l00pS
52. HABETICAL LIST 6 3 50 log10 log10 Common base 10 logarithm SYNTAX R log10 X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION LOG10 X is the base 10 logarithm of the elements of X NaN results are produced if X is not positive Compilation supported EXAMPLE Ps Il rand 10 GPUsingle log10 X gt I MATLAB COMPATIBILITY Not implemented for complex X 139 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 51 loglp 140 loglp Compute log 1 z accurately SYNTAX R logip X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION LOG1P Z computes log 1 z Only REAL values are accepted Compilation supported EXAMPLE Ps Il rand 10 GPUsingle log1p X gt i MATLAB COMPATIBILITY Not implemented for complex X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 52 log2 141 log2 Base 2 logarithm and dissect floating point number SYNTAX R log2 X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION Y LOG2 X is the base 2 logarithm of the elements of X Compilation supported EXAMPLE X rand 10 GPUsingle R log2 X MATLAB COMPATIBILITY Not implemented for complex X GPUmat Guide Ver
53. ION GPUisDoublePrecision returns 1 if the GPU supports double pre cision Compilation supported EXAMPLE GPUisDoublePrecision GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 33 GPUmem 122 GPUmem Returns the free memory bytes on selected GPU device SYNTAX GPUmem MODULE NAME na DESCRIPTION Returns the free memory bytes on selected GPU device Compilation supported EXAMPLE GPUmem GPUmem 1024 1024 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 34 GPUround 123 GPUround Round towards nearest integer SYNTAX GPUround X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUround X R is equivalent to round X but the result is re turned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUround X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 35 GPUsinh 124 GPUsinh Hyperbolic sine SYNTAX GPUsinh X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUsinh X R is equivalent to sinh X but the result is returned in input parameter R Compilation supported EXAMPLE
54. LPHABETICAL LIST 6 4 44 GPUloglp 219 GPUlog1p Compute log 1 z accurately SYNTAX GPUlogip X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUlogip X R is equivalent to LOG1P X but the result is re turned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUlogip X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 45 GPUlog2 220 GPUlog2 Base 2 logarithm and dissect floating point number SYNTAX GPUlog2 X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUlog2 X R is equivalent to LOG2 X but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUlog2 X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 46 GPUIt 221 GPUIt Less than SYNTAX GPUlt X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUlt X Y R is equivalent to 1t X Y but the result is re turned in input parameter R Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R zeros size B GPUsingle GPU
55. NH X is the inverse hyperbolic sine of the elements of X Compilation supported EXAMPLE X R rand 10 GPUsingle asinh X MATLAB COMPATIBILITY Not implemented for complex X GPUmat Guide Version 0 27 Copyright gp you org 95 CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 7 assign assign Indexed assignement SYNTAX assign dir P Q R1 R2 RN P GPUsingle GPUdouble Q GPUsingle GPUdouble Matlab scalar supported MODULE NAME NUMERICS DESCRIPTION ASSIGN DIR P Q R1 R2 RN performs the following operations depending on the value of the parameter DIR DIR 0 gt P Q R1 R2 RN DIR 1 gt P R1 R2 RN Q R1 R2 RN represents a sequence of ranges A range can be con structed as follows inf stride sup defines a range between inf and sup with spec ified stride It is similar to the Matlab syntax A inf stride sup The special keyword END please note uppercase END can be used similar to the colon used in Matlab indexing i1 i2 in any array enclosed by brackets is consid ered an indexes array similar to A 1 2 3 4 1 2 in Matlab i1 a single value is interpreted as an index Similar to A 10 in Matlab Compilation supported EXAMPLE A rand 100 GPUsingle B rand 10
56. NTAX R X lt Y R le X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A lt B le A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A lt B single R R le A B single R 136 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 48 length 137 length Length of vector SYNTAX R length X X GPU variable MODULE NAME NUMERICS DESCRIPTION LENGTH X returns the length of vector X It is equivalent to MAX SIZE X for non empty arrays and 0 for empty ones Compilation not supported EXAMPLE A rand 5 GPUsingle length A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 49 log log Natural logarithm SYNTAX R log X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION LOG X is the natural logarithm of the elements of X NaN results are produced if X is not positive Compilation supported EXAMPLE Ps Il rand 10 GPUsingle log X gt I MATLAB COMPATIBILITY Not implemented for complex X 138 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALP
57. ON GPUacosh X R is equivalent to ACOSH X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle 1 R zeros size X GPUsingle GPUacosh X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 19 GPUallocVector GPUallocVector Variable allocation on GPU memory SYNTAX GPUallocVector P P GPU variable MODULE NAME na DESCRIPTION P GPUallocVector P allocates the required GPU memory for P The size of the allocated variable depends on the size of P A complex variable is allocated as an interleaved sequence of real and imaginary values It means that the memory size for a complex on the GPU is numel P 2 SIZE_OF_FLOAT It is mandatory to set the size of the variable before calling GPUallocVector Compilation not supported EXAMPLE A GPUsingle setSize A 100 100 GPUallocVector A A GPUsingle setSize A 100 100 setComplex A GPUallocVector A 193 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 20 GPUand 194 GPUand Logical AND SYNTAX GPUand A B R A GPUsingle GPUdouble B GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUand A B R is equivalent to A amp B but result is returned in the input paramete
58. OPERATORS 6 2 6 A lt B le Less than or equal SYNTAX R X lt Y R le X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 74 DESCRIPTION A lt B le A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A lt B single R R le A B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 7 A lt B It Less than SYNTAX R X lt yY R 1t X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 75 DESCRIPTION A lt B 1t A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A lt B single R R 1t A B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 8 A B minus Minus SYNTAX R X Y R minus X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 76 DESCRIPTION X Y subtracts matrix Y from X X and Y must have the same dimensions unless one is a scalar A scalar can be subtracted from anything Compilation supported EXAMPLE rand 10 GPUsingle rand 10 GPUsingle wo rand 10 GPUdouble rand 10 GPUdouble Wo IE Dime DW lt es GPUmat Guide Version 0 27 Copyright gp you o
59. PU memory The following code shows a typical situation where the GPU memory is not enough and some variables must be manually removed 40 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 9 LOW LEVEL GPU MEMORY MANAGEMENT A rand 6000 3000 GPUsingle A is on GPU B rand 6000 3000 GPUsingle B is on GPU C rand 6000 3000 GPUsingle C is on GPU Device memory allocation error Available memory is 65274 KB required 70312 KB In the above example it is not possible to allocate the variable C because the GPU memory is not enough see the error message In this case we must delete other variable such as A or B If we need also A and B then our GPU card has not enough memory to manage all the variables To delete a variable for example A use the clear command as follows clear A Check the file MemoryExample m located in the example folder to under stand how to use functions for memory management The file performs the following actions e Displays the GPU available memory e Creates a GPUsingle variable on the GPU workspace and displays the available free memory e Cleans up the GPU variable and displays once more the available GPU memory A very useful Matlab command is the whos which can be used to check how many GPU variables are on the Matlab workspace The following Mat lab output shows the result of the whos command and the presence of a GPUsingle A on the Matlab workspace
60. ST 6 4 65 GPUtimes 238 GPUtimes Array multiply SYNTAX GPUtimes X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUtimes X Y R is equivalent to times X Y but the result is returned in input parameter R Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R zeros size A GPUsingle GPUtimes A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 66 GPUtranspose 239 GPUtranspose Transpose SYNTAX GPUtranspose X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUtranspose X R is equivalent to transpose X but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUtranspose X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 67 GPUuminus 240 GPUuminus Unary minus SYNTAX GPUuminus X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUuminus X R is equivalent to uminus X but the result is re turned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUuminus X R
61. TER 4 GPUmat compiler 4 3 SYSTEM REQUIREMENTS B randn 1 5 GPUsingle GPUcompileStart myfor2 f A B GPUfor it 1 5 GPUfor jt 1 5 assign 1 A B jt it GPUend GPUend GPUcompileStop 4 3 System requirements Your system must be configured to compile Matlab mex functions Please check the Matlab manual for more details about Building MEX Files A valid compiler must be installed in order to compile Under Windows we suggest Microsoft Visual C Express Edition a free product from Microsoft Under Linux we suggest the free GPU GCC compiler To configure the compiler under Matlab run the following command mex setup To check from GPUmat if the system is properly configured run the following script after starting GPUmat GPUcompileCheck 4 4 Limitations The GPUmat compilers records GPU functions only Matlab functions are not included in the compilation The following are some examples A randn 5 5 5 GPUsingle AE GPUcompileStart code_ex1 f A if a R exp A 53 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 4 GPUmat compiler 4 4 LIMITATIONS else R floor A end GPUcompileStop R In the above code only one if statement is evaluated Therefore only one command is executed on GPU and recorded to the compiled function The above code is equivalent to the following A randn 5 5 5 GPUsingle GPUcompileStart code_ex1 f A R exp A
62. U to the CPU The same happens when a GPUsingle is created and initialized using a Matlab array Because of the limited memory bandwidth between the HOST and the GPU the data transfer between CPU and GPU may be time consuming and therefore its usage should be limited 3 6 Indexed references The elements of a GPU array can be accessed as any other Matlab array for example rand 50 GPUsingle A is on GPU A 1 end A 1 1 10 AG 1 10 A 21 30 A B B B A Above commands are translated in Matlab to calls to the functions subsref and subsasgn The implementation and the source code of these functions is documented in the GPUmat User Modules Wiki on Sourceforge see Chap ter 5 for further details The functions slice and assign can also be used to access the elements of a GPU array They have a syntax very similar to the standard Matlab indexing but are faster than subsref and subsasgn Table 3 10 shows the performance analysis of the subsasgn function for different GPUmat versions compared to the function assign and the CPU time More details about the above tests are presented on the GPUmat User Modules Wiki The following are some slice and assign examples also available in the Examples folder file SliceAssign m Bh single rand 100 B rand 100 GPUsingle 7 Matlab syntax Ah Bh 1 end Equivalent slice syntax 37 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview
63. X setComplex A A GPU variable MODULE NAME na DESCRIPTION setComplex P set the GPU variable P as complex Should be called before using GPUallocVector Compilation not supported EXAMPLE A GPUsingle setSize A 10 10 setComplex A GPUallocVector A 249 GPUmat Guide Version 0 27 Copyright gp you org Bibliography 1 NVIDIA Cuda Programming Guide NVIDIA Corporation 2 Cuda http www nvidia com object cuda_home htm1 3 Gpgpu http www gpgpu org 250
64. X R is equivalent to ATAN X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUatan X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 24 GPUatanh 198 GPUatanh Inverse hyperbolic tangent SYNTAX GPUatanh X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUatanh X R is equivalent to ATANH X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUatanh X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 25 GPUceil 199 GPUceil Round towards plus infinity SYNTAX GPUceil X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUceil X R is equivalent to CEIL X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUceil X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 26 GPUcomplex GPUcomplex Construct complex data from real and imaginary components SYNTAX GPUcomplex X R GPUco
65. X to the nearest integers towards infinity Compilation supported EXAMPLE X R rand 10 GPUsingle ceil X MATLAB COMPATIBILITY Not implemented for complex X 99 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 11 clone 100 clone Creates a copy of a GPUtype SYNTAX R X R clone X GPUsingle GPUdouble GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION CLONE X creates a copy of X Compilation supported EXAMPLE X R rand 10 GPUsingle clone X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 12 colon 101 colon Colon SYNTAX R R colon J K GPUsingle colon J D K GPUsingle MODULE NAME NUMERICS DESCRIPTION COLON J K GPUsingle is the same as J K and COLON J D K GPUsingle is the same as J D K J K is the same as J J 1 K J K is empty if J gt K J D K is the same as J J D J mxD where m fix K J D J D K is empty if D 0 ifD gt Oand J gt K orifD lt Oand J lt K Compilation supported EXAMPLE A colon 1 2 10 GPUsingle GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 13 complex complex Construct complex data from real and imaginary com ponents SYNTAX R complex X R c
66. Y R is equivalent to ne X Y but the result is re turned in input parameter R Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R zeros size B GPUsingle GPUne A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 50 GPUnot 225 GPUnot Logical NOT SYNTAX GPUnot X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUnot X R is equivalent to not X but the result is returned in input parameter R Compilation supported EXAMPLE A GPUsingle 1 2 0 4 R zeros size A GPUsingle GPUnot A R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 51 GPUones 226 GPUones GPU ones array SYNTAX GPUones R R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUones R sets to one all the elements of R Compilation supported EXAMPLE A rand 5 GPUsingle GPUones A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 52 GPUor 227 GPUor Logical OR SYNTAX GPUor X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUor X Y R is equivalent to or X Y but the
67. alysis The easiest way to evaluate the performance in Matlab are the tic and toc commands as follows A rand 1000 1000 A is on CPU B rand 1000 1000 B is on CPU tic A B toc executed on CPU The GPU code performance can be evaluated in a similar way by using tic toc and the GPUsync command as follows A rand 1000 1000 GPUsingle B rand 1000 1000 GPUsingle tic A B GPUsync toc The following example shows a simple Matlab script to compare the ex ecution time of the element by element multiplication between two matrices A and B on the GPU and on the CPU 24 GPUmat Guide Version 0 27 Copyright gp you org 2 6 N 100 100 2000 timecpu zeros 1 length N timegpu zeros 1 length N index 1 for i N Ah single rand i CPU A rand i GPUsingle GPU h Execution on GPU tic A A GPUsync timegpu index toc h Execution on CPU mass Ah Ah timecpu index toc increase index index index 1 end CHAPTER 2 Quick start PERFORMANCE ANALYSIS The above code calculates the two vectors timecpu and timegpu that can be used to evaluate the speed up between the GPU and the CPU as follows speedup timecpu timegpu 25 GPUmat Guide Version 0 27 Copyright gp you org Chapter 3 GPUmat overview GPUmat functions are grouped into high level and low level functions High level functions can be used in a similar way as existing Matlab functions while to use low level f
68. ape Reshape array round Round towards nearest integer 6 1 8 General information Name Description display Display GPU variable getPtr Get pointer on GPU memory 65 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 1 FUNCTIONS BY CATEGORY getSize f Get the size of the GPU datatype similar to sizeof in C getType Get the type of the GPU variable GPUisDoublePrecision Check if GPU is double precision iscomplex True for complex array isempty True for empty GPUsingle array isreal True for real array isscalar True if array is a scalar length Length of vector ndims Number of dimensions numel Number of elements in an array or sub scripted array expression size Size of array 6 1 9 User defined modules Name Description GPUgetUserModule Returns CUDA cubin module handler GPUuserModuleLoad Loads CUDA cubin module GPUuserModulesInfo Prints loaded CUDA cubin modules GPUuserModuleUnload Unloads CUDA cubin module 6 1 10 GPUmat compiler Name Description GPUcompileAbort Aborts the GPUmat compilation GPUcompileStart Starts the GPUmat compiler GPUcompileStop Stops the GPUmat compiler 6 1 11 Complex numbers 66 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 1 FUNCTIONS BY CATEGORY Na
69. ariable FLOAT 0 COMPLEX FLOAT 1 DOUBLE 2 COMPLEX DOUBLE 3 Compilation not supported EXAMPLE A rand 10 GPUsingle getType A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 16 GPUabs 190 GPUabs Absolute value SYNTAX R GPUabs X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUabs X R is equivalent to ABS X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 1 5 GPUsingle i rand 1 5 GPUsingle R zeros size X GPUsingle GPUabs X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 17 GPUacos 191 GPUacos Inverse cosine SYNTAX GPUacos X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUacos X R is equivalent to ACOS X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUacos X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 18 GPUacosh 192 GPUacosh Inverse hyperbolic cosine SYNTAX GPUacosh X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTI
70. cision zeros zeros M N P GPUdouble or zeros M N P GPUdouble is an M by N by P by GPU array of double precision zeros Compilation supported EXAMPLE A zeros 10 GPUsingle B zeros 10 10 GPUsingle C zeros 10 10 GPUsingle A zeros 10 GPUdouble zeros 10 10 GPUdouble C zeros 10 10 GPUdouble Ww Il GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 Low level functions alphabetical list 6 4 1 cuCheckStatus cuCheckStatus Check the CUDA DRV status MODULE NAME na DESCRIPTION cuCheckStatus STATUS MSG returns EXIT_FAILURE 1 or EXIT _SUCCESS 0 depending on STATUS value and throws an error with message MSG Compilation not supported EXAMPLE status culInit cuCheckStatus status Error initialize CUDA driver 6 4 2 cudaCheckStatus 179 cudaCheckStatus Check the CUDA run time status MODULE NAME na DESCRIPTION RET cudaCheckStatus STATUS MSG returns EXIT_FAILURE 1 or EXIT_SUCCESS 0 depending on STATUS value and throws an error with message MSG Compilation not supported EXAMPLE status cudaGetLastError cudaCheckStatus status Kernel execution error GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 3 cudaGetDeviceCount 180 cudaGetDeviceCount Wra
71. copy 61 CHAPTER 6 Function Reference 6 1 FUNCTIONS BY CATEGORY memCpyHtoD Host Device memory copy ones GPU ones array repmat Replicate and tile an array setComplex Set a GPU variable as complex setReal Set a GPU variable as real setSize Set GPU variable size single Converts a GPU variable into a Matlab single precision variable zeros GPU zeros array 6 1 3 GPU memory management Name Description GPUallocVector Variable allocation on GPU memory GPUmem Returns the free memory bytes on selected GPU device 6 1 4 Random numbers generator High level Name Description rand GPU pseudorandom generator randn GPU pseudorandom generator 6 1 5 Random numbers generator Low level Name Description GPUrand GPU pseudorandom generator GPUrandn GPU pseudorandom generator 62 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 1 FUNCTIONS BY CATEGORY 6 1 6 Numerical functions High level 63 Name Description abs Absolute value acos Inverse cosine acosh Inverse hyperbolic cosine and Logical AND asin Inverse sine asinh Inverse hyperbolic sine assign Indexed assignement atan Inverse tangent result in radians atanh Inverse hyperbolic tangent ceil Round towards plu
72. ction Unary minus Element wise multiplication Matrix multiplication Right element wise division Left element wise division Element wise power Less than Greater than Less than or equal to Greater than or equal to Not equal to Equality Logical AND Logical OR up pmpilpplpmp m mIimimim m Logical NOT Complex conjugate trans pose Matrix transpose GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 1 A amp B and Logical AND SYNTAX R A amp B R and A B A GPUsingle GPUdouble B GPUsingle GPUdouble R GPUsingle GPUdouble 69 DESCRIPTION A amp B performs a logical AND of arrays A and B and returns an array containing elements set to either logical 1 TRUE or logical 0 FALSE Compilation supported EXAMPLE A GPUsingle 1 3 0 4 B GPUsingle 0 1 10 2 R A amp B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 2 A ctranspose Complex conjugate transpose SYNTAX R X R ctranspose X X GPUsingle GPUdouble R GPUsingle GPUdouble 70 DESCRIPTION X is the complex conjugate transpose of X Compilation supported EXAMPLE gt lt Il rand 10 GPUsingle i rand 10 GPUsingle Ber ctranspose X ee i GPUmat Guide Version 0 27 Copyright gp you org CHAPT
73. ctly use the function rand as follows 14 CHAPTER 2 Quick start Ah in on CPU memory Ah single rand 100 100 A is directly created on GPU memory A rand 100 100 GPUsingle In the above code there is no memory trasfer between CPU and GPU Ina similar way we can create two double precision Matlab variables Bh and B as follows if GPUisDoublePrecision Bh rand 100 100 Bh in on CPU memory B GPUdouble Bh B is on GPU memory end The optimized version of the above code without CPU to GPU memory transfer is the following if GPUisDoublePrecision Bh rand 100 100 Bh in on CPU memory B rand 100 100 GPUdouble B is on GPU memory end If a double precision Matlab array is used to initialize a GPUsingle variable it is converted to a single precision variable resulting in a loss of precision Ah A rand 100 100 Ah in on CPU memory double precision GPUsingle Ah A is on GPU memory single precision During the initialization of the GPU variable A the data in the Matlab array Ah is copied from the CPU memory to the GPU memory The data transfer is transparent to the user There are several ways to create a GPU variable as explained in Sec tion 3 2 The command A colon 0 2 6 GPUsingle A is on GPU memory if GPUisDoublePrecision B colon 0 2 6 GPUdouble B is on GPU memory end results in 15 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 0246 024
74. currently implements only a subset of Matlab functions The most important operators and numerical functions are implemented and users with programming experience can extend the library by using low level and high level functions that are available and documented in the library Table 3 11 shows a short summary of implemented functions and operators 39 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 8 GPU MEMORY MANAGEMENT Implemented functions Example Matlab operators A rand 1000 GPUsingle A B A B A B B rand 1000 GPUsingle A B etc C A B Numerical functions A rand 1000 GPUsingle exp sqrt log etc B rand 1000 GPUsingle C exp A D sqrt C B Fast Fourier Transform RE rand 1000 GPUsingle IM i rand 1000 GPUsingle C fft RE IM Table 3 11 Some GPUmat functions 3 8 GPU memory management The memory is managed automatically by GPUmat Any GPU variable is automatically destroyed following exactly the same life cycle as any other Matlab variable Nevertheless the GPU memory is limited and eventually the user can manually remove GPU variables by using the Matlab built in command clear Table 3 12 shows functions to manage the GPU memory Name Description clear Matlab built in command removes the specified variables GPUmem Returns available GPU memory in bytes Table 3 12 Functions used to manage the G
75. d EXAMPLE A rand 10 GPUsingle B A 5 A rand 10 GPUdouble B A 5 MATLAB COMPATIBILITY Supported only A n where n is scalar 144 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 56 mtimes mtimes Matrix multiply SYNTAX R X Y R mtimes X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION mtimes X Y is the matrix product of X and Y Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R Ax B A rand 10 GPUdouble B rand 10 GPUdouble R Ax B A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R Ax B 145 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 57 ndims 146 ndims Number of dimensions SYNTAX R ndims X X GPU variable MODULE NAME NUMERICS DESCRIPTION N NDIMS X returns the number of dimensions in the array X The number of dimensions in an array is always greater than or equal to 2 Trailing singleton dimensions are ignored Put simply it is LENGTH SIZE X Compilation not supported EXAMPLE X rand 10 GPUsingle ndims X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 58
76. d under Windows and Linux with Matlab ver R2007a or newer installed CUDA should be installed on the system Follow the instructions on NVIDIA s CUDA website 2 to download and install the software 1 3 Credits and licensing Copyright gp you org GPUmat is distributed as Freeware By using GPUmat you accept all the terms and conditions specified in the license txt file in the GPUmat installation folder Please send any suggestions questions or bug report to gp you gp you org 1 4 How to install To install GPUmat unpack the downloaded package and follow these steps e STEPO Windows Microsoft Visual C 2008 Redistributable Pack age installation This package is required only on Windows You might have this package already installed Try to run GPUmat by following steps STEP1 to STEP3 If it fails install the C Redistributable by running the executable veredist_x86 exe or vcredist_x64 exe depend ing on the architecture that you find in the etc folder in the GPUmat installation package e STEP1 start Matlab and change directory to the folder where the library was unpacked e STEP2 start GPUmat using the GPUstart command e STEP3 optional but suggested add the library path to the Matlab path by using the File gt Set Path menu The Matlab documenta tion describes how to add a new path This step is not mandatory if the 10 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 1 Introduction 1 4 HOW TO INST
77. double B is on GPU 35 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 5 CONVERTING A GPU VARIABLE INTO A MATLAB VARIABLE end Matlab scalars are automatically converted into GPU variables as described in previous sections 3 5 Converting a GPU variable into a Matlab variable Although a GPU variable is available from Matlab its content is stored on the GPU memory Converting a GPU variable into a Matlab variable means transferring the content of the variable from the GPU to the CPU memory The following example describes how to convert a GPU variable A into a Matlab array Ah by using the functions single and double Ah rand 10 A GPUsingle Ah A is on GPU memory Ch single A Ch is on CPU memory if GPUisDoublePrecision B GPUdouble Ah B is on GPU memory Dh double B Dh is on CPU memory end To visualize the content of a GPU variable on the Matlab command window just type its name as any other Matlab array A rand 5 GPUsingle A is on GPU ans 0 8147 0 0975 0 1576 0 1419 0 6557 0 9058 0 2785 0 9706 0 4218 0 0357 0 1270 0 5469 0 9572 0 9157 0 8491 0 9134 0 9575 0 4854 0 7922 0 9340 0 6324 0 9649 0 8003 0 9595 0 6787 36 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 6 INDEXED REFERENCES Single precision REAL GPU type Every time the content of a GPUsingle is read in Matlab the system performs a memory transfer from the GP
78. e Ah rand 5 Ah is on CPU A rand 5 GPUsingle A is on GPU Bh 1 Bh is on CPU Ah A Unknown operation between double and GPUsingle A Bh ans 1 8147 1 0975 1 1576 1 1419 1 6557 1 9058 1 2785 1 9706 1 4218 1 0357 1 1270 1 5469 1 9572 1 9157 1 8491 1 9134 1 9575 1 4854 1 7922 1 9340 1 6324 1 9649 1 8003 1 9595 1 6787 Single precision REAL GPU type Adding Ah and A generates an error whereas adding A and Bh is possible because Bh is a scalar A can be converted into a Matlab variable and added to Ah or in a similar way Ah can be converted into a GPU variable and added to A as follows Ah rand 5 A rand 5 GPUsingle Ah single A A converted into Matlab Ch single A A converted into Matlab Ch Ah Ch adding Ah and Ch D GPUsingle Ah Ah converted into the GPUsingle D A D adding A and D A GPUsingle Ah A added directly to GPUsingle Ah 48 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 12 PERFORMANCE ANALYSIS 3 12 Performance analysis The easiest way to evaluate the performance in Matlab are the tic and toc commands as follows A rand 1000 1000 A is on CPU B rand 1000 1000 B is on CPU tic A B toc executed on CPU The GPU code performance can be evaluated in a similar way by using tic toc and the GPUsync command as follows A rand 1000 1000 GPUsingle B rand 1000 1000 GPUsingle tic A B GPUsy
79. e GPUfloor X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 36 GPUge 211 GPUge Greater than or equal SYNTAX GPUge X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUge A B R is equivalent to ge A B but result is returned in the input parameter R Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R zeros size B GPUsingle GPUge A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 37 GPUgetUserModule 212 GPUgetUserModule Returns CUDA cubin module handler SYNTAX GPUgetUserModule module_name module_name string MODULE NAME na DESCRIPTION GPUgetUserModule module_name returns the handler of the loaded module module_name Compilation not supported EXAMPLE GPUgetUserModule numerics GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 38 GPUgt 213 GPUgt Greater than SYNTAX GPUgt X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUgt A B R is equivalent to gt A B but result is returned in the input parameter R
80. e 108 double Converts a GPU variable into a Matlab double precision variable SYNTAX R single X X GPUsingle GPUdouble X Matlab variable R single precision Matlab variable MODULE NAME na DESCRIPTION B SINGLE X converts the content of the GPU variable X into a double precision Matlab array Compilation not supported EXAMPLE A rand 100 GPUsingle Ah double A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 20 eq eq Equal SYNTAX R X R eq X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A B eq A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A B single R R eq A B single R 109 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 21 exp 110 exp Exponential SYNTAX R exp X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION EXP X is the exponential of the elements of X e to the X For complex Z X i Y EXP Z EXP X COS Y i SIN Y Compilation supported EXAMPLE Ps Il rand 1 5 GPUsingle i rand 1 5 GPUsingle exp X gt Il GPUmat Guide Version 0
81. e Ah single rand 1000 Ah is a Matlab variable A GPUsingle Ah GPU variable if GPUisDoublePrecision Ah rand 1000 Ah is a Matlab variable A GPUdouble Ah GPU variable end There is a loss of precision in the conversion between double and single pre cision if the GPU variable is initialized with a double precision Matlab array Ah as follows Ah A rand 1000 Ah is a double precision Matlab variable GPUsingle Ah GPU variable Conversion between double and single precision is possible using the functions GPUsingle and GPUdouble as follows if GPUisDoublePrecision Ah rand 1000 Ah is a Matlab variable A GPUdouble Ah GPU variable double prec end Bh single rand 1000 Bh is a Matlab variable B GPUsingle Bh GPU variable single prec convert GPU single to double 29 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 2 CREATING A GPU VARIABLE if GPUisDoublePrecision C GPUdouble B end convert GPU double to single D GPUsingle A 30 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 2 CREATING A GPU VARIABLE A colon begin stride end GPUsingle A colon begin stride end GPUdouble Creates a GPU variable A with values in the range begin end The increment between elements is stride This command is similar to the Matlab colon command Example A colon 0 2 1000 GPUsingle A
82. e offset incr m p offsetp type Matlab MODULE NAME NUMERICS DESCRIPTION GPUfill A offset incr m p offsetp type fills an ex isting array with specific values Compilation supported EXAMPLE hh Fill with ones A zeros 5 GPUsingle GPU EIA 1000 0 hh Fill with ones and element every 2 A zeros 5 GPUsingle GRU EGA rr Ona De One 09 hh Fill with ones and element every 2 starting from the 2nd element A zeros 5 GPUsingle GPU CAN ACTO Ones 210 00 hh Fill with a sequence of numbers from 1 to numel A A zeros 5 GPUsingle GPUfill A 1 1 numel A O O 0O hh Fill with a sequence of numbers from 1 to numel A An element every 2 is modified A zeros 5 GPUsingle GPUfill A 1 1 numel A 2 0O O hh type 2 to modify both real and complex part A zeros 2 complex GPUsingle GPUfill A 1 1 numel A 0 0 2 hh Modify only the complex part A zeros 2 complex GPUsingle PUfill A 1 1 numel A IDLA 209 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 35 GPUfloor 210 GPUfloor Round towards minus infinity SYNTAX GPUfloor X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUfloor X R is equivalent to FLOOR X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 1 5 GPUsingle R zeros size X GPUsingl
83. ed in the examples folder and it shows how to port existing Matlab code and run it on the GPU The example creates two variables A and B add them and store the result into the variable C The original Matlab code is the following A single rand 100 A is on CPU memory B single rand 100 B is on CPU memory C A B executed on CPU C is on CPU memory The above code in double precision is the following A rand 100 A is on CPU memory B rand 100 B is on CPU memory C A B executed on CPU C is on CPU memory The ported GPUmat code single and double precision is the following hh Single precision A rand 100 GPUsingle A is on GPU memory B rand 100 GPUsingle B is on GPU memory C A B executed on GPU C is on GPU memory hh double precision if GPUisDoublePrecision A rand 100 GPUdouble A is on GPU memory B rand 100 GPUdouble B is on GPU memory 18 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 2 1 MATRIX ADDITION EXAMPLE C A B executed on GPU C is on GPU memory end Please note the difference between the original code and the modified code Every Matlab variable has been converted to the GPUsingle or GPUdouble class A rand 100 becomes A rand 100 GPUsingle Any operation on GPUsingle variables generates a GPUsingle i e C in the modified code is also a GPUsingle Functions involving GPUsingle variables like A B in the above example
84. ensions unless one is a scalar A scalar can be divided with anything Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle REAN B A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B 158 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 69 real 159 real Real part of complex number SYNTAX R X real X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION R real X returns the real part of the elements of X Compilation supported EXAMPLE A R rand 10 GPUsingle sqrt 1 rand 10 GPUsingle real A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 70 repmat 160 repmat Replicate and tile an array SYNTAX repmat X M N REPMAT X M N REPMAT X M N P GPUsingle GPUdouble GPUsingle GPUdouble U I I wD Il MODULE NAME NUMERICS DESCRIPTION R repmat X M N creates a large matrix R consisting of an M by N tiling of copies of X The statement repmat X N creates an N by N tiling Compilation supported EXAMPLE A rand 10 GPUsingle repmat A 3 4 5 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 71 setReal 161
85. ent Variable assignment in GPUmat is different from Matlab For example the following commands create in Matlab two arrays A and B and B is assigned to A A rand 3 CPU B rand 3 CPU A B In the above example A and B have the same values but are distinct vari ables It means that the following statement has effect only on A A 1 10 A 22 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 2 5 VARIABLE ASSIGNMENT B gt gt A 1 10 A B NE 10 0000 0 7379 0 7817 0 4959 0 3107 0 1115 0 9885 0 6004 0 5793 B 0 0068 0 7379 0 7817 0 4959 0 3107 0 1115 0 9885 0 6004 0 5793 The above commands have a different behavior in GPUmat If a GPUmat variable B is assigned to a GPUmat variable A then the two objects are exactly the same It means that the following command has effects on both A and B A rand 3 GPUsingle GPU B rand 3 GPUsingle GPU A B gt gt A 1 10 A B 10 0000 0 0946 0 3821 0 3778 0 9091 0 6603 23 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 2 6 PERFORMANCE ANALYSIS 0 5180 0 2076 0 7584 Single precision REAL GPU type ans 10 0000 0 0946 0 3821 0 3778 0 9091 0 6603 0 5180 0 2076 0 7584 Single precision REAL GPU type To assign to Athe GPUmat variable B the clone command must be used as follows A rand 3 GPUsingle GPU B rand 3 GPUsingle GPU A clone B 2 0 Performance an
86. es 66 6 1 10 GPUmat compiler taa 2 8 23 a OE SS 66 6 1 11 Komplex numbers acer a Re 66 6 1 12 CUDA Driver functions 4 2 occ 2 4 ses 3 67 6 1 13 CUDA run time functions er Yasha 67 GPUmat Guide Version 0 27 Copyright gp you org CONTENTS CONTENTS 6 2 Opelalors LS DAA A A wee te as 68 621 RRB chet O Ny a ean he ha at 69 A ete wheelie Sek he EN 70 023e A B hobs Witte cokes RN 71 O24 ASB teh ea Di Oh ce en A aa Se de hy A ey a 12 022 5 IB ie is re wht ey Gens ee er 73 O20 AKSB ernennen 74 Oo MBA Boe Ae ee nit aada te eis aada ae is Ge PP site do 75 628 reek toe eS ne eh Se eS oe at 76 0229k e Nc hah Se A CE OTO IBS cis te av gh oe sine GY due san ar ee le 78 Os SAT as hs E hah 79 AVA E E E en tee pe as Ate te Se aS 80 BS Al Be oS ee o hie o e Et hd e 81 Belt AB a e dsd 82 0 215 A A A 83 LA E E ee Ne re a re 84 Oe ah ee ee ee ei 85 EA o en ae Eee are Dane Bin 86 ETE PER Na Dar EA ae gn fea oe 87 02205 LA BI de Le o ie en Se Ge np gt Oe Seon lt a 88 6 3 High level functions alphabetical list 2 2 22 2 2200 89 DE 3 1 AIDS oaks az here E Se Ue ee th al oe te 89 092 ACOs air ee ae ere ee ae eS A 90 a e A eR ee ai 91 634 and AI A A LI ote 92 0 3 9 A A ee ee oe 93 6 0 ASMA ar de 25d aoe en be BOR A eee ar 94 0 37 asian ali Dr oe he e Ge ee nae gs Gad ote 96 0 3 8 atai a al hs mn JE oe A A tt Dy at 97 0 3 9 atanh fod pr e a eed CAD tel thee ee d 98 E oy ces Hes ety oe t
87. es X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION X Y denotes element by element multiplication X and Y must have the same dimensions unless one is a scalar A scalar can be multiplied into anything Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R A B A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B A rand 10 GPUdouble ix rand 10 GPUdouble B rand 10 GPUdouble ix rand 10 GPUdouble R A B 174 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 84 unpackfC2C unpackfC2C Unpack one complex array into two single precision arrays SYNTAX UNPACKFC2C IDATA RE_ODATA IM_ODATA MODULE NAME na DESCRIPTION UNPACKFC2C IDATA RE_ODATA IM_ODATA unpack the values of IDATA into two arrays RE_ODATA and IM_ODATA as shown in the example The type of elements of IDATA is complex Compilation not supported 6 3 85 unpackfC2R unpackfC2R Transforms a complex array into a real array dis carding the complex part SYNTAX UNPACKFC2C IDATA RE_ODATA MODULE NAME na DESCRIPTION UNPACKFC2C IDATA RE_ODATA transforms the complex array IDATA into the array RE_ODATA discarding the imaginary part The type of elements of IDATA is complex Compilation not supported 175
88. etSize B 100 100 must set GPUdouble size GPUallocVector B allocate GPU memory clear B end 42 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 10 COMPLEX NUMBERS 3 9 2 Memory management using low level functions The following code shows how to allocate a variable with 100 single precision floating point elements by using CUBLAS functions create a new pointer GPUptr 0 allocate using cublasAlloc SIZE_OF_FLOAT 4 NUMEL 100 status GPUptr cublasAlloc NUMEL SIZE_OF_FLOAT GPUptr cublasCheckStatus status Device memory allocation error The function cublasFree is used to free the memory status cublasFree GPUptr cublasCheckStatus status memory free error GPUptr 3 10 Complex numbers A complex number is represented as a sequence of two values the real and imaginary part respectively A complex vector is a sequence of complex numbers i e a sequence of interleaved real and imaginary values There are different methods to create a complex GPU variable e Initializing a GPU variable with a Matlab complex number e Multiply a real number by the imaginary unit e Use GPUreal and GPUimag functions or the corresponding high level functions real and imag Above points are explained in the following example 1 Initialize a GPUsingle with a Matlab complex array Gh G rand 10 sqrt 1 rand 10 Matlab complex variable GPUsingle Gh
89. ge net projects matcuda The GPUmat User Modules project explains how to access GPUmat in ternal functions directly from a mex file and how to add to GPUmat a user implemented GPU kernel Documentation for this project can be found in the GPUmat installation folder on the Sourceforge web site and on Sourceforge Wiki page http sourceforge net apps mediawiki gpumatmodules Some examples can be found in the GPUmat installation folder modules The matCUDA project is a collection of Matlab wrappers to CUDA CUBLAS and CUFFT libraries Documentation can be found in the GPUmat installation folder on the Sourceforge web site and on Sourceforge Wiki page http sourceforge net apps mediawiki matcuda 60 Chapter 6 Function Reference 6 1 Functions by category 6 1 1 GPU startup and management Name Description GPUinfo Prints information about the GPU device GPUstart Starts the GPU environment and loads re quired components GPUstop Stops the GPU environment 6 1 2 GPU variables management Name Description colon Colon double Converts a GPU variable into a Matlab dou ble precision variable eye Identity matrix GPUdouble GPUdouble constructor GPUeye Identity matrix GPUfill Fill a GPU variable GPUones GPU ones array GPUsingle GPUsingle constructor GPUsync Wait until all GPU operations are completed GPUzeros GPU zeros array memCpyDtoD Device Device memory
90. hee GV Ene ee eV end Secs pee ees a ek 99 DSL SCION ey ts is a a Dn de Sheetal Si ene amp ae 100 MA 28 2 4 2 by ea ete em TE eS 101 o er y uns Gd Wand Eee 102 AA Sve ir a a Mar Bee Bla 103 OO COS da a bis a a er ee eh 104 0 3 1o 0c0sh la Boa te ete a of tte A 105 6 3317 CinANS POSE at Sika ER A A 106 63 18 display var sch 2 oe Sieve see ie ae So rad e a 107 63 19 double co aaa boa eh ae eee em S 108 GPUmat Guide Version 0 27 Copyright gp you org CONTENTS CONTENTS O20 A oS ate re Se eee 109 632 Ox a hy ere we eee Ee Pele at 110 OG 22 NE e Fee ate Sek he elas bea Sek 111 ES A A Ge Ga see soit 9p fe 112 OZ 2A AN2 Ss ge ca oh he ae es Be de SP N ER 113 Gr 25 TOON So easi bs ed Sy A D T 114 E dan a ne AN EN 115 6 3 27 GPUcompileAbort a a By a ee Beis 116 6 3 28 GPUcompileStart 2 222 22 om nn 117 6 3 29 GPUcompileStop acne Can Er ned 118 6 3 30 GPUdouble ss ico waere er 119 E en une er 120 6 3 32 GPUisDoublePrecision 2 0 121 AR 6 wa O we 122 63 34 GP tound rd SS 123 6 3235 GP Usialy 2 2 4 2 2 2 2 83 8 2 43 08 Ka ae ee 124 6 3 00 GRUSITE A 222 ls Kae are ah 125 0 3 31 GPUstart lt Se 2 Seed he in a 126 DEI ID a ea er ee BO BS 127 6339 ea rl E ae Eher 128 A A Se ea Seren SY oe pease he SE 129 63 41 AMA a RR RE ee a eek oa e 130 6 3 42 jscompleX 2er rer ADA A a 131 623 49 ISemply a 3 45 au a ere ah oe a nk 132 6 3 44 amp ache Be A 133 6 3 45 AS E ER a 134 6 3 46 lIdivide
91. inf stride sup defines a range between inf and sup with spec ified stride It is similar to the Matlab syntax A inf stride sup The special keyword END please note uppercase END can be used similar to the colon used in Matlab indexing i1 i2 in any array enclosed by brackets is consid ered an indexes array similar to A 1 2 3 4 1 2 in Matlab i1 a single value is interpreted as an index Similar to A 10 in Matlab Compilation supported EXAMPLE Bh single rand 100 B GPUsingle Bh Ah Bh 1 end A slice B 1 1 END Ah Bh 1 10 Alco eo Ma BAS A la ce Bios Ah Bh 2 3 1 1 A slice B 2 3 1 1 Ah Bh AS laden 29 92 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 18 A I subsref Subscripted reference SYNTAX R X I X GPUsingle GPUdouble I GPUsingle GPUdouble Matlab range R GPUsingle GPUdouble 86 DESCRIPTION ACI subsref is an array formed from the elements of A specified by the subscript vector I The resulting array is the same size as I except for the special case where A and I are both vectors In this case A I has the same number of elements as I but has the orientation of A Compilation not supported EXAMPLE A GPUsingle 1 2 3 4 5 A GPUdouble 1 2 3 4 5 idx GPUsingle 1 2 B A idx GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6
92. ion Reference 6 2 OPERATORS 6 2 15 A B power Array power SYNTAX R X Y R power X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble DESCRIPTION Z X Y denotes element by element powers Compilation supported EXAMPLE A rand 10 GPUsingle I We R A 7 B A rand 10 GPUsingle i rand 10 GPUsingle R A 7 B MATLAB COMPATIBILITY Implemented for REAL exponents only 83 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 16 A B rdivide Right array divide SYNTAX R X YFY R rdivide X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 84 DESCRIPTION A B denotes element by element division A and B must have the same dimensions unless one is a scalar A scalar can be divided with anything Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R A B A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 17 85 slice Subscripted reference SYNTAX R slice X Ri R2 RN X GPUsingle GPUdouble R1 R2 RN Range R GPUsingle GPUdouble DESCRIPTION slice X R1 RN is an array formed from the elements of X specified by the ranges R1 R2 RN A range can be constructed as follows
93. le 0 1 1000 and creates a vector with single precision elements having values from 0 to 1000 Scalars are automatically converted to GPU variables as follows A rand 100 GPUsingle A is on GPU memory C A 1 executed on GPU C is on GPU memory equivalent to C A GPUsingle 1 In the above example the Matlab scalar can be converted to a GPU variable using GPUsingle but this is not necessary because the conversion is auto matically done in GPUmat Automatic casting between GPU and Matlab for non scalar variables is not done automatically The following code generates an error A colon 0 1 1000 GPUsingle A is on GPU memory B colon 0 1 1000 B is on CPU memory C A B ERROR Element by element operations such as the the matrix addition A B are highly optimized for the GPU It is suggested to use this kind of opera tions as explained in Section 3 11 2 2 Matrix multiplication example This section describes the code to perform the following tasks e Create A and B on the GPU memory 20 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 2 3 FFT CALCULATION EXAMPLE e Multiply A and B and store the results in C e Convert the result C into the Matlab variable Ch A rand 100 100 GPUsingle A is on GPU memory B rand 100 100 GPUsingle B is on GPU memory C AxB executed on GPU C is on GPU memory Ch single C Ch is on CPU memory The equivalent code
94. lt A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 47 GPUminus 222 GPUminus Minus SYNTAX GPUminus X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUminus X Y R is equivalent to minus X Y but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle Y rand 10 GPUsingle R zeros size X GPUsingle GPUminus Y X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 48 GPUmtimes 223 GPUmtimes Matrix multiply SYNTAX GPUmtimes X Y R X GPUsingle GPUdouble R GPUsingle GPUdouble Y GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUmtimes X Y R is equivalent to mtimes X Y but the result is returned in input parameter R Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R zeros size A GPUsingle GPUmtimes A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 49 GPUne 224 GPUne Not equal SYNTAX GPUne X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUne X
95. me Description complex Construct complex data from real and imag inary components GPUcomplex Construct complex data from real and imag inary components GPUimag Imaginary part of complex number GPUreal Real part of complex number imag Imaginary part of complex number real Real part of complex number 6 1 12 CUDA Driver functions Name Description cuCheckStatus Check the CUDA DRV status culnit Wrapper to CUDA driver function culnit cuMemGet Info Wrapper to CUDA driver function cuMemGetInfo 6 1 13 CUDA run time functions Name Description cudaCheckStatus Check the CUDA run time status cudaGetDeviceCount Wrapper to CUDA cudaGetDe viceCount function cudaGetDeviceMajorMinor Returns CUDA compute capabil ity major and minor numbers cudaGetDeviceMemory Returns device total memory cudaGetDeviceMultProcCount Returns device multi processors count cudaGetLastError Wrapper to CUDA cudaGet LastError function cudaSetDevice Wrapper to CUDA cudaSetDe vice function cudaThreadSynchronize Wrapper to CUDA cud aThreadSynchronize function 67 GPUmat Guide Version 0 27 Copyright gp you org 6 2 Operators CHAPTER 6 Function Reference 6 2 OPERATORS Operators are used in mathematical expression such as A B GPUmat over loads Matlab operators for the GPUsingle class 68 Name Description a b Binary addition a b Binary subtra
96. mpileStart function we run the GPUmat code that should be recorded in the compiled function as follows R exp A The function GPUcompileStop used to stop the compilation has the follow ing interface GPUcompileStop r1i r2 rn Parameters r to rn are the output arguments of the compiled function They can be only GPUtype GPUsingle GPUdouble etc The following example creates the function R1 R2 myfun A1 A2 two input and two 51 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 4 GPUmat compiler 4 2 FOR LOOPS output arguments gt Il randn 5 GPUsingle B randn 5 GPUsingle A and B are dummy variables GPUcompileStart myfun f A B R1 exp A R2 floor B GPUcompileStop R1 R2 The following is another example A randn 5 GPUsingle A is a dummy variable GPUcompileStart myfuni f A R1 floor exp A GPUcompileStop R1 Find more examples in the GPUmat folder examples file GPUmatCom piler m 4 2 For loops It is possible to generate for loops in the compiled code by using GPUfor and GPUend The following is an example A randn 5 5 5 GPUsingle B randn 5 GPUsingle GPUcompileStart myfor1 f A B GPUfor it 1 5 assigni CITAN BE A A ae t GPUend GPUcompileStop The following is another example with nested loops A randn 5 5 5 GPUsingle 52 GPUmat Guide Version 0 27 Copyright gp you org CHAP
97. mplex X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUcomplex X R is equivalent to complex X but result is re turned in the input parameter R Compilation supported EXAMPLE RE rand 10 GPUsingle IM rand 10 GPUsingle R complex zeros size RE GPUsingle GPUcomplex RE R R complex RE IM 200 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 27 GPUconj 201 GPUconj GPUconj X R is the complex conjugate of X SYNTAX GPUconj X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUconj X R is equivalent to CONJ X but result is returned in the input parameter R Compilation supported EXAMPLE A rand 1 5 GPUsingle i rand 1 5 GPUsingle R complex zeros size A GPUsingle GPUconj A R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 28 GPUcos 202 GPUcos Cosine of argument in radians SYNTAX GPUcos X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUcos X R is equivalent to COS X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUcos X R
98. nc toc The GPUsync command is used to synchronize the GPU code It means that Matlab waits until the GPU execution is completed The execution of the GPU code is asynchronous i e the control is returned to Matlab after calling the GPUmat function But this does not necessarily mean that the GPU has finished its task To force Matlab to wait until the GPU has finished his task the GPUsync command must be used Here is an example A rand 1000 1000 GPUsingle B rand 1000 1000 GPUsingle tic A B GPUsync toc Elapsed time is 0 010231 seconds tic A B toc Elapsed time is 0 003808 seconds Asynchronous execution is entirely managed by GPUmat and is transparent to the user The GPUsync should be used only when checking the GPU execution time 49 GPUmat Guide Version 0 27 Copyright gp you org Chapter 4 GPUmat compiler 4 1 Overview The GPUmat compiler allows the user to record several GPU operations into a single Matlab function see Table 4 1 for a summary of available GPUmat compiler functions Please check Section 4 3 for the system requirements By using the compiler it is possible to generate optimized code that is ex Name Description GPUcompileStart Starts the compilation GPUcompileStop Stops the compilation GPUcompileAbort Aborts the compilation GPUfor Starts a for loop GPUend Ends a for loop GPUcompileMEX Compiles a cpp file Table 4 1 GPUmat compiler functions ecuted fas
99. nerates code for each iteration of the loop the loop is unrolled By doing this way it is possible that the generated codes fills the compiler stack It is suggested to replace the native Matlab for loop statements with the GPUmat GPUfor loop commands 4 6 Not implemented functions Not every GPUmat function can be used during the compilation In general every function that retrieves a GPUtype property such as size or numel is not implemented Find more information for each function in Chapter 6 58 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 4 GPUmat compiler 4 7 ADDITIONAL COMPILATION OPTIONS 4 7 Additional compilation options The GPUcompileStart can be executed with the additional options in Table 4 7 Name Description f Force compilation Overwrites target file verbose0 Verbosity level 0 verbosel Verbosity level 1 verbose2 Verbosity level 2 verbose4 Verbosity level 4 Table 4 2 GPUcompileStart options For example A randn 5 5 GPUsingle GPUcompileStart code_ex5 f verbose4 A R exp A GPUcompileStop 59 GPUmat Guide Version 0 27 Copyright gp you org Chapter 5 Developer s section Starting from GPUmat version 0 22 this chapter is maintained through the following external open source projects e GPUmat User Modules on Sourceforge http sourceforge net projects gpumatmodules e matCUDA on Sourceforge http sourcefor
100. not supported EXAMPLE A rand 5 GPUsingle isreal A A rand 5 GPUsingle i rand 5 GPUsingle isreal A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 45 isscalar 134 isscalar True if array is a scalar SYNTAX R isscalar X X GPU variable R logical 0 or 1 MODULE NAME NUMERICS DESCRIPTION ISSCALAR S returns 1 if S is a 1x1 matrix and 0 otherwise Compilation not supported EXAMPLE A rand 5 GPUsingle isscalar A A GPUsingle 1 isscalar A A GPUdouble 1 isscalar A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 46 Idivide Idivide Left array divide SYNTAX R X Y R ldivide X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A B denotes element by element division A and B must have the same dimensions unless one is a scalar A scalar can be divided with anything Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle Boek E A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B 135 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 47 le le Less than or equal SY
101. nual for more in formation Compilation not supported EXAMPLE A randn 5 GPUsingle A is a dummy variable Compile function C myexp B GPUcompileStart myexp f A R exp A GPUcompileAbort GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 28 GPUcompileStart GPUcompileStart Starts the GPUmat compiler SYNTAX GPUcompileStart NAME OPTIONS X1 X2 XN NAME Function name OPTIONS Compilation options X1 X2 XN GPUsingle GPUdouble Matlab variables MODULE NAME na DESCRIPTION Starts the GPUmat compiler Check the manual for more informa tion Compilation not supported EXAMPLE A randn 5 GPUsingle A is a dummy variable Compile function C myexp B GPUcompileStart myexp f A R exp A GPUcompileStop R 117 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 29 GPUcompileStop 118 GPUcompileStop Stops the GPUmat compiler SYNTAX GPUcompileStop X1 X2 XN X1 X2 XN GPUsingle GPUdouble Matlab variables MODULE NAME na DESCRIPTION Stops the GPUmat compiler Check the manual for more informa tion Compilation not supported EXAMPLE A randn 5 GPUsingle A is a dummy variable Compile function C myexp B GPUcompileStart myexp f A R e
102. oes as nit Deets ge Beet Rte a e 226 GRUB Le e Ord an a el He da So ee a 227 GPUpl Si 2 2 Beer Bee e a 228 GRUDOWER 14 20 Duros a bo Se a Me eae E 229 GPUmat Guide Version 0 27 Copyright gp you org 6 4 55 6 4 56 6 4 57 6 4 58 6 4 59 6 4 60 6 4 61 6 4 62 6 4 63 6 4 64 6 4 65 6 4 66 6 4 67 6 4 68 6 4 69 6 4 70 6 4 71 6 4 72 6 4 73 6 4 74 6 4 75 6 4 76 Bibliography CONTENTS CONTENTS A 2a sa aa a ra 229 SE OVATION re ns hee ts Gee uae nr 230 GPU Wide Casale area back Roa a ee ti sde 231 GPUreal naiss eon Boge Ba ele Dente a cs oan ee Ge tte ee amp 232 GPUSIH svs 2 3 2 eR 2 8 2 288 ag he i et 233 GPEISNEIE Es Guide re Be Gene he re ee 234 GPU SHOR A a ea eh a A 235 GPU SiGe acct tr Gay Neots Seca ae id sit Be 235 GPUtan ut a a e do a ge aA 236 AA are Lt ran an Ne al ra na 237 GPUtimes 4 See tence ar ar ee ee o a ee Y 238 GPUtranspose sic Rein i Re SO da TE Sd 239 SP WUINUSY e Ark sa a ai eS oy a 240 GPUuserModuleLoad 241 GPUuserModulesInfo 242 GPUuserModuleUnload 243 GPU ZEROS Aare ee aa ae Ss tk 244 memCpyDtoD 46 8 2 wh a a SS 245 MEME PYEITOD Su i poe a eae da en 246 respetada u ic E bree Eon hea eon H 247 FOUND SY ee Pe wad Gk he 248 E a adh a ae Adie ale E amp x a 249 250 GPUmat Guide Version 0 27 Copyright gp you org Chapter 1 Introduction GPUmat enables Matlab code to run on the Graphical Processing Unit GPU The foll
103. omplex X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION R complex X Y creates a complex output R from the two real inputs X and Y R complex X creates a complex output R from the real input X Imaginary part is set to 0 Compilation supported EXAMPLE RE rand 10 GPUsingle IM rand 10 GPUsingle R complex RE R complex RE IM 102 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 14 conj 103 conj CONJ X is the complex conjugate of X SYNTAX R conj X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION For a complex X CONJ X REAL X i IMAG X Compilation supported EXAMPLE A B rand 1 5 GPUsingle i rand 1 5 GPUsingle conj A GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 15 cos cos Cosine of argument in radians SYNTAX R cos X X GPUsingle GPUdouble 104 R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION COS X is the cosine of the elements of X Compilation supported EXAMPLE X R rand 10 GPUsingle cos X MATLAB COMPATIBILITY Not implemented for complex X GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS
104. on Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 67 randn 157 randn GPU pseudorandom generator SYNTAX randn N GPUsingle randn M N GPUsingle randn M N GPUsingle randn M N P GPUsingle randn M N P GPUsingle randn N GPUdouble randn M N GPUdouble randn M N GPUdouble randn M N P GPUdouble randn M NP GPUdouble MODULE NAME RAND DESCRIPTION randn N GPUsingle is an N by N GPU matrix of values generated with a pseudorandom generator normal distribution randn M N GPUsingle or randn M N GPUsingle is an M by N GPU matrix randn M N P GPUsingle or randn M N P GPUsingle is an M by N by P by GPU array of single precision values randn M N P GPUdouble or randn M N P GPUdouble is an M by N by P by GPU array of double precision values Compilation supported EXAMPLE randn 10 GPUsingle randn 10 10 GPUsingle randn 10 10 GPUsingle randn 10 GPUdouble randn 10 10 GPUdouble randn 10 10 GPUdouble QUur gt GQ ur GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 68 rdivide rdivide Right array divide SYNTAX R X Y R rdivide X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A B denotes element by element division A and B must have the same dim
105. on the CPU is the following A single rand 100 100 A is on CPU memory B single rand 100 100 B is on CPU memory C AxB executed on CPU C is on CPU memory 2 3 FFT calculation example This section describes the code to perform the following tasks e Create two arrays A and B on the GPU e Calculate 1D FFT of A e Calculate 2D FFT of B e Transfer results from GPU into Matlab variables Ah and Bh A rand 1 100 GPUsingle GPU B rand 100 100 GPUsingle GPU Ah 1D FFT FFT_A fft A executed on GPU Ah 2D FFT FFT_B fft2 B executed on GPU Convert GPU into Matlab variables Ah single A Ah is on HOST Bh single B Bh is on HOST 21 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 2 4 GPUMAT COMPILER FFT_Ah single FFT_A FFT_Ah is on HOST FFT_Bh single FFT_B FFT_Bh is on HOST The equivalent code that executes above operations entirely on the CPU is the following A single rand 1 100 CPU B single rand 100 100 CPU 1D FFT FFT_A fft A executed on CPU Ah 2D FFT FFT_B fft2 B executed on CPU 2 4 GPUmat compiler The GPUmat compiler is used to record GPU operations into a new function The compiled function is optimized and faster than the non compiled code Moreover the GPUmat compiler can be used to optimize for loops as shown in the GPUmatCompiler m file located in the GPUmat example folder 2 5 Variable assignm
106. ons on the GPU The following example explains the mechanism that allows Matlab functions to be executed on the GPU A rand 10 GPUsingle A is on GPU B exp A exp calculated on GPU The exp function in the above code is the one implemented in GPUmat and not the built in function Matlab uses the GPUmat function because the 33 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 4 PORTING EXISTING MATLAB CODE argument of the exp is a GPUsingle type The following example shows similar code executed on CPU A B single rand 10 A is on CPU exp A exp calculated on CPU The mechanism to execute a function on the GPU is the following e Functions involving the GPU variables are executed on GPU by using GPUmat functions e Not every Matlab function is defined in GPUmat This means that not every Matlab code is executed on the GPU but only the Matlab code that uses functions defined in GPUmat The complete function reference can be found in Chapter 6 GPUmat implements also Matlab operators such as It means that algebraic expressions such as A B are also defined in GPUmat and executed on the GPU GPU operators are shown on table 3 9 Here is an example A rand 100 100 GPUsingle GPU variable B A 5 A xAx2 1 run on GPU C A lt B Arun on GPU Same operation performed on CPU A single A CPU variable B A 5 A Ax2 1 Arun on CPU Cae a ABE run on
107. ory transfer is a time consuming task and might reduce the performance of the code Function Description A GPUsingle Ah Creates a GPU array A initial A GPUdouble Ah ized with the Matlab array Ah Requires GPU CPU memory transfer A rand size GPUsingle Creates a GPU array initialized A rand size GPUdouble with random numbers uniform distribution A randn size GPUsingle Creates a GPU array initialized A randn size GPUdouble with random numbers normal distribution A zeros size GPUsingle Creates a GPU array initialized A zeros size GPUdouble with zeros A ones size GPUsingle Creates a GPU array initialized A ones size GPUdouble with ones A colon begin A colon begin stride stride end GPUsingle end GPUsingle creates a regu A colon begin stride larly spaced GPU vector A with end GPUdouble values in the range begin end C vertcat A B orC A B Vertical concatenation Can be applied to more than 2 GPU vec tors Table 3 2 Functions used to create GPU variables 28 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 2 CREATING A GPU VARIABLE A GPUsingle Ah A GPUdouble Ah Creates a GPU single or double precision variable A initialized with the Matlab array Ah A has the same properties as Ah such as the size and the number of elements Requires GPU CPU memory transfer Exampl
108. owing is a summary of GPUmat most important features e GPU computational power can be easily accessed from Matlab without any GPU knowledge e Matlab code is directly executed on the GPU The execution is trans parent to the user e GPUmat speeds up Matlab functions by using the GPU multi processor architecture e Existing Matlab code can be ported and executed on GPUs with few modifications e GPU resources are accessed using Matlab scripting language The fast code pryping capability of the scripting language is combined with the fast code execution on the GPU e GPUmat can be used as a Source Development Kit to create new func tions and extend the library functionality e GPU operations can be easily recorded into new functions using the GPUmat compiler 1 1 About GPUs Although GPUs have been traditionally used only for computer graphics a recent technique called GPGPU General purpose computing on graph ics processing units allows the GPUs to perform numerical computations usually handled by CPU The advantage of using GPUs for general purpose CHAPTER 1 Introduction 1 2 SYSTEM REQUIREMENTS computation is the performance speed up that can be achieved due to the parallel architecture of these devices One of the most promising GPGPU technologies is called CUDA SDK 1 developed by NVIDIA For further information about CUDA GPGPU and related topics please check 2 3 1 2 System requirements GPUmat was teste
109. peration in STATUS Original function declaration cudaError_t cudaSetDevice int dev Compilation not supported 6 4 9 cudaThreadSynchronize cudaThreadSynchronize Wrapper to CUDA cudaThreadSyn chronize function MODULE NAME na DESCRIPTION STATUS cudaThreadSynchronize STATUS is the result of the operation Original function declaration cudaError_t cudaThreadSynchronize void Compilation not supported 184 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 10 cufftPlan3d cufftPlan3d Wrapper to CUFFT cufftPlan3d function MODULE NAME na DESCRIPTION Wrapper to CUFFT cufftPlan3d function Original function decla ration cufftResult cufftPlan2d cufftHandle plan int nx int ny int nz cufftType type Original function returns only a cufftResult whereas wrapper re turns also the plan Compilation not supported 6 4 11 culnit culnit Wrapper to CUDA driver function culnit MODULE NAME na DESCRIPTION Wrapper to CUDA driver function culnit Compilation not supported 185 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 12 cuMemGetinfo 186 cuMemGetlInfo Wrapper to CUDA driver function cuMemGet Info MODULE NAME na DESCRIPTION Wrapper to CUDA driver function cuMemGetInfo Compilation not supported EX
110. pper to CUDA cudaGetDeviceCount function MODULE NAME na DESCRIPTION Wrapper to CUDA cudaGetDeviceCount function Compilation not supported EXAMPLE count 0 status count cudaGetDeviceCount count if status 0 error Unable to get the number of devices end GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 4 cudaGetDeviceMajorMinor 181 cudaGet DeviceMajorMinor Returns CUDA compute capability major and minor numbers MODULE NAME na DESCRIPTION Returns CUDA compute capability major and minor numbers STATUS MAJOR MINOR cudaGetDeviceMajorMinor DEV returns the compute capability number major minor of the device DEV STATUS is the result of the operation Compilation not supported EXAMPLE dev 0 status major minor cudaGetDeviceMajorMinor dev if status 0 error Unable to get the compute capability end major minor GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 5 cudaGetDeviceMemory 182 cudaGetDeviceMemory Returns device total memory MODULE NAME na DESCRIPTION STATUS TOTMEM cudaGetDeviceMemory DEV returns the to tal memory of the device DEV STATUS is the result of the oper ation Compilation not supported EXAMPLE dev 0 status totmem cudaGetDeviceMemory
111. r R Compilation supported EXAMPLE A GPUsingle 1 3 0 4 B GPUsingle 0 1 10 2 R zeros size A GPUsingle GPUand A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 21 GPUasin 195 GPUasin Inverse sine SYNTAX GPUasin X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUasin X R is equivalent to ASIN X but result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUasin X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 22 GPUasinh GPUasinh Inverse hyperbolic sine SYNTAX GPUasinh X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUasinh X R is equivalent to ASINH X but result is returned in the input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUasinh X R 196 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 23 GPUatan 197 GPUatan Inverse tangent result in radians SYNTAX GPUatan X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUatan
112. result is re turned in input parameter R Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R zeros size B GPUsingle GPUor A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 53 GPUplus 228 GPUplus Plus SYNTAX GPUplus X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUplus X Y R is equivalent to plus X Y but the result is returned in input parameter R Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R zeros size B GPUsingle GPUplus A B R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 54 GPUpower GPUpower Array power SYNTAX GPUpower X Y R X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUpower X Y R is equivalent to power X Y but the result is returned in input parameter R Compilation supported 6 4 55 GPUrand 229 GPUrand GPU pseudorandom generator SYNTAX GPUrand R R GPUsingle GPUdouble MODULE NAME RAND DESCRIPTION GPUrand R returns in R a matrix containing pseudorandom values drawn from the standard uniform distribution Compilation supported EXAMPLE
113. rg CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 9 A B mrdivide Slash or right matrix divide SYNTAX R X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble TT DESCRIPTION Slash or right matrix divide Compilation supported EXAMPLE A rand 10 GPUsingle B A 5 A rand 10 GPUdouble B A 5 MATLAB COMPATIBILITY Supported only A n where n is scalar GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 10 A B mtimes Matrix multiply SYNTAX R X Y R mtimes X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 78 DESCRIPTION mtimes X Y is the matrix product of X and Y Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R Ax B A rand 10 GPUdouble B rand 10 GPUdouble R AxB A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 11 A B ne Not equal SYNTAX R X Y R ne X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble DESCRIPTION A B ne A B does element by element comparisons between A and B 79 Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A B single R R ne A B single R GPUmat Guide Ver
114. riable Ch Ah single rand 100 100 Ah in on CPU memory A GPUsingle Ah Create A GPU initialized with Ah CPU C exp A exp A performed on GPU Ch single C convert C GPU to Ch CPU The above example without CPU to GPU memory transfer is the following Ah single rand 100 100 Ah in on CPU memory A rand 100 100 GPUsingle Create A GPU C exp A exp A performed on GPU Ch single C convert C GPU to Ch CPU Please note that in the above code Ah and A are different The previous example in double precision is the following if GPUisDoublePrecision Ah rand 100 100 Ah in on CPU memory A GPUdouble Ah Create A GPU initialized with Ah CPU C exp A exp A performed on GPU Ch double C convert C GPU to Ch CPU end To visualize the contents of a GPU variable type the name of the variable on the Matlab command window A rand 5 GPUsingle ans 17 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 2 Quick start 2 1 MATRIX ADDITION EXAMPLE 0 8147 0 0975 0 1576 0 1419 0 6557 0 9058 0 2785 0 9706 0 4218 0 0357 0 1270 0 5469 0 9572 0 9157 0 8491 0 9134 0 9575 0 4854 0 7922 0 9340 0 6324 0 9649 0 8003 0 9595 0 6787 Single precision REAL GPU type Next sections show different examples matrix addition matrix multiplica tion and FFT calculation 2 1 Matrix addition example The following code can be found in the QuickStart m file locat
115. rsion 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 4 and and Logical AND SYNTAX R A amp B R and A B A GPUsingle GPUdouble B GPUsingle GPUdouble R GPUsingle GPUdouble 92 MODULE NAME NUMERICS DESCRIPTION A amp B performs a logical AND of arrays A and B and returns an array containing elements set to either logical 1 TRUE or logical 0 FALSE Compilation supported EXAMPLE A GPUsingle 1 3 0 4 B GPUsingle 0 1 10 2 R A amp B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 5 asin asin Inverse sine SYNTAX R asin X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ASIN X is the arcsine of the elements of X NaN Not A Number results are obtained if ABS x gt 1 0 for some element Compilation supported EXAMPLE X R rand 10 GPUsingle asin X MATLAB COMPATIBILITY NaN returned if ABS x gt 1 0 In this case Matlab returns a complex number Not implemented for complex X 93 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 6 asinh asinh Inverse hyperbolic sine SYNTAX R asinh X X GPUsingle GPUdouble 94 R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ASI
116. s an error if this package is not installed The GPU environment will not correctly work if a CUDA compatible graphic card and CUDA toolkit are not installed on the system 1 5 Terminology The following is a summary of common terms and concepts used in this manual e GPU Graphics Processing Unit It is the graphic card We assume that the GPU is compatible with NVIDIA s CUDA SDK e HOST The computer where the GPU is installed e CPU The Central Processing Unit installed on the HOST e GPU memory the memory available on the GPU e CPU memory the memory available on the HOST e CUDA capable GPU a GPU compatible with NVIDIA CUDA SDK 1 6 Documentation overview This manual is organized as follows e Quick start describes GPUmat basic concepts by using simple exam ples e Overview describes GPUmat high level functions e GPUmat compiler describes how to record new functions using the GPUmat compiler 12 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 1 Introduction 1 6 DOCUMENTATION OVERVIEW e Developer s section describes low level functions and how to imple ment new functions in GPUmat The first two chapters contains enough information to understand the basic concepts of the library and are intended for users with at least some ex perience with Matlab Chapter 5 is intended for users familiar with GPU programming concepts in particular with the CUDA SDK The Function reference can be found
117. s array SYNTAX ones N GPUsingle ones M N GPUsingle ones M N GPUsingle ones M N P GPUsingle ones M N P GPUsingle ones N GPUdouble ones M N GPUdouble ones M N GPUdouble ones M N P GPUdouble ones M N P GPUdouble MODULE NAME NUMERICS DESCRIPTION ones N GPUsingle is an N by N GPU matrix of ones ones M N GPUsingle or ones M N GPUsingle is an M by N GPU matrix of ones ones M N P GPUsingle or ones M NP GPUsingle is an M by N by P by GPU array of ones ones M N P GPUdouble or ones M NP GPUdouble is an M by N by P by GPU array of ones Compilation supported EXAMPLE ones 10 GPUsingle ones 10 10 GPUsingle ones 10 10 GPUsingle ones 10 GPUdouble ones 10 10 GPUdouble ones 10 10 GPUdouble QnWnrawe I GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 62 or or Logical OR SYNTAX R X Y R or X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A B or A B performs a logical OR of arrays A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A B single R R or A B single R 151 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST
118. s infinity clone Creates a copy of a GPUtype conj CONJ X is the complex conjugate of X cos Cosine of argument in radians cosh Hyperbolic cosine ctranspose Complex conjugate transpose eq Equal exp Exponential fft Discrete Fourier transform fft2 Two dimensional discrete Fourier Transform floor Round towards minus infinity ge Greater than or equal GPUround Round towards nearest integer GPUsinh Hyperbolic sine GPUsqrt Square root gt Greater than ifft Inverse discrete Fourier transform ifft2 Two dimensional inverse discrete Fourier transform ldivide Left array divide le Less than or equal log Natural logarithm log10 Common base 10 logarithm logip Compute log 1 z accurately log2 Base 2 logarithm and dissect floating point number lt Less than GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 1 FUNCTIONS BY CATEGORY minus Minus mrdivide Slash or right matrix divide mtimes Matrix multiply ne Not equal not Logical NOT or Logical OR permute Permute array dimensions plus Plus power Array power rdivide Right array divide sin Sine of argument in radians sinh Hyperbolic sine slice Subscripted reference sqrt Square root subsref Subscripted reference sum Sum of elements tan Tangent of argument in radians tanh Hyperbolic tangent times Array multiply vertcat Vertical concatenation 6
119. se check Chapter 4 for more details GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 3 GPUmat overview 3 11 CODING GUIDELINES Next section explains previous points with more details 3 11 1 Memory transfers The most time consuming task is the memory transfer from to GPU such as initializing a GPU variable with a Matlab array Here is an example Ah rand 1000 Ah is on CPU memory A GPUsingle Ah A is on GPU memory In the above code the variable Ah is used to initialize the GPU variable A which means that data is transferred from the CPU to the GPU memory Vice versa when a GPU variable is converted into a Matlab variable there is a memory transfer from the GPU to the CPU A rand 1000 GPUsingle A is on GPU memory Ah single A Ah is on CPU memory The fastest way to initialize or create a GPU variable is to use existing variables on the GPU memory to create other GPU variables or to use functions such as zeros colon or rand which directly create values on the GPU without transferring data from Matlab Please check Section 3 2 for more information about creating new GPU variables with GPUmat 3 11 2 Vectorized code and for loops Another way to improve the code performance is to avoid for loops by using vectorized operations For example for i 1 1e6 A rand 3 3 B rand 3 3 C A B hh do something with C end The above code can be executed as is on the GPU by converting A and
120. single GPUdouble MODULE NAME NUMERICS DESCRIPTION ABS X is the absolute value of the elements of X When X is com plex ABS X is the complex modulus magnitude of the elements of X Compilation supported EXAMPLE rs Il rand 1 5 GPUsingle i rand 1 5 GPUsingle abs X gt Il GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 2 acos acos Inverse cosine SYNTAX R acos X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ACOS X is the arccosine of the elements of X NaN Not A Number results are obtained if ABS x gt 1 0 for some element Compilation supported EXAMPLE X R rand 10 GPUsingle acos X MATLAB COMPATIBILITY NaN returned if ABS x gt 1 0 In this case Matlab returns a complex number Not implemented for complex X 90 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 3 acosh 91 acosh Inverse hyperbolic cosine SYNTAX R acosh X X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION ACOSH X is the inverse hyperbolic cosine of the elements of X Compilation supported EXAMPLE X R rand 10 GPUsingle 1 acosh X MATLAB COMPATIBILITY NaN is returned if X lt 1 0 Not implemented for complex X GPUmat Guide Ve
121. sion 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 12 A 80 not Logical NOT SYNTAX R X X GPUsingle GPUdouble R GPUsingle GPUdouble DESCRIPTION A not A performs a logical NOT of input array A Compilation supported EXAMPLE A GPUsingle 1 2 0 4 R ely single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 13 A B or Logical OR SYNTAX R SX Y R or X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 81 DESCRIPTION A B or A B performs a logical OR of arrays A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 A B single R R or A B single R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 2 OPERATORS 6 2 14 A B plus Plus SYNTAX R X Y R plus X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble 82 DESCRIPTION X Y plus X Y adds matrices X and Y X and Y must have the same dimensions unless one is a scalar a 1 by 1 matrix A scalar can be added to anything Compilation supported EXAMPLE A rand 10 GPUsingle B rand 10 GPUsingle R A B A rand 10 GPUsingle i rand 10 GPUsingle B rand 10 GPUsingle i rand 10 GPUsingle R A B GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Funct
122. sion 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 53 It It Less than SYNTAX R X lt yY R 1t X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION A lt B 1t A B does element by element comparisons between A and B Compilation supported EXAMPLE A GPUsingle 1 2 0 4 B GPUsingle 1 0 0 4 R A lt B single R R 1A B single R 142 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 54 minus minus Minus SYNTAX R X Y R minus X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION X Y subtracts matrix Y from X X and Y must have the same dimensions unless one is a scalar A scalar can be subtracted from anything Compilation supported EXAMPLE rand 10 GPUsingle rand 10 GPUsingle Nee rand 10 GPUdouble rand 10 GPUdouble We D KAKA 143 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 55 mrdivide mrdivide Slash or right matrix divide SYNTAX R X Y X GPUsingle GPUdouble Y GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION Slash or right matrix divide Compilation supporte
123. t parameter R Compilation supported EXAMPLE A rand 10 GPUsingle sqrt 1 rand 10 GPUsingle R zeros size A GPUsingle GPUreal A R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 59 GPUsin 233 GPUsin Sine of argument in radians SYNTAX GPUsin X R X GPUsingle GPUdouble R GPUsingle GPUdouble MODULE NAME NUMERICS DESCRIPTION GPUsin X R is equivalent to sin X but the result is returned in input parameter R Compilation supported EXAMPLE X rand 10 GPUsingle R zeros size X GPUsingle GPUsin X R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 60 GPUsingle GPUsingle GPUsingle constructor SYNTAX R GPUsingle R GPUsingle A A Either a GPU variable or a Matlab array R GPUsingle variable MODULE NAME na DESCRIPTION GPUsingle is used to create a Matlab variable allocated on the GPU memory Operations on GPUsingle objects are executed on GPU Compilation supported EXAMPLE Ah rand 100 A GPUsingle Ah Bh rand 100 i rand 100 B GPUsingle Bh 234 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 4 LOW LEVEL FUNCTIONS ALPHABETICAL LIST 6 4 61 GPUstop GPUstop Stops the GPU environment SYNTAX GPUstop MODULE NAME na D
124. ter than the native GPUmat code Nevertheless there are some limitations see Section 4 4 The compilation is performed as follows e Start the compilation Define the input arguments of the generated function e Execute operations on the GPU by running GPUmat code Every GPU operation is recorded into the generated function e Stop the compilation Define the output arguments of the generated function 50 CHAPTER 4 GPUmat compiler 4 1 OVERVIEW The following code generates a function r1 rn name p1 p2 pn where pi to pn are input parameters and r to rn are output parameters GPUcompileStart name p1 p2 pn GPUcompilvestop Gris r2 rn For example the following code shows how to compile a function myexp having one input and one output argument and the same behavior as the native GPUmat exp function A randn 5 GPUsingle A is a dummy variable GPUcompileStart myexp f A R exp A GPUcompileStop R The GPUcompileStart function is used to start the compilation and has the following interface GPUcompileStart name p1 p2 pn The parameter name is the name of the compiled function Parameters p1 to pn are the input arguments of the compiled function They can be a GPUtype GPUsingle GPUdouble etc or a Matlab variable The variable A in the above example is a dummy variable It is used to define the first input argument of the function myexp After calling the GPUco
125. unctions the user needs some experience in GPU programming For example low level functions can directly manage GPU memory which is automatically handled with a Garbage Collector on high level functions Low level functions can also directly access CUDA libraries such as CUBLAS and CUFFT The detailed list of high level and low level functions can be found in Chapter 6 GPUmat can be used in the following ways e As any other Matlab toolbox by using high level functions This is the easiest way to use GPUmat e As a GPU Source Development Kit in order to integrate functions that are not available in the library by using both low and high level functions The GPUmat compiler can also be used to record GPU operations into new functions This chapter describes how to use the GPUmat high level functions Users can find further information about low level functions in Chapter 5 The full function reference is in Chapter 6 This chapter describes the following topics e Starting the GPU environment e Creating a GPU variable e Performing calculations on the GPU e Converting a GPU variable into a Matlab variable e Indexed references e GPUmat functions 26 CHAPTER 3 GPUmat overview 3 1 STARTING THE GPU ENVIRONMENT e GPU memory management e Complex numbers e Compatibility between Matlab and GPUmat e GPUmat code performance 3 1 Starting the GPU environment Name Description GPUstart Starts GPU environment and
126. xp A GPUcompileStop R GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 30 GPUdouble GPUdouble GPUdouble constructor SYNTAX R GPUdouble R GPUdouble A A Either a GPU variable or a Matlab array R GPUsingle variable MODULE NAME na DESCRIPTION GPUdouble is used to create a Matlab variable allocated on the GPU memory Operations on GPUdouble objects are executed on GPU Compilation supported EXAMPLE GPUdouble rand 100 100 Ah rand 100 A GPUdouble Ah Bh rand 100 i rand 100 B GPUdouble Bh 119 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 31 GPUinfo 120 GPUinfo Prints information about the GPU device SYNTAX GPUinfo MODULE NAME na DESCRIPTION GPUinfo displays information about each CUDA capable device installed on the system Printed information includes total memory and number of processors GPUinfo N displays information about the specific device with index N Compilation supported EXAMPLE GPUinfo 0 GPUmat Guide Version 0 27 Copyright gp you org CHAPTER 6 Function Reference 6 3 HIGH LEVEL FUNCTIONS ALPHABETICAL LIST 6 3 32 GPUisDoublePrecision 121 GPUisDoublePrecision Check if GPU is double precision SYNTAX GPUisDoublePrecision MODULE NAME na DESCRIPT

Download Pdf Manuals

image

Related Search

Related Contents

Philips 1-cup podholder CRP700  Tandberg Data 8171-LTO  91310100_REVB_Sub EQ_Solitaire.indd  Alternadores Baja Tensión - 4 polos LSA 47.2  SUCTION PUMP  (PRO LIGHT) LED-MB200R USER MANUAL - 04.08.2014  PDFダウンロード  E30 www.tunturi.com  Rexel Agenda2    

Copyright © All rights reserved.
Failed to retrieve file