Home

Users Manual

1. input same struct is output from cg multiparzntVP cg struct MultiPaintV2F float4 HPosition POSO O So CELOS O ES float4 TexCoords TEXCOORDO base ST coordinates float3 OPosition TEXCOORD1 position obj space float3 Normal TEXCOORD2 normal eye space float3 VPosition TEXCOORD3 view pos obj space losas T TEXCOORD4 tangent obj space float3 B TEXCOORD5 binormal obj space float3 N TEXCOORD6 normal obj space float4 LightVecO B sve UP iiie chis cel9y specs channels in our material map eese SIPC Sane sx define METALNESS y define NORM SPEC EXPON z Il gstielos im YSpecData define MINPOWER x define MAXPOWER y define MAXSPEC z zy Pu H Pu gustielo im Retlinaca define FRESNEL MIN x define FRESNEL MAX y define FRESNEL EXPON z define REFL STRENGTH w 808 00504 0000 006 NVIDIA 167 Cg Language Toolkit subfields in BumpData define BUMP_SCALE x half4 main MultiPaintV2F IN uniform sampler2D ColorMap if colos uniform sampler2D MaterialMap see above uniform sampler2D NormalMap tangent space normals uniform samplerCUBE EnvMap environment skybox uniform float4 SpecData see above uniform float4 ReflData see above uniform float4 BumpData see above P COLOR half4 surfCol tex2D Col
2. y DWORD declaration D3DVSD_STREAM 0 D3DVSD_REG cgD3D8ResourceToInputRegister CG_POSITION DSDVSDTERROATS I D3DVSD_REG cgD3D8ResourceToInputRegister CG_COLORO D3DVSDT_D3DCOLOR D3DVSD_STREAM 1 D3DVSD_SKIP 4 D3DVSD REG cgD3D8ResourceToInputRegister CG TEXCOORDO 808 00504 0000 006 89 NVIDIA Cg Language Toolkit D3DVSDT FLOAT2 D3DVSD END y If it is possible to do so the functions cgD3D9ResourceToDeclUsage and cgD3D8ResourceToInputRegister convert a CGresource enumerated type into a Direct3D vertex shader input register BYTE cgD3D9ResourceToDeclUsage CGresource resource DWORD cgD3D8ResourceToInputRegister CGresource resource If the resource is not a vertex shader input resource the call to cgD3D9ResourceToDeclUsage returns CGD3D9 INVALID REG and the call to cgD3D8ResourceToInputRegister returns CGD3D8 INVALID REG To write the vertex declarations described above based on the program parameters which eliminates the reference to any semantic use cgD3D9ResourceToDeclUsage Or cgD3D8ResourceToInputRegister CGparameter position cgGetNamedParameter program position CGparameter color cgGetNamedParameter program color CGparameter texCoord cgGetNamedParameter program texCoord const D3DVERTEXELEMENT9 declaration 7L 9 D slizcor Melo y D3DDECL
3. With this program handle egEvaluateProgram evaluates the program over the same one two or three dimensional domain Its parameters are as follows a CGprogram handle Q a float to an output buffer Q the number of components in the output buffer 1 2 3 or 4 Q the number of positions in the x dimension at which to evaluate the function QO the number of positions in the y dimension Q the number of positions in the z dimension The total size of the buffer should be equal to the product of the number of positions in each of the dimensions and the number of components in the buffer define RES 256 define NCOMPS 4 float buf new float NCOMPS RES RES cgEvaluateProgram tp buf NCOMPS RES RES 1 Do something with buf delete buf It is a runtime error to pass a CGprogram that doesn t have the CG PROFILE GENERIC profile to cgEvaluateProgram 808 00504 0000 006 3l NVIDIA Cg Language Toolkit Annotations Additionally each variable technique pass and program in the file can have an optional annotation The annotation is a per variable instance structure that contains data that the effect author wants to communicate to a CgFX aware application such as an artist tool The application can then allow the variable to be manipulated based on a GUI element that is appropriate for the type of annotation An annotation can be used to describe a user in
4. recurra losa ria Color 15 09 p 808 00504 0000 006 207 NVIDIA Cg Language Toolkit Shadow Mapping Description This effect shows generating texture coordinates for shadow mapping along with using the shadow map in the lighting equation per pixel Fig 19 Fig 19 Example of Shadow Mapping 208 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Shadow Mapping struct appdata float3 Position POSITION float3 Normal NORMAL y struct vocoma 1 float4 Hposition BOSTON float4 TexCoordO0 EXCOORDO float4 TexCoordl EXCOORD1 low erem COLORO y vpconn main appdata IN uniform uniform uniform uniform float3 LightVec vpconn OUT float3 worldNormal float ldotn max dot LightVec UA Color sy tcl teme float4 tempPos tempPos xyz IN Position xyz tempPos w 1 0 OU exCoord0 mul TexTransform OU exCoordl mul TexTransform OUT Hposition rerurn OUT normalize mul WorldIT worldNormal 0 mul WorldViewProj float4x4 WorldViewProj float4x4 TexTransform float3x3 WorldIT IN Normal 0 tempPos tempPos tempPos 808 00504 0000 006 NVIDIA 209 Cg Language Toolkit Pixel Shader Source Code for Shadow Mapping struct v2f_simple ao Sion IOSIITIQNE float4 TexCoord0 TEXCOORDO float4 TexCoordl TEXCOORD1 float4 Color0 COLORO
5. HRESULT hresult cgD3D8LoadProgram vertexProgram TRUE D3DXASM_DEBUG D3DUSAGE_SOFTWAREVERTEXPROCESSING declaration HRESULT hresult cgD3D8LoadProgram fragmentProgram TRUE 0 0 0 If you want to apply the same vertex program to several sets of geometric data each having a different layout you need to load the program with different vertex declarations in Direct3D 8 To do so you need to make a duplicate of the program using cgCopyProgram for each of these declarations Here is a code sample illustrating this operation CGprogam programl program2 programl cgCreateProgramFromFile context CG_SOURCE yerce rosa es CE ARO Iba WS i 09 const DWORD declarationl cgD3D8GetVertexDeclaration programl cgD3D8LoadProgram programl TRUE 0 0 declaration1 program2 cgCopyProgram programl const DWORD declaration2 Custom declaration y if cgD3D8ValidateVertexDeclaration program2 declaration2 cgD3D8LoadProgram program2 TRUE 0 0 declaration2 Only the loading functions differ between Direct3D 9 and Direct3D 8 the unloading and binding functions are the same To release the Direct3D resources allocated by cgD3D9LoadProgram such as the Direct3D shader object and any shadowed parameter use HRESULT cgD3D9UnloadProgam CGprogram program Note that cgD3D9UnloadProgam does not free any core runtime resources such as
6. O CG INVALID PROFILE ERROR Returned when the profile is not supported 808 00504 0000 006 7 NVIDIA Cg Language Toolkit O CG INVALID VALUE TYPE ERROR Returned when an unknown value type is assigned to a parameter Q CG_NOT_MATRIX_PARAM ERROR Returned when the parameter is not of a matrix type O CG INVALID ENUMERANT ERROR Returned when the enumerant parameter has an invalid value O CG NOT 4x4 MATRIX ERROR Returned when the parameter must be a 4x4 matrix type CG FILE READ ERROR Returned when the file cannot be read CG FILE WRITE ERROR Returned when the file cannot be written CG MEMORY ALLOC ERROR Returned when a memory allocation fails D D DO O CG INVALID CONTEXT HANDLE ERROR Returned when an invalid context handle is used QO CG INVALID PROGRAM HANDLE ERROR Returned when an invalid program handle is used OQ CG INVALID PARAM HANDLE ERROR Returned when an invalid parameter handle is used O CG UNKNOWN PROFILE ERROR Returned when the specified profile is unknown O CG VAR ARG ERROR Returned when the variable arguments are specified incorrectly O CG INVALID DIMENSION ERROR Returned when the dimension value is invalid O CG ARRAY PARAM ERROR Returned when the parameter must be an array QO CG OUT OF ARRAY BOUNDS ERROR Returned when the index into an array is out of bounds API Specific Cg Runtimes Each API specific Cg runtimes provides an additional se
7. tex1D sampler1D tex float2 sz 1D nonprojective depth compare tex1D sampler1D tex float2 sz float dsdx float dsdy 1D nonprojective depth compare with derivatives texlDproj samplerlD tex float2 sq 1D projective texlDproj sampler1D tex float3 szq 1D projective depth compare tex2D sampler2D tex float2 s 2D nonprojective tex2D sampler2D tex float2 s float2 dsdx float2 dsdy 2D nonprojective with derivatives tex2D sampler2D tex float3 sz 2D nonprojective depth compare tex2D sampler2D tex float3 sz float2 dsdx float2 dsdy 2D nonprojective depth compare with derivatives tex2Dproj sampler2D tex float3 sq 2D projective tex2Dproj sampler2D tex float4 szq 2D projective depth compare 808 00504 0000 006 NVIDIA 39 Cg Language Toolkit Table 3 Texture Map Functions continued Texture Map Functions Function Description texRECT samplerRECT tex float2 s 2D RECT nonprojective texRECT samplerRECT tex float2 s float2 dsdx float2 dsdy 2D RECT nonprojective with derivatives texRECT samplerRECT tex float3 sz 2D RECT nonprojective depth compare texRECT samplerRECT tex float3 sz float2 dsdx float2 dsdy 2D RECT nonprojective depth compare with derivatives texRECTproj samplerRECT tex float3 sq 2D RECT projective texRECTproj samplerRECT tex float3 szq 2D RECT pro
8. y float4 main v2f_simple IN uniform sampler2D ShadowMap uniform sampler2D SpotLight COLOR float4 shadow tex2D ShadowMap IN TexCoord0 xy float4 spotlight tex2D SpotLight IN TexCoordl xy float4 lighting IN Color0 return shadow spotlight lighting 210 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Shadow Volume Extrusion Description This effect uses vertex programs to generate shadow volumes by extruding geometry along the light vector Fig 20 Fig 20 Example of Shadow Volume Extrusion 808 00504 0000 006 211 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Shadow Volume Extrusion struct appdata y Sa ABONO SON float3 Normal NORMAL Sas ECO lO mC OO RO Tloat2 Tex cordi s LTEXCOORD 0 Steve vocoma y float4 Hposition POSITION mioara Colorz0O MEME GA EIE float2 TexCoord0 TEXCOORDO vpconn main appdata IN uniform float4x4 WorldViewProj uniform float4 LightPos in object space uniform float4 Fatness uniform float4 ShadowExtrudeDist uniform float4 Factors vpconn OUT Create normalized vector from vertex to light float4 light to vert normalize IN Position LightPos N dot L to decide if point should be moved away ES from the light to extrude the volum float ndotl dot light to vert xyz IN Normal xyz Inset the position along the normal vector direction
9. Cg Language Toolkit out float4 coloro ICO TOR 0 out float4 texCoordO TEXCOORDO const uniform float4x4 ModelViewMatrix positionO mul position ModelViewMatrix colorO color texCoordO texCoord Fragment Program The following Cg code is assumed to be in a file called FragmentProgram cg void FragmentProgram iin itl aie4 cales C OFORO O ATEO E mele COORD OF ott Elo coloro 2 COLOR const uniform sampler2D BaseTexture const uniform float4 SomeColor colorO color tex2D BaseTexture texCoord SomeColor Direct3D 9 Application The following C code links the previous vertex and fragment programs to the Direct3D 9 application include lt cg cg h gt include lt cg cgD3D9 h gt IDirect3DDevice9 device Initialized somewhere else IDirect3DTexture9 texture Initialized somewhere else D3DXMATRIX matrix Initialized somewhere else D3DXCOLOR constantColor Initialized somewher ls CGcontext context CGprogram vertexProgram fragmentProgram IDirect3DVertexDeclaration9 vertexDeclaration IDirect3DVertexShader9 vertexShader IDirect3DPixelShader9 pixelShader CGparameter baseTexture someColor modelViewMatrix Called at application startup void OnStartup J Create comites context cgCreateContext 92 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Called whenever the Direct3D device needs to be create
10. Fragment program SIUC Gai 4 float4 diffusecolor COLORO float4 uv0 TEXCOORDO float4 uvl TLE COORD y fragout bar myvf indata float4 x indata uv0 E ET The following binding semantics are available in all Cg vertex profiles for output from vertex programs POSITION PSIZE FOG COLORO COLOR1 and TEXCOORDO TEXCOORD7 All vertex programs must declare and set a vector output that uses the POSITION binding semantic This value is required for rasterization To ensure interoperability between vertex programs and fragment programs both must use the same struct for their respective outputs and inputs For example struct myvert2frag Ria Dos BOSTON float4 uv0 TEXCOORDO float4 uvl TEXCOORD1 y Vertex program myvert2frag vertmain myvert2frag outdata RU return outdata Fragment program void fragmain myvert2frag indata float4 tcoord indata uv0 jew nq tul 8 808 00504 0000 006 NVIDIA Introduction to the Cg Language Note that values associated with some vertex output semantics are intended for and are used by the rasterizer These values cannot actually be used in the fragment program even though they appear in the input struct For example the indata pos value associated with the POSITION fragment semantic may not be read in the fragmain shader Varying Outputs from Fragment Programs Binding semantics are always required on the outputs of fr
11. The OpenGL ARB Vertex Program Profile is used to compile Cg source code to vertex programs compatible with version 1 0 of the GL_ARB_vertex_program extension Q Profile name arbvp1 Q How to invoke Use the compiler option profile arbvp1 This section describes the capabilities and restrictions of Cg when using the arbvpl profile O The arbvp1 profile is similar to the vp20 profile except for the format of its output and its capability of accessing OpenGL state easily Q ARB vertex programhas the same capabilities as NV vertex program and DirectX 8 vertex shaders so the limitations that this profile places on the Cg source code written by the programmer is the same as the NV vertex program profile Accessing OpenGL State The arbvp1 profile allows Cg programs to refer to the OpenGL state directly unlike the vp20 profile However if you want to write Cg programs that are compatible with vp20 vp30 and dx8vs profiles you should use the alternate mechanism of setting uniform variables with the necessary state using the Cg run time The compiler relies on the feature of ARB vertex assembly programs that enables parts of the OpenGL state to be written automatically to program parameter registers as the state changes The OpenGL driver handles this state tracking feature A special variable semantic called state can be used to refer to every part of the OpenGL state that ARB vertex programs can reference Following this pa
12. ls 312 Table 51 ps_1_x Uniform Input Binding Semantics 313 Table 52 ps_1_x Varying Input Binding Semantics oaoa oaa aaa 314 Table 53 ps 1 x Varying Output Binding Semantics 004 314 Table 54 ps 1 x Auxiliary Texture Functions aa 315 xii 808 00504 0000 006 NVIDIA Foreword We are in the midst of a great transition in computer graphics both in terms of graphics hardware and in terms of the visual quality and authoring process for games interactive applications and animation Graphics hardware has evolved from big iron graphics workstations costing hundreds of thousands of dollars to single chip graphics processing units GPUs whose performance and features have grown to match and now even to exceed traditional workstations The processing power provided by a modern GPU ina single frame rivals the amount of computation that used to be expended for an offline rendered animation frame Indeed at the launch of GeForce3 on the Apple Macintosh a convincing version of Pixar s Luxo Jr was demonstrated running interactively in real time At the 2001 SIGGRAPH conference an interactive version of a more recent film Square Studios Final Fantasy was shown running in real time again on a GeForce3 Although these feats of computation are astounding there is much more to come Today s GPUs evolve very quickly Typically a product generation is only six months long and with
13. Hf Application code that is traced cgD3D9EnableDebugTracing CG_FALSE Note that each debug trace output sets an error equal to cgD3D9DebugTrace So if an error callback has been registered with the core runtime using cgSetErrorCallback each debug trace output triggers a call to this error callback see Using Error Callbacks on page 116 Direct3D Error Reporting Error reporting in Cg includes defined error types functions that allow testing for errors and support for error callbacks Direct3D Error Types The Direct3D runtime generates errors of type CGerror reported by the Cg core runtime and of type HRESULT reported by the Direct3D runtime In addition it returns the errors listed in the next two groups that are specific to the Direct3D Cg runtime QO CGerror cgD3D9Failed Set when a Direct3D runtime function makes a Direct3D call that returns an error cgD3D9DebugTrace Set when a debug message is output to the debug console when using the debug DLL see Direct3D Debugging Mode on page 112 Q HRESULT CGD3D9ERR_INVALIDPARAM Returned when a parameter value cannot be set Y CGD3D9ERR INVALIDPROFILE Returned when a program with an unexpected profile is passed to a function CGD3D9ERR INVALIDSAMPLERSTATE Returned when a parameter of type D3DTEXTURESTAGESTATETYPE which is not a valid sampler state is passed to a sampler state function 114 808 00504 00
14. Input point size Generic Attribute 6 BLENDINDICES ATTR7 Generic Attribute 7 TEXCOORDO TEXCOORD 7 ATTR8 ATTR15 Input texture coordinates texcoord0 texcoord7 Generic Attributes 8 15 Generic Attribute 14 Generic Attribute 15 TANGENT ATTR14 BINORMAL ATTR15 The valid binding semantics for varying output parameters in the vp20 profile are summarized in Table 31 These binding semantics map to NV_vertex_program output registers The two sets act as aliases to each other Table 31 vp20 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION HPOS Output position PSIZE PSIZ Output point size FOG FOGC Output fog coordinate 808 00504 0000 006 281 NVIDIA Cg Language Toolkit Table 31 vp20 Varying Output Binding Semantics continued Binding Semantics Name Corresponding Data COLORO COLO Output primary color COLOR1 COL1 Output secondary color BCOLO Output backface primary color BCOL1 Output backface secondary color TEXCOORDO TEXCOORD3 TEXO TEX3 Output texture coordinates The profile also allows wPos to be present as binding semantics on a member of a structure of a varying output data structure provided the member with this binding semantics is not referenced This allows Cg programs to have the same structure specify the varying output of a vp20 profile prog
15. Structure methods are called using the notation given an object of type Foo the valueTimesTwo method is called by valueTimesTwo Interfaces Interfaces may be declared in order to define a set of methods that a structure must provide in order to implement that interface Programs and functions can take interfaces as parameters where the specific structure types being passed to them may be resolved at runtime Depending on hardware limitations some profiles may require that the concrete types associated with a particular usage of interfaces be resolved by the runtime before the program can execute Interfaces are specified with the interface keyword interface Light float3 illuminate float3 position y 228 808 00504 0000 006 NVIDIA Types Appendix A Cg Language Specification A structure indicates that it implements a particular interface with a colon and the name of the interface struct lusum 3 heim 1 floes illunmatre lots positiam d sss y A structure may only implement a single interface and inheritance between structures is not supported Cg s types are as follows Q The int type is preferably 32 bit two s complement Profiles may optionally treat int as float Q The float type is as close as possible to the IEEE single precision 32 bit floating point Profiles must support the 1oat data type Q The half type is lower precision IEEE like floating point Profiles must su
16. Varying input binding semantics in the p20 profile consist of COLORO COLOR1 TEXCOORDO TEXCOORD1 TEXCOORD2 and TEXCOORD3 These map to output registers in vertex shaders The valid binding semantics for varying input parameters in the p20 profile are summarized in Table 36 Table 36 p20 Varying Input Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Input color value vO COL COLO COLOR1 Input color value v1 COL1 TEXCOORDO TEXCOORD3 Input texture coordinates t0 t3 TEXO TEX3 FOGP Input fog color and factor FOG 808 00504 0000 006 289 NVIDIA Cg Language Toolkit Additionally the p20 profile allows POSITION PSIZE TEXCOORD4 TEXCOORD5 TEXCOORD6 and TEXCOORD7 to be specified on varying inputs provided these inputs are not referenced This allows Cg programs to have the same structure specify the varying output of a vp20 profile program and the varying input of a p20 profile program The valid binding semantics for varying output parameters in the p20 profile are summarized in Table 37 Table 37 p20 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Output color 1oat4 COL COLO DEPR Output depth float DEPTH The output depth value is special in that it may only be assigned a value of the form float4 t lt texture shader operation gt float z dot texCoord lt n gt t
17. float4 HPosition POSITION loss Orosielca 8 EE COORD Hoat sT mos licioa MECO RD float3 Normal ITESO ORD float3 TexCoord0 TEXCOORDO Tigers Colon ME OO RO float3 LightPos EXC O ORD A float3 ViewerPos TEXCOORD5 y void calcLighting out float diffuse out float specular Eie 1 itllexeuES mormeall Blogs fracios Elosies EAE SS float3 eyePos float specularExp loat3 light lightPos fragPos loat len length light ligne light lemy 11 a i loat3 eye normalize eyePos fragPos loat3 halfVec normalize eyePos light loat aAttemmciiom 1 3 lenj 1 loat4 lighting lit dot light normal dot halfVec normal specularExp diffuse lighting y attenuation specular lighting z attenuation float4 main vert2frag IN uniform float4 LightPos uniform sampler3D noise map uniform sampler2D nv map uniform samplerCUBE cube map uniform float4 interpolate EEG USES float diffuse specular float3 biVariate float3 IN OPosition x IN OPosition z 808 00504 0000 006 163 NVIDIA Cg Language Toolkit JEN c G8 oYSjaLie ab oda o sz INAOBO sion 0p float3 uniVariate float3 IN OPosition x IN OPosition z 0 0 float3 normal normalize IN Normal float3 noiseTex float3 IN OPosition x IN OPosition z 6 EN MOL So ZONE float3 noiseSum tex3D noise_map biVariate 3 rgb 12 tex3D noise map noiseTex rgb 18
18. gt M n m Componentwise M n m M n m gt M n m Componentwise M n m M n m gt M n m Componentwise 808 00504 0000 006 NVIDIA 247 Cg Language Toolkit Operators Boolean amp amp Boolean operators may be applied to boo1 packed boo1 vectors in which case they are applied in elementwise fashion to produce a result vector of the same size Each operand must be a bool vector of the same size Both sides of amp amp and are always evaluated there is no short circuiting as there is in C Comparisons lt gt lt gt lo Comparison operators may be applied to numeric vectors Both operands must be vectors of the same size The comparison operation is performed in elementwise fashion to produce a bool vector of the same size Comparison operators may also be applied to bool vectors For the purpose of relational comparisons true is treated as one and false is treated as zero The comparison operation is performed in elementwise fashion to produce a bool vector of the same size Comparison operators may also be applied to numeric or boo scalars Arithmetic unary unary The arithmetic operator is the remainder operator as in C It may only be applied to two operands of cint or int type When or is used with cint or int operands C rules for integer and apply The C operators that combine assignment with arithmetic operat
19. 1 id i i be lic colos floata sheenColor 1 i i 1 e sheen Color float4 skinColor tex2D texl In texcoords float3 g Woe 035 M0 Pe float3 albedo Us Uso d oiliness mask float4 oiliness 0 9 tex2D tex2 In texcoords Get eye spac ye vector float3 v normalize In eyeSpacePosition Get eye space light and halfangle vectors float3 1 normalize eyeSpaceLightPosition In eyeSpacePosition leer S la seXonsvellistexe o w sr db E Get tangent space normal vector from normal map float3 tangentSpaceNormal tex2D tex0 In texcoords rgb float3 bumpscale bscale bscale 1 0 tangentSpaceNormal tangentSpaceNormal bumpscale Transform it into eye space Floats qms n 0 dot In tangentToEyeMat0 xyz tangentSpaceNormal n 1 dot In tangentToEyeMatl tangentSpaceNormal n 2 dot In tangentToEyeMat2 tangentSpaceNormal n normalize n Compute the lighting equation fleet m lotril nesl elo im 1 0 je fi elsu 0 to 1 float meloicin max dot n h 0 eis 0 to 1 loewe deg lora actor gt 0p Compute oil sheen subsurf scattering contributions ilo a Calls float4 sheen 178 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders Plot silos wei y itle ike IA itle It ez loata ar 127 TILOAES IR IR Compute fresnel at sheen layer ramp it up a bit Kr fresnel v n eta
20. Binding Semantics Name Corresponding Data COLOR COLORO Output color 1oat4 COL COLO DEPTH Output depth float DEPR The output depth value is special in that it may only be assigned a value in the ps_1_3 profile and must be of the form float4 t lt texture addressing operation gt float z dot texCoord lt n gt t xyz float w dot texCoord lt n 1 gt t xyz depth z w 314 808 00504 0000 006 NVIDIA Appendix B Language Profiles Auxiliary Texture Functions Because the capabilities of the texture addressing instructions are limited in DirectX pixel shader 1_X a set of auxiliary functions is provided in these profiles that express the functionality of the more complex texture addressing instructions These functions are provided merely as a convenience for writing ps_1_x Cg programs The same result can be achieved by writing the expanded form of each function directly The expanded form has the added advantage of being supported on other profiles These functions are summarized in Table 54 Table 54 ps_1_x Auxiliary Texture Functions Texture Function Description offsettex2D uniform sampler2D tex float2 st float4 prevlookup uniform float4 m Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy return tex2D tex newst where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture o
21. Cg Language Toolkit 3 Use the Cg Standard Library The functions in the Cg Standard Library have been carefully written for both efficiency and correctness By using Standard Library functions when appropriate you can automatically take advantage of the work that went into making sure they compile to fast code on GPUs while you concentrate on the hard problems you re solving in your own shaders Particularly fast Standard Library functions include dot which computes the dot product of two vectors abs which computes the absolute value of a variable saturate which clamps a value to be between zero and one and min and max which return the minimum and maximum of a pair of values You won t be able to write more efficient implementations of these functions than the Standard Library provides because many of them compile directly to GPU assembly language instructions Writing a dot product function of your own float mydot float3 a float3 b Seva ay dos GP Ela Wild harz DRZ compiles to a handful of instructions while the built in dot function compiles to a single specialized dot product instruction There s no other way to get to this instruction other than by using the Standard Library Two functions deserve particular attention The abs function usually has no cost in either vertex or fragment programs because the GPU can evaluate the function while executing other instructions Similarly the saturat
22. Returns x otherwise sign x lifx gt 0 lif x lt 0 0 otherwise sin x Sine of x sincos float x out s out c s is set to the sine of x and cis set to the cosine of x If sin x and cos x are both needed this function is more efficient than calculating each individually sinh x Hyperbolic sine of x smoothstep min max x For values of x between min and max returns a smoothly varying value that ranges from 0 at x min to 1 at x max x is clamped to the range min max and then the interpolation formula is evaluated 2 x min max min 3 min max min step a x Difx lt a lifx gt a sqrt x Square root of x x must be greater than zero tan x Tangent of x tanh x Hyperbolic tangent of x transpose M Matrix transpose of matrix M If M is an AxB matrix the transpose of M is a BxA matrix whose first column is the first row of M whose second column is the second row of M whose third column is the third row of M and SO On 808 00504 0000 006 37 NVIDIA Cg Language Toolkit Geometric Functions Table 2 Geometric Functions presents the geometric functions that are provided in the Cg Standard Library Table 2 Geometric Functions Geometric Functions Function Description distance pt1 pt2 Euclidean distance between points pt1 and pt2 faceforward N I Ng N if dot
23. User s Manual A Developer s Guide to Programmable Graphics Release 1 4 September 2005 Cg Language Toolkit ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOARDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all information previously supplied NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries Microsoft Windows the Windows logo and DirectX are registered trademarks of Microsoft Corporation
24. based Cg profiles there is no such implied mapping Binding semantics may be specified directly on program parameters rather than on struct elements Thus the following vertex program definition is legal outdata foo float3 myPosition POSITION float3 myNormal NORMAL float3 myTangent TANGENT float refractive index TEXCOORD3 DET Within the program the parameters are referred to by their variable names myPosition myNormal mangent Emol Mireia abite let n Wee sas a Varying Outputs to and from Vertex Programs The outputs of a vertex program pass through the rasterizer and are made available to a fragment program as varying inputs For a vertex program and fragment program to interoperate they must agree on the data being passed between them As it does with the data flow between the application and vertex program Cg uses binding semantics to specify the data flow between the vertex program and fragment program This example shows the use of binding semantics for vertex program output Vertex program struct myvf float4 pout B POSITIONS WESC or asic Sica zc aL float4 diffusecolor COLORO float4 uvO ECO ORIO float4 uvl TEXAS O ORD y WME EOSS soo Y Y Al myvf outstuff fius ae ctf 808 00504 0000 006 7 NVIDIA Cg Language Toolkit return outstuff And this example shows how to use this same data as the input to a fragment program
25. float4 main MyInterface foo COLOR sica tor Well a 5 ES p Listing 3 Cg Program 3 Notice that both Cg Program 1 and Cg Program 2 define the val method of the MyInterface and MyStruct types using the float type whereas Cg Program 3 does so using the half type As a result the MyInterface and MyStruct types defined in Cg Program Three are not equivalent to types in the other two programs even though the types have the same names The following C program creates all three of the above Cg programs and connects shared parameter instances to their input parameters static CGprogram CreateProgram const char program_str return cgCreateProgram Context CG_SOURCE program str CG PROFILE ARBFP1 Muela Nfl P ame mein Late euge che exe if CGContext Context CGprogram Programl Program2 Program3 CGparameter msl ms3 Disable automatic compilation since the programs cannot be compiled until concrete structs are connected to each program s interface parameters 62 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Context cgCreateContext cgSetAutoCompile Context CG_COMPILE MANUAL Create the programs Programl CreateProgram ProgramlString Program2 CreateProgram Program2String Program3 CreateProgram Program3String Create two shared parameters one of the MyStruct type from Programl and one of the MyStruct type from Program3
26. msl cgCreateParameter cgGetNamedUserType Programl iN Sie EUGEN g ms3 cgCreateParameter cgGetNamedUserType Program3 UMV eee p Connect the same shared parameter to Programl and Progran2 wy cgConnectParameter Fool cgGetNamedParameter Programl MESONI J cgConnectParameter Fool cgGetNamedParameter Program2 KECOH The following would generate an error because the type of the Fool parameter is not equivalent to type J MS ciacer Erom Pieegieena c cgConnectParameter ms1 Hit cgGetNamedParameter Program3 foo cgConnectParameter ms3 cgGetNamedParameter Program3 UE xg Now we can compile all three programs cgCompileProgram Programl cgCompileProgram Program2 cgCompileProgram Program3 We a a BO GM us 808 00504 0000 006 63 NVIDIA Cg Language Toolkit Parameter Properties Parameter properties encompass validity references size and other attributes Parameter Type The Cg language defines a number of built in parameter types such as float4 int3x3 and so on In addition user defined types may be specified in a program when declaring structure and interface types For example if the following Cg code is included in the source to a CGprogram created via cgCreateProgram the types MyInterface and MyStruct will be added to the resulting CGprogram interface MyInterface float SomeMethod float x y struct MyStruct
27. newResult xyz tlostatl 0 00 normal normalize normal calculate diffuse lighting off the normal Ue that was just calculated floats liceos Elec 0 9 19 5 float3 lightVec normalize lightPos position float diffuselnten dot lightVec normal e wp the final color The first term is a semi random term based Ve on the total height of this straw The second term is the diffuse lighting component UT Coloro nongmalize ars chris lnea NADO Sion return OUT 204 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Refraction Description This effect performs custom texture coordinate generation to compute a refracted vector per vertex that is then used to look up in a cube map Fresnel is also calculated to blend between reflection and refraction Fig 18 Fig 18 Example of Refraction 808 00504 0000 006 205 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Refraction SUSE 3UmpexUR e float4 Position PO SIERO NIS float4 Normal NORMAL y SEXucl DULPUES float4 hPosition B IONS IL ILO P float4 fresnelTerm COLORO float4 refractVec B LE COORD OT float4 reflectVec EAS O OE Daa y fresnel approximation fixed rast ares mle n o at NNI float3 fresnelValues fixed power fresnelValues x fixed scale fresnelValues y fixed bias fresnelValues z ieirwica dias jocww il 0 cdta 1 Texo
28. variables Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the vp20 profile are sum marized in Table 29 Table 29 vp20 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data C0 C95 register c0 register c95 Constant register 0 95 The aliases c0 c95 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used 280 NVIDIA 808 00504 0000 006 Appendix B Language Profiles Binding Semantics for Varying Input Output Data The valid binding semantics for varying input parameters in the vp20 profile are summarized in Table 30 One can also use TANGENT and BINORMAL instead of TEXCOORD6 and TEXCOORD7 A second set of binding semantics ATTRO ATTR15 can also be used The two sets act as aliases to each other Table 30 vp20 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION ATTRO Input Vertex Generic Attribute 0 BLENDWEIGHT ATTR1 Input vertex weight Generic Attribute 1 NORMAL ATTR2 Input normal Generic Attribute 2 COLORO DIFFUSE ATTR3 Input primary color Generic Attribute 3 COLOR1 SPECULAR ATTR4 Input secondary color Generic Attribute 4 TESSFACTOR FOGCOORD ATTR5 Input fog coordinate Generic Attribute 5 PSIZE ATTR6
29. 0 5 lightVectorInTangentSpace 0 5 compute view vector float3 viewVector normalize EyePosition xyz IN Position xyz compute half angle vector float3 halfAngleVector normalize LightVector xyz viewVector transform half angle vector from object space to tangent space OUT HalfAngleVector xyz mul objToTangentSpace halfAngleVector transform position to projection space OUT Position mul WorldViewProj IN Position return OUT Pixel Shader Source Code for Bump Dot3x2 EErEE wi y float4 Position POSITION in projection space float4 Normal COLORO in tangent space float4 LightVectorUnsigned COLOR1 in tangent space float3 TexCoord0 TEXCOORDO float3 TexCoordl TEXCOORD1 float4 LightVector TEXCOORD2 in tangent space float4 HalfAngleVector TEXCOORD3 in tangent space Elec meat 2 JON uniform sampler2D DiffuseMap 194 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders uniform sampler2D NormalMap uniform sampler2D IlluminationMap uniform float Ambient COLOR Ii kerteh base color float4 color tex2D DiffuseMap IN TexCoord0 xy fetch bump normal and expand it to 1 1 float4 bumpNormal 2 tex2D NormalMap IN TexCoordl xy 0 5 compute the dot product between Ue the bump normal and the light vector compute the dot product between v the bump normal and the half a
30. 5 set the two color coefficients the magic constants are arbitrary these two color coefficients are used 808 00504 0000 006 159 NVIDIA Cg Language Toolkit to calculate the contribution from each of the two environment cubemaps one bright one dark OUT Color0 fres 1 4 min reflected y 0 xxxx Elige 4 2 oS oS E OUT Colorl fres 1 26 xxxx return OUT Pixel Shader Source Code for Improved Water float4 main in float3 colorO0 INCOLORO iia flogs colori amp COLOR in float3 reflectVec EIE A9 9 1 in tloadks rer llechViccDark S EEREXGOORDST uniform samplerCUBE environmentMaps 2 Y 2 COLOR float3 reflectColor texCUBE environmentMaps 0 reflectVec rgb float3 reflectColorDark texCUBE environmentMaps 1 reflectVecDark rgb floats color retlectColor color0 refliectColorDark colorl return floatti color 1 0 160 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders Melting Paint Description This shader uses an environment map with procedurally modified texture lookups to create a melting effect on the surface texture the NVIDIA logo in this example The reflection vector is shifted using a noise function giving the appearance of a bumpy surface The surface texture s texture coordinates are shifted in a time dependent manner also based on a noise texture Fig 7 Example of Melting Paint Verte
31. NVIDIA Appendix A Cg Language Specification structure that is a uniform parameter to the program This requirement also applies when the array is indirectly a uniform program parameter that is it and or the structure containing it has been passed via a chain of in function parameters There are two operations that must be supported O Rvalue subscripting by a run time computed value or a compile time value Passing the entire array as a parameter to a function where the corresponding formal function parameter is declared as in The following operations are explicitly not required to be supported O Lvalue subscripting a Copying Q Other operators including multiply add compare and so on Note that when the array is rvalue subscripted the result is an expression and this expression is no longer considered to be a uniform program parameter Therefore if this expression is an array its subsequent use must conform to the standard rules for array usage These rules are not limited to arrays of numeric types and thus imply support for arrays of struct arrays of matrices and arrays of vectors when the array is a uniform program parameter Maximum array sizes may be limited by the number of available registers or other resource limits and compilers are permitted to issue error messages in these cases However profiles must support sizes of at least float arr 8 float4 arr 8 and float4x4 arr 4 4 Fragment profile
32. Ng I lt 0 otherwise N length v Euclidean length of a vector normalize v Returns a vector of length 1 that points in the same direction as vector v reflect i n Computes reflection vector from entering ray direction i and surface normal n Only valid for 3 component vectors refract i n eta Given entering ray direction i surface normal n and relative index of refraction eta computes refraction vector If the angle between i and n is too large for a given eta returns 0 0 0 Only valid for 3 component vectors Texture Map Functions Table 3 Texture Map Functions presents the texture functions that are provided in the Cg Standard Library These texture functions are fully supported by the ps 2 arbfp1 p30 and p40 profiles The two dimensional variants of these functions are supported by the vp40 profile All of the functions in the table return a float4 value Because of the limited pixel programmability of older hardware the ps 1 and p20 profiles use a different set of texture mapping functions See Language Profiles on page 255 for more information 38 808 00504 0000 006 NVIDIA Cg Standard Library Functions Table 3 Texture Map Functions Texture Map Functions Function Description tex1D sampler1D tex float s 1D nonprojective tex1D sampler1D tex float s float dsdx float dsdy 1D nonprojective with derivatives
33. O G A O OU w U U 5 y Ensure the resulting declaration is compatible with the shader This is really just a sanity check 808 00504 0000 006 107 NVIDIA Cg Language Toolkit assert cgD3D9ValidateVertexDeclaration vertexProgram declaration device gt CreateVertexDeclaration declaration amp vertexDeclaration Load the program with th xpanded interfac Parameter shadowing is enabled second parameter TRUE cgD3D9LoadProgram vertexProgram TRUE 0 Create the pixel shader fragmentProgram cgCreateProgramFromFile context CG SOURCE FragmentProgram cg pixelProfile FragmentProgram pixelOptions Load the program with th xpanded interface Parameter shadowing is enabled second parameter TRUE Ignore vertex shader specifc flags such as declaration usage cgD3D9LoadProgram fragmentProgram TRUE 0 Grab some parameters modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram Some oou e Sanity check that parameters have th xpected siz assert cgD3D9TypeToSize cgGetParameterType modelViewMatrix 16 assert cgD3D9TypeToSize cgGetParameterType someColor SE Set parameters that don t change They can be set only once since parame
34. Program Iteration The programs within a context are sequentially ordered and can be iterated over by using cgGetFirstProgram and cgGetNextProgram CGprogram cgGetFirstProgram CGcontext context CGprogram cgGetNextProgram CGprogram program The first program of the sequence is retrieved by cgGetFirstProgram If the context is invalid or does not contain any program the function returns zero Given a program cgGetNextProgram returns the program immediately next in the sequence or zero if there is none Here is how those two functions would typically be used given a valid context named context CGprogram program cgGetFirstProgram context while program 0 Here is the code that handles the program program cgGetNextProgram program Nothing is guaranteed regarding the order of the programs in the sequence or how cgGetFirstProgram and cgGetNextProgram behave when programs are created or destroyed during iteration Program Query Program queries encompass validity compilation results and attributes Program Validity Use cgIsProgram to check whether a program handle references a valid program CGbool cgIsProgram CGprogram program Compilation Result You can query the result of the compilation resulting from the last call to cgCreateProgram for a given context by using cgGetLast Listing const char cgGetLastListing CGcontext context 808 00504 0000 006 53 NVIDIA Cg
35. R T Kee sinooclasicee 0 0 0 5 me g ite d40 Xx Compute the refracted light ray and the refraction coefficient Ke2 ame SL mn ical 1X2 IZ Mp 122 Ssmooehsiten 00 Oh oy NN ua 1 0 IXe2p For oil contribution modulate the oiliness mask by a specular term Oil 0 5 olliness poy acloitla m p For sheen contribution modulate Fresnel term by sheen color times specular Modulate by additional diffuse term to soften it a bit sheen 2 5 Kr sheenColor ndot1 0 2 pow ndoth m Compute single scattering approximation to subsurface scattering Here we compute 3 scattering terms simultaneously and the results end up in the x y z components of a float3 Using 3 terms approximates distribution of multiply scattered light For details see Matt Pharr s SIGGRAPH 2001 RenderMan course notes Layered Media for Surface Shaders float3 temp singleScatter T2 T n g albedo thickness uloste 2 5 sikamCollor mdgtl Eug EZ gt temp x temp y temp z Add contributions from oil sheen and subsurface scattering and modulate by light color and result of a shadow map lookup return lightColor tex2Dproj tex3 In shadowcoords r oil sheen subsurf 808 00504 0000 006 179 NVIDIA Cg Language Toolkit Thin Film Effect Description This demo shows a thin film interference effect
36. This book is intended as an introduction to Cg as well as a practical handbook to get programmers started developing in Cg It includes a language description a reference for the standard and run time libraries and is full of helpful examples The goal for this book is to be both an introduction and a tool for the new user as well as a reference and resource for developers as they become more proficient Welcome to the world of Cg David Kir amp Chief Scientist NVIDIA Corporation xiv 808 00504 0000 006 NVIDIA a o LN gt Preface The goal of this book is to introduce to you Cg a new high level language for graphics programming To that end we have organized this document into the following sections Q Introduction to the Cg Language on page 1 A quick introduction to the current release of Cg with everything you need to know to start working it Q Cg Standard Library Functions on page 33 A list of the Standard Library functions which can help to reduce your program development time Q Introduction to the Cg Runtime Library on page 43 An introduction to the Cg runtime APIs which allow you to easily compile Cg programs and pass data to them from within applications Q Introduction to CgFX on page 117 The CgFX API which supports this Cg extended file format is described Q A Brief Tutorial on page 145 A description of a simple Cg program and Microsoft Vi
37. This moves the shadow volume points inside the model slightly to minimize popping of shadowed areas as each facet comes in and out of shadow The Fatness value should be negative float4 inset pos IN Normal Fatness xyz IN POSANE I OI SAV VBP inset_pos w IN Position w scale the vector from light to vertex 212 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders float4 extrusion_vec light_to_vert ShadowExtrudeDist if ndotl lt 0 then the vertex faces Vay away from the light so move it It will be moved along the direction from 1 d light to vertex to extrude the shadow volume iPlWoxeE chew low cor Ll lt 0 5 Move the back facing shadow volume points float4 new_position extrusion_vec away inset_pos Transform position to hclip space OUT Hposition mul WorldViewProj new position Set the color to blue for when the shadow volume il is rendered in color for illustrative purposes float color TPloat4 r0 0 BRactors x 0 OUT Color0 color OUT TexCoord0 xy IN TexCoord0 return OUT 808 00504 0000 006 213 NVIDIA Cg Language Toolkit Sine Wave Demo Description This effect modifies the vertex positions using a sine function based on the current time It demonstrates use of the built in sin function It also computes a normal based on the perturbed mesh and uses this to compute a reflection vector to
38. and less than the value of GL_MAX CLIP_PLANES ColorMask bool4 1 0 ColorMatrix float4x4 ARB imaging ColorMaterial int2 Front Back 1 0 FrontAndBack Emission Ambient Diffuse Specular AmbientAndDiffuse CullFace int Front Back 1 0 FrontAndBack DepthBounds float2 EXT depth bounds test DepthFunc int Never Less 1 0 LEqual Equal Greater NotEqual GEqual Always DepthMask bool 1 0 DepthRange float2 1 0 FogMode int Linear Exp Exp2 1 0 FogDensity float 1 0 FogStart float 1 0 FogEnd float 1 0 FogColor float4 1 0 FragmentEnvParameter float4 ARB fragment program ndx ndx must be greater than or equal to zero and less than the value of GL MAX PROGRAM ENV PARAMETERS ARB for the GL FRAGMENT PROGRAM ARB target to glGetProgramivARB 132 NVIDIA 808 00504 0000 006 Introduction to CgFX Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires Fragment LocalParameter float4 ARB fragment program ndx ndx must be greater or equal to zero and less than the value of GL MAX PROGRAM LOCAL PARAMETERS ARB for the GL FRAGMENT PROGRAM ARB target to glGetProgramivARB FogCoordSrc int FragmentDepth OpenGL 1 4 or FogCoord EXT fog coord FogDistanceMode int EyeRadial NV fog distance EyePlane EyePlaneAbsolute FragmentProgram compile ARB fragment program statement OrNV fragment program FrontFace int CW CCW 1 0 L
39. coreCg 50 control constructs used 19 core Cg context 50 Core Cg error reporting 71 Core Cg parameter 54 Core Cg program 50 core Cg runtime 49 D data types bool 11 fixed 11 float 11 half 11 int 11 sampler 11 supported 11 data types for performance 325 debugging function 41 declaration Cg definition 224 definition as used in Cg 224 derivative functions 41 Direct3D Cg runtime 85 cgD3D9EnableDebugTracing 114 cgD3D9GetLastError 115 cgD3D9TranslateHRESULT 116 CGerror 114 debugging mode 112 error callbacks 116 error testing 115 error types 114 expanded interface 98 cgD3D8LoadProgram 103 cgD3D8SetSamplerState 102 cgD3D9BindProgram 105 cgD3D9EnableParameterShadowing 103 cgD3D9GetDevice 98 cgD3D9GetlatestPixelProfile cgD3D9GetLatestVertexProfile cgD3D9GetOptimalOptions 105 808 00504 0000 006 cgD3D9IsParameterShadowingEnable d 103 cgD3D9IsProgramLoaded 104 cgD3D9LoadProgram 103 cgD3D9SetDevice 98 cgD3D9SetSamplerState 102 cgD3D9SetTexture 102 cgD3D9SetTextureWrapMode 102 cgD3D9SetUniform 100 cgD3D9SetUniformArray 101 cgD3D9SetUniformMatrix 101 cgD3D9SetUniformMatrixArray 10 1 cgD3D9UnloadProgam 104 Direct3D 8 application 109 Direct3D 9 application 106 Direct3D device 98 fragment program 106 lost devices 98 parameters 100 array 101 sampler 102 uniform 100 profile support 105 program executiion 103 vertex program 106 HRESULT 114 minimal interface 85 cgD3D8
40. discard texl a col0 sum scale_by_one_half How different NV_texture_shader and NV_register_combiners instruction set modifiers are expressed in Cg programs are summarized in Table 32 For more details on the context in which each modifier is allowed and ways in which modifiers may be combined refer to the NV_texture_shader and NV_register_combiners documentation 284 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 32 NV texture shader and NV register combiners Instruction Set Modifiers Instruction Register Modifier Cg Expression scale by two 2 x scale by four A x scale_by_one_half x 2 bias_by_negative_one_half x 0 5 bias by negative one half scale by two 2 x 0 5 unsigned reg saturate x i e min 1 max 0 x unsigned_invert reg 1 saturate x half_bias reg x 0 5 reg x expand reg 2 x 0 5 Language Constructs and Support Data Types In the p20 profile operations occur on signed clamped floating point values in the range 1 to 1 These profiles allow all data types to be used but all operations are carried out in the above range Refer to the NV_texture_shader and NV_register_combiners documentation for more details Statements and Operators The p20 profile supports all of the Cg language constructs with the following exceptions Q Arbitrary swizzles are not supported though arbitrary write masks are Only t
41. modulos and casts from floating point types Q fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables 270 808 00504 0000 006 NVIDIA Bindings Appendix B Language Profiles Statements and Operators This profile is a superset of the vp20 profile Any program that compiles for the vp20 profile should also compile for the vp30 profile although the converse is not true The additional capabilities of the vp30 profile beyond those of vp20 are Q for while and do loops are supported without requiring loop unrolling Q Full support for if else allowing non constant conditional expressions Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the vp30 profile are summarized in Table 23 Table 23 vp30 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c255 Constant register 0 255 C0 C255 The aliases c0 c255 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used 808 00504 0000 006 271 NVIDIA Cg Language Toolkit Binding Seman
42. register c95 Constant register 0 95 C0 C95 The aliases c0 c95 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used Binding Semantics for Varying Input Output Data The valid binding semantics for uniform parameters in the vs 1 1 profile are summatized in Table 46 These map to the input registers in DirectX 8 1 vertex shaders Table 46 vs 1 1 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION Vertex shader input register vo BLENDWEIGHT Vertex shader input register v1 BLENDINDICES Vertex shader input register v2 NORMAL Vertex shader input register v3 PSIZE Vertex shader input register v4 COLORO DIFFUSE Vertex shader input register v5 306 808 00504 0000 006 NVIDIA Options Table 46 vs 1 1 Varying Input Binding Semantics continued Appendix B Language Profiles Binding Semantics Name Corresponding Data COLOR1 SPECULAR Vertex shader input register v6 TEXCOORDO TEXCOORD7 Vertex shader input register v7 v14 TANGENT Vertex shader input register v14 BINORMAL Vertex shader input register v15 i TANGENT is an alias for TEXCOORD7 The valid binding semantics for varying output parameters in the vs 1 x profile These map to output registers in DirectX 8 1 ve
43. setucate Clore NIE Dohe ssColor AmbiColor baseTex DiffLight ffPupil AmbiColor saturate dot xAxis Ln lfAng normalize Ln Vn abs dot Nf halfAng cl pow ndh GlossData PHONG smoothstep GlossData GLOSS1 GlossData GLOSS2 lerp GlossData DROP specl s2 ecularLight SpecColor specl tColor missColor e gt 0 0h radedEta BallData ETA 808 00504 0000 006 173 NVIDIA Cg Language Toolkit gradedEta 1 0h gradedEta half3 faceColor BgColor half3 refVector refract Vn Nf gradedEta if dot refVector refVector gt 0 now let s intersect with the iris plane half irisT intersect_plane IN OPosition refVector planeEquation half fadeT irisT BallData LENS DENSITY fadeT fadeT fadeT faceColor DiffPupil xxx iit aeisi gt 0 d half3 irisPoint IN OPosition irisT refVector Halts Sierss th issscaile imi spon MBULIES 0 On OO Sia O Sim y faceColor tex2D ColorMap irisST yz rgb faceColor lerp faceColor LensColor fadeT hitColor lerp missColor faceColor smoothstep 0 0h GRADE slice hitColor hitColor SpecularLight maSiewien walii inwicolloie LaO p 174 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders Skin Description This effect demonstrates some techniques for rendering skin ranging from simple Blinn Phong Bump Mapping to more compl
44. state texgen 0 eye q state texgen 0 object s state texgen 0 object t state texgen 0 object r state texgen 0 object q state fog color state fog params state clip 0 plane The state semantics of type 1oat that can be accessed are listed in Table 15 Table 15 float state Semantics state point size state point attenuation Position Invariance m language specification m semantic of GL MVP Data Types The arbvp1 profile supports position invariance as described in the core The modelview projection matrix is not specified using a binding This profile implements data types as follows 258 NVIDIA 808 00504 0000 006 Appendix B Language Profiles O float data type is implemented as defined in the ARB_vertex_program specification half data type is implemented as float fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables Compatibility with the vp20 Vertex Program Profile Programs that work with the vp20 profile are compatible with the arbvp1 profile as long as they use the Cg run time to manage all uniform parameters including OpenGL state That is arbvp1 and vp20 profiles can be used interchangeably without changing the Cg source code or the application
45. tempnorm xyz normalVec normalize normalVec compute th ye gt vertex vector float3 eyeVec EyeVector Xxyz compute the view depth for the thin film float viewdepth 1 0 dot normalVec eyeVec FilmDepth x OUT filmDepth viewdepth xx store normalized light vector float3 lightVec normalize float3 LightVector calculate half angle vector float3 halfAngleVec normalize lightVec eyeVec 808 00504 0000 006 181 NVIDIA Cg Language Toolkit calculate diffuse component float diffuse dot normalVec lightVec calculate specular component float specular dot normalVec halfAngleVec use the lit instruction to calculate lighting automatically clamp igata ikGileicaing lite clilirituse secular 32 output final lighting results OUT diffCol float4 lighting y OUT specCol float4 lighting z return OUT Pixel Shader Source Code for Thin Film Effect STEUCE Wie eloco clinical COLOR 0 float3 specCol EOL float2 filmDepth TEXCOORDO y void main v2f IN Gwt iloac color 3 COCOR uniform sampler2D fringeMap uniform sampler2D diffMap diffuse material color eloco chiro thoacs 0 3 0 3 0 5 p lookup fringe value based on view depth float3 fringeCol float3 tex2D fringeMap IN filmDepth modulate specular lighting by fringe color combine with regular lighting color rgb fringeCol IN specCol IN
46. tex3D noise map biVariate 6 rgb 18 normal normalize normal noiseSum calcLighting diffuse specular normal IN OPosition IN LightPos IN ViewerPos 32 float3 nvShift tex3D noise map uniVariate 3 rgb 2 tex3D noise map uniVariate rgb 4 tex3D noise map biVariate 3 rgb 16 yal yla are dumerpodetesax Sp 0 FANASIMIL IEE s sx ANASINILIETE 7 biVariate float3 IN OPosition x IN OPosition z INODORO 0p Float texloowel loiveiciace ss7 4 t Eloeuz lo 3259 nvShift yx float2 0 interpolate x 8 float3 nvDecal tex2D nv_map float2 1 texCoord x texCoord y rgb Imes OOlaice 2 U 27 335 float3 eye IN ViewerPos JEN OP Sive iL om p float3 lightMetal texCUBE cube map reflect normal eye rgb loss der Meral Cchiiriruse iloacs 5 25 0 a Specular se fil eat iv DN float3 finalColor lerp lightMetal darkMetal nvDecal x wejeulicin ilo erc4 ienmalCoillei 1 5 164 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders MultiPaint Description MultiPaint presents a single pass solution to a common production problem mixing multiple kinds of materials on a single polygonal surface MultiPaint provides a simple BRDF bidirectional reflectance distribution function that is still complex enough to represent many common metallic and dielectric surfaces and controls all key factors of the variable BRDF through texturing This permits you to cre
47. the appropriate code path at run time An example of this situation would be a fragment shader that supported a generic light source model for shading Depending on how its parameters were set it might implement a point light a spotlight or a light source that projected a texture map to determine the light distribution Rather than having a series of if else tests to determine which light model to use having a separate version of the shader for each light type is generally more efficient 328 808 00504 0000 006 NVIDIA pendix D Cg Compiler Options This appendix describes the command line options for the Cg compiler What follows are the command line options for the Cg compiler cgc exe Qh profile prof Compile for the prof profile OU profileopts profopts Specify a comma separated list of profile specific options See the profile specification for valid options QO entry fname Specify the main function name as fname O o fname Write the output to file fname QO Dmacro value Define a macro with optional value UA Ipathname Specify path to an include directory ao 1 filename Write compiler messages to filename rather than to standard output Q strict Enforce strict type checking QO nofx Do not treat CgFX keywords as reserved words a quiet Suppress printing the header to stdout a nocode Compile but do not generate any code QO nostdlib Do not include the stdlib h hea
48. use the Advanced Profile Sample Shaders on page 153 and Basic Profile Sample Shaders on page 189 as a basis to build your own effects Release Notes Release notes for Cg are now contained in a separate document that is part of the Cg distribution Please report any bugs issues and feedback to NVIDIA by e mailing cgsupport nvidia com We will expeditiously address any reported problems Online Updates Any changes additions or corrections are posted at the NVIDIA Cg Web site http developer nvidia com Cg Refer to this site often to keep up on the latest changes and additions to the Cg language Information on how to report any bugs you may find in the release is also available on this site xvi 808 00504 0000 006 NVIDIA Introduction to the Cg Language Historically graphics hardware has been programmed at a very low level Fixed function pipelines were configured by setting states such as the texture combining modes More recently programmers configured programmable pipelines by using programming interfaces at the assembly language level In theory these low level programming interfaces provided great flexibility In practice they were painful to use and presented a serious barrier to the effective use of hardware Using a high level programming language rather than the low level languages of the past provides several advantages a A high level language speeds up the tweak and run cycle when a
49. xut eese dee we ahah aed ane boe kei SR eae eae wees 304 Language Constructs and Support wie ca kac a ee ee o pdg ee 304 BINGINGS iu acies ura pao HO Ec Reh eee Rede Fe parar Kad qr d 306 OPUS ari C Xp PEUPLE Ris ed RO eal aes eine 307 DirectX Pixel Shader 1 x Profiles ps_1_ 0 ccc cee eee oraka 308 aU Dag PCT 308 Modifies cura cute s mapa qued EEA Ad xe Rp PT AREE EU BENE RE 309 Language Constructs and SUPPOM aa Ra 310 Standard Library FUNCUONS sucio rt dad e 311 BINGINGS cet bee pin AAA BRERA E A RP hale 312 Auxiliary Texture FUNCIONS 24 2246 963245 8 2 VAGRRARE OAS AER NS dde qx 315 Examples os aid a e set Ei 319 Appendix C Nine Steps to High Performance C9 lt ccooooccccc o 321 Appendix D Cg Compiler OptiONS i5 x iconos a dca A a el A ew a a 329 MAER EE 331 808 00504 0000 006 vii NVIDIA Cg Language Toolkit viii 808 00504 0000 006 NVIDIA Contents Figures and Tables List of Figures Figs 2 CgsModelofthe GPU o isa eek yx Fono m SOR A Gok a Box BOR A ox 3 2 Fig 2 The Parts of the Cg Runtime API 2 2 2 2 0 022 eee 45 Fig 3 The Cg Simple Workspace sois ee RRR Ee X ox UR RO a 145 Fig 4 Thesimple cg Shader 2 cns 146 Fig 5 Example of Improved Skinning lens 154 Fig 6 Example of Improved Water les 157 Fig 7 Example of Melting Paint ues Rm ike Ug a a de RR GR A 161 Fig 8 Example of MultiPaint ns 165 Fig 9 Example of R
50. 123 saturate for performance 324 scalar type category 232 semantics aliasing 243 restrictions 243 shader sample anisotropic lighting 190 bump dot 3x2 diffuse and specular 192 bump reflection mapping 196 fresnel 200 grass 202 improved skinning 154 improved water 157 matrix palette skinning 217 melting paint 161 multipaint 165 ray traced refraction 170 refraction 205 shadow mapping 208 shadow volume extrusion 211 sine wave demo 214 skin 175 shader simple cg example 146 shaders advanced profile samples 153 basic profile samples 189 shading computations for performance 326 shadow mapping 208 pixel shader code example 210 sample shader 208 vertex shader code example 209 shadow volume extrusion sample shader 211 vertex shader code example 212 shadow volumes 211 silent incompatibilities with C 221 simple cg basic transformations 149 passing arguments 149 Sine function 202 214 sine wave demo sample shader 214 vertex shader code example 215 sinh x 37 skin pixel shader code example 175 sample shader 175 skinning improved sample shader 154 vertex shader code example 155 smearing scalar to vector 237 Stanford shading language relation to Cg 221 State assignment 118 statements introduction 18 statements in Cg 244 structures introduction 13 swizzle for performance 323 swizzle operator 22 swizzle operator described 245 336 808 00504 0000 006 NVIDIA T technique 117 technique validation 120
51. 43 for more details Consider the following effect float3 DiffuseColor lt string type color float3 minValue float3 0 0 0 float3 maxValue float3 10 10 10 cd qu d sms technique FixedFunctionLighting pass LightingEnable true ightEnable 0 true tige eosi elom O Lileasea 10 10 10 i1 LightAmbient 0 float4 1 1 1 1 LightDiffuse 0 float4 2 DiffuseColor 1 LightSpecular 0 float4 1 1 1 1 MaterialShininess 10 f MaterialAmbient float4 1 1 1 1 118 808 00504 0000 006 NVIDIA Introduction to CgFX MacercrtalpDpi Ss Eoi bM NIME MaterialSpecular float4 5 5 5 1 The effect defines a single effect parameter DiffuseColor with three associated annotations a string named type and two float3s named minValue and maxValue These annotations exist purely for the use of the application using the effect file the Cg runtime does not interpret the annotation names or values in any way The effect parameter is initialized to the value 1 1 1 The effect also defines a single technique named FixedFunctionLighting which in turn contains a single rendering pass The rendering pass sets the appropriate OpenGL state to perform per vertex lighting using the built in fixed function material model of OpenGL The complete set of supported OpenGL states is listed in the section OpenGL State on page 129 Note that the LightDiffuse 0 state value
52. Bar DESEA eye iL Foo Mar il T ooo 1014 Bar i Fooalt B Parameter Values The core Cg runtime provides a number of entry points for setting and retrieving parameter values In addition the graphics API specific Cg runtimes provide additional entry points for managing parameter values When managing numeric parameters choosing which set of entry points to use is largely a matter of programmer preference In some circumstances it may be slightly more efficient to use the core Cg runtime entry points However parameters that hold graphics API specific quantities such as sampler handles must be set using the API specific entry points The API specific entry points must be used because the core Cg runtime which is graphics APl agnostic provides no such entry points The most often used parameter value routines are used to set and get a parameter s current values A parameter s current value is initialized to any default value assigned in the Cg source or 0 otherwise The current value of a numeric parameter can be queried using the family of entry points int cgGetParameterValue i f d r c CGparameter param int nvals type v The given parameter must be a scalar vector matrix or an possibly multidimensional array of scalars vectors or matrices There are versions of each function to retrieve the values into an int float or double buffer these are signified by the i and din the entry point
53. CG_SOURCE FragmentProgram cg de miro a 198 2 0 Iracmena rogue 0 4 CComPtr lt ID3DXBuffer gt byteCode const char progSrc cgGetProgramString fragmentProgram CG_COMPILED_PROGRAM D3DXAssembleShader progSrc strlen progSrc 0 0 0 808 00504 0000 006 93 NVIDIA Cg Language Toolkit amp byteCode 0 device gt CreatePixelShader byteCode GetBufferPointer amp pixelShader Grab some parameters modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor Sanity check that parameters have th xpected siz assert cgD3D9TypeToSize cgGetParameterType modelViewMatrix 16 assert cgD3D9TypeToSize cgGetParameterType someColor 4 2 Called to render the scen void OnRender Get the Direct3D resource locations for parameters This can be done earlier and saved DWORD modelViewMatrixRegister cgGetParameterResourcelndex modelViewMatrix DWORD baseTextureUnit cgGetParameterResourcelndex baseTexture DWORD someColorRegister cgGetParameterResourceIndex someColor See the Dizect3D state device gt SetVertexShaderConstantF modelViewMatrixRegister cmaci a Aye device gt SetPixelShaderConstantF someColorRegister eC OSes Color A E vice gt SetVertexDeclara
54. CgFX file may contain one technique for an advanced GPU with powerful fragment programmability and another technique for older graphics hardware supporting fixed function texture blending CgFX techniques can also be used for functionality level of detail or performance fallbacks For example technique PixelShaderVersion dox B technique FixedFunctionVersion Leo f technique LowDetailVersion Loch B An application can make queries about which techniques are present in an effect and can choose an appropriate one at runtime based on whatever criteria are appropriate Each technique contains one or more passes Each pass represents a set of render states and shaders to apply for a single rendering pass within a technique For instance the first pass might lay down depth only so that subsequent passes can apply an additive alpha blending technique without requiring polygon sorting Each pass may contain a vertex program a fragment program or both and each pass may use fixed function vertex pixel processing or both For example a first pass might use fixed function pixel processing to output the ambient color The next pass could use an p30 fragment program and pass three might use an arbfp1 fragment program State Assignments Each pass also contains render state assignments such as alpha blending depth writes and texture filtering modes to name a few For example pass firstPass DepthTestEnable tru
55. Comparison operators are allowed gt lt gt lt and Boolean operators amp amp are allowed However the logic operators s are not Data Types The profiles implement data types as follows O float data types are implemented as IEEE 32 bit single precision Q half and double data types are treated as float int data type is supported using floating point operations which adds extra instructions for proper truncation for divides modulos and casts from floating point types Q fixed or sampler data types are not supported but the profiles do provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables Using Arrays Variable indexing of arrays is allowed as long as the array is a uniform constant For compatibility reasons arrays indexed with variable expressions need not be declared const just uniform However writing to an array that is later indexed with a variable expression yields unpredictable results Array data is not packed because vertex program indexing does not permit it Each element of the array takes a single 4 float program parameter register For example float arr 10 float2 arr 10 float3 arr 10 and float4 arr 10 all consume 10 program parameter registers It is more efficient to access an array
56. IDENTITY for applying no transformation at all O CG GL MATRIX TRANSPOSE for transposing the matrix O CG GL MATRIX INVERSE for inverting the matrix O CG GL MATRIX INVERSE TRANSPOSE for inverting and transposing the matrix Setting Uniform Arrays of Scalar Vector and Matrix Parameters To set the values of arrays of uniform scalar or vector parameters use the cgGLSetParameterArray functions void cgGLSetParameterArraylf CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetParameterArrayld CGparameter parameter long startIndex long numberOfElements const double array void cgGLSetParameterArray2f CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetParameterArray2d CGparameter parameter long startIndex long numberOfElements const double array 76 808 00504 0000 006 NVIDIA void void void void Introduction to the Cg Runtime Library cgGLSetParameterArray3f CGparameter parameter long startIndex long numberOfElements const float array cgGLSetParameterArray3d CGparameter parameter long startIndex long numberOfElements const double array cgGLSetParameterArray4f CGparameter parameter long startIndex long numberOfElements const float array cgGLSetParameterArray4d CGparameter parameter long startIndex long numberOfElements const double array The digit in the name of those functions indica
57. IDirect 3DDevice8 CreateVertexShader A data stream is basically an array of data structures Each of those structures is of a particular type called the vertex format of the stream Here is an example of a vertex declaration for Direct3D 9 const D3DVERTEXELEMENT9 declaration LO silzcor float D3DDECLTYPE_FLOAT3 D3DDECLMETHOD_DEFAULT D3DDECLUSAGE_POSITION 0 Position LO Ss ezeo oat D3DDECLTYPE_FLOAT3 D3DDECLMETHOD_DEFAULT D3DDECLUSAGE_NORMAL 0 Normal LO 8 slizcor rote y D3DDECLTYPE_FLOAT2 D3DDECLMETHOD_DEFAULT D3DDECLUSAGE_TEXCOORD 0 Base texture T db 0 sizcor elote y D3DDECLTYPE_FLOAT3 D3DDECLMETHOD_DEFAULT D3DDECLUSAGE_TEXCOORD 1 Tangent D3DD3CL_END y Here is an example of a vertex declaration for Direct3D 8 const DWORD declaration D3DVSD_STREAM 0 D3DVSD_REG D3DVSDE_POSITION D3DVSDT_FLOAT3 Position D3DVSD REG D3DVSDE NORMAL D3DVSDI_FLOAT3 Normal D3DVSD SKIP 2 Skip the diffuse and specular color D3DVSD REG D3DVSDE TEXCOORDO DSDM DTERKCOA 2 eee west D3DVSD STREAM 1 Tangent basis stream D3DVSD REG D3DVSDE EXCOORD1 D3DVSDT FLOAT3 Tangent D3DVSD END y Both declarations tell the Direct3D runtim
58. Multiple color outputs are not supported in pixel shaders Only Coloro is supported 808 00504 0000 006 303 NVIDIA Cg Language Toolkit DirectX Vertex Shader 1 1 Profile vs 1 1 The DirectX Vertex Shader 1 1 profile is used to compile Cg source code to DirectX 8 1 Vertex Shaders and DirectX 9 VS 1 1 shaders o Profile name vs 1 1 Q How to invoke Use the compiler option profile vs 1 1 The vs 1 1 profile limits Cg to match the capabilities of DirectX Vertex Shaders This section describes how using the vs 1 1 profile affects the Cg source code that the developer writes Memory Restrictions DirectX 8 vertex shaders have a limited amount of memory for instructions and data Program Instruction Limits The DirectX 8 vertex shaders are limited to 128 instructions If the compiler needs to produce more than 128 instructions to compile a program it reports an error Vector Register Limits Likewise there are limited numbers of registers to hold program parameters and temporary results Specifically there are 96 read only vector registers and 12 read write vector registers If the compiler needs more registers to compile a program than are available it generates an error Language Constructs and Support Data Types This profile implements data types as follows O float data types are implemented as IEEE 32 bit single precision Q half and double data types are treated as float 8 To unders
59. MyInterface float Scale SomeMethod float x return Scale x y In order to obtain the unique enumerant associated with a parameter s type the following entry point should be used CGtype cgGetParameterNamedType CGparameter param The CGtype associated with a named user defined type in a program can be retrieved using CGtype cgGetNamedUserType CGhandle handle const char name Here handle can be either a CGprogram Or a CGeffect The struct types can implement a given interface In such a case the indicated interface is known as a parent type of the struct type In the example above MyStruct has a single parent type MyInterface The parent types of a given named type may be obtained with the following entry points int cgGetNumParentTypes CGtype type CGtype cgGetParentType CGtype type int index Note that the Cg language specification currently makes it impossible for a struct type to have more than a single parent type 64 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library All of the user defined types associated with a program may be obtained with the following entry points int cgGetNumUserTypes CGprogram program CGtype cgGetUserType CGprogram program int index Note that the runtime treats interface program parameters as if they were structure parameters with no concrete data or function members In older applications that use the Cg runtime you may encounter the
60. MySampler cgD3D9SetTexture mySampler myDefaultPoolTexture Te se El See the Direct3D documentation for a full explanation of lost devices and how to properly handle them Setting Expanded Interface Parameters This section discusses setting the various types of parameters of the expanded interface including uniform scalar uniform vector uniform matrix uniform arrays of the three previous types and sampler Setting Uniform Scalar Vector and Matrix Parameters The function cgD3D9SetUni form sets floating point parameters like float3 and float4x3 HRESULT cgD3D9SetUniform CGparameter parameter const void value The amount of data required depends on the type of parameter but is always specified as an array of one or more floating point values The type is void so a user defined structure that is compatible can be passed in without type casting Here is some code illustrating the use of cgD3D9SetUni form for setting a vectorParam of type float3 matrixParamof type float2x3 and arrayParam of type float2x2 3 DEDXVECTORS vectorData l 2 3 filbat matrixDatal2 4 2 Sty 14 SBS Bi float arrayData 3 2 2 title Zh 187 Aio lis Oro iodo MAS WON ML 1231 cgD3D9SetUniform vectorParam amp vectorData cgD3D9SetUniform matrixParam matrixData cgD3D9SetUniform arrayParam arrayData As mentioned previously cgD3D9TypeToSize can be used to determine how many values are requi
61. N N E where strq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate _coordl are texture coordinates associated with the n 2 texture unit intermediate coord2 are texture coordinates associated with the n 1 texture unit and eye is the eye ray vector This function can be used generate the dot product reflect cube map const eye NV texture shader instruction combination tex dp3x2 depth float3 str float4 intermediate coord float4 prevlookup Performs the following float z dot intermediate coord xyz prevlookup xyz float w dot str prevlookup xyz return z w where str are texture coordinates associated with the nth texture unit intermediate coord are texture coordinates associated with the n 1 texture unit and prevlookup is the result of a previous texture operation This function can be used in conjunction with the DEPTH varying out semantic to generate the dot product depth replace NV texture shader instruction combination 294 808 00504 0000 006 NVIDIA Appendix B Language Profiles Examples The following examples show how a developer can use Cg to achieve NV_texture_shader and NV_register_combiners functionality Example 1 struct VertexOut float4 color ENG ORORO float4 texCoord0 TEXCOORDO sui riis Totoral y EDe OR DMs y float4 main VertexOut IN uniform sampler2D diffuseMap un
62. The final skinned positions are computed using these bones along with the weights supplied per vertex Tangent space bases are skinned in a similar fashion and then used to transform the light vector into tangent space for per pixel bump mapping Fig 22 Fig 22 Example of Matrix Palette Skinning 808 00504 0000 006 217 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Matrix Palette Skinning struct appdata ellen Ss iles slicskoin 8 POSITION loat2 Weights BLENDWEIGHTO loat2 Indices BLENDINDICES loat3 Normal NORMAL T loat2 TexCoord0 TEXCOORDO Loats 5 TEXCOURDI loat3 SN TEXCOORD lan la ten da Gey Uy ler ARSS 2 MEDMCOMINDI SIs y struct vpconn float4 Hposition POSITION float4 TexCoord0 TEXCOORDO float4 TexCoordl TEXCOORD1 float4 Color0 COLORO y vpconn main appdata IN uniform float4x4 WorldViewProj uniform float3x4 Bones 26 uniform float3 LightVec VOC Onmm OWA float4 tempPos tempPos xyz IN Position xyz tempPos w 1 0 grab first bone matrix float IN indices x transform position float3 pos0 mul Bones i tempPos create 3x3 version of bone matrix float3x3 m m mOO m01 m02 Bones i _m00_m01_m02 fate MO mdd il 2 Bonos ia m _m20_m21_m22 Bones i _m20_m21_m22 tension S UT SEXT float3 s0 mula NIS 218 808 00504 0000 006 NVIDIA Bas
63. about the performance of this and other NVIDIA GPUs The p40 profile therefore provides two options to control whether the compiler should emit branches or conditionally executed code for the if statements and loops within Cg shaders The options are described in Table 22 268 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 22 p40 Compiler Branching Options Compiler Option Description ifcvt all none count N Changes the if conversion mode based on the option selected QO all All i statements are converted to conditional writes QO none All if statements generate branching code O count N Sets if limit cost to N operations unroll all none count N Changes the loop unrolling mode based on the option selected a all All loop statements that can be unrolled will be Ud none All loop statements that can be implemented with branching will be O countzN Sets loop limit cost to N operations Setting both ifevt and unroll to a11 yields behavior similar to the p30 profile for which branch instructions are not available Using ifcvt none places the burden on the Cg fragment program author to use i statements where they want true branches and to use conditional expressions otherwise FACE Semantic The FACE semantic can be applied to a varying parameter to a program The value of such a parameter has a value less than zero if the fragment being render
64. corresponding formal parameter in any function in the set remove all functions whose corresponding parameter does not match exactly b If there is a defined promotion for the type of the actual parameter to the unqualified type of the formal parameter of any function remove all functions for which this is not true from the set c If there is a valid implicit cast that converts the type of the actual parameter to the unqualified type of the formal parameter of any function remove all functions without this cast d Fail Choose a function based on profile a If there is at least one function with a profile that exactly matches the compilation profile discard all functions that don t exactly match b Otherwise if there is at least one function with a wildcard profile that matches the compilation profile determine the most specific matching wildcard profile in the candidate set Discard all functions except those with this most specific wildcard profile How specific a given wildcard profile name is relative to a particular profile is determined by the profile specification 240 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification 7 Ifthe number of functions remaining in the set is not one then fail Global Variables Global variables are declared and used as in C Uniform non static variables may have a semantic associated with them Uniform non static variables may have their value set throu
65. corresponding to the fixed function light s diffuse color is set with an expression involving the DiffuseColor effect parameter If the value of this parameter is changed by the application and the pass s state is later set the parameter s new value is used in the expression that sets the light s diffuse color Note also that this expression is parenthesized In general CgFX requires that most expressions like this one involving effect parameters be in parenthesis This is necessary so that CgFX can distinguish between effect parameters and built in enumerant values representing constants The code below demonstrates how to create an effect given the name of an effect file After creating a Cg context cgGLRegisterStates sets up the state assignments that support the standard OpenGL state manager Most applications will want to do this immediately after creating the CGcontext Next the effect is created and associated with the given context CGcontext context cgCreateContext cgGLRegisterStates context CGeffect effect cgCreateEffectFromFile context oamp ler oe a me N UI if leffect fprintf stderr Unable to creat ffect n const char listing cgGetLastListing context de Mist corales sedert Esa lisina exa iL E 808 00504 0000 006 119 NVIDIA Cg Language Toolkit Technique Validation Before using any of the techniques in an effect it s important to validate the te
66. diffCol diffCol color a 1 0 182 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders Car Paint 9 Description This car paint shader uses gonioreflectometric paint samples measured by Cornell University The samples were converted into a 2D texture map which is indexed using NdotL and NdotH as the s t coordinate pair and which provides the diffuse component of our lighting equation The specular term is calculated using the Blinn model and also includes a term which simulates the clear coat s metallic flecks The fleck normal mipmap chain has randomly generated vectors which reside within a positive Z cone in tangent space The cone is reduced gradually at every level such that in the distance the flecks are pointing mostly up The flecks specular power and their contribution are reduced by distance to give it a grainier appearance up close and a more uniform appearance from afar Next the view vector is reflected off a wavy normal map which represents the object s natural undulations to index into the environment map The shininess of the clear coat itself is calculated by scaling the Fresnel term by the luminance of the environment map The luminance transfer function selects only the perceptually bright areas of the environment map in order not to reflect the darker areas of the scene Finally the shader lerps between the diffuse paint color and the reflection based on the Fresnel term and adds the
67. each new product generation comes a two fold increase in performance Graphics processor performance increases at approximately three times the rate of microprocessors Moore s Law cubed In addition to the performance increases each year brings new hardware features supported by new application programming interfaces APIs This dizzying pace is difficult for developers to adapt to but adapt they must Developers and users are demanding better rendering quality and more realistic imagery and experiences Users don t care about the details they simply want games and other interactive applications to look more like movies special effects and animation Developers want more power always more along with more flexibility in controlling the massively capable GPUs of today and tomorrow APIs do not and cannot keep up with the rapid pace of innovation in GPUs As APIs and underlying technologies change programmers artists and software publishers struggle to adapt to the change and the churn of the hardware software platform What s needed is to raise the level of abstraction for interaction with GPUs Continued updates and improvements to the hardware and APIs are too painful if developers are too close to the metal This problem was 808 00504 0000 006 xiii NVIDIA Cg Language Toolkit exacerbated by the advent of programmability in GPUs Older GPUs had a small number of controllable or configurable rendering paths but th
68. either entry point Only unsized arrays may be modified using these entry points Parameter Attributes A parameter s general class can be queried using CGparameterclass cgGetParameterClass CGparameter param The returned CGparameterclass value enumerates the high level parameter classes O CG PARAMETERCLASS SCALAR A scalar type such as CG INT or CG FLOAT O CG PARAMETERCLASS VECTOR A vector type such as CG_INT1 or CG_FLOAT4 O CG_PARAMETERCLASS_MATRIX A matrix type such as CG_INT1X2 Or CG_FLOAT4X4 O CG_PARAMETERCLASS_STRUCT A struct or interface O CG PARAMETERCLASS SAMPLER A sampler type such as sampler1D or samplerCUBE O CG PARAMETERCLASS OBJECT A texture string or program The program that the parameter corresponds to is found using cgGetParameterProgram CGprogram cgGetParameterProgram CGparameter parameter To determine whether the parameter is varying uniform or constant cgGetParameterVariability is used CGenum cgGetParameterVariability CGparameter parameter The call returns CG_VARYING if the parameter is a varying parameter CG UNIFORM if the parameter is a uniform parameter or CG CONSTANT if the parameter is a constant parameter A constant parameter is a parameter whose value never changes for the life of a compiled program so that changing its 68 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library value requires recompiling the program For some profiles the
69. float and int types except for the usual arithmetic conversion behavior and function overloading rules see Function Overloading on page 240 The usual arithmetic conversions for binary operators are defined as follows 1 If either operand is double the other is converted to double 2 Otherwise if either operand is float the other operand is converted to float 3 Otherwise if either operand is ha1 the other operand is converted to half 4 Otherwise if either operand is fixed the other operand is converted to fixed 236 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification 5 Otherwise if either operand is c 1oat the other operand is converted to cfloat 6 Otherwise if either operand is int the other operand is converted to int 7 Otherwise both operands have type cint Note that conversions happen prior to performing the operation Assignment Assignment of an expression to an object or compile time typed value converts the expression to the type of the object or value The resulting value is then assigned to the object or value The value of the assignment expressions and so on is defined as in C An assignment expression has the value of the left operand after the assignment but is not an lvalue The type of an assignment expression is the type of the left operand unless the left operand has a qualified type in which case it is the unqualified version of the type of the l
70. floating point is supported It is recommended that you use fixed half and float in that order for maximum performance Reversing this order provides maximum precision You are encouraged to use the fastest type that meets your needs for precision Statements and Operators Full support for if else Q No or and while loops unless they can be unrolled by the compiler Q Support for flexible texture mapping QO Support for screen space derivative functions Q No support for variable indexing of arrays 274 808 00504 0000 006 NVIDIA Bindings Appendix B Language Profiles Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the p30 profile are sum marized in Table 26 Table 26 p30 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register s0 register s15 Texunit N where N is in the range 0 15 TEXUNITO TEXUNIT15 May be used only with uniform inputs with sampler types register c0 register c31 Constant register N where N is in range C0 C31 0 15 May only be used with uniform inputs Binding Semantics for Varying Input Output Data The valid binding semantics for varying input parameters in the p30 profile are summarized in Table 27 These binding semantics map to NV_fragment_program input registers The two sets act as aliases to each other The profile also allows POSITION FOG PSIZE HPOS FOGC PSIZ BCOLO BCOL1
71. if x is infinite isnan x Returns true if x is NaN not a number ldexp x n x 2 lerp a b f Linear interpolation 1 a b fwhere a and b are matching vector or scalar types Parameter can be either a scalar or a vector of the same type as a and b lit ndotl ndoth m Computes lighting coefficients for ambient diffuse and specular light contributions Returns a 4 vector as follows The x component of the result vector contains the ambient coefficient which is always 1 0 The y component contains the diffuse coefficient which is zero if n 1 0 otherwise n 1 The z component contains the specular coefficient which is zero if either n 1 lt Oor ne n lt 0 n 9 n otherwise The w component is 1 0 There is no vectorized version of this function log x Natural logarithm 1n x x must be greater than zero log2 x Base 2 logarithm of x x must be greater than zero log10 x Base 10 logarithm of x x must be greater than zero max a b Maximum of a and b min a b Minimum of a and b 808 00504 0000 006 35 NVIDIA Cg Language Toolkit Table 1 Mathematical Functions continued Mathematical Functions Function Description modf x out ip Splits x into integral and fractional parts each with the same sign as x Stores the integral part in ip and returns the fractional part mul M N Matrix
72. images To interface Cg programs with applications you must do two things 1 Compile the programs for the correct profile In other words compile the programs into a form that is compatible with the 3D API used by the application and the underlying hardware 2 Link the programs to the application program This allows the application to feed varying and uniform data to the programs You have two choices as to when to perform these operations You can perform them at compile time when the application program is compiled into an executable or you can perform them at run time when the application is actually executed The Cg runtime is an application programming interface that allows an application to compile and link Cg programs at run time 808 00504 0000 006 43 NVIDIA Cg Language Toolkit Benefits of the Cg Runtime Future Compatibility Most applications need to run on a range of profiles If an application precompiles its Cg programs the compile time choice it must store a compiled version of each program for each profile This is reasonable for one program but is cumbersome for an application that uses many programs What s worse the application is frozen in time It supports only the profiles that existed when it was compiled it cannot take advantage of the optimizations that future compilers could offer In contrast programs compiled by applications at run time O Benefit from future compiler optimizations for
73. instructions no limit on texture instructions no limit on texture dependent reads and support for predication This section describes the capabilities and restrictions of Cg when using these profiles Program Instruction Limit DirectX 9 Pixel shaders have a limit on the number of instructions in a pixel shader Q PS 2 0 ps_2_0 pixel shaders are limited to 32 texture instructions and 64 arithmetic instructions a Extended PS 2 ps 2 x shaders have a limit of maximum number of total instructions between 96 to 1024 instructions There is no separate texture instruction limit on extended pixel shaders If the compiler needs to produce more than the maximum allowed number of instructions to compile a program it reports an error Vector Register Limit Likewise there are limited numbers of registers to hold program parameters and temporary results Specifically there are 32 read only vector registers 7 To understand the capabilities of DirectX PS 2 0 Pixel Shaders and the code produced by the compiler refer to the Pixel Shader Reference in the DirectX 9 SDK documentation 300 808 00504 0000 006 NVIDIA Appendix B Language Profiles and 12 32 read write vector registers If the compiler needs more registers to compile a program than are available it generates an error Language Constructs and Support Data Types This profile implements data types as follows O float data type is implemented as IEEE 32 bit si
74. is no enum or union Bit field declarations in structures are not allowed There are no bit field declarations in structures D D D DO Variables may be defined anywhere before they are used rather than just at the beginning of a scope as in C That is we adopt the C rules that govern where variable declarations are allowed Variables may not be redeclared within the same scope Q Vector constructors such as the form 1oat4 1 2 3 4 may be used anywhere in an expression Q A struct definition automatically performs a corresponding typedef as in C Q An interface can be specified to define a set of methods that comprises an abstract interface Q A struct type can be declared as implementing an interface by adding a colon and the name of the interface after the name of the struct Methods can be defined in the body of a struct definition C style comments are allowed in addition to C style comments Detailed Language Specification Definitions The following definitions are based on the ANSI C standard Q Object An object is a region of data storage in the execution environment the contents of which can represent values When referenced an object may be interpreted as having a particular type Q Declaration A declaration specifies the interpretation and attributes of a set of identifiers a Definition A declaration that also causes storage to be reserved for an object or co
75. it this way loea lowed Ao 4e 194 Boks a lem a w b w than to write it this way float4 c ath The compiler does its best to find vectorization in your programs but the more vectorized your original code is the better starting place it has to work from A more specific example comes from a common computation done for tangent space bump mapping Given a texture map that encodes a bump map by storing the offset along the tangent direction in x the offset along the binormal in y and the offset along the normal in z the bump mapped normal is computed by scaling the tangent binormal and normal appropriately In C or C the natural way to write this computation is as shown Tangent binormal normal Passed in from vertex program Indes T 2 Np Float3 Nbump Bump mapped normal Float3 bump tex2D bumpSampler uv Note loin A ap lobia Noe Noto y loeo Toy ar lobo oy iy ar loto 79 IN Novios lomos oz ap loups I ce lio a 9 ENEZ However here we have written a series of computations that add and multiply single pairs of floating point values at a time After a little algebra we can rewrite this as three multiplies of a 1oat3 and a float and two loat3 additions which runs several times faster than the original Nouns lobia gs UD sr Jouwqpgw c i xe EUME zZ INP 322 808 00504 0000 006 NVIDIA Appendix C Nine Steps to High Performance Cg 2 Use Swiz
76. look up in a cube map Fig 21 Fig 21 Example of Sine Wave 214 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Sine Wave struct appdata float4 TexCoord0 TEXCOORDO y struct woeonm i tloecd Pos e POSITION float4 COLO COLORO float4 TEXO TEXCOORDO y vpconn main appdata IN uniform float4x4 WorldViewProj uniform float3x4 WorldView uniform float3x3 WorldViewIT uniform float3 WavesX acojan lors WavesY uniform float3 WavesH lima ona closes Time MOC Omni OW ley float3 angle WavesX IN TexCoord0 x WavesY IN TexCoord0 y angle angle Time float3 sine cosine sincos angle sine cosine posicion iss u sumas sim smedei vy 1 float4 position position xz IN TexCoord0 xy POSE LON o y dot WavesH sine position w lcg OUT HPOS mul WorldViewProj position normal is t a WaveX cos angle fia t h WaveY cos angle ellos normal normal x dot WavesH WavesX cosine normalny l o 03 y 808 00504 0000 006 215 NVIDIA Cg Language Toolkit 216 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Matrix Palette Skinning Description This effect performs matrix palette skinning using two bones per vertex All the bones for the mesh are set in the constant memory and each vertex includes two indices that indicate which bones influence this vertex
77. main parameter gives the name of the function to use as the main entry 46 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library point when the program is executed Lastly args is a null terminated list of null terminated strings that is passed as an argument to the compiler Loading a Program After you compile a program you need to pass the resulting object code to the 3D API that you re using For this you need to invoke the Cg runtime s API specific functions The Direct3D specific functions require the Direct3D device structure in order to make the necessary Direct3D calls The application passes it to the runtime using the following call cgD3D9SetDevice Device You must do this every time a new Direct3D device is created typically only at the beginning of the application You can then load a Cg program in this way for the Direct3D 9 Cg runtime cgD3D9LoadProgram program CG_FALSE 0 or this way for the Direct3D 8 Cg runtime cgD3D8LoadProgram program CG_FALSE 0 0 vertexDeclaration The parameter vertexDeclaration is the Direct3D 8 vertex declaration array that describes where to find the necessary vertex attributes in the vertex streams See Expanded Interface Program Execution on page 103 for the details on the arguments to cgD3D8LoadProgram and cgD3D9LoadProgram In OpenGL the equivalent call is cgGLLoadProgram program Modifying Program Parameters
78. parameter shadowing is turned off for a given program and the value of any of its uniform parameters is set by some function of the Direct3D Cg runtime it is immediately downloaded to the GPU constant memory the memory containing the values of all the uniform parameters When parameter shadowing is turned on the value is shadowed instead and no Direct3D call is made at the time it is set only when the program is bound are all of its parameters actually downloaded to the constant memory This means that a parameter value set after binding the program is not used during the execution of the program until the next time the program is bound Parameter shadowing applies to all parameter settings including texture state stage and texture mode Disabling parameter shadowing allows the runtime to consume less memory but forces the application to do the work of making sure that the constant memory contains all the right values every time it activates a program OpenGL Cg Runtime This section discusses setting parameters and program execution for the OpenGL Cg runtime 808 00504 0000 006 73 NVIDIA Cg Language Toolkit Note Before any OpenGL Cg runtime functions can be executed an OpenGL context must be created with either wylCreateContext or glXCreateContext Setting Parameters in OpenGL In accordance with the OpenGL convention many of the functions described below come in two versions a version operating on float
79. product of matrix mand matrix N as shown below Mi Ma Ms Ma Ni Na Ns Na Mia Mza Mz Ma Ni Na Ns2 Na Mis Mos Mss Ma Ni Na N33 Nas Mia M24 M34 Maa Ni4 N23 N34 Nas If M has size AxB and N has size BxC returns a matrix of size AxC mul M N mul M v Product of matrix M and column vector v as shown below Mi Ma Ma Ma Vi Mia Mza Mz Maz V2 Mis Mos M33 Maz V3 Mia M24 M34 Maa Va mul M v If M is an AxB matrix and v is a Bx1 vector returns an Ax1 vector mul v M Product of row vector v and matrix M as shown below Mi Ma Mz Ma Mia Mz Mz Maz Mis Mos M33 Maz Mia Mar Ms4 Maa If v is a 1xA vector and M is an AxB matrix returns a 1xB vector mul v M Vi V2 V3 Va noise x pow x y Either a 1 2 or 3 dimensional noise function depending on the type of its argument The returned value is between zero and one and is always the same for a given input value xY radians x Degree to radian conversion round x Closest integer to x 36 808 00504 0000 006 NVIDIA Table 1 Cg Standard Library Functions Mathematical Functions continued Function Mathematical Functions Description rsqrt x Reciprocal square root of x x must be greater than zero saturate x Equivalent to clamp x O 1 Returns 0 if x is less than O Returns 1 if x is greater than 1
80. profile is an enumerant specifying the profile to which the program must be compiled Q entry is the name of the function that must be considered as the main entry point by the compiler If the value is zero the name main is used Q args isa pointer to a null terminated array of null terminated strings that are passed as arguments to the compiler The pointer may itself be null The only difference between the two functions is how programis interpreted For cgCreateProgramFromFile program is a string containing the name of a file containing source code for cgCreateProgram program directly contains source code If the enumerant programType is equal to CG_SOURCE the source code is Cg source code if it is equal to cG OBJECT the source code is precompiled object code and does not require any further compilation The CGprogram handle returned by cgCreateProgramFromFile is valid if it is different from zero which means that the program has been successfully created and compiled The program is destroyed by passing its handle to cgDestroyProgram void cgDestroyProgram CGprogram program The Cg runtime allows for either automatic or manual compilation of programs Compilation of a program is required before the program may be used when drawing As such program compilation is necessary sometime after the program is first created or whenever it enters an uncompiled state A program may enter an uncompiled state for a variety
81. program and any of its parameter handles On the other hand destroying a program with cgDestroyProgram or cgDestroyContext releases any Direct3D resources by indirectly calling cgD3D9UnloadProgam Function cgD3D9IsProgramLoaded returns CG TRUE if a programis loaded CGbool cgD3D9IsProgramLoaded CGprogram program 104 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library All programs must be loaded before they can be bound Binding a program is done by calling cgD3D9BindProgranm HRESULT cgD3D9BindProgram CGprogram program This function basically activates the Direct3D shader corresponding to program by calling IDirect 3DDevice9 SetVertexShader or IDirect3DDevice9 SetPixelShader depending on the program s profile If parameter shadowing is enabled for program it also sets all the shadowed parameters and their associated Direct3D states such as texture stage states for the sampler parameters No value or state tracking is performed by the runtime so that this setting is done regardless of what the current values of these parameters or of their states are If a shadowed parameter has not been set by the time cgD3D9BindProgram is called no Direct3D call of any sort is issued for this parameter Only one vertex program and one fragment program can be bound at any given time so binding a program of a given type implicitly unbinds any other program of the same type Expanded Interface Profil
82. program except for specifying a different profile However if any of the glProgramParameterxxNV routines are used the application program needs to be changed to use the corresponding ARB functions Since there is no ARB function corresponding to glTrackMatrixNV an application using glTrackMatrixNV and the arbvp1 profile needs to be modified One solution is to change the Cg source code to refer to the matrix using the state structure so that the matrix is automatically tracked by the OpenGL driver as part of its GL_ARB_vertex support Another solution is for the application to use the Cg run time routine cgGLSetStateMatrixParameter to load the appropriate matrix or matrices when necessary Another potential incompatibility between the arbvp1 and vp20 profiles is the way that input varying semantics are handled In the vp20 profile semantic names such as POSITION and ATTRO are aliases of each other the same way NV_vertex_program aliases Vertex and Attribute 0 see Table 30 p20 Varying Input Binding Semantics on page 281 In the arbvp1 profile the semantic names are not aliased because ARB vertex program allows the conventional attributes such as vertex position to be separate from the generic attributes such as Attribute 0 For this reason it is important to follow the conventions given in Table 17 arbvp1 Varying Input Binding Semantics on page 261 so that arbvp1 programs work for all implementations of ARB_vertex_pr
83. programs to behave correctly under other pixel shader profiles The swizzles required on the texture coordinate parameter to the projective texture lookup functions are listed in Table 34 808 00504 0000 006 287 NVIDIA Cg Language Toolkit Bindings Table 34 Required Projective Texture Lookup Swizzles Texture Lookup Function Texture Coordinate Swizzle texlDproj Xw ra tex2Dproj xyw rga texRECTproj Xyw rga tex3Dproj xyzw rgba texCUBEproj Xyzw rgba Manual Assignment of Bindings The Cg compiler can determine bindings between texture units and uniform sampler parameters texture coordinate inputs automatically This automatic assignment is based on the context in which uniform sampler parameters and texture coordinate inputs are used together To specify bindings between texture units and uniform parameters texture coordinates to match their application all sampler uniform parameters and texture coordinate inputs that are used in the program must have matching binding semantics for example TEXUNIT n may only be used with TEXCOORD n Partially specified binding semantics may not work in all cases Fundamentally this restriction is due to the close coupling between texture samplers and texture coordinates in the NV texture shader extension Binding Semantics for Uniform Data If a binding semantic for a uniform parameter is not specified then the compiler will allocat
84. sample of which is given below For a complete list see Texture Map Functions on page 38 808 00504 0000 006 23 NVIDIA Cg Language Toolkit Q Standard nonprojective texture lookup tex2D sampler2D tex float2 s texRECT samplerRECT tex float2 s texCUBE samplerCUBE tex float3 s Q Standard projective texture lookup tex2Dproj sampler2D tex float3 sq texRECTproj samplerRECT tex float3 sq texCUBEproj samplerCUBE tex float4 sq a Nonprojective texture lookup with user specified filter kernel size tex2D sampler2D tex float2 s float2 dsdx float2 dsdy texRECT samplerRECT tex float2 s float2 dsdx float2 dsdy texCUBE samplerCUBE tex float3 s float3 dsdx float3 dsdy The filter size is specified by providing the derivatives of the texture coordinates with respect to pixel coordinates x dsdx and y dsdy For more information see Texture Map Functions on page 38 Q Shadowmap lookup tex2Dproj sampler2D tex float4 szq tex2DRECT samplerRECT tex float4 szq In these functions the z component of the texture coordinate holds a depth value to be compared against the shadowmap Shadowmap lookups require the associated texture unit to be configured by the application for depth compare texturing otherwise no depth comparison is actually performed Effects Cg includes a powerful versatile shader specification and interchange format CgFX For artists and developers of rea
85. sg e E lt a es P moines ll 285 8 Rep gt maval maxval xxx B Xp return x Texture Lookups in Advanced Fragment Profiles Cg s advanced fragment profiles and the vp40 profile provide a variety of texture lookup functions Please note that Cg uses a different set of texture lookup functions for basic fragment profiles because of the restricted pixel programmability of that hardware Basic fragment profile lookup functions aren t discussed in this introductory chapter Advanced fragment profile texture lookup functions always require at least two parameters Q Texture sampler A texture sampler is a variable with the type sampler sampler1D sampler2D sampler3D samplerCUBE Or samplerRECT and represents the combination of a texture image with a filter clamp wrap or similar configuration Texture sampler variables cannot be set directly within the Cg language instead they must be provided by the application as uniform parameters to a Cg program Q Texture coordinate Depending on the type of texture lookup the coordinate may be a scalar a two vector a three vector or a four vector The following fragment program uses the tex2D function to perform a 2D texture lookup to determine the fragment s RGBA color void applytex uniform sampler2D mytexture 1t ML oye 2 uv TEXCOORDO out float4 outcolor COLOR outcolor tex2D mytexture uv Cg provides a wide variety of texture lookup functions a
86. simple files ro P POSITION 2w Fil oat4 Position Y Source Files float4 Normal NORMAL iS cg simple cpp E data path cpp 4 obiload cpp define outputs from vertex shader Y Header Files struct vertout t 3 CG Programs iti i float4 HPosition POSITION jJ simple ca float4 Color0 COLORO 3 Extemal Dependencies vertout main appin IN uniform float4x4 ModelVievProj uniform float4x4 ModelVievIT uniform float4 LightVec vertout OUT transform vertex position into homogenous clip space OUT HPosition mul ModelViewProj IN Position transform normal from model space to view space float3 normalVec normalize mul ModelViewIT IN Normal xyz store normalized light vector float3 lightVec normalize lightVec xyz calculate half angle vector float3 eyeVec float3 0 0 0 0 1 0 float3 halfVec normalize lightVec eyeVec calculate diffuse component float diffuse dot normalVec lightYec Es Uni Coil REC COL DVR READ A Fig 3 The Cg Simple Workspace 808 00504 0000 006 145 NVIDIA Cg Language Toolkit As usual click the FileView tab to view the various files in the project What s different in this case though is that in addition to the usual Source Files and Header Files folders there is also a Cg Programs folder This Cg Programs folder should contain one Cg program simple cg which is what you can use for experimentation Double click simple cg to ope
87. source code Examples shown are Anisotropic Lighting Bump Dot3x2 Diffuse and Specular Bump Reflection Mapping Fresnel Grass Refraction Shadow Mapping Shadow Volume Extrusion Sine Wave Demo Oooo oO O OO ODO DO Matrix Palette Skinning 808 00504 0000 006 189 NVIDIA Cg Language Toolkit Anisotropic Lighting Description The anisotropic lighting effect Fig 13 shows the vertex program s half angle vector calculation It uses HdotN and LdotN per vertex to look up into a 2D texture to achieve interesting lighting effects Fig 13 Example of Anisotropic Lighting 190 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Anisotropic Lighting struct appdata Hoar Poslrlca ROSI ETON float3 Normal y struct VO OA NORMAL Eloet Hposicion B POSITION float4 TexCoord0 TEXCOORDO y vpconn main appdata IN uniform float4x4 WorldViewProj Uns orm elloar Wo lel uniform float3x4 World uniform float3 LightVec uniform float3 EyePos vpconn OUT CEMPLOS XYZ tempPos w vector from float3 vertTol float3 worldNormal normalize mul WorldIT IN Normal build float4 float4 tempPos INAP O Sion ey TOA compute world space position float3 worldSpacePos mul World tempPos vertex to eye normalized OUT TexCoord0 OU exCoord0 OUT Hposition return OUT Eye normalize EyePos wor
88. structure that is defined in simple cg is vertout which connects the vertex to the fragment define outputs from vertex shader SETUCO vertout float4 HPosition g POSITIONS float4 Color COLOR y 148 808 00504 0000 006 NVIDIA A Brief Tutorial The vertout structure also contains only two members Hposition the vertex position in homogeneous coordinates and Color the vertex color Again binding semantics are used to specify register locations for the variables In this case the homogeneous position information resides in the hardware register corresponding to POSITION and that the color information resides in the hardware register corresponding to COLOR Passing Arguments Now let s take a look at the body of the program section by section starting with the declaration of main vertout main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewIT uniform float4 LightVec As required for a vertex program main takes an application to vertex structure as input and returns a vertex to fragment structure In this case we are using the two structure types we have already defined appin and vertout Notice that main takes in three uniform parameters two matrices and one vector All three parameters are passed to simple cg by the application using the run time library The first matrix ModelViewProj is the concatenation of the modelview and projection matrices Together these matrices transfo
89. supported by the Cg Standard Library Vertex profiles are not required to support these functions Table 4 Derivative Functions Derivative Functions Function Description ddx a Approximate partial derivative of a with respect to Screen space x coordinate ddy a Approximate partial derivative of a with respect to screen space y coordinate Debugging Function Table 5 Debugging Function presents the debugging function that is supported by the Cg Standard Library Vertex profiles are not required to support this function 808 00504 0000 006 41 NVIDIA Cg Language Toolkit Table 5 Debugging Function Debugging Function Function Description void debug float4 x If the compiler s DEBUG option is specified calling this function causes the value x to be copied to the COLOR output of the program and execution of the program is terminated If the compiler s DEBUG option is not specified this function does nothing The debug function is intended to allow a program to be compiled twice once with the DEBUG option and once without By executing both programs you can obtain one frame buffer containing the final output of the program and a second containing an intermediate value to be examined for debugging Predefined Fragment Program Output Structures A number of e per structure types for use in fragment programs are predefined in the standard library Variables of th
90. teak Ae eee mb ob Rel oC Rodeo d 208 Descriptio causes tice oboe ERU RUE EO bae E d BOR doka OPE a Rc eed ae ee E eR RE REIR 208 Vertex Shader Source Code for Shadow Mapping 0 000 eee eee 209 Pixel Shader Source Code for Shadow Mapping llle 210 Shadow Volume EXEPUSIOE s sai sa a ahaa RC RR OR ER y V OR e e Rd ec OR 211 presi et PRICE UELLE UTE 211 Vertex Shader Source Code for Shadow Volume Extrusion 000005 212 Sine Wave DEMO scc so e RH ceed Ree OR qot c RAR ESA e e RR 214 BILE ca TT TTL 214 Vertex Shader Source Code for Sine Wave liliis 215 Matrix Palette SKIMMING s xu ae gate det da A gid ae a A 217 DESCAPON cock x cer A CHAR Red OC ORO MERA CR Ree RR 217 Vertex Shader Source Code for Matrix Palette Skinning 218 Appendix A Cg Language Specification ooooocccccr eee 221 Language OVSIVIGW cs sss tsi A RS RENE eU Sd ROLE E RSEN NOR PU US 221 Silent incormpatibilities acu need Eg poder and ox eee Ed Vade do ad 221 Similar Operations That Must be Expressed Differently 000005 222 Differences from ANSI Co ce hh 222 Detailed Language Specification sse a hh oe 224 A Io ad Rate Ride Ru a echa i t RI Pao dba MR RS 224 Proteo neta o tite dapi cp tarda Bua a teat qaot Ap Rod AUR RAE 225 The Unifor MOAIEN is a aden dex iex oia de Ue RD t ai a RR alc p D a Re a 225 Function Declarations sise rd d o ac AP a Sob deed wea dade qe 226 Overloading of Fu
91. texture handle should be used for the sampler2D in the effect file Secondly the application must use the Cg runtime to set the texture state given in the sampler_state block at the appropriate time Under OpenGL the easiest way to achieve these goals is to call cgGLSetupSampler param textureID This entry points binds the given texture associates the texture handle with the given parameter and initializes the sampler state by calling cgSetSamplerState Alternately an application can perform these steps itself The code below shows this in practice CGparameter p cgGetNamedEffectParameter effect samp GLuint handle glGenTextures 1 amp handle glBindTexture GL TEXTURE 2D handle cgGLSetTextureParameter p handle cgSetSamplerState p glTexlImage2D GL TEXTURE 2D 0 GL RGBA RES RES 0 GL RGBA GL FLOAT data Note the calls to cgGLSetTextureParameter and cgSetSamplerState The first call is the usual runtime call that needs to be made to tell the runtime which OpenGL texture object is associated with a given parameter The egSetSamplerState call ends up making the glTexParameter calls that set up the texture state defined in the sampler state block It expects that the appropriate texture object has been bound with g1BindTexture first After the sampler has been initialized in either of these manners there are two possibilities for how the texture para
92. the same meaning as they do for the cgGLSetMatrixParameter functions Setting Varying Parameters The values of fragment program varying parameters are set as the result of the interpolation across the triangles performed by the GPU so only the values of vertex program varying parameters are set by the application Setting a vertex varying parameter requires two steps The first step consists in passing a pointer to an array containing the values for each vertex This is done using cgGLSetParameterPointer void cgGLSetParameterPointer CGparameter parameter GLint size GLenum type GLsizei stride GLvoid array The variable size indicates the number of values per vertex that are stored in array It is equal to 1 2 3 or 4 If fewer values are set than the parameter requires the non specified values default to 0 for x y and z and 1 for w 78 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library The enumerate type type specifies the data type of the values stored in array GL_SHORT GL_INT GL_FLOAT or GL_DOUBLE The parameter stride is the byte offset between any two consecutive vertices Passing a value of zero for stride is equivalent to passing a byte offset equal to size multiplied by the size of type in bytes in other words it means that there is no gap between two consecutive vertex values Note that the minimum size for array is implicitly defined by the biggest vertex index specified in the t
93. to which output values 808 00504 0000 006 127 NVIDIA Cg Language Toolkit are to be written ncomp is the number of components per pixel in the output buffer 1 2 3 or 4 and nx ny and nz indicate the number of positions at which the function should be evaluated in each of the x y and z dimensions The total size of the buffer should be equal to the product of the number of positions in each of the dimensions and the number of components in the buffer as in the example below define RES 256 define NCOMPS 4 float buf new float NCOMPS RES RES cgEvaluateProgram tp buf NCOMPS RES RES 1 do something with buf delete buf It is a error to pass a CGprogram that doesn t have the CG PROFILE GENERIC profile to cgEvalauteProgram Annotations Using annotations it is possible to attach additional information to parameters techniques programs and passes in the effect file for use by the application An annotation is a list of variables and values denoted by angle brackets immediately following a declaration as in the effect below loas Lalola aie lt sismo Ues Wehiicecieaeim p Sf technique fancyHalo lt bool optional true gt 4 pass lt string geometry character string destination texture gt CgFX does not interpret the meaning of annotations in any way annotations exist solely for the convenience of the application The example abov
94. values marked with an and a version operating on double values marked with a d Setting Uniform Scalar and Uniform Vector Parameters To set the values of scalar parameters or vector parameters use the cgGLSetParameter functions void cgGLSetParameterlf CGparameter parameter float x void cgGLSetParameterlfv CGparameter parameter const float array void cgGLSetParameterld CGparameter parameter double x void cgGLSetParameterldv CGparameter parameter const double array void cgGLSetParameter2f CGparameter parameter float x float y void cgGLSetParameter2fv CGparameter parameter const float array void cgGLSetParameter2d CGparameter parameter double x double y void cgGLSetParameter2dv CGparameter parameter const double array void cgGLSetParameter3f CGparameter parameter float x float y float z void cgGLSetParameter3fv CGparameter parameter const float array void cgGLSetParameter3d CGparameter parameter double x double y double z void cgGLSetParameter3dv CGparameter parameter const double array void cgGLSetParameter4f CGparameter parameter float x float y float z float w void cgGLSetParameter4fv CGparameter parameter const float array 74 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library void cgGLSetParameter4d CGparameter parameter double x double y double z double w void cgGLSetParameter4dv CGparameter parameter const dou
95. vp20 Vertex Shader profile and the DirectX VS 1 1 profile is that the vp20 profile supports two additional outputs BCOLO for back facing primary color and BCOL1 for back facing secondary color Position Invariance O The vp20 profile supports position invariance as described in the core language specification O The modelview projection matrix must be specified using a binding semantic of GL MVP Data Types This profile implements data types as follows float data types are implemented as IEEE 32 bit single precision O half and double data types are implemented as float 3 To understand the NV vertex program and the code produced by the compiler using the vp20 profile see the GL NV vertex program extension documentation 4 See OpenGL NV vertex program 1 0 Profile vp20 on page 279 for a full explanation of the data types statements and operators supported by this profile 808 00504 0000 006 279 NVIDIA Cg Language Toolkit Bindings Q int data type is supported using floating point operations which add extra instructions for proper truncation for divides modulos and casts from floating point types Q fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the
96. 00 006 NVIDIA Introduction to the Cg Runtime Library CGD3D9ERR INVALIDVEREXDECL Returned when a program is loaded with the expanded interface but the given declaration is incompatible CGD3D9ERR_NODEVICE Returned when a required Direct3D device is 0 This typically occurs when an expanded interface function is called and a Direct3D device has not been set with cgD3D9SetDevice CGD3D9ERR NOTMATRIX Returned when a parameter that is not a matrix type is passed to a function that expects one CGD3D9ERR NOTLOADED Returned when a parameter has not been loaded with the expanded interface by cgD3D9LoadProgram CGD3D9ERR_NOTSAMPLER Returned when a parameter that is not a sampler parameter is passed to a function that expects one CGD3D9ERR_NOTUNIFORM Returned when a parameter that is not uniform is passed to a function that expects one CGD3D9ERR_NULLVALUE Returned when a value of zero is passed to a function that requires a non zero value CGD3D9ERR OUTOFRANGE Returned when an array range specified to a function is out of range CGD3D9 INVALID REG Returned when a register number is requested for an invalid parameter type This error is specific to the minimal interface functions and does not trigger an error callback Testing for Errors When a Direct3D runtime function is called that returns an error of type HRESULT the proper method of testing for success or failure is to use the Win32 macros
97. 00 006 309 NVIDIA Cg Language Toolkit Table 48 ps 1 x Instruction Set Modifiers continued Instruction Register Cg Expression Modifier instr sat saturate x ie min 1 max 0 x reg_bias x 0 5 l reg 1 x reg x reg bx2 2 x 0 5 Language Constructs and Support Data Types In the ps 1 x profiles operations occur on signed clamped floating point values in the range MaxPixelShaderValue to MaxPixelShaderValue where MaxPixelShaderValue is determined by the DirectX implementation These profiles allow all data types to be used but all operations are carried out in the above range Refer to the DirectX pixel shader 1 X documentation for more details Statements and Operators The DirectX pixel shader 1 X profiles support all of the Cg language constructs with the following exceptions Q Arbitrary swizzles are not supported though arbitrary write masks are Only the following swizzles are allowed x r y g z b w a xy rg xyz rgb xyzw rgba xxx rrr yyy ggg zzz bbb www aaa xxxx rrrr yyyy gggg zzzz bbbb wwww aaaa Matrix swizzles are not supported Boolean operators other than lt gt and gt are not supported Furthermore lt lt gt and gt are only supported as the condition in the operator Bitwise integer operators are not supported is not supported unless the divisor is a non zero constant or it is used to compute t
98. 1 float4 v float4 low float4 high return saturate v low high low float4 remapFrom01 float4 v float4 low float4 high return lerp low high v Don t forget vectorization here as well If two 1oat valued functions have the same domain and range you can pack them into two texture components of the same texture Only one texture lookup is needed to load them both and vectorized versions of the remap can be used to do the remapping more efficiently as well 5 Use Data Types with Minimum Sufficient Precision For profiles that support multiple precisions a general rule of thumb is that if you can do a computation with fixed precision variables the computation is faster than if you use half and if you use half the computation is faster than if you use float Although sometimes you need the range and extra precision that half and float offer you should avoid using them unless necessary 808 00504 0000 006 325 NVIDIA Cg Language Toolkit 6 Use the Right Standard Library Routines for Shading Computations If you re implementing a shading model such as Lambertian Blinn or Phong you ll generally be performing some dot product routines clamping negative results to zero and raising some of the values to a power to compute a specular exponent There are a few tricks that can speed up this process Q Besure to use the dot function when computing dot products Q Ifyou need t
99. 22 operator enhancements 247 precedence 247 operators arithmetic 20 boolean 21 conditional 22 introduction 18 808 00504 0000 006 swizzle 22 write mask 22 P packed type modifier 230 parameter shadowing 73 parameters modifiable function passing 19 parameters in function definitions syntax 227 pass 117 120 pass state 120 performance techniques abs 324 avoiding matrix transposes 328 computation frequency 327 conditional code in fragment programs 328 datatypes 325 dot 324 min 324 saturate 324 shading computations 326 swizzle 323 texture maps 324 vectorization 321 pixel program defined 3 pixel shader defined 3 position invariance 250 profile arbfpl 263 arbvpl 256 fp20 283 fp30 274 ps_1_1 ps_1_2 ps_1_3 308 ps_2_0 ps_2_x 300 vp20 279 vp30 270 vs_1_1 304 vs20 vs2x 296 profile defined 3 program declaring 5 kinds of inputs 5 program profiles fragment 252 335 NVIDIA Cg Language Toolkit vertex 250 programming model GPU 2 ps 1 x profile 308 ps 2 0 profile 300 ps 2 x profile 300 R ray traced refraction pixel shader code example 172 sample shader 170 vertex shader code example 171 recursion function 19 reflection vector 200 refraction pixel shader code example 207 sample shader 205 vertex shader code example 206 release notes xvi Renderman relation toCg 221 reserved words 249 runtime core Cg 49 S sampler data type 11 sampler type specification 230 samplers
100. 3 tex2Dproj sampler2D float4 tex3D sampler3D float3 808 00504 0000 006 311 NVIDIA Cg Language Toolkit Bindings Table 49 Supported Standard Library Functions continued tex3Dproj sampler3D float4 texCUBE samplerCUBE float3 texCUBEproj samplerCUBE float4 Note The non projective texture lookup functions are actually done as projective lookups on the underlying hardware Because of this the w component of the texture coordinates passed to these functions from the application or vertex program must contain the value 1 Texture coordinate parameters for projective texture lookup functions must have swizzles that match the swizzle done by the generated texture addressing instruction While this may seem burdensome it is intended to allow ps 1 X profile programs to behave correctly under other pixel shader profiles The swizzles required on the texture coordinate parameter to the projective texture lookup functions are listed in Table 50 Table 50 Required Projective Texture Lookup Swizzles Texture Lookup Function Texture Coordinate Swizzle texlDproj Xw ra tex2Dproj xyw rga texRECTproj Xyw rga tex3Dproj xyzw rgba texCUBEproj xyzw rgba Manual Assignment of Bindings The Cg compiler can determine bindings between texture units and uniform sampler parameters texture coordinate inputs automatically This automatic assi
101. 3 May be used only with uniform inputs with sampler types register c0 register c7 Constant register 0 7 C0 C7 Binding Semantics for Varying Input Output Data The varying input binding semantics in the ps 1 x profiles are the same as the varying output binding semantics of the vs 1 1 profile Varying input binding semantics in the ps 1 x profiles consist of COLORO COLOR1 TEXCOORDO TEXCOORD1 TEXCOORD2 and TEXCOORD3 These map to output registers in DirectX vertex shaders 808 00504 0000 006 313 NVIDIA Cg Language Toolkit The valid binding semantics for varying input parameters in the ps_1_x profiles are summarized in Table 52 Table 52 ps 1 x Varying Input Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Input color value vO COL COLO COLOR1 Input color value v1 COL1 TEXCOORDO TEXCOORD3 Input texture coordinates t0 t3 TEXO TEX3 Additionally the ps_1_x profiles allow POSITION FOG PSIZE TEXCOORD4 TEXCOORD5 TEXCOORD6 and TEXCOORD7 to be specified on varying inputs provided these inputs are not referenced This allows Cg programs to have the same structure specify the varying output of a vs_1_1 profile program and the varying input of a ps_1_x profile program The valid binding semantics for varying output parameters in the ps_1_x profile are summarized in Table 53 Table 53 ps 1 x Varying Output Binding Semantics
102. 33 overloading by profile 226 standard library 33 texture map 38 G geometric functions 38 GL_ARB_vertex 256 global variables 241 graphics hardware evolution of xiii grass sample shader 202 vertex shader code example 202 H half datatype 11 half type specification 229 l if statements 244 inputs uniform 5 varying 5 6 int data type 11 int type specification 229 integral type category 232 interfaces 125 J Java relation to Cg 221 L language profiles conceptof 3 M mathematical functions 33 matrices multiplying 20 matrices support of 12 matrix palette skinning 217 334 808 00504 0000 006 NVIDIA sample shader 217 vertex shader code example 218 matrix transposes and performance 328 melting paint pixel shader code example 163 sample shader 161 vertex shader code example 161 min for performance 324 miscellaneous operators 249 modifiable function parameters passing 19 multipaint pixel shader code example 167 sample shader 165 vertex shader code example 166 namespaces 237 numeric type category 232 O object Cg definition 224 open profile functions 227 OpenGL Cg runtime 73 error reporting 85 OpenGL application 82 parameter setting 74 OpenGL CGerror 85 OpenGL profiles ARB fragment program 263 ARB vertex program 256 NV fragment program 274 NV register combiners 283 NV texture shader 283 NV vertex program 279 NV vertex program 2 0 270 operations expressed differently from C 2
103. 4 MaxTexIndirections lt n gt where n gt 1 default infinite NumDrawBuffers lt n gt where 1 lt n lt 4 default 1 266 808 00504 0000 006 NVIDIA Appendix B Language Profiles OpenGL NV_vertex_program 3 0 Profile vp40 The vp40 profile is an extended version of the arbvp1 profile It has all of the capabilities of arbvp1 and the added capability described in this section Vertex Texturing The vp40 profile supports accessing texture maps in programs Textures are available via the usual sampler types and the tex standard library calls 808 00504 0000 006 267 NVIDIA Cg Language Toolkit OpenGL NV_fragment_program 2 0 Profile p40 The p40 profile is an extended version of the arpfp1 profile It has all of the capabilities of arbfp1 as well as the added capabilities described in this section Branching The branching support in p40 allows some if statements and looping constructs to be implemented with branching In profiles such as p30 conditional execution of code was always implemented with predicated instructions and loops were always unrolled In the GeForce 6800 GPU there is a cost associated with executing a branch in the fragment shading engine As such it is possible that the cost of the branch will out weigh the savings from skipping over a block of conditionally executed code or of executing an unrolled loop Please refer to the NVIDIA developer Web site for more information
104. 504 0000 006 NVIDIA Introduction to the Cg Language computations to be performed in slower high precision arithmetic If the C behavior is desired the constant should be explicitly typed to force the type promotion halfvar 2 0 is compiled as float halfvar 2 0 Cg uses the following type suffixes for constants Q f for float O h for half O x for fixed Structures and Member Functions Cg supports structures the same way C does Cg adopts the C convention of implicitly performing a typedef based on the tag name when a struct is declared struct mystruct 5 sou EL ka mystice sp Define s as m Vastu Structures may define member functions in addition to member variables Member functions provide a convenient way of encapsulating helper functions associated with the data in the structure or as a means of describing the behavior of a data object Structure member functions are declared and defined within the body of the structure definition struct Foo logic Wells float helper float x imenEwUE well ep 29 y Member functions may reference their arguments or the member variables of the structure in which they are defined The result of referring to a variable outside the scope of the enclosing structure such as global variables is undefined instead passing such variables as arguments to member functions that need them is recommended Member functions are invoked using the usual
105. Arithmetic Operators TOM C uis eee RR ERE Chae Aes heehee ERR E SAU d 20 Multiplication FUNCHONSS cascos ett pops aperte a prec Eo Foe ob e uot gd 20 Vector COnStructOFr 3 acoso img RR CER ORR RC p Ra REA RA RR AN 21 Boolean and Comparison Operators 1 1 0 ee 21 Swizzle Operator serani eret eii eng d eyed cole tea Merten ew ate 22 Wite Mask Operdtotiin einander aci au gc a kg E ace toned a aeg cut v a lec bees 22 Conditional Operator a p smpi piede E Vb qal acd Bop eda IR OR ORE US EOD OR d 22 Texture Lookups in Advanced Fragment Profiles llis 23 clc PP m 24 Imc e TP 25 Passes svo yk ROT OAD SEE Gc AEG EHE EEO aS EEUU XI EUER RA ADR KON 26 State ASS MENS cotas s gE SENCER EErEE ia E eR REA e SCR d 26 Parameters and Semantics i s sonrasi a tanir ea ras 27 Vertex and Fragment Programs usada 27 Textures and Samplers 24 3 dog hok RE RES LRG ERG ae EEE OE ORY qus 29 Interfaces and Unsized Arrays s vues pec ur Vows ea RE eee ok C Ree EO we 29 Running Cg Programs On the CPU oca RR e Re Re eR IL Ue ea o RR 30 808 00504 0000 006 NVIDIA Cg Language Toolkit ANNOIN S ua arras d Seb eo dede tede Sed BOE taa ra c qd c po ER dod d ong 32 More DS Tea cep PEE 32 Cg Standard Library Functions 000 ccc e eee 33 Mathematical FUNCOMS ou eee eke CEA Cet ra Ree eee Rl deu RY 33 Geomettic FUNCHONS as qued wade ERE aah EQ Re E PD EO AWE Cp 38 Texture Map FUNCUONS ac cee kbavav even Shee
106. Assignments 0 0 4 141 Table 9 Type Conversions sooo RA RRA A 235 Table 10 Expanded Operators 2 0 247 Table 11 Vertex Output Binding Semantics s 4 251 Table 12 Fragment Output Binding Semantics 2 1 252 Table 16 arbvp1 Uniform Input Binding Semantics 260 Table 17 arbvp1 Varying Input Binding Semantics 0 261 Table 18 arbvpi Varying Output Binding Semantics 0 261 Table 19 arbfp1 Uniform Input Binding Semantics ll ss 265 Table 20 arbfp1 Varying Input Binding Semantics 265 Table 21 arbfp1 Varying Output Binding Semantics 265 Table 22 p40 Compiler Branching Options 00040 269 Table 23 vp30 Uniform Input Binding Semantics 048 271 Table 24 vp30 Varying Input Binding Semantics 04 272 Table 25 vp30 Varying Output Binding Semantics 000 G 272 Table 26 p30 Uniform Input Binding Semantics 04 275 Table 27 p30 Varying Input Binding Semantics 0 275 Table 28 p30 Varying Output Binding Semantics a a 276 Table 29 vp20 Uniform Input Binding Semantics 048 280 Table 30 vp20 Varying Input Binding Semantics 0 281 Table 31 vp20 Varying Output Binding Semantics 0 0 00 ee 281 Table 32
107. COORDO float3 3 MEPXCIOQEND ILS ARO e cis pares Hoar BE COORD P EON SD des loat N p WaxXCOOROSs a m ales Space Si y float Position 2 POSITION im projection space float4 Normal COLORO in tangent space float4 LightVectorUnsigned COLOR1 in tangent space float3 TexCoord0 TEXCOORDO float3 TexCoordl TEXCOORD1 Hoa icto or EL XCOORDZ in tangent space float4 HalfAngleVector TEXCOORD3 in tangent space v2f main a2v IN uniform float4x4 WorldViewProj uniform float4 LightVector in object space uniform float4 EyePosition in object space v2 OUT pass texture coordinates for fetching the diffuse map OUT TexCoord0 xy IN TexCoord xy pass texture coordinates for fetching the normal map OUT TexCoordl xy IN TexCoord xy compute the 3x3 transform from tangent space to object space float3x3 objToTangentSpace objToTangentSpace 0 IN T objToTangentSpace 1 TEIN S ehe objToTangentSpace 2 TEIN SINE transform normal from 808 00504 0000 006 193 NVIDIA Cg Language Toolkit object space to tangent space OUT Normal xyz 0 5 mul objToTangentSpace IN Normal Oar transform light vector from object space to tangent space float3 lightVectorInTangentSpace mul objToTangentSpace LightVector xyz OUT LightVector xyz lightVectorInTangentSpace OUT LightVectorUnsigned xyz
108. Car Paint 9 pixel shader code example 186 vertex shader code example 184 cfloat type specification 229 Cg brief tutorial 145 defined 1 language introduction 1 necessity for xiv standard library functions 33 Cg compiler cgc exe 329 command line options 329 Cg runtime API specific 72 benefits 44 compiling 46 context creation 46 Direct3D 85 NVIDIA cgD3D9GetLastError 115 CGerror 114 debugging mode 112 error callbacks 116 error testing 115 error types 114 Direct3D cgD3D9EnableDebugTracing 114 Direct3D cgD3D9TranslateHRESULT 116 Direct3D expanded interface 98 cgD3D8LoadProgram 103 cgD3D8SetSamplerState 102 cgD3D9BindProgram 105 cgD3D9EnableParameterShadowing 103 cgD3D9GetDevice 98 cgD3D9GetLatestPixelProfile 105 cgD3D9GetLatestVertexProfile 105 Cg Language Toolkit cgD3D9GetOptimalOptions 105 cgD3D9IsParameterShadowingEnable d 103 cgD3D9IsProgramLoaded 104 cgD3D9LoadProgram 103 cgD3D9SetDevice 98 cgD3D9SetSamplerState 102 cgD3D9SetTexture 102 cgD3D9SetTextureWrapMode 102 cgD3D9SetUniform 100 cgD3D9SetUniformArray 101 cgD3D9SetUniformMatrix 101 cgD3D9SetUniformMatrixArray 10 T cgD3D9UnloadProgam 104 Direct3D 8 application 109 Direct3D 9 application 106 Direct3D device 98 fragment program 106 lost devices 98 parameters 100 array 101 sampler 102 uniform 100 profile support 105 program executiion 103 vertex program 106 Direct3D HRESULT 114 Direct3D
109. Car Paint Q cara neared RC ne RR RR 186 Basic Profile Sample Shaders coooocononc eee 189 AnisotropicEighlfit suo Ss gue cinta wie Aa g AN Rod a B N O E 190 Descriptio eve 2L TP E REOR EAS A LL Ep A ORE EAA dd 190 Vertex Shader Source Code for Anisotropic Lighting o oooo 191 Bump Dot3x2 Diffuse and Specular serrara sek prr tex iiy d eser ad Rage 192 DESCUPUON acer eae ee qe m pp Roo ep or eod mee a dg mee Cic 192 Vertex Shader Source Code for Bump Dot3X2 ssll ee 193 Pixel Shader Source Code for Bump Dot3x2 0 000 cee ee 194 B mp Reflection Mappllii 5 agii mogsa ie ee CAR IS a RR tap Te Te cde AMM alg ea 196 Descrip seres Sen PO me a ee ee Rer Sens par desunt e a 196 808 00504 0000 006 iii NVIDIA Cg Language Toolkit Vertex Shader Source Code for Bump Reflection Mapping 00005 197 Pixel Shader Source Code for Bump and Reflection Mapping 199 o A A AO 200 DESCAPHON mtr 200 Vertex Shader Source Code for Fresnel 0 0 0 0 0c 200 GaSe ce ow EHI ARO oo 202 DESCARTO pra inte edi a eie as sido qui we ie Rosen o 202 Vertex Shader Source Code for Grass liliis 202 Refraction ci exa Rx eR RR ECCE AAA AAA RO OR GC CH IRR 205 BDeSCHBLOTI rsrs arx eos genes RE aede CE n Sa EA Eee dee we d aua 205 Vertex Shader Source Code for Refraction 2 206 Pixel Shader Source Code for Refraction l l eee 207 Shadow MappIDg 32 35 x bred e dun pee
110. D TRACE Activating vertex shader for program 3 cgD3D TRACE Setting shadowed parameters for program 3 cgD3D TRACE Setting registers for uniform parameter ModelViewProj of type float4x4 CgD3D TRACE Setting constant registers 0 3 for parameter ModelViewProj of type float4x4 cgD3D TRACE Activating pixel shader for program 24 CgD3D TRACE Setting shadowed parameters for program 24 CgD3D TRACE Setting texture for sampler parameter BaseT CgD3D TRACE Setting SamplerState 0 D3DTSS MAGFILTER for sampler parameter BaseTexture CgD3D TRACE Setting SamplerState 0 D3DTSS MINFILTER for sampler parameter BaseTexture cgD3D TRACE Setting SamplerState 0 D3DTSS MIPFILTER for sampler parameter BaseTexture CgD3D TRACE Deleting vertex shader for program 3 cgD3D TRACE Deleting pixel shader for program 24 To use the debug DLL 1 Link your application against egD3D9d 1ib or cgb3D8d 1ib instead of CcgD3D9 lib or cgD3D8 lib 2 Make sure that the application can find egD3D9d d11 or cgD3D8d d11 808 00504 0000 006 113 NVIDIA Cg Language Toolkit 3 Turn on and turn off tracing of portions of your code using cgD3D9EnableDebugTracing void cgD3D EnableDebugTracing CGbool enable Here is how you would enable debug tracing for part of the application code cgD3D9EnableDebugTracing CG TRUE
111. E E EA To ER ARR rra LR A nre 260 OPUS ba ios o ciao es 262 OpenGL ARB Fragment Program Profile azbf p1 sisse 263 Accessing OpenGL State a oo sem ding er ed a 263 808 00504 0000 006 NVIDIA Cg Language Toolkit MET SUB DOME a aca fered anita dood tin a ra dete aR ale d 263 Resource EImits 5s pia ae 264 Language Constructs and Support es 264 Bindlhgs uiuit pem hehe Shae ahaa Rode Wea Bese aed Rone Seg cb hem ee Stans 265 anc CP ree IM 266 OpenGL NV vertex program 3 0 Profile vb40 leen 267 Vertex Textulitigi uox douce a ia da aed mab RR CU Re Rn 267 OpenGL NV fragment program 2 0 Profile p40 isse 268 sud e PPP RPE CA CREM SERA E eee RRR EE 268 FACE Semantis viele made CaaS Ad A wale EE eed xa 269 OpenGL NV vertex program 2 0 Profile vb30 see 270 Position MATERIE a sms tue dc deti ia dove een d goede Pidgin y bie Ay dak q dd 270 Language CONSHUCES ccc peor ice Roe RC RC Cee A AAA Re Ro 270 Biridilngs suscita dies a a sinus A aR ho ade tested Meese UE EU TIE 271 OpenGL NV fragment program Profile p30 llle 274 Language Constructs and Suppor orig wurst es macetna sos ac ema 9 dled mundo 274 BIN S pei a al Waco ER a BSG 215 Pack and Unpack FUNGOS s a sii cue ceo iix Dee xk a EROR RR RR Ra RC URS 216 OpenGL NV vertex program 1 0 Profile vP20 lise 279 AUCI Werer nse errar eRe 279 Position I nVallaflcBs ir REUS oak A AE Ka 279 Data Types veria REOR E RA A O
112. Enable bool 1 0 LightModelLocalViewerEnable bool 1 0 LightModelTwoSideEnable bool 1 0 LineSmoothEnable bool 1 0 LineStippleEnable bool 1 0 LogicOpEnable bool 1 0 MultisampleEnable bool 1 3 or ARB multisample NormalizeEnable bool 1 0 PointSmoothEnable bool 1 0 808 00504 0000 006 139 NVIDIA Cg Language Toolkit Table 7 Enable Disable States continued Enable Disable State Name Type Requires PointSpriteEnable bool 2 0 ARB point sprite Of NV_point_sprite PolygonOffsetFillEnable bool OpenGL 1 1 PolygonOffsetLineEnable bool 1 1 PolygonOffsetPointEnable bool 1 1 PolygonSmoothEnable bool 1 0 PolygonStippleEnable bool 1 0 RescaleNormalEnable bool 1 20r EXT rescale normal SampleAlphaToCoverageEnable bool 1 3 0r ARB multisample SampleAlphaToOneEnable bool 1 3 0r ARB multisample SampleCoverageEnable bool 1 3 or ARB_multisample ScissorTestEnable bool 1 0 StencilTestEnable bool 1 0 TexGenSEnable ndx bool 1 0 ndx must be greater or equal to zero and less than the value of cr MAx TEXTURE COORDS TexGenTEnable ndx bool Same as TexGenSEnable TexGenREnable ndx bool Same as TexGenSEnable TexGenQEnable ndx bool Same as TexGenSEnable TexturelDEnable ndx bool 1 0 ndx must be greater or equal to zero and less than the value of Gt MAX TEXTURE IMAGE UNITS Texture2DEnable ndx bool same as TexturelDEnable Texture3DEnable ndx
113. FAILED and SUCCEEDED Simply testing the error against Zero or D3D OK is not sufficient because there could be more than one success value As an added convenience and for uniformity with the core runtime the Direct3D runtime also supplies cgD3D9GetLastError which is analogous to cgGetLastError but returns the last Direct3D runtime error of type HRESULT for which the FAILED macro returns TRUE HRESULT cgD3D9GetLastError The last error is always cleared immediately after the call 808 00504 0000 006 115 NVIDIA Cg Language Toolkit The function egD3D9TranslateHRESULT converts an error of type HRESULT into a string const char cgD3D9TranslateHRESULT HRESULT hr This function should be called instead of DXGetErrorDescription9 because it also translates errors that the Cg Direct3D runtime generates Using Error Callbacks Here is an example of a possible error callback that sorts out debug trace errors from core runtime errors and from Direct3D runtime errors void MyErrorCallback CGerror error cgGetError if error cgD3D9DebugTrace This is a debug trace output A breakpoint could be set here to step from one debug output to the other return char buffer 1024 if error cgD3D9Failed sora otero WA Direccion emo Occurred Sa val cgD3D9TranslateHRESULT cgD3D9GetLastError else sorrat outra WA Ce arron occurred Ss aU cgD3D9Tra
114. JD s COLORI return Los gt QR T ww 2 suv float bar technique NewSimpleFrag pass VertexProgram NULL FragmentProgram compile arbfpl main 2 bar Here the value 2 bar is associated with the foo parameter of main When the value of bar is changed by the application the value of foo in main is set appropriately 28 808 00504 0000 006 NVIDIA Introduction to the Cg Language Finally vertex or fragment programs may be assigned the value NULL in the state assignment This signifies that no program should be used in this pass Textures and Samplers CgFX makes it possible to define state related to textures in the effect file The short effect file below shows an example sampler2D samp sampler_state generateMipMap true minFilter LinearMipMapLinear magFilter Linear y float4 texsimple uniform sampler2D sampler tloata uy TEXCOORDO s COLOR Y return tex2D sampler uv technique TextureSimple pass FragmentProgram compile arbfpl texsimple samp Interfaces and Unsized Arrays CgFX also supports Cg s interfaces and unsized arrays features Given an effect file with Cg programs that use these features the compile statement can be used in two different ways to resolve the interfaces and unsized arrays so that the program can be compiled Consider the following example a Light interface has been defined with SpotLight implementing t
115. L Profile Support A convenient function is provided that gives the best available profile for vertex or fragment programs depending on the available OpenGL extensions CGprofile cgGLGetLatestProfile CGGLenum profileType Parameter profileType is equal to CG GL VERTEX Or CG_GL_FRAGMENT Function cgGLGetLatestProfile may be used in conjunction with cgCreateProgram Or cgCreateProgramFromFile to ensure that the best available vertex and fragment profiles are used for compilation This allows you to make your application future ready because the Cg programs are automatically compiled for the best profiles that are available at runtime even if these profiles did not exist at the time the application was written Another function that allows you optimal compilation is cgGLSetOptimalOptions It sets implicit compiler arguments that are 80 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library appended to the argument list passed to cgCreateProgram or cgCreateProgramFromFile void cgGLSetOptimalOptions CGprofile profile OpenGL Program Execution All programs must be loaded before they can be bound To load a program use cgGLLoadProgram void cgGLLoadProgram CGprogram program Binding a program only works if its profile is enabled This is done by calling cgGLEnableProfile with the program profile void cgGLEnableProfile CGprofile profile The binding itself is done using cgGLBindProgram void
116. Language Toolkit If no call to cgCreateProgram has been made for the context cgGetLastListing returns zero Otherwise it returns a string containing the output you would typically get from the command line version of the compiler Program Attributes To retrieve the context the program belongs to use cgGetProgramContext CGcontext cgGetProgramContext CGprogram program Retrieving the profile the program has been compiled to is done with cgGetProgramProfile CGprofile cgGetProgramProfile CGprogram program The function pair cgGetProfile and cgGetProfileString allows you to find the correspondence between a profile enumerant and its corresponding string CGprofile cgGetProfile const char profileString const char cgGetProfileString CGprofile profile If the string passed to egGetProfile does not correspond to any profile CG PROFILE UNKNOWN is returned The function cgGetProgramString retrieves various strings related to the program depending on the value of the enumerant stringType const char cgGetProgramString CGprogram program CGenum stringType The variable stringType can have any of these values O CG PROGRAM SOURCE The original Cg source program is returned O CG PROGRAM ENTRY The main entry point of the Cg source program is returned CG PROGRAM PROFILE The profile string is returned CG COMPILED PROGRAM The resulting compiled program is returned Core Cg Parameters Cg par
117. MirrorClamp MirrorClampToEdge MirrorClampToBorder OpenGL 1 2 or EXT texture3D for WrapR 1 2 or EXT_texture_edge_clamp for ClampToEdge 1 3 or ARB_texture_border_clamp for ClampToBorder 1 4 ARB_texture_mirrored_repeat OF IBM_texture_mirrored_repeat for MirroredRepeat EXT_texture_mirror_clamp or ATI_texture_mirror_once for MirrorClamp Or MirrorClampToEdge EXT texture mirror clamp for MirrorClampToBorder 808 00504 0000 006 NVIDIA 14 Cg Language Toolkit Table 8 sampler_state State Assignments continued Name Type Valid Values Requires BorderColor float4 OpenGL 1 0 CompareMode int None 1 4 or ARB_ shadow CompareRToTexture CompareFunc int Never Less LEqual 1 40rARB shadow 1 5 or Equal Greater EXT shadow funcs for Never Less NotEqual GEqual Equal Greater NotEqual Of Always Always DepthMode int Alpha Intensity 1 40r ARB depth texture Luminance GenerateMipMa bool 1 4 or SGIS generate mipmap P LODBias float 1 4 MinFilter int Nearest Linear 1 0 LinearMipMapNearest NearestMipMapNearest NearestMipMapLinear LinearMipMapLinear MagFilter int Nearest Linear 1 0 MaxMipLevel float 1 20r EXT texture lod MaxAnisotropy float EXT texture filter anisotropic MinMipLevel float 1 2 or EXT texture lod Texture texture Reference to texture parameter OpenGL State Not Specifiable with State Assignments By design state assi
118. NV_texture_shader and NV_register_combiners Instruction Set Modifiers 285 Table 33 Supported Standard Library Functions 0000 eee 286 Table 34 Required Projective Texture Lookup Swizzles ss 288 808 00504 0000 006 xi NVIDIA Cg Language Toolkit List of Tables Table 35 p20 Uniform Binding Semantics 0048 289 Table 36 p20 Varying Input Binding Semantics 048 289 Table 37 p20 Varying Output Binding Semantics 0 0000 ae 290 Table 38 p20 Auxiliary Texture Functions 0000007 291 Table 39 vs 2 Uniform Input Binding Semantics 298 Table 40 vs 2 Varying Input Binding Semantics lins 298 Table 41 vs 2 Varying Output Binding Semantics 299 Table 42 ps 2 Uniform Input Binding Semantics 302 Table 43 ps 2 Varying Input Binding Semantics 0 302 Table 44 ps 2 Varying Output Binding Semantics 302 Table 45 vs 1i 1 Uniform Input Binding Semantics 306 Table 46 vs 1i 1 Varying Input Binding Semantics 0 306 Table 47 vs 1i 1 Varying Output Binding Semantics iss 307 Table 48 ps 1 x Instruction Set Modifiers 309 Table 49 Supported Standard Library Functions 0 0 00 eee 311 Table 50 Required Projective Texture Lookup Swizzles
119. ON Tangent space VIEW distance attenuation O view dElbexeuE eean CEN tanV z viewP w 808 00504 0000 006 185 NVIDIA Cg Language Toolkit Vi NI EWTANG O tange O oia O norma O fresn return ine mal AL O normalize View normalize View normalize View FresnelApprox Tangent 0 Tangent 1 Tangent 2 Pixel Shader Source Code for Car Paint 9 column column 0 il Sala 2 This shader is based on the Time Machine temporal rust shader Car paint data was measured by Cornell University from samples provided by Ford Motor Company EN SPSS MO MA float4 HPosition POSITION coord position in window float2 uv EXCOORDO wavy fleckmap coords clones Lale ame EXCOORD1 light pos tangent space float4 halfangle EXCOORD2 Blinn halfangle floats reflection TEXCOORDS ARE vector per vertex float4 view EXCOORD4 view tangent space float3 tangent EXCOORD5 view tangent matrix float3 binormal EXCOORD6 float3 normal EXCOORD7 float fresn COLORO y PIXEL SHADER float4 main VS_OUTPUT vert uniform sampler2D WavyMap register s0 uniform samplerCUBE EnvironmentMap register s1 uniform sampler2D PaintMap register s2 uniform sampler2D FleckMap register s3 uniform float Ambient COLOR NEWPAINTSPEC UNUSED S
120. OSITION position elijo spaca float4 TexCoords TEXCOORDO base ST coordinates float3 OPosition TEXCOORD1 position obj space float3 Normal TEXCOORD2 normal eye space float3 VPosition TEXCOORD3 view pos obj space iloet3 7 TEXCOORD4 tangent obj space loe s 18 TEXCOORD5 binormal obj space floats STI TEXCOORD6 normal obj space float4 LightVecO SER IDEE 2 ASIE Dre cie elos sees MultiPaintV2F main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewIT uniform float4x4 ModelViewl uniform float4 TexRepeats uniform float4 LightVec eye space MultiPaintV2F OUT OUT HPosition mul ModelViewProj IN Position pass through object space position OUD OPosit Lone ENE Os On 2727 transform normal to eye space OUT Normal normalize mul ModelViewIT IN Normal xyz OUT TexCoords IN UV TexRepeats pass through object space normal tangent binormal 166 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders OUT N normalize IN Normal xyz QUE 1 IN Temejsat o xS TP OU Pee EN AB no a gt transform view pos origin to obj space OUT VPosition mul ModelViewI float4 0 0 0 1 xyz transform light vector to obj space OUT LightVecO mul ModelViewI LightVec return OUT Pixel Shader Source Code for MultiPaint T define WHITE half4 1 0h 1 0h 1 0h 1 0h
121. OpenGL is a trademark of SGI Other company and product names may be trademarks of the respective companies with which they are associated Updates Any changes additions or corrections will be posted at the NVIDIA Cg Web site http developer nvidia com Cg Refer to this site often to keep up on the latest changes and additions to the Cg language Copyright 2002 2005 NVIDIA Corporation All rights reserved NVIDIA NVIDIA Corporation 2701 San Tomas Expressway Santa Clara CA 95050 www nvidia com Foreword asia aaa aa xiii Preface iaa a o AAA CN a xv Release Notes ies se ERREUR Keene E RUE NEQOE BI xvi Online Updates a RN xvi Introduction to the Cg Language 545 eode ru Rx A KR RI RR Ra ad E 1 Th Cg Language is creed ih Ee P TU Rob Paco REPRE RP E QUEE Rob dtd 2 Cg s Programming Model for GPUs soci const 0 0000 2 Cg Language PMOTIES ecards 3 Declaring Programs IMCO esep inie Rag a a A qo adobe 5 Program Inputs and OUtpUls s s e a e n oboe Roe E ER Gees 5 Working With Data 3 2 2x gioid e ga E Rex g s EORR E dnbie do ao 11 Basie D ta TYPES cea cds id ia oia Sande 11 TYPE CONVETSIONE souci orangia tee eR IRR it mc IRIS ART o WUE RAO EAO Ree 12 Structures and Member FUNCOMS coi o dae eS 13 AMS P PIERDE 14 Statements and OPEO S sapa xac ones kasd ee dra bbc obe qq a Py Roe darus 18 CONTEOLEIOW asi ss eee LAER REE A Ca CR Ra OR e nd 19 Function Definitions and Function Overloading lille 19
122. PEC POWER GLOSSINESS FLECK SPEC POWER float4 NewPaintSpec OW 648 08 Does Bo je float3 ClearCoat 099 0 59 vi Oy dibaie Teo float3 FleckColor 1 S 1 05 soe is float3 WavyScale eeu Une E VU Ps 186 NVIDIA 808 00504 0000 006 Advanced Profile Sample Shaders Tangent space LIGHT vector float3 L normalize vert light Tangent space HALF ANGLE vector float3 H normalize vert halfangle xyz Tangent space VIEW vector float3 V normalize vert view xyz float v_dist vert view w Tangent space WAVY_NORMAL float3 wavyN float3 tex2D WavyMap vert uv 2 1 wavyN normalize wavyN WavyScale PAINT A normal map map could be loaded here instead if we wanted more detail In this case we have a uniform tangent space normal 0 0 1 llore ig ol jl Mas mE elote mella kozy float3 paint color float3 tex2D PaintMap tlosciez um cl 1 mel 1m p SPECULAR POWER use a saturated diffuse term to clamp the backlighting n_d_h saturate n_d_1 4 pow n_d_h NewPaintSpec y REFLECTION ENVIRONMENT Reflect view vector about wavy normal and bring to view space float3 R reflect V wavyN R R x vert tangent R y vert binormal R z vert normal float3 reflect_color float3 texCUBE EnvironmentMap R FLECKS Load random 3 vector flecks from fleck map Reduce tiling artifact
123. PLEVEL D3DTSS_MAXANISOTROPY Parameter value is a value appropriate for the corresponding type Here is an example of how to use this function cgD3D8SetTextureStageState parameter D3DTSS_MAGFILTER D3DTEXF_LINEAR The texture wrap mode is set using HRESULT cgD3D9SetTextureWrapMode CGparameter parameter DWORD value The input value is either zero or a combination of D3DWRAP_U D3DWRAP_V and D3DWRAP_W Here is an example of how to use this function cgD3D9SetTextureWrapMode parameter D3DWRAP_U D3DWRAP V Parameter Shadowing Parameter shadowing can be enabled or disabled on a per program basis Q When loading the program see Expanded Interface Program Execution on page 103 102 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library O Atany time using HRESULT cgD3D9EnableParameterShadowing CGprogram program CGbool enable for which enable should be set to CG_TRUE to enable parameter shadowing and to CG_FALSE to disable it To know if parameter shadowing is enabled for a given program use CGbool cgD3D9IsParameterShadowingEnabled CGprogam program This function returns CG_TRUE if parameter shadowing is enabled for program Expanded Interface Program Execution To load a program in Direct3D 9 use cgD3D9LoadProgram HRESULT cgD3D9LoadProgram CGprogram program CG_BOOL parameterShadowingEnabled DWORD assembleF lags This function assembles the result of the compilation
124. Plane float 4 Same as ndx TexGenSEyePlane TexGenQObjectPlane float4 Same as ndx TexGenSEyePlane TexturelD ndx sampler1D OpenGL 1 0 ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE IMAGE UNITS Texture2D ndx sampler2D Same as TexturelD Texture3D ndx sampler3D 1 2 or EXT texture3D ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE IMAGE UNITS TextureRectangle ndx samplerRECT ARB texture rectangle EXT texture rectangle Apple or NV texture rectangle ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE IMAGE UNITS 808 00504 0000 006 NVIDIA 137 Cg Language Toolkit Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires TextureCubeMap ndx TextureEnvColor ndx samplerCUBE float4 1 3 ARB_texture_cube_map or EXT_texture_cube_ map ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE IMAGE UNITS OpenGL 1 0 ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE UNITS TextureEnvMode ndx int Modulate Decal Blend Replace Add 1 0 1 3 ARB texture env add Oor EXT texture env addfor Add ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE UNITS VertexEnvParameter ndx float4 ARB vertex program ndx must be greate
125. PointSize float 1 0 PointSizeMin float 1 4 ARB point parameters or EXT point parameters 134 808 00504 0000 006 NVIDIA Introduction to CgFX Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires PointSizeMax float OpenGL 1 4 ARB point parameters or EXT point parameters PointSpriteCoordOrigin int LowerLeft 2 0 UpperLeft PointSpriteCoordReplace bool 2 0 ARB point sprite ndx Or NV point sprite ndx must be greater than or equal to zero and less than the value of GL MAX TEXTURE COORDS PointSpriteRMode int Zero R S NV point sprite PolygonMode int2 Front Back 1 0 FrontAndBack Point Line Fill PolygonOffset float2 1 1 ProjectionMatrix float4x4 1 0 Scissor int4 1 0 ShadeModel int Flat Smooth 1 0 StencilFunc int3 Never Less 1 0 LEqual Equal Greater NotEqual GEqual Always StencilMask int 1 0 Stencilop int3 Keep Zero 1 0 Replace Incr Decr Invert IncrWrap DecrWrap StencilFuncSeparate int4 Front Back 2 0 or FrontAndBack Never Less LEqual Equal Greater NotEqual GEqual Always EXT stencil two side 808 00504 0000 006 NVIDIA 135 Cg Language Toolkit Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires StencilMaskSeparate int2 Front Back OpenGL 2 0 or FrontAndBack EXT stencil two side StencilOpSepara
126. RESULT cgD3D9SetUniformMatrixArray CGparameter parameter DWORD startIndex DWORD numberOfElements const D3DMATRIX matrices The parameters startIndex and numberOfElements have the same meanings as for cgD3D9SetUniformMatrix The upper left portion of each matrix of the array matrices is extracted to fit the size of the element of the array parameter parameter Array matrices is assumed to have numberOfElements elements 808 00504 0000 006 101 NVIDIA Cg Language Toolkit Setting Sampler Parameters You assign a Direct3D texture to a sampler parameter using HRESULT cgD3D9SetTexture CGparameter parameter IDirect3DBaseTexture9 texture To set the sampler state in the Direct3D 9 Cg runtime use HRESULT cgD3D9SetSamplerState CGparameter parameter D3DSAMPLERSTATETYPE type DWORD value Parameter type is any of the D3DSAMPLERSTATETYPE enumerants and parameter value is a value appropriate for the corresponding type Here is an example of how to use this function cgD3D9SetSamplerState parameter D3DSAMP_MAGFILTER D3DTEXF_LINEAR To set the texture stage state in the Direct3D 8 Cg runtime use HRESULT cgD3D8SetTextureStageState CGparameter parameter D3DTEXTURESTAGESTATETYPE type DWORD value Parameter type must be one of the following values D3DTSS_ADDRESSU D3DTSS_ADDRESSV D3DTSS_ADDRESSW D3DTSS_BORDERCOLOR D3DTSS_MAGFILTER D3DTSS_MINFILTER D3DTSS_MIPFILTER D3DTSS_MIPMAPLODBIAS D3DTSS_MAXMI
127. RGUERCROROR AR RA 279 Sp ae 280 OpenGL NV texture shader and NV register combiners Profile fp20 283 OVeIVIGW c expe AA al Me Ade eed Na eU tap EOE oh ER 283 sca cd 283 uno C 284 Language Constr cts arid SUpDOFt sspe re ur Pewee a Parr 285 Standard Library EURCEIORS ci ace ee eat RR op BOR RC RR a RU RO Re ee OR 286 Sp aee C 288 Auxiliary Texture FubcHorls xut sie edet de ime grind UIS ap aro RR pci AE e OR 290 Examples s sacs m RR a PEER KL Na cm UE Mac RON 295 DirectX Vertex Shader 2 x Profiles vs 2 se II 296 Or 296 Memory eus dunk PA eee RO ER EPEN G ADEE PET CRE GE ae ee 296 Statements and Operators 2 vrs rar wie ee DRE AES LEE EASE dd 297 Data TYPES wise cece ee cee ecb OO ERORROEGUPR CRAY AR CRESS CREE OES RRS 297 USIN GLASS ace trud ports eder a cs O O E edad andes a 297 BIMGIINGS ac aces ia ea id da Rara Bonn de Sons oa mae eR 298 E ceeded Resa 299 DirectX Pixel Shader 2 x Profiles ps 2 0 cee eee eee 300 A E peddle tata soo ide eats A Sorde 300 Language Constructs and SUPPO xu ose tera a ads 301 e amimga Eo EORR E SOROR Se mead A aud de meti a ed Do 302 ORT OMS ccn dre biti Rodin ea en tao harta do aed 303 vi 808 00504 0000 006 NVIDIA Limitations inthis Implementation pan le 303 DirectX Vertex Shader 1 1 Profile vs 1 1 s RR 304 Memory RestriCLIOns
128. ResourceToDeclUsage 90 cgD3D8ValidateVertexDeclaration 88 cgD3D9ResourceToDeclUsage 90 cgD3D9ValidateVertexDeclaration 88 Direct3D 8 application 95 Direct3D 9 application 92 fragment program 92 type retrieval 91 vertex declaration 85 vertex declaration for Direct3D 8 86 vertex declaration for Direct3D 9 86 vertex program 91 Direct3D debug DLL using 113 DirectX pixel shader 1 x profiles 308 DirectX pixel shader 2 x profile 300 DirectX vertex shader 1 1 profile 304 Cg Language Toolkit DirectX vertex shader 2 x profile 296 dot for performance 324 dx8ps profile deprecated 308 E effect 117 Effect parameter 118 effect parameters 121 evaluating Cg programs 127 explicit casts compile time 235 numeric 236 numeric matrix 236 numeric vector 236 F fixed datatype 11 fixed type specification 229 float data type 11 float type specification 229 floating type category 232 for statements 244 fp20 profile 283 fp30 profile 274 fragment profiles texture lookups 23 fragment program 121 predefined output structures 42 varying output 9 fragment program profiles 252 OpenGL ARB 263 OpenGL NV fragment program 274 fragment program defined 3 fresnel 200 sample shader 200 vertex shader code example 200 function calls 228 multiplying 20 open profile 227 function definitions introduction 19 function overloading 240 introduction 19 functions debugging 41 declaring 226 derivative 41 geometric 38 mathematical
129. Specular and diffuse lighting are computed per vertex in a Cg program along with a view depth parameter which is computed using the view vector surface normal and the depth of the thin film on the surface of the object The view depth is then perturbed in an ad hoc manner per fragment by the underlying decal texture and is then used to lookup into a 1D texture containing the precomputed destructive interference for red green blue wavelengths given a particular view depth This interference value is then used to modulate the specular lighting component of the standard lighting equation Fig 11 Example of Thin Film Effect Vertex Shader Source Code for Thin Film Effect define inputs from application JEJEUIG E UE ElOat4 Position e POSITION 180 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders float3 Normal NORMAL y define outputs from vertex shader Siew VAE float4 HPOS POSITION tloei4 crece COLOR OF float specCol 8 COMLORIL float2 filmDepth TEXCOORDO y v2f main a2v IN uniform float4x4 WorldViewProj uniform float4x4 WorldViewIT uniform float4x4 WorldView uniform float4 LightVector uniform float4 FilmDepth uniform float4 EyeVector WE QUUD transform position to clip space OUT HPOS mul WorldViewProj IN Position float4 tempnorm float4 IN Normal 0 0 transform normal from model space to view spac float3 normalVec mul WorldViewIT
130. TYPE FLOAT3 D3DDECLMETHOD_DEFAULT cgD3D9ResourceToDeclUsage cgGetParameterResource position cgGetParameterResourceIndex position LO 3 Sizcor alo D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage cgGetParameterResource color cgGetParameterResourceIndex color i 4 8 sizcor loe D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage cgGetParameterResource texCoord cgGetParameterResourceIndex texCoord D3DD3CL END y DWORD declaration D3DVSD_STREAM 0 D3DVSD_REG cgD3D8ResourceToInputRegister cgGetParameterResource position D3DVSDT_FLOAT3 D3DVSD REG cgD3D8ResourceToInputRegister 90 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library cgGetParameterResource color D3DVSDT_D3DCOLOR D3DVSD_STREAM 1 D3DVSD_SKIP 4 D3DVSD REG cgD3D8ResourceToInputRegister cgGetParameterResource texCoord D3DVSDT FLOAT2 D3DVSD END y The size specified as the second argument of the D3DVSD_REG macro call of a Direct3D 8 declaration does not need to match the size of the corresponding parameter for the vertex declaration to be valid Those sizes are specified to describe how the data is laid out in the streams not to perform any type checking with the shader code The data referr
131. Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires BlendFuncSeparate int4 Zero One OpenGL 1 4 or rgb_src DestColor EXT blend func separate rgb dst OneMinusDestColor 1 4 or NV blend square a src SrcAlpha for SrcColor or a_dst OneMinusSrcAlpha OneMinusSrcColor for DstAlpha rgb_src and DstColor or OneMinusDstAlpha OneMinusDstColor for SrcAlphaSaturate rgb_dst SrcColor OneMinusSrcColor ConstantColor OneMinusConstantColor ConstantAlpha OneMinusConstantAlpha BlendEquation int FuncAdd 1 4 or ARB_imaging or FuncSubtract Min EXT blend subtract for Max LogicOp FuncSubtract Or FuncReverseSubtract Or EXT blend minmax for Min Or Max or EXT_blend_logic_op for LogicOp BlendEquationSeparate int2 rgb FuncAdd EXT_blend_equation_ alpha FuncSubtract Min separate or 1 4 Max LogicOp ARB_imaging Or EXT_blend_subtract for FuncSubtract or FuncReverseSubtract Ol 1 4 ARB_imaging or EXT_blend_minmax for Min Or Max or EXT_blend_logic_op for LogicOp BlendColor float4 1 4 ARB_imaging or EXT blend color ClearColor float4 1 0 ClearStencil int 1 0 ClearDepth float 1 0 808 00504 0000 006 NVIDIA 131 Cg Language Toolkit Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires ClipPlane ndx float4 OpenGL 1 0 ndx must be greater than or equal to zero
132. The runtime gives you the option of modifying the values of your program parameters The first step is to get a handle to the parameter CGparameter myParameter cgGetNamedParameter program myParameter The variable myParameter is the name of the parameter as it appears in the program source code The second step is to set the parameter value The function used depends on the parameter type Here is an example in OpenGL cgGLSetParameter4fv myParameter value 808 00504 0000 006 47 NVIDIA Cg Language Toolkit Here is the same example in Direct3D cgD3D9SetUniform myParameter value Numeric parameters may also be set using core Cg runtime calls such as cgSetParameterValuefr myParameter 4 value These function calls assign the four floating point values contained in the array value to the parameter myParameter which is assumed to be of type float4 In both APIs there are variants of these calls to set matrices arrays textures and texture states The core Cg runtime provides variants of these calls to set the value of numeric parameters including scalars vectors arrays and structures The graphics API specific runtimes must be used to set API specific values such as sampler handles Executing a Program Before you can execute a program in OpenGL you must enable its corresponding profile cgGLEnableProfile CG_PROFILE_ARBVP1 In Direct3D nothing explicitly needs to be done to
133. X greater than Unlike C Cg allows all boolean operators to be applied to vectors in which case boolean operations are performed in an elementwise fashion The result of such a boolean expression is a vector of bool elements with that number of elements being the same as the two source vectors Also unlike C the logical AND amp amp and logical OR 1 operators cannot be used for short circuiting evaluation side effects of both sides of these expressions always occur regardless of the value of the boolean expression 808 00504 0000 006 21 NVIDIA Cg Language Toolkit Swizzle Operator Cg has a swizz e operator that allows the components of a vector to be rearranged to form a new vector The new vector need not be the same size as the original vector elements can be repeated or omitted The characters x y z and w represent the first second third and fourth components of the original vector respectively The characters r g b and a can be used for the same purpose Because the swizzle operator is implemented efficiently in the GPU hardware its use is usually free The following are some examples of swizzling float3 a b c zyx yields float3 c b a float4 a b c d xxyy yields float4 a a b b float2 a b yyxx yields 1oat4 b b a a float4 a b c d w yields d The swizzle operator can also be used to create a vector from a scalar a xxxx yiclds float4 a a a a The precedence of th
134. _shader instruction combinations texCUBE_reflect_dp3x3 uniform samplerCUBE tex float4 strq float4 intermediate coordl float4 intermediate coord2 float4 prevlookup Performs the following float3 E float3 intermediate coord2 w intermediate coordl w strq w float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot strq xyz prevlookup xyz return texCUBE tex 2 dot N E dot N N N E where strq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate coordl are texture coordinates associated with the n 2 texture unit and intermediate coord2 are texture coordinates associated with the n 1 texture unit This function can be used to generate the dot product reflect cube map eye from qs NV texture shader instruction combination 808 00504 0000 006 NVIDIA 293 Cg Language Toolkit Table 38 p20 Auxiliary Texture Functions continued Texture Function Description texCUBE reflect eye dp3x3 uniform samplerCUBE tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup uniform float3 eye Performs the following float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot coords xyz prevlookup xyz return texCUBE tex 2 dot N E dot N
135. age Specification correct precision and range but is not required to produce bit exact results It is recommended that compilers provide an option either to forbid these optimizations or to guarantee that they are made in bit exact fashion Operator Precedence Cg uses the same operator precedence as C for operators that are common between the two languages The swizzle and write mask operators have the same precedence as the structure member operator and the array index operator 1 Operator Enhancements The standard C arithmetic operators unary are extended to support vectors and matrices Sizes of vectors and matrices must be appropriately matched according to standard mathematical rules Scalar to vector promotion see Smearing of Scalars to Vectors on page 237 allows relaxation of these rules Table 10 Expanded Operators Operator Description M n m Matrix with n rows and mcolumns V n Vector with n elements V n gt V n M n gt M n Unary vector negate Unary matrix negate vin V n gt V n Componentwise V n V n gt V n Componentwise Componentwise V n V n V n V n Vin gt V n Componentwise vin V n gt V n Componentwise M n m M n m gt M n m Componentwise M n m M n m gt M n m Componentwise M n m M n m
136. agment programs Fragment programs are required to declare and set a vector output that uses the COLOR semantic This value is usually used by the hardware as the final color of the fragment Some fragment profiles also support the DEPTH output semantic which allows the depth value of the fragment to be modified and some support additional color outputs for hardware that supports multiple render targets MRIs As with vertex programs fragment programs may return their outputs in the body of a structure However it is usually more convenient to either declare outputs as out parameters valo mana o y cue Eloar4 olor 8 COLOR otic flogs esca s Dase qd PES we yf colori Chlitcusecolor E cole depth or to associate a semantic with the return value of the shader Flota mala Y asco 8 COLOR 4 IE oem tus reruwa lliexwiexexclolione 4 3 0 w g The following example shows a simple vertex program that calculates diffuse and specular lighting Two structures for varying data appin and vertout are also declared Don t worry about understanding exactly what the program is doing the goal is simply to give you an idea of what Cg code looks like A Brief Tutorial on page 145 explains this shader in detail Define inputs from application struct appin float4 Position 8 ISO X ONIE float4 Normal NORMAL y 808 00504 0000 006 9 NVIDIA Cg Language Toolkit Define outputs fr
137. alues are propagated do not appear as lvalues within any kind of control statement if for or while or construct Profiles may choose to support more general constant propagation techniques but such support is not required Q Profiles may optionally support fully general for and while loops New Vector Operators These new operators are defined for vector types Q Vector construction operator lt typeID gt This operator builds a vector from multiple scalars or shorter vectors float4 scalar scalar scalar scalar float4 float3 scalar Q Matrix construction operator typeID 244 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification This operator builds a matrix from multiple rows Each row may be specified either as multiple scalars or as any combination of scalars and vectors with the appropriate size float3x3 1 2 3 4 5 6 7 8 9 float3x3 float3 float3 float3 float3x3 1 float2 float3 float3 1 1 1 Q Swizzle operator a b xxyz A swizzle operator exampl Atleast one swizzle character must follow the operator There are two sets of swizzle characters and they may not be mixed Set one is xyzw 0123 and set two is rgba 0123 The vector swizzle operator may only be applied to vectors or to scalars Applying the vector swizzle operator to a scalar gives the same result as applying the operator to a vector of length one Thus myscalar xxx and
138. ameterArray3d CGparameter parameter long startIndex long numberOfElements double array cgGLGetParameterArray4f CGparameter parameter long startIndex long numberOfElements float array cgGLGetParameterArray4d CGparameter parameter long startIndex long numberOfElements double array 808 00504 0000 006 NVIDIA TI Cg Language Toolkit Similar functions exist to set the values of arrays of uniform matrix parameters void cgGLSetMatrixParameterArrayfr CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetMatrixParameterArrayfc CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements const double array void cgGLSetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements const double array and to query those values void cgGLGetMatrixParameterArrayfr CGparameter parameter long startIndex long numberOfElements float array void cgGLGetMatrixParameterArrayfc CGparameter parameter long startIndex long numberOfElements float array void cgGLGetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements double array void cgGLGetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements double array The e and r suffixes have
139. ameters fall into three broad categories program parameters effect parameters and shared parameters Program parameters are associated with Cg programs A parameter that is declared as part of the program s entry point belongs to the program s 54 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library namespace A parameter that is declared globally in the file scope of the Cg program belongs to the program s global namespace Effect parameters are associated with Cg Effects See the Introduction to CgFX chapter for more information on managing effect parameters Shared parameters are associated with Cg contexts See Shared Parameters on page 59 for more details Cg functions exist for retrieving creating and querying program parameters Program Parameter Retrieval Parameters associated with Cg programs may be retrieved iteratively or directly Iteration A program has a sequence of parameters that can be iterated over by using cgGetFirstParameter and cgGetNextParameter CGparameter cgGetFirstParameter CGprogram program CGenum namespace CGparameter cgGetNextParameter CGparameter parameter A call to cgGetFirstParameter returns the first parameter of the sequence If the program is invalid or does not contain any parameter the call returns zero Given a parameter cgGetNextParameter returns the parameter immediately next in the sequence or zero if there is none The namespace
140. and CLPO CLP5 to be present as binding semantics on a member of a structure of a varying input data structure provided the member with this binding semantics is not referenced This allows Cg programs to have the same structure specify the varying output of a vp30 profile program and the varying input of an p30 profile program Table 27 p30 Varying Input Binding Semantics Binding Semantics Name Corresponding Data type COLORO COLO Input color float 4 COLOR1 COL1 Input colorl 1oat4 TEXCOORDO TEXCOORD7 Input texture coordinates float 4 TEXO TEX7 WPOS Window Position Coordinates float 4 808 00504 0000 006 275 NVIDIA Cg Language Toolkit The valid binding semantics for varying output parameters in the p30 profile are summarized in Table 28 Table 28 p30 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO COL Output color float 4 DEPTH DEPR Output depth float Pack and Unpack Functions The p30 profile provides a number of functions for packing multiple floating point values into a single 32 bit result Corresponding unpacking functions are also provided These functions map directly to the packing and unpacking instructions defined by the Nv ragment program OpenGL extension pack_2half float pack 2half float2 a float pack 2half half2 a Converts the components of a into a pair of 16 bit f
141. are associated with a CGcontext They may be created with the following entry points CGparameter cgCreateParameter CGcontext ctx CGtype type CGparameter cgCreateParameterArray CGtype type int length CGparameter cgCreateParameterMultiDimArray CGtype type int dim int lengths Only parameters of concrete types may be created In particular parameters of abstract interface types may not be created By default a created parameter has uniform variability and undefined values Shared Parameter Deletion Shared parameters may be deleted using Void cgDeleteParameter CGparameter param When a shared parameter is deleted all parameters connected to it are disconnected and vice versa Connecting Parameters Once created a shared parameter may be connected to any number of program effect or shared parameters using void cgConnectParamteer CGparameter source CGparameter sink where source is the shared parameter and sink is the target parameter that will inherit the shared parameter s values Once a parameter has had a source connected to it its value should no longer be set directly Instead its value can be set indirectly by setting the value of the associated sink A parameter that has been connected to a shared source parameter may be disconnected using Void cgDisconnectParameter param Shared Parameters and Interfaces Using Cg it is possible to create families of code modules that share a common inte
142. argument of cgGetFirstParameter specifies the name space of the parameters returned by this function and subsequent calls to cgGetNextParameter Every parameter belongs to a particular name space that defines its scope When CG_GLOBAL is specified the program s global parameters i e those parameters that are in the file scope of the program s entry point are iterated over When CG PROGRAM is specified the parameters specified in the program s entry point declaration are iterated over Here is how those two functions would typically be used given a valid program called program CGparameter parameter cgGetFirstParameter program CG PROGRAM while parameter 0 Here is the code that handles the parameter parameter cgGetNextParameter parameter 808 00504 0000 006 55 NVIDIA Cg Language Toolkit These functions don t provide access to the fields of a structure parameter type CG_STRUCT or the elements of an array parameter type CG_ARRAY In other words if a struct or array parameter is declared these entry points return will return a handle to the struct or array itself One way to access the fields of a structure is to use cgGetFirstStructParameter along with cgGetNextParameter CGparameter cgGetFirstStructParameter CGparameter parameter If parameter is not of type CG STRUCT cgGetFirstStructParameter returns zero Similarly to get access to the elements of an ar
143. ariable may only be used by passing it to another function as an in parameter Assignment to sampler variables is not permitted and sampler expressions are not permitted The following sampler types are always defined sampler sampler1D sampler2D sampler3D samplerCUBE and samplerRECT The base sampler type may be used in any context in which a more specific sampler type is valid However a sampler variable must be used in a consistent way throughout the program For example it cannot be used in place of both a sampler1D and a sampler2D in the same program Fragment profiles are required to fully support the sampler sampler1D sampler2D sampler3D and samplerCUBE data types Fragment profiles are required to provide partial support see Partial Support of Types on page 231 for the samplerRECT data type and may optionally provide full support for this data type Vertex profiles are required to provide partial support for the six sampler data types and may optionally provide full support for these data types An array type is a collection of one or more elements of the same type An array variable has a single index Some array types may be optionally designated as packed using the packed type modifier The storage format of a packed type may be different from the storage format of the corresponding unpacked type The storage format of packed types is implementation dependent but must be consistent for any particular combinatio
144. ariables are initialized with the same value but the variables are not aliased thereafter Output aliasing is illegal but implementations are not required to detect it If the compiler does not issue an error on a program that aliases output binding semantics the results are undefined Restrictions on Semantics Within a Structure For a particular profile it is illegal to mix input binding semantics and output binding semantics within a particular struct That is for a particular top level function a struct must be either input only or output only Likewise a struct must consist exclusively of uniform inputs or exclusively of non uniform inputs It is illegal to use binding semantics to mix the two within a single struct Additional Details for Binding Semantics The following rules are somewhat redundant but provide extra clarity Semantics names are case insensitive D Semantics attached to parameters to non main functions are ignored O Input semantics may be aliased by multiple variables a Output semantics may not be aliased How Programs Receive and Return Data A program is just a non static function that has been designated as the main entry point at compilation time The varying inputs to the program come from this top level function s varying in parameters The uniform inputs to the program come from the top level function s uniform in parameters and from any non static global variables that are referenced by the
145. as main must be declared as uniform A structure that implements a particular interface may be used wherever its interface type is expected For example float3 myfunc Light light lloeies resule lacing n abd Ib saos los o 8 float4 main uniform SpotLight spot float3 color myfunc spot Here the SpotLight variable spot may be used as a generic Light in the call to my unc because Spot Light implements the Light interface It is possible to declare a local variable of an interface type However a concrete structure must be assigned to that variable before any of the interface s methods may be called For example Light mylight SpotLight spot tlocrs Colors fs imitialize spe 7 colori milicia c dla rs 7 7 cra mylight spot ecole mylieiae dlluminste lao OK 808 00504 0000 006 17 NVIDIA Cg Language Toolkit Under all current profiles the concrete implementation of all interface method calls must be resolvable at compile time There is no dynamic run time determination of which implementation to call under any current profile See the interfaces_ogl example included in the Cg distribution for an example of the use of interfaces Notes and Caveats The following limitations may be addressed in future releases a There is no inheritance per se in Cg a structure may not inherit from another structure Q Structures may only implement a single interface Q Interfa
146. as Rr eee die eR RR RU ORE crar eds 121 Textures and Samplers iu s xa vox CREDO x bete dA eats URN UR Be RR IR 123 Interfaces and Unsized ATTAYS lt lt sies es ep kom Rem oh RR E heh a 125 Evaluating Cg Programs using the Virtual Machine llle 127 ANOTACIONES aa ara ped 128 OpenGL State socio a A RA A A A EORR 129 OpenGL Sampler State us crop e a e A 141 OpenGL State Not Specifiable with State Assignments o oo ooooooooo 142 ABrief Tutorial 02 0 0 cece ee 145 Loading the WORKS PaCS sit a sig ginau go at AR aeg g a ae bet bom Rd 145 Understanding SIMP O air AAA 146 Program Listing Tor SIMPE C conos rar ei Meee ARR REA ERU 147 Definitions for Structures with Varying Data 0 0 0 0 cece es 148 Passing ArQUMENES 4 s cepe ee pp PCR rar eee oe Ide 149 ii 808 00504 0000 006 NVIDIA Basic Transformations aaa rre 149 Prepare TOR LATINAS ue aerobic sedis ese ah ok cae 150 Calculating the Vertex Colom 3 aac cepe ra etek ERU PRR Rd Ra RR gen 151 Further Experimentation ssa east rui BSG SIRE TES XR EN RAE 152 Advanced Profile Sample Shaders le eeeeeler nnne 153 Improved SKIMMING sos aciei gaa oa de pai s 154 A as g n pi etus ae eave tesi pants Rud lese gi S 154 Vertex Shader Source Code for Improved Skinning 000000 eae 155 Improved Water ego e ob m ee bb eed AA 157 DESCNPUIOM sis qom cos o ERE P XUL ee TARE Re Hd x 157 Vertex Shader Source Code for Impr
147. ate multiple materials without switching shaders splitting your model or resorting to multiple passes Uses for MultiPaint might include complex armor built of inlaid metals woods and stones all modeled on a single simple poly mesh buildings composed of multiple types of stone glass and metal expressed as simple cubes cloth with inlaid metallic threads or as in this demo metal partially covered with peeling paint Using multiple BRDFs is common in the offline world but rarely optimized instead two different shaders may be evaluated and their results blended using a mask texture or chained through if statements For maximum real time performance MultiPaint instead integrates all of the key parts of the BRDFs as multiple painted textures so that only one pass through the shader is required to create the mixed appearance This permits a single pass shader containing diffuse specular and environmental lighting effects in a compact fast executing package Fig 8 Example of MultiPaint 808 00504 0000 006 165 NVIDIA Cg Language Toolkit Vertex Shader Source Code for MultiPaint define inputs from vertex buffer struct appin float4 Position 2 IPOS IW IONS float4 UV TO LEXCOORDIO float4 Tangent ee LEEXCOOR DAN float4 Binormal ESO EID float4 Normal 2 TEXCOORD 3 y output same struct is the input to cg multipaint cg spice Milicia qd float4 HPosition 3 P
148. ay Traced Refraction ss 170 Fig 10 Example of SKI 2k 8b 244 6S eee Re BREE BOR RS A Re 175 Fig 11 Example of Thin Film Effect s s cotor o ooo m Ro Re n Rs 180 Fig 12 Example of Car Paint 9 uium pon ac e i A A A A A UR 183 Fig 13 Example of Anisotropic Lighting 00002000 190 Fig 14 Example of Bump Dot3x2 Diffuse and Specular 04 192 Fig 15 Example of Bump Reflection Mapping lens 196 Fig 16 Exampleof Fresnel e 2 3 6304 fe ae Oe ware 339 x0 49k Soe xx 200 Fig 17 EXample of Grass cce soi a ia Rw IRR ek es ee me d 202 Fig 18 Example of Refraction s o soior caom acanar a a a a a a a 205 Fig 19 Example of Shadow Mapping nsn 208 Fig 20 Example of Shadow Volume Extrusion llle 211 Fig 21 Example of Sine Wave 2 4 214 Fig 22 Example of Matrix Palette Skinning 2 2 2 00 00000 217 808 00504 0000 006 ix NVIDIA Cg Language Toolkit List of Figures X 808 00504 0000 006 NVIDIA List of Tables Table 1 Mathematical Functions o e 34 Table 2 Geometric FUNCIONS s e aa e AAA 38 Table 3 Texture Map Functions ons 39 Table 4 Derivative Functions 2 lens 41 Table 5 Debugging FUNCION uus os a AA ad ae 42 Table 6 CgFX OpenGL State Manager States 130 Table 7 Enable Disable States 2 2 0 oen 139 Table 8 sampler state State
149. ber of the source can be converted to the target ii Not allowed if target is larger than source Warning issued if target is smaller than source iii Only allowed if source and target are the same total size iv Only allowed if both source and target have the same number of members and each member of the source can be converted to the corresponding member of the target Explicit casts are Q Compile time type when applied to expressions of compile time type 808 00504 0000 006 235 NVIDIA Cg Language Toolkit Q Numeric type when applied to expressions of numeric or compile time type Q Numeric vector type when applied to another vector type of the same number of elements Q Numeric matrix type when applied to another matrix type of the same number of rows and columns Type Equivalency Type T1 is equivalent to type T2 if any of the following are true Q T2 is equivalent to T1 Q T1 and T2 are the same scalar vector or structure type A packed array type is not equivalent to the same size unpacked array Tl is a typedef name of T2 T1 and T2 are arrays of equivalent types with the same number of elements O The unqualified types of T1 and T2 are equivalent and both types have the same qualifications Q T1 and T2 are functions with equivalent return types the same number of parameters and all corresponding parameters are pair wise equivalent Type Promotion Rules The cfloat and cint types behave like
150. ble array The digit in the name of those functions indicates how many scalar values are set by the function The v suffix is for functions that operate on an array of values as opposed to individual arguments If more values are set than the parameter requires the extra values are ignored If less values are set than the parameter requires the last value is smeared The egGLSetParameter functions may be called for either uniform or varying parameters When called for a varying parameter the appropriate immediate mode OpenGL entry point is called The corresponding parameter value retrieval functions are as follows cgGLGetParameterlf CGparameter parameter float array cgGLGetParameterld CGparameter parameter double array cgGLGetParameter2f CGparameter parameter float array cgGLGetParameter2d CGparameter parameter double array cgGLGetParameter3f CGparameter parameter float array cgGLGetParameter3d CGparameter parameter double array cgGLGetParameter4f CGparameter parameter double array cgGLGetParameter4d CGparameter parameter type array Setting Uniform Matrix Parameters The egGLSetMatrixParameter functions are used to set any matrix void cgGLSetMatrixParameterfr CGparameter parameter const float matrix void cgGLSetMatrixParameterfc CGparameter parameter const float matrix void cgGLSetMatrixParameterdr CGparameter parameter const double matrix void cgGLSetMatrixParameterdc CGpa
151. blic register reinterpret cast return row major sampler sampler state sampler1D sampler2D sampler3D samplerCUBE shared short signed sizeof static static_cast string struct switch technique template texture texturelD 808 00504 0000 006 NVIDIA 249 Cg Language Toolkit texture2D texture3D textureCUBE textureRECT this throw true try typedef typeid typename uniform union unsigned using vector vertexfragment vertexshader virtual void volatile while identifier two underscores before identifier Cg Standard Library Functions Cg provides a set of built in functions and predefined structures with binding semantics to simplify GPU programming These functions are discussed in Cg Standard Library Functions on page 33 Vertex Program Profiles A few features of the Cg language that are specific to vertex program profiles are required to be implemented in the same manner for all vertex program profiles Mandatory Computation of Position Output Vertex program profiles may and typically do require that the program compute a position output This homogeneous clip space position is used by the hardware rasterizer and must be stored in a program output with an output binding semantic of POSITION or HPOS for backward compatibility Position Invariance In many graphics APIs the user can choose between two different approaches to specifying per vertex computations use a built in configurable fixed fu
152. bool 1 20r EXT texture3D ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE IMAGE UNITS 140 808 00504 0000 006 NVIDIA Introduction to CgFX Table 7 Enable Disable States continued Enable Disable State Name Type Requires TextureRectangleEnable ndx bool ARB texture rectangle EXT texture rectangle Apple or NV texture rectangle ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE IMAGE UNITS TextureCubeMapEnable ndx bool OpenGL 1 3 ARB texture cube map Or EXT texture cube map ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE IMAGE UNITS OpenGL Sampler State The following table lists the state assignments available in sampler state blocks when using the CgFX OpenGL state manager Any state values given are set when the cgSetSamplerState routine is called with the CGparameter handle for a particular sample Note that some of these states are defined in OpenGL extensions for example MirrorClampToBorder is defined in the EXT texture mirror clamp extension Any state used that is based on an extension not supported by the current OpenGL context is ignored by the CgFX runtime Table 8 sampler state State Assignments Name Type Valid Values Requires WrapS WrapT int WrapR Repeat Clamp ClampToEdge ClampToBorder MirroredRepeat
153. bout the contents of a Cg file Cg also includes built in vector data types that are based on the basic data types A sample of these built in vector data types includes but is not limited to the following float4 float3 float2 floatl bool4 bool3 bool2 booll Additional support is provided for matrices of up to four by four elements Here are some examples of matrix declarations floatixl matrixl One element matrix it llexeuE 2528 ieee P Two by three matrix six elements float4x2 matrix Four by two matrix eight elements locuras mede dios p Four by four matrix sixteen elements Note that the multi dimensional array 1oat M 4 4 is not type equivalent to the matrix float4x4 M There are no unions or bit fields in Cg at present Type Conversions Type conversions in Cg work largely as they do in C Type conversions may be explicitly specified using the C newtype cast operator Cg automatically performs type promotion in mixed type expressions just as C does For example the expression floatvar halfvar is compiled as floatvar float halfvar Cg uses different type promotion rules than C does in one case A constant without an explicit type suffix does not cause type promotion CG compiles the expression halfvar 2 0 as halfvar half 2 0 In contrast C would compile itas double halfvar 2 0 Cg uses different rules than C to minimize inadvertent type promotions that cause 12 808 00
154. by a large grid of vertices because of the free rotation but switching to wireframe or increasing the frustum angle makes it apparent that the vertices are a static mesh with the height normal and texture coordinates being calculated on the fly based on the direction and height of the viewer This technique allows for very GPU friendly water animations because the static mesh can be precomputed The vertices are displaced using sine waves and in this example a loop is used to sum five sine waves to achieve realistic effects Fig 6 Example of Improved Water 808 00504 0000 006 157 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Improved Water struct app2vert float4 Position 8 IPOS ILI NS H GUEIEUNEIE Vertrag icll cied IPO Slicslom 8 POSITION float4 TexCoord0 TEXCOORDO float4 TexCoordl TEXCOORD1 float4 Color0 2 COMOIRO y float4 Colorl o IOUL ONRUL e H void calcWave out float disp out float2 normal float dampening float3 viewPosition float waveTime float height float frequency float2 waveDirection float distancel dot viewPosition xy waveDirection distancel frequency distancel waveTime disp height sin distancel dampening normal cos distancel height frequency waveDirection xy 4 dampening vert2frag main app2vert IN uniform float4x4 ModelViewProj uniform float4x4 ModelView uniform float4x4 ModelViewIT uni
155. c ran eN Y 15 FragmentProgram compile arbfpl main 2 f technique AsmFrag pass 808 00504 0000 006 27 NVIDIA Cg Language Toolkit FragmentProgram asm EIA sx oOfCOLR Mhz WEG 2Dp END um Compile statements are generally the most commonly used of these three options for specifying programs They take the profile that the program is to be compiled to p30 p40 arbfp1 vp20 and so on the name of the function in the effect file to be compiled and a list of expressions 2 in the above example These expressions have a one to one correspondence with the uniform parameters of the program being compiled there must be exactly one for each uniform program parameter In the example above the expression 2 sets the value of the oo parameter to main Because it is using a literal value CgFX is able to compile the shader into a particularly efficient version that just includes returning the uv value Inline assembly is given with the asm keyword with the assembly language code between braces as in the example above CgFX depends on having the appropriate header at the start of the assembly FP1 0 for p30 ARBvp1 0 for arbvp1 and so on to determine which assembly profile the code is given in It is also possible to include effect parameters in the expression used in the compile statement For example losa meda Marto lost roo Sloet wy s UBOXCIOQNE
156. called FragmentProgram cg void FragmentProgram iin logar color amp COLORO in float4 texCoord TEXCOORDO out float4 coloro SOMO RO const uniform sampler2D BaseTexture const uniform float4 SomeColor colorO color tex2D BaseTexture texCoord SomeColor OpenGL Application This C code links the previous vertex and fragment programs to the application include lt cg cg h gt include lt cg cgGL h gt float vertexPositions Initialized somewher lse float vertexColors Initialized somewher lse float vertexTexCoords Initialized somewher lse GLuint texture Initialized somewher lse float constantColor Initialized somewher ls CGcontext context CGprogram vertexProgram fragmentProgram CGprofile vertexProfile fragmentProfile CGparameter position color texCoord baseTexture someColor modelViewMatrix Il Called art imicializariomn void CgGLInit 1 Create context context cgCreateContext 82 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Initialize profiles and compiler options vertexProfile cgGLGetLatestProfile CG GL VERTEX cgGLSetOptimalOptions vertexProfile fragmentProfile cgGLGetLatestProfile CG GL FRAGMENT cgGLSetOptimalOptions fragmentProfile Create the vertex program vertexProgram cgCreateProgramFromFile context CG SOURCE VertexProgram cg vertexProfile V
157. can be retrieved by calling cge LGet Text ureEnum see the following discussion The second step consists of enabling the texture unit associated with the sampler parameter for a specific drawing call It is strongly recommended 808 00504 0000 006 79 NVIDIA Cg Language Toolkit that applications allow the Cg OpenGL runtime library to perform this second step itself This is accomplished by calling void cgGLSetManageTextureParameters CGcontext context CGbool enable with enable set to a non zero value after the Cg context has been created When automatic texture parameter management is in effect the Cg OpenGL runtime will automatically enable all appropriate texture units when a CGprogram is bound If despite the above you wish to manage texture parameters yourself you can use the helper function void cgGLEnableTextureParameter CGparameter parameter which must be called after cgGLsetTextureParameter and before the actual drawing call The equivalent disabling function is void cgGLDisableTextureParameter CGparameter parameter You can retrieve the texture object assigned to a sampler parameter using GLuint cgGLGetTextureParameter CGparameter parameter You can retrieve the OpenGL enumerant for the texture unit associated with a sampler parameter using GLenum cgGLGetTextureEnum CGparameter parameter The returned enumerant has the form GL_TEXTURE ARB where is the texture unit index OpenG
158. ces cannot be extended or combined Although there is no structure inheritance it is possible to define a default implementation of a particular interface method The default implementation can be defined as a global function and structures that implement that interface may then call this default method via a wrapper Note also that interface and structure parameters of top level functions such as main may be connected to structures that are created in the runtime See the Cg runtime documentation for more details Statements and Operators Cg supports the following types of statements and operators Control flow Function definitions and function overloads Arithmetic operators from C Multiplication function Vector constructor Boolean and comparison operators Swizzle operator Write mask operator Ooo D OO O O O Conditional operator 18 808 00504 0000 006 NVIDIA Introduction to the Cg Language Control Flow Function Cg uses the following C control constructs Q Function calls and the return statement QO if else O while QO for These control constructs require that their conditional expressions be of type bool Because Cg expressions like i lt 3 are of type bool this change from C is normally not apparent Profiles like vs_2_x vp30 and vp40 support branch instructions so for and while loops are fully supported in these profiles In other profiles for and while loops may only be us
159. cgGLBindProgram CGprogram program Only one vertex program and one fragment program can be bound at any given time so binding a program implicitly unbinds any other program of that type Profiles are disabled using cgGLDisableProfile void cgGLDisableProfile CGprofile profile Some profiles may not be supported on some systems For example a given profile is not supported if the OpenGL extensions it requires are not available You can check if a profile is supported by using cgGLIsProfileSupported CGbool cgGLIsProfileSupported CGprofile profile It returns CG TRUE if profile is supported and cG FALSE otherwise OpenGL Program Examples This section presents code that illustrates how to use functions from the OpenGL Cg interface to make Cg programs work with OpenGL The vertex and fragment programs below are used in OpenGL Application on page 82 OpenGL Vertex Program The following Cg code is assumed to be in a file called VertexProgram cg void VertexProgram in float4 position B POSITION in float4 color eC ORO in float4 texCoord PLEX IIS 0T 808 00504 0000 006 81 NVIDIA Cg Language Toolkit au Fleece posltlomo 2 POSITION out loet coloro COLORO out float4 texCoordO TEXCOORDO const uniform float4x4 ModelViewMatrix positionO mul position ModelViewMatrix coloro colo texCoordO texCoord OpenGL Fragment Program The following Cg code is assumed to be in a file
160. chniques Validation fails for instance if a techniques includes a compile state assignment that references a profile that isn t supported on the current graphics hardware Similarly validation fails if the technique includes a state assignment that uses an unsupported OpenGL extension Effects are commonly written such that the application can iterate over the given techniques in order and then choose the first technique that passes validation to apply the effect For this reason techniques are usually given in order of decreasing quality The code below iterates through the techniques in a CGeffect in turn attempting to validate each of them and printing an error for the ones that fail CGtechnique technique cgGetFirstTechnique effect while technique if cgValidateTechnique technique CG FALSE fprintf stderr Technique s did not validate Skipping n cgGet TechniqueName technique technique cgGetNextTechnique technique The function cgIsTechniqueValidated can be used to check if the given technique has been validated Note that any Cg programs referenced in a technique are not compiled until the technique is validated This makes it possible to modify the uncompiled program by connecting concrete shared structs to interface effect parameters marking uniforms as literals changing the program s profile and so on Passes and Pass State The heart of CgFX is applyin
161. compiler has to add some that correspond to literal constant values in the code A parameter s variability can also be modified via the core Cg runtime using void cgSetParameterVariability CGparameter parameter CGenum vary Here vary may be one of 0 CG _ UNIFORM The parameter is set to uniform variability O CG_LITERAL The parameter is marked as a literal whose value can be assumed to be a compile time constant compilation This feature can be used to bake parameter values into the compiled Cg program which often produces much more efficient compiled code Q CG_DEFAULT The parameter reverts to its default variability as specified in the program text or is made to inherit its variability from any source it has been connected to Note that parameters may not currently be set to CG_VARYING variability To obtain the parameter direction use cgGetParameterDirection CGenum cgGetParameterDirection CGparameter parameter It returns CG IN if the parameter is an input parameter CG_OUT if the parameter is an output parameter or CG INOUT if the parameter is both an input and an output parameter The entry point cgGetParameterType retrieves the parameter name const char cgGetParameterName CGparameter parameter Use cgGetParameterSemantic to retrieve the parameter semantic string const char cgGetParameterSemantic CGparameter parameter If the parameter does not have any semantic an empty string is
162. ction set and machine architecture limit programmability in these profiles compared to what is allowed by Cg constructs Thus these profiles place additional restrictions on what can and cannot be done in a Cg program The main differences between these profiles from the Cg perspective is that additional texture addressing operations are exposed in ps 1 2 and ps 1 3 and the depth value output is made available in a limited form in ps 1 3 Operations in the DirectX pixel shader 1 X profiles can be categorized as texture addressing operations and arithmetic operations Texture addressing Operations are operations which generate texture addressing instructions arithmetic operations are operations which generate arithmetic instructions A Cg program in one of these profiles is limited to generating a maximum of four texture addressing instructions and eight arithmetic instructions Since 9 For more details about the underlying instruction sets their capabilities and their limitations refer to the MSDN documentation of DirectX pixel shaders 1 1 1 2 and 1 3 308 808 00504 0000 006 NVIDIA Modifiers Appendix B Language Profiles these numbers are quite small users need to be very aware of this limitation while writing Cg code for these profiles There are certain simple arithmetic operations that can be applied to inputs of texture addressing operations and to inputs and outputs of arithmetic operations without generating an a
163. ctors this causes a warning if it is done implicitly A matrix may also be converted implicitly to a matrix of the same size and shape and compatible element type 234 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification A matrix may be converted to a smaller matrix type the upper left submatrix is selected or to a vector of the same total size but a warning is issued if an explicit cast is not used Q Structure conversions A structure may be explicitly cast to the type of its first member or to another structure type with the same number of members if each member of the struct can be converted to the corresponding member of the new struct No implicit conversions of struct types are allowed Q Array conversions No conversions of array types are allowed Table 9 summarizes the type conversions discussed here The table entries have the following meanings but please pay attention to the footnotes Allowed allowed implicitly or explicitly D Warning allowed but warning issued if implicit Q Explicit only allowed with explicit cast a No not allowed Table 9 Type Conversions Target Type Source Type Scalar Vector Matrix Struct Array Scalar Allowed Warning Warning Explicit No Vector Allowed Allowed Warning Explicit No Matrix Allowed Warning Allowed Explicit No Struct Explicit No No Explicit No Array No No No No No i Only allowed if the first mem
164. d void OnCreateDevice Create the vertex shader vertexProgram cgCreateProgramFromFile context CG SOURCE VertexProgram cg CG PROFILE VS 2 0 VertexProgram 0 CComPtr ID3DXBuffer byteCode const char progSrc cgGetProgramString vertexProgram CG COMPILED PROGRAM D3DXAssembleShader progSrc strlen progSrc 0 O0 0 amp byteCode 0 If your program uses explicit binding semantics like this one you can create a vertex declaration using those semantics const D3DVERTEXELEMENT9 declaration Size o plod D3DDECLTYPE_FLOAT3 D3DDECLMETHOD_DEFAULT D3DDECLUSAGE POSITION O0 97 sizeof float D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT D3DDECLUSAGE COLOR O0 t sizeof float D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT D3DDECLUSAGE TEXCOORD O0 D3DD3CL END e I T o Oo e eee ler g Make sure the resulting declaration is compatible with the shader This is really just a sanity check assert cgD3D9ValidateVertexDeclaration vertexProgram declaration device gt CreateVertexDeclaration declaration amp vertexDeclaration device gt CreateVertexShader byteCode gt GetBufferPointer amp vertexShader Create the pixel shader fragmentProgram cgCreateProgramFromFile context
165. de that will be generated for a function named by an identifier is a definition 224 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification Profiles Compilation of a Cg program a top level function always occurs in the context of a compilation profile The profile specifies whether certain optional language features are supported These optional language features include certain control constructs and standard library functions The compilation profile also defines the precision of the float half and fixed data types and specifies whether the fixed and sampler data types are fully or only partially supported The choice of a compilation profile is made externally to the language by using a compiler command line switch for example The profile restrictions are only applied to the top level function that is being compiled and to any variables or functions that it references either directly or indirectly If a function is present in the source code but not called directly or indirectly by the top level function it is free to use capabilities that are not supported by the current profile The intent of these rules is to allow a single Cg source file to contain many different top level functions that are targeted at different profiles The core Cg language specification is sufficiently complete to allow all of these functions to be parsed The restrictions provided by a compilation profile are only needed for code generati
166. deprecated entry point CGtype cgGetParameterType CGparameter parameter This entry point differs from cgGetNamedUserType in that it always returns CG_STRUCT for any struct parameter rather than returning the enumerant associated with the user defined type of the struct The name associated with a given type enumerant can be queried using const char cgGetTypeString CGtype type If the string passed to cgGetType does not correspond to any type CG_UNKNOWN_TYPE is returned Function cgGetParameterBaseType returns the basic type of vector matrix and matrix parameters For example given a float 4x4 parameter cgGetParameterBaseType returns the CG_FLOAT type Similarly given a multidimensional array of float 4x4s it also returns CG_FLOAT It is also possible to determine the general class of the type of a parameter CGparameterclass cgGetParameterClass CGparameter param It returns one of the following enumerated values CG PARAMETERCLASS UNKNOWN CG_PARAMETERCLASS_SCALAR CG PARAMETERCLASS VECTOR CG_PARAMETERCLASS OBJECT CG PARAMETERCLASS MATRIX CG PARAMETERCLASS STRUCT CG PARAMETERCLASS ARRAY Parameter Type Equivalency If a program containing a user defined type is created in a context that already contains another program or effect that defines a user type with the same name the two type definitions are compared If both type definitions are found to be equivalent the CGtype enumerant associated with the user typ
167. der file before compilation 808 00504 0000 006 329 NVIDIA Cg Language Toolkit a longprogs Allow code generation that is longer than a profile s limit A debug Activate the debug function 0 v Print the compiler s version to stdout a h Print a short help message O maxunrollcount N Set the maximum loop unroll count to N Loops with greater than N iterations are not unrolled Defaults to 256 OU posinv Generate a position invariant vertex program if position invariance is supported by the current profile 330 808 00504 0000 006 NVIDIA A abs for performance 324 animation of geometry 202 anisotropic lighting sample shader 190 vertex shader code example 191 Annotation 118 ANSI C differences from Cg 222 relation toCg 221 arbfp1 profile 263 arbvpl profile 256 arithmetic operators 20 248 arithmetic precision 246 arithmetic range 246 array type specification 230 arrays declaration and use of 238 support of 14 B binding semantics 242 defined 6 overview 241 Blinn Phong Bump Mapping 175 booldatatype 11 bool type specification 229 boolean operators 21 248 built in functions 33 bump dot3x2 diffuse and specular pixel shader code example 194 sample shader 192 vertex shader code example 193 bump reflection mapping pixel shader code example 199 sample shader 196 vertex shader code example 197 C C preprocessor 808 00504 0000 006 supporting 241 C relation to Cg 221
168. e function usually has no cost in fragment programs Do not hesitate to use these functions when appropriate 4 Use Texture Maps to Encode Complex Functions For profiles that support texture maps filtered texture map lookups are extraordinarily efficient If you have a complex function that takes more than a handful of arithmetic operations to evaluate you might want to encode the function in a texture map Say that you have written a function x y that is a bottleneck in your shader Assume for now that it is always called with values of x and y between zero and one and that the value that x y computes is always between zero and one If the function is reasonably smooth and you don t need to compute it at extremely high precision you 324 808 00504 0000 006 NVIDIA Appendix C Nine Steps to High Performance Cg can precompute the function in your application and store it in a texture map replacing calls like float val f x y with code like Flogs val EDE oanp ller lol 3 03 This method can also be applied to one and three dimensional functions using 1D and 3D texture maps More generally the values you pass to the function may not be in the range 0 1 and the values your function returns may not be in the range 0 1 In this case the following two utility functions can serve as a base remapTo01 remaps the range low high into 0 1 remapFrom01 does the opposite float4 remapTo0
169. e DepthFunc Less AlphaTestEnable true AlphaFunc float2 Equal 0 26 808 00504 0000 006 NVIDIA Introduction to the Cg Language Parameters and Semantics The CgFX file also contains global Cg parameters These variables are usually passed as uniform parameters to Cg functions or as the values for render or texture state settings For instance a bool variable might be used as a uniform parameter to a Cg function or as a value enabling or disabling the alpha blend render state bool AlphaBlending false float bumpHeight 0 5f These variables can contain a user defined semantic which helps applications provide the correct data to the shader without having to decipher the variable names float4x4 myViewMatrix ViewMatrix texture2D someTexture DiffuseMap A CgFX enabled application can then query the CgFX file for its variables and their semantics Vertex and Fragment Programs With the OpenGL state manager vertex and fragment programs are defined via assignments to the VertexProgram and FragmentProgram states respectively Three different types of expressions can be on the right hand side of these program types O Compile statements Q In line assembly Q NULL These three possibilities are demonstrated in the effect file below float makin a Orme ioar Oo OA EXC OO RD O CODO return foo gt QU 2 uy 3 2 cum technique SimpleFrag pass VieKnirexP ho
170. e Support Two convenient functions are provided that give the highest vertex and pixel shader versions supported by the device CGprofile cgD3D9GetLatestVertexProfile CGprofile cgD3D9GetLatestPixelProfile This allows you to make your application future ready because the Cg programs are automatically compiled for the best profiles that are available at runtime even if these profiles did not exist at the time the application was written Another function that allows you optimal compilation is cgD3D9GetOptimalOptions It returns a string representing the optimal set of compiler options for a given profile char const cgD3D9GetOptimalOptions CGprofile profile This string is meant to be used as part of the argument parameter to cgCreateProgram It does not need to be destroyed by the application However its content could change if cgD3D9GetOptimalOptions is called again for the same profile but for a different Direct3D device Expanded Interface Program Examples In this section we provide programs that illustrates how and when to use functions from the expanded interface to make Cg programs work with Direct3D For the sake of clarity the examples do very little error checking but a production application should check the return values of all Cg 808 00504 0000 006 105 NVIDIA Cg Language Toolkit functions The vertex and fragment programs that follow are referenced in Expanded Interface DirectD3D 9 App
171. e application must then create a shared array of concrete light instances To do so the application proceeds as it would when operating on a CGprogram by retrieving the CGtype corresponding to each type of concrete instance to be created and calling cgCreateParameter or cgCreateParameterArray to create the shared parameter of the given type Lastly the shared parameter is connected to the effect parameter This process is illustrated below CGtype spotTyp cgGetNamedUserType effect SpotLight CGparameter spots cgCreateParameterArray context spotType 4 CGparameter lights cgGetNamedEffectParameter effect Ws giae p cgConnectParameter spots lights Note that cgGetNamedUserType in this case is passed a CGeffect handle rather than a CGprogram handle 126 808 00504 0000 006 NVIDIA Introduction to CgFX Later when the associated technique is validated any programs that make use of the abstract effect parameters are compiled Note that abstract parameters may not be used on the right hand side of any state assignments other than compile state assignments Doing so results in an error at effect creation time Evaluating Cg Programs using the Virtual Machine There are many situations where it is useful to execute Cg programs on the CPU using the Cg runtime Virtual Machine VM Although running Cg programs on the CPU doesn t offer the same performance as execution on the GPU it is som
172. e in the new program will be identical to that of the identical user type in the existing program or effect If the types are not equivalent the new type will be assigned a unique CGtype In this way type equivalency of 808 00504 0000 006 65 NVIDIA Cg Language Toolkit parameters shared between multiple programs and effects can be assured simply by comparing CGtype enumerants In order for two types to be considered equivalent they must meet the following requirements OQ The type names must match Both types must have the exact same name Q The parent types if any must match If the type is a structure both must either not implement an interface or both implement interfaces that are type equivalent Q The member variables and methods must match They must both have the exact same member variables and methods The order and name of the variables must match exactly and the order and name of the methods must match The signature of the methods including argument and return types must be identical Type equivalency is useful when using shared parameters instances with multiple programs by connecting them with cgConnectParameter Parameter Validity The function cgIsParameter allows you to check whether a parameter handle references a valid parameter or not CGbool cgIsParameter CGparameter parameter A parameter handle becomes invalid when the program or the context of the program it corresponds to is de
173. e most recent technology is highly programmable and becoming ever more so We can now write short vertex and fragment programs to be executed by the GPU This requires great skill and is only possible with short programs When GPU hardware grows to allow programs of hundreds thousands or even more instructions assembly coding will no longer be practical Rather than programming each rendering state each bit byte and word of data and control through a low level assembly language we want to express our ideas in a more straightforward form using a high level language Thus Cg C for Graphics becomes necessary and inevitable Just as C was derived to expose the specific capabilities of processors while allowing higher level abstraction Cg allows the same abstraction for GPUs Cg changes the way programmers can program focusing on the ideas the concepts and the effects they wish to create not on the details of the hardware implementation Cg also decouples programs from specific hardware because the language is functional not hardware implementation specific Also since Cg can be compiled at run time on any platform operating system and for any graphics hardware Cg programs are truly portable Finally and perhaps best of all Cg programs are future proof and can adapt to run well on future products The compiler can optimize directly for a new target GPU that perhaps did not even exist when the original Cg program was written
174. e of an object when used in an expression The qualifiers are Q const The value of a const qualified object cannot be changed after its initial assignment The definition of a const qualified object that is not a parameter must contain an initializer Named compile time values are inherently qualified as const but an explicit qualification is also allowed The value of a static const cannot be changed after compilation and thus its value may be used in constant folding during compilation A uniform const on the other hand is only const for a given execution of the program its value may be changed via the runtime between executions Q inandout Formal parameters may be qualified as in out or both by using in out or inout By default formal parameters are in qualified An in qualified parameter is equivalent to a call by value parameter An out qualified parameter is equivalent to a call by result parameter and an 808 00504 0000 006 233 NVIDIA Cg Language Toolkit inout qualified parameter is equivalent to a value result parameter An out qualified parameter cannot be const qualified nor may it have a default value Type Conversions Some type conversions are allowed implicitly while others require an cast Some implicit conversions may cause a warning which can be suppressed by using an explicit cast Explicit casts are indicated using C style syntax casting variable to the float4 type can be achieved using floa
175. e one automatically Scalar uniform parameters may be allocated to either the xyz or the w portion of a constant register depending on how they are used within the Cg program When using the output of the compiler without the Cg runtime you must set all values of a scalar uniform to the desired scalar value not just the x component The valid binding semantics for uniform parameters in the p20 profile are summarized in Table 35 288 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 35 p20 Uniform Binding Semantics Binding Semantics Name Corresponding Data register s0 register s3 Texture unit N where N is in range 0 3 TEXUNITO TEXTUNIT3 May be used only with uniform inputs with sampler types The ps_1_x profiles allow the programmer to decide which constant register a uniform variable will reside in by specifying the C lt n gt register c lt n gt binding semantic This is not allowed in the p20 profile since the NV_register_combiners extension does not have a single bank of constant registers While the NV register combiners extension does describe constant registers these constant registers are per combiner stage and specifying bindings to them in the program would overly constrain the compiler Binding Semantics for Varying Input Output Data The varying input binding semantics in the p20 profile are the same as the varying output binding semantics of the vp20 profile
176. e shows a few common uses for annotations the annotation of LightDir indicates what sort of user interface widget would be appropriate to provide the user for setting that parameter The technique s annotation might indicate that applying the technique was optional when rendering the scene In the example above the pass annotations indicates to the application which part of the scene geometry to draw when rendering that pass as well as where to store the image from rendering the pass 128 808 00504 0000 006 NVIDIA Introduction to CgFX Given a handle to a technique pass or parameter there are API entry points for iterating through the annotations in turn CGannotation cgGetFirstTechniqueAnnotation CGtechnique CGannotation cgGetFirstPassAnnotation CGpass CGannotation cgGetFirstParameterAnnotation CGparameter CGannotation cgGetFirstProgramAnnotation CGprogram CGannotation cgGetNextAnnotation CGannotation In addition there are entry points for retrieving annotations by name CGannotation cgGetNamedTechniqueAnnotation CGtechnique const char CGannotation cgGetNamedPassAnnotation CGpass const char CGannotation cgGetNamedParameterAnnotation CGparameter const char CGannotation cgGetNamedProgramAnnotation CGprogram const char Given an annotation handle its values may be retrieved through the use of one of the cgGet AnnotationValues entry points const float cgGetFloatAnnotationVal
177. e specified length For example loe fune lose Seeds 4 logic mesi sc Eloat xd pe Pihost ve Pod it 1 1G float myvl func valsl match 6 6 808 00504 0000 006 15 NVIDIA Cg Language Toolkit float myv2 func vals2 no match 5 6 Unsized arrays may only be declared as function parameters they may not be declared as variables Furthermore in all current profiles the actual array length and address calculations implied by array indexing must be known at compile time Unsized array parameters of top level functions such as main may be connected to sized arrays that are created in the runtime or their size may be set directly for convenience See the cgSetArraySize manual in the Cg core runtime documentation for details Interfaces Cg supports interfaces a language construct found in other languages including Java and C and in C as pure virtual classes Interfaces provide a means of abstractly describing the member functions a particular structure provides without specifying how those functions are implemented When used in conjunction with parameter instantiation by the Cg runtime this abstraction makes it possible to plug in any structure that implements a given interface into a program even if the structure was not known to the author of the original program An interface declaration describes a set of member functions that a structure must define in order to impleme
178. e supported by this profile are presented in Table 33 See the standard library documentation for descriptions of these functions Table 33 Supported Standard Library Functions dot floatN floatN lerp floatN floatN floatN lerp floatN floatN float tex1D samplerl1D float tex1D sampler1D float2 286 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 33 Supported Standard Library Functions continued tex1Dproj sampler1D float2 tex1Dproj sampler1D float3 tex2D sampler2D float2 tex2D sampler2D float3 tex2Dproj sampler2D float3 tex2Dproj sampler2D float4 texRECT samplerRECT float2 texRECT samplerRECT float3 texRECTproj samplerRECT float3 texRECTproj samplerRECT float4 tex3D sampler3D float3 tex3Dproj sampler3D float4 texCUBE samplerCUBE float3 texCUBEproj samplerCUBE float4 Note The nonprojective texture lookup functions are actually done as projective lookups on the underlying hardware Because of this the w component of the texture coordinates passed to these functions from the application or vertex program must contain the value 1 Texture coordinate parameters for projective texture lookup functions must have swizzles that match the swizzle done by the generated texture shader instruction While this may seem burdensome it is intended to allow p20 profile
179. e swizzle operator is the same as that of the array subscripting operator 1 Write Mask Operator The write mask operator is placed on the left hand side of an assignment statement It can be used to selectively overwrite the components of a vector It is illegal to specify a particular component more than once in a write mask or to specify a write mask when initializing a variable as part of a declaration The following is an example of a write mask floats color isa tato O CORSO COINS Colona e Op 7 Sie cuidas o 1 0 lesa RES alone The write mask operator can be a powerful tool for generating efficient code because it maps well to the capabilities of GPU hardware The precedence of the write mask operator is the same as that of the swizzle operator Conditional Operator Cg includes C s if else conditional statement and conditional operator With the conditional operator the control variable may be a boo1 vector If so the second and third operands must be similarly sized vectors and selection is performed on an elementwise basis Unlike C any side effects 22 808 00504 0000 006 NVIDIA Introduction to the Cg Language associated with the second and third operands always occur regardless of the conditional As an example the following would be a very efficient way to implement a vector clamp function if the min and max functions did not exist Eloot ome lanpi oar oi Elo miavel lost maseyell
180. e the device changes or is destroyed void OnDestroyDevice device gt DeleteVertexShader vertexShader device gt DeletePixelShader pixelShader Called before application shuts down void OnShutdown This frees any core runtime resources The minimal interface has no dynamic storage to free cgDestroyContext context 808 00504 0000 006 97 NVIDIA Cg Language Toolkit Direct3D Expanded Interface If you use the expanded interface for a program in order to avoid any unfortunate inconsistencies it is advisable to stick with the expanded interface for all shader related operations that can be performed through its functions such as shader setting shader activation and parameter setting including setting texture stage states Setting the Direct3D Device The expanded interface encapsulates more functionality than the minimal interface to ease program and parameter management It does this by making the appropriate Direct3D calls at the appropriate times Because some of these calls require the Direct3D device it must be communicated to the Cg runtime HRESULT cgD3D9SetDevice IDirect3DDevice9 device You can get the Direct3D device currently associated with the runtime using cgD3D9GetDevice IDirect3DDevice9 cgD3D9GetDevice When egD3D9SetDevice is called with zero as an input all Direct3D resources used by the expanded interface are released Since a Direct3D device
181. e to find 1 the positions of the vertices in stream 0 as the first three floating point values of the vertex format 2 the normals as the next three floating point values following the three floating point values in stream 0 and 3 the texture coordinates as the two floating point values located at an offset equal to twice the size of a DWORD from the end of the normal data in stream 0 The tangents are 86 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library provided in stream 1 as a second texture coordinate set that is found as the first three floating point values of the vertex format To get a vertex declaration from a Cg vertex program for the Direct3D 9 Cg runtime use cgD3D9GetVertexDeclaration CGbool cgD3D9GetVertexDeclaration CGprogram program D3DVERTEXELEMENT9 declaration MAXD3DDECLLENGTH MAXD3DDECLLENGTH is a Direct3D 9 constant that gives the maximum length of a Direct3D 9 declaration If no declaration can be derived from the program cgD3D9GetVertexDeclaration fails and returns CG FALSE To get a vertex declaration from a Cg vertex program for the Direct3D 8 Cg runtime use cgD3D8GetVertexDeclaration CGbool cgD3D8GetVertexDeclaration CGprogram program DWORD declaration MAX FVF DECL SIZE MAX FVF DECL SIZE is a Direct3D constant that gives the maximum length of a Direct3D declaration If no declaration can be derived from the program cgD3D8GetVertexDeclaration fails and ret
182. eColor amp constantColor Called to render the scen void OnRender Load model view matrix D3DXMATRIX modelViewMatrix Hi Set the parameters that change every frame This must be done before binding the programs cgD3D8SetUniformMatrix modelViewMatrix amp modelViewMatrix Bind the programs This downloads any parameter values 808 00504 0000 006 111 NVIDIA Cg Language Toolkit that have been previously set cgD3D8BindProgram vertexProgram cgD3D8BindProgram fragmentProgram Draw scene Called before the device changes or is destroyed void OnDestroyDevice Ky Calling calg Fwetlisa cells da xpanded interface to release its internal reference to the Direct3D devic jf amd free TES Direct n masoumess cgD3D8SetDevice 0 Called before application shuts down void OnShutdown This frees any core runtime resource cgDestroyContext context Direct3D Debugging Mode In addition to the error reporting mechanisms described in Direct3D Error Reporting on page 114 a debug version of the Direct3D 9 or Direct3D 8 Cg runtime DLL is provided to assist you with the development of applications using the Direct3D 9 or Direct3D 8 Cg runtime This version does not have debug symbols but when used in place of the regular version it uses the Win32 function OutputDebugString to output many helpful messages and traces to the d
183. eae ad RR ERA OE RR ERR d 38 Derivative FUleHlofiS tet aa aa pal cil ede aon tao n t qd 41 Debugging FUNCION ss cavar oa ea 41 Predefined Fragment Program Output Structures 1 ce ees 42 Introduction to the Co Runtime Library sica A eee 43 Introducing the Cg RUMEME sacos ria e gai ies 43 Benetits of Me Cg RUNTIME ecc aos aux det a ai m RU edat e m ac Ro deca 44 Overview of the Cg RUHBEITIG oink ae Eee ae ERU ERROR RP EORR RUE Rr on ps 45 CORE Cg RUNM i id ons mo icon ae ps n hk mai a ec ated lea RD At Red UR UE RR UR mace OR 49 Core Cg COMPRE ic ebrei Ron edic dci aca qon cid a qo E aod 50 Core Cu PIOS uc egt kt te Hec eU a ero erty Dti deco NR 50 Core Cg Parameters cs erasa Eck e Race e EIE RON EL qud BS 54 Core Co Error Reporting assesses A Qu cR RE ETE ROS E ee 71 API Specific Cg RUNUMES comas o wae rte ton satis qu cr AR Reale d Hae Roan E 72 Parameter SHAG OWING sn i o uec dc cob Srt poA e pk du Ed epos E OR CR 73 OpenGE Cg Runtime aquest m QUSE REGEM ERE EYES PERPE wade 73 DireetaD Cg RUNTIME creta gee we eee Ree RE OU EER ERE ERE EX REG 85 Introduction to CgFX surco A a ee 117 COFX OVER W cosas 117 sudor 117 Getting Stated use init ope PLE ATO Mak E LE deb rd AA 118 Technigue Valldatioti v3 s dyna ed e815 AAA AA 120 Passes and Pass State eese eb Dr x ks erbe ie e A adest debe e 120 Effect Parameters usse sacco dci e ea oh alicia itat a nasce Rie S ROS aug Bole otto Sl avin otis 121 Vertex and Fragment Programs u
184. ebug output console Examples of information the debug DLL outputs are the following Q Any Direct3D or Cg core runtime errors Q Debugging information about parameters that are managed by the expanded interface Q Potential performance warnings Here is a sample trace CgD3D TRACE Creating vertex shader for program 3 cgD3D TRACE Discovering parameters for vertex program 3 CgD3D TRACE Discovered uniform parameter ModelViewProj of type float4x4 112 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library cgD3D TRACE Finished discovering parameters for vertex program 3 cgD3D TRACE Creating pixel shader for program 24 cgD3D TRACE Discovering parameters for pixel program 24 cgD3D TRACE Discovered sampler parameter BaseTexture cgD3D TRACE Discovered uniform parameter SomeColor of type float4 cgD3D TRACE Finished discovering parameters for pixel program 24 cgD3D TRACE Shadowing state for sampler parameter BaseTexture cgD3D TRACE Shadowing sampler state D3DTSS_MAGFILTER for sampler parameter BaseTexture cgD3D TRACE Shadowing sampler state D3DTSS_MINFILTER for sampler parameter BaseTexture cgD3D TRACE Shadowing sampler state D3DTSS_MIPFILTER for sampler parameter BaseTexture cgD3D TRACE Shadowing 16 values for uniform parameter ModelViewProj of type float4x4 cgD3
185. ed Function cgGetParameterResourceIndex retrieves the numerical portion of the resource unsigned long cgGetParameterResourcelndex CGparameter parameter For example if the resource for a given parameter is CG_TEXCOORD7 cgGetParameterResourcelIndex returns 7 The cgGetParameterValues function retrieves the default or constant value of a uniform parameter const double cgGetParameterValues CGparameter parameter CGenum valueType int numberOfValuesReturned It retrieves the default value if valueType is equal to CG_DEFAULT and the constant value if valueType is equal to CG_CONSTANT The components of the value are returned in row major order as a pointer to an array containing type double elements After cgGetParameterValues is called the number of components available in the array is pointed to by numberOfValuesReturned 70 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Core Cg Error Reporting An error code is associated with each type of runtime error that can be generated The runtime caches both the most recently generated error as well as the error that was first generated since the error code was last checked by the application Applications can query the cached error codes as well as the error message corresponding to either using CGerror error cgGetError CGerror error cgGetFirstEror const char errorString cgGetErrorString error An error code of 0 i
186. ed if the compiler can fully unroll them that is if the compiler can determine the iteration count at compile time Likewise return can only appear as the last statement in a function in these profiles Function recursion and co recursion is forbidden in Cg The switch case and default keywords are reserved but they are not supported by any profiles in the current release of the Cg compiler Definitions and Function Overloading To pass a modifiable function parameter in C the programmer must explicitly use pointers C provides a built in pass by reference mechanism that avoids the need to explicitly use pointers but this mechanism still implicitly assumes that the hardware supports pointers Cg must use a different mechanism because the vertex and fragment hardware of the GPU does not support the use of pointers Cg passes modifiable function parameters by value result instead of by reference The difference between these two methods is subtle it is only apparent when two function parameters are aliased by a function call In Cg the two parameters have separate storage in the function whereas in C they would share storage To reinforce this distinction Cg uses a different syntax than C to declare function parameters that are modified function olani out loci s x is output only fune tion blan Inour loari x is input and output function blah3 in loa X x is input only function blah4 float x x is inpu
187. ed in Cg is to associate a binding semantic with each element of the packet This is a bind by name approach For example an output with the binding semantic Foo is fed to an input with the binding semantic Foo Profiles may allow the user to define arbitrary identifiers in this semantic namespace or they may restrict 808 00504 0000 006 241 NVIDIA Cg Language Toolkit the allowed identifiers to a predefined set Often these predefined names correspond to the names of hardware registers or API resources In some cases predefined names may control non programmable parts of the hardware For example vertex programs normally compute a position that is fed to the rasterizer and this position is stored in an output with the binding semantic POSITION For any profile there are two namespaces for predefined binding semantics the namespace used for in variables and the namespace used for out variables The primary implication of having two namespaces is that the binding semantic cannot be used to implicitly specify whether a variable is in or out Binding Semantics A binding semantic may be associated with an input to a top level function in one of three ways Q The binding semantic is specified in the formal parameter declaration for the function The syntax for formal parameters to a function is const in out inout lt type gt lt identifier gt lt binding semantic gt lt initializer gt Q Ifthe f
188. ed is back facing greater than zero if it is front facing and zero if the fragment was from a line or a point 808 00504 0000 006 269 NVIDIA Cg Language Toolkit OpenGL NV_vertex_program 2 0 Profile vp30 The vp30 Vertex Program profile is used to compile Cg source code to vertex programs for use by the NV_vertex_program2 OpenGL extension Q Profile name vp30 Q How to invoke Use the compiler option profile vp30 The vp30 profile limits Cg to match the capabilities of the NV_vertex_program2 extension This section describes the capabilities and restrictions of Cg when using the vp30 profile Position Invariance Under vp30 unlike other profiles the following points can be made Q The posinv option won t cause an OPTION driver directive to be added to the assembly code header see the OpenGL specification for more details on this directive Q The instructions for transforming the position using the modelview projection matrix are emitted They are true because the final assembly code itself guarantees that the position calculation is invariant compared to the fixed pipeline calculation Language Constructs Data Types This profile implements data types as follows O float data type is implemented as IEEE 32 bit single precision O half data type is implemented as float O int data type is supported using floating point operations which adds extra instructions for proper truncation for divides
189. ed matrix types Implementations must also predefine type identifiers in the global scope to represent these types packed TYPE1 TYPE1x1 1 packed TYPE1 TYPE3x1 3 packed TYPE2 TYPE1x2 1 packed TYPE2 TYPE3x2 3 packed TYPE3 TYPE1x3 1 packed TYPE3 TYPE3x3 3 packed TYPEA TYPE1x4 1 packed TYPEA TYPE3x4 3 packed TYPE1 TYPE2x1 2 packed TYPE1 TYPE4x1 4 packed TYPE2 TYPE2x2 2 packed TYPE2 TYPE4x2 4 packed TYPE3 TYPE2x3 2 packed TYPE3 TYPE4x3 4 packed TYPE4 TYPE2x4 2 packed TYPE4 TYPE4x4 4 For example implementations must predefine the type identifiers float2x1 float3x3 float 4x4 and so on A typedef follows the usual matrix naming convention of TYPE_rows_X_columns If we declare float4x4 a then a 3 is equivalent to a _m30_m31_m32_m33 Both expressions extract the third row of the matrix Q Implementations are required to support indexing of vectors and matrices with constant indices O A struct type is a collection of one or more members of possibly different types O An interface type defines a collection of methods that comprises an abstract interface Partial Support of Types This specification mandates partial support for some types Partial support for a type requires the following Q Definitions and declarations using the type are supported 808 00504 0000 006 231 NVIDIA Cg Language Toolkit Q Assignment and copy of objects of that type are supported including implicit copie
190. ed to by a D3DVSD_REG macro call is expanded to the four floating point values of the corresponding hardware register and the missing values are set to 0 for x y and z and to 1 for w Minimal Interface Type Retrieval Use cgD3D9TypeToSize to retrieve the size of a CGtype enumerated type in terms of floating point numbers DWORD cgD3D9TypeToSize CGtype type More precisely it is the number of floating point values required to store a parameter of type type This function does not apply to some types like the sampler types in which case it returns zero It is useful because applications can determine how many floating point values they have to provide to set the value of a given parameter Minimal Interface Program Examples In this section we provide some code samples that illustrate how and when to use functions from the minimal interface to make Cg programs work with Direct3D To enhance clarity the examples do very little error checking but a production application should check the return values of all Cg functions The vertex and fragment programs below are referenced in Direct3D 9 Application on page 92 and Direct3D 8 Application on page 95 Vertex Program The following Cg code is assumed to be in a file called VertexProgram cg void VertexProgram o ADO So BOSE ON iia loci color COMO RO in float4 texCoord TEXCOORDO out float4 positionO POSITION 808 00504 0000 006 91 NVIDIA
191. effect contains one or more echniques A technique is intended to encapsulate the information needed to produce a visual effect graphics state shaders and at least one rendering pass Pass Each technique contains one or more rendering passes Passes store graphics state possibly including fixed function state settings and vertex and 808 00504 0000 006 117 NVIDIA Cg Language Toolkit fragment shaders The passes are generally processed in order CgFX sets the graphics state for a pass the application draws the scene geometry the state for the next pass is set geometry is drawn again and so on State assignment Passes hold state assignments that describe the graphics state for the pass Annotation Annotations make it possible to associate meta data with parameters techniques passes and so on For example a parameter like light Intensity might have annotations indicating the minimum and maximum valid values for the parameter Effect parameter Parameters declared in the global scope of the effect file are effect parameters Effect parameter values may be set and queried using the Cg runtime API Effect parameters may be referenced on the right hand side of state assignments and also as global parameters within Cg functions and programs defined within the effect Getting Started We expect that the reader is generally familiar with the Cg runtime See Introduction to the Cg Runtime Library on page
192. eft operand The side effect of updating the stored value of the left operand occurs between the previous and the next sequence point Smearing of Scalars to Vectors If a binary operator is applied to a vector and a scalar the scalar is automatically type promoted to a same sized vector by replicating the scalar into each component The ternary operator also supports smearing The binary rule is applied to the second and third operands first and then the binary rule is applied to this result and the first operand Namespaces Just as in C there are two namespaces Each has multiple scopes as in C O Tag namespace which consists of struct tags O Regular namespace typedef names including an automatic typedef from a struct declaration Variables Function names 808 00504 0000 006 237 NVIDIA Cg Language Toolkit Arrays and Subscripting Arrays are declared as in C except that they may optionally be declared to be packed as described under Types on page 229 Arrays in Cg are first class types so array parameters to functions and programs must be declared using array syntax rather than pointer syntax Likewise assignment of an array typed object implies an array copy rather than a pointer copy Arrays with size 1 may be declared but are considered a different type from the corresponding non array type Because the language does not currently support pointers the storage order of arrays
193. egister N where N is in range C0 C31 0 31 May only be used with uniform inputs Binding Semantics for Varying Input Output Data The valid binding semantics for varying input parameters in the ps 2 0 and ps 2 x profiles are summarized in Table 43 Table 43 ps 2 Varying Input Binding Semantics Binding Semantics Name Corresponding Data type COLORO Input color 0 1oat4 COLOR1 Input color 1 float 4 TEXCOORDO0 TEXCOORD7 Input texture coordinates float 4 The valid binding semantics for varying output parameters in the ps_2_0 and ps_2_x profiles are summarized in Table 44 Table 44 ps 2 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Output color float4 DEPTH Output depth 1oat 302 808 00504 0000 006 NVIDIA Appendix B Language Profiles Options The ps_2_x profile allows the following profile specific options NumTemps lt n gt where 0 lt n lt 32 default 32 NumInstructionSlots lt n gt where n gt 0 default 1024 Predication lt b gt where b 0 or 1 default 1 ArbitrarySwizzle lt b gt where b 0 or 1 default 1 GradientInstructions b where b 0 or 1 default 1 NoDependentReadLimit lt b gt where b 0 or 1 default 1 NoTexInstructionLimit lt b gt where b 0 or 1 default 1 Limitations in this Implementation Currently this profile implementation has the following limitations Q Dynamic flow control is not supported in extended pixel shaders Q
194. enable a specific profile Next you bind the program to the current state This means that in subsequent drawing calls the program is executed for every vertex in the case of a vertex program and for every fragment in the case of a fragment program Here s how to bind a program in OpenGL cgGLBindProgram program Here s how to bind a program in Direct3D cgD3D9BindProgram program You can only bind one vertex and one fragment program ata time for a particular profile Therefore the same vertex program is executed until another vertex program is bound Similarly the same fragment program is executed as long as no other fragment program is bound In OpenGL you disable profiles by the following call cgGLDisableProfile CG_PROFILE_ARBVP1 Disabling a profile also disables the execution of the corresponding vertex or fragment program 48 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Releasing Resources When your application is ready to close it is good programming practice to free resources that you ve acquired Because the Direct3D runtime keeps an internal reference to the Direct3D device you must tell it to release this reference when you are done using the runtime This is done with the following call cgD3D9SetDevice 0 To free resources allocated for a program call this function cgDestroyProgram program To free resources allocated for a context use this function cgD
195. encouraged to issue a warning a Implementations may choose to recognize more general versions of the second condition such as the variables being copy propagated from the original inputs and outputs but this additional generality is not required Binding Semantics for Outputs As shown in Table 11 there are two output binding semantics for vertex program profiles Table 11 Vertex Output Binding Semantics Name Meaning Type Default Value POSITION Homogeneous clip space position float4 Undefined fed to rasterizer PSIZE Point size float Undefined Profiles may define additional output binding semantics with specific behaviors and these definitions are expected to be consistent across commonly used profiles 808 00504 0000 006 251 NVIDIA Cg Language Toolkit Fragment Program Profiles A few features of the Cg language that are specific to fragment program profiles are required to be implemented in the same manner for all fragment program profiles Binding Semantics for Outputs As shown in Table 12 there are three output binding semantics for fragment program profiles Profiles may define additional output binding semantics with specific behaviors and these definitions are expected to be consistent across commonly used profiles Table 12 Fragment Output Binding Semantics Name Meaning Type Default Value COLOR RGBA output color float4 Undefined COLORO Sa
196. erally are executed many more times than vertex programs Therefore move computation from fragment programs into vertex programs whenever possible Recall that varying outputs from vertex programs are automatically linearly interpolated before being passed to the fragment program There are three main cases where you can move computation from a fragment program into a vertex program Q The result is constant over all fragments If the vertex shader computes a value that is the same for all vertices so that all fragments receive the same value after interpolation any computation that the fragment shaders do that is based solely on such values can be moved to the vertex shader as long as it doesn t require texture map lookups or other fragment only operations Q The result is linear across a triangle If the fragment shader is computing a value that varies linearly over the face of the triangle for example the distance from the fragment to a light source to be used for attenuation the value can be computed in the vertex shader at each vertex passed to the fragment shader and automatically interpolated by the GPU along the way Q The result is nearly linear across a triangle When a value computed by a fragment shader varies slowly over triangles it may be an acceptable approximation to compute its value at each vertex and use its linearly interpolated value in the fragment shader For example the usual Gouraud shading algorithm take
197. ertexProgram 0 Load the program cgGLLoadProgram vertexProgram Create the fragment program fragmentProgram cgCreateProgramFromFile context CG SOURCE FragmentProgram cg fragmentProfile FragmentProgram 0 Load the program cgGLLoadProgram fragmentProgram Grab some parameters position cgGetNamedParameter vertexProgram position color cgGetNamedParameter vertexProgram color texCoord cgGetNamedParameter vertexProgram texCoord modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor Set parameters that don t change They can be set only once because of parameter shadowing cgGLSetTextureParameter baseTexture texture cgGLSetParameter4fv someColor constantColor Il Called to render the seen void Display 1 Set the varying parameters cgGLEnableClientState position 808 00504 0000 006 83 NVIDIA Cg Language Toolkit cgGLSetParameterPointer position 3 GL FLOAT 0 vertexPositions cgGLEnableClientState color cgGLSetParameterPointer color 1 GL FLOAT 0 vertexColors cgGLEnableClientState texCoord cgGLSetParameterPointer texCoord 2 GL FLOAT 0 vertexTexCoords Set the uniform parameters that change ev
198. ery frame cgGLSetStateMatrixParameter modelViewMatrix CG GL MODELVIEW PROJECTION MATRIX CG GL MATRIX IDENTITY Enable the profiles cgGLEnableProfile vertexProfile cgGLEnableProfile fragmentProfile Bind the programs cgGLBindProgram vertexProgram cgGLBindProgram fragmentProgram Enable texture cgGLEnableTextureParameter baseTexture Draw scene Vif Disable texture cgGLDisableTextureParameter baseTexture Disable the profiles cgGLDisableProfile vertexProfile cgGLDisableProfile fragmentProfile Set the varying parameters cgGLDisableClientState position cgGLDisableClientState color cgGLDisableClientState texCoord Called before application shuts down void CgShutdown This frees any runtime resource 84 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library cgDestroyContext context OpenGL Error Reporting Here is the list of the CGerror errors specific to the OpenGL Cg runtime Q CG_PROGRAM_LOAD_ERROR Returned when the program could not be loaded Q CG_PROGRAM_BIND_ERROR Returned when the program could not be bound O CG_PROGRAM_NOT_LOADED_ERROR Returned when the program must be loaded before the operation may be used O CG_UNSUPPORTED_GL_EXTENSION_ERROR Returned when an unsupported Open GL extension is required to perform the operation Any OpenGL Cg runtime f
199. es where you can push the GPU to its limits though careful programming The Cg language shields you from the majority of the low level details of GPU hardware enabling you to think about your shaders at a higher level than the low level GPU instruction sets However just as an understanding of modern computer architecture such as cache and memory hierarchy issues is important for writing fast C and C code understanding a bit about the GPU can help you write better Cg code This appendix focuses on techniques for maximizing performance from vertex and fragment programs written in Cg and running on the NVIDIA GeForce FX architecture specifically the vp30 p30 arbfp1 ps_2_0 ps_2_x vs_2_0 and vs_2_x profiles although many of the principles are more broadly applicable Program for Vectorization The GPU can generally perform four arithmetic operations as quickly as it can perform a single operation Therefore if you have two vectors of four floating point values float ay 195 you can add the two vectors together float4 c ath 808 00504 0000 006 321 NVIDIA Cg Language Toolkit with no more computational expense than adding together two of their elements loe Cl os se IB This has two implications for efficient programming First you should try to write code that naturally maps to these vector operations If you want to add two 1oat4 variables together it may be substantially less efficient to write
200. ese types can be used to hold the outputs of a fragment program Their use is strictly optional For the ps_1 and p20 profiles the ragout structure is defined as follows struct deer 1 closet col COLOR y The ps_2 arbfp1 and p30 profiles have two fragment output types defined struct Tragout 1 half4 col CONOR float depth DEPTH y struct fragout float float col COLOR loe GClejoirln DERT y 42 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library This chapter introduces the Cg Runtime Library It assumes that you have some basic knowledge of the Cg language as well as the OpenGL or Direct3D APIs depending on which one you use in your applications The first section Introducing the Cg Runtime on page 43 describes the benefits of using the Cg Runtime Library and gives a brief overview of how it is used in an application to create and manage Cg programs The next two sections Core Cg Runtime on page 49 and API Specific Cg Runtimes on page 72 describe the APIs composing the Cg Runtime This chapter is primarily focused on using the Cg runtime to directly create and manage Cg programs The following chapter Introduction to CgFX describes how the runtime may also be used to create and manage Cg based shader effects Introducing the Cg Runtime Cg programs are lines of code that describe shading but they need the support of applications to create
201. estroyContext context Note that destroying a context destroys all the programs it contains as well Core Cg Runtime The core Cg runtime provides all the functions necessary to manage Cg programs from within the application It makes no assumption about which 3D API the applications uses so that any application could easily ignore the API specific Cg runtime libraries and content itself with the core Cg runtime The core Cg runtime is built around three main concepts context program and parameter which are represented by the CGcontext CGprogram and CGparameter object types Those concepts are hierarchically related one to each other a program has several parameters a context contains several programs and shared parameters and the application can define several contexts The next sections describe these three basic object types and the runtime entry points that operate on them The three object types have some points in common O The use of CGboo1 which is an integer type equal to either CG_TRUE or CG FALSE Q The use of CGenum which is an enumerate type used to specify various enumerate values that are not necessarily related Q Theconvention that functions that return a value of type CGcontext CGprogram CGparameter Or const char indicate failure by returning Zero 808 00504 0000 006 49 NVIDIA Cg Language Toolkit Core Cg Context The Cg runtime provides functions for creating destroying and quer
202. etimes useful as in tabularizing complex functions into texture maps Programs that are to run on the VM are declared as follows float foo 4 S AMAN O A DO SON O NS E SE 8 COLOR ISIE UCIN EOS jon APP The POSITION semantic denotes the parameter or parameters that are initialized with the coordinates of each point at which the function is evaluated The value passed varies from zero to one in each of the dimensions over which the function is being evaluated The PSIZE semantic denotes the parameter that is initialized with the spacing between samples at which the function is being evaluated Lastly the COLOR semantic denotes which parameter or function return value holds the computed value Thus the function above could have been written as a void function but with an out float4 ret COLOR parameter and an assignment to ret instead of using a return statement Given an effect file with such a program a CGprogram handle to it can be retrieved by creating a program using the CG PROFILE GENERIC profile CGprogram tp cgCreateProgramFromEffect effect CG PROFILE GENERIC Minas JUI P Given such a program handle cgEvaluateProgram evaluates the program over the same one two or three dimensional domain cgEvaluateProgram Cgprogram prog float obuf int ncomp WME TX ME My Gime m y Where prog is the Cgprogram handle retrieved using cgCreateProgramFromEffect obuf is the buffer
203. ex Subsurface Scattering lighting models It also illustrates the use of Rim lighting and simple translucency for capturing some of the more subtle properties of skin resulting from complex non local lighting interactions Finally it shows how the various techniques can be combined to produce compelling stylized skin Fig 10 Example of Skin Pixel Shader Source Code for Skin STUE Eragi float2 texcoords TEXCOORDO 808 00504 0000 006 175 NVIDIA Cg Language Toolkit float4 shadowcoords TEXCOORD1 float4 tangent ToEyeMat0 TEXCOORD4 float3 tangent ToEyeMat1l TEXCOORD5 float3 tangent ToEyeMat2 TEXCOORD6 float3 eyeSpacePosition TEXCOORD7 H itlkeyeue S laciomase i icleacs wil closes w2 EloemsS ep float costheta lost ae float3 gtemp Costnera cle vl we jp g2 g g gtemp 1 0 xxx g2 2 0 g costheta gtemp pow gtemp 1 5 xxx gtemp 1 0 xxx g2 gtemp return gtemp Computes the single scattering approximation to scattering from a one dimensional volumetric surface Proart o sine lescnteci locas vy Eloses vo Eloctes a float3 g float3 albedo float thickness float win abs dot wi n float won abs dot wo n float eterm float3 result term 1L Seo wa won thickness JF result eterm albedo hgphase wo wi g win won return result Ki i ie tha incident ray n is the surface normal eta
204. ex2D diffuseMap IN texCoord0 xy float4 normal 2 tex2D normalMap IN texCoordl xy 0 5 flees lagi vector 2 UN color eo 0 5 float4 dot result saturate dot light_vector moral Sa SS A return dot_result diffuseTexColor Example 2 struct VertexOut float4 texCoord0 TEXCOORDO float4 texCoordl TEXCOORD1 float4 texCoord2 TEXCOORD2 float4 texCoord3 TEXCOORD3 y float4 main VertexOut IN uniform sampler2D normalMap uniform sampler2D intensityMap uniform sampler2D colorMap COLOR float4 normal 2 tex2D normalMap IN texCoord0 xy 0 5 float2 intensCoord float2 dot IN texCoordl xyz normal xyz doe INstexCoord2yxyz no na ly e float4 intensity tex2D intensityMap intensCoord float4 color tex2D colorMap IN texCoord3 xy return color intensity 808 00504 0000 006 319 NVIDIA Cg Language Toolkit 320 808 00504 0000 006 NVIDIA Appendix C Nine Steps to High Performance Cg Writing Cg code that compiles to efficient programs requires techniques and approaches that are different from efficient programming in C C or Java While some of the basic lessons are the same such as using efficient underlying algorithms the hardware programming model of modern GPUs is substantially different from that of modern CPUs This can lead to pitfalls where you may be disappointed by your shader s performance as well as to opportuniti
205. float2 newst float2 dot intermediate coord xyz prevlookup xyz dot str prevlookup xyz return tex2D tex newst where str are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation and intermediate coord are texture coordinates associated with the previous texture unit This function can be used to generate the texm3x2pad texm3x2tex instruction combination in all ps 1 x profiles tex3D dp3x3 sampler3D tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup texCUBE dp3x3 samplerCUBE tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup Performs the following float3 newst float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot str prevlookup xyz return tex3D CUBE tex newst where 316 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 54 ps_1_x Auxiliary Texture Functions continued Texture Function Description str are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate_coord1 are texture coordinates associated with the n 2 texture unit and intermediate coord2 are texture coordinates associated with the n 1 texture unit This function can be used to generate the texm3x3pad texm3x3pad texm3x3tex instruction combination in a
206. float3 myscalar myscalar myscalar yield the same value If only one swizzle character is specified the result is a scalar not a vector of length one Therefore the expression b y returns a scalar Care is required when swizzling a constant scalar because of ambiguity in the use of the decimal point character For example to create a three vector from a scalar use one of the following 1 xxx Of 1 xxx Ot 1 0 xxx Of 1 0f xxx The size of the returned vector is determined by the number of swizzle characters Therefore the size of the result may be larger or smaller than the size of the original vector For example float2 0 1 xxyy and float4 0 0 1 1 yield the same result Q Matrix swizzle operator For any matrix type of the form lt type gt lt rows gt x lt columns gt the notation matrixObject m row col m row col can be used to access individual matrix elements in the case of only one row col pair or to construct vectors from elements of a matrix in the case of more than one lt row gt lt co1 gt pair The row and column numbers are zero based 808 00504 0000 006 245 NVIDIA Cg Language Toolkit For example float4x4 myMatrix it leu myFloatScalar float4 myFloatVec4 Set myFloatScalar to myMatrix 3 2 myFloatScalar myMatrix m32 Assign the main diagonal of myMatrix to myFloatVec4 myFloatVec4 myMatrix m00 m11 m22 m33 For compatibility wit
207. foo parameter of main When the value of bar is changed by the application the value of foo in main is set appropriately The second class of program state assignment types is assembly code In line assembly is indicated using the asm keyword with the assembly language code between braces as in the example above CgFX depends on having the appropriate header at the start of the assembly FP1 0 for p30 ARBvp1 0 for arbvp1 and so on to determine the profile for which the code is given Finally vertex or fragment programs may be assigned the value NULL in the state assignment This signifies that no such program should be used in this pass Textures and Samplers CgFX also makes it possible to define state related to textures in the effect file The effect file below shows an example The full set of supported OpenGL texture state is listed in OpenGL State on page 129 sampler2D samp sampler_state generateMipMap true minFilter LinearMipMapLinear magFilter Linear y float4 texsimple uniform sampler2D sampler fiost2 uw rf TEXCOORDO COLOR 4 return tex2D sampler uv technique TextureSimple pass FragmentProgram compile arbfpl texsimple samp 808 00504 0000 006 123 NVIDIA Cg Language Toolkit Given this effect file the application must take an extra step or two when setting up the texture in OpenGL First the application must indicate which
208. form float4x4 TextureMat pusubirg yem loe Wiis uniform float4 Wavel uniform float4 WavelOrigin uniform float4 Wave2 uniform float4 Wave20rigin const uniform float4 WaveData 5 vert2frag OUT 158 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders float4 position float4 IN Position x O0 JUIN 18 Sali 3L oma Wp 1L p float4 normal float4 0 1 0 0 float dampening 1 dot position xyz position xyz 1000 float 2 giabsmp delit 2o woe ax OF a 5e O float waveTime Time x WaveData i z float frequency WaveData i z float height WaveData i w float2 waveDir WaveData i xy calcWave disp norm dampening IN Position xyz waveTime height frequency waveDir OSLELOM y POSE lomy SO normal xz normal xz norm OUT HPosition mul ModelViewProj position transfom normal into eye space normal mul ModelViewIT normal normal xyz normalize normal xyz get a vector from the vertex to the eye float3 eyeToVert mul ModelView position xyz eyeToVert normalize eyeToVert calculate the reflected vector for cubemap look up float4 reflected mul TextureMat reflect eyeToVert normal xyz xyzz output two reflection vectors for the two environment cubemaps OUT TexCoord0 reflected OUT TexCoordl reflected Ii Calevlete a Exesmel term mote that ix 0 float fres 1 dot eyeToVert normal xyz fres pow fres
209. g the state defined in the passes in a technique The loop below demonstrates the standard approach for looping over a technique s passes and applying their states in turn CGpass pass cgGetFirstPass technique while pass cgSetPassState pass drawGeom cgResetPassState pass pass cgGetNextPass pass 120 808 00504 0000 006 NVIDIA Introduction to CgFX Each of the state assignments in a pass translates directly to an OpenGL API call For example LightingEnable true translates to the call glEnable GL_LIGHTING and LightPosition 0 float4 10 10 10 1 translates to the call glLightfv GL_LIGHTO GL POSITION v where vis an array of four GL 1oat values Before or after the call to cySetPassState the application is of course free to set other OpenGL state as desired However any state set before the call to cgSetPassState may be overridden by the pass Note that if the technique containing the indicated pass has not been validated calling cgSetStatePass triggers an attempted validation of the technique If validation fails a runtime error results After the geometry has been drawn cgResetPassState resets the state that was set by the pass to the default values as specified by OpenGL Note that it does not reset state to its values before cgSetPassState an application that desires this behavior should either push and pop OpenGL state or should manually examine the state assignme
210. gCreateProgramFromFile context CG SOURCE VertexProgram cg CG_PROFILE_VS_1_1 VertexProgram 0 CComPtr lt ID3DXBuffer gt byteCode const char progSrc cgGetProgramString vertexProgram 808 00504 0000 006 95 NVIDIA Cg Language Toolkit CG COMPILED PROGRAM Normally you also grab the constants and prepend them to your vertex declaration Not shown here for brevity D3DXAssembleShader progSrc strlen progSrc 0 0 O0 amp byteCode 0 If your program uses explicit binding semantics like this one you can create a vertex declaration using those semantics DWORD declaration D3DVSD STREAM 0 D3DVSD REG D3DVSDE POSITION D3DVSDT FLOAT3 D3DVSD REG D3DVSDE DIFFUSE D3DVSDT_D3DCOLOR D3DVSD REG D3DVSDE TEXCOORDO D3DVSDT FLOAT2 D3DVSD END Make sure the resulting declaration is compatible with the shader This is really just a sanity check assert cgD3D8ValidateVertexDeclaration vertexProgram declaration Create the shader handle using the declaration device gt CreateVertexShader declaration byteCode gt GetBufferPointer amp vertexShader 0 Create the pixel shader fragmentProgram cgCreateProgramFromFile context CG_SOURCE FragmentProgram cg CE Miro ba 198 1 1 rracmena zogrent 0 CComPtr lt ID3DXBuffer gt byteCode const char progSrc cgGetProgramString fragmentProg
211. gh the run time API Use of Uninitialized Variables It is incorrect for a program to use an uninitialized variable However the compiler is not obligated to detect such errors even if it would be possible to do so by compile time data flow analysis The value obtained from reading an uninitialized variable is undefined This same rule applies to the implicit use of a variable that occurs when it is returned by a top level function In particular if a top level function returns a struct and some element of that struct is never written then the value of that element is undefined Note Variables are not defined as being initialized to zero because this would result in a performance penalty in cases where the compiler is unable to determine if a variable is properly initialized by the programmer Preprocessor Cg profiles must support the full ANSI C standard preprocessor capabilities if define and so on However Cg profiles are not required to support macro like define or the use of include directives Overview of Binding Semantics In stream processing architectures data packets flow between different programmable units On a GPU for example packets of vertex data flow from the application to the vertex program Because packets are produced by one program the application in this case and consumed by another the vertex program there must be some method for defining the interface between the two The approach us
212. gnment is based on the context in which uniform sampler parameters and texture coordinate inputs are used together 312 808 00504 0000 006 NVIDIA Appendix B Language Profiles To specify bindings between texture units and uniform parameters texture coordinates to match their application all sampler uniform parameters and texture coordinate inputs that are used in the program must have matching binding semantics that is TEXUNIT lt n gt may only be used with TEXCOORD lt n gt Partially specified binding semantics may not work in all cases Fundamentally this restriction is due to the close coupling between texture samplers and texture coordinates in DirectX pixel shaders 1_X Binding Semantics for Uniform Data If a binding semantic for a uniform parameter is not specified then the compiler will allocate one automatically Scalar uniform parameters may be allocated to either the xyz or the w portion of a constant register depending on how they are used within the Cg program When using the output of the compiler without the Cg runtime you must set all values of a scalar uniform to the desired scalar value not just the x component The valid binding semantics for uniform parameters in the ps_1_x profiles are summarized in Table 51 Table 51 ps_1_x Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register s0 register s3 Texture unit N where N is in range 0 3 TEXUNITO TEXTUNIT
213. gnments are limited to OpenGL state related to rendering geometric primitives OpenGL state that is not assignable using the built in OpenGL state manager includes the following Q Pixel path state such as pixel transfer and convolution state Q Per vertex attributes such as glColor or glNorma1 Q Client side state such as vertex arrays and pixel store modes 142 NVIDIA 808 00504 0000 006 Introduction to CgFX Vertex and pixel buffer object state Miscellaneous state for evaluators feedback selection or occlusion queries Q Texture environment GL_COMBINE state Although related to rendering it is complex and redundant with fragment color operations better specified with Cg fragment programs Future enhancements may allow assignments for currently unassignable OpenGL state 808 00504 0000 006 143 NVIDIA Cg Language Toolkit 144 808 00504 0000 006 NVIDIA A Brief Tutorial This section walks you through the sample Cg Microsoft Visual Studio workspace we have provided along with a simple Cg program that you can use for experimentation Loading the Workspace When you load the Cg_Simple file your workspace should look like the image in Fig 3 cg simple Microsoft Visual C C cg_simple simple cg El Ele Edt view Insert Project Build Tools Window Help 1d Sua A RASA Ja Sm INS define inputs from application Workspace cg_simple 1 proje struct appin E cg
214. h OpenGL or Direct3D It addresses the following four issues Q The Cg language lets you easily express how an object should be rendered Although current Cg profiles describe only a single rendering pass many shading techniques such as shadow volumes or shadow maps require more than one rendering pass Q Many applications need to target a wide range of graphics hardware functionality and performance Thus versions of shaders that run on older hardware and versions that aid performance for distant objects are important Q Each Cg program typically targets a single profile and doesn t specify how to fall back to other profiles to assembly language shaders or to fixed function vertex or fragment processing Q To generate images with Cg programs some information about their environment is needed For instance some programs might require alpha blending to be turned on and depth writes to be disabled Others may need a certain texture format to work correctly This information is not present in standard Cg source files Techniques Each CgFX file usually presents a certain effect that the shader author is trying to achieve such as bump mapping environment mapping or anisotropic lighting The CgFX file contains one or more techniques each of which describes a way to achieve the effect Each technique usually targets a 808 00504 0000 006 25 NVIDIA Cg Language Toolkit Passes certain level of GPU functionality so a
215. h the D3DMatrix data type Cg also allows one based swizzles using a form with the m omitted after the _ symbol matrixObject row col row col In this form the indexes for row and co1 are one based rather than the C standard zero based So the two forms are functionally equivalent float4x4 myMatrix float4 myVec These two statements are functionally equivalent myVec myMatrix m00 m23 m11 m31 myVec myMatrix 11 34 22 42 Because of the confusion that can be caused by the one based indexing use of the latter notation is strongly discouraged The matrix swizzles may only be applied to matrices When multiple components are extracted from a matrix using a swizzle the result is an appropriately sized vector When a swizzle is used to extract a single component from a matrix the result is a scalar The write mask operator It can only be applied to an lvalue that is a vector It allows assignment to particular elements of a vector or matrix leaving other elements unchanged The only restriction is that a component cannot be repeated Arithmetic Precision and Range Some hardware may not conform exactly to IEEE arithmetic rules Fixed point data types do not have IEEE defined rules Optimizations are allowed to produce slightly different results than unoptimized code Constant folding must be done with approximately the 246 808 00504 0000 006 NVIDIA Appendix A Cg Langu
216. half angle vector float3 eyeVec float3 0 0 0 0 1 0 float3 halfVec normalize lightVec eyeVec Calculate diffuse component float diffuse dot normalVec lightVec Calculate specular component float specular dot normalVec halfVec Use the lit function to compute lighting vector from 808 00504 0000 006 147 NVIDIA Cg Language Toolkit diffuse and specular values float4 lighting lit diffuse specular 32 Blue diffuse material moato chicituseiecerial tloaes 0 0 0 10 La 5 White specular material fal eat specu lateMat eral allel ate on lel INDE Combine diffuse and specular contributions and output final vertex color OUT Color rgb lighting y diffuseMaterial lighting z specularMaterial OUT Color a 1 0 return OUT Definitions for Structures with Varying Data The first thing to notice is the definitions of structures with binding semantics for varying data Let s take a look at the appin structure define inputs from application struct appin float4 Position IDO SIRIO Ne float4 Normal NORMAL This structure contains only two members Position and Normal Because this data varies per vertex the binding semantics POSITION and NORMAL tell the compiler that the position information is associated with the predefined attribute POSITION and that the normal information is associated with the predefined attribute NORMAL The other
217. have a cgD3D9 prefix Because most of the functions are identical between the two runtimes we describe the Direct3D 9 Cg runtime with the understanding that the description applies to the Direct3D 8 Cg runtime as well unless otherwise indicated The same prefix convention used for the function names is also used for the type names macro names and enumerant values Header Files Here is how to include the core Cg runtime API into your C or C program include lt Cg cg h gt Here is how to include the OpenGL Cg runtime API include lt Cg cgGL h gt Here is how to include the Direct3D 9 Cg runtime API include lt Cg cgD3D9 h gt And here is how to include the Direct3D 8 Cg runtime API include lt Cg cgD3D8 h gt Creating a Context A context is a container for multiple Cg programs It holds the Cg programs as well as their shared data Here s how to create a context CGcontext context cgCreateContext Compiling a Program Compile a Cg program by adding it to a context with cgCreateProgram CGprogram program cgCreateProgram context CG_SOURCE myVertexProgramString CG PROFILE ARBVP1 main args CG SOURCE indicates that myVertexProgramString a string argument contains Cg source code not precompiled object code Indeed the Cg runtime also lets you create a program from precompiled object code if you want to CG PROFILE AREBVP1 is the profile the program is to be compiled to The
218. he Cg Runtime Library Overview of the Cg Runtime The Cg runtime API consists of three parts Fig 2 Q A core set of functions and structures that encapsulates the entire functionality of the runtime O A set of functions specific to OpenGL built on top of the core set a Aset of functions specific to Direct3D built on top of the core set To make it easier for application writers the OpenGL and Direct3D runtime libraries adopt the philosophy and data structure style of their respective API Application Fig 2 The Parts of the Cg Runtime API The rest of the section provides instructions for using the Cg runtime in the framework of an application Each step includes source code for OpenGL and Direct3D programming Functions that involve only pure Cg resource management belong to the core runtime and have a cg prefix In these cases the same code is used for OpenGL and Direct3D When functions from the OpenGL or Direct3D Cg runtimes are used notice that the API name is indicated by the function name Functions belonging to the OpenGL Cg runtime library have a cgGL prefix and functions in the Direct3D Cg runtime library have a cgD3D prefix There are actually two Direct3D Cg runtime libraries One for Direct3D 8 and one for Direct3D 9 Functions belonging to the Direct3D 8 Cg runtime have a 808 00504 0000 006 45 NVIDIA Cg Language Toolkit cgD3D8 prefix and functions belonging to the Direct3D 9 Cg runtime
219. he depth output in ps 1 3 310 808 00504 0000 006 NVIDIA Appendix B Language Profiles is not supported Ternary is supported if the boolean test expression is a compile time boolean constant a uniform scalar boolean or a scalar comparison to a constant value in the range 0 5 1 0 for example a gt 0 5 b c Q do for and while loops are supported only when they can be completely unrolled Q arrays vectors and matrices may be indexed only by compile time constant values or index variables in loops that can be completely unrolled Q The discard statement is not supported The similar but less general clip function is supported Q The use of an allocation rule identifier for an input or output struct is optional Standard Library Functions Because the DirectX pixel shader 1_X profiles have limited capabilities not all of the Cg standard library functions are supported Table 49 presents the Cg standard library functions that are supported by these profiles See the standard library documentation for descriptions of these functions Table 49 Supported Standard Library Functions dot floatN floatN lerp floatN floatN floatN lerp floatN floatN float tex1D samplerl1D float tex1D sampler1D float2 tex1Dproj sampler1D float2 tex1Dproj sampler1D float3 tex2D sampler2D float2 tex2D sampler2D float3 tex2Dproj sampler2D float
220. he following swizzles are allowed x r y g z b w a xy rg xyz rgb xyzw rgba xxx rrr yyy ggg zzz bbb www aaa xxxx rrrr yyyy gggg zzzz bbbb wwww aaaa 808 00504 0000 006 285 NVIDIA Cg Language Toolkit Matrix swizzles are not supported Boolean operators other than lt lt gt and gt are not supported Furthermore lt lt gt and gt are only supported as the condition in the operator Bitwise integer operators are not supported is not supported unless the divisor is a non zero constant or it is used to compute the depth output is not supported Ternary is supported if the boolean test expression is a compile time boolean constant a uniform scalar boolean or a scalar comparison to a constant value in the range 0 5 1 0 for example a gt 0 5 b c O do for and while loops are supported only when they can be completely unrolled Q arrays vectors and matrices may be indexed only by compile time constant values or index variables in loops that can be completely unrolled Q The discard statement is not supported The similar but less general clip function is supported O The use of an allocation rule identifier for an input or output struct is optional Standard Library Functions Because the p20 profile has limited capabilities not all of the Cg standard library functions are supported The Cg standard library functions that ar
221. he interface The main program takes an unsized array of Light interface objects loops over them and returns the sum of the values returned by their respective value methods interface Light float4 value y SELUGIE Soo llei 2 bacine deitas value EE rulo a AS EE y float4 main uniform Light 1 COLOR 808 00504 0000 006 29 NVIDIA Cg Language Toolkit float4 v float4 0 0 0 0 foe ame 3 Of a lt lade x v l i value return v Recall that all uniform parameters to the program must have expressions in the parenthesized list in the compile statement and therefore one expression is necessary here for the one parameter The first way that main can be compiled is to give the name of an effect parameter that resolves both the actual size of the array as well as the concrete type that implements the Light interface Spor might spots technique pass FragmentProgram compile arbfpl main spots Alternatively the application can leave the resolution of the concrete types and array size until later so that they can be set via Cg runtime calls from the application This was the usual approach before CgFX 1 4 For this case the expression passed to the compile statement should just be an unsized array of the abstract interface type wieme laches lg technique pass FragmentProgram compile arbfpl main lights Running Cg Programs on the CPU There are
222. ic Profile Sample Shaders flost3 tU mul m IN T Eloato sxtU aula INS next bone i IN Indices y create 3x3 version of bone m _m00_m01_m02 Bones i _m00_m01_m02 m _m10_m11_m12 Bones i _m10_m11_m12 Tier Omer leer Bone AA Omer IA float3 posl mul Bones i tempPos I tzanstomn S UT Sx float3 sl mulim INS logics el il Gm dS 1 p tiloacs ssel dl Gm I N SIN E final blending li blemil m Ey Su float3 finalS sO IN Weights x sl IN Weights y Eloats Esa t0 IN Weights x tl IN Weights y float3 finalSxT sxt0 IN Weights x sxt1 IN Weights y blend between the two positions float3 finalPos pos0 IN Weights x posl1 IN Weights y float3x3 worldToTangentSpace worldToTangentSpace _m00_m01_m02 finalS worldToTangentSpace _m10_m11_m12 finalT worldToTangentSpace _m20_m21_m22 c 3Ligvel IL Sean e float3 tangentLight normalize mul worldToTangentSpace LightVec secale emd bias ech bie ut embleme tangentLight tangentLight 1 0 0 5 0 2 create float4 with 1 0 alpha float4 tempLight tempLight xyz tangentLight xyz tempLight w 1 0 OUT Color0 tempLight 808 00504 0000 006 219 NVIDIA Cg Language Toolkit 220 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification Language Overview The Cg language is primarily modeled on ANSI C but adopts some ideas fro
223. ic data types are float half and fixed Fragment profiles are required to support all three data types but may choose to implement half and fixed at float precision Vertex profiles are required to support half and float but may choose to implement half at float precision Vertex profiles may omit support for fixed operations but must still support definition of fixed variables Cg allows profiles to omit run time support for int Cg allows profiles to treat double as float Many operators support per element vector operations The amp amp and comparison operators can be used with bool four vectors to perform four conditional operations simultaneously The side effects of all operands to the and amp amp operators are always executed Q Non static global variables and parameters to top level functions such as main may be designated as uniform A uniform variable may be read and written within a program just like any other variable However the uniform modifier indicates that the initial value of the variable or parameter is expected to be constant across a large number of invocations of the program A new set of sampler types represents handles to texture objects Functions may have default values for their parameters as in C These defaults are expressed using assignment syntax Q Function overloading is supported 808 00504 0000 006 223 NVIDIA Cg Language Toolkit There
224. iform sampler2D normalMap COLOR float4 diffuseTexColor tex2D diffuseMap IN texCoord0 xy float4 normal 2 tex2D normalMap IN texCoordl xy 0 5 flog lagi wector 2 Mealor eo 0 5 float4 dot result saturate dot stg hivavie cols morma 9 2 cse n return dot_result diffuseTexColor Example 2 struct VertexOut float4 texCoord0 TEXCOORDO loan mises CO SC amen OOD lee float4 texCoord2 TEXCOORD2 suits Mte C oco MEETUPS ORD or y float4 main VertexOut IN uniform sampler2D normalMap uniform sampler2D intensityMap uniform sampler2D colorMap COLOR float4 normal 2 tex2D normalMap IN texCoord0 xy 0 5 float2 intensCoord float2 dot IN texCoordl xyz normal xyz dot IN texCoord2 xyz normal xyz float4 intensity tex2D intensityMap intensCoord float4 color tex2D colorMap IN texCoord3 xy recta colar iaeteas iy 808 00504 0000 006 295 NVIDIA Cg Language Toolkit DirectX Vertex Shader 2 x Profiles vs 2 Overview Memory The DirectX Vertex Shader 2 0 profiles are used to compile Cg source code to DirectX 9 VS 2 0 vertex shaders and DirectX 9 VS 2 0 Extended vertex shaders Q Profile names vs_2_0 for DirectX 9 VS 2 0 vertex shaders vs_2_x for DirectX 9 VS 2 0 extended vertex shaders Q How to invoke Use the compiler options profile vs_2_0 profile vs_2_x This section describes how using the vs_2_0 and vs 2 x
225. ightModelAmbient float4 1 0 LightAmbient ndx float4 1 0 ndx must be greater or equal to 0 and less than the value of GL MAX LIGHTS LightConstantAttenuation float Same as LightAmbient ndx LightDiffuse ndx float4 Same as LightAmbient LightLinearAttenuation float Same as LightAmbient ndx LightPosition ndx float4 Same as LightAmbient LightQuadraticAttenuation float Same as LightAmbient ndx LightSpecular ndx float4 Same as LightAmbient LightSpotCutoff ndx float Same as LightAmbient LightSpotDirection ndx float3 Same as LightAmbient 808 00504 0000 006 NVIDIA 133 Cg Language Toolkit Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires Light SpotExponent float Same as LightAmbient ndx LightModelColorControl int SingleColor OpenGL 1 2 or SeparateSpecular EXT_separate_ specular_color LineStipple int2 1 0 LineWidth float 1 0 LogicOp int Clear And 1 0 AndReverse Copy AndInverted Noop Xor Or Nor Equiv Invert OrReverse CopyInverted Nand Set MaterialAmbient float4 1 0 MaterialDiffuse float4 1 0 MaterialEmission float4 1 0 MaterialShininess float 1 0 MaterialSpecular float4 1 0 ModelViewMatrix float4x4 1 0 PointDistanceAttenuation float3 1 4 ARB point parameters or EXT point parameters PointFadeThresholdSize float 1 4 ARB point parameters or EXT point parameters
226. interface s reference to that texture so it can be destroyed and the Direct3D device can be reset from a lost state Later after resetting the Direct3D device and recreating the texture it needs to be re bound to the sampler parameter For example IDirect3DDevice9 device Initialized elsewhere IDirect3DTexture9 myDefaultPoolTexture CGprogram program void OneTimeLoadScene Load the program with cgD3D9LoadProgram and enable parameter shadowing 1 E EA cgD3D9LoadProgram program TRUE 0 0 0 Ferry Bind sampler parameter GCparameter parameter parameter cgGetParameterByName program MySampler cgD3D9SetTexture parameter myDefaultPoolTexture void OnLostDevice First release all necessary resources PrepareForReset Next actually reset the Direct3D devic device ese aan e NF Finally recreate all those resource OnReset void PrepareForReset Pe oso El Releas xpanded interface referenc cgD3D9SetTexture mySampler 0 Release local reference and any other references to the texture myDefaultPoolTexture gt Release PES naue fU 808 00504 0000 006 99 NVIDIA Cg Language Toolkit void OnReset Recreate myDefaultPoolTexture in D3DPOOL_DEFAULT Pe soa El Since the texture was just recreated it must be re bound to the parameter GCparameter parameter parameter cgGetParameterByName prog
227. ions such as are also supported when the corresponding arithmetic operator is supported by Cg Conditional Operator 25 If the first operand is of type bool one of the following statements must hold for the second and third operands Q Both operands have compatible structure types 248 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification Q Both operands are scalars with numeric or bool type O Both operands are vectors with numeric or bool type where the two vectors are of the same size which is less than or equal to four If the first operand is a packed vector of bool then the conditional selection is performed on an elementwise basis Both the second and third operands must be numeric vectors of the same size as the first operand Unlike C side effects in the expressions in the second and third operands are always executed regardless of the condition Miscellaneous Operators typecast Cg supports C s typecast and comma operators Reserved Words The following are the reserved words in Cg asm asm_fragment auto bool break case catch char class column major compile const const_cast continue decl default delete discard do double dword dynamic cast else emit enum explicit extern false fixed float for friend get goto half if in inline inout int interface long matrix mutable namespace new operator out packed pass pixelfragment pixelshader private protected pu
228. is destroyed only when all references to it are removed the application should call cgD3D9SsetDevice with zero as an input when it is done with a Direct3D device so that it gets destroyed when the application shuts down Otherwise Direct3D does not shut down properly and reports memory leaks to the debug console Note that calling cgD3D9SetDevice with zero as an input does not affect the Cg core runtime resources in any way all the related core runtime handles of type CGprogram CGparameter and so on remain valid If you call cgD3D9SetDevice a second time with a different device all programs managed by the old device are rebuilt using the new device Responding to Lost Direct3D Devices The expanded interface may hold references to Direct3D resources that need to be recreated in response to a lost device In particular certain sampler parameters might need to be released before a Direct3D device can be reset from a lost state The expanded interface is holding a reference to a texture that needs to be reset in response to a lost device if both of the following are true for a texture Q Itwas created in the D3DPOOL DEFAULT pool 98 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Q It was bound to a sampler parameter using cgD3D9SetTexture of a program for which parameter shadowing is enabled In this case the parameter must be set to zero using cgD3D9Set Texture to remove the expanded
229. is only visible when an application passes parameters to a vertex or fragment program Therefore the compiler is currently free to allocate temporary variables as it sees fit The declaration and use of arrays of arrays is in the same style as in C That is if the 2D array A is declared as loew AJANTA then the following statements are true Q The array is indexed as A row column Q The array can be built with a constructor using A i ALO 101 ALt ALO 2 ALOlI3I A 1 0 All 1 ALII 2 A 1 31 A 2 0 AI21 1 Al2 2 A 2 31 AIST tol ASTID Allil s m Q A 0 is equivalent to A 0 0 A 0 1 A 0 2 A 0 3 Support must be provided for any struct containing arrays Minimum Array Requirements Profiles are required to provide partial support for certain kinds of arrays This partial support is designed to support vectors and matrices in all profiles For vertex profiles it is additionally designed to support arrays of light state indexed by light number passed as uniform parameters and arrays of skinning matrices passed as uniform parameters Profiles must support subscripting copying and swizzling of vectors and matrices However subscripting with run time computed indices is not required to be supported Vertex profiles must support the following operations for any non packed array that is a uniform parameter to the program or is an element of a 238 808 00504 0000 006
230. is the ratio of indices of refraction r is the reflected ray is the transmitted ray 176 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders loe fresnel loss al Eloeis im lose Elica out logars r out Eloato t jJ float result float cui float sz loat Elec Refraction vector courtesy Paul Heckbert cl dot i n cs 1 0 eta eus 4 O cilci p celes los es2 gt 0 0 ig titlag Gtar cl seiet es2 a Start y le ws ilssSevohy unit lemojela Cie 1 0 10 0 Compute Fresnel terms From Global Illumination Compendeum Plot reXoloxtic P filogie Cosic_cliw cosils itle Coar eiv COSL close Esp flog tor loge lez meore lou m E y dosis chiy cosl meote ely cosa bw sos el Teitt is Cesie_chiv Cosi eta os COS IES SIS so cosl_ciwy_ cos eta cosi civ coss etaj fg o top ki 0 5 EE SE pol E resulti telag se Lo cr ka P r reflect i n return result float4 main fragin In uniform sampler2D tex0 uniform sampler2D texl uniform sampler2D tex2 uniform sampler2D tex3 uniform float3 eyeSpaceLightPosition uniform float thickness 808 00504 0000 006 177 NVIDIA Cg Language Toolkit uniform float4 ambient COLOR float bscale In tangentToEyeMat0 w float eta 1 0 1 4 If ratio OH UACLGSS Qu refraccion museum float m 34 specular exponent tloatd ilagincColoe
231. jective depth compare tex3D sampler3D tex float3 s 3D nonprojective tex3D sampler3D tex float3 s float3 dsdx float3 dsdy 3D nonprojective with derivatives tex3Dproj sampler3D tex float4 szq 3D projective depth compare texCUBE samplerCUBE tex float3 s Cubemap nonprojective texCUBE samplerCUBE tex float3 s float3 dsdx float3 dsdy Cubemap nonprojective with derivatives texCUBEproj samplerCUBE tex float4 sq Cubemap projective 40 808 00504 0000 006 NVIDIA Cg Standard Library Functions In the table the name of the second argument to each function indicates how its values are used when performing the texture lookup s indicates a 1 2 or 3 component texture coordinate z indicates a depth comparison value for shadowmap lookups q indicates a perspective value and is used to divide the texture coordinate s before the texture lookup is performed For convenience the standard library also defines versions of the texture functions prefixed with h4 such as h4tex2D that return hal 4 values and prefixed with x4 such as x4tex2D that return fixed4 values When the texture functions that allow specifying a depth comparison value are used the associated texture unit must be configured for depth compare texturing Otherwise no depth comparison is actually performed Derivative Functions Table 4 Derivative Functions presents the derivative functions that are
232. l functions are for multiplying matrices by vectors and matrices by matrices Matrix by column vector multiply mar reco Mumma we Cie One B uil MA Row vector by matrix multiply row vector matrix mul v M Matrix by matrix multiply matrix matrix mul M N It is important to use the correct version of mul Otherwise you are likely to get unexpected results More detail on the mu1 functions are provided in Cg Standard Library Functions on page 33 20 808 00504 0000 006 NVIDIA Introduction to the Cg Language Vector Constructor Cg allows vectors up to size 4 to be constructed using the following notation y load 3 0 2560 150 1 0 The vector constructor can appear anywhere in an expression Furthermore vectors can be constructed from smaller vectors MA amp sss float b tloatst a 0 0 1 0 Boolean and Comparison Operators Cg includes three of the standard C boolean operators amp amp logical AND I logical OR logical negation In C these operators consume and produce values of type int but in Cg they consume and produce values of type bool This difference is not normally noticeable except when declaring a variable that will hold the value of a boolean expression Cg also supports the C comparison operators which produce values of type bool lt less than lt less than or equal to inequality equality gt greater than or equal to gt
233. l bits set to 0 corresponds to the value 128 127 and a representation with all bits set to 1 corresponds to 127 127 The four signed integers are then packed into a single 32 bit result This operation may be reversed using the unpack 4byte function C Pseudocode iplo roume ia elema laczy 128 127 3307 1299 ae 128 p lolo segwumol 127 elemala yo 128 127 227 127 129 p lolo zs segweumogl 127 lem anz 128 127 127 1279 s 128 p Wow rovni Ar elama la w 126 127 127 129 ae 129p restile woow lt lt 24 wo lt lt 16 wo s 6 wox 808 00504 0000 006 277 NVIDIA Cg Language Toolkit unpack_4byte half4 unpack_4byte float a Unpacks four 8 bit integers from a and scales the results into individual 16 bit floating point values between 128 127 and 127 127 C Pseudocode e SU a e 0 amp Osan 329 X275 resule Ma gt gt 8 Osa 128 127 07 mesi a gt gt 16 E Osan 128 127 06 reste a x 24 m Odin 125 127 07 pack_4ubyte float pack _4ubyte float4 a float pack _4ubyte half4 a Converts the four components of a into 8 bit unsigned integers The unsigned integers are such that a representation with all bits set to 0 corresponds to 0 0 and a representation with all bits set to 1 corresponds to 1 0 The four unsigned integers are then packed into a single 32 bit result This operation can be reversed using
234. l time graphics this format provides several key benefits O Encapsulation of multiple rendering techniques enabling fallbacks for level of detail functionality and performance Support for Cg assembly language and fixed function shaders Editable parameters and GUI descriptions embedded in the file Multipass shaders 24 808 00504 0000 006 NVIDIA Introduction to the Cg Language Q Render state and texture state specification In practical terms by wrapping both Cg vertex programs and Cg fragment programs together with render state texture state and pass information developers can describe a complete rendering effect Although individual Cg programs may contain the core rendering algorithms necessary for an effect only when combined with this additional environmental information does the shader become complete and self contained The addition of artist friendly GUI descriptions and fallbacks enables CgFX files to integrate well with the production workflow used by artists and programmers CgFX encapsulates in a single text file everything needed to apply a rendering effect This feature lets a third party tool or another 3D application use a CgFX text file as is with no external information other than the necessary geometry and texture data In this sense CgFX acts as an interchange format CgFX allows shaders to be exchanged without the associated C code that is normally necessary to make a Cg program work wit
235. ldSpacePos h normalize l e float3 halfAngle normalize vertToEye LightVec X max dot LightVec worldNormal 0 0 y max dot halfAngle worldNormal 0 0 transform into homogeneous clip space mul WorldViewProj tempPos 808 00504 0000 006 191 NVIDIA Cg Language Toolkit Bump Dot3x2 Diffuse and Specular Description The bump dot3x2 diffuse and specular effect mixes bump mapping with diffuse and specular lighting based on the texm3x2tex DirectX 8 pixel shader instruction DOT_PRODUCT_TEXTURE_2D in OpenGL This instruction computes the dot product of the normal and the light vector corresponding to the diffuse light component and the dot product of the normal and the half angle vector corresponding to the specular light component This results into two scalar values that are used as texture coordinates to look up a 2D illumination texture containing the diffuse color and the specular term in its alpha component Since the normal fetched from the normal map is in tangent space both the light vector and the half angle vector are transformed to this space by the vertex shader Fig 14 Fig 14 Example of Bump Dot3x2 Diffuse and Specular 192 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Bump Dot3x2 Semuce ex f y float4 Position POSITION in object space float3 Normal NORMAL in object space float2 TexCcoord TEX
236. ler2D ColorMap color components radius irisDepth eta lensDensity uniform float4 BallData 172 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders components phongExp gloss1 gloss2 drop uniform uniform uniform uniform uniform uniform float4 GlossData float3 AmbiColor losa WalirirtColloie float3 SpecColor float3 LensColor ioar o BG COMO COLOR const half3 baseTex half3 1 0h 1 0h 1 0h const half GRADE 0 05h const half3 yAxis half3 0 0h 1 0h 0 0h const ha const ha ii lectu nadle abest Sarei nali asea wee aLa half3 pu Pip alae Sx half D avale eLa half4 pl view half3 Vn nalia NE half3 Ln naes IDa half3 mi als Dal half3 ha half ndh half spe nal ye specl specl half3 Sp aces inal skit elie half g 1f3 xAxis half3 1 0h 0 0h 0 0h ies bellet aedis 0 Ola 0 Ola 0 5 Ole y ally constants could be done in VP or on CPU sSize BallData RADIUS 0h BallData IRIS DEPTH BallData IRIS DEPTH sScale 0 3333h max 0 01h irisSize sDist BallData RADIUS BallData IRIS DEPTH fowiceinceie beller a gt halra GuxesbeysL sic 0 Ola 5 Olin E axis returns simple irisDist dot pupilCenter xAxis cs MN ROP oO sate Wome real smile aneEquation half4 xAxis D vector TO surface normalize IN OPosition IN VPosition normalize IN N IN LightVecO xyz EtaGine Distcolor
237. les supports MRTs The MaxDrawBuffers profile option may be used to explicitly set the number of draw buffers that is render targets available on the target hardware If the input program requires more than the specified number of draw buffers compilation fails If the MaxDrawBuf fers profile option is not specified the stand alone Cg compiler cgc assumes that the target hardware supports MRTs to whatever extent required by the input program When compiling programs using the Cg runtime be sure to call cgGLSetOptimalOptions under OpenGL or call cgD3D9Get Opt imalOptions under Direct3D These functions allow you to 2 To understand the capabilities of OpenGL ARB fragment programs and the code produced by the compiler refer to the ARB fragment program extension in the OpenGL Extensions documentation 808 00504 0000 006 263 NVIDIA Cg Language Toolkit automatically determine the value for the MaxDrawBuffers profile option that is appropriate for the graphics hardware on the target machine Resource Limits The ARB_fragment_profile specifications allows an OpenGL implementation to place limits on the numbers and types of resources that a fragment program may use If these resource limits must be exceeded to compile a Cg program the compilation will fail Resources that may be limited include the number of instructions the number of registers and the number of dependent texture reads The arb p1 profile suppo
238. lication on page 106 and Expanded Interface DirectD3D 8 Application on page 109 Expanded Interface Vertex Program The following Cg code is assumed to be in a file called VertexProgram cg void VertexProgram in float4 position POSITION in float4 color INCOLORO in float4 texCoord TEXCOORDO out float4 positionO POSITION ate Eloei Coloro TT COLORO out float4 texCoordO TEXCOORDO const uniform float4x4 ModelViewMatrix positionO mul position ModelViewMatrix Colon On eolon texCoordO texCoord Expanded Interface Fragment Program The following Cg code is assumed to be in a file called FragmentProgram cg void FragmentProgram da Ellos celos INCOLORO in float4 texCoord TEXCOORDO out float4 coloro COLORO const uniform sampler2D BaseTexture const uniform float4 SomeColor colorO color tex2D BaseTexture texCoord SomeColor Expanded Interface DirectD3D 9 Application The following C code links the previous vertex and fragment programs to the Direct3D 9 application include lt cg cg h gt include lt cg cgD3D9 h gt IDirect3DDevice9 device Initialized somewhere else IDirect3DTexture9 texture Initialized somewhere else D3DXCOLOR constantColor Initialized somewhere else CGcontext context IDirect3DVertexDeclaration9 vertexDeclaration CGprogram vertexProgram fragmentProgram CGparameter baseTexture someColor modelViewMat
239. ll ps 1 x profiles texCUBE reflect dp3x3 uniform samplerCUBE tex float4 strq float4 intermediate coordl float4 intermediate coord2 float4 prevlookup Performs the following float3 E float3 intermediate coord2 w intermediate coordl w strq w float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot strq xyz prevlookup xyz return texCUBE tex 2 dot N E dot N N N E where strq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate _coordl are texture coordinates associated with the n 2 texture unit and intermediate coord2 are texture coordinates associated with the n 1 texture unit This function can be used to generate the texm3x3pad texm3x3pad texm3x3vspec instruction combination in all ps 1 x profiles 808 00504 0000 006 NVIDIA 317 Cg Language Toolkit Table 54 ps 1 x Auxiliary Texture Functions continued Texture Function Description texCUBE reflect eye dp3x3 uniform samplerCUBE tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup uniform float3 eye Performs the following float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot coords xyz prevlookup xyz return texCUBE tex 2 dot N E dot N N N E where s
240. loating point values The two converted components are then packed into a single 32 bit result This operation can be reversed using the unpack 2half function C Pseudocode result half a y lt lt 16 half a x unpack 2half half2 unpack 2half float a Unpacks a 32 bit value into two 16 bit floating point values C Pseudocode result x a gt gt 0 amp OxFF result y a gt gt 16 OxFF 276 808 00504 0000 006 NVIDIA Appendix B Language Profiles pack 2ushort float pack 2ushort float2 a float pack 2ushort half2 a Converts the components of a into a pair of 16 bit unsigned integers The two converted components are then packed into a single 32 bit return value This operation can be reversed using the unpack 2ushort function C Pseudocode USING sx seguuoel 5535 5 0 Cllewjo a lt 0 0 J50 5 ushon xy Ote S65535 0 Cllamjo a yv 0 0 J30 5 ESSE tislaome ivy lt lt US O unpack_2ushort float2 unpack_2ushort float a Unpacks two 16 bit unsigned integer values from a and scales the results into individual floating point values between 0 0 and 1 0 C Pseudocode mew 0x gt gt 0 amp bees 65529 07 resulte UE gt 16 demise 15535550 pack 4byte float pack 4byte float4 a float pack 4byte half4 a Converts the four components of a into 8 bit signed integers The signed integers are such that a representation with al
241. m modern languages such as C and Java and from earlier shading languages such as RenderMan and the Stanford shading language The language also introduces a few new ideas In particular it includes features designed to represent data flow in stream processing architectures such as GPUs Profiles which are specified at compile time may subset certain features of the language including the ability to implement loops and the precision at which certain computations are performed Silent Incompatibilities Most of the changes from ANSI C are either omissions or additions but there are a few potentially silent incompatibilities These are changes within Cg that could cause a program that compiles without errors to behave in a manner different from C Q The type promotion rules for constants are different when the constant is not explicitly typed using a type cast or type suffix In general a binary operation between a constant that is not explicitly typed and a variable is performed at the variable s precision rather than at the constant s default precision Q Declarations of struct perform an automatic typedef as in C and thus could override a previously declared type O Arrays are first class types that are distinct from pointers As a result array assignments semantically perform a copy operation for the entire array 808 00504 0000 006 221 NVIDIA Cg Language Toolkit Similar Operations That Must be Expressed Differen
242. many situations such as tabularizing complex functions into texture maps where it is useful to execute Cg programs on the CPU and not on the GPU While the CPU path doesn t offer the same performance it can be useful because it doesn t have the resource limits associated with GPUs Programs that run on a CPU in this manner are declared like the following float foo 4 f mioara sie loewe js TEOSTON wihoeic2 clellicae 2 ASIA E COLOR 30 808 00504 0000 006 NVIDIA Introduction to the Cg Language ESC UE FOC 113037 The POSITION semantic denotes the parameter or parameters that should be set with the coordinates of each point at which the function is evaluated there is a coordinate value from zero to one for each dimension over which the function is being evaluated The PSIZE semantic denotes a parameter that should be initialized with the value of the spacing between samples at which the function is being evaluated and the COLOR semantic denotes where the result of the function should be returned Thus the function above could have been written as a void function with an out float4 ret COLOR parameter and an assignment to ret instead of the return statement Given an effect file with such a program a CGprogram handle to it can be retrieved by creating a program with the following CG_PROFILE_GENERIC profile CGprogram tp cgCreateProgramFromEffect effect CE IPINOM Iii Cla WiewuayesY ATA
243. me as COLOR DEPTH Fragment depth value float Interpolated depth from rasterizer in range 0 1 in range 0 1 If a program desires an output color alpha of 1 0 it should explicitly write a value of 1 0 to the w component of the COLOR output The language does not define a default value for this output Note If the target hardware uses a default value for this output the compiler may choose to optimize away an explicit write specified by the user if it matches the default hardware value Such defaults are not exposed in the language In contrast the language does define a default value for the DEPTH output This default value is the interpolated depth obtained from the rasterizer Semantically this default value is copied to the output at the beginning of the execution of the fragment program Note Although the DEPTH output is assigned a default value as with all outputs its value cannot be read in a Cg program 252 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification As discussed earlier when a binding semantic is applied to an output the type of the output variable is not required to match the type of the binding semantic For example the following is legal although not recommended struct myfragoutput cloar mycolor z COLOR In such cases the variable is implicitly copied with a typecast to the semantic upon program completion If the variable s
244. meter parameter i break default Here is the code that handles the parameter break while parameter cgGetNextParameter parameter 0 In practice it is usually simpler to iterate over all of the leaf parameters that is non aggregate parameters directly using cgGetNextLeafParameter CGparameter cgGetFirstLeafParameter CGprogram program CGenum namespace CGparameter cgGetNextLeafParameter CGparameter parameter These functions iterate through all the simple parameters including structure fields and array elements that serve as inputs to the program Nothing is guaranteed regarding the order of the parameters in the sequence Direct Retrieval Any parameter of a program can also be retrieved directly by using its name with cgGetNamedParameter CGparameter cgGetNamedProgramParameter CGprogram program CGenum namespace const char name Here namespace may be either CG_GLOBAL or CG_PROGRAM as above If the program has no parameter corresponding to name cgGetNamedParameter returns zero The Cg syntax is used to retrieve structure fields or array elements Let s take the following code snippet as an example EE ostra float A float4 B y EErEE legs exu 1 Foostruct Fool y 808 00504 0000 006 57 NVIDIA Cg Language Toolkit void main BarStruct Bar 3 P The following are valid names for retrieving the corresponding parameter
245. meters are managed By far the easiest method is to enable texture management in the context cgGLSetManageTextureParameters context CG TRUE If this is done then when the CGprogram is bound by a call to cgSetPassState the texture parameters used are associated with the appropriate hardware texture units automatically 124 808 00504 0000 006 NVIDIA Introduction to CgFX Alternatively the mapping of texture parameters to hardware units can be handled explicitly by the application using the routine cgGLEnableTextureParameter CGparameter progParam cgGetNamedParameter prog sampler cgGLEnableTextureParameter progParam However note that it is not possible to call cgGLEnableTextureParameter with a handle to an effect s sampler parameter the handle must be to an actual program parameter In general the first approach is to be preferred for its simplicity Interfaces and Unsized Arrays CgFX also supports Cg s interfaces and unsized arrays features Given an effect file with Cg programs that use these features the compile statement can be used in two different ways to resolve the interfaces and unsized arrays so that the program can be compiled The abstract types may be resolved using Cg code itself or they may be resolved using the Cg runtime Consider the following example a Light interface has been defined with SpotLight implementing the interface The main program takes an
246. minimal interface 85 cgD3D8ResourceToDeclUsage 90 cgD3D8ValidateVertexDeclaration 88 cgD3D9ResourceToDeclUsage 90 cgD3D9ValidateVertexDeclaration 88 Direct3D 8 application 95 Direct3D 9 application 92 fragment program 92 type retrieval 91 vertex declaration 85 vertex declaration for Direct3D 8 86 vertex declaration for Direct3D 9 86 vertex program 91 header files 46 loading 47 modifying parameters 47 OpenGL 73 error reporting 85 OpenGL application 82 OpenGL parameter setting 74 parameter shadowing 73 program execution 48 releasing resources 49 Cg Runtime Library overview 45 Cg standard library 33 Cg Simple file 145 cgc exe Cg compiler 329 cgD3D9EnableParameterShadowing 103 CGerror Direct3D 114 OpenGL 85 cint type specification 229 command line options Cg compiler 329 comparison operators 248 introduction 21 compilation profiles use of 225 compiler options command line 329 debug 330 Dmacro 329 entry 329 h 330 pathname 329 I filename 329 longprogs 330 maxunrollcount 330 nocode 329 nofx 329 nostdlib 329 0 329 profile 329 profileopts 329 quiet 329 strict 329 v 330 compile time type category 232 computation frequency for performance 327 concrete type category 232 conditional code in fragment programs and performance 328 conditional operator 248 332 808 00504 0000 006 NVIDIA conditional operators 22 constants typing of 232 construction operator described 244 context
247. modulate the lighting contributions with the material properties to get the final vertex color and we assign it to the output structure s color field OUT Color Finally we set the alpha channel of the final color to 1 0 so that our object will be opaque and return the computed position and color values stored in the OUT structure Further Experimentation Use simple cg as a framework to try more advanced experiments perhaps by adding more parameters to the program or by performing more complex calculations in the vertex program Have fun experimenting 152 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders This chapter provides a set of advanced profile sample shaders written in Cg Each shader comes with an accompanying snapshot description and source code Examples shown are Improved Skinning Improved Water Melting Paint MultiPaint Ray Traced Refraction Skin Thin Film Effect Car Paint 9 Oo oO oO OO O O 808 00504 0000 006 NVIDIA 153 Cg Language Toolkit Improved Skinning Description This shader takes in a set of all the transformation matrices that can affect a particular bone Each bone also sends in a list of matrices that affect it There is then a simple loop that for each vertex goes through each bone that affects that vertex and transforms it This allows just one Cg program to do the entire skinning for vertices affected by any number of bones instead of having one pr
248. n 2 f technique AsmFrag pass FragmentProgram asm Vine 0 3X dco Oho WH 207 END he The most common of these three options for specifying programs is using compile statements The first argument following the compile keyword is the name of the profile to which the program is to be compiled for example p30 p40 arbfp1 or vp20 The next argument gives the name of the function in the effect file that serves as the program entry point followed by a list of expressions for example 2 These expressions have a one to one correspondence with the uniform parameters of the program being compiled there must be exactly one for each uniform program parameter no more and no less In the example above the expression 2 sets the value for the foo parameter to main Because it is a literal value CgFX is able to compile the program to a particularly efficient version that just includes returning the uv value It is also possible to include references to effect parameters in the expression used in the compile statement for example WEllyeuc A4t asin ia tora loat OO Float ww 9 WaKCOORIDG e COLOR return foo 0 2 uv g 2 uy 122 808 00504 0000 006 NVIDIA Introduction to CgFX float bar technique NewSimpleFrag pass VertexProgram NULL FragmentProgram compile arbfpl main 2 bar Here the value 2 bar is associated with the
249. n gt MaxLocalParams lt n gt where 1 lt n lt 32 default 32 where 1 lt n lt 8 default 1 where 16 lt n 4096 default 1024 where 16 lt n 256 default 96 262 NVIDIA 808 00504 0000 006 Appendix B Language Profiles OpenGL ARB Fragment Program Profile arb p1 The OpenGL ARB Fragment Program Profile is used to compile Cg source code to fragment programs compatible with version 1 0 of the GL_ARB_fragment_program OpenGL extension Q Profile name arbfp1 Q How to invoke Use the compiler option profile arbfp1 The arbfp1 profile limits Cg to match the capabilities of OpenGL ARB fragment programs This section describes the capabilities and restrictions of Cg when using the arbfp1 profile Accessing OpenGL State The arbfp1 profile supports access to OpenGL state with the same set of state semantics provided by the arbvp1 profile See Accessing OpenGL State on page 256 for more information about this feature MRT Support This profile supports multiple render targets MRTs When MRTs are used up to three additional four component outputs may be written in addition to the COLOR and DEPTH outputs supported in other profiles These new outputs are available via the output semantics COLOR1 through COLOR3 The use of MRTs is an optional feature of the ARB_fragment_program and the DirectX PixelShader 2 specifications consequently not all hardware that supports these profi
250. n it for editing While you are editing simple cg you can press Control F7 at any time to compile it Because of the way the project is set up any errors in your code will be shown just as when you compile a normal C or C program You can also double click on an error which takes you to the location in the source code that caused the error Understanding simple cg The Cg Simple application runs the shader defined in simple cg on a torus The provided version of simple cg calculates diffuse and specular lighting for each vertex A screenshot of the shader is shown in Fig 4 F Fig 4 The simple cg Shader 146 808 00504 0000 006 NVIDIA A Brief Tutorial Program Listing for simple cg The following is the program listing for simple cg Define inputs from application struct appin float4 Position BO Silt k Nie float4 Normal NORMAL y Define outputs from vertex shader Struct wvertout float4 HPosition PO SATIS NIIS float4 Color g COLOR y vertout main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewIT uniform float4 LightVec vertout OUT Transform vertex position into homogenous clip space OUT HPosition mul ModelViewProj IN Position Transform normal from model space to view spac float3 normalVec normalize mul ModelViewIT IN Normal xyz Store normalized light vector float3 lightVec normalize LightVec xyz Calculate
251. n of compiler and profile The operations supported on a packed type in a particular profile may be different than the operations supported on the corresponding unpacked type in that same profile Profiles may define a maximum allowable size for packed arrays but must support at least size 4 for packed vector one dimensional array types and 4x4 for packed matrix two dimensional array types When declaring an array of arrays in a single declaration the packed modifier only refers to the outermost array However it is possible to declare a packed array of packed arrays by declaring the first level of array in a typedef using the packed keyword and then declaring a packed array of this type in a second statement It is not possible to have a packed array of unpacked arrays 230 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification Q For any supported numeric data type TYPE implementations must support the following packed array types which are called vector types Type identifiers must be predefined for these types in the global scope typedef packed TYPE TYPE1 1 typedef packed TYPE TYPE2 2 typedef packed TYPE TYPE3 3 typedef packed TYPE TYPE4 4 For example implementations must predefine the type identifiers float1 float2 float3 float4 and so on for any other supported numeric type Q For any supported numeric data type TYPE implementations must support the following packed array types which are call
252. name respectively Similarly there are versions of each function that retrieve any matrices in the given parameter in row major or column major order These are specified using ror c respectively At most nvals values will be copied into the given array v The total number of values copied into v is returned For example cgGetParameterValueic retrieves the values of the given parameter into the supplied array of integer data and copies matrix data in column major order The total number of values associated with a given 58 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library parameter and hence the required length of the given array can be computed using the core Cg runtime int nrows cgGetParameterRows param int ncols cgGetParameterColumns param int asize cgGetArrayTotalSize param int ntotal nrows ncols a asigze gt 0 meoral asizes A similar family of entry points exist for setting a parameter s values void cgSetParameterValue i f d r c CGparameter param int nvals type v The entry points in this family are identical to those of the cgGetParameterValue family The total number of values in a parameter may be computed as above If nva1s is less than the total size of the parameter an error is generated The core Cg runtime also allows the application to query a parameter s default values const double cgGetParameterValues CGparameter parameter CGenum valueT
253. nction pipeline or specify a user written vertex program If the user wishes to mix these two approaches it is sometimes desirable to guarantee that the position computed by the first approach is bit identical to the position computed by the second approach This position invariance is particularly important for multipass rendering Support for position invariance is optional in Cg vertex profiles but for those vertex profiles that support it the following rules apply Q Position invariance with respect to the fixed function pipeline is guaranteed if two conditions are met 250 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification The vertex program is compiled using a compiler option indicating position invariance posinv for example The vertex program computes position as follows OUT_POSITION mul MVP IN_POSITION where OUT POSITION is a variable or structure element of type float 4 with an output binding semantic of POSITION or HPOS IN POSITION is a variable or structure element of type float 4 with an input binding semantic of POSITION MVP is a uniform variable or structure element of type loat 4x4 with an input binding semantic that causes it to track the fixed function modelview projection matrix The name of this binding semantic is currently profile specific for OpenGL profiles the semantic GL MVP is recommended Q Ifthe first condition is met but not the second the compiler is
254. nctions by Profile llle 226 Syntax for Parameters in Function Definitions eiie 227 Function Calls serg ce cee ee SEER eae OXON XA RA ERE eR Dee 228 Method Call u prismi ra Aa A de Melia Een Bed 228 niic ink CARE eRe oA RESTA Ke eae ee 228 TYDES 4 bo Pa x BRERA REE LEMS Cad VEE RETA SRR Tea Ros Bas 229 Partial Support Of TYPOS ad cht wedded Sh Shed BERR a 231 ae OM SS EP 232 CONSIGN sia hee ee AS Re ae ae I S Eds 232 iv 808 00504 0000 006 NVIDIA Type Qualifiers omar tit aida mw Reged kee eg BR Rodd ag ORR debe dS dw d 233 Type CONVENIOS scada Weak 234 Type EGUNAN Y ss so cat ede eee he CGR DI RU ER XE ee ee eee RED Rea 236 TYPe Promotion RUES pgn 2 aga RE ROSARII OAQUEROOROE RE RR ea dS 236 NAMESPACES finde eee AR SRK EROR ORG ORG OA HRT ORO RA 231 Arrays and SubDSCHIDEIFI wis cg a A ete Red 238 Unisize AaS AAA AAA A 239 Funcion OVEROAGING cuba rad ARA a A we E 240 evt rp PP di o e a ws 241 Use of Uninitializead Variables in iras ao o tuts neo e Ra TR ca Rec 241 PreprocessO Farid id AS a ls 241 Overview of Binding Semantics iliis ee 241 Binding SEMANTIES serios oro cari Sexe a e eate Dh d aH ies 242 Aliasing of SS IV ACCS a su acquis qr anl n asc Rok TR bct ah vas CR RR GORA Re nn 243 Restrictions on Semantics Within a Structure llli 243 Additional Details for Binding Semantics 0 000000 n 243 How Programs Receive and Return Data 0 000000 eee ers 243 Statemelts ccs
255. ndicates no error When either error fetching entry point is called its cached error value is reset to 0 More comprehensive error checking and handling can be achieved using Cg s error handler callback mechanism Each time an error occurs the core Cg runtime calls an error handler callback function optionally provided by the application The application registers the error handler using typedef void CGerrorHandlerFunc CGcontext ctx CGerror err void appdata void cgSetErrorHandler CGerrorHandlerFunc func void data When an error occurs the Cg runtime calls the specified function passing the CGcontext in which the error occurred the code associated with the triggering error and a copy of the data pointer registered by the application A typical implementation of the error handler might look like this void HandleCgError CGcontext ctx CGerror err void appdata imjoneainioie Sicilia Es Eicicores Bs inl cocacola ier J p const char listing cgGetLastListing ctx if listing NULL oriol suele Y last Jlistimes usa Je p Here is a list of some of the CGerror codes specific to the core Cg runtime O cc NO ERROR Returned when no error has occurred O CG COMPILER ERROR Returned when the compiler generated an error A call to cgGetLastListing should be made to get more details on the actual compiler error O CG INVALID PARAMETER ERROR Returned when the parameter used is invalid
256. ngle precision Q half fixed and double data types are treated as float half data types can be used to specify partial precision hint for pixel shader instructions int data type is supported using floating point operations sampler types are supported to specify sampler objects used for texture fetches Statements and Operators With the ps_2_0 profiles while do and for statements are allowed only if the loops they define can be unrolled because there is no dynamic branching in PS 2 0 shaders In current Cg implementation extended ps_2_x shaders also have the same limitation Comparison operators are allowed gt lt gt lt and Boolean operators amp amp are allowed However the logic operators amp are not Using Arrays and Structures Variable indexing of arrays is not allowed Array and structure data is not packed 808 00504 0000 006 301 NVIDIA Cg Language Toolkit Bindings Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the ps_2_0 and ps 2 X profiles are summarized in Table 42 Table 42 ps 2 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register s0 register s15 Texunit unit N where N is in range 0 15 TEXUNITO TEXUNIT15 May only be used with uniform inputs with sampler types register c0 register c31 Constant r
257. ngle vector fetch the illumination map using A the result of the two previous dot products Hf as texture coordinates returns the diffuse color in the Ve color components and the specular color in the alpha component float2 illumCoord float2 dot IN LightVector xyz bumpNormal xyz dot IN HalfAngleVector xyz bumpNormal xyz float4 illumination tex2D IlluminationMap illumCoord expand iterated normal to 1 1 float4 normal 2 IN Normal 0 5 compute self shadowing term float shadow saturate 4 dot normal xyz IN LightVectorUnsigned xyz compute final color return Ambient color shadow illumination color illumination wwww 808 00504 0000 006 195 NVIDIA Cg Language Toolkit Bump Reflection Mapping Description This effect mixes bump mapping and reflection mapping based on the texm3x3vspec DirectX 8 pixel shader instruction DOT_PRODUCT_REFLECT_CUBE_MAP in OpenGL This instruction computes three dot products to transform the normal fetched from the normal map into the environment cube space reflects the transformed normal with respect to the eye vector and fetches a cube map to get the final color The vertex shader is responsible for computing the transform matrix and the eye vector Fig 15 Fig 15 Example of Bump Reflection Mapping 196 808 00504 0000 006 NVIDIA Vertex Sha Basic Profile Sample Shaders der Source Code fo
258. nguage Typically the initial value of a uniform variable or parameter is stored in a different class of hardware register Furthermore the external mechanism for specifying the initial value of uniform variables or parameters may be different than that used for specifying the initial value of non uniform variables or parameters Parameters qualified as uniform are normally treated as persistent state while non uniform parameters are treated as streaming data with a new value specified for each stream record such as within a vertex array Function Declarations Functions are declared essentially as in C A function that does not return a value must be declared with a void return type A function that takes no parameters may be declared in one of two ways Q AsinC using the void keyword functionName void Q With no parameters at all functionName Functions may be declared as static If so they may not be compiled as a program and are not visible from other compilation units Overloading of Functions by Profile Cg supports overloading of functions by compilation profile This capability allows a function to be implemented differently for different profiles It is also useful because different profiles may support different subsets of the language capabilities and because the most efficient implementation of a function may be different for different profiles 226 808 00504 0000 006 NVIDIA Appendix A Cg Language S
259. nguage features especially in fragment programs These are referred to as basic profiles See Language Profiles on page 255 for detailed descriptions of these and related profiles Declaring Programs in Cg CPU code generally consists of one program specified by main in C In contrast a Cg program can have any name A program is defined using the following syntax return type lt program name gt lt parameters gt lt semantic name gt C os FF Program Inputs and Outputs The programmable processors in GPUs operate on streams of data The vertex processor operates on a stream of vertices and the fragment processor operates on a stream of fragments A programmer can think of the main program as being executed just once on a CPU In contrast a program is executed repeatedly on a GPU once for each element of data in a stream The vertex program is executed once for each vertex and the fragment program is executed once for each fragment The Cg language adds several capabilities to C to support this stream based programming model For new Cg programmers these capabilities often take some time to understand because they have no direct correspondence to C capabilities However the sample programs later in this document demonstrate that it really is easy to use these capabilities in Cg programs Two Kinds of Program Inputs A Cg program can consume two different kinds of inputs Q Varying inputs are u
260. notation float4 main uniform Foo myfoo uniform float myval COLOR return myfoo helper myval 808 00504 0000 006 13 NVIDIA Cg Language Toolkit Arrays Note that in the current release member variables must be declared before member functions that reference them additionally member functions may not be overloaded based on profile Arrays are supported in Cg and are declared just as in C Because Cg does not support pointers arrays must always be defined using array syntax rather than pointer syntax Declare a function that accepts an array of five skinning matrices returnType foo float4x4 mymatrix 5 Basic profiles place substantial restrictions on array declaration and usage General purpose arrays can only be used as uniform parameters to a vertex program The intent is to allow an application to pass arrays of skinning matrices and arrays of light parameters to a vertex program The most important difference from C is that arrays are first class types That means array assignments actually copy the entire array and arrays that are passed as parameters are passed by value the entire array is copied before making any changes rather than by reference Unsized Arrays Cg supports unsized arrays arrays with one or more dimensions having no specified length This makes it possible to write Cg functions that operate on arrays of arbitrary size For example float myfunc float val
261. nslateCGerror error OutputDebugString buffer cgSetErrorCallback MyErrorCallback 116 808 00504 0000 006 NVIDIA Introduction to CgFX CgFX Overview CgFX is an extended file format for Cg In addition to Cg programs CgFX files can also represent both fixed function graphics state and meta information about shader parameters The CgFX API makes it possible to load CgFX effects files traverse the data in them set the associated graphics state and so on This chapter introduces this new API and the ideas behind it and is intended to make it easy to get started using CgFX This chapter assumes that the OpenGL state manager implemented as part of the CgGL runtime is being used Because CgFX allows for extensible custom state managers alternate state managers that accept different state syntax may also be available For example a Direct3D state manager might accept Direct3D style state names while a Direct3D Under OpenGL state manager might accept Direct3D style state names but allow for rendering using OpenGL Key Concepts Effect An effect file contains a collection of shader source code parameters and rendering techniques An effect encapsulates one or more different methods to render a particular visual effect For example the effect might provide one approach intended for use on fixed function hardware and a different approach on more modern programmable hardware Technique Each
262. nt the named interface Interfaces contain only function prototype definitions They do not contain actual function implementations or data members For example the following example defines an interface named Light consisting of two methods illuminate and color interface Light micare sliltimimece GEl sics 2 owe oae ih 9 Tear colo weasel y y A Cg structure may optionally implement an interface This is signified by placing a and the name of the interface after the name of the structure being defined The methods required by the interface must be defined within the body of the structure For example struct Spotlight Light sampler2D shadow samplerCUBE distribution flogs Plight Clg Hoars dias loss 19 elle Elosjes 1 4 16 808 00504 0000 006 NVIDIA Introduction to the Cg Language L normalize Plight P return Clight tex2D shadow P xxx texCUBE distribution L xyz itle S color woul 4 return Clight y Here the SpotLight structure is defined which implements the Light interface Note that the illuminate and color methods are defined within the body of the structure and that their implementations are able to reference data members of the SpotLight structure for example Plight Clight shadow and distribution Function parameters local variables and global variables all may have interface types Interface parameters to top level functions such
263. nts in the pass in order to determine what state was changed so that it can set it back to the desired values The routines to manually traverse the state in a pass are explained in OpenGL State on page 129 Effect Parameters Handles to effect parameters can be retrieved using cgGetNamedEffectParameter Given such a handle the name of the parameter can be found with cgGetParameterName its value can be set using the Cg runtime value setting entry points and so on CGparameter c cgGetNamedEffectParameter effect Color cgSetParameter3fv c Color CGparameter mvp cgGetNamedEffectParameter effect ModelViewProjection cgGLSetStateMatrixParameter mvp CG_GL_MODELVIEW_PROJECTION_MATRIX CG GL MATRIX IDENTITY Vertex and Fragment Programs With the OpenGL state manager vertex and fragment programs are defined via assignments to the VertexProgram and FragmentProgram states respectively Three different classes of expressions can be given on the right hand side of these state assignments Q Compile statements 808 00504 0000 006 121 NVIDIA Cg Language Toolkit Q In line assembly Q NULL These three possibilities are demonstrated in the effect file below lOc meta maso illoeie oo lose i 8 EEXXCIOYOIEJD QU m COILOIR return foo gt 0 2 UY 2 2 uv technique SimpleFrag pass VertexProgram NULL FragmentProgram compile arbfpl mai
264. ny constant that is not explicitly typed is implicitly typed 1f the constant includes a decimal point it is implicitly typed as c 1oat If it does not include a decimal point it is implicitly typed as cint 232 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification By default constants are base 10 For compatibility with C integer hexadecimal constants may be specified by prefixing the constant with 0x and integer octal constants may be specified by prefixing the constant with 0 Compile time constant folding is preferably performed at the same precision that would be used if the operation were performed at run time Some compilation profiles may allow some precision flexibility for the hardware in such cases the compiler should ideally perform the constant folding at the highest hardware precision allowed for that data type in that profile If constant folding cannot be performed at run time precision it may optionally be performed using the precision indicated below for each of the numeric data types float s23e8 p32 IEEE single precision floating point half s10e5 p16 floating point with IEEE semantics fixed s1 10 fixed point clamping to 2 2 double s52e11 p64 IEEE double precision floating point D D DO D int signed 32 bit integer Type Qualifiers The type of an object may be qualified with one or more qualifiers Qualifiers apply only to objects Qualifiers are removed from the valu
265. o clamp the result of a dot product computation to the range 0 1 in a fragment program use the saturate function instead of max This is often written as max 0 dot N L but as long as the N and L vectors are normalized this can be written equivalently as saturate dot N L because the dot product of two normalized vectors is never greater than one Given that saturate is free in fragment programs see 3 Use the Cg Standard Library on page 324 this compiles to more efficient code Q Use the 1it Standard Library function if appropriate The 1it function implements a diffuse glossy Blinn shading model It takes three parameters The dot product of the normalized surface normal and the light vector The dot product of a half angle vector and the normal The specular exponent It returns a 4 vector where The x and w components are always one The y component is equal to the diffuse dot product or to zero if the product is less than zero The z component is equal to the specular dot product raised to the given exponent or to zero if the diffuse dot product was less than zero All this is done substantially more efficiently than if the corresponding operations were written out in Cg code 326 808 00504 0000 006 NVIDIA Appendix C Nine Steps to High Performance Cg 7 Take Advantage of the Different Levels of Computation Frequency Always keep in mind the fact that fragment programs gen
266. o small objects using a single bounce ray traced pass In this example the polygonal surface is sampled and a refraction vector is calculated This vector is then intersected with a plane that is defined as being perpendicular to the object s x axis The intersection point is calculated and used as texture indices for a painted iris The demo permits varying the index of refraction the depth and density of the lens Note that the choice of geometry is arbitrary this sample is a sphere but any polygonal model can be used Fig 9 Example of Ray Traced Refraction 170 808 00504 0000 006 NVIDIA Vertex Shad Advanced Profile Sample Shaders er Source Code for Ray Traced Refraction struct appin float4 Position 8 POSE IONS float4 Normal NORMAL y output same struct is the input to fragment shader struct EyeV2F float4 HPosition POSITION clip space pos locas OPosiriom E TEXCOORDOS Olj coswels location float3 VPosition TEXCOORD1 eye pos obj space float3 N TEXCOORD2 normal obj space plogar J nbgysuEWexe O 8 T COORDSa J ligae clu 6193 s2 y O O EyeV2F main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewI uniform float4 LightVec in EYE coords EyeV2F OUT calculate clip space position for rasterizer use UT HPosition mul ModelViewProj IN Position pass through object space position UT OPositi
267. o the Cg Runtime Library D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT D3DDECLUSAGE TEXCOORD 0 D3DD3CL_END y and the following Direct3D 8 vertex declaration is valid DWORD declaration D3DVSD_STREAM 0 D3DVSD REG D3DVSDE POSITION D3DVSDT_FLOAT3 D3DVSD REG D3DVSDE DIFFUSE D3DVSDT_D3DCOLOR D3DVSD STREAM 1 D3DVSD SKIP 4 D3DVSD REG D3DVSDE D3DVSD END y H EXCOORDO D3DVSDT_FLOAT2 This is true because D3DDECLUSAGE_POSITION and D3DVSDE_POSITION match the hardware register associated with the predefined semantic POSITION D3DDECLUSAGE_DIFFUSE and D3DVSDE_DIFFUSE match the register associated with COLORO and D3DDECLUSAGE_TEXCOORDO and D3DVSDE_TEXCOORDO match the register associated with TEXCOORDO The above declarations can also be written the following way using cgD3D9 ResourceToDeclUsage Or cgD3D8ResourceToInputRegister const D3DVERTEXELEMENT9 declaration LO 0 sizcor loci D3DDECLTYPE_FLOAT3 D3DDECLMETHOD_DEFAULT cgD3D9ResourceToDeclUsage CG POSITION O0 i Uu 3S fubexexouE elote D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage CG COLORO O i 4 sizcolr lose p D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage CG TEXCOORDO O D3DD3CL END
268. of program using D3DXAssembleShader with assembleFlags as the D3DXASM flags Depending on the program s profile it then either uses IDirect3DDevice9 CreateVertexShader to create a Direct3D 9 vertex shader or uses IDirect 3DDevice9 CreatePixelShader to create a Direct3D 9 pixel shader Here is a typical use of the function HRESULT hresult cgD3D9LoadProgram vertexProgram TRUE D3DXASM DEBUG HRESULT hresult cgD3D9LoadProgram fragmentProgram TRUE 0 To load a program in Direct3D 8 use cgD3D8LoadProgram HRESULT cgD3D8LoadProgram CGprogram program BOOL parameterShadowingEnabled DWORD assembleFlags DWORD vertexShaderUsage const DWORD declaration This function assembles the result of the compilation of program using D3DXAssembleShader with assembleFlags as the D3DXASM flags Depending on the program s profile it then either uses IDirect3DDevice8 CreateVertexShader to create a Direct3D vertex shader with declaration as the vertex declaration and vertexShaderUsage as the usage control or uses IDirect3DDevice8 CreatePixelShader to create a Direct3D pixel shader 808 00504 0000 006 103 NVIDIA Cg Language Toolkit The value of parameterShadowingEnabled should be set to TRUE to enable parameter shadowing for the program This behavior can be changed after the program is created by calling cgD3DEnableParameterShadowing Here is a typical use of the function
269. of reasons including Q Changing variability of parameters Parameters may be changed from uniform variability to literal variability compile time constant See the cgSetParameterVariability manual page for more information Q Changing value of literal parameters Changing the value of a literal parameter will require recompilation since the value is used at compile time See the cgSetParameter and cgSetMatrixParameter manual pages for more information Q Resizing unsized arrays Changing the length of a parameter array may require recompilation depending on the capabilities of the program profile See the 808 00504 0000 006 51 NVIDIA Cg Language Toolkit cgSetArraySize and cgSetMultiDimArraySize manual pages for more information Q Connecting structures to interface parameters Structure parameters can be connected to interface program parameters to control the behavior of the program Changing these connections requires recompilation on all current profiles See the cgConnectParameter manual page and the Interfaces section of this document for more details When a program enters an uncompiled state it is automatically unloaded and unbound In order to be used again the program must be recompiled either automatically or manually see the following and then reloaded and rebound Compilation can be performed manually by the application via cgCompileProgram CGprogram program or automatically by the runtime Com
270. of vectors than an array of matrices Accessing a matrix requires a floor calculation followed by a multiply by a constant to compute the register index Because vectors and scalars take one register neither the floor nor the multiply is needed It is faster to do 808 00504 0000 006 297 NVIDIA Cg Language Toolkit matrix skinning using arrays of vectors with a premultiplied index than using arrays of matrices Bindings Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the vs_2_0 and vs 2 Xprofiles are summarized in Table 39 Table 39 vs 2 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c255 Constant register 0 95 C0 C255 The aliases c0 c95 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used Binding Semantics for Varying Input Output Data Only the binding semantic names need be given for these profiles The vertex parameter input registers are allocated dynamically All the semantic names except POSITION can have a number from 0 to 15 after them Table 40 vs 2 Varying Input Binding Semantics POSITION PSIZE BLENDWEIGHT BLENDINDICES NORMAL TEXCOORD COLOR TANGENT TESSFACTOR BINORMAL The valid binding semantics for varying output
271. ogram The arbvp1 conventions are compatible with the vp20 and vp30 profiles 808 00504 0000 006 259 NVIDIA Cg Language Toolkit Loading Constants Bindings Applications that do not use the Cg run time are no longer required to load constant values into program parameters registers as indicated by the const expressions in the Cg compiler output The compiler produces output that causes the OpenGL driver to load them However uniform variables that have a default definition still require constant values to be loaded into the appropriate program parameter registers as ARB vertex programs do not support this feature Application programs either have to use the Cg run time parse and handle the default commands or have to avoid initializing uniform variables in the Cg source code Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the arbvp1 profile are summarized in Table 16 Table 16 arbvp1 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c255 Local parameter with index n n 0 255 C0 C255 The aliases c0 c255 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first local parameter that is used Binding Semantics for Varying Input Output Data The valid binding semantics for uniform parameters in the a
272. ogram To see if it is compatible with the program use cgD3D9ValidateVertexDeclaration CGbool cgD3D9ValidateVertexDeclaration CGprogram program const D3DVERTEXELEMENT9 declaration for the Direct3D 9 Cg runtime or cgD3D8ValidateVertexDeclaration Use cgD3D8ValidateVertexDeclaration CGbool cgD3D8ValidateVertexDeclaration CGprogram program const DWORD declaration for the Direct3D 8 Cg runtime A call to cegD3D9ValidateVertexDeclaration or cgD3D8ValidateVertexDeclaration returns CG TRUE if the vertex declaration is compatible with the program A Direct3D 9 declaration is compatible with the program if the declaration has an entry matching every varying input parameter used by the program A Direct3D 8 declaration is compatible with the program if the declaration has a D3DVSD REG macro call matching every varying input parameter used by the program For the program void main float4 position POSITION float4 color COLORO float4 texCoord TEXCOORDO i the following Direct3D 9 vertex declaration is valid const D3DVERTEXELEMENT9 declaration LO sizcor tle y D3DDECLTYPE FLOAT3 D3DDECLMETHOD DEFAULT D3DDECLUSAGE POSITION O LO 5 Saleen float D3DDECLTYPE_D3DCOLOR D3DDECLMETHOD_DEFAULT D3DDECLUSAGE COLOR O0 i 4 3 salgeoie elos 88 808 00504 0000 006 NVIDIA Introduction t
273. ogram for one bone another program for two bones and so on Fig 5 Example of Improved Skinning 154 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders Vertex Shader Source Code for Improved Skinning SERUCIE imauies float4 position POSITION float4 weights BLENDWEIGHT float4 normal NORMAL float4 matrixIndices TESSFACTOR float4 numBones SPECULAR y SstrUCC QUEPUES float4 hPosition POSITION float4 color COLORO y Oui dUlcs mesita Eajoules I N uniform float4x4 modelViewProj uniform float3x4 boneMatrices 30 vision los colo uniform float4 lightPos outputs OUT float4 index IN matrixIndices float4 weight IN weights float4 position float3 normal for float i 0 i lt IN numBones x transform the offset by bone i position Position neto float 4 mul boneMatrices index x il OMe transform normal by bone i normal normal weight x mul float3x3 boneMatrices ind TN mal 7 o XS P ff shit ower iin the index and the index weight variables weight for the current bone into i 1 d JUIN DOS ELO ZE Pepe this moves X component of the index and weight variables 808 00504 0000 006 NVIDIA 155 Cg Language Toolkit 156 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders Improved Water Description This demo gives the appearance that the viewer is surrounded
274. om vertex shader SECU Wee ol float4 HPosition REO STONE float4 Color CEN EISE y vertout main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewIT uniform float4 LightVec vertout OUT Transform vertex position into homogenous clip space OUT HPosition mul ModelViewProj IN Position Transform normal from model space to view spac float3 normalVec normalize mul ModelViewIT IN Normal xyz Store normalized light vector loat3 lightVec normalize LightVec xyz Int Ss Calculate half angle vector loat3 eyeVec float3 0 0 0 0 1 0 loat3 halfVec normalize lightVec eyeVec in inn oe Calculate diffuse component loat diffuse dot normalVec lightVec lan Ss Calculate specular component loat specular dot normalVec halfVec ri Ss Use the lit function to compute lighting vector from diffuse and specular values float4 lighting lit diffuse specular 32 Blue diffuse material micare Cchirruseuerecial wiloacs 0 0 0 0 10 White specular material figaro pecularMarteriia Eloces 1 0 1 0 1 0 Combine diffuse and specular contributions and 10 808 00504 0000 006 NVIDIA Introduction to the Cg Language IN Sat final vertaz color OUT Color rgb lighting y diffuseMaterial lighting z specularMaterial OUT Color s 1 05 return OUT Working with Data Like C Cg supports feature
275. ompiled into GPU assembly code either on demand at run time or beforehand Cg makes it easy to combine a Cg fragment program with a handwritten vertex program or even with the non programmable OpenGL or DirectX vertex pipeline Likewise a Cg vertex program can be combined with a handwritten fragment program or with the non programmable OpenGL or DirectX fragment pipeline Cg Language Profiles Because all CPUs support essentially the same set of basic capabilities the C language supports this set on all CPUs However GPU programmability has not quite yet reached this same level of generality For example the current generation of programmable vertex processors supports a greater range of capabilities than do the programmable fragment processors Cg addresses this issue by introducing the concept of language profiles A Cg profile defines a subset of the full Cg language that is supported on a particular hardware platform or API The current release of the Cg compiler supports the following profiles Q OpenGL ARB vertex programs Runtime profile CG_PROFILE_ARBVP1 Compiler option profile arbvpl Q OpenGL ARB fragment programs Runtime profile CG_PROFILE_ARBFP1 Compiler option profile arbfp1 Q OpenGL NV40 vertex programs Runtime profile CG_PROFILE VP40 Compiler option profile vp40 Q OpenGL NV40 fragment programs Runtime profile CG_PROFILE_FP40 Compiler option profile fp40 Q OpenGL NV30 vertex programs Runtime p
276. on IN Position xyz object space normal UT N normalize IN Normal xyz transform view pos and light vec to obj space UT VPosition mul ModelViewI float4 0 0 0 1 xyz UT LightVecO normalize mul ModelViewI LightVec return OUT 808 00504 0000 006 171 NVIDIA Cg Language Toolkit Pixel Shader Source Code for Ray Traced Refraction Assume ray direction is normalized Vector planeEq is encoded half3 A B C D where Ax By C z D 0 and half3 A B C has been normalized Returns distance along to to intersection distance is negative HAE ig aceso half intersect_plane half3 rayOrigin half3 rayDir half4 planeEg half3 planeN planeEq xyz half denominator dot planeN rayDir levels ires ulti il mp d 0 gt parallel d gt 0 gt faces away if denominator 0 0h half top dot planeN rayOrigin planeEq w result top denominator return result define ETA il solorields im Balilipdeca define RADIUS x define IRIS_DEPTH y Zz define LENS_DENSITY w IV suwistielc in Specnaira define PHONG x define GLOSS1 y define GLOSS2 z define DROP w struct EyeV2F llo au EIE osito MEOS ERE ONIE float3 OPosition TEXCOORDO loss Weroslrlomn 2 WapyCOOurraw il SESAN TEXCOORD2 float4 LightVecO TEXCOORD3 half4 main EyeV2F IN uniform samp
277. on and are therefore only applied to those functions for which code is being generated This specification uses the word program to refer to the top level function any functions the top level function calls and any global variables or typedef definitions it references Each profile must have a separate specification that describes its characteristics and limitations This core Cg specification requires certain minimum capabilities for all profiles In some cases the core specification distinguishes between vertex program and fragment program profiles with different minimum capabilities for each The Uniform Modifier Non static global variables and parameters passed to functions such as main can be declared with an optional qualifier uniform To specify a uniform variable use this syntax uniform lt type gt lt variable gt For example uniform float4 myVector 808 00504 0000 006 225 NVIDIA Cg Language Toolkit or float4 foo uniform float4 uv If the uniform qualifier is specified for a function that is not top level it is meaningless and is ignored The intent of this rule is to allow a function to serve either as a top level function or as one that is not Note that uniform variables may be read and written just like non uniform variables The uniform qualifier simply provides information about how the initial value of the variable is to be specified and stored through a mechanism external to the la
278. onent of x is not equal to 0 Returns false otherwise asin x Arcsine of x in range 1 2 1 2 x should be in 1 1 atan x Arctangent of x in range 1 2 1 2 atan2 y x Arctangent of y x in range z n ceil x Smallest integer not less than x clamp x a b x clamped to the range a b as follows Returns a if x is less than a e Returns b if x is greater than b Returns x otherwise cos x Cosine of x cosh x cross a b Hyperbolic cosine of x Cross product of vectors a and b a and b must be 3 component vectors degress x Radian to degree conversion determinant M Determinant of matrix M dot a b Dot product of vectors a and b exp x Exponential function e exp2 x Exponential function 2 floor x Largest integer not greater than x fmod x y Remainder of x y with the same sign as x If y is zero the result is implementation defined 34 808 00504 0000 006 NVIDIA Table 1 Cg Standard Library Functions Mathematical Functions continued Mathematical Functions Function Description frac x Fractional part of x frexp x out exp Splits x into a normalized fraction in the interval 1 2 1 which is returned and a power of 2 which is stored in exp If x is zero both parts of the result are zero isfinite x Returns true if x is finite isinf x Returns true
279. orMap IN TexCoords xy half4 material tex2D MaterialMap IN TexCoords xy half3 Nt tex2D NormalMap IN TexCoords xy rgb ha teo 0 5 045127 0 510 y SpecData MAXSPEC should range from 0 1 half specStr material SPEC STR SpecData MAXSPEC half specPower SpecData MINPOWER material NORM SPEC EXPON SpecData MAXPOWER SpecData MINPOWER palm oiva morma lla vs IN essen mM MEISNIIO EO SENESIEGIET half3 Ln normalize IN LightVecO xyz half3 Nb normalize BumpData BUMP SCALE nal nal Fh Fh Fh w nal ns naL naL imn dew dem ns ns Na nar half4 reflColor ie dew de nal Nt x IN T Nt y IN B NE Z ENS cizre got NO Hn normalize Vn Ln 4 lighting lit diff dot Hn Nb specPower diffResult lighting y surfCol specCol lerp WHITE surfCol material METALNESS specResult lighting z specStr specCol 3 reflVect reflect Vn Nb texCUBE EnvMap reflVect fakeFresnel ReflData FRESNEL_MIN ReflData FRESNEL_MAX pow saturate 1 0h dot Vn IN N ReflData FRESNEL_EXPON 168 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders 808 00504 0000 006 169 NVIDIA Cg Language Toolkit Ray Traced Refraction Description This shader presents a method for adding high quality details t
280. orm samplerCUBE EnvironmentMap uniform float3 EyeVector COLOR fetch the bump normal from the normal map float4 normal tex2D NormalMap IN TexCoord xy transform the bump normal into cube space then use the transformed normal and eye vector M to compute the reflection vector that is a used to fetch the cube map return texCUBE reflect eye dp3x3 EnvironmentMap IN TangentToCubeSpace2 xyz IN TangentToCubeSpace0 IN TangentToCubeSpacel normal EyeVector 808 00504 0000 006 199 NVIDIA Cg Language Toolkit Fresnel Description This effect computes a reflection vector to lookup into an environment map for reflections and modulates this by a Fresnel term The result is reflections only at grazing angles Fig 16 Fig 16 Example of Fresnel Vertex Shader Source Code for Fresnel struct app2vert float4 Position 8 POSITION float4 Normal NORMAL float4 TexCoordO0 S TEXCOORDOR y 200 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders SUSE Sues float4 HPosition DIO SARA ONS float4 Color0 COLORO float4 TexCoord0 LE XECOORD OT y vert2frag main app2vert IN uniform float4x4 ModelViewProj uniform float4x4 ModelView uniform float4x4 ModelViewIT vert2frag OUT ifdef PROFILE ARBVPI1 ModelViewProj glstate matrix mvp ModelView glstate matrix modelview 0 ModelViewIT glstate matrix inv
281. ormal parameter is a struct the binding semantic may be specified with an element of the struct when the struct is defined struct lt struct tag gt lt type gt lt identifier gt lt binding semantic gt y Q If the input to the function is implicit a non static global variable that is read by the function the binding semantic may be specified when the non static global variable is declared lt type gt lt identifier gt lt binding semantic gt lt initializer gt If the non static global variable is a struct the binding semantic may be specified when the struct is defined as described in the second bullet above O A binding semantic may be associated with the output of a top level function in a similar manner lt type gt lt identifier gt lt parameter list gt lt binding semantic gt lt body gt Another method available for specifying a semantic for an output value is to return a struct and to specify the binding semantic s with 242 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification elements of the struct when the struct is defined In addition if the output is a formal parameter the binding semantic may be specified using the same approach used to specify binding semantics for inputs Aliasing of Semantics Semantics must honor a copy on input and copy on output model Thus if the same input binding semantic is used for two different variables those v
282. ormalize vert Binormal normalize vert Normal j 184 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders FRESNEL OFFSET SCALE POWER UNUSED float4 Fresnel O Ce p c float3x3 ViewTangent mul ModelTangent float3x3 ModelViewIT Generate VIEW SPACE vectors float3 viewN normalize mul float3x3 ModelView vert ONormal float4 viewP mul ModelView vert OPosition viewP w l saturate sqrt dot viewP xyz ViewP xyz 0 01 float3 viewV viewP xyz Generate OBJECT SPACE vectors float3 objV normalize EyePosition vert OPosition xyz float3 objL normalize LightVector float3 objH normalize objL objV Generate TANGENT SPACE vectors float3 tanL mul ModelTangent objL float3 tanV mul ModelTangent objV float3 tanH mul ModelTangent objH Generate REFLECTION vector for per vertex reflection look up float3 reflection reflect viewV viewN Generate FRESNEL term float ndv saturate dot viewN viewV float FresnelApprox pow 1 ndv Fresnel z Fresnel y Fresnel x Fill OUTPUT parameters ONU yart wwe TEXCOORDO xy O dbitcoylee tanL Tangent space LIGHT Tangent space HALF ANGLE O halfangle float4 tanH x tanH y tanH z l exp viewP w G reflection deux op View space REFLECTI
283. oved Water 0 0 0 0 llle 158 Pixel Shader Source Code for Improved Water 0 0 00 epee 160 uir TTC CUIU 161 DeSEFTDEIDIIS 1 ccf ae mat dos Sex eh se db ia eatis mut dc ne Ro 161 Vertex Shader Source Code for Melting Paint 2 2 0 00 161 Pixel Shader Source Code for Melting Paint 00 000 cece eee eee 163 M ltiPalfE 3 2 aic ad o EE RR aci gh aed Ban Ok aed A 165 rtis LOUPE 165 Vertex Shader Source Code for MultiPaint llle 166 Pixel Shader Source Code for MultiPaint llle 167 Ray Traced RefracBoN rs 3424624 5 06 QA Sheet esas p Rn Rud qd Re NA 170 DeseriptiOll cas epr He Et AREER E REN a a 170 Vertex Shader Source Code for Ray Traced Refraction 0 o oo ooooo eee 171 Pixel Shader Source Code for Ray Traced Refraction ooooooooooooo 172 Jl 175 DeseriDtlOl 24 ca cktve d bp Ea Du ERROR REAR GR UR KEE ORS 175 Pixel Shader Source Code for SHIM sistas aa ja tax x ert eh le RT Ras e 175 Trot FINTENECE cose sc ER oe E SEHR DINI UPS Rp Rd dre a Fa i 180 DeSEEIDEIBRI spies suis ta de agai A dos SOROR e S Re ie OR UR IER RU ERU ROB ion 180 Vertex Shader Source Code for Thin Film Effect llle 180 Pixel Shader Source Code for Thin Film Effect llle 182 CaP POA Os ogc te hoe ed Since Se QN aM aa dr m e Paha st aq a Rari Ee Ran ees 183 DESEO sarria REGE CER ONE a A RE EROR RUE de 183 Vertex Shader Source Code for Car Paint9 1 ee 184 Pixel Shader Source Code for
284. parameters in the vs 2 0 and vs 2 x profiles are summarized in Table 41 298 808 00504 0000 006 NVIDIA Appendix B Language Profiles These map to output registers in DirectX 9 vertex shaders Table 41 vs 2 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION Output position oPos PSIZE Output point size oPts FOG Output fog value oFog COLORO COLOR1 Output color values oDO oD1 TEXCOORDO TEXCOORD7 Output texture coordinates oTO oT7 Options The vs_2_x profile allows the following profile specific options DynamicFlowControlDepth lt n gt NumTemps lt n gt Predication where n 0 or 24 default 24 where 12 lt n lt 32 default 16 default true 808 00504 0000 006 NVIDIA 299 Cg Language Toolkit DirectX Pixel Shader 2 x Profiles ps 2 Memory The DirectX Pixel Shader 2 0 Profiles are used to compile Cg source code to DirectX 9 PS 2 0 pixel shaders and DirectX 9 PS 2 0 extended pixel shaders Q Profile names ps_2_0 for DirectX 9 PS 2 0 pixel shaders ps_2_x for DirectX 9 PS 2 0 extended pixel shaders Q How to invoke Use the compiler options profile ps_2_0 profile ps 2 x The ps 2 0 profile limits Cg to match the capabilities of DirectX PS 2 0 pixel shaders The ps 2 x profile is the same as the ps 2 0 profile but allows extended features such as arbitrary swizzles larger limit on number of
285. pare for lighting store normalized light vector float3 lightVec normalize LightVec xyz calculate half angle vector float3 eyeVec float3 0 0 0 0 1 0 float3 halfVec normalize lightVec eyeVec At this point we have to ensure that all our vectors are normalized We start by normalizing LightVec Then in preparation for specular lighting we have to define the half angle vector halfvec which is the vector halfway between the light and the eye vectors that is lightVecteyeVec 2 We normalize halfvec so we don t need to bother with the division by two because it cancels out after normalization anyway In this example we assume that the eye is at 0 0 1 but an application would typically pass the eye position also as a uniform parameter since it would be unchanged from vertex to vertex We use Cg s inline vector construction capability to build a 3 component float vector that contains the eye position and then we assign this value to eyeVec 1 Because Light Vec is uniform it is more efficient to normalize it once in the application rather than on a per vertex basis It is done here for illustrative purposes 150 808 00504 0000 006 NVIDIA A Brief Tutorial Calculating the Vertex Color Now we have to calculate the vertex color to output Calculating the Diffuse and Specular Lighting Contributions In this example we re going to calculate just a simple combination of diffuse and specular ligh
286. pecification The profile name must immediately precede the type name in the function declaration For example to define two different versions of the function myfunc for the profileA and profileB profiles prorilea flos myituine Flogs x 1 7 sns Vi profiles float mwtumo float x 19 If a type is defined using a typedef that has the same name as a profile the identifier is treated as a type name and is not available for profile overloading at any subsequent point in the file If a function definition does not include a profile the function is referred to as an open profile function Open profile functions apply to all profiles Several wildcard profile names are defined The name vs matches any vertex profile while the name ps matches any fragment or pixel profile The names ps_1 and ps_2 match any DirectX 8 pixel shader 1 x profile or DirectX 9 pixel shader 2 x profile respectively Similarly the names vs_1 and vs_2 match any DirectX vertex shader 1 x or 2x respectively Additional valid wildcard profile names may be defined by individual profiles In general the most specific version of a function is used More details are provided in Function Overloading on page 240 but roughly speaking the search order is the following 1 Version of the function with the exact profile overload 2 Version of the function with the most specific wildcard profile overload such as vs Or ps_1 3 Version of the function
287. peration and m is the 2 D bump environment mapping matrix This function can generate the texbem instruction in all ps 1 x profiles offsettex2DScaleBias uniform sampler2D tex float2 st float4 prevlookup uniform float4 m uniform float scale uniform float bias Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy float4 result tex2D tex newst return result saturate prevlookup z scale bias where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation m is the 2 D bump environment mapping matrix scale is the 2 D bump environment mapping scale factor and bias is the 2 D bump environment mapping offset This function can generate the texbem1 instruction in all ps 1 x profiles 808 00504 0000 006 315 NVIDIA Cg Language Toolkit Table 54 ps 1 x Auxiliary Texture Functions continued Texture Function Description texlD dp3 samplerlD tex float3 str float4 prevlookup Performs the following return tex1D tex dot str prevlookup xyz where str are texture coordinates associated with sampler tex and prevlookup is the result of a previous texture operation This function can be used to generate the texdp3tex instruction in the ps 1 2andps 1 3 profiles tex2D dp3x2 uniform sampler2D tex float3 str float4 intermediate coord float4 prevlookup Performs the following
288. peration or arithmetic Operation can occur in the program A texture shader operation may not have any dependency on the output of an arithmetic operation unless Q the arithmetic operation is a valid input modifier for the texture shader operation 5 For more details about the underlying instruction sets their capabilities and their limitations please refer to the NV texture shader and NV register combiners extensions in the OpenGL Extensions documentation 808 00504 0000 006 283 NVIDIA Cg Language Toolkit Q the arithmetic operation is part of a complex texture shader operation which are summarized in the section Auxiliary Texture Functions on page 290 Modifiers There are certain simple arithmetic operations that can be applied to inputs of texture shader operations and to inputs and outputs of arithmetic operations without generating a register combiner instruction These operations are referred to as input modifiers and output modifiers Instead of generating a register combiners instruction the arithmetic operation modifies the assembly instruction or source registers to which it is applied For example the following Cg expression z x 0 5 y 2 could generate the following register combiner instruction assuming x is in tex0 y is in tex1 and z is in co10 rgb discard half bias tex0 rgb discard texl rgb col0 sum scale_by_one_half alpha discard half bias tex0 a
289. pilation behavior is controlled via void cgSetAutoCompile CGcontext ctx CGenum flag Here 1ag may be one of the following enumerants Q CG COMPILE MANUAL In this mode the application is responsible for manually compiling a program The application may check to see if a program requires recompilation with the entry point cgIsProgramCompiled The program may then be compiled via cgCompileProgram This mode provides the application with the most control over how and when program recompilation occurs QO CG COMPILE IMMEDIATE In this mode the Cg runtime will force compilation automatically and immediately when a program enters an uncompiled state or when the program is first created This is the default mode O CG COMPILE LAZY This mode is similar to CG COMPILE IMMEDIATE but will delay program compilation until the program object code is needed The advantage of this method is the reduction of extraneous recompilations The disadvantage is that compile time errors will not be encountered when the program enters an uncompiled state but will instead be encountered at some later time most likely when the program is loaded or bound 52 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library A call to cgIsProgramCompi led determines whether a program needs to be recompiled CGbool cgIsProgramCompiled CGprogram program To recompile a program use cgCompileProgram cgCompileProgram CGprogram program
290. ponding Data POSITION HPOS Output position PSIZE PSIZ Output point size 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 25 vp30 Varying Output Binding Semantics continued Binding Semantics Name Corresponding Data FOG FOGC Output fog coordinate COLORO COLO Output primary color COLOR1 COL1 Output secondary color BCOLO Output backface primary color BCOL1 Output backface secondary color TEXCOORDO TEXCOORD7 Output texture coordinates TEXO TEX7 CLPO CL5 Output Clip distances The profile allows wPos to be present as binding semantics on a member of a structure of a varying output data structure provided the member with this binding semantics is not referenced This allows Cg programs to have same structure specify the varying output of a vp30 profile program and the varying input of an p30 profile program 808 00504 0000 006 273 NVIDIA Cg Language Toolkit OpenGL NV fragment program Profile p30 The p30 Fragment Program Profile is used to compile Cg source code to fragment programs for use by the NV ragment program OpenGL extension Q Profile name p30 Q How to invoke Use the compiler option profile fp30 This section describes the capabilities and restrictions of Cg when using the p30 profile Language Constructs and Support Data Types Q fixed type s1 10 fixed point is supported Q half type s10e5
291. ponding Data register s0 register s15 Texunit image unit N where wis in range TEXUNITO TEXUNIT15 0 15 May only be used with uniform inputs with sampler types register c0 register c31 Local Parameter N where wis in range C0 C31 0 31 May only be used with uniform inputs Binding Semantics for Varying Input Output Data The valid binding semantics for varying input parameters in the arb p1 pro file are summarized in Table 20 Table 20 arbfp1 Varying Input Binding Semantics Binding Semantics Name Corresponding Data type COLORO Input color 0 loat4 COLOR1 Input color 1 loat4 TEXCOORDO TEXCOORD7 Input texture coordinates float 4 The valid binding semantics for varying output parameters in the arbfp1 profile are summarized in Table 21 Table 21 arbfp1 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Output color float 4 DEPTH Output depth float 808 00504 0000 006 265 NVIDIA Cg Language Toolkit Options The ARB fragment program profile allows the following profile specific options NumTemps lt n gt where 0 lt n lt 32 default 32 NumInstructionSlots lt n gt where n gt 0 default 1024 NumMathInstructionSlots lt n gt where n gt 0 default 1024 NoDependentReadLimit lt b gt where b 0 or 1 default 1 NumTexInstructionSlots lt n gt where n gt 0 default 102
292. pport the half type but may choose to implement it with the same precision as the float type Q The fixed type is a signed type with a range of at least 2 2 and with at least 10 bits of fractional precision Overflow operations on the data type clamp rather than wrap Fragment profiles must support the fixed type but may implement it with the same precision as the half or float types Vertex profiles are required to provide partial support see Partial Support of Types on page 231 for the fixed type Vertex profiles have the option to provide full support for the fixed type or to implement the fixed type with the same precision as the half or float types O The bool type represents Boolean values Objects of bool type are either true or false O The cint type is 32 bit two s complement This type is meaningful only at compile time it is not possible to declare objects of type cint Q The cfloat type is IEEE single precision 32 bit floating point This type is meaningful only at compile time it is not possible to declare objects of type cfloat Q The void type may not be used in any expression It may only be used as the return type of functions that do not return a value 808 00504 0000 006 229 NVIDIA Cg Language Toolkit The sampler types are handles to texture objects Formal parameters of a program or function may be of type sampler No other definition of sampler variables is permitted A sampler v
293. profiles affects the Cg source code that the developer writes The vs_2_0 profile limits Cg to match the capabilities of DirectX VS 2 0 vertex shaders The vs_2_x profile is the same as the vs_2_0 profile but allows extended features such as dynamic flow control branching DirectX 9 vertex shaders have a limited amount of memory for instructions and data Program Instruction Limit DirectX 9 vertex shaders are limited to 256 instructions If the compiler needs to produce more than 256 instructions to compile a program it reports an error Vector Register Limit Likewise there are limited numbers of registers to hold program parameters and temporary results Specifically there are 256 read only vector registers and 12 32 read write vector registers If the compiler needs more registers to compile a program than are available it generates an error 6 To understand the DirectX VS 2 0 Vertex Shaders and the code the compiler produces see the Vertex Shader Reference in the DirectX 9 SDK documentation 296 808 00504 0000 006 NVIDIA Appendix B Language Profiles Statements and Operators If the vs_2_0 profile is used then if while do and for statements are allowed only if the loops they define can be unrolled because there is no dynamic branching in unextended VS 2 0 shaders If the vs_2_x profile is used then if while and do statements are fully supported as long as the DynamicFlowControlDepth option is not 0
294. r Bump Reflection Mapping Exc ex f y float4 Position POSITION in object space float2 TexCoord TEXCOORDO float3 TEXCOORD1 in object space Hoar SiS mlb COORD 2 E in object space moat SN EX COORDS in object space SiO wd d y Floats Posicion s POSITION su projection Space float4 TexCoord TEXCOORDO tiret Bow or the S29 transtorm hit from tangent to cube space float4 TangentToCubespace0 TEXCOORD1 second row of the 3x3 transform Ml from tangent to cube space float4 TangentToCubeSpacel TEXCOORD2 ff third row ue the BRS Eras oran J from tangent to cube space float4 TangentToCubeSpace2 TEXCOORD3 v2f main a2v IN uniform float4x4 WorldViewProj uniform float3x4 ObjToCubeSpace uniform float3 EyePosition in cube space uniform float BumpScale WAI OUP pass texture coordinates for Ue fetching the normal map OUT TexCoord xy IN TexCoord xy compute 3x3 transform from tangent to object space float3x3 objToTangentSpace first rows are the tangent and binormal scaled by the bump scale objToTangentSpace 0 BumpScale IN T 808 00504 0000 006 197 NVIDIA Cg Language Toolkit objToTangentSpace 1 BumpScale IN B objToTangentSpace 2 ENENG compute the 3x3 transform from Hi tangent space to cube space TangentToCubeSpace Hy object2cube tangent2object Va object2cube tran
295. r or equal to zero and less than the value of GL MAX PROGRAM LOCAL PARAMETERS ARB forthe GL VERTEX PROGRAM ARB target to glGetProgramivARB VertexLocalParameter ndx float4 ARB_vertex_program ndx must be greater or equal to zero and less than the value of GL MAX PROGRAM LOCAL PARAMETERS ARB for the GL VERTEX PROGRAM ARB target to glGetProgramivARB VertexProgram compile statement ARB_vertex_program or NV vertex program 138 NVIDIA 808 00504 0000 006 Introduction to CgFX Similarly there is a simple algorithm for determining the relationship between enumerants for glEnable and for glDisable and each of the states in the table below for example the state assignment BlendEnable false corresponds to a call to glDisable GL_BLEND Table 7 Enable Disable States Enable Disable State Name Type Requires AlphaTestEnable bool OpenGL 1 0 AutoNormalEnable bool 1 0 BlendEnable bool 1 0 ClipPlaneEnable ndx bool 1 0 ndx must be greater or equal to zero and less than the value of GL_MAX_CLIP_PLANES ColorLogicOpEnable bool 1 2 CullFaceEnable bool 1 0 DepthBoundsEnable bool EXT_depth_bounds DepthClampEnable bool NV_depth_clamp DepthTestEnable bool 1 0 DitherEnable bool 1 0 FogEnable bool 1 0 LightEnable ndx bool 1 0 ndx must be greater or equal to O and less than the value of GL_MAX_LIGHTS Lighting
296. ragraph are three lists of the state fields that can be accessed The array indexes are shown as 0 but an array can be accessed using any positive integer that is less than the limit of the array For example the diffuse component of the second light would be accessed by using the semantic 1 See OpenGL NV vertex program 1 0 Profile vp20 on page 279 for a full explanation of the data types statements and operators supported by this profile 256 808 00504 0000 006 NVIDIA Appendix B Language Profiles state light 1 diffuse assuming that GL_MAX_LIGHTS is at least 2 as shown in the following code void main uniform float4 lightColor state light 1 diffuse The state semantics of type 1oat4x4 that can be accessed are in Table 13 Table 13 float4x4 state Semantics state matrix modelview 0 state matrix projection state matrix mvp state matrix texture 0 state matrix palette 0 state matrix program 0 state matrix inverse modelview 0 state matrix inverse projection state matrix inverse mvp state matrix inverse texture 0 state matrix inverse palette 0 state matrix inverse program 0 state matrix transpose modelview 0 state matrix transpose projection state matrix transpose mvp state matrix transpose texture 0 state matrix transpose palette 0 state matrix transpose program 0 state matrix invtrans modelview 0 state matrix invtrans projection state matrix invtrans m
297. ram CG COMPILED PROGRAM D3DXAssembleShader progSrc strlen progSrc 0 O0 0 amp byteCode 0 device gt CreatePixelShader byteCode GetBufferPointer amp pixelShader Grab some parameters modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor 96 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Sanity check that parameters have th xpected siz assert cgD3D8TypeToSize cgGetParameterType modelViewMatrix 16 assert cgD3D8TypeToSize cgGetParameterType someColor SE Il Called to render ie seen void OnRender Get the Direct3D resource locations for parameters This can be done earlier and saved DWORD modelViewMatrixRegister cgGetParameterResourcelndex modelViewMatrix DWORD baseTextureUnit cgGetParameterResourcelndex baseTexture DWORD someColorRegister cgGetParameterResourceIndex someColor Set the Duzect3D state device gt SetVertexShaderConstant modelViewMatrixRegister amp matrix 4 device gt SetPixelShaderConstant someColorRegister fcComstentColor js device gt SetTexture baseTextureUnit texture device gt SetVertexShader vertexShader device gt SetPixelShader pixelShader Draw scene Called befor
298. ram and the varying input of an p30 profile program 282 NVIDIA 808 00504 0000 006 Appendix B Language Profiles OpenGL NV_texture_shader and NV_register_combiners Profile p20 The OpenGL NV_texture_shader and NV_register_combiners profile is used to compile Cg source code to the nvparse text format for the NV_texture_shader and NV_register_combiners family of OpenGL extensions o Profile name p20 a How to invoke Use the compiler option profile fp20 This document describes the capabilities and restrictions of Cg when using the p20 profile Overview Operations in the p20 profile can be categorized as texture shader operations and arithmetic operations Texture shader operations are operations which generate texture shader instructions arithmetic operations are operations which generate register combiners instructions The underlying instruction set and machine architecture limit programmability in this profile compared to what is allowed by Cg constructs Thus this profile places additional restrictions on what can and cannot be done in a Cg program Restrictions A Cg program in one of these profiles is limited to generating a maximum of four texture shader instructions and eight register combiner instructions Since these numbers are quite small users need to be very aware of this limitation while writing Cg code for these profiles The p20 profile also restricts when a texture shader o
299. rameter parameter const double matrix The matrix is passed as an array of floating point values whose size matches the number of coefficients of the matrix The r suffix is for functions that assume the matrix is laid out in row order and the e suffix is for functions that assume the matrix is laid out in column order The corresponding parameter value retrieval functions are void cgGLGetMatrixParameterfr CGparameter parameter float matrix void cgGLGetMatrixParameterfc CGparameter parameter float matrix 808 00504 0000 006 75 NVIDIA Cg Language Toolkit void cgGLGetMatrixParameterdr CGparameter parameter double matrix void cgGLGetMatrixParameterdc CGparameter parameter double matrix Use cgGLSetStateMatrixParameter to set a OpenGL 4x4 state matrix void cgGLSetStateMatrixParameter CGparameter parameter GLenum stateMatrixType GLenum transform The variable stateMat rixType is an enumerate type specifying the state matrix to be used to set the parameter CG GL MODELVIEW MATRIX for the current model view matrix D CG GL PROJECTION MATRIX for the current projection matrix O CG GL TEXTURE MATRIX for the current texture matrix O CG GL MODELVIEW PROJECTION MATRIX for the concatenated model view and projection matrices The variable transformis an enumerate type specifying a transformation applied to the state matrix before it is used to set the parameter value CG GL MATRIX
300. ray you can use cgGetArrayDimension cgGetArraySize cgGetArrayParameter and cgGetNextParameter int cgGetArrayDimension CGparameter parameter int cgGetArraySize CGparameter parameter int dimension CGparameter cgGetArrayParameter CGparameter parameter int index These three functions return 0 if parameter is not of type CG_ARRAY Function cgGetArrayDimension gives the dimension of the array It returns 1 for float4 array 10 2 for float4 array 10 100 and so on Next cgGetArraySize gives the size of every dimension For example for float 4 array 10 100 cgGetArraySize array 0 returns 10 and cgGetArraySize array 1 returns 100 An array anArray has cgGetArraySize anArray 0 elements If its dimension is greater than one those elements are themselves arrays Here is how these iteration functions could be used given a valid program named program void IterateProgramParameters CGprogram program RecurseProgramParameters cgGetFirstParameter program CG_PROGRAM void RecurseProgramParameters CGparameter parameter if parameter 0 return do 4 switch cgGetParameterType parameter case ONSE RecurseProgramParameters cgGetFirstStructParameter parameter break 56 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library case CG_ARRAY int arraySize cgGetArraySize parameter 0 oie Kume a Of lt Si sa RecurseProgramParameters cgGetArrayPara
301. rbvp1 profile are summarized in Table 17 The set of binding semantics for varying input data to arbvp1 consists of POSITION BLENDWEIGHT NORMAL COLORO COLOR1 TESSFACTOR PSIZE BLENDINDICES and TEXCOORDO TEXCOORD7 One can also use TANGENT and BINORMAL instead of TEXCOORD6 and TEXCOORD7 Additionally a set of generic binding semantics of ATTRO ATTR15 can be used In OpenGL implementations conventional and generic vertex attributes may or may not be aliases for each other see the ARB vertex program specification for more 260 808 00504 0000 006 NVIDIA Appendix B Language Profiles details The mapping of these semantics to corresponding setting command is listed in the table Table 17 arbvp1 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION Input Vertex through Vertex command BLENDWEIGHT Input vertex weight through WeightARB VertexWeightEXT command NORMAL Input normal through Normal command COLORO DIFFUSE Input primary color through Color command COLOR1 SPECULAR Input secondary color through SecondaryColorEXT command FOGCOORD Input fog coordinate through FogCoordEXT command TEXCOORDO TEXCOORD7 Input texture coordinates texcoord0 texcoord7 through MultiTexCoord command ATTRO ATTR15 Generic Attribute 0 15 through VertexAttrib command PSIZE ATTR6 Generic Attribute 6 The valid binding semantics for varying o
302. red for setting a parameter of a particular type 100 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library For convenience there is also a function to set a parameter from a 4x4 matrix of type D3DMATRIX HRESULT cgD3D9SetUniformMatrix CGparameter parameter const D3DMATRIX matrix The upper left portion of the matrix is extracted to fit the size of the input parameter so that you could set matrixParam this way as well D3DXMATRIX matrix i I ip 9 lg Ip ip 0 0 0 0 O 07 W 07 Y i cgD3D9SetUniformMatrix matrixParam amp matrix In the example above every element of matrixParam is set to 1 Setting Uniform Arrays of Scalar Vector and Matrix Parameters To set an array parameter use cgD3D9SetUni formArray HRESULT cgD3D9SetUniformArray CGparameter parameter DWORD startIndex DWORD numberOfElements const void array The parameters startIndexand numberOfElements specify which elements of the array parameter are set Those are the numberOfElements elements of indices ranging from startIndexto startIndex numberOfElements 1 It is assumed that array contains enough values to set all those elements As with cgD3D9SetUniform cgD3D9TypeToSize can be used to determine how many values are required and the type is void so a compatible user defined structure can be passed in without type casting There is a convenience function equivalent to cgD3D9SetUni formMatrix H
303. returned There is a one to one correspondence between a set of predefined semantics POSITION COLOR and so on and hardware resources registers texture units and so on In the Cg runtime a hardware resource is represented by the type CGresource and cgGetParameterResource retrieves the resource assigned to a parameter CGresource cgGetParameterResource CGparameter parameter 808 00504 0000 006 69 NVIDIA Cg Language Toolkit If the parameter does not have any associated resource cgGetParameterResource returns CG_UNDEFINED The two functions cgGetResource and cgGetResourceString allow you to determine the correspondence between a resource enumerant and its corresponding string CGresource cgGetResource const char resourceString const char cgGetResourceString CGresource resource If the string passed to cgGetResource does not correspond to any resource CG_UNDEFINED is returned Using cgGetParameterBaseResource allows you to retrieve the base resource for a parameter in a Cg program CGresource cgGetParameterBaseResource CGparameter parameter The base resource is the first resource in a set of sequential resources For example if a given parameter has a resource equal to C6_TEXCOORD7 its base resource is CG TEXCOORDO Only parameters with resources whose name ends with a number have a base resource All other parameters return CG_UNDEFINED When cgGetParameterBaseResource is call
304. rface each member of which has a different implementation This ability makes it easy for applications to construct material trees on the 60 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library fly to change the number or type of texture maps applied to an object at application runtime and so on Specifying which particular implementation of an interface to use is accomplished through connecting parameters In particular a shared instance of a struct that implements the interface is created by the application This shared instance is then connected to the interface parameter The act of connecting the parameters causes the interface parameter to inherit the shared parameter s implementation of the interface This process can be thought of as implementing compile time polymorphism It is legal to connect a shared parameter of a user defined structure type to an interface parameter as long as the structure type implements that interface type At runtime the entry point s cgIsParentType coupled with cgGetParameterNamedType can be used to determine type parenthood When a structure parameter is connected to an interface parameter copies of any child that is member variables associated with the source structure parameter are automatically created as children of the sink parameter Under most circumstances these member variable copies can be ignored by the application since their values and variability are a
305. riangles drawn The second step consists in enabling the varying parameter for a specific drawing call void cgGLEnableClientState CGparameter parameter The equivalent disabling function is void cgGLDisableClientState CGparameter parameter Another way to set the vertex varying parameter is to use the cgGLSet Parameter functions When a cgGLSetParameter function is called for a varying parameter the appropriate immediate mode OpenGL entry point is called The egGLGet Parameter functions do not apply to varying parameters Setting Sampler Parameters Setting a sampler parameter requires two steps First an OpenGL texture object handle must be assigned to the sampler parameter Next the texture unit associated with the sampler must be enabled prior to drawing The first step must be done explicitly by the application The second step may also be performed explicitly by the application or the OpenGL Cg runtime can be instructed to automatically manage texture units itself The first step consists in assigning an OpenGL texture object to the sampler parameter using void cgGLSetTextureParameter CGparameter parameter GLuint textureName where textureName is the OpenGL texture name Note that when your application makes OpenGL calls to initialize the texture environment for a given sampler it is important to remember to set the active texture unit to that associated with the sampler before doing so The sampler s texture unit
306. rithmetic instruction From here on these operations are referred to as input modifiers and output modifiers The ps_1_x profiles also restrict when a texture addressing operation or arithmetic operation can occur in the program A texture addressing operation may not have any dependency on the output of an arithmetic operation unless Q The arithmetic operation is a valid input modifier for the texture addressing operation O The arithmetic operation is part of a complex texture addressing operation which are summarized in the section on Auxiliary Texture Functions Input and output modifiers may be used to perform simple arithmetic operations without generating an arithmetic instruction Instead the arithmetic operation modifies the assembly instruction or source registers to which it is applied For example the following Cg expression z x 0 5 y 2 could generate the following pixel shader instruction assuming x is in t0 y is in t1 and z is in r0 add_d2 r0 t0_bias tl How different DirectX pixel shader 1_X instruction set modifiers are expressed in Cg programs are summarized in Table 48 For more details on the context in which each modifier is allowed and ways in which modifiers may be combined refer to the DirectX pixel shader 1_X documentation Table 48 ps 1 x Instruction Set Modifiers Instruction Register Cg Expression Modifier instr X2 2 x instr X4 4 x instr d2 x 2 808 00504 00
307. rix 106 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Called at application startup void OnStartup J Create Gomes context cgCreateContext Called whenever the Direct3D device needs to be created void OnCreateDevice Pass the Direct3D device to th xpanded interfac cgD3D9SetDevice device Determine the best profiles to use CGprofile vertexProfile cgD3D9GetLatestVertexProfile CGprofile pixelProfile cgD3D9GetLatestPixelProfile Grab the optimal options for each profile const char vertexOptions cgD3D9GetOptimalOptions vertexProfile 0 const char pixelOptions cgD3D9GetOptimalOptions pixelProfile 0 Create the vertex shader vertexProgram cgCreateProgramFromFile context CG_SOURCE VertexProgram cg vertexProfile VertexProgram vertexOptions If your program uses explicit binding semantics you can create a vertex declaration using those semantics const D3DVERTEXELEMENT9 declaration t 97 sizeof float D3DDECLTYPE FLOAT3 D3DDECLMETHOD DEFAULT D3DDECLUSAGE POSITION 0 9 sizeof float D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT D3DDECLUSAGE COLOR O0 TROF sizeof float D3DDECLTYPE_FLOAT2 D3DDECLMETHOD_DEFAULT D3DDECLUSAGE TEXCOORD 0 D3DD3CL_END I
308. rm float scale uniform float bias offsettexRECTScaleBias uniform samplerRECT tex float2 st float4 prevlookup uniform float4 m uniform float scale uniform float bias Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy float4 result tex2D RECT tex newst return result saturate prevlookup z scale bias where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation m is the offset texture matrix scale is the offset texture scale and bias is the offset texture bias This function can be used to generate the offset_2d_scale or offset_rectangle_scale NV_texture_shader instructions 291 808 00504 0000 006 NVIDIA Cg Language Toolkit Table 38 p20 Auxiliary Texture Functions continued Texture Function Description tex1D_dp3 sampler1D tex float3 str float4 prevlookup Performs the following return tex1D tex dot str prevlookup xyz where str are texture coordinates associated with sampler tex and prevlookup is the result of a previous texture operation This function can be used to generate the dot product 1d NV_texture_shader instruction tex2D_dp3x2 uniform sampler2D tex float3 str texRECT_dp3x2 uniform samplerRECT tex float3 str float4 intermediate_coord float4 prevlookup float4 intermediate coord float4 prevlookup Performs the following float2 ne
309. rm points from model space to clip space The second matrix ModelViewIT is the inverse transpose of the modelview matrix The third parameter LightVec is a vector that specifies the location of the light source Basic Transformations Now we start the body of the vertex program vertout OUT OUT HPosition mul ModelViewProj IN Position A vertex program is responsible for calculating the homogenous clip space position of the vertex given the vertex s model space coordinates Therefore the vertex s model space position given by IN Position needs to be transformed by the concatenation of the modelview and projection matrices called ModelViewProj in this example The transformed position is assigned directly to OUT HPosition Note that you are not responsible for 808 00504 0000 006 149 NVIDIA Cg Language Toolkit the perspective division when using vertex programs The hardware automatically performs the division after executing the vertex program Since we want to do our lighting in eye space we have to transform the model space normal IN Normal to eye space transform normal from model space to view space float3 normalVec normalize mul ModelViewIT IN Normal xyz Remember that when transforming normals we need to multiply by the inverse transpose of the modelview matrix Then we normalize the eye space normal vector and store it as normalVec Prepare for Lighting The subsequent steps pre
310. rmance or precision reasons it is generally wiser to use the standard library functions when possible The standard library functions will continue to be optimized for future GPUs meaning that a shader written today will automatically be optimized for the latest architectures at compile time Additionally the standard library provides a convenient unified interface for both vertex and fragment programs This section describes the contents of the Cg Standard Library including Mathematical functions Geometric functions Texture map functions Derivative functions D D O Do O Predefined helper struct types Where appropriate functions are overloaded to support scalar and vector variations when the input and output types are the same Mathematical Functions Table 1 Mathematical Functions lists the mathematical functions that the Cg Standard Library provides The list includes functions useful for trigonometry exponentiation rounding and vector and matrix manipulations among others All functions work on scalars and vectors of all sizes except where noted 808 00504 0000 006 33 NVIDIA Cg Language Toolkit Table 1 Mathematical Functions Mathematical Functions Function Description abs x Absolute value of x acos x Arccosine of x in range 0 7 x in 1 1 all x Returns true if every component of x is not equal to 0 Returns false otherwise any x Returns true if any comp
311. rofile CG_PROFILE VP30 Compiler option profile vp30 808 00504 0000 006 3 NVIDIA Cg Language Toolkit Q OpenGL NV30 fragment programs Runtime profile CG_PROFILE_FP30 Compiler option profile fp30 Q OpenGL NV2X vertex programs Runtime profile CG_PROFILE VP20 Compiler option profile vp20 Q OpenGL NV2X fragment programs Runtime profile cG PROFILE FP20 Compiler option profile fp20 a DirectX 9 vertex shaders Runtime profiles CG_PROFILE VS 2 X CG PROFILE VS 2 0 Compiler options profile vs 2 x profile vs 2 0 Q DirectX 9 pixel shaders Runtime profiles CcG PROFILE PS 2 X CG PROFILE PS 2 0 Compiler options profile ps 2 x profile ps 2 0 a DirectX 8 vertex shaders Runtime profile CG_PROFILE VS 1 1 Compiler option profile vs 1 1 Q DirectX 8 pixel shaders Runtime profiles CG_PROFILE PS 1 3 CG_PROFILE_PS_1_2 CG_PROFILE_PS_1_1 Compiler options profile ps_1_3 profile ps 1 2 profile ps 1 1 The DirectX 9 profiles vs 2 x and ps 2 x OpenGL ARB profiles arbfp1 and arbvp1 NV30 OpenGL profiles p30 and vp30 and NV40 OpenGL profiles p40 and vp40 generally support longer more complex programs and offer more features and functionality to the developer These are referred to as advanced profiles The DirectX 8 profiles vs 1 1and ps 1 3 and NV2X OpenGL profiles p20 and vp20 have more restrictions on program length and available 4 808 00504 0000 006 NVIDIA Introduction to the Cg La
312. rtex shaders are summarized in Table 47 Table 47 vs 1 1 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION Output position oPos PSIZE Output point size oPts FOG Output fog value oFog COLORO COLOR1 Output color values oDO oD1 TEXCOORDO TEXCOORD7 Output texture coordinates oTO oT7 When using the vs_1_1 profile under DirectX 9 it is necessary to tell the compiler to produce del statements to declare varying inputs The option profileopts dcls causes dcl statements to be added to the compiler output 808 00504 0000 006 307 Cg Language Toolkit DirectX Pixel Shader 1 x Profiles ps 1 Overview The DirectX pixel shader 1_X profiles are used to compile Cg source code to DirectX PS 1 1 PS 1 2 or PS 1 3 pixel shader assembly Q Profile names ps_1_1 for DirectX PS 1 1 pixel shaders ps_1_2 for DirectX PS 1 2 pixel shaders ps_1_3 for DirectX PS 1 3 pixel shaders Q How to invoke Use the compiler options profile ps 1 1 profile ps 1 2 profile ps 1 3 The deprecated profile dx8ps is also available and is synonymous with ps 1 1 This document describes the capabilities and restrictions of Cg when using the DirectX pixel shader 1 X profiles DirectX PS 1 4 is not currently supported by any Cg profile all statements aboutps 1 xin the remainder of this document refer only to ps 1 1 ps 12andps 1 3 The underlying instru
313. rts a number of options that allow these limits to be specified on the compiler command line see Options on page 262 for details These limits may also be values appropriate for the host computer s GPU which are set using the cgGLSetoptimaloptions Cg runtime call Language Constructs and Support Data Types This profile implements data types as follows float data type is implemented as IEEE 32 bit single precision Q half fixed and double data types are treated as float Q int data type is supported using floating point operations O sampler types are supported to specify sampler objects used for texture fetches Statements and Operators With the ARB fragment program profiles while do and for statements are allowed only if the loops they define can be unrolled because there is no dynamic branching in ARB fragment program 1 Comparison operators are allowed gt lt gt lt and Boolean operators amp amp are allowed However the logic operators amp are not Using Arrays and Structures Variable indexing of arrays is not allowed Array and structure data is not packed 264 808 00504 0000 006 NVIDIA Bindings Appendix B Language Profiles Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the arbfp1 profile are found in Table 19 Table 19 arbfp1 Uniform Input Binding Semantics Binding Semantics Name Corres
314. s Here my unc is declared to be a function of a single parameter vals which is a one dimensional array of floats However the length of the vals array is not specified The effect of this declaration is that any subsequent call to myfunc that passes a one dimensional array of floats of any size resolves to the declared function For example float myfunc float vals nicard mein 4 14 808 00504 0000 006 NVIDIA Introduction to the Cg Language float valel 2 5 float valsz 76 float myvall myfunc vals1 match float myval2 myfunc vals2 match The actual length of an array parameter sized or unsized may be queried via the length pseudo member float myfunc float vals AS 07 for aime 2 Op 3 lt vals encep i i sum vals i return sum The size of a particular dimension of a multidimensional array may be queried by dereferencing the appropriate number of dimensions of the array For example vals2d 0 length gives the length of the second dimension of the two dimensional vals2d array lost myjaruiae Gelkoeie yelsz2en 11 d float sum 0 fore ine i Op a lt velszel lenguas au 4 tow aum J 7 aL lt weise lencia 3a if sum vals 0 3 3 11 g return sum If the length of any dimension of an array parameter is specified that parameter only matches calls with variables whose corresponding dimension is of th
315. s advantage of this situation to compute lighting per vertex rather than per pixel In a similar manner it may be advantageous to move any vertex shader computation that is solely dependent on the values of uniform parameters to the CPU and then to pass the result of the computation into the vertex shader with different uniform parameters For example if the vertex shader is passed a float3 vector giving the direction of a distant light source the vector should be normalized on the CPU and passed to the vertex shader This avoids the need to repeatedly and unnecessarily recompute normalize lightvector in the vertex shader 808 00504 0000 006 327 NVIDIA Cg Language Toolkit 8 Avoid Matrix Transposes J ust for Multiplication Computing the transpose of a matrix can often be avoided If you would like to multiply transposed float3x3 matrix mby a float3 v mali mu is equivalent to and more efficient than mul transpose m v 9 Minimize Conditional Code in Fragment Programs GPUs don t currently support branching in fragment programs a program with a large amount of code that is conditionally executed for example in an if else expression tends to run at the same speed as if all of it were executed Therefore if you have a large amount of conditional code and it is possible to evaluate the condition on the CPU it may be advantageous to have multiple versions of the shader source code and to bind the one with
316. s are not required to support any operations on arbitrarily sized arrays only support for vectors and matrices is required Unsized Arrays An unsized array may be declared by declaring an array with no length specified between the brackets float a The actual length of the array may then be set by the runtime before program execution In program code the length of any array can be queried using the syntax a length where length acts like an undeclared structure parameter that holds the actual length of the array at runtime 808 00504 0000 006 239 NVIDIA Cg Language Toolkit Function Overloading Multiple functions may be defined with the same name as long as the definitions can be distinguished by unqualified parameter types and do not have an open profile conflict see Overloading of Functions by Profile on page 226 Function matching rules 1 Add all visible functions with a matching name in the calling scope to the set of function candidates Eliminate functions whose profile conflicts with the current compilation profile Eliminate functions with the wrong number of formal parameters If a candidate function has excess formal parameters and each of the excess parameters has a default value do not eliminate the function If the set is empty fail For each actual parameter expression in sequence perform the following a If the type of the actual parameter matches the unqualified type of the
317. s by sampling at different frequencies float3 fleckN float3 tex2D FleckMap vert uv 37 2 1 Pecki E S eleat9 ex2bIlleciMapPVeE5uVvePS 2 1 2 5 ie exei 2 p float fleck_n_d_h saturate dot fleckN H float3 fleck color FleckColor pow fleck n d h 808 00504 0000 006 187 NVIDIA Cg Language Toolkit lerp NewPaintSpec y NewPaintSpec w v_dist Control the ambient fleckiness and also attenuate with distance fleck_color fleck_color Ambient vert halfangle w DIFFUSE close Ie cl Series ial il 2 float3 paintResult lerp Ambient paint_color parme dolos le er FRESNEL float Fresnel saturate dot ClearCoat reflect_color Fresnel pow Fresnel NewPaintSpec z This helps make the clear coat less omnipresent only the really perceptually bright areas reflect Vi ThS moste Fresnel saturate vert fresn Fresnel Show more of the specular reflection environment when in fresnel zones diffuse 1 fresnel environment fresnel paintResult lerp paintResult reflect color Fresnel SPECULAR diffuse specular flecks paintResult paintResult n_d_h fleck_color OUTPUT return paintResult xyzz 188 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders This chapter provides a set of basic profile sample shaders written in Cg Each shader comes with an accompanying snapshot description and
318. s that create and manipulate data D a a Basic types Structures Arrays Type conversions Basic Data Types Cg supports seven basic data types a float A 32 bit IEEE floating point s23e8 number that has one sign bit a 23 bit mantissa and an 8 bit exponent This type is supported in all profiles although the DirectX 8 pixel profiles implement it with reduced precision and range for some operations QO half A 16 bit IEEE like floating point s10e5 number A int A 32 bit integer Profiles may omit support for this type or have the option to treat int as float Q fixed A 12 bit fixed point number s1 10 number It is supported in all fragment profiles Q bool Boolean data is produced by comparisons and is used in if and conditional operator constructs This type is supported in all profiles OU sampler 808 00504 0000 006 11 NVIDIA Cg Language Toolkit The handle to a texture object comes in six variants sampler sampler1D sampler2D sampler3D samplerCUBE and samplerRECT With one exception these types are supported in all pixel profiles fragment profiles and the NV40 vertex program profile The samplerRECT type is not supported in the DirectX profiles Q string Although it is not possible to use strings in Cg program code for any currently existing profile they can be set and have their values queried though the Cg runtime API thus they can be useful for storing information a
319. s when passing function parameters Q Top level function parameters may be defined using that type If a type is partially supported variables may be defined using that type but no useful operations can be performed on them Partial support for types makes it easier to share data structures in code that is targeted at different profiles Type Categories The integral type category includes types cint and int The floating type category includes types c loat float half and fixed Note that floating really means floating or fixed fractional The numeric type category includes integral and floating types The compile time type category includes types cfloat and cint These types are used by the compiler for constant type conversions O The concrete type category includes all types that are not included in the compile time type category O The scalar type category includes all types in the numeric category the bool type and all types in the compile time category In this specification a reference to a lt category gt type such as a reference to a numeric type means one of the types included in the category such as float half or fixed Constants A constant may be explicitly typed or implicitly typed Explicit typing of a constant is performed as in C by suffixing the constant with a single character indicating the type of the constant for float d for double D Q hforhalf OQ xforfixed A
320. sche EUR RE ROO QUR RPE AR aR CH Ree RE CUR p good 244 Minimum Requirements for if while and for Statements 244 New Vector Operators versan RO E ROROE Oe RS EROR COE Re Rhee Roe 244 Arithmetic Precision and Range 2 rs 246 Operator Precedentes 247 Operator ENNaNCEmMENtS okai a i dok aca a e eR a i 247 Or AAA IC PE 248 Reserved WOPIS 2 coram avidus i a ORCI mnie DANCER Poma TE nau e AE 249 Cg Standard Library FUNCIONS riseire epp ron in Parens dace 250 Vertex Program Profiles creon scs qo Bcd toa x uot RU RR a 250 Mandatory Computation of Position Output lee 250 Position nVatia CQ cache hedged nage de qux Yo OE tee po di barra e aas 250 Binding Semantics for Outputs sva kei o pp dead eed E a RE E Lgs 251 Fragment Program Proves ai fees x gia dba ak ose ERR EO DR REOR OR D d 252 Binding Semantics for Outputs i us sepe qp rr Cad qun g d de Rae us 252 Appendix B Language Profiles oo rr RR REIN a Rara EE Rua ad Rmi au 255 OpenGL ARB Vertex Program Profile arbvp1 liliis 256 OVAs cura Reps d E RE REN ERRASSE IEEE Sd de mtb dug ds 256 Accessing OpenGL State ss exsdeer RERO eke meee hed EALE EN DEPRE 256 Position nValidliCBa axons acs 8 gig a bob regen SORS Rebeka Qd Debout ded anis 258 Data pesto a a Ia ERR ORI ge Ud BC ee ae 258 Compatibility with the vp20 Vertex Program Profile o0oo oooooo 259 Loading CONSEANES soria cda PCT ER 260 Bihdinds ua ri a ma gig gR E m a
321. sed for data that is specified with each element of the stream of input data For example the varying inputs to a vertex program are the per vertex values that are specified in vertex arrays For a fragment program the varying inputs are the interpolants such as texture coordinates Q Uniform inputs are used for values that are specified separately from the main stream of input data and don t change with each stream element For example a vertex program typically requires a transformation matrix as a uniform input Often uniform inputs are thought of as graphics state 808 00504 0000 006 5 NVIDIA Cg Language Toolkit Varying Inputs to a Vertex Program A vertex program typically consumes several different per vertex varying inputs For example the program might require that the application specify the following varying inputs for each vertex typically in a vertex array Q Model space position O Model space normal vector Q Texture coordinate In a fixed function graphics pipeline the set of possible per vertex inputs is small and predefined This predefined set of inputs is exposed to the application through the graphics API For example OpenGL 1 4 provides the ability to specify a vertex array of normal vectors In a programmable graphics pipeline there is no longer a small set of predefined inputs It is perfectly reasonable for the developer to write a vertex program that uses a per vertex refractive inde
322. shader is developed The ultimate test for a shader is Does it look right To that end the ability to quickly prototype and modify a shader is crucial to the rapid development of high quality effects O The compiler optimizes code automatically and performs low level tasks such as register allocation that are tedious and prone to error O Shading code written in a high level language is much easier to read and understand It also allows new shaders to be easily created by modifying previously written shaders What better way to learn than from a shader written by the best artists and programmers Q Shaders written in a high level language are portable to a wider range of hardware platforms than shaders written in assembly code This chapter introduces Cg C for Graphics a high level language tailored for programming GPUs Cg offers all the advantages just described allowing programmers to finally combine the inherent power of the GPU with a language that makes GPU programming easy 808 00504 0000 006 1 NVIDIA Cg Language Toolkit The Cg Language Cg is based on C but with enhancements and modifications that make it easy to write programs that compile to highly optimized GPU code Cg code looks almost exactly like C code with the same syntax for declarations function calls and most data types Before describing the Cg language in detail it is important to explain the reason for some of the differences that exis
323. sionality of an array is queried using int cgGetArrayDimension CGparameter param Dimensions are enumerated starting at 0 zero The length of a particular dimension of an array can be retrieved by calling int cgGetArraySize CGparameter param int dimension The total number of elements in an array may be queried using int cgGetArrayTotalSize CGparameter param Here param may be an array of any dimension the returned value is the total number of elements across all dimensions of the array The type of each element of an array can be queried using CGtype cgGetArrayType CGparameter param For example if a parameter were declared sc lleyene d array 21 1517 cgGetArrayType would return CG_FLOAT4 If it were declared misere uses L3 cgGetArrayType would return the enumerant corresponding to the user defined mystruct type Unsized Array Length Unsized arrays can be assigned concrete sizes via the runtime Under many profiles setting the size of unsized arrays associated with a Cg program is required before the program can be compiled 808 00504 0000 006 67 NVIDIA Cg Language Toolkit The length of one dimensional unsized arrays can be set using void cgSetArraySize CGparameter param int size The size of multidimensional arrays may be set using void cgSetMultiDimArraySize CGparameter param int sizes Note that arrays with completely determined lengths may not have their size changed using
324. specular highlights Fig 12 Example of Car Paint 9 808 00504 0000 006 183 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Car Paint 9 This shader is based on the Time Machine temporal rust shader Car paint data was measured by Cornell University from samples provided by Ford Motor Company Siecuce azy di float4 OPosition POSITION float3 ONormal NORMAL lomo why EXCOORDO float3 Tangent EXCOORD1 float3 Binormal EXCOORD2 float3 Normal EXCOORD3 y Struct VS OUTBRUT Y float4 HPosition POSITION coord position in window late 2 uw TEXCOORDO wavy fleckmap coords loaro Licime TEXCOORD1 light pos tangent space float4 halfangle TEXCOORD2 Blinn halfangle float3 reflection TEXCOORD3 Refl vector per vertex float4 view TEXCOORD4 view tangent space float3 tangent TEXCOORD5 view tangent matrix float3 binormal TEXCOORD6 float3 normal 8 ECC OORD Wim 7 float fresn COLORO y VS_OUTPUT main a2v vert TRANSFORMATIONS uniform float4x4 ModelView uniform float4x4 ModelViewIT uniform float4x4 ModelViewProj uniform float3 LightVector uniform float3 EyePosition Obj space Obj space VS OUTPUT O Generate homogeneous POSITION O HPosition mul ModelViewProj vert OPosition Generate BASIS matrix float3x3 ModelTangent normalize vert Tangent n
325. spose objToTangentSpace since the inverse of a rotation is its transpose P4 So a row of TangentToCubeSpace is the transform by objToTangentSpace of the corresponding row of il ObjToCubeSpace OUT TangentToCubeSpaceO0 xyz mul objToTangentSpace ObjToCubeSpace 0 xyz OUT TangentToCubeSpacel xyz mul objToTangentSpace ObjToCubeSpace l1 xyz OUT TangentToCubeSpace2 xyz mul objToTangentSpace ObjToCubeSpace 2 xyz compute the eye vector T going from eye to shaded point in cube space float3 eyeVector mul ObjToCubeSpace IN Position EyePosition OUT TangentToCubeSpace0 w eyeVector x OUT TangentToCubeSpacel w eyeVector y OUT TangentToCubeSpace2 w eyeVector z transform position to projection space OUT Position mul WorldViewProj IN Position retura OUT 198 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders Pixel Shader Source Code for Bump and Reflection Mapping EEwKeE wed 1 float4 Position POSITION in projection space float4 TexCoord TEXCOORDO0 JE ESSE WEE C ME cR SEO d from tangent to cube space float4 TangentToCubeSpace0 TEXCOORD1 second row of the 3x3 transform from tangent to cube space float4 TangentToCubeSpacel TEXCOORD2 third row of the 3x3 transform f d from tangent to cube space float4 TangentToCubeSpace2 TEXCOORD3 y AM WIE IN uniform sampler2D NormalMap unif
326. stroyed Parameter References A parameter that is referenced by the original Cg source code may be optimized out of the compiled program by the compiler in which case the application can simply ignore it and not set its value Calling cgIsParameterReferenced allows you to check whether a parameter is potentially used by the final compiled program CGbool cgIsParameterReferenced CGparameter parameter Note that the value returned by this entry point is conservative but not always exact particularly if the program has not yet been compiled Also note that no error is generated if you set the value of a parameter that is not referenced 66 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Parameter Size A number of core Cg runtime entry points are provided for querying and setting parameter size and length The number of rows or columns associated with a parameter can be retrieved using int cgGetParameterRows CGparameter param int cgGetParameterColumns CGparameter param A scalar parameter is considered to have a single row and a single column while a vector parameter has a single row and columns equal to the length of the vector If paramis a matrix parameter the values returned correspond to those of the matrix If paramis an array the number of rows or columns associated with each element of the array is returned If paramis not a numeric type 0 is returned by either entry point The dimen
327. sual Studio workspace both provided on the accompanying CD that you can use to start experimenting with Cg Q Advanced Profile Sample Shaders on page 153 A list of sample NV30 shaders complete with source code Q Basic Profile Sample Shaders on page 189 A list of sample NV2X shaders complete with source code O Appendix A Cg Language Specification on page 221 The formal Cg language specification O Appendix B Language Profiles on page 255 Describes features and restrictions of the currently supported language profiles DirectX 8 vertex DirectX 8 pixel OpenGL ARB vertex NV2X OpenGL vertex NV30 OpenGL vertex NV30 OpenGL fragment OpenGL ARB fragment NV40 OpenGL vertex and NV40 OpenGL fragment 808 00504 0000 006 XV NVIDIA Cg Language Toolkit Q Appendix C Nine Steps to High Performance Cg on page 321 Strategies for getting the most out of your Cg code Appendix D Cg Compiler Options on page 329 A list of the various command line options that the Cg compiler accepts a CgDeveloper s CD The CD provided with this book contains the entire Cg release which allows you get started immediately The readme txt file on the CD describes the contents of the release in detail You can begin working with Cg immediately by reading the Introduction to the Cg Language on page 1 and then going through A Brief Tutorial on page 145 Once you have a basic understanding of the Cg language
328. sulting declaration is compatible with the shader This is really just a sanity check assert cgD3D8ValidateVertexDeclaration vertexProgram declaration Load the program with th xpanded interfac Parameter shadowing is enabled second parameter TRUE 110 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library cgD3D8LoadProgram vertexProgram TRUE 0 0 declaration Ii Create the pizel shader fragmentProgram cgCreateProgramFromFile context CG_SOURCE FragmentProgram cg pixelProfile FragmentProgram pixelOptions Load the program with th xpanded interfac Parameter shadowing is enabled second parameter TRUE Ignore vertex shader specifc flags like declaration and usage cgD3D8LoadProgram fragmentProgram TRUE 0 0 0 Grab some parameters modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor Sanity check that parameters have th xpected siz assert cgD3D8TypeToSize cgGetParameterType modelViewMatrix 16 assert CgD3D8TypeToSize cgGetParameterType someColor NI Set parameters that don t change They can be set only once since parameter shadowing is enabled cgD3D8SetTexture baseTexture texture cgD3D8SetUniform som
329. t between Cg and C Fundamentally it comes down to the difference in the programming models for GPUs and for CPUs Cg s Programming Model for GPUs CPUs normally have only one programmable processor In contrast GPUs have at least two programmable processors the vertex processor and the fragment processor plus other non programmable hardware units The processors the non programmable parts of the graphics hardware and the application are all linked through data flows Cg s model of the GPU is illustrated by Fig 1 3D Application or Game 3D API Commands 3D API OpenGL or Direct3 CPU GPU Boundary GPU Command amp Data Stream Assembled Pixel Vertex Index Polygons Lines Location Pixel Stream amp Points Stream Updates GPU a Primitive quem Rasterization amp mmm Raster Buffer Front En Assembly Interpolation Operations Frame Pretransformed Transformed Rasterized Transformed Vertices Vertices Pretransformed Fragments Fragments Programmable ac Vertex Pr Processor t Fk Fig 1 Cg s Model of the GPU 2 808 00504 0000 006 NVIDIA Introduction to the Cg Language The Cg language allows you to write programs for both the vertex processor and the fragment processor We refer to these programs as vertex programs and fragment programs respectively Fragment programs are also known as pixe programs ox pixel shaders and we use these terms interchangeably in this document Cg code can be c
330. t of functions on top of the core Cg runtime to ease the integration of Cg to an application based on this API They essentially interface between the core runtime data structures and the API data structures to provide the following facilities 72 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Q Setting the parameter values A distinction is made between texture matrix array vector and scalar values as those various types are handled differently by each API and have different data structures Q Executing the program Program execution is divided into program loading passing the result of the Cg compiler to the API and program binding setting the program as the one to execute for any subsequent draw calls This is because those two operations are usually done at a different time A program is loaded each time it is recompiled and it is bound each time it needs to be executed for a particular draw call Parameter Shadowing When the value of a uniform parameter is set by some function of the OpenGL Cg runtime it is actually stored internally or shadowed by either the Cg or the OpenGL runtime so that it does not need to be reset every time the program is about to be executed This behavior is referred to as parameter shadowing If the Direct3D Cg runtime expanded interface described in Direct3D Expanded Interface on page 98 is used parameter shadowing can be turned on or off on a per program basis When
331. t only default as in C 808 00504 0000 006 19 NVIDIA Cg Language Toolkit Cg supports function overloading by the number of operands and by operand type The choice of a function is made by matching one operand at a time starting at the first operand The formal language specification provides more details on the matching rules but it is not normally necessary to study them because the overloading generally works in an intuitive manner For example the following code declares two versions of a function one that takes two bool operands and one that takes two float operands bool same float a float b return a b bool same bool a bool b return a b Arithmetic Operators from C Cg includes all the standard C arithmetic operators and allows the operators to be used on vectors as well as on scalars The vector operations are always performed in elementwise fashion For example float3 a b c float3 A B C equals float3 a A b B c C These operators can also be used in a form that mixes scalar and vector the scalar is smeared to create a vector of the necessary size to perform an elementwise operation Thus a float3 A B C isequalto float3 a A a B a C The built in arithmetic operators do vot currently support matrix operands It is important to remember that matrices are not the same as vectors even if their dimensions are the same Multiplication Functions Cg s mu
332. t4 variable Q Scalar conversions Implicit conversion of any scalar numeric type to any other scalar numeric type is allowed A warning may be issued if the conversion is implicit and a loss of precision is possible Implicit conversion of any scalar object type to any compatible scalar object type is allowed Conversions between incompatible scalar object types or between object and numeric types are not allowed even with an explicit cast A sampler is compatible with sampler1D sampler2D sampler3D samplerCube and samplerRECT No other object types are compatible sampler1D is not comparable with sampler2D even though both are compatible with sampler Scalar types may be implicitly converted to vectors and matrices of compatible type The scalar is replicated to all elements of the vector or matrix Scalar types may also be explicitly cast to structure types if the scalar type can be legally cast to every member of the structure Q Vector conversions Vectors may be converted to scalar types the first element of the vector is selected A warning is issued if this is done implicitly A vector may also be implicitly converted to another vector of the same size and compatible element type A vector may be converted to a smaller compatible vector or a matrix of the same total size but a warning is issued if an explicit cast is not used Q Matrix conversions Matrices may be converted to a scalar type element 0 0 is selected As with ve
333. tand the DirectX VS 1 1 Vertex Shaders and the code the compiler produces see the Vertex Shader Reference in the DirectX 8 1 SDK documentation 304 808 00504 0000 006 NVIDIA Appendix B Language Profiles Q int data type is supported using floating point operations which adds extra instructions for proper truncation for divides modulos and casts from floating point types Q fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables Statements and Operators The if while do and for statements are allowed only if the loops they define can be unrolled because there is no branching in VS 1 1 shaders There are no subroutine calls either so all functions are inlined Comparison operators are allowed gt lt gt lt and Boolean operators amp amp are allowed However the logic operators s are not allowed Using Arrays Variable indexing of arrays is allowed as long as the array is a uniform constant For compatibility reasons arrays indexed with variable expressions need not be declared const just uniform However writing to an array that is later indexed with a variable expression yields unpredictable results Array data is not packed because verte
334. te int4 Keep Zero 2 0 or Replace Incr EXT stencil two side Decr Invert IncrWrap DecrWrap TexGenSMode ndx int ObjectLinear 1 0 or 1 3 EyeLinear ARB texture cube map SphereMap EXT texture cube map Ol ReflectionMap NV texgen reflection for NormalMap ReflectionMap or NormalMap ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE COORDS TexGenTMode ndx int Same as TexGenSMode TexGenRMode ndx int ObjectLinear 1 0 or 1 3 EyeLinear ARB texture cube map ReflectionMap EXT texture cube map Or NormalMap NV texgen reflection for ReflectionMap Or NormalMap ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE COORDS TexGenQMode ndx int ObjectLinear 1 0 ndx must be greater or EyeLinear equal to zero and less than the value of GL MAX TEXTURE COORDS TexGenSEyePlane ndx float 4 1 0 ndx must be greater or equal to zero and less than the value of GL MAX TEXTURE COORDS TexGenTEyePlane ndx float 4 Same as TexGenSEyePlane TexGenREyePlane ndx float 4 Same as TexGenSEyePlane 136 NVIDIA 808 00504 0000 006 Introduction to CgFX Table 6 CgFX OpenGL State Manager States continued State Name Type Valid Enumerants Requires TexGenQEyeP lane ndx float4 Same as TexGenSEyePlane TexGenSObjectPlane float 4 Same as ndx TexGenSEyePlane TexGenTObjectPlane float 4 Same as ndx TexGenSEyePlane TexGenRObject
335. ter shadowing is enabled cgD3D9SetTexture baseTexture texture cgD3D9SetUniform someColor amp constantColor Called io render the seen void OnRender Load model view matrix D3DXMATRIX modelViewMatrix J d 108 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Set the parameters that change every frame This must be done before binding the programs cgD3D9SetUniformMatrix modelViewMatrix amp modelViewMatrix Set the vertex declaration device gt SetVertexDeclaration vertexDeclaration Bind the programs This downloads any parameter values that have been previously set cgD3D9BindProgram vertexProgram cgD3D9BindProgram fragmentProgram Draw scene Called before the device changes or is destroyed void OnDestroyDevice fy Calling tales function tells da xpanded interface to release its internal reference to the Direct3D devic eme free its Directs resources cgD3D9SetDevice 0 Called before application shuts down void OnShutdown This frees any core runtime resource cgDestroyContext context Expanded Interface DirectD3D 8 Application The following C code links the previous vertex and fragment programs to the Direct3D 8 application include lt cg cg h gt include lt cg cgD3D8 h gt IDirect3DDevice8 device Initialized somewher is IDirect3DTexture8 texture Ini
336. terface element for manipulating uniform parameters or to describe the type of render target a rendering pass is expecting float bumpHeight lt string gui slider elosie viimia 0 07 float uimax 1 0f float uistep 0 1f gt 0 587 The annotation appears after the optional semantic and before variable initialization Applications can query for annotations and use them to expose certain parameters to artists in a CgFX aware tool such as Discreet s 3ds max 5 or Alias Wavefront s Maya 4 5 More Details The purpose of this chapter has been to give you a brief overview of Cg so that you can get started quickly and experiment to gain hands on experience If you would like some more detail about any of the language features described in this chapter see Cg Language Specification on page 221 32 808 00504 0000 006 NVIDIA Cg Standard Library Functions Cg provides a set of built in functions and predefined structures with binding semantics to simplify GPU programming These functions are similar in spirit to the C standard library providing a convenient set of common functions In many cases the functions map to a single native GPU instruction meaning they are executed very quickly Of those functions that map to multiple native GPU instructions you may expect the most useful to become more efficient in the near future Although customized versions of specific functions can be written for perfo
337. ters and pointer related capabilities such as the amp and gt operators are not supported Arrays are supported but with some limitations on size and dimensionality Restrictions on the use of computed subscripts are also permitted Arrays may be designated as packed The operations allowed on packed arrays may be different from those allowed on unpacked arrays Predefined packed types are provided for vectors and matrices It is strongly recommended these predefined types be used 222 808 00504 0000 006 NVIDIA Appendix A Cg Language Specification a Unsized arrays can be created by declaring an array s dimension as The array s actual dimension can be set at runtime before a final compilation step Q There is a built in swizzle operator xyzw or rgba for vectors This operator allows the components of a vector to be rearranged and also replicated It also allows the creation of a vector from a scalar Q For an lvalue the swizzle operator allows components of a vector or matrix to be selectively written Q There is a similar built in swizzle operator for matrices _In lt row gt lt col gt _m lt row gt lt col gt This operator allows access to individual matrix components and allows the creation of a vector from elements of a matrix For compatibility with DirectX 8 notation there is a second form of matrix swizzle which is described later Q Numeric data types are different Cg s primary numer
338. tes the type of the parameter array elements 1 for arrays of float1 2 for arrays of 1oat2 and so on The variables start Index and numberOfElements specify which elements of the array parameter are set They are the numberOfElements elements of the indices that range from startIndexto startIndextnumberOfElements 1 Passing a value of 0 for numberO Elements tells the functions to set all the values starting at index startIndex up to the last valid index of the array namely cgGetArraySize parameter 0 1 This is equivalent to setting numberOfElements to cgGetArraySize parameter 0 startIndex The parameter array is an array of scalar values It must have numberOfElements for the cgGLSet ParameterArray1 functions 2 numberOfElements for the cgGLSetParameterArray2 functions and so on The corresponding parameter value retrieval functions are as follows void void void void void void void void cgGLGetParameterArraylf CGparameter parameter long startIndex long numberOfElements float array cgGLGetParameterArrayld CGparameter parameter long startIndex long numberOfElements double array cgGLGetParameterArray2f CGparameter parameter long startIndex long numberOfElements float array cgGLGetParameterArray2d CGparameter parameter long startIndex long numberOfElements double array cgGLGetParameterArray3f CGparameter parameter long startIndex long numberOfElements float array cgGLGetPar
339. texture lookups 23 texture map functions 38 texture maps for performance 324 textures 123 thin film effect pixel shader code example 182 vertex shader code example 180 tutorial 145 type conversions array 235 matrix 234 scalar 234 structure 235 vector 234 type equivalency 236 type promotion 236 assignment 237 smearing 237 type qualifiers 233 const 233 in 233 out 233 types general discussion 229 partial support 231 12 234 U uniform inputs 5 uniform modifer use of 225 uninitialized variables use of 241 unsized arrays 125 V variables global 241 uninitialized use of 241 varying inputs 5 6 vector data types 12 vector operators new 244 vectorization for performance 321 vectors constructing 21 808 00504 0000 006 vertex color 149 vertex position 149 vertex program 121 varying output 7 vertex program profiles 250 vertex programs defined 3 virtual machine 127 void type specification 229 vp20 profile 279 vp30 profile 270 vs_1_1 profile 304 vs_2_0 profile 296 vs 2 x profile 296 Ww water improved pixel shader code example sample shader 157 vertex shader code example web site NVIDIA xvi while statements 244 workspace loading 145 write mask operator 22 described 246 337 NVIDIA 160 158 Cg Language Toolkit 338 808 00504 0000 006 NVIDIA
340. the CgFX state assignment BlendFunc int2 Zero DstAlpha When a state assignment depends on the presence of an OpenGL extension for example BlendFuncSeparate requires either EXT_blend_func_separate or the presence of OpenGL 1 4 it is possible to successfully load an effect file that uses that extension in one of its techniques even if the OpenGL context doesn t support that extension However validation of any technique that uses such an unsupported extension in of its passes will fail The following table lists the names of the states supported by the CgFX OpenGL state manager their types and valid enumerants The Requires column in the tables below indicates what OpenGL version or extension is required for each state assignment Table 6 CgFX OpenGL State Manager States State Name Type Valid Enumerants Requires AlphaFunc float2 Never Less OpenGL 1 0 enum LEqual Equal reference_ Greater NotEqual value GEqual Always BlendFunc int2 src Zero One 1 0 1 4 or factor DestColor NV blend square for dst factor OneMinusDestColor SrcColor or SrcAlpha OneMinusSrcColor for OneMinusSrcAlpha src factor and DstAlpha DstColor Or OneMinusDstAlpha OneMinusDstColor for SrcAlphaSaturate dst factor SrcColor OneMinusSrcColor ConstantColor OneMinusConstantColor ConstantAlpha OneMinusConstantAlpha 130 808 00504 0000 006 NVIDIA Introduction to CgFX
341. the existing profiles Q Runon future profiles corresponding to new 3D APIs or to hardware that did not exist at the time the Cg programs were written No Dependency Limitations If you link a Cg program to the application when it is compiled the application is too dependent on the result of the compilation The application program has to refer to the Cg program input parameters by using the hardware register names that are output by the Cg compiler This approach is awkward for two reasons Q The register names can t be easily matched to the corresponding meaningful names in the Cg program without looking at the compiler output Q Register allocations can change each time the Cg program the Cg compiler or the compilation profile changes This means you have the inconvenience of updating the application each time as well In contrast linking a Cg program to the application program at run time removes the dependency on the Cg compiler With the runtime you need to alter the application code only when you add delete or modify Cg input parameters Input Parameter Management The Cg runtime also offers additional facilities to manage the input parameters of the Cg program In particular it makes data types such as arrays and matrices easier to deal with These additional functions also encompass the necessary 3D API calls to minimize code length and reduce programmer errors 44 808 00504 0000 006 NVIDIA Introduction to t
342. the unpack_4ubyte function C Psuedocode Ws ok wowiacd 255 0 clemolasz 0 0 1 0 p o y wowacd 255 0 clemo lay 0 0 1 Pp uo roumca 2339 0 clemolasz 0 0 1 0 oy wowacd 255 0 clemo lali 0 0 1 p restile low lt lt 24 lo lt lt 16 wig yv lt lt e wos unpack_4ubyte half4 unpack_4ubyte float a Unpacks the four 8 bit integers in a and scales the results into individual 16 bit floating point values between 0 0 and 1 0 C Pseudocode resultes a gt gt 0 amp O 255 07 esla y Ma gt 8 amp Os 255 05 wesulke 4 SO EE 25507 ESSE O E 255 05 278 808 00504 0000 006 NVIDIA Appendix B Language Profiles OpenGL NV_vertex_program 1 0 Profile vp20 Overview The vp20 Vertex Program profile is used to compile Cg source code to vertex programs for use by the NV vertex program OpenGL extension Q Profile name vp20 Q How to invoke Use the compiler option profile vp20 This section describes the capabilities and restrictions of Cg when using the vp20 profile The vp20 profile limits Cg to match the capabilities of the NV_vertex_program extension NV_vertex_program has the same capabilities as DirectX 8 vertex shaders so the limitations that this profile places on the Cg source code written by the programmer is the same as the DirectX VS 1 1 shader profile Aside from the syntax of the compiler output the only difference between the
343. tialized somewher ls D3DXCOLOR constantColor Initialized somewher ls CGcontext context CGprogram vertexProgram fragmentProgram CGparameter baseTexture someColor modelViewMatrix 808 00504 0000 006 109 NVIDIA Cg Language Toolkit Called at application startup void OnStartup Vi Create comerse context cgCreateContext Ii Called whenever the Direct sn device meses to ba crearad void OnCreateDevice Pass the Direct3D device to th xpanded interfac cgD3D8SetDevice device Determine the best profiles to use CGprofile vertexProfile cgD3D8GetLatestVertexProfile CGprofile pixelProfile cgD3D8GetLatestPixelProfile Grab the optimal options for each profile const char vertexOptions cgD3D8GetOptimalOptions vertexProfile 0 const char pixelOptions cgD3D8GetOptimalOptions pixelProfile 0 Create the vartez ssl vertexProgram cgCreateProgramFromFile context CG_SOURCE VertexProgram cg vertexProfile VertexProgram vertexOptions If your program uses explicit binding semantics like this one you can create a vertex declaration using those semantics DWORD declaration D3DVSD STREAM 0 D3DVSD REG D3DVSDE POSITION D3DVSDT FLOAT3 D3DVSD REG D3DVSDE DIFFUSE D3DVSDT_D3DCOLOR D3DVSD REG D3DVSDE TEXCOORDO D3DVSDT FLOAT2 D3DVSD END Ensure the re
344. tics for Varying Input Output Data The valid binding semantics for varying input parameters in the vp30 profile are summarized in Table 24 One can also use TANGENT and BINORMAL instead of TEXCOORD6 and TEXCOORD7 These binding semantics map to NV_vertex_program2 input attribute parameters The two sets act as aliases to each other Table 24 vp30 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION ATTRO Input Vertex Generic Attribute 0 BLENDWEIGHT ATTR1 NORMAL ATTR2 Input vertex weight Generic Attribute 1 Input normal Generic Attribute 2 COLORO DIFFUSE ATTR3 Input primary color Generic Attribute 3 COLOR1 SPECULAR ATTR4 Input secondary color Generic Attribute 4 TESSFACTOR FOGCOORD ATTR5 Input fog coordinate Generic Attribute 5 PSIZE ATTR6 Input point size Generic Attribute 6 BLENDINDICES ATTR7 Generic Attribute 7 TEXCOORDO TEXCOORD 7 ATTR8 ATTR15 Input texture coordinates texcoord0 texcoord7 Generic Attributes 8 15 TANGENT ATTR14 BINORMAL ATTR15 Generic Attribute 14 Generic Attribute 15 The valid binding semantics for varying output parameters in the vp30 profile are summarized in Table 25 These binding semantics map to NV_vertex_program2 output registers The two sets act as aliases to each other Table 25 vp30 Varying Output Binding Semantics Binding Semantics Name Corres
345. ting calculate diffuse component float diffuse dot normalVec lightVec calculate specular component float specular dot normalVec halfVec Use the lit function to compute lighting vector from diffuse and specular values float4 lighting lit diffuse specular 32 Here we use the Cg Standard Library to perform dot products using dot We also make use of the Standard Library s lit function to calculate a Blinn style lighting vector based on the previously computed dot products The returned vector holds the diffuse lighting contribution in the y coordinate and the specular lighting contribution in the z coordinate Remember to take advantage of the Standard Library to help speed up your development cycle Modulating the Diffuse and Specular Lighting Contributions Once the diffuse and specular lighting contributions lighting y and lighting z have been calculated we need to modulate them with the object s material properties blue diffuse material rilo ges Chiirruisaieircerial iloacs 0 0 0 0 1 0 white specular material float3 specularMaterial float3 1 0 1 0 1 0 combine diffuse and specular contributions and output final vertex color OUT Color rgb lighting y diffuseMaterial lighting z specularMaterial OUT CoLor a mO return OUT 808 00504 0000 006 151 NVIDIA Cg Language Toolkit We define the object s diffuse material color as blue We
346. tion vertexDeclaration vice gt SetTexture baseTextureUnit texture vice gt SetVertexShader vertexShader vice gt SetPixelShader pixelShader Q 000 Draw scene 94 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library Called before the device changes or is destroyed void OnDestroyDevice vertexShader gt Release pixelShader gt Release vertexDeclaration gt Release Called before application shuts down void OnShutdown This frees any core runtime resources The minimal interface has no dynamic storage to free cgDestroyContext context Direct3D 8 Application The following C code links the previous vertex and fragment programs to the Direct3D 8 application include lt cg cg h gt include lt cg cgD3D8 h gt IDirect3DDevice8 device Initialized somewhere else IDirect3DTexture8 texture Initialized somewhere else D3DXMATRIX matrix Initialized somewhere else D3DXCOLOR constantColor Initialized somewhere else CGcontext context CGprogram vertexProgram fragmentProgram DWORD vertexShader pixelShader CGparameter baseTexture someColor modelViewMatrix Called at application startup void OnStartup Z2 Create comerse context cgCreateContext Called whenever the Direct3D device needs to be created void OnCreateDevice Create the vertex shader vertexProgram c
347. tly There are several changes that force the same operation to be expressed differently in Cg than in C Q A Boolean type bool is introduced with corresponding implications for operators and control constructs Q Arrays are first class types because Cg does not support pointers Q Functions pass values by value result and thus use an out or inout modifier in the formal parameter list to return a parameter By default formal parameters are in but it is acceptable to specify this explicitly Parameters can also be specified as in out which is semantically the same as inout Differences from ANSI C Cg was developed based on the ANSI C language with the following major additions deletions and changes This is a summary more detail is provided later in this document Q Language profiles described in Profiles on page 225 may subset language capabilities in a variety of ways In particular language profiles may restrict the use of for and while loops For example some profiles may only support loops that can be fully unrolled at compile time QA binding semantic may be associated with a structure tag a variable or a structure element to denote that object s mapping to a specific hardware or API resource See Binding Semantics on page 242 Reserved keywords goto break and continue are not supported Reserved keywords switch case and default are not supported Labels are not supported either Q Poin
348. top level function or by any functions that it calls The output of the program comes from the return value of the function which is always implicitly varying and from any out parameters which must also be varying Parameters to a program of type sampler are implicitly const 808 00504 0000 006 243 NVIDIA Cg Language Toolkit Statements Statements are expressed just as in C unless an exception is stated elsewhere in this document Additionally Q The if while and for statements require bool expressions in the appropriate places O Assignment is performed using The assignment operator returns a value just as in C so assignments may be chained O The new discard statement terminates execution of the program for the current data element such as the current vertex or current fragment and suppresses its output Vertex profiles may choose to omit support for discard Minimum Requirements for if while and for Statements The minimum requirements are as follows Q All profiles should support if but such support is not strictly required for older hardware O All profiles should support for and while loops if the number of loop iterations can be determined at compile time Can be determined at compile time is defined as follows The loop iteration expressions can be evaluated at compile time by use of intra procedural constant propagation and folding where the variables through which constant v
349. trans modelview 0 tendif OUT HPosition mul ModelViewProj IN Position float3 normal normalize mul ModelViewIT TENES mas Ez float3 eyeToVert normalize mul ModelView IN POS Eon s x72 P reflect th ye vector across the normal vector for reflection OUT TexCoord0 float4 reflect eyeToVert normal 1 0 float 0 15 compute the fresnel term float oneMCosAngle 1 dot eyeToVert normal oneMCosAngle pow oneMCosAngle 5 OUT Color0 lerp oneMCosAngle 1 0 xxxx rerurn OUT 808 00504 0000 006 201 NVIDIA Cg Language Toolkit Grass Description This effect shows procedural animation of geometry using a Sine function along with calculation of a normal for the procedurally deformed geometry Fig 17 Fig 17 Example of Grass Vertex Shader Source Code for Grass Serle EMS d float4 Position POSITION float4 Normal NORMAL 202 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders float4 TexCoordO0 TEXCOORDO float4 Coloro COLORO y struct vertout float4 Hposition BOSTON Hali MN ike COLORO float4 TexCoordO0 TEXCOORDO y vertout main app2vert IN uniform uniform uniform uniform float4x4 ModelViewProj float4x4 ModelView float4x4 ModelViewIT float4 Constants vertout OUT we need to figure OUT what the position is float4 position position z 0 position y 0 IN Position add IN the act
350. trq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate_coord1 are texture coordinates associated with the n 2 texture unit intermediate coord2 are texture coordinates associated with the n 1 texture unit and eye is the eye ray vector This function can be used to generate the texm3x3pad texm3x3pad texm3x3spec instruction combination in all ps 1 x profiles tex dp3x2 depth float3 str float4 intermediate coord float4 prevlookup Performs the following float z dot intermediate coord xyz prevlookup xyz float w dot str prevlookup xyz return z w where str are texture coordinates associated with the nth texture unit intermediate coord are texture coordinates associated with the n 1 texture unit and prevlookup is the result of a previous texture operation This function can be used with the DEPTH varying out semantic to generate the texm3x2pad texm3x2depth instruction combination in ps 1 3 318 808 00504 0000 006 NVIDIA Appendix B Language Profiles Examples The following examples illustrate how a developer can use Cg to achieve DirectX pixel shader 1_X functionality Example 1 struct VertexOut float4 color SOHO RO float4 texCoord0 TEXCOORDO float4 texCoord1 TEXCOORD1 y float4 main VertexOut IN uniform sampler2D diffuseMap uniform sampler2D normalMap COLOR float4 diffuseTexColor t
351. ual base location of d cas straw uel IN Colum POSE Lam POSWELOM a WN Colo eO xp position a Tou s UN Color z7 figure OUT where the wind is coming from float4 origin lora 20 0 20 0 4 float4 dir POSLEOM Gueiepusp Wf al tae imncsnesity Our idas wmi float inten sin Constants x 2 length dir JUN POSE LOIN ye dir normalize dir Bezier curve stuff here float4 0 0 0 0 Float 0 UN Colocd w 2 0 0 P loca eli x lt nimeem IN Color Y chile atimta 0 do the Bezier linear interpolation steps ILO de IN Colort o were we need to do som iloare rere lil ihoaee orii Tlosr ciel 808 00504 0000 006 203 NVIDIA Cg Language Toolkit loge temo lheroa ciellil qued ww clock cenos lema eri gus i Pp float4 result lerp temp temp2 t add IN the height and wind displacement components position position result position w 1 transform for sending to the reg combiners OUT Hposition mul ModelViewProj position calculate the texture coordinate 721 from the position passed IN UA econ lose N osea ar 195 5300 8 ML 10 9 find the normal we need one more point to do a partial cedo lez Curl etrr Er 05 8 tempa lewo erele ceils Er0 05 float4 newResult lerp temp temp2 t 0 05 do a crossproduct with a vector that EY is horizontal across the screen float normal cross result
352. ues CGannotation int nvalues const int cgGetIntAnnotationValues CGannotation int nvalues const char cgGetStringAnnotationValue CGannotation const int cgGetBooleanAnnotationValues CGannotation int nvalues OpenGL State When egGLRegisterStates is called the CgFX OpenGL runtime initializes state assignments that correspond to almost all appropriate or useful OpenGL API calls The set of states and state callbacks that are registered by this call compose the CgFX OpenGL state manager There is a one to one mapping between the state assignments that are provided by the OpenGL state manager and the corresponding OpenGL calls Given an OpenGL call of interest it is intended to be simple to determine which state assignment it corresponds to and vice versa For example the state assignment ClearColor float4 0 1 0 1 leadsto the call glclearColor 0 1 0 1 when the state assignment is executed during a call to cgSetPassState For calls that take enumerated values for example GL DEST COLOR for glBlendFunc corresponding enumerants are defined by the CgFX 808 00504 0000 006 129 NVIDIA Cg Language Toolkit OpenGL state manager again with a straightforward mapping GL_DEST_COLOR corresponds to DestColor and so forth When an OpenGL call takes multiple parameters or multiple enumerants a corresponding vector type is used for example a call to glBlendFunc GL_ZERO GL_DST_ALPHA corresponds to
353. unction can generate an OpenGL error in addition to the Cg specific error These errors are checked in Cg as in any OpenGL application by using glGetError Direct3D Cg Runtime The Direct3D Cg runtime is composed of two interfaces Q Minimal interface This interface makes no Direct3D calls itself and should be used when you prefer to keep the Direct3D code in the application itself Q Expanded interface This interface makes the Direct3D calls necessary to provide enhanced program and parameter management and should be used when you prefer to let the Cg runtime manage the Direct3D shaders Direct3D Minimal Interface The minimal interface simply supplies convenient functions to convert some information provided by the core runtime to information specific to Direct3D Vertex Declaration In Direct3D you have to supply a vertex declaration that establishes a mapping between the vertex shader input registers and the data provided by the application as data streams In Direct3D 9 this vertex declaration is bound to the current state the same way the vertex shader is see the 808 00504 0000 006 85 NVIDIA Cg Language Toolkit Direct3D 9 documentation on IDirect3DDevice9 CreateVertexDeclaration and IDirect 3DDevice9 SetVertexDeclaration for a detailed explanation In Direct3D 8 the vertex declaration is required at the time you create the vertex shader for more information see the Direct3D 8 documentation on
354. unsized array of Light interface objects loops over them and returns the sum of the values returned by their respective value methods interface Light float4 value y struct Soo lic 2 lem 1 floats value recaen Elo ata aa O y loa mesa Was cora biome LIY 3 COOR 1 float4 v float4 0 0 0 0 foe aime 3b Of a lt lL lewwguimge sri Ww a Ifa velue O p return v Recall that all uniform parameters to the program must have expressions in the parenthesized list in the compile statement and therefore one expression is necessary here for the 1 parameter 808 00504 0000 006 125 NVIDIA Cg Language Toolkit Resolution using Cg The first way that main can be compiled is to provide the name of an effect parameter that resolves both the actual size of the array as well as the concrete type that implements the Light interface SpotLight spots 4 technique pass FragmentProgram compile arbfpl main spots Resolution using the Cg runtime Alternatively the application can leave the resolution of the concrete types and array size until later so that they may be set via Cg runtime calls from the application as one typically does for Cg programs that are not CgFX For this case the expression passed to the compile statement should just be an unsized array of the abstract interface type ieme liiciaes technique pass FragmentProgram compile arbfpl main lights Th
355. urns CG FALSE The declaration returned by cgD3D9GetVertexDeclaration or cgD3D8GetVertexDeclaration is for a single stream so that for the following program mole masa atin deluxe Posiriom e POSITION in float4 color 2 COLOROF in tloat4 texCoord TEXCOORDO out float4 hpos 2 IOS IVE ION i it is equivalent to const D3DVERTEXELEMENT9 declaration LO 0 slizcor float D3DDECLTYPE_FLOAT4 D3DDECLMETHOD_DEFAULT D3DDECLUSAGE POSITION O0 Jj LO 4 v sizcor ac Leste y D3DDECLTYPE_FLOAT4 D3DDECLMETHOD DEFAULT D3DDECLUSAGE COLOR 0 i 9 8 epibeAxexouE elote y D3DDECLTYPE_FLOAT4 D3DDECLMETHOD DEFAULT D3DDECLUSAGE TEXCOORD 0 D3DD3CL_END y for the Direct3D 9 Cg runtime and it is equivalent to const DWORD declaration 808 00504 0000 006 87 NVIDIA Cg Language Toolki D3DVSD_STREAM 0 D3DVSD_REG D3DVSDE_POSITION D3DVSDT_FLOAT4 D3DVSD_REG D3DVSDE_DIFFUSE D3DVSDT_FLOAT4 D3DVSD_REG D3DVSDE_TEXCOORDO D3DVSDT_FLOAT4 O D3DVSD_END y for the Direct3D 8 Cg runtime Usually though you want to apply a vertex program to geometric data that come in multiple streams or with specific vertex formats In this case the vertex declaration is based on the vertex formats rather than the pr
356. utomatically set by the Cg runtime However in some situations it may be useful to query a sink side member parameter for its underlying resource for example A shared instance of a structure whose type in defined in one Cg program or effect may be connected to parameters of other programs or effects provided that the entities involved define the source structure types and destination interface types equivalently See Parameter Type Equivalency on page 65 or more details If the types are not equivalent cgconnect Parameter generates a runtime error The following example illustrates structure to interface connection by creating three programs all of which define a type named Foo with one program s definition differing from the others interface MyInterface close Weill itllke te xx p y struct MyStruct MyInterface float Scale float Val float x return Scale x y float4 main MyInterface foo COLOR Stevia GECKO Well 52 2 P 808 00504 0000 006 61 NVIDIA Cg Language Toolkit Listing 1 Cg Program 1 interface MyInterface float Val float x y Sici licio Mistico EMNIN er decem float Scale float Val float x return Scale x y float4 main MyInterface foo COLOR erica too Well 5 8 AKA y Listing 2 Cg Program 2 interface MyInterface half valiai x y struct MyStruct MyInterface float Scale palk valjali recura Scala sx y
357. utput parameters in the arbvp1 profile are found in Table 18 These binding semantics map to ARB_vertex_program output registers The two sets act as aliases to each other Table 18 arbvp1 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION HPOS Output position PSIZE PSIZ Output point size FOG FOGC Output fog coordinate COLORO COLO Output primary color COLOR1 COL1 Output secondary color BCOLO Output backface primary color 808 00504 0000 006 261 NVIDIA Cg Language Toolkit Options Table 18 arbvp1 Varying Output Binding Semantics continued Binding Semantics Name Corresponding Data BCOL1 Output backface secondary color TEXCOORDO TEXCOORD 7 TEXO TEX7 Output texture coordinates Note The application must call ylEnable GL COLOR SUM ARB in order to enable COLOR1 output when using the arbvp1 profile The profile also allows wPos to be present as binding semantics on a member of a structure of a varying output data structure provided the member with this binding semantics is not referenced This allows Cg programs to have the same structure specify the varying output of an arbvp1 profile program and the varying input of an p30 profile program The arbvp1 profile supports the following profile specific options NumTemps n MaxAddressRegs n MaxInstructions lt
358. vector size is shorter than the semantic s vector size the larger numbered components of the semantic receive their default values if applicable and otherwise are undefined In the case above the R and G components of the output color are obtained from mycolor while the B and A components of the color are undefined 808 00504 0000 006 253 NVIDIA Cg Language Toolkit 254 808 00504 0000 006 NVIDIA Appendix B Language Profiles This appendix describes the language capabilities that are available in each of the following profiles supported by the Cg compiler Oooooddooodo oO a OpenGL ARB Vertex Program Profile arbvp1 OpenGL ARB Fragment Program Profile arbfp1 OpenGL NV_vertex_program 3 0 Profile vp40 OpenGL NV_fragment_program 2 0 Profile p40 OpenGL NV_vertex_program 2 0 Profile vp30 OpenGL NV_fragment_program Profile p30 OpenGL NV_vertex_program 1 0 Profile vp20 OpenGL NV texture shader and NV_register_combiners Profile p20 DirectX Vertex Shader 2 x Profiles vs 2 DirectX Pixel Shader 2 x Profiles ps 2 DirectX Vertex Shader 1 1 Profile vs 1 1 DirectX Pixel Shader 1 x Profiles ps 1 In each case the capabilities are a subset of the full capabilities described by the Cg language specification in Cg Language Specification on page 221 808 00504 0000 006 255 NVIDIA Cg Language Toolkit OpenGL Overview ARB Vertex Program Profile arbvp1
359. vp state matrix invtrans texture 0 state matrix invtrans palette 0 state matrix invtrans program 0 Accessible state semantics of type float4 are listed in Table 14 Table 14 float4 state Semantics state material ambient state material diffuse state material specular state material emission state material shininess state material front ambient state material front diffuse state material front specular state material front emission state material front shininess state material back ambient state material back diffuse state material back specular state material back emission 808 00504 0000 006 257 NVIDIA Cg Language Toolkit Table 14 float4 state Semantics continued state material back shininess state light 0 ambient state light 0 diffuse state light 0 specular state light 0 position state light 0 attenuation state light 0 spot direction state light 0 half state lightmodel ambient state lightmodel scenecolor state lightmodel front scenecolor state lightmodel back scenecolor state lightprod 0 ambient state lightprod 0 diffuse state lightprod 0 specular state lightprod 0 front ambient state lightprod 0 front diffuse state lightprod 0 front specular state lightprod 0 back ambient state lightprod 0 back diffuse state lightprod 0 back specular state texgen 0 eye s state texgen 0 eye t state texgen 0 eye r
360. with no profile overload This search process allows generic versions of a function to be defined that can be overridden as needed for particular hardware Syntax for Parameters in Function Definitions Functions are declared in a manner similar to C but the parameters in function definitions may include a binding semantic see Binding Semantics on page 242 and a default value Each parameter in a function definition takes the following form uniform type identifier binding semantic gt lt default gt where Q type may include the qualifiers in out inout and const as discussed in Type Qualifiers on page 233 808 00504 0000 006 227 NVIDIA Cg Language Toolkit default is an expression that resolves to a constant at compile time Default values are only permitted for uniform parameters and for in parameters to functions that are not top level Function Calls A function call returns an rvalue Therefore if a function returns an array the array may be read but not written For example the following is allowed minas 6 4 1 E But this is not myfunc x 2 y For multiple function calls within an expression the calls can occur in any order it is undefined Method Calls Structures may have methods declared and defined in their structure definitions For example strict Roo 1 float value float valueTimesTwo return 2 value DE uon
361. wst float2 dot intermediate coord xyz prevlookup xyz dot str prevlookup xyz return tex2D RECT tex newst where str are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation and intermediate coord are texture coordinates associated with the previous texture unit This function can be used to generate the dot product 2dor dot product rectangle NV texture shader instruction combinations tex3D dp3x3 sampler3D tex float3 str texCUBE dp3x3 samplerCUBE tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup float4 intermediate coordl float4 intermediate coord2 float4 prevlookup 292 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 38 p20 Auxiliary Texture Functions continued Texture Function Description Performs the following float3 newst float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot str prevlookup xyz return tex3D CUBE tex newst where str are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate_coord1 are texture coordinates associated with the n 2 texture unit and intermediate coord2 are texture coordinates associated with the n 1 texture unit This function can be used to generate the dot_product_3d or dot_product_cube_map NV_texture
362. x Shader Source Code for Melting Paint define inputs from application struct app2vert float4 Position TEOSTITON float4 Normal NORMAL 808 00504 0000 006 161 NVIDIA Cg Language Toolkit 1 1 H St LUCE A Ime dram H vert2f ve Wy Hi Ou 14 Ou TE oat4 ColorO LODOLOBOU oat4 TexCoord0 TEXCOORDD vert2frag oat4 HPosition T ROSTETON oat3 OPosition EAS O ORINA oat3 EPosition ECO ORDES oat3 Normal TEXCOORD1 oat3 TexCoord0 LOU EXCODEIDUS oat4 ColorO COLO loat3 LightPos TEXCOORD4 loat3 ViewerPos TEXCOORD5 rag main app2vert In uniform float4x4 ModelViewProj uniform float4x4 ModelView uniform float4x4 ModelViewl uniform float4 ViewerPos uniform float4 LightPos ae ere teo tout Vertex positions In clip space t HPosition mul ModelViewProj In Position In object space E OROSite tom Im POSE Oa a SUE In eye space t EPosition mul ModelView In Position xyz t Normal normalize In Normal xyz Copy the texture coordinates t TexCoord0 In TexCoord0 xyz Generate a white color t Color0 LightPos t LightPos mul ModelViewI LightPos xyz t ViewerPos mul ModelViewI float4 0 0 0 1 xyz cuca Ova p 162 808 00504 0000 006 NVIDIA Advanced Profile Sample Shaders Pixel Shader Source Code for Melting Paint struct vert2frag
363. x program indexing does not permit it Each element of the array takes a single 4 float program parameter register For example float arr 10 float2 arr 10 float3 arr 10 and 1oat4 arr 10 all consume ten program parameter registers It is more efficient to access an array of vectors than an array of matrices Accessing a matrix requires a floor calculation followed by a multiply by a constant to compute the register index Because vectors and scalars take one register neither the floor nor the multiply is needed It is faster to do matrix skinning using arrays of vectors with a premultiplied index than using arrays of matrices Constants Literal constants can be used with this profile but it is not possible to store them in the program itself Instead the compiler will issue as comments a list of program parameter registers and the constants that need to be loaded into them The Cg run time system will handle loading the constants as directed by the compiler 808 00504 0000 006 305 NVIDIA Cg Language Toolkit Bindings Note If the Cg run time system is not used it is the responsibility of the programmer to make sure that the constants are loaded properly Binding Semantics for Uniform Data The valid binding semantics for uniform parameters in the vs_1_1 profile are summarized in Table 45 Table 45 vs_1_1 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0
364. x value as long as the application provides this value with each vertex Cg provides a flexible mechanism for specifying these per vertex inputs in the form of a set of predefined names Each program input must be bound to aname from this set In the following structure the vertex program definition binds its parameters to the predefined names POSITION NORMAL TANGENT and TEXCOORD3 The application must provide the vertex array data associated with these predefined names struct myinputs float3 myPosition POSITION float3 myNormal NORMAL float3 myTangent TANGENT float refractive index TEXCOORD3 outdata foo myinputs indata He noa Within the program the parameters are referred to as indata myPosition indata myNormal and so on Xe Vows UY We refer to the predefined names as binding semantics The following set of binding semantics is supported in all Cg vertex program profiles Some Cg profiles support additional binding semantics POSITION BLENDWEIGHT NORMAL TANGENT 6 808 00504 0000 006 NVIDIA Introduction to the Cg Language BINORMAL PSIZE BLENDINDICES TEXCOORDO TEXCOORD7 The binding semantic POSITIONO is equivalent to the binding semantic POSITION likewise the other binding semantics have similar equivalents In the OpenGL Cg profiles binding semantics implicitly specify the mapping of varying inputs to particular hardware registers However in DirectX
365. xwese 9 sealer outputs main inputs IN uniform float4x4 ModelViewProj uniform float4x4 ModelView uniform float4x4 ModelViewIT uniform float theta outputs OUT OUT hPosition mul ModelViewProj IN Position convert the position and normal into appropriate spaces float3 eyeToVert mul ModelView IN Position xyz eyeToVert normalize eyeToVert float3 normal mul ModelViewIT IN Normal xyz normal normalize normal OUT refractVec xyz refract eyeToVert normal theta 206 808 00504 0000 006 NVIDIA Basic Profile Sample Shaders DIU acte c WIR OUT reflectVec xyz reflect eyeToVert normal OUT reflectVec w 1 calculate the fresnel reflection OUT fresnelTerm fast fresnel eyeToVert normal Ergat o 3 0 1 0 9 0 Pp return OUT Pixel Shader Source Code for Refraction ellos mata aum sEllowHES eer ACE Wee TEXCOORDO iin Tlosis retlecivee e WE YMCOORIDI y in float3 fresnelTerm COLORO uniform samplerCUBE environmentMaps 2 uniform float enableRefraction uniform float enableFresnel COLOR float3 refractColor texCUBE environmentMaps 0 refractVec rgb float3 reflectColor texCUBE environmentMaps 1 reflectVec rgb float3 reflectRefract lerp refractColor reflectColor fresnelTerm float3 finalColor enableRefraction enableFresnel reflectRefract refractColor enableFresnel reflectColor fresnelTerm
366. xyz float w dot texCoord lt n 1 gt t xyz depth z w Auxiliary Texture Functions Because the capabilities of the texture shader instructions are limited in NV_texture_shader a set of auxiliary functions are provided in these profiles that express the functionality of the more complex texture shader instructions These functions are merely provided as a convenience for writing p20 Cg programs The same result can be achieved by writing the expanded form of each function directly Using the expanded form has the additional advantage of being supported on other profiles These functions are summarized in Table 38 290 808 00504 0000 006 NVIDIA Appendix B Language Profiles Table 38 p20 Auxiliary Texture Functions Texture Function Description offsettex2D uniform sampler2D tex float2 st uniform float4 m float4 prevlookup offsettexRECT uniform samplerRECT tex float2 st float4 prevlookup uniform float4 m Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy return tex2D RECT tex newst where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation and m is the offset texture matrix This function can be used to generate the offset_2d or offset_rectangle NV_texture_shader instructions offsettex2DScaleBias uniform sampler2D tex float2 st float4 prevlookup uniform float4 m unifo
367. ying contexts Context Creation and Destruction Programs can only be created as part of a context that acts as a program container A context is created by calling cgCreateContext CGcontext cgCreateContext A context is destroyed by cgDestroyContext void cgDestroyContext CGcontext context cgDestroyContext deletes all data associated with the context including all programs it contains cgDestroyContext should be called before destroying any associated OpenGL context or Direct3D device Context Query To check whether a context handle references a valid context or not use cgIsContext CGbool cgIsContext CGcontext context Core Cg Program There are Cg functions for creating destroying iterating over and querying programs Program Creation and Destruction A program is created by calling either cgCreateProgram CGprogram cgCreateProgram CGcontext context CGenum programType const char program CGprofile profile const char entry const char args Or cgCreateProgramFromFile CGprogram cgCreateProgramFromFile CGcontext context CGenum programType const char program CGprofile profile const char entry const char args 50 808 00504 0000 006 NVIDIA Introduction to the Cg Runtime Library These functions create a program object add it to the specified context and compile the associated source code For both of them Q context is a valid context handle Q
368. ype int numberOfValuesReturned This entry point retrieves the parameter s default value if valueType is equal to CG_DEFAULT The components of the value are returned in row major order as a pointer to an array containing type double elements The number of components available in the array is returned in numberOfValuesReturned Function cgGetParameterValues can also be used to retrieve a parameter s constant values but this functionality is rarely used see the corresponding manual page for more details Shared Parameters The core Cg runtime supports the creation of instances of any type of concrete parameter e g built in types user defined structures within a Cg context A parameter instance may be connected to any number of compatible parameters including any program or effect parameter within the context When an instance is connected to another parameter the second parameter will inherit its values from the instance Furthermore if the variability of the second parameter has not been explicitly set by a call to cgSetParameterVariability its variability will also be inherited from the instance 808 00504 0000 006 59 NVIDIA Cg Language Toolkit The ability to create and easily manage shared context global parameters provides a powerful means for creating parameter trees and for sharing data and user defined objects between multiple Cg programs or effects Shared Parameter Creation Shared parameters
369. zles to Make the Most of Vectorization The GPU can swizzle the values in vectors with no performance penalty recall that a swizzle can be used to rearrange the elements of a vector Given a vector clo ees ex ilheees O id 2 5 swizzles construct new vectors pos lloat 0 0 9 P a vaz itloecad il 2 2 9 cozy tlhoae2 2 X and so forth By swizzling your data carefully you can still take advantage of vectorization even when you don t want to use the same component of both vectors on both sides of your computation For example consider the computation of the cross product Given two three dimensional vectors the cross product returns a new vector that is perpendicular to the given vectors It is computed by loss as 19 loss g LOBES owe Esa do Elsa ESO a Bo RI WF EFD p Here we ve again got a lot of arithmetic operations each using a single pair of float values Some cleverness lets us turn this into a vectorized operation Below is the implementation of the cross function from the Cg Standard Library requiring just two vector multiply operations and one vector subtraction operation flost3S crossTPlost3 a Floats b 1 ESOT Es lO wy BoB 10 ZE Confirm for yourself that this computes the same value as the first section of code for the cross product note that it exposes much more vectorized computation for the GPU to efficiently process 808 00504 0000 006 323 NVIDIA

Users Manual

Contents

Download Pdf Manuals

Related Search

Related Contents