Home

Cg Toolkit User`s Manual

1. Enable the profiles EnableProfile vertexProfile LEnableProfile fragmentProfile Bind the programs LBindProgram vertexProgram LBindProgram fragmentProgram Enable texture LEnableTextureParameter baseTexture Draw scene Disable texture LDisableTextureParameter baseTexture Disable the profiles LDisableProfile vertexProfile LDisableProfile fragmentProfile Set the varying parameters LDisableClientState position LDisableClientState color LDisableClientState texCoord lled before application shuts down CgShutdown E This frees any runtime resource estroyContext context 56 808 00504 0000 004 NVIDIA Using the Cg Runtime Library OpenGL Error Reporting Here is the list of the CGerror errors specific to the OpenGL Cg runtime Q CG PROGRAM LOAD ERROR Returned when the program could not be loaded Q CG PROGRAM BIND ERROR Returned when the program could not be bound Q CG PROGRAM NOT LOADED ERROR Returned when the program must be loaded before the operation may be used Q CG UNSUPPORTED GL EXTENSION ERROR Returned when an unsupported Open GL extension is required to perform the operation Any OpenGL Cg runtime function can generate an OpenGL error in addition to the Cg specific error These errors are checked in Cg as in any OpenGL application by using glGetError Direct3D Cg Runtime The Direct3D Cg runtime is
2. To load a program in Direct3D 8 use egD3D8LoadProgram HRESULT cgD3D8LoadProgram CGprogram program BOOL parameterShadowingEnabled DWORD assembleFlags DWORD vertexShaderUsage const DWORD declaration This function assembles the result of the compilation of program using D3DXAssembleShader with assembleFlags as the D3DXASM flags Depending on the program s profile it then either uses IDirect3DDevice8 CreateVertexShader to create a Direct3D vertex shader with declaration as the vertex declaration and vertexShaderUsage as the usage control or uses IDirect3DDevice8 CreatePixelShader to cteate a Direct3D pixel shader The value of parameterShadowingEnabled should be set to TRUE to enable parameter shadowing for the program This behavior can be changed after the program is created by calling cgD3DEnableParameterShadowing Here is a typical use of the function HRESULT hresult cgD3D8LoadProgram vertexProgram TRUE D3DXASM DEBUG D3DUSAGE SOFTWAREVERTEXPROCESSING declaration HRESULT hresult cgD3D8LoadProgram fragmentProgram TRUE 0 0 0 If you want to apply the same vertex program to several sets of geometric data each having a different layout you need to load the program with different vertex declarations in Direct3D 8 To do so you need to make a duplicate of the program using cgCopyProgram for each of these declarations Here is a code sam
3. D3DVSD Ej ND y If it is possible to do so the functions cgD3D9ResourceToDeclUsage and cgD3D8ResourceToInputRegister convert a CGresource enumetated type into a Direct3D vertex shader input register BYTE cgD3D9ResourceToDeclUsage CGresource resource DWORD cgD3D8ResourceToInputRegister CGresource resource If the resource is not a vertex shader input resource the call to cgD3D9ResourceToDeclUsage returns CGD3D9 INVALID REG and the call to cgD3D8ResourceToInputRegister returns CGD3D8 INVALID REG 808 00504 0000 004 61 NVIDIA Cg Language Toolkit To write the vertex declarations described above based on the program parameters which eliminates the reference to any semantic use cgD3D9ResourceToDeclUsage ot cgD3D8ResourceToInputRegister CGparameter position cgGetNamedParameter program position CGparameter color cgGetNamedParameter program color CGparameter texCoord cgGetNamedParameter program texCoord const D3DVERTEXELEMENT9 declaration ff gr 0 sizeor Melo y D3DDECLTYPE FLOAT3 D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage cgGetParameterResource position cgGetParameterResourceIndex position s S w Sao Ge lona v D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsag
4. TANGENT SPAC mul Model mul Model17 mul Model mul ModelView 1 saturate sqrt dot viewP xyz normalize normalize objL objV Tangent Tangent Tangent float3x3 ModelViewIT Wectors mul float3x3 ModelView vert ONormal vert OPosition Walle 272 10 QU 9 viewP xyz vectors EyePosition vert OPosition xyz LightVector m E vectors ob3L 195 W objH REFL Generate ECTION vector for per vertex ll Sai Loeb float3 reflection reflect viewV viewN Generate FRESNEL term float ndv saturate dot viewN viewV float FresnelApprox pow 1 ndv Fresnel z Fresnel y Fresnel x Fill OUTPUT parameters ALVA vert uv TEXCOORDO xy os tanL Tangent space LIGHT Tangent space HALF ANGLE O halfangle float4 tanH x tanH y tanH z l exp viewP w O reflection reflection View space REFLECTION Tangent space VIEW distance attenuation O view float4 tanV x tanV y 808 00504 0000 004 129 NVIDIA Cg Language Toolkit ELY ga VIEWTANGENT O tangent normalize View O binormal normalize View O normal normalize View O fresn FresnelApprox return 0 vie Tang Pang Tang amp A wP w ent 0 column 0 cie LI collin 1 emela Dy 27 colma 2 Pixel Shader Source Code for Car Paint 9 This sh
5. v2f main a2v IN uniform float4x4 WorldViewProj uniform float4 LightVector in object space uniform float4 EyePosition in object space Wie QUE pass texture coordinates for fetching the diffuse map OUT TexCoord0 xy IN TexCoord xy pass texture coordinates for fetching the normal map OUT TexCoordl xy IN TexCoord xy compute the 3x3 transform from tangent space to object space float3x3 obj ToTangentSpace obj ToTangentSpace 0 IN T obj ToTangentSpace 1 IN B obj ToTangentSpace 2 IN N 808 00504 0000 004 137 NVIDIA Cg Language Toolkit transform normal from object space to tangent space OUT Normal xyz 0 5 mul objToTangentSpace IN Normal 037 transform light vector from object space to tangent space float3 lightVectorInTangentSpace mul objToTangentSpace LightVector xyz OUT LightVector xyz lightVectorInTangentSpace OUT LightVectorUnsigned xyz 0 5 lightVectorInTangentSpace 0 5 compute view vector float3 viewVector normalize EyePosition xyz IN Position xyz compute half angle vector float3 halfAngleVector normalize LightVector xyz viewVector transform half angle vector from object space to tangent space OUT HalfAngleVector xyz mul objToTangentSpace halfAngleVector transform position to projection space OUT Position mul WorldViewProj IN Position return OUT
6. 238 808 00504 0000 004 NVIDIA Appendix B Language Profiles Examples The following examples illustrate how a developer can use Cg to achieve DirectX pixel shader 1_X functionality Example 1 struct VertexOut float4 color COLORO float4 texCoord0 TEXCOORDO float4 texCoordl PEXCOORD1 4 H float4 main VertexOut IN uniform sampler2D diffuseMap uniform sampler2D normalMap COLOR float4 diffuseTexColor tex2D diffuseMap IN texCoord0 xy float4 normal 2 tex2D normalMap IN texCoordl xy 0 5 ilom lagi vector 2 IN color raa 0 5 z tl eEed dor resulte SeicuiceieS choir igne vector normal xyz xxxx return Glow result dirrusste Color Example 2 struct VertexOut float4 texCoord0 TEXCOORDO float4 texCoordl TEXCOORD1 float4 texCoord2 TEXCOORD2 float4 texCoord3 TEXCOORD3 y float4 main Vertex0ut IN uniform sampler2D normalMap uniform sampler2D intensityMap uniform sampler2D colorMap COLOR float4 normal 2 tex2D normalMap IN texCoord0 xy 0 5 float2 intensCoord float2 dot IN texCoordl xyz normal xyz dot IN texCoord2 xyz normal xyz float4 intensity tex2D intensityMap intensCoord float4 color tex2D colorMap IN texCoord3 xy Pe curn color NESSIE y 808 00504 0000 004 239 NVIDIA Cg Language Toolkit OpenGL NV_vertex_program 1 0 Profile vp20 The vp20 Vertex Program profile is used to compile Cg
7. OUT refractVec xyz refract eyeToVert normal theta OUT refractVec w 1 OUT reflectVec xyz reflect eyeToVert normal OUT reflectVec w 1 calculace the fresnel reflection QUIN 1rResaclcin rast txssmel eya lovere normal closets 5 07 LO 0 0 7 return OUT Pixel Shader Source Code for Refraction float4 main in float3 refractVec MESS O ORD O in float3 reflectVec MECO ORD in float3 fresnelTerm COMODO uniform samplerCUBE environmentMaps 2 uniform float enableRefraction uniform float enableFresnel COLOR float3 refractColor texCUBE environmentMaps 0 refractVec rgb float3 reflectColor texCUBE environmentMaps 1 reflectVec rgb float3 reflectRefract lerp refractColor reflectColor fresnelTerm float3 finalColor enableRefraction enableFresnel reflectRefract refractColor enableFresnel reflectColor fresnelTerm return float4 finalColor 1 0 808 00504 0000 004 151 NVIDIA Cg Language Toolkit Shadow Mapping Description This effect shows generating texture coordinates for shadow mapping along with using the shadow map in the lighting equation per pixel Figure 19 Figure 19 Example of Shadow Mapping 152 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Shadow Mapping struct appdata he IO RSS Son OSA ION float3 Normal NORMAL struct vpc
8. offsettex2D uniform sampler2D tex float2 st float4 prevlookup uniform float4 m Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy return tex2D tex newst where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation and m is the 2 D bump environment mapping matrix This function can generate the texbem instruction in all ps 1 X profiles 234 808 00504 0000 004 NVIDIA Appendix B Language Profiles Table 40 ps 1 x Auxiliary Texture Functions continued Texture Function Description offsettex2DScaleBias uniform sampler2D tex float2 st float4 prevlookup uniform float4 m uniform float scale uniform float bias Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy float4 result tex2D tex newst return result saturate prevlookup z scale bias where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation m is the 2 D bump environment mapping matrix scale is the 2 D bump environment mapping scale factor and bias is the 2 D bump environment mapping offset This function can generate the texbem1 instruction in all ps 1 x profiles texlD dp3 samplerlD tex float3 str float4 prevlookup Performs the following return texlD tex dot str prevlookup xyz where str are texture coordi
9. use the lit instruction to calculate lighting automatically clamp float4 lighting lit diffuse specular 32 output final lighting results OUT diffCol float4 lighting y OUT specCol float4 lighting z return OUT Pixel Shader Source Code for Thin Film Effect SRCE ZE E locie lied oL COLORO float3 specCol COMO float2 filmDepth TEXCOORDO y void main v2f IN e aeiloync4 Colkoie 3 COMO uniform sampler2D fringeMap uniform sampler2D diffMap diffuse material color logics Carrol eloacS 0 3 0 3 O 5 8 lookup fringe value based on view depth float3 fringeCol float3 tex2D fringeMap IN filmDepth modulate specular lighting by fringe color combine with regular lighting color rgb fringeCol IN specCol IN diffCol diffCol Coloma ban 126 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders Car Paint 9 Description This car paint shader uses gonioreflectomettic paint samples measured by Cornell University The samples were converted into a 2D texture map which is indexed using NdotL and NdotH as the s t coordinate pair and which provides the diffuse component of our lighting equation The specular term is calculated using the Blinn model and also includes a term which simulates the clear coat s metallic flecks The fleck normal mipmap chain has randomly generated vectors which reside within a positive Z cone in tangent space The con
10. 2 could generate the following pixel shader instruction assuming x is in t0 y is in t1 and z is in r0 add d2 r0 t0 bias tl Table 34 summarizes how different DirectX pixel shader 1_X instruction set modifiers are expressed in Cg programs For more details on the context in which each modifier is allowed and ways in which modifiers may be combined refer to the DirectX pixel shader 1 X documentation Table34 ps 1 x Instruction Set Modifiers Instruction Register Modifier Cg Expression instr X2 2 x instr X4 4A x instr d2 x 2 instr sat saturate x i e min x max x 1 0 reg bias x 0 5 1 reg 1 x reg x reg bx2 2 x 0 5 228 808 00504 0000 004 NVIDIA Appendix B Language Profiles Language Constructs and Support Data Types In the ps_1_X profiles operations occur on signed clamped floating point values in the range MaxPixelShaderValue to MaxPixelShaderValue where MaxPixelShaderValue is determined by the DirectX implementation These profiles allow all data types to be used but all operations are carried out in the above range Refer to the DirectX pixel shader 1 X documentation for more details Statements and Operators The DirectX pixel shader 1 X profiles support all of the Cg language constructs with the following exceptions Q Arbitrary swizzles are not supported though arbitrary write masks are Only the following swizzles are allowed x
11. Binding Semantics for Varying Input Output Data Only the binding semantic names need be given for these profiles The vertex parameter input registers are allocated dynamically All the semantic names except POSITION can have a number from 0 to 15 after them Table 11 vs 2 Varying Input Binding Semantics POSITION PSIZE BLENDWEIGHT BLENDINDICES NORMAL TEXCOORD COLOR TANGENT TESSFACTOR BINORMAL Table 12 summatizes the valid binding semantics for varying output parameters in the vs 2 0 and vs 2 X profiles 198 808 00504 0000 004 NVIDIA Appendix B Language Profiles These map to output registers in DirectX 9 vertex shaders Table 12 vs 2 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION Output position oPos PSIZE Output point size oPts FOG Output fog value oFog COLORO COLOR1 Output color values oDO oD1 TEXCOORDO TEXCOORD7 Output texture coordinates oT0 oT7 Options The vs 2 x profile allows the following profile specific options DynamicFlowControlDepth lt n gt NumTemps lt n gt Predication where n 0 or 24 default 24 where 12 lt n lt 32 default 16 default true 808 00504 0000 004 NVIDIA 199 Cg Language Toolkit DirectX Pixel Shader 2 x Profiles ps 2 Memory The DirectX Pixel Shader 2 0 Profiles are used to compile Cg source code to DirectX
12. It will be moved along the direction from ll light to vertex to extrude the shadow volume float away float ndotl 0 Move the back facing shadow volume points loci new Position eztr sion VES Y away r imss DOS Transform position to hclip space OUT Hposition mul WorldViewProj new position Set the color to blue for when the shadow volume Teh is rendered in color for illustrative purposes float4 color float4 0 0 Factors x 0 OUr Colori colos OUT TexCoord0 xy IN TexCoord0 return OUT 808 00504 0000 004 157 NVIDIA Cg Language Toolkit Sine Wave Demo Description This effect modifies the vertex positions using a sine function based on the current time It demonstrates use of the built in sin function It also computes a normal based on the perturbed mesh and uses this to compute a reflection vector to look up in a cube map Figure 21 Figure 21 Example of Sine Wave 158 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Sine Wave struct appdata float4 TexCoord0 TEXCOORDO he SLUG wou float4 HPOS POSITION float4 COLO COLORO float4 TEXO TEXCOORDO H vpconn main appdata IN uniform float4x4 WorldViewProj uniform float3x4 WorldView uniform float3x3 WorldViewIT uniform float3 WavesX Quantus 3EllexeHr S WavesY uniform float3 WavesH uniform float3 Time vpconn OUT float3 a
13. Ola 0 Olan come lasilies Joa Inculiesi Ola 0 Ola 0 Ola p actually constants could be done in VP or on CPU half irisSize BallData RADIUS sart UL Ola Beles TRES DARIK SEUL DEE IRIS DUBINI p half irisScale 0 3333h max 0 01h irisSize hakk rise ysl Deca RADIUS BaLlDAta IRIS DUPINA nubes mol cc loll itGicie 4 Javedbit 9 S sie 0 OOO ICE 5x GRAS Teruras S mole lesbi half D dot pupilCenter xAxis Panke sico duoi rs 12 DS E half4 planeEquation half4 xAxis D view vector TO surface half3 Vn normalize IN OPosition IN VPosition half3 Nf normalize IN N half3 Ln IN LightVecO xyz ohlar S Diker Lukee JbitsesECIo loe Sciuanas lot UNE Ia 9 half3 missColor AmbiColor baseTex DiffLight half3 DiffPupil AmbiColor saturate dot xAxis Ln half3 halfAng normalize Ln Vn half ndh abs dot Nf halfAng half specl pow ndh GlossData PHONG half s2 smoothstep GlossData GLOSS1 GlossData GLOSS2 specl specl lerp GlossData DROP speci s2 half3 SpecularLight SpecColor specl half3 hitColor missColor if slice gt 0 0h 808 00504 0000 004 117 NVIDIA Cg Language Toolkit half gradedEta BallData ETA gradedEta 1 0h gradedEta half3 faceColor BgColor half3 refVector refract Vn Nf gradedEta dot refVector refVector gt 0 now let s in
14. Product of row vector v and matrix M as shown below M 12 22 Mz 42 mul v M V V V Vj M M If v is a 1xA vector and M is an AxB matrix returns a 1xB vector noise x Either a 1 2 or 3 dimensional noise function depending on the type of its argument The returned value is between zero and one and is always the same for a given input value pow x y xY radians x Degree to radian conversion 22 808 00504 0000 004 NVIDIA Cg Standard Library Functions Table 1 Mathematical Functions continued Mathematical Functions Function Description round x Closest integer to x rsqrt x Reciprocal square root of x x must be greater than zero sign x lifx 0 1 if x lt 0 0 otherwise sin x Sine of x sincos float x out s out c s is set to the sine of x and c is set to the cosine of x If sin x and cos x are both needed this function is more efficient than calculating each individually sinh x Hyperbolic sine of x smoothstep min For values of x between min and max returns a max x smoothly varying value that ranges from 0 at x 2 min to 1 at x max xis clamped to the range min max and then the interpolation formula is evaluated 2 x min max min 3 x min max min step a x Oifx a lifx gt a sqrt x Square root of x x must be greater than zero tan x Tangent of
15. This frees any core runtime resources The minimal interface has no dynamic storage to free cgDestroyContext context 66 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Direct3D 8 Application The following C code links the previous vertex and fragment programs to the Direct3D 8 application finclude lt cg cg h gt include lt cg cgD3D8 h gt IDirect3DDevice8 device Initialized somewher ls IDirect3DTexture8 texture Initialized somewher ls D3DXMATRIX matrix Initialized somewher ls D3DXCOLOR constantColor Initialized somewher ls eec omes enisi CGprogram vertexProgram fragmentProgram DWORD vertexShader pixelShader CGparameter baseTexture someColor modelViewMatrix 7 Called su agalilcaicion sterco void OnStartup Create context context cgCreateContext Called whenever the Direct3D device needs to be created void OnCreateDevice Create the vertex shader vertexProgram cgCreateProgramFromFile context CG SOURCE YETTE TOG rem CG CE PNOMEE WS 1 1 Wertes rogram U CComPtr lt ID3DXBuffer gt byteCode const char progSrc cgGetProgramString vertexProgram CG COMPILED PROGRAM Normally you also grab the constants and prepend them to your vertex declaration Not shown here for brevity D3DXAssembleShader progSrc strlen progSrc 0 0 O amp byteCode 0 If your program uses explicit binding semantic
16. p30 profiles and will also be supported by all future profiles that have texture mapping capabilities All of the functions in Table 3 return a 1oat4 value Because of the limited pixel programmability of older hardware the ps 1 and p20 profiles use a different set of texture mapping functions See Language Profiles on page 195 for more information Table3 Texture Map Functions Texture Map Functions Function Description texlD samplerlD tex float s 1D nonprojective tex1D sampler1D tex float s float dsdx float dsdy 1D nonprojective with derivatives texlD samplerlD tex float2 sz 1D nonprojective depth compare texlD samplerlD tex float2 sz float dsdx float dsdy 1D nonprojective depth compare with derivatives texlDproj samplerlD tex float2 sq 1D projective texlDproj samplerlD tex float3 szq 1D projective depth compare tex2D sampler2D tex float2 s 2D nonprojective tex2D sampler2D tex float2 s float2 dsdx float2 dsdy 2D nonprojective with derivatives tex2D sampler2D tex float3 sz 2D nonprojective depth compare 808 00504 0000 004 25 NVIDIA Cg Language Toolkit Table3 Texture Map Functions continued Texture Map Functions Function Description tex2D sampler2D tex float3 sz float2 dsdx float2 dsdy 2D nonprojective depth compare with derivatives tex2Dproj sampler2D tex float3 sq 2D projective
17. If less values are set than the parameter requires the last value is smeared The cgGLSetParameter functions may be called for either uniform or varying 808 00504 0000 004 47 NVIDIA Cg Language Toolkit parameters When called for a varying parameter the appropriate immediate mode OpenGL entry point is called The corresponding parameter value retrieval functions are as follows cgGLGetParameterlf CGparameter parameter float array cgGLGetParameterld CGparameter parameter double array cgGLGetParameter2f CGparameter parameter float array cgGLGetParameter2d CGparameter parameter double array cgGLGetParameter3f CGparameter parameter float array cgGLGetParameter3d CGparameter parameter double array cgGLGetParameter4f CGparameter parameter double array cgGLGetParameter4d CGparameter parameter type array Setting Uniform Matrix Parameters The cgGLSetMatrixParameter functions ate used to set any matrix void cgGLSetMatrixParameterfr CGparameter parameter const float matrix void cgGLSetMatrixParameterfc CGparameter parameter const float matrix void cgGLSetMatrixParameterdr CGparameter parameter const double matrix void cgGLSetMatrixParameterdc CGparameter parameter const double matrix The matrix is passed as an atray of floating point values whose size matches the number of coefficients of the matrix The r suffix is for functions that assume the matrix is laid out in row o
18. More precisely it is the number of floating point values required to store a parameter of type type This function does not apply to some types like the sampler types in which case it returns zero It is useful because applications can determine how many floating point values they have to provide to set the value of a given parameter Minimal Interface Program Examples In this section we provide some code samples that illustrate how and when to use functions from the minimal interface to make Cg programs work with Direct3D To enhance clarity the examples do very little error checking but a production application should check the return values of all Cg functions The vertex and fragment programs below are referenced in Direct3D 9 Application on page 64 and Direct3D 8 Application on page 67 Vertex Program The following Cg code is assumed to be in a file called VertexProgram cg void VertexProgram ntm Elles posite onm MEO STIN ON sum llos colo COLORO mne Ee eC TEXCOORDO Que logs ses sislOn OO JIEXOISIDI OQ out float4 colorO ee OLORO out float4 texCoordO TEXCOORDO const uniform float4x4 ModelViewMatrix positionO mul position ModelViewMatrix colorO color texCoordO texCoord Fragment Program The following Cg code is assumed to be in a file called FragmentProgram cg void FragmentProgram iim cloar colo COLORO in float4 texCoord TEXCOORDO Oe logr uoles cae
19. Pixel Shader Source Code for Bump Dot3x2 struct v2f float4 Position POSITION in projection space float4 Normal COLORO in tangent space float4 LightVectorUnsigned COLORI in tangent space float3 TexCoord0 TEXCOORDO float3 TexCoordl TEXCOORD1 float4 LightVector TEXCOORD2 in tangent space float4 HalfAngleVector TEXCOORD3 in tangent space 138 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders float4 main v2f IN uniform sampler2D DiffuseMap uniform sampler2D NormalMap uniform sampler2D IlluminationMap uniform float Ambient COLOR tecch base color float4 color tex2D DiffuseMap IN TexCoord0 xy fetch bump normal and expand it to 1 1 float4 bumpNormal 2 tex2D NormalMap IN TexCoordl xy 0 5 compute the dot product between the bump normal and the light vector compute the dot product between the bump normal and the half angle vector fetch the illumination map using Jd the result of the two previous dot products as texture coordinates returns the diffuse color in the Wit color components and the specular color in the d alpha component float2 illumCoord float2 dot IN LightVector xyz bumpNormal xyz dot IN HalfAngleVector xyz bumpNormal xyz float4 illumination tex2D IlluminationMap illumCoord expand iterated normal to 1 1 float4 normal 2 IN Normal 0 5 compute self shadowing
20. The arithmetic operator is the remainder operator as in C It may only be applied to two operands of cint or int type When or is used with cint or int operands C rules for integer and apply 808 00504 0000 004 189 NVIDIA Cg Language Toolkit The C operators that combine assignment with arithmetic operations such as are also supported when the corresponding arithmetic operator is supported by Cg Conditional Operator P If the first operand is of type bool one of the following statements must hold for the second and third operands Q Both operands have compatible structure types Q Both operands are scalars with numeric or bool type Q Both operands are vectors with numeric or bool type where the two vectors are of the same size which is less than or equal to four If the first operand is a packed vector of bool then the conditional selection is performed on an elementwise basis Both the second and third operands must be numeric vectors of the same size as the first operand Unlike C side effects in the expressions in the second and third operands are always executed regardless of the condition Miscellaneous Operators typecast Cg supports C s typecast and comma operators 190 808 00504 0000 004 NVIDIA Reserved Words The following ate the reserved words in Cg asm bool catch column major const cast default do dynamic cast enum false for goto in int
21. User s Manual A Developer s Guide to Programmable Graphics Release 1 1 February 2003 RHVIDIA Cg Language Toolkit ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOARDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all information previously supplied NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA and the NVIDIA logo are trademarks of NVIDIA Corporation Microsoft Windows the Windows logo and DirectX are registered trademarks of Microsoft Corporation OpenGL is a trademark of SGI Other company and product name
22. provided these inputs are not referenced This allows Cg programs to have the same structure specify the varying output of a vp20 profile program and the varying input of a p20 profile program Table 49 summarizes the valid binding semantics for varying output parameters in the p20 profile Table 49 p20 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Output color float4 COL COLO DEPR Output depth float DEPTH 250 808 00504 0000 004 NVIDIA Appendix B Language Profiles The output depth value is special in that it may only be assigned a value of the form float4 t lt texture shader operation gt float z dot texCoord lt n gt t xyz float w dot texCoord lt n 1 gt t xyz depth z w Auxiliary Texture Functions Because the capabilities of the texture shader instructions ate limited in NV_texture_shader a set of auxiliary functions are provided in these profiles that express the functionality of the more complex texture shader instructions These functions are merely provided as a convenience for writing p20 Cg programs The same result can be achieved by writing the expanded form of each function directly Using the expanded form has the additional advantage of being supported on other profiles Table 50 summatizes these functions Table 50 p20 Auxiliary Texture Functions Texture Function Description offsettex2D
23. void void void void void void void void void void void void void void cgGLSetParameterlf CGparameter parameter float x cgGLSetParameterlfv CGparameter parameter const float array cgGLSetParameterld CGparameter parameter double x cgGLSetParameterldv CGparameter parameter const double array cgGLSetParameter2f CGparameter parameter float x float y cgGLSetParameter2fv CGparameter parameter const float array cgGLSetParameter2d CGparameter parameter double x double y cgGLSetParameter2dv CGparameter parameter const double array cgGLSetParameter3f CGparameter parameter float x float y float z cgGLSetParameter3fv CGparameter parameter const float array cgGLSetParameter3d CGparameter parameter double x double y double z cgGLSetParameter3dv CGparameter parameter const double array cgGLSetParameter4f CGparameter parameter float x float y float z float w cgGLSetParameter4fv CGparameter parameter const float array cgGLSetParameter4d CGparameter parameter double x double y double z double w cgGLSetParameter4dv CGparameter parameter const double array The digit in the name of those functions indicates how many scalar values are set by the function The v suffix is for functions that operate on an array of values as opposed to individual arguments If more values are set than the parameter requires the extra values are ignored
24. CG PROFILE ARBVP1 main args CG SOURCE indicates that myVertexProgramString a string argument contains Cg source code not precompiled object code Indeed the Cg runtime also lets you create a program from precompiled object code if you want to CG PROFILE ARBVP1 is the profile the program is to be compiled to The main parameter gives the name of the function to use as the main entry point when the program is executed Lastly args is a null terminated list of null terminated strings that is passed as an argument to the compiler Loading a Program After you compile a program you need to pass the resulting object code to the 3D API that you re using For this you need to invoke the Cg runtime s API specific functions The Direct3D specific functions require the Direct3D device structure in order to make the necessary Direct3D calls The application passes it to the runtime using the following call cgD3D9SetDevice Device 32 808 00504 0000 004 NVIDIA Using the Cg Runtime Library You must do this every time a new Direct3D device is created typically only at the beginning of the application You can then load a Cg program in this way for the Direct3D 9 Cg runtime cgD3D9LoadProgram program CG FALSE 0 or this way for the Direct3D 8 Cg runtime cgD3D8LoadProgram program CG FALSE 0 0 vertexDeclaration The parameter vertexDeclaration is the Direct3D 8 vertex declaration array that
25. NVIDIA Cg Language Toolkit Q Cg Developer s CD The CD provided with this book contains the entire Cg release which allows you get started immediately The readme txt file on the CD describes the contents of the release in detail You can begin working with Cg immediately by reading the Introduction to the Cg Language on page 1 and then going through A Brief Tutorial on page 89 Once you have a basic understanding of the Cg language use the Advanced Profile Sample Shaders on page 97 and Basic Profile Sample Shaders on page 133 as a basis to build your own effects Release Notes Release notes for Cg ate now contained in a separate document that is part of the Cg distribution Please report any bugs issues and feedback to NVIDIA by e mailing cegsupport nvidia com We will expeditiously address any reported problems Online Updates Any changes additions or corrections are posted at the NVIDIA Cg Web site http developer nvidia com Cg Refer to this site often to keep up on the latest changes and additions to the Cg language Information on how to report any bugs you may find in the release is also available on this site xiv 808 00504 0000 004 NVIDIA Introduction to the Cg Language Historically graphics hardware has been programmed at a very low level Fixed function pipelines were configured by setting states such as the texture combining modes More recently programmers con
26. Scalar types may be implicitly converted to vectors and matrices of compatible type The scalar is replicated to all elements of the vector or matrix Scalar types may also be explicitly cast to structure types if the scalar type can be legally cast to every member of the structure Vector conversions Vectors may be converted to scalar types the first element of the vector is selected A warning is issued if this is done implicitly A vector may also be implicitly converted to another vector of the same size and compatible element type A vector may be converted to a smaller comparable vector or a matrix of the same total size but a warning is issued if an explicit cast is not used Matrix conversions Matrices may be converted to a scalar type element 0 0 is selected As with vectors this causes a warning if it is done implicitly A matrix may also be converted implicitly to a matrix of the same size and shape and comparable element type A matrix may be converted to a smaller matrix type the upper right sub matrix is selected ot to a vector of the same total size but a warning is issued if an explicit cast is not used Structure conversions A structure may be explicitly cast to the type of its first member or to another structure type with the same number of members if each member of the struct can be converted to the corresponding member of the new struct No implicit conversions of struct types ate allowed 176 8
27. This profile implements data types as follows Q float data types are implemented as IEEE 32 bit single precision Q half and double data types are implemented as float Q int data type is supported using floating point operations which add extra instructions for proper truncation for divides modulos and casts from floating point types Q fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables Bindings Binding Semantics for Uniform Data Table 41 summarizes the valid binding semantics for uniform parameters in the vp20 profile Table 41 vp20 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c95 Constant register 0 95 C0 C95 The aliases c0 c95 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used 808 00504 0000 004 241 NVIDIA Cg Language Toolkit Binding Semantics for Varying Input Output Data Table 42 summarizes the valid binding semantics for varying input parameters in the vp20 profile One can also use TANGENT and BINORMAL instead of TEXCOORD6 and TEXCOORD7 A second se
28. although many of the principles are mote broadly applicable 1 Program for Vectorization The GPU can generally perform four arithmetic operations as quickly as it can perform a single operation Therefore if you have two vectors of four floating point values igne ip 197 you can add the two vectors together log E atop with no more computational expense than adding together two of their elements elote Cl e ess ue IX This has two implications for efficient programming First you should try to write code that naturally maps to these vector operations If you want to add 808 00504 0000 004 257 NVIDIA Cg Language Toolkit two float4 variables together it may be substantially less efficient to write it this way Elko g e lolas p loa x Els E Day fura ue DnA a w b w than to write it this way cloca cio The compiler does its best to find vectorization in your programs but the more vectorized your original code is the better starting place it has to work from A mote specific example comes from a common computation done for tangent space bump mapping Given a texture map that encodes a bump map by storing the offset along the tangent direction in x the offset along the binormal in y and the offset along the normal in z the bump mapped normal is computed by scaling the tangent binormal and normal appropriately In C or C the natural way to write this computation is as shown Tangen
29. b c zyx yields float3 c b a float4 a b c d xxyy yields float4 a a b b float2 a b yyxx yields float4 b b a a float4 a b c d w yields d The swizzle operator can also be used to create a vector from a scalar a xxxx yields float4 a a a a The precedence of the swizzle operator is the same as that of the array subsctipting operator Write Mask Operator The write mask operator is placed on the left hand side of an assignment statement It can be used to selectively overwrite the components of a vector It 1s illegal to specify a patticular component more than once in a write mask or to specify a write mask when initializing a variable as part of a declaration The following is an example of a write mask float4 color lora 0 1 0 9507 10150 8 color a 1 0 Set alpha to 1 0 leaving RGB alone The write mask operator can be a powerful tool for generating efficient code because it maps well to the capabilities of GPU hardware The precedence of the write mask operator is the same as that of the swizzle operator 16 808 00504 0000 004 NVIDIA Introduction to the Cg Language Conditional Operator Cg includes C s if else conditional statement and conditional operator With the conditional operator the control variable may be a bool vector If so the second and third operands must be similarly sized vectors and selection is performed on an elementwise basis Unlike C any side
30. const DWORD declaration for the Direct3D 8 Cg runtime A call to cgD3D9ValidateVertexDeclaration ot cgD3D8ValidateVertexDeclaration returns CG TRUE if the vertex declaration is compatible with the program A Direct3D 9 declaration is compatible with the program if the declaration has an entry matching every varying input parameter used by the program A Direct3D 8 declaration is compatible with the program if the declaration has a D3DVSD_REG macro call matching every varying input parameter used by the program For the program void main float4 position POSITION float4 color COLORO float4 texCoord TEXCOORDO td the following Direct3D 9 vertex declaration is valid const D3DVERTEXELEMENT9 declaration e 9r ESAS ou EMO o D3DDECLTYPE FLOAT3 D3DDECLMETHOD DEFAULT K i 0 D D3DD AGE POSITION 0 O 3 zeof float D3DDE P D3DCOLOR D3DDECLMETHOD DEFAULT D 4 D D I D3D AGE COLOR 0 eof float E FLOAT2 D3DDECLMETHOD DEFAULT EXC O ORD O E Qaa E AL D3DDECLT D3DDECLUSAGE D3DD3CL END and the following Direct3D 8 vertex declaration is valid W N E DWORD declaration D3DVSD_STREAM 0 D3DVSD REG D3DVSDE POSITION D3DVSDT FLOAT3 D3DVSD REG D3DVSDE DIFFUSE D3DVSDT D3DCOLOR D3DVSD STREAM 1 D3DVSD
31. eight elements float4x4 matrix4 Four by four matrix sixteen elements Note that the multi dimensional array float M 4 4 is not type equivalent to the matrix float4x4 M There are no unions or bit fields in Cg at present Type Conversions Type conversions in Cg work largely as they do in C Type conversions may be explicitly specified using the C newtype cast operator 808 00504 0000 004 11 NVIDIA Cg Language Toolkit Cg automatically performs type promotion in mixed type expressions just as C does For example the expression floatvar halfvar is compiled as floatvar float halfvar Cg uses different type promotion tules than C does in one case constant without an explicit type suffix does not cause type promotion CG compiles the expression halfvar 2 0 as halfvar half 2 0 In contrast C would compile itas double halfvar 2 0 Cg uses different rules than C to minimize inadvertent type promotions that cause computations to be performed in slower high precision arithmetic If the C behavior is desired the constant should be explicitly typed to force the type promotion halfvar 2 0f is compiled as float halfvar 2 0f Cg uses the following type suffixes for constants Q f for float A h for half Q x for fixed Structures Arrays Cg supports structures the same way C does Cg adopts the C convention of implicitly performing a typedef based on the tag name when a struct is de
32. float3 eyeSpacePosition TEXCOORD7 he ELOGIES imegnimese locus wil itiloees w2 loss e float costheta float3 g2 float3 gtemp costheta dot vl v2 g2 g g guisempe Mec a G4 2 Oe COS chetan gtemp pow gtemp 1 5 xxx Gneewuo 1 sos 2 creas return Lemos Computes the single scattering approximation to scattering from a one dimensional volumetric surface float3 singleScatter float3 wi float3 wo float3 n float3 g float3 albedo float thickness float win abs dot wi n float won abs dot wo n float eterm Brodit si result eterm 1 0 exp 1 win 1 won thickness result eterm albedo hgphase wo wi g win won return result i is the incident ray n is the surface normal eta is the ratio of indices of refraction r is the reflected ray t is the transmitted ray float fresnel float3 i float3 n float eta 120 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders ine acloenes 12 Obie elotes qu float result lote Cy itle ES27 ie llOete 1616 Weve p Refraction vector courtesy Paul Heckbert cl cdo i im 4 esa 50er es La 0 Eil 91 y ella losw es2 059 ula Greece sal sp exeo A ISS allready wait tenga or 0 0 0 Compute Fresnel terms From Global Illumination Compendeum LOBE MOORE ile coss chiy COSp logre Cosi oh cosi itle
33. for DirectX PS 1 1 pixel shaders ps 1 2 for DirectX PS 1 2 pixel shaders ps 1 3 for DirectX PS 1 3 pixel shaders Q How to invoke Use the compiler options profile ps 1 1 profile ps 1 2 profile ps 1 3 The deprecated profile dx8ps is also available and is synonymous with ps 1 1 This document describes the capabilities and restrictions of Cg when using the DirectX pixel shader 1 X profiles Overview DirectX PS 1 4 is not currently supported by any Cg profile all statements aboutps 1 Xin the remainder of this document refer only to ps 1 1 ps 1 2 and ps 1 3 The underlying instruction set and machine architecture limit programmability in these profiles compared to what is allowed by Cg constructs Thus these profiles place additional restrictions on what can and cannot be done in a Cg program The main differences between these profiles from the Cg perspective is that additional texture addressing operations are exposed in ps 1 2andps 1 3and the depth value output is made available in a limited form in ps 1 3 Operations in the DirectX pixel shader 1 X profiles can be categorized as texture addressing operations and arithmetic operations Texture addressing operations are operations which generate texture addressing instructions arithmetic operations are operations which generate arithmetic instructions A Cg program in one of these profiles is limited to generating a maximum of four texture addressing instructions and
34. lighting z attenuation float main vert2frag IN uniform float4 LightPos uniform sampler3D noise map uniform sampler2D nv_ map uniform samplerCUBE cube map uniform float4 interpolate IEC OO float diffuse specular 808 00504 0000 004 107 NVIDIA Cg Language Toolkit float3 biVariate float3 IN OPosition x IN OPosition z NRO BO Sion aN Osito OE float3 uniVariate float3 IN OPosition x IN OPosition z 0 0 float3 normal normalize IN Normal float3 noiseTex float3 IN OPosition x IN OPosition z 6 dN gexexsstieskom a 2 5 0 float3 noiseSum tex3D noise map biVariate 3 rgb 12 tex3D noise map noiseTex rgb 18 tex3D noise map biVariate 6 rgb 18 normal normalize normal noiseSum calcLighting diffuse specular normal IN OPosition IN LightPos IN ViewerPos 32 float3 nvShift tex3D noise map uniVariate 3 rgb 2 tex3D noise map uniVariate rgb 4 tex3D noise map biVariate 3 rgb 16 SS ANS A A SOS nvShift y 0 biVariate float3 IN OPosition x IN OPosition z NOLO Sascsonenya 9 float2 texCoord biVariate xy 4 float2 1 1 5 nvShift yx float2 0 interpolate x 8 float3 nvDecal LES OF mejo OETA Ulsa SC OO 1 suelo l interpolate x 7 xxx float3 eye IN ViewerPos IN OPosition float3 lightMetal texCUBE cube map reflect normal eye rgb MAPS RAMSAR RS 7 dEdkoyeuES 5 25 0 sp enxexeuwubeuc s ar ko
35. lt 1024 default 1024 Predication lt b gt where b 0 or 1 default 1 ArbitrarySwizzle lt b gt where b 0 ot 1 default 1 GradientInstructions lt b gt where b 0 ot 1 default 1 NoDependentReadLimit b where b 0 ot 1 default 1 NoTexInstructionLimit b where b 0 or 1 default 1 Limitations in this Implementation Currently this profile implementation has the following limitations Q Dynamic flow control is not supported in extended pixel shaders Q Multiple color outputs are not supported in pixel shaders Only Coloro is supported 808 00504 0000 004 203 NVIDIA Cg Language Toolkit OpenGL Overview ARB Vertex Program Profile arbvp1 The OpenGL ARB Vertex Program Profile is used to compile Cg source code to vertex programs compatible with version 1 0 of the GL ARB vertex program extension a Profile name arbvp1 Q How to invoke Use the compiler option profile arbvpl This section describes the capabilities and restrictions of Cg when using the arbvpl profile Q The arbvp1 profile is similar to the vp20 profile except for the format of its output and its capability of accessing OpenGL state easily O ARB vertex program has the same capabilities as NV vertex program and DirectX 8 vertex shaders so the limitations that this profile places on the Cg source code written by the programmer is the same as the NV vertex program profile Accessing OpenGL State The ar
36. pass through object space position IN Normal xyz 110 NVIDIA 808 00504 0000 004 pass OUT N CUE QUID trans OUT PGS ans OUT Ligh through object space normal normalize IN Normal xyz IN Tangent xyz IN Binormal xyz form view pos origin Advanced Profile Sample Shaders tangent binormal to obj space loca OO 0 3 072 ition mul ModelViewI form light vector to obj space tVecO mul ModelViewI return OUT Pixel Shader Source Code for MultiPaint LightVec define WHITE half4 1 0h 1 0h 1 0h 1 0h input same struct is output from cg multipaintVP cg struct MultiPaintV2F float4 HPosition POSITION position clip space float4 TexCoords TEXCOORD0 base ST coordinates float3 OPosition UEXCO ORD CO Sion OO Asp de e float3 Normal TEXCOORD2 normal eye space float3 VPosition TEXCOORD3 view pos obj space float3 T TEXCOORD4 tangent obj space float3 B TEXCOORD5 binormal obj space float3 N TEXCOORD6 normal obj space float4 LightVecO TEXCOORD7 light dir obj space lg channel p define SE S in our material map EC STR x define M ETALN BIS Vy define NORM SF fp lutis aia PEC_EXPON z subfields in SpecData define MINPOWER x define MAXPOWER define MAXSPEC y Z ReflData define FR ESNE define FRESNE MIN x
37. texRECT samplerRECT float3 texRECTproj samplerRECT float3 texRECTproj samplerRECT float4 tex3D sampler3D float3 tex3Dproj sampler3D float4 texCUBE samplerCUBE float3 texCUBEproj samplerCUBE float4 Note The nonprojective texture lookup functions are actually done as projective lookups on the underlying hardware Because of this the w component of the texture coordinates passed to these functions from the application or vertex program must contain the value 1 Texture coordinate parameters for projective texture lookup functions must have swizzles that match the swizzle done by the generated texture shader instruction While this may seem burdensome it is intended to allow p20 profile programs to behave correctly under other pixel shader profiles Table 46 lists the swizzles required on the texture coordinate parameter to the projective texture lookup functions Table 46 Required Projective Texture Lookup Swizzles Texture Lookup Function Texture Coordinate Swizzle texlDproj Xw ra tex2Dproj Xyw rga texRECTproj Xyw rga tex3Dproj Xyzw rgba texCUBEproj Xyzw rgba 248 808 00504 0000 004 NVIDIA Bindings Appendix B Language Profiles Manual Assignment of Bindings The Cg compiler can determine bindings between texture units and uniform sampler parameters texture coordinate inputs automatically This automatic assignment is
38. w float2 waveDir WaveData i xy calcWave disp norm dampening IN Position xyz waveTime height frequency waveDir position y position y disp normal z e normal Xz norm OUT HPosition mul ModelViewProj position transfom normal into eye space normal mul ModelViewIT normal normal xyz normalize normal xyz get a vector from the vertex to the eye float3 eyeToVert mul ModelView position xyz eyeToVert normalize eyeToVert calculate the reflected vector for cubemap look up float4 reflected mul TextureMat reflect eyeToVert normal xyz xyzz output two reflection vectors for the two environment cubemaps OUT TexCoord0 reflected OUT TexCoordl reflected Calculate a Eresnell term note that 0 0 float fres l dot eyeToVert normal xyz fres pow fres 5 set the two color coefficients the magic constants 808 00504 0000 004 103 NVIDIA Cg Language Toolkit are arbitrary these two color coefficients are used to calculate the contribution from each of the two environment cubemaps one bright one dark Ow od cec SS A SSL O AO ATAN So 0 2 OW Colori e eres db 2 9 Roos return OUT Pixel Shader Source Code for Improved Water float4 main in float3 color0 COLORO aum ftlhouuES colos COMORES in float3 reflectVec TEXCOORDO in float3 reflectVecDark TEXCOORDI uniform samplerCUBE envir
39. without generating a register combiner instruction These operations are referred to as input modifiers and output modifiers Instead of generating a register combiners instruction the arithmetic operation modifies the assembly instruction or source registers to which it is applied For example the following Cg expression z x 0 5 4y 2 could generate the following register combiner instruction assuming x is in tex0 y is in tex1 and z is in col0 rgb 1 cliisieguas Ineillit gia eel zeielloy y discard texl rgb col0 sum scale low ome meli p alpha cliseauad male fora stezom discard ES col0 sum scale by one half Table 44 summarizes how different NV_texture_shadet and NV_register_combiners instruction set modifiers are expressed in Cg programs For more details on the context in which each modifier is allowed and ways in which modifiers may be combined refer to the NV_texture_shader and NV_register_combiners documentation Table 44 NV texture shader and NV register combiners Instruction Set Modifiers Instruction Register Modifier Cg Expression scale by two 2 x scale by four 4A x scale by one half x 2 bias by negative one half x 0 5 808 00504 0000 004 NVIDIA 245 Cg Language Toolkit Table 44 NV texture shader and NV_register_combiners Instruction Set Modifiers continued Instruction Register Modifier Cg Expression bias by
40. 0 and a representation with all bits set to 1 corresponds to 1 0 The four unsigned integers are then packed into a single 32 bit result This operation can be reversed using the unpack 4ubyte function C Psuedocode os Kowmcl Z55 0 clamp lar 050 5 1 0 5 ilg woumcl Z55 0 clama y 00 1 40 5 ilo z rouncd 299 0 clewwe esz 0540 14509 5 O comica 0 clean 007 1320 5 resule SA I ocz xs Teo unes lt lt 8 OA unpack 4ubyte half4 unpack 4ubyte float a Unpacks the four 8 bit integers in a and scales the results into individual 16 bit floating point values between 0 0 and 1 0 C Pseudocode rosula a gt gt 0 an 255 05 estilo Ma gt gt E amp ease 7 239 108 result az e Ma gt gt 16 amp Os 255 059 resultes e gt gt 24 amp 008 255 05 222 808 00504 0000 004 NVIDIA Appendix B Language Profiles DirectX Vertex Shader 1 1 Profile vs 1 1 The DirectX Vertex Shader 1 1 profile is used to compile Cg source code to DirectX 8 1 Vertex Shaders and DirectX 9 VS 1 1 shaders a Profile name vs 1 1 a How to invoke Use the compiler option profile vs 1 1 The vs 1 1 profile limits Cg to match the capabilities of DirectX Vertex Shaders This section describes how using the vs 1 1 profile affects the Cg source code that the developer writes Memory Restrictions DirectX 8 vertex shaders have a limited amount of memory for instructions and data
41. 004 NVIDIA Using the Cg Runtime Library The parameter type is retrieved by cgGetParameterType CGtype cgGetParameterType CGparameter parameter One of five types is returned 1 CG_STRUCT if the parameter is a structure 2 CG ARRAY if the parameter is an array 3 CG HALF if the parameter is a half based type 4 CG FLOAT if the parameter is a loat based type or 5 CG SAMPLER if the parameter is a sampler based type The pair of functions cgGetType and cgGetTypeString indicates the correspondence between a type enumerant and its corresponding string CGtype cgGetType const char typeString const char cgGetTypeString CGtype type If the string passed to cgGetType does not correspond to any type CG UNKNOWN TYPE is returned Function cgGetParameterName retrieves the parameter name const char cgGetParameterName CGparameter parameter Use cgGetParameterSemantic to retrieve the parameter semantic string const char cgGetParameterSemantic CGparameter parameter If the parameter does not have any semantic an empty string is returned There is a one to one correspondence between a set of predefined semantics POSITION COLOR and so on and hardware resources registers texture units and so on In the Cg runtime a hardware resource is represented by the type CGresource and cgGetParameterResource retrieves the resource assigned to a parameter CGresource cgGetParameterResource CGparame
42. 204 arithmetic operators 14 189 arithmetic precision 188 arithmetic range 188 array type specification 172 arrays declaration and use of 179 support of 12 B binding semantics 183 defined 6 overview 183 Blinn Phong Bump Mapping 119 bool data type 11 bool type specification 172 boolean operators 15 189 built in functions 19 bump dot3x2 diffuse and specular pixel shader code example 138 sample shader 136 vertex shader code example 137 bump reflection mapping pixel shader code example 143 sample shader 140 vertex shader code example 141 C C preprocessor supporting 182 C relation to Cg 165 Car Paint 9 pixel shader code example 130 vertex shader code example 128 cfloat type specification 172 Cg brief tutorial 89 defined 1 language introduction 1 necessity for xii standard library functions 19 Cg compiler cgc exe 265 command line options 265 Cg runtime 29 API specific 45 benefits 29 compiling 32 context creation 32 Direct3D 57 cgD3D9GetLastError 87 CGerror 86 debugging mode 83 error callbacks 87 error testing 87 error types 85 Direct3D cgD3D9EnableDebugTracing 85 Direct3D cgD3D9TranslateHRESULT 87 Direct3D expanded interface 69 cgD3D8LoadProgram 75 cgD3D8SetSamplerState 73 cgD3D9BindProgram 76 cgD3D9EnableParameterShadowing 808 00504 0000 004 NVIDIA Cg Language Toolkit 74 cgD3D9GetDevice 70 cgD3D9GetLatestPixelProfile 76 cgD3D9GetLatestVertexProfile 76 cgD3D9GetOptimal
43. 85 NVIDIA Cg Language Toolkit Q CGerror CcgD3D9Failed Set when a Direct3D runtime function makes a Direct3D call that returns an error cgD3D9DebugTrace Set when a debug message is output to the debug console when using the debug DLL see Direct3D Debugging Mode on page 82 Q HRESULT CGD3D9ERR_INVALIDPARAM Returned when a parameter value cannot be set CGD3D9ERR INVALIDPROFILE Returned when a program with an unexpected profile is passed to a function CGD3D9ERR INVALIDSAMPLERSTATE Returned when a parameter of type D3DTEXTURESTAGESTATETYPE which is not a valid sampler state is passed to a sampler state function CGD3D9ERR_INVALIDVEREXDECL Returned when a program is loaded with the expanded interface but the given declaration is incompatible Y CGD3D9ERR NODEVICE Returned when a required Direct3D device is 0 This typically occurs when an expanded interface function is called and a Direct3D device has not been set with cgD3D9SetDevice Y CGD3D9ERR NOTMATRIX Returned when a parameter that is not a matrix type is passed to a function that expects one CGD3D9ERR_NOTLOADED Returned when a parameter has not been loaded with the expanded interface by cgD3D9LoadProgram CGD3D9ERR NOTSAMPLER Returned when a parameter that is not a sampler parameter is passed to a function that expects one CGD3D9ERR NOTUNIFORM Returned when a parameter that is not uniform is passed to a function that expects
44. 9 PS 2 0 pixel shaders and DirectX 9 PS 2 0 extended pixel shaders a Profile names ps 2 0 for DirectX 9 PS 2 0 pixel shaders ps 2 x for DirectX 9 PS 2 0 extended pixel shaders Q How to invoke Use the compiler options profile ps 2 O0 profile ps 2 x The ps 2 0 profile limits Cg to match the capabilities of DirectX PS 2 0 pixel shaders The ps 2 x profile is the same as the ps 2 0 profile but allows extended features such as arbitrary swizzles larger limit on number of instructions no limit on texture instructions no limit on texture dependent reads and support for predication This section desctibes the capabilities and restrictions of Cg when using these profiles Program Instruction Limit DirectX 9 Pixel shaders have a limit on the number of instructions in a pixel shadet a PS 2 0 ps_2 0 pixel shaders are limited to 32 texture instructions and 64 arithmetic instructions Q Extended PS 2 ps 2 x shaders have a limit of maximum number of total instructions between 96 to 1024 instructions There is no separate texture instruction limit on extended pixel shaders If the compiler needs to produce more than the maximum allowed number of instructions to compile a program it reports an error Vector Register Limit Likewise there are limited numbers of registers to hold program parameters and temporary results Specifically there are 32 read only vector registers and 12 32 read write vector registers If the compile
45. Input Binding Semantics ins 232 Table 38 ps 1 x Varying Input Binding Semantics ln 233 Table 39 ps 1 x Varying Output Binding Semantics sns 233 Table 40 ps 1 x Auxiliary Texture Functions aoso aoo a 234 Table 41 vp20 Uniform Input Binding Semantics 241 Table 42 vp20 Varying Input Binding Semantics 4 242 Table 43 vp20 Varying Output Binding Semantics ls 242 Table 44 NV texture shader and NV register combiners Instruction Set Modifiers 245 Table 45 Supported Standard Library Functions ll sn 247 Table 46 Required Projective Texture Lookup Swizzles ls 248 Table 47 p20 Uniform Binding Semantics lll 4 249 Table 48 p20 Varying Input Binding Semantics sls ns 250 Table 49 p20 Varying Output Binding Semantics ls 250 Table 50 p20 Auxiliary Texture Functions 4 251 x 808 00504 0000 004 NVIDIA Foreword We are in the midst of a great transition in computer graphics both in terms of graphics hardware and in terms of the visual quality and authoring process for games interactive applications and animation Graphics hardware has evolved from big iron graphics workstations costing hundreds of thousands of dollars to single chip graphics processing units GPUs whose performance and features have grown to mat
46. MAX y 808 00504 0000 004 NVIDIA 111 Cg Language Toolkit define FRESN EL EXPON z STRENGTH w subfields half4 main Mu uniform uniform uniform uniform uniform uniform uniform 8 loui half4 surfC half4 mater half3 Nt SpecData half specSt half specPo half3 Vn half3 Ln half3 Nb meli gli half3 Hn half4 ligh clsitime INI detine BUM in BumpData SCALE x ier Pari I N sampler2D ColorMap sampler2D MaterialMap sampler2D NormalMap samplerCUBE EnvMap float4 SpecData float4 ReflData float4 BumpData R Wi dtl A Ad A El A color see above tangent space normals environment skybox see above see above see above ol tex2D ColorMap IN TexCoords xy ial tex2D MaterialMap tex2D NormalMap IN TexCoords xy rgb REALESO Sia 0 Sint 0 310 P JEN 5L ess C toYonetols oor d MANI PE CSS O Ml Mbit c re M i ie meson SH SIMA wer SpecData MINPOWER material NORM SP AC EXPON SpecData MAXPOW normalize IN VPosition ER SpecData MINPOWER IN OPosition normalize IN LightVecO xyz normalize BumpData BUMP SCALE IME A JEN IN dot Ln Nb normalize Vn Ln amo Mate Clute Choice sim NINA Ne EIN 318 E Nb specPower half4 diffResult lighting y surfCol ol lerp WHITE surfCol ha
47. Range Some hardware may not conform exactly to IEEE arithmetic rules Fixed point data types do not have IEEE defined rules Optimizations are allowed to produce slightly different results than unoptimized code Constant folding must be done with approximately the correct precision and range but is not required to produce bit exact results It is recommended that compilers provide an option either to forbid these optimizations or to guarantee that they are made in bit exact fashion Operator Precedence Cg uses the same operator precedence as C for operators that ate common between the two languages The swizzle and write mask operators have the same precedence as the structure member operator and the array index operator Operator Enhancements The standard C arithmetic operators unary are extended to support vectors and matrices Sizes of vectors and matrices must be appropriately matched according to standard mathematical rules Scalar to vector promotion see Smearing of Scalars to Vectors on page 179 allows relaxation of these rules Table7 Expanded Operators Operator Description M n m Matrix with n rows and m columns V n Vector with n elements V n V n Unary vector negate M n M n Unary matrix negate vin Vin gt V n Componentwise Vin V n gt V n Componentwise vin Vin gt V n Componentwise V n
48. Semantics o 202 Table 19 arbvp1 Uniform Input Binding Semantics 208 Table 20 arbvp1 Varying Input Binding Semantics 209 Table 21 arbvp1 Varying Output Binding Semantics 210 Table 22 arbfp1 Uniform Input Binding Semantics lr 212 Table 23 arbfp1 Varying Input Binding Semantics 213 Table 24 arbfp1 Varying Output Binding Semantics 213 Table 25 vp30 Uniform Input Binding Semantics 215 Table 26 vp30 Varying Input Binding Semantics sn 216 Table 27 vp30 Varying Output Binding Semantics s 216 Table 28 p30 Uniform Input Binding Semantics 219 Table 29 p30 Varying Input Binding Semantics 219 Table 30 p30 Varying Output Binding Semantics 220 Table 31 vs 1i 1 Uniform Input Binding Semantics o 225 Table 32 vs 1 1 Varying Input Binding Semantics rns 225 Table 33 vs 1 1 Varying Output Binding Semantics 226 Table 34 ps 1 x Instruction Set Modifiers lens 228 808 00504 0000 004 ix NVIDIA Cg Language Toolkit List of Tables Table 35 Supported Standard Library Functions ee ee 230 Table 36 Required Projective Texture Lookup Swizzles rn 231 Table 37 ps 1 x Uniform
49. Vertex shader input register v3 PSIZE Vertex shader input register v4 COLORO DIFFUSE Vertex shader input register v5 COLOR1 SPECULAR Vertex shader input register v6 TEXCOORDO TEXCOORD7 Vertex shader input register v7 v14 TANGENT Vertex shader input register v14 BINORMAL Vertex shader input register v15 i TANGENT is an alias for TEXCOORD7 808 00504 0000 004 225 NVIDIA Cg Language Toolkit Table 33 summarizes the valid binding semantics for varying output parameters in the vs 1 X profile These map to output registers in DirectX 8 1 vertex shaders Table 33 vs 1 1 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION Output position oPos PSIZE Output point size oPts FOG Output fog value oFog COLORO COLOR1 Output color values oDO oD1 TEXCOORDO TEXCOORD7 Output texture coordinates oTO oT7 Options When using the vs 1 1 profile under DirectX 9 it is necessary to tell the compiler to produce del statements to declare varying inputs The option profileopts dcls causes dcl statements to be added to the compiler output 226 808 00504 0000 004 NVIDIA Appendix B Language Profiles DirectX Pixel Shader 1 x Profiles ps 1 The DirectX pixel shader 1_X profiles are used to compile Cg source code to DirectX PS 1 1 PS 1 2 or PS 1 3 pixel shader assembly a Profile names ps 1 1
50. a cgGLSetParameter function is called for a varying parameter the appropriate immediate mode OpenGL entry point is called The cgGLGetParameter functions do not apply to varying parameters Setting Sampler Parameters Setting a sampler parameter requires two steps The first step consists in assigning an OpenGL texture object to the sampler parameter using void cgGLSetTextureParameter CGparameter parameter GLuint textureName where textureName is the OpenGL texture name The second step consists of enabling the sampler parameter for a specific drawing call void cgGLEnableTextureParameter CGparameter parameter Function cgGLEnableTextureParameter must be called after cgGLSetTextureParameter and before the actual drawing call The equivalent disabling function is void cgGLDisableTextureParameter CGparameter parameter You can retrieve the texture object assigned to a sampler parameter using GLuint cgGLGetTextureParameter CGparameter parameter You can retrieve the OpenGL enumerant for the texture unit associated with a sampler parameter usinp GLenum cgGLGetTextureEnum CGparameter parameter The returned enumerant has the form GL_TEXTURE _ARB where is the texture unit index OpenGL Profile Support A convenient function is provided that gives the best available profile for vertex or fragment programs depending on the available OpenGL extensions CGprofile cgGLGetLatestProfile CGGLenum profileType Param
51. an application based on this API They essentially interface between the core runtime data structures and the API data structures to provide the following facilities 808 00504 0000 004 45 NVIDIA Cg Language Toolkit Q Setting the parameter values A distinction is made between texture matrix atray vector and scalar values as those various types are handled differently by each API and have different data structures Q Executing the program Program execution is divided into program loading passing the result of the Cg compiler to the APD and program binding setting the program as the one to execute for any subsequent draw calls This is because those two operations are usually done at a different time A program is loaded each time it is recompiled and it is bound each time it needs to be executed for a particular draw call Parameter Shadowing When the value of a uniform parameter 1s set by some function of the OpenGL Cg runtime it is actually stored internally or shadowed by either the Cg or the OpenGL runtime so that it does not need to be reset every time the program 1s about to be executed This behavior is referred to as parameter sbadoming If the Direct3D Cg runtime expanded interface described in Direct3D Expanded Interface on page 69 is used parameter shadowing can be turned on ot off on a pet program basis When parameter shadowing is turned off for a given program and the value of any of its uniform paramete
52. and Destruction Programs can only be created as part of a context that acts as a program container A context is created by calling cgCreateContext CGcontext cgCreateContext A context is destroyed by cgDestroyContext void cgDestroyContext CGcontext context Context Query To check whether a context handle references a valid context or not use cgIsContext CGbool cgIsContext CGcontext context Core Cg Program There are Cg functions for creating destroying iterating over and querying programs 808 00504 0000 004 35 NVIDIA Cg Language Toolkit Program Creation and Destruction A program is created by calling either cgCreateProgram CGprogram cgCreateProgram CGcontext context CGenum programType const char program CGprofile profile const char entry const char args Of cgCreateProgramFromFile CGprogram cgCreateProgramFromFile CGcontext context CGenum programType const char program CGprofile profile const char entry const char args These functions create a program object add it to the specified context and compile the associated source code For both of them Q context is a valid context handle Q profileisan enumerant specifying the profile to which the program must be compiled Q entry is the name of the function that must be considered as the main entry point by the compiler If the value is zero the name main is used Q args is a pointer to a null
53. assumed to be in a file called FragmentProgram cg void FragmentProgram a Elo color COLORO a oaeee Coor a LE XCOORDOU out float4 coloro COLORO const uniform sampler2D BaseTexture const uniform float4 SomeColor color O Ne o Mo TS AD Bas e nece mes CO Ora a Some olio 808 00504 0000 004 77 NVIDIA Cg Language Toolkit Expanded Interface DirectD3D 9 Application The following C code links the previous vertex and fragment programs to the Direct3D 9 application finclude lt cg cg h gt include lt cg cgD3D9 h gt IDirect3DDevice9 device Initialized somewher ls IDirect3DTexture9 texture Initialized somewher ls D3DXCOLOR constantColor Initialized somewher ilc CGcontext context IDirect3DVertexDeclaration9 vertexDeclaration CGprogram vertexProgram fragmentProgram CGparameter baseTexture someColor modelViewMatrix Called at application startup void OnStartup Create Context context cgCreateContext Called whenever the Direct3D device needs to be created void OnCreateDevice Pass the Direct3D device to th xpanded interfac cgD3D9SetDevice device Determine the best profiles to use CGprofile vertexProfile cgD3D9GetLatestVertexProfile CGprofile pixelProfile cgD3D9GetLatestPixelProfile Grab the optimal options for each profile const char vertexOptions cgD3D9GetOptimalOptions vertexProfile 0 cons
54. but is cumbersome for an application that uses many programs What s worse the application is frozen in time It supports only the profiles that existed when it was compiled it cannot take advantage of the optimizations that future compilers could offer In contrast programs compiled by applications at run time Q Benefit from future compiler optimizations for the existing profiles O Run on future profiles corresponding to new 3D APIs or to hardware that did not exist at the time the Cg programs were written No Dependency Limitations If you link a Cg program to the application when it is compiled the application is too dependent on the result of the compilation The application program has to refer to the Cg program input parameters by using the hardware register names that ate output by the Cg compiler This approach is awkward for two reasons O The register names can t be easily matched to the corresponding meaningful names in the Cg program without looking at the compiler output Q Register allocations can change each time the Cg program the Cg compiler ot the compilation profile changes This means you have the inconvenience of updating the application each time as well In contrast linking a Cg program to the application program at run time removes the dependency on the Cg compiler With the runtime you need to alter the application code only when you add delete or modify Cg input parameters Input Parameter Ma
55. ca DA a te 105 Description suae eder S HER e AN da 105 Vertex Shader Source Code for Melting Paint 0 cee eee eee 105 Pixel Shader Source Code for Melting Paint s e tasa ce eee ee nh 107 MultiPailit 25 22r eS Egon abd doa Sew A CRAM awa ee RRS 109 DVS CHM OIE sy sae caer PA 109 Vertex Shader Source Code for MultiPaint 2 02 2 celere nn 110 Pixel Shader Source Code for MultiPaint 2 26600 cee ee rm n 111 Ray Iraced Refraction eres 2th 2 eened do Re par d acdadera desea A 114 DESCAPUO me M rmm 114 Vertex Shader Source Code for Ray Traced Refraction o o ooooooooo 115 Pixel Shader Source Code for Ray Traced Refracti0N oooooooooooo 116 Sisa x ees Sa ia ada e EORR 119 ii 808 00504 0000 004 NVIDIA Description ia sot a rosa a cs ii a a Gtk ths 119 Pixel Shader Source Code for Skin are tees aa a A ii ee 119 Thin Fil EMEGE 240 rara wea oka a X ST UR RC 124 regn p mara ri Ew dowd ee aa Ie BE eee eine ees 124 Vertex Shader Source Code for Thin Film Effect 2 2 2 00 eee eee 124 Pixel Shader Source Code for Thin Film Effect oooooooccoconm mmo 126 Car Paint ora e a ea a eee ae aS eee dene 127 peni EET 127 Vertex Shader Source Code for Car Paint 9 4222 an a eee 128 Pixel Shader Source Code for Car Paint occu cune ansaa m epa eqs 130 Basic Profile Sample Shaders lessen nnn 133 Anisotropic Loting sica Rr AA hax m eR RERUM d 134 Descrip z 5 55 9o Pr A A AA 134 Ver
56. cgD3D9SetDevice 69 cgD3D9SetSamplerState 73 cgD3D9SetTexture 73 cgD3D9SetTextureWrapMode 74 cgD3D9SetUniform 72 cgD3D9SetUniformArray 73 cgD3D9SetUniformMatrix 72 cgD3D9SetUniformMatrixArray 73 cgD3D9UnloadProgam 76 Direct3D 8 application 81 Direct3D 9 application 78 Direct3D device 69 fragment program 77 lost devices 70 parameters 72 array 73 sampler 73 uniform 72 profile support 76 program executiion 74 vertex program 77 HRESULT 86 minimal interface 57 cgD3D8ResourceToDeclUsage 61 cgD3D8ValidateVertexDeclaration 60 cgD3D9ResourceToDeclUsage 61 cgD3D9ValidateVertexDeclaration 60 Direct3D 8 application 67 Direct3D 9 application 64 fragment program 63 type retrieval 63 vertex declaration 57 vertex declaration for Direct3D 8 58 vertex declaration for Direct3D 9 58 vertex program 63 Direct3D debug DLL using 85 DirectX pixel shader 1 x profiles 227 DirectX pixel shader 2 x profile 200 808 00504 0000 004 269 Cg Language Toolkit DirectX vertex shader 1 1 profile 223 DirectX vertex shader 2 x profile 196 dot for performance 259 dx8ps profile deprecated 227 E explicit casts compile time 177 numeric 177 numeric matrix 177 numeric vector 177 F fixed data type 11 fixed type specification 171 float data type 10 float type specification 171 floating type category 174 for statements 185 fp20 profile 244 fp30 profile 218 fragment profiles texture lookups 17 fragment program
57. easy The Cg Language Cg is based on C but with enhancements and modifications that make it easy to wtite programs that compile to highly optimized GPU code Cg code looks 808 00504 0000 004 1 NVIDIA Cg Language Toolkit almost exactly like C code with the same syntax for declarations function calls and most data types Before describing the Cg language in detail it is important to explain the reason for some of the differences that exist between Cg and C Fundamentally it comes down to the difference in the programming models for GPUs and for CPUS Cg s Programming Model for GPUs CPUs normally have only one programmable processor In contrast GPUs have at least two programmable processors the vertex processor and the fragment processor plus other non programmable hardware units The processors the non programmable parts of the graphics hardware and the application are all linked through data flows Figure 1 illustrates Cg s model of the GPU an parem 36 AF Commands j GAU GPU Boundary GFU Command E Dura Serer Aerie Fiii ers uc bradi Polpgons Linea Location Fl Sana Eu Prora Sirasm ket GPU Potente GESEH Pr 7 a Gm Paria dl c Cem 7 Fears Haier Pretrariicerrad BM T sior rapa Fasenizad Tani red Vertice DLE Praia Figi F 5 e mana regret Verben F Figure 1 Cg s Model of the GPU The Cg language allows you to write programs for both the vertex processor and the fragment p
58. effects associated with the second and third operands always occut regardless of the conditional As an example the following would be a very efficient way to implement a vector clamp function if the min and max functions did not exist loss lensis loe ABL edL 3ElovenE mesxsyedl s Em ke lt w ipgawyeudlo sex 2 wobenygedL ass 97 NB xe cx iex 3 ebewedloxex A E 55 return x Texture Lookups in Advanced Fragment Profiles Cg s advanced fragment profiles provide a variety of texture lookup functions Please note that Cg uses a different set of texture lookup functions for basic fragment profiles because of the restricted pixel programmability of that hardware Basic fragment profile lookup functions aren t discussed in this introductory chapter Advanced fragment profile texture lookup functions always require at least two parameters Q Texture sampler A texture sampler is a variable with the type sampler sampler1D sampler2D sampler3D samplerCUBE of samplerRECT and represents the combination of a texture image with a filter clamp wrap or similar configuration Texture sampler variables cannot be set directly within the Cg language instead they must be provided by the application as uniform parameters to a Cg program Q Texture coordinate Depending on the type of texture lookup the coordinate may be a scalar a two vector a three vector or a four vector The following fragment program use
59. eight arithmetic instructions Since these numbers are quite small users need to be very aware of this limitation while wtiting Cg code for these profiles There are certain simple arithmetic operations that can be applied to inputs of texture addressing operations and to inputs and outputs of arithmetic 6 For more details about the underlying instruction sets their capabilities and their limitations refer to the MSDN documentation of DirectX pixel shaders 1 1 1 2 and 1 3 808 00504 0000 004 227 NVIDIA Cg Language Toolkit operations without generating an arithmetic instruction From here on these operations ate referred to as input modifiers and output modifiers The ps_1_x profiles also restrict when a texture addressing operation or arithmetic operation can occur in the program A texture addressing operation may not have any dependency on the output of an arithmetic operation unless Q The arithmetic operation is a valid input modifier for the texture addressing operation Q The arithmetic operation is part of a complex texture addressing operation which are summarized in the section on Auxiliary Texture Functions Modifiers Input and output modifiers may be used to perform simple arithmetic operations without generating an arithmetic instruction Instead the arithmetic operation modifies the assembly instruction or source registers to which it is applied For example the following Cg expression z x 0 5 y
60. float4 intermediate coord2 float4 prevlookup Performs the following float3 newst float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot str prevlookup xyz return tex3D CUBE tex newst where str are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate coordl are texture coordinates associated with the n 2 texture unit and intermediate coord are texture coordinates associated with the n 1 texture unit This function can be used to generate the dot product 3d or dot product cube map NV texture shader instruction combinations 808 00504 0000 004 253 NVIDIA Cg Language Toolkit Table 50 p20 Auxiliary Texture Functions continued Texture Function Description texCUBE reflect dp3x3 uniform samplerCUBE tex float4 strq float4 intermediate coordl float4 intermediate coord2 float4 prevlookup Performs the following float3 E float3 intermediate coord2 w intermediate coordl w strq w float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot strq xyz prevlookup xyz return texCUBE tex 2 dot N E dot N N N E where strq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate coordl are texture coordinates associated with the n 2 te
61. going from eye to shaded point in cube space float3 eyeVector mul ObjToCubeSpace IN Position EyePosition OUT TangentToCubeSpace0 w eyeVector x OUT TangentToCubeSpacel w eyeVector y OUT TangentToCubeSpace2 w eyeVector z transform position to projection space OUT Position mul WorldViewProj IN Position return OUT 142 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Pixel Shader Source Code for Bump and Reflection Mapping Saca swear d e float4 Position POSITION in projection space float4 TexCoord TEXCOORDO first row of the 3x3 transform from tangent to cube space float4 TangentToCubeSpace0 TEXCOORD1 second row of the 3x3 transform from tangent to cube space float4 TangentToCubeSpacel TEXCOORD2 third row of the 3x3 transform Ue from tangent to cube space float4 TangentToCubeSpace2 TEXCOORD3 floats main v2f IN uniform sampler2D NormalMap uniform samplerCUBE EnvironmentMap uniform float3 EyeVector COLOR fetch the bump normal from the normal map float4 normal tex2D NormalMap IN TexCoord xy transform the bump normal into cube space then use the transformed normal and eye vector EN to compute the reflection vector that is 4 used to fetch the cube map return texCUBE reflect eye dp3x3 EnvironmentMap IN TangentToCubeSpace2 xyz IN TangentToCubeSpaceO0 IN TangentToCubeSpacel normal EyeVec
62. in the same program Fragment profiles are required to fully support the sampler sampler1D sampler2D sampler3D and samplerCUBE data types Fragment profiles are required to provide partial support see Partial Support of Types on page 173 for the samplerRECT data type and may optionally provide full support for this data type Vertex profiles are required to provide partial support for the six sampler data types and may optionally provide full support for these data types An array type is a collection of one or more elements of the same type An array variable has a single index Some array types may be optionally designated as packed using the packed type modifier The storage format of a packed type may be different from the storage format of the corresponding unpacked type The storage format of packed types is implementation dependent but must be consistent for any particular combination of compiler and profile The operations supported on a packed type in a particular profile may be different than the operations supported on the corresponding unpacked type in that same profile Profiles may define a maximum allowable size for packed arrays but must support at least size 4 for packed vector one dimensional array types and 4x4 for packed matrix two dimensional array types 172 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification Q When declaring an array of arrays in a single declaration the packed mod
63. inputs IN uniform float4x4 modelViewProj uniform float3x4 boneMatrices 30 uniform float4 color uniform float4 lightPos oxbhejoxbues OUR float4 index IN matrixIndices float4 weight IN weights float4 position float3 normal moje Galerio ab Of al lt lt DN ammisoaeso a 3L ar d i transform the offset by bone i position position weight x float4 mul boneMatrices index x IN position xyz WON transform normal by bone i normal normal weight x mul float3x3 boneMatrices index x IN normal xyz xyz shift over the index weight variables this moves the index and weight for the current bone into 808 00504 0000 004 99 NVIDIA Cg Language Toolkit 100 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders Improved Water Description This demo gives the appearance that the viewer is surrounded by a large grid of vertices because of the free rotation but switching to wireframe or increasing the frustum angle makes it apparent that the vertices are a static mesh with the height normal and texture coordinates being calculated on the fly based on the direction and height of the viewet This technique allows for very GPU friendly water animations because the static mesh can be precomputed The vertices are displaced using sine waves and in this example a loop is used to sum five sine waves to achieve realistic effects Figure 6 Example of Improv
64. is invalid Q CG INVALID PROFILE ERROR Returned when the profile is not supported Q CG INVALID VALUE TYPE ERROR Returned when an unknown value type is assigned to a parameter Q CG NOT MATRIX PARAM ERROR Returned when the parameter is not of a matrix type Q CG INVALID ENUMERANT ERROR Returned when the enumerant parameter has an invalid value O CG NOT 4x4 MATRIX ERROR Returned when the parameter must be a 4x4 matrix type CG FILE READ ERROR Returned when the file cannot be read CG FILE WRITE ERROR Returned when the file cannot be written CG MEMORY ALLOC ERROR Returned when a memory allocation fails D D co O CG INVALID CONTEXT HANDLE ERROR Returned when an invalid context handle is used Q CG INVALID PROGRAM HANDLE ERROR Returned when an invalid program handle is used Q CG INVALID PARAM HANDLE ERROR Returned when an invalid parameter handle is used O CG UNKNOWN PROFILE ERROR Returned when the specified profile is unknown O CG VAR ARG ERROR Returned when the variable arguments are specified incorrectly O CG INVALID DIMENSION ERROR Returned when the dimension value is invalid CG ARRAY PARAM ERROR Returned when the parameter must be an array CG OUT OF ARRAY BOUNDS ERROR Returned when the index into an array is out of bounds API Specific Cg Runtimes Each API specific Cg runtimes provides an additional set of functions on top of the core Cg runtime to ease the integration of Cg to
65. is not referenced This allows Cg programs to have the same structure specify the varying output of an arbvp1 profile program and the varying input of an p30 profile program 210 808 00504 0000 004 NVIDIA Appendix B Language Profiles OpenGL ARB Fragment Program Profile arb p1 Memory The OpenGL ARB Fragment Program Profile is used to compile Cg source code to fragment programs compatible with version 1 0 of the GL ARB fragment program OpenGL extension a Profile name arbfp1 Q How to invoke Use the compiler option profile arbfpl The arbfp1 profile limits Cg to match the capabilities of OpenGL ARB fragment programs This section describes the capabilities and restrictions of Cg when using the arb p1 profile Program Instruction Limits OpenGL ARD fragment programs have a limit on number of instructions in an ARB fragment program ARB fragment programs are limited to number of instructions that can be queried from underlying OpenGL implementation using MAX PROGRAM INSTRUCTIONS ARB with a minimum value of 72 There are limits on number of texture instructions minimum limit of 24 and atithmetic instructions minimum limit of 48 that can be quetied from OpenGL implementation If the compiler needs to produce more than maximum allowed instructions to compile a program it reports an error Vector Register Limits Likewise there are limited numbers of registers that can be queried from OpenGL implementat
66. m01 m02 Bones i MOO m01 m02 mo MIO mi mi2 Bones i s mi mil m2 un 1020 21 1022 Bones 11 120 21 1227 float3 posl mul Bones i tempPos II crensian Sp Ty Sew float3 sl mul m IN S Hoare acil mul m IN T float3 sxtl nmol Git JENIS San o final blending fi oland m Ey EXE float3 finalSxT blend between the two positions float3x3 worldToTangentSpace Basic Profile Sample Shaders float3 finalS 30 UN Meuemesos ap Sil SIN MEL ES Y float3 finalT t0 IN Weights x tl IN Weights y sxt0 IN Weights xt sxtl IN Weights y float3 finalPos pos0 IN Weights xtpos1 IN Weights y worldToTangentSpace m00 m01 m02 finalS worldToTangentSpace m10 ml11 m12 finalT worldToTangentSpace m20 m21 m22 finalSxT float3 tangentLight normalize mul worldToTangentSpace LightVec i Seale cue bias edel bit Que eulos tangentLight tangentLight 1 0 0 5 0 2 create float4 with 1 0 alpha float4 tempLight tempLight xyz tangentLight xyz tempLight w 1 0 808 00504 0000 004 NVIDIA 163 Cg Language Toolkit 164 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification Language Overview The Cg language is primarily modeled on ANSI C but adopts some ideas from modern languages such as C and Java and from earlier shading languages such as RenderMan and the Stanford shading language The language al
67. matrix type when applied to another matrix type of the same number of rows and columns 808 00504 0000 004 177 NVIDIA Cg Language Toolkit Type Equivalency Type T1 is equivalent to type T2 if any of the following are true Q T2is equivalent to T1 a T1 and T2 are the same scalar vector or structure type A packed array type is ot equivalent to the same size unpacked array T1 is a typedef name of T2 T1 and T2 are arrays of equivalent types with the same number of elements Q The unqualified types of T1 and 12 are equivalent and both types have the same qualifications Q T1 and T2 are functions with equivalent return types the same number of parameters and all corresponding parameters are pair wise equivalent Type Promotion Rules The cfloat and cint types behave like float and int types except for the usual arithmetic conversion behavior and function overloading rules see Punction Ovetloading on page 181 The usual arithmetic conversions for binary operators are defined as follows 1 Ifeither operand is double the other is converted to double 2 Otherwise if either operand is float the other operand is converted to float 3 Otherwise if either operand is half the other operand is converted to half 4 Otherwise if either operand is fixed the other operand is converted to fixed 5 Otherwise if either operand is c 1oat the other operand is converted to cfloat 6 Otherwise if either operand i
68. negative one half scale by two 2 x 0 5 unsigned reg saturate x i e min x max x 1 0 unsigned invert reg 1 saturate x half bias reg x 0 5 reg x expand reg 2 x 0 5 Language Constructs and Support Data Types In the p20 profile operations occur on signed clamped floating point values in the range 1 to 1 These profiles allow all data types to be used but all operations are carried out in the above range Refer to the NV texture shader and NV register combiners documentation for more details Statements and Operators The p20 profile supports all of the Cg language constructs with the following exceptions a Arbitrary swizzles are not supported though arbitrary write masks are Only the following swizzles are allowed x r y g z b w a xy rg xyz rgb xyzw rgba xxx rrr yyy ggg zzz bbb www aaa xxxx rrrr yyyy gggg zzzz bbbb wwww aaaa Matrix swizzles are not supported Boolean operators other than lt lt gt and gt are not suppotted Purthermore lt lt gt and gt are only supported as the condition in the operator Bitwise integer operators ate not supported is not supported unless the divisor is a non zero constant or it is used to compute the depth output 246 808 00504 0000 004 NVIDIA Appendix B Language Profiles Q is not supported Q Ternary is supported if the boolean test exp
69. not normally noticeable except when declaring a vatiable that will hold the value of a boolean expression Cg also supports the C comparison operators which produce values of type bool lt less than lt less than or equal to inequality equality gt greater than or equal to gt greater than 808 00504 0000 004 15 NVIDIA Cg Language Toolkit Unlike C Cg allows all boolean operators to be applied to vectots in which case boolean operations are performed in an elementwise fashion The result of such a boolean expression is a vector of bool elements with that number of elements being the same as the two source vectots Also unlike C the logical AND amp amp and logical OR operators cannot be used for short circuiting evaluation side effects of both sides of these expressions always occur regardless of the value of the boolean expression Swizzle Operator Cg has a swixz e operator that allows the components of a vector to be rearranged to form a new vector The new vector need not be the same size as the original vector elements can be repeated ot omitted The characters x y z and w represent the first second third and fourth components of the original vector respectively The characters r g b and a can be used for the same purpose Because the swizzle operator is implemented efficiently in the GPU hardware its use is usually free The following ate some examples of swizzling float3 a
70. profile implements data types as follows Q float data type is implemented as IEEE 32 bit single precision Q half data type is implemented as float Q int data type is supported using floating point operations which adds extra instructions for proper truncation for divides modulos and casts from floating point types Q fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables 214 808 00504 0000 004 NVIDIA Appendix B Language Profiles Statements and Operators This profile is a superset of the vp20 profile Any program that compiles for the vp20 profile should also compile for the vp30 profile although the converse is not true The additional capabilities of the vp30 profile beyond those of vp20 are Q for while and do loops are supported without requiring loop unrolling Q Full support for if else allowing non constant conditional expressions Bindings Binding Semantics for Uniform Data Table 25 summarizes the valid binding semantics for uniform parameters in the vp30 profile Table 25 vp30 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c255 Constant register 0 255 C0 C255 The aliases c0 c255 lowercase are als
71. program compute a position output This homogeneous clip space position is used by the hardware rasterizer and must be stored in a program output with an output binding semantic of POSITION or HPOS for backward compatibility Position Invariance In many graphics APIs the user can choose between two different approaches to specifying per vertex computations use a built in configurable fixed function pipeline or specify a user written vertex program If the user wishes to mix these two approaches it is sometimes desirable to guarantee that the position computed by the first approach is bit identical to the position computed by the second approach This position invariance is particularly important for multipass rendering Support for position invariance is optional in Cg vertex profiles but for those vertex profiles that support it the following rules apply Q Position invariance with respect to the fixed function pipeline is guaranteed if two conditions are met The vertex program is compiled using a compiler option indicating position invariance posinv for example The vertex program computes position as follows OUT POSITION mul MVP IN POSITION where OUT POSITION is a variable or structure element of type float4 with an output binding semantic of POSITION oft HPOS IN POSITION is a variable or structure element of type float4 with an input binding semantic of POSITION MVP is a uniform variable or structu
72. space coordinates Therefore the vertex s model space position given by IN Position needs to be transformed by the concatenation of the modelview and projection matrices called ModelViewProj in this example The transformed position is assigned directly to OUT HPosition Note that you are not responsible for the perspective division when using vertex programs The hardware automatically performs the division after executing the vertex program 808 00504 0000 004 93 NVIDIA Cg Language Toolkit Since we want to do our lighting in eye space we have to transform the model space normal IN Normal to eye space transform normal from model space to view spac float3 normalVec normalize mul ModelViewIT IN Normal xyz Remember that when transforming normals we need to multiply by the inverse transpose of the modelview matrix Then we normalize the eye space normal vector and store it as normalVec Prepare for Lighting The subsequent steps prepare for lighting store normalized light vector float3 lightVec normalize LightVec xyz calculate half angle vector float3 eyeVec float3 0 0 0 0 1 0 float3 halfVec normalize lightVec eyeVec At this point we have to ensure that all our vectors are normalized We start by normalizing LightVec Then in preparation for specular lighting we have to define the half angle vector halfvec which is the vector halfway between the light and the eye vector
73. term float shadow saturate 4 dot normal xyz IN LightVectorUnsigned xyz compute final color return Ambient color shadow illumination color illumination wwww 808 00504 0000 004 139 NVIDIA Cg Language Toolkit Bump Reflection Mapping Description This effect mixes bump mapping and reflection mapping based on the texm3x3vspec DirectX 8 pixel shader instruction DOT_PRODUCT_REFLECT_CUBE_MAP in OpenGL This instruction computes three dot products to transform the normal fetched from the normal map into the environment cube space reflects the transformed normal with respect to the eye vector and fetches a cube map to get the final color The vertex shader is responsible for computing the transform matrix and the eye vector Figure 15 Figure 15 Example of Bump Reflection Mapping 140 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Bump Reflection Mapping struct a2v H float4 Position POSITION in object space float2 TexCoord TEXCOORDO loss UW 2 MU TURON 9 STONE in object space float3 B TEXCOORD2 in object space float3 N TEXCOORD3 in object space Struct fay yo float4 Position POSITION in projection space float4 TexCoord TEXCOORDO ff ei TOW our taS 929 itacresavelirroncim from tangent to cube space float4 TangentToCubeSpace0 TEXCOORD1 second row of the 3x3 tran
74. than b e Returns x otherwise cos x Cosine of x cosh x Hyperbolic cosine of x cross a b Cross product of vectors a and b a and b must be 3 component vectors degress x Radian to degree conversion determinant M Determinant of matrix M dot a b Dot product of vectors a and b exp x Exponential function e exp2 x Exponential function 2 floor x Largest integer not greater than x 20 808 00504 0000 004 NVIDIA Table 1 Cg Standard Library Functions Mathematical Functions continued Mathematical Functions Function Description fmod x y Remainder of x y with the same sign as x If y is zero the result is implementation defined frac x Fractional part of x frexp x out exp Splits x into a normalized fraction in the interval 1 2 1 which is returned and a power of 2 which is stored in exp If x is zero both parts of the result are zero isfinite x Returns true if x is finite isinf x Returns true if x is infinite isnan x Returns true if x is NaN not a number ldexp x n x 27 lerp a b f Linear interpolation 1 a b where a and b are matching vector or scalar types Parameter can be either a scalar or a vector of the same type as a and b lit ndotl ndoth m Computes lighting coefficients for ambient diffuse and specular light contributions Retu
75. the Cg Language on page 1 A quick introduction to the current release of Cg with everything you need to know to start wotking it Cg Standard Library Functions on page 19 A list of the Standard Library functions which can help to reduce your program development time Using the Cg Runtime Library on page 29 An introduction to the Cg runtime APIs which allow you to easily compile Cg programs and pass data to them from within applications A Brief Tutorial on page 89 A description of a simple Cg program and Microsoft Visual Studio wotkspace both provided on the accompanying CD that you can use to start experimenting with Cg Advanced Profile Sample Shaders on page 97 A list of sample NV30 shaders complete with source code Basic Profile Sample Shaders on page 133 A list of sample NV2X shadets complete with source code Appendix A Cg Language Specification on page 165 The formal Cg language specification Appendix B Language Profiles on page 195 Describes features and restrictions of the currently supported language profiles DirectX 8 vertex DirectX 8 pixel OpenGL ARB vertex NV2X OpenGL vertex NV30 OpenGL vertex and NV30 OpenGL fragment Appendix C Nine Steps to High Performance Cg on page 257 Strategies for getting the most out of your Cg code Appendix D Cg Compiler Options on page 265 A list of the various command line options that the Cg compiler accepts xiii
76. the language by using a compiler command line switch for example The profile restrictions are only applied to the top level function that is being compiled and to any variables or functions that it references either directly or indirectly If a function is present in the source code but not called directly or indirectly by the top level function it is free to use capabilities that are not supported by the current profile The intent of these rules is to allow a single Cg source file to contain many different top level functions that are targeted at different profiles The core Cg language specification is sufficiently complete to allow all of these functions to be parsed The restrictions provided by a compilation profile are only needed for code generation and are therefore only applied to those functions for which code is being generated This specification uses the word program to refer to the top level function any functions the top level function calls and any global vatiables or typedef definitions it references 168 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification Each profile must have a separate specification that describes its characteristics and limitations This core Cg specification requires certain minimum capabilities for all profiles In some cases the core specification distinguishes between vertex program and fragment program profiles with different minimum capabilities fot each The Unifor
77. three ways a The binding semantic is specified in the formal parameter declaration for the function The syntax for formal parameters to a function is const in out inout type identifier lt binding semantic gt lt initializer gt Q Ifthe formal parameter is a struct the binding semantic may be specified with an element of the struct when the struct is defined struct struct tag type lt identifier gt lt binding semantic gt Q If the input to the function is implicit a non static global variable that is read by the function the binding semantic may be specified when the non static global variable is declared type lt identifier gt lt binding semantic gt lt initializer gt 808 00504 0000 004 183 NVIDIA Cg Language Toolkit If the non static global variable is a struct the binding semantic may be specified when the struct is defined as described in the second bullet above Q A binding semantic may be associated with the output of a top level function in a similar manner type identifier lt parameter list gt lt binding semantic gt lt body gt Another method available for specifying a semantic for an output value is to return a struct and to specify the binding semantic s with elements of the struct when the struct is defined In addition if the output is a formal parameter the binding semantic may be specified using the same a
78. uniform sampler2D tex float2 st float4 prevlookup uniform float4 m offsettexRECT uniform samplerRECT tex float2 st float4 prevlookup uniform float4 m Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy return tex2D RECT tex newst where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation and m is the offset texture matrix This function can be used to generate the o set 2d or offset rectangle NV texture shader instructions 808 00504 0000 004 251 NVIDIA Cg Language Toolkit Table 50 p20 Auxiliary Texture Functions continued Texture Function Description offsettex2DScaleBias uniform sampler2D tex float2 st float4 prevlookup uniform float4 m uniform float scale uniform float bias offsettexRECTScaleBias uniform samplerRECT tex float2 st float4 prevlookup uniform float4 m uniform float scale uniform float bias Performs the following float2 newst st m xy prevlookup xx m zw prevlookup yy float4 result tex2D RECT tex newst return result saturate prevlookup z scale bias where st are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation m is the offset texture matrix scale is the offset texture scale and bias is the offset texture bias This function can be used to generate the o fset 2d scale o
79. used Figure9 Example of Ray Traced Refraction 114 808 00504 0000 004 NVIDIA Vertex Shade Advanced Profile Sample Shaders r Source Code for Ray Traced Refraction struct appin float4 Position see Oouelak N float4 Normal NORMAL y output same struct is the input to fragment shader struct EyeV2F float4 HPosition POSITION clip space pos float3 OPosition float3 VPosition float3 N float4 LightVecO OUT OUT uA OUT OUT Sy TEXCOORDO Obj coords location TEXCOORD1 eye pos obj space TEXCOORD2 normal obj space TEXCOORD3 light dir obj sp a q E EyeV2F main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewI uniform float4 LightVec in EYE coords EyeV2F OUT calculate clip space position for rasterizer use HPosition mul ModelViewProj IN Position pass through object space position MOROS TENON AUN POSPONE object space normal T N normalize IN Normal xyz transform view pos and light vec to obj space VPosition mul ModelViewI float4 0 0 0 1 xyz OUT LightVecO normalize mul ModelViewI LightVec return OUT 808 00504 0000 004 115 NVIDIA Cg Language Toolkit Pixel Shader Source Code for Ray Traced Refraction Assume ray direction is normalized Vector planeEq is encoded half3 A B C D where Ax By Cz D 0 and half3 A
80. void cgGLDisableProfile CGprofile profile Some profiles may not be supported on some systems For example a given profile is not supported if the OpenGL extensions it requires are not available You can check if a profile is supported by using cgGLIsProfileSupported CGbool cgGLIsProfileSupported CGprofile profile It returns CG TRUE if profile is supported and CG FALSE otherwise OpenGL Program Examples This section presents code that illustrates how to use functions from the OpenGL Cg interface to make Cg programs work with OpenGL The vertex and fragment programs below are used in OpenGL Application on page 54 OpenGL Vertex Program The following Cg code is assumed to be in a file called VertexProgram cg void VertexProgram in float4 position PROS ELTON in float4 color Fe OOO in float4 texCoord 2 LLEXCOORDO out float4 positionO POSITION Ote locus colo COLORO out float4 texCoordO TEXCOORDO 808 00504 0000 004 53 NVIDIA Cg Language Toolkit const uniform float4x4 ModelViewMatrix positionO mul position ModelViewMatrix colorO color texCoordO texCoord OpenGL Fragment Program The following Cg code is assumed to be in a file called FragmentProgram cg void FragmentProgram in float4 color COLORO in float4 texCoord TEXCOORDO out float4 coloro amp egi const uniform sampler2D BaseTexture const uniform float4 SomeColor colorO color tex2D
81. x tanh x Hyperbolic tangent of x transpose M Matrix transpose of matrix M If M is an AxB matrix the transpose of M is a BxA matrix whose first column is the first row of M whose second column is the second row of M whose third column is the third row of M and so on 808 00504 0000 004 23 NVIDIA Cg Language Toolkit Geometric Functions Table 2 presents the geometric functions that are provided in the Cg Standard Library Table 2 Geometric Functions Geometric Functions Function Description distance ptl pt2 Euclidean distance between points pt1 and pt2 faceforward N I Ng N ifdot Ng I lt 0 otherwise N length v Euclidean length of a vector normalize v Returns a vector of length 1 that points in the same direction as vector v reflect i n Computes reflection vector from entering ray direction i and surface normal n Only valid for 3 component vectors refract i n eta Given entering ray direction i surface normal n and relative index of refraction eta computes refraction vector If the angle between i and n is too large for a given eta returns 0 0 0 Only valid for 3 component vectors 24 808 00504 0000 004 NVIDIA Cg Standard Library Functions Texture Map Functions Table 3 presents the texture functions that are provided in the Cg Standard Library These texture functions are fully supported by the ps_2 arbfp1 and
82. you can precompute the function in your application and store it in a texture map replacing calls like loci val ilz y with code like float val tex2D fSampler float2 x y x This method can also be applied to one and three dimensional functions using 1D and 3D texture maps More generally the values you pass to the function may not be in the range 0 1 and the values your function returns may not be in the range 0 1 In this case the following two utility functions can serve as a base remapTo01 remaps the range low high into 0 1 remapFrom01 does the opposite float4 remapTo01 float4 v float4 low float4 high return saturate v low high low float4 remapFrom01 float4 v float4 low float4 high return lerp low high v Don t forget vectorization here as well If two float valued functions have the same domain and range you can pack them into two texture components of the same texture Only one texture lookup is needed to load them both and vectorized versions of the remap can be used to do the remapping more efficiently as well 260 808 00504 0000 004 NVIDIA Appendix C Nine Steps to High Performance Cg 5 Use Data Types with Minimum Sufficient Precision For profiles that support multiple precisions a general rule of thumb is that if you can do a computation with fixed precision variables the computation is faster than if you use half and if you use half the comput
83. 08 00504 0000 004 NVIDIA a Appendix A Cg Language Specification Array conversions No convetsions of array types are allowed Table 6 summarizes the type conversions discussed here The table entries have the following meanings but please pay attention to the footnotes a a a a Allowed allowed implicitly or explicitly Warning allowed but warning issued if implicit Explicit only allowed with explicit cast No not allowed Table 6 Type Conversions Target Type Source Type Scalar Vector Matrix Struct Array Scalar Allowed Warning Warning Explicit No Vector Allowed Allowed Warning Explicit No Matrix Allowed Warning Allowed Explicit No Struct Explicit No No Explicit No Array No No No No No i Only allowed if the first member of the source can be converted to the target ii Not allowed if target is larger than source Warning issued if target is smaller than source ii Only allowed if source and target are the same total size iv Only allowed if both source and target have the same number of members and each member of the source can be converted to the corresponding member of the target Explicit casts are Q Compile time type when applied to expressions of compile time type Q Numeric type when applied to expressions of numeric or compile time type Q Numeric vector type when applied to another vector type of the same number of elements Q Numeric
84. 155 Descriptloll ous qoe rad e A A ROR Ewa ees 155 Vertex Shader Source Code for Shadow Volume Extrusion llli 156 Sine Wave DEMO suse s adum s POSS SEERA Rd ai SACR CER RR 158 bs goo na MMC T I 158 Vertex Shader Source Code for Sine Wave isses rar ka Rs 159 Matrix Palette Skinning 5 ico wb Oe Re ee RARI UDeE GU ERE Laud LASSE eRe 161 DDSScFIDUON fests x 80k dor ciat mart es a end aoo A 161 Vertex Shader Source Code for Matrix Palette Skinning o o oooooo 162 808 00504 0000 004 ii NVIDIA Cg Language Toolkit Appendix A Cg Language Specification ooooooooccccc nnnm 165 Language OVSIVI amp W 2 53 32 ab sod after odios Bul a RE quendi ie det Sidi 165 Silent Incompatbilities ssa x xut rr a dn tr hne Re Rh or Rene 165 Similar Operations That Must be Expressed Differently o oooooooo 165 Differences from ANSL C cias e ec Ru ru abs cR RR HERR 166 Detailed Language Specification is 2i eci ok RR RR RERO ERR RE 168 Definitions 4a dod e eer Oa ORR oR SE bem kae PU RUD 4p BORNE I eno da 168 Profiles ss pi mE usd dd RI RR E CiU EIER ER RUE as 168 The Uniform Modifier a cs ree y e ala a R cR Ron 169 Function DeclaraltlOrs s 3 2 2 22 eas ros Ce EE aaa ls 169 Overloading of Functions by Profile serrtis entieri m hn 170 Syntax for Parameters in Function Definitions llle 171 Function Call sc 3 nk none RR R Rh EUR GR EROR RR Da DEA RA EEA Beate a 171 uo rr ad ra id
85. 188 Operator Enhancements sss esa tka sra kc E PRO RE EROR ERE 188 olco SCC e ea ea e e a a d aa a 189 Reserved WOEdS aora A RA 191 Cg Standard Library Functians iiu sx a e da Ue E AS 191 Vertex Program Profiles iacu iu ini 192 Mandatory Computation of Position Output llle 192 iv 808 00504 0000 004 NVIDIA Position Imvatial e 3 a anu nm harundo Xx cR AR a a 192 Binding Semantics for OutpUtS Hi ar eee ead tracer donando e died 193 Fragment Program Profiles sisi nage hots aie Re eae debe Melee a ee US Sto Se 193 Binding Semantics for OUEDUlS ccna e aise ee die hd ea 193 Appendix B Language Profiles 2 0 06 000 00 0sse eee ee eee 195 DirectX Vertex Shader 2 x Profiles vs 2 0 cece eect iishie inan iia 196 OVEVI Wicca or ta nae wad Ree OE X FCR Peas Red aaa ons emo 196 MEMORY si erta enpa Eh a b OE Gee ae o a ebd ee 196 Statements and Operators i esr e iu pd ek A Ao e 197 Data IDES 5 aide ach a ERR CR CER CR CIRC a ac OR CR 197 Using Arfdys ix cedro bere ve dane eee eRe x RU RR Pd hw cuo Fab Oe n 197 BINDINGS lt Sx wastes exec eed e ek da de Ead REG 198 CODES su costuatd ola aiite in diis dereud mv dcs Sees Pada S sainte ue E 199 DirectX Pixel Shader 2 x Profiles ps 2 Jo uude ic dc aeter eee tee eda 200 MEMON EET a e ert EN E MERE 200 Language Constructs and S ppott va t a 94s 40 bce Re a a ew eee 201 BINGINGS zs kae n n AAA RETO RE a AAA 202 A HER 203 Limitations i
86. 3 If the number of functions remaining in the set is not one then fail Global Variables Global variables ate declared and used as in C Uniform non static variables may have a semantic associated with them Uniform non static variables may have their value set through the run time API Use of Uninitialized Variables It is incorrect for a program to use an uninitialized variable However the compiler is not obligated to detect such errors even if it would be possible to do so by compile time data flow analysis The value obtained from reading an uninitialized variable is undefined This same rule applies to the implicit use of a variable that occurs when it is returned by a top level function In particular if a top level function returns a struct and some element of that struct is never written then the value of that element is undefined Note Variables are not defined as being initialized to zero because this would result in a performance penalty in cases where the compiler is unable to determine if a variable is properly initialized by the programmer Preprocessor Cg profiles must support the full ANSI C standard preprocessor capabilities fif define and so on However Cg profiles are not required to support macto like define or the use of include directives 182 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification Overview of Binding Semantics In stream processing architectures data packets f
87. 4 195 NVIDIA Cg Language Toolkit DirectX Vertex Shader 2 x Profiles vs 2 The DirectX Vertex Shader 2 0 profiles are used to compile Cg source code to DirectX 9 VS 2 0 vertex shaders and DirectX 9 VS 2 0 Extended vertex shaders Q Profile names vs 2 0 for DirectX 9 VS 2 0 vertex shaders vs 2 x for DirectX 9 VS 2 0 extended vertex shaders Q How to invoke Use the compiler options profile vs 2 0 profile vs 2 x This section describes how using the vs 2 0 and vs 2 x profiles affects the Cg source code that the developer writes Overview Memory The vs 2 0 profile limits Cg to match the capabilities of DirectX VS 2 0 vertex shaders The vs 2 x profile is the same as the vs 2 0 profile but allows extended features such as dynamic flow control branching DirectX 9 vertex shadets have a limited amount of memory for instructions and data Program Instruction Limit DirectX 9 vertex shaders are limited to 256 instructions If the compiler needs to produce more than 256 instructions to compile a program it reports an error Vector Register Limit Likewise there are limited numbers of registers to hold program parameters and temporary results Specifically there are 256 read only vector registers and 12 32 read write vector registers If the compiler needs more registers to compile a program than are available it generates an error 1 To understand the DirectX VS 2 0 Vertex Shaders and the code the compile
88. 535 07 pack_4byte float pack 4byte float4 a float pack 4byte half4 a Converts the four components of a into 8 bit signed integers The signed integers are such that a representation with all bits set to 0 corresponds to the value 128 127 and a representation with all bits set to 1 corresponds to 127 127 The four signed integers are then packed into a single 32 bit result This operation may be reversed using the unpack_4byte function C Pseudocode Gors rowne lA clama 128 127 1329373029 MAS Z o S7 Tome elano anys 129 127 127 127 s 128 5 lo romme 27 elema laczy 128 1277 L27127 MLS p plo wy oa clamo tasaw 128 127 129 712 a 128 7 edi que lt lt 24 Gosa lt lt 16 me lt lt 9 tle xe 808 00504 0000 004 221 NVIDIA Cg Language Toolkit unpack_4byte half4 unpack 4byte float a Unpacks four 8 bit integers from a and scales the results into individual 16 bit floating point values between 128 127 and 127 127 C Pseudocode Festes Ma gt gt 0 Oui 128 Y 127 07 SSI a gt gt O 128 127 405 resuule yz c Ulea 49 E dom 128 127 08 Festes la gt gt 24 Ox 128 127 07 pack_4ubyte float pack 4ubyte float4 a float pack 4ubyte half4 a Converts the four components of a into 8 bit unsigned integers The unsigned integers are such that a representation with all bits set to 0 corresponds to 0
89. 9 ELOENES Eloisa yl 2 La dB iOS SEA A AOS Here we ve again got a lot of arithmetic operations each using a single pair of float values Some cleverness lets us turn this into a vectorized operation Below is the implementation of the cross function from the Cg Standard Library requiring just two vector multiply operations and one vector subtraction operation floes Cross blogs a Eloars 19 4 TSIEN EL GVWR 7 DOES lq Sy E OSNES Confirm for yourself that this computes the same value as the first section of code for the cross product note that it exposes much more vectorized computation for the GPU to efficiently process 3 Use the Cg Standard Library The functions in the Cg Standard Library have been carefully written for both efficiency and correctness By using Standard Library functions when appropriate you can automatically take advantage of the work that went into making sure they compile to fast code on GPUs while you concentrate on the hard problems yow re solving in your own shaders Particularly fast Standard Library functions include dot which computes the dot product of two vectors abs which computes the absolute value of a variable saturate which clamps a value to be between zero and one and min and max which return the minimum and maximum of a pair of values You won t be able to write more efficient implementations of these functions than the Standard Library pr
90. B C has been normalized Returns distance along to to intersection distance is negative if no intersection half intersect plane half3 rayOrigin half3 rayDir half4 planeEg half3 planeN planeEq xyz half denominator dot planeN rayDir half result 1 0h d 0 parallel d 0 gt faces away if denominator lt 0 0h half top dot planeN rayOrigin planeEq w result top denominator return result subfields in BallData define RADIUS x define IRIS DEPTH y define ETA z define LENS DENSITY w subfields in SpecData define PHONG x define GLOSS1 y define GLOSS2 z define DROP w struct EyeV2F silio d ted EO Site OL 2 O SII ONIS Flogs iS kao nue mE LEEK OORD 0s float3 VPosition TEXCOORD1 float3 N TEXCOORD2 float4 LightVecO TEXCOORD3 H half4 main EyeV2F IN uniform sampler2D ColorMap color components radius irisDepth eta lensDensity 116 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders uniform float4 BallData components phongExp glossi gloss2 drop uniform float4 GlossData uniform float3 AmbiColor uniform float3 DiffColor uniform float3 SpecColor uniform float3 LensColor massa Oconee CC OMe COLOR const half3 baseTex half3 1 0h 1 0n 1 0h const half GRADE 0 05h const half3 yAxis nales 10 Ola il Olay 0 5 Ola CONS ESAS aele S AL Ola
91. BaseTexture texCoord SomeColor OpenGL Application This C code links the previous vertex and fragment programs to the application finclude lt cg cg h gt include lt cg cgGL h gt float vertexPositions Initialized somewher ls float vertexColors Initialized somewher ls float vertexTexCoords Initialized somewher ls GLuint texture Initialized somewher ls float constantColor Initialized somewher ls eS OmscssM S enis crie CGprogram vertexProgram fragmentProgram CGprofile vertexProfile fragmentProfile CGparameter position color texCoord baseTexture someColor modelViewMatrix Called at initialization void CgGLInit Create context context cgCreateContext Initialize profiles and compiler options vertexProfile cgGLGetLatestProfile CG GL VERTEX cgGLSetOptimalOptions vertexProfile 54 808 00504 0000 004 NVIDIA Using the Cg Runtime Library fragmentProfile cgGLGetLatestProfile CG GL FRAGMENT cgGLSetOptimalOptions fragmentProfile Create the vertex program vertexProgram cgCreateProgramFromFile context CG SOURCE VertexProgram cg vertexProfile VertexProgram 0 Load the program cgGLLoadProgram vertexProgram Create the fragment program fragmentProgram cgCreateProgramFromFile context CG SOURCE FragmentProgram cg fragmentProfile FragmentProgram 0 Loa
92. CopyProgram CGprogram cgCopyProgram CGprogram program This function creates a new program object that is a copy of program and adds it to the same context So you can have several versions of the same original program each of them modified in a particular way Program Iteration The programs within a context are sequentially ordered and can be iterated over by using cgGetFirstProgram and cgGetNextProgram CGprogram cgGetFirstProgram CGcontext context CGprogram cgGetNextProgram CGprogram program The first program of the sequence is retrieved by egGetFirstProgram lf the context is invalid or does not contain any program the function returns zero Given a program cgGetNextProgram returns the program immediately next in the sequence or zero if there is none Here is how those two functions would typically be used given a valid context named context CGprogram program cgGetFirstProgram context while program 0 Here is the code that handles the program program cgGetNextProgram program Nothing is guaranteed regarding the order of the programs in the sequence or how cgGetFirstProgram and cgGetNextProgram behave when programs are created or destroyed during iteration Program Query Program queries encompass validity compilation results and attributes 808 00504 0000 004 37 NVIDIA Cg Language Toolkit Program Validity Use cgIsProgram to check whether a program handle refere
93. D3D9SetTextureWrapMode parameter D3DWRAP U D3DWRAP V Parameter Shadowing Parameter shadowing can be enabled or disabled on a per program basis Q When loading the program see Expanded Interface Program Execution on page 74 Q At any time using HRESULT cgD3D9EnableParameterShadowing CGprogram program CGbool enable for which enable should be set to CG_TRUE to enable parameter shadowing and to CG_FALSE to disable it To know if parameter shadowing is enabled for a given program use CGbool cgD3D9IsParameterShadowingEnabled CGprogam program This function returns CG TRUE if parameter shadowing is enabled for program Expanded Interface Program Execution To load a program in Direct3D 9 use cgD3D9LoadProgram HRESULT cgD3D9LoadProgram CGprogram program CG BOOL parameterShadowingEnabled DWORD assembleFlags This function assembles the result of the compilation of program using D3DXAssembleShader with assembleFlags as the D3DXASM flags Depending on the program s profile it then either uses 74 808 00504 0000 004 NVIDIA Using the Cg Runtime Library IDirect3DDevice9 CreateVertexShader to create a Direct3D 9 vertex shader or uses IDirect3DDevice9 CreatePixelShader to create a Direct3D 9 pixel shader Here is a typical use of the function HRESULT hresult cgD3D9LoadProgram vertexProgram TRUE D3DXASM DEBUG HRESULT hresult cgD3D9LoadProgram fragmentProgram TRUE 0
94. DIA Cg Language Toolkit 274 808 00504 0000 004 NVIDIA
95. DTSS MIPFILTER for sampler parameter BaseTexture cgD3D TRACE Deleting vertex shader for program 3 cgD3D TRACE Deleting pixel shader for program 24 To use the debug DLL 1 Link your application against cgD3D9d lib or cgD3D8d lib instead of cgD3D9 lib ot cgD3D8 lib 2 Make sure that the application can find egD3D9d d11 or cgb3D8d d11 3 Turn on and turn off tracing of portions of your code using cgD3D9EnableDebugTracing void cgD3D9EnableDebugTracing CGbool enable Here is how you would enable debug tracing for part of the application code cgD3D9EnableDebugTracing CG TRUE fl Application code that is traced 4 cgD3D9EnableDebugTracing CG FALSE Note that each debug trace output sets an error equal to cgD3D9DebugTrace So if an error callback has been registered with the core runtime using cgSetErrorCallback each debug trace output triggers a call to this error callback see Using Error Callbacks on page 87 Direct3D Error Reporting Error reporting in Cg includes defined error types functions that allow testing for errors and support for error callbacks Direct3D Error Types The Direct3D runtime generates errors of type CGerror reported by the Cg core runtime and of type HRESULT reported by the Direct3D runtime In addition it returns the errors listed in the next two groups that are specific to the Direct3D Cg runtime 808 00504 0000 004
96. GetArraySize gives the size of every dimension For example for float4 array 10 100 cgGetArraySize array 0 returns 10 and cgGetArraySize array 1 returns 100 An atray anArray has cgGetArraySize anArray 0 elements If its dimension is greater than one those elements are themselves arrays Here is how all these iteration functions would typically be used given a valid program named program void IterateProgramParameters CGprogram program RecurseProgramParameters cgGetFirstParameter program CG PROGRAM void RecurseProgramParameters CGparameter parameter if parameter 0 ESCU do switch cgGetParameterType parameter Sas CIG SEQUI S RecurseProgramParameters cgGetFirstStructParameter parameter break case CG ARRAY int arraySize cgGetArraySize parameter 0 core me 3b p 3L Aras ama RecurseProgramParameters cgGetArrayParameter parameter i break 40 808 00504 0000 004 NVIDIA Using the Cg Runtime Library default Here is the code that handles the parameter break while parameter cgGetNextParameter parameter 0 If you do not need to know how the parameters are organized in terms of structure and arrays you can also iterate through all of them using cgGetFirstLeafParameter and cgGetNextLeafParameter CGparameter cgGetFirstLeafParameter CGprogram program CGenum namespace CGparameter cgGetNextLeafParameter CGparameter
97. OOO 808 00504 0000 004 63 NVIDIA Cg Language Toolkit const uniform sampler2D BaseTexture const uniform float4 SomeColor colorO color tex2D BaseTexture texCoord SomeColor Direct3D 9 Application The following C code links the previous vertex and fragment programs to the Direct3D 9 application include lt cg cg h gt include lt cg cgD3D9 h gt IDirect3DDevice9 device Initialized somewher 1s IDirect3DTexture9 texture Initialized somewher ls D3DXMATRIX matrix Initialized somewher ls D3DXCOLOR constantColor Initialized somewher ls CGcontext context CGprogram vertexProgram fragmentProgram IDirect3DVertexDeclaration9 vertexDeclaration IDirect3DVertexShader9 vertexShader IDirect3DPixelShader9 pixelShader CGparameter baseTexture someColor modelViewMatrix Called at application startup void OnStartup Create context context cgCreateContext Called whenever the Direct3D device needs to be created void OnCreateDevice Create the vertex shader vertexProgram cgCr eateProgramFrombile context CG SOURCE VertexProgram cg CG PROFILE VS 2 0 VertexProgram 0 CComPtr ID3DXBuffer byteCode const char progSrc cgGetProgramString vertexProgram CG COMPILED PROGRAM D3DXAssembleShader progSrc strlen progSrc 0 0 O0 amp byteCode 0 If your program uses explicit binding semantics like this one you c
98. Options 77 cgD3D9IsParameterShadowingEnable dO 74 cgD3D9IsProgramLoaded 76 cgD3D9LoadProgram 74 cgD3D9SetDevice 69 cgD3D9SetSamplerState 73 cgD3D9SetTexture 73 cgD3D9SetTextureWrapMode 74 cgD3D9SetUniform 72 cgD3D9SetUniformArray 73 cgD3D9SetUniformMatrix 72 cgD3D9SetUniformMatrixArray 73 cgD3D9UnloadProgam 76 Direct3D 8 application 81 Direct3D 9 application 78 Direct3D device 69 fragment program 77 lost devices 70 parameters 72 array 73 sampler 73 uniform 72 profile support 76 program execution 74 vertex program 77 Direct3D HRESULT 86 Direct3D minimal interface 57 cgD3D8ResourceToDeclUsage 61 cgD3D8ValidateVertexDeclaration 60 cgD3D9ResourceToDeclUsage 61 cgD3D9ValidateVertexDeclaration 60 Direct3D 8 application 67 Direct3D 9 application 64 fragment program 63 type retrieval 63 vertex declaration 57 vertex declaration for Direct3D 8 58 vertex declaration for Direct3D 9 58 vertex program 63 header files 32 loading 32 modifying parameters 33 OpenGL 46 error reporting 57 OpenGL application 54 OpenGL parameter setting 46 parameter shadowing 46 program execution 33 releasing resources 34 Cg Runtime Library overview 30 Cg standard library 19 Cg_Simple file 89 cgc exe Cg compiler 265 cgD3D9EnableParameterShadowing 74 CGerror Direct3D 86 OpenGL 57 cint type specification 172 command line options Cg compiler 265 comparison operators 189 introduction 15 compilation pro
99. Program Instruction Limits The DirectX 8 vertex shaders are limited to 128 instructions If the compiler needs to produce more than 128 instructions to compile a program it reports an error Vector Register Limits Likewise there are limited numbers of registers to hold program parameters and temporary results Specifically there are 96 read only vector registers and 12 read write vector registers If the compiler needs more registers to compile a program than are available it generates an error Language Constructs and Support Data Types This profile implements data types as follows Q float data types are implemented as IEEE 32 bit single precision Q half and double data types ate treated as float Q int data type is supported using floating point operations which adds extra instructions for proper truncation for divides modulos and casts from floating point types 5 To understand the DirectX VS 1 1 Vertex Shaders and the code the compiler produces see the Vertex Shader Reference in the DirectX 8 1 SDK documentation 808 00504 0000 004 223 NVIDIA Cg Language Toolkit Q fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables Statements and Operators The if while do an
100. SKIP 4 60 808 00504 0000 004 NVIDIA Using the Cg Runtime Library D3DVSD_R D3DVSD_EN e El EXCOORDO D3DVSDT FLOAT2 G D3DVSDE D This is true because D3DDECLUSAGE POSITION and D3DVSDE POSITION match the hardware register associated with the predefined semantic POSITION D3DDECLUSAGE DIFFUSE and D3DVSDE DIFFUSE match the register associated with COLORO and D3DDECLUSAGE TEXCOORD0 and D3DVSDE TEXCOORDO match the register associated with TEXCOORDO The above declarations can also be written the following way using cgD3D9ResourceToDeclUsage of cgD3D8ResourceToInputRegister const D3DVERTEXELEMENT9 declaration CLIO 0 sio lho arte D3DDECLTYPE FLOAT3 D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage CG POSITION O CO 3 5 SLE ELOTE y D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage CG COLORO 0 IAS CO EE OER D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage CG TEXCOORDO 0 D3DD3CL END DWORD declaration D3DVSD STREAM 0 D3DVSD REG cgD3D8ResourceToInputRegister CG POSITION D3DVSDT FLOAT3 D3DVSD REG cgD3D8ResourceToInputRegister CG COLORO D3DVSDT D3DCOLOR D3DVSD STREAM 1 D3DVSD SKIP 4 D3DVSD REG cgD3D8ResourceToInputRegister CG TEXCOORDO D3DVSDT FLOAT2
101. T uniform float4 LightVec vertout OUT Transform vertex position into homogenous clip space OUT HPosition mul ModelViewProj IN Position Transform normal from model space to view spac float3 normalVec normalize mul ModelViewIT IN Normal xyz Store normalized light vector float3 lightVec normalize LightVec xyz Calculate half angle vector float3 eyeVec float3 0 0 0 0 1 0 float3 halfVec normalize lightVec eyeVec 808 00504 0000 004 9 NVIDIA Cg Language Toolkit Calculate diffuse component float diffuse dot normalVec lightVec Calculate specular component float specular dot normalVec halfVec Use the lit function to compute lighting vector from diffuse and specular values float4 lighting lit diffuse specular 32 Blue diffuse material float3 diffuseMaterial float3 0 0 0 0 1 0 White specular material iulio asc e entrar Ma recio M GN e AO PEST EINER ODE Combine diffuse and specular contributions and output final vertex color OUT Color rgb lighting y diffuseMaterial lighting z specularMaterial OUT Colors cae le Ol return OUT Working with Data Like C Cg supports features that create and manipulate data Q Q Q Q Basic Data Basic types Structures Arrays Type conversions Types Cg supports six basic data types Q float A 32 bit IEEE floating point s23e8 number that ha
102. Uniform Arrays of Scalar Vector and Matrix Parameters To set the values of arrays of uniform scalar or vector parameters use the cgGLSetParameterArray functions void cgGLSetParameterArraylf CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetParameterArrayld CGparameter parameter long startIndex long numberOfElements const double array void cgGLSetParameterArray2f CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetParameterArray2d CGparameter parameter long startIndex long numberOfElements const double array void cgGLSetParameterArray3f CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetParameterArray3d CGparameter parameter long startIndex long numberOfElements const double array void cgGLSetParameterArray4f CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetParameterArray4d CGparameter parameter long startIndex long numberOfElements const double array 808 00504 0000 004 49 NVIDIA Cg Language Toolkit The digit in the name of those functions indicates the type of the parameter array elements 1 for arrays of float1 2 for arrays of 1oat2 and so on The variables startIndex and numberOfElements specify which elements of the array parameter are set They are the numberOfElements elements of the indices that range fr
103. Use the compiler option profile fp30 This section describes the capabilities and restrictions of Cg when using the p30 profile Language Constructs and Support Data Types Q fixed type s1 10 fixed point is supported Q half type s10e5 floating point is supported It is recommended that you use fixed half and 1oat in that order for maximum performance Reversing this order provides maximum precision You are encouraged to use the fastest type that meets your needs for precision Statements and Operators Q Full support for if else Q No for and while loops unless they can be unrolled by the compiler Q Support for flexible texture mapping Q Support for screen space derivative functions Q No support for variable indexing of arrays 218 808 00504 0000 004 NVIDIA Bindings Appendix B Language Profiles Binding Semantics for Uniform Data Table 28 summarizes the valid binding semantics for uniform parameters in the p30 profile Table 28 p30 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register s0 register s15 Texunit N where N is in the range 0 15 TEXUNITO TEXUNIT15 May be used only with uniform inputs with sampler types register c0 register c31 Constant register N where N is in range C0 C31 0 15 May only be used with uniform inputs Binding Semantics for Varying Input Output Data Table 29 summatizes the valid binding semantics for v
104. V n gt VI n Componentwise V n Vin gt V n Componentwise M n m M n m gt M n m Componentwise 188 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification Table 7 Expanded Operators continued Operator Description M n m M n m gt M n m Componentwise M n m M n m gt M n m Componentwise M n m M n m gt M n m Componentwise M n m M n m gt M n m Componentwise Operators Boolean amp amp Boolean operators may be applied to bool packed bool vectors in which case they are applied in elementwise fashion to produce a result vector of the same size Each operand must be a bool vector of the same size Both sides of amp amp and are always evaluated there is no short circuiting as there is in C Comparisons lt gt lt gt l Comparison operators may be applied to numeric vectors Both operands must be vectors of the same size The comparison operation is performed in elementwise fashion to produce a bool vector of the same size Comparison operators may also be applied to bool vectors For the purpose of relational comparisons true is treated as one and false is treated as zero The comparison operation is performed in elementwise fashion to produce a bool vector of the same size Comparison operators may also be applied to numeric or bool scalars Arithmetic unary unary
105. Y The vector swizzle operator may only be applied to vectors or to scalars Applying the vector swizzle operator to a scalar gives the same result as applying the operator to a vector of length one Thus myscalar xxx and float3 myscalar myscalar myscalar yield the same value Y Ifonly one swizzle character is specified the result is a scalar not a vector of length one Therefore the expression b y returns a scalar Y Care is required when swizzling a constant scalar because of ambiguity in the use of the decimal point character For example to create a three vector from a scalar use one of the following 1 xxx Or 1 xxx Of 1 0 xxx Of 1 0f xxx The size of the returned vector is determined by the number of swizzle characters Therefore the size of the result may be larger or smaller than the size of the original vector For example loat2 0 1 xxyy and float4 0 0 1 1 yield the same result Q Matrix swizzle operator 186 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification For any matrix type of the form lt type gt lt rows gt x lt columns gt the notation matrixObject m lt row gt lt col gt _m lt row gt lt co1 gt can be used to access individual matrix elements in the case of only one lt row gt lt col gt pait or to construct vectors from elements of a matrix in the case of more than one row col pair The row and column numbers are zero based For example floa
106. a Bode Tierra umibcra Piowtded Bode Viet wabbore PioktE Lighter wertost OUT iras Pura erber potion Labo boc oli pegs OTT Paro malb EodelViewFre IW Positimn trama cEm EGER TOR del apa 5 viss dpecse Bloat mara le nmnrkalizm maiiExRlViss I IM Bore mri store mor amp s iired light vector bosah l ghiVee sera lim Lighks Tes api calculate kalf ample Lor Elpatd wepmec lat D B E 1 05 E bost halfTec nor amp al amp rsiligkitVec pate calculate dillume copo Plat difiere dot nrmm Ter Liga Fee h Figure 3 The Cg Simple Workspace 808 00504 0000 004 89 NVIDIA Cg Language Toolkit As usual click the FileView tab to view the various files in the project What s different in this case though is that in addition to the usual Source Files and Header Files folders there 1s also a Cg Programs folder This Cg Programs folder should contain one Cg program simple cg which is what you can use for experimentation Double click simple cg to open it for editing While you ate editing simple eg you can press Control F7 at any time to compile it Because of the way the project 1s set up any errors in your code will be shown just as when you compile a normal C or C program You can also double click on an error which takes you to the location in the source code that caused the error Understanding simple cg The Cg Simple application runs the shader defined in simple cg on a totus The provi
107. ache de adie phan 223 Memory RESHHCUONS c cus a dos mue Ae ea teed whee ae aah Ent 223 Language Constructs and SuppOIt acces anh eae xen ow kat Bee 223 808 00504 0000 004 NVIDIA Cg Language Toolkit Sl sr ries ad ie aa 225 gor c PP EP 226 DirectX Pixel Shader 1 x Profiles ps 1 seee n I 227 OVENI GW ne a a ideas 227 Modifie Soccer RA ee aed D 228 Language Constructs and Support saisi ika ce en Sones dea Er eel anes 229 Standard Library Functions 2k pe eR bu x R ease AA 230 BINGINGS e P a ee awd ale aa e ada dos 232 Auxiliary Texture FUNCOMS ver 045 xem puce pr AD Rh cee RP ee RUR 234 EXAMPICS errei Gard alee a Sa we Ron s we dido Pe a DOE a en ecard 239 OpenGL NV vertex program 1 0 Profile wp20 oooooooommmm eee 240 OVEIMEW si eaque acEE qu RU COE Y ERA d WEN ce A E Ra e EN E ed 240 Position Invallarice us x ob a A ERE RO RE le ESO NO RUE K EGER 240 Data TY PCS ase tad ai eae Gehan odia d ai ra 241 hn M lcd 241 OpenGL NV texture shader and NV register combiners Profile p20 244 al ne tic Cer 244 RESHIGHONS 2 2 5 rata a RRS eA Rae AAA ded dotem 244 MOGI GTS T eb iaa 245 Language Coristructs and Support 2x ee mr irra ee ee RE RERO 246 Standard Library FUlCUODS iii wien cca rtc ck td ware opns 247 ssl dI D aria e Rem E gone 249 Auxiliary Texture FUNCIONS s dica a a pon 251 Exatmipless u oros ESE da ee MP eS 256 Appendix C Nin
108. ader is based on the Time Machine temporal rust sha der Car paint data was measured by Cornell University from samples provided by Ford Motor Company Pal SRC floa floa floa loa loa loa loa loa loa loa fl Jd RIS float4 iif 38 72 floa floa floa floa VS OUTPUT t4 HPosition POSITION coord position in window 22 Uy TEXCOORDO wavy fleckmap coords ES ILENE TEXCOORD1 light pos tangent space t4 halfangle TEXCOORD2 Blinn halfangle t3 reflection TEXCOORD3 Refl vector per vertex t4 view TEXCOORD4 view tangent space t3 tangent TEXCOORD5 view tangent matrix t3 binormal TEXCOORD6 t3 normal TEXCOORDI o v t fresn COLORO EL SHADER Main VS OUTPUT vert uniform sampler2D WavyMap register s0 uniform samplerCUBE EnvironmentMap register s1 uniform sampler2D PaintMap register s2 uniform sampler2D FleckMap register s3 uniform float Ambient COLOR EWPAINTSPEC UNUSED SPEC POWER GLOSSINESS FLECK SPEC POWER t4 NewPaintSpec qt Qi Gus Sete Tenue is t3 ClearCoat OE 2 00 DS qm Odd deg T Luke ekC oor See ORO eles Ops ale t3 WavyScale ed 0 27 0527 1 0 p 130 NVIDIA 808 00504 0000 004 Advanced Profile Sample Shaders Tangent space LIGHT vector float3 L normalize vert light Tangent space HALF ANGLE vector float3 H normalize vert halfan
109. al reference to the Direct3D devic and free its Direct3D resources cgD3D8SetDevice 0 Called before application shuts down void OnShutdown This frees any core runtime resource cgDestroyContext context Direct3D Debugging Mode In addition to the error reporting mechanisms described in Direct3D Error Reporting on page 85 a debug version of the Direct3D 9 or Direct3D 8 Cg runtime DLL is provided to assist you with the development of applications using the Direct3D 9 or Direct3D 8 Cg runtime This version does not have debug symbols but when used in place of the regular version it uses the Win32 function OutputDebugString to output many helpful messages and traces 808 00504 0000 004 83 NVIDIA Cg Language Toolkit to the debug output console Examples of information the debug DLL outputs ate the following O Any Direct3D or Cg core runtime errors Q Debugging information about parameters that are managed by the expanded interface Q Potential performance warnings Here is a sample trace cgD3D TRACI cgD3D TRACI E Creating vertex shader for program 3 Discovering parameters for vertex program 3 cgD3D TRACE Discovered uniform parameter ModelViewProj of type float4x4 cgD3D TRACE Finished discovering parameters for vertex program 3 cgD3D TRACI cgD3D TRACI cgD3D TRACI cgD3D TRACI Creating pixel shader for program 24 Discovering parameters for pixel pr
110. an create a vertex declaration using those semantics const D3DVERTEXELEMENT9 declaration 64 808 00504 0000 004 NVIDIA Using the Cg Runtime Library LO 9 sico elote v D3DDECLTYPE FLOAT3 D3DDECLMETHOD DEFAULT D3DDECLUSAGE POSITION 0 Oy Si S Ze o Eoo D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT D3DDECLUSAGE COLOR O Oj 4 SAO AO is v D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT D3DDECLUSAGE TEXCOORD 0 D3DD3CL END y Make sure the resulting declaration is compatible with the shader This is really just a sanity check assert cgD3D9ValidateVertexDeclaration vertexProgram declaration device gt CreateVertexDeclaration declaration amp vertexDeclaration device gt CreateVertexShader byteCode gt GetBufferPointer amp vertexShader Create the pixel shader fragmentProgram cgCreateProgramFromFile context CG SOURCE FragmentProgram cg CE WROMMMT PIS 2 0 EE gm enero qa 0 CComPtr lt ID3DXBuffer gt byteCode const char progSrc cgGetProgramString fragmentProgram CG COMPILED PROGRAM D3DXAssembleShader progSrc strlen progSrc 0 0 O0 amp byteCode 0 device gt CreatePixelShader byteCode gt GetBufferPointer amp pixelShader Grab some parameters modelViewMatrix cgGetNamedPara
111. and gives a brief overview of how it is used in an application The next two sections Core Cg Runtime on page 34 and API Specific Cg Runtimes on page 45 give an exhaustive description of the APIs composing the Cg Runtime Introducing the Cg Runtime Cg programs are lines of code that describe shading but they need the support of applications to create images To interface Cg programs with applications you must do two things 1 Compile the programs for the correct profile In other words compile the programs into a form that is compatible with the 3D API used by the application and the underlying hardware 2 Link the programs to the application program This allows the application to feed varying and uniform data to the programs You have two choices as to when to perform these operations You can perform them at compile time when the application program is compiled into an executable or you can perform them at run time when the application is actually executed The Cg runtime is an application programming interface that allows an application to compile and link Cg programs at run time Benefits of the Cg Runtime Future Compatibility Most applications need to run on a range of profiles If an application precompiles its Cg programs the compile time choice it must store a compiled version of each program for each profile This is reasonable for one 808 00504 0000 004 29 NVIDIA Cg Language Toolkit program
112. are and set a vector output that uses the COLOR semantic This value is usually used by the hardware as the final color of the fragment Some fragment profiles also support the DEPTH output semantic which allows the depth value of the fragment to be modified As with vertex programs fragment programs may return their outputs in the body of a structure However it is usually more convenient to either declare outputs as out parametets WO nus op Ww Ome 3Ellorened cello COLOR Ote milo cesa Das 7 JU ll coler clierusccaolo r JI nna cama m ss B 8 808 00504 0000 004 NVIDIA Introduction to the Cg Language or to associate a semantic with the return value of the shader loma masa y cos E 8 COLOR A PE aoo Tf rerun deubtiruasecOlor oo fs The following example shows a simple vertex program that calculates diffuse and specular lighting Two structures for varying data appin and vertout are also declared Don t worry about understanding exactly what the program is doing the goal is simply to give you an idea of what Cg code looks like A Brief Tutorial on page 89 explains this shader in detail Define inputs from application StXUCE East loci osuicseim NEO SIRIO float4 Normal NORMAL e Define outputs from vertex shader Struck cwertout float4 HPosition POSITION close Color 3 COLORS y vertout main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewI
113. arying input parameters in the p30 profile These binding semantics map to NV ragment program input registers The two sets act as aliases to each other The profile also allows POSITION FOG PSIZE HPOS FOGC PSIZ BCOLO BCOL1 and CLPO CLPS5 to be present as binding semantics on a member of a structure of a varying input data structure provided the member with this binding semantics is not referenced This allows Cg programs to have the same structure specify the varying output of a vp30 profile program and the varying input of an p30 profile program Table 29 p30 Varying Input Binding Semantics Binding Semantics Name Corresponding Data type COLORO COLO Input color0 float4 COLOR1 COL1 Input colori float4 TEXCOORDO TEXCOORD7 Input texture coordinates float4 TEXO TEX7 WPOS Window Position Coordinates float 4 808 00504 0000 004 219 NVIDIA Cg Language Toolkit Table 30 summarizes the valid binding semantics for varying output parameters in the p30 profile Table 30 p30 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO COL Output color float4 DEPTH DEPR Output depth 1oat Pack and Unpack Functions The p30 profile provides a number of functions for packing multiple floating point values into a single 32 bit result Corresponding unpacking functions are also provided These functions map directly to
114. ated as the main entry point at compilation time The varying inputs to the program come from this top level function s varying in parameters The uniform inputs to the program come from the top level function s uniform in parameters and from any non static global variables that are referenced by the top level function or by any functions that it calls The output of the program comes from the return value of the function which is always implicitly varying and from any out parameters which must also be varying Parameters to a program of type sampler are implicitly const Statements Statements are expressed just as in C unless an exception is stated elsewhere in this document Additionally Q The if while and for statements require bool expressions in the approptiate places Q Assignment is performed using The assignment operator returns a value just as in C so assignments may be chained Q The new discard statement terminates execution of the program for the current data element such as the current vertex or current fragment and suppresses its output Vertex profiles may choose to omit support for discard Minimum Requirements for if while and for Statements The minimum requirements are as follows Q All profiles should support if but such support is not strictly required for older hardware Q All profiles should support for and while loops if the number of loop iterations can be determined at compile t
115. ategory 174 O object Cg definition 168 open profile functions 170 OpenGL Cg runtime 46 error reporting 57 OpenGL application 54 parameter setting 46 OpenGL CGerror 57 OpenGL profiles ARB fragment program 211 ARB vertex program 204 NV fragment program 218 NV register combiners 244 NV texture shader 244 NV vertex program 240 NV vertex program 2 0 214 operations expressed differently from C 165 operator enhancements 188 precedence 188 operators arithmetic 14 boolean 15 conditional 17 introduction 13 swizzle 16 write mask 16 P packed type modifier 172 parameter shadowing 46 parameters modifiable function passing 14 parameters in function definitions syntax 171 performance techniques abs 259 avoiding matrix transposes 263 computation frequency 262 conditional code in fragment programs 263 data types 261 dot 259 min 259 saturate 260 shading computations 261 Swizzle 258 texture maps 260 vectorization 257 pixel program defined 2 pixel shader defined 2 position invariance 192 profile arbfp1 211 arbvp1 204 fp20 244 fp30 218 psii psi2 psi13 227 ps20 ps2x 200 vp20 240 vp30 214 vs_1 1 223 vs 20 vs 2x 196 profile defined 3 program declaring 4 kinds of inputs 5 program profiles fragment 193 vertex 192 programming model GPU 2 ps_1_x profile 227 ps_2_0 profile 200 ps 2 xprofile 200 808 00504 0000 004 271 NVIDIA Cg Language Toolkit R ray traced refraction pixel shader code e
116. ation is faster than if you use float Although sometimes you need the range and extra precision that half and float offer you should avoid using them unless necessary 6 Usethe Right Standard Library Routines for Shading Computations If you re implementing a shading model such as Lambertian Blinn or Phong you ll generally be performing some dot product routines clamping negative results to zero and raising some of the values to a power to compute a specular exponent There are a few tricks that can speed up this process Q Besure to use the dot function when computing dot products Q Ifyou need to clamp the result of a dot product computation to the range 0 1 in a fragment program use the saturate function instead of max This is often written as max 0 dot N L but as long as the N and L vectors are normalized this can be written equivalently as saturate dot N L because the dot product of two normalized vectors is never greater than one Given that saturate is free in fragment programs see 3 Use the Cg Standard Library on page 259 this compiles to mote efficient code Q Use the 1it Standard Library function if appropriate The 1it function implements a diffuse glossy Blinn shading model It takes three parameters The dot product of the normalized surface normal and the light vector The dot product of a half angle vector and the normal The specular exponent It returns a 4
117. based on the context in which uniform sampler parameters and texture coordinate inputs are used together To specify bindings between texture units and uniform parameters texture coordinates to match their application all sampler uniform parameters and texture coordinate inputs that are used in the program must have matching binding semantics for example TEXUNIT lt n gt may only be used with TEXCOORD lt n gt Partially specified binding semantics may not work in all cases Fundamentally this restriction is due to the close coupling between texture samplers and texture coordinates in the NV_texture_shader extension Binding Semantics for Uniform Data If a binding semantic for a uniform parameter is not specified then the compiler will allocate one automatically Scalar uniform parameters may be allocated to either the xyz or the w portion of a constant register depending on how they ate used within the Cg program When using the output of the compller without the Cg runtime you must set all values of a scalar uniform to the desired scalar value not just the x component Table 47 summatizes the valid binding semantics for uniform parameters in the p20 profile Table 47 p20 Uniform Binding Semantics Binding Semantics Name Corresponding Data register s0 register s3 Texture unit N where wis in range 0 3 TEXUNITO TEXTUNIT3 May be used only with uniform inputs with sampler types The ps 1 X profile
118. bvp1 profile allows Cg programs to refer to the OpenGL state directly unlike the vp20 profile Howevet if you want to write Cg programs that are compatible with vp20 and dx8vs profiles you should use the alternate mechanism of setting uniform variables with the necessary state using the Cg run time The compiler relies on the feature of ARB vertex assembly programs that enables parts of the OpenGL state to be written automatically to program parameter registers as the state changes The OpenGL driver handles this state tracking feature A special variable called g1state defined as a structure can be used to refer to every part of the OpenGL state that ARB vertex programs can reference Following this paragraph are three lists of the g1state fields that can be accessed The array indexes ate shown as 0 but an array can be accessed using any positive integer that is less than the limit of the array For example to access the diffuse component of the second light use gistate light 1 diffuse assuming that GL MAX LIGHTS is at least 2 3 See DirectX Vertex Shader 1 1 Profile vs 1 1 on page 223 for a full explanation of the data types statements and operators supported by this profile 204 808 00504 0000 004 NVIDIA Appendix B Language Profiles Table 16 lists the glstate fields of type float4x4 that can be accessed Table 16 float4x4 glstate Fields
119. calar to vector 179 Stanford shading language relation to Cg 165 statements introduction 13 statements in Cg 185 structures introduction 12 swizzle for performance 258 swizzle operator 16 swizzle operator described 186 T texture lookups 17 texture map functions 25 texture maps for performance 260 thin film effect pixel shader code example 126 vertex shader code example 124 tutorial 89 272 808 00504 0000 004 NVIDIA type conversions 11 176 array 177 matrix 176 scalar 176 structure 176 vector 176 type equivalency 178 type promotion 178 assignment 178 smearing 179 type qualifiers 175 const 175 in 175 out 175 types general discussion 171 partial support 173 U uniform inputs 5 uniform modifer use of 169 uninitialized variables use of 182 V variables global 182 uninitialized use of 182 varying inputs 5 vector data types 11 vector operators new 186 vectorization for performance 257 vectors constructing 15 vertex color 93 vertex position 93 vertex program varying output 7 vertex program profiles 192 vertex programs defined 2 void type specification 172 vp20 profile 240 vp30 profile 214 vs_1_1 profile 223 vs_2_0 profile 196 vs 2 x profile 196 WwW water improved pixel shader code example 104 sample shader 101 vertex shader code example 102 web site NVIDIA xiv while statements 185 workspace loading 89 write mask operator 16 described 187 808 00504 0000 004 273 NVI
120. cally activates the Direct3D shader corresponding to program by calling IDirect3DDevice9 SetVertexShader or IDirect3DDevice9 SetPixelShader depending on the program s profile If parameter shadowing is enabled for program it also sets all the shadowed parameters and their associated Direct3D states such as texture stage states for the sampler parameters No value or state tracking is performed by the runtime so that this setting is done regardless of what the current values of these parameters or of their states are If a shadowed parameter has not been set by the time cgD3D9BindProgram is called no Direct3D call of any sort is issued for this parameter Only one vertex program and one fragment program can be bound at any given time so binding a program of a given type implicitly unbinds any other program of the same type Expanded Interface Profile Support Two convenient functions are provided that give the highest vertex and pixel shader versions supported by the device CGprofile cgD3D9GetLatestVertexProfile CGprofile cgD3D9GetLatestPixelProfile This allows you to make your application future ready because the Cg programs ate automatically compiled for the best profiles that are available at runtime even if these profiles did not exist at the time the application was written Another function that allows you optimal compilation is 76 808 00504 0000 004 NVIDIA Using the Cg Runtime Library cgD3D9GetOpti
121. capable GPUs of today and tomorrow APIs do not and cannot keep up with the rapid pace of innovation in GPUs As APIs and underlying technologies change programmers artists and software publishers struggle to adapt to the change and the churn of the hardware software platform What s needed is to raise the level of abstraction for interaction with GPUs Continued updates and improvements to the hardware and APIs are too painful if developers are too close to the metal This problem was exacerbated by the advent of programmability in GPUs Older GPUs had a small number of controllable or configurable rendering paths but the most recent technology is 808 00504 0000 004 xi NVIDIA Cg Language Toolkit highly programmable and becoming ever more so We can now write short vertex and fragment programs to be executed by the GPU This requires great skill and is only possible with short programs When GPU hardwate grows to allow programs of hundreds thousands or even more instructions assembly coding will no longer be practical Rather than programming each rendering state each bit byte and word of data and control through a low level assembly language we want to express our ideas in a more straightforward form using a high level language Thus Cg C for Graphics becomes necessary and inevitable Just as C was derived to expose the specific capabilities of processors while allowing higher level abstraction Cg allows
122. ce for both vertex and fragment programs This section describes the contents of the Cg Standard Library including Mathematical functions Geometric functions Texture map functions Derivative functions D D DUO O Predefined helper struct types Where appropriate functions are overloaded to support scalar and vector variations when the input and output types are the same Mathematical Functions Table 1 lists the mathematical functions that the Cg Standard Library provides The list includes functions useful for trigonometty exponentiation rounding 808 00504 0000 004 19 NVIDIA Cg Language Toolkit and vector and matrix manipulations among others All functions work on scalars and vectots of all sizes except where noted Table 1 Mathematical Functions Mathematical Functions Function Description abs x Absolute value of x acos x Arccosine of x in range 0 1 x in 1 1 all x Returns true if every component of x is not equal to 0 Returns alse otherwise any x Returns true if any component of x is not equal to 0 Returns false otherwise asin x Arcsine of x in range 1 2 x 2 x should be in 1 1 atan x Arctangent of x in range 1 2 1 2 atan2 y x Arctangent of y x in range z 7 ceil x Smallest integer not less than x clamp x a b x clamped to the range a b as follows e Returns a if x is less than a e Returns b if x is greater
123. cgGetFirstParameter and cgGetNextParameter will allow you to iterate through all the parameters of a program that are within the scope of the context Here is how those two functions would typically be used given a valid program called program CGparameter parameter cgGetFirstParameter program CG PROGRAM while parameter 0 Here is the code that handles the parameter parameter cgGetNextParameter parameter These functions don t give access to the fields of a structure parameter type CG_STRUCT or the elements of an array parameter type CG_ARRAY 808 00504 0000 004 39 NVIDIA Cg Language Toolkit To get access to the fields of a structure you use cgGetFirstStructParameter along with cgGetNextParameter CGparameter cgGetFirstStructParameter CGparameter parameter If parameter is not of type CG_STRUCT cgGetFirstStructParameter returns zero To get access to the elements of an array you use cgGetArrayDimension cgGetArraySize cgGetArrayParameter and cgGetNextParameter int cgGetArrayDimension CGparameter parameter int cgGetArraySize CGparameter parameter int dimension CGparameter cgGetArrayParameter CGparameter parameter int index These three functions return 0 if parameter is not of type CG ARRAY Function cgGetArrayDimension gives the dimension of the array It returns 1 for float4 array 10 2 for float4 array 10 100 and so on Next cg
124. ch and now even to exceed traditional workstations The processing power provided by a modern GPU in a single frame rivals the amount of computation that used to be expended for an offline rendered animation frame Indeed at the launch of GeForce3 on the Apple Macintosh a convincing version of Pixar s Luxo Jr was demonstrated running interactively in real time At the 2001 SIGGRAPH conference an interactive version of a more recent film Square Studios Final Fantasy was shown running in real time again on a GeForce Although these feats of computation are astounding there is much more to come Today s GPUs evolve vety quickly Typically a product generation is only six months long and with each new product generation comes a two fold increase in performance Graphics processor performance increases at approximately three times the rate of microprocessors Moore s Law cubed In addition to the performance increases each year brings new hardware features supported by new application programming interfaces APIs This dizzying pace is difficult for developers to adapt to but adapt they must Developers and usets are demanding better rendeting quality and more realistic imagery and experiences Users don t care about the details they simply want games and other interactive applications to look more like movies special effects and animation Developets want more power always more along with more flexibility in controlling the massively
125. cie O Ve float3 finalColor lerp lightMetal darkMetal nvDecal x return float4 finalColor 1 108 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders MultiPaint Description MultiPaint presents a single pass solution to a common production problem mixing multiple kinds of materials on a single polygonal surface MultiPaint provides a simple BRDF bidirectional reflectance distribution function that 1s still complex enough to represent many common metallic and dielectric surfaces and controls all key factors of the variable BRDF through texturing This permits you to create multiple materials without switching shaders splitting your model or resorting to multiple passes Uses for MultiPaint might include complex armor built of inlaid metals woods and stones all modeled on a single simple poly mesh buildings composed of multiple types of stone glass and metal expressed as simple cubes cloth with inlaid metallic threads or as in this demo metal partially covered with peeling paint Using multiple BRDFs is common in the offline world but rarely optimized instead two different shaders may be evaluated and their results blended using a mask texture or chained through if statements For maximum real time performance MultiPaint instead integrates all of the key parts of the BRDFs as multiple painted textures so that only one pass through the shader is required to create the mixed appearance This permit
126. clared sheer my Seeker PO ge t qua yews Sy f f DeErtas s as ey Muy Arrays ate supported in Cg and are declared just as in C Because Cg does not support pointers arrays must always be defined using array syntax rather than pointer syntax Declare a function that accepts an array of five skinning matrices Retin peat ooisltod 4x4 aca Y co EU Basic profiles place substantial restrictions on array declaration and usage General purpose arrays can only be used as uniform parameters to a vertex program The intent is to allow an application to pass arrays of skinning matrices and arrays of light parameters to a vertex program The most important difference from C is that arrays are first class types That means array assignments actually copy the entire array and arrays that are 12 808 00504 0000 004 NVIDIA Introduction to the Cg Language passed as parameters are passed by value the entire array is copied before making any changes rather than by reference Statements and Operators Cg supports the following types of statements and operators Control flow Function definitions and function overloads Arithmetic operators from C Multiplication function Vector constructor Boolean and comparison operators Swizzle operator Write mask operator D DO OCOLDOLDCDO ZrLv Conditional operator Control Flow Cg uses the following C control constructs a Function calls and the return stat
127. composed of two interfaces Q Minimal interface This interface makes no Direct3D calls itself and should be used when you prefer to keep the Direct3D code in the application itself Q Expanded interface This interface makes the Direct3D calls necessary to provide enhanced program and parameter management and should be used when you prefer to let the Cg runtime manage the Direct3D shaders Direct3D Minimal Interface The minimal interface simply supplies convenient functions to convert some information provided by the core runtime to information specific to Direct3D Vertex Declaration In Direct3D you have to supply a vertex declaration that establishes a mapping between the vertex shader input registers and the data provided by the application as data streams In Direct3D 9 this vertex declaration is bound to the current state the same way the vertex shader is see the Direct3D 9 documentation on IDirect3DDevice9 CreateVertexDeclaration and IDirect3DDevice9 SetVertexDeclaration fora detailed explanation In Direct3D 8 the vertex declaration is required at the time you create the vertex shader for mote information see the Direct3D 8 documentation on IDirect3DDevice8 CreateVertexShader 808 00504 0000 004 57 NVIDIA Cg Language Toolkit A data stream is basically an array of data structures Each of those structures is of a particular type called the vertex format of the stream Here is an example of a vertex d
128. coordinates as the two floating point values located at an offset equal to twice the size of a DWORD from the end of the normal data in stream 0 The tangents are provided in stream 1 as a second texture coordinate set that is found as the first three floating point values of the vertex format To get a vertex declaration from a Cg vertex program for the Direct3D 9 Cg runtime use cgD3D9GetVertexDeclaration CGbool cgD3D9GetVertexDeclaration CGprogram program D3DVERTEXELEMENT9 declaration MAXD3DDECLLENGTH 58 808 00504 0000 004 NVIDIA Using the Cg Runtime Library MAXD3DDECLLENGTH is a Direct3D 9 constant that gives the maximum length of a Direct3D 9 declaration If no declaration can be derived from the program cgD3D9GetVertexDeclaration fails and returns CG_FALSE To get a vertex declaration from a Cg vertex program for the Direct3D 8 Cg runtime use cgD3D8GetVertexDeclaration CGbool cgD3D8GetVertexDeclaration CGprogram program DWORD declaration MAX FVF DECL SIZE MAX FVF DECL SIZE is a Direct3D constant that gives the maximum length of a Direct3D declaration If no declaration can be derived from the program cgD3D8GetVertexDeclaration fails and returns CG FALSE The declaration returned by cgD3D9GetVertexDeclaration or cgD3D8GetVertexDeclaration is for a single stream so that for the following program votan ao AOS ion ROS ENEON in float4 color INCOLORO AS ASEOS S EpXCO QISIDU out f
129. cs for Uniform Data Table 13 summatizes the valid binding semantics for uniform parameters in the ps 2 0 and ps 2 X profiles Table 13 ps 2 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register s0 register s15 Texunit unit N where N is in range 0 15 TEXUNITO TEXUNIT15 May only be used with uniform inputs with sampler types register c0 register c31 Constant register N where N is in range C0 c31 0 31 May only be used with uniform inputs Binding Semantics for Varying Input Output Data Table 14 summarizes the valid binding semantics for varying input parameters in the ps 2 0 andps 2 x profiles Table 14 ps 2 Varying Input Binding Semantics Binding Semantics Name Corresponding Data type COLORO Input color 0 float4 COLOR1 Input color 1 float4 TEXCOORDO TEXCOORD7 Input texture coordinates float4 Table 15 summatizes the valid binding semantics for varying output parameters in theps_2 0 and ps 2 x profiles Table 15 ps 2 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Output color float4 DEPTH Output depth 1oat 202 808 00504 0000 004 NVIDIA Appendix B Language Profiles Options The ps 2 x profile allows the following profile specific options NumTemps lt n gt where 12 lt n lt 32 default 32 NumInstructionSlots lt n gt where 96 lt n
130. ctX 9 pixel shaders Runtime profiles CG PROFILE PS 2 X CG PROFILE PS 2 0 Compiler options profile ps 2 x profile ps 2 0 a OpenGL ARD vertex programs Runtime profile CG PROFILE ARBVP1 Compiler option profile arbvpl Q OpenGL ARB fragment programs Runtime profile CG PROFILE ARBFP1 Compiler option profile arbfp1 a OpenGL NV30 vertex programs Runtime profile CG_PROFILE VP30 Compiler option profile vp30 Q OpenGL NV30 fragment programs Runtime profile CG PROFILE FP30 Compiler option profile fp30 808 00504 0000 004 3 NVIDIA Cg Language Toolkit a DirectX 8 vertex shaders Runtime profile CG PROFILE VS 1 1 Compiler option profile vs 1 1 Q DirectX 8 pixel shaders Runtime profiles CG_PROFILE PS 1 3 CG PROFILE PS 1 2 CG PROFILE PS 1 1 Compiler options profile ps 1 3 profile ps 1 2 profile ps 1 1 Q OpenGL NV2X vertex programs Runtime profile CG PROFILE VP20 Compiler option profile vp20 Q OpenGL NV2X fragment programs Runtime profile CG PROFILE FP20 Compiler option profile fp20 The DirectX 9 profiles vs 2 x and ps 2 x OpenGL ARB profiles arbfp1 and arbvp1 and NV30 OpenGL profiles p30 and vp30 generally support longet more complex programs and offer more features and functionality to the developer These ate referred to as advanced profiles The DirectX 8 profiles vs 1 1andps 1 3 and NV2X OpenGL profiles p20 and vp20 have more restrictions on program length and available features e
131. ction parameters ate aliased by a function call In Cg the two parameters have separate storage in the function whereas in C they would share storage To reinforce this distinction Cg uses a different syntax than C to declare function parameters that are modified function blahl out SPINE x x is output only function blah2 inout float x Jf sx 3e agyoure aime Omara function blah3 in Flo 2 7 x is input only conecto lollela lose 7 f f 2 ts imon oby eleudbs as am E Cg suppotts function ovetloading by the number of operands and by operand type The choice of a function is made by matching one operand at a time starting at the first operand The formal language specification provides more details on the matching rules but it is not normally necessary to study them because the ovetloading generally works in an intuitive manner For example the following code declares two versions of a function one that takes two bool operands and one that takes two float operands bool same float a float b return a b bool same bool a bool b return a b Arithmetic Operators from C Cg includes all the standard C arithmetic operators and allows the operators to be used on vectors as well as on scalars The vector operations are always performed in elementwise fashion For example float3 a b c loat3 A B C equals float3 a A b B c C These operators can also be used in a form that mixe
132. d for statements are allowed only if the loops they define can be unrolled because there is no branching in VS 1 1 shaders There are no subroutine calls either so all functions are inlined Comparison operators ate allowed gt lt gt lt and Boolean operators amp amp are allowed However the logic operators amp are not allowed Using Arrays Variable indexing of arrays is allowed as long as the array is a uniform constant For compatibility reasons arrays indexed with variable expressions need not be declared const just uniform However writing to an array that is later indexed with a variable expression yields unpredictable results Array data is not packed because vertex program indexing does not permit it Each element of the array takes a single 4 float program parameter register For example float arr 10 float2 arr 10 float3 arr 10 and float4 arr 10 all consume ten program parameter registers It is more efficient to access an array of vectors than an array of matrices Accessing a matrix requires a floor calculation followed by a multiply by a constant to compute the register index Because vectors and scalars take one register neither the floor nor the multiply is needed It is faster to do matrix skinning using arrays of vectors with a premultiplied index than using atrays of matrices Constants Literal constants can be used with this profile but it is not possible to sto
133. d the program cgGLLoadProgram fragmentProgram Grab some parameters position cgGetNamedParameter vertexProgram position color cgGetNamedParameter vertexProgram color texCoord cgGetNamedParameter vertexProgram texCoord modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram NSome Color Set parameters that don t change They can be set only once because of parameter shadowing cgGLSetTextureParameter baseTexture texture cgGLSetParameter4fv someColor constantColor Called to render the scen void Display Set the varying parameters cgGLEnableClientState position cgGLSetParameterPointer position 3 GL FLOAT 0 vertexPositions cgGLEnableClientState color 808 00504 0000 004 55 NVIDIA Cg Language Toolkit cgG cgG cgG cgG cgG cgG cgG cgG cgG A mi cgG A cgG cgG cgG cgG cgG id Qa void ii cgD IsetrbeceivecerPoumeer cedo 1 Ch LOA 0 vertexColors EnableClientState texCoord LSetParameterPointer texCoord 2 GL FLOAT 0 vertexTexCoords Set the uniform parameters that change every frame LSetStateMatrixParameter modelViewMatrix CG GL MODELVIEW PROJECTION MATRIX CG GL MATRIX IDENTITY
134. d within the same scope a Vector constructors such as the form 1oat4 1 2 3 4 may be used anywhere in an expression O A struct definition automatically performs a corresponding typedef as in C a C style comments are allowed in addition to C style comments 808 00504 0000 004 167 NVIDIA Cg Language Toolkit Detailed Language Specification Definitions Profiles The following definitions are based on the ANSI C standard a Object An object is a region of data storage in the execution environment the contents of which can represent values When referenced an object may be interpreted as having a particular type O Declaration A declaration specifies the interpretation and attributes of a set of identifiers a Definition A declaration that also causes storage to be reserved for an object or code that will be generated for a function named by an identifier 1s a definition Compilation of a Cg program a top level function always occurs in the context of a compilation profile The profile specifies whether certain optional language features are supported These optional language features include certain control constructs and standard library functions The compilation profile also defines the precision of the float half and fixed data types and specifies whether the fixed and sampler data types are fully or only partially supported The choice of a compilation profile is made externally to
135. ded version of simple cg calculates diffuse and specular lighting for each vertex Figure 4 shows a screenshot of the shader d Figure 4 The simple cg Shader 90 808 00504 0000 004 NVIDIA A Brief Tutorial Program Listing for simple cg The following is the program listing for simple cg Define inputs from application struct appin float4 Position JACKS IMIEILOIN 2 float4 Normal NORMAL y Define outputs from vertex shader struct Vebloue float4 HPosition S POSITION OA Color COMO yo vertout main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewIT uniform float4 LightVec vertout OUT Transform vertex position into homogenous clip space OUT HPosition mul ModelViewProj IN Position Transform normal from model space to view spac float3 normalVec normalize mul ModelViewIT IN Normal xyz Store normalized light vector float3 lightVec normalize LightVec xyz Calculate half angle vector float3 eyeVec float3 0 0 0 0 1 0 float3 halfVec normalize lightVec eyeVec Calculate diffuse component float diffuse dot normalVec lightVec Calculate specular component float specular dot normalVec halfVec Use the lit function to compute lighting vector from 808 00504 0000 004 91 NVIDIA Cg Language Toolkit diffuse and specular values float4 lighting lit diffuse specular 32 Blue
136. describes where to find the necessary vertex attributes in the vertex streams See Expanded Interface Program Execution on page 74 for the details on the arguments to cgD3D8LoadProgram and cgD3D9LoadProgram In OpenGL the equivalent call is cgGLLoadProgram program Modifying Program Parameters The runtime gives you the option of modifying the values of yout program parameters The first step is to get a handle to the parameter CGparameter myParameter cgGetNamedParameter program myParameter The variable myParameter is the name of the parameter as it appears in the program source code The second step is to set the parameter value The function used depends on the parameter type Here is an example in OpenGL cgGLSetParameter4fv myParameter value Hete is the same example in Direct3D cgD3D9SetUniform myParameter value These function calls assign the four floating point values contained in the array value to the parameter myParameter which is assumed to be of type float4 In both APIs there are variants of these calls to set matrices arrays textures and texture states Executing a Program Before you can execute a program in OpenGL you must enable its corresponding profile cgGLEnableProfile CG PROFILE ARBVP1 808 00504 0000 004 33 NVIDIA Cg Language Toolkit In Direct3D nothing explicitly needs to be done to enable a specific profile Next you bind the program
137. diffuse material float3 diffuseMaterial float3 0 0 0 0 1 0 White specular material float3 specularMaterial float3 1 0 1 0 1 0 Combine diffuse and specular contributions and output final vertex color OUT Color rgb lighting y diffuseMaterial lighting z specularMaterial OUT Color a 1 0 return OUT Definitions for Structures with Varying Data The first thing to notice is the definitions of structures with binding semantics for varying data Let s take a look at the appin structure PP define inputs from application struct appin float4 Position POSITION float4 Normal NORMAL y This structure contains only two members Position and Normal Because this data varies per vertex the binding semantics POSITION and NORMAL tell the compiler that the position information is associated with the predefined attribute POSITION and that the normal information is associated with the predefined attribute NORMAL The other structure that is defined in simple cg is vertout which connects the vertex to the fragment define outputs from vertex shader SLEUCE wvertout float4 HPosition POSITION float Color COLOR be 92 808 00504 0000 004 NVIDIA A Brief Tutorial The vertout structure also contains only two members Hposition the vertex position in homogeneous coordinates and Color the vertex color Again binding semantics ate used to specify register locations for the va
138. distance leck colos fleck color milena wes Insul icing le w DIFFUSE flos je cl amp entuseente um ol 1152 5 locus peintkesult lero 2vulosieimE seua robos parco lio de Drs FRESNEL log Bresmel seua Clo ClmesCosenr Rerlece Color Fresnel pow Fresnel NewPaintSpec z This helps make the clear coat less omnipresent only the really perceptually bright areas reflect the most Fresnel saturate vert fresn Fresnel Show more of the specular reflection environment when in fresnel zones diffuse 1 fresnel environment fresnel pemaxesulie lero paint Result iWSclecie color Fresnel SPECULAR O rtuse specular lecks parres ult o o Results Colon OUTPUT return paintResult xyzz 132 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders This chapter provides a set of basic profile sample shaders written in Cg Each shader comes with an accompanying snapshot description and source code Examples shown are Anisottopic Lighting Bump Dot3x2 Diffuse and Specular Bump Reflection Mapping Fresnel Grass Refraction Shadow Mapping Shadow Volume Extrusion Sine Wave Demo D D DL oO CCOO U oO oO O Matrix Palette Skinning 808 00504 0000 004 133 NVIDIA Cg Language Toolkit Anisotropic Lighting Description The anisotropic lighting effect Figure 13 shows the vertex program s half angle vector calculation It use
139. ds rgb bscale 1 0 tangentSpaceNormal tangentSpaceNormal bumpscale Transform it into eye space t loyeWE S my n 0 dot In tangentToEyeMat0 xyz tangentSpaceNormal n 1 dot In tangentToEyeMatl tangentSpaceNormal n 2 dot In tangentToEyeMat2 tangentSpaceNormal n normalize n Compute the loat LOa LOa LOa LOa LOa lighting equation t acotl mesi doe a Ly 0 llamo 0 o 1 t nadoda mezi corta m O J2 f Cle to 1 t flag float ndotl gt 0 ompute oil sheen subsurf scattering contributions EA guts t4 sheen t4 subsurf 122 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders loa Kr Kr2 loe IRE KEZ ElogieSs dU 12 ELOSNES IN 825 Compute fresnel at sheen layer ramp it up a bit ra mass aci Y i Sta R UP M Kr SMOO M sie soln 0 0 Q5 kae p Mc alk 0 dies Compute the refracted light ray and the refraction Uc oeste nts ge esse IL ins SEE IR UU vp KoA smooskisisc o 0 07 0 39 ES RR Ke2 db c Tac For oil contribution modulate the oiliness mask by a specular term Oil 0 39 guless jw melon ia 9 For sheen contribution modulate Fresnel term by sheen color times specular Modulate by additional diffuse term to soften it a bit sheen 2 5 Kr sheenColor ndotl 0 2 pow ndoth m Compute single scattering approximation to subsurface scatter
140. e cgGetParameterResource color cgGetParameterResourceIndex color if dL 4L w size elo D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT cgD3D9ResourceToDeclUsage cgGetParameterResource texCoord cgGetParameterResourceIndex texCoord D3DD3CL END y DWORD declaration D3DVSD_STREAM 0 D3DVSD REG cgD3D8ResourceToInputRegister cgGetParameterResource position D3DVSDT FLOAT3 D3DVSD REG cgD3D8ResourceToInputRegister cgGetParameterResource color D3DVSDT D3DCOLOR D3DVSD STREAM 1 D3DVSD SKIP 4 D3DVSD REG cgD3D8ResourceToInputRegister cgGet ParameterResource texCoord D3DVSDT FLOAT2 D3DVSD END The size specified as the second argument of the D3DVSD REG macro call of a Direct3D 8 declaration does not need to match the size of the corresponding parameter for the vertex declaration to be valid Those sizes are specified to describe how the data is laid out in the streams not to perform any type checking with the shader code The data referred to by a D3DVSD REG macro 62 NVIDIA 808 00504 0000 004 Using the Cg Runtime Library call is expanded to the four floating point values of the corresponding hardware register and the missing values are set to 0 for x y and z and to 1 for w Minimal Interface Type Retrieval Use cgD3D9TypeToSize to rettieve the size of a CGtype enumerated type in terms of floating point numbers DWORD cgD3D9TypeToSize CGtype type
141. e Extrusion 808 00504 0000 004 155 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Shadow Volume Extrusion struct appdata ne float4 Position POSITION float3 Normal NORMAL float4 DiffuseColor COLORO float2 TexCoord0 TEXCOORDO struct vpconn he Plat 4 Hpos von PSI ON float4 Color0 COLORO float2 TexCoord0 TEXCOORDO vpconn main appdata IN uniform float4x4 WorldViewProj uniform float4 LightPos in object space uniform float4 Fatness uniform float4 ShadowExtrudeDist uniform float4 Factors vpconn OUT Create normalized vector from vertex to light tlosur4 ligne to vere momasllias N Positron Igino 7 N dot L to decide if point should be moved away from the light to extrude the volum iloeur melojl doe eligiir tO wert sya UN NOrmal sayz 7 Inset the position along the normal vector direction This moves the shadow volume points inside the model slightly to minimize popping of shadowed areas as each facet comes in and out of shadow The Fatness value should be negative AS pos TN Normek Fatness A UNO Sisto oye S74 eye za MSS JOOS IN POSTON wig 156 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders scale the vector from light to vertex Plc extrusion yes ligne to vee Sinverclonmdxceicvicls ul sic y if ndotl 0 then the vertex faces y away from the light so move it
142. e Qualifiers on page 175 Q default is an expression that resolves to a constant at compile time Default values are only permitted for uniform parameters and fot in parameters to functions that are not top level Function Calls Types A function call returns an rvalue Therefore if a function returns an array the array may be read but not written For example the following is allowed y myfunc x 2 But this is not myfunc x 2 y For multiple function calls within an expression the calls can occur in any order it is undefined Cg s types are as follows O The int type is preferably 32 bit two s complement Profiles may optionally treat int as float a The float type is as close as possible to the IEEE single precision 32 bit floating point Profiles must support the 1oat data type a The half type is lower precision IEEE like floating point Profiles must support the half type but may choose to implement it with the same precision as the float type a The fixed type is a signed type with a range of at least 2 2 and with at least 10 bits of fractional precision Overflow operations on the data type clamp rather than wrap Fragment profiles must support the fixed type but may implement it with the same precision as the half or float types Vertex profiles are required to provide partial support see Partial Support of Types on page 173 for the fixed type Vertex profiles have the o
143. e Steps to High Performance Cg ooococococcc a 257 Appendix D Cg Compiler Options ccccconn hh hh ahhh hne 265 vi 808 00504 0000 004 NVIDIA Contents Figures and Tables List of Figures Figure 1 Cgs Model of the GPW i iur care o Oe Rc ROO D Des RR 2 Figure 2 The Parts of the Cg Runtime API e 31 Figure 3 The Cg_Simple Workspace 89 Figure 4 The simple cg Shader llle 90 Figure 5 Example of Improved Skinning s a aooaa a 4 98 Figure 6 Example of Improved Water a 2 101 Figure 7 Example of Melting Paint 0 105 Figure 8 Example of MultiPaint 4 2 109 Figure 9 Example of Ray Traced Refraction 0 114 Figure 10 Example of Skin lt lt ot RR Rn 119 Figure 11 Example of Thin Film Effect 0 124 Figure 12 Example of Car Paint9 a lt aoc saosna e 4 2 127 Figure 13 Example of Anisotropic Lighting a a 134 Figure 14 Example of Bump Dot3x2 Diffuse and Specular 136 Figure 15 Example of Bump Reflection Mapping 140 Figure 16 Example of Fresnel s sa sa daaa o RR rtr nn 144 Figure 17 Example Of Grass llle 146 Figure 18 Exampl
144. e desired scalar value not just the x component Table 37 summarizes the valid binding semantics for uniform parameters in the ps 1 X profiles Table 37 ps 1 x Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register s0 register s3 Texture unit N where wis in range 0 3 TEXUNITO TEXTUNIT3 May be used only with uniform inputs with sampler types register c0 register c7 Constant register 0 7 C0 C7 232 808 00504 0000 004 NVIDIA Appendix B Language Profiles Binding Semantics for Varying Input Output Data The vatying input binding semantics in the ps 1 X profiles are the same as the varying output binding semantics of the vs 1 1 profile Varying input binding semantics in the ps 1 X profiles consist of COLORO COLOR1 TEXCOORDO TEXCOORD1 TEXCOORD2 and TEXCOORD3 These map to output registers in DirectX vertex shaders Table 38 summarizes the valid binding semantics for varying input parameters in the ps 1 X profiles Table 38 ps 1 x Varying Input Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Input color value vO COL COLO COLOR1 Input color value v1 COL1 TEXCOORDO TEXCOORD3 Input texture coordinates t0 t3 TEXO TEX3 Additionally the ps 1 x profiles allow POSITION FOG PSIZE TEXCOORD4 TEXCOORD5 TEXCOORD6 and TEXCOORD7 to be specified on varying inputs provided these inputs are not r
145. e is reduced gradually at every level such that in the distance the flecks are pointing mostly up The flecks specular power and their contribution are reduced by distance to give it a gtainier appearance up close and a more uniform appearance from afar Next the view vector is reflected off a wavy normal map which represents the object s natural undulations to index into the environment map The shininess of the clear coat itself is calculated by scaling the Fresnel term by the luminance of the environment map Finally the shader lerps between the diffuse paint color and the reflection based on the Fresnel term and adds the specular highlights Figure 12 Example of Car Paint 9 1 The luminance transfer function selects only the perceptually bright areas of the environment map in order not to reflect the darker areas of the scene 808 00504 0000 004 127 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Car Paint 9 This shader is based on the Time Machine temporal rust shader Car paint data was measured by Cornell University from samples provided by Ford Motor Company struct alv float4 float3 float2 t ll oiu 1e ll oiu float3 he OPosition ONormal uv Tangent Binormal Normal struct VS_OUTPUT float4 float2 float3 float4 float3 float4 float3 float3 i oe float be VS OUTPUT main TRANSFORMATIONS uniform float4x4 uniform float4x4 uniform
146. e of Refraction 2 eo 149 Figure 19 Example of Shadow Mapping 2 152 Figure 20 Example of Shadow Volume Extrusion 0 ee 155 Figure 21 Example of Sine Wave ler 158 Figure 22 Example of Matrix Palette Skinning 161 808 00504 0000 004 vii NVIDIA Cg Language Toolkit List of Figures viii 808 00504 0000 004 NVIDIA List of Tables Table 1 Mathematical Functions lt e sa te ae les 20 Table 2 Geometric FUNCIONS uxo x 0x x03 wee c o EO x ROX ERY RES SS E Rx OS 24 Table 3 Texture Map FUNCUONS 4 4 6 5 s woo 9k a e es 38 3k O RO oe 25 Table 4 Derivative Functions 22er 27 Table 5 Debugging Function eere 28 Table 6 Type Conversions 1 ww 4 ea 177 Table 7 Expanded Operators llle 188 Table 8 Vertex Output Binding Semantics 4 193 Table 9 Fragment Output Binding Semantics 2 ee 193 Table 10 vs 2 Uniform Input Binding Semantics o 198 Table 11 vs 2 Varying Input Binding Semantics eae 198 Table 12 vs 2 Varying Output Binding Semantics o 199 Table 13 ps 2 Uniform Input Binding Semantics rns 202 Table 14 ps 2 Varying Input Binding Semantics rns 202 Table 15 ps 2 Varying Output Binding
147. e previously computed dot products The returned vector holds the diffuse lighting contribution in the y coordinate and the specular lighting contribution in the z coordinate Remember to take advantage of the Standard Library to help speed up your development cycle Modulating the Diffuse and Specular Lighting Contributions Once the diffuse and specular lighting contributions lighting y and lighting z have been calculated we need to modulate them with the object s material properties blue diffuse material float3 diffuseMaterial float3 0 0 0 0 1 0 white specular material float3 specularMaterial float3 1 0 1 0 1 0 combine diffuse and specular contributions and P ECOut pU E ndm vent exe olo OUT Color rgb lighting y diffuseMaterial lighting z specularMaterial OUNCE Ooi se m Jp return OUT We define the object s diffuse material color as blue We modulate the lighting contributions with the material properties to get the final vertex color and we assign it to the output structure s color field OUT Color Finally we set the alpha channel of the final color to 1 0 so that our object will be opaque and return the computed position and color values stored in the OUT structure Further Experimentation Use simple cg as a framewotk to try more advanced experiments perhaps by adding more parameters to the program or by performing more complex calculations in the vertex program Have fun experimen
148. e standard library documentation for descriptions of these functions Table 35 Supported Standard Library Functions dot floatN floatN lerp floatN floatN floatN lerp floatN floatN float tex1D sampler1D float tex1D sampler1D float2 texlDproj samplerlD float2 texlDproj sampler1D float3 tex2D sampler2D float2 tex2D sampler2D float3 tex2Dproj sampler2D float3 tex2Dproj sampler2D float4 tex3D sampler3D float3 tex3Dproj sampler3D float4 texCUBE samplerCUBE float3 texCUBEproj samplerCUBE float4 Note The non projective texture lookup functions are actually done as projective lookups on the underlying hardware Because of this the w component of the texture coordinates passed to these functions from the application or vertex program must contain the value 1 Texture coordinate parameters for projective texture lookup functions must have swizzles that match the swizzle done by the generated texture addressing instruction While this may seem burdensome it is intended to allow ps 1 X profile programs to behave correctly under other pixel shader profiles 230 808 00504 0000 004 NVIDIA Appendix B Language Profiles Table 36 lists the swizzles required on the texture coordinate parameter to the ptojective texture lookup functions Table 36 Required Projective Texture Lookup Swizzles Te
149. e7 oia 19 float kE ndott dot n t Cost diy COS iO C elz Gloss ciy Cosi O its ss CO Siem eli COSL ote Cost chiy COSL Ste FSE SaS so Cosi chiy esse Gta Cosi chiy casi Gta 19 jeg 598 ise e 9 5 s eS estube rice sr s lleve 2 ie er cohlke crt e te return result float4 main fragin In uniform sampler2D tex0 uniform sampler2D texl uniform sampler2D tex2 uniform sampler2D tex3 uniform float3 eyeSpaceLightPosition uniform float thickness uniform float4 ambient COLOR 808 00504 0000 004 121 NVIDIA Cg Language Toolkit LOat LOa ag Loa Loa Loa Loa Loa O Loa 7 1 LOa LOat bscale In tangentToEyeMat0 w vara O71 1 atio of indices of refraction air skin 5 tm S455 specular exponent tA lsgimCcolos i ij i 1 by ff Xxgjwe colos tA simeeCcolor i 1 1 1 he sees colos v4 sikeimColeie e 2D esl In CSCO A ES a OS Der GeO t3 albedo 0 8 03 054 ip iliness mask cl laos 0 3 SAD mea mee o s L eye spac VE VECTOR t3 v normalize In eyeSpacePosition Get eye space light and halfangle vectors if Loa loa t3 1 normalize eyeSpaceLightPosition In eyeSpacePosition Eo la e doxwuedbbee wd p Get tangent space normal vector from normal map 157 f Loa loa t3 tangentSpaceNormal t3 bumpscale bscale tex2D tex0 In texcoor
150. eclaration for Direct3D 9 const D3DVERTEXELEMENT9 declaration O O SS O Elo D3DDECLTYPE FLOAT3 D3DDECL ETHOD DEFAULT D3DDECLUSAGE POSITION 0 Position Or Sy 9 SES O hoa y D3DDECLTYPE FLOAT3 D3DDECL ETHOD DEFAULT D3DDECLUSAGE NORMAL 0 Normal 0 E size CE OEE yy D3DDECLTYPE FLOAT2 D3DDECL ETHOD DEFAULT D3DDECLUSAGE TEXCOORD 0 Base texture i 0 sizeojr loe D3DDECLTYPE FLOAT3 D3DDECL ETHOD DEFAULT D3DDECLUSAGE TEXCOORD 1 Tangent D3DD3CL END Here is an example of a vertex declaration for Direct3D 8 const DWORD declaration D3DVSD_STREAM 0 D3DVSD REG D3DVSDE POSITION D3DVSDT FLOAT3 Position D3DVSD REG D3DVSDE NORMAL D3DVSDT FLOAT3 Normal D3DVSD SKIP 2 Skip the diffuse and specular color D3DVSD REG D3DVSDE TEXCOORDO DSDVSDT ELOAT2 Base texture D3DVSD STREAM 1 Tangent basis stream D3DVSD REG D3DVSDE TEXCOORDI D3DVSDT FLOAT3 Tangent D3DVSD END Both declarations tell the Direct3D runtime to find 1 the positions of the vertices in stream 0 as the first three floating point values of the vertex format 2 the normals as the next three floating point values following the three floating point values in stream 0 and 3 the texture
151. ed Water 808 00504 0000 004 101 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Improved Water struct app2vert float4 Position 3 POSITION H struct vert2frag float4 HPosition POSITION float4 TexCoord0 TEXCOORDO float4 TexCoordl TEXCOORDI float4 Color0 ESOO RO float4 Colorl zu e TRES y void calcWave out float disp out float2 normal float dampening float3 viewPosition float waveTime float height float frequency float2 waveDirection float distancel dot viewPosition xy waveDirection distancel frequency distancel waveTime disp height sin distancel dampening normal cos distancel height frequency waveDirection xy 4 dampening vert2frag main app2vert IN uniform float4x4 ModelViewProj uniform float4x4 ModelView uniform float4x4 ModelViewIT uniform float4x4 TextureMat uniform float Time uniform float4 Wavel uniform float4 WavelOrigin uniform float4 Wave2 uniform float4 Wave20rigin const uniform float4 WaveData 5 vert2frag OUT 102 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders float4 position float4 IN Position x 0 NADOS ont float4 normal float4 0 1 0 0 float dampening 1 dot position xyz position xyz 1000 loci at Clio float2 norm ron O s al d float waveTime Time x WaveData i z float frequency WaveData i z float height WaveData i
152. ed for any struct containing arrays Minimum Array Requirements Profiles are required to provide partial support for certain kinds of arrays This partial support is designed to support vectors and matrices in all profiles For vertex profiles it is additionally designed to support arrays of light state indexed by light number passed as uniform parameters and arrays of skinning matrices passed as uniform parameters Profiles must support subscripting copying and swizzling of vectors and matrices However subscripting with run time computed indices is not required to be supported Vertex profiles must support the following operations for any non packed array that is a uniform parameter to the program or is an element of a structure that is a uniform parameter to the program This requirement also applies when the array is indirectly a uniform program parameter that is it and or the structure containing it has been passed via a chain of in function parameters The two operations that must be suppotted are O Rvalue subscripting by a run time computed value or a compile time value Q Passing the entire array as a parameter to a function where the corresponding formal function parameter is declared as in The following operations are explicitly not required to be supported O Lvalue subscripting a Copying Q Other operators including multiply add compare and so on 180 808 00504 0000 004 NVIDIA Function Appendix A C
153. eferenced This allows Cg programs to have the same structure specify the varying output of a vs 1 1 profile program and the varying input of a ps 1 X profile program Table 39 summarizes the valid binding semantics for varying output parameters in the ps 1 X profile Table 39 ps 1 x Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Output color float4 COL COLO DEPTH Output depth 1oat DEPR 808 00504 0000 004 233 NVIDIA Cg Language Toolkit The output depth value is special in that it may only be assigned a value in the ps_1_3 profile and must be of the form float4 t texture addressing operation float z dot texCoord lt n gt t xyz float w dot texCoord lt n 1 gt t xyz depth z w Auxiliary Texture Functions Because the capabilities of the texture addressing instructions are limited in DirectX pixel shader 1 X a set of auxiliary functions are provided in these profiles that express the functionality of the more complex texture addressing instructions These functions are merely provided as a convenience for writing ps 1 X Cg programs The same result can be achieved by writing the expanded form of each function directly Using the expanded form has the additional advantage of being supported on other profiles Table 40 summatizes these functions Table 40 ps 1 x Auxiliary Texture Functions Texture Function Description
154. ement if else Q while Q for These control constructs require that their conditional expressions be of type bool Because Cg expressions like i lt 3 are of type bool this change from C is normally not apparent The vs_2_x and vp30 profiles support branch instructions so for and while loops are fully supported in these profiles In other profiles for and while loops may only be used if the compiler can fully unroll them that is if the compiler can determine the iteration count at compile time Likewise return can only appear as the last statement in a function in these profiles Function recursion and co recursion is forbidden in Cg The switch case and default keywords are reserved but they are not supported by any profiles in the current release of the Cg compiler 808 00504 0000 004 13 NVIDIA Cg Language Toolkit Function Definitions and Function Overloading To pass a modifiable function parameter in C the programmer must explicitly use pointers C provides a built in pass by reference mechanism that avoids the need to explicitly use pointers but this mechanism still implicitly assumes that the hardware supports pointers Cg must use a different mechanism because the vertex and fragment hardware of the GPU does not support the use of pointers Cg passes modifiable function parameters by value result instead of by reference The difference between these two methods is subtle it is only apparent when two fun
155. eter profileType is equal to CG GL VERTEX or CG GL FRAGMENT Function cgGLGetLatestProfile may be used in conjunction with cgCreateProgram or cgCreateProgramFromFile to ensure that the best available vertex and fragment profiles are used for compilation This allows you to make your application future ready because the Cg programs are automatically compiled for the best profiles that are available at runtime even if these profiles did not exist at the time the application was written Another 52 808 00504 0000 004 NVIDIA Using the Cg Runtime Library function that allows you optimal compilation is cgGLSetOptimalOptions It sets implicit compiler arguments that are appended to the argument list passed to cgCreateProgram ot cgCreateProgramFromFile void cgGLSetOptimalOptions CGprofile profile OpenGL Program Execution All programs must be loaded before they can be bound To load a program use cgGLLoadProgram void cgGLLoadProgram CGprogram program Binding a program only wotks if its profile is enabled This is done by calling cgGLEnableProfile with the program profile void cgGLEnableProfile CGprofile profile The binding itself is done using cgGLBindProgram void cgGLBindProgram CGprogram program Only one vertex program and one fragment program can be bound at any given time so binding a program implicitly unbinds any other program of that type Profiles are disabled using cgGLDisableProfile
156. ex program specification half data type is implemented as float fixed or sampler data types are not supported but the profile does provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables Compatibility with the vp20 Vertex Program Profile Programs that work with the vp20 profile are compatible with the arbvp1 profile as long as they use the Cg run time to manage all uniform parameters including OpenGL state That is arbvp1 and vp20 profiles can be used interchangeably without changing the Cg source code or the application program except for specifying a different profile However if any of the glProgramParameterxxNV routines ate used the application program needs to be changed to use the corresponding ARB functions Since there is no ARB function corresponding to glTrackMatrixNV an application using glTrackMatrixNV and the arbvp1 profile needs to be modified One solution is to change the Cg source code to refer to the matrix using the glstate structure so that the matrix is automatically tracked by the OpenGL driver as part of its GL ARB vertex support Another solution is for the application to use the Cg run time routine cgGLSetStateMatrixParameter to load the appropriate matrix or matrices when necessaty Another potential incompatibility between
157. figured programmable pipelines by using programming interfaces at the assembly language level In theory these low level programming interfaces provided great flexibility In practice they were painful to use and presented a serious barrier to the effective use of hardware Using a high level programming language rather than the low level languages of the past provides several advantages Q A high level language speeds up the tweak and run cycle when a shader is developed The ultimate test for a shader is Does it look right To that end the ability to quickly prototype and modify a shader is crucial to the rapid development of high quality effects O The compiler optimizes code automatically and performs low level tasks such as register allocation that are tedious and prone to error Q Shading code written in a high level language is much easier to read and understand It also allows new shaders to be easily created by modifying previously written shaders What better way to learn than from a shader wtitten by the best artists and programmers Q Shaders written in a high level language are portable to a wider range of hardware platforms than shaders written in assembly code This chapter introduces Cg C for Graphics a new high level language tailored for programming GPUs Cg offers all the advantages just described allowing programmers to finally combine the inherent power of the GPU with a language that makes GPU programming
158. files use of 168 compiler options command line 265 debug 266 Dmacro 265 entry 265 h 266 Ipathname 265 filename 265 longprogs 266 maxunrollcount 266 nocode 265 nofx 265 nostdlib 265 0 265 profile 265 profileopts 265 quiet 265 strict 265 v 266 compile time type category 174 268 808 00504 0000 004 NVIDIA computation frequency for performance 262 concrete type category 174 conditional code in fragment programs and performance 263 conditional operator 190 conditional operators 17 constants typing of 174 construction operator described 186 context core Cg 35 control constructs used 13 core Cg context 35 core Cg runtime 34 D data types bool 11 fixed 11 float 10 half 11 int 11 sampler 11 supported 10 data types for performance 261 debugging function 28 declaration Cg definition 168 definition as used in Cg 168 derivative functions 27 Direct3D Cg runtime 57 cgD3D9EnableDebugTracing 85 cgD3D9GetLastError 87 cgD3D9TranslateHRESULT 87 CGerror 86 debugging mode 83 error callbacks 87 error testing 87 error types 85 expanded interface 69 cgD3D8LoadProgram 75 cgD3D8SetSamplerState 73 cgD3D9BindProgram 76 cgD3D9EnableParameterShadowing 74 cgD3D9GetDevice 70 cgD3D9GetLatestPixelProfile 76 cgD3D9GetLatestVertexProfile 76 cgD3D9GetOptimalOptions 77 cgD3D9IsParameterShadowingEnable dO 74 cgD3D9IsProgramLoaded 76 cgD3D9LoadProgram 74
159. float4 dir x inten IN Color0 y Chit mc 9 g do the Bezier linear interpolation steps stuff here 808 00504 0000 004 NVIDIA 147 Cg Language Toolkit float t IN Color0 w LOBE ieee leo CETE ECEE ic Pp float4 temp2 lerp ctrl2 ctrl3 t float4 result lerp temp temp2 t add IN the height and wind displacement components position position result position w 1 transform for sending to the reg combiners OUT Hposition mul ModelViewProj position calculate the texture coordinate from the position passed IN QU ASCO loc Cri POSi EQ Sx wr 09 MEL 1 dl 10 m find the normal we need one more point to do a partial ices Jewel qeux12 OO canoa leg Geel2 eS i05 7 float4 newResult lerp temp temp2 t 0 05 do a crossproduct with a vector that is horizontal across the screen float normal cross result newResult xyz loas tir rO Ore normal normalize normal calculate diffuse lighting off the normal that was just calculated loses kewe RoS Elocues 0 5 1D p float3 lightVec normalize lightPos position float diffuseInten dot lightVec normal M Sart wo che tinel colos The first term is a semi random term based ll on the total height of this straw The second term is the diffuse lighting component OUT Color0 normalize ctrl3 diffuseInten IN 5 IXoysl tie d ono Ze retur
160. float4x4 uniform float3 uniform float3 HPosition uv light halfangle Rele cto ne view tangent binormal normal fresn VS OUTPUT O Generat a2v vert POSTTION NORMAL EXCOORDO PEXCOORD1 l EXCOORD2 EXCOORD3 e E Er Vel by 44 by Pl He POSTELON EXCOORDO PEXCOORD1 l EXCOORD2 EXCOORD3 TE XCOORD4 EXCOORD5 EXCOORD6 EXCOORD7 COLORO coord position in window wavy fleckmap coords light pos tangent space Blinn halfangle Refl vector per vertex view tangent space view tangent matrix E ES E ET E odelView odelViewIT odelViewProj LightVector EyePosition Obj USOS space space O HPosition homogeneous POSITION mul ModelViewProj vert OPosition Generate BASIS matrix float3x3 ModelTangent normalize vert Tangent normalize vert Binormal 128 808 00504 0000 004 NVIDIA FRESNEL float4 Fresn float3x3 Vie Generate VI float3 viewN t4 viewP P w loa view float3 viewV Generate iELOENES glow float3 objL float3 objH Generate float3 tanL float3 tanV float3 tanH Advanced Profile Sample Shaders normalize vert Normal OERSET SCALE POWER I UNUSED IS el Ex HE Odes cula E On Ose due wlangent mul ModelTangent EW SPAC normalize i OBJECT SPAC normalize
161. for a program in order to avoid any unfortunate inconsistencies it is advisable to stick with the expanded interface for all shader related operations that can be performed through its functions such as shader setting shader activation and parameter setting including setting texture stage states Setting the Direct3D Device The expanded interface encapsulates more functionality than the minimal interface to ease program and parameter management It does this by making the appropriate Direct3D calls at the appropriate times Because some of these calls require the Direct3D device it must be communicated to the Cg runtime HRESULT cgD3D9SetDevice IDirect3DDevice9 device 808 00504 0000 004 69 NVIDIA Cg Language Toolkit You can get the Direct3D device currently associated with the runtime using cgD3D9GetDevice IDirect3DDevice9 cgD3D9GetDevice When cgD3D9SetDevice is called with zero as an input all Direct3D resources used by the expanded interface are released Since a Direct3D device is destroyed only when all references to it are removed the application should call cgD3D9SetDevice with zero as an input when it is done with a Direct3D device so that it gets destroyed when the application shuts down Otherwise Direct3D does not shut down properly and reports memory leaks to the debug console Note that calling cgD3D9SetDevice with zero as an input does not affect the Cg core runtime resources in any wa
162. fragmentProgram Draw scene Called before the device changes or is destroyed void OnDestroyDevice ff Clas tas i wince iem tetis ta xpanded interface to release ies internal referente to the Directa device tando tree ats Direct3D resources cgD3D9SetDevice 0 Called before application shuts down void OnShutdown This frees any core runtime resource cgDestroyContext context 80 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Expanded Interface DirectD3D 8 Application The following C code links the previous vertex and fragment programs to the Direct3D 8 application finclude cg cg h finclude lt cg cgD3D8 h gt IDirect3DDevice8 device Initialized somewher ls IDirect3DTexture8 texture Initialized somewher ls D3DXCOLOR constantColor Initialized somewher ls ClerciomtieexiE Cuore esxE D CGprogram vertexProgram fragmentProgram CGparameter baseTexture someColor modelViewMatrix Called at application startup void OnStartup Create context context cgCreateContext Called whenever the Direct3D device needs to be created void OnCreateDevice Pass the Direct3D device to th xpanded interfac cgD3D8Set Device device Determine the best profiles to use CGprofile vertexProfile cgD3D8GetLatestVertexProfile CGprofile pixelProfile cgD3D8GetLatestPixelProfi
163. g Language Specification Note that when the array is rvalue subscripted the result is an expression and this expression is no longer considered to be a uniform program parameter Therefore if this expression is an array its subsequent use must conform to the standard rules for atray usage These rules ate not limited to arrays of numeric types and thus imply support for arrays of struct arrays of matrices and arrays of vectors when the array is a uniform program parameter Maximum array sizes may be limited by the number of available registers or other resource limits and compilers are permitted to issue etrot messages in these cases However profiles must support sizes of at least 1oat arr 8 float4 arr 8 and float4x4 arr 4 4 Fragment profiles are not required to support any operations on arbitrarily sized arrays only support for vectors and matrices is required Overloading Multiple functions may be defined with the same name as long as the definitions can be distinguished by unqualified parameter types and do not have an open profile conflict see Overloading of Functions by Profile on page 170 Function matching rules 1 Add all visible functions with a matching name in the calling scope to the set of function candidates 2 Hliminate functions whose profile conflicts with the current compilation profile 3 Eliminate functions with the wrong number of formal parameters If a candidate function has exces
164. g Output Binding Semantics continued Binding Semantics Name Corresponding Data COLORO COLO Output primary color COLOR1 COL1 Output secondary color BCOLO Output backface primary color BCOL1 Output backface secondary color TEXCOORDO TEXCOORD3 TEXO TEX3 Output texture coordinates The profile also allows WPOS to be present as binding semantics on a member of a structure of a varying output data structure provided the member with this binding semantics is not referenced This allows Cg programs to have the same structure specify the varying output of a vp20 profile program and the varying input of an p30 profile program 808 00504 0000 004 243 NVIDIA Cg Language Toolkit OpenGL NV_texture_shader and NV_register_combiners Profile p20 The OpenGL NV_texture_shader and NV_register_combiners profile is used to compile Cg source code to the nvparse text format for the NV_texture_shader and NV_register_combiners family of OpenGL extensions a Profile name p20 Q How to invoke Use the compiler option profile fp20 This document describes the capabilities and restrictions of Cg when using the p20 profile Overview Operations in the p20 profile can be categorized as texture shader operations and atithmetic operations Texture shader operations are operations which generate texture shader instructions arithmetic operations are operations which generate register combinets inst
165. g semantic For example the following is legal although not recommended struct myfragoutput Hroar muwcouose 3 COMO In such cases the variable is implicitly copied with a typecast to the semantic upon program completion If the variable s vector size is shorter than the semantic s vector size the larger numbered components of the semantic receive their default values if applicable and otherwise are undefined In the case above the Rand G components of the output color are obtained from mycolor while the B and A components of the color are undefined 194 808 00504 0000 004 NVIDIA Appendix B Language Profiles This appendix describes the language capabilities that are available in each of the following profiles supported by the Cg compiler a Oooodooco O a DirectX Vertex Shader 2 x Profiles vs_2 DirectX Pixel Shader 2 x Profiles ps 2 OpenGL ARB Vertex Program Profile arbvp1 OpenGL ARB Fragment Program Profile arbfp1 OpenGL NV vertex program 2 0 Profile vp30 OpenGL NV fragment program Profile p30 DirectX Vertex Shader 1 1 Profile vs 1 1 DirectX Pixel Shader 1 x Profiles ps 1 OpenGL NV vertex program 1 0 Profile vp20 OpenGL NV texture shader and NV register combiners Profile p20 In each case the capabilities are a subset of the full capabilities described by the Cg language specification in Cg Language Specification on page 165 808 00504 0000 00
166. ghtEXT command NORMAL Input normal through Normal command COLORO DIFFUSE Input primary color through Color command COLOR1 SPECULAR Input secondary color through SecondaryColorEXT command FOGCOORD Input fog coordinate through FogCoordEXT command TEXCOORDO TEXCOORD7 Input texture coordinates texcoord0 texcoord7 through MultiTexCoord command ATTRO ATTR15 Generic Attribute 0 15 through VertexAttrib command PSIZE ATTR6 Generic Attribute 6 808 00504 0000 004 209 NVIDIA Cg Language Toolkit Table 21 summarizes the valid binding semantics for varying output parameters in the arbvp1 profile These binding semantics map to ARB vertex program output registers The two sets act as aliases to each other Table 21 arbvp1 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION HPOS Output position PSIZE PSIZ Output point size FOG FOGC Output fog coordinate COLORO COLO Output primary color COLOR1 COL1 Output secondary color BCOLO Output backface primary color BCOL1 Output backface secondary color TEXCOORDO TEXCOORD7 TEXO TEX7 Output texture coordinates Note The application must call ylEnable GL COLOR SUM ARB in order to enable COLORI output when using the arbvpl1 profile The profile also allows WPOS to be present as binding semantics on a member of a structute of a varying output data structure provided the member with this binding semantics
167. gle xyz Tangent space VIEW vector float3 V normalize vert view xyz ilo w Cist vert VIEW Wy Tangent space WAVY NORMAL float3 wavyN float3 tex2D WavyMap vert uv 2 1 wavyN normalize wavyN WavyScale PAINT A normal map map could be loaded here instead if we wanted more detail In this case we have a uniform tangent space normal 0 0 1 iloghne im cl id Ih we slow im db in iz float3 paint color float3 tex2D PaintMap AA m gl i m el im SPECULAR POWER use a saturated diffuse term to clamp the backlighting n dh saturate n d 1 4 pow n d h NewPaintSpec y REFLECTION ENVIRONMENT Reflect view vector about wavy normal and bring to view space float3 R reflect V wavyN R R x vert tangent R y vert binormal R z vert normal float3 reflect color float3 texCUBE EnvironmentMap R FLECKS Load random 3 vector flecks from fleck map Reduce tiling artifacts by sampling at different frequencies float3 fleckN float3 tex2D FleckMap vert uv 37 2 1 fleckN float3 tex2D FleckMap vert uv 23 2 1 2 ELECKN 25 808 00504 0000 004 131 NVIDIA Cg Language Toolkit log lees a alla saturate dot fleckN H iloenES fleck colo Mecicolor joxoxw Griexele m cl la lerp NewPaintSpec y NewPaintSpec w v dist Control the ambient fleckiness and also attenuate with
168. glstate matrix modelview 0 glstate matrix projection glstate matrix mvp glstate matrix texture 0 glstate matrix palette 0 glstate matrix program 0 glstate matrix inverse modelview 0 glstate matrix inverse projection glstate matrix inverse mvp glstate matrix inverse texture 0 glstate matrix inverse palette 0 glstate matrix inverse program 0 glstate matrix transpose modelview 0 glstate matrix transpose projection glstate matrix transpose mvp glstate matrix transpose texture 0 glstate matrix transpose palette 0 glstate matrix transpose program 0 glstate matrix invtrans modelview 0 glstate matrix invtrans projection glstate matrix invtrans mvp glstate matrix invtrans texture 0 glstate matrix invtrans palette 0 glstate matrix invtrans program 0 Table 17 lists the glstate fields of type float4 that can be accessed Table 17 float4 glstate Fields glstate material ambient glstate material diffuse glstate material specular glstate material emission glstate material shininess glstate material front ambient glstate material front diffuse glstate material front specular glstate material front emission glstate material front shininess glstate material back ambient glstate material back diffuse glstate material back specular glstate material back emission glstate material back shininess glstate light 0 ambient gistate light 0 diffuse glstate light 0 specular glstate light 0 position gistate light 0 attenuation gistate light 0 spot directio
169. hader specifc flags like declaration and usage cgD3D8LoadProgram fragmentProgram TRUE 0 0 0 Grab some parameters modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor Sanity check that parameters have th xpected siz assert cgD3D8TypeToSize cgGetParameterType modelViewMatrix 16 assert cgD3D8TypeToSize cgGetParameterType someColor may Set parameters that don t change They can be set only once since parameter shadowing is enabled cgD3D8SetTexture baseTexture texture cgD3D8SetUniform someColor amp constantColor 82 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Called to render the scen void OnRender Load model view matrix D3DXMATRIX modelViewMatrix i if Set the parameters that change every frame This must be done before binding the programs cgD3D8SetUniformMatrix modelViewMatrix amp modelViewMatrix Bind the programs This downloads any parameter values that have been previously set cgD3D8BindProgram vertexProgram cgD3D8BindProgram fragmentProgram Draw scene d Called before the device changes or is destroyed void OnDestroyDevice I Calling dais unction ells ela xpanded interface to release its intern
170. hat parameters have th xpected siz assert CgD3D8TypeToSize cgGetParameterType modelViewMatrix 16 oSize cgGetParameterType someColor 5 n T assert cgD3D81 4 YP I Calles to rencer idas Seca void OnRender Get the Direct3D resource locations for parameters This can be done earlier and saved DWORD modelViewMatrixRegister cgGetParameterResourceIndex modelViewMatrix 68 808 00504 0000 004 NVIDIA Using the Cg Runtime Library DWORD baseTextureUnit cgGetParameterResourceIndex baseTexture DWORD someColorRegister cgGetParameterResourceIndex someColor See the Direct3D state device gt SetVertexShaderConstant modelViewMatrixRegister matrix 4 device gt SetPixelShaderConstant someColorRegister TO onisisdnmis oto SNP device gt SetTexture baseTextureUnit texture device gt SetVertexShader vertexShader device gt SetPixelShader pixelShader Draw scene bik Called before the device changes or is destroyed void OnDestroyDevice device gt DeleteVertexShader vertexShader device DeletePixelShader pixelShader Called before application shuts down void OnShutdown This frees any core runtime resources The minimal interface has no dynamic storage to free cgDestroyContext context Direct3D Expanded Interface If you use the expanded interface
171. he application provides this value with each vertex Cg provides a flexible mechanism for specifying these per vertex inputs in the form of a set of predefined names Each program input must be bound to a 808 00504 0000 004 5 NVIDIA Cg Language Toolkit name from this set In the following structure the vertex program definition binds its parameters to the predefined names POSITION NORMAL TANGENT and TEXCOORD3 The application must provide the vertex array data associated with these predefined names struct myinputs float3 myPosition JNE SEMINE float3 myNormal NORMAL float3 myTangent TANGENT flog eriam de pev Mem CL ECE CORDO he outdata foo myinputs indata PP ugs OH Within the program the parameters are referred to as indata myPosition indata myNormal and so on As We refer to the predefined names as binding semantics The following set of binding semantics is supported in all Cg vertex program profiles Some Cg profiles support additional binding semantics POSITION BLENDWEIGHT NORMAL TANGENT BINORMAL PSIZE BLENDINDICES TEXCOORDO TEXCOORD7 The binding semantic POSITIONO is equivalent to the binding semantic POSITION likewise the other binding semantics have similar equivalents In the OpenGL Cg profiles binding semantics implicitly specify the mapping of varying inputs to particular hardware registers However in DirectX based Cg profiles there is no such
172. ifier only refers to the outermost array However it is possible to declare a packed array of packed arrays by declaring the first level of array in a typedef using the packed keyword and then declaring a packed array of this type in a second statement It is not possible to have a packed array of unpacked arrays Q For any supported numeric data type TYPE implementations must support the following packed array types which are called vector types Type identifiers must be predefined for these types in the global scope typedef packed TYPE TYPE1 1 typedef packed TYPE TYPE2 2 typedef packed TYPE TYPE3 3 typedef packed TYPE TYPE4 4 For example implementations must predefine the type identifiers 10at1 float2 float3 float4 and so on for any other supported numeric type Q For any supported numeric data type TYPE implementations must support the following packed array types which are called matrix types Implementations must also predefine type identifiers in the global scope to represent these types packed TYPE1 TYPE1x1 1 packed TYPE1 TYPE3x1 3 packed TYPE2 TYPE1x2 1 packed TYPE2 TYPE3x2 3 packed TYPE3 TYPE1x3 1 packed TYPE3 TYPE3x3 3 packed TYPEA TYPE1x4 1 packed TYPEA TYPE3x4 3 packed TYPE1 TYPE2x1 2 packed TYPE1 TYPEAx1 4 packed TYPE2 TYPE2x2 2 packed TYPE2 TYPE4x2 4 packed TYPE3 TYPE2x3 2 packed TYPE3 TYPEAx3 4 packed TYPEA TYPE2x4 2 packed TYPE4 TYPE4x4 4 For example implementatio
173. iform Scalar Vector and Matrix Parameters The function cgD3D9SetUniform sets floating point parameters like float 3 and float4x3 HRESULT cgD3D9SetUniform CGparameter parameter const void value The amount of data required depends on the type of parameter but is always specified as an array of one or more floating point values The type is void so a user defined structure that is compatible can be passed in without type casting Here is some code illustrating the use of cgD3D9SetUniform for setting a vectorParam of type float3 matrixParam of type float2x3 and arrayParam of type float2x2 3 D3DXVECTOR3 vectorData 1 2 3 loe imac ara 2 1 ES til Zo sh Le Sy Bike float arrayData 3 2 2 ills 21 495 450594097 By MS tte 10h lily 12 cgD3D9SetUniform vectorParam amp vectorData cgD3D9SetUniform matrixParam matrixData cgD3D9SetUniform arrayParam arrayData As mentioned previously cgD3D9TypeToSize can be used to determine how many values are required for setting a parameter of a particular type For convenience there is also a function to set a parameter from a 4x4 matrix of type D3DMATRIX HRESULT cgD3D9SetUniformMatrix CGparameter parameter const D3DMATRIX matrix The upper left portion of the matrix is extracted to fit the size of the input parameter so that you could set matrixParam this way as well D3DXMATRIX matrix i iL iL 0 E IO 0 0 0 0 Qu Of Of cgD3D9SetUnifor
174. iform parameters in the arbfp1 profile Table 22 arbfp1 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register s0 register s15 Texunit image unit N where wis in range TEXUNITO TEXUNIT15 0 15 May only be used with uniform inputs with sampler types register c0 register c31 Local Parameter N where N is in range C0 C31 0 31 May only be used with uniform inputs 808 00504 0000 004 NVIDIA Options Appendix B Language Profiles Binding Semantics for Varying Input Output Data Table 23 summarizes the valid binding semantics for varying input parameters in the arbfp1 profile Table 23 arbfp1 Varying Input Binding Semantics Binding Semantics Name Corresponding Data type COLORO Input color 0 float4 COLOR1 Input color 1 float4 TEXCOORDO TEXCOORD7 Input texture coordinates 1oat4 Table 24 summarizes the valid binding semantics for varying output parameters in the arbfp1 profile Table 24 arbfp1 Varying Output Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Output color float4 DEPTH Output depth float The ARB fragment program profile allows the following profile specific options NumTemps lt n gt where 16 lt n lt 32 default 32 NumInstructionSlots lt n gt where 72 lt n lt 1024 default 1024 NoDependentReadLimit lt b gt where b 0 or 1 default 1 N
175. iform variables in the Cg source code Bindings Binding Semantics for Uniform Data Table 19 summarizes the valid binding semantics for uniform parameters in the arbvpl profile Table 19 arbvp1 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c255 Local parameter with index n n 0 255 C0 C255 The aliases c0 c255 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first local parameter that is used 208 808 00504 0000 004 NVIDIA Appendix B Language Profiles Binding Semantics for Varying Input Output Data Table 20 summarizes the valid binding semantics for uniform parameters in the arbvpl profile The set of binding semantics for varying input data to arbvp1 consists of POSITION BLENDWEIGHT NORMAL COLORO COLOR1 TESSFACTOR PSIZE BLENDINDICES and TEXCOORDO TEXCOORD7 One can also use TANGENT and BINORMAL instead of TEXCOORD6 and TEXCOORD7 Additionally a set of binding semantics of ATTRO ATTR15 can be used The mapping of these semantics to corresponding setting command is listed in the table Table 20 arbvp1 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION Input Vertex through Vertex command BLENDWEIGHT Input vertex weight through WeightARB VertexWei
176. ime Can be determined at compile time is defined as follows The loop iteration expressions can be evaluated at compile time by use of intra procedural constant propagation and folding where the variables through which constant values are propagated do not appear as lvalues within any kind of control statement if for or while or construct Profiles may choose to support more general constant propagation techniques but such support is not required Q Profiles may optionally support fully general or and while loops 808 00504 0000 004 185 NVIDIA Cg Language Toolkit New Vector Operators These new operators are defined for vector types Q Vector construction operator lt typeID gt This operator builds a vector from multiple scalars or shorter vectots float4 scalar scalar scalar scalar float4 float3 scalar O Matrix construction operator lt typeID gt This operator builds a matrix from multiple rows Each row may be specified either as multiple scalars or as any combination of scalars and vectots with the appropriate size float3x3 1 2 3 4 5 6 7 8 9 float3x3 float3 float3 float3 float3x3 1 float2 float3 float3 1 1 1 Q Swizzle operator a b xxyz A swizzle operator exampl At least one swizzle character must follow the operator E Y There are two sets of swizzle characters and they may not be mixed Set one is xyzw 0123 and set two is rgba 0123
177. ime type category includes types c 1oat and cint These types are used by the compiler for constant type convetsions O The concrete type category includes all types that are not included in the compile time type category Q The scalar type category includes all types in the numeric category the bool type and all types in the compile time category In this specification a reference to a lt category gt type such as a reference to a numeric type means one of the types included in the category such as float half or fixed Constants A constant may be explicitly typed or implicitly typed Explicit typing of a constant is performed as in C by suffixing the constant with a single character indicating the type of the constant Q f for float Q d for double A h for half Q x for fixed Any constant that is not explicitly typed is Zzp icitly typed If the constant includes a decimal point it is implicitly typed as cfloat If it does not include a decimal point it is implicitly typed as cint 174 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification By default constants are base 10 For compatibility with C integer hexadecimal constants may be specified by prefixing the constant with 0x and integer octal constants may be specified by prefixing the constant with 0 Compile time constant folding is preferably performed at the same precision that would be used if the operation were performed at run time Some c
178. imilar functions exist to set the values of arrays of uniform mattix parameters void cgGLSetMatrixParameterArrayfr CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetMatrixParameterArrayfc CGparameter parameter long startIndex long numberOfElements const float array void cgGLSetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements const double array void cgGLSetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements const double array 50 808 00504 0000 004 NVIDIA Using the Cg Runtime Library and to query those values void cgGLGetMatrixParameterArrayfr CGparameter parameter long startIndex long numberOfElements float array void cgGLGetMatrixParameterArrayfc CGparameter parameter long startIndex long numberOfElements float array void cgGLGetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements double array void cgGLGetMatrixParameterArraydc CGparameter parameter long startIndex long numberOfElements double array The c and r suffixes have the same meaning as they do for the cgGLSetMatrixParameter functions Setting Varying Parameters The values of fragment program vatying parameters are set as the result of the interpolation across the triangles performed by the GPU so only the values of vertex program vatying parameters are set by the applicatio
179. implied mapping Binding semantics may be specified directly on program parameters rather than on struct elements Thus the following vertex program definition is legal outdata foo float3 myPosition EE C CHEN float3 myNormal NORMAL float3 myTangent TANGENT loat seeubree A SN SANA O ORD SA E eng E Within the program the parameters are referred to by their variable names myPosition myNormal myTangent and refractive index JC EET 6 808 00504 0000 004 NVIDIA Introduction to the Cg Language Varying Outputs to and from Vertex Programs The outputs of a vertex program pass through the rasterizer and ate made available to a fragment program as varying inputs For a vertex program and fragment program to interoperate they must agree on the data being passed between them As it does with the data flow between the application and vertex program Cg uses binding semantics to specify the data flow between the vertex program and fragment program This example shows the use of binding semantics for vertex program output Vertex program SEAMS IG MAE 3 float4 pout POSITION Used for rasterization float4 diffusecolor COLORO float4 uv0 TEXCOORDO float4 uvl TEXCOORD1 he Hay TOO aoo YY 4 myvf outstuff jv Gros Rial Savia Wyble ee S And this example shows how to use this same data as the input to a fragment program Fragment program SEUA A float4 diffu
180. ing Here we compute 3 scattering terms simultaneously and the results end up in the x y z components of a float3 Using 3 terms approximates distribution of multiply scattered light For details see Matt Pharr s SIGGRAPH 2001 RenderMan course notes Layered Media for Surface Shaders float3 temp singleScatter T2 T n g albedo thickness Suo Z5 COM meoicll E 5 NOEZ temp x temp y temp z Add contributions from oil sheen and subsurface scattering and modulate by light color and result of a shadow map lookup return lightColor tex2Dproj tex3 In shadowcoords r oil sheen subsurf 808 00504 0000 004 123 NVIDIA Cg Language Toolkit Thin Film Effect Description This demo shows a thin film interference effect Specular and diffuse lighting are computed per vertex in a Cg program along with a view depth parameter which is computed using the view vector surface normal and the depth of the thin film on the surface of the object The view depth is then perturbed in an ad hoc manner per fragment by the underlying decal texture and is then used to lookup into a 1D texture containing the precomputed destructive interference for red green blue wavelengths given a particular view depth This interference value is then used to modulate the specular lighting component of the standard lighting equation Figure 11 Example of Thin Film Effect Vertex Shader S
181. ion to hold local program parameters minimum limit of 24 and temporary results minimum limit of 16 If the compiler needs more temporaries or local parameters to compile a program than are available it generates an error 4 To understand the capabilities of Opena ARB fragment programs and the code roduced by the compiler refer to the ARB fragment program extension in the OpenGL ixtensions documentation 808 00504 0000 004 211 NVIDIA Cg Language Toolkit Language Constructs and Support Bindings Data Types This profile implements data types as follows Q float data type is implemented as IEEE 32 bit single precision Q half fixed and double data types are treated as float Q int data type is supported using floating point operations Q sampler types are supported to specify sampler objects used for texture fetches Statements and Operators With the ARB fragment program profiles while do and for statements are allowed only if the loops they define can be unrolled because there is no dynamic branching in ARB fragment program 1 Comparison operators are allowed gt lt gt lt and Boolean operators 11 amp amp are allowed However the logic operators 8 ate not Using Arrays and Structures Variable indexing of arrays is not allowed Array and structure data is not packed Binding Semantics for Uniform Data Table 22 summarizes the valid binding semantics for un
182. ions in the Direct3D Cg runtime library have a cgD3D prefix There are actually two Direct3D Cg runtime libraries One for Direct3D 8 and one for Direct3D 9 Functions belonging to the Direct3D 8 Cg runtime have a cgD3D8 prefix and functions belonging to the Direct3D 9 Cg runtime have a cgD3D9 prefix Because most of the functions are identical between the two runtimes we describe the Direct3D 9 Cg runtime with the understanding that the description applies to the Direct3D 8 Cg runtime as well unless otherwise indicated The same prefix convention used for the function names is also used for the type names macro names and enumerant values 808 00504 0000 004 31 NVIDIA Cg Language Toolkit Header Files Here is how to include the core Cg runtime API into your C or C program include lt Cg cg h gt Here is how to include the OpenGL Cg runtime API include lt Cg cgGL h gt Here is how to include the Direct3D 9 Cg runtime API include lt Cg cgD3D9 h gt And here is how to include the Direct3D 8 Cg runtime API include lt Cg cgD3D8 h gt Creating a Context A context is a container for multiple Cg programs It holds the Cg programs as well as their shared data Here s how to create a context CGcontext context cgCreateContext Compiling a Program Compile a Cg program by adding it to a context with cegCreateProgram CGprogram program cgCreateProgram context CG SOURCE myVertexProgramString
183. ions on size and dimensionality Restrictions on the use of computed subscripts are also permitted Arrays may be designated as packed The operations allowed on packed arrays may be different from those allowed on unpacked arrays Predefined packed types ate provided for vectors and matrices It is strongly recommended these predefined types be used Q There is a built in swizzle operator xyzw or rgba for vectors This operator allows the components of a vector to be rearranged and also replicated It also allows the creation of a vector from a scalar Q For an lvalue the swizzle operator allows components of a vector or matrix to be selectively written Q There is a similar built in swizzle operator for matrices m lt row gt lt col gt _m lt row gt lt col gt This operator allows access to individual matrix components and allows the creation of a vector from elements of a matrix For compatibility with 166 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification DirectX 8 notation there is a second form of matrix swizzle which is described later O Numeric data types are different Cg s primary numeric data types are float half and fixed Fragment profiles are required to support all three data types but may choose to implement half and fixed at float precision Vertex profiles are required to support half and 1oat but may choose to implement half at float precision Vertex profiles may omit supp
184. l constructs 808 00504 0000 004 165 NVIDIA Cg Language Toolkit Q Arrays are first class types because Cg does not support pointers Q Functions pass values by value result and thus use an out or inout modifier in the formal parameter list to return a parameter By default formal parametets ate in but it is acceptable to specify this explicitly Parameters can also be specified as in out which is semantically the same as inout Differences from ANSI C Cg was developed based on the ANSI C language with the following major additions deletions and changes This is a summary mote detail is provided latet in this document Q Language profiles described in Profiles on page 168 may subset language capabilities in a variety of ways In particular language profiles may restrict the use of for and while loops For example some profiles may only suppott loops that can be fully unrolled at compile time O A binding semantic may be associated with a structure tag a variable ot a structure element to denote that object s mapping to a specific hardware or API resource See Binding Semantics on page 183 Reserved keywords goto break and continue are not supported Reserved keywords switch case and default are not supported Labels ate not supported either Q Pointers and pointer related capabilities such as the and gt operators are not supported Q Arrays are supported but with some limitat
185. lar namespace typedef names including an automatic typedef from a struct declaration Variables Function names Arrays and Subscripting Arrays are declared as in C except that they may optionally be declated to be packed as described under Types on page 171 Arrays in Cg are first class types so array parameters to functions and programs must be declared using array syntax rather than pointer syntax Likewise assignment of an array typed object implies an array copy rather than a pointer copy Arrays with size 1 may be declared but are considered a different type from the corresponding non artay type Because the language does not currently support pointers the storage order of arrays is only visible when an application passes parameters to a vertex or fragment program Therefore the compiler is currently free to allocate temporary variables as it sees fit 808 00504 0000 004 179 NVIDIA Cg Language Toolkit The declaration and use of arrays of arrays 1s in the same style as in C That is if the 2D array A is declared as float A 4 4 then the following statements are true a The array is indexed as A row column Q The array can be built with a constructor using A a ALTO ALO 1D ALOT 7 ALON TSI CALL IOI AML Aa 1211 ZEN o CASPZEEO CASAS AS E232 ALZ Si CALS Oil AISI Ll AIST AISI TSI e Q A O0 is equivalent to A 0 0 A 0 1 A 0 2 A 0 3 Support must be provid
186. le Grab the optimal options for each profile const char vertexOptions cgD3D8GetOptimalOptions vertexProfile 0 const char pixelOptions cgD3D8GetOptimalOptions pixelProfile 0 Create the vertex shader vertexProgram cgCreateProgramFromFi le context CG SOURCE VertexProgram cg vertexProfile VertexProgram vertexOptions If your program uses explicit binding semantics like this one you can create a vertex declaration using those semantics DWORD declaration D3DVSD STREAM 0 808 00504 0000 004 81 NVIDIA Cg Language Toolkit D3DVSD REG D3DVSDE POSITION D3DVSDT FLOAT3 D3DVSD REG D3DVSDE DIFFUSE D3DVSDT D3DCOLOR D3DVSD REG D3DVSDE TEXCOORDO D3DVSDT FLOAT2 D3DVSD END Ensure the resulting declaration is compatible with the shader This is really just a sanity check assert cgD3D8ValidateVertexDeclaration vertexProgram declaration Load the program with th xpanded interfac Parameter shadowing is enabled second parameter TRUE cgD3D8LoadProgram vertexProgram TRUE 0 0 declaration Create the pixel shader fragmentProgram cgCreateProgramFromFile context CG SOURCE FragmentProgram cg pixelProfile FragmentProgram pixelOptions Load the program with th xpanded interfac Parameter shadowing is enabled second parameter TRUE Ignore vertex s
187. lf4 specResult lighting z specStr specCol half4 specC half3 reflVect reflect Vn Nb half4 refl1C half fakeFr olor texCUBE EnvMap esnel ReflData FRESN i material METALNESS reflVect ReflData FRESN E MIN MAX 112 NVIDIA 808 00504 0000 004 Advanced Profile Sample Shaders pow saturate 1 0h dot Vn IN N ReflData FRESNEL EXPON half4 paintShine fakeFresnel reflColor half4 metalShine surfCol reflColor half4 shineCol ReflData REFL STRENGTH lerp paintShine metalShine material METALNESS half4 finalColor specResult diffResult shineCol finalColor w 1 0h return finalColor 808 00504 0000 004 113 NVIDIA Cg Language Toolkit Ray Traced Refraction Description This shader presents a method for adding high quality details to small objects using a single bounce ray traced pass In this example the polygonal surface is sampled and a refraction vector is calculated This vector is then intersected with a plane that is defined as being perpendicular to the object s x axis The intersection point is calculated and used as texture indices for a painted iris The demo permits varying the index of refraction the depth and density of the lens Note that the choice of geometry is arbitrary this sample is a sphere but any polygonal model can be
188. loat4 hpos SEIS SIMON it is equivalent to const D3DVERTEXELEMENT9 declaration CO 0 Siizeoit elkoaie 7 D3DDECLTYPE FLOAT4 D3DDECL ETHOD DEFAULT D3DDECLUSAGE POSITION 0 OL Sez COs Hoare ir D3DDECLTYPE FLOAT4 D3DDECL ETHOD DEFAULT D3DDECLUSAGE COLOR ORE CO te MESS cod tod D3DDECLTYPE FLOAT4 D3DDECL ETHOD DEFAULT D3DDECLUSAGE TEXCOORD 0 D3DD3CL END y for the Direct3D 9 Cg runtime and it is equivalent to const DWORD declaration D3DVSD STREAM 0 D3DVSD REG D3DVSDE POSITION D3DVSDT FLOAT4 D3DVSD REG D3DVSDE DIFFUSE D3DVSDT_FLOAT4 D3DVSD REG D3DVSDE TEXCOORDO D3DVSDT FLOATA4 D3DVSD END e for the Direct3D 8 Cg runtime 808 00504 0000 004 59 NVIDIA Cg Language Toolkit Usually though you want to apply a vertex program to geometric data that come in multiple streams or with specific vertex formats In this case the vertex declaration is based on the vertex formats rather than the program To see if it is compatible with the program use cgD3D9ValidateVertexDeclaration CGbool cgD3D9ValidateVertexDeclaration CGprogram program const D3DVERTEXELEMENT9 declaration for the Direct3D 9 Cg runtime or cgD3D8ValidateVertexDeclaration CGbool cgD3D8ValidateVertexDeclaration CGprogram program
189. lookups require the associated texture unit to be configured by the application for depth compare texturing otherwise no depth comparison is actually performed More Details The purpose of this chapter has been to give you a brief overview of Cg so that you can get started quickly and experiment to gain hands on experience If you would like some more detail about any of the language features described in this chapter see Cg Language Specification on page 165 18 808 00504 0000 004 NVIDIA Cg Standard Library Functions Cg provides a set of built in functions and predefined structures with binding semantics to simplify GPU programming These functions ate similar in spirit to the C standard library providing a convenient set of common functions In many cases the functions map to a single native GPU instruction meaning they are executed very quickly Of those functions that map to multiple native GPU instructions you may expect the most useful to become more efficient in the near future Although customized versions of specific functions can be written for performance or precision reasons it is generally wiser to use the standard library functions when possible The standard library functions will continue to be optimized for future GPUs meaning that a shader written today will automatically be optimized for the latest architectures at compile time Additionally the standard library provides a convenient unified interfa
190. low between different programmable units On a GPU for example packets of vertex data flow from the application to the vertex program Because packets are produced by one program the application in this case and consumed by another the vertex program there must be some method for defining the interface between the two The approach used in Cg is to associate a binding semantic with each element of the packet This is a bind by name approach For example an output with the binding semantic FOO is fed to an input with the binding semantic FOO Profiles may allow the user to define arbitrary identifiers in this semantic namespace or they may restrict the allowed identifiers to a predefined set Often these predefined names correspond to the names of hardware registers or API resources In some cases predefined names may control non programmable parts of the hardware For example vertex programs normally compute a position that is fed to the rasterizer and this position is stored in an output with the binding semantic POSITION For any profile there are two namespaces for predefined binding semantics the namespace used for in variables and the namespace used for out variables The primary implication of having two namespaces is that the binding semantic cannot be used to implicitly specify whether a variable is in ot out Binding Semantics A binding semantic may be associated with an input to a top level function in one of
191. m Modifier Function Non static global variables and parameters passed to functions such as main can be declared with an optional qualifier uniform To specify a uniform variable use this syntax uniform type variable For example uniform float4 myVector Of fragout foo uniform float4 uv If the uniform qualifier is specified for a function that 1s not top level it is meaningless and is ignored The intent of this rule is to allow a function to serve either as a top level function or as one that is not Note that uniform variables may be read and written just like non uniform variables The uniform qualifier simply provides information about how the initial value of the variable 1s to be specified and stored through a mechanism external to the language Typically the initial value of a uniform variable or parameter is stored in a different class of hardware register Furthermore the external mechanism for specifying the initial value of uniform variables or parameters may be different than that used for specitying the initial value of non uniform variables or parameters Parameters qualified as uniform are normally treated as persistent state while non uniform parameters are treated as streaming data with a new value specified for each stream record such as within a vertex array Declarations Functions are declared essentially as in C A function that does not return a value must be declared with a void return ty
192. m is returned Q CG PROGRAM ENTRY The main entry point of the Cg source program is returned O CG PROGRAM PROFILE The profile string is returned Q CG COMPILED PROGRAM The resulting compiled program is returned 38 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Core Cg Parameter Cg functions exist for retrieving and querying parameters Parameter Retrieval Parameter retrieval can be either iterative ot direct Iteration A program has a sequence of parameters that can be iterated over by using cgGetFirstParameter and cgGetNextParameter CGparameter cgGetFirstParameter CGprogram program CGenum namespace CGparameter cgGetNextParameter CGparameter parameter A call to cgGetFirstParameter returns the first parameter of the sequence If the program is invalid or does not contain any parameter the call returns zero Given a parameter cgGetNextParameter returns the parameter immediately next in the sequence or zero if there is none The namespace argument of cgGetFirstParameter specifies the name space of the parameters returned by this function and subsequent calls to cgGetNextParameter Every parameter belongs to a particular name space that defines its scope For now the scope of any parameter is limited to the program it belongs to so that the only possible value for namespace is CG PROGRAM Note In the future other name spaces such as the context may be defined in which case
193. mMatrix matrixParam amp matrix In the example above every element of matrixParam is set to 1 72 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Setting Uniform Arrays of Scalar Vector and Matrix Parameters To set an array parameter use cgD3D9SetUniformArray HRESULT cgD3D9SetUniformArray CGparameter parameter DWORD startIndex DWORD numberOfElements const void array The parameters startIndex and numberOfElements specify which elements of the array parameter are set Those are the numberO Elements elements of indices ranging from startIndexto startIndex numberOfElements 1 It is assumed that array contains enough values to set all those elements As with cgD3D9SetUniform cgD3D9TypeToSize can be used to determine how many values are required and the type is void so a compatible user defined structure can be passed in without type casting There is a convenience function equivalent to cgD3D9SetUniformMatrix HRESULT cgD3D9SetUniformMatrixArray CGparameter parameter DWORD startIndex DWORD numberOfElements const D3DMATRIX matrices The parameters startIndex and numberOfElements have the same meanings as for cgD3D9SetUniformMatrix The upper left portion of each matrix of the atray matrices is extracted to fit the size of the element of the array parameter parameter rray matrices is assumed to have numberOfElements clements Setting Sampler Parameters You assign a Direc
194. malOptions It returns a string representing the optimal set of compiler options for a given profile char const cgD3D9GetOptimalOptions CGprofile profile This string is meant to be used as part of the argument parameter to cgCreateProgram It does not need to be destroyed by the application However its content could change if cgD3D9GetOptimalOptions is called again for the same profile but for a different Direct3D device Expanded Interface Program Examples In this section we provide programs that illustrates how and when to use functions from the expanded interface to make Cg programs work with Direct3D For the sake of clarity the examples do very little error checking but a production application should check the return values of all Cg functions The vertex and fragment programs that follow are referenced in Expanded Interface DirectD3D 9 Application on page 78 and Expanded Interface DirectD3D 8 Application on page 81 Expanded Interface Vertex Program The following Cg code is assumed to be in a file called VertexProgram cg void VertexProgram OA doles O SON aum logs color ee OLOROF in float4 texCoord TEXCOORDO Gue loert posre OnO 8 Jeep out float4 coloro COLORO out float4 texCoordO TEXCOORDO const uniform float4x4 ModelViewMatrix positionO mul position ModelViewMatrix colorO color texCoordO texCoord Expanded Interface Fragment Program The following Cg code is
195. matrix new packed pixelshader public return sampler state sampler3D short static struct template texture2D textureRECT true typeid union vector virtual while asm fragment break char compile continue delete double else explicit fixed friend half inline interface mutable operator pass private register row major sampler1D samplerCUBE signed static_cast switch texture texture3D this try typename unsigned vertexfragment void Appendix A Cg Language Specification auto case class const decl discard dword emit extern float get if inout long namespace out pixelfragment protected reinterpret cast sampler sampler2D shared sizeof string technique texturelD textureCUBE throw typedef uniform using vertexshader volatile identifier two underscores before identifier Cg Standard Library Functions Cg provides a set of built in functions and predefined structures with binding semantics to simplify GPU programming These functions are discussed in Cg Standard Library Functions on page 19 808 00504 0000 004 NVIDIA 191 Cg Language Toolkit Vertex Program Profiles A few features of the Cg language that are specific to vertex program profiles are required to be implemented in the same manner for all vertex program profiles Mandatory Computation of Position Output Vertex program profiles may and typically do require that the
196. me provides all the functions necessary to manage Cg programs from within the application It makes no assumption about which 3D API the applications uses so that any application could easily ignore the API specific Cg runtime libraries and content itself with the core Cg runtime 34 808 00504 0000 004 NVIDIA Using the Cg Runtime Library The core Cg runtime is built around three main concepts context program and parameter which are represented by the CGcontext CGprogram and CGparameter object types Those concepts are hierarchically related one to each other a program has several parameters a context contains several programs and the application can define several contexts Note In the future it will also be possible to define parameters at the level of the context so that they are shared among all the programs of a context The next sections go over those three basic object types and the related functions The three object types have some points in common Q The use of CGbool which is an integer type equal to either CG TRUE or CG FALSE Q The use of CGenum which is an enumerate type used to specify various enumerate values that are not necessarily related a The convention that functions that return a value of type CGcontext CGprogram CGparameter of const char indicate failure by returning zero Core Cg Context Cg provides functions for creating destroying and querying contexts Context Creation
197. meter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor Sanity check that parameters have th xpected siz assert cgD3D9TypeToSize cgGetParameterType modelViewMatrix 16 808 00504 0000 004 65 NVIDIA Cg Language Toolkit assert CgD3D9TypeToSize cgGet ParameterType someColor 07 Called to render the scen void OnRender Get the Direct3D resource locations for parameters This can be done earlier and saved DWORD modelViewMatrixRegister cgGetParameterResourceIndex modelViewMatrix DWORD baseTextureUnit cgGetParameterResourcelndex baseTexture DWORD someColorRegister cgGetParameterResourceIndex someColor Set the Direct3D state device gt SetVertexShaderConstantF modelViewMatrixRegister matrix 4 device gt SetPixelShaderConstantF someColorRegister ACOSTA E vice gt SetVertexDeclaration vertexDeclaration vice gt SetTexture baseTextureUnit texture vice gt SetVertexShader vertexShader vice gt SetPixelShader pixelShader aaqaaQaa Draw scene Called before the device changes or is destroyed void OnDestroyDevice vertexShader gt Release pixelShader gt Release vertexDeclaration Release Called before application shuts down void OnShutdown
198. n Setting a vertex vatying parameter requires two steps The first step consists in passing a pointer to an array containing the values for each vertex This is done using cgGLSetParameterPointer void cgGLSetParameterPointer CGparameter parameter GLint size GLenum type GLsizei stride GLvoid array The variable size indicates the number of values per vertex that are stored in array It is equal to 1 2 3 or 4 If fewer values are set than the parameter requires the non specified values default to O for x y and z and 1 for w The enumerate type type specifies the data type of the values stored in array GL SHORT GL INT GL FLOAT or GL DOUBLE The parameter stride is the byte offset between any two consecutive vertices Passing a value of zero for stride is equivalent to passing a byte offset equal to size multiplied by the size of type in bytes in other words it means that there is no gap between two consecutive vertex values Note that the minimum size for array is implicitly defined by the biggest vertex index specified in the triangles drawn The second step consists in enabling the varying parameter for a specific drawing call void cgGLEnableClientState CGparameter parameter The equivalent disabling function is void cgGLDisableClientState CGparameter parameter 808 00504 0000 004 51 NVIDIA Cg Language Toolkit Another way to set vertex varying parameter is to use the cgGLSetParameter functions When
199. n OUT 148 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Refraction Description This effect performs custom texture coordinate generation to compute a refracted vector per vertex that is then used to look up in a cube map Fresnel is also calculated to blend between reflection and refraction Figure 18 Figure 18 Example of Refraction 808 00504 0000 004 149 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Refraction GWESPUIOIE ENPUES F floats Position SEP SSIBINIKONIS float4 Normal NORMAL struct Outputs float4 hPosition we POSITION float4 fresnelTerm COLORO float4 refractVec TELE XCOORD OF float4 reflectVec ECO ORD fresnel approximation eo ias os cin lees T ilo Ny float3 fresnelValues fixed power fresnelValues x fixed scale fresnelValues y fixed bias fresnelValues z cecuri Jotas sr jor 0 elo r 1 ower seco outputs main inputs IN uniform float4x4 ModelViewProj uniform float4x4 ModelView uniform float4x4 ModelViewIT uniform float theta outputs OUT OUT hPosition mul ModelViewProj IN Position convert the position and normal into appropriate spaces float3 eyeToVert mul ModelView IN Position xyz eyeToVert normalize eyeToVert float3 normal mul ModelViewIT IN Normal xyz normal normalize normal 150 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders
200. n glstate light 0 half 808 00504 0000 004 NVIDIA 205 Cg Language Toolkit Table 17 float4 glstate Fields continued glstate lightmodel ambient glstate lightmodel scenecolor glstate lightmodel front scenecolor glstate lightmodel back scenecolor glstate lightprod 0 ambient glstate lightprod 0 diffuse glstate lightprod 0 specular glstate lightprod 0 front ambient glstate lightprod 0 front diffuse glstate lightprod 0 front specular gistate lightprod 0 back ambient glstate lightprod 0 back diffuse glstate lightprod 0 back specular glstate texgen 0 eye s glstate texgen 0 eye t glstate texgen 0 eye r glstate texgen 0 eye q glstate texgen 0 object s glstate texgen 0 object t glstate texgen 0 object r glstate texgen 0 object q glstate fog color glstate fog params glstate clip 0 plane Table 18 lists the glstate fields of type float that can be accessed Table 18 float glstate Fields glstate point size glstate point attenuation Position Invariance Q The arbvp1 profile supports position invariance as described in the core language specification Q The modelview projection matrix is not specified using a binding semantic Of GL MVP 206 NVIDIA 808 00504 0000 004 Appendix B Language Profiles Data Types This profile implements data types as follows Q float data type is implemented as defined in the ARB vert
201. n this Implementation 224 220040 m RR Rohan nx man 203 OpenGL ARB Vertex Program Profile arbvp1 leen 204 OVGIVIGW eeu ac dur xii ROI ORC A 204 Accessing OpenGL State espaces qoe ree RP o a OR Pone PUES RON RE n 204 Posion Invatlamncez sx oe ees A RE RUE E CANS wor ace ee 206 Data TYPES 45i tek ahh a Ra RR ACER RYE RO ROCA RA AR a RC 207 Compatibility with the vp20 Vertex Program Profile o oooooo o 207 Loading Constants sereni gei wed cepa a te Re ete e n e xU e Od 208 BINGINGS 22 42 T 208 OpenGL ARB Fragment Program Profile aztb p1 llle 211 MEMO cow a ce ar sh che te bay dat dada Aa ita Grinch ek 211 Language Constructs and SUPPORT s eee ee se ara e eRe a aee m ee 212 BINDINGS s 3 sade RR Coe Re EROR TATE dub RUE ERR a Maia Ra 212 OPUS snc eee enemies ned See eke hee oe Ge Pees 213 Limitations in the Implementation looo n eee 213 OpenGL NV vertex program 2 0 Profile vp30 s ers lle lee n n 214 Position InVvaridiCe oia b rper om de qe ipee Re ex RR Roe RU dd wore ded 214 Language CONStTUELS sco ue ark rt RR ee DA Rao RR E RC ER RT eee EY 214 BINGINGS C e eM PE tar aa Reade DA ae ear koe da on 215 OpenGL NV_fragment_program Profile p30 ooocooooorrrooroo 218 Language Constructs and Support 554 22 rrr hh n Rh e ee 218 A r rrm 219 Pack and Unpack FUNCHONS omic mde pae ao me pcm RE er m cni ete woe 220 DirectX Vertex shader 1 1 Profile vs 11 iasi asa ie c
202. nagement The Cg runtime also offers additional facilities to manage the input parameters of the Cg program In particular it makes data types such as arrays and matrices easier to deal with These additional functions also encompass the necessary 3D API calls to minimize code length and reduce programmer errors Overview of the Cg Runtime The Cg runtime API consists of three parts Figure 2 Q A core set of functions and structures that encapsulates the entire functionality of the runtime A set of functions specific to OpenGL built on top of the core set A set of functions specific to Direct3D built on top of the core set 30 808 00504 0000 004 NVIDIA Using the Cg Runtime Library To make it easier for application writers the OpenGL and Direct3D runtime libraries adopt the philosophy and data structure style of their respective API Figure 2 The Parts of the Cg Runtime API The rest of the section provides instructions for using the Cg runtime in the framework of an application Each step includes source code for OpenGL and Direct3D programming Functions that involve only pure Cg resource management belong to the core runtime and have a cg prefix In these cases the same code is used for OpenGL and Direct3D When functions from the OpenGL or Direct3D Cg runtimes are used notice that the API name is indicated by the function name Functions belonging to the OpenGL Cg runtime library have a egGL prefix and funct
203. name vs matches any vertex profile while the name ps matches any fragment or pixel profile The names ps 1 and ps 2 match any DirectX 8 pixel shader 1 x profile or DirectX 9 pixel shader 2 x profile respectively Similarly the names vs 1 and vs 2 match any DirectX vertex shader 1 x or 2x respectively Additional valid wildcard profile names may be defined by individual profiles In general the most specific version of a function is used More details are provided in Function Overloading on page 181 but roughly speaking the search order is the following 1 Version of the function with the exact profile overload 2 Version of the function with the most specific wildcard profile overload such as vs or ps 1 3 Version of the function with no profile overload This search process allows generic versions of a function to be defined that can be overridden as needed for particular hardware 170 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification Syntax for Parameters in Function Definitions Functions are declared in a manner similar to C but the parameters in function definitions may include a binding semantic see Binding Semantics on page 183 and a default value Each parameter in a function definition takes the following form uniform type identifier binding semantic gt lt default gt where Q type may include the qualifiers in out inout and const as discussed in Typ
204. nates associated with sampler tex and prevlookup is the result of a previous texture operation This function can be used to generate the texdp3tex instruction in the ps 1 2andps 1 3 profiles tex2D dp3x2 uniform sampler2D tex float3 str float4 intermediate coord float4 prevlookup Performs the following float2 newst float2 dot intermediate coord xyz prevlookup xyz dot str prevlookup xyz return tex2D tex newst where str are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation and intermediate coord are texture coordinates associated with the previous texture unit This function can be used to generate the texm3x2pad texm3x2tex instruction combination in all ps 1 x profiles 808 00504 0000 004 235 NVIDIA Cg Language Toolkit Table 40 ps 1 x Auxiliary Texture Functions continued Texture Function Description tex3D dp3x3 sampler3D tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup texCUBE dp3x3 samplerCUBE tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup Performs the following float3 newst float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot str prevlookup xyz return tex3D CUBE tex newst where str are texture coordinates associated with sampler tex prevlookup is the result of a p
205. nces a valid program CGbool cgIsProgram CGprogram program Compilation Result You can query the result of the compilation resulting from the last call to cgCreateProgram for a given context by using cgGetLastListing const char cgGetLastListing CGcontext context If no call to cgCreateProgram has been made for the context cgGetLastListing returns zero Otherwise it returns a string containing the output you would typically get from the command line version of the compiler Program Attributes To retrieve the context the program belongs to use cgGetProgramContext CGcontext cgGetProgramContext CGprogram program Retrieving the profile the program has been compiled to is done with cgGetProgramProfile CGprofile cgGetProgramProfile CGprogram program The function pair cgGetProfile and cgGetProfileString allows you to find the correspondence between a profile enumerant and its corresponding string CGprofile cgGetProfile const char profileString const char cgGetProfileString CGprofile profile If the string passed to cgGetProfile does not correspond to any profile CG PROFILE UNKNOWN is returned The function cgGetProgramString retrieves various strings related to the program depending on the value of the enumerant stringType const char cgGetProgramString CGprogram program CGenum stringType The variable stringType can have any of these values Q CG PROGRAM SOURCE The original Cg source progra
206. nctions continued Texture Function Description texCUBE reflect eye dp3x3 uniform samplerCUBE tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup uniform float3 eye Performs the following float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot coords xyz prevlookup xyz return texCUBE tex 2 dot N E dot N N N E where strq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate coordl are texture coordinates associated with the n 2 texture unit intermediate coord are texture coordinates associated with the n 1 texture unit and eye is the eye ray vector This function can be used to generate the texm3x3pad texm3x3pad texm3x3spec instruction combination in all ps 1 x profiles tex dp3x2 depth float3 str float4 intermediate coord float4 prevlookup Performs the following float z dot intermediate coord xyz prevlookup xyz float w dot str prevlookup xyz return z w where str are texture coordinates associated with the nth texture unit intermediate coord are texture coordinates associated with the n 1 texture unit and prevlookup is the result of a previous texture operation This function can be used with the DEPTH varying out semantic to generate the texm3x2pad texm3x2depth instruction combination in ps 1 3
207. ngle WavesX IN TexCoord0 x WavesY IN TexCoord0 y angle angle Time float3 sine cosine sincos angle sine cosine posicion abes u sunan sim eimnglei we float4 position position xz IN TexCoord0 xy position y dot WavesH sine POSO O OUT HPOS mul WorldViewProj position normal is t h WaveX cos angle YX t h WaveY cos angle float3 normal normal x dot WavesH WavesX cosine 1 808 00504 0000 004 NVIDIA 159 Cg Language Toolkit Tornali gt 0187 normal z dot WavesH WavesY cosine transform normal into eye space normal mul WorldViewIT normal normal normalize normal Transform vertex to eye space and compute the vector from the eye to the vertex Because th ye is at 0 no subtraction is necessary Because the reflection of this vector d looks into a cube map normalization is also 72 4 unnecessary float3 eyeVector mul WorldView position OUT TEXO xyz reflect eyeVector normal return OUT 160 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Matrix Palette Skinning Description This effect performs matrix palette skinning using two bones per vertex All the bones for the mesh ate set in the constant memory and each vertex includes two indices that indicate which bones influence this vertex The final skinned positions are computed using these bones along with the weights
208. niform float4x4 ModelViewI uniform float4 ViewerPos uniform float4 LightPos vert2frag Out 2 Ou i Ou n jac Vertex positions In clip space t HPosition mul ModelViewProj In Position In object space t OPosition In Position xyz In eye space t EPosition mul ModelView In Position xyz t Normal normalize In Normal xyz Copy the texture coordinates t TexCoord0 In TexCoord0 xyz Generate a white color t Color0 LightPos t LightPos mul ModelViewI LightPos xyz t ViewerPos mul ModelViewI float4 0 0 0 1 xyz5 CUE 106 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders Pixel Shader Source Code for Melting Paint struct vert2frag MO AA O SiO ADO SON POSES POLOS Iron TEX O ORDZ XN ae mE Os TEON TEXCOORD3 float3 Normal TEXCOORDI float3 TexCoord0 TEXCOORDO float4 Color0 ICO MORO float3 LightPos TEXCOORD4 float3 ViewerPos TEXCOORD5 he void calcLighting out float diffuse out float specular f3ltorais genou maus tio rs caghos sn lOces ico float3 eyePos float specularExp ElOcics beim ligados icici os7 float len length light Jibgjsue liee len float3 eye normalize eyePos fragPos float3 halfVec normalize eyePos light iphone siecle Ls 13 lem loew Liclmestas iE cla late ae orando dot halfVec normal specularExp diffuse lighting y attenuation specular
209. ns must predefine the type identifiers float2x1 float3x3 float4x4 and so on A typedef follows the usual matrix naming convention of TYPE_rows_X_columns If we declare float4x4 a then a 3 is equivalent to a _m30_m31_m32_m33 Both expressions extract the third row of the matrix Q Implementations are required to support indexing of vectors and matrices with constant indices Q Astruct type is a collection of one or more members of possibly different types Partial Support of Types This specification mandates partial support for some types Partial support for a type requires the following 808 00504 0000 004 173 NVIDIA Cg Language Toolkit Q Definitions and declarations using the type are supported Q Assignment and copy of objects of that type are supported including implicit copies when passing function parameters a Top level function parameters may be defined using that type If a type is partially supported variables may be defined using that type but no useful operations can be performed on them Partial support for types makes it easier to share data structures in code that is targeted at different profiles Type Categories O The ntegral type category includes types cint and int O The floating type category includes types cfloat float half and fixed Note that floating really means floating or fixed fractional Q The numeric type category includes integral and floating types The compile t
210. ntensCoord float2 dot IN texCoordl xyz normal xyz dot IN texCoord2 xyz normal xyz intensity tex2D intensityMap intensCoord color tex2D colorMap IN texCoord3 xy Corona Matas 256 808 00504 0000 004 NVIDIA Appendix C Nine Steps to High Performance Cg Writing Cg code that compiles to efficient programs requires techniques and approaches that are different from efficient programming in C C or Java While some of the basic lessons ate the same such as using efficient underlying algorithms the hardware programming model of modern GPUs is substantially different from that of modern CPUs This can lead to pitfalls where you may be disappointed by your shader s performance as well as to opportunities where you can push the GPU to its limits though careful programming The Cg language shields you from the majority of the low level details of GPU hardware enabling you to think about your shaders at a higher level than the low level GPU instruction sets However just as an understanding of modern computer architecture such as cache and memoty hierarchy issues is important for writing fast C and C code understanding a bit about the GPU can help you write better Cg code This appendix focuses on techniques for maximizing performance from vertex and fragment programs written in Cg and running on the NVIDIA GeForce FX architecture specifically the vp30 p30 arb p1 ps 2 0 ps 2 x vs 2 0 andvs 2 x profiles
211. o accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used 808 00504 0000 004 215 NVIDIA Cg Language Toolkit Binding Semantics for Varying Input Output Data Table 26 summarizes the valid binding semantics for varying input parameters in the vp30 profile One can also use TANGENT and BINORMAL instead Of TEXCOORD6 and TEXCOORD7 These binding semantics map to NV_vertex_program2 input attribute parameters The two sets act as aliases to each other Table 26 vp30 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION ATTRO Input Vertex Generic Attribute O BLENDWEIGHT ATTR1 Input vertex weight Generic Attribute 1 NORMAL ATTR2 Input normal Generic Attribute 2 COLORO DIFFUSE ATTR3 Input primary color Generic Attribute 3 COLOR1 SPECULAR ATTR4 Input secondary color Generic Attribute 4 TESSFACTOR FOGCOORD ATTR5 Input fog coordinate Generic Attribute 5 PSIZE ATTR6 Input point size Generic Attribute 6 BLENDINDICES ATTR7 Generic Attribute 7 TEXCOORDO TEXCOORD7 ATTR8 ATTR15 Input texture coordinates texcoord0 texcoord7 Generic Attributes 8 15 TANGENT ATTR14 Generic Attribute 14 BINORMAL ATTR15 Generic Attribute 15 Table 27 summatizes the valid binding semantics for var
212. ogram TRUE 0 0 0 70 808 00504 0000 004 NVIDIA Using the Cg Runtime Library fp rece T Bind sampler parameter GCparameter parameter parameter cgGetParameterByName program MySampler cgD3D9SetTexture parameter myDefaultPoolTexture void OnLostDevice First release all necessary resources PrepareForReset Next actually reset the D3D devic Giewioce meset PP cao 9 DE Finally recreate all those resource OnReset void PrepareForReset JS See Soh Releas xpanded interface referenc cgD3D9SetTexture mySampler 0 Release local reference and any other references to the texture myDefaultPoolTexture Release HO ta Yh void OnReset Recreate myDefaultPoolTexture in D3DPOOL DEFAULT EO Bak OH Since the texture was just recreated it must be re bound to the parameter GCparameter parameter parameter cgGetParameterByName prog MySampler cgD3D9SetTexture nySampler myDefaultPoolTexture PO bec SUL j See the Direct3D documentation for a full explanation of lost devices and how to properly handle them 808 00504 0000 004 71 NVIDIA Cg Language Toolkit Setting Expanded Interface Parameters This section discusses setting the various types of parameters of the expanded interface including uniform scalar uniform vector uniform matrix uniform arrays of the three previous types and sampler Setting Un
213. ogram 24 Discovered sampler parameter BaseTexture E D b EH Discovered uniform parameter SomeColor of type float4 cgD3D TRACE Finished discovering parameters for pixel program 24 cgD3D TRACE Shadowing state for sampler parameter BaseTexture cgD3D TRACE Shadowing sampler state D3DTSS MAGFILTER for sampler parameter BaseTexture cgD3D TRACE Shadowing sampler state D3DTSS MINFILTER for sampler parameter BaseTexture cgD3D TRACE Shadowing sampler state D3DTSS MIPFILTER for sampler parameter BaseTexture cgD3D TRACE Shadowing 16 values for uniform parameter ModelViewProj of type float4x4 cgD3D TRACE Activating vertex shader for program 3 cgD3D TRACE Setting shadowed parameters for program 3 CgD3D TRACE Setting registers for uniform parameter ModelViewProj of type float4x4 cgD3D TRACE Setting constant registers 0 3 for parameter ModelViewProj of type float4x4 cgD3D TRACE Activating pixel shader for program 24 cgD3D TRACE Setting shadowed parameters for program 24 cgD3D TRACE Setting texture for sampler parameter BaseTexture cgD3D TRACE Setting SamplerState 0 D3DTSS MAGFILTER for sampler parameter BaseTexture 84 808 00504 0000 004 NVIDIA Using the Cg Runtime Library cgD3D TRACE Setting SamplerState 0 D3DTSS MINFILTER for sampler parameter BaseTexture cgD3D TRACE Setting SamplerState 0 D3
214. om startIndexto startIndextnumberOfElements 1 Passing a value of 0 for numberOfElements tells the functions to set all the values starting at index startIndex up to the last valid index of the array namely cgGetArraySize parameter 0 1 This is equivalent to setting numberOfElements to cgGetArraySize parameter 0 startIndex The parameter array is an atray of scalar values It must have numberOfElements for the cgGLSetParameterArray1 functions 2 numberOfElements for the cgGLSetParameterArray2 functions and so on The corresponding parameter value retrieval functions are as follows void cgGLGetParameterArraylf CGparameter parameter long startIndex long numberOfElements float array void cgGLGetParameterArrayld CGparameter parameter long startIndex long numberOfElements double array void cgGLGetParameterArray2f CGparameter parameter long startIndex long numberOfElements float array void cgGLGetParameterArray2d CGparameter parameter long startIndex long numberOfElements double array void cgGLGetParameterArray3f CGparameter parameter long startIndex long numberOfElements float array void cgGLGetParameterArray3d CGparameter parameter long startIndex long numberOfElements double array void cgGLGetParameterArray4f CGparameter parameter long startIndex long numberOfElements float array void cgGLGetParameterArray4d CGparameter parameter long startIndex long numberOfElements double array S
215. ompilation profiles may allow some precision flexibility for the hardware in such cases the compiler should ideally perform the constant folding at the highest hardware precision allowed for that data type in that profile If constant folding cannot be performed at run time precision it may optionally be performed using the precision indicated below for each of the numeric data types Q float s23e8 p32 IEEE single precision floating point half s10e5 p16 floating point with IEEE semantics Q fixed s1 10 fixed point clamping to 2 2 Q double s52e11 p64 IEEE double precision floating point Q int signed 32 bit integer Type Qualifiers The type of an object may be qualified with one or more qualifiers Qualifiers apply only to objects Qualifiers are removed from the value of an object when used in an expression The qualifiers are Q const The value of a const qualified object cannot be changed after its initial assignment The definition of a const qualified object that is not a parameter must contain an initializer Named compile time values are inherently qualified as const but an explicit qualification is also allowed The value of a static const cannot be changed after compilation and thus its value may be used in constant folding during compilation A uniform const on the other hand is only const for a given execution of the program its value may be changed via the runtime between executions O inand out F
216. ompiled program by the compiler in which case the application can simply ignore it and not set its value Calling cgIsParameterReferenced allows you to check whether a parameter is actually used by the final compiled program CGbool cgIsParameterReferenced CGparameter parameter No error is generated if you set the value of a parameter that is not referenced Parameter Attributes The program that the parameter corresponds to is found using cgGetParameterProgram CGprogram cgGetParameterProgram CGparameter parameter To determine whether the parameter is vatying uniform ot constant cgGetParameterVariability is used CGenum cgGetParameterVariability CGparameter parameter The call returns CG_VARYING if the parameter is a varying parameter CG UNIFORM if the parameter is a uniform parameter or CG CONSTANT if the parameter is a constant parameter A constant parameter is a parameter whose value never changes for the life of a compiled program so that changing its value requires recompiling the program For some profiles the compiler has to add some that correspond to literal constant values in the code To obtain the parameter direction use cgGetParameterDirection CGenum cgGetParameterDirection CGparameter parameter It returns CG_IN if the parameter is an input parameter CG OUT if the parameter is an output parameter or CG_INOUT if the parameter is both an input and an output parameter 42 808 00504 0000
217. one Y CGD3D9ERR NULLVALUE Returned when a value of zero is passed to a function that requires a non zero value CGD3D9ERR OUTOFRANGE Returned when an array range specified to a function is out of range Y CGD3D9 INVALID REG Returned when a register number is requested for an invalid parameter type This error is specific to the minimal interface functions and does not trigger an error callback 86 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Testing for Errors When a Direct3D runtime function is called that returns an error of type HRESULT the proper method of testing for success or failure is to use the Win32 macros FAILED and SUCCEEDED Simply testing the error against Zero or D3D OK is not sufficient because there could be more than one success value As an added convenience and for uniformity with the core runtime the Direct3D runtime also supplies cgD3D9GetLastError which is analogous to cgGetLastError but returns the last Direct3D runtime error of type HRESULT for which the FAILED macro returns TRUE HRESULT cgD3D9GetLastError The last error is always cleared immediately after the call The function cgD3D9TranslateHRESULT converts an error of type HRESULT into a string const char cgD3D9TranslateHRESULT HRESULT hr This function should be called instead of DXGetErrorDescription 9 because it also translates errors that the Cg Direct3D runtime generates Using Err
218. onmentMaps 2 8 looi float3 reflectColor texCUBE environmentMaps 0 reflectVec rgb float3 reflectColorDark texCUBE environmentMaps 1 reflectVecDark rgb closws colo rertlecucolo r xdos sr reflectColorDark colorl eubisa lose delo LO 104 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders Melting Paint Description This shader uses an environment map with procedurally modified texture lookups to create a melting effect on the surface texture the NVIDIA logo in this example The reflection vector is shifted using a noise function giving the appearance of a bumpy surface The surface texture s texture coordinates are shifted in a time dependent manner also based on a noise texture Figure 7 Example of Melting Paint Vertex Shader Source Code for Melting Paint define inputs from application struct app2vert float4 Position LO SeenON float4 Normal NORMAL 808 00504 0000 004 105 NVIDIA Cg Language Toolkit fal iz y oat4 Color0 COLORO oat4 TexCoord0 TEXCOORDO struct vert2frag float4 HPosition SEOSJTUN ON float3 OPosition LE X lt COOR DZ float3 EPosition TEXCOORD3 float3 Normal TEXCOORD1 float3 TexCoord0 TES O OBI Or float4 Color0 COLORO float3 LightPos TEXCOORD4 float3 ViewerPos TEXCOORD5 y vert2frag main app2vert In uniform float4x4 ModelViewProj uniform float4x4 ModelView u
219. onn y float4 Hposition POSITION float4 TexCoord0 TEXCOORDO float4 TexCoordl TEXCOORD1 itle Color 3 euo vpconn main appdata IN uniform float4x4 WorldViewProj uniform float4x4 TexTransform uniform float3x3 WorldIT uniform float3 LightVec vpconn OUT float3 worldNormal normalize mul WorldIT IN Normal float ldotn max dot LightVec worldNormal 0 0 Qu Color Q yz loud float4 tempPos tempPos xyz IN Position xyz tempPos w 1 0 OUT TexCoordO0 OUT TexCoordl mul TexTransform tempPos mul TexTransform tempPos OUT Hposition mul WorldViewProj tempPos CECU OU 808 00504 0000 004 153 NVIDIA Cg Language Toolkit Pixel Shader Source Code for Shadow Mapping SIC float4 float4 float4 float4 hg simple position BOS ION TexCoord0 TEXCOORDO TexCoordl TEXCOORD1 Colon RECEN OT ESTA float4 main v2f simple IN uniform sampler2D ShadowMap uniform sampler2D SpotLight COLOR float4 shadow tex2D ShadowMap IN TexCoord0 xy float4 spotlight tex2D SpotLight IN TexCoordl xy float4 return lighting IN Color0 shadow spotlight lighting 154 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Shadow Volume Extrusion Description This effect uses vertex programs to generate shadow volumes by extruding geometry along the light vector Figure 20 Figure 20 Example of Shadow Volum
220. ons arrays indexed with variable expressions need not be declared const just uniform However writing to an array that is later indexed with a variable expression yields unpredictable results Array data is not packed because vertex program indexing does not permit it Each element of the array takes a single 4 float program parameter register For example float arr 10 float2 arr 10 float3 arr 10 and float4 arr 10 all consume 10 program parameter registers Itis more efficient to access an array of vectors than an array of matrices Accessing a matrix requires a floor calculation followed by a multiply by a constant to compute the register index Because vectors and scalars take one register neither the floor nor the multiply is needed It is faster to do matrix skinning using arrays of vectors with a premultiplied index than using atrays of matrices 808 00504 0000 004 197 NVIDIA Cg Language Toolkit Bindings Binding Semantics for Uniform Data Table 10 summatizes the valid binding semantics for uniform parameters in the vs_2 0 and vs_2 X profiles Table 10 vs 2 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c255 Constant register 0 95 C0 C255 The aliases c0 c95 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used
221. ooooooooooooooos 14 Arithmetic Operators rom C uoa epos a ete 14 Multiplication PUNCHONS vw 3 is di BER a rr qc br TR a n 15 Vector CONSTUCO sd actio o RU Heo EA AA gs 15 Boolean and Comparison Operators 2 s l e hh 15 SWizzle OpeFatof sacar see tbi e eoe em hg repe bh deas re Rotes 16 Write Mask Operator i a aa hah a a RG aes ac Rara 16 Conditional Operator x ns exo ERE ax RR ME Rer eg AA dpt os 17 Texture Lookups in Advanced Fragment Profiles ooooooooommoo 17 More Detalls 22 30 rm 18 Cg Standard Library Functions cooocccccccc nnn rn 19 Mathematical EUNEONS 5 2a it ri a mot abad dle diete 19 GeOITigbric FUNCHONG ot ia ia ia a iS oie 24 Texture Map FUNCIONS s x suede ri e Ed esas tons 25 Derivative FUNCHONS gt lt a main cea AG gotta neck or ta e 27 Debugding FUNCI N 22 e tre io a de PER Rx Roca ipte d eres 28 Predefined Fragment Program Output Structures llle 28 808 00504 0000 004 NVIDIA Cg Language Toolkit Using the Cg Runtime Library osc ee ee eee RR Rn 29 Introducing the Cg Rutritiirie cimas sarria ae PAG ip a AA 29 Benefits of the Cg R fltllIne aeaiee sek be bx e it e te del 29 Overview of the Ca Runtlie aua ek Exch Rb xx Rus ERU eda x ERA 30 Gore Cy Until dois ba ERA abate o E e ae Oe dede 34 Core Co Context zen ich ipeo iced etur aa A ie re 35 Core Gg Program seirene s eima m nah ph HR RR ECEORORORO RUN A A ae RE 35 Gore Cg Palatmeltel sse ape o
222. opts profopts Specify a comma separated list of profile specific options See the profile specification for valid options QO entry fname Specify the main function name as fname O o fname Write the output to file fname O Dmacro value Define a macro with optional value A Ipathname Specify path to an include directory Q 1 filename Write compiler messages to filename rather than to standard output Q strict Enforce strict type checking QO nofx Do not treat CgFX keywords as reserved words Qh quiet Suppress printing the header to stdout Q nocode Compile but do not generate any code Oh nostdlib Do not include the stdlib h header file before compilation 808 00504 0000 004 265 NVIDIA Cg Language Toolkit OU longprogs Allow code generation that is longer than a profiles limit Q debug Activate the debug function Q v Print the compiler s version to stdout a h Print a short help message Q maxunrollcount N Set the maximum loop unroll count to N Loops with greater than N iterations are not unrolled Defaults to 256 Q posinv Generate a position invariant vertex program if position invariance is supported by the current profile 266 808 00504 0000 004 NVIDIA A abs for performance 259 animation of geometry 146 anisotropic lighting sample shader 134 vertex shader code example 135 ANSI C differences from Cg 166 relation to Cg 165 arbfp1 profile 211 arbvp1 profile
223. or Callbacks Here is an example of a possible error callback that sorts out debug trace errors from core runtime errors and from Direct3D runtime errors void MyErrorCallback CGerror error cgGetError if error cgD3D9DebugTrace This is a debug trace output A breakpoint could be set here to step from one debug output to the other Detur char buffer 1024 if error cgD3D9Failed soler mire WA Walwecie sw error Ocorre Es War cgD3D9TranslateHRESULT cgD3D9GetLastError else Sjoveslinicse To wrtafs SAGE SIS OTIO G Uem cgD3D9TranslateCGerror error OutputDebugString buffer cgSetErrorCallback MyErrorCallback 808 00504 0000 004 87 NVIDIA Cg Language Toolkit 88 808 00504 0000 004 NVIDIA A Brief Tutorial This section walks you through the sample Cg Microsoft Visual Studio wotkspace we have provided along with a simple Cg program that you can use fot experimentation Loading the Workspace When you load the Cg Simple file your workspace should look like the image in Figure 3 Mis p gee pee e dd leds gehe me amp wug T area Sm Jin dar tua l debio tugun rcm appliratizm quere 1 ness zira i appii El vn vga Fina Loi m im DEE Pain 1h ear Fy mete itr i Ploktd Errei T dia pae cem debia quipete iros rerien uheia siruci martori wati EPoalth R Dat 4 Gabor sto main mppixz 1N unbiors Dict
224. ormal parameters may be qualified as in out or both by using in out or inout By default formal parameters are in qualified An in qualified parameter is equivalent to a call by value parameter An out qualified parameter is equivalent to a call by result parameter and an inout qualified parameter is equivalent to a value result parameter An out qualified parameter cannot be const qualified nor may it have a default value 808 00504 0000 004 175 NVIDIA Cg Language Toolkit Type Conversions Some type conversions are allowed implicitly while others require an cast Some implicit conversions may cause a warning which can be suppressed by using an explicit cast Explicit casts are indicated using C style syntax casting variable to the 1oat4 type can be achieved using float4 variable a Scalar conversions Implicit conversion of any scalar numetic type to any other scalar numeric type is allowed A warning may be issued if the conversion is implicit and a loss of precision is possible Implicit conversion of any scalar object type to any compatible scalar object type is allowed Conversions between incompatible scalar object types or between object and numeric types are not allowed even with an explicit cast A sampler is compatible with sampler1D sampler2D sampler3D samplerCube and samplerRECT No other object types are compatible sampler1D is not comparable with sampler2D even though both are compatible with sampler
225. ort for fixed operations but must still support definition of fixed varlables Cg allows profiles to omit run time support for int Cg allows profiles to treat double as float Many operators support per element vector operations The amp amp and comparison operators can be used with bool four vectots to perform four conditional operations simultaneously The side effects of all operands to the and amp amp operators are always executed Q Non static global variables and parameters to top level functions such as main may be designated as uniform A uniform variable may be read and written within a program just like any other variable However the uniform modifiet indicates that the initial value of the variable or parameter is expected to be constant across a large number of invocations of the program A new set of sampler types represents handles to texture objects D Functions may have default values for their parameters as in C These defaults are expressed using assignment syntax Function overloading is supported There is no enum ot union Bit field declarations in structures are not allowed There are no bit field declarations in structures D Do 0 O Variables may be defined anywhere before they are used rather than just at the beginning of a scope as in C That is we adopt the C rules that govern where variable declarations are allowed Variables may not be redeclare
226. osAngle 1 0 xxxx return OUT 808 00504 0000 004 145 NVIDIA Cg Language Toolkit Grass Description This effect shows procedural animation of geometty using a Sine function along with calculation of a normal for the procedurally deformed geometry Figure 17 Figure 17 Example of Grass Vertex Shader Source Code for Grass struct app2vert isto ais M PON OIM EO STEDIRGONIS 146 808 00504 0000 004 NVIDIA float4 Normal Basic Profile Sample Shaders NORMAL float4 TexCoord0 TEXCOORDO float4 Color0 COLORO he struct vertout float4 Hposition POSITION float4 Color0 COLORO float4 TexCoord0 TEXCOORDO y vertout main app2vert IN uniform uniform uniform uniform float4x4 ModelViewProj float4x4 ModelView float4x4 ModelViewIT float4 Constants vertout OUT we need to figure OUT what the position is float4 position IN Position position z 0 POSO ny Up add IN the actual base location of the straw stored IN Color0 xz POSTON 5 AO Sato aP JUN Colo sre POSTEN 7 O O Son EN CONO OR figure OUT where the wind is coming from float4 origin float4 20 0 20 0 float4 dir position origin find the intensity of the wind float inten sin Constants x 2 length dir JIN c POSILIE LOI S77 dir normalize dir we need to do some Bezier curv Eloet ermi float4 0 0 0 0 loci ceriz alo N Color 7 2 9 OP ODE float4 ctr13
227. ource Code for Thin Film Effect define inputs from application STEEL Ev float4 Position POSITION 124 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders float3 Normal NORMAL 10 define outputs from vertex shader SILIEMOE WIE float4 HPOS 2 POS a TON float4 diffCol SE e OLOROF float4 specCol COMORAS float2 filmDepth TEXCOORDO y v2f main a2v IN uniform float4x4 WorldViewProj uniform float4x4 WorldViewIT uniform float4x4 WorldView uniform float4 LightVector uniform float4 FilmDepth uniform float4 EyeVector wi UP transform position to clip space OUT HPOS mul WorldViewProj IN Position float4 tempnorm float4 IN Normal 0 0 transform normal from model space to view spac float3 normalVec mul WorldViewIT tempnorm xyz normalVec normalize normalVec compute th ye gt vertex vector float3 eyeVec EyeVector xyz compute the view depth for the thin film float viewdepth 1 0 dot normalVec eyeVec FilmDepth x OUT filmDepth viewdepth xx store normalized light vector float3 lightVec normalize float3 LightVector calculate half angle vector float3 halfAngleVec normalize lightVec eyeVec 808 00504 0000 004 125 NVIDIA Cg Language Toolkit calculate diffuse component float diffuse dot normalVec lightVec calculate specular component float specular dot normalVec halfAngleVec
228. ovides because many of them compile directly to GPU assembly language instructions Writing a dot product function of your own flog velar Elosies e dieses 19 1 Situ Elo o wp ELA yap Gig AO A compiles to a handful of instructions while the built in dot function compiles to a single specialized dot product instruction There s no other way to get to this instruction other than by using the Standard Library 808 00504 0000 004 259 NVIDIA Cg Language Toolkit Two functions deserve particular attention The abs function usually has no cost in either vertex or fragment programs because the GPU can evaluate the function while executing other instructions Similatly the saturate function usually has no cost in fragment programs Do not hesitate to use these functions when appropriate 4 Use Texture Maps to Encode Complex Functions For profiles that support texture maps filtered texture map lookups are extraordinarily efficient If you have a complex function that takes more than a handful of arithmetic operations to evaluate you might want to encode the function in a texture map Say that you have written a function f x y that is a bottleneck in your shader Assume for now that it is always called with values of x and y between zero and one and that the value that x y computes is always between zero and one If the function is reasonably smooth and you don t need to compute it at extremely high precision
229. parameter These functions iterate through all the simple parameters structure fields and array elements that are input to the program Nothing is guaranteed regarding the order of the parameters in the sequence Direct Retrieval Any parameter of a program can be retrieved directly by using its name with cgGetNamedParameter CGparameter cgGetNamedParameter CGprogram program const char name If the program has no parameter corresponding to name cgGetNamedParameter returns zero The Cg syntax is used to retrieve structure fields or array elements Let s take the following code snippet as an example struct FooStruct float4 A float4 B e Sieg U ele iByeurscii en d WOOSELUCE DOOL y void main BarStruct Bar 3 Un The following are valid names for retrieving the corresponding parameter Bar Bar 1 Bar 1 Foo Bar 1 Foo 0 Bar 1 Foo 0 B 808 00504 0000 004 41 NVIDIA Cg Language Toolkit Parameter Query Parameter queries encompass validity references and attributes Parameter Validity The function cgIsParameter allows you to check whether a parameter handle references a valid parameter or not CGbool cgIsParameter CGparameter parameter A parameter handle becomes invalid when the program or the context of the program it corresponds to is destroyed Parameter References A parameter that is referenced by the original Cg source code may be optimized out of the c
230. pe A function that takes no parameters may be declared in one of two ways Q Asin C using the void keyword functionName void Q With no parameters at all functionName 808 00504 0000 004 169 NVIDIA Cg Language Toolkit Functions may be declared as static If so they may not be compiled as a y ar Pey may t p program and are not visible from other compilation units Overloading of Functions by Profile Cg supports overloading of functions by compilation profile This capability allows a function to be implemented differently for different profiles It is also useful because different profiles may support different subsets of the language capabilities and because the most efficient implementation of a function may be different for different profiles The profile name must immediately precede the type name in the function declaration For example to define two different versions of the function myfunc for the profileA and profileB profiles protrTeA loer mytunc Elost x IIA protileB float myfunc float x If a type is defined using a typedef that has the same name as a profile the identifier is treated as a type name and is not available for profile overloading at any subsequent point in the file If a function definition does not include a profile the function is referred to as an open profile function Open profile functions apply to all profiles Several wildcard profile names are defined The
231. ple illustrating this operation CGprogam programl program2 programl cgCreateProgramrromBile context CG SOURCE VertexProgram cg CG PROFILE VS 1 1 0 0 const DWORD declarationl cgD3D8GetVertexDeclaration programl cgD3D8LoadProgram programl TRUE 0 0 declarationl program2 cgCopyProgram programl const DWORD declaration2 loaa Customs declaration H 808 00504 0000 004 75 NVIDIA Cg Language Toolkit 1f cgD3D8ValidateVertexDeclaration program2 declaration2 cgD3D8LoadProgram program2 TRUE 0 0 declaration2 Only the loading functions differ between Direct3D 9 and Direct3D 8 the unloading and binding functions ate the same To release the Direct3D resources allocated by cgD3D9LoadProgram such as the Direct3D shader object and any shadowed parameter use HRESULT cgD3D9UnloadProgam CGprogram program Note that cgD3D9UnloadProgam does not free any core runtime resources such as program and any of its parameter handles On the other hand destroying a program with cgDestroyProgram or cgDestroyContext releases any Direct3D resources by indirectly calling cgD3D9UnloadProgam Function cgD3D9IsProgramLoaded returns CG TRUE if a program is loaded CGbool cgD3D9IsProgramLoaded CGprogram program All programs must be loaded before they can be bound Binding a program is done by calling cgD3D9BindProgram HRESULT cgD3D9BindProgram CGprogram program This function basi
232. ply transposed float3x3 matrix m by a float3 v mul v m is equivalent to and more efficient than mul transpose m v 9 Minimize Conditional Code in Fragment Programs GPUs don t currently support branching in fragment programs a program with a large amount of code that is conditionally executed for example in an if else expression tends to run at the same speed as if all of it were executed Therefore if you have a large amount of conditional code and it is possible to evaluate the condition on the CPU it may be advantageous to have multiple versions of the shader source code and to bind the one with the appropriate code path at run time An example of this situation would be a fragment shader that supported a generic light source model for shading Depending on how its parameters were set it might implement a point light a spotlight or a light source that projected a texture map to determine the light distribution Rather than having a series of if else tests to determine which light model to use having a separate version of the shader for each light type is generally more efficient 808 00504 0000 004 263 NVIDIA Cg Language Toolkit 264 808 00504 0000 004 NVIDIA P Appendix D Cg Compiler Options This appendix describes the command line options for the Cg compiler What follows are the command line options for the Cg compiler egc exe A profile prof Compile for the prof profile Ud profile
233. pproach used to specify binding semantics for inputs Aliasing of Semantics Semantics must honor a copy on input and copy on output model Thus if the same input binding semantic is used for two different variables those variables are initialized with the same value but the variables are not aliased thereafter Output aliasing is illegal but implementations are not required to detect it If the compiler does not issue an error on a program that aliases output binding semantics the results are undefined Restrictions on Semantics Within a Structure For a particular profile it is illegal to mix input binding semantics and output binding semantics within a particular struct That is for a particular top level function a struct must be either input only or output only Likewise a struct must consist exclusively of uniform inputs or exclusively of non uniform inputs It is illegal to use binding semantics to mix the two within a single struct Additional Details for Binding Semantics The following rules are somewhat redundant but provide extra clarity Q Semantics names are case insensitive Q Semantics attached to parameters to non main functions are ignored Q Input semantics may be aliased by multiple variables Q Output semantics may not be aliased 184 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification How Programs Receive and Return Data A program is just a non static function that has been design
234. predefined output structures 28 varying output 8 fragment program profiles 193 OpenGL ARB 211 OpenGL NV_fragment_program 218 fragment program defined 2 fresnel 144 sample shader 144 vertex shader code example 144 function calls 171 multiplying 15 open profile 170 function definitions introduction 14 function overloading 181 introduction 14 functions debugging 28 declaring 169 derivative 27 geometric 24 mathematical 19 overloading by profile 170 standard library 19 texture map 25 G geometric functions 24 GL_ARB_vertex 204 global variables 182 graphics hardware evolution of xi grass sample shader 146 vertex shader code example 146 H half data type 11 half type specification 171 I if statements 185 inputs uniform 5 varying 5 int data type 11 int type specification 171 integral type category 174 J Java relation to Cg 165 L language profiles concept of 3 M mathematical functions 19 matrices multiplying 15 matrices support of 11 matrix palette skinning 161 sample shader 161 vertex shader code example 162 matrix transposes and performance 263 melting paint 270 808 00504 0000 004 NVIDIA pixel shader code example 107 sample shader 105 vertex shader code example 105 min for performance 259 miscellaneous operators 190 modifiable function parameters passing 14 multipaint pixel shader code example 111 sample shader 109 vertex shader code example 110 namespaces 179 numeric type c
235. ption 808 00504 0000 004 171 NVIDIA Cg Language Toolkit to provide full support for the fixed type or to implement the fixed type with the same precision as the half or float types The bool type represents Boolean values Objects of bool type are either true or false The cint type is 32 bit two s complement This type is meaningful only at compile time it is not possible to declare objects of type cint The c 1oat type is IEEE single precision 32 bit floating point This type is meaningful only at compile time it is not possible to declare objects of type cfloat The void type may not be used in any expression It may only be used as the return type of functions that do not return a value The sampler types ate handles to texture objects Formal parameters of a program or function may be of type sampler No other definition of sampler variables is permitted A sampler vatiable may only be used by passing it to another function as an in parameter Assignment to sampler variables is not permitted and sampler expressions are not permitted The following sampler types are always defined sampler sampler1D sampler2D sampler3D samplerCUBE and samplerRECT The base sampler type may be used in any context in which a more specific sampler type is valid However a sampler variable must be used in a consistent way throughout the program For example it cannot be used in place of both a sampler1D and a sampler2D
236. r offset rectangle scale NV texture shader instructions texlD dp3 samplerlD tex float3 str float4 prevlookup Performs the following return texlD tex dot str prevlookup xyz where str are texture coordinates associated with sampler tex and prevlookup is the result of a previous texture operation This function can be used to generate the dot product 1d NV texture shader instruction 252 808 00504 0000 004 NVIDIA Appendix B Language Profiles Table 50 p20 Auxiliary Texture Functions continued Texture Function Description tex2D_dp3x2 uniform sampler2D tex float3 str float4 intermediate coord float4 prevlookup texRECT_dp3x2 uniform samplerRECT tex float3 str float4 intermediate coord float4 prevlookup Performs the following float2 newst float2 dot intermediate coord xyz prevlookup xyz dot str prevlookup xyz return tex2D RECT tex newst where str are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation and intermediate coord are texture coordinates associated with the previous texture unit This function can be used to generate the dot product 2d or dot product rectangle NV texture shader instruction combinations tex3D dp3x3 sampler3D tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup texCUBE dp3x3 samplerCUBE tex float3 str float4 intermediate coordl
237. r y g z b w a xy rg xyz rgb xyzw rgba xxx rrr yyy ggg zzz bbb www aaa xxxx rrrr yyyy gggg zzzz bbbb wwww aaaa Matrix swizzles are not supported Boolean operators other than lt gt and gt are not supported Furthermore lt lt gt and gt are only supported as the condition in the operator Bitwise integer operators are not supported is not supported unless the divisor is a non zero constant or it is used to compute the depth outputin ps 1 3 is not supported Ternary is supported if the boolean test expression is a compile time boolean constant a uniform scalar boolean or a scalar comparison to a constant value in the range 0 5 1 0 for example a 0 5 b c Q do for and while loops are supported only when they can be completely unrolled Q arrays vectors and matrices may be indexed only by compile time constant values or index variables in loops that can be completely unrolled Q The discard statement is not supported The similar but less general clip function is supported Q The use of an allocation rule identifier for an input or output struct is optional 808 00504 0000 004 229 NVIDIA Cg Language Toolkit Standard Library Functions Because the DirectX pixel shader 1_X profiles have limited capabilities not all of the Cg standard library functions are supported Table 35 presents the Cg standard library functions that are supported by these profiles See th
238. r fragment program profiles Profiles may define additional output binding semantics with specific behaviors and these definitions are expected to be consistent across commonly used profiles Table 9 Fragment Output Binding Semantics Name Meaning Type Default Value COLOR RGBA output color float4 Undefined 808 00504 0000 004 193 NVIDIA Cg Language Toolkit Table 9 Fragment Output Binding Semantics continued COLORO Same as COLOR DEPTH Fragment depth value float Interpolated depth from rasterizer in range 0 1 in range 0 1 If a program desires an output color alpha of 1 0 it should explicitly write a value of 1 0 to the w component of the COLOR output The language does not define a default value for this output Note If the target hardware uses a default value for this output the compiler may choose to optimize away an explicit write specified by the user if it matches the default hardware value Such defaults are not exposed in the language In contrast the language does define a default value for the DEPTH output This default value is the interpolated depth obtained from the rasterizer Semantically this default value is copied to the output at the beginning of the execution of the fragment program As discussed earlier when a binding semantic is applied to an output the type of the output variable is not required to match the type of the bindin
239. r needs more registers to compile a program than are available it generates an error 2 To understand the capabilities of DirectX PS 2 0 Pixel Shaders and the code produced by the compiler refer to the Pixel Shader Reference in the DirectX 9 SDK documentation 200 808 00504 0000 004 NVIDIA Appendix B Language Profiles Language Constructs and Support Data Types This profile implements data types as follows Q float data type is implemented as IEEE 32 bit single precision Q half fixed and double data types are treated as float half data types can be used to specify partial precision hint for pixel shader instructions int data type is supported using floating point operations sampler types are supported to specify sampler objects used for texture fetches Statements and Operators With the ps 2 0 profiles while do and for statements are allowed only if the loops they define can be unrolled because there is no dynamic branching in PS 2 0 shaders In current Cg implementation extended ps_2_x shaders also have the same limitation Comparison operators ate allowed gt lt gt lt and Boolean operators 1 1 amp amp are allowed However the logic operators 8 are not Using Arrays and Structures Variable indexing of arrays is not allowed Array and structure data is not packed 808 00504 0000 004 201 NVIDIA Cg Language Toolkit Bindings Binding Semanti
240. r produces see the Vertex Shader Reference in the DirectX 9 SDK documentation 196 808 00504 0000 004 NVIDIA Appendix B Language Profiles Statements and Operators If the vs 2 0 profile is used then i while do and for statements are allowed only if the loops they define can be unrolled because there is no dynamic branching in unextended VS 2 0 shaders Ifthe vs 2 x profile is used then i while and do statements are fully supported as long as the DynamicFlowControlDepth option is not 0 Comparison operators ate allowed gt lt gt lt and Boolean operators 11 amp amp are allowed However the logic operators 8 are not Data Types The profiles implement data types as follows Q float data types are implemented as IEEE 32 bit single precision Q half and double data types ate treated as float Q int data type is supported using floating point operations which adds extra instructions for proper truncation for divides modulos and casts from floating point types Q fixed or sampler data types are not supported but the profiles do provide the minimal partial support that is required for these data types by the core language specification that is it is legal to declare variables using these types as long as no operations are performed on the variables Using Arrays Variable indexing of arrays is allowed as long as the array is a uniform constant For compatibility reas
241. rc rabo Ede CR BOE RC SUPER WC SOME RO a te ER 39 Core CO ENO si oro IUIUS 44 APTSpecific Cg R ntilies a m o cc A eee ROC UR ales 45 Parameter SHOWING 1 422 rte hex EO oet heh RR Rp OR RR ER CE RS 46 OpenGL Cg RUMME gr scary EQ ES EON Ru ESO E RR 46 Direct3D Cg RUNTIME ss cos ere eee ie rol weed Si dP ed OS ko eee 57 A Brief T torlal 4 52x acres area E C CERCA CR RR RC DR or RC 89 Loading the WorkSpace sa cccc5 4000e 4 RR EY ad ei 89 Understanding simple versatil 90 Program Listing for Simple CO 125 uen d is di e e er A 91 Definitions for Structures with Varying Data oocccccororooooomoos 92 Passing AUMENS cz rro rr as 93 Basic Iranisformationis 0d ica dt etre Ad ane dedos 93 Prepare for Lightihg apre epu ark otn demo A ae eR rers dade dus 94 Calculating the Vertex Colo chos vx gae bue ru boli ghd CERA GUERRE dd 94 Further Experimentation i224 cie ka ERR RAEG A Se 95 Advanced Profile Sample Shaders ieeeeee enn 97 Improved SKINNING sawed cara media a RR REOR ORG RN A ee eines eae 98 prego DP tt 98 Vertex Shader Source Code for Improved Skinning o oooooooomoo 99 Improved Waters 22x eic acku addo eee PE Ea E Rok d EP E donde tn dete tos 101 D eSCriptlO cuins dre kCR SCR SR 101 Vertex Shader Source Code for Improved Water ilsis ee 102 Pixel Shader Source Code for Improved Water llle 104 Meting Pant s pora pra bead ene Roca
242. rder and the c suffix is for functions that assume the matrix is laid out in column otder The corresponding parameter value retrieval functions are void cgGLGetMatrixParameterfr CGparameter parameter float matrix void cgGLGetMatrixParameterfc CGparameter parameter float matrix void cgGLGetMatrixParameterdr CGparameter parameter double matrix void cgGLGetMatrixParameterdc CGparameter parameter double matrix Use egGLSetStateMatrixParameter to set a OpenGL 4x4 state matrix void cgGLSetStateMatrixParameter CGparameter parameter GLenum stateMatrixType GLenum transform The variable stateMatrixType is an enumerate type specifying the state matrix to be used to set the parameter Q CG GL MODELVIEW MATRIX for the current model view matrix 48 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Q CG GL PROJECTION MATRIX for the current projection matrix CG GL TEXTURE MATRIX for the current texture matrix CG GL MODELVIEW PROJECTION MATRIX for the concatenated model view and ptojection matrices The variable transform is an enumerate type specifying a transformation applied to the state matrix before it is used to set the parameter value O CG GL MATRIX IDENTITY for applying no transformation at all CG_GL MATRIX TRANSPOSE for transposing the matrix a Q CG GL MATRIX INVERSE for inverting the matrix a CG GL MATRIX INVERSE TRANSPOSE for inverting and transposing the matrix Setting
243. re them in the program itself Instead the compiler will issue as comments a list of program parameter registers and the constants that need to be loaded into them The Cg run time system will handle loading the constants as directed by the compiler Note If the Cg run time system is not used it is the responsibility of the programmer to make sure that the constants are loaded properly 224 808 00504 0000 004 NVIDIA Bindings Appendix B Language Profiles Binding Semantics for Uniform Data Table 31 summarizes the valid binding semantics for uniform parameters in the vs_1 1 profile Table 31 vs_1 1 Uniform Input Binding Semantics Binding Semantics Name Corresponding Data register c0 register c95 C0 C95 Constant register 0 95 The aliases c0 c95 lowercase are also accepted If used with a variable that requires more than one constant register for example a matrix the semantic specifies the first register that is used Binding Semantics for Varying Input Output Data Table 32 summarizes the valid binding semantics for uniform parameters in the vs 1 1 profile These map to the input registers in DirectX 8 1 vertex shaders Table 32 vs 1 1 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION Vertex shader input register vO BLENDWEIGHT Vertex shader input register v1 BLENDINDICES Vertex shader input register v2 NORMAL
244. re element of type float4x4 with an input binding semantic that causes it to track the fixed function modelview projection matrix The name of this binding semantic is currently profile specific for OpenGL profiles the semantic _GL_MVP is recommended 192 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification Q If the first condition is met but not the second the compiler is encouraged to issue a warning Q Implementations may choose to recognize more general versions of the second condition such as the variables being copy propagated from the original inputs and outputs but this additional generality is not required Binding Semantics for Outputs As shown in Table 8 there are two output binding semantics for vertex program profiles Table 8 Vertex Output Binding Semantics Name Meaning Type Default Value POSITION Homogeneous clip space position float4 Undefined fed to rasterizer PSIZE Point size float Undefined Profiles may define additional output binding semantics with specific behaviors and these definitions are expected to be consistent across commonly used profiles Fragment Program Profiles A few features of the Cg language that are specific to fragment program profiles are requited to be implemented in the same manner for all fragment program profiles Binding Semantics for Outputs As shown in Table 9 there are three output binding semantics fo
245. rent kinds of inputs a Varying inputs are used for data that is specified with each element of the stream of input data For example the varying inputs to a vertex program are the pet vertex values that are specified in vertex arrays For a fragment program the varying inputs are the interpolants such as texture coordinates a Uniform inputs are used for values that are specified separately from the main stream of input data and don t change with each stream element For example a vertex program typically requires a transformation matrix as a uniform input Often uniform inputs are thought of as graphics state Varying Inputs to a Vertex Program A vertex program typically consumes several different per vertex varying inputs For example the program might require that the application specify the following varying inputs for each vertex typically in a vertex array a Model space position Q Model space normal vector Q Texture coordinate In a fixed function graphics pipeline the set of possible per vertex inputs is small and predefined This predefined set of inputs is exposed to the application through the graphics API For example OpenGL 1 4 provides the ability to specify a vertex array of normal vectors In a programmable graphics pipeline there is no longer a small set of predefined inputs It is perfectly reasonable for the developer to write a vertex program that uses a per vertex refractive index value as long as t
246. ression is a compile time boolean constant a uniform scalar boolean or a scalar comparison to a constant value in the range 0 5 1 0 for example a gt 0 5 b c Q do for and while loops are supported only when they can be completely unrolled Q arrays vectors and matrices may be indexed only by compile time constant values or index variables in loops that can be completely unrolled Q The discard statement is not supported The similar but less general clip function is supported Q The use of an allocation rule identifier for an input or output struct is optional Standard Library Functions Because the p20 profile has limited capabilities not all of the Cg standard library functions are supported Table 45 presents the Cg standard library functions that are supported by this profile See the standard library documentation for descriptions of these functions Table 45 Supported Standard Library Functions dot floatN floatN lerp floatN floatN floatN lerp floatN floatN float texlD samplerlD float tex1D sampler1D float2 tex1Dproj sampler1D float2 texlDproj samplerlD float3 tex2D sampler2D float2 tex2D sampler2D float3 tex2Dproj sampler2D float3 tex2Dproj sampler2D float4 texRECT samplerRECT float2 808 00504 0000 004 247 NVIDIA Cg Language Toolkit Table 45 Supported Standard Library Functions continued
247. revious texture operation intermediate coordl are texture coordinates associated with the n 2 texture unit and intermediate _coord2 are texture coordinates associated with the n 1 texture unit This function can be used to generate the texm3x3pad texm3x3pad texm3x3tex instruction combination in all ps 1 x profiles 236 808 00504 0000 004 NVIDIA Appendix B Language Profiles Table 40 ps 1 x Auxiliary Texture Functions continued Texture Function Description texCUBE reflect dp3x3 uniform samplerCUBE tex float4 strq float4 intermediate coordl float4 intermediate coord2 float4 prevlookup Performs the following float3 E float3 intermediate coord2 w intermediate coordl w strq w float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot strq xyz prevlookup xyz return texCUBE tex 2 dot N E dot N N N E where strq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate coordl are texture coordinates associated with the n 2 texture unit and intermediate coord are texture coordinates associated with the n 1 texture unit This function can be used to generate the texm3x3pad texm3x3pad texm3x3vspec instruction combination in all ps 1 x profiles 237 808 00504 0000 004 NVIDIA Cg Language Toolkit Table 40 ps 1 x Auxiliary Texture Fu
248. riables In this case the homogeneous position information resides in the hardware register corresponding to POSITION and that the color information resides in the hardware register corresponding to COLOR Passing Arguments Now let s take a look at the body of the program section by section starting with the declaration of main vertout main appin IN uniform float4x4 ModelViewProj uniform float4x4 ModelViewIT uniform float4 LightVec As required for a vertex program main takes an application to vertex structure as input and returns a vertex to fragment structure In this case we are using the two structure types we have already defined appin and vertout Notice that main takes in three uniform parameters two matrices and one vector All three parameters are passed to simple cg by the application using the run time library The first matrix ModelViewProj is the concatenation of the modelview and projection matrices Together these matrices transform points from model space to clip space The second matrix ModelViewIT is the inverse transpose of the modelview matrix The third parameter LightVec is a vector that specifies the location of the light source Basic Transformations Now we start the body of the vertex program vertout OUT OUT HPosition mul ModelViewProj IN Position A vertex program is responsible for calculating the homogenous clip space position of the vertex given the vertex s model
249. rns a 4 vector as follows e The x component of the result vector contains the ambient coefficient which is always 1 0 The y component contains the diffuse coefficient which is zero if n 1 0 otherwise n 1 e The z component contains the specular coefficient which is zero if either n 1 lt Oor n e h lt 0 n 9 n otherwise e The w component is 1 0 There is no vectorized version of this function log x Natural logarithm 1n x x must be greater than zero log2 x Base 2 logarithm of x x must be greater than zero log10 x Base 10 logarithm of x x must be greater than zero max a b Maximum of a and b 808 00504 0000 004 21 NVIDIA Cg Language Toolkit Table 1 Mathematical Functions continued Mathematical Functions Function Description min a b Minimum of a and b modf x out ip Splits x into integral and fractional parts each with the same sign as x Stores the integral part in ip and returns the fractional part mul M N Matrix product of matrix M and matrix N as shown below M 11 mul M N M Mis Ma If M has size AxB and N has size BxC returns a matrix of size AxC mul M v Product of matrix M and column vector v as shown below mul M v SIS My Mis Mis Mia If Mis an AxB matrix and v is an Bx1 vector returns an Ax1 vector mul v M
250. rocessor We refer to these programs as vertex programs and fragment programs respectively Fragment programs are also known as pixel programs ot pixel shaders and we use these terms interchangeably in this document Cg code can be compiled into GPU assembly code either on demand at run time or beforehand 2 808 00504 0000 004 NVIDIA Introduction to the Cg Language Cg makes it easy to combine a Cg fragment program with a handwritten vertex program or even with the non programmable OpenGL or DirectX vertex pipeline Likewise a Cg vertex program can be combined with a handwritten fragment program or with the non programmable OpenGL or DirectX fragment pipeline Cg Language Profiles Because all CPUs support essentially the same set of basic capabilities the C language supports this set on all CPUs However GPU programmability has not quite yet reached this same level of generality For example the current generation of programmable vertex processors supports a greater range of capabilities than do the programmable fragment processors Cg addresses this issue by introducing the concept of language profiles A Cg profile defines a subset of the full Cg language that is supported on a particular hardware platform or API The current release of the Cg compiler supports the following profiles a DirectX 9 vertex shaders Runtime profiles CG PROFILE VS 2 X CG PROFILE VS 2 0 Compiler options profile vs 2 x profile vs 2 0 Q Dire
251. rs is set by some function of the Direct3D Cg runtime it is immediately downloaded to the GPU constant memory the memory containing the values of all the uniform parameters When parameter shadowing is turned on the value is shadowed instead and no Direct3D call is made at the time it is set only when the program is bound are all of its parameters actually downloaded to the constant memory This means that a parameter value set after binding the program is not used during the execution of the program until the next time the program is bound Parameter shadowing applies to all parameter settings including texture state stage and texture mode Disabling parameter shadowing allows the runtime to consume less memory but forces the application to do the work of making sure that the constant memory contains all the right values every time it activates a program OpenGL Cg Runtime This section discusses setting parameters and program execution for the OpenGL Cg runtime Setting Parameters in OpenGL In accordance with the OpenGL convention many of the functions described below come in two versions a version operating on float values marked with an f and a version operating on double values marked with a d 46 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Setting Uniform Scalar and Uniform Vector Parameters To set the values of scalar parameters or vector parameters use the cgGLSetParameter functions void void
252. ructions The underlying instruction set and machine architecture limit programmability in this profile compared to what is allowed by Cg constructs Thus this profile places additional restrictions on what can and cannot be done in a Cg program Restrictions A Cg program in one of these profiles is limited to generating a maximum of four texture shader instructions and eight register combiner instructions Since these numbers are quite small users need to be very aware of this limitation while writing Cg code for these profiles The p20 profile also restricts when a texture shader operation or arithmetic operation can occur in the program A textute shader operation may not have any dependency on the output of an arithmetic operation unless O the arithmetic operation is a valid input modifier for the texture shader operation Q the arithmetic operation is part of a complex texture shader operation which ate summatized in the section Auxiliary Texture Functions on page 251 9 For more details about the underlying instruction sets their capabilities and their limitations please refer to the NV_texture_shader and NV_register_combiners extensions in the OpenGL Extensions documentation 244 808 00504 0000 004 NVIDIA Modifiers Appendix B Language Profiles There are certain simple arithmetic operations that can be applied to inputs of texture shader operations and to inputs and outputs of arithmetic operations
253. s like this one you can create a vertex declaration using those semantics DWORD declaration D3DVSD_STREAM 0 D3DVSD_REG D3DVSDE POSITION D3DVSDT_FLOAT3 D3DVSD_REG D3DVSD_REG D3DVSD_END D3DVSDE DIFFUSE D3DVSDT D3DCOLOR D3DVSDE TEXCOORDO D3DVSDT FLOAT2 808 00504 0000 004 67 NVIDIA Cg Language Toolkit Make sure the resulting declaration is compatible with the shader This is really just a sanity check assert cgD3D8ValidateVertexDeclaration vertexProgram declaration Create the shader handle using the declaration device gt CreateVertexShader declaration byteCode gt GetBufferPointer amp vertexShader 0 Create the pixel shader fragmentProgram cgCreateProgramFromFile context CG_SOURCE FragmentProgram cg ds PROMI ES 1 1 Vinicecmemceioguen 2 CComPtr lt ID3DXBuffer gt byteCode const char progSrc cgGetProgramString fragmentProgram CG_COMPILED PROGRAM D3DXAssembleShader progSrc strlen progSrc 0 0 O0 amp byteCode 0 device gt CreatePixelShader byteCode gt GetBufferPointer amp pixelShader Grab some parameters modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor Sanity check t
254. s that is 1ightVec eyeVec 2 We normalize halfVec so we don t need to bother with the division by two because it cancels out after normalization anyway In this example we assume that the eye is at 0 0 1 but an application would typically pass the eye position also as a uniform parametet since it would be unchanged from vertex to vertex We use Cg s inline vector construction capability to build a 3 component float vector that contains the eye position and then we assign this value to eyeVec Calculating the Vertex Color Now we have to calculate the vertex color to output Calculating the Diffuse and Specular Lighting Contributions In this example we re going to calculate just a simple combination of diffuse and specular lighting calculate diffuse component float diffuse dot normalVec lightVec 1 Because LightVec is uniform it is more efficient to normalize it once in the application rather than on a per vertex basis It is done here for illustrative purposes 94 808 00504 0000 004 NVIDIA A Brief Tutorial calculate specular component float specular dot normalVec halfVec Use the lit function to compute lighting vector from diffuse and specular values float4 lighting lit diffuse specular 32 Here we use the Cg Standard Library to perform dot products using dot We also make use of the Standard Library s 1it function to calculate a Blinn style lighting vector based on th
255. s HdotN and LdotN per vertex to look up into a 2D texture to achieve interesting lighting effects Figure 13 Example of Anisotropic Lighting 134 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Anisotropic Lighting struct appdata itlowurS iesus IOS IVIPILONNIP float3 Normal NORMAL he shesauicia vo COmnen float4 Hposition POSITION float4 TexCoord0 TEXCOORDO H vpconn main appdata IN uniform float4x4 WorldViewProj uniform float3x3 WorldIT uniform float3x4 World uniform float3 LightVec uniform float3 EyePos vpconn OUT float3 worldNormal normalize mul Wor1dIT IN Normal build float4 float4 tempPos tempPos xyz IN Position xyz tempPos w 1 0 compute world space position float3 worldSpacePos mul World tempPos wector from vertex to eye normalized float3 vertToEye normalize EyePos worldSpacePos h normalize l e float3 halfAngle normalize vertToEye LightVec OUI o omBUNES max dot LightVec worldNormal 0 0 OUT TexCoord0 y max dot halfAngle worldNormal 0 0 transform into homogeneous clip space OUT Hposition mul WorldViewProj tempPos return OUT 808 00504 0000 004 135 NVIDIA Cg Language Toolkit Bump Dot3x2 Diffuse and Specular Description The bump dot3x2 diffuse and specular effect mixes bump mapping with diffuse and specular lighting based on the
256. s a single pass shader containing diffuse speculat and environmental lighting effects in a compact fast executing package Figure 8 Example of MultiPaint 808 00504 0000 004 109 NVIDIA Cg Language Toolkit Vertex Shader Source Code for MultiPaint define inputs from vertex buffer struct appin float4 Position Poe TT LON float4 UV TEXCOORDO float4 Tangent IMEPACOMIRUDIL 3 float4 Binormal TEXCOORD2 float4 Normal TEXCOORD3 n output same struct is the input struct MultiPaintV2F float4 HPosition POSITION float4 TexCoords CELE COORD YY float3 OPosition MERC O ORD float3 Normal TEXCOORD2 float3 VPosition TEXCOORD3 float3 T TEXCOORD4 float3 B TEXCOORDS ER float3 N TEXCOORD6 float4 LightVecO MEXCO ORD y MultiPaintV2F main appin uniform uniform uniform uniform uniform IN float MultiPaintV2F OUT OUT HPosition mul ModelViewProj OUT OPosition IN Position xyz transform normal to eye space OUT Normal OUT TexCoords IN UV TexRepeats normalize mul ModelViewIT ito Mes plagam o cuj position clip space base ST coordinates position obj space normal eye space view pos obj space tangent obj space binormal obj space normal obj space largime Chir OS pace float4x4 ModelViewProj float4x4 ModelViewIT float4x4 ModelViewI float4 TexRepeats LightVec eye space IN Position
257. s allow the programmer to decide which constant register a uniform variable will reside in by specifying the C lt n gt register c lt n gt binding semantic This is not allowed in the p20 profile since the NV register combiners extension does not have a single bank of constant registers While the NV register combiners extension does describe constant registers these constant registers are per combiner stage and specifying bindings to them in the program would overly constrain the compiler 808 00504 0000 004 249 NVIDIA Cg Language Toolkit Binding Semantics for Varying Input Output Data The varying input binding semantics in the p20 profile are the same as the varying output binding semantics of the vp20 profile Varying input binding semantics in the p20 profile consist of COLORO COLOR1 TEXCOORDO TEXCOORD1 TEXCOORD2 and TEXCOORD3 These map to output registers in vertex shaders Table 48 summarizes the valid binding semantics for varying input parameters in the p20 profile Table 48 p20 Varying Input Binding Semantics Binding Semantics Name Corresponding Data COLOR COLORO Input color value vo COL COLO COLOR1 Input color value v1 COL1 TEXCOORDO TEXCOORD3 Input texture coordinates t0 t3 TEXO TEX3 FOGP Input fog color and factor FOG Additionally the p20 profile allows POSITION PSIZE TEXCOORD4 TEXCOORD5 TEXCOORD6 and TEXCOORD to be specified on varying inputs
258. s formal parameters and each of the excess parameters has a default value do not eliminate the function 4 If the set is empty fail For each actual parameter expression in sequence perform the following a If the type of the actual parameter matches the unqualified type of the corresponding formal parameter in any function in the set remove all functions whose corresponding parameter does not match exactly b If there is a defined promotion for the type of the actual parameter to the unqualified type of the formal parameter of any function remove all functions for which this is not true from the set c If there is a valid implicit cast that converts the type of the actual parameter to the unqualified type of the formal parameter of any function remove all functions without this cast 808 00504 0000 004 181 NVIDIA Cg Language Toolkit d Fail 5 Choose a function based on profile a If there is at least one function with a profile that exactly matches the compilation profile discard all functions that don t exactly match b Otherwise if there is at least one function with a wildcard profile that matches the compilation profile determine the most specific matching wildcard profile in the candidate set Discard all functions except those with this most specific wildcard profile How specific a given wildcard profile name is relative to a particular profile is determined by the profile specification
259. s int the other operand is converted to int 7 Otherwise both operands have type cint Note that conversions happen prior to performing the operation Assignment Assignment of an expression to an object or compile time typed value converts the expression to the type of the object or value The resulting value 1s then assigned to the object or value 178 808 00504 0000 004 NVIDIA Appendix A Cg Language Specification The value of the assignment expressions and so on is defined as in C An assignment expression has the value of the left operand after the assignment but is not an lvalue The type of an assignment expression is the type of the left operand unless the left operand has a qualified type in which case it is the unqualified version of the type of the left operand The side effect of updating the stored value of the left operand occurs between the previous and the next sequence point Smearing of Scalars to Vectors If a binaty operator is applied to a vector and a scalar the scalar is automatically type promoted to a same sized vector by replicating the scalar into each component The ternaty operator also suppotts smeating The binary rule is applied to the second and third operands first and then the binary rule is applied to this result and the first operand Namespaces Just as in C there are two namespaces Each has multiple scopes as in C O Tag namespace which consists of struct tags Q Regu
260. s may be trademarks of the respective companies with which they are associated Updates Any changes additions or corrections will be posted at the NVIDIA Cg Web site http developer nvidia com Cg Refer to this site often to keep up on the latest changes and additions to the Cg language Copyright Copyright NVIDIA Corporation 2002 RVIDIA NVIDIA Corporation 2701 San Tomas Expressway Santa Clara CA 95050 www nvidia com Foreword a a es a a ea ee A A xi Preface caca AS xiii Release Notes cusa c doa ke UE Ue DECR REX eet EOD E dor CCP LCS em xiv Online Updates 24x emer eoe XE OR Re kb oer d RE RE MER xiv Introduction to the Cg Language sisas er OG EO RR E Re pipa do S RS E d dc 1 The Cg Language eliana ri Ea ERO RE EE ERES EP V BEN 1 Cg s Programming Model for GPUS orinar pr ea io Ro s 2 Cg Language Profiles srami Bee nds ine eb beet e eee o esie bon ea UR s Rois 3 Declaring Programs It Cg cocinar eats ARA rea badd Sonne n 4 Program Inputs and OUtDUES mus ia Re Roe A A Poe le 4 Working with Data uas kh ER a ERG A AAA DERE EE 10 Basic Data TYPES a ip da den id do d irae 10 Type CONVERSIONS a 3L d nen cte rue b aa se ers O Cor ce de wes eae ex RG ae ira 11 SHrUCHINES c p qr don ase encoded eate E P oe d st od 12 TOS s aactor xb don didt oa SEE Rad IE Roto O opinor Pus 12 Statements and Operators 3 aedes e res ri IRR AI A 13 lire M T CC io e a 13 Function Definitions and Function Overloading o
261. s one sign bit a 23 bit mantissa and an 8 bit exponent This type is supported in all profiles 10 808 00504 0000 004 NVIDIA Introduction to the Cg Language although the DirectX 8 pixel profiles implement it with reduced precision and range for some operations A half A 16 bit IEEE like floating point s10e5 number Q int A 32 bit integer Profiles may omit support for this type or have the option to treat int as float a fixed A 12 bit fixed point number s1 10 number It is supported in all fragment profiles QO bool Boolean data is produced by comparisons and is used in i and conditional operator constructs This type is supported in all profiles O sampler The handle to a texture object comes in six variants sampler sampler1D sampler2D sampler3D samplerCUBE and samplerRECT These types are supported in all pixel and fragment profiles with one exception samplerRECT is not supported in the DirectX profiles Cg also includes built in vector data types that are based on the basic data types A sample of these built in vector data types includes but is not limited to the following float4 float3 float2 floatl bool4 boo13 bool2 booll Additional support is provided for matrices of up to four by four elements Here are some examples of matrix declarations floatixl matrixl One element matrix float2x3 matrix2 Two by three matrix six elements float4x2 matrix3 Four by two matrix
262. s scalar and vector the scalar is smeared to create a vector of the necessary size to perform an elementwise operation Thus a loat3 A B C is equal to float3 a A a B a C The built in arithmetic operators do no currently support matrix operands It is important to remember that matrices are not the same as vectors even if their dimensions are the same 14 808 00504 0000 004 NVIDIA Introduction to the Cg Language Multiplication Functions Cg s mul functions are for multiplying matrices by vectors and matrices by matrices Matrix by column vector multiply matrix column vector mul M v Row vector by matrix multiply row vector matrix mul v M Matrix by matrix multiply matrix matrix mul M N It is important to use the correct version of mul Otherwise you are likely to get unexpected results More detail on the mul functions are provided in Cg Standard Library Functions on page 19 Vector Constructor Cg allows vectors up to size 4 to be constructed using the following notation y ce xx floac2 3 0 2 0 1 0 1L 0 The vector constructor can appear anywhere in an expression Boolean and Comparison Operators Cg includes three of the standard C boolean operators amp amp logical AND II logical OR logical negation In C these operators consume and produce values of type int but in Cg they consume and produce values of type bool This difference is
263. s the tex2D function to perform a 2D texture lookup to determine the fragment s RGBA color void applytex uniform sampler2D mytexture float2 uv TEXCOORDO out float4 Guuicolo 5 CON OR E outcolor tex2D mytexture uv 808 00504 0000 004 17 NVIDIA Cg Language Toolkit Cg provides a wide variety of texture lookup functions a sample of which is given below For a complete list see Texture Map Functions on page 25 Q Standard nonprojective texture lookup tex2D sampler2D tex float2 s texRECT samplerRECT tex float2 s texCUBE samplerCUBE tex float3 s Q Standard projective texture lookup tex2Dproj sampler2D tex float3 sq texRECTproj samplerRECT tex float3 sg texCUBEproj samplerCUBE tex float4 sq O Nonprojective texture lookup with user specified filter kernel size tex2D sampler2D tex float2 s float2 dsdx float2 dsdy texRECT samplerRECT tex float2 s float2 dsdx float2 dsdy texCUBE samplerCUBE tex float3 s float3 dsdx float3 dsdy The filter size is specified by providing the derivatives of the texture cootdinates with respect to pixel coordinates x dsdx and y dsdy For more information see Texture Map Functions on page 25 Q Shadowmap lookup tex2Dproj sampler2D tex float4 szq tex2DRECT samplerRECT tex float4 szq In these functions the z component of the texture coordinate holds a depth value to be compared against the shadowmap Shadowmap
264. secolor COLORO float4 uv0 TEXCOORDO float4 uvl LE XCOOR DIF y fragout bar myvf indata float4 x indata uv0 JR iM Ip The following binding semantics are available in all Cg vertex profiles for output from vertex programs POSITION PSIZE FOG COLORO COLOR1 and TEXCOORDO TEXCOORD7 All vertex programs must declare and set a vector output that uses the POSITION binding semantic This value is required for rasterization 808 00504 0000 004 7 NVIDIA Cg Language Toolkit To ensure interoperability between vertex programs and fragment programs both must use the same struct for their respective outputs and inputs For example struct myvert2frag fite 87 OSO SEIKO float4 uvO0 TEXCOORDO float4 uvl TEXCOORD1 be Vertex program myvert2frag vertmain myvert2frag outdata EE ting return outdata Fragment program void fragmain myvert2frag indata float4 tcoord indata uv0 SS Note that values associated with some vertex output semantics are intended for and ate used by the rasterizer These values cannot actually be used in the fragment program even though they appear in the input struct For example the indata pos value associated with the POSITION fragment semantic may not be read in the ragmain shader Varying Outputs from Fragment Programs Binding semantics are always required on the outputs of fragment programs Fragment programs are required to decl
265. sform A from tangent to cube space float4 TangentToCubeSpacel TEXCOORD2 third row of the 3x3 transform 722 from tangent to cube space float4 TangentToCubeSpace2 TEXCOORD3 mA nenkin ay JN uniform float4x4 WorldViewProj uniform float3x4 ObjToCubeSpace uniform float3 EyePosition in cube space uniform float BumpScale WE UP pass texture coordinates for UA fetching the normal map OUT TexCoord xy IN TexCoord xy compute 3x3 transform from tangent to object space float3x3 objToTangentSpace first rows are the tangent and binormal scaled by the bump scale 808 00504 0000 004 141 NVIDIA Cg Language Toolkit objToTangentSpace 0 BumpScale IN T objToTangentSpace 1 BumpScale IN B objToTangentSpace 2 IN N compute the 3x3 transform from Hi tangent space to cube space TangentToCubeSpace Gi object2cube tangent2object object2cube transpose objToTangentSpace since the inverse of a rotation is its transpose Jl So a row of TangentToCubeSpace is the transform by Up objToTangentSpace of the corresponding row of ObjToCubeSpace OUT TangentToCubeSpace0 xyz mul objToTangentSpace ObjToCubeSpace 0 xyz OUT TangentToCubeSpacel xyz mul objToTangentSpace ObjToCubeSpace 1 xyz OUT TangentToCubeSpace2 xyz mul objToTangentSpace ObjToCubeSpace 2 xyz compute the eye vector f
266. so introduces a few new ideas In particular it includes features designed to represent data flow in stream processing architectures such as GPUs Profiles which ate specified at compile time may subset certain features of the language including the ability to implement loops and the precision at which certain computations are performed Silent Incompatibilities Most of the changes from ANSI C are either omissions or additions but there are a few potentially silent incompatibilities These are changes within Cg that could cause a program that compiles without errors to behave in a manner different from C Q The type promotion rules for constants ate different when the constant is not explicitly typed using a type cast ot type suffix In general a binary operation between a constant that is not explicitly typed and a variable is performed at the variable s precision rather than at the constant s default precision O Declarations of struct perform an automatic typedef as in C and thus could override a previously declared type O Arrays are first class types that are distinct from pointers As a result array assignments semantically perform a copy operation for the entire array Similar Operations That Must be Expressed Differently There are several changes that force the same operation to be expressed differently in Cg than in C a A Boolean type bool is introduced with corresponding implications for operators and contro
267. soutce code to vertex programs for use by the NV vertex program OpenGL extension a Profile name vp20 Q How to invoke Use the compiler option profile vp20 This section describes the capabilities and restrictions of Cg when using the vp20 profile Overview The vp20 profile limits Cg to match the capabilities of the NV vertex program extension NV vertex program has the same capabilities as DirectX 8 vertex shaders so the limitations that this profile places on the Cg source code wtitten by the programmer is the same as the DirectX VS 1 1 shader profiles Aside from the syntax of the compiler output the only difference between the vp20 Vertex Shader profile and the DirectX VS 1 1 profile is that the vp20 profile supports two additional outputs BCOLO for back facing primary color and BCOL1 for back facing secondary color Position Invariance O The vp20 profile supports position invariance as described in the core language specification Q The modelview projection matrix must be specified using a binding semantic of GL MVP 7 To understand the NV vertex program and the code produced by the compiler using the vp20 profile see the GL NV vertex program extension documentation 8 See DirectX Vertex Shader 1 1 Profile vs 1 1 on page 223 for a full explanation of the data types statements and operators supported by this profile 240 808 00504 0000 004 NVIDIA Appendix B Language Profiles Data Types
268. specially in fragment programs These are referred to as basic profiles See Language Profiles on page 195 for detailed descriptions of these and related profiles Declaring Programs in Cg CPU code generally consists of one program specified by main in C In contrast a Cg program can have any name program is defined using the following syntax lt return type gt lt program name gt lt parameters gt lt semantic name gt asa 7 Program Inputs and Outputs The programmable processors in GPUs operate on streams of data The vertex processor operates on a stream of vertices and the fragment processor operates on a stream of fragments 4 808 00504 0000 004 NVIDIA Introduction to the Cg Language A programmer can think of the main program as being executed just once on a CPU In contrast a program is executed repeatedly on a GPU once for each element of data in a stream The vertex program is executed once for each vertex and the fragment program is executed once for each fragment The Cg language adds several capabilities to C to support this stream based programming model For new Cg programmers these capabilities often take some time to understand because they have no direct correspondence to C capabilities However the sample programs later in this document demonstrate that it really is easy to use these capabilities in Cg programs Two Kinds of Program Inputs A Cg program can consume two diffe
269. st double cgGetParameterValues CGparameter parameter CGenum valueType int numberOfValuesReturned It retrieves the default value if valueType is equal to CG_DEFAULT and the constant value if valueType is equal to CG_CONSTANT The components of the value are returned in row major order as a pointer to an array containing type double elements After cgGetParameterValues is called the number of components available in the array is pointed to by numberOfValuesReturned Core Cg Error The core Cg runtime reports an error by setting a global variable containing the error code You quety it as well as the corresponding error string as follows CGerror error cgGetError const char errorString cgGetErrorString error Each time an error occurs the core Cg runtime also calls a callback function optionally provided by the application that usually calls cgGetError void MyErrorCallback const char errorString cgGetErrorString cgGetError cgSetErrorCallback MyErrorCallback Here is the list of all the CGerror etrots specific to the core Cg runtime O CG NO ERROR Returned when no error has occurred O CG COMPILER ERROR Returned when the compiler generated an error A call to egGetLastListing should be made to get more details on the actual compiler error 44 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Q CG INVALID PARAMETER ERROR Returned when the parameter used
270. supplied per vertex Tangent space bases are skinned in a similar fashion and then used to transform the light vector into tangent space for per pixel bump mapping Figure 22 Figure 22 Example of Matrix Palette Skinning 808 00504 0000 004 161 NVIDIA Cg Language Toolkit Vertex Shader Source Code for Matrix Palette Skinning struct appdata mode SAPOS EON REOS ENEON float2 Weights BLENDWEIGHTO float2 Indices BLENDINDICES float3 Normal NORMAL float2 TexCoord0 TEXCOORDO Sto ON S 2 ICO float o TE METH OEI 27 rial is os mL COORDS E be SURCO SORT float4 Hposition POSITION float4 TexCoord0 TEXCOORDO float4 TexCoordl TEXCOORD1 float4 Color0 COLORO vpconn main appdata IN uniform float4x4 WorldViewProj uniform float3x4 Bones 26 uniform float3 LightVec vpconn OUT float4 tempPos tempPos xyz IN Position xyz tempPos w 1 0 grab first bone matrix float i IN Indices x transform position float3 pos0 mul Bones i tempPos create 3x3 version of bone matrix Lores inp m m00 m01 m02 Bones i m00 m01 m02 m m10 mil m12 somes aj mi0 mi m12 mio 1620 mZ 1122 Iomes x 1120 um21 1225 ans omnes UM Se 162 808 00504 0000 004 NVIDIA float3 s0 mul m IN S Ploate mul m IN T float3 sxt0 mul m IN SxT next bone i IN Indices y create 3x3 version of bone m m00
271. t binormal normal Passed in from vertex program loaro 18 INP Float3 Nbump Bump mapped normal Float3 bump tex2D bumpSampler uv Nbump x pump za SUI O EZ IN gdp Iowiwgow PUMPA w MS ue domo 7 F JBoss m oo 74 55 NO Nou UE TA O ZN PAZOS However here we have written a series of computations that add and multiply single pairs of floating point values at a time After a little algebra we can rewrite this as three multiplies of a loat3 and a float and two float3 additions which runs several times faster than the original Now oxbWgaos 9 JU se JOwdsg JB xr logra 7 INE 2 Use Swizzles to Make the Most of Vectorization The GPU can swizzle the values in vectors with no performance penalty recall that a swizzle can be used to rearrange the elements of a vector Given a vector float3 a float3 0 1 2 swizzles construct new vectots laos loca 0r Or 0 ayaa c lores b 257 2 E Eloy Elo ae 2 18 and so forth By swizzling your data carefully you can still take advantage of vectorization even when you don t want to use the same component of both 258 808 00504 0000 004 NVIDIA Appendix C Nine Steps to High Performance Cg vectors on both sides of your computation For example consider the computation of the cross product Given two three dimensional vectors the cross product returns a new vector that is perpendicular to the given vectors It is computed by itlloiuES cy 1
272. t char pixelOptions cgD3D9GetOptimalOptions pixelProfile 0 Create the vertex shader vertexProgram cgCreateProgramFromFile context CG SOURCE VertexProgram cg vertexProfile VertexProgram vertexOptions If your program uses explicit binding semantics you can create a vertex declaration using those semantics const D3DVERTEXELEMENT9 declaration 78 808 00504 0000 004 NVIDIA Using the Cg Runtime Library LO 9 sico elote v D3DDECLTYPE FLOAT3 D3DDECLMETHOD DEFAULT D3DDECLUSAGE POSITION 0 Oy Si S Ze o Eoo D3DDECLTYPE D3DCOLOR D3DDECLMETHOD DEFAULT D3DDECLUSAGE COLOR O Oj 4 SAO AO is v D3DDECLTYPE FLOAT2 D3DDECLMETHOD DEFAULT D3DDECLUSAGE TEXCOORD 0 D3DD3CL END y Ensure the resulting declaration is compatible with the shader This is really just a sanity check assert cgD3D9ValidateVertexDeclaration vertexProgram declaration device gt CreateVertexDeclaration declaration amp vertexDeclaration Load the program with th xpanded interfac Parameter shadowing is enabled second parameter TRUE cgD3D9LoadProgram vertexProgram TRUE 0 Create the pixel shader fragmentProgram cgCreateProgramFromFile context CG SOURCE FragmentProgram cg pixelProfile FragmentProgram pixelOp
273. t of binding semantics ATTRO ATTR15 can also be used The two sets act as aliases to each other Table 42 vp20 Varying Input Binding Semantics Binding Semantics Name Corresponding Data POSITION ATTRO Input Vertex Generic Attribute 0 BLENDWEIGHT ATTR1 Input vertex weight Generic Attribute 1 NORMAL ATTR2 Input normal Generic Attribute 2 COLORO DIFFUSE ATTR3 Input primary color Generic Attribute 3 COLOR1 SPECULAR ATTR4 Input secondary color Generic Attribute 4 TESSFACTOR FOGCOORD ATTR5 Input fog coordinate Generic Attribute 5 PSIZE ATTR6 Input point size Generic Attribute 6 BLENDINDICES ATTR7 Generic Attribute 7 TEXCOORDO TEXCOORD7 ATTR8 ATTR15 Input texture coordinates texcoord0 texcoord7 Generic Attributes 8 15 TANGENT ATTR14 Generic Attribute 14 BINORMAL ATTR15 Generic Attribute 15 Table 43 summarizes the valid binding semantics for varying output parameters in the vp20 profile These binding semantics map to NV_vertex_program output registers The two sets act as aliases to each other Table 43 vp20 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION HPOS Output position PSIZE PSIZ Output point size FOG FOGC Output fog coordinate 242 NVIDIA 808 00504 0000 004 Appendix B Language Profiles Table 43 vp20 Varyin
274. t3D texture to a sampler parameter using HRESULT cgD3D9SetTexture CGparameter parameter IDirect3DBaseTexture9 texture To set the sampler state in the Direct3D 9 Cg runtime use HRESULT cgD3D9SetSamplerState CGparameter parameter D3DSAMPLERSTATETYPE type DWORD value Parameter type is any of the D3DSAMPLERSTATETYPE enumerants and parameter value is a value appropriate for the corresponding type Here is an example of how to use this function cgD3D9SetSamplerState parameter D3DSAMP MAGFILTER D3DTEXF LINEAR To set the texture stage state in the Direct3D 8 Cg runtime use HRESULT cgD3D8SetTextureStageState CGparameter parameter D3DTEXTURESTAGESTATETYPE type DWORD value 808 00504 0000 004 73 NVIDIA Cg Language Toolkit Parameter type must be one of the following values D3DTSS_ADDRESSU D3DTSS_ADDRESSV D3DTSS_ADDRESSW D3DTSS_BORDERCOLOR D3DTSS_MAGFILTER D3DTSS_MINFILTER D3DTSS_MIPFILTER D3DTSS MIPMAPLODBIAS D3DTSS MAXMIPLEVEL D3DTSS MAXANISOTROPY Parameter value is a value appropriate for the corresponding type Here is an example of how to use this function cgD3D8SetTextureStageState parameter D3DTSS MAGFILTER D3DTEXF LINEAR The texture wrap mode is set using HRESULT cgD3D9SetTextureWrapMode CGparameter parameter DWORD value The input value is either zero or a combination of D3DWRAP_U D3DWRAP V and D3DWRAP_W Here is an example of how to use this function cg
275. t4x4 myMatrix float myFloatScalar float4 myFloatVec4 Set myFloatScalar to myMatrix 3 2 myFloatScalar myMatrix m 32 Assign the main diagonal of myMatrix to myFloatVec4 myFloatVec4 myMatrix m 00 m11 m22 m33 Forcompatibility with the D3DMatrix data type Cg also allows one based swizzles using a form with the m omitted after the _ symbol matrixObject lt row gt lt col gt _ lt row gt lt col gt In this form the indexes for row and lt co1 gt are one based rather than the C standard zero based So the two forms are functionally equivalent float4x4 myMatrix float4 myVec These two statements are functionally equivalent myVec myMatrix m00 m23 m11 m31 Es emos ili 54 22 427 Because of the confusion that can be caused by the one based indexing use of the latter notation is strongly discouraged The matrix swizzles may only be applied to matrices When multiple components are extracted from a matrix using a swizzle the result is an appropriately sized vector When a swizzle is used to extract a single component from a matrix the result is a scalar Q The write mask operator It can only be applied to an lvalue that is a vector It allows assignment to particular elements of a vector or matrix leaving other elements unchanged The only restriction is that a component cannot be repeated 808 00504 0000 004 187 NVIDIA Cg Language Toolkit Arithmetic Precision and
276. ter parameter If the parameter does not have any associated resource cgGetParameterResource returns CG_UNDEFINED The two functions cgGetResource and cgGetResourceString allow you to determine the correspondence between a resource enumerant and its corresponding string CGresource cgGetResource const char resourceString const char cgGetResourceString CGresource resource If the string passed to cgGetResource does not correspond to any resource CG UNDEFINED is returned Using cgGetParameterBaseResource allows you to retrieve the base resoutce for a parameter in a Cg program CGresource cgGetParameterBaseResource CGparameter parameter 808 00504 0000 004 43 NVIDIA Cg Language Toolkit The base resource is the first resource in a set of sequential resources For example if a given parameter has a resource equal to CG_TEXCOORD7 its base resource is CG TEXCOORDO Only parameters with resources whose name ends with a number have a base resource All other parameters return CG UNDEFINED when cgGetParameterBaseResource is called Function egGetParameterResourceIndex retrieves the numerical portion of the resource unsigned long cgGetParameterResourceIndex CGparameter parameter For example if the resource for a given parameter is C6_TEXCOORD7 cgGetParameterResourceIndex returns 7 The cgGetParameterValues function retrieves the default or constant value of a uniform parameter con
277. terminated array of null terminated strings that are passed as arguments to the compiler The pointer may itself be null The only difference between the two functions is how program is interpreted For cgCreateProgramFromFile programis a string containing the name of a file containing source code for cgCreateProgram program directly contains source code If the enumerant programType is equal to CG_SOURCE the source code is Cg source code if it is equal to CG_OBJECT the source code is precompiled object code and does not require any further compilation The CGprogram handle returned by cgCreateProgramFromFile is valid if it 1s different from zero which means that the program has been successfully created and compiled The program is destroyed by passing its handle to cgDestroyProgram void cgDestroyProgram CGprogram program 36 808 00504 0000 004 NVIDIA Using the Cg Runtime Library Note In the future it will be possible to modify a program that has been created by cgCreateProgram or cgCreateProgramFromFile through the runtime by changing the variability or the semantics of some parameters for example so that it will need to be recompiled A call to cgIsProgramCompiled determines whether a program needs to be recompiled CGbool cgIsProgramCompiled CGprogram program To recompile a program use cgCompileProgram cgCompileProgram CGprogram program A useful function in this context is cg
278. tersect with the iris plane halz Leis Intersect Dlane IN Oo sirio planeEquation helie racet teisik Bel Dete TINS DENSI ENA fadeT fadeT fadeT faceColor DiffPupil xxx ds noms X 0 1 halts 1ersPomte INSOPOsteron cese Set Vecioni half3 irisST irisScale irisPoint hatis sm 055m 0 51 y faceColor tex2D ColorMap refVector aciei s eds yz 159197 faceColor lerp faceColor LensColor fadeT hitColor lerp missColor faceColor smoothstep 0 0h GRADE slice hitColor hitColor SpecularLight return half4 hitColor 1 43 7 118 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders Skin Description This effect demonstrates some techniques for rendering skin ranging from simple Blinn Phong Bump Mapping to more complex Subsurface Scattering lighting models It also illustrates the use of Rim lighting and simple translucency for capturing some of the more subtle properties of skin resulting from complex non local lighting interactions Finally it shows how the various techniques can be combined to produce compelling stylized skin Figure 10 Example of Skin Pixel Shader Source Code for Skin SENGE Eram float2 texcoords LEX COORD O float4 shadowcoords TEXCOORD1 808 00504 0000 004 119 NVIDIA Cg Language Toolkit float4 tangentToEyeMat0 TEXCOORD4 float3 tangentToEyeMatl TEXCOORD5 float3 tangentToEyeMat2 TEXCOORD6
279. tex Shader Source Code for Anisotropic Lighting oo ooooo 135 B mp Dot3x2 Diffuse and Specular encre Rm tnm emet mens 136 Description asia doma pipa nie had quodp ande aca cadran m Ro Od AUR RUN DG GR ia aa 136 Vertex Shader Source Code for Bump Dot3X2 ics eser m eh 137 Pixel Shader Source Code for Bump Dot3X2 lisse aan 138 B mp Reflection Mapping isa entem Aretha dl d ear 140 DESCHPUOM resisten iU dr unire a a a adiri us 140 Vertex Shader Source Code for Bump Reflection Mapping llle 141 Pixel Shader Source Code for Bump and Reflection Mapping 143 ETOSDell i osa ee pce acide ea aos qiu Sextius e ca eua ir Sr EUR 144 Descrip ae 9e ockde aac RR ERE E ENE D VE EUER es VR AL Ne ens 144 Vertex Shader Source Code for Fresnel cse hh nn 144 GOSS iaa da aa 146 Bises T P a E a wwe 146 Vertex Shader Source Code for Grass 24s pee ek dre ug a A 146 Reacciona ans aaa d ER GE E i lo ol ume 149 DGS CHINO v a 149 Vertex Shader Source Code for Refraction s soca durar monas 150 Pixel Shader Source Code for Refraction cas es lh RR nbn uh Ron 151 Shadow Mapping x sas ai al eat eee AUR o A d HL eS 152 Descriptio 0064 ear ede ented agers A ERR kee e tae Pes 152 Vertex Shader Source Code for Shadow Mapping 0000ee eee eeee 153 Pixel Shader Source Code for Shadow Mapping 0000e cence eee 154 Shadow Volume EXtrusion ira a cada
280. tex2Dproj sampler2D tex float4 szq 2D projective depth compare texRECT samplerRECT tex float2 s 2D RECT nonprojective texRECT samplerRECT tex float2 s float2 dsdx float2 dsdy 2D RECT nonprojective with derivatives texRECT samplerRECT tex float3 sz 2D RECT nonprojective depth compare texRECT samplerRECT tex float3 sz float2 dsdx float2 dsdy 2D RECT nonprojective depth compare with derivatives texRECTproj samplerRECT tex float3 sq 2D RECT projective texRECTproj samplerRECT tex float3 szq 2D RECT projective depth compare tex3D sampler3D tex float3 s 3D nonprojective tex3D sampler3D tex float3 s float3 dsdx float3 dsdy 3D nonprojective with derivatives tex3Dproj sampler3D tex float4 szq 3D projective depth compare 26 808 00504 0000 004 NVIDIA Cg Standard Library Functions Table 3 Texture Map Functions continued Texture Map Functions Function Description texCUBE samplerCUBE tex float3 s Cubemap nonprojective texCUBE samplerCUBE tex float3 s float3 dsdx float3 dsdy Cubemap nonprojective with derivatives texCUBEproj samplerCUBE tex float4 sq Cubemap projective In the table the name of the second argument to each function indicates how its values are used when performing the texture lookup s indicates a 1 2 or 3 component texture coordinate z indicates a depth comparison
281. texm3x2tex DirectX 8 pixel shader instruction DOT PRODUCT TEXTURE 2D in OpenGL This instruction computes the dot product of the normal and the light vector corresponding to the diffuse light component and the dot product of the normal and the half angle vector corresponding to the specular light component This results into two scalar values that are used as texture cootdinates to look up a 2D illumination texture containing the diffuse color and the specular term in its alpha component Since the normal fetched from the normal map is in tangent space both the light vector and the half angle vector are transformed to this space by the vertex shader Figure 14 Figure 14 Example of Bump Dot3x2 Diffuse and Specular 136 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders Vertex Shader Source Code for Bump Dot3x2 struct a2v y float4 Position POSITION in object space float3 Normal NORMAL in object space float2 TexCoord TEXCOORDO float3 T TEXCOORD1 in object space float3 B TEXCOORD2 in object space float3 N TEXCOORD3 in object space Seater WAsE di he float4 Position POSITION in projection space float4 Normal COLORO in tangent space float4 LightVectorUnsigned COLOR1 in tangent space float3 TexCoord0 TEXCOORDO float3 TexCoordl TEXCOORD1 float4 LightVector TEXCOORD2 in tangent space float4 HalfAngleVector TEXCOORD3 in tangent space
282. that vaties lineatly over the face of the triangle for example the distance from the fragment to a light source to be used for attenuation the value can be computed in the vertex shader at each vertex passed to the fragment shader and automatically interpolated by the GPU along the way Q The result is nearly linear across a triangle When a value computed by a fragment shader varies slowly over triangles it may be an acceptable approximation to compute its value at each vertex and use its linearly interpolated value in the fragment shader For example the usual Gouraud shading algorithm takes advantage of this situation to compute lighting per vertex rather than per pixel In a similar manner it may be advantageous to move any vertex shader computation that is solely dependent on the values of uniform parameters to the CPU and then to pass the result of the computation into the vertex shader with different uniform parameters For example if the vertex shader is passed a loat3 vector giving the direction of a distant light source the vector should be normalized on the CPU and passed to the vertex shader This avoids the need to repeatedly and unnecessarily recompute normalize lightvector in the vertex shader 262 808 00504 0000 004 NVIDIA Appendix C Nine Steps to High Performance Cg 8 Avoid Matrix Transposes Just for Multiplication Computing the transpose of a matrix can often be avoided If you would like to multi
283. the arbvp1 and vp20 profiles is the way that input varying semantics are handled In the vp20 profile semantic names such as POSITION and ATTRO ate aliases of cach other the same way NV vertex program aliases Vertex and Attribute 0 see Table 42 vp20 Varying Input Binding Semantics on page 242 In the arbvp1 profile the semantic names are not aliased because ARB vertex program allows the conventional attributes such as vertex position to be separate from the generic attributes such as Attribute 0 For this reason it is important to follow the conventions given in Table 20 arbvp1 Varying Input Binding Semantics on page 209 so that arbvp1 programs work for all implementations of ARB vertex program The arbvp1 conventions are compatible with the vp20 and vp30 profiles 808 00504 0000 004 207 NVIDIA Cg Language Toolkit Loading Constants Applications that do not use the Cg run time ate no longer required to load constant values into program parameters registers as indicated by the const expressions in the Cg compiler output The compiler produces output that causes the OpenGL driver to load them However uniform variables that have a default definition still require constant values to be loaded into the appropriate program parameter registers as ARB vertex programs do not support this feature Application programs either have to use the Cg run time parse and handle the default commands or have to avoid initializing un
284. the packing and unpacking instructions defined by the NV ragment program OpenGL extension pack 2half float pack 2half float2 a float pack 2half half2 a Converts the components of a into a pair of 16 bit floating point values The two converted components are then packed into a single 32 bit result This operation can be reversed using the unpack 2half function C Pseudocode result half a y lt lt 16 half a x unpack 2half half2 unpack 2half float a Unpacks a 32 bit value into two 16 bit floating point values C Pseudocode result x a gt gt 0 OXEE result y a gt gt 16 amp OxFF 220 808 00504 0000 004 NVIDIA Appendix B Language Profiles pack 2ushort float pack 2ushort float2 a float pack 2ushort half2 a Converts the components of a into a pair of 16 bit unsigned integers The two converted components are then packed into a single 32 bit return value This operation can be reversed using the unpack_2ushort function C Pseudocode MSIE y mouncl G5535 0 Cileamo a x lt 0 0 1 0 7 Wisin y wouncl GS535 0 clama y 0 0 oO F resule USAGE ay lt lt IG ES Os unpack_2ushort float2 unpack 2ushort float a Unpacks two 16 bit unsigned integer values from a and scales the results into individual floating point values between 0 0 and 1 0 C Pseudocode resul Oe SS 0 amp Oxenwn 6533505 resolt sy es Se 15 amp umi 65
285. the same abstraction for GPUs Cg changes the way programmers can program focusing on the ideas the concepts and the effects they wish to create not on the details of the hardware implementation Cg also decouples programs from specific hardware because the language is functional not hardware implementation specific Also since Cg can be compiled at run time on any platform operating system and for any graphics hardware Cg programs ate truly portable Finally and perhaps best of all Cg programs are future proof and can adapt to run well on future products The compiler can optimize directly for a new target GPU that perhaps did not even exist when the original Cg program was written This book is intended as an introduction to Cg as well as a practical handbook to get programmers started developing in Cg It includes a language description a reference for the standard and run time libraties and is full of helpful examples The goal for this book 1s to be both an introduction and a tool for the new uset as well as a reference and resource for developers as they become more proficient Welcome to the world of Cg David Kirk Chief Scientist NVIDIA Corporation xii 808 00504 0000 004 NVIDIA Preface The goal of this book is to introduce to you Cg a new high level language for graphics programming To that end we have organized this document into the following sections a 808 00504 0000 004 Introduction to
286. ting 808 00504 0000 004 95 NVIDIA Cg Language Toolkit 96 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders This chapter provides a set of advanced profile sample shaders written in Cg Each shader comes with an accompanying snapshot description and source code Examples shown are Improved Skinning Improved Water Melting Paint MultiPaint Ray Traced Refraction Skin Thin Film Effect Car Paint 9 Oo ooo o oO O 808 00504 0000 004 NVIDIA 97 Cg Language Toolkit Improved Skinning Description This shader takes in a set of all the transformation matrices that can affect a particular bone Each bone also sends in a list of matrices that affect it There is then a simple loop that for each vertex goes through each bone that affects that vertex and transforms it This allows just one Cg program to do the entire skinning for vertices affected by any number of bones instead of having one program for one bone another program for two bones and so on Figure 5 Example of Improved Skinning 98 808 00504 0000 004 NVIDIA Advanced Profile Sample Shaders Vertex Shader Source Code for Improved Skinning GNE DIOE ENPUES float4 position EOS TEON float4 weights BLENDWEIGHT float4 normal NORMAL float4 matrixIndices TESSFACTOR float4 numBones SPECULAR y struct cULDUES float4 hPosition LDOPOSXTION float4 color COLORO be outputs main
287. tions Load the program with th xpanded interface Parameter shadowing is enabled second parameter TRUE Ignore vertex shader specifc flags such as declaration usage cgD3D9LoadProgram fragmentProgram TRUE 0 Grab some parameters modelViewMatrix cgGetNamedParameter vertexProgram ModelViewMatrix baseTexture cgGetNamedParameter fragmentProgram BaseTexture someColor cgGetNamedParameter fragmentProgram SomeColor Sanity check that parameters have th xpected siz assert cgD3D9TypeToSize cgGetParameterType modelViewMatrix 16 assert cgD3D9TypeToSize cgGetParameterType someColor 4 7 808 00504 0000 004 79 NVIDIA Cg Language Toolkit Set parameters that don t change They can be set only once since parameter shadowing is enabled cgD3D9SetTexture baseTexture texture cgD3D9SetUniform someColor amp constantColor Called to render the scen void OnRender Load model view matrix D3DXMATRIX modelViewMatrix Hh Set the parameters that change every frame This must be done before binding the programs cgD3D9SetUniformMatrix modelViewMatrix amp modelViewMatrix Set the vertex declaration device gt SetVertexDeclaration vertexDeclaration Bind the programs This downloads any parameter values that have been previously set cgD3D9BindProgram vertexProgram cgD3D9BindProgram
288. to the COLOR output of the program and execution of the program is terminated If the compiler s DEBUG option is not specified this function does nothing The debug function is intended to allow a program to be compiled twice once with the DEBUG option and once without By executing both programs you can obtain one frame buffer containing the final output of the program and a second containing an intermediate value to be examined for debugging Predefined Fragment Program Output Structures A number of e per structure types for use in fragment programs are predefined in the standard library Variables of these types can be used to hold the outputs of a fragment program Their use is strictly optional For the ps 1 and p20 profiles the ragout structure is defined as follows Siesculcie Esso d moar col 3 CONO The ps 2 arbfp1 and p30 profiles have two fragment output types defined struct ragout half4 col COLOR float depth DEPTH be Sib track OUte mel citer float4 col COMO ioo cosi DEPART be 28 808 00504 0000 004 NVIDIA Using the Cg Runtime Library This chapter describes the Cg Runtime Library It assumes that you have some basic knowledge of the Cg language as well as the OpenGL or Direct3D APIs depending on which one you use in your applications The first section Introducing the Cg Runtime on page 29 talks about the benefits of using the Cg Runtime Library
289. to the current state This means that in subsequent drawing calls the program is executed for every vertex in the case of a vertex program and for every fragment in the case of a fragment program Here s how to bind a program in OpenGL cgGLBindProgram program Here s how to bind a program in Direct3D cgD3D9BindProgram program You can only bind one vertex and one fragment program at a time for a particular profile Therefore the same vertex program is executed until another vertex program is bound Similarly the same fragment program is executed as long as no other fragment program is bound In OpenGL you disable profiles by the following call cgGLDisableProfile CG PROFILE ARBVP1 Disabling a profile also disables the execution of the corresponding vertex or fragment program Releasing Resources When your application is ready to close it is good programming practice to free resources that you ve acquired Because the Direct3D runtime keeps an internal reference to the Direct3D device you must tell it to release this reference when you are done using the runtime This is done with the following call cgD3D9SetDevice 0 To free resources allocated for a program call this function cgDestroyProgram program To free resources allocated for a context use this function cgDestroyContext context Note that destroying a context destroys all the programs it contains as well Core Cg Runtime The core Cg runti
290. tor 808 00504 0000 004 143 NVIDIA Cg Language Toolkit Fresnel Description This effect computes a reflection vector to lookup into an environment map for reflections and modulates this by a Fresnel term The result is reflections only at grazing angles Figure 16 Figure 16 Example of Fresnel Vertex Shader Source Code for Fresnel struct app2vert float4 Position 8 o SINK float4 Normal NORMAL float4 TexCoord0 ERAS O ORIO be 144 808 00504 0000 004 NVIDIA Basic Profile Sample Shaders struct vert2frag float4 HPosition POs tT LON float4 Color0 COMOROS float4 TexCoord0d TEXCOORDO H vert2frag main app2vert IN uniform float4x4 ModelViewProj uniform float4x4 ModelView uniform float4x4 ModelViewIT vert2frag OUT ifdef PROFILE ARBVP1 ModelViewProj glstate matrix mvp ModelView glstate matrix modelview 0 ModelViewIT glstate matrix invtrans modelview 0 fendif OUT HPosition mul ModelViewProj IN Position float3 normal normalize mul ModelViewIT IN Normal xyz float3 eyeToVert normalize mul ModelView ION Osito 59 722 reflect th ye vector across the normal vector for reflection OUT TexCoord0 float4 reflect eyeToVert normal 1 0 float f0 1 compute the fresnel term float oneMCosAngle 1 dot eyeToVert normal oneMCosAngle pow oneMCosAngle 5 OUT Color0 lerp oneMC
291. umTexInstructionSlots lt n gt where n gt 24 Limitations in the Implementation Currently this profile implementation has following limitations Q OpenGL ARB fragment program profile is still in developmental beta stage as the extension and its support is not widely available Q OpenGL state access in ARB fragment programs is not yet implemented 808 00504 0000 004 213 NVIDIA Cg Language Toolkit OpenGL NV vertex program 2 0 Profile vp30 The vp30 Vertex Program profile is used to compile Cg source code to vertex programs for use by the NV vertex program2 OpenGL extension a Profile name vp30 Q How to invoke Use the compiler option profile vp30 The vp30 profile limits Cg to match the capabilities of the NV vertex program extension This section describes the capabilities and restrictions of Cg when using the vp30 profile Position Invariance The vp30 profile supports position invariance as described in the core language specification Q The modelview projection matrix must be specified using a binding semantic of GL MVP Unlike the vp20 and arbvp1 profiles this profile causes the compiler to emit the instructions for transforming the position using the modelview projection matrix Q The assembly code position invariant option is not used because the hardware guarantees that the position calculation is invariant compared to the fixed pipeline calculation Language Constructs Data Types This
292. ure coordinates associated with the nth texture unit intermediate coord are texture coordinates associated with the n 1 texture unit and prevlookup is the result of a previous texture operation This function can be used in conjunction with the DEPTH varying out semantic to generate the dot product depth replace NV texture shader instruction combination 808 00504 0000 004 255 NVIDIA Cg Language Toolkit Examples The following examples illustrate how a developer can use Cg to achieve NV texture shader and NV_register_combiners functionality Example 1 struct VertexOut float4 color sane OOROF float4 texCoord0 TEXCOORDO float4 texCoordl TEXCOORD1 be float4 main VertexOut IN uniform sampler2D diffuseMap uniform sampler2D normalMap COLOR float4 diffuseTexColor tex2D diffuseMap IN texCoord0 xy float4 normal 2 tex2D normalMap IN texCoordl xy 0 5 El eies ligat vector 2 1 color gis 0 5 7 logr dor resule SEE dot light vector normal xyz xxxx rertra Clo resule cliiriusste Color Example 2 struct VertexOut float4 texCoordO0 TEXCOORDO float4 texCoordl TEXCOORD1 float4 texCoord2 TEXCOORD2 float4 texCoord3 TEXCOORD3 float4 main VertexOut IN float4 float2 float4 float4 return uniform sampler2D normalMap uniform sampler2D intensityMap uniform sampler2D colorMap COLOR normal 2 tex2D normalMap IN texCoord0 xy 0 5 i
293. vaa A A ed 171 Partial Support of TYPES oasis ai E A x RR Cad s Ox deis 173 TYPE Categories 23 2 v arts ar ene EIER na Miah Beheaded 174 Constants x ascos ach saos alid ten fox a eot bd ad nde 174 TYPO QUAES s oi DET 175 Type CONVEISIONS su cce nde do Re a 176 Type EguiValefiGy s suat irs ROS ardt rt eh artes ores iw ide roe Pw eye de 178 Type Promotion Rule cuevas ricardo dae 178 NAMESPACES 4 5 3 aca a ba OEC DPA ode ea 179 Arrays and SUBSCHPINdO eva iia a e ame n 179 Function Overloacliht ea uii aio et iD eei wee A ETT RA eh Pe 181 Global VartablesS ic 2 2 thee hee Ede idonee S ah deii ao n OR RR gat 182 Use of Uninitialized Variables rrai oer dre a nm eet e dram ms 182 PIepEOCOSSOE aot iii OX ae DE eR e dit n Eco c oe ea E do Roe rt ed Satu 182 Overview of Binding Semantics s lisss ek Eleanor RR AA 183 Binding SEMANTI S 1 2 2 ond ceed dea a Boer nt Ae herbe Pete X orae dui 183 Alidsing of Semalnties sucia eco eem a eed RO SD daw ca CR RC 184 Restrictions on Semantics Within a Structure llle 184 Additional Details for Binding Semantics llll llle 184 How Programs Receive and Return Data llle 185 Statements cid eC ER Rx ack E tee ara ge dog V eoi 185 Minimum Requirements for if while and for Statements 185 New Vector Operators 3 pia n cham alee dci ae 186 Arithmetic Precision and Range isses ee rr ee RR RR a x 188 Operator Precedence uei cea a a Quer ULP Ced ded ARA
294. value for shadowmap lookups q indicates a perspective value and is used to divide the texture coordinate s before the texture lookup is performed For convenience the standard library also defines versions of the texture functions prefixed with h4 such as h4tex2D that return hal 4 values and prefixed with x4 such as x4tex2D that return fixed4 values When the texture functions that allow specifying a depth comparison value are used the associated texture unit must be configured for depth compare texturing Otherwise no depth comparison is actually performed Derivative Functions Table 4 presents the derivative functions that are supported by the Cg Standard Library Vertex profiles are not required to support these functions Table 4 Derivative Functions Derivative Functions Function Description ddx a Approximate partial derivative of a with respect to screen space x coordinate ddy a Approximate partial derivative of a with respect to screen space y coordinate 808 00504 0000 004 27 NVIDIA Cg Language Toolkit Debugging Function Table 5 presents the debugging function that is supported by the Cg Standard Library Vertex profiles are not required to support this function Table 5 Debugging Function Debugging Function Function Description void debug float4 x If the compiler s DEBUG option is specified calling this function causes the value x to be copied
295. vector where The x and w components are always one The y component is equal to the diffuse dot product or to zero if the product is less than zeto The z component is equal to the specular dot product raised to the given exponent or to zero if the diffuse dot product was less than zero All this is done substantially more efficiently than if the corresponding operations were written out in Cg code 808 00504 0000 004 261 NVIDIA Cg Language Toolkit 7 Take Advantage of the Different Levels of Computation Frequency Always keep in mind the fact that fragment programs generally are executed many more times than vertex programs Therefore move computation from fragment programs into vertex programs whenever possible Recall that varying outputs from vertex programs are automatically lineatly interpolated before being passed to the fragment program There are three main cases where you can move computation from a fragment program into a vertex program Q The result is constant over all fragments If the vertex shader computes a value that is the same for all vertices so that all fragments receive the same value after interpolation any computation that the fragment shaders do that is based solely on such values can be moved to the vertex shader as long as it doesn t require texture map lookups or other fragment only operations O The result is linear across a triangle If the fragment shader is computing a value
296. xample 116 sample shader 114 vertex shader code example 115 recursion function 13 reflection vector 144 refraction pixel shader code example 151 sample shader 149 vertex shader code example 150 release notes xiv Renderman relation toCg 165 reserved words 191 runtime coreCg 34 S sampler data type 11 sampler type specification 172 saturate for performance 260 scalar type category 174 semantics aliasing 184 restrictions 184 shader sample anisotropic lighting 134 bump dot 3x2 diffuse and specular 136 bump reflection mapping 140 fresnel 144 grass 146 improved skinning 98 improved water 101 matrix palette skinning 161 melting paint 105 multipaint 109 ray traced refraction 114 refraction 149 shadow mapping 152 shadow volume extrusion 155 sine wave demo 158 skin 119 shader simple cg example 90 shaders advanced profile samples 97 basic profile samples 133 shading computations for performance 261 shadow mapping 152 pixel shader code example 154 sample shader 152 vertex shader code example 153 shadow volume extrusion sample shader 155 vertex shader code example 156 shadow volumes 155 silent incompatibilities with C 165 simple cg basic transformations 93 passing arguments 93 Sine function 146 158 sine wave demo sample shader 158 vertex shader code example 159 sinh x 23 skin pixel shader code example 119 sample shader 119 skinning improved sample shader 98 vertex shader code example 99 smearing s
297. xture Lookup Function Texture Coordinate Swizzle texlDproj xw ra tex2Dproj Xyw rga texRECTproj Xyw rga tex3Dproj xyzw rgba texCUBEproj xyzw rgba 808 00504 0000 004 231 NVIDIA Cg Language Toolkit Bindings Manual Assignment of Bindings The Cg compiler can determine bindings between texture units and uniform sampler parameters texture coordinate inputs automatically This automatic assignment is based on the context in which uniform sampler parameters and texture coordinate inputs are used together To specify bindings between texture units and uniform parameters texture cootdinates to match their application all sampler uniform parameters and texture coordinate inputs that are used in the program must have matching binding semantics that is TEXUNIT n may only be used with TEXCOORD lt n gt Partially specified binding semantics may not work in all cases Fundamentally this restriction is due to the close coupling between texture samplets and texture cootdinates in DirectX pixel shaders 1 X Binding Semantics for Uniform Data If a binding semantic for a uniform parameter is not specified then the compiler will allocate one automatically Scalar uniform parameters may be allocated to either the xyz or the w portion of a constant register depending on how they are used within the Cg program When using the output of the compiler without the Cg runtime you must set all values of a scalar uniform to th
298. xture unit and intermediate coord are texture coordinates associated with the n 1 texture unit This function can be used to generate the dot product reflect cube map eye from qs NV texture shader instruction combination 254 808 00504 0000 004 NVIDIA Appendix B Language Profiles Table 50 p20 Auxiliary Texture Functions continued Texture Function Description texCUBE reflect eye dp3x3 uniform samplerCUBE tex float3 str float4 intermediate coordl float4 intermediate coord2 float4 prevlookup uniform float3 eye Performs the following float3 N float3 dot intermediate coordl xyz prevlookup xyz dot intermediate coord2 xyz prevlookup xyz dot coords xyz prevlookup xyz return texCUBE tex 2 dot N E dot N N N E where strq are texture coordinates associated with sampler tex prevlookup is the result of a previous texture operation intermediate coordl are texture coordinates associated with the n 2 texture unit intermediate coord are texture coordinates associated with the n 1 texture unit and eye is the eye ray vector This function can be used generate the dot product reflect cube map const eye NV texture shader instruction combination tex dp3x2 depth float3 str float4 intermediate coord float4 prevlookup Performs the following float z dot intermediate coord xyz prevlookup xyz float w dot str prevlookup xyz return z w where str are text
299. y all the related core runtime handles of type CGprogram CGparameter and so on remain valid If you call egD3D9SetDevice a second time with a different device all programs managed by the old device are rebuilt using the new device Responding to Lost Direct3D Devices The expanded interface may hold references to Direct3D resources that need to be recreated in response to a lost device In particular certain sampler patameters might need to be released before a Direct3D device can be reset from a lost state The expanded interface is holding a reference to a texture that needs to be reset in response to a lost device if both of the following are true for a texture Q It was created in the D3DPOOL_ DEFAULT pool Q It was bound to a sampler parameter using cgD3D9SetTexture of a program for which parameter shadowing is enabled In this case the parameter must be set to zero using cgD3D9SetTexture to remove the expanded interface s reference to that texture so it can be destroyed and the Direct3D device can be reset from a lost state Later after resetting the Direct3D device and recreating the texture it needs to be re bound to the sampler parameter For example IDirect3DDevice9 device Initialized elsewhere IDirect3DTexture9 myDefaultPoolTexture CGprogram program void OneTimeLoadScene Load the program with cgD3D9LoadProgram and enable parameter shadowing x ge HR cgD3D9LoadProgram pr
300. ying output parameters in the vp30 profile These binding semantics map to NV vertex program2 output registers The two sets act as aliases to each other Table 27 vp30 Varying Output Binding Semantics Binding Semantics Name Corresponding Data POSITION HPOS Output position PSIZE PSIZ Output point size 216 808 00504 0000 004 NVIDIA Appendix B Language Profiles Table 27 vp30 Varying Output Binding Semantics continued Binding Semantics Name Corresponding Data FOG FOGC Output fog coordinate COLORO COLO Output primary color COLOR1 COL1 Output secondary color BCOLO Output backface primary color BCOL1 Output backface secondary color TEXCOORDO TEXCOORD7 Output texture coordinates TEXO TEX7 CLPO CL5 Output Clip distances The profile allows WPOS to be present as binding semantics on a member of a structure of a vatying output data structure provided the member with this binding semantics is not referenced This allows Cg programs to have same structure specify the varying output of a vp30 profile program and the varying input of an p30 profile program 808 00504 0000 004 217 NVIDIA Cg Language Toolkit OpenGL NV_fragment_program Profile p30 The p30 Fragment Program Profile is used to compile Cg source code to fragment programs for use by the NV ragment program OpenGL extension a Profile name p30 Q How to invoke

Cg Toolkit User`s Manual

Contents

Download Pdf Manuals

Related Search

Related Contents