1 / 52

Computer Graphics 3 Lecture 4: GPU Programming

Computer Graphics 3 Lecture 4: GPU Programming. Dr. Benjamin Mora. University of Wales Swansea. 1. Benjamin Mora. Content. Introduction. Vertex and Fragment Programs. Programming the GPU. Assembly Code. High Level Languages. Example of applications. Conclusion.

cameo
Download Presentation

Computer Graphics 3 Lecture 4: GPU Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Graphics 3Lecture 4:GPU Programming Dr. Benjamin Mora University of Wales Swansea 1 Benjamin Mora

  2. Content • Introduction. • Vertex and Fragment Programs. • Programming the GPU. • Assembly Code. • High Level Languages. • Example of applications. • Conclusion. University of Wales Swansea 2 Benjamin Mora

  3. Introduction University of Wales Swansea 3 Benjamin Mora

  4. Introduction • OpenGL (SGI) early oriented the design of current graphics processors (GPUs). • Fixed pipeline. • Once the different tests are passed, the fragment color is replaced by the new (textured & interpolated) one. • Not realistic enough. • The graphics pipeline is fed with Primitives like Triangles, Points, etc… that are rasterized. • Two main stages: • Vertex processing. • Fragment (rasterized pixel) processing. • These 2 stages have been extended for more realism. University of Wales Swansea 4 Benjamin Mora

  5. Introduction • Latest evolutions • Unified shaders. • Automatic graphical units balancing between vertex and fragment programs. • The lower the image size is, the more cpu and vertex bound the program is. • The greater the image-size is, the more fragment/pixel bound the program is. • Anti-aliasing and texture filtering parameters also contribute to this. • Geometry shaders discussed separately. University of Wales Swansea 5 Benjamin Mora

  6. Vertex and Fragments Programs University of Wales Swansea 6 Benjamin Mora

  7. Vertex and Fragment Programs Daniel Weiskopf, Basics of GPU-Based Programming, http://www.vis.uni-stuttgart.de/vis04_tutorial/vis04_weiskopf_intro_gpu.pdf University of Wales Swansea 7 Benjamin Mora

  8. Vertex Programs: User-Defined Vertex Processing Transform And Lighting Setup Rasterization Texture Fetch, Fragment Shading Fragment Programs: User-Defined Per-Pixel Processing Frame Buffer Blending Vertex and Fragment Programs Vertices Tests (z, stencil…) University of Wales Swansea 8 Benjamin Mora

  9. Programming the GPU University of Wales Swansea 9 Benjamin Mora

  10. Programming the GPU • Low Level languages (Pseudo-assembler). • Help to understand what is possible on the GPU. • Large code is a pain to maintain/optimize. • May be specific to the graphics card generation/supplier. • High Level languages. • Easier to write. • Early compilers were not very good. • Code may be more compatible. • Loops. University of Wales Swansea 10 Benjamin Mora

  11. Current Low Level Languages (APIs) • DirectX 9. • Vertex shader 2.0. • Pixel shader 2.0. • OpenGL extensions. • GL_ARB_vertex_program. • GL_ARB_fragment_program. • Vendor APIs • NVidia vertex and fragment program. University of Wales Swansea 11 Benjamin Mora

  12. Current High Level Languages (APIs) • Microsoft, ATI. • High Level Shading Language (HLSL). • NVidia. • Cg. • OpenGL Shading Language. University of Wales Swansea 12 Benjamin Mora

  13. How to use them? • Assembly programs: • Can be loaded (and compiled) at run-time (OpenGL). • Several programs can be loaded at once. • Applying the suitable rendering style (i.e. program) to every scene primitive. • Avoid latency due to pseudo-assembly compilation. • High level Programs: • Must be compiled before run-time. • The resulting (pseudo) assembly code can then be used. University of Wales Swansea 13 Benjamin Mora

  14. Vertex Programs • Vertex Program. • Bypass the T&L unit. • GPU instruction set to perform all vertex math. • Input: arbitrary vertex attributes. • Output: a transformed vertex attributes. • homogeneous clip space position (required). • colors (front/back, primary/secondary). • fog coord. • texture coordinates. • Point size. University of Wales Swansea 14 Benjamin Mora

  15. Vertex Programs • Customized computation of vertex attributes • Computation of anything that can be interpolated linearly between vertices. • Limitations: • Vertices can neither be generated nor destroyed. • Geometry shader for that. • No information about topology or ordering of vertices is available. University of Wales Swansea 15 Benjamin Mora

  16. Vertex Programs • Vertex programs bypass the following OpenGL functionalities: • Vertex transformations. • The modelview and projection matrix transformations. • Normal transformations and normalizations. • Color material. • Per-vertex lighting. • Texture coordinate generation. • Texture matrix transformations. • Raster position transformation. • Client-defined clip planes. • Per-vertex processing in EXT_point_parameters. • Per-vertex processing in NV_fog_distance. • Per-vertex point size computations. University of Wales Swansea 16 Benjamin Mora

  17. Vertex Programs • What is not replaced? • The view frustum clip. • Perspective divide (division by w). • The viewport transformation. • The depth range transformation. • Clamping the primary and secondary color to [0,1]. • Primitive assembly and per-fragment operations. • Evaluator (except the AUTO_NORMAL normalization). University of Wales Swansea 17 Benjamin Mora

  18. NV Vertex Programs • Different Versions: 1.0,1.1, 2.0, 3.0. • Version 1.0: • 12 temporary vectorial registers (xyzw): R0 => R11. • 96 Read-Only vectorial registers (xyzw). • Specified outside of glBegin/glEnd. • 8 Matrices. • 17 Different Vertex Programs instructions. • (128 instruction Max. inside the program.) • 27 in shader 3.0 model. University of Wales Swansea 18 Benjamin Mora

  19. NV Vertex Programs • Input Parameters for the vertices (v[]): Mnemonic Number Typical Meaning • OPOS 0 object position • WGHT 1 vertex weight • NRML 2 normal • COL0 3 primary color • COL1 4 secondary color • FOGC 5 fog coordinate • TEX0 8 texture coordinate 0 • TEX1 9 texture coordinate 1 • TEX2 10 texture coordinate 2 • TEX3 11 texture coordinate 3 • TEX4 12 texture coordinate 4 • TEX5 13 texture coordinate 5 • TEX6 14 texture coordinate 6 • TEX7 15 texture coordinate 7 University of Wales Swansea 19 Benjamin Mora

  20. NV Vertex Programs • New Output Values for the vertices (o[]): Mnemonic Typical Meaning • HPOS Homogeneous clip space position (x,y,z,w) • COL0 Primary color (front-facing) (r,g,b,a) • COL1 Secondary color (front-facing) (r,g,b,a) • BFC0 Back-facing primary color (r,g,b,a) • BFC1 Back-facing secondary color (r,g,b,a) • FOGC Fog coordinate (f,*,*,*) • PSIZ Point size (p,*,*,*) • TEX0 Texture coordinate set 0 (s,t,r,q) • TEX1 Texture coordinate set 1 (s,t,r,q) • TEX2 Texture coordinate set 2 (s,t,r,q) • TEX3 Texture coordinate set 3 (s,t,r,q) • TEX4 Texture coordinate set 4 (s,t,r,q) • TEX5 Texture coordinate set 5 (s,t,r,q) • TEX6 Texture coordinate set 6 (s,t,r,q) • TEX7 Texture coordinate set 7 (s,t,r,q) University of Wales Swansea 20 Benjamin Mora

  21. NV Vertex Programs • Vertex Program Instructions: OpCode Inputs Output Operation (scalar or vector) (vector or replicated scalar) ARL s address register address register load MOV v v move MUL v,v v multiply ADD v,v v add MAD v,v,v v multiply and add RCP s ssss reciprocal RSQ s ssss reciprocal square root DP3 v,v ssss 3-component dot product DP4 v,v ssss 4-component dot product DST v,v v distance vector MIN v,v v minimum MAX v,v v maximum SLT v,v v set on less than SGE v,v v set on greater equal than EXP s v (ssss?) exponential base 2 LOG s v (ssss?) logarithm base 2 LIT v v light coefficients University of Wales Swansea 21 Benjamin Mora

  22. 2 10 8 14 NV Vertex Programs • Special Instruction Manipulation: • Use of Negated Values: • MOV R0,-R1; • ADD R0,R1,-R2; # R0 <= R1-R2 (vectorial operation.) • Registers can be Swizzled: • MOV R1,R1.wzyx; • ADDR R1,R1,R1.xzxy; x y z w • Old R1: • New R1: 1 3 7 11 University of Wales Swansea 22 Benjamin Mora

  23. NV Vertex Programs • Example: Normal Normalization. # v[NRML] = (nx,ny,nz) # # R0.xyz = normalize(v[NRML]) # R0.w = 1/sqrt(nx*nx + ny*ny + nz*nz) # !!VP1.0 MOV R1, v[NRML] ; DP3 R0.w, R1, R1; RSQ R0.w, R0.w; MUL R0.xyz, R1, R0.wwww; # Then use R0 to compute shading... MOV o[COL0],... University of Wales Swansea 23 Benjamin Mora

  24. NV Vertex Programs #simple specular and diffuse lighting computation with an eye-space normal? !!VP1.0 # # c[0-3] = modelview projection (composite) matrix # c[4-7] = modelview inverse transpose # c[32] = normalized eye-space light direction (infinite light) # c[33] = normalized constant eye-space half-angle vector (infinite viewer) # c[35].x = pre-multiplied monochromatic diffuse light color & diffuse material # c[35].y = pre-multiplied monochromatic ambient light color & diffuse material # c[36] = specular color # c[38].x = specular power # # outputs homogenous position and color # DP4 o[HPOS].x, c[0], v[OPOS]; DP4 o[HPOS].y, c[1], v[OPOS]; DP4 o[HPOS].z, c[2], v[OPOS]; DP4 o[HPOS].w, c[3], v[OPOS]; DP3 R0.x, c[4], v[NRML]; DP3 R0.y, c[5], v[NRML]; DP3 R0.z, c[6], v[NRML]; # R0 = n' = transformed normal DP3 R1.x, c[32], R0; # R1.x = Lpos DOT n' DP3 R1.y, c[33], R0; # R1.y = hHat DOT n' MOV R1.w, c[38].x; # R1.w = specular power LIT R2, R1; # Compute lighting values MAD R3, c[35].x, R2.y, c[35].y; # diffuse + emissive MAD o[COL0].xyz, c[36], R2.z, R3; # + specular END University of Wales Swansea 24 Benjamin Mora

  25. NV Fragment Programs • Similar to the Vertex Programs. • Same way to load programs. • Inputs and Outputs are differents. • Different Set of instructions. • More instructions, but tend to be the same… • Versions available: 1.0, 2.0, and 4.0. • 64 constant vector registers. • 32 32-bit floating point precision registers or 64 16-bit floating point precision registers. University of Wales Swansea 25 Benjamin Mora

  26. NV Fragment Programs Fragment Program Inputs Register Name Description f[WPOS] Position of the fragment center. (x,y,z,1/w) f[COL0] Interpolated primary color (r,g,b,a) f[COL1] Interpolated secondary color (r,g,b,a) f[FOGC] Interpolated fog distance/coord (z,0,0,0) f[TEX0] Texture coordinate (unit 0) (s,t,r,q) f[TEX1] Texture coordinate (unit 1) (s,t,r,q) f[TEX2] Texture coordinate (unit 2) (s,t,r,q) f[TEX3] Texture coordinate (unit 3) (s,t,r,q) f[TEX4] Texture coordinate (unit 4) (s,t,r,q) f[TEX5] Texture coordinate (unit 5) (s,t,r,q) f[TEX6] Texture coordinate (unit 6) (s,t,r,q) f[TEX7] Texture coordinate (unit 7) (s,t,r,q) University of Wales Swansea 26 Benjamin Mora

  27. NV Fragment Programs Fragment Program Outputs Register Name Description o[COLR] Final RGBA fragment color, fp32 format (color programs) o[COLH] Final RGBA fragment color, fp16 format (color programs) o[DEPR] Final fragment depth value, fp32 format o[TEX0] TEXTURE0 output, fp16 format (combiner programs) o[TEX1] TEXTURE1 output, fp16 format (combiner programs) o[TEX2] TEXTURE2 output, fp16 format (combiner programs) o[TEX3] TEXTURE3 output, fp16 format (combiner programs) Write access only! University of Wales Swansea 27 Benjamin Mora

  28. NV Fragment Programs Fragment Program Instruction Set (V2.0) Instruction Inputs Output Description ADD[RHX][C][_SAT] v,v v add COS[RH ][C][_SAT] s ssss cosine DDX[RH ][C][_SAT] v v derivative relative to x DDY[RH ][C][_SAT] v v derivative relative to y DP3[RHX][C][_SAT] v,v ssss 3-component dot product DP4[RHX][C][_SAT] v,v ssss 4-component dot product DST[RH ][C][_SAT] v,v v distance vector EX2[RH ][C][_SAT] s ssss exponential base 2 FLR[RHX][C][_SAT] v v floor FRC[RHX][C][_SAT] v v fraction KIL none none conditionally discard fragment LG2[RH ][C][_SAT] s ssss logarithm base 2 LIT[RH ][C][_SAT] v v compute light coefficients LRP[RHX][C][_SAT] v,v,v v linear interpolation MAD[RHX][C][_SAT] v,v,v v multiply and add MAX[RHX][C][_SAT] v,v v maximum MIN[RHX][C][_SAT] v,v v minimum MOV[RHX][C][_SAT] v v move MUL[RHX][C][_SAT] v,v v multiply PK2H v ssss pack two 16-bit floats PK2US v ssss pack two unsigned 16-bit scalars PK4B v ssss pack four signed 8-bit scalars PK4UB v ssss pack four unsigned 8-bit scalars POW[RH ][C][_SAT] s,s ssss exponentiation (x^y) University of Wales Swansea 28 Benjamin Mora

  29. NV Fragment Programs Fragment Program Instruction Set (V2.0) Instruction Inputs Output Description RCP[RH ][C][_SAT] s ssss reciprocal RFL[RH ][C][_SAT] v,v v reflection vector RSQ[RH ][C][_SAT] s ssss reciprocal square root SEQ[RHX][C][_SAT] v,v v set on equal SFL[RHX][C][_SAT] v,v v set on false SGE[RHX][C][_SAT] v,v v set on greater than or equal SGT[RHX][C][_SAT] v,v v set on greater than SIN[RH ][C][_SAT] s ssss sine SLE[RHX][C][_SAT] v,v v set on less than or equal SLT[RHX][C][_SAT] v,v v set on less than SNE[RHX][C][_SAT] v,v v set on not equal STR[RHX][C][_SAT] v,v v set on true SUB[RHX][C][_SAT] v,v v subtract TEX[C][_SAT] v v texture lookup TXD[C][_SAT] v,v, v v texture lookup w/partials TXP[C][_SAT] v v projective texture lookup UP2H[C][_SAT] s v unpack two 16-bit floats UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars UP4B[C][_SAT] s v unpack four signed 8-bit scalars UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation University of Wales Swansea 29 Benjamin Mora

  30. NV Fragment Programs • Simple Example: Red Colouring of the fragments (i.e., rasterized pixels): !!FP1.0 DEFINE red={1.0,0,0,0}; MOV o[COLR], red; END • Simple Example: Applying Single Texturing. !!FP1.0 TEX R0, f[TEX0],TEX0, 2D; //Last Parameter can be 1D,2D,3D,RECT MOV o[COLR],R0; END University of Wales Swansea 30 Benjamin Mora

  31. NV Fragment Programs • Useful Instructions: • LRP: Linear Interpolation. • SIN, COS… • SGE,SLT, … : Set the comparison flags. • KILL : Stop the pixel computation. • Pack and Unpack instructions. • Most instructions are done in 1 cycle (without allowing for texture access). • Most instructions can conditionally update the result according the comparison flags (e.g., MOV => MOVC) • Most instructions can clamp the results between 0 and 1. • MOV => MOV_SAT. • Loops are now possible with the latest generation. University of Wales Swansea 31 Benjamin Mora

  32. (Silly) Limitations • Most of the limitations are for performance reasons. • At the fragment level, there is no real possibility to access the frame-buffer in read-write mode. • The new pixel value cannot be computed from the old one. • Floating-point precision filtering and blending only available in recent graphics cards (NV 8x00 generation). Previous cards (e.g., GeForce 7800 series) could only filter and blend at a FP16 precision. • Actual number of registers may be less than the number of logical registers. • Slower programs if a large number of registers is used. University of Wales Swansea 32 Benjamin Mora

  33. High Level Languages • Why ? • Assembly programming can be tedious when having long assembly shaders. • Inefficient or difficult programming and debugging operations. • High-level languages are more portable. • But: • Final code may be slower. University of Wales Swansea 33 Benjamin Mora

  34. High Level Languages: Cg Overview • C for Graphics. • Syntax similar to C for easy shader writing. • See CG manual. http://developer.nvidia.com/object/cg_toolkit.html • The Vertex and Fragments programs take specific input vectors and values, and have to return specific outputs. • Need to declare data structures that will be input and output parameters of a function. University of Wales Swansea 34 Benjamin Mora

  35. Cg: Inputs • Two kinds of shader inputs: • Varying Inputs. • Inputs that are specific to each entity processed. • Vertex: Position, Normals, etc… • Fragment: Interpolated values like colors, texture coordinates, etc… • Uniform Inputs. • Values that do not change when streaming vertices. • Vertex level: Transformation Matrix. • Fragment Level: Constant parameters,… University of Wales Swansea 35 Benjamin Mora

  36. Cg: Vertex Program Inputs • Supported Inputs to a CG Vertex Program (Binding semantics). • POSITION . • BLENDWEIGHT. • NORMAL. • TANGENT. • BINORMAL. • PSIZE. • BLENDINDICES. • TEXCOORD0—TEXCOORD7. • Every parameter can be declared as a float array with a range of 1 to 4 components. (float, float4,…). • float3 myPosition : POSITION; University of Wales Swansea 36 Benjamin Mora

  37. Cg: Vertex Program Inputs • Example from the CG user Manual. struct myinputs { float3 myPosition : POSITION; float3 myNormal : NORMAL; float3 myTangent : TANGENT; float refractive_index : TEXCOORD3; }; outdata foo(myinputs indata) { /* ... */ // Within the program, the parameters are referred to as // “indata.myPosition”, “indata.myNormal”, and so on. /* ... */ } University of Wales Swansea 37 Benjamin Mora

  38. Cg: Vertex Program Inputs • Inputs can be directly specified (rather than using a struct operator). • Example from the CG user Manual: outdata foo( float3 myPosition : POSITION, float3 myNormal : NORMAL, float3 myTangent : TANGENT, float refractive_index : TEXCOORD3) { /* ... */ } University of Wales Swansea 38 Benjamin Mora

  39. Cg: Vertex Program Varying Output • The vertex program output type should match the fragment programs input type. • The binding semantics will help the compiler to associate the vertex output to the fragment input (interoperability). • The semantics do not actually impose a specific use for those channels. • Texture coordinates can be used to specify colors or locations for example. University of Wales Swansea 39 Benjamin Mora

  40. Cg: Vertex Program Varying Output • Supported outputs to a Vertex Program. • POSITION. • PSIZE. • FOG. • COLOR0–COLOR1. • TEXCOORD0–TEXCOORD7. University of Wales Swansea 40 Benjamin Mora

  41. Cg: Vertex Program Varying Output • Example from the CG user Manual: // Vertex program (inside a CG file…) struct myvf { float4 pout : POSITION; // Used for rasterization float4 diffusecolor : COLOR0; float4 uv0 : TEXCOORD0; float4 uv1 : TEXCOORD1; }; myvf foo(/* ... */) { myvf outstuff; /* ... */ return outstuff; } University of Wales Swansea 41 Benjamin Mora

  42. Cg: Input/Output Interoperability • Example from the CG user Manual: struct myvert2frag { float4 pos : POSITION; float4 uv0 : TEXCOORD0; float4 uv1 : TEXCOORD1; }; // Vertex program myvert2frag vertmain(...) { myvert2frag outdata; /* ... */ return outdata; } // Fragment program void fragmain(myvert2frag indata ) { float4 tcoord = indata.uv0; /* ... */ } University of Wales Swansea 42 Benjamin Mora

  43. Cg: Fragment Program Varying Output • Two supported outputs: COLOR and DEPTH. • Examples: void main(/* ... */, out float4 color : COLOR, out float depth : DEPTH) { /* ...*/ color = diffuseColor * /* ...*/; depth = /*...*/; } float4 main(/* ... */) : COLOR { /* ... */ return diffuseColor * /* ... */; } University of Wales Swansea 43 Benjamin Mora

  44. Cg: General Coding • Different type of variables are supported and declarable: • float, half (16 bits), fixed (12 bits). • int, bool. • float1, float4, bool4, bool1,… • float1x1, float2x2,… • Arrays. • Can declare auxiliary functions. • A wide set of functions and operators is also available. University of Wales Swansea 44 Benjamin Mora

  45. Cg: General Coding • Control flow. • if, else, while, for. • Function definitions and function overloads. • Arithmetic operators from C. • Multiplication function. • MatrixxVector, VectorxMatrix, MatrixxMatrix. • Vector constructor. • Boolean and comparison operators. • Swizzle operator. • float4 a; =>a.xxxx; • Write mask operator. • float4 color = float4(1.0, 1.0, 0.0, 0.0); color.a=2.0; • Conditional operator. University of Wales Swansea 45 Benjamin Mora

  46. Cg: General Coding • Standard nonprojective texture lookup: • tex2D (sampler2D tex, float2 s); • texRECT (samplerRECT tex, float2 s); • texCUBE (samplerCUBE tex, float3 s); • Standard projective texture lookup: • tex2Dproj (sampler2D tex, float3 sq); • texRECTproj (samplerRECT tex, float3 sq); • texCUBEproj (samplerCUBE tex, float4 sq); • Math functions: • abs, cos, sin, tan, acos, asin, atan, clamp, determinant, exp, log, floor, lerp, min, max, pow, sqrt, normalize, … University of Wales Swansea 46 Benjamin Mora

  47. Applications University of Wales Swansea 47 Benjamin Mora

  48. Application: Procedural Texturing • Application of textures that are not image based. • Combination of noise and various math expressions. (Perlin Noise.) • Representation of Wood, Marble, Stone, Clouds, Waves, Bumps… • Can be computed at the fragment level. • Adds computations, but reduces bandwidth. • Suppresses the issue of texturing curved surfaces. ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/ University of Wales Swansea 48 Benjamin Mora

  49. Application: Phong Shading • Traditional OpenGL pipeline implements Gouraud (shading) interpolation. • Computation of colors and lighting at the vertices, followed by a linear interpolation. • Can miss the specular highlights that can occur in the middle of a triangle. • Phong interpolation is better. • Linearly interpolate the normal across the triangle first. • Then compute Phong shading from the interpolated normal. ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/ University of Wales Swansea 49 Benjamin Mora

  50. Application: Phong Shading Ian Fergusson, https://www.cis.strath.ac.uk/teaching/ug/classes/52.359/lect13.pdf University of Wales Swansea 50 Benjamin Mora

More Related