290 likes | 384 Views
Status – Week 240. Victor Moya. Summary. Post Geometry Pipeline. Rasterization. Triangle Setup. Triangle Traversal. Interpolation. Current status. Post Geometry Pipeline. Divide by w? Clipping? NVidia doesn’t seem to have geometric clipping. Alpha kill in NV2x for user clip planes.
E N D
Status – Week 240 Victor Moya
Summary • Post Geometry Pipeline. • Rasterization. • Triangle Setup. • Triangle Traversal. • Interpolation. • Current status.
Post Geometry Pipeline • Divide by w? • Clipping? • NVidia doesn’t seem to have geometric clipping. • Alpha kill in NV2x for user clip planes. • ATI seems to have geometric clipping. • Proper user clipping. • No support for transformed and lit vertex clipping. • What do we do?
Post Geometry Pipeline • Clipping: • 6 frustum clip planes. • At least 6 user clip planes. • Hardware requeriments: • Plane – edge intersection (?). • Generates new vertices (for triangles 1 or 2). • Interpolate output attributes at the new vertex. • Can generate new triangles (for triangles 1). • Affects primitive assembly. • At least frustum clipping should be fast.
Post Geometry Pipeline • Viewport Transformation • Delay to end of rasterization (at conversion from fixed point to float point fragment attributes). • Use fixed point device coordinates [-1, 1] for rasterization. • Rasterization.
MC StF StOC StC PA TS TT Int 1 1 1 2 1 1 1 A*TL+L 1 A*TL+L StL Shader 1 1 MC: Memory Controller Shader: Vertex Shader StF: Streamer Fetch PA: Primitive Assembly StL: Streamer Loader TS: Triangle Setup StOC: Streamer Output Cache TT: Triangle Traversal StC: Streamer Commit Int: Interpolation
Rasterization • We can divide it in three phases: • Setup. • Calculate linear equation coefficients, start values and slopes. • Perform area and face culling. • Traversal. • Traverse the triangle generating fragments inside the triangle. • Clipping of fragments by frustum and user clip. • Interpolation. • Interpolate all fragment attributes for the generated fragment.
Triangle Setup • Use 2DH rasterization setup. • Create matrix (inverse or just adjoint matrix?) from the three vertex 2DH positions. • Calculate determinant. • Cull for sign (face culling) and zero (zero area). • Send the edge equation coefficients or/and start and slope values to Triangle Traversal. • Optional: send other equations (1/w, clip planes, interpolators …).
Triangle Setup • Adjoint rasterization matrix adj(M): • First level: 18 muls. • Second level: 9 adds. • a0 = y1w2 – y2w1 • a1 = y2w0 – y0w2 • a2 = y0w1 – y1w0 • b0 = x2w1 – x1w2 • b1 = x0w2 – x2w0 • b2 = x1w0 – x0w1 • c0 = x1y2 – x2y1 • c1 = x2y0 - x0y2 • c2 = x0y1 – x1y0
Triangle Setup • Matrix determinant det(M): • 1 DP3: {w0, w1, w2} X {c0, c1, c2} • Inverse matrix M-1 (not needed?): • First level: 1 reciproque: 1/det(M). • Second level: 9 muls. • Edge equations: • M-1 rows. • E0 = [a0, b0, c0] • E1 = [a1, b1, c1] • E2 = [a2, b2, c2]
Triangle Setup • 1/w equation: • Sum of rows (param vector {1, 1, 1}). • Can be calculated as the sum of the edge equations. • Additional equations: • param vector {u0, u1, u2} X M-1 : 3 DP3. • Frustum/Viewport clip: • D0 = [1, 0, -x0] • D1 = [-1, 0, x0 + w] • D2 = [0, 1, -y0] • D3 = [0, -1, y0 + h]
* + + * * * * DP3
Triangle Traversal • Different algorithms: • I don’t know which is better. • Scanline. • Centerline (PixelVision). • Tiled (Neon, McCormack). • Incremental and Hierarchical Hilbert Order (McCool). • Others?
Triangle Traversal • Traversal algorithm effects: • Can improve the texture pattern access (Neon, Hilbert). • Can improve framebuffer memory access (Neon). • Traversal algorithm requeriments: • Must produce at least 2x2 fragments per cycle or multiples (2 2x2 or 3 2x2, etc). • Must be efficient and generate the less fragments outside the triangle. • Antialiasing?
Triangle Traversal • Uses edge equation coefficients and/or start and slope values calculated from then to walk the triangle. • One ‘step’ per cycle. • Fixed point arithmetic : integer addition. • Requires to save state (2 to 3 saved states) or must use walk back (spends cycles). • Tests (sign) the edge equations values at n positions per cycle. • May test frustum and znear/zfar clip at the same time.
Triangle Traversal • Hardware requeriments: • Multiple fixed point adders. • Multiple sign testers. • Registers for current (at least 3 for each edge equation) and saved states. • Registers for edge slops/increments (as many as fragments generated per cycle and edge equations?).
Traversal Algorithm TEST + + +
Interpolation. • Using barycentric method: • Use the edge equation result (McCool): • F0(x,y) = E0 • F1(x,y) = E1 • F2(x,y) = E2 • Calculate sum of edge equations at the fragment: • R’(x,y) = F0 (x,y) + F1(x,y) + F2(x,y) • Calculate reciproque: • r = 1/R’(x,y) • Interpolate attribute at the fragment: • pk(x,y) = pk0rF0 (x,y) + pk1rF1(x,y) + pk2rF2(x,y)
Interpolation • Alternative (Olano & Greer): • At setup: • Use 2DH method and calculate coefficients for all the attributes. • Calculate 1/w (sum of rows) coefficients. • Requires a vector matrix mul per attribute. • At traverse/interpolation: • Interpolate 1/w and attributes using fixed point incremental arithmetic. • Calculate reciproque of 1/w. • Mul interpolated attribute by reciproque of 1/w
Interpolation • Barycentric coordinates (McCool): • no cost at setup. • store the parameter values at the three triangle edges. • fixed: 1 addition, 1 reciproque and 3 muls • per parameter: 1 DP3. • Interpolation using Olano & Greer: • vector matrix mul at setup per parameter and 1/w: 3 DP3. • store current state and slope increment for all the parameters and 1/w. • fixed: 1 addition, 1 reciproque • per parameter: 1 addition, 1 mul.
Interpolation • How many attributes/parameters can be interpolated per cycle? • XBOX: • 5 interpolators? • general interpolator: color diffuse + color specular (shared). • Texture interpolators: 4? • Note: each of those interpolators is for a 4D vector.
VERTEX ATTRIBUTES FRAGMENT ATTRIBUTES * * + * * * * + 1/x
Current status • Implemented Primitive Assembly box (with trivial degenerate triangle rejection). • Added GPU_VERTEX_OUTPUT_ATTRIBUTE register. • Boolean vector of MAX_VERTEX_ATTRIBUTES that stores if a vertex output register is written in the shader (and therefore must be transmited). • Now the transmission latency for vertex between the Shader and Streamer Commit and between Streamer Commit and Primitive Assembly is determined by the number of ouput attributes.
Current Status • Started Triangle Setup box and support classes.
Current Status • Comments: • Streamer Loader to Shader transmission should also have transmission latency penalty? • Where are stored the vertex output attributes? • How many times we must pay the vertex transmission penalty?
Current Status • Signal Analyzer: • Already works with large traces.
References • Triangle Scan Conversion using 2D Homogeneous Coordinates, Marc Olano, Trey Greer. • Tiled Polygon Traversal Using Half-Plane Edge Functions, Joel McCormack, Robert McNamara. • Incremental and Hierarchical Hilber Order Edge Equation Polygon Rasterization, Michael D. McCool, Chris Wales, Kevin Moule.
References • A Parallel Algorithm for Polygon Rasterization, Juan Pineda.