GPU Acceleration in Registration

GPU Acceleration in Registration Danny Ruijters 26 April 2007

Outline • The GPU • Rigid 3D-3D Registration • Elastic Registration • Conclusions

The GPU

The graphics card • Raserization of primitives • Texture mapping • Colour interpolation

The GPU • Graphics Processing Unit • Programmable processor in the graphics rendering pipeline • Parallel execution (SIMD like)

video memory on-chip cache memory vertex shading (T&L) pre-TnL cache geometry system memory commands post-TnL cache CPU triangle setup rasterization texture cache fragment shading and raster operations textures frame buffer Graphics rendering pipeline

video memory on-chip cache memory transfer limited vertex shading (T&L) pre-TnL cache transform limited geometry system memory commands post-TnL cache setup limited CPU triangle setup texture limited raster limited rasterization CPU limited fragment shader limited fragment shading and raster operations texture cache textures frame buffer frame buffer limited Bottlenecks

128 processing units Local cache Shared memory

Performance

Performance • Parallelism & pipelining (up to 16 parallel pipelines) • Vector processor • Moore’s Law: CPU: 2* performance per 18 months • GPU: 2* performance per 6 months

Textures & buffers • 1D, 2D, 3D textures • 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer) • 8, 10, 12, 16 bit integers, 16, 32 bit floating point • 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel

Historic overview GPU • RenderMan (1988, pre-history) • Intel MMX (SIMD, 1997, pre-history) • Register combiners (nVidia, 1999, bronze age) • Vender specific APIs (2001, iron age) • Generic assembly-like language (2002, middle-ages) • Different high-level languages (2003, industrial age) • CUDA: general purpose C-like language (2007, modern age)

Register combiners (1999, bronse age) // Stage 0 // spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir) glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXTURE0_ARB,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONSTANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXTURE0_ARB,GL_EXPAND_NEGATE_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONSTANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_NV,GL_DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);

GL_ARB_fragment_program (2002) !!ARBfp1.0 ATTRIB coord = fragment.texcoord[0]; ATTRIB color = fragment.color; OUTPUT out = result.color; TEMP texel; TEMP lookup; TEX texel, coord, texture[0], 3D; TEX lookup, texel, texture[1], 1D; MUL out, lookup, color; END

GLSlang (2003) uniform vec3 ViewDir; void main (void) { floatvalue; vec3 gradient; gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0; value = 1.0 - abs(dot(gradient, ViewDir)); value *= 1.3 * dot(gradient, gradient); value = clamp(value, 0.0, 1.0); gl_FragColor = vec4(value); }

CUDA (2007) • Compute Unified Device Architecture • General purpose C-like language • nVidia only • Very recently released

Rigid 3D-3D Registration

3DRA – MR registration

3DRA – XperCT Registration 1 Pre-operative

3DRA – XperCT Registration 2 Post-operative: verification of the embolization

3DRA Slice

Mutual information F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“ IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997

Joint histogram

Resampling Joint histogram: increment(g,g)

3DRA – MR, before, after

3DRA – MR: CPU interpolation

3DRA – MR: GPU interpolation

Elastic Registration

Elastic deformation • Parameterized deformation: • B-spline deformation:

Cubic B-spline

GPU linear interpolation • Hardwired: linear interpolation is much faster than separate lookups

= GPU Cubic Interpolation • Compose cubic interpolation from weighted sum of linear interpolations: C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2

= Outline of proof

GPU Cubic Interpolation • 2D: 4 linear-interpolated lookups, instead of 16 direct lookups • 3D: 8 linear-interpolated lookups, instead of 64 direct lookups

GPU Linear Interpolation Accuracy

Linear deformation, linear interpolation

Linear deformation, cubic interpolation

Cubic deformation, linear interpolation

Cubic deformation, cubic interpolation

Optimization • Many parameters: huge parameter space • Solution: use derivatives like Jacobian, Hessian • Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt

GPU Elastic Registration Iteration • Generate deformed image on GPU & store to texture • Calculate Similarity Measure & First-Order Derivative on GPU • Texture with reference image • Texture with deformed image

First-Order Derivative of Sim. Measure J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”

Derivative of the Similarity Measure SSD:

Derivative of the Deformed Image • Sobel operator to calculate gradients:

Derivative of the Control Points • Constant • B-spline: separatable kernel of fixed size

Original Fluoroscopy Sequence

2 * 2 Control Points

8 * 8 Control Points

Deformation Field

GPU Elastic Registration • 40 images: Quasi Newton: 16 seconds • Gradient Descent: 63 seconds • 8 * 8 Control Points: rest motion • Multi-resolution deformation field, with reduced parameters (discussed with Dirk Loeckx)

GPU Acceleration in Registration

GPU Acceleration in Registration

Presentation Transcript

GPU

KD-Tree Acceleration Structures for a GPU Raytracer

GPU Acceleration of Finite Element Computations

GPU

Acceleration Techniques for GPU-based Volume Rendering

GPU Acceleration of SVG November 2011

Parallelizing and Optimizing Programs for GPU Acceleration using CUDA

GPU Acceleration in ITK v4

Adventures in Acceleration

GPU acceleration in Matlab

GPU

Performance Measurement of Applications with GPU Acceleration using CUDA

GPU in HPC

Performance study of multi-GPU acceleration of LU Factorization

Harvesting the Opportunity of GPU-based Acceleration Matei Ripeanu

GPU

GPU Acceleration in ITK v4

GPU Manufacturers -JETALL GPU

Acceleration Techniques for GPU-based Volume Rendering

GPU