• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Vertex shader performance. Hardware vs software vertex assembly

Mark R

Diamond Member
I've been fiddling with a 3D engine for visualization of complex data. The core of the engine requires finding the intersection of a clipping plane through a series of parallelopipeds.

The problem has been that performing the clipping and finding the vertices in the cut surface is quite involved - and I'd originally done it in software (because it required quite a lot of pain, including having to sort the transformed vertices into clockwise order).

I did eventually modify it to precalculate the awkward bits (vertex orders, etc.) and then just use the hints for each polygon. I was doing this in software, which meant rebuilding the vertex buffer 100-odd times per frame, i.e. once per object. As the bulk of this final stage was just a few dot-products, I thought I'd just try porting it to a vertex shader.

I built the shader and set it off - I got quite a surprise when the vertex shader enabled engine ran 10x slower than the pure software version. Realising that the prob was the DirectX debug runtimes I reinstalled the normal runtimes. The engine performance jumped hugely, but the vertex setup time was still 2x-3x slower than the pure software version.

So, why is my vertex processing so slow? Is it because I only draw one polygon at a time (max 6 vertices) - each object will be intersected exactly once, so will yield a convex hull with up to 6 vertices, and each object uses a different texture - so multiple objects can't be batched together.

Any hints/tips for making it faster?

Cliffs:
1. 3D engine uses complex algorithm to determine visibility
2. Rendering converted to 2 step process - a) painful software algorithm + b) simple final setup
3. Sofware version of the simple final setup is way slower than hardware accelerated version on 8800GTX
 
Originally posted by: Mark R
So, why is my vertex processing so slow? Is it because I only draw one polygon at a time (max 6 vertices) - each object will be intersected exactly once, so will yield a convex hull with up to 6 vertices, and each object uses a different texture - so multiple objects can't be batched together.

Any hints/tips for making it faster?

Bingo. You need to drive GPUs with massive datasets to make them efficient. Otherwise, the computation is dominated by transfer time and setup time on the GPU.

Review the following:

* Ryoo et. al. Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA Mostly about GPGPU, but you will get an idea of the scale required, and difficulty of performance-tuning GPU code



 
Yeah GPUs shine when you can load data once and then do -a lot- of computation on it. For example, I was doing some finite element stuff... part of which required the calculation of a ~100x100 jacobian matrix. And this happens for a few thousand elements across a few million time steps. But the matrix doesn't change over the time steps... on a CPU, it's faster to pre-calculate it. On a GPU (9800 series), it's actually faster to recalculate the thing every time you need it, rather than calculate once & load from memory afterward.

It's really a very different world.
 
Back
Top