Tesselation review by xbitlabs

Lonyo

Lifer
Aug 10, 2002
21,938
6
81
What it basically says is that most current real world implementations of tessellation are more than sufficiently dealt with by existing tessellation designs from both NV and ATI, but going forward, both NV and ATI have insufficient methods of dealing with tessellation, so they will both need to up their game (although ATI a little more than NV since their performance drop is a bit bigger).

Not a huge amount of news, and it would be nicer if they have done some lower end tests, to see how the current mid-range (from ATI) deals with it.
The (lack of) difference in scaling between the NV cards is quite interesting, since the GF104 should in theory have a decent amount less power but the percentage decrease in higher workloads isn't that much greater.

The fact that the drop as you add tessellation in all the benches where there is a noticeable drop is similar between the GTX480 and GTX460 suggests again that the tessellator itself isn't a limiting factor.
The GTX460 has 7 polymorph engines at 675 (or 1350, doesn't matter), while the GTX480 has 15 at 700, so it should have only ~45% of the tessellation capability of the GF100/GTX480 (through polymorph engines x clockspeed), yet the performance drop when tessellation is used is roughly similar, and nowhere near the drop in theoretical tessellation performance. So the overall card is more the limiting factor when you increase tessellation levels beyond "normal" (i.e. the games where there's pretty much no hit, e.g. Dirt 2, AvP) and into the more extreme levels of Metro 2033, Stone Giant and Unigine.
So the ATI tessellator and the reduced tessellation performance of the GF104 vs GF100 don't seem to be the limiting factors. IMO.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
I found it interesting that even in this review that the 5830 matches the 460 in Metro 2033.
http://benchmarkreviews.com/index.p...k=view&id=576&Itemid=72&limit=1&limitstart=12

It looks like a stock GTX460 can't compete with 5850 or GTX470 in Metro 2033:

http://www.hardwarecanucks.com/foru...force-gtx-460-cyclone-768mb-oc-review-12.html

We need to see more games with DX11 effects to see if GTX480 can keep up its serious performance advantage over the 5870 as was the case in STALKER and Metro 2033 (without Tessellation at 2560x1600).
 
Last edited:

happy medium

Lifer
Jun 8, 2003
14,387
480
126

Skurge

Diamond Member
Aug 17, 2009
5,195
1
71
Maybe they trimmed out the GTX460 a little too much. That could be why the GTX465 is faster.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
I find it more interesting that the gtx 465 looks much better?
Bear in mind that the 465 has more Polymorph Engines since it has more SMs. If you are tessellation limited then the 465 should be a good 10% faster at stock clocks.
 

Borealis7

Platinum Member
Oct 19, 2006
2,901
205
106
taking into consideration the rumors about SI (not really a new architecture) i dont think Tesselation performance will increase much in ATI cards in the next generation.

also it would be nice to see more games with a choice for tesselation levels like "low","medium" and "high" so you could enjoy some tessing without completely killing your FPS.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
This is an area where I would really like to see some synthetic tests.
Eg, making the shading as simple as possible, no textures etc... so you really measure the geometry workload, not the pixel procesing workload.

Another test that would be interesting is to compare on-the-fly tessellated geometry with the same geometry that was preprocessed to have the same detail. Currently we only compare static lowpoly with dynamic highpoly. But I'd like to say dynamic highpoly vs static highpoly, to get a better idea of how much is saved by generating the geometry on-the-fly, rather than loading it from memory.
It would also give some sort of indication about whether the polycount itself may run into bottlenecks in other parts of the pipeline.
 

dug777

Lifer
Oct 13, 2004
24,778
4
0
What it basically says is that most current real world implementations of tessellation are more than sufficiently dealt with by existing tessellation designs from both NV and ATI, but going forward, both NV and ATI have insufficient methods of dealing with tessellation, so they will both need to up their game (although ATI a little more than NV since their performance drop is a bit bigger).

Not a huge amount of news, and it would be nicer if they have done some lower end tests, to see how the current mid-range (from ATI) deals with it.
The (lack of) difference in scaling between the NV cards is quite interesting, since the GF104 should in theory have a decent amount less power but the percentage decrease in higher workloads isn't that much greater.

The fact that the drop as you add tessellation in all the benches where there is a noticeable drop is similar between the GTX480 and GTX460 suggests again that the tessellator itself isn't a limiting factor.
The GTX460 has 7 polymorph engines at 675 (or 1350, doesn't matter), while the GTX480 has 15 at 700, so it should have only ~45% of the tessellation capability of the GF100/GTX480 (through polymorph engines x clockspeed), yet the performance drop when tessellation is used is roughly similar, and nowhere near the drop in theoretical tessellation performance. So the overall card is more the limiting factor when you increase tessellation levels beyond "normal" (i.e. the games where there's pretty much no hit, e.g. Dirt 2, AvP) and into the more extreme levels of Metro 2033, Stone Giant and Unigine.
So the ATI tessellator and the reduced tessellation performance of the GF104 vs GF100 don't seem to be the limiting factors. IMO.

I find it interesting that in Metro2033 the Fermi cards look like they are taking a bigger hit %wise than the Cypress cards when you look at the minumum fps, at least at the top two resolutions (the GTX 480 does better at the first res tested I think).

I thought nvidia's way of doing it was supposed to be more efficient?
 

Scali

Banned
Dec 3, 2004
2,495
0
0
I thought nvidia's way of doing it was supposed to be more efficient?

Depends on which benchmark you take.
Obviously with programmable hardware, it all depends on how the functionality is implemented by the developer.
Stone Giant and Heaven show exactly what nVidia had in mind: There is a pretty linear decrease in performance going from low to high tessellation settings.
The Radeons seem to have 'reached their fill' at a certain point and drop off almost exponentially from there on.

Heaven relies almost completely on tessellation, where turning tessellation off results in 'wrong' geometry, because too much detail is missing.

But that's not a realistic scenario for games. If you disable tessellation in games, you still want things to look correctly. I think this results in the 'flaw' that games with tessellation are based on the non-tessellated geometry, rather than simplified geometry (which is how tessellation should really be used... reduce detail).
 

dug777

Lifer
Oct 13, 2004
24,778
4
0
Depends on which benchmark you take.
Obviously with programmable hardware, it all depends on how the functionality is implemented by the developer.
Stone Giant and Heaven show exactly what nVidia had in mind: There is a pretty linear decrease in performance going from low to high tessellation settings.
The Radeons seem to have 'reached their fill' at a certain point and drop off almost exponentially from there on.

Heaven relies almost completely on tessellation, where turning tessellation off results in 'wrong' geometry, because too much detail is missing.

But that's not a realistic scenario for games. If you disable tessellation in games, you still want things to look correctly. I think this results in the 'flaw' that games with tessellation are based on the non-tessellated geometry, rather than simplified geometry (which is how tessellation should really be used... reduce detail).

Agreed that the synthetics don't seem to be relating well to those real world impacts, as you mention, they show the linear decrease for fermi, and that isn't replicated in the only practical real world example tested there that actually changes the gameplay experience in terms of FPS impact when you turn tesselation, which is Metro 2033.

It's interesting to see that the GTX 480 minimum gets smashed at even 19x10 there. Would love to know what is going on there, it certainly obliterates the GTX 460 on paper in sheer tesselation horsepower and in every other way that I am aware of, but takes a bigger % hit when you turn it on at that res and 25x16 (where the GTX 480's avg fps also tumbles).

Could this be a memory bandwidth issue? From the depths of my memory is the only area where the GTX 480 didn't take a huge stride forward over the GTX 285 memory bandwidth? I suppose that doesn't explain the % drop relative to the GTX 460, although it may explain the GTX 480's 25x16 performance hit.

Will be interesting when we get some further tesselation heavy games to see if that Metro 2033 experience is replicated, or if they scale more as per the synthetics.
 
Last edited:

Sylvanas

Diamond Member
Jan 20, 2004
3,752
0
0
I find it interesting that in Metro2033 the Fermi cards look like they are taking a bigger hit %wise than the Cypress cards when you look at the minumum fps, at least at the top two resolutions (the GTX 480 does better at the first res tested I think).

I thought nvidia's way of doing it was supposed to be more efficient?

Indeed, at 1920x1200, the 480 takes a 45% hit in minimums while the 5870 takes a 25% hit. First thing that comes to mind is how they determine a 'minimum' if it is one frame at one instant during a 1minute Fraps then that is not an accurate measurement to go by (due to all sorts of irregularities in the application and/or background processes), if it occurs more than once through the whole run or occurs for a 'relatively' long time in one particular scene then that is what should be called a minimum.

With that out of the way, it baffles me. If we assume the minimums are when high load is placed on the GPUs and in this case a 'high load' of Geometry, Fermi should win out in these circumstances every time- it's the nature of their architecture. Fermi's Polymorph engines are load balancing and have many more resources available than the 5870s fixed pipeline. If Geometry load jumped to 100% at one instant then Fermi should allocate that to each Polymorph independently and should that not be enough then it waits until resources are available just like what happens with Cypress. The difference being with Cypress the 'queue' for available resources in ATI's Geometry block would naturally take longer to alleviate than that of Fermi's 15 resources given the same data.

If I were to guess I'd say it has to do with drivers/application. Either bad optimisation by Nvidia or good optimisation from AMD, I'd say it's a mix of both (Nvidia has noted improvements in Metro2033 in recent drivers, to my knowledge ATI has not- goes to show perhaps Nvidia has more to give here). Nonetheless It's an impressive showing by ATI, given the 25% hit is incurred solely by increased geometry load and nothing to do with Shaders (and thus nothing to do with Vec5 or the compiler) it goes to show that the same data is dispatched quicker on Cypress than Fermi.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
It should scale more like the synthetic benchmarks because that is the correct way of applying tessellation.
You feed a low poly mesh, which is your 'control cage', and the parameters of those control points generate the detailed geometry.
Pixar does it the same way, going all the way down to subpixel-level with their tessellation. The actual control meshes are NURBS patches with a pretty low number of points, really.

What we see with current games is that they are basically DX9-class games with tessellation added on top, to make already high-poly geometry even more high-poly. Much like what TruForm used to do (and with the same bugs, as I see in the Metro 2033 screenshot: the weapon looks a bit 'inflated' in some points... other than that, you barely see what the tessellation is doing anyway).
Then you have a relatively low amplification factor, and that is suited quite well to the naive approach of the Radeons (eg each triangle results in 4 new triangles being generated).
If you turn up the amplification factor (Pixar-style... ~2000 triangle mesh turns into millions of triangles... Heaven comes pretty close to this scenario), then you need something considerably more powerful. I think nVidia's idea is okay, it's just not applied to the scale that we want it to be yet.

But what games are doing currently with tessellation is just nonsense. It's really just a checkbox feature. They run up the polycount to insane levels, without really making the game look better, or getting better performance.

If you want to use tessellation properly, you need to adjust your entire geometry to this. Unigine Heaven does this. Sadly it cannot be run with high-poly static geometry, so we cannot do an interesting comparison with ~equal polycount with tessellation on and off.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,003
126
I have tessellation disabled in all three games I play that use the feature. In CoP and AvP 3 the difference in IQ is basically non-existent outside of still screenshots, while Metro2033 has a massive performance hit at the resolution I use (2560x1600).
 

thilanliyan

Lifer
Jun 21, 2005
12,060
2,273
126
What we see with current games is that they are basically DX9-class games with tessellation added on top, to make already high-poly geometry even more high-poly. Much like what TruForm used to do (and with the same bugs, as I see in the Metro 2033 screenshot: the weapon looks a bit 'inflated' in some points... other than that, you barely see what the tessellation is doing anyway).

Do you think games will eventually reach the level of what Unigine is doing? When do you think that would happen?
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Unigine uses Tessellation in a very innefficient way, I'm pretty confident that done properly, will look as good as Unigine and will run faster than it is doing currently.

http://www.beyond3d.com/content/reviews/54/9

That's a nice article that gives a glimpse of what's going on in Unigine and why it's so hard on Cypress.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Do you think games will eventually reach the level of what Unigine is doing? When do you think that would happen?

Well, Unigine is a game engine, any game could license it and make use of it right now.
The biggest problem is that you need to design your content with tessellation in mind.
I have no idea when game developers will make that decision though.
Perhaps not until tessellation hardware is commonplace... Although technically that would not be a requirement.
Namely, once you've prepared your tessellation content, you can also pre-tessellate it to generate more detailed static geometry for non-tessellation hardware.
It's more difficult (although not entirely impossible) to take detailed geometry and reduce the detail to replace it with a tessellated solution.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Unigine uses Tessellation in a very innefficient way, I'm pretty confident that done properly, will look as good as Unigine and will run faster than it is doing currently.

http://www.beyond3d.com/content/reviews/54/9

That's a nice article that gives a glimpse of what's going on in Unigine and why it's so hard on Cypress.

Typical of Beyond3D.
"This is a consequence of having numerous small triangles in the scene (just look at something like the dragon's leg or the roof), and is one of the cases where upping setup rate beyond 1 triangle/clock could have helped (we're pretty sure the rasteriser itself isn't the one causing the stalls, given pixel/triangle ratios)."

Well yea, the idea of tessellation is to generate detail... the whole point is to generate numerous small triangles, so that the result doesn't look like triangles anymore, but like perfectly curved surfaces.

The problem with Cypress is that it cannot do tessellation for large amplification factors, as I've already mentioned before.
Yet large amplification factors are pretty much the ONLY scenario where tessellation makes sense. Which is why you want a more powerful parallelized tessellation and triangle setup engine. nVidia didn't put that thing in Fermi just for kicks.
By having low poly source meshes, you also reduce the workload for things like vertex skinning a lot (although the Heaven benchmark doesn't really make use of that).
 

Janooo

Golden Member
Aug 22, 2005
1,067
13
81
Typical of Beyond3D.
"This is a consequence of having numerous small triangles in the scene (just look at something like the dragon's leg or the roof), and is one of the cases where upping setup rate beyond 1 triangle/clock could have helped (we're pretty sure the rasteriser itself isn't the one causing the stalls, given pixel/triangle ratios)."

Well yea, the idea of tessellation is to generate detail... the whole point is to generate numerous small triangles, so that the result doesn't look like triangles anymore, but like perfectly curved surfaces.

The problem with Cypress is that it cannot do tessellation for large amplification factors, as I've already mentioned before.
Yet large amplification factors are pretty much the ONLY scenario where tessellation makes sense. Which is why you want a more powerful parallelized tessellation and triangle setup engine. nVidia didn't put that thing in Fermi just for kicks.
By having low poly source meshes, you also reduce the workload for things like vertex skinning a lot (although the Heaven benchmark doesn't really make use of that).
Well, why you did not continue with the quote?
One thing worth noting is that between 60 and 80(!)% of these primitives get culled, which doesn't strike us as terribly efficient: you've just hammered the GPU with some heavy tessellation, generated a sea of triangles out of which a huge portion won't be used since they're back facing, or 0 area for example.
... in other words useless tessellation.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Well, why you did not continue with the quote?
... in other words useless tessellation.

It's a synthetic benchmark, who cares?
The entire thing is 'useless' anyway, except for measuring how well various parts of the GPU, including the tessellation unit, perform.

But you can always trust Beyond3D to attack the benchmark, rather than the weaknesses it uncovers with certain hardware.
 

Janooo

Golden Member
Aug 22, 2005
1,067
13
81
It's a synthetic benchmark, who cares?
The entire thing is 'useless' anyway, except for measuring how well various parts of the GPU, including the tessellation unit, perform.

But you can always trust Beyond3D to attack the benchmark, rather than the weaknesses it uncovers with certain hardware.
You contradict yourself:
Well, Unigine is a game engine, any game could license it and make use of it right now.

It seems it's not a weakness. Yes AMD tessellation unit is not as strong as NV one but I would say it is good enough for any reasonable in game tessellation (that improves image quality and does not force a work on invisible areas).