[SemiAccurate] Nvidia's Fermi GTX480 is broken and unfixable

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

alcoholbob

Diamond Member
May 24, 2005
6,390
470
126
Next interesting question would be: Is Fermi's tessellation prowess due to a fundamentally superior design, or is it due to tessellation being heavily cut in RV870 so it could fit into a smaller die size?

ATI 5xxx uses a hardware tesellator. Fermi's is entirely software-based, and uses shader power. It's pretty easy for Fermi to win a tessellation benchmark since it can dedicate a ton of hardware to tessellate. That said, the 5xxx won't worry about much of a performance hit with tessellation enabled, while Fermi will see a much larger performance hit since it has to dedicate shaders to that function.
 

C@rnage

Junior Member
Feb 20, 2010
2
0
0
astrallite,fermi is hardware tesselated.how can it be software based if it uses cuda cores to render it.i believe fermi scored almost twice as high in the unigine benchmark for tesselation.this is just my personal outlook.i think fermi will surprise us all in terms of performance.i say it will be on par with the 5970.kinda like the way the 5870 is shadowed by the gtx295.anyways,this is my last post.dont care much for these forums.take care
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
astrallite,fermi is hardware tesselated.how can it be software based if it uses cuda cores to render it.i believe fermi scored almost twice as high in the unigine benchmark for tesselation.this is just my personal outlook.i think fermi will surprise us all in terms of performance.i say it will be on par with the 5970.kinda like the way the 5870 is shadowed by the gtx295.anyways,this is my last post.dont care much for these forums because they're not green enough.take care

fixed
 

yh125d

Diamond Member
Dec 23, 2006
6,886
0
76
i think fermi will surprise us all in terms of performance.i say it will be on par with the 5970.

It better be, it's releasing 6+ months later. It's more like a half gen card at this point

kinda like the way the 5870 is shadowed by the gtx295.

What? 5870 isn't overshadowed by the GTX295 in anything but price

anyways,this is my last post.dont care much for these forums.take care

Buh bye. We don't care much for you either
 

Schmide

Diamond Member
Mar 7, 2002
5,788
1,093
126
astrallite,fermi is hardware tesselated.how can it be software based if it uses cuda cores to render it.i believe fermi scored almost twice as high in the unigine benchmark for tesselation

Fermi has no dedicated tessellation unit like Cypress. Much of its functionality is extremely flexible and reprogrammable. NVidia hinted earlier that part of their delays with getting DX11 tessellation working was the reordering of the extra polygons after processing. Depending on how you draw the line between hardware, microcode, and software it could go either way.

I'm certain there were some trade offs made in designing it this way.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
astrallite,fermi is hardware tesselated.how can it be software based if it uses cuda cores to render it
That means the drivers are responsible for making the CUDA cores (which are "general purpose", so to speak) do tessellation instead of having dedicated chips in the card that do only tessellation.

I actually do not know if this is in fact how different the Fermi and Cypress designs are. I merely interpreted Astrallite's post.

EDIT: Schmide beat me to it as I was typing; his answer is far better.
 
Last edited:

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
That means the drivers are responsible for making the CUDA cores (which are "general purpose", so to speak) do tessellation instead of having dedicated chips in the card that do only tessellation.

I actually do not know if this is in fact how different the Fermi and Cypress designs are. I merely interpreted Astrallite's post.

EDIT: Schmide beat me to it as I was typing; his answer is far better.

Woah, I thought nvidia did have dedicated hardware tesselators? Sure, that goes against the design approach of it, but the hype for it basically makes it sound like they beefed up on the tesselator and AA hardware (and some other stuff) so many of the extra effects come for free, on top of having a ton of general purpose compute power.
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
56
91
Astralite said:
ATI 5xxx uses a hardware tesellator. Fermi's is entirely software-based, and uses shader power. It's pretty easy for Fermi to win a tessellation benchmark since it can dedicate a ton of hardware to tessellate. That said, the 5xxx won't worry about much of a performance hit with tessellation enabled, while Fermi will see a much larger performance hit since it has to dedicate shaders to that function.

Fermi has no dedicated tessellation unit like Cypress. Much of its functionality is extremely flexible and reprogrammable. NVidia hinted earlier that part of their delays with getting DX11 tessellation working was the reordering of the extra polygons after processing. Depending on how you draw the line between hardware, microcode, and software it could go either way.

I'm certain there were some trade offs made in designing it this way.


Woah, I thought nvidia did have dedicated hardware tesselators? Sure, that goes against the design approach of it, but the hype for it basically makes it sound like they beefed up on the tesselator and AA hardware (and some other stuff) so many of the extra effects come for free, on top of having a ton of general purpose compute power.

Tesselation on GF100 is performed in the polymorph engines. Each cluster of 32 shaders has 1 polymorph engine. Total of 16 polymorph engines per GF100.

gf100a.jpg

gf100b.jpg


So if a 128 shader SKU is released, it would have 4 polymorph engines, 256/8, 384/12.
It appears tesselation is scalable the higher you go in GPU rank.
 
Last edited:

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
Wow, Charlie's article is pretty interesting. I'm sure it's neither 0% correct or 100% correct. I just wonder if it falls closer to the 0% or the 100% point in truth.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
Yep. Regardless, the 9% or whatever number seems absurdly low, I really doubt that. I bet the first yields are somewhere in the 40-60% range, but who knows. :)

If NVIDIA can get 40%-60% yields that means ATI would be close to 80% yields due to difference in sizes alone. We know that isn't happening.

Let me try to dig a post from IDC where he presented the formula and did some math.

EDIT:

Here it is:

Yep, smaller die generally means higher yield. So not only do you get more die on the wafer when the die are smaller, but you get even more of them off the wafer as being functional and sellable as well.

There are more details than you probably ever care to read about on the topic in my posts here and here.

The simple rule-of-thumb equation for yields between two chips which are similar in design/complexity and produced in the same fab is:

Yield_Chip_A = Yield_Chip_B^(DieSize_Chip_A/DieSize_Chip_B)

Juniper's die size is 166mm^2. Cypress is 334mm^2.

So if Cypress' yields are say 50% then Juniper's yield entitlement would be:

Yield_Juniper = 0.5^(166/334) = 71%

A simplistic calculator for determining the number of dies per wafer (DPW) based on no more info than the die-size of the chips (S) and the diameter of the wafer (d):

898958d058a1660440a88ccfc8369858.png


Using d = 294 mm (3 mm wafer edge exclusion, aka WEE) and the S from above we arrive at an estimated 358 dies per wafer for Juniper and 167 dies per wafer for Cypress.

After factoring in the yields (remember we assumed Cypress yields, so this is all relative not absolute), that means we see for wafer of Juniper chips AMD nets 50%*167 = 84 Cypress chips and 71%*358 = 254 Juniper chips.

That ratio is 3:1. Now the ratio will change obviously as the background defect density drops and overall functional yields improve, and also the yields we are talking about here do not include losses from parametric yields (which are further biased against larger chips, so if anything the ratio of good Juniper chips per wafer to good Cypress chips per wafer is even larger than 3:1).



Remember the shortage of Cypress isn't about the ratio of demand for Juniper:Cypress not being 1:1...the actual demand for 57xx products could be 10x more than that for 58xx, but if AMD anticipated the ratio to be 11x and they started enough Juniper wafers to support that much 57xx demand then that means AMD has excess supply of Juniper (11x > 10x) and so they could have done a better job of balancing the fixed 40nm wafer allocation by starting fewer Juniper wafers and more Cypress.

I've no doubt they did account for bigger demand of the cheaper products, it just appears the ratio they chose resulted in more 57xx supply than demand. Not a bad thing, just not optimal from a gross margins point-of-view.

And you can follow his link for more details.
 
Last edited:

bryanW1995

Lifer
May 22, 2007
11,144
32
91
I keep getting this itchy "2900xt" feeling about gf100. This isn't all bad as 2900xt was the basis for the next couple of series that amd came out with (and were enormously better/more successful), but all signs point to "buy amd this year or wait for gen 2 fermi before buying it".
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
ATI 5xxx uses a hardware tesellator. Fermi's is entirely software-based, and uses shader power. It's pretty easy for Fermi to win a tessellation benchmark since it can dedicate a ton of hardware to tessellate. That said, the 5xxx won't worry about much of a performance hit with tessellation enabled, while Fermi will see a much larger performance hit since it has to dedicate shaders to that function.
You know that's a great point. So it might come down to how a game or other application implements each feature; very interesting to consider.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Wait, what about the polymorph engine Keys reminded us of? Isn't that a full-fledged tessellator, or still "general purpose"? Wouldn't it be equal to ATI's hardware tessellator?
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
56
91
You know that's a great point. So it might come down to how a game or other application implements each feature; very interesting to consider.

I'll post it again.

Tesselation on GF100 is performed in the polymorph engines. Each cluster of 32 shaders has 1 polymorph engine. Total of 16 polymorph engines per GF100.

gf100a.jpg

gf100b.jpg


So if a 128 shader SKU is released, it would have 4 polymorph engines, 256/8, 384/12.
It appears tesselation is scalable the higher you go in GPU rank.
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
56
91
Wait, what about the polymorph engine Keys reminded us of? Isn't that a full-fledged tessellator, or still "general purpose"? Wouldn't it be equal to ATI's hardware tessellator?

The Polymorph engines handle Vertex Fetch, Tesselation, Viewport Transform, Attribute Setup and Stream Output.

Point is, Tesselation is NOT handled by the shaders as some keep saying. But it wont seem to matter as I'm sure Tesselation just received a demotion in it's importance in upcoming games.
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
The Polymorph engines handle Vertex Fetch, Tesselation, Viewport Transform, Attribute Setup and Stream Output.

Point is, Tesselation is NOT handled by the shaders as some keep saying. But it wont seem to matter as I'm sure Tesselation just received a demotion in it's importance in upcoming games.
Great to know, thanks Keys. I haven't really been following the news, but why was tesselation demoted? Performance issues?
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
56
91
Great to know, thanks Keys. I haven't really been following the news, but why was tesselation demoted? Performance issues?

No prob.
I meant that I've been noticing that people are starting to give less importance to tesselation since the data of how good Fermi is at it. I dunno why. Well, I do, but unsure why.
Tesselation will go from "The shiznit" to "meh" in record time.
 

waffleironhead

Diamond Member
Aug 10, 2005
7,123
623
136
Great to know, thanks Keys. I haven't really been following the news, but why was tesselation demoted? Performance issues?

Because one of the two major vendors doesnt have any hardware available for sale that runs tesselation?(yet) :sneaky:
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
No prob.
I meant that I've been noticing that people are starting to give less importance to tesselation since the data of how good Fermi is at it. I dunno why. Well, I do, but unsure why.
Tesselation will go from "The shiznit" to "meh" in record time.

I think tesselation is awesome. It's a shame that it's been pretty much unused, AMD's had it for years. Tesselation is a part of DX11, it's supported by both vendors. We'll see it used in games and it'll bring on a benefit to the visuals. I think being a part of DX and being used by both vendors guarantees it's success and that it'll be used often and take off. This is completely different than say, oh I don't know... something like Physx.
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
No prob.
I meant that I've been noticing that people are starting to give less importance to tesselation since the data of how good Fermi is at it. I dunno why. Well, I do, but unsure why.
Tesselation will go from "The shiznit" to "meh" in record time.
Well, there's going to be fan boys anywhere, but I take it this is more of a forum issue than a real world one? I'm with SlowSpyder on this one, and I'll go head and say that I think tesselation is that next big step, just like FSAA, AF, or HDR was. Unfortunately, it's only now getting the recognition and support it should have had awhile ago. However, given what you've stated about Fermi's tesselators (the polymorph engines), it would seem that it's a discrete part of the GPU, much like AMD's solution (sorry if I used some incorrect terminology in there). I'm curious as to why they perform so much better? Are they better implemented, more efficient, just beefier (i.e., more raw computing power), some combination of these factors, or something else?
 

waffleironhead

Diamond Member
Aug 10, 2005
7,123
623
136
I'll post it again.

Tesselation on GF100 is performed in the polymorph engines. Each cluster of 32 shaders has 1 polymorph engine. Total of 16 polymorph engines per GF100.



So if a 128 shader SKU is released, it would have 4 polymorph engines, 256/8, 384/12.
It appears tesselation is scalable the higher you go in GPU rank.

I see the difference you are pointing out, after reading anands article. Cypress and the rest of the 5xxx line have one fixed unit for tesselation which means all of the chips from the top to bottom will have the same tesselation capabilities. Fermi and derivatives will scale their performance as the tessellation is spread across the clusters. The less clusters the less tesselation ability/performance. I guess we will have to wait on benchmarks across the fermi lineup to see how it holds up when they only have 128/4 or 64/2 for example. Thanks for pioting out where the tessellation is done on chip.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
8,520
9,954
136
I see the difference you are pointing out, after reading anands article. Cypress and the rest of the 5xxx line have one fixed unit for tesselation which means all of the chips from the top to bottom will have the same tesselation capabilities. Fermi and derivatives will scale their performance as the tessellation is spread across the clusters. The less clusters the less tesselation ability/performance. I guess we will have to wait on benchmarks across the fermi lineup to see how it holds up when they only have 128/4 or 64/2 for example. Thanks for pioting out where the tessellation is done on chip.

-I really like Fermi's approach to scalable tessellation, and ATI's fixed function tessellator always left me scratching my head precisely because it effectively made Tessellation a bottleneck for high end cards (which, as it appears now, should have had another FFT) rather than something that scales with the card's shader power.

The question still remains, however, as to how effective a single polymorph engine (PME) is with regards to a FFT. If 16 PMEs can can only do as much as ATI's FFT you're right back to square one.
 

Mr. Pedantic

Diamond Member
Feb 14, 2010
5,027
0
76
-I really like Fermi's approach to scalable tessellation, and ATI's fixed function tessellator always left me scratching my head precisely because it effectively made Tessellation a bottleneck for high end cards (which, as it appears now, should have had another FFT) rather than something that scales with the card's shader power.

The question still remains, however, as to how effective a single polymorph engine (PME) is with regards to a FFT. If 16 PMEs can can only do as much as ATI's FFT you're right back to square one.
Given what NVidia has done with GT200(b), I'm not sure that the scalability they've built in is such a great idea as it sounds in theory. As long as it does the job for the cards it's powering then it's fine.