GK110--when do you think it will be released?

RussianSensation · Sep 11, 2012

The chip you are describing is going back to Fermi. The node is exactly the same 28nm. How do you increase functional units between 50-87% and stay at a reasonable power consumption level while nearly doubling the size of the die?

GTX280 @ 65nm = 576mm^2 GPU clocks of 602 mhz

GTX480 @ 40nm = 520-530mm^2 GPU clocks of 700mhz

Average power use:

GTX280 = 136W (peak of 171W)
GTX480 = 223W (peak of 257W)
http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_480_Fermi/30.html

GTX280 = 32 ROPs, 240 SPs, 80 TMUs, 512-bit bus
GTX480 = 48 ROPs (50% more), 480 SPs (2x more), 60 TMUs (25% less), 384-bit bus (25% less).

A full-node shrink wasn't even enough to stop the power consumption from going past 250W at load. Notice how GTX480 is a smaller die than the 280 and despite a full node shrink, the power consumption went up 50%!!

Right now GTX680 uses 166W on average, with a peak of 186W. The die size is 294mm^2.

This is what you are proposing:

50% more ROPs
87.5% more CUDA cores
87.5% more TMUs
2x the memory bus width increase
520-550mm^ die size (or at least 77% larger)
Additional full compute functionality eating up more transistor space since this is inherent in the GK110 architecture (meaning introducing dynamic schedule, etc. since you can't just remove that from GK110)

All of that at 850mhz GPU clocks on the same 28nm node? Really? I need to know how NV will keep this chip under 250W of peak power OR we are back to Fermi days. That's pretty interesting since Performance/watt was NV's focus for Kepler from the beginning. You are saying they will throw it all away and make another 250W card with a jet engine blower fan?

tviceman · Sep 11, 2012

RussianSensation said:
The chip you are describing is going back to Fermi. The node is exactly the same 28nm. How do you increase functional units between 50-87% and stay at a reasonable power consumption level while nearly doubling the size of the die?

GTX280 @ 65nm = 576mm^2 GPU clocks of 602 mhz

GTX480 @ 40nm = 520-530mm^2 GPU clocks of 700mhz

Average power use:

GTX280 = 136W (peak of 171W)
GTX480 = 223W (peak of 257W)
http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_480_Fermi/30.html

GTX280 = 32 ROPs, 240 SPs, 80 TMUs, 512-bit bus
GTX480 = 48 ROPs (50% more), 480 SPs (2x more), 60 TMUs (25% less), 384-bit bus (25% less).

A full-node shrink wasn't even enough to stop the power consumption from going past 250W at load.

Right now GTX680 uses 166W on average, with a peak of 186W. The die size is 294mm^2.

This is what you are proposing:

50% more ROPs
87.5% more CUDA cores
87.5% more TMUs
2x the memory bus width
520-550mm^ die size
Additional full compute functionality eating up more transistor space since this is inherent in the GK110 architecture (meaning introducing dynamic schedule, etc. since you can't just remove that from GK110)

All of that at 850mhz GPU clocks on the same 28nm node? Really? I need to know how NV will keep this chip under 250W of peak power OR we are back to Fermi days which everyone here seems to have hated since Performance/watt is all the rage.

I think GK110 is going to have high power consumption, no way around that. But when you compare Fermi V1 to anything when trying to extrapolate performance per watt improvements, face palm yourself please

. If you use anything Fermi to make comparisons, it should be from the 500 series only. Nvidia had WORSE performance per watt going from gtx280 to gtx480. Fermi v1 was horribly inefficient, hell even the gtx 460 was barely more efficient than the gtx285 despite having less ROP's and TMU's: http://www.techpowerup.com/reviews/Zotac/GeForce_GTX_460_1_GB/32.html

Look at what they did with GK104 vs. GF114 - performance per watt was significantly increased 51.5% at 1920x1200 and massively increased 69.5% at 2560x1600. Don't you think it's reasonable to assume they will have similar (or perhaps even better) perf/watt gains with GK110 over GF110? An 850mhz core clock is substantially lower, in both actual units and percentage wise, than GF110 was over GF114. Remember when everyone here was saying Nvidia was tapped out with 40nm and they couldn't deliver a fully functional Fermi core because they hit a wall? Look how all that turned out in the end.

For all intents and purposes, GK110 is going to be a second generation 28nm product. It's going to inherit the improvements gained by the node maturing and the understanding of how to manufacturer chips more efficiently on it. Nvidia has made their largest performance per watt generational improvement in a long, long time (perhaps EVER???). Yes there will be a power wall limiting how high GK110 can be clocked, but I think with the improvements Kepler has made over Fermi and the time Nvidia is taking with GK110, it's going to be a beast and have a beastly price tag.

RussianSensation · Sep 11, 2012

Not talking about performance/watt. For this discussion it's a meaningless metric.

If performance/watt stays the same on GTX780 to GTX680, that changes nothing in the context of viability of manufacturing this chip you describe. You can increase performance 80% and then power consumption grows 80%. You have the same performance/watt as the GTX680 but your chip peaks at 337W. Real peak power consumption is the problem. How do you keep a chip with 50-87% more functional units @ 500mm^2+ die size on 28n node below 250W? That's not even logical without dropping GPU clocks massively.

You think you can increase the die size 70-80% and just suffer a 50W penalty on 28nm node? 50W over GTX680 is already at 236W but that's just a 27% increase in power consumption with a 50-87% increase in functional units and a 77% increase in die size @ 520mm^2. I don't need to face palm myself because you are living in dream land right now. Nvidia does not have some magical 28nm process no one else on the planet doesn't have access to.

If this was so easy to accomplish, why didn't everyone just do this? Increase CPU/GPU functional units/cores by 50-87% and increase the die size 77% with a 27% power consumption increase on the same node.

tviceman · Sep 11, 2012

RussianSensation said:
Not talking about performance/watt. For this discussion it's a meaningless metric.

It isn't meaningless, it's at the heart of the matter. 40nm gen 1 didn't net any perf/watt increases, gen 2 netted decent improvements, and 28nm has netted huge increases. They can significantly increase performance within the same power envelope. Factor in that gk110 is essentially going to be a gen 2 28nm chip....

RussianSensation said:
That's not even logical without dropping GPU clocks massively. You think you can increase the die size 70-80% and just suffer a 50W penalty on 28nm node? 50W over GTX680 is already at 236W but that's just a 27% increase in power consumption with 50-87% increase in functional units and a 77% increase in die size @ 520mm^2. I don't need to face palm myself because you are living in dream land right now.

If G110 ended up at 850mhz, you don't think a 26.5% drop in core clocks isn't a huge drop? It IS huge. The drop in core clocks that GF110 had over GF114 was only 6.5%. I'm saying GK110 will have at least another 20% on top of that slower clocks than GK104. The clocks could be around 800mhz, representing a 30% slower clocks and that still would not have a meaningful change on it's performance. The amount of power savings they'll attain from being able to run at a significantly lower voltage because of the lower clocks would be substantial.

tviceman · Sep 11, 2012

RussianSensation said:
If this was so easy to accomplish, why didn't everyone just do this? Increase CPU/GPU functional units/cores by 50-87% and increase the die size 77% with a 27% power consumption increase on the same node.

Who said it was easy? I didn't. They're taking an awful long time with GK110, so obviously it isn't easy. But the signs and foundation is there. Second generation 28nm chip, with all the node process improvements, experience from manufacturing other chips, the big performance per watt increase Nvidia made going from 40-28nm.... it's all there.

RussianSensation · Sep 11, 2012

850mhz is a 20% drop in GPU clocks. (850 - 1058) / 1058

The drop in core clocks that GF110 had over GF114 was only 6.5%

GF114 --> GF110 is not even remotely comparable to what you are proposing.
http://www.gpureview.com/show_cards.php?card1=641&card2=637

GTX580 only had 50% more ROPs, 33% more CUDA cores, 50% memory bus width, same # of TMUs, and suffered a 44% power consumption penalty at peak.

GTX560 Ti peak 159W
GTX580 peak 229W (+44%)

"GTX780" you have outlined has 50% more ROPs, 87% more CUDA cores, 2x memory bus width, 87% more TMUs, and suffers a 27-30% power consumption penalty?

You are defying the laws of physics right there!

GTX680 peak 186W

Then you get to this: 294mm^2 x 2 GTX690 = $999
Somehow NV will sell you a 520-550mm^2 die chip for $649? So you are suggesting TMSC will drop wafer prices 35-40% by Spring 2013? OR NV is taking a hit on their gross margins?

NV may sell you a 520mm^2 28nm chip but it'll be more reasonable to expect failed K20 die without the full 15 SMX units.

boxleitnerb · Sep 12, 2012

tviceman said:
The question is, will you be willing to drop $599-649 for a GK110 Geforce card? I fully believe that is what we're going to get price wise. If the performance of GK110 over GK104 can be extrapolated by viewing GF110's performance over GF114, I think we're going to see a gtx785 @ $649, gtx780 @ $549, and a gtx770 @ $449 all based off GK110, all faster than GK104. GK104's refresh, and again I am guessing, will start at $379-399.

One good thing is that if GK110 is going into full production in October / November, then hopefully by the time they bring it to their Geforce lineup yields will be good enough that the highest end part won't have any fused off SMX units. It will be a BEAST right from the start.

Short answer:
Yes, I am. In fact, I'm going to get 3 of them :biggrin:

RussianSensation said:
Not talking about performance/watt. For this discussion it's a meaningless metric.

If performance/watt stays the same on GTX780 to GTX680, that changes nothing in the context of viability of manufacturing this chip you describe. You can increase performance 80% and then power consumption grows 80%. You have the same performance/watt as the GTX680 but your chip peaks at 337W. Real peak power consumption is the problem. How do you keep a chip with 50-87% more functional units @ 500mm^2+ die size on 28n node below 250W? That's not even logical without dropping GPU clocks massively.

You think you can increase the die size 70-80% and just suffer a 50W penalty on 28nm node? 50W over GTX680 is already at 236W but that's just a 27% increase in power consumption with a 50-87% increase in functional units and a 77% increase in die size @ 520mm^2. I don't need to face palm myself because you are living in dream land right now. Nvidia does not have some magical 28nm process no one else on the planet doesn't have access to.

If this was so easy to accomplish, why didn't everyone just do this? Increase CPU/GPU functional units/cores by 50-87% and increase the die size 77% with a 27% power consumption increase on the same node.

That is what the turbo is for. Lower the base clock and the voltage accordingly and go only as high as you can at every instant. I also believe they won't use the full 15 SMX right from the start.

Btw what do you think AMD will do? They have the same problem, even worse due to the fact that the 7970 GHz Edition is already quite close to the 250W wall. Not much room for improvement I would say.

BenSkywalker · Sep 12, 2012

Trying to calculate out how much power BigK will utilize based on Fermi isn't going to work. One rather massive difference between the two, lack of hot clocks for shader cores. That could have a rather large impact on the overall power draw of the part.

1020MHZ vs 1150MHZ results in ~50% increase in power sometimes-

http://www.guru3d.com/article/radeon-hd-7950-overclock-guide/3

The lack of hot clocks on Kepler may have a staggering increase in perf/watt. It may end up that BigK is the most power hungry part ever released, but there are reasons to believe that it won't have nearly the power draw issues that its' predecessors had.

Hypertag · Sep 12, 2012

RussianSensation said:
850mhz is a 20% drop in GPU clocks. (850 - 1058) / 1058

GF114 --> GF110 is not even remotely comparable to what you are proposing.
http://www.gpureview.com/show_cards.php?card1=641&card2=637

GTX580 only had 50% more ROPs, 33% more CUDA cores, 50% memory bus width, same # of TMUs, and suffered a 44% power consumption penalty at peak.

GTX560 Ti peak 159W
GTX580 peak 229W (+44%)

"GTX780" you have outlined has 50% more ROPs, 87% more CUDA cores, 2x memory bus width, 87% more TMUs, and suffers a 27-30% power consumption penalty?

You are defying the laws of physics right there!

GTX680 peak 186W

Then you get to this: 294mm^2 x 2 GTX690 = $999
Somehow NV will sell you a 520-550mm^2 die chip for $649? So you are suggesting TMSC will drop wafer prices 35-40% by Spring 2013? OR NV is taking a hit on their gross margins?

NV may sell you a 520mm^2 28nm chip but it'll be more reasonable to expect failed K20 die without the full 15 SMX units.

1) Using GTX 480 power consumption figures is just silly. Make comparisons with the good fermi chips, not the crap.
1a) The shader hot clocks on fermi really harmed power usage.
2) Use GTX 690 power consumption. GTX 690 has one chip running close to stock GTX 680 speeds, and another throttled to 900 MHz. Some tests show the GTX 690 as the graphics card offering the highest performance per watt (ahead of the 7850/7870).
3) If GK110 is as "limited" by power as you claim, then more performance can always be achieved by running more shaders/rops ... at a lower voltage and clock speed. A 12 SMX unit with voltages to support 1GHz would use more power and perform terribly compared to 14 SMX units at 900MHz.
4) GF100 (the crap version) had terrible yields, correct? If your theory that "bad yields means it will never go out of high margin professional / compute segment" is correct, then GF100 would have lived in that segment only. Instead they produced an enormous number of chips in order to spread costs with massive volume.
5) If this theory were correct, then why is Nvidia selling any GTX 690s for $1,000? The Tesla K10 is a GTX 690 with 4GB of RAM per GPU. Clearly the K10 has a far, far higher margin than the GTX 690. Why would Nvidia made this choice?
5a) Note that the GTX 690 launched before the K10.
6) As far as I remember from the GK110 white paper, it is 384-bit, not 512-bit.
7) I am assuming the 680's power consumption is higher due to a higher voltage setting. The clock speeds on the 680 are not set to maximize performance per watt. It was about pushing GK114 to beat the weaker than expected 7970. As your numbers pointed out, even the 670 is much closer to a performance per watt sweet spot.

raghu78 · Sep 12, 2012

GTX 560 Ti had 1.95 billion transistors and GTX 580 had 3 billion transistors. GTX 680 has 3.5 billion transistors while GK110 has 7.1 billion transistors. thats a 100% increase compared to the the 54% increase from GTX 560 Ti to GTX 580. Also GTX 680 TDP is 195w while GTX 560 Ti is 170w . so around 20 - 25w more power consumption. So GTX 780 has lesser power gap and a bigger die size compared to its mid range chip (GTX 680 aka GTX 760 Ti) if we assume it will draw the same power as GTX 580. Hitting 850 Mhz on GTX 780 is not going to be easy especially on a fully enabled chip. what seems to be reasonable to expect is 13 - 14 SMX and the rest of the chip fully enabled at 750 - 800 Mhz. So it all depends on how TSMC 28nm process is performing in Q1 2013 (especially on a huge die) and how good Nvidia's chip design is

Arzachel · Sep 12, 2012

Hypertag said:
1) Using GTX 480 power consumption figures is just silly. Make comparisons with the good fermi chips, not the crap.
1a) The shader hot clocks on fermi really harmed power usage.

The reason why companies usually begin production on a new node with the largest part is that some process bugs creep up only when you're going over a given size. Are you seriously expecting Nvidia to get GK110 perfect on the first try when the largest die they have done on 28nm before is GK104? On the same matter, Nvidia didn't use hotclocking just for kicks but because it was a trade off between performance/mm^2 and performance/w. It caused GK100/110 to be as power hungry as they were but it also made them actually manufacturable. Just makes the claims of 80% more performance with barely any increase in power consumption even more silly.

Hypertag said:
2) Use GTX 690 power consumption. GTX 690 has one chip running close to stock GTX 680 speeds, and another throttled to 900 MHz. Some tests show the GTX 690 as the graphics card offering the highest performance per watt (ahead of the 7850/7870).

So, we should go on cherry picked chips used on an extremely limited availability card? ...See last sentence above.

Hypertag said:
3) If GK110 is as "limited" by power as you claim, then more performance can always be achieved by running more shaders/rops ... at a lower voltage and clock speed. A 12 SMX unit with voltages to support 1GHz would use more power and perform terribly compared to 14 SMX units at 900MHz.

Back to point nr. 1. While more execution units are more power efficient way of increasing performance after a certain clockspeed is reached, they add complexity and die size. You need the resulting chip to actually yield.

Hypertag said:
4) GF100 (the crap version) had terrible yields, correct? If your theory that "bad yields means it will never go out of high margin professional / compute segment" is correct, then GF100 would have lived in that segment only. Instead they produced an enormous number of chips in order to spread costs with massive volume.

TSMC provided per good die pricing for Nvidia on 40nm, which allowed them to pull stunts such as having near zero yields for fully working chips a few months before release.

Hypertag said:
5) If this theory were correct, then why is Nvidia selling any GTX 690s for $1,000? The Tesla K10 is a GTX 690 with 4GB of RAM per GPU. Clearly the K10 has a far, far higher margin than the GTX 690. Why would Nvidia made this choice?
5a) Note that the GTX 690 launched before the K10.

(I'm guessing K10 is the GK104 Tesla) Time to validate professional drivers which is what commands the price differences. Pushing out an extremely rare double GPU card with cherry picked dies was quicker.

Hypertag said:
6) As far as I remember from the GK110 white paper, it is 384-bit, not 512-bit.
7) I am assuming the 680's power consumption is higher due to a higher voltage setting. The clock speeds on the 680 are not set to maximize performance per watt. It was about pushing GK114 to beat the weaker than expected 7970. As your numbers pointed out, even the 670 is much closer to a performance per watt sweet spot.

And neither of these make any difference. You can get a percent of efficiency here and there for what is essentially napkin math. What people are expecting is for Nvidia to magick the laws of physics away.

3DVagabond · Sep 12, 2012

They've only delivered 32 cards so far, with another 1000 arriving this week.That still leaves 13560 more to go. Even if this is their only customer for these cards (highly unlikely) it's going to be a while before they have any "spare" cards for us.

The only way we are likely to see any, any time soon, is if HD 8970 is a monster card. Not likely, though.

I don't remember where I read it, but I seem to recall 275W for the K20. I think that will be difficult without a pretty dramatic drop in clocks.

Keysplayr · Sep 12, 2012

3DVagabond said:
They've only delivered 32 cards so far, with another 1000 arriving this week.That still leaves 13560 more to go. Even if this is their only customer for these cards (highly unlikely) it's going to be a while before they have any "spare" cards for us.

The only way we are likely to see any, any time soon, is if HD 8970 is a monster card. Not likely, though.

I don't remember where I read it, but I seem to recall 275W for the K20. I think that will be difficult without a pretty dramatic drop in clocks.

With 3000+ wafer starts per day, and being overly pessimistic regarding yield, 500 wafers ( +/- 30% yield) would accomodate ORNL's order of 14560 over what they have now.
And, you never know. BigK may never make it to the consumer desktop at all. If it's compute heavy and just being used for Tesla HPC solutions and compute light is the order of the day for desktops.
Not saying it's what I would like to see happen, but I am saying it's a possibility.

Arzachel · Sep 12, 2012

Keysplayr said:
With 3000+ wafer starts per day, and being overly pessimistic regarding yield, 500 wafers ( +/- 30% yield) would accomodate ORNL's order of 14560 over what they have now.
And, you never know. BigK may never make it to the consumer desktop at all. If it's compute heavy and just being used for Tesla HPC solutions and compute light is the order of the day for desktops.
Not saying it's what I would like to see happen, but I am saying it's a possibility.

30% is not overly pessimistic. 3% yields are enough to pay off the wafer without using binned chips, also 3% better than GF100 did at a comparable point of time. Napkin math says 30% yielding K20s is more money than they would get from a 100% yielding GK104 wafer with all of the chips going into GTX680. <3% yields is being pessimistic, 5-10% is being realistic, 30% is not worth considering because they'd be pushing more of them.

3DVagabond said:
I don't remember where I read it, but I seem to recall 275W for the K20. I think that will be difficult without a pretty dramatic drop in clocks.

I'd say it'll run on 450 to 550 mhz core.

sontin · Sep 12, 2012

3DVagabond said:
I don't remember where I read it, but I seem to recall 275W for the K20. I think that will be difficult without a pretty dramatic drop in clocks.

Pls, don't make stories up.
TDP will be around 225 Watt with 1,2TFLOPs+ DP.

3DVagabond · Sep 12, 2012

sontin said:
Pls, don't make stories up.
TDP will be around 225 Watt with 1,2TFLOPs+ DP.

We'll see, I guess.

tviceman · Sep 12, 2012

RussianSensation said:
850mhz is a 20% drop in GPU clocks. (850 - 1058) / 1058

I didn't clarify that. I was calculating from the slower clock's point of view. So 1058mhz is 23.6% faster. But from either point of view, we are still looking at a substantial drop in clocks. 200mhz less is nothing to dismiss when dealing with GPU clock speeds. It could end up being a little lower, again it's just my guess.

RussianSensation said:
GF114 --> GF110 is not even remotely comparable to what you are proposing.
http://www.gpureview.com/show_cards.php?card1=641&card2=637

GTX580 only had 50% more ROPs, 33% more CUDA cores, 50% memory bus width, same # of TMUs, and suffered a 44% power consumption penalty at peak.

GTX560 Ti peak 159W
GTX580 peak 229W (+44%)

"GTX780" you have outlined has 50% more ROPs, 87% more CUDA cores, 2x memory bus width, 87% more TMUs, and suffers a 27-30% power consumption penalty?

You are defying the laws of physics right there!

And, interestingly, looking at the latest benchmarks with the most up-to-date drivers, the gtx580 is about 44% faster than the gtx560ti. So despite having 33% more shaders and 50% more ROPs, and having significantly less transistors per mm^2, in the end it ended up being just as efficient as GF114. FURTHERMORE, if you compare gtx580 to a first gen 40nm gtx460 you end up with the gtx580 being more efficient, despite - what you say - defies the laws of physics.

RussianSensation said:
Then you get to this: 294mm^2 x 2 GTX690 = $999
Somehow NV will sell you a 520-550mm^2 die chip for $649? So you are suggesting TMSC will drop wafer prices 35-40% by Spring 2013? OR NV is taking a hit on their gross margins?

Of course not Russian. Please, please, please stop putting words into my mouth. Draw some reasonable conclusions. Do you think prices are going to stay this high? No. Do you think the gtx680 is currently overpriced? Yes. GK110 is *roughly* going to be the same die size as GF110 (give or take 5%). Do you think AMD will improve their 2nd gen GPU performance and perf per watt metrics? Yes. Nvidia can't sell GK104 at inflated prices forever.

If they sell GK110 for $650, $550, and $450 ($150, $200, and $150 higher than what GF110's consumer products initial MSRP's were), and the video cards actually sell, then they would be very happy. Chip costs might be 10% higher, and component costs might be $15 more, but that would still leave them with a bigger profit margin than GF110 in the consumer space. And there is a larger market for a single GPU $650 video card than a $1000 dual GPU video card.

RussianSensation said:
NV may sell you a 520mm^2 28nm chip but it'll be more reasonable to expect failed K20 die without the full 15 SMX units.

OF COURSE I am expecting chips with fused off SMX's to be sold. Regardless of when they start mass producing GK110, of course this will happen. What I am saying, though, is that if they will have been in production for 5 months, yields will probably be good enough that they can launch full working chips in enough volume to satisfy the demand they would have at the prices I am suggesting.

Look, you seem to think it's impossible that Nvidia can further improve their performance per watt metric set by GK104, and think that a 20-25% dip in clock speeds won't allow enough voltage drop to lower the power draw to fit within a reasonable envelope (<=265 watts), despite the fact that GK110 is essentially a 2nd gen 28nm chip, they made significant gains within the same architecture on 40nm, and made huge gains going from 40nm to 28nm. I think base clocks are going to be 800mhz or slightly higher, and I think TDP is going to be right around 265 watts. I guess we'll just have to agree to disagree.

sontin · Sep 12, 2012

K5000 has a full GK104 chip with 700MHz and 1250MHz GDDR5 and is using only 122 Watt.

Scott from nVidia said in the last 12 months that nVidia will put more cores into the chip and let them run at a "very" low clock. That's the reason why they get rid of the shader clock domain.

f1sherman · Sep 12, 2012

RussianSensation said:
Does the article say that they bought the GPUs for double precision performance? How do you know that in a specific application (scientific computing) that they aren't using single precision or maybe K20 just works MUCH faster in a specific program that benefits substantially from the CUDA architecture?

ORNL or even Titan do not run "a specific application".
Titan is expected to take #1 spot in TOP500, and in terms of supercomputing I'm pretty sure it runs everything you can think of, and then some.
So no, it's not about a specific application being specially tuned for CUDA

Furthermore you seem to be a little off regarding DP numbers.
K20 is expected to triple the double precision performance of Tesla M2090 - so something around ~1.8 TFLOPS is expected.

In essence, while AMD is trying to fix it's drivers to be competitive with old Fermi,
Nvidia is releasing full fledged compute Kepler, that is out of AMD's ballpark even in pure theoreticals.

tviceman said:
RussianSensation said:

Not talking about performance/watt. For this discussion it's a meaningless metric.

Click to expand...

It isn't meaningless, it's at the heart of the matter.

BINGO!

Not to mention that any additional MW is $1,000,000/year.

Arzachel · Sep 12, 2012

sontin said:
K5000 has a full GK104 chip with 700MHz and 1250MHz GDDR5 and is using only 122 Watt.

Scott from nVidia said in the last 12 months that nVidia will put more cores into the chip and let them run at a "very" low clock. That's the reason why they get rid of the shader clock domain.

... I believe you are correct. That's bad for enthusiasts, because that means there isn't going to be a non-professional GK110, but still. 2880 shader units running at anywhere from 400 to 500 mhz at 1/2 dp rate should push 1.4 - 1.8 gflops at a very good TDP which is in line with the promised tripling of peak dp flops.

sontin · Sep 12, 2012

Arzachel said:
... I believe you are correct. That's bad for enthusiasts, because that means there isn't going to be a non-professional GK110, but still. 2880 shader units running at anywhere from 400 to 500 mhz at 1/2 dp rate should push 1.4 - 1.8 gflops at a very good TDP which is in line with the promised tripling of peak dp flops.

GK110 has only a 1/3 dp ratio.
And why would this be bad for enthusiasts? With 800MHz and 14SMX GK110 will be 40% faster than the GTX680...

Arzachel · Sep 12, 2012

sontin said:
GK110 has only a 1/3 dp ratio.
And why would this be bad for enthusiasts? With 800MHz and 14SMX GK110 will be 40% faster than the GTX680...

Source? If that thing really does run at 800mhz, then your quoted 225W TDP flies right out of the window. And the Fermi based Teslas all run at 1/2 dp, which would make the change even more baffling.

Keysplayr · Sep 12, 2012

Arzachel said:
Source? If that thing really does run at 800mhz, then your quoted 225W TDP flies right out of the window. And the Fermi based Teslas all run at 1/2 dp, which would make the change even more baffling.

Arzachel, where are your numbers? Where are you pulling or extrapolating your data from? For all you know, it could pull anywhere from 200 to 400W or anywhere in between. As goes for the rest of us. This is a conversation better left for after reviews. We could guess until we're dead and not get it right.

tviceman · Sep 12, 2012

sontin said:
Pls, don't make stories up.
TDP will be around 225 Watt with 1,2TFLOPs+ DP.

I don't think any actual, real TDP figures have been announced or discussed by Nvidia. Where did you read this?

RussianSensation · Sep 12, 2012

sontin said:
Pls, don't make stories up.
TDP will be around 225 Watt with 1,2TFLOPs+ DP.

That doesn't even make sense given NV's historical strategy going back to G80 to lower DP compute for the consumer part. Why would this change for GTX780? GTX780 will have more DP than GTX680 but 1.2Tflops? Not a chance.

NV always cripples DP on the consumer part to a fraction of the professional part. This is how they can justify a part of the premium if you want compute.

Also, your 1.2Tflops compute and 225W TDP makes it an impossibility from an engineering sense on 28nm node for NV for a single-GPU consumer part. That implies a 1/5th FP32 multiplier on a 1000mhz 2880 SP Kepler OR 1/4th multiplier on an 850mhz 2880 SP Kepler. Keep dreaming if you think NV will have a 1000mhz 2880SP 225W TDP part.

When did NV ever sell you a flagship consumer chip with 1/4th or 1/5th FP32 multiplier? :biggrin:

GTX280 = 1/12th
GTX480 = 1/8th or 1/12th
GTX580 = 1/8th
GTX680 = 1/24th

Nvidia has never sold a consumer part with similar DP compute as the professional part. If you want double precision compute performance at $500 or less, you have to buy AMD cards. This has been the case since at least HD4000 series.

tviceman said:
I don't think any actual, real TDP figures have been announced or discussed by Nvidia. Where did you read this?

It's in the Amazon's best seller list of books, titled Nvidia: Defying the Law of Physics at 28nm nanometer (i.e., 2880 SP 1000mhz GK110 with 1.2 Tflops DP at 225W).

tviceman said:
I guess we'll just have to agree to disagree.

Fair enough. I am going with:

12-13 SMX harvested GK110 @ 1000mhz

OR

15 SMX units at very low clocks 725-800mhz

OR

An entirely new chip that builds on GK104 modular architecture.

boxleitnerb said:
Btw what do you think AMD will do? They have the same problem, even worse due to the fact that the 7970 GHz Edition is already quite close to the 250W wall. Not much room for improvement I would say.

Not sure. Maybe use what they learned on Pitcairn XT & use that design as a learning basis for the 8970.

Pitcairn has higher pixel fill-rate than HD7950 by 25%, retains 32 ROPs of 7950/7970 flagships.
Pitcairn has 36% lower memory bandwidth than HD7950 (153.6 vs. 240)
Pitcairn has 11% lower texture fill-rate than 7950 (80 vs. 89.6)
http://www.gpureview.com/show_cards.php?card1=677&card2=664

HD7950 is just 8% faster.

My guess is AMD's current chip is pixel-fill rate/ROP limited and has gobbles of memory bandwidth it doesn't need since the bottleneck is elsewhere.

It wouldn't be out of the question that AMD goes to 48 ROPs and redesigns the layout using what they learned on Pitcairn. I wouldn't even be surprised if AMD removed some DP performance to accomodate the extra units (they can always reintroduce it back in with HD9000 series on 20nm). Also, since HD7970 is the first flagship part based on GCN, there are bound to be major inefficiencies in that first design in terms of unit balance (ROP/TMUs/memory bandwidth) and transistor efficiency (which actually decreased since HD6970).

You can improve rasterization performance by going with 2 Raster engines (with doubled grid setup <--- this strategy was used on HD5870 Cypress and on 6850/6870 Barts) or 3-4 Raster Engines but then chop some compute off.

GK110--when do you think it will be released?

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Elite Member

Platinum Member

Diamond Member

Member

Diamond Member

Senior member

Lifer

Elite Member

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Senior member

Elite Member

Diamond Member

Elite Member