nVidia's next move - flagship using GF104 tech?

Hauk

Platinum Member
Nov 22, 2001
2,806
0
0
nVidia makes a good showing with the revised GF104 architecture. Where could they take it? Could they make a flagship level product that's more powerful and efficient than GTX 480?

There are some very smart people here. It would be great to hear what nVidia's options may be. Is a speedy revision possible, how would they design it, what's the power profile look like, etc. We know ATI is working on their next release. What's nVidia's next move? For reference are segments from AT's review. Thanks AT for the excellent review:

"On GF100, there were 4 GPCs each containing a Raster Engine and 4 SMs. In turn each SM contained 32 CUDA cores, 16 load/store units, 4 special function units, 4 texture units, 2 warp schedulers with 1 dispatch unit each, 1 Polymorph unit (containing NVIDIA’s tessellator) and then the L1 cache, registers, and other glue that brought an SM together."
fullGF100.jpg



"GF104 in turn contains 2 GPCs, which are effectively the same as a GF100 GPC. Each GPC contains 4 SMs and a Raster Engine. However when we get to GF104’s SMs, we find something that has all the same parts as a GF100 SM, but in much different numbers. NVIDIA beefed up the number of various execution units per SM. The 32 CUDA cores from GF100 are now 48 CUDA cores, while the number of SFUs went from 4 to 8 along with the texture units. As a result, per SM GF104 has more compute and more texturing power than a GF100 SM. This is how a “full” GF104 GPU has 384 CUDA cores even though it only has half the number of SMs as GF100."
fullGF104.jpg
 

Scali

Banned
Dec 3, 2004
2,495
0
0
I think nVidia's best bet would be to use the improved superscalar architecture to build a smaller chip that delivers the same performance as GTX470/480, rather than trying to build a mega-chip like GF100 again.
They already lead in absolute GPU performance. With a cheaper and more power-efficient die, they have a better chance competing on price, and perhaps building a nice dual-GPU card to take on the 5970.

I'm not sure how far away the next process shrink is for TSMC though. Perhaps nVidia wants to wait for that before they refresh their high-end products.
I think the GTX460 and possibly another lower end range will be more important though, as they are higher volume parts. It looks like nVidia will be doing okay in the volume markets for now.
 
Jan 27, 2009
182
0
0
I think that the significant aspect of GF104 is that it is once again a gaming card. The dropping of the ECC (only essential for the Tesla parts) is a good thing. I hope they continue to differentiate designs for the different market sectors.

The past three months has shown that not only is the performance GPU market hyper competitive in performance/ $ but also in performance/ Watt. Nvidia had to trim some of the 'fat' from their first Fermi design to get back to a competitive state. GF104 looks like it is a step toward this.

For me, still not that excited. My current GTX260 SLI rig that has been rocking 18 months is still running amazingly fast at 1920x1200 and I bought my cards cheaper than they currently sell for all the way back then :(
 

Scali

Banned
Dec 3, 2004
2,495
0
0
It seems that the early rumours of a 'GTX475' were actually about the GTX460 1 GB model.
 

Sylvanas

Diamond Member
Jan 20, 2004
3,752
0
0
Some thoughts on the topic:

It's interesting to note Nvidia's new found interest in ILP, and extraction of ILP at the hardware level at that. The addition of 2 dispatch units per scheduler was a great addition to an otherwise well rounded (perhaps lacking in texture units) modular architecture. But they'd have to go for more than 480 cores if they wanted it to pay off at the high end. If the current GF104 implementation was scaled up to 10 SM's for 480 cores, some of the time it would perform a good as a GTX480, other times it would perform a little more than a 316 core part (worst case scenario, as 33% of the SM in GF104 is used on the chance that the preceding instruction is independent from the last, therefore 480*0.66=316 core used 100% of the time). A CUDA core on the GF104 takes up the same die space as a CUDA core on a GF100 so if you were to go above 480 cores on a high end part using this architecture, why would you since your going to be limited to a guaranteed 66% of your execution resources all of the time and more depending on the code. If the GF100 scaled above 480 cores it would make use of 100% of its execution resources 100% of the time so all die space would be utilised optimally, but as we know, in it's current state thermals are getting on the upper end of what's acceptable. They need a die shrink for this to work at the high end.

GF104 makes sense as a midrange part where a new die (independant from a 'cut down GF100) is required to utilise a smaller die effectively, which is exactly what Nvidia have done.

EDIT: I forgot to mention the doubling of SFU's and Texture units per SM over the GF100- this would actually help the case of a 10SM GF104 part, how much is harder to quantify but I don't think it would overcome the 66% performance guarantee of 480 CUDA cores. Not to mention all those extra units would probably over saturate the texturing and transcendental power of the GPU whose die space could be utilised more effectively for something else. It's important to get the 'balance' right with Fermi, that's something I remember AT have commented on before and something Nvidia have positioned themselves to do well with two different derivatives with 2 separate goals in mind.
 
Last edited:

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
I think they might use this with other enhancements to build a 786CC chip using 28nm process tech up against whatever AMD has in store for us next. Until then, dual GPU solution using two GF104 might take over the high end market for nVIDIA and a GTX470 replacement in the form of a full fledged GF104 chip on steroids could fill some of the void left behind.

The GF104 is not without its weaknesses however, such as its bottleneck with the register/L1 cache since those stayed the same in terms of quantity. Also it can only output 2 pixels/clock for each SM, meaning a total of 14 pixels/clock hence why the 768MB model is pretty close to its 1GB variant even with an 8 ROP difference.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
I think the more interesting question is: what is AMD's next move?
With GTX460, nVidia has fixed pretty much all of Fermi's weaknesses:
- Price
- Power consumption
- Heat
- Noise

While retaining its strengths:
- Cuda/GPGPU
- Tessellation

AMD can lower the prices of their current products, but in the long run they'll need an answer to Fermi's strengths, so an updated architecture is in order, and better driver/SDK support.
It seems that the ball is in AMD's court now.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
You mean, something like this?

My guess is that a fully unlocked GF104 (all 384 shaders) with a higher reference clock than the gtx460 will make up the gtx475. I don't think they're going to come out with an all new chip this year that would play second fiddle to the gtx480 in performance.
 

RaistlinZ

Diamond Member
Oct 15, 2001
7,470
9
91
I would like to see a dual GPU 460 card at the $400 price point. Something that could overclock well and come close to a stock 5970 for $300 less.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
My guess is that a fully unlocked GF104 (all 384 shaders) with a higher reference clock than the gtx460 will make up the gtx475. I don't think they're going to come out with an all new chip this year that would play second fiddle to the gtx480 in performance.

Don't know if those few extra shaders could beat a GTX 470 - the clocks would need to be pretty high, it would be more like a GTX 468 or something :)

I think they are really there for the dual card - I bet a dual 384 shader card was what nvidia worked out they needed to beat the radeon 4970.
 

Hauk

Platinum Member
Nov 22, 2001
2,806
0
0
Don't know if those few extra shaders could beat a GTX 470 - the clocks would need to be pretty high, it would be more like a GTX 468 or something :)

Yea I don't think a fully functional GF104 would be enough to beat 470.

Regarding number of SMs per GPC, is it limited to four? What if GF104 had two GPCs with six fully functional SMs per GPC?
 

edplayer

Platinum Member
Sep 13, 2002
2,186
0
0
Don't know if those few extra shaders could beat a GTX 470 - the clocks would need to be pretty high, it would be more like a GTX 468 or something :)


it wouldn't surprise me to see Nvidia replace the GTX 470 with this card. Even if it performed slightly lower, it wouldn't stop them from doing it.
 
May 25, 2003
100
0
0
We are already signs that the 460s will end being sold over MSRP.

I would love to see ATi finally lower the price of their 5800 series.