[Videocardz] Maxwell 28 nm coming this year.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Mondozei

Golden Member
Jul 7, 2013
1,043
41
86
This is an old article, but it makes you wonder. http://www.extremetech.com/computin...y-with-tsmc-claims-22nm-essentially-worthless

"20 or 14nm cost barely goes below the previous one, no saving!"

Without even clicking the link I'm sure the author is Hruska, who is known to be a bit of a bloviating guy, prone to extreme statements. (Living up to his employer's name).

And even if we assume that the base scenario is correct and Nvidia doesn't save much money, they don't really have a choice. Whoever gets to 20 nm will get all the customers. Ditto 14 nm.

And at any rate, Intel's been saying the exact opposite on their 14 nm node, they do make savings and in some cases they have made the costs come down faster than for 22 nm(if you want source, just PM Witeken, he has all the articles in his bookmarks, it would seem, because in every discussion he gives out those sources). So if anything it's a TSMC-related problem and for AMD, at least, the jump to 14 nm is solved by going through Samsung.

TL;DR Not really an industry-wise issue and AMD won't go via TSMC for 14 nm anyway, so Nvidia could well follow suit.
 

Mondozei

Golden Member
Jul 7, 2013
1,043
41
86
Maxwell does well on power consumption compared to its predecessor but its also quite a bit larger in terms of die area, 1.87B compared to 1.3B of its predecessor. You can't just take a 780 ti and convert the cores to Maxwell and get a nice reduction in power consumption and a performance boost because it wont fit, its too big to be fabricated.

If we take BF4 as an example the 750 ti achieves 48 fps at high quality (1) and the 650 ti gets 37.9. So while it gains in power efficiency it looses out in transistor efficiency. That is actually one of the better cases, in a lot of games its lead is smaller. So while its 27% faster its also 44% larger. This poses a problem when we start talking about large die chips like 780's because the spare transistor budget for a 44% increase just isn't there, it would be capped at a similar size to the 780, ie 7.1B or so. Given that the performance of Maxwell would probably be lower than that of Keplar as its less efficient with its transistor in raw performance.

What it looks like to me is that Maxwell is designed for a new silicon process. They reduced power consumption with the expectation of near double the transistor count on a new GPU on a new process and the end result would be similar power consumption today and a top GPU around 14B transistors and a corresponding increase in performance somewhere around 50-100%. But its problematic using this architecture on 28nm, because the transistor budget means the design is limited, it will be very power efficient but it wont be faster unless something else changes.

The 650 Ti is based on the GK106 die and not the GK107 die, which has a transistor count of 2.54B. Going by your numbers, this means that the 750Ti beats out the 650Ti in terms of power AND in transistor count. If you want to compare the 750Ti to the 650, according to Anand's bench, the 750Ti leads the 650 by about 2x in most if not all games tested.

That's what I had initially thought until I looked up AT's article on the 650 Ti.

So, Saylick(and anyone else who ventures a guesstimate), even if we discount the fact that Maxwell was designed for 20 nm, what would you say is a reasonable estimate to expect from Maxwell on 28, given what we know about 750 Ti, for the replacements of GTX 760 and 770.
 

Saylick

Diamond Member
Sep 10, 2012
4,110
9,607
136
So, Saylick(and anyone else who ventures a guesstimate), even if we discount the fact that Maxwell was designed for 20 nm, what would you say is a reasonable estimate to expect from Maxwell on 28, given what we know about 750 Ti, for the replacements of GTX 760 and 770.

If we assume the SMM:SMX ratio holds true, i.e. 128 Maxwell cores (1 SMM) provides 90% of the performance of 192 Kepler cores (1 SMX), then you need about 1138 Maxwell cores to tie with GK104 and about 2133 cores to tie with GK110. The number of cores in GK104 is 1536 and dividing by the equivalent number of Maxwell cores gives you a ratio of 1.35. The same is true for GK110. This is where the 35% increase in instructions per core increase comes from.

Since nVidia also claim that 128 Maxwell cores take up far less room than 128 Kepler cores, you can pack more Maxwell cores into the same area that would have been taken up by an equivalent amount of Kepler cores. If 128 Maxwell cores took up just as much space as 128 Kepler cores, you'd see an improvement of only 35% (which we were given), but because 128 Maxwell cores take up LESS space than 128 Kepler cores, you're looking at better than 35% improvement in performance for a given die area which shall be dedicated to shaders only.

From Anandtech's bench, GK110 is roughly 2.5x faster than the 750Ti and GK104 is roughly 2x faster than the 750Ti but GK110 has 4.5 times the number of shaders and GK104 has 2.4 times the number of shaders. Knocking down these quantities by 1.35 gives us 3.33 and 1.78, respectively. In other words, GK110 should be 3.33 times faster but it really is 2.5 times, and GK104 should be 1.78 times faster but it's 2x as fast, probably due to bandwidth limitations in the 750Ti.

TL, DR: I expect nVidia to release something with maybe 16 SMMs, or about 2048 Maxwell cores, 128 TMUs, 256-bit memory bus, and 32 ROPs. Should be decently faster than the GTX 770 but perhaps tied with GTX 780 Ti until you get into really high resolutions at which the 780 Ti takes the lead. I wouldn't be surprised if the TDP is around the GTX 770's TDP as well, i.e. ~180W.
 
Last edited:

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
If we assume the SMM:SMX ratio holds true, i.e. 128 Maxwell cores (1 SMM) provides 90% of the performance of 192 Kepler cores (1 SMX), then you need about 1138 Maxwell cores to tie with GK104 and about 2133 cores to tie with GK110. The number of cores in GK104 is 1536 and dividing by the equivalent number of Maxwell cores gives you a ratio of 1.35. The same is true for GK110. This is where the 35% increase in instructions per core increase comes from.

Since nVidia also claim that 128 Maxwell cores take up far less room than 192 Kepler cores, you can pack more Maxwell cores into the same area that would have been taken up by an equivalent amount of Kepler cores. If 128 Maxwell cores took up just as much space as 192 Kepler cores, you'd see an improvement of only 35% (which we were given), but because 128 Maxwell cores take up LESS space than 192 Kepler cores, you're looking at better than 35% improvement in performance for a given die area which shall be dedicated to shaders only.

From Anandtech's bench, GK110 is roughly 2.5x faster than the 750Ti and GK104 is roughly 2x faster than the 750Ti but GK110 has 4.5 times the number of shaders and GK104 has 2.4 times the number of shaders. Knocking down these quantities by 1.35 gives us 3.33 and 1.78, respectively. In other words, GK110 should be 3.33 times faster but it really is 2.5 times, and GK104 should be 1.78 times faster but it's 2x as fast, probably due to bandwidth limitations in the 750Ti.

TL, DR: I expect nVidia to release something with maybe 16 SMMs, or about 2048 Maxwell cores, 128 TMUs, 256-bit memory bus, and 32 ROPs. Should be decently faster than the GTX 770 but perhaps tied with GTX 780 Ti until you get into really high resolutions at which the 780 Ti takes the lead. I wouldn't be surprised if the TDP is around the GTX 770's TDP as well, i.e. ~180W.

The GM107 only has one SMX,meaning it has no interconnects. Interconnect problems are what lead to the original GF100 having such high power consumption(IIRC,that is what Nvidia stated). Scaling up the design will probably mean additional power consumption increases.
 

Saylick

Diamond Member
Sep 10, 2012
4,110
9,607
136
The GM107 only has one SMX,meaning it has no interconnects. Interconnect problems are what lead to the original GF100 having such high power consumption(IIRC,that is what Nvidia stated). Scaling up the design will probably mean additional power consumption increases.

Interconnects weren't the reason for why Fermi had high power consumption; it was a combination of hot clocking the shaders and the use of a complex hardware scheduler that made the power consumption very high. Kepler addressed these two issues directly by doing away the hot clock and going back to a static scheduler. Fixing the issues with their memory controller not clocking high enough probably had an effect on the power consumption as well, although I'm not entirely sure if doing so would help lower the power draw.

With Maxwell, nVidia takes Kepler one step further by partitioning the SMX into 4 blocks, each with its own dispatch, crossbar, and execution units. With improvements in instructions per clock, 128 cores are capable of doing 90% of the work of 192 Kepler cores, so they decided to shrink the SMX down to 128 cores and calling it a SMM. Since most of the chip is power gated, unused execution units can be switched off but the level of granularity depends on how things are connected. With Kepler, if an SMX was at 50% utilization, I presume that meant you had to have the ENTIRE crossbar, ALL of the dispatch units, and ALL of the execution units fired up since all of the resources were grouped into 1 large partition. With an SMM, if you have 50% utilization, you are able to switch off 2 of the 4 partitions and cut down on power. Therefore, you save on power when the chip isn't being fully utilized.

Secondly, there's the addition of a larger L2 cache (4 times the size from Kepler), which is supposed to help reduce the time spent accessing RAM and thus save on power consumption as well.

Lastly, as Ryan suggests in his Maxwell article, you have small improvements made at the transistor level by taking advantage of the fact that the 28 nm node is already very mature. With 2 designs already done on 28 nm, nVidia has plenty of experience with the node and knows how much more they can squeeze out of it.

All the reasons I've listed above led me to my estimate that GM104 will be at least look like GK104 from a high level (except that you have an additional 33% more shaders going from 1536 to 2048) but will have roughly the same power consumption as GK104 due to all of the other improvements made in power efficiency. Of course, I'm just pulling numbers out of a hat here so take everything with a grain of salt. nVidia could just as easily throw more SMMs into the mix and scale up the design but that means making a larger die a la GK110. At that point, it comes down to whether or not it's worth it to release a large die Maxwell on 28 nm. Perhaps it's better to release something at a lower cost which trades blows with GK110 in very specific usage cases, but doesn't completely destroy GK110 so that they don't cannibalize their sales at the high end, i.e. GTX TITAN Black. nVidia probably wants to milk GK110 for as much money as they can while they still can.
 

blastingcap

Diamond Member
Sep 16, 2010
6,654
5
76
No need. GK110 dice still in demand for Tesla/Quadro. GM104 would be a gamer (crippled-DP) GPU.
 

NTMBK

Lifer
Nov 14, 2011
10,519
6,033
136
No need. GK110 dice still in demand for Tesla/Quadro. GM104 would be a gamer (crippled-DP) GPU.

Don't forget the rumours of a GK210. Looks like NVidia is bifurcating its "compute" and "gaming" lines, with a new GPU for HPC (GK210) and a new GPU for gamers (GM2xx).
 

Mand

Senior member
Jan 13, 2014
664
0
0
Don't forget the rumours of a GK210. Looks like NVidia is bifurcating its "compute" and "gaming" lines, with a new GPU for HPC (GK210) and a new GPU for gamers (GM2xx).

Isn't this pretty much contradicted rather completely by the Titan and Titan Z, which is a smashing together of the compute and gaming lines?
 

NTMBK

Lifer
Nov 14, 2011
10,519
6,033
136
Isn't this pretty much contradicted rather completely by the Titan and Titan Z, which is a smashing together of the compute and gaming lines?

Um, no. Titan is a compute card which just happens to be good at gaming. It is not a gaming oriented design, it is aimed at DP FP for scientific computing. NVidia directly markets the Titan at Tesla developers as a "cheap" developer platform. Why do you think that the 780 and 780ti exist?

And the divide is getting much more pronounced in the next generation- two different families of GPU (GK210 vs GM2xx).
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
(if you want source, just PM Witeken, he has all the articles in his bookmarks, it would seem, because in every discussion he gives out those sources)
I use Google. For example, search "14nm transistor scaling" images and you'll get a bunch of marketing slides (and going to the links of those images will give you the articles).
 

B-Riz

Golden Member
Feb 15, 2011
1,595
765
136
Question of day:

How would this thing be named?

nVidia has (usually) always launched the high end new card first, e.g. GTX 680, GTX 780 followed by the GTX 670 and GTX 770.

If this is not really a "new new" card, because still based on 28nm, would it be replaced by a 20 nm card in the future with the same performance specs, a true GTX 880 card?

I see this being a GTX 785 or something...

It feels like the AMD release of R9 290 and R9 290x (even though new architecture in Maxwell), a stop-gap before 20nm is available for the volume needs of video cards.