[Pastebin/Forums] NVIDIA Maxwell GM1xx Specifications Leaked?

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Cloudfire777

Golden Member
Mar 24, 2013
1,787
95
91
FYI:

TSMC is doing risk production on 20nm right now, and is ready to begin volume production of 20nm in February next year.

GPUs in 20nm will be out in 2014 for sure.

I`m thinking GPUs from AMD/Nvidia will be available around May-June if yields are not catastrophic
 

blackened23

Diamond Member
Jul 26, 2011
8,548
2
0
Risk production. LOL.

http://www.tsmc.com/tsmcdotcom/PRListingNewsAction.do?action=detail&newsid=4042&language=E

Issued on: 2009/08/24

The TSMC 28nm development and ramp-up has remained on track since the announcement made in September of 2008. The 28LP process is expected to enter risk production at the end of Q1 of 2010, followed closely by the 28HP risk production at the end of Q2 and the 28HPL risk production in Q3.

The 28nm LP process will serve as a fast time-to-market and low cost technology ideal for cellular and mobile applications. The 28nm HP process is expected to support devices such as CPUs, GPUs, Chipsets, FPGAs, networking, video game consoles, and mobile computing applications that are performance demanding. The 28nm HPL process features low power, low leakage, and medium-high performance. It is aimed to support applications such as cell phone, smart netbook, wireless communication and portable consumer electronics that demand low leakage.

All 28nm TSMC processes feature a comprehensive design infrastructure based on the company’s Open Innovation Platform™ to extend the power of the technology to a broad range of differentiating products.

I love TSMC's marketing. But, if you think that everyone will suddenly have 20nm in early 2014, have fun with that.
 

Grooveriding

Diamond Member
Dec 25, 2008
9,108
1,260
126
I agree on 20nm coming in a year or so. Anything viable before then Apple is probably taking.

This could explain why nvidia has done such a rape train on pricing and products for 28nm and why AMD has gone so long without a new 28nm lineup until now.

Going back over the past several nodes basically what we should of seen was GTX 680= GTX 660, GTX 780=GTX 670, Titan=GTX 680 for the first batch and then GTX 770=GTX 760, Titan=GTX 770, Titan Ultra=GTX 780 for the second round. Instead we got hosed. Likewise with AMD we would of seen something similar but they're going for the same move nvidia did on their 28nm refresh, rebranding everything below their halo card and release a new die for the two top end cards.

So with such a long wait till 20nm, AMD just about to roll out their new cards to carry them through till 20nm with performance reportedly better than everything from nvidia except for Titan where they're expected to be on parity with R9 290X. Which if AMD prices it right it will do well and push down nv prices and present solid competition. So I guess the question is will nvidia just drop prices and ride it out till 20nm ? Or will they actually do yet another refreshed lineup ?! Maybe some even further harvested GK110s for some sort of 775. I can't see them getting the traction they did with Titan again if they try another $1000 single-gpu GTX card for a Titan Ultra. Overclocked 780s are faster than Titan already. So release yet another $1000 card that can match or slightly best an overclocked 780 ?

And just how bad will 20nm be price wise and in terms of lifecycle ? $1500 cards and four years of 20nm GPUs ? Fail.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
The main issue is, how long will 20nm last for GPUs, seeing the general trend for extending node shrinks' lifecycles from TSCM and how long this 28nm already lasted?

Would it finally allow a tick tock model for GPU vendors, instead of both new arch and node shrink in the same cycle, move the arch revamps to replace current rebrand... err refresh model in the node's mid lifespan, and have node shrinks alone, with just minor tweaks to efficiency (ala GCN 1.0-> 1.1 with Bonaire and those new Pstates that allow finer grained efficiency)?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
The main issue is, how long will 20nm last for GPUs, seeing the general trend for extending node shrinks' lifecycles from TSCM and how long this 28nm already lasted?

Would it finally allow a tick tock model for GPU vendors, instead of both new arch and node shrink in the same cycle, move the arch revamps to replace current rebrand... err refresh model in the node's mid lifespan, and have node shrinks alone, with just minor tweaks to efficiency (ala GCN 1.0-> 1.1 with Bonaire and those new Pstates that allow finer grained efficiency)?

Tick/tock model doesnt really work for GPUs. New uarchs on average never made a noteworthy contribution. They can even be slower due to new features and abilities introduced.

A tick/tock model would also cost money. Something AMD for example doesnt make on its GPUs. And I doubt nVidia would want to add that expense either.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
some random cooked up specs. 320 bit for gm104 is so lame. 3840 alus is 2.5x the count of 1536 on GK104. and to feed that the memory bus is not even increased to 384 bit. anyway not a shred of truth in this.

I am thinking GM104 on 20nm must be taping out in Q4 2013 for a Q3 2014 release. so Nvidia will have to make do for another 9 - 12 months with the Kepler family. 20nm is not going to be as big a leap from 28nm as 28nm was from 40nm. But TSMC 16FF thats a massive node.

http://forums.anandtech.com/showpost.php?p=35559109&postcount=22

GM100 is mostly not going to launch anytime in 2014. most probably Q3 / Q4 2015 on a TSMC 16FF process. Now that should double GK110 perf. :biggrin:
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
As we know, though, Kepler cores are approximately equal to 1/2 of a Fermi core. So Keys got it mostly right, GK104 doubled GF114, GK106 doubled GF116, and GK107 doubled GF118.

That only strengthens my point. Unless we know how Maxwell cores compare to Kepler cores, it's impossible to even start guessing in the number of CUDA cores. Then you have the node and die size uncertainty.

HD6970 was 389mm2 on 40nm but GCN (new arch) on 28nm only increased SPs by 33% despite a 365mm2 die size. Yet, AMD fit 37.5% more SPs in Hawaii with only a 424mm2 die size (just 16% increase).

We have no clue about the relationship of Maxwell vs. Kepler since Maxwell is supposedly a brand new architecture. We also have no idea if NV will change transistor density significantly vs. Kepler.

If someone told you the flagship Maxwell would have ~4000 or ~6000 CUDA cores, we have no way to rebuttal either claim.
 

BoFox

Senior member
May 10, 2008
689
0
0
NVIDIA Maxwell GM1xx Specifications Leaked?

around some maxwell plans from nvidia. quite interesting stuff. there will be two lines, maybe the second one aligns with the finfet stuff, don't know. anyway, that's what i gathered:

the smx structure changes slightly. nv did some optimization that they can use now the dp alu also for sp, it supports now all sp instructions and can be used in parallel. it means an smx looks now to have 256 alu. technically, that reduces maxwell's dp rate to 1:4, but in reality it just boosts the sp performance in comparison to kepler. nv found out how to gate of the unused parts of the dp alu to keep the power down when doing sp stuff.

but the real changes are in the cache area. that will boost the efficiency big time.
first off, the registers are doubled per smx. more threads using a lot of registers can now run in parallel and better hide the latencies. and the caches got increased as well. the L1 cache also used as shared memory is now 128kb (doubled) and can be split between cache and shared memory in steps of 32/96, 64/64, or 96/32. maxwell keeps the 16 tmus per smx.

the gpcs consist of usually 3 smx, but got changed quite a bit. there is still that geometry engine and stuff, but each gpc now includes 768kb of l2 cache, backing the r/w-L1 as well as the read only texture L1 in the smxs and also serve as instruction cache for the smx. all this gets topped off with a much larger l3 cache than in kepler. now to some numbers for the first line.

gm100:
8 gpc (8 triangles per clock), 24 smx, 384 tmus, 6144 alu, 8mb l3 (and there are also 8 l2s in the gpcs!), 64 rops, 512 bit interface, up to 8 gb @ 6+ ghz
target frequency for gf 930mhz, boost 1GHz
target frequency for tesla 850mhz, gives 2.61 dp tflops, double that of kepler, comes with 16gb

gm104:
5 gpc, 15 smx, 240 tmu, 3840 alu, 4mb l3, 40 rops, 320 bit interface (7 ghz), 2.5gb for cheap models, probably a lot of asymmetric 3gb or (symmetric again) 5gb models, target 1+ ghz, can do dp only with 1:16 rate

gm106:
3 gpc, 9 smx, 144 tmu, 2304 alu, 4mb l3, 24 rops, 192 bit interface, 7ghz, 3gb ram

gm108:
2 gpc, 4 smx, 64 tmu, 1024 alu, 2mb l3, 16 rops, 128bit interface, 2 gb ram

but really interesting gets the refresh, probably waiting for tsmc's finfets. then 64 bit arm cores developed by nv gets integrated on the same die. they can coherently access the common l3 cache. the big thing is that they will be used by the graphics driver to offload some heavy lifting from the system cpu. basically most part of the driver will be running on the gpu itself! nvidia expects this will give them at least the same speed up as amd will get from mantle, but without using a new api with straight dx11 or opengl code! and it will also help with the new cuda version for maxwell, where one can access both gpu as well as cpu cores seamlessly.

the specs are planned to stay almost the same for gm110/114/116, just the 110 gets full 8 ARM v8 cores and a doubled l3 (16mb!) compared to the gm100. the finfets may also allow a further speed boost. the 8 arm core version is actually called gm110soc, so maybe nv will start to market them as standalone processors for hpc. the consumer version is likely cut down to 4 arm cores, the same as gm114 will get (which also gets a doubled l3 to 8mb). the gm116 will only get 2 cpu cores on die, i have not seen that a gm118 got mentioned.

http://pastebin.com/jm93g3YG

After looking at GTX 680 vs GTX 580, we see that GTX 680 has over 2x the GFLOPs muscle (largely thanks to 3x the shader processors), and practically identical bandwidth (and a bit lower pixel fillrate) while boasting over 2.5x as much texturing power.

Then let's assume that GM104 will be the only thing around on 20nm for a while (6 months or so) before GM100 comes out, if NV repeats the same pattern as they did with Kepler's GK104..:

GM104 would still have pretty much the same bandwidth as GK110 (a little less, actually), and only about 1.5 times the TFLOPs muscle if it's at say, 1.2 GHz compared against a fully unlocked GK110 Titan Ultra. Yet it would hardly give any more texturing power, let alone pixel fillrate. The next-gen console ports are more likely to use far bigger textures, so the performance advantage would be diminished against Titan Ultra in these games.

Maxwell would for sure have massive cache enhancements to try to make up for relative lack of bandwidth, but the L3 cache would probably be aimed towards management of the ARM cores.

It remains to be seen, but one thing I would bet my money on is that GM104 won't be half as big of a jump over the biggest Kepler as GK104 was over the biggest Fermi.