[Pastebin/Forums] NVIDIA Maxwell GM1xx Specifications Leaked?

ams23

Senior member
Feb 18, 2013
907
0
0
NVIDIA Maxwell GM1xx Specifications Leaked?

around some maxwell plans from nvidia. quite interesting stuff. there will be two lines, maybe the second one aligns with the finfet stuff, don't know. anyway, that's what i gathered:

the smx structure changes slightly. nv did some optimization that they can use now the dp alu also for sp, it supports now all sp instructions and can be used in parallel. it means an smx looks now to have 256 alu. technically, that reduces maxwell's dp rate to 1:4, but in reality it just boosts the sp performance in comparison to kepler. nv found out how to gate of the unused parts of the dp alu to keep the power down when doing sp stuff.

but the real changes are in the cache area. that will boost the efficiency big time.
first off, the registers are doubled per smx. more threads using a lot of registers can now run in parallel and better hide the latencies. and the caches got increased as well. the L1 cache also used as shared memory is now 128kb (doubled) and can be split between cache and shared memory in steps of 32/96, 64/64, or 96/32. maxwell keeps the 16 tmus per smx.

the gpcs consist of usually 3 smx, but got changed quite a bit. there is still that geometry engine and stuff, but each gpc now includes 768kb of l2 cache, backing the r/w-L1 as well as the read only texture L1 in the smxs and also serve as instruction cache for the smx. all this gets topped off with a much larger l3 cache than in kepler. now to some numbers for the first line.

gm100:
8 gpc (8 triangles per clock), 24 smx, 384 tmus, 6144 alu, 8mb l3 (and there are also 8 l2s in the gpcs!), 64 rops, 512 bit interface, up to 8 gb @ 6+ ghz
target frequency for gf 930mhz, boost 1GHz
target frequency for tesla 850mhz, gives 2.61 dp tflops, double that of kepler, comes with 16gb

gm104:
5 gpc, 15 smx, 240 tmu, 3840 alu, 4mb l3, 40 rops, 320 bit interface (7 ghz), 2.5gb for cheap models, probably a lot of asymmetric 3gb or (symmetric again) 5gb models, target 1+ ghz, can do dp only with 1:16 rate

gm106:
3 gpc, 9 smx, 144 tmu, 2304 alu, 4mb l3, 24 rops, 192 bit interface, 7ghz, 3gb ram

gm108:
2 gpc, 4 smx, 64 tmu, 1024 alu, 2mb l3, 16 rops, 128bit interface, 2 gb ram

but really interesting gets the refresh, probably waiting for tsmc's finfets. then 64 bit arm cores developed by nv gets integrated on the same die. they can coherently access the common l3 cache. the big thing is that they will be used by the graphics driver to offload some heavy lifting from the system cpu. basically most part of the driver will be running on the gpu itself! nvidia expects this will give them at least the same speed up as amd will get from mantle, but without using a new api with straight dx11 or opengl code! and it will also help with the new cuda version for maxwell, where one can access both gpu as well as cpu cores seamlessly.

the specs are planned to stay almost the same for gm110/114/116, just the 110 gets full 8 ARM v8 cores and a doubled l3 (16mb!) compared to the gm100. the finfets may also allow a further speed boost. the 8 arm core version is actually called gm110soc, so maybe nv will start to market them as standalone processors for hpc. the consumer version is likely cut down to 4 arm cores, the same as gm114 will get (which also gets a doubled l3 to 8mb). the gm116 will only get 2 cpu cores on die, i have not seen that a gm118 got mentioned.

http://pastebin.com/jm93g3YG
 
Last edited:

wand3r3r

Diamond Member
May 16, 2008
3,180
0
0
Is this NV's attempt to mitigate mantle? Looks like they're concerned, although pastebin as a source implies it's just a fan trying to spread fud.

Might as well read tea leaves. Anyways the rumor mill appears to be gearing up to try discredit the amd launch.
 
Feb 19, 2009
10,457
10
76
Fantasy GPUs on a fantasy node using fantasy finfet technology. Sounds legit.

Perhaps start the wild rumours closer to 20nm release next time, m'kay?
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
^ Pretty sure they were going for James Clerk Maxwell.


Nvidia has been talking about using an on card ARM chip to offload cpu driver overhead for awhile now.

Besides that what's so unbelievable? Doubling of units isn't anything special and Maxwell is a new uarch so except changes. Not saying these are it, but keep an open mind ;)
 

Grooveriding

Diamond Member
Dec 25, 2008
9,147
1,329
126
With nonsense like this I wonder if it's a random nvidia fanboy who can't handle an AMD card launching and decides to make up a pile of nonsense and put it on a site like pastebin, or if it's nvidia's viral machine trying to get some attention in the midst of a big AMD launch.

Probably a more relevant discussion point than the random nonsense in the OP from pastebin. :awe:
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Well, I know where he got the 6144 alu number.

Picture GK110 with 15SMX enabled is 2880 cuda cores.
Now 15 SMX is an odd duck number. It should be 16SMX with 3072 cores. (akin to GT200 being 240 cores instead of 256 or GF100 being 480 cores instead of 512)
So, a "fully functional GK110" at 3072 cores doubled will give you 6144 cores.
Doubling of the cores moving to a new production process is normal.
 

Leadbox

Senior member
Oct 25, 2010
744
63
91
Well, I know where he got the 6144 alu number.

Picture GK110 with 15SMX enabled is 2880 cuda cores.
Now 15 SMX is an odd duck number. It should be 16SMX with 3072 cores. (akin to GT200 being 240 cores instead of 256 or GF100 being 480 cores instead of 512)
So, a "fully functional GK110" at 3072 cores doubled will give you 6144 cores.
Doubling of the cores moving to a new production process is normal.

So not a "brand new architecture" just a doubling of Kepler with some ARM cores thrown in :eek:
 

ams23

Senior member
Feb 18, 2013
907
0
0
Well, I know where he got the 6144 alu number.

Most of what was speculated here could probably be deduced or guessed at using publically available information, and the specs are certainly within reason for the most part. That said, there are some parts that don’t make sense to me. I will post more thoughts later in the weekend.

Maxwell should be introduced by March 2014 at GTC 2014, which is less than six months away from today (and which is a full 2 years from the day that Kepler was introduced). Exciting times ahead no doubt.

Of course, the elephant in the room is the Quadro K6000 which has 12 GB RAM, 5.2 TFLOPS single precision throughput, and 225w TDP, which shows that Kepler and the 28nm fabrication process still have some legs.
 
Last edited:

wand3r3r

Diamond Member
May 16, 2008
3,180
0
0
Where did you pull GTC 2014 as the release from? Last rumors I saw were 2H 2014 for 20nm. As far as I'm concerned it's pointless to speculate without any data especially when it might be up to a year away.
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
So not a "brand new architecture" just a doubling of Kepler with some ARM cores thrown in :eek:

Clearly not according to this, did you miss the Double Precision units capable of performing Single Precision OPS?

Think of it like AVX, instead of having a worthless Floating Point unit in each of the four cores, you can now use the FP units to do integer work. So you're running a very intense Single Precision application, on Kepler and everything else you use the dedicated SP units, whereas with Maxwell (reported by this leak) it will also leaverage the power of the additional Double Precision units.

There is also the addition of an ARM cpu on the Graphics Card board, and whatever magic they need to make it work in a way that matters - removing the latency/speed issues sending information across a PCIe bus to the CPU for it to process that information, then send said information back to the GPU before it can do anything.


Seems a lot of the changes with Maxwell are aimed to increase efficiency of the product, clipping CPU overhead at the hardware level instead of through an API (will work in any game, not limited by having to pay companies to use your API). Then there are the units themselves, using what is now wasted die space for Double Precision to enhance Single Precision performance (SP is what games use).

This could easily double the performance of Titan (ignore the cpu overhead part, just the reworked DP units), which is a bit more than we would normally except but nothing so blatantly stupid as the R290X rumors where it would be twice as fast as the 780 on 28nm. This still requires 20nm, which for all we know is a discussion of a rumor of a product over a year away, probably more if Nvidia goes GM104 first then waits a year to release their real flagship product.

I have no idea if it's even possible from a technical standpoint, but we didn't get where we are today doing the same thing over and over, innovation drives this market.

Grain of Salt.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Maxwell is like a year away. Just like any 20nm AMD GPU. So any specs the next 6 months I would utterly ignore. Nothing but hypes and hopes. Not to mention there is no finfets for the 20nm process. Well, depending on how you see TSMCs "16nm".

Utterly fake specs.
 
Last edited:

skipsneeky2

Diamond Member
May 21, 2011
5,035
1
71
1zebh9k.png
And here we go....with the many pages of vaporware.:awe: