NVIDIA Pascal Thread

Sweepr · Apr 8, 2016

From the other thread:

xpea said:
back to topic:
http://www.hardwareluxx.de/index.ph...it-pascal-gpu-und-samsung-gddr5-speicher.html

GP106 spotted in Revision A1 (should be the production one) and made in week 13 of 2016 (we are week 15). Still hot from the oven !

For such small chip, qualification should be fast and we can expect to see boards in retail before September.

Then on chiphell forum, someone showed ~300mm² Pascal with 8GB GDDR5 (link doesn't work for me but there is it: https://www.chiphell.com/thread-1563086-1-1.html )

Looks like Nvidia is going to deploy Pascal from top to bottom at very fast pace (this is where deep pockets and big R&D budget helps)

Head1985 said:
GP 106 should be 180-200mm2 SKU
http://www.hardware.fr/news/14589/gtc-200-mm-petit-gpu-pascal.html
Gp104 350-400mm2

GP106

GP104 - ~ 290-300mm²
GP106 - ~195-205mm²

DooKey · Apr 8, 2016

I'm starting to feel that itch for new shinies. Come to me my precious ones.........AMD or NV.....just give me some new shinies to buy!!

Sweepr · Apr 8, 2016

DooKey said:
I'm starting to feel that itch for new shinies. Come to me my precious ones.........AMD or NV.....just give me some new shinies to buy!!

Looks like the fun starts June/July. Exciting times ahead.

Sweepr · Apr 8, 2016

GP104 Die Pictured

According to junmiu @ Chiphell this is GP104 (GM204 successor), and it measures ~290-300mm². At the right there's Samsung K4G80325FB - 1.5V 8Gb 8Gbps (8000MHz) GDDR5. Word on the street is 2560 SPs.

www.samsung.com/semiconductor/global/file/insight/2015/08/PSG2014_2H_FINAL-1.pdf

Adored · Apr 8, 2016

That's identical to GK104's die size, does anyone have the actual measurements of that?

MrTeal · Apr 8, 2016

Wouldn't 2560 CC's make more sense than 3072, given how the cores are arranged in GP100? Granted GP104 might not maintain the same GPC:TPC:SM:CC ratio as GP100, but that does seem likely. GK110 had the same FP32 CUDA Core number per SMX as GK104, just with additional FP64 units. GM204 and GM200 shared the same SMM to CC ratio as well.

Kris194 · Apr 8, 2016

Glo. said:
GTX X80 - 3072 CUDA cores, with updated Maxwell to Pascal Arch, and higher core clocks.
GTX X70 - 2560 CUDA cores.

I don't think so, If they will keep 2:4:6 ratio it will be more like

x80 - 2560 CUDA cores
x70 - 2304(?) CUDA cores

To have 3072 cores in GP104, GP104 would have to have 4,8 GPC, it doesn't make any sense.

Head1985 · Apr 8, 2016

I think AMD will win in this generation..vega 11 with HBM2 and 4096Sp will crush this.

Pascal looks pretty lame
256bit 8Ghz DDR5 64rops 2560sp is just lame.
it better have 3072Sp or its crap

el etro · Apr 8, 2016

Good point on Mem chips, they look awesome and will make the PCB smaller. Just wanted to it be GDDR5x, would push the BW of the chips to bigger heights without the need of to use bigger and hungrier buses.

jpiniero · Apr 8, 2016

If it's really 300 mm2, then my pricing estimates of $399/$649 are likely too low. Maybe it does have 384-bit memory. This does seem like it could be trouble if it has the same core:SM ratio that GP100 does.

ShintaiDK · Apr 8, 2016

Head1985 said:
I think AMD will win in this generation..vega 11 with HBM2 and 4096Sp will crush this.

Pascal looks pretty lame
256bit 8Ghz DDR5 64rops 2560sp is just lame.
it better have 3072Sp or its crap

Its Vega 10 that got 4096SP, not Vega 11.

Also they may not compete at all.

Polaris 10 is the first one, and that looks to be 2304ish SP and 256bit GDDR5.

Unless GP104 pulls a GDDR5X or HBM2. Then nobody with a GTX970/290 and up is going to upgrade this year.

Glo. · Apr 8, 2016

Head1985 said:
I think AMD will win in this generation..vega 11 with HBM2 and 4096Sp will crush this.

Pascal looks pretty lame
256bit 8Ghz DDR5 64rops 2560sp is just lame.
it better have 3072Sp or its crap

If you are correct, that will mean 2560 GCN4 Core 232 mm2 GPU will compete in performance with 300 mm2 2560 CUDA core GPU and in comparison of clock-to-clock will win.

But overall, you are correct, it will be that X80 is 2560 CUDA core.

MrTeal · Apr 8, 2016

ShintaiDK said:
Its Vega 10 that got 4096SP, not Vega 11.

Also they may not compete at all.

Polaris 10 is the first one, and that looks to be 2304ish SP and 256bit GDDR5.

Unless GP104 pulls a GDDR5X or HBM2. Then nobody with a GTX970/290 and up is going to upgrade this year.

Source? That's the same number as Fiji, which seems extremely unlikely for the next gen flagship. It also leave very little room between 4096 and the proposed P10 at 2304 SP for Vega 11 to slot in at. Hawaii was 41% larger than Tahiti; Fiji was 41% larger than Hawaii. The gap at the start of GCN was even larger, with Tahiti being 60% more shaders than Pitcairn while Pitcairn was twice as many as Cape Verde. Even if you space the 14nm chips out evenly, that would give V11 1/3rd more shaders than P10 and V10 1/3rd more shaders than V11.

Adored · Apr 8, 2016

Nvidia aren't going to build a 300mm2 bandwidth-starved GPU on 16FF+. Either it has a 384-bit bus or 256-bit bus with compression + arch enhancements meaning bandwidth constraints are further lessened. I prefer the latter.

Don't forget that Samsung's process is denser than TSMC's as well, so AMD's 232mm2 could be closer to Nvidia's 275mm2. AMD also have history of wider buses in smaller area, look at Hawaii and Tonga, however the memory amounts in the leaks don't appear to support a 384-bit bus.

ShintaiDK · Apr 8, 2016

MrTeal said:
Source? That's the same number as Fiji, which seems extremely unlikely for the next gen flagship. It also leave very little room between 4096 and the proposed P10 at 2304 SP for Vega 11 to slot in at. Hawaii was 41% larger than Tahiti; Fiji was 41% larger than Hawaii. The gap at the start of GCN was even larger, with Tahiti being 60% more shaders than Pitcairn while Pitcairn was twice as many as Cape Verde. Even if you space the 14nm chips out evenly, that would give V11 1/3rd more shaders than P10 and V10 1/3rd more shaders than V11.

http://hexus.net/tech/news/graphics/91592-amd-greenland-vega-10-said-4096-stream-processors/

jpiniero · Apr 8, 2016

Adored said:
Nvidia aren't going to build a 300mm2 bandwidth-starved GPU on 16FF+. Either it has a 384-bit bus or 256-bit bus with compression + arch enhancements meaning bandwidth constraints are further lessened. I prefer the latter.

I suppose one option is that both products are cut so that 8 Ghz GDDR5 would be enough, and then a followup in 2017 would up the core counts and add GDDR5X. Or it's going to be an epic paper launch just to mess with AMD and Polaris but won't actually be really available until September or October.

Head1985 · Apr 8, 2016

ShintaiDK said:
http://hexus.net/tech/news/graphics/91592-amd-greenland-vega-10-said-4096-stream-processors/

vega10 its speculation.It cant be vega 10, because there is no room for vega 11 then.
polaris 10 2560SP
Vega10 4096SP
Vega10 cut 3584SP

Where is vega11?
vega11 3072SP
vega11 cut 2560SP?Same sp as polaris 10?no way.

DooKey · Apr 8, 2016

Adored said:
Nvidia aren't going to build a 300mm2 bandwidth-starved GPU on 16FF+. Either it has a 384-bit bus or 256-bit bus with compression + arch enhancements meaning bandwidth constraints are further lessened. I prefer the latter.

Don't forget that Samsung's process is denser than TSMC's as well, so AMD's 232mm2 could be closer to Nvidia's 275mm2. AMD also have history of wider buses in smaller area, look at Hawaii and Tonga, however the memory amounts in the leaks don't appear to support a 384-bit bus.

I hope you're right. Unfortunately we're talking about Glofo executing Samsung tech and they don't have the best reputation for exectution.........

ShintaiDK · Apr 8, 2016

Head1985 said:
vega10 its speculation.It cant be vega 10, because there is no room for vega 11 then.
polaris 10 2560SP
Vega10 4096SP
Vega10 cut 3584SP

Where is vega11?
vega11 3072SP
vega11 cut 2560SP?Same sp as polaris 10?no way.

If Polaris 10 doesn't get GDDR5X, its bottlenecked. Then a 2500-2800sp cut down Vega 11 part is going to be much faster. Not to mention it could have another TMU/ROP layout as well.

swilli89 · Apr 8, 2016

ShintaiDK said:
If Polaris 10 doesn't get GDDR5X, its bottlenecked. Then a 2500-2800sp cut down Vega 11 part is going to be much faster. Not to mention it could have another TMU/ROP layout as well.

You make so many matter of fact statements without knowing so many constants. You don't know Polaris:

uArch changes
sp performance changes
memory compression changes
clockspeed
memory speed

Why don't you let AMD engineers worry about matching up a memory interface with their graphics core?

Adored · Apr 8, 2016

ShintaiDK said:
If Polaris 10 doesn't get GDDR5X, its bottlenecked. Then a 2500-2800sp cut down Vega 11 part is going to be much faster. Not to mention it could have another TMU/ROP layout as well.

Most people would have said the 980 would be totally bottlenecked with bandwidth too but it doesn't appear to be the case. I prefer to compare a Polaris 10 at ~Fury X performance level to the 980 rather than Hawaii.

Yes it means AMD will need to have made another leap in compression or other architectural advances, but I don't feel it's impossible. I fully expect Nvidia to have made a similar leap if not greater.

I think we are all going to be surprised as just how far a 256-bit bus can stretch.

ShintaiDK · Apr 8, 2016

Adored said:
Most people would have said the 980 would be totally bottlenecked with bandwidth too but it doesn't appear to be the case. I prefer to compare a Polaris 10 at ~Fury X performance level to the 980 rather than Hawaii.

Yes it means AMD will need to have made another leap in compression or other architectural advances, but I don't feel it's impossible. I fully expect Nvidia to have made a similar leap if not greater.

I think we are all going to be surprised as just how far a 256-bit bus can stretch.

The 980 is bandwidth bottlenecked.

If you dream of Fury X performance level, then there is a real long way to 512GB/sec. And even Fury X benefits with faster memory. So does Hawaii. And now some 192-224GB/sec bus will be enough?

ShintaiDK · Apr 8, 2016

swilli89 said:
You make so many matter of fact statements without knowing so many constants. You don't know Polaris:

uArch changes
sp performance changes
memory compression changes
clockspeed
memory speed

Why don't you let AMD engineers worry about matching up a memory interface with their graphics core?

If it bugs you so much, ignore it.

Adored · Apr 8, 2016

ShintaiDK said:
The 980 is bandwidth bottlenecked.

It might be slightly held back perhaps, but quite far from bottlenecked.

At TPU the 980 is 19% behind the 980 Ti at 1080p, 24% at 1440p and 27% at 4K - http://www.techpowerup.com/reviews/Gigabyte/GTX_980_Ti_XtremeGaming/23.html

Memory amount surely counts in favour of the 980 Ti as well in some games. Is the 980 really being badly bottlenecked with its 256-bit bus?

ShintaiDK · Apr 8, 2016

Adored said:
It might be slightly held back perhaps, but quite far from bottlenecked.

At TPU the 980 is 19% behind the 980 Ti at 1080p, 24% at 1440p and 27% at 4K - http://www.techpowerup.com/reviews/Gigabyte/GTX_980_Ti_XtremeGaming/23.html

Memory amount surely counts as well in some games.

Is the 980 really being badly bottlenecked with its 256-bit bus?

I own a GTX980, the answer is yes.

NVIDIA Pascal Thread

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Member

Golden Member

Golden Member

Lifer

Lifer

Diamond Member

Diamond Member

Senior member

Lifer

Lifer

Golden Member

Golden Member

Lifer

Golden Member

Senior member

Lifer

Lifer

Senior member

Lifer