1000 shader APU - when?

Mac29 · Nov 16, 2014

Not finding anything on this, even on other sites.

Can anyone venture a guess how long it may take for AMD to make an
APU that delivers 1000 shaders? Their next iteration is supposed to still
use DDR3 so I'm thinking sometime when DDR4 is mainstream but when?

Thanks,

Mac

lyssword · Nov 16, 2014

yeah, I think it's a memory issue, I think even 512 cores are currently limited by slow ram.

AtenRa · Nov 16, 2014

I will say it could be at 2016.

VirtualLarry · Nov 16, 2014

Not until we get stacked HBM. Current APUs are already starved for memory bandwidth.

frozentundra123456 · Nov 16, 2014

VirtualLarry said:
Not until we get stacked HBM. Current APUs are already starved for memory bandwidth.

And probably a die shrink or even two. Stacked memory plus twice then number of current shaders would be a huge, hot chip.

Blitzvogel · Nov 16, 2014

frozentundra123456 said:
And probably a die shrink or even two. Stacked memory plus twice then number of current shaders would be a huge, hot chip.

Taking the PS4 APU as a starting point, you could replace the 2 Jaguar clusters with two Carrizo modules, have the full 20 CUs or go with 16 CUs (1024 Shaders), implement 2 GB on board HBM and also have an external DDR4 interface and do it on 20 nm. It would be a compelling low end gaming APU.

monstercameron · Nov 16, 2014

NUSNA_Moebius said:
Taking the PS4 APU as a starting point, you could replace the 2 Jaguar clusters with two Carrizo modules, have 16 CUs (1024 Shaders), implement 2 GB on board HBM and also have an external DDR4 interface and do it on 20 nm. It would be a compelling low end gaming APU.

You could put a lot of puma cores in place of a Carizzo module

Roland00Address · Nov 16, 2014

Let put it this way there is no real world differences between the 384 shader parts and the 512 shader parts. Sure there is a small % increase in fps but the games are the same playable settings.

The reason this is the case is they are memory bandwidth starved.

Since AMD has their clusters in units of 128 shaders (384 is 128*3, 512 is 128*4) then they would not do 1000 shaders but instead 1024.

So until they can fix the memory bandwidth issue why would they increase the shaders from 384 to 1024, an increase of 8/3 aka 266%? No one would double the amount of shaders let alone near triple them till they fix the memory bandwidth issue.

Yuriman · Nov 16, 2014

Unfortunately, building a big, complex memory bus that is connected to both the CPU and GPU is expensive. It's far cheaper/easier to give a large and powerful GPU its own memory bus, considering how different and non-overlapping the bandwidth needs of CPUs and large GPUs is. I expect that beyond the point of providing ample memory bandwidth to the CPU, APUs will always be more expensive than a discrete CPU and GPU at a given performance level, making them uneconomical until we start to see true heterogeneous computing.

So far there have been very few advantages of sticking a CPU and GPU on the same chip aside from package size.

NTMBK · Nov 16, 2014

Probably when Zen comes out in 2016.

SPBHM · Nov 16, 2014

question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...

jpiniero · Nov 16, 2014

SPBHM said:
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...

I think AMD only introduced the memory compression with Tonga.

el etro · Nov 16, 2014

The FinFets 16nm TSMC process claims a power reduction of above 100% over the current 28nm process that Kaveri and probably Carrizo is based. So on this process you probably will see a APU with this power.

el etro · Nov 16, 2014

SPBHM said:
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...

Carrizo will surely use, Carrizo GPU will be Tonga-based.

Enigmoid · Nov 16, 2014

jpiniero said:
I think AMD only introduced the memory compression with Tonga.

No, they had it before but this is a new better version. Its good but not as good as maxwell's.

R9-285 - 176 GB/sec.
980 - 224 GB/sec

27% more bandwidth 47% more fill. I think in game Nvidia's advantage is greater given that the 980 is ~75% faster in games (1080p or 1440p).

http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_980/26.html

Toms, take it or leave it.

http://www.tomshardware.com/reviews/graphics-performance-myths-debunked,3739-4.html

Essentially bandwidth utilization, blue line is when the 750 ti is normalized to the 650 ti's peformance.

Maxwell is roughly twice as efficient in terms of bandwidth than kepler.

I realize that that was a little off topic but it shows what is possible and is something that AMD needs to jump on if they want to improve their APUs. Better bandwidth utilization is possible but work needs to be done. With maxwell like memory efficiency they probably could feed twice as many shaders (768). 1024 would probably require a bigger bus or HBM.

Other ways to improve performance is to allow the igp to access cache like intel does.

http://www.notebookcheck.net/Performance-and-Scaling-Overview-of-Intel-HD-Graphics-4000.82847.0.html

HD 4000

Would be much more noticeable at AMD's levels of performance.

monstercameron · Nov 16, 2014

Enigmoid said:
No, they had it before but this is a new better version. Its good but not as good as maxwell's.

R9-285 - 176 GB/sec.
980 - 224 GB/sec

r9-285 256-bit 5.5GHz [ http://www.anandtech.com/show/8460/amd-radeon-r9-285-review ]
gtx 980 256-bit 7GHz [ http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review ]

how did you get to the conclusion that the compression tech is better on maxwell?

Blitzvogel · Nov 16, 2014

monstercameron said:
You could put a lot of puma cores in place of a Carizzo module

Isn't it 4 Jaguar/Puma cores will fit in a Steamroller module? Seems like a good trade off when you have much faster speeds in modules.

monstercameron · Nov 16, 2014

NUSNA_Moebius said:
Isn't it 4 Jaguar/Puma cores will fit in a Steamroller module? Seems like a good trade off when you have much faster speeds in modules.

The Jaguar cores in Kabini are listed as 3.1mm2, and AMD is quoting that four of these cores will fit into a single Steamroller module. Unfortunately the dimensions of a Steamroller module are not known - a 32nm SOI Bulldozer module clocked in at 30.9 mm2 for example, but no equivalent number is available for 28nm Steamroller. However some quick math shows four Jaguar cores populates 12.4 mm2. This leaves the rest of the core for the L2 cache, IGP and a large amount of IO.

I guess the uncore will also fit into the module's footprint.
http://www.anandtech.com/show/8067/amd-am1-kabini-part-2-athlon-53505150-and-sempron-38502650-tested

Enigmoid · Nov 16, 2014

monstercameron said:
r9-285 256-bit 5.5GHz [ http://www.anandtech.com/show/8460/amd-radeon-r9-285-review ]
gtx 980 256-bit 7GHz [ http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review ]

how did you get to the conclusion that the compression tech is better on maxwell?

I explained better colour fill. Toms 2x bandwidth efficiency (which tracks out where the 750m DDR3 is noticeably BW limited yet with the 850m DDR3 Nvidia managed to squeeze out +70% on a small BW increase (900 mhz DDR3 to 1000 mhz DDR3). The 980 also is 75% more powerful while only boasting 27% more bandwidth.

Also. Core vs. Core comparisons are nice but its really core + cache. Especially as cache designs and amount can change between designs. 4 Jaguar cores with 2 MB L2 or 2 SR modules with 4 MB L2.

NostaSeronx · Nov 16, 2014

2016 APU Excavator+ - 1024 GCNX cores
2016 APU Cheetah - 1024 GCNX cores

Excavator+ = 14-nm successor to Excavator(28-nm)
Cheetah = 14-nm successor to Puma(28-nm)/Puma+(20-nm)

Zen and K12 have not been finished anything else is the usual bs from Sunnyvale.

raghu78 · Nov 16, 2014

NostaSeronx said:
2016 APU Excavator+ - 1024 GCNX cores
2016 APU Cheetah - 1024 GCNX cores

Excavator+ = 14-nm successor to Excavator(28-nm)
Cheetah = 14-nm successor to Puma(28-nm)/Puma+(20-nm)

Zen and K12 have not been finished anything else is the usual bs from Sunnyvale.

look at the irony of it. you are telling that AMD is lying as if you always speak the truth. :biggrin:

AMD is yet to reveal their 2016 APU roadmap. Right now we only know their 2015 APUs - Carrizo on 28nm and Nolan (x86-64)/Amur(ARMv8) on 20nm.

On topic I expect AMD's 2016 FINFET APUs to sport a 1024 GCN 2.0 GPU with HBM.

NostaSeronx · Nov 16, 2014

raghu78 said:
look at the irony of it. you are telling that AMD is lying as if you always speak the truth.

I tell what is accurate at the given time. If you don't like it, how about you go searching through PDFs, profiles, commentary, analyst interviews.

AMD has not once released a new platform architecture on time. This has been going on for um lets see.... nine years.

Any information from Sunnyvale is bad information unless they have given up. When they give up they give their work to the Mile High Design Center or the Boston Design Center.

---
We at least know for a fact Basilisk is Excavator.

raghu78 · Nov 16, 2014

NostaSeronx said:
I tell what is accurate at the given time. If you don't like it, how about you go searching through PDFs, profiles, commentary, analyst interviews.

AMD has not once released a new platform architecture on time. This has been going on for um lets see.... nine years.

Any information from Sunnyvale is bad information unless they have given up. When they give up they give their work to the Mile High Design Center or the Boston Design Center.

We at least know for a fact Basilisk is Excavator.

AMD is paying for its poor execution. But the point is AMD has a chance to rebuild their company with their 2016 CPU architectures. As for AMD's product roadmap I am not going to enter into an argument with you. You know nothing and just talk rubbish. When AMD discloses their 2016 apu roadmap to the public thats when it matters. Not some speculation of yours. :whiste:

NostaSeronx · Nov 16, 2014

raghu78 said:
I know nothing and just talk rubbish.

Fixed it for you.

You didn't even post till I posted. Way to single me out raghu78.

III-V · Nov 16, 2014

Enigmoid said:
No, they had it before but this is a new better version. Its good but not as good as maxwell's.

R9-285 - 176 GB/sec.
980 - 224 GB/sec

Enigmoid, Maxwell has twice the number of ROPs. It's not totally surprising that it has higher pixel throughput.

1000 shader APU - when?

Member

Diamond Member

Lifer

No Lifer

Lifer

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Lifer

Golden Member

Golden Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member