• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

1000 shader APU - when?

Mac29

Member
Not finding anything on this, even on other sites.

Can anyone venture a guess how long it may take for AMD to make an
APU that delivers 1000 shaders? Their next iteration is supposed to still
use DDR3 so I'm thinking sometime when DDR4 is mainstream but when?


Thanks,

Mac
 
And probably a die shrink or even two. Stacked memory plus twice then number of current shaders would be a huge, hot chip.

Taking the PS4 APU as a starting point, you could replace the 2 Jaguar clusters with two Carrizo modules, have the full 20 CUs or go with 16 CUs (1024 Shaders), implement 2 GB on board HBM and also have an external DDR4 interface and do it on 20 nm. It would be a compelling low end gaming APU.
 
Last edited:
Taking the PS4 APU as a starting point, you could replace the 2 Jaguar clusters with two Carrizo modules, have 16 CUs (1024 Shaders), implement 2 GB on board HBM and also have an external DDR4 interface and do it on 20 nm. It would be a compelling low end gaming APU.


You could put a lot of puma cores in place of a Carizzo module
 
Let put it this way there is no real world differences between the 384 shader parts and the 512 shader parts. Sure there is a small % increase in fps but the games are the same playable settings.

The reason this is the case is they are memory bandwidth starved.

Since AMD has their clusters in units of 128 shaders (384 is 128*3, 512 is 128*4) then they would not do 1000 shaders but instead 1024.

So until they can fix the memory bandwidth issue why would they increase the shaders from 384 to 1024, an increase of 8/3 aka 266%? No one would double the amount of shaders let alone near triple them till they fix the memory bandwidth issue.
 
Unfortunately, building a big, complex memory bus that is connected to both the CPU and GPU is expensive. It's far cheaper/easier to give a large and powerful GPU its own memory bus, considering how different and non-overlapping the bandwidth needs of CPUs and large GPUs is. I expect that beyond the point of providing ample memory bandwidth to the CPU, APUs will always be more expensive than a discrete CPU and GPU at a given performance level, making them uneconomical until we start to see true heterogeneous computing.

So far there have been very few advantages of sticking a CPU and GPU on the same chip aside from package size.
 
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...
 
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...

I think AMD only introduced the memory compression with Tonga.
 
The FinFets 16nm TSMC process claims a power reduction of above 100% over the current 28nm process that Kaveri and probably Carrizo is based. So on this process you probably will see a APU with this power.
 
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...

Carrizo will surely use, Carrizo GPU will be Tonga-based.
 
I think AMD only introduced the memory compression with Tonga.

No, they had it before but this is a new better version. Its good but not as good as maxwell's.

R9-285 - 176 GB/sec.
980 - 224 GB/sec

3dm-color.gif


27% more bandwidth 47% more fill. I think in game Nvidia's advantage is greater given that the 980 is ~75% faster in games (1080p or 1440p).

http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_980/26.html

Toms, take it or leave it.

http://www.tomshardware.com/reviews/graphics-performance-myths-debunked,3739-4.html

Essentially bandwidth utilization, blue line is when the 750 ti is normalized to the 650 ti's peformance.

FB-Utilization_w_600.png


Maxwell is roughly twice as efficient in terms of bandwidth than kepler.


I realize that that was a little off topic but it shows what is possible and is something that AMD needs to jump on if they want to improve their APUs. Better bandwidth utilization is possible but work needs to be done. With maxwell like memory efficiency they probably could feed twice as many shaders (768). 1024 would probably require a bigger bus or HBM.

Other ways to improve performance is to allow the igp to access cache like intel does.

http://www.notebookcheck.net/Performance-and-Scaling-Overview-of-Intel-HD-Graphics-4000.82847.0.html

HD 4000

csm_deusex_06_3a943b0205.png


Would be much more noticeable at AMD's levels of performance.
 
Isn't it 4 Jaguar/Puma cores will fit in a Steamroller module? Seems like a good trade off when you have much faster speeds in modules.

The Jaguar cores in Kabini are listed as 3.1mm2, and AMD is quoting that four of these cores will fit into a single Steamroller module. Unfortunately the dimensions of a Steamroller module are not known - a 32nm SOI Bulldozer module clocked in at 30.9 mm2 for example, but no equivalent number is available for 28nm Steamroller. However some quick math shows four Jaguar cores populates 12.4 mm2. This leaves the rest of the core for the L2 cache, IGP and a large amount of IO.
I guess the uncore will also fit into the module's footprint.
http://www.anandtech.com/show/8067/amd-am1-kabini-part-2-athlon-53505150-and-sempron-38502650-tested
 
Last edited:
r9-285 256-bit 5.5GHz [ http://www.anandtech.com/show/8460/amd-radeon-r9-285-review ]
gtx 980 256-bit 7GHz [ http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review ]

how did you get to the conclusion that the compression tech is better on maxwell?

I explained better colour fill. Toms 2x bandwidth efficiency (which tracks out where the 750m DDR3 is noticeably BW limited yet with the 850m DDR3 Nvidia managed to squeeze out +70% on a small BW increase (900 mhz DDR3 to 1000 mhz DDR3). The 980 also is 75% more powerful while only boasting 27% more bandwidth.

Also. Core vs. Core comparisons are nice but its really core + cache. Especially as cache designs and amount can change between designs. 4 Jaguar cores with 2 MB L2 or 2 SR modules with 4 MB L2.

b5e9ebbf-be97-4cc3-b015-96398cbc5c27_zpsa1bf1899.jpg
 
2016 APU Excavator+ - 1024 GCNX cores
2016 APU Cheetah - 1024 GCNX cores

Excavator+ = 14-nm successor to Excavator(28-nm)
Cheetah = 14-nm successor to Puma(28-nm)/Puma+(20-nm)

Zen and K12 have not been finished anything else is the usual bs from Sunnyvale.
 
2016 APU Excavator+ - 1024 GCNX cores
2016 APU Cheetah - 1024 GCNX cores

Excavator+ = 14-nm successor to Excavator(28-nm)
Cheetah = 14-nm successor to Puma(28-nm)/Puma+(20-nm)

Zen and K12 have not been finished anything else is the usual bs from Sunnyvale.

look at the irony of it. you are telling that AMD is lying as if you always speak the truth. :biggrin:

AMD is yet to reveal their 2016 APU roadmap. Right now we only know their 2015 APUs - Carrizo on 28nm and Nolan (x86-64)/Amur(ARMv8) on 20nm.

On topic I expect AMD's 2016 FINFET APUs to sport a 1024 GCN 2.0 GPU with HBM.
 
look at the irony of it. you are telling that AMD is lying as if you always speak the truth.
I tell what is accurate at the given time. If you don't like it, how about you go searching through PDFs, profiles, commentary, analyst interviews.

ZZvt5o8.gif


kMA3Z6S.png


AMD has not once released a new platform architecture on time. This has been going on for um lets see.... nine years.

Any information from Sunnyvale is bad information unless they have given up. When they give up they give their work to the Mile High Design Center or the Boston Design Center.

---
We at least know for a fact Basilisk is Excavator.
 
Last edited:
I tell what is accurate at the given time. If you don't like it, how about you go searching through PDFs, profiles, commentary, analyst interviews.

AMD has not once released a new platform architecture on time. This has been going on for um lets see.... nine years.

Any information from Sunnyvale is bad information unless they have given up. When they give up they give their work to the Mile High Design Center or the Boston Design Center.

We at least know for a fact Basilisk is Excavator.

AMD is paying for its poor execution. But the point is AMD has a chance to rebuild their company with their 2016 CPU architectures. As for AMD's product roadmap I am not going to enter into an argument with you. You know nothing and just talk rubbish. When AMD discloses their 2016 apu roadmap to the public thats when it matters. Not some speculation of yours. :whiste:
 
No, they had it before but this is a new better version. Its good but not as good as maxwell's.

R9-285 - 176 GB/sec.
980 - 224 GB/sec
Enigmoid, Maxwell has twice the number of ROPs. It's not totally surprising that it has higher pixel throughput.
 
Back
Top