AMD Raven Ridge 'Zen APU' Thread

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

cbn

Lifer
Mar 27, 2009
12,968
221
106
Here is an example of what I hope doesn't happen with Raven Ridge.

HP is using two DDR4 SO-DIMMs with the laptop in the review, but decides to use a 15W Bristol Ridge APU and Oland Jet dGPU instead of a 35W Bristol Ridge APU.

Some quotes from the review:

We found the dedicated Radeon R7 M440 GPU of our test laptop relatively disappointing. It rarely performed faster than the integrated Radeon R5 and, when it did, the IGP was suffering performance drops due to its limiting TDP. Instead, HP should have installed the higher-performance A10-9630P (35-watt). This would have boosted CPU and GPU performance. This statement is supported by the fact that the CrossFire setup did not produce higher frame rates in none of the games we tested.

Performance Consistency - Sustaining High CPU and GPU Loads

In games, the TDP limit is temporarily exceeded. However, after a maximum of two minutes (depending on the previous load and temperatures of the laptop), the 15-watt limit strikes back, dropping clock speeds. For example, in “Diablo III”, the CPU and GPU clock speeds start at 1800 and 550 MHz respectively. As the game runs, the speeds drop to 1100 to 1200 (CPU) and 380 to 420 MHz (GPU). The frame rates drop in parallel in our static gaming scene from 72 to 49 fps.


We want to praise the AMD IGP. Once again, the manufacturer steals the performance crown in the 15-watt segment. The only Intel IGP capable of surpassing the AMD competitor is the Iris Graphics 540, which is very expensive. We hope that many manufacturers will install Bristol Ridge without a dedicated graphics card and thus make good use of this strong advantage. Dual Graphics can only create a large lead over the competition in synthetic benchmarks. When running demanding applications, such as games, the setup does not offer much better performance.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Something I would to see AMD do with Raven Ridge is advertise something like "Gaming clocks" for the 35W and 15W APU configurations.

One reason is because currently with Bristol Ridge the advertised specs look very close together for the two TDP levels:

5%20-%20SKUs.jpg


But yet with the CPU and GPU both active there will likely be a very large difference in performance if these tests by Stilt are any indication---> http://forums.anandtech.com/showpost.php?p=37946141&postcount=12
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Here is an example of what I hope doesn't happen with Raven Ridge.

HP is using two DDR4 SO-DIMMs with the laptop in the review, but decides to use a 15W Bristol Ridge APU and Oland dGPU instead of a 35W Bristol Ridge APU.

Some quotes from the review:
Similar with the crap some OEMs do to Intel... they still understimates the power of their iGPU and they stack an nVIDIA GPU on it.

And wondering, why they didn't put an nVIDIA GPU on AMD CPU since it can help them more than expecting?
 
Last edited:

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
That has nothing to do with throughput, that is a product of decreased latency.
You said bandwidth before.

Regardless, data goes in and out and the L4 makes it go in and out better.

Bandwidth... throughput... whatever you want to call it... fact is that Broadwell C shouldn't have been able to beat Skylake — with it's amazing world-changing technological advancement and improved cores — at a lower clock and power consumption.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
fact is that Broadwell C shouldn't have been able to beat Skylake — with it's amazing world-changing technological advancement and improved cores — at a lower clock and power consumption.

If Intel had just started integrating l4 into all of their top-end consumer CPUs, Skylake would have left Broadwell-C in the dust. But noooo they can't be bothered to do that.
 
Feb 4, 2009
34,506
15,737
136
So when will we find out if this is a legit step forward for AMD or a typical over hyped over promised AMD release?
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
If Intel had just started integrating l4 into all of their top-end consumer CPUs, Skylake would have left Broadwell-C in the dust. But noooo they can't be bothered to do that.

Possibility is there that they just can't. Just like how AMD/Nvidia struggle with HBM. At least not in the timeframe that seems "logical" to us. Let's look at one example of that. Tick Tock. The idea is that products turn out better when design targets are met. It would be nice to do all at once, architecture and process, but that's just not realistic. Tick Tock succeeded because it realistically split very complex projects into segments. Now, the problems seem even beyond that.

Design problems for computing seems monumental. In all aspects all manufacturers are struggling to make small leaps forward. It may be that we really forget its not companies that make the product, but the people working for the company. We are running into human limitations.
 
Last edited:

24601

Golden Member
Jun 10, 2007
1,683
39
86
Possibility is there that they just can't. Just like how AMD/Nvidia struggle with HBM. At least not in the timeframe that seems "logical" to us.

Design problems for computing seems monumental. In all aspects all manufacturers are struggling to make small leaps forward.

It's obvious that they didn't do it because profit margins have always been king for Intel.

Putting an EDRAM chip that's extremely large in mm^2 in relation to the consumer skus' mm^2 would mean effectively lower yield per wafer which leads to higher COGS.

It's the same reason why they don't just put 6 cores on the consumer skus. It would take negligible amounts of extra die space, but it would tank profit margins due to giving consumers more bang for the buck, which means that every performance level now has lower gross revenue per sale.

In a monopoly, capitalism dictates maximum profit extraction.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
Putting an EDRAM chip that's extremely large in mm^2 in relation to the consumer skus' mm^2 would mean effectively lower yield per wafer which leads to higher COGS.

It's the same reason why they don't just put 6 cores on the consumer skus.
The questionable Iris graphics take up a huge portion of Broadwell C.

It's not that there isn't enough space. It's that Intel is withholding the performance to try to entice people to buy again later.
 

24601

Golden Member
Jun 10, 2007
1,683
39
86
The questionable Iris graphics take up a huge portion of Broadwell C.

It's not that there isn't enough space. It's that Intel is withholding the performance to try to entice people to buy again later.

If you question dimensional physics, i'm afraid any further conversation with you would not yield any results.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
We are running into human limitations.

Be that as it may, they've already done it with Broadwell-C, and they were able to bring the chip to market at an actual retail price of about $400 (MSRP was lower for the i7-5775c but supply & demand changed that). Had board support been better, it would have /could have been a smash hit for gamers.

There's no technical reason why the same could not have been done for Skylake. It might not have been a very big eDRAM, but it doesn't need to be THAT big to make a difference.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Putting an EDRAM chip that's extremely large in mm^2 in relation to the consumer skus' mm^2 would mean effectively lower yield per wafer which leads to higher COGS.

It's the same reason why they don't just put 6 cores on the consumer skus.

The questionable Iris graphics take up a huge portion of Broadwell C.

It's not that there isn't enough space. It's that Intel is withholding the performance to try to entice people to buy again later.

If you question dimensional physics, i'm afraid any further conversation with you would not yield any results.

If the iGPU were smaller than GT4 (say GT2), then certainly Intel could compensate with more eDRAM production. Then more processors could have the L4 cache to boost CPU.

Of course, that defeats the primary purpose of having the L4 cache in the first place (which is provide enough bandwidth to feed a larger than normal iGPU).

I guess it depends on what is valued more? Having more processors with better CPU performance or having a smaller amount of processors with better CPU and iGPU performance?

EDIT: Maybe one predictable thing that could happen is if Intel is unable to sell enough dual core GT3e mobile processors, they would take the eDRAM that was allocated for those SKUs and instead make 4C/GT2e out of them. Then take the resulting eDRAM-less dual core GT3 die and make a desktop chip out it. But I don't ever seen them using the eDRAM meant for the quad core GT4 to make quad core GT2e (for obvious reasons).
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,000
3,357
136
Well at 15W TDP, a 8x CU Polaris iGPU with Dual Core + HT 4x Threads ZEN CPUs, could be more than 30% faster than current 28nm 8x CU Bristol Ridge in gaming.

Radeon-RX-480-Presentation-VideoCardz_com-5-1.jpg


z9HoDnZ.jpg


15% higher performance per CU
+
3200MHz DDR-4 memory = 33% higher bandwidth than 2400MHz
+
14nm FF more than 2x higher efficiency at low power vs 28nm Bulk
+
ZEN architecture = higher perf/watt than CMT based Currizo


what say you ??
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Well at 15W TDP, a 8x CU Polaris iGPU with Dual Core + HT 4x Threads ZEN CPUs, could be more than 30% faster than current 28nm 8x CU Bristol Ridge in gaming.

Radeon-RX-480-Presentation-VideoCardz_com-5-1.jpg


z9HoDnZ.jpg


15% higher performance per CU
+
3200MHz DDR-4 memory = 33% higher bandwidth than 2400MHz
+
14nm FF more than 2x higher efficiency at low power vs 28nm Bulk
+
ZEN architecture = higher perf/watt than CMT based Currizo


what say you ??

How much do you think it would it throttle the CPU at that TDP?

Considering FX-8800P (Carrizo) @ 15W/25W AC boost clocked down to around 1.5 Ghz for CPU and around 350 Mhz for iGPU during Stilt's GTA V test, I wonder what would happen here? Also notice the A10-9600P (15W) from post #101 clocked down to around 1100 Mhz to 1200 Mhz for CPU during sustained gaming.

Maybe a better comparison would be a 15W Intel Skylake GT3e? Though I wonder by the time enough of these harvested Raven Ridge processors accumulate how far Intel would be into Cannonlake (10nm)?

P.S. One reason I am posing this question is because I see AMD's main strength (even with Zen) being GPU uarch. And when TDP drops too low for these big iGPU APUs I fear that strength becomes minimized.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Regarding the Samsung 8GB HBM2 package mentioned back in post #65, assuming the default configuration for a single 8GB HBM2 stack is 256 GB/s @ 2 Gbps per pin I wonder how it scales with lower frequencies? (Power consumption and timings)

Say 800 Mbps or 1 Gbps per pin for a more entry level APU laptop with a single 8GB stack serving as system RAM and 400 Mbps to 500 Mbps per pin (with even tighter timings) for a high performance APU laptop with two 8GB stacks (ie, 16GB total serving as system RAM).

And if that is possible, how much impact could a 2 x 8GB stack optimized with very tight timings (offering 102 GB/s to 128 GB/s total bandwith @ 400 Mbps to 500 Mbps per pin ) have on the amount of L2/L3 cache needed for the APU? Maybe even go slower than 400 Mbps per pin if that would boost CPU without hurting iGPU too much?

Also how would the power consumption of such HBM2 configurations compare to 1 x 8GB DDR4 SO-DIMM or 2 x 8GB DDR4 SO-DIMM?
 
Last edited:

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
If you question dimensional physics, i'm afraid any further conversation with you would not yield any results.

Hmm..

Here is what I'm talking about.

http://images.anandtech.com/doci/9320/BDW-H-Map_575px.png

That is a lot of space for Iris graphics, space that could be used for something else. And, I also had suggested that Intel sell on SKU for enthusiasts with the Iris graphics disabled.
That would improve yields by making parts with all that space dedicated to graphics worthwhile if that space contains flaws.

If Intel can sell the 5675C (or even 5775C) for what it was asking, despite having all that space dedicated to graphics and despite having EDRAM and despite the fact that it was produced when 14nm yields weren't so good — then it seems quite difficult to argue that Intel can't afford to put EDRAM on a Skylake part or sell a Broadwell part with a higher TDP for enthusiasts with the Iris graphics turned off.

Also, when it comes to "dimensional physics" — does that require Intel to put in an EDRAM controller and not use it?
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,744
3,077
136
You said bandwidth before.
The more correct term is throughput, bandwidth can mean a very different thing depending on who your talking to. Most people take bandwidth to mean throughput.


Regardless, data goes in and out and the L4 makes it go in and out better.
Which i never contested


Bandwidth... throughput... whatever you want to call it... fact is that Broadwell C shouldn't have been able to beat Skylake — with it's amazing world-changing technological advancement and improved cores — at a lower clock and power consumption.

And its because of decreased latency for data that was evicted from the L3. its all about race to stall, if there is a large enough OOOE window the L4 latency could be the difference between a core stalling and not stalling. which in turn means you get to keep run further and further ahead.

If throughput was such a problem intel wouldn't roll with 4 channel memory controllers for 18 core server chips but they do roll with more L3 cache per core then the consumer space.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Hmm..

Here is what I'm talking about.

http://images.anandtech.com/doci/9320/BDW-H-Map_575px.png

That is a lot of space for Iris graphics, space that could be used for something else. And, I also had suggested that Intel sell on SKU for enthusiasts with the Iris graphics disabled.
That would improve yields by making parts with all that space dedicated to graphics worthwhile if that space contains flaws.

If Intel can sell the 5675C (or even 5775C) for what it was asking, despite having all that space dedicated to graphics and despite having EDRAM and despite the fact that it was produced when 14nm yields weren't so good — then it seems quite difficult to argue that Intel can't afford to put EDRAM on a Skylake part or sell a Broadwell part with a higher TDP for enthusiasts with the Iris graphics turned off.

Also, when it comes to "dimensional physics" — does that require Intel to put in an EDRAM controller and not use it?

Also remember depending on how GT4 die is laid out Intel could use chop lines to make a smaller die with a smaller iGPU....but still keep the eDRAM controller.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
The more correct term is throughput, bandwidth can mean a very different thing depending on who your talking to. Most people take bandwidth to mean throughput.

Which i never contested.

And its because of decreased latency for data that was evicted from the L3. its all about race to stall, if there is a large enough OOOE window the L4 latency could be the difference between a core stalling and not stalling. which in turn means you get to keep run further and further ahead.

If throughput was such a problem intel wouldn't roll with 4 channel memory controllers for 18 core server chips but they do roll with more L3 cache per core then the consumer space.

Ian Cutress said:
This means, to quote, ‘if you have eight cache misses per thousand, you are now down to around two’ – I take this to mean a regular user workload but in a higher throughput environment, it could mean the difference between 2% and 0.5% cache misses out to main memory. Because the move out to main memory is such a latency and bandwidth penalty compared to an on-package transfer between the CPU and L3/L4, even a small decrease in cache misses has performance potential when used in the right context.
: )
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Regarding the discussion of HBM2 (for future APUs) which started early in this thread, I asked the following question in the memory and storage forum on possible applications (beyond iGPU):

http://forums.anandtech.com/showthread.php?t=2480064

P.S. Just to put things in perspective, each stack of HBM2 has a 1024 bit memory interface while a SO-DIMM or DIMM has a 64 bit memory interface. So a stack of HBM2 running at the same level of bandwidth as a SO-DIMM hypothetically would run 16x slower DRAM. And at faster speeds (such a 102 GB/s) for a single 8GB HBM2 stack the speed of the DRAM works out to be 800 level (the same speed as comonly found DDR2). With two 8GB HBM2 stack @ 102 GB/s all that would be needed is DRAM running at 400 level (the speed of commonly found DDR1).

That, if possible, should offer potential for some very tight timings.

P.S. (As an example) Slowest DDR3 speeds I have seen were 800 Mhz (the level of commonly found DDR2) and the slowest DDR4 speeds I have seen are 1600 (the level of commonly found DDR3) so I do know that DRAM has the ability to scale downward to some degree. How slow the DRAM found in HBM2 can slow down I don't have any idea though.
 
Last edited:

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
HBM is out of the question at least for Raven Ridge and it's possible successor in the consumer segment. AMD needs either to implement eDRAM or GDDR5X, or alternatively forget about the APUs completely. HBM is way too complex (expensive) and DDR4 frequencies won't reach high enough to provide sufficient bandwith on a 128-bit bus.

Sad but true.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
HBM2 is relatively cheap in comparison to LPDDR4 TSV/Wide I/O. The power becomes an issue if you are aiming for Fanless operation only.

HBM2 4 GB @ 1 GHz / 128 GB/s = Sub-3 watts. The spec is open for a more efficient shrink of DRAM which would be sub-2 watts.