bitsandchips[RUMOR] Bristol Ridge could have a 16CU GPU

csbin

Senior member
Feb 4, 2013
908
614
136
http://www.bitsandchips.it/9-hardware/6675-la-gpu-di-bristol-ridge-potrebbe-avere-16cu-gcn

According to one of our sources, the next AMD's APU Bristol Ridge could have a 16CU GPU (the same Radeon HD7850's CU), doubling the Compute Units that we have seen in Kaveri/Godavari APUs.



Thanks to the high density 28nm node of GloFo will be possibile to integrate 16CU in the next GPU, but the problem will be the available bandwidth. Despite Bristol Ridge will use a Dual Channel DDR4 memory controller, the Bandwidth will be equal to 50GB/s at most. In any case, the performances of this kind of GPU will double the performances of Godavari GPU.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Here is a picture of the Carrizo die (which we have been led to believe is the same as Bristol Ridge, but with DDR3 enabled rather than DDR4)

carrizo-die-shot.png


If true, maybe they just extended the GPU portion a bit further and kept everything else the same?

With a DDR4 2400 IMC it would be starved for bandwidth (even with GCN 1.2 DCC increasing bandwidth efficiency ~35% over what Kaveri had).

But then again OEM Kaveris were bandwidth starved at dual channel DDR3 1600 and AMD didn't seem to care.
 
Last edited:
Aug 11, 2008
10,451
642
126
How can it "double the performance" when it is bandwidth limited already? Seems like throttling would be a problem as well, since Kaveri/Carrizo already has issues with half the gpu resources. Maybe at 14 nm with HBM they could do something like this, but even with HBM on 28 nm I still dont see how it would work, since Kaveri and Carizzo already throttle with half the EUs. Or maybe they increase the TDP to 125 watts or something.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Entertaining the idea that this rumor is true, I wonder how quad core Excavator with 1024sp and DDR4 2400 IMC would do as mobile chip?

Here is 15W/25W and 35W/42W Carrizo (done by The Stilt using GTA V) as a baseline:

15W-25W

9fab58c2_GTA-V_15-25W-1600-DAR-CLK.png


35W-42W:

2fc49047_GTA-V_35-42W-2133-DAR-CLK.png


Assuming 45W is available, how much of the 1024sp iGPU could be utilized during gaming?

Would the clocks be too low? Or actually could a person get some decent mileage out the iGPU (ie, not a waste of silicon area).

If decent enough clocks are available and performance per watt improves over 512sp maybe this would not be such a bad idea for a 1366 x 768 laptop?
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,289
136
What if it has a GMI link with another Bristol Ridge die?

8 XV cores and 16 CUs.

This performance improvement over Carrizo family of APUs is the result of AMD engineers “applying more aggressive power management to the 28nm design,” the US-based chipmaker said at ISSCC 2016. “The Bristol Ridge design was a study in using power management to overcome performance limits tied to heat, voltage and current.”
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Not a chance in hell for this to be true.

Unless this is a semi-custom design for a customer.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
A silly rumor and completely made up. Bristol Ridge has 8 graphics compute units and 8 ROPs, just like Carrizo (for obvious reasons).

AMD isn´t that stupid. Doubling the GPU units would be like a third world solution to famine: When you can´t feed your two children, make two new.

If Raven Ridge (Zen) implements HBM2 then such GPU configuration is likely. The odds for Raven Ridge implementing HBM in any form are non-existent, thou.

Even the current eight units in Carrizo & Bristol Ridge is overkill, since there is no way even remotely to saturate them fully.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Unless this is a semi-custom design for a customer.

Hard to imagine what kind of a twisted customer would want a semi-custom design based on Carrizo / Bristol Ridge. The 15h architecture is taking it´s final breaths anyway.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Hard to imagine what kind of a twisted customer would want a semi-custom design based on Carrizo / Bristol Ridge. The 15h architecture is taking it´s final breaths anyway.

Just a thought, with GDDR-5 controller a 4 Core 16 CU die close to 300mm2 will be enough to overcome the PS4 in performance at lower power. Im not saying this product exists but what it could be.
 

deasd

Senior member
Dec 31, 2013
603
1,033
136
Neglecting starved bandwidth is not that unreasonable, even nowaday dGPU could achieve higher performance by overclocking GDDR which means bottleneck is everywhere and shouldn't be demonized.

Only if AMD(Glofo) could control the TDP in a proper range, I guess either 16CU config could exceed 95w TDP or the yield and cost is good enough that it could be around 95w.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Neglecting starved bandwidth is not that unreasonable, even nowaday dGPU could achieve higher performance by overclocking GDDR which means bottleneck is everywhere and shouldn't be demonized.

Only if AMD(Glofo) could control the TDP in a proper range, I guess either 16CU config could exceed 95w TDP or the yield and cost is good enough that it could be around 95w.

Maybe they could use a second Bristol Ridge variant for laptop.

So instead of having (for example) 35W/42W Bristol Ridge I (512sp) at 750 Mhz during a game, there could also be a 35W/42W Bristol Ridge II (1024sp) at 450 to 500 Mhz during a game.

1024sp @ 450 Mhz to 500 Mhz > 512sp @ 750 Mhz

P.S. 1024sp @ 450 Mhz to 500 Mhz would be hard on the bandwidth....but maybe works well enough in 1366 x 768 or 1600 x 900 laptop?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Neglecting starved bandwidth is not that unreasonable, even nowaday dGPU could achieve higher performance by overclocking GDDR which means bottleneck is everywhere and shouldn't be demonized.

Shouldn´t be demonized? Despite the issue, which AMD has been uncapable to solve for five years now is killing their APUs?

You do realize even the most memory bandwidth starved AMD dGPUs (such as Bonaire based) have > 53MB/s per GFlops dedicated memory bandwidth available?

Meanwhile the fastest Godavari APUs (e.g. A10-7870K) have 38.5 - 43.3MB/s per GFlops shared bandwidth (official / unofficial MEMCLK).

The fastest Bristol Ridge AM4 part will have less than 34MB/s per GFlops of shared bandwidth.

Adding more compute units would be insanity. Ideally Steamroller / Excavator designs would have six units which would be clocked higher.
 

SPBHM

Diamond Member
Sep 12, 2012
5,066
418
126
1024sps could make things very interesting, but, that memory bandwidth... it would probably need an Iris Pro style cache on top of the fast DDR4 to perform close to the way it should
 

hojnikb

Senior member
Sep 18, 2014
562
45
91
1024sps could make things very interesting, but, that memory bandwidth... it would probably need an Iris Pro style cache on top of the fast DDR4 to perform close to the way it should

Or gddr5
 

NTMBK

Lifer
Nov 14, 2011
10,438
5,787
136
You need some serious shader throughput for those pachinko machines.
 

deasd

Senior member
Dec 31, 2013
603
1,033
136
Shouldn´t be demonized? Despite the issue, which AMD has been uncapable to solve for five years now is killing their APUs?

You do realize even the most memory bandwidth starved AMD dGPUs (such as Bonaire based) have > 53MB/s per GFlops dedicated memory bandwidth available?

Meanwhile the fastest Godavari APUs (e.g. A10-7870K) have 38.5 - 43.3MB/s per GFlops shared bandwidth (official / unofficial MEMCLK).

The fastest Bristol Ridge AM4 part will have less than 34MB/s per GFlops of shared bandwidth.

Adding more compute units would be insanity. Ideally Steamroller / Excavator designs would have six units which would be clocked higher.

People arguing & worrying bandwidth problem since Llano.
Let me emphasis again: IF AMD(Glofo) get good enough process(cost down, yield up), why not have bigger die/more CUs?
Your statements and data are just be reference, not strategy or manufacturing idea. Both me and you are the same just have wishful thinking. I'll wait & see what happen next few months.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Why would they add more units to the die when they can't even utilize the units currently present? A10-7870K paired with DDR-2400 memory looses around 25% of it´s performance potential already, due the lack of bandwidth.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
One area that will benefit from doubling the CUs is Compute (OpenCL etc). They dont need that much Memory bandwidth like games.
 

NTMBK

Lifer
Nov 14, 2011
10,438
5,787
136
One area that will benefit from doubling the CUs is Compute (OpenCL etc). They dont need that much Memory bandwidth like games.

Please, not this myth again...

A lot of the OpenCL benchmarks out there don't need a lot of memory bandwidth. That is because they are bad benchmarks. They are like Dhrystone, and don't exercise the memory subsystem. There are plenty of GPGPU loads out there that depend on memory bandwidth, and are on the GPU for that precise reason.