Question Zen 6 Speculation Thread

Covfefe · Wednesday at 7:34 AM

basix said:
It really depends on the speed of the LPDDR5X modules. 8.5...9.6 Gbps are getting standard. 12.7 Gps is the ceiling. Strix Point mostly uses 7.5 Gbps modules and many notebooks also slow 5.6 Gbps modules.

8.5...9.6 Gbps = 1.14...1.28x bandwidth vs. 7.5 Gbps

10.7...12.7 Gbps = 1.43...1.69x bandwidth

The bigger L1 and L2 caches should be very effective. Compared to Strix Point without MALL it will be a very decent upgrade in bandwidth efficiency.
If then the universal compression comes into play, there should be another bandwidth amplification.

We do not know overall bandwidth efficiency upgrades. But summing all up together it could be on par with RDNA3 + 32 MByte MALL or at least close to it. AT4 could reach N44 performance levels.

RX 9060 XT has 320 GB/s memory bandwith

RX 9060 has 288 GB/s memory bandwith

128bit LPDDR5X with 8.5 Gbps yields in 171 GB/s

128bit LPDDR5X with 9.6 Gbps yields in 192 GB/s

128bit LPDDR5X with 10.7 Gbps yields in 214 GB/s

128bit LPDDR5X with 12.7 Gbps yields in 254 GB/s

192bit LPDDR6 with 10.67 Gbps yields in 284 GB/s (effective) --> probably enough bandwidth for a desktop grade part with high clock rates (AMD should have designed / dimensioned it that way)

192bit LPDDR6 with 12.8 Gbps yields in 340 GB/s (effective)

9.6 Gbps bandwidth for a mobile part might be a little bit tight. That is true. But as I said, we simply do not know RDNA5's overall bandwidth efficiency. Maybe LPDDR6 bandwidth levels are plenty enough and LPDDR5X is then sufficient for mobile parts with restricted TDPs. If 12.7 Gbps LPDDR5X gets used (but this memory is probably too expensive and not used), there should be no big difference in bandwidth compared to 10.67 Gbps LPDDR6.

Maybe I'm missing something, but your LPDDR bandwidth figures all look off.

LPDDR5X-9600 is 9.6*128/8 = 153GB/s.
LPDDR5X-12700 is 12.7*128/8 = 203GB/s.
LPDDR6-10667 is 10.667*192/8 = 256GB/s line speed, or 256 *256/288 = 228GB/s effective bandwidth.

basix · Wednesday at 8:02 AM

You are right, I somehow assumed 128 GB/s at 6.4 Gbps instead of 102.4 GB/s.

I fixed it in the original post.

But the general idea is still valid:
If LPDDR6 with 10.67 Gbps works out for a desktop grade product, LPDDR5X should be OK enough for a mobile part in most cases.

ToTTenTranz · Wednesday at 11:20 AM

branch_suggestion said:
L5X is gonna be around for a very long time, will steadily trickle down the pricing ladder but high volume stuff won't switch to LP6 until 2030+

Why? We're already getting LPDDR6 this year with the Snapdragon 8 Elite Gen 6 Pro / SM8975 and probably Mediatek 9600.

LPDDR5X took less than a year between being in the first smartphone and going into most higher-end laptop chips (Raptor Lake H/U and Phoenix).
Why would the PC market quadruple the latency to adopt the new memory standard, this time?

adroc_thurston · Wednesday at 11:23 AM

ToTTenTranz said:
Why?

PHY area.

ToTTenTranz · Wednesday at 11:34 AM

adroc_thurston said:
PHY area.

Isn't 48bit LPDDR6 going to be at least as fast, if not faster than 64bit LPDDR5X at a lower power? This ought to be a reason to accelerate adoption rate for mid/high end smartphones, not slow it down.

adroc_thurston · Wednesday at 11:49 AM

ToTTenTranz said:
Isn't 48bit LPDDR6 going to be at least as fast, if not faster than 64bit LPDDR5X at a lower power?

No, LP6 barely pushes pin rates up.

ToTTenTranz said:
This ought to be a reason to accelerate adoption rate for mid/high end smartphones, not slow it down.

No, LP6 is not coming to mainstream SoCs for a long long while due to catastrophic PHY area efficiency.

jpiniero · Wednesday at 12:09 PM

Funny, I was thinking that the PS6 Portable would use it.

basix · Wednesday at 12:18 PM

adroc_thurston said:
No, LP6 is not coming to mainstream SoCs for a long long while due to catastrophic PHY area efficiency.

Why are LP6 PHY area inefficient? Compared to LP5X I do not see any obvious reason.

If you want to say that area efficiency is on par with LPDDR5X (bandwidth per area) or not much better, OK.

Doug S · Wednesday at 5:23 PM

Tuna-Fish said:
Forgot to source it. You can find it clearly from Synopsys data sheet for the combo PHY here: https://www.synopsys.com/dw/doc.php/ds/c/dwc_lpddr6_5_5x_5_phy_ds.pdf Bottom of page 2.

You have to give them your phone number and email for the download, but they don't actually check anything except that they are potentially valid, you can just use throwaways if you want to.

OK so that's how I originally thought they worked, then someone here told me I was wrong about that - plus I later saw some article claiming the same. So I assumed they'd figured out some way to do something like a 96 bit wide combo controller that provided either 6 LPDDR5X or 4 LPDDR6 channels.

If that's not the case then for the case of a combo controller then LPDDR6 is better because it makes more efficient use of that resource. If you design with an LPDDR6 only controller though then there's no benefit - because the chip area and shoreline used 96 bits worth of LPDDR6 controller and 96 bits worth of LPDDR5X controller are essentially the same. I know that's not relevant to AMD since they are probably going to be forced to do combo controllers to provide OEM flexibility but not everyone will be forced to go that way.

Tuna-Fish · Wednesday at 7:07 PM

Doug S said:
If you design with an LPDDR6 only controller though then there's no benefit - because the chip area and shoreline used 96 bits worth of LPDDR6 controller and 96 bits worth of LPDDR5X controller are essentially the same.

They are not. The DQ pins are not the only pins on the interface. LPDDR5X has 72 active signals per 32-bit dual channel controller, while LPDDR6 has 84 active signals per 48-bit dual channel (4x half channel) interface. 3/2 data signals but only 7/6 times the pins. Or, 96-bit LPDDR6 uses only 168 signals, while 96-bit LPDDR5X uses 216. Even after you adjust for the 8/9 loss of efficiency from sharing the DQ pins, LPDDR6 comes out ahead.

LPDDR6 is a neat and efficient design.

Tangopiper · Wednesday at 7:24 PM

jpiniero said:
Funny, I was thinking that the PS6 Portable would use it.

192b LPDDR5X is the current suggestion

Det0x · Thursday at 8:21 AM

hmm X3DL3 = 3 layers ? 🧐
wishful thinking -> 48 + 3*48 = 192mb per CCD

What do you guys think ? Why would he write it like that for Zen6 while Zen5 is normal "X3D"

adroc_thurston · Thursday at 8:22 AM

Det0x said:
View attachment 138226

hmm X3DL3 = 3 layers ? 🧐
wishful thinking -> 48 + 3*48 = 192mb per CCD

What do you guys think ? Why would he write it like that for Zen6 while Zen5 is normal "X3D"

no such thing

Det0x · Thursday at 8:28 AM

Some ppl also said no "such thing" to 9950X3D2 also if memory serves me right

But yea, only wishful thinking/hoping from me on this one.. But they could if they really wanted / was under pressure from bLLC size

MS_AT · Thursday at 9:15 AM

Det0x said:
wishful thinking -> 48 + 3*48 = 192mb per CCD

Wouldn't that be less technical feasible? I mean atm they are using 1L 64MB die. I guess stacking 2 of those, if they wanted to stack would have been easier than producing smaller 48MB dies and stacking those 3 times. If for some reason, they would need L3 areas to match, so 48MB in CCD and 48MB in X3D die, then to recoup the capacity I could envision them doing 2L, so 96MB of additional L3. That is 144MB of L3 in total, matching Intel in marketing wars, and easier to produce.

Joe NYC · Thursday at 9:56 AM

Det0x said:
View attachment 138226

hmm X3DL3 = 3 layers ? 🧐
wishful thinking -> 48 + 3*48 = 192mb per CCD

What do you guys think ? Why would he write it like that for Zen6 while Zen5 is normal "X3D"

Most likely L3 as in L3 cache.

BTW, most of the other things in the post are wrong.

adroc_thurston · Thursday at 10:08 AM

MS_AT said:
smaller 48MB dies

The die size is always an exact match of the CCD.

Joe NYC · Thursday at 10:12 AM

Det0x said:
Some ppl also said no "such thing" to 9950X3D2 also if memory serves me right
But yea, only wishful thinking/hoping from me on this one.. But they could if they really wanted / was under pressure from bLLC size

That would certainly be a dream for AMD if they managed to sucker Intel into L3 size competition, where AMD would be stacking cheaper dies with low latency SRAM, while Intel is ballooning the N2 die size and increasing latency.

Covfefe · Thursday at 10:22 AM

adroc_thurston said:
The die size is always an exact match of the CCD.

Nope, it uses structural silicon. AMD gave some details when the 5800X3D launched.

And you can see the structural silicon after delidding. Like this pic of the 7950X3D found in this article.

Hitman928 · Thursday at 10:35 AM

Covfefe said:
Nope, it uses structural silicon. AMD gave some details when the 5800X3D launched.
View attachment 138229
And you can see the structural silicon after delidding. Like this pic of the 7950X3D found in this article.
View attachment 138228

Starting in Zen 5 the cache chiplet is now on bottom and matches the CCD geometry.

adroc_thurston · Thursday at 10:36 AM

Covfefe said:
Nope, it uses structural silicon. AMD gave some details when the 5800X3D launched.
View attachment 138229
And you can see the structural silicon after delidding. Like this pic of the 7950X3D found in this article.
View attachment 138228

Zen 5 moves the cacheslab to the bottom and they're the exact same die size (guess why).

Covfefe · Thursday at 10:48 AM

Hitman928 said:
Starting in Zen 5 the cache chiplet is now on bottom and matches the CCD geometry.

adroc_thurston said:
Zen 5 moves the cacheslab to the bottom and they're the exact same die size (guess why).

One generation of the cache chiplet matching CCD size is not "always an exact match". Zen6 could move the cache back on top, or use structural silicon underneath.

Joe NYC · Thursday at 11:36 AM

Covfefe said:
One generation of the cache chiplet matching CCD size is not "always an exact match". Zen6 could move the cache back on top, or use structural silicon underneath.

How is this: Henceforth, cache die will be underneath.

adroc_thurston · Thursday at 11:52 AM

Covfefe said:
One generation of the cache chiplet matching CCD size is not "always an exact match". Zen6 could move the cache back on top, or use structural silicon underneath.

It is always an exact match since they moved to wafer on wafer stacking.

Z O X · Thursday at 12:19 PM

Covfefe said:
One generation of the cache chiplet matching CCD size is not "always an exact match". Zen6 could move the cache back on top, or use structural silicon underneath.

Why not on both sides for cache galore?

adroc_thurston said:
It is always an exact match since they moved to wafer on wafer stacking.

So it's 48MB L3 in one plane and only 2*64MB underneath the whole CCD area (L3+12 cores)?
Doesn't seem logical or am I missing something...

Edit: correction

Question Zen 6 Speculation Thread

Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Lifer

Senior member

Diamond Member

Golden Member

Member

Golden Member

Diamond Member

Golden Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Member