AMD Bristol/Stoney Ridge Thread

Page 61 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
Even though I do think the GloFo Zen 3 is real, I totally agree with this. I'm not sure why AMD thinks this is a market worth spending resources on even if GloFo wafers are cheap. Dali does need to be upgraded if they are going to continue with it though.
Cue Monet.
GloFo Zen3/Monet is definitely not real.
4c Zen3 & 3+ RDNA2 WGPs at a goliath-size of 250+ mm2.

When Dali/Pollock is directly succeeded by Mendocino.
Family 17h 20h-2Fh -> Family 17h A0h-AFh
0x820F00 -> 0x8A0F00
FP5/FT5 128-bit/64-bit DDR4-2400 -> FT6 128-bit/64-bit LPDDR5-6400
14nm Dual-core Zen/GCN CUs -> 6nm Quad-core Zen2/RDNA2 CUs

From a cost-perspective there is no expected cost increase going from Dali/Pollock's $300 range to Mendocino's $300 range. However, Mendocino's $200 reach is more cost-prohibitive, hence Mendocino's increased salvaged die capability.

----
However, AMD is technically using different edges between Stoney and Raven2/Mendocino. Where Stoney is quite literally designed and taped out at the end.
28nm design in 2011 -> Stoney design start 2014 -> Stoney prod in 2016

For perspective, if it occurred the same with 22FDX:
22FDX(Milestone 6) design in Q4 2016 - Q1 2017 -> Equivalent time frame is 2020 design start -> Production in 2022.

However, it might make sense to only design once trailing-edge IP catches up with leading-edge IP. Allowing for DDR5/PCIe5/USB4 to be on die, while only launching with DDR4/PCIe4/USB3.2. So once in the case of DDR5/LPDDR5 getting cheaper, it only takes a refresh(same maskset) to enable it.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
11,322
3,052
136
I'm pretty sure AMD currently (in the current shortage and supply chain disruption) doesn't think so. But the easy answer is: WSA. $1.6 billion need to be spent on wafers from GloFo either way, better spend it on something that can fetch a decent amount even if its a budget chip. Cue Monet.
Funny thing is, the more profitable idea would be to re-release Polaris and sell mining cards. Patch it to use GDDR6 if necessary.
 

LightningZ71

Golden Member
Mar 10, 2017
1,236
1,250
136
Can you imagine the absolute bank AMD could be making with an rx560, 8GB mining card right now? Baffin's relatively tiny and a 12nm uplate that includes the ability to run 8GB ram would make for an efficient and cheap mining card.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
Yet, another use case where 22FDX can re-use 28nm designs and have lower power consumption than FinFETs. Especially, with AMD's slew of body-biasing GPU research/patents that went nowhere in FinFETs.

Lower electrical costs thus faster return on investment. Especially, with 22FDX loving big phat compute chips, with body-biasing decreasing power to a maximum savings of 90%. Three shrinks plus one optimization [50%(shrink)+50%(shrink)+50%(shrink)+20%(optimization)] worth of power savings.

1x monolithic R9 Nano--175W TDP 600 mm2 split into 4x MCM four sub-200 mm2 dies, upgrade HBM into HBM2e. While also providing scaling from 4x 1024 ALU to 1x 1024 ALU cryptomining SKUs. With best case scenario is the 4x die SKU consuming >17.5W for same rate of the R9 Nano. There isn't a need for extra-fast HBM2e either, just use higher capacity slower-salvaged DRAMs to cut cost.

Then, there is the reduced need for salvages do to lower variability:
22fdx.png - circa 2018
Transistor-side
Reduced variability on masks/process steps wafer-side. Less BTI/HCI effects affecting design-side, etc.

It also follows ETHZurich's Manticore for AI/Compute but for Cryptocurrency, while following RX 5700XTB, RX 5700B, and RX 5500XTB HBM2 Cryptomining *cough* Blockchain cards.

It is also the recommended node for cryptocurrency on GloFo's China website.
 
Last edited:

NTMBK

Diamond Member
Nov 14, 2011
9,696
3,536
136
Yet, another use case where 22FDX can re-use 28nm designs and have lower power consumption than FinFETs. Especially, with AMD's slew of body-biasing GPU research/patents that went nowhere in FinFETs.

Lower electrical costs thus faster return on investment. Especially, with 22FDX loving big phat compute chips, with body-biasing decreasing power to a maximum savings of 90%. Three shrinks plus one optimization [50%(shrink)+50%(shrink)+50%(shrink)+20%(optimization)] worth of power savings.

1x monolithic R9 Nano--175W TDP 600 mm2 split into 4x MCM four sub-200 mm2 dies, upgrade HBM into HBM2e. While also providing scaling from 4x 1024 ALU to 1x 1024 ALU cryptomining SKUs. With best case scenario is the 4x die SKU consuming >17.5W for same rate of the R9 Nano. There isn't a need for extra-fast HBM2e either, just use higher capacity slower-salvaged DRAMs to cut cost.

Then, there is the reduced need for salvages do to lower variability:
View attachment 53357 - circa 2018
Transistor-side
Reduced variability on masks/process steps wafer-side. Less BTI/HCI effects affecting design-side, etc.

It also follows ETHZurich's Manticore for AI/Compute but for Cryptocurrency, while following RX 5700XTB, RX 5700B, and RX 5500XTB HBM2 Cryptomining *cough* Blockchain cards.

It is also the recommended node for cryptocurrency on GloFo's China website.
If 22FDX was so good for big compute chips, AMD and Nvidia would be using it. They know full well the offerings from GloFo, and they instead chose to use Samsung and TSMC.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
If 22FDX was so good for big compute chips, AMD and Nvidia would be using it. They know full well the offerings from GloFo, and they instead chose to use Samsung and TSMC.
It isn't if it was so good, it is that it wasn't available while doing FinFETs. Nvidia is too far leading edge to even bother with GlobalFoundries.

Where Nvidia has full stack of 8nm Samsung, no 28nm/16nm/12nm. So, unless Samsung is hiding 8nm FDSOI(STM's 10nm FDSOI node) for Nvidia's use for an low power Ampere refresh. It is rather unlikely Nvidia would fallback to 28FDS or 18FDS. Samsung has a bigger capital expenditure so expecting 8nm FDSOI as an 8LPP/8LPU cost-saving option is likely.
fdsoifinfet.jpeg
10nm FDSOI from STM: 0.7V Vdd-nom
10nm/8nm FinFET in the fields of Samsung: 0.75V Vdd-nom, so there is a case for actual 10nm FDSOI reducing the cost and power of Ampere.

10FD initial transistors is the first time a plus is not needed.
28FD required 28FD+ to get faster than 20HKMG offerings. Beginning slower than 20nm, out prod comes out faster than 20nm.
14FD required 14FD+ to get faster than 16/14FinFET offerings. Beginning slower than 14nmFF, out prod comes out faster.
10FD doesn't require 10FD+ to get faster than 10/8FinFET offerings. Beginning faster than 10nmFF, out prod is much faster.

|| 22FDX uses 14FD+ transistors, whereas 22FDX+ uses 12FD transistors which are 10FD transistors but w/o 10nm FDSOI Dual-STI bi-directional body-biasing perf booster. Where as 12FDX-2019 uses 12FD transistors and uses Dual-STI bi-directional body-biasing.

However, AMD has precedent of using prior nodes on lower end parts for extended amounts of time:
Radeon HD 7670 (Turks XT) - 40nm
Radeon HD 7470/Radeon HD 8490/Radeon R5 235X (Caicos) - 40nm
Radeon RX 455 (Bonaire)
Radeon R7 450 (Cape Verde)
Radeon 530/Radeon 610 (Topaz) (2017/2019) <-- New die relative(No UVD/VCE/DCE) to Oland.
Radeon Instinct MI8 (Fiji) (2016)

On the MI-level(Big GPU) and the Radeon-level(Small GPU), AMD has an increasing large un-exploited gap. Where they were still using 28nm parts, thus there is no reason not to expect 22FDX at GlobalFoundries from AMD.

AMD has yet to announce the node for the ULP Arch:
gpuarch.png

If there is a partnership going on with AMD and GlobalFoundries, it is with a node that has scaling internal to GlobalFoundries.
CDNA TSMC 7nm -> CDNA2 TSMC -> CDNA3 TSMC
RDNA TSMC 7nm -> RDNA2 TSMC -> RDNA3 TSMC
XDNA GloFo 22FDX -> XDNA2 GloFo -> XDNA3 GloFo

28nm/22nm PowerVR/Adreno/Mali is wiping the floor against 28nm GCN in power efficiency.

Alternatives to GPGPUs on 28nm/22nm offer full-rate FP64 SIMD and INT8 Matrix. While low-end 28nm/22nm GPGPU's are adopting Ray-Tracing.

GCN Full-CU per scheduler: 4x Wave16(SIMD) 32-bit (4*64B/4x 16 Load-Store PRF:4 TMUs)
to
XDNA Tiny-CU per scheduler: 1x Wave16(SIMD or Matrix) 64-bit (1x128B/1x 16 Load-Store PRF:4 TMUs:1 RT[or equivalent performing alternative]).
Change in 32-bit to 64-bit in this way should scale an area cut of 0.8x on same node. With 22FDX 0.5XLogic/0.9XMemories should benefit more.

There is also a bigger availability of 3rd party ip/semi-custom options for 22FDX. In turn returning to where full semi-custom SoC isn't largely just Microsoft or Sony.

Geode = Last Time Purchase 2021
Excavator = Last Time Purchase 2021
Jaguar = Last Time Purchase 2023
Embedded will eventually need to have parts more effectively succeeding Geode/Excavator/Jaguar eventually.
As is the QM215, RK3566, BCM2711, JH7110 are just better.

The various mentions of ULP CPU Architecture, ULP GPU Architecture, and ULP SOC. With who are working on those projects and the explicit repeated ultra-low-power mentions more or less has me thinking 22FDX-ULP[14FD+ Wafer/Xtors]/22FDX-ULP+[12FD Wafer/Xtors] than 12LP+.

Extending from 22FDX/14FD+ to 22FDX+/12FD, should be similar to 22GP to 22FFL(basically-14nm++ performance) at Intel.
g.foolcdn.com.png
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
3,306
4,551
136
Funny thing is, the more profitable idea would be to re-release Polaris and sell mining cards. Patch it to use GDDR6 if necessary.
In the current state possibly, but you have to plan well in advance and risk supplying the market with an outdated product nobody wants to buy. The current state is an exception not a new normal state to base predictions on, the whole industry struggles to get back to the usual modus operandi again.
 

NTMBK

Diamond Member
Nov 14, 2011
9,696
3,536
136
It isn't if it was so good, it is that it wasn't available while doing FinFETs. Nvidia is too far leading edge to even bother with GlobalFoundries.

Where Nvidia has full stack of 8nm Samsung, no 28nm/16nm/12nm. So, unless Samsung is hiding 8nm FDSOI(STM's 10nm FDSOI node) for Nvidia's use for an low power Ampere refresh. It is rather unlikely Nvidia would fallback to 28FDS or 18FDS. Samsung has a bigger capital expenditure so expecting 8nm FDSOI as an 8LPP/8LPU cost-saving option is likely.
NVidia has a much larger product line up than just cutting edge parts. They are still cranking out 16nm Switch parts, embedded GPUs and Tegras on older processes, and they brought the Samsung 14nm 1050ti back from the dead to fight the GPU shortages. There's parts that they could move over to 12FDX in order to expand volume (e.g. entry level GPUs), if the process were as good as you claim.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
NVidia has a much larger product line up than just cutting edge parts. They are still cranking out 16nm Switch parts, embedded GPUs and Tegras on older processes, and they brought the Samsung 14nm 1050ti back from the dead to fight the GPU shortages. There's parts that they could move over to 12FDX in order to expand volume (e.g. entry level GPUs), if the process were as good as you claim.
The issue isn't if the process is as good as GlobalFoundries' or Samsung's claims. But, rather it is suffering from "it is not out now syndrome" or "it doesn't scale with what generation the shrinks are on currently"

Nvidia would be specific to Samsung not GlobalFoundries, the nodes are:
28FDS [2015] // 28FD Xtors
28FDS Gen2 [2017] // 28FD+ Xtors == no numbers from Samsung whatsoever.
18FDS [2019] // 14FD Xtors

18FDS is not area or performance competitive with 14nm Family nodes at Samsung.
18fds.png

18FDS = 90CPP/64Mx, effectively succeeding 20LPP from Samsung.
14LPP = 78CPP/64Mx, ~30% more performance and ~0.786x lower area than 18FDS.

For a shrink going forward it has to be 10nm FDSOI which targeted 64CPP/48Mx(at STM) which is in between 10LPP's 68CPP/48Mx and 8LPP's 64CPP/44Mx.

Relative to GlobalFoundries, Samsung other than power optimization over performance(compared to STM). Samsung is 1:1 with STMicroelectronics roadmap but delayed including using non-pluses first.

18FDS Gen2 at earliest is 2021 and 8FDS Gen1 at 2023.

Switch is EOL on 16nm, btw. With NX2/Dane/Shield2, Embedded and Tegras being 8nm, with prior stack rest-their-souls being stone-cold-dead-not-being-fabbed. So the stack of value is still more or less 8nm-only going forward.
----------------------
On the AMD-side, they are required to fab at GlobalFoundries and they don't need the shrink.
28nm IP as-is is good enough, but what is available to 22FDX: LDMOS for PMIC intergration, RF for RF integration, Analog for Audio Codec integration. Means that the effort to go from 28nm to 22FDX means lower costs to board manufacturers. However, I believe the above will most likely be pushed to semi-custom. The idea is to go from good-enough on 28nm to excelling on 22FDX with newer IP hence ULP Archs. Which AMD can do and since the bottom stack is QM215, RK3566, BCM2711, JH7110, PicoRiov3, PowerPI, etc. There isn't any push to use 7nm/5nm leading-edge for competing against them. So, it is better to opt to use 22FDX instead.

Rather than finesse with a 150 mm2/170 mm2 14nm/6nm leading-edge die and to push for low cost-low power. There is cost-savings from refactoring from the 107 mm2/125 mm2 28nm trailing-edge dies to push for low cost-low power.

Refactor Jaguar-Excavator into a single ULP CPU arch. With Jaguar's client/embedded-focused FE,Core,LS,FPU,BU,L2 rather than Excavator's HPC/server-focused FE,Core,LS,FPU,BU,L2.
Refactor GCN to ULP GPU Arch. Value-ended integrated GPU is going more aggressively half compute[FP64/INT8]-half gaming[RT processor-inclusion].
Refactor system IP to fit into sub-100 mm2.
----------------
The rant:
14fdvs14nmpp.png
Upscaled 14FD+ w/ 36 Masks to upscaled 14++ w/ >47 masks.

Then, there is 22FDX+ in second half 2020 with RF+ in Q1 2021. With improved digital performance: "With both digital and RF enhancements, the new 22FDX RF+ solution is optimized to boost the performance of front-end-module (FEM) designs." With no mention of how much the digital aspect even improved, either....
https://www.globalfoundries.com/sites/default/files/22fdx-product-brief.pdf <== Updated in 2019, still used 2016-2017 transistor performance instead of 2018-2019 transistor performance.

https://gf.com/sites/default/files/2021-10/GF21-22FDX 0927.pdf <== Updated in 2021, still using 2016-2017 numbers, doesn't even mention 22FDX+ which they launched the year before. Also, has an error built in with 28FDSOI instead of 28BULK.

Compared to the 28nm generation, it is very easy to get performance numbers for: 28HP -> 28HPP -> 28SHP/28A -> 28HPA. Getting performance numbers for every little transistor improvement for 22FDX is pretty difficult, since GlobalFoundries isn't bragging about them.

22FDX Risk Production transistor which is shown up above in the GF22FDX versus everything and in the PDFs. The 22FDX Volume Production transistor which appeared in Q3 2019, then there is the 22FDX+ transistor which appeared in Q3 2020. Basically, two major spins of 22FDX not benched at all other than in paywalled IEEE papers.

22FDX RP Xtor has an expected Tsi of 7nm, BOX=20nm.
22FDX VP Xtor has an expected Tsi of 6nm, BOX=20nm.
Which has relative huge implications on performance, since Tsi^2 and Tsi is in the equations for electrostatics. There is also the source/drain perf booster, gate implant/oxide perf booster, and MOL contact lower resistance/capacitance perf booster. Which were not present in 22FDX risk, but were when 22FDX went volume in Q3 2019.

22FDX+ from STMicroelectronics Roadmap has BOX=15nm and Tsi reduced by 15%-10%, which means Tsi is 5.1nm~5.4nm.

Which is involved with this:
KOYu5AD.png

There is also a new "Nano-thin SOI" transistor to better exploit upcoming Tsi=3.5/Box=10, demonstrating back at IBM days a planar Lg/Effective Gate of ~11nm. Which is supposed to be introduced last in 22FDX+, but first in 12FDX; effectively CTI from 90nm/65nm generation = https://www.anandtech.com/show/2018 .. Replacing the Ultra-Thin SOI transistor which was built for Lg=100-50nm and Tsi=50-10nm.
-----
Back to AMD:
Start in early production for most obsolete implementation or start in mature production for least obsolete implementation.
22FDX 2019 -- DDR4/LPDDR4 only
22FDX 2021-2022 at least has DDR5(Open-Silicon IP)/LPDDR5(Innosilicon{TSS-OPENEDGES} IP).

Even though A10 Micro-6700T/a8-7410 embedded versions are still technically available for purchase up to 2023. It probably would be more in the open if it also had DDR4 capability.

Dual/Four cores @ 1.5 MB L2, no L3. = $37 to $80 RCPs.
There is still a case for no L3.

Starting off with Jaguar's FPU: A single Jaguar FPU is 0.39 mm2, with 2cJag being 5.42 mm2 if you take out the two FPUs. Also, when you have a Jaguar core and flipped horizontal Jaguar core:
amdcmtgain.jpeg
The FPU area is directly next to each other. 2x FP-Mul and 2x FP-Add, the only thing needed is a bridge unit for FMAs, and the addition of AVX2-support. Then FPU unit-wise it is similar to Zen.
amdcmtfpu.png
Was to lazy to hack the rename parts which probably would be fused like Zen3's 2x FPU unit. Zen3's FPU diagram(Zen3 Recap/Zen3 overview slides) show a single PRF, but as you can see on die, Zen3 is two 160?-entry PRFs tied to three 256-bit units.
zen3fpu.jpeg

Zen1, single 160-entry large PRF tied to four 128-bits units:
zenprf.jpeg

Flat shrink via measurement tools;
If early node promises actually followed the graph, green-line 22FDX and purple-line 22FDX+(22FDX-FBB=22FDXPlus-ZBB):
perfpowe22fdx.jpeg

Scaling from the >52% slide:
52percent.jpg

From a refactoring and improvement side:
Jaguar being 5-tiled, Zen/Zen2 20-tiled, Excavator 63-tiled. Leans towards refactoring up from Jaguar being the cheapest.
-----
sccmp.png
1x Single Core = 1x Area, 1x Perf
2x Thread in SMT = ~1.05x Area, ~1.3x Perf
2x Core in CMP = 2x Area, ~1.7x perf
2x Cluster in CMT = ~1.5x Area, ~1.8x perf
^-- Actual scaling that CMT was meant to do.
-----
Q4 2009--Q1 2018 only positive N.I. = $7.631B
Q4 2020--Q3 2021 N.I. = $12.778B
Risk is lower than ever, since in a continuous period of less than one year, AMD made more than they made discontinuously over nine years.

22FDX -> 12FDX == Totally a GlobalFoundries route.
12LP+ -> ???? == Not a GlobalFoundries route.

22FDX w/ Zen-like and RDNA-like performance has a shrink available down the road.
12LP+ w/ Picasso-(plus sized) Zen3 and RDNA2 at reduced performance does not have a shrink available.
(Monet = 4C Zen3 + A couple of WGPs, which is in-line with;
https://www.amd.com/en/products/apu/amd-ryzen-3-3350u
https://www.amd.com/en/products/apu/amd-ryzen-5-3450u
Which happens to also be Mendocino's target range as well with Zen2+RDNA2 on TSMC's cost-effective 6nm.
Cost per layer compared: SADP/SAQP layers = 0.71x/1x each layer && EUV layer = 0.57x each layer, basically 7EUV(6nm) costs the same as 14DUV(14/12) in HVM.))

For GlobalFoundries it is better to have AMD have a CPU-GPU-APU-SCBU product line on their highest returning node, rather than a single APU on their lowest return node. In this case AMD is buying 22FDX for 12FDX down the road. (If extra capacity is needed apart from Fab 1/Bernin II, there is Fab 7/Pasir Ris and Fab 8/O'Fallon.)

Whereas until JFIL springs to life, there is no 7LP coming from buying 12LP+. In a really bad cross-translated news article: Canon will strive to apply the NIL mass production technology to multiple chip fields, including PC, CPU, DRAM and other chip equipment. Canon stated that this technology can be applied to the manufacturing of 5nm chips within 2025, that is, within the next 4 years.
---
Finding LV cache: The 22FDX Low-Power CPU/GPU architectures' L2 caches are apparently using the the 0.65V 6T LV Cell (in a custom cache design) from that 22FDX v. 22FFL slide.

Three CPUs categories and their most current introduction family: Performance(Zen), Value(Jaguar), Pervasive(Bobcat)
GPU are split in three as well and their most current implementations: Compute(CDNA), Gaming(RDNA), Quiet/Value(Not yet revealed)
Value-Pervasive => ULP CPU Arch
Value-Quiet => ULP GPU Arch

Performance is sub-categorized: Zen4(Performance Cache) - Zen4c(Dense Cache)
ULP Arch's cache design is also suppose to be scaled up to Zen[x][?](Low-voltage Cache) and CDNA[x][?](Low-voltage cache). I assume these two will be released as reduced TDP versions for low cost cooled server solutions. CPU-side Full Core Count: 85W/99W/115W and GPU-side Full CU Count: 100W/125W/150W.
TSMC 7nm SRAM Vmin is 0.5V
TSMC 5nm SRAM Vmin is 0.4V
LV Cache is meant to operate near those voltages but the more near the more large the cache design gets. If the L2/L3 operate at those, so will the CPUs, Zen2 64C/128T seems to operate at 0.85v on average. In this case, with LV cache+shrinked ULP BPU, the core would be pushing 0.65V or lower nominally. The range does appear to go straight to Vmin, opening up 0.4V through sub-0.75V.
96c/192c w/ Reduced Capacity/LV Cache = sub-150W target
96c/192c w/ Performance Cache = 200W+ up to 600W
128c/256c w/ Dense Cache = 200W+ up to 400W?

This hints at the low voltage SKU on 22FDX being 1/4th the TDP of normal voltage SKU of 28nm(Bhavani/Beema and Stoney).
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
3,306
4,551
136
Interesting quotes relating to this thread:

"A small team spent much of 2012 exploring how to get back on track with high-performance computing. It was clear that derivatives based on the Bulldozer architecture were not going to close the gap with one-year cycles of innovation. The team needed enough time to produce a new baseline architecture."

Bulldozer, Jaguar and system architecture teams were merged for the resulting core to cover everything from server to laptop. "A new independent subteam was created to focus on power through the full development cycle". The team behind the ARM K12 core moved onto Zen 3.

Also explains why innovation cycles will keep being significantly longer than one year. @DrMrLordX
 

NTMBK

Diamond Member
Nov 14, 2011
9,696
3,536
136
Interesting quotes relating to this thread:

"A small team spent much of 2012 exploring how to get back on track with high-performance computing. It was clear that derivatives based on the Bulldozer architecture were not going to close the gap with one-year cycles of innovation. The team needed enough time to produce a new baseline architecture."

Bulldozer, Jaguar and system architecture teams were merged for the resulting core to cover everything from server to laptop. "A new independent subteam was created to focus on power through the full development cycle". The team behind the ARM K12 core moved onto Zen 3.

Also explains why innovation cycles will keep being significantly longer than one year. @DrMrLordX
Funny, he didn't mention all those teams working on FD-SOI Bulldozer derivatives...
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
AMD Zen for HPC design was 2012-2015 after that it was in Si.
AMD Revisioned 15h was 2013-2015; SteamrollerB and ExcavatorHDL. Then, there is the Ultra Low Power architecture that spun off that through 2016-2018.

Stoney Ridge: 2014-2015 => Si in 2016

Bulldozer FDSOI derivative start in 2013, one year after June 2012(STM allows GloFo's customers to 28FD and 20FD) and ended between the 2014 Tech Seminar(late 2014) and the death of the 20nm APUs(early 2015):
fdsoi.png
fdsoi2.png

The CMT FDSOI renewed design spawned off Stoney Ridge's design period 2014-2015. With the new angle of targeting GlobalFoundries Advanced FDSOI, with it being named 22FDX in late 2015. Hence, why its architcture was developed in the period 2016-2018.

2012-2013 -> AMD swaps 15h from HPC to LP.
Late 2014 -> GlobalFoundries reveals Advanced FDSOI wrecking 20nm.
1st Half of 2015 -> AMD Cancels 20nm APUs.
2nd Half of 2015 -> GlobalFoundries announces 22FDX.
2016 through 2018 -> New Ultra Low Power CPU/GPU/Cache architecture
2019 through present -> New Ultra Low Power SoC architecture
Late 2020 -> 22FDX+ is announced

Through the duration of Zen's design.
Steamroller, StreamrollerB, Excavator-HDL, Excavator-20nm, "Excavator"-28FD+, 3rdIterBulldozer-14nm, Leopard(20SOC)/Catamount(20SHP), Margay(14nm), Enhanced Jaguar(14nm/XBX&PS4P; BTB structure and BP unit is different than prior 28nm Jaguar)
Yes, clearly everyone was shuffled to AMD's Zen architecture after 2012. When, officially only after 2015 when the designs of all 20nm(Excavator, Catamount, Cortex A57) were cancelled after production started is when Zen appeared alone, but not really.

ULP CPU architecture design appears after Zen and ULP GPU architecture design appears during Navi's design.

Stoney Ridge -> Pollock = 1/7th to 1/9th total product cost to 1/3 to 1/5th total product cost => Lower margins for customers whereas Stoney had higher margins for customers. Hence, why Pollock relatively is immediately replaced by AMD Mendocino | MDN-A0 Zen 2 | Socket FT6 | CPUID 0x8A0F00.

FT4:
Stoney Ridge 2016
Stoney Ridge Refresh 2017
Stoney Ridge Refresh+ 2018
Stoney Ridge Refresh++ 2019(Only A6-9220C/A4-9120C, however USB-IF leak pointed towards A9-9435 and A6-9235).

FT5:
Pollock 2020
Pollock Refresh 2021

FT6:
Mendocino 2022

The above is ruined by Pollock being a die and bin harvest, while Stoney was low cost as a full die, as well as Mendocino's low cost parts will also be die and bin harvests.

AMD has yet to succeed Stoney for a Value BGA socket or an Essential PGA/LGA socket contender.

28SLP -> 28HPP -> 28SHP(Kaveri)/28A(Carrizo) -> 28HPA(Bristol/Stoney) -> 20LPM -> 20SHP/20AN(Excavator/Catamount) -> 14LPP(Margay/Zen) -> 6FF(Zen2)
Is increasing costs for increased performance...

While 28SLP -> 22FDX is the same cost post-1Q2019 according to GlobalFoundries/ARM:
22fdsoi.png

Which in fact does not include "Low Cost and Highly Manufacturable MOL/BEOL Constructs in 22FDSOI Technology for High Performance and Low Power Applications" :: https://ieeexplore.ieee.org/document/8421481
and other low cost performance boosters applied to 22FDX and recently 22FDX+.

Massive design window: 2013 to present, however officially started after Stoney 2016+.
Massive product window: Four years with no changes to design relative to AMD's almost yearly Zen -> Zen+ -> Zen2 -> Zen3.
Meaning there is another massive design window for 12FDX. So when 12FDX gets as cost-effective as 28SLP an even cheaper product can be released.

22FDX M7 in Q2 2017 onwards had D0 approaching 28nm.
fab122fdx.png
With 22FDX hitting 10x rate of 2017 wafer per month at the end of 2020. Which means maximum depreciation of 22FDX occurred recently.

Also, "adaptive body bias capability as tools and IP to implement it properly were scarce until mid-2019." If ABB is needed it didn't exist till mid-2019 apparently.

There is another facet that is missed: ATTACKING DIFFERENT MARKETS: BULLDOZER AND BOBCAT
It wasn't originally two cores, but THREE cores:
cores.png

K9 team = Performance Core [5 GHz Opteron w/ post-L1i trace cache]
K10 team = Value Core [Bulldozer, originally meant to be Family 11h's successor] (Chuck M.(Value Core), as an AMD Turion X2 alternative ... to Mike B.(Performance Core), M.B. and crew patented collaborative mode(two clusters as one core) in 2007)
K11 team = Pervasive Core [Bobcat, nothing meant be explored on other than Jaguar being Value, and post-Puma Excavator returned to Value]

AMD Performance became Zen.
However, design team wise Value and Pervasive is still there.

There is also the fourth Cores team which is paired more aggressively with AMD India, Israel and ODCs.

While most posts might imply AMD64...
1990s = RISC-I later parts used for K5.
2000s = MIPS32/64 didn't lead anywhere.
2010s = ARM64 didn't lead anywhere.
2020s = Back to the roots with RISC-V.

Going forward with the death of x86 in servers sometime in 2023-2024 and client in 2026-2028. It makes sense to go to RISC-V for low-cost rather than ARM. As well as going forward only ARM SVEx and RISC-V "V" is in pre-planning of inserting Posit(hw_unum) extensions.

As is AMD's RISC-V CPU is RV64GB with no "C" extension compared to SiFive.
Indirectly it supports "Vector Opterations" but not sure if that actually means "V" or "Custom".

There is also a RISC-V GPU but ISA capability isn't anywhere, only the CPU has ISA scope revealed.

ULP CPU design, ULP GPU design, RISC-V CPU architecture, RISC-V GPU architecture, ULP Cache design, ULP System design, 5G System Architect;
Basically, a lot of stuff not going towards AMD's Zen/x86.
 
Last edited:
  • Like
Reactions: Zepp

moinmoin

Diamond Member
Jun 1, 2017
3,306
4,551
136
That was specific to CON cores . . .
No, it was AMD CPU teams' modus operandi before Zen, updating their two CPU lines (both construction and cat cores) in a yearly cycle. Zen broke free of that, and it's clear that Zen core updates by now still don't follow a yearly cycle and likely never will thanks to above lesson learned.
 
  • Like
Reactions: Tlh97 and NTMBK

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
No, it was AMD CPU teams' modus operandi before Zen, updating their two CPU lines (both construction and cat cores) in a yearly cycle. Zen broke free of that, and it's clear that Zen core updates by now still don't follow a yearly cycle and likely never will thanks to above lesson learned.
???
Zen -> 2017
Zen+ -> 2018
Zen2 -> 2019
Zen3 -> 2020
Zen3+(Warhol and Chagall) or Zen4/AM4 -> 2021
Zen4/AM5 -> 2022

Other than the hiccup of 2021, seems pretty yearly.
In fact, the one that was least yearly was Family 15h. Where client side mostly saw refreshes, which Intel eventually did as well.

AMD individuals are mostly misguided, since the fact is HPC/Server was killed by the Board of Directors. Which is why there was a huge gap between 2011 Opteron to 2017 EPYC. Since, they had to swallow the change from Server 2011 to Mobile 2012-2016 to Server 2017+.

Zen's direction as a mobile core largely failed luckily. Which is antagonizing the BoD again. AMD very much needs a mobile/client/embedded architecture that doesn't require a 12W 3000 second/50 minute boost. Intel's boost in the same market is only for 15 seconds.

AMD has to contend with a high-cost Dali/Pollock against low-cost JPL/EKL, then AMD's Monolithic high-cost MDN has to contend with the low-cost tiled ADL-N(CPU/GPU+Chipset) and MTL-N(CPU+GPU+Chipset).

AMD need to re-introduce the $25, sub-$30 range with a Stoney successor. Which can coast below in power and cost since performance for AMD costs battery and heatsink expenditures.
Low-cost micronotebook is back, low-cost gaming handhelds is back, embedded on land and in space is back, etc. All of which needs cheap, reliable, secure processors.

22FDX Fully Optimized is 20% faster than 14LPP Fully Optimized or 22FDX Full Optimized is 20% lower power than 14LPP Fully Optimized.
However, that is with the same design. We are looking at a thin-OoO value CMT processor instead. Instead of >192/>128+>128 instruction window module for both Int and FPU, ideally looking at >64-int0+>64-int1+>64-fpu01.

It also follows GlobalFoundries push for pervasive.
pervasive.jpg
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
19,165
7,926
136
No, it was AMD CPU teams' modus operandi before Zen, updating their two CPU lines (both construction and cat cores) in a yearly cycle. Zen broke free of that, and it's clear that Zen core updates by now still don't follow a yearly cycle and likely never will thanks to above lesson learned.
I will respectfully disagree. You are misinterpreting the quote. It speaks only of the need to abandon the CON cores. Zen1 itself took years of development which is what is indicated in that quote. It really says nothing about AMD's MO post-Zen. How or why iterations upon the Zen design have taken as long as they have is entirely orthogonal to their need to abandon CON (rather than continuing to work on it in cycles; that the cycles were yearly is entirely unrelated to what they do today).
 

moinmoin

Diamond Member
Jun 1, 2017
3,306
4,551
136
I will respectfully disagree. You are misinterpreting the quote. It speaks only of the need to abandon the CON cores. Zen1 itself took years of development which is what is indicated in that quote. It really says nothing about AMD's MO post-Zen. How or why iterations upon the Zen design have taken as long as they have is entirely orthogonal to their need to abandon CON (rather than continuing to work on it in cycles; that the cycles were yearly is entirely unrelated to what they do today).
The quote was: "It was clear that derivatives based on the Bulldozer architecture were not going to close the gap with one-year cycles of innovation."

I say the failure to close the gap is not specific to the Bulldozer architecture but to the one-year cycles of innovation. Those cycles are too short to implement meaningful innovations. You have previously complained about AMD's cycles of Zen gens to be on the long side. I say they are no slip ups but intended that way, with today's one-year cycles focused on the market being the mainstream APU line, not the cycles that drive the actual innovations like changes in the Zen core.
 
  • Like
Reactions: Tlh97

Shivansps

Diamond Member
Sep 11, 2013
3,609
1,284
136
AMD has to contend with a high-cost Dali/Pollock against low-cost JPL/EKL, then AMD's Monolithic high-cost MDN has to contend with the low-cost tiled ADL-N(CPU/GPU+Chipset) and MTL-N(CPU+GPU+Chipset).

AMD need to re-introduce the $25, sub-$30 range with a Stoney successor. Which can coast below in power and cost since performance for AMD costs battery and heatsink expenditures.
Low-cost micronotebook is back, low-cost gaming handhelds is back, embedded on land and in space is back, etc. All of which needs cheap, reliable, secure processors.
AMD lost that battle LONG ago. Ever since Bay Trail to be exact. Its no wonder why, a E1-3800 ITX board at 15W uses x3 as much power as a J4105 board at 10W that has superior performance, even with the J4105 running at 18W it still uses half the power.

What they are going to do? i have a hard time beliving that Pollock can be competitive with Jasper, or even Gemini Lakes in anything but IGP perf thats still worthless. And since no one seems to be interested in using it i would think thats the case.

Now. i just dont see how Stoney can accomplish anything here.
 
  • Like
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
19,165
7,926
136
You have previously complained about AMD's cycles of Zen gens to be on the long side.
I have, and I will continue to do so, especially since the 12-18 month no 15-18 month no 18 month cadence is, by this point , dead in the water.

The next Zen is ready, you know, whenever. The consumer market is put in its place. It is still preferable to AMD trying to iterate upon CON, the irony there being that AMD could not possibly have dug themselves out of bankrupcy with core designs named after construction equipment.
 

Zepp

Member
May 18, 2019
100
84
71
AMD has to contend with a high-cost Dali/Pollock against low-cost JPL/EKL
why is Dali considered high-cost?

AMD need to re-introduce the $25, sub-$30 range with a Stoney successor. Which can coast below in power and cost since performance for AMD costs battery and heatsink expenditures.
I thought Dali was the successor. Could they not port Stoney to a better node? construction core APU's have been on the same node since 2014.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
why is Dali considered high-cost?
Raven2/Dali/Pollock are high-cost because of insertion of leading-edge node generation plus a more large die.

Bhavani is ~107 mm2 (Insert at 28SHP at leading edge)
Stoney is ~125 mm2 (Insert at improved 28SHP at trailing edge)
Raven2 is ~149 mm2 (Insert while 14LPP was very expensive and still is...)((Extended: Raven and Raven2 were designed simultaneously, but Raven2 launched later))
October 26, 2017(Raven) -> April 16, 2019~November 20, 2019(Raven2)
I thought Dali was the successor. Could they not port Stoney to a better node? construction core APU's have been on the same node since 2014.
Raven2/Dali replaced Bristol Ridge which is where it perf/cost replaced. 250 mm2 to 148 mm2 while providing oppurtunities of better performance at same power and at lower cost.

Athlon X4 Bristol = $60
A12-9800E = $99
to 3000G = $49

Both designs have 128-bit DDR4, both were on AM4.

Pollock is suppose to succeed Stoney but it largely falls flat. Being frequency-wise slower and die-size larger than Stoney.
Stoney 6W = 1866 MHz 64-bit Mem, 1.8/2.7 GHz Single Module, 720 MHz iGPU, ~125 mm2
Pollock 6W = 1600 MHz 64-bit Mem, 1.2/2.3 GHz Dual Core, 600 MHz iGPU, ~149 mm2

Rather than port to 22FDX and keeping the 125 mm2 thus get the cost-saving from going from improved 28SHP to 28SLP cost-parity. Every indication is that Stoney's successor is a full new design but CPU-wise still sticking with CMT. Where they move towards a smaller die from 125 mm2 to 107mm2 or lower.

With 0.8x area shrink being from same design, so 125*0.8 = ~100 mm2, however the design CPU, GPU, and System Fabric-wise should be smaller than Bhavani on 28nm: <107mm2 * 0.8 = ~85.6 mm2. Which inserts it into Bobcat's Pervasive market at best and Jaguar's Value market at worst scaled cost-wise. Ensuring that invasive $25 price tag is higher margin for 22FDX/22FDX+ over 28nm and 14nm; >125 mm2 ~ >$16 per chip for 14nm and <100 mm2 ~ <$5 per chip for 22FDX.

Stoney Ridge was inserted in improved 28SHP. 22FDX+ which is improved 22FDX launched only in Sep. 2020: https://gf.com/press-release/globalfoundries-announces-new-22fdx-platform-extending-fdx-leadership-specialty While, there does appear to have Base+ (increased logic/transistor performance) it does not have ULP+ (0.5V/0.65V/0.8V HP-Dense Plus libs and memories compiler).
 
Last edited:
  • Like
Reactions: Zepp

moinmoin

Diamond Member
Jun 1, 2017
3,306
4,551
136
I have, and I will continue to do so, especially since the 12-18 month no 15-18 month no 18 month cadence is, by this point , dead in the water.

The next Zen is ready, you know, whenever. The consumer market is put in its place. It is still preferable to AMD trying to iterate upon CON, the irony there being that AMD could not possibly have dug themselves out of bankrupcy with core designs named after construction equipment.
You are asking for the return of the CON's one-year core update cycles. With such CON cycles AMD would dig itself back into big troubles.
 
  • Like
Reactions: Tlh97

LightningZ71

Golden Member
Mar 10, 2017
1,236
1,250
136
The other approach for AMD would be to take Renoir, ax one CCX, one memory PHY, one memory controller, remove the 16 PCIe lanes for the dGPU and refloorplan. That should give you a roughly 100mm^2 die on N7. Should be quite low power, and you'll get a ton of them, roughly 600, per 12 inch wafer. If this link is anything to go by, Tom's hardware article on wafer costs at TSMC, then, that's roughly $15.58 per die, packaging and processing excluded. Assuming that they keep the Lucienne power management improvements, and accounting for the smaller uncore, it should be more performant than Dali/Pollock at the same power. Even at 10% larger die, that's only $19 per die... Still not overly expensive. Why chase older architectures on very different nodes?
 
  • Like
Reactions: VirtualLarry

NostaSeronx

Diamond Member
Sep 18, 2011
3,451
953
136
Why chase older architectures on very different nodes?
It is for one specific reason: Adaptive Body Biasing.
150nmbulkbodybias.png
Ensuring ~100% yield at ~100% highest bin is only capable with body-biasing.

It also isn't chasing older architectures but rather modern ARM(Cortex-A510 @ similar perf to A73) and RISC-V(Ascalon, Zen-esque). Thin-OoO should push AMD's prior ~14.5 mm2 to something ~1.5x Jaguar's 3.1 mm2, which becomes ~4.65 mm2 then add the shrink from 22FDX. Should in turn allow for an IPC boost on integer and doubled IPC boost on float/simd from Jaguar and with ABB from 22FDX pushing frequency up or power down. The design only needs to be smaller than Zen's 14LPP which is 5.5 mm2.

Design Smarter, Not Smaller => 22FDX >is preferred over> 14LPP/7FF/6FF. If one is going for the cost-effective approach or in this case pervasive/value.

Moore's Standard is only applied to 28nm -> 22FDX -> 12FDX. So, ever decreasing costs is only present at GlobalFoundries.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
11,322
3,052
136
The other approach for AMD would be to take Renoir, ax one CCX, one memory PHY, one memory controller, remove the 16 PCIe lanes for the dGPU and refloorplan. That should give you a roughly 100mm^2 die on N7. Should be quite low power, and you'll get a ton of them, roughly 600, per 12 inch wafer. If this link is anything to go by, Tom's hardware article on wafer costs at TSMC, then, that's roughly $15.58 per die, packaging and processing excluded. Assuming that they keep the Lucienne power management improvements, and accounting for the smaller uncore, it should be more performant than Dali/Pollock at the same power. Even at 10% larger die, that's only $19 per die... Still not overly expensive. Why chase older architectures on very different nodes?
I don't think they want to spend 7 nm wafers on something like this. Hence the Zen 3 on GloFo 12 talk.
 

ASK THE COMMUNITY