If an Chromebook uses the Samsung Exynos with RDNA2, then it's a Samsung powered Chromebook, not an AMD one. Samsung is licensing the GPU IP from AMD, it's not an SOC developed by AMD.
Say that to the Google Tensor...
The only thing that might make sense for AMD to use a GF process for at this point are super cheap, 3rd world country type, APUs they can sell for dirt cheap and still make a buck, or to put the Zen IOD on 12FD-SOI, but that process still isn't ready for production.
Even in worst-case scenarios the 22FDX APUs would still be faster than either 28nm Bhavani or 28nm Stoney cases. The purpose of switching is Fully-depleted transistors like FinFETs, while unlike FinFETs actually drop in wafer price per node. AMD can't do aggressive low-cost with Zen because of this. The prices with Zen can only go up.
There is very much a market for low-cost and
low-power 22FDX CPU and 22FDX GPU solutions as well.
Microserver/Dense Server => Cost-prohibitive to consumers in 2012~2014. While in 2020+ it isn't cost-prohibitive to consumers anymore.
Cluster board/rack => ~$200-300
4-5 APUs => ~$120-250 (30 best case, 50 worst case)
Included 70-120 watt PSU => 5 at sub ~6w = <30w APU power.
This can be more aggressively scaled for PPD/W rather than expensive CPU+GPU with PPD metric only.
The other case is inexpensive expansive 2D/2.5D/3D gaming. Where high-end PCs aren't really needed, since there is a loss of quality in story or gameplay for quantity of graphic fidelity.
GPU arch. = Compute/W and Gaming/W but not necessarily at the same time at low $. (Split-RDNAx&CDNAx isn't feasible. It needs to do both Compute and Gaming well)
CPU arch. = GFlops/W and GOPs/W at low $.
Fam16h was optimized for Gate-last, Fam15h never removed HPC M-SPACE/Arch FE, BU, FPU. Neither designs would port well to 22FDX, so relatively grounds-up would need to be done for new Fam w/ CMT; ~$40M + ~$1500 vs ~$100M + ~$4000
$1800/538 (107 mm2 - 28SHP) = ~3.3
$1800/455 (125 mm2 - 28SHP) = ~4.0
$1500/577 (100 mm2 - 22FDX-0.8x) = ~2.6
$1500/653 (90 mm2 - 22FDX-0.7x) = ~2.3
$4000/386 (150 mm2 - Dali) = ~10.4
$4000/256 (225 mm2 - Monet) = ~15.6
Minimum Client SEP from Dali = 4.711538462
22FDX-0.8x = ~$12.25
22FDX-0.7x = ~$10.84
Monet = ~$73.5
Worst case insert at risk 22FDX SEP($3600-14nm FDSOI price @ 2015) = ~29.4 for the 100mm2-0.8x die.
Worst case for 7nm VGH=6nm MDN(FT6 version) = ~121.85
However, if it is that then we can basically look at Ryzen 5 3400G-3700U/2400G-2700U launch prices for MDN. Since, it actually provides that level of performance.
Oversimplification:
Up to 50,000,000 consumer chips × $25(ASP) = Up to $1,250,000,000
Up to 5,000,000 consumer chips × $250(ASP) = Up to $1,250,000,000
APU: 1.2W to 25W w/ big seller being the 1.2W to 6W range. 4c/3WGP(6x1 SIMD16 FP64) from 2c/3CU(3x4 SIMD16 FP32) and 4c/2CU(2x4 SIMD16 FP32)
CPU: 1.2W to 25W w/ big seller being the 5W to 10W range. Single-die 4 modules/8 MB L2(25-cycle for all four, 19-cycle for one), and dual-die 8 modules (3x4 PCIe GFX, 1x4 PCIe GPP, 128-bit DDR)
GPU: 1.2W to 25W w/ any range being good. 6WGP(12x1 SIMD16 FP64), single GPU card through quad GPU card => 3x slots from above ~12 dGPUs avg sold.
$25-$35 per chip range;; Low-cost Essentials to Compute/W/Dollar scale out.
---
Orange square is two Jaguar cores, did it on the side placed bits like BD-XV with area concerns popping up with private big LSU/L1d.
Some placement is covered in other non-AMD designs to maximize perf/power. I threw most of the repeats in, there is a case for a single 32KB L1d and 32KB L1i. Since, L0s are present to cover latency. Because of the repeats especially in FPU/LSU/dTLB/iTLB/L1d/L1i those are larger than if in an actual design.
MI/CNN/DNN machine-learned synthesis-place-route is relatively mature
22FDX is near its equipment depreciation, no new fab so fab depreciation isn't in, 2017 + 100% depreciation, 2018 + 80% depreciation, 2019 + 60% depreciation, 2020 + 40% depreciation, 2021 + 20% depreciation, 2022 + 0% depreciation.
Next-gen I/O(DDR5, PCIe Gen5, USB4, etc. on die for cost-crossover refresh), H.266/AV1 hw decode/encode, exhaustive power/architecture design and process improvements, etc. The benefit of inserting into Si at the trailing edge.
22FDX Risk = 2016, Tsi up to 8.5
22FDX Intro-to-Volume = 2017, Tsi up to 7.5
22FDX Actual-to-Volume = 2018-2019, Tsi avg to 6~6.5
Big EDA tools support Adaptive Body Biasing = 2019
Big EDA tools improve performance by ~20% and shorten time to market by half = 2020
22FDX+ is introduced, no mentions of logic perf except for STM's roadmap being half-way between 22FDX and 12FDX = 2020
40% + 20% + 15% = Note the increase from 2017+ 22FDX ZBB [40%] to 2021+ 22FDX+ ZBB [75%], whereas BB was only [70%] in 2019+.
\\\\
With 12FDX returning to profiles in 2021, 2019 till very recently, 12FDX wasn't present at all. So, it is very important to check on these:
"The 300 mm pilot line is on track to be completed by the end of the year" -- 2021 Missouri 300mm SOI, GlobalWafers.
Dresden/F1 = 22FDX (now) and 12FDX (soon)
Singapore/F7 = 22FDX (soon)
Malta/F8 = 22FDX (soon - 300mm Missouri) and 12FDX (soon - 300mm Missouri)
Burlington/F9 = 22FDX (soon - 200mm Missouri)
Bernin II - 2020 €25 million($28.3M) in capital expenditure + €10M($11.3M) in 2021(Shared Bernin 1) + €220M[$249M](Shared Between Bernin II and third 300mm FDSOI fab SOITEC) between 2022 through 2026.
Pasir Ris - 2020 €26($29.4M million in capital expenditure + €67M($75.8M) in 2021 + €275M($311M) during 2022 through 2026.
$87.1M(2021 and shared between B1,B2, PR1) and going forward $111.5-140M(avg per year) from SOITEC
$210M(2021 and shared between 200mm/300mm), and no details going forward other than $800M in planned wafers bought by GloFo, from GlobalWafers
spent for capacity.
$800M using the 2009 cost reduced FDSOI wafer = 1,600,000 SOI wafers(if at 2009 planned costs) being planned to be bought by GlobalFoundries.
PDSOI was $1000 base, FDSOI entry was $500 back then. If the above is spread out over five years then it should be enough to supply ~27K (relative to 300mm) wafers per month(avg).
29.2K wafer starts per month at Fab 8 in 2021
+ $1B for another 12.5K wafer starts per month for 2022 and beyond.
+ sub-$10B for a double of the above with Fab 8.2/Fab8 Phase2, unknown introduction.
Since, GloFo is not going to update the roadmap I did it for them.
What FinFET node? We never had one!
Some special customers who were eyeballing 14LPP/12LP/12LP+ on-shore are opting to move to Intel's 22FFL instead. GlobalFoundries does not currently do 22FDX in the states, so it is off-shore.
Worldwide Best-All-Around = TSMC FinFETs
United States = Intel FinFETs
Cheapest WW/US/China = Samsung(US/Korea) and SMIC(China) FinFETs
- Samsung 3nm Taylor-Austin, Texas
- TSMC 5nm Northphoenix, Arizona
- Intel, all of em, Oregon/Arizona
////
22FDX-Y1 -> 22FDX-Y2 -> 22FDX-Y3 -> 22FDX-Y4 -> Yx -> 12FDX-Y1
::
12FDX-RP -> 12FDX-IV -> 12FDX-AV -> 12FDX-BiABB -> 12FDX-PPACY+x -> 12FDX-Y1
By the time 12FDX processors exist for AMD, GlobalFoundries will probably be inserting JFIL(Japan), SRPL(China), DSA(Belgium), or REBL/MAPPER/ISFEA(USA) into 6FDX.
Chartered => 157nm or SFIL for next-gen nodes
Albany => SFIL for next-gen nodes
Both combine in a modernity and GF chooses JFIL for next-gen nodes and removes high straight-line depreciation costs.
However, looking at the EOL lineup: GlobalFoundries 28nm is dead, the only 28nm product in is AMD Embedded G-Series 1st Generation SoC, AMD Embedded G-Series CPUs which is TSMC. 28nm GF EOL at January 2021, Customers of EOL products Last-time-ship is January 2022. Thus, End User last-time-purchase is 2022 for GloFo 28nm.
AMD has completely waived purchases of 14nm/12nm instead opting to use node freedom to second source those products.
GF-28nm killed in 2021.
GF-14nm/GF-12nm killed in 2022.
Back to why 22FDX;
::ARM::
BCM2711 28nm = 1.5 GHz to 1.8 GHz Cortex-A72 (quad-core) = ~6W TDP
RK356x 22FDX = 1.8 GHz to 2.0 GHz Cortex A55 (quad-core) = 5W TDP
Cortex-A510 is within 10% of Cortex-A73, while Cortex-A73 was an ~10% improvement over A72.
RPi4 can go to RPi5 and get 4x 3 GHz A510 at lower area and RK356x can move to 4x 3 GHz A510 for better perf.
::AMD::
A6-9220C 28nm = 1.8 GHz to 2.7 GHz (dual-core) = 6W TDP
3015e 14nm = 1.2 GHz to 2.3 GHz (dual-core) = 6W TDP w/ 5 min fPPT boost to 18W and 50 min sPPT boost to 12W.
::Intel::
Pentium N6000 = 1.1 GHz to 3.3 GHz (quad-core) = 6W TDP w/ 15 second boost to 18W.
Alderlake-N = 8x Gracemont (octo-core) = 1.8 GHz to 3.4 GHz(sust. quad-core boost) and 3.0 GHz(sust. octo-core boost)
Meteorlake-N = 8x Crestmont (octo-core) = IPC(Arch+)/GHz boost from Intel 4.
AMD at the low-end at GlobalFoundries is basically crushed... big die 14nm isn't competitive enough against small die Intel 7/4 and big die 14nm is too expensive against small die 28nm/22nm node generation.
Hence, why small-die 22FDX is the clear choice going forward.
~6.2 mm2 for two Jaguar-cores on 28nm <-> ~4.0 mm2 for two Enhanced-Jaguar-cores on 16nm. The issue with this singular design is that they don't have the FPU power of the above; 2x64 or 2x128 FMA SVE for ARM and 2x256 FMA for Intel.
So, they need a small core or module that at median provide 2x128 FMA. With 28nm CMT = 4.65 mm2+arbitrary area for 2x128 FMA and 16nm-like Area-opt 22nm CMT = 3 mm2+arbitrary area for 2x128 FMA.
Process Complexity/Cost:
22FDX/22FDX+(w/ in-situ perf boosters) -> 28PolySi -> 28HKMG(SLP) -> 12FDX(w/ gen2.1 in-situ perf boosters) -> 28HKMG(28HPP/28SHP/28SHP+ w/ implant perf boosters) -> 14nm FinFET(Fin complexity, RMG complexity, MOL complexity)
Performance+Power:
28PolySi -> 28HKMG(SLP) -> 28HKMG(28HPP-SHP+) -> 14nm FinFET -> 22FDX/22FDX+ -> 12FDX
14LPP has delay variation at 1.1-up V(Ultra-high-performance) and 0.7-down V(Ultra-low-power) which requires costly implants to fix.
22FDX doesn't have these issues because of its use of in-situ boosters intrinsically.
So, across ultra-wide range workloads 22FDX comes out way cheaper and way faster.
28SLP-A9 1 GHz @ 1.1V
New Range: 28FDS-A9 1 GHz @ 0.65V
Same Range: 28FDS-A9 @ 2.3 GHz @ 1V
New Range: 28FDS-A9 3 GHz @ 1.4V
22FDX was derived from the Tri-gate competitor line(20FD: 0.9Vdd+20nmLg, 14FD: 0.8Vdd+boosters, 14FD+: 0.7Vdd+gen1.1 boosters, 22FDX: 0.65Vdd+gen1.2 boosters)... 20LPM to 20SHP is a 10% perf increase, but 20LPM to 20FD(gen1) is a 20% perf increase.
If the eQuad A9 core was ported to 22FDX, the A9 core would be 4.5 GHz @ 1.3V... which explains 28BLK and 28FDS designs at STMicro were ported to 22FDX.
Body biasing in designs:
Intel's 45nm - 2008
"Dynamic SRAM PMOS forward-body-bias (FBB) and Active-Controlled SRAM VCC in Sleep are integrated in the design to lower Active-VCCmin and Standby Leakage, respectively. FBB improves the Active-VCCmin by up to 75 mV, and Active-Controlled SRAM VCC distribution tightened by 100 mV, both of which result in further power reduction.
The 16 KB Subarray was also used as the building block in on-die 6 MB Cache for Intel Core 2 Duo CPU in 45 nm technology.
Oracle 40nm - 2010
"In addition, the design implements body-bias capability for both PMOS (VNW) and NMOS (VSB)."
Samsung 32nm - 2012
"Data from the monitors is analyzed to identify the process corner, the amount of threshold voltage skew and the on-chip variation and utilized to control body bias and supply voltage for the power planes. It effectively reduces the process window of the silicon samples and minimizes the leakage/performance impact of process variation. We can target the process corner to SS to minimize the overall leakage current and selectively apply forward body bias on the speed critical blocks, or target the processor to the FF corner and apply backward body bias on the leakage critical blocks."
Samsung 28nm - 2013
"In addition, based on the measurements from on-chip performance sensors, reverse and forward body biasing are appropriately applied to compensate against the process variations for reducing leakage, improving performance and yield."
28FDS eQuad was compared to Samsung's 28nm processor in one of the demo videos. 22FDX Rockchip uses body-biasing for A35, A53, A55 cores.
Closest processor TDPs of tier 1:
:Shrink of not-AMD MediaGX processor:
Geode range => 2.8-5.1 watts
:Bobcat w/ iGPU like MediaGX:
Geode2 => 6.4 watts
Geode2 client => 5.9W and 4.5W
:Jaguar w/ iGPU like MediaGX:
Geode3 (dual-core) => 6W
Geode3 client (quad-core) => 8W
: Puma w/ iGPU like MediaGX:
Geode3+ (quad-core) => 5W-7W
Geode3+ client (quad-core) => 4.5W
:Excavator w/ iGPU like MediaGX:
Geode4 (dual-core) => 6W-10W
Geode4 client (dual-core) => 6W
:No successor to reduce power and increase performance at similar lowering price point:
Embedded, Industrial, Mobility tier 1 (Stripped I/O for lower power) => 22FDX is ideal and allows for "Cool_AMD64" back-biasing//body-biasing.
Client, Desktop, Mobility tier 2 (Full I/O do to higher availability of power) => 22FDX can also spin up for high TDP bursts.
22FDX;
Lowest VT + FBB
Lower Mid VT + FBB
Higher Mid VT + RBB
High VT + RBB
12FDX:
Lowest VT + FBB + RBB
Lower Mid VT + FBB + RBB
Higher Mid VT + FBB + RBB
High VT + FBB + RBB
Collected a range of Mullins and Stoney => 0.9V=+70% frequency and 0.5V=-75% power
Frequency of 0.9V for 22FDX design is 3.06 to 3.74 GHz. (+500 MHz to 1.2 GHz for Puma and +300 MHz to 1 GHz for Excavator)
Frequency of 0.5V for 22FDX design is 1.71 to 2.01 GHz. (-75% the power of 28nm 0.9V)
The issue is 28nm bin range is absolutely everywhere. Whereas 22FDX should be more biased towards highest bin.