AMD Bristol/Stoney Ridge Thread

NostaSeronx · Apr 10, 2021

NTMBK said:
Note the mention of graphics in there. This is designing cores that are embedded into the GPU, not a standalone CPU product.

I think that implies beyond just developing RISC-V CPU work, it might move towards other RISC-V implementations of GPU/NPU/PSP/ACP/etc.

The mention of graphics includes stuff like RV64X[X extension of V extension]/Simty. The mention of compute includes a RVV co-processor in the vein of Centaur's NPU, etc.
Note the distinction between CPUs and Processors:

Work with a team of architects for developing new innovative embedded RISC-V CPUs.
Understand and improve existing and emerging graphics/compute paradigms and new APIs employing RISC-V Processors.

Now take note of this Think Silicon's related quote: "The usage of a common ISA between the main system CPUs and GPUs will allow new programming paradigms by dynamically balancing computation load between these processing elements."

Today: CPU+Graphics+Compute => Physically Unified(on the same die) yet Logically Separated(Differentiated programming model, unique compilers, multiple instruction sets)
Tomorrow: "+"+" => "(") and Logically Unified(Unified programming model, shared compiler, single instruction set)

NTMBK · Apr 10, 2021

NostaSeronx said:
I think that implies beyond just developing RISC-V CPU work, it might move towards other RISC-V implementations of GPU/NPU/PSP/ACP/etc.

The mention of graphics includes stuff like RV64X[X extension of V extension]/Simty. The mention of compute includes a RVV co-processor in the vein of Centaur's NPU, etc.
Note the distinction between CPUs and Processors:

Work with a team of architects for developing new innovative embedded RISC-V CPUs.

Understand and improve existing and emerging graphics/compute paradigms and new APIs employing RISC-V Processors.

Now take note of this Think Silicon's related quote: "The usage of a common ISA between the main system CPUs and GPUs will allow new programming paradigms by dynamically balancing computation load between these processing elements."

Today: CPU+Graphics+Compute => Physically Unified(on the same die) yet Logically Separated(Differentiated programming model, unique compilers, multiple instruction sets)
Tomorrow: "+"+" => "(") and Logically Unified(Unified programming model, shared compiler, single instruction set)

Nah, this is the same as Nvidia replacing their FALCON embedded core with a custom RISC-V one. Tiny cores that handle data flow, power management etc within the GPU are moving over to RISC-V so that they can benefit from the software ecosystem and don't need to write their own compilers etc.

NostaSeronx · Apr 10, 2021

NTMBK said:
Nah, this is the same as Nvidia replacing their FALCON embedded core with a custom RISC-V one. Tiny cores that handle data flow, power management etc within the GPU are moving over to RISC-V so that they can benefit from the software ecosystem and don't need to write their own compilers etc.

However, that relates to security/systems not graphics/compute.

Falcon and NV-RISCV processor => systems and security
Systems => Debug ucode, temperature, etc
Security => Tegra Security Processor, secure driver context switching, etc.

None of which touch upon existing and emerging graphics/compute paradigms => HLSL/GLSL/SPIR-V or new APIs beyond => DirectX/OpenGL/Vulkan.

Unless, AMD plans to remove hardware features from their graphics ISA/HW. There is no reason to have RISC-V microcontrollers where Nvida places them.

Nvidia:
What will be done:

Support development of (bare metal) firmware run on embedded microcontrollers within Nvidia GPUs.
Collaborate with the hardware and software teams to architect new features and guide future development.
Optimize software to improve system robustness, performance and security.
Participate in testing new and existing firmware.
Perform system bring-up, debug, and validation.
Ensure compliance to functional safety standards (ISO 26262 and ASPICE). This includes defining requirements, architecture and design with end-to-end traceability, performing safety analyses - FMEA/DFA/FTA and ensuring code compliance to MISRA and Cert-C standards.

What we will need to see:

Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, buses, DMA, etc).

If such a project existed, I would suspect similar language to above. Rather than firmware development, it would be hardware development.

For Example: Systems Design Engineer for AMD:
Key Responsibilities

Develop and execute feature verification, enablement, and test plans for SOC power and performance management features
Develop firmware functions and features for embedded microcontrollers in CPU/GPU
Debug, troubleshoot system-level issues related to power and performance features
Prototype innovative ideas and designs
Support test automation infrastructure and develop test scripts
Proactively drive continuous improvement in all areas of activity

Looking at other occurrences of AMD CPU Design Engineers, they do in fact work on CPU cores.

=> RISC-V CPU in the vein of a microprocessor not a microcontroller, which targets central processor functions. (Ex List: K5/K6/K7/K8-10h/K9^1/Bobcat/Bulldozer/Jaguar/Zen^1/etc)
=> RISC-V Processor in the vein of a microprocessor not a microcontroller, which targets graphical/computational functions. (Ex List: GCN/RDNA/CDNA)

NTMBK · Apr 10, 2021

NostaSeronx said:
However, that relates to security/systems not graphics/compute.

Falcon and NV-RISCV processor => systems and security
Systems => Debug ucode, temperature, etc
Security => Tegra Security Processor, secure driver context switching, etc.

None of which touch upon existing and emerging graphics/compute paradigms => HLSL/GLSL/SPIR-V or new APIs beyond => DirectX/OpenGL/Vulkan.

Unless, AMD plans to remove hardware features from their graphics ISA/HW. There is no reason to have RISC-V microcontrollers where Nvida places them.

Nvidia:
What will be done:

Support development of (bare metal) firmware run on embedded microcontrollers within Nvidia GPUs.

Collaborate with the hardware and software teams to architect new features and guide future development.

Optimize software to improve system robustness, performance and security.

Participate in testing new and existing firmware.

Perform system bring-up, debug, and validation.

Ensure compliance to functional safety standards (ISO 26262 and ASPICE). This includes defining requirements, architecture and design with end-to-end traceability, performing safety analyses - FMEA/DFA/FTA and ensuring code compliance to MISRA and Cert-C standards.

What we will need to see:

Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, buses, DMA, etc).

If such a project existed, I would suspect similar language to above. Rather than firmware development, it would be hardware development.

For Example: Systems Design Engineer for AMD:
Key Responsibilities

Develop and execute feature verification, enablement, and test plans for SOC power and performance management features

Develop firmware functions and features for embedded microcontrollers in CPU/GPU

Debug, troubleshoot system-level issues related to power and performance features

Prototype innovative ideas and designs

Support test automation infrastructure and develop test scripts

Proactively drive continuous improvement in all areas of activity

Looking at other occurrences of AMD CPU Design Engineers, they do in fact work on CPU cores.

=> RISC-V CPU in the vein of a microprocessor not a microcontroller, which targets central processor functions. (Ex List: K5/K6/K7/K8-10h/K9^1/Bobcat/Bulldozer/Jaguar/Zen^1/etc)
=> RISC-V Processor in the vein of a microprocessor not a microcontroller, which targets graphical/computational functions. (Ex List: GCN/RDNA/CDNA)

I'm sure that they will work on a RISC-V CPU core! I'm also sure that it will never be accessible to end users.

NostaSeronx · Apr 10, 2021

NTMBK said:
I'm sure that they will work on a RISC-V CPU core! I'm also sure that it will never be accessible to end users.

Again, RISC-V CPU and processors are not RISC-V Microcontrollers. So, the given hiring statement is for a RISC-V project of a CPU. Which is in the vein of the CPU cores in such products as Athlon, Sempron, Duron, FX, Ryzen, Geode, etc.

If AMD was going for RISC-V microcontroller, they would state it. It also wouldn't be a CPU Design Engineer position either.

So, it is definitely a RISC-V CPU core, which will be accessible to those who buy it. With absolute access to a more open BDKG/PPR/etc than x86/ARM cores from AMD.

Premium => Proprietary (orientated towards premium closed-source firmware/microcode // NDA requirement)
Value => Libre (orientated towards free open-source firmware/microcode // No NDA requirement)

The mention of graphics/compute mostly comes from RISC-V capability of having hybrid CPU-GPU architectures. So, basically AMD RISC-V Core A is a traditional CPU, while AMD RISC-V Core B is a Scalar Processor of a larger Vector-focused RISC-V Graphics unit.

1. GlobalFoundries.
2. 22FDX/12FDX => AMD has already done 14LPP/12LP/etc. Malta is running down FinFETs for more mature processes like 90nm FDSOI, 45nm CLO/LPCLO (PDSOI/FDSOI), etc.
3. Needs to compete with or surpass upcoming solutions.

Allwinner/Pine => Really weak CPU, no GPU?
BeagleV => 2x/4x U74 (SiFive-licensed) + Imgtec B-series (4x32) <== Probably TSMC
PicoRio => 4x Low-end custom? RISC-V + Imgtec 7XE GE7800 (2x16) <== TSMC
HiFive Unmatched => 4x U74 + No GPU IP <== TSMC
T-head ICE soc => 2x C910 + 1x C910w/V-extentsion + 1x GPU core(unknown) <== TSMC

Thus, the lowest end RISC-V embedded processor at minimum to maximum must be within these.
Ontario/Zacate[Bobcat-class], Kabini/Steppe Eagle/Crowned Eagle[Jaguar/Puma-class], Brown Falcon/Prairie Falcon[Excavator-class]

Since, the competition of RISC-V in consumer/embedded SBC market is pretty weak.

2x RISC-V ULP CPU cores/ 4x RISC-V ULP CPU cores / 2x RISC-V HP-ULP cores
1x RISC-V GPU cluster / 2x RISC-V GPU cluster / 3x RISC-V GPU cluster

Embedded markets are shared with semi-custom. So, if on GloFo that means several added options RHBD-capability, RF-capability, 3rd party RISC-V IP, etc. More customers to GloFo through AMD's Value IP, rather than using AMD's Premium IP which has been mostly exclusive to TSMC anyway.

RISC-V + FDSOI and the lower costs and faster time to market. Should allow for several versions of such processors;
Drafts => Initial version; Boot, OS, Software development//Identify issues.
Candidates => Mid-timeline version: Addition of third-party IP//Second identification of issues.
Release => Frozen version: Compliant for end-user usage with frozen features.//Issues identified previously shouldn't be in here, etc.
OoO style => Draft v1 -> Candidate v1 -> Draft v2 -> Release v1 -> Candidate v2 -> Draft v3 -> Release v2 -> etc.
Which should allow for a similar launch style of yearly improved IP.
Ex: => https://www.sifive.com/blog/sifive-core-ip-20g1

SiFive Core IP 21G1

www.sifive.com

Project Roadmap — PicoRio User Manual 0.0.1 documentation

picorio-doc.readthedocs.io

and Beagle-V Later Version which switched out the Cadence Tensilica VP6 which is in the Early Version for a IMG BXE-4-32 MC4 GPU
^-- These later boards are expected to swap to a new SoC, trading out the VIC-7100 for the VIC-7110. This will be a quad-core SoC, with the aforementioned dedicated video hardware.

I went searching it seems to be a copy-paste with exact meaning:
AMD GPU Architect:
A self-motivated compute architect who is passionate about growing the efficiency of the GPUs. An effective teammate who focuses on collaboration, team building, mentoring, and further team success.

Work with a team of architects for developing innovative solutions in the field of graphics and compute.
Identify complex technical problems, summarize multiple simpler solutions, and help the team to make advances in PPA.
Understand the concepts of Performance/FLOP and Performance/Byte. Use these metrics as a vehicle to identify bottlenecks and solves them to increase the overall GPU efficiency.
Explore architectural innovation in fixed function, compute, and memory hierarchy.
Communicates ideas with other architects and managers on multiple sites.

Only needs =>

Knowledge of modern GPU architectures with an overall· 7+ years of experience in architecting GPUs.

Master or PhD degree with emphasis in Electrical engineering, Computer architecture, or Computer science preferred

AMD CPU Design Engineer 2:
A self-motivated CPU enthusiast. An effective team player who focuses on collaboration, team building, mentoring, and furthering team success.

Work with a team of architects for developing new innovative embedded RISC-V CPUs.
Identify complex technical problems, break them down, summarize multiple possible solutions, and help the team make advances in Performance, Power, and silicon Area (PPA).
Understand and improve existing and emerging graphics/compute paradigms and new APIs employing RISC-V Processors.
Work with subsystem architects to understand bottlenecks and other problems where an embedded processor will improve the performance.

Only needs => Master's degree preferred in EE and CE or Bachelor's degrees with 1 year of proven experience.

While sharing the GPU Architect position it appears to share more with a PMTS Silicon Design Engineer:
PMTS Silicon Design Engineer:
A self-motivated graphics enthusiast. An effective team player who focuses on collaboration, team building, mentoring, and furthering team success.

Work with a team of architects for developing new innovative algorithms in the field of graphics and compute for low power GPUs
Identify complex technical problems, break them down, summarize multiple possible solutions, and help the team make advances in Performance, Power, and silicon Area (PPA).
Understand and improve existing and emerging graphics/compute paradigms and new APIs
Work with subsystem architects to understand bottlenecks in low power graphics cores/SoCs

How to know it isn't a microcontroller positon;
Power Management Firmware Engineer:
AMD's power management design team is seeking an experienced Firmware Design Engineer to chip in to System Management, Power Management and Security firmware for AMD's APU, Server and dGPU products. This position offers a very good growth path in a highly visible role.

Assume ownership in development and/or verification of firmware crafted for an embedded microcontroller.
Work with HW design and verification engineers to verify firmware features
Contribute to architecture of hardware, firmware and power management features

I'll go through these one by one.

Work with a team of architects for developing new innovative embedded RISC-V CPUs.
Identify complex technical problems, break them down, summarize multiple possible solutions, and help the team make advances in Performance, Power, and silicon Area (PPA).
Understand and improve existing and emerging graphics/compute paradigms and new APIs employing RISC-V Processors.
Work with subsystem architects to understand bottlenecks and other problems where an embedded processor will improve the performance.

1. Work under(CPU DE2 is a very low position) a team of architects who are designing/developing a new RISC-V CPU.
2. Basically choose PPA options for a given process that the RISC-V CPU will be on.
Let say 22FDX:
Track Libs: 12T(UHP), 8T(ULP), 7.5T(ULL), 6.75T(AG1)
Types: SDB, DDB, CNRX, etc
ABB: FBB-orientated block, RBB-orientated block, or complex FBB/RBB double block.
etc.
3. Understand/Improve existing/emerging programming models/APIs in regards of employing RISC-V processors for graphics/compute.
4. Makes little sense, but it basically is stating if x86/ARM/GCN/RDNA/CDNA bottleneck and cause other problems identify if RISC-V processors will solve and improve performance.

amd6502 · Nov 22, 2021

I think Risc-V has huge potential and I hope they can participate in it with their own custom core. If so, hopefully they take the best of Zen together with the best of Piledriver/XV.

It's too bad Stoney was the end of line for XV. Laura Nyro has an explanation of why they named the project Stoney. I guess and FDX port was too complicated and costly (too difficult to port the high density GPU part to FDSOI?).

I think Stoney might be out of production now (maybe for some time already, and just running on old stock). Two threads may start to get too limiting, even for low end, unless the frequency or IPC is very large.

Anyways, cost is where still had a big advantage over Pollock/Dali, which is kind of big ~145mm2 vs Stoney's ~125mm2 (at under half the transistors, ~1.6 B). You'd think with automotive chip shortages such a product, would still have a lot of use, especially if enhanced with FDSOI.

NostaSeronx · Nov 23, 2021

amd6502 said:
I guess and FDX port was too complicated and costly (too difficult to port the high density GPU part to FDSOI?).

It is probably do to lack of differentiation and that AMD wanted to shift x86 to Zen-only as quickly as possible. 28BLK Stoney Ridge at 125 mm2 = Hypothetical 22FDX Stoney Ridge Plus at 125mm2. Since the library allowed for continuous library length; 114CPP 28BLK = 104+10 CPP on 22FDX and 90Mx 28BLK = 90Mx on 22FDX, etc. Very minimally had to be changed between 28BLK and 22FDX for a 1:1 port. No different than OR-B0 to OR-C0. Especially, since designs on GF28 had GF Advanced FDSOI porting in mind.

amd6502 said:
I think Stoney might be out of production now (maybe for some time already, and just running on old stock).

Bristol and Stoney are both out of production, they went EOL at the same time.

On prior subjects

12FDX:
Will most likely have two CPPs like 22FDX.
10nm FDSOI or 7nm FDSOI FEOL's 64CPP[10nm]/56CPP[7nm]. 10nm CPP is standard(64CPP/56Mx), 7nm CPP is similar to Intel's last planar node(56CPP/56Mx).

SiFive SoC IP:
Something potential SiFive<->AMD partnership for RISC-V inter-system op.
SiFive Shield
SiFive WorldGuard
SiFive Insight
Not sure however how far this hole goes... related to ARM system ip to RISC-V system ip.

CPU:
Still Clustered Multithreading, no HPC target so architecture might be small(no FPU co-processor?//Combined ALU/FPU units?). Server is only Edge with low core count compared to EPYC.
RISC-V RV64GC + V?/B/S&V? Crypto/P? <== potential

GPU:
Potentially-derived from both CDNAx and RDNAx; Wave16 w/ 64-bit ALUs(Packed 2x32-bit), etc.
Listed as Ultra-Low-Power GPU compared to both CDNA/RDNA designs being listed as High-Performance GPU. Very big guess that it will use X for Xpress DNA(old IGP mobo reference) or Cross DNA(being both CDNA and RDNA), etc.

NTMBK · Nov 23, 2021

amd6502 said:
I think Risc-V has huge potential and I hope they can participate in it with their own custom core. If so, hopefully they take the best of Zen together with the best of Piledriver/XV.

It's too bad Stoney was the end of line for XV. Laura Nyro has an explanation of why they named the project Stoney. I guess and FDX port was too complicated and costly (too difficult to port the high density GPU part to FDSOI?).

I think Stoney might be out of production now (maybe for some time already, and just running on old stock). Two threads may start to get too limiting, even for low end, unless the frequency or IPC is very large.

Anyways, cost is where still had a big advantage over Pollock/Dali, which is kind of big ~145mm2 vs Stoney's ~125mm2 (at under half the transistors, ~1.6 B). You'd think with automotive chip shortages such a product, would still have a lot of use, especially if enhanced with FDSOI.

Why waste resources shrinking a bad architecture? AMD invested billions in designing Zen from the ground up, because Bulldozer was awful. As soon as Zen was ready, they dropped the Bulldozer family and never looked back. It's old tech that was never very good to begin with, it's never coming back.

NostaSeronx · Nov 23, 2021

NTMBK said:
Why waste resources shrinking a bad architecture? AMD invested billions in designing Zen from the ground up, because Bulldozer was awful. As soon as Zen was ready, they dropped the Bulldozer family and never looked back. It's old tech that was never very good to begin with, it's never coming back.

However, AMD did in fact do work on a design after Excavator.

2007+ = design of Bulldozer/Piledriver
2009+ = design of Steamroller/Excavator
2011+ = design of BDv5/BDv6 <== this work was paused while Zen was getting workers, however not everyone moved to Zen work.

BDv5/BDv6 were finished but never put into a production product. However, there does appear to be a general range given with the above list.

45nm/32nm = Gen1/Gen2
28nm/20nm = Gen3/Gen4
14nm/10nm = Gen5/Gen6

Zen+ to Zen2 => 12LP to 7LP // TSMC 7nm as backup and priority. (~20K at TSMC and ~10K at GlobalFoundries)
Excavator to BDv5 => 28A to 12FDX // No backup, no other fab to push the product too. Since, 7nm wasn't a priority, 12FDX thus had to be.

Both 7LP and 12FDX were announced in 2016:
"The company plans to offer both 7nm FinFET and 12FDX technologies simultaneously to address different areas of the market. Where 12FDX's true advantages lie are cost and simplicity, two things that FinFET's design complexity and high expense really cannot contend with."

There is very much a timeline where instead of getting two Zen's with Dali and Pollock. It could have been Zen for Dali and BDv5 for Pollock.

14nm -> 12LP+ = +40% power savings, same GDPW
14nm -> 12FDX = +50% power savings, +22% good dies per wafer.

BDv5 would have also been prepared for the inversion:
Big core design: BDv1 -> BDv2 -> BDv3 -> BDv4 -> Zen
Small core design: Bobcat -> Bobcat+ -> Jaguar -> Jaguar+ -> BDv5

Where BDv3/BDv4 are streamlined derivatives of BDv1/BDv2, BDv5/BDv6 would have been the reworked inversion from HPC/Server-focused cores to Client/Embedded-focused cores.

Thunder 57 · Nov 23, 2021

NostaSeronx said:
However, AMD did in fact do work on a design after Excavator.

2007+ = design of Bulldozer/Piledriver
2009+ = design of Steamroller/Excavator
2011+ = design of BDv5/BDv6 <== this work was paused while Zen was getting workers, however not everyone moved to Zen work.

BDv5/BDv6 were finished but never put into a production product. However, there does appear to be a general range given with the above list.

45nm/32nm = Gen1/Gen2
28nm/20nm = Gen3/Gen4
14nm/10nm = Gen5/Gen6

Zen+ to Zen2 => 12LP to 7LP // TSMC 7nm as backup and priority. (~20K at TSMC and ~10K at GlobalFoundries)
Excavator to BDv5 => 28A to 12FDX // No backup, no other fab to push the product too. Since, 7nm wasn't a priority, 12FDX thus had to be.

Both 7LP and 12FDX were announced in 2016:
"The company plans to offer both 7nm FinFET and 12FDX technologies simultaneously to address different areas of the market. Where 12FDX's true advantages lie are cost and simplicity, two things that FinFET's design complexity and high expense really cannot contend with."

There is very much a timeline where instead of getting two Zen's with Dali and Pollock. It could have been Zen for Dali and BDv5 for Pollock.

14nm -> 12LP+ = +40% power savings, same GDPW
14nm -> 12FDX = +50% power savings, +22% good dies per wafer.

Are you starting up that fan fiction thing again? You never provide sources just like this BDv5/6. AMD started working on Zen in 2012 IIRC, they knew BD wasn't going to be enough.

I mean where do you come up with these numbers? What makes you love BD and FDX so much? The one thing you do bring up that I don't understand is why AMD never used 12nm+.

DAPUNISHER · Nov 23, 2021

NTMBK said:
Why waste resources shrinking a bad architecture? AMD invested billions in designing Zen from the ground up, because Bulldozer was awful. As soon as Zen was ready, they dropped the Bulldozer family and never looked back. It's old tech that was never very good to begin with, it's never coming back.

j/k of course. The idea of pursuing anything to do with BD is laughable.

NostaSeronx · Nov 23, 2021

Thunder 57 said:
Are you starting up that fan fiction thing again? You never provide sources just like this BDv5/6. AMD started working on Zen in 2012 IIRC, they knew BD wasn't going to be enough.

AMD never fully enabled Bulldozer to begin with. We never got access to the collaborative mode with 1*4 ALU/3 AGLUs(2 clusters as 1 core), we got stuck only with the respective mode 2*2 ALU/2 AGLU(2 clusters as 2 cores).

Thunder 57 said:
I mean where do you come up with these numbers? What makes you love BD and FDX so much? The one thing you do bring up that I don't understand is why AMD never used 12nm+.

Process numbers come from GlobalFoundries. Bulldozer full feature set was never launched only hinted at in patents and AMD's Open64. There is a bunch of research backing FDSOI is faster than FinFETs. At 64CPP undoped FDSOI becomes faster than super-doped SSRW FinFETs. GlobalFoundries has been low-key releasing A53/A55/A57 cores on 22FDX at higher speeds than 16FF, etc.

A53 on 16FF+ at TSMC ~~ 1.4 GHz @ 0.9V
A53 on 22FDX at GloFo ~~ 2.4 GHz @ 0.9V

So, there is a bunch of stuff like this as well:

This is with 22FDX being sandbagged without the High Performance SOI channel and transistor as well. They can't sandbag 12FDX, so they delayed it.

Operating on BDv5 being switched to small core design implies Bobcat, Jaguar, etc amount of units and size.

2 core of Jaguar 2*3.1 mm2 (Low power cores//Synthesized macros to standard cells) vs 2 core of Excavator 1*14.48 mm2 (High performance core//Custom macros to standard cells)
Following standard procedure of BDv1-2 to BDv3-4. Four-wide decode into two-wide decode, four-wide FPU to three-wide FPU, smaller design(20 mm2 -> 18mm2 -> 14.48 mm2), smaller is better, and higher frequency at same power is also better. It makes more sense that Jaguar would be the basis of BDv5, much like how Zen is based off Bulldozer's SMT portions but trimmed down(4-wide Front-end, 4-wide Floating Point Unit, etc).

NTMBK · Nov 23, 2021

NostaSeronx said:
AMD never fully enabled Bulldozer to begin with. We never got access to the collaborative mode with 1*4 ALU/3 AGLUs(2 clusters as 1 core), we got stuck only with the respective mode 2*2 ALU/2 AGLU(2 clusters as 2 cores).Process numbers come from GlobalFoundries. Bulldozer full feature set was never launched only hinted at in patents and AMD's Open64. There is a bunch of research backing FDSOI is faster than FinFETs. At 64CPP undoped FDSOI becomes faster than super-doped SSRW FinFETs. GlobalFoundries has been low-key releasing A53/A55/A57 cores on 22FDX at higher speeds than 16FF, etc.

A53 on 16FF+ at TSMC ~~ 1.4 GHz @ 0.9V
A53 on 22FDX at GloFo ~~ 2.4 GHz @ 0.9V

So, there is a bunch of stuff like this as well:
View attachment 53276

This is with 22FDX being sandbagged without the High Performance SOI channel and transistor as well. They can't sandbag 12FDX, so they delayed it.

AMD would have LOVED to have a higher performance CPU in the Bulldozer era. If they had a magical switch to enable high performance mode, they would have flicked it in a heartbeat. Either this mode never existed, or they couldn't get it to work- either they ran into issues at design time and dropped it, or it was bugged to hell and non-functional.

Just because they patent something doesn't mean they ever implemented it. Companies patent a lot of stuff.

NostaSeronx · Nov 23, 2021

NTMBK said:
AMD would have LOVED to have a higher performance CPU in the Bulldozer era. If they had a magical switch to enable high performance mode, they would have flicked it in a heartbeat. Either this mode never existed, or they couldn't get it to work- either they ran into issues at design time and dropped it, or it was bugged to hell and non-functional.

Just because they patent something doesn't mean they ever implemented it. Companies patent a lot of stuff.

The feature was in up to January 2009 with compute core being used instead of compute module. Granted patents are usually implemented in hardware even if we don't see it. The module was obviously built with collaborative mode in mind. Especially, with how close the cores are next to each other. There is stuff in between the cores but aren't completely labelled in the diagrams in IEEE.

If respective mode was the goal, they would have split the cores between the IFU/BPU like POWER9 does for its SMT4 clusters/SMT4 cores. Equal distance from LSU to Cache Unit for both clusters. The same for Steamroller, if collaborative mode wasn't the goal it should have split the IFU/BPU like POWER10 did.

The blatant optimizations of the architecture;
Bulldozer/Piledriver => Collaborative(intent) or Respective(option)
Steamroller/Excavator => Collaborative(intent-remained) with enhanced options; {+Opportunistic, +Lock-step, +Run-ahead} or Respective(option).
Opportunistic => Per-cluster decode T0/T1 goes to C0/C1 that decodes it first.
Lock-step = Per-cluster decode, same time execution
Run-ahead = Use cluster 0 for execution and cluster 1 for address

DrMrLordX · Nov 23, 2021

NTMBK said:
It's old tech that was never very good to begin with, it's never coming back.

Still waiting on Harvester. And um all the other pieces of equipment.

NostaSeronx · Nov 23, 2021

DrMrLordX said:
Still waiting on Harvester. And um all the other pieces of equipment.

That one is 28nm:

June 11, 2012
"ST plans to open access to its FD-SOI technology to GLOBALFOUNDRIES’s other customers, giving them the possibility to develop products with the most advanced technology available at both the 28nm and 20nm nodes."

Section Manager Product Development Engineering -- Nov 2010 - July 2012
Manager Product Development Engineering -- July 2012 - August 2015
Lead Product Manager -- 2011 to present
Currently working on x86 architecture product development using 28nm SOI...

In turn it was BDv4 and was replaced by Bristol's and Stoney's Excavatormk2.

You are thinking about Cranesomething and Tunnelborer which is the High Performance cores that would exist if Zen didn't get in the way. One listed on 14nm Bulk FinFET and the other on 14nm SOI FinFET.

We are onto BDv5/BDv6 which is the full synthesized, low macro count which might be RISC-V only. Client/Embedded-target point rather than HPC/Server-target point.

Thunder 57 · Nov 23, 2021

NostaSeronx said:
That one is 28nm:

June 11, 2012
"ST plans to open access to its FD-SOI technology to GLOBALFOUNDRIES’s other customers, giving them the possibility to develop products with the most advanced technology available at both the 28nm and 20nm nodes."

Section Manager Product Development Engineering -- Nov 2010 - July 2012
Manager Product Development Engineering -- July 2012 - August 2015
Lead Product Manager -- 2011 to present
Currently working on x86 architecture product development using 28nm SOI...

In turn it was BDv4 and was replaced by Bristol's and Stoney's Excavatormk2.

You are thinking about Cranesomething and Tunnelborer which is the High Performance cores that would exist if Zen didn't get in the way. One listed on 14nm Bulk FinFET and the other on 14nm SOI FinFET.

We are onto BDv5/BDv6 which is the full synthesized, low macro count which might be RISC-V only. Client/Embedded-target point rather than HPC/Server-target point.

Zen got in the way? Zen probably saved AMD from going bankrupt!

NostaSeronx · Nov 23, 2021

Thunder 57 said:
Zen got in the way? Zen probably saved AMD from going bankrupt!

Zen didn't really save AMD, it actually lead to that bankruptcy hitch.

No 10-core die, no Steamroller server or Excavator server, thus the Intel server rush. It halted AMD's server FCH from 2014-2015 to 2016-2017. Especially, no high performance core development other than Zen. Hence, $300M for K12/Zen and $80M/$60M for SR/XV over that period of time. New products sale at higher profits, old products sale at decaying profits.

It isn't particularly Zen that saved AMD, it was the server-orientated chiplets. Which could have saved AMD's Server Market share with Bulldozer family architecture at a lower overall cost. The bigger the feasible monolithic die the larger the cost savings chiplets provide.

For example:
Greyhound -> 65nm
Greyhound+ -> 45nm
Huksy -> 32nm

Nothing blew up or died technically over those abysmal days.

AMD basically did Greyhound 65nm (Server+Client) -> Greyhound+ & Greyhound++ 55nm (Client-only) and said they were going to do a new grounds up core. Can you imagine if Deneb and Thuban didn't happen?

The big revenue dip only occurred when server upgrade cycle turned up nada for 2015/2016. Refreshes can only go so far, where shrinks can go the distance.

Abu Dhabi - 2012
??? - 2013
Warsaw - 2014 <-- So minute, is it even worth mentioning? The Interlagos/Abu Dhabi 85W HE Opterons are cheaper and have higher half-use turbo than these 99W parts.
??? - 2015
??? - 2016
Naples - 2017

??? really set AMD to fail. Since, those are areas where Steamroller server and Excavator sever could have made bucks like Barcelona -> Shanghai -> Istanbul did.

amd6502 · Nov 23, 2021

NTMBK said:
Why waste resources shrinking a bad architecture? AMD invested billions in designing Zen from the ground up, because Bulldozer was awful. As soon as Zen was ready, they dropped the Bulldozer family and never looked back. It's old tech that was never very good to begin with, it's never coming back.

Biggest advantages imho:

1. Cheap nodes and alternate supply that help protect against product shortages, and that might better address certain markets, like low end and sub $300 consumer markets.

2. Cheap nodes to design and test new combinations of design ideas. (Or, alternately, just respin vanilla variants of old designs for absolute minimal investment).

PD was great at certain things. XV was pretty capable too. Had it been given a better cache system (like an added L3) these would have gotten impressive multithread. But even without an L3, as a low end product that would significantly beat Dali's 3 Billion so transistor count (like Stoney's 1.6B), at say ~2 Billion would be really good.

Look at the sub $200 netbooks, notebooks, 2-in-ones; Intel dominates that, because other than Stoney, AMD has basically nothing in that range (other than the occasional bottom barrel Dali die salvage). Zen may not be the best to address Intel's Atom line of cores. Now that Stoney stockpile has vanished, this is becoming very relevant; and even before this, Stoney (with its 2 thread limit) was always just a cheap temporary patchjob to address that market segment. A better solution would be a 4c piledriver shrunk to an energy efficient node.

The main disappointment of XV and steamroller was that the number of transistors was on the high side. But they are still much lower than that of a big core. Piledriver with a few XV energy efficiency tricks on a somewhat modern node I think could pretty nicely address Atom.

NostaSeronx said:
It is probably [due] to lack of differentiation and that AMD wanted to shift x86 to Zen-only as quickly as possible.

I think that was Lisa's game, and it kind of worked okay for a while. But we're beginning to see the shortcomings of that strategy.

Same thing with the unified AM4 platform. We need an AM1 successor, whether it be a BGA ultra small form factor variant or a traditional socket like AM1. They should be improving on the best of what the competition has to offer; eg. a something like the Rasperry 400, but with slightly more IO (eg an M.2 slot).

NTMBK · Nov 23, 2021

amd6502 said:
Biggest advantages imho:

1. Cheap nodes and alternate supply that help protect against product shortages, and that might better address certain markets, like low end and sub $300 consumer markets.

2. Cheap nodes to design and test new combinations of design ideas. (Or, alternately, just respin vanilla variants of old designs for absolute minimal investment).

PD was great at certain things. XV was pretty capable too. Had it been given a better cache system (like an added L3) these would have gotten impressive multithread. But even without an L3, as a low end product that would significantly beat Dali's 3 Billion so transistor count (like Stoney's 1.6B), at say ~2 Billion would be really good.

Look at the sub $200 netbooks, notebooks, 2-in-ones; Intel dominates that, because other than Stoney, AMD has basically nothing in that range (other than the occasional bottom barrel Dali die salvage). Zen may not be the best to address Intel's Atom line of cores. Now that Stoney stockpile has vanished, this is becoming very relevant; and even before this, Stoney (with its 2 thread limit) was always just a cheap temporary patchjob to address that market segment. A better solution would be a 4c piledriver shrunk to an energy efficient node.

The main disappointment of XV and steamroller was that the number of transistors was on the high side. But they are still much lower than that of a big core. Piledriver with a few XV energy efficiency tricks on a somewhat modern node I think could pretty nicely address Atom.

I think that was Lisa's game, and it kind of worked okay for a while. But we're beginning to see the shortcomings of that strategy.

Same thing with the unified AM4 platform. We need an AM1 successor, whether it be a BGA ultra small form factor variant or a traditional socket like AM1. They should be improving on the best of what the competition has to offer; eg. a something like the Rasperry 400, but with slightly more IO (eg an M.2 slot).

Small cores and craptops trashed AMD's reputation. They were the bargain basement brand. AMD sticker on a laptop was Bad News. And what did AMD get for their troubles? Awful margins, and Intel flooding the market with contra-revenue dumped Atom chips. Intel controls their own fabs, and has massive scale- AMD can't compete on low end price without completely trashing their profits. It was a sucker's game, and as soon as AMD had a genuinely competitive product again they stopped playing it.

Good riddance to the AM1 e-waste. Those were throw away systems that were never fit for most people's needs.

burninatortech4 · Nov 23, 2021

NostaSeronx said:
Zen didn't really save AMD, it actually lead to that bankruptcy hitch.

No 10-core die, no Steamroller server or Excavator server, thus the Intel server rush. It halted AMD's server FCH from 2014-2015 to 2016-2017. Especially, no high performance core development other than Zen. Hence, $300M for K12/Zen and $80M/$60M for SR/XV over that period of time. New products sale at higher profits, old products sale at decaying profits.

It isn't particularly Zen that saved AMD, it was the server-orientated chiplets. Which could have saved AMD's Server Market share with Bulldozer family architecture at a lower overall cost. The bigger the feasible monolithic die the larger the cost savings chiplets provide.

For example:
Greyhound -> 65nm
Greyhound+ -> 45nm
Huksy -> 32nm

Nothing blew up or died technically over those abysmal days.

AMD basically did Greyhound 65nm (Server+Client) -> Greyhound+ & Greyhound++ 55nm (Client-only) and said they were going to do a new grounds up core. Can you imagine if Deneb and Thuban didn't happen?

The big revenue dip only occurred when server upgrade cycle turned up nada for 2015/2016. Refreshes can only go so far, where shrinks can go the distance.

Abu Dhabi - 2012
??? - 2013
Warsaw - 2014 <-- So minute, is it even worth mentioning? The Interlagos/Abu Dhabi 85W HE Opterons are cheaper and have higher half-use turbo than these 99W parts.
??? - 2015
??? - 2016
Naples - 2017

??? really set AMD to fail. Since, those are areas where Steamroller server and Excavator sever could have made bucks like Barcelona -> Shanghai -> Istanbul did.

While this insight is appreciated, I think it is fundamentally flawed.

Yes AMD 'could' have made slightly more money those years with server variants of Steamroller and Excavator. But for what?

a) Sunk cost fallacy in terms of chip development. They needed their limited funds to go to Zen.
b) Broadwell-e and Haswell-e were likely still to be far superior to a hypothetical Excavator server die (regardless of core count)
c) Excavator server would likely consume too much power when scaled to higher core counts. It was designed for a low power (<35W) target to achieve most of its efficiency gains over Steamroller.

AMD made the right choice dropping them completely; even if it really hurt in the short term.

amd6502 said:
Look at the sub $200 netbooks, notebooks, 2-in-ones; Intel dominates that, because other than Stoney, AMD has basically nothing in that range (other than the occasional bottom barrel Dali die salvage). Zen may not be the best to address Intel's Atom line of cores. Now that Stoney stockpile has vanished, this is becoming very relevant; and even before this, Stoney (with its 2 thread limit) was always just a cheap temporary patchjob to address that market segment. A better solution would be a 4c piledriver shrunk to an energy efficient node.

Athlon Silver 3050e is actually a very compelling chip for <$200 netbooks. It's everything that Stoney wished it was and couldn't be.

NostaSeronx · Nov 23, 2021

amd6502 said:
Look at the sub $200 netbooks, notebooks, 2-in-ones; Intel dominates that, because other than Stoney, AMD has basically nothing in that range (other than the occasional bottom barrel Dali die salvage). Zen may not be the best to address Intel's Atom line of cores. Now that Stoney stockpile has vanished, this is becoming very relevant; and even before this, Stoney (with its 2 thread limit) was always just a cheap temporary patchjob to address that market segment. A better solution would be a 4c piledriver shrunk to an energy efficient node.

Going into this AMD's A6-9220C Stoney is still technically faster and cheaper than AMD's Athlon 3015Ce Pollock.

AMD Pollock 3015Ce:
6W ~ 1.2GHz to 2.3 GHz
3 CUs @ 600 MHz
64-bit DDR4 @ 1.6 GHz
150 mm2 die on 14nm.

AMD Stoney A4-9120C:
6W ~ 1.6 GHz to 2.4 GHz
3 CUs @ 600 MHz
64-bit DDR4 @ 1.86 GHz
125 mm2 die on 28nm

AMD Stoney A6-9220C
6W ~ 1.8 GHz to 2.7 GHz
3 CUs @ 720 MHz
64-bit DDR4 @ 1.86 GHz
125 mm2 die on 28nm

It makes sense to insert an actual successor to Stoney when Mendocino comes out.

burninatortech4 said:
c) Excavator server would likely consume too much power when scaled to higher core counts. It was fundamentally designed for a low power (<35W) target to achieve most of its efficiency gains over Steamroller.

Opteron 16-core 85W Interlagos => ~5.3W per core(10.6W per module).

AMD Interlagos Press Dock B Attendum:
Bulldozer module in 85W Interlages HE consumes 10.6W in work.

AVFS to Optimize Performace Per Watt slide:
Excavator AVFS module consumes at minimum 2.5W in work.

I believe the frequency normalized 0.5 is that 2 GHz range, where as frequency normalized 0.9-1.0 is the 3.5-4.0 GHz range.

Opteron 16-core XV variant from 40h-4Fh HE => 8M * 2.5W at ~2 GHz => 20W power consumption.
Opteron-2Die 32-core XV variant from 40h-4Fh HE => 16M * 2.5W at ~2 GHz => 40W power consumption

It is unlikely that Opteron would have burned power since Opterons generally operated at low frequency.

The only case of sub-100W for AMD is the limited run of Snowy Owl: https://en.wikipedia.org/wiki/Epyc#First_generation_Epyc_(Snowy_Owl)
It also didn't launch till 2018. 2013 to 2018 and no new low-power server parts. Went from buying Seamicro in 2012 for $334M to not even releasing post-Piledriver Opteron. With them burying the hatchet in 2015. Basically any investment in products, were wasted throughout. They basically self-Osborned themselves into near bankruptcy waiting for Zen EPYC.

Consistent adoption rate:
Barcelona 2007-2008
Shanghai 2008-2009
Magny-Cours 2010
Interlagos 2011
Abu Dhabi 2012
No new high MSRP products. Plus all that money to get people to adopt Bulldozer/Piledriver Opteron with momentum going into Steamroller/Excavator Opteron in HPC/Server/Cloud. Also, have this awkward moment, both processors released in 2012 btw: https://www.principledtechnologies.com/clients/reports/Dell/R815_power_TCO_0113

AMD Server - Cloud Computing;
October 2012 = 28nm HPC CPU in 2H 2014, with ominous red 20nm in the bottom right corner.

*incoming whiplash*
AMD 2013-2014 Server Roadmap:
June 2013 = no 28nm HPC in 2H 2014, get two part SKU refresh instead in 1H 2014, it isn't even better than Abu Dhabi HE.

NTMBK said:
Awful margins, and Intel flooding the market with contra-revenue dumped Atom chips.

Actually, AMD introduced Bhavani(AM1) in expectation to get contra-revenued.

107 mm2 on 28nm versus 102 mm2 on 22nm(Bay Trail) and 87 mm2 on 14nm(Cherry Trail). In both cases AMD caused Intel to lose revenue. However, Bhavani wasn't the nuke it was Beema/Mullins.

Awful margins doesn't really mean anything when they sold a 315 mm2 die for $200. 107-125 mm2 at $50-25

FX-8 @ $199 @ 315 mm2
Ath AM1 @ $55 @ 107 mm2

315 / 107 = 2.943925234 put this in an exponent = 2 ^ 2.943925234 = use that to divide $199 => $26... So, AMD made a $30 higher margin on 4-core Jag than FX, relatively.

burninatortech4 · Nov 23, 2021

NostaSeronx said:
Going into this AMD's A6-9220C Stoney is still technically faster and cheaper than AMD's Athlon 3015Ce Pollock.

Not sure where you're getting faster from. Gimped Zen is still Zen.

Passmark

Athlon 3015e 1403/2674
A6-9220C 1020/1163

NostaSeronx · Nov 23, 2021

burninatortech4 said:
Not sure where you're getting faster from. Gimped Zen is still Zen.

Passmark

Athlon 3015e 1403/2674
A6-9220C 1020/1163

The faster comes from 14.48 mm2 but still having higher frequency.

A6-9220c doesn't have efficient power delivery, improved AVFS via PurePower, process improvement(fully depleted-orientated), and architectural reworkings.

With an optimized port on 22FDX/12FDX, then basically higher frequency is more achievable. Which was why there was a move from Carrizo-L(2.5 GHz-25W) to Stoney(3.5 GHz-25W); a8-7410 804 ST passmark and a9-9410 1339 ST passmark.

Basically, a x86 CMT core post-inversion would continue the trend of higher frequency at same power for both CPU & GPU as memory speed rises.

Circa Jan 2016.

Then, the cheaper part extends that Stoney inserts below Raven2/Dali/Pollock, since that product line's die is more large. With the process also being more complex thus more expensive to yielded wafer compared to 12FDX/22FDX. Design wise FinFETs are very limited on LVT/RVT optimization requiring extended masks/process steps that aren't needed in FDSOI.

DrMrLordX · Nov 23, 2021

AMD doesn't have the wafer capacity to waste time with crap like a successor to AM1 or anything else in the bargain bin. Which sucks for people that actually like the bargain bin but, what can you do?

Meanwhile AMD is cranking out 96c server monstrosities, to be followed up by 128c server monstrosities. That sell for $$$$$$$$. The investors will be pleased. If you are a fan of firesale sub-$200 cheapo systems that AMD used to have in the past . . . you're out of luck! Practically nobody is making that now.

AMD Bristol/Stoney Ridge Thread

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Super Moderator CPU Forum Mod and Elite Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Senior member

Diamond Member

Senior member

Diamond Member

Lifer