Still in progress.
Movement from CMT(One core per thread) to DMT(Many cores per many threads to many cores tied to a single thread)

^Closest existing architecture to the threading style is P9.
Current-gen 22FDX(Traditional channel lattice/gate-first/12nm soi thickness/20nm BOX) to Next-gen 22FDX or 12FDX(Novel channel superlattice/gate-last/5nm soi thickness/15nm BOX) // soi thickness is the layer above buried oxide(box), current gen is thinned to ~8.5nm which is far from the required sub-6nm for High Performance.
Only solution that I have gathered from small snip-it texts.
Single module
Next-gen graphics
AI/Vision engine
LPDDR memory only
Anything K8-2 and K10 related is pretty much locked to GlobalFoundries;
K8-2
https://patents.google.com/patent/US6553482B1 expired, but Current Assignee: GlobalFoundries Inc
K8-2
https://patents.google.com/patent/US6240503B1 expired, but Current Assignee: Advanced Micro Devices Inc
K10
https://patents.google.com/patent/US7793080B2 not-expired, but Current Assignee: GlobalFoundries Inc
https://patents.google.com/patent/US6240503B1 =>
"According to one embodiment, address generation units
34A and
34C are used for load memory operations and address generation units
34B and
34D are used for store memory operations. Functional units
32A and
32D are integer functional units configured to perform integer arithmetic/logical operations and execute branch instructions. Functional units
32B and
32E are multimedia execution units configured to execute multimedia instructions, and functional units
32C and
32F ate floating point units configured to execute floating point instructions."
A processor employs an instruction queue and a dependency vector generation unit. The dependency vector generation unit generates a dependency vector for each instruction operation. Particularly, a dependency vector corresponding to a first instruction operation may be indicative of an ordering...
patents.google.com
"It is noted that, while in the present embodiment the instruction queue is physically divided into instruction queues
36A-
36B, other embodiments may divide the instruction queue into even larger numbers of physical queues which may operate independently. For example, an embodiment employing four instruction queues might be employed (with four register files and four execution cores)."
Generally enough leeway to have AMD's Neoverse Nx to TSMC and AMD's Neoverse Ex to GlobalFoundries.
Pre-2006 => K9 canned
Post-2012 => Zen taking K9 w/ micro-op cache replacing the trace cache and a low-power overhaul replacing its 5 GHz push.
Pre-2002 => K8-2 canned
Pre-2006 => Bulldozer takes K8-2 and further clusters it and becomes K10 officially.
Now we just have to wait for the third overhaul. Since, Family 19h is the third full overhaul architecture from K9. With the growth AMD is getting again, it isn't a bad time to bring out two cores like old times.