waiwatiwaitwaitwaitwatiwait: is someone saying that there is somewhat active development of construction core processors on less than bleeding edge nodes going on?
The core after Excavator and Excavator's shrinks were dropped from a high-performance insert. With Zen completely taking up the high performance side.
First instance:
Next-generation "Low Power" x86 cores; Not Leopard or Margay (cancelled in early 2015 officially) and the development appeared around late 2015 around designs complete of Zen.
Second instance
Which around 12FDX/7LP announcement in late 2016 and switched to the second overarching concealed codename project: Next-generation "Ultra Low Power" x86 cores.
Zen2 as the big core at 7LP and ULP1 as the small core at 12FDX.
The work appears to cover both "22FDX" and "12FDX" but more recent changes in AMD|GF appears to have a focus towards "12FDX". Also, 3rd form of Bulldozer/Roadwork/Construction cores target was the 14nm generation in general. Early simulations would be with expected 14nm design rules. The only 22FDX product that ever neared release was a refresh of the 60 GHz Nitero chip.
AMD didn't adopt 28HP, instead waited for 28SHP for production parts. The identical version to that is waiting for high mobility strained FDSOI wafers. Which appears first in 12FDX and will be backported to 22FDX later.
Of which around the 2016-2018 timeframe:
ULP CPU Cores
ULP GPU Cores
ULP SoC architecture
ULV Branch Predictor
ULV Cache Design
etc.
All popped up associated with the NG ULP cores.
The general evolution of the Low Power cores before indicate a general rise in frequency: Bobcat = 1.75 GHz -> Jaguar = 2.4 GHz. So, an ultra low power design with low delay in architecture and a low delay in process, means especially low FO4/gate/wire delay. Getting around the power consumption is the Vdd scaling offered by FDSOI. Small islands of repeated schedulers, PRFs, executions units is better for this than a single large island of schedulers, PRFs, execution units like that in Zen.

(I can't read spaghetti, but this should provide the idea... Clustered = Non-monolithic, Standard SMT = Monolithic)
Architectural threading:
1998-2004 = Clustered Microarchitecture (Original grounds up K8 design - David Witt/James Keller patents)
2005-2007 = Cluster-based Multithreading
2008-2012 = Chip-level Multithreading
2013-now = Cluster-based Multithreading
Clustered Microarchitecture = Shared Retire/Rename because it is a single processor core.
Cluster-based Multithreading = Shared Retire/Rename because it is a single processor core, but with SMT-enabled for increased utilization of second execution core.
Chip-level Multithreading = Two Retire/Rename because it is two processor cores.
Repeat of Cluster-based MT ...
In Cluster-based Multithreading the retire/rename isn't fully duplicated over both schedulers/execution/PRFs, it is one unit with SMT.
Basically, this line:
"The core hardware can stay lean by supporting execution resources and bandwidth for a single thread, instead of scaling up to cover SMT throughput. As a result, the core remains small and enables a higher-frequency design"
but add [execution] to core.
Scale-out for SMT(TLP), rather than scale-up for SMT(TLP).
Of which is based off an earlier single-threaded architecture.
"High Frequency, Wide Issue Microprocessor" from 1997:
Under the:
K8 patents the register file is duplicated and the L0i removed. (As well as J.K. patents showing off a smaller core variant with just one integer core)
K9 patents the AGUs were shifted to Instruction Window 2 which directly connects to the Load/Store unit. Like the way Bobcat does...
Of the above architecture, both execution cores can run their own slice of single-thread. AMD's processors do happen to be OoO.
Scale-out for ILP, rather than scale-up for ILP.
Now combine the two:
Thread0 => Cluster0+Cluster1
or
Thread1 => Cluster0+Cluster1
or
Thread0+Thread1 or Thread1+Thread0 => Cluster0+Cluster1
Dynamically load the core with TLP or ILP. The core is most efficient when it is off or fully loaded.
Now all we have to do is wait for that "Ultra Low Power" x86 core to come out. Knowing where the origin is and knowing where the project is going: Safe to conclude that it will be FX at a lower TDP and price. (If FX used the originally planned K10 core)
Bobcat vs Bulldozer:
4.9 mm2 * 2 (Double core count) * 2 (Half the lean for High-Performance) => ~19.6 mm2 ~~ near actual of 21 mm2.
Jaguar vs Excavator (same node, same libs)
3.1 mm2 * 2 (Double core count) * 2 (Half the lean for High-Performance) => ~12.4 mm2 ~~ near actual of 14.48 mm2.
Jaguar vs Zen vs Ultra low power (similar node, different libs)
1.8 mm2 * 1.5 (Double execution core count) * 1.5 (increased frequency with extra-lean) / 0.9=> ~3.645 mm2 ~~ likely very near actual mm2.
Zen = 5.5 mm2 ... ULP same ILP/TLP and higher frequency at less area. Higher frequency comes from the superior architecture design and process node.
Curious question: Does FX-8350 work better with Nvidia drivers or AMD drivers? Ironic if Nvidia drivers perform better. That would be like AMD treating their child as a bastard.
Questionable answer: FX-family processors work best under open-source operating systems and open-source drivers.