It's matter of FO4 and clock.
You are really misinformed here.
Do you know how IPC is increased?
Do you know the process of increasing IPC increases structure SIZE and POWER?
Do you really think +40% comes free?
Do you realize how adding so many transistors in a smaller area with higher work done per cycle creates huge thermal density problems?
And how they affect the chip?
ALL of these factors are interlinked.
Forget process for now (capacitance, Vth and wire delay).
You either aim for a high IPC/low clocks design or a low IPC/high clock design.
More work per logic COSTS size/power!
There is no such thing as both possible in science!
Even when you have an excellent balance, it is about the process, the yields, and the market. Process factors are so tricky, many do not have a known cause with errors. Just 'poor reliability'.
I questioned you about Vth, capacitance and the rest of GloFo PROCESS qualities as soon as you were 'developing' them, and they still remain unanswered to date.
Show me ONE x86 desktop/server MFG that released a new uarch with MUCH higher IPC + higher core count + higher clocks + much lower power from their previous chip, in the past 15 years.
Just one.
Just design a very high FO4, high IPC CPU, then break the stages with flip flops and increase pipeline stage number. You lose something for branch miss penality, but if branch prediction is good, it's negligible. So you have an high IPC low FO4 design.
OMG.
You make all the worldd best, most intellectual researchers seem so backwards
More stages->less logic per stage=deeper pipeline=higher latencies=lower IPC=higher possible clocks
Have a read of some FO4/IPC basics because you're reasoning is non-sensical and scientifically impossible:
http://www.cs.utexas.edu/users/skeckler/pubs/isca00.pdf
Table 7 shows you exactly what I've described.
Let me be more clear:
Let's suppose BD/XV FO4 is 17 (the number is unimportant). It reaches 4.9GHz at 1.43V on the 28nm BULK and HDL libraries, with all 4 cores, 2 modules.
Now we have a Zen core, that has the same (moreless) FO4 and the same (moreless) transistors than a XV module (2 cores), half than the overclocked XV depicted above.
With the same FO4 and half transistors, a zen core is on the 14nm FF that has:
1) Less Vth
2) Less leakage
3) Less parasitic capacitance
4) More transconductance (transistor strength)
Intel has a far superior process for all intents and purposes. Why did they not see superstellar gains?
Show me where they doubled cores, increased clocks, doubled caches and structures (added much higher IPC) AND decreased power at the top of their line.
I know that with double the transistor density, heat can become a problem, but the gains are so high that we should have an increase anyway... +50% transconductance, -20% capacitance, -83% leakage, less Vth... At same frequency the power can be half per transistor... And official AMD statements of same energy cycle per core says just that.
AMD never clarified anything about WHICH Core at WHAT MHz/Voltage during WHICH workload under WHAT conditions for WHICH part of the traditional curve has the same energy per cycle.
In other words, it can really be used for nothing.
Sent from HTC 10
(Opinions are own)