The Longmont/Mile High CPU Cores were killed off by him. This is after he gave them a go ahead too. It is basically Intel seeing Pentium M at Israel. Then, having the CEO go ballistic and shuttering that CPU design center down.
2002-2008:
-> Every next-generation(non-K7 related) High Performance core never launched.
-> CEO in a fit shut down the Low Power cores for next-gen Personal Internet Communicator.
Now everyone and their family has at least one "MiniPC" somewhere....
Would have been nice to have gotten the Longmont "Bulldozer" core in 2009 on 45nm in a minipc(>10W target). Rather than wait for the Sunnyvale "Bulldozer" core in 2011 with no miniPC(>45W target).
Longmont "Bulldozer" core => APU First, Server Second
A single 45nm Bulldozer processor is smaller than 45nm Greyhound processor.
10W first, then scale up many cores up.
2009 Server CPU target => When finessing it out of AMD, 8-core 45nm LM "Bulldozer" would be smaller than 4-core "Greyhound" 65nm. Cost-effectiveness(small die)/power efficiency(small core(SINGULAR core)) was the goal under the 2005-2007 Longmont-Mile High design.
1st Bulldozer Design => Premium "Internet Box"/PIC product with Fusion.
2nd Bulldozer Design => Many-core HPC product standard market. 8/16 HEDT/Server first, then 4 client product with a really really small die.
Sunnyvale "Bulldozer" module => Server First, APU Last
A single 32nm Bulldozer processor is larger than 32nm Greyhound processor.
140W first, then lower.
Orignal Bulldozer by PH/CRM/BB was from scratch off of Bobcat, while its HR/DM/MB remake was based off of Andy Glew's K10. The current "Bulldozer" stole the name from the original core.
Mobile first codename:
"Bulldozer is the code name for the Fusion chip that will be designed for everything from handhelds to servers."
Mobile first platform:
"Bulldozer will be part of the "Falcon" PC platform that also includes an integrated memory controller, a graphics processor, cache memory and a PCI Express controller."
[In 2007 BD was going to be a 45nm product. When we decided to make the change, do it right and move it to 32[n]m everything changed.
Anything that was written about BD in 2007 is probably not accurate.]
45nm processor != 32nm processor
Initial patent w/ Bobcat(unchanged from early cancelation and revival):
Should be noted just to iterate on this again:
Execution core => Scheduler + Function Units => Bobcat has 3 schedulers tied to specific functions thus has three execution cores.
Initial architecture details of Bulldozer:
The original Bulldozer followed the same example as Bobcat: Three separate instances of; Integer Execution, Address Execution, and Floating Point Execution. However, the Integer and Floating Point executions were double instanced, and Address Execution was widen for the higher memory IPC. It was not duplicated so it can centrally address all memory<->reg operations for all clusters(2x Int/2x FPU). As the Original Bulldozer had a single-threaded focus, most new workloads Media/HPC(SSE5) were expected to run in single-threaded mode. While, existing code optimized for K8/Greyhound(<SSE4) was expected to run on Lo-/Hi- clusters. Thus, getting the highest single-threaded and multi-threaded core performance.
New workloads => Use both for single-threaded workloads // More ILP for Integer and more DLP for SIMD.
Old workloads => Use both as multi-threaded workloads // Instead of L2<->L2 like that of K10-BD/K8H-GH, the original LM-Bulldozer core did sliding L1d<->L1d transfers for multithreaded, via central address execution.
Taking the above(+post-mortem investigation of 2005-2007 interviews post-Moore's CMT slides and post-Hester's Bulldozer slides):
FU126A => Two instances of 2 ALUs
FU126B => Two instances of 2 FPUs
FU126C => One instance of 3 AGUs : 1x L1D
50% area overhead was not against Greyhound, but rather Bobcat. As Bulldozer, the original design started from Bobcat.
Actual Bulldozer: Synthesized macros via Longmont/Mile High (Low power/High Density) -> K10 Bulldozer: Template-based full custom macros via Sunnyvale (High performance/No density)