How bad was AMD Bulldozer and its variants

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

WhoBeDaPlaya

Diamond Member
Sep 15, 2000
7,415
404
126
BD was fine if the price was right.
Still have quite a few 8320s running in non-gaming systems.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Stars cores were small. The comparison is relevant based on cost.
A 8150 was 315mm2 on 32nm.
Thuban x6 was 346mm2 on 45nm.
A zen core is 193mm2 btw.
Bd was a epic failure. And mostly because of sad efficiency and its huge size and not because of its lackluster performance from witch it was famous. It was worse than its reputation.
Llano(32nm) Stars => 32 mm² for two cores & 2 MB L2.
Trinity(32nm) Piledriver => <31.9 mm² for two cores & 2MB L2.

Then point out Stars doesn't use the same IP or PDK Lib as Bulldozer.
Stars => 2K BTB(10-cycle mispredict) // Piledriver => 0.5K L1 BTB(16-cycle mispredict) & 5K L2 BTB(20-cycle mispredict) & Dual Branch Prediction(Perceptron(Piledriver) & Native(Bulldozer)) + 0.5~0.6K Return/Call/Direct/etc Index(15-cycle mispredict(this is the only IP shared with Stars(10-cycle misdpredict))
Stars => 3-wide decode // Piledriver => 4-wide decode(+more decode stages/complexity)
Stars => 3x12-wide schedulers - small retire/rename // Piledriver => 2x40-wide scheduler + 2 large retire/rename.
Stars => 3 execution buses(3 ALU to 3 AGU combiniations, but never 3 ALU + 3 AGU) // Piledriver => 4 execution buses. (Always 2 EX + 2 AGLU) - 2x this
Stars => 6T SRAM, Smaller LSQ, Cache Unit, a single core etc. // Piledriver => 8T SRAM for L1s, Doubled and Wider LSQ, Doubled Cache Unit, two cores, etc.
Stars => Integer & Floating Point 3-pipe(FADD+FMUL+FSTO) // Piledriver => 2-pipe FP FMAC(Integer Mul + Rotates/Packs/Permutes) + 2-pipe Integer MMX(Adds + Stores)

And Piledriver isn't even larger than two stars cores. Heck, it doesn't even use extremely optimized macros(Synthesized macros). Each macro had to be HAND-TWEAKED(aka Hand Custom). While, Llano/Bobcat/Jaguar/Zen used computer tweaked macros. (Steamroller/Excavator synthesized to the previous existing macros. It wasn't a top to bottom overhaul, just cleaning up the macro dust)

It's a miracle that it even worked as well as it did. Compared to Agena and Summit Ridge, it was blessed by the gods.
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
Its not a blamegame on the engineers making the stuff on the ground because they also made zen. And yeaa bd really learned amd a bit about efficiency that we already saw in excavator or even the clockmesh in pd. Hard lessons.

But bd didnt arive as 2 cores on 32mm2 sans l3. It came as 315mm2. Thats the reality that hit the market. Yes it was evident from the die shots most of the area was wasted. But its just another technical reason of a long string that went completely wrong. Tons of top level designs decisions were incompetent. You cant hand optimize away from that.
Probably it would have been better if it was worse so it could have been scrapped at 32nm tapeout! - it was that bad
 
  • Like
Reactions: NTMBK

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Its not a blamegame on the engineers making the stuff on the ground because they also made zen. And yeaa bd really learned amd a bit about efficiency that we already saw in excavator or even the clockmesh in pd. Hard lessons.
Except, AMD didn't make Zen. It was developed first at Apple, which was shot down. SMT consumed too much power in a mobile phone.
But bd didnt arive as 2 cores on 32mm2 sans l3. It came as 315mm2. Thats the reality that hit the market. Yes it was evident from the die shots most of the area was wasted. But its just another technical reason of a long string that went completely wrong. Tons of top level designs decisions were incompetent. You cant hand optimize away from that.
Probably it would have been better if it was worse so it could have been scrapped at 32nm tapeout! - it was that bad
Sandy Bridge-EP-4 was 270 mm². So, the large dies wasn't exclusive to just AMD. AMD at the time was using larger components that were more optimal in 0.7-1.1v range. It isn't really all that shocking FX was at 315 mm² and later 319 mm² with Piledriver. AMD should have launched with the 45nm PDSOI version of K10 @ Phenom FX & APU Falcon. Back in 2008/2009, this would have shown the progress from standard voltage to low voltage design. 45nm version of Bulldozer being effectively smaller than 32nm Bulldozer, etc.
Verification of Bulldozer cores (microprocessor based on K10 micro architecture & M-SPACE design methodology)
The design and verification of the 45 nm clock skew and clock tree test chip for the K10 (Bulldozer)
For all intents and purposes the 32nm PDSOI version should have consumed significantly more power than it did. More pipeline stages with each stage being wider(~2.3x to be approximately exact) than K8! (the ! is the actual name for K8 cores with 128-bit LSQ & 128-bit FPU). So, the fact that FX-8150 and later FX-8350 out clock Phenom II X6/Phenom II X4, is pretty significant. Especially, when you consider than 32nm PDSOI had very little power/perf improvement over 45nm PDSOI in initial yields(2010-2012; 2013 being the year 32nm PDSOI finally got faster than 45nm PDSOI).

Compare:
http://www.cpu-world.com/CPUs/K10/AMD-Phenom II Quad-Core Mobile P960 - HMP960SGR42GM.html
http://www.cpu-world.com/CPUs/K10/AMD-A8-Series A8-3520M.html
http://www.cpu-world.com/CPUs/Bulldozer/AMD-A8-Series A8-4500M.html

Late PS:
Originally, K9(4-wide(4 ALUs)/<5 GHz) was fused with K10. In fact, with the release of Core 2 and later Core i. The two cores in Bulldozer was originally planned to be 4 ALUs/2 Load/1 Store config. Which shared heavily with Bobcats IP units being One Complex I-pipe, Two Simple I/Ld pipes, One Complex I/St pipe. (Sometime, Late 2009-2010 those goals were swapped and boom reuse of Alpha 21264.) Bobcat => High performance to Low Power & Easy to reuse IP. Both were announced at the same time. So, the lack of reuse is pretty glaring.
 
Last edited:

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
{{Citation needed}}
That theory is ridiculous and is completely based on Keller working at Apple for the A5X design. Apple doesn't and will never have an X86 license. There is no reason they would have Keller designing a x86 CPU that they could never use.

Keller was originally brought back to AMD after they bought that ARM based Server company and they wanted to develop a ARM and ARM/x86 hybrid (not like they did with the PSP). They moved him over to x86 quickly after that to work on what would become Zen. But Zen has nothing to do with Apple or anything Keller did at Apple.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
{{Citation needed}}
Apple A7 has SMT logic in Cyclone which was disabled/fused-off at release. With A8 that logic was completely removed. That defunked path was moved to AMD via IP theft(early A7/Cyclone concept) by Keller. Zen is based off Cyclone from A7, but not the subsequent releases in A8/A9/A10. K12 which is ARM was a complete rip of the core, Zen was considered the not so desired sister-core to it. Except, Zen was different enough to bypass any engineering scrutiny cuz x86. Which, ultimately leads to Zen being released and K12 being killed off do to a cease and desist order by Apple.

So, ideally Bulldozer was good because in most part it was built Internally. Zen is awful because it was built from another companies existing core. While components are different, largely because Zen is built from shrinked Excavator IP/patents/etc. So, really have fun with your recycled Excavator cores. (The ARM version had none of this differing from Cyclone only the x86 ver.)

Trolling is not allowed
Markfw
Anandtech Moderator
 
Last edited by a moderator:

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
^^ pretty doubtful AMD would "steal" or circumvent licences.
Especially since IBM is practically giving away SMT and power licences in general.
https://en.wikipedia.org/wiki/OpenPOWER_Foundation
IBM is using the word open to describe this project in three ways.[3]

  1. They are licensing the microprocessor technology openly to its partners. They are sharing the blueprints to their hardware and software to their partners, so they can hire IBM or other companies to manufacture processors or other related chips.
  2. They will collaborate openly in an open-collaboration business model where participants share technologies and innovations with each other.
  3. Advantages via open source software such as the Linux operating system.
Hey,finished blueprints AND open software (other suckers doing work for them)
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
2 wide core with a half frontend, dressed up in a long pipeline.
If stealing from Apple is what is needed to get a new design strategy, its fine for me. I just dont find it likely nor needed.
 

MentalIlness

Platinum Member
Nov 22, 2009
2,383
11
76
I wont be upgrading until a few months after Vega hits. But my FX-8120 has served me perfectly since it was purchased at launch. Never one issue with games or programs that I use. Been oc'd for 3 years now.

When I do upgrade though, it'll likely be a 1600X
 

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
That defunked path was moved to AMD via IP theft(early A7/Cyclone concept) by Keller.

Careful. That's borderline libel there. Hell, nothing borderline about it . . .

Which, ultimately leads to Zen being released and K12 being killed off do to a cease and desist order by Apple.

Can you produce copies of this order?

So, ideally Bulldozer was good because in most part it was built Internally.

. . . what?

Zen is awful because it was built from another companies existing core.

Gee, I guess k6 was awful too . . .
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
. . . what?
It was built grounds-up internally by teams already at AMD. AMD received baseline MCMT & concept Bulldozer between 2002-2004. First Bulldozer test-chip was taped out in 2006. Which, then would go through a very weird development trend.
Gee, I guess k6 was awful too . . .
NexGen was purchased by AMD. So, effectively it was then purely AMD in origin. It also lead to superior design within one-ish year of K5s shipping.

Did AMD buy Apple or the team/patents/licenses/IP? No.

If you want to go that way... K5(15h) -> K6(17h) -> K7(future architecture). So, what ever is after 17h will be better than 15h&17h combined. // I can wait.
---
Meanwhile went looking through the data.

Bulldozer? Bad.
Piledriver? Good enough.
Steamroller? Better than enough.
Excavator? Worth it till Kaby Lake came out. (2015/2016 good years, 2017 death of all that is AMD.)
// Excavator is worth it if you can get it for less than Apollo Lake;
https://browser.primatelabs.com/v4/cpu/search?utf8=✓&q=a9-9400
https://browser.primatelabs.com/v4/cpu/search?utf8=✓&q=Celeron+J3455
(I know I am comparing but I am pretty sure Stoney Ridge costs less than the dual core Apollo Lake.)
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
It was built grounds-up internally by teams already at AMD. AMD received baseline MCMT & concept Bulldozer between 2002-2004. First Bulldozer test-chip was taped out in 2006. Which, then would go through a very weird development trend.NexGen was purchased by AMD. So, effectively it was then purely AMD in origin. It also lead to superior design within one-ish year of K5s shipping.

You're missing the point. But hey, I'll let Keller himself go after you from this point since you're the one accusing him of IP theft, so . . . that's all I have to say about that.
 
  • Like
Reactions: Ozzyrulez

NTMBK

Lifer
Nov 14, 2011
10,452
5,839
136
Apple A7 has SMT logic in Cyclone which was disabled/fused-off at release. With A8 that logic was completely removed. That defunked path was moved to AMD via IP theft(early A7/Cyclone concept) by Keller. Zen is based off Cyclone from A7, but not the subsequent releases in A8/A9/A10. K12 which is ARM was a complete rip of the core, Zen was considered the not so desired sister-core to it. Except, Zen was different enough to bypass any engineering scrutiny cuz x86. Which, ultimately leads to Zen being released and K12 being killed off do to a cease and desist order by Apple.

So, ideally Bulldozer was good because in most part it was built Internally. Zen is awful because it was built from another companies existing core. While components are different, largely because Zen is built from shrinked Excavator IP/patents/etc. So, really have fun with your recycled Excavator cores. (The ARM version had none of this differing from Cyclone only the x86 ver.)

Trolling is not allowed
Markfw
Anandtech Moderator

Jesus H Christ Seronx, you've really gone off the deep end this time.
 

NTMBK

Lifer
Nov 14, 2011
10,452
5,839
136
I don't know where Seronx has been getting his information at

crack-pipe.jpg
 

Panino Manino

Golden Member
Jan 28, 2017
1,144
1,383
136
@NostaSeronx
I'm curious, what you think about Mongoose architecture? "Obviously" all signs point that Samsung "stole" it from AMD?
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
@NostaSeronx
I'm curious, what you think about Mongoose architecture? "Obviously" all signs point that Samsung "stole" it from AMD?
I feel the same way about Mongoose. At least, Samsung evolved the Cat design rather than doing a 1:1 rip with Apple, but with Excavator IP.

Mongoose over Jaguar:
1CALU-2 ALUs vs 1CALU-1 ALU
2 Branch Units vs none.
1 Store Data vs 0 Store Data
1 FMAC vs no FMAC (technically, could support native AVX2-128b. Which isn't in Cheetah(Sony) or Tiger(Microsoft))
Shorter pipeline/less pipeline stages.
Vastly more optimal L2 phyiscal design.

Aspen project(K12/Zen/Zen 2) over Cyclone.
Basically, almost identical, little to no improvement. With none of the improvements in Cyclone+, Twister, Hurricane/Zephyr. Essentially, Cyclone with some AMD exclusive IP from Excavator bolted on.

Essentially, Person A sees Person B doing some awesome martial arts.

Samsung A sees Person B, I could do that. Wait, I can make it even better. Adds some moves to that style.
AMD A sees Person B, ah ha this would be something nice to rip. *sinister laugh*, Doesn't add anything to the style.
 
Last edited:

mtcn77

Member
Feb 25, 2017
105
22
91
I'd say Bulldozer and all its variants had good performance, but total junk features. Boost and throttle capabilities didn't add value, if you turned them off you had a much better basis. Intel chips aren't different in that regard, but at least they boost when met with single-thread loads. On Bulldozer, single-core - dual, in light of modular design - boosting occurs when multi-threading is turned off completely when core exceeds a major offset +0.075v. It also doesn't know when not to boost which is a very important power metric. It completely deters the user from doing so since multithreading for all its worth is better than the sparse benefit of single threaded boosting, imo.
 

majord

Senior member
Jul 26, 2015
509
711
136
This is getting pretty hilarious.

One thing Nosta has brought up though is the comparisons of 32nm SOI to 45nm, It really was a dog for some time.. Llano was a disaster IMO, with FMAX well below 45nm, and perf/watt was barely any better even at lower freq/power levels. So if you actually compare Llano with piledriver based APU's, the later looks more impressive.

AMD moving to bulk TSMC process wasn't exactly a step in the right direction for a high freq architecture either. tldr, architecture alone is not to blame.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
It's rather hilarious that a design that is apparently ripped off from Apple is so close in form to Intel's.

Maybe, just MAYBE, due to the maturity of CPU designs as an engineering task, we're seeing the effects of that by all of the best CPU's being similar in design.
Because ya know, when two engineering teams are faced with the same problem (high performance core), have the same knowledge behind them (engineering cross polination from engineers moving from company to company), and have ballpark similar requirements and limitations (Scale from mobile to desktop, have X transistor budget and die size, run popular software on the market, etc' etc'),
they tend to create similar solutions.


Nah, it's totally IP theft. :rolleyes:
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
I have it on good authority that Intel has been so successful because they keep kidnapped aliens in their factories. They've just been stealing alien technology for 20 years, that's why they are so good.

Support human. Support AMD.
 

majord

Senior member
Jul 26, 2015
509
711
136
I would argue Zen and SKL are still very different - Sure, at a very high level they're similar - but so was P6 and K7.

and They still differenate themselves in the same ways when it comes to fundamentals.. e.g AMD's continued seperation of INT and FP , Pipe vs Port execution, Distributed Int Schedulers vs Unified (similar to K7-10 albiet they were tied to ALU/AGU pair scheme), full Complex x86 decoders,
 

amd6502

Senior member
Apr 21, 2017
971
360
136
I disagree on all three counts:

But bd didnt arive as 2 cores on 32mm2 sans l3. It came as 315mm2. Thats the reality that hit the market.

The saving grace for BD1 and BD2 was their multithread and having 8 cores. BD1 (and possibly PD) was (were) the first consumer octacore(s) that hit market and it had a great niche to fill. A quadcore BD1 as flagship would have been a real joke going against Phenom II quads and hexacores.

One thing Nosta has brought up though is the comparisons of 32nm SOI to 45nm, It really was a dog for some time.. Llano was a disaster IMO, with FMAX well below 45nm, and perf/watt was barely any better even at lower freq/power levels. So if you actually compare Llano with piledriver based APU's, the later looks more impressive.

I don't know anything about Llano desktop, but Llano laptop was marvelous; I suppose K10 was meant for lower frequencies and power profiles.

It's natural to expect a performance increase going to a new generation. And however small, that's what we got going from Llano to BD2 APUs.

I'd say Bulldozer and all its variants had good performance, but total junk features. Boost and throttle capabilities didn't add value, if you turned them off you had a much better basis... On Bulldozer, single-core - dual, in light of modular design - boosting occurs when....

Boost and single thread is what saved the dozers; the unimpressive IPC can only be offset by frequency gains.

You could clock the individual cores separately (and thus get very effective single core boosts) starting with Steamroller. Piledriver still had a shared decoder+dispatcher, which might have been what prevented a single core to boost alone. Steamroller eliminated this shared front end and you can set different frequencies for cores on the same module; had steamroller kept the shared front end, steamroller might have been (significantly?) more power efficient. Steamroller is not impressive at all power efficiency wise (neither desktop nor laptop) but overall they made for decent quadcores.

Turbo boost can be annoying at times (on 95W and up power profiles) and I've turned this features off at times for the duration of a summer. Mostly a poor OS setup or frequency governor is to blame. On linux, BOINC loads go from being a nuissance (spinning up the fan) to being near unnoticeable, if you set edit (set to 1) the file /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load .

This executes niced tasks at low power efficient frequencies rather than maximal frequency and makes a world of difference.
 
Last edited:

mtcn77

Member
Feb 25, 2017
105
22
91
I disagree on all three counts:



The saving grace for BD1 and BD2 was their multithread and having 8 cores. BD1 (and possibly PD) was (were) the first consumer octacore(s) that hit market and it had a great niche to fill. A quadcore BD1 as flagship would have been a real joke going against Phenom II quads and hexacores.



I don't know anything about Llano desktop, but Llano laptop was marvelous; I suppose K10 was meant for lower frequencies and power profiles.

It's natural to expect a performance increase going to a new generation. And however small, that's what we got going from Llano to BD2 APUs.



Boost and single thread is what saved the dozers; the unimpressive IPC can only be offset by frequency gains.

You could clock the individual cores separately (and thus get very effective single core boosts) starting with Steamroller. Piledriver still had a shared decoder+dispatcher, which might have been what prevented a single core to boost alone. Steamroller eliminated this shared front end and you can set different frequencies for cores on the same module; had steamroller kept the shared front end, steamroller might have been (significantly?) more power efficient. Steamroller is not impressive at all power efficiency wise (neither desktop nor laptop) but overall they made for decent quadcores.

Turbo boost can be annoying at times (on 95W and up power profiles) and I've turned this features off at times for the duration of a summer. Mostly a poor OS setup or frequency governor is to blame. On linux, BOINC loads go from being a nuissance (spinning up the fan) to being near unnoticeable, if you set edit (set to 1) the file /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load .

This executes niced tasks at low power efficient frequencies rather than maximal frequency and makes a world of difference.
GeAPM setting cannot be switched off directly after selecting a profile in msrtweaker dos utility, since it needs to be done in Stilt's Devastator Powertune if it constantly shuffles between profiles as it does in its usual operation, so I suppose you aren't correct on this one... Stilt has tested how much it impacts to throttle to p3 from p0 and it was 1%.