How bad was AMD Bulldozer and its variants

Wolverine607 · May 18, 2017

Like for 1 core of it on a clock per clock basis at the same speed.

Say one core of an AMD FX CPU at 4GHz vs an Intel Core i5 Sandy Bridge 2500K at 4GHz with no turbo or throttling.

Wasn'y Bulldozr so bad that at the same clock speed of 4GHz, it was way under even half the performance of even Sandy Bridge for a single threaded app or a multi-threaded app running on an Intel Sandy Bridge E using 8 cores at same exact clock speed?

Or was it not that bad but more like half or only 60% as fast?

BigDaveX · May 18, 2017

It depended on the situation. It could be only about half as fast as Sandy Bridge per clock, but that was usually only in worst-case scenarios, such as apps that used old compilers and/or had lots of branchy code. Generally speaking it was more like 65-75% the single-thread performance, depending on the scenario.

Topweasel · May 18, 2017

BigDaveX said:
It depended on the situation. It could be only about half as fast as Sandy Bridge per clock, but that was usually only in worst-case scenarios, such as apps that used old compilers and/or had lots of branchy code. Generally speaking it was more like 65-75% the single-thread performance, depending on the scenario.

But that is at its best case when it finally started to see positive IPC. When Bulldozer hit, it was about about 5% slower per clock than what was it at the time Husky? Or Stars? Anyways the cores from Llano and Thuban. Which put it about 60% down on Sandy Bridge. It's advantage was much higher clocks. So outright performance was about 65% of Sandy Bridge (or down 35%).

Roland00Address · May 18, 2017

It was slower performance for the similar Core 2 Quad in both single and multi performance if you compare a Core 2 Quad with a FX 4100 (3.6 Ghz base, 3.7 2 module turbo, 3.8 ghz 1 module turbo). At a stock clock the Core 2 Quad at 3 ghz (lets say the Intel Core 2 Quad 9650) is trading blows with the FX4100, but over clock the Q9650 to 3.6 ghz and the Core 2 Quad is fast in most situations both single thread and multi thread.

Now the Q9650 came out in 2008, but other 45nm quads came out in 2007. It was a 45nm processor and it was an old architecture in 2008 with the 1st generation of Core with the integrated IMC (no FSB) coming also out in 45nm in 2008 with the i7 920.

Now compare this to AMD who with their 32 nm product coming out in 2011. Aka it took 4 years for AMD to reach parity with the Core 2 Quad in similar performance and a similar ish clock speed.

The thing is the AMD Thuban and Phenom IIs also did this several years earlier and actually in many things performed better than the bulldozer architecture supposedly they replaced.

-----

Now with the new atoms / pentiums that are using Intel's Goldmount if we were to increase the theoretical performance via 50% (pretend that Apollo Lake could scale to 3.6 ghz instead of 2.4 ghz) we would not be as quiet as fast as the AMD FX 4100, with the AMD being faster...but that is not saying much for we are talking about intel's "free cores" effectively where you are paying not really for the cpu core but the rest of the silicon and Intel goal is to make the cpu core and l2 cache being as cheap as possible for you still have to make the rest of the silicon to make a soc work and your goal is to make the total die size as cheap as possible.

Wolverine607 · May 18, 2017

Roland00Address said:
It was slower performance for the similar Core 2 Quad in both single and multi performance if you compare a Core 2 Quad with a FX 4100 (3.6 Ghz base, 3.7 2 module turbo, 3.8 ghz 1 module turbo). At a stock clock the Core 2 Quad at 3 ghz (lets say the Intel Core 2 Quad 9650) is trading blows with the FX4100, but over clock the Q9650 to 3.6 ghz and the Core 2 Quad is fast in most situations both single thread and multi thread.

Now the Q9650 came out in 2008, but other 45nm quads came out in 2007. It was a 45nm processor and it was an old architecture in 2008 with the 1st generation of Core with the integrated IMC (no FSB) coming also out in 45nm in 2008 with the i7 920.

Now compare this to AMD who with their 32 nm product coming out in 2011. Aka it took 4 years for AMD to reach parity with the Core 2 Quad in similar performance and a similar ish clock speed.

The thing is the AMD Thuban and Phenom IIs also did this several years earlier and actually in many things performed better than the bulldozer architecture supposedly they replaced.

-----

Now with the new atoms / pentiums that are using Intel's Goldmount if we were to increase the theoretical performance via 50% (pretend that Apollo Lake could scale to 3.6 ghz instead of 2.4 ghz) we would not be as quiet as fast as the AMD FX 4100, with the AMD being faster...but that is not saying much for we are talking about intel's "free cores" effectively where you are paying not really for the cpu core but the rest of the silicon and Intel goal is to make the cpu core and l2 cache being as cheap as possible for you still have to make the rest of the silicon to make a soc work and your goal is to make the total die size as cheap as possible.

Was its parity 4 years after Core 2 Quad release in 2007 overall or on a same clock speed per core basis. Like say there was a 4 core Bulldozer at 3.5GHz vs a 4 core Core 2 Quad Yorkfield at 3.5GHz. Was the Core 2 Quad still faster and by how much at the same clock and same core count with no hyperthreading?

NostaSeronx · May 18, 2017

How bad was it?
Not bad at all, what the hell you talking about?

How bad was it if you actually look at the process into functionally release subpar-Bulldozer ver Charlie?
Oh what the hell AMD. You could have done better than THAT!

2007 -> Bulldozer on 45nm first eng samp tapes out.
2008- > Dirk Meyer, lets delay it to 32nm. So, we can switch to power-hungry 8T SRAM, Phat FPU, removal of arithmatic units in AGLUs, lower clock speed.
2009 -> GlobalFoundries delays 32nm. Dirk Meyer herp derp BOBCAT, BOBCAT, BOBCAT!!!!
2011 -> Bulldozer finally releases. Its broken, but hey at least Bobcat launched on time.

Really, the only functional dozers were Steamroller & Excavator. That was after Zen took mainstream, which in turn cancelled the actual successors to BD/PD. (Mind you, AMD is still working on 15h to this day.)

MajinCry · May 18, 2017

For open world games, the biggest limiter in performance is the number of draw calls being issued. AMD's CPUs are bad at handling them, with Ryzen (w/ fast RAM and the game running on one ccx) getting around Sandybridge level draw call perf.

Got a nice thread with the results from various users: https://forums.anandtech.com/threads/part-2-measuring-cpu-draw-call-performance.2499609/

Clock for clock, comparing against Piledriver at draw calls:

Lynnfield is 54% faster
Sandybridge is 35-39% faster
Haswell is 47% faster
Skylake is 77% faster

DrMrLordX · May 19, 2017

OP apparently doesn't want to look at the last 6 years of benchmarks to find out the answer.

rvborgh · May 19, 2017

On Cinebench Bulldozer core (1/2 module) was around 71% of a K10 core's performance at same GHz. Piledrive rose to around 83%, Steamroller a bit better still until Excavator where they got the IPC just about equal with K10 ("stars").

Seems to me that they'd have had better results if they had just widened K10, and fixed the K10's retire bottleneck, and applied all that work to optimizing the existing K10 (while they worked on Ryzen).

BD was pretty much a disaster IMHO... Ryzen is the complete opposite.

PS: btw, i ran a "simulated" 8 core K10 chip on my 48 core Opteron system... @ 3.5 GHz.

https://www.youtube.com/watch?v=THhZTmkrxzQ

CB R11.5 8.09 and that was with the existing core. Piledriver at 4 GHz scores high 6s.

IEC · May 19, 2017

They were terrible. That's why I didn't buy ANY processors from AMD for over a decade. Until Ryzen.

Atari2600 · May 19, 2017

rvborgh said:
Seems to me that they'd have had better results if they had just widened K10, and fixed the K10's retire bottleneck, and applied all that work to optimizing the existing K10 (while they worked on Ryzen).

+1

Thuban and Deneb were on 45nm compared to BD on 32nm - and the transistor count for both cores were similar.

Given the process improvement allowed a ~30W drop in thermals relative to Thuban, and 45nm was shown to clock to 3.7GHz for the PhII x4 980 then you gotta imagine more than a few hundred MHz could have been added to a shrunk Thuban.

Taking the anandtech launch review of the time, just a bog standard Thuban at 3.7 GHz would have been quicker than BD in pretty much every application tested.

A straight shrink of Deneb onto 32nm would have outperformed BD pretty much across the board. A postulated PhenomIII where they addressed known bottlenecks would likely have been significantly quicker. The entire Bulldozer design philosophy was a real mistake.

scannall · May 19, 2017

IEC said:
They were terrible. That's why I didn't buy ANY processors from AMD for over a decade. Until Ryzen.

I did build a couple of HTPC's that used the A4, and they were great for that. And my laptop with an A8 4500m still works great. But for any heavy lifting, they were just too far behind.

rvborgh · May 19, 2017

rather good description of why:

https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-33#post-38796242

krumme · May 19, 2017

rvborgh said:
rather good description of why:

https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-33#post-38796242

A stellar technical explanation. Funny and sad at the same time. But i still wonder to this day why and how amd ended in such a technical mess?

itsmydamnation · May 19, 2017

krumme said:
A stellar technical explanation. Funny and sad at the same time. But i still wonder to this day why and how amd ended in such a technical mess?

He just skipped all the good parts about bulldozer.
BD at every pipeline stage of the core was significantly more modern then STAR's:
Much Improved Branch predictors
Decoupling in the front end
Significantly bigger TLB's
Physical Register file instead of ROB's
significantly bigger load store queue
significantly better at memory disambiguation

If bulldozer had of shipped with 32kb L1 that wasn't write through, a uop cache (wouldn't have needed to duplicate the decode like they did in SR) and a 3rd ALU, it probably would have performed significantly better at the same power and still hit close to the same clocks.

BD reaks of development hell, the rumors you see float around occasionally seem to blade Dirk Meyers for forcing a design target from the top down.

NostaSeronx · May 19, 2017

itsmydamnation said:
....
1. If bulldozer had of shipped with 32kb L1 that wasn't write through,
2. a uop cache (wouldn't have needed to duplicate the decode like they did in SR) and
3. a 3rd ALU.

1. 8T SRAM came at a premium, so it would have had to been delayed to 22nm PDSOI. The write policy is because of the intrinsic issue with early Andy Glew MCMT at amd. It was never meant to use L1 as private caches. Instead, smaller L0 data cache with a unified L1 cache.
Andy Glew -

I.e. I think MCMT is clearly justified with a 4K or 8K L0 cache.
But probably not justified with a 32K L1 cache. 16K? I am sure that
AMD must have simulated it.

2. The duplicated decode is for the increased instruction window length. BD/PD = 16B Windows, SR/XV = 32B Windows. So, 4 decode is fine for BD, but 8 decode is needed for SR. Keep the window length for decode the same, means duplicating the decode. As well removing ThreadA/ThreadB SMT and replacing it with WindowA/WindowB logic at that stage. The loop cache is a good enough (macro)/u-op cache.
3. There are four ALUs in every Bulldozer variant currently. 2 ALUs in the EX pipelines and 2 Simple ALUs in the AGLU pipelines.

Bulldozer's issue was development intrinsic at AMD. Can someone release a better MCMT core? Yes, definitely. (tunnelborer, digger, harvester, etc)

Just to note because of Zen, we never got to see Steamroller Rev A. (Instead, Steamroller Rev B was launched which was a mobile/consumer oriented version.)

itsmydamnation · May 19, 2017

You present a lot of rumor as fact:

1. Andy glew left well before bulldozer that was released.
2. There is nothing that needs a higher instruction window for SR, it was removing a decode bottleneck because the unit was round robin. Zen can decode upto 32bytes a cycle and yet according to agner rarely gets above 17, a uop cache would serve the exact same purpose and do it at lower power just like Zen,SB,Haswell,skylake etc.
3. The L1 arrays aren't even that big in BD, we are talking 1 or 2mm a module,
4. There are very few ALU instructions(like none) on the AGU's, no adds,muls,div,and,or,xor,etc
5. BD had all non memory functions on only 2 ALU's this creates unit contention like Mul and branch on the same port.

just to note because of Zen AMD is now competitive again.........

krumme · May 20, 2017

I recall Keller described bd design as something like:

"trying to make the ocean boil"

I think perhaps with k7 meyers were in a sense that they did accomplish something like that (big) and therefore could do it again. Forcing an impossible design structure top down.

But i like to see this in a broader sense;

Amd perhaps also have have an engineering culture thar promotes such designs. A kind of wish to do everything.
It seems to me it can also be seen in their product portfolio especially for gpu that seems to me to be way to wide.
Ironically Raja said to the investers meeting something like "i know a lot of you want us to cover more segments"
Imo he is applying meaning to his own doubt if they are targeting to broad.

Wolverine607 · May 20, 2017

I did look at some benchmarks but they were all over the place showing the bottom line it was bad. Just curious how bad assuming same core count and same threads and same GHz speed which would put it against a Sandy Bridge E or Haswell E 8 corre with HT disabled purposely clocked at same speed. It seemed so bad that they were even less than half speed and is why AMD had 8 core chips priced so low compared to Intel's 4 core chips because performance per core was beyond embarrassingly bad.

Ryzen is a massive improvement, but still gets beat bad by Intel at the same clock speed evne in non-gaming related tasks at the same clock speed. Just is not near the embarrassment that Bulldozer variants were and puts AMD in the game on a price per dollar segment again like back in the Core 2 vs AMD K10 days where AMD had a market on a dollar value, but still got beat by like 20-30% by Intel. Where as when Bulldozer was released it was so bad Intel was beating it by 100% or more on a per core and same thread count a the same clock speed.

The reason why AMD Ryzen competes very close to Intel in may non-gaming benchmarks is because of higher clocks vs Intel's 8 core CPUs. If you look at AD Ryzen CPUs close in GHz speed to Ryzen 8 core, it gets beat handily by 20% and even worse in gaming it seems.

Trolling is not allowed.
Markfw
Anandtech Moderator

Ken g6 · May 20, 2017

I only remember recommending a Bulldozer-style AMD chip once, to a guy who already had an AM3+ mobo, and wanted to improve his h.265 video encoding speed over his Phenom. In every other situation, it was worse than the Intel offering, or sometimes even AMD's previous processors.

scannall · May 20, 2017

Wolverine607 said:
I did look at some benchmarks but they were all over the place showing the bottom line it was bad. Just curious how bad assuming same core count and same threads and same GHz speed which would put it against a Sandy Bridge E or Haswell E 8 corre with HT disabled purposely clocked at same speed. It seemed so bad that they were even less than half speed and is why AMD had 8 core chips priced so low compared to Intel's 4 core chips because performance per core was beyond embarrassingly bad.

Ryzen is a massive improvement, but still gets beat bad by Intel at the same clock speed evne in non-gaming related tasks at the same clock speed. Just is not near the embarrassment that Bulldozer variants were and puts AMD in the game on a price per dollar segment again like back in the Core 2 vs AMD K10 days where AMD had a market on a dollar value, but still got beat by like 20-30% by Intel. Where as when Bulldozer was released it was so bad Intel was beating it by 100% or more on a per core and same thread count a the same clock speed.

The reason why AMD Ryzen competes very close to Intel in may non-gaming benchmarks is because of higher clocks vs Intel's 8 core CPUs. If you look at AD Ryzen CPUs close in GHz speed to Ryzen 8 core, it gets beat handily by 20% and even worse in gaming it seems.

In productivity benchmarks, the Ryzen 7 meets or beats the Intel equivalent 6900k. Productivity Benchmarks. Which for my uses, is more important. And the gaming I do is at 4k. Again, the 1080 gaming benchmarks don't come into play at all.

Wolverine607 · May 20, 2017

scannall said:
In productivity benchmarks, the Ryzen 7 meets or beats the Intel equivalent 6900k. Productivity Benchmarks. Which for my uses, is more important. And the gaming I do is at 4k. Again, the 1080 gaming benchmarks don't come into play at all.

Yeah it beats it by a little and 1800X is clocked 400-500Mhz higher than the 6900K at stock clock for non turbo and turbo. How would it fare a the same clock speed.

But yes if you do not overclock and intend to leave it at stock speed, the Ryzen 1800X is a much better buy especially for the price than the 6900K if gaming is not your primary usage.

I am no Intel fanboy, just stating the facts. I wish AMD could beat Intel or be neck and neck with them like back in the AMD Athlon 64 X2 days were AMD spanked Intel or in the P4 Northwood and Athlon XP Barton and Athlon 64 days where they were neck and neck.

SlowSpyder · May 20, 2017

I really like my FX. Great hobbyist part, and handles my day to day use without a sweat.

edcoolio · May 20, 2017

SlowSpyder said:
I really like my FX. Great hobbyist part, and handles my day to day use without a sweat.

Agreed.

Shivansps · May 20, 2017

They were horrible, some workloads were slower than Phenom 2, and power/temperature are trought the roof compared to Sandy/Ivy Bridge.

In the recent years FX8350 has managed to catch up with the 2500K in some heavy MT workloads, still the FX8350 launched after Ivy.

How bad was AMD Bulldozer and its variants

Member

Senior member

Diamond Member

Platinum Member

Member

Diamond Member

Platinum Member

Lifer

Member

Elite Member

Golden Member

Golden Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Programming Moderator, Elite Member

Golden Member

Member

Lifer

Senior member

Diamond Member