Setting performance expectations for Bulldozer(client)

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
The point of Bulldozer isn't to have super high single threaded performance, or all out multi-threaded performance damning everything else. It's to provide reasonable performance in the former, with focus on the latter. It's time to analyze the architecture.

"Single-thread": When we say one architecture has performance advantage in "single-thread", what are we referring to here? True single thread performance using applications that only use 1 thread, or per core performance, which shows how the cores interact with each other?

Pure single thread performance is irrelevant. The gains with pure single thread performance with new architecture nowadays is not worth it. If we put AMD's Deneb/Thuban core as 1.0, here's how the single thread performance of modern CPU architectures will look like.

Deneb/Thuban=1.0
Core 2 Penryn=1.05
Core ix Nehalem/Westmere=1.10
Core ix Sandy Bridge=1.2

Even back in the days of Athlon vs first Pentium III's where the latter was thought to be legendary, the performance differences weren't much more than 10% in average. If it wasn't for quarterly increases in clock speed, the performance gains wouldn't have been worth it. Multi-thread is the new clock speed.

Multi-Core performance:
If you look at recent benchmarks though, Sandy Bridge looks to have far better than 20% gain in "single thread" performance over Deneb/Thuban. That's not because there's something magical going on, it's just that Sandy Bridge's multi-core capabilities are superior to Deneb/Thuban too.

That's where the "per-core" performance comes in. Note again this is different from pure single thread performance.

On paper we expect 2x clock to bring us 2x the performance, and same with 2x cores. Just like a car with just the engine being 2x more powerful doesn't bring 2x the max speeds, the same is with microprocessors. You need the supporting components to keep up.

Multi-thread advantage of Sandy Bridge over Deneb/Thuban comes with lots of reasons:
-Two level BTBs
-Memory controller with better latency/bandwidth
-Higher bandwidth caches
-Better handling of the data in the LLC with multiple cores
-SMT(Hyperthreading)

That's how a 1.2x advantage in single thread performance turns into 1.5-1.6x advantage in per core performance(or per chip performance).

Bulldozer changes:

Look at the die!: http://forums.anandtech.com/showthread.php?t=2146715

I've compared Westmere/Sandy Bridge/Bulldozer/Llano cores to each other. In most of the cases, the core sizes determine how much resources is in the architecture. Westmere/Sandy Bridge cores are both almost 2x larger than Llano, hence the significant performance advantage.

Bulldozer, compared to Sandy Bridge, with the former's integer "core" taken out and latter's L2 cache taken out, Sandy Bridge has only 5-10% larger core, or in this case resources. That tells me while there will still be an advantage for both single thread and per core advantage for Sandy Bridge, Bulldozer itself is a substantial improvement, and is probably on par with Westmere.

Now being on par with Westmere isn't a bad thing at all, because I said in the first few paragraphs that single thread performance gains are hard to get and rewards small.

Bulldozer adds:
-2 level BTB
-Significantly better memory controller
-L/S units with some restrictions lifted
-Module-based multi-threading which is superior to SMT

Oh, and it looks like the pipeline stage will end up similar to Nehalem/Sandy Bridge architecture with ~15 stages(Basically the only reason new processors have better branch prediction is to make up for having more mispredictions).

Bulldozer performance estimates

It's pretty much guaranteed that Bulldozer will be a decent amount faster than the 4 core, Sandy Bridge in well multi-threaded apps. There's little doubt about it. It's best gains will come with multi-media apps(like video editing apps) that take advantage of the FMAC, and in those applications, there will be additional up to 2x gain. Now how many programs can take advantage of the FMAC out of the bat is an unknown. If it needs to be recompiled, its probably easier than AVX.

I mean, how many PC users need more CPU power in doing things other than video editing and 3D rendering anyway? The exact applications Bulldozer will shine at. Games? Give me a break. At the resolutions most people run, it'll be GPU bound. Sandy Bridge might turn out to be 5-10% faster in low-thread count games at mostly CPU bound resolutions.

In fact, I wouldn't be surprised if it performs like the 6 core Sandy Bridge in multi-threaded apps, which are the type of applications you need 6 cores vs 4.

If you've taken a look at the Sandy Bridge and Westmere core, you'll notice that the L2 cache is not only small, but is very close to the CPU core. The space Intel uses for the L2 cache is not far off from the space Bulldozer requires for the extra integer "core".

-Small fast L2 cache in Westmere/Sandy Bridge = single thread focus
-Extra, small integer core in Bulldozer = multi-thread focus

Take 5-10% performance off Lynnfield and double the amount of cores. That'd probably make it faster than the 980X. ~20% for max Turbo is not unexpected for Bulldozer, and we can probably see 10% gains for most applications. If Orochi comes at 3.5GHz base clock, Intel better have higher clocked Sandy Bridge E chips to compete!
 
Last edited:

fuzzymath10

Senior member
Feb 17, 2010
520
2
81
There are still a lot of cases where we are pegging the CPU at 25% or 50% (to use 4-core as an example).

My Core i5 2400 is about 75-100% faster than my Q8200 core for core, but I would never want to use the equivalent of a 6 or 8-core Q8200 because when I'm pegged to only 1 or 2 cores, I'm still screwed.

Two examples relevant to me:

1) SC2 instantly ran at higher frame rates after upgrading the CPU and removing my 15% video card overclock and running at higher resolution (900p to 1080p). No GPU bottleneck here--not by a long shot.

2) Many complex Excel macros do not distribute well, and only run on one core. With or without turbo boost, this makes my i5 roughly twice as fast as my Q8200.

Basically, I'm under the impression that for a given amount of computing power, it's best spread across fewer "cores" or whatever appear as discrete units to the operating system, and that splitting it out is driven by technical limitations (power, clock speed, etc.) with marketing capitalizing on another way to compare platforms (you can never have enough "cores").

The only time distributing the power is remotely useful is in relevant software supporting such parallelism, or if you simply run a ton of single/dual core applications at once. For one intensive program running, it just increases idle resources which could otherwise be utilized.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
A CPU with 16mb cache :) and 8 "cores".
Running at 3.5ghz with turbo'ing +500mhz or more.

8 cores that for most things will be running 4ghz+, without you haveing to oc them or anything, right off the bat.

OEM's has to love that, mhz sells, and people can understand cache sizes.
I think bulldozer will be a hit, because of that.


My guess:

1) Sandy bridge at equal mhz, will be slightly faster in 4 threads or less.
2) Sandy bridge at equal mhz, will be ALOT slower in applications that use more than 4 threads (6-8 threads).
3) We will see bulldozers oc like champs, probably hitting the 5ghz on air.
4) more and more games/applications will use more threads, makeing bulldozer more future proof.
 

ShadowVVL

Senior member
May 1, 2010
758
0
71
yeah I have high expectations for 8c bd. Im hoping to see 5.5 ghz on aftermarket cooler.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
My guess:

1) Sandy bridge at equal mhz, will be slightly faster in 4 threads or less.
2) Sandy bridge at equal mhz, will be ALOT slower in applications that use more than 4 threads (6-8 threads).
3) We will see bulldozers oc like champs, probably hitting the 5ghz on air.
4) more and more games/applications will use more threads, makeing bulldozer more future proof.

It looks like you are comparing a 4 core SB to a 8 core BD (statement #2).

Maybe a 6 core Westmere would be a better comparison.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Improvements to single-thread performance is the tide that lifts all boats. It improves the performance of any and all apps that are not expressly I/O bound.

Improvements to multi-thread throughput only improves multi-threaded apps within the limits set forth by Amdahl's law and the inter-processor communication contention as detailed by Almasi and Gottlieb.

I can appreciate the relevance of the fact that one is easier (less costly) to improve upon relative to the other, but that truth doesn't make the former irrelevant.

We'd all be running 1GHz 16-core throughput demons and SUN's Niagara would rule the server world and things like Turbo-Core would not exist if single-threaded performance were irrelevant...and Nvidia would dominate the desktop marketspace with their gigathread GPU's.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
IDC: Nice try. Niagara probably performs less than Atom and individual GPU cores are probably worth less than 200MHz cellphone chips. Did I mention If we take extra threads out of the equation, there's only 20% difference between AMD and Intel that'll be shrunk with Bulldozer?

They mentioned 3.5GHz clock speeds with 500MHz all-core Turbo at least(because that was on a 16 core chip).

4GHz 8 core Lynnfield vs 3.4GHz 4 core Sandy Bridge. Which looks better?
 

Castiel

Golden Member
Dec 31, 2010
1,772
1
0
I can honestly care less how an 8 core BD perform's. What i want to see is a lot better IPC over thuban.

All i seem to hear is 8 core this and 8 core that. Yeah lots of the games i play really support 8 cores. :rolleyes:
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Did I mention If we take extra threads out of the equation, there's only 20% difference between AMD and Intel that'll be shrunk with Bulldozer?
Comparing the singlethreaded cinebench results between a 2600k at 3.8ghz to the 1100t be at 3.3ghz the Intel CPU ends up about 27.5% faster if we take the different frequencies into account.

But we'll see how Bulldozer does on that front - it just doesn't seem like increasing single threaded performance was the most important bullet point for AMD when designing bulldozer. Pretty much every software I use doesn't scale especially well after 4 cores (yep I really should spend more time encoding stuff I know :( )
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
I can honestly care less how an 8 core BD perform's. What i want to see is a lot better IPC over thuban.

All i seem to hear is 8 core this and 8 core that. Yeah lots of the games i play really support 8 cores. :rolleyes:


Games that use 6 threads/cores:
http://www.pcgameshardware.com/aid,...already-benefit-from-six-cores-CPUs/Practice/

Medel of honor
Civilization 5
Ruse
Dead Rising 2
Dragon Age Origins
Arcania - Gothic 4
F1 2010
Lost Planet 2
Anno 1404
Metro 2033
Prince of Persia
ArmA 2
Battlefield: Bad Company 2
GTA4
Colin McRae Dirt 2
Resident Evil 5
Splinter Cell Conviction



Games that use 6 threads/cores:
http://www.overclock.net/amd-cpus/877244-6-core-supported-games.html

Quotes from members, looking through the thread:

Black ops (chewed up all 8 threads of my xeon)
FSX (chews up all 8 threads of my xeon as well)
Napoleon: Total War (supports 6-cores.)
ARMA 2 ( 8 core support)
Flight Sim X (6 cores)
Final Fantasy XIV (6 cores)
Black Ops (8 cores)
Battlefield: Bad Company 2 (8 cores)
Mass Effect 2 (6 cores)
dirt2 (6 cores)
Crysis 2 (supposedly 6+ core support)




Do you play any of the games above? or plan on playing any of the newer released games? 4threads capable cpus might not be enough anymore, if you want the fastests gameing experiance you can get.

In that sense it makes sense, that bulldozers come with 8 core versions.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Do you play any of the games above?
Yup and you know what's funny? Most of them ran perfectly fine on my e8400 at 4.2ghz. Just because a game can use 4 threads doesn't mean you need a 4core CPU - that is, as long as you don't care about running a game at 60 or 150fps. The number of games where a quad core is necessary in the sense of getting 60 instead of 30fps or similar can be counted on one hand.. and it took us about 7 years to get there.
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
I could be entirely wrong, if TurboCORE ends up being terribly effective, but it seems to me that Bulldozer is oriented more towards the server/HPC market rather than the home market. That being said, BD single-thread performance should be "fast enough". Modern processors are more than powerful enough for web browsing, games are almost always GPU-bottleknecked at resolutions people actually play at, and many general productivity operations are IO-bound (unless you have an SSD).

Operations that require serious computational power are increasingly being thrown at the GPU, or can be multithreaded.

Oh, and as JFAMD has said pretty consistently, IPC increases! Even a Deneb X8 @ 3.5ghz would be a formidable chip. If BD manages to increase IPC by ~ 10%, and can operate around 3.5ghz to 4ghz, it should pretty much be a winner in throughput. I expect if nothing else, AMD has a server and HPC winner on their hands. I don't think the mainstream market actually cares. Look at the excitement generated by Llano (although I haven't heard about any design wins yet), we know it won't be a star in singlethreaded non-GPU workloads. But mainstream users just don't care...
 

StinkyPinky

Diamond Member
Jul 6, 2002
6,977
1,276
126
Games that use 6 threads/cores:
http://www.pcgameshardware.com/aid,...already-benefit-from-six-cores-CPUs/Practice/

Medel of honor
Civilization 5
Ruse
Dead Rising 2
Dragon Age Origins
Arcania - Gothic 4
F1 2010
Lost Planet 2
Anno 1404
Metro 2033
Prince of Persia
ArmA 2
Battlefield: Bad Company 2
GTA4
Colin McRae Dirt 2
Resident Evil 5
Splinter Cell Conviction



Games that use 6 threads/cores:
http://www.overclock.net/amd-cpus/877244-6-core-supported-games.html

Quotes from members, looking through the thread:

Black ops (chewed up all 8 threads of my xeon)
FSX (chews up all 8 threads of my xeon as well)
Napoleon: Total War (supports 6-cores.)
ARMA 2 ( 8 core support)
Flight Sim X (6 cores)
Final Fantasy XIV (6 cores)
Black Ops (8 cores)
Battlefield: Bad Company 2 (8 cores)
Mass Effect 2 (6 cores)
dirt2 (6 cores)
Crysis 2 (supposedly 6+ core support)




Do you play any of the games above? or plan on playing any of the newer released games? 4threads capable cpus might not be enough anymore, if you want the fastests gameing experiance you can get.

In that sense it makes sense, that bulldozers come with 8 core versions.

Come on. Mass Effect 2 ran perfectly fine on my old Q6600. Gothic 4??? My Q6600 played that on max settings. What else do you need from it? 400 fps? Same as Dragon Age.

Civ 5 may be interesting as that is a CPU bound game. I haven't played it on my 2500k but it killed my Q6600 near the end game.

A great way to benchmark would be to use a save game with 12 civs and see how long the turn takes on certain cpu's. Would be a good real-world comparison.
 

gevorg

Diamond Member
Nov 3, 2004
5,070
1
0
Bulldozer will most likely pwn Sandy Bridge in multi-threaded apps/games, i.e. it is more future proof. If it can achieve this at lower price points, it will be an epic comeback for AMD. But don't get too excited since Ivy Bridge will be released within months after Bulldozer. Hope AMD can keep up. :)
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Man these are some high expectations you guys are setting for BD....

No kidding.

gevorg said:
But don't get too excited since Ivy Bridge will be released within months after Bulldozer. Hope AMD can keep up. :)

Forget IB, When SB-E (s2011) comes out with a 8 core chip, we will know more.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
We won't know till we have pricing.

I can guarantee that AMD isn't going to cut their own throat on pricing. They will price these as high as they feel they can get away with - as they should.
 

drizek

Golden Member
Jul 7, 2005
1,410
0
71
s2011 will not be price competitive with Bulldozer. I'm sure Intel can make a really fast 8c/16t chip, but with the CPU at a thousand bucks and a motherboard at $300, who cares?
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Games that use 6 threads/cores:
http://www.pcgameshardware.com/aid,...already-benefit-from-six-cores-CPUs/Practice/

Medel of honor
Civilization 5
Ruse
Dead Rising 2
Dragon Age Origins
Arcania - Gothic 4
F1 2010
Lost Planet 2
Anno 1404
Metro 2033
Prince of Persia
ArmA 2
Battlefield: Bad Company 2
GTA4
Colin McRae Dirt 2
Resident Evil 5
Splinter Cell Conviction



Games that use 6 threads/cores:
http://www.overclock.net/amd-cpus/877244-6-core-supported-games.html

Quotes from members, looking through the thread:

Black ops (chewed up all 8 threads of my xeon)
FSX (chews up all 8 threads of my xeon as well)
Napoleon: Total War (supports 6-cores.)
ARMA 2 ( 8 core support)
Flight Sim X (6 cores)
Final Fantasy XIV (6 cores)
Black Ops (8 cores)
Battlefield: Bad Company 2 (8 cores)
Mass Effect 2 (6 cores)
dirt2 (6 cores)
Crysis 2 (supposedly 6+ core support)




Do you play any of the games above? or plan on playing any of the newer released games? 4threads capable cpus might not be enough anymore, if you want the fastests gameing experiance you can get.

In that sense it makes sense, that bulldozers come with 8 core versions.

It's not just number of threads, it's thread dependencies that matter. I'd bet that most if not all of the games you listed have 1 thread that is cpu bound, and all the other threads basically wait for that thread.

In other words, in highly threaded games IPC still matters.
 

StinkyPinky

Diamond Member
Jul 6, 2002
6,977
1,276
126
s2011 will not be price competitive with Bulldozer. I'm sure Intel can make a really fast 8c/16t chip, but with the CPU at a thousand bucks and a motherboard at $300, who cares?

Pretty sure Ivy Bridge is going to have 6c/12t cpus at a good price range. And being 22nm it should clock like a bitch.

If BD is as good as everyone is saying then I suggest IB will be bumped up by a couple of months. I'll be waiting before I return my p67 board for a refund just so I know which sandy bridge mobos support IB.
 

drizek

Golden Member
Jul 7, 2005
1,410
0
71
Pretty sure Ivy Bridge is going to have 6c/12t cpus at a good price range. And being 22nm it should clock like a bitch.

If BD is as good as everyone is saying then I suggest IB will be bumped up by a couple of months. I'll be waiting before I return my p67 board for a refund just so I know which sandy bridge mobos support IB.

I was referring to SB-E, not IB.

IntelUser says that Bulldozer is as good as Westmere, which means it could be not too far off from IB, depending on how aggressive both companies are iwth clocks.