Fudzilla: Bulldozer performance figures are in

Page 67 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
The main thing I worry about with these first BDs is that it will be difficult to push them beyond their rated TDPs. I'll be limited by my cooling to 150-170W but my board should be able to handle a ~200W OCed chip if I get a cooler to match.
 

cantholdanymore

Senior member
Mar 20, 2011
447
0
76
From AMD twiter 12 min ago

AMD_Unprocessed AMD



Did you hear? We broke a Guinness World Record for the “Highest Frequency of a Computer Processor”, with our 8 core FX desktop processor!
 

cantholdanymore

Senior member
Mar 20, 2011
447
0
76
You're right, too many BD threads; but I thought this was the successor of the successor of the thread that was block that was the original thread that...
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Yay, they broke a record with a CPU no consumer can buy!!
Even if one could - a suicide run on 1 module with Ln2 isn't especially useful for anything but bragging rights..

Also if that had any meaning at all, it'd mean that so far the old pentiums were the fastest CPUs around :p
 

garagisti

Senior member
Aug 7, 2007
592
7
81
I had hoped that this already ended when learning about BD's IPC at Hot Chips 2010 and ISSCC 2011 (?constant IPC?).

All falling on deaf ears Matt. :D JF cried himself hoarse to not much result really. People are still saying on forums that IPC is same or worse than Phenom II, which is bemusing really.

Anyways, what will be your concise opinion of IPC improvements in percentage over Phenom II, unless they've found some other TLB bug... I'd tend to opine about 10-20% in most scenarios unless using fancy coding where gains could be more.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
I had hoped that this already ended when learning about BD's IPC at Hot Chips 2010 and ISSCC 2011 (?constant IPC?).

I don't understand your point.

There is non-linear scaling with overclocking, but IPC of SB at 4.7ghz is still around 40% superior over Phenom II at 4.7ghz as it is 40% superior at 3.3ghz.

This 'attack' on IPC has become a recent phenomenon from AMD side. Should we revisit AMD's historical roots when their superior in IPC CPUs were actually good?

Athlon XP+
Athlon 64
Athlon X2 / FX

It's interesting how AMD keeps dismissing IPC as irrelevant in the last 5 years given that only a handful of code exists that uses 6-8 threads. I find it very ironic because in the last 10 years a CPU with superior IPC has shown to be superior most of the time.

Not only that with 50% faster IPC, for instance a 200mhz overclock = 300mhz overclock.

When both CPUs are made on 32nm process, then BD can't overcome IPC disadvantage through overclocking either.

It's sad to see that everything Athlon 64 stood for is what SB is today, while AMD went backwards towards Pentium 4 era of throwing more "specs" (cores) to try to beat efficiency. It's even more ironic considering AMD's GPU division is doing the exact opposite of their CPU division.

Also, isn't it better to get 40-50% faster performance per clock so that you can reduce power consumption since you won't have to clock your processor's frequency as high? Isn't this what gave Athlon 64 the edge over P4?
 
Last edited:

Arzachel

Senior member
Apr 7, 2011
903
76
91
I don't understand your point.

There is non-linear scaling with overclocking, but IPC of SB at 4.7ghz is still around 40% superior over Phenom II at 4.7ghz as it is 40% superior at 3.3ghz.

This 'attack' on IPC has become a recent phenomenon from AMD side. Should we revisit AMD's historical roots when their superior in IPC CPUs were actually good?

Athlon XP+
Athlon 64
Athlon X2 / FX

It's interesting how AMD keeps dismissing IPC as irrelevant in the last 5 years given that only a handful of code exists that uses 6-8 threads. I find it very ironic because in the last 10 years a CPU with superior IPC has shown to be superior most of the time.

Not only that with 50% faster IPC, for instance a 200mhz overclock = 300mhz overclock.

When both CPUs are made on 32nm process, then BD can't overcome IPC disadvantage through overclocking either.

It's sad to see that everything Athlon 64 stood for is what SB is today, while AMD went backwards towards Pentium 4 era of throwing more "specs" (cores) to try to beat efficiency. It's even more ironic considering AMD's GPU division is doing the exact opposite of their CPU division.

Diminishing returns. This is the thing that you seem to be unable to grasp.

Everything is a tradeoff and the more you focus on one specific aspect that contributes to the overall performance, the more costly it becomes to push it further. Intel can afford it, because otherwise they would drown in all the money they have, but AMD has no such luxury.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
All falling on deaf ears Matt. :D JF cried himself hoarse to not much result really. People are still saying on forums that IPC is same or worse than Phenom II, which is bemusing really.
Well on forums and also from official people at AT, to cite:

JarredWalton said:
I suspect that clock for clock, a single BD core will be slower than current K10.5 stuff, but you'll get more cores


But the point was that the interesting number isn't the 8ghz in a suicide run, but the 4.8ghz with the watercooling setup, because that should basically be about the upper boundary one should expect for homeuse (if those were final chips) - assuming that one could run that setup stable (heaven benchmark that stresses 2-4cores isn't what I'd call a guarantee for that) for some time without killing/degrading the chip.
 
Last edited:

intangir

Member
Jun 13, 2005
113
0
76
All falling on deaf ears Matt. :D JF cried himself hoarse to not much result really. People are still saying on forums that IPC is same or worse than Phenom II, which is bemusing really.

Anyways, what will be your concise opinion of IPC improvements in percentage over Phenom II, unless they've found some other TLB bug... I'd tend to opine about 10-20% in most scenarios unless using fancy coding where gains could be more.

No, he's saying that AMD already came out and said that the project goals for Bulldozer were 25%+ higher frequency with the same IPC as previous AMD chips. So all this talk of Nehalem-like IPC should have stopped in February.

http://ieeexplore.ieee.org/Xplore/l...746228.pdf?arnumber=5746228&authDecision=-203

Compared to previous AMD x86-64 cores, project goals reduce the number of F04 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
Well on forums and also from official people at AT, to cite:




But the point was that the interesting number isn't the 8ghz in a suicide run, but the 4.8ghz with the watercooling setup, because that should basically be about the upper boundary one should expect for homeuse (if those were final chips) - assuming that one could run that setup stable (heaven benchmark that stresses 2-4cores isn't what I'd call a guarantee for that) for some time without killing/degrading the chip.
You forgot to quote last paragraph from Jarred's post :
AMD might still pull this off, but considering the lack of benchmark information I remain skeptical. (We saw running K8 long before launch way back in the day -- http://www.anandtech.com/show/883 -- and I seem to recall benches getting leaked at least several months before launch on some sites.) Just my feelings on the subject right now, as someone that hasn't seen any actual real data on BD performance -- leaked or otherwise.
So he has no idea about how Bulldozer performs and he admits this...

No, he's saying that AMD already came out and said that the project goals for Bulldozer were 20% higher frequency with the same IPC as previous AMD chips. So all this talk of Nehalem-like IPC should have stopped in February.

http://ieeexplore.ieee.org/Xplore/l...746228.pdf?arnumber=5746228&authDecision=-203

I think that the quote refers to threaded(throughput) performance of Bulldozer.So you get same IPC as Phenom/Magny Cours in fully loaded Bulldozer based chip. But you still get more cores(33% more to be exact) and therefore your throughput goes up. Then you have more clock speed headroom and your performance again goes up.
Also note that AMD states 80% of CMP design . I assume this refers (as JF-AMD stated in his XS post) to 1.8x performance over single core running in module. So you have constant IPC (or ~ Phenom II) with fully loaded module which has the sharing penalty.Single thread therefore will perform better and this is before Turbo kicks in. By how much better? From 0% to 11% ,probably varying a lot depending on micro benchmark.
 

garagisti

Senior member
Aug 7, 2007
592
7
81
Originally Posted by JarredWalton
I suspect that clock for clock, a single BD core will be slower than current K10.5 stuff, but you'll get more cores
He suspects... What does that tell you? He hasn't signed any NDA, and has no figures, or chips. He may as well suspect to know the winner of 2012 election. In short, it amounts to plain speculation, and nothing more. Certainly not anything worthwhile. I'll tend to think JF who works for AMD, has both access to chips and engineers would know more than him about the processor.

But the point was that the interesting number isn't the 8ghz in a suicide run, but the 4.8ghz with the watercooling setup, because that should basically be about the upper boundary one should expect for homeuse (if those were final chips) - assuming that one could run that setup stable (heaven benchmark that stresses 2-4cores isn't what I'd call a guarantee for that) for some time without killing/degrading the chip.
In many a tests at [H], a good air cooling setup was found to be as good/ better than most ready to go water cooling set-ups available. They used an Antec solution, which someone on another thread mentioned is only as good as a Noctua DH14. If you didn't notice, with a TDP of 125W (AMD's TDP /= Intel's TDP), they already hit 4+ Ghz on turbo. So yes, you could potentially have all 8 cores running at about 4-4.5 ghz 24/7 (with a good air-cooling setup) in most cases, unless you get a bum chip. In best case scenario, they're clocking it max at 5.5 Ghz, which i don't think many would want to run their SB even, so complaining is moot.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
No, he's saying that AMD already came out and said that the project goals for Bulldozer were 25%+ higher frequency with the same IPC as previous AMD chips. So all this talk of Nehalem-like IPC should have stopped in February.

http://ieeexplore.ieee.org/Xplore/l...746228.pdf?arnumber=5746228&authDecision=-203

This quote is misunderstood :

Compared to previous AMD x86-64 cores, project goals reduce the number of F04 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.

When they say constant IPC , it s not in respect of previous Uarch,
but in reference to BD designed to achieve a constant IPC when
running a task without periods of collapsing IPC rate.
 

intangir

Member
Jun 13, 2005
113
0
76
This quote is misunderstood :

When they say constant IPC , it s not in respect of previous Uarch,
but in reference to BD designed to achieve a constant IPC when
running a task without periods of collapsing IPC rate.

If you read the paper, it's obvious that it's comparing two microarchitectures. But you believe what you want to believe, I suppose.

No microarchitect would design a chip to have IPC insensitive to workload. It's not a measure with any real-world value.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Diminishing returns. This is the thing that you seem to be unable to grasp.

Everything is a tradeoff and the more you focus on one specific aspect that contributes to the overall performance, the more costly it becomes to push it further. Intel can afford it, because otherwise they would drown in all the money they have, but AMD has no such luxury.

How am I unable to grasp the point of diminishing returns? I am talking about the overall foundation behind a CPU architecture. If you want to improve performance per clock per watt, you add things to the processor that improve its throughput efficiency per clock. I believe Intel has a rule where they'll only add something to the CPU architecture only if it adds 2% more performance for 1% increase in power. They will leave something on the table if you get 1% more performance for 1% more power.

You get diminishing returns whether you focus on more cores or on adding everything you can to increase IPC. That part is obvious and I understand it completely. It appears to me that AMD focused almost entirely on adding more cores during a time when most PC code doesn't scale beyond 4 threads.

They simply mistimed the adoption of multi-threaded code by what appears to be a full generation. When they sat down in the boardroom and discussed which direction their next CPU should go to, they all voted multi-core, not a balanced approach. It is now starting to smell like they have slow 6-8 core CPUs in a marketplace where quad core CPUs won't become mainstream until 2014-2015. In other words, most consumers would rather purchase fast and power efficient dual and quad core CPUs and here AMD wants to sell them 6-8 core CPUs. The market/consumer trends are working completely against AMD's current CPU strategy.

http://www.xbitlabs.com/news/cpu/di...re_Quad_Core_Processors_in_2015_Analysts.html

My personal opinion is that in the PC period from 2011 - 2013, AMD would have been FAR better off releasing a 20-30% faster IPC 4-6 core Phenom II style processor than a 6-8 core Bulldozer style processor with barely any increase in IPC over Phenom II (as a result of the 10-20% penalty incurred due to module design).

If you have a processor with superior IPC, you don't need 4.0ghz clock speeds. As such:

1) You need less voltage to scale higher frequencies, and thus you have lower power consumption;
2) You likely get improved yields since it's harder to get 8 working cores at 4.0ghz vs. 8 cores at 3.0ghz
3) Your CPU line refreshes need only 100-200mhz clock speed increase not 200-300 mhz increases to get similar performance gains. This puts less pressure in the next 3-4 quarters on achieving higher performance through even higher frequencies (because achieving higher yields at 4.5ghz is even harder).

I just don't understand their strategy this round unless they decided to focus on servers and workstation users; and use marketing to sell more cores to avg. user who may think that their cores are not any worse than Intel's.
 
Last edited:

Arzachel

Senior member
Apr 7, 2011
903
76
91
I'm quite sure "constant" in this context doesn't mean "in line with PII". Instead, it probably means "not fluctuating".

Reminds me of the Ye Olden BD thread where someone (JFAMD maybe?) explained how BD should have higher integer performance then PII despite having less execution units per core, because usually only 1 of the 3 gets used per clock. BD should use both most of the time.

This is all from memory, so feel free to correct me if I remember wrong.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
If you read the paper, it's obvious that it's comparing two microarchitectures. But you believe what you want to believe, I suppose.

No microarchitect would design a chip to have IPC insensitive to workload. It's not a measure with any real-world value.

Not at all...
A chip that has a peak IPC of 2 but can sustain an average IPC of 1.5
will be better than a chip that is capable of a peak IPC of 3 but that will
do only an average of 1 with sustained loading...
 

intangir

Member
Jun 13, 2005
113
0
76
Not at all...
A chip that has a peak IPC of 2 but can sustain an average IPC of 1.5
will be better than a chip that is capable of a peak IPC of 3 but that will
do only an average of 1 with sustained loading...

Yes, but the proper measure of that is average IPC over many workloads. They wouldn't use IPC variability as a metric (how would you measure that? standard deviation?), because no buyer actually cares about that.

The only sensible interpretation of "constant" is "the same with respect to previous architectures".
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Yes, but the proper measure of that is average IPC over many workloads. They wouldn't use IPC variability as a metric (how would you measure that? standard deviation?), because no buyer actually cares about that.

The only sensible interpretation of "constant" is "the same with respect to previous architectures".

I think a lot of server chip buyers care about relative IPC consistency. It's not a strict numerical metric but a description of behavior, i.e. we aren't designing to look good in current benchmark software. Can't be certain what was meant by "constant" though without more context or a direct clarification from AMD.
 

intangir

Member
Jun 13, 2005
113
0
76
I think a lot of server chip buyers care about relative IPC consistency. It's not a strict numerical metric but a description of behavior, i.e. we aren't designing to look good in current benchmark software. Can't be certain what was meant by "constant" though without more context or a direct clarification from AMD.

No, no they don't. They only care about worst-case latencies, which is fundamentally limited by single-threaded performance.

But like I said, people will believe what they want to believe. Just ask Dresdenboy how he interprets that statement. See what he says. I've said my piece, and saying more would just get me frustrated.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Yes, but the proper measure of that is average IPC over many workloads. They wouldn't use IPC variability as a metric (how would you measure that? standard deviation?), because no buyer actually cares about that.

The only sensible interpretation of "constant" is "the same with respect to previous architectures".

He s not comparing two uarch , the term "constant" is a property of the
design he is talking about.

Average IPC will be the one inherently measured by benchmarks...
 
Status
Not open for further replies.