Fudzilla: Bulldozer performance figures are in

Vesku · Sep 12, 2011

The main thing I worry about with these first BDs is that it will be difficult to push them beyond their rated TDPs. I'll be limited by my cooling to 150-170W but my board should be able to handle a ~200W OCed chip if I get a cooler to match.

cantholdanymore · Sep 13, 2011

From AMD twiter 12 min ago

AMD_Unprocessed AMD

Did you hear? We broke a Guinness World Record for the “Highest Frequency of a Computer Processor”, with our 8 core FX desktop processor!

BlueBlazer · Sep 13, 2011

cantholdanymore said:
From AMD twiter 12 min ago

AMD_Unprocessed AMD

Did you hear? We broke a Guinness World Record for the Highest Frequency of a Computer Processor, with our 8 core FX desktop processor!

Where were you? Too late... Already posted at two other threads (you can find them easily).

cantholdanymore · Sep 13, 2011

You're right, too many BD threads; but I thought this was the successor of the successor of the thread that was block that was the original thread that...

Edrick · Sep 13, 2011

cantholdanymore said:
Did you hear? We broke a Guinness World Record for the “Highest Frequency of a Computer Processor”, with our 8 core FX desktop processor!

Yay, they broke a record with a CPU no consumer can buy!!

Voo · Sep 13, 2011

Edrick said:
Yay, they broke a record with a CPU no consumer can buy!!

Even if one could - a suicide run on 1 module with Ln2 isn't especially useful for anything but bragging rights..

Also if that had any meaning at all, it'd mean that so far the old pentiums were the fastest CPUs around

Dresdenboy · Sep 13, 2011

Voo said:
Even if one could - a suicide run on 1 module with Ln2 isn't especially useful for anything but bragging rights..

Also if that had any meaning at all, it'd mean that so far the old pentiums were the fastest CPUs around

And where is yours?

Dresdenboy · Sep 13, 2011

RussianSensation said:
That should pretty end the argument of Nehalem-like IPC.

I had hoped that this already ended when learning about BD's IPC at Hot Chips 2010 and ISSCC 2011 (?constant IPC?).

garagisti · Sep 13, 2011

Dresdenboy said:
I had hoped that this already ended when learning about BD's IPC at Hot Chips 2010 and ISSCC 2011 (?constant IPC?).

All falling on deaf ears Matt.

JF cried himself hoarse to not much result really. People are still saying on forums that IPC is same or worse than Phenom II, which is bemusing really.

Anyways, what will be your concise opinion of IPC improvements in percentage over Phenom II, unless they've found some other TLB bug... I'd tend to opine about 10-20% in most scenarios unless using fancy coding where gains could be more.

RussianSensation · Sep 13, 2011

Dresdenboy said:
I had hoped that this already ended when learning about BD's IPC at Hot Chips 2010 and ISSCC 2011 (?constant IPC?).

I don't understand your point.

There is non-linear scaling with overclocking, but IPC of SB at 4.7ghz is still around 40% superior over Phenom II at 4.7ghz as it is 40% superior at 3.3ghz.

This 'attack' on IPC has become a recent phenomenon from AMD side. Should we revisit AMD's historical roots when their superior in IPC CPUs were actually good?

Athlon XP+
Athlon 64
Athlon X2 / FX

It's interesting how AMD keeps dismissing IPC as irrelevant in the last 5 years given that only a handful of code exists that uses 6-8 threads. I find it very ironic because in the last 10 years a CPU with superior IPC has shown to be superior most of the time.

Not only that with 50% faster IPC, for instance a 200mhz overclock = 300mhz overclock.

When both CPUs are made on 32nm process, then BD can't overcome IPC disadvantage through overclocking either.

It's sad to see that everything Athlon 64 stood for is what SB is today, while AMD went backwards towards Pentium 4 era of throwing more "specs" (cores) to try to beat efficiency. It's even more ironic considering AMD's GPU division is doing the exact opposite of their CPU division.

Also, isn't it better to get 40-50% faster performance per clock so that you can reduce power consumption since you won't have to clock your processor's frequency as high? Isn't this what gave Athlon 64 the edge over P4?

Arzachel · Sep 13, 2011

RussianSensation said:
I don't understand your point.

There is non-linear scaling with overclocking, but IPC of SB at 4.7ghz is still around 40% superior over Phenom II at 4.7ghz as it is 40% superior at 3.3ghz.

This 'attack' on IPC has become a recent phenomenon from AMD side. Should we revisit AMD's historical roots when their superior in IPC CPUs were actually good?

Athlon XP+
Athlon 64
Athlon X2 / FX

It's interesting how AMD keeps dismissing IPC as irrelevant in the last 5 years given that only a handful of code exists that uses 6-8 threads. I find it very ironic because in the last 10 years a CPU with superior IPC has shown to be superior most of the time.

Not only that with 50% faster IPC, for instance a 200mhz overclock = 300mhz overclock.

When both CPUs are made on 32nm process, then BD can't overcome IPC disadvantage through overclocking either.

It's sad to see that everything Athlon 64 stood for is what SB is today, while AMD went backwards towards Pentium 4 era of throwing more "specs" (cores) to try to beat efficiency. It's even more ironic considering AMD's GPU division is doing the exact opposite of their CPU division.

Diminishing returns. This is the thing that you seem to be unable to grasp.

Everything is a tradeoff and the more you focus on one specific aspect that contributes to the overall performance, the more costly it becomes to push it further. Intel can afford it, because otherwise they would drown in all the money they have, but AMD has no such luxury.

Voo · Sep 13, 2011

garagisti said:
All falling on deaf ears Matt. JF cried himself hoarse to not much result really. People are still saying on forums that IPC is same or worse than Phenom II, which is bemusing really.

Well on forums and also from official people at AT, to cite:

JarredWalton said:
I suspect that clock for clock, a single BD core will be slower than current K10.5 stuff, but you'll get more cores

But the point was that the interesting number isn't the 8ghz in a suicide run, but the 4.8ghz with the watercooling setup, because that should basically be about the upper boundary one should expect for homeuse (if those were final chips) - assuming that one could run that setup stable (heaven benchmark that stresses 2-4cores isn't what I'd call a guarantee for that) for some time without killing/degrading the chip.

intangir · Sep 13, 2011

garagisti said:
All falling on deaf ears Matt. JF cried himself hoarse to not much result really. People are still saying on forums that IPC is same or worse than Phenom II, which is bemusing really.

Anyways, what will be your concise opinion of IPC improvements in percentage over Phenom II, unless they've found some other TLB bug... I'd tend to opine about 10-20% in most scenarios unless using fancy coding where gains could be more.

No, he's saying that AMD already came out and said that the project goals for Bulldozer were 25%+ higher frequency with the same IPC as previous AMD chips. So all this talk of Nehalem-like IPC should have stopped in February.

http://ieeexplore.ieee.org/Xplore/l...746228.pdf?arnumber=5746228&authDecision=-203

Compared to previous AMD x86-64 cores, project goals reduce the number of F04 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.

inf64 · Sep 13, 2011

Voo said:
Well on forums and also from official people at AT, to cite:

But the point was that the interesting number isn't the 8ghz in a suicide run, but the 4.8ghz with the watercooling setup, because that should basically be about the upper boundary one should expect for homeuse (if those were final chips) - assuming that one could run that setup stable (heaven benchmark that stresses 2-4cores isn't what I'd call a guarantee for that) for some time without killing/degrading the chip.

You forgot to quote last paragraph from Jarred's post :

AMD might still pull this off, but considering the lack of benchmark information I remain skeptical. (We saw running K8 long before launch way back in the day -- http://www.anandtech.com/show/883 -- and I seem to recall benches getting leaked at least several months before launch on some sites.) Just my feelings on the subject right now, as someone that hasn't seen any actual real data on BD performance -- leaked or otherwise.

So he has no idea about how Bulldozer performs and he admits this...

intangir said:
No, he's saying that AMD already came out and said that the project goals for Bulldozer were 20% higher frequency with the same IPC as previous AMD chips. So all this talk of Nehalem-like IPC should have stopped in February.

http://ieeexplore.ieee.org/Xplore/l...746228.pdf?arnumber=5746228&authDecision=-203

I think that the quote refers to threaded(throughput) performance of Bulldozer.So you get same IPC as Phenom/Magny Cours in fully loaded Bulldozer based chip. But you still get more cores(33% more to be exact) and therefore your throughput goes up. Then you have more clock speed headroom and your performance again goes up.
Also note that AMD states 80% of CMP design . I assume this refers (as JF-AMD stated in his XS post) to 1.8x performance over single core running in module. So you have constant IPC (or ~ Phenom II) with fully loaded module which has the sharing penalty.Single thread therefore will perform better and this is before Turbo kicks in. By how much better? From 0% to 11% ,probably varying a lot depending on micro benchmark.

garagisti · Sep 13, 2011

Voo said:
Originally Posted by JarredWalton
I suspect that clock for clock, a single BD core will be slower than current K10.5 stuff, but you'll get more cores

He suspects... What does that tell you? He hasn't signed any NDA, and has no figures, or chips. He may as well suspect to know the winner of 2012 election. In short, it amounts to plain speculation, and nothing more. Certainly not anything worthwhile. I'll tend to think JF who works for AMD, has both access to chips and engineers would know more than him about the processor.

Voo said:
But the point was that the interesting number isn't the 8ghz in a suicide run, but the 4.8ghz with the watercooling setup, because that should basically be about the upper boundary one should expect for homeuse (if those were final chips) - assuming that one could run that setup stable (heaven benchmark that stresses 2-4cores isn't what I'd call a guarantee for that) for some time without killing/degrading the chip.

In many a tests at [H], a good air cooling setup was found to be as good/ better than most ready to go water cooling set-ups available. They used an Antec solution, which someone on another thread mentioned is only as good as a Noctua DH14. If you didn't notice, with a TDP of 125W (AMD's TDP /= Intel's TDP), they already hit 4+ Ghz on turbo. So yes, you could potentially have all 8 cores running at about 4-4.5 ghz 24/7 (with a good air-cooling setup) in most cases, unless you get a bum chip. In best case scenario, they're clocking it max at 5.5 Ghz, which i don't think many would want to run their SB even, so complaining is moot.

Abwx · Sep 13, 2011

intangir said:
No, he's saying that AMD already came out and said that the project goals for Bulldozer were 25%+ higher frequency with the same IPC as previous AMD chips. So all this talk of Nehalem-like IPC should have stopped in February.

http://ieeexplore.ieee.org/Xplore/l...746228.pdf?arnumber=5746228&authDecision=-203

This quote is misunderstood :

Compared to previous AMD x86-64 cores, project goals reduce the number of F04 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.

When they say constant IPC , it s not in respect of previous Uarch,
but in reference to BD designed to achieve a constant IPC when
running a task without periods of collapsing IPC rate.

intangir · Sep 13, 2011

Abwx said:
This quote is misunderstood :

When they say constant IPC , it s not in respect of previous Uarch,
but in reference to BD designed to achieve a constant IPC when
running a task without periods of collapsing IPC rate.

If you read the paper, it's obvious that it's comparing two microarchitectures. But you believe what you want to believe, I suppose.

No microarchitect would design a chip to have IPC insensitive to workload. It's not a measure with any real-world value.

RussianSensation · Sep 13, 2011

Arzachel said:
Diminishing returns. This is the thing that you seem to be unable to grasp.

Everything is a tradeoff and the more you focus on one specific aspect that contributes to the overall performance, the more costly it becomes to push it further. Intel can afford it, because otherwise they would drown in all the money they have, but AMD has no such luxury.

How am I unable to grasp the point of diminishing returns? I am talking about the overall foundation behind a CPU architecture. If you want to improve performance per clock per watt, you add things to the processor that improve its throughput efficiency per clock. I believe Intel has a rule where they'll only add something to the CPU architecture only if it adds 2% more performance for 1% increase in power. They will leave something on the table if you get 1% more performance for 1% more power.

You get diminishing returns whether you focus on more cores or on adding everything you can to increase IPC. That part is obvious and I understand it completely. It appears to me that AMD focused almost entirely on adding more cores during a time when most PC code doesn't scale beyond 4 threads.

They simply mistimed the adoption of multi-threaded code by what appears to be a full generation. When they sat down in the boardroom and discussed which direction their next CPU should go to, they all voted multi-core, not a balanced approach. It is now starting to smell like they have slow 6-8 core CPUs in a marketplace where quad core CPUs won't become mainstream until 2014-2015. In other words, most consumers would rather purchase fast and power efficient dual and quad core CPUs and here AMD wants to sell them 6-8 core CPUs. The market/consumer trends are working completely against AMD's current CPU strategy.

http://www.xbitlabs.com/news/cpu/di...re_Quad_Core_Processors_in_2015_Analysts.html

My personal opinion is that in the PC period from 2011 - 2013, AMD would have been FAR better off releasing a 20-30% faster IPC 4-6 core Phenom II style processor than a 6-8 core Bulldozer style processor with barely any increase in IPC over Phenom II (as a result of the 10-20% penalty incurred due to module design).

If you have a processor with superior IPC, you don't need 4.0ghz clock speeds. As such:

1) You need less voltage to scale higher frequencies, and thus you have lower power consumption;
2) You likely get improved yields since it's harder to get 8 working cores at 4.0ghz vs. 8 cores at 3.0ghz
3) Your CPU line refreshes need only 100-200mhz clock speed increase not 200-300 mhz increases to get similar performance gains. This puts less pressure in the next 3-4 quarters on achieving higher performance through even higher frequencies (because achieving higher yields at 4.5ghz is even harder).

I just don't understand their strategy this round unless they decided to focus on servers and workstation users; and use marketing to sell more cores to avg. user who may think that their cores are not any worse than Intel's.

Arzachel · Sep 13, 2011

I'm quite sure "constant" in this context doesn't mean "in line with PII". Instead, it probably means "not fluctuating".

Reminds me of the Ye Olden BD thread where someone (JFAMD maybe?) explained how BD should have higher integer performance then PII despite having less execution units per core, because usually only 1 of the 3 gets used per clock. BD should use both most of the time.

This is all from memory, so feel free to correct me if I remember wrong.

Abwx · Sep 13, 2011

intangir said:
If you read the paper, it's obvious that it's comparing two microarchitectures. But you believe what you want to believe, I suppose.

No microarchitect would design a chip to have IPC insensitive to workload. It's not a measure with any real-world value.

Not at all...
A chip that has a peak IPC of 2 but can sustain an average IPC of 1.5
will be better than a chip that is capable of a peak IPC of 3 but that will
do only an average of 1 with sustained loading...

Abwx · Sep 13, 2011

Arzachel said:
I'm quite sure "constant" in this context doesn't mean "in line with PII". Instead, it probably means "not fluctuating".

That s it...

intangir · Sep 13, 2011

Abwx said:
Not at all...
A chip that has a peak IPC of 2 but can sustain an average IPC of 1.5
will be better than a chip that is capable of a peak IPC of 3 but that will
do only an average of 1 with sustained loading...

Yes, but the proper measure of that is average IPC over many workloads. They wouldn't use IPC variability as a metric (how would you measure that? standard deviation?), because no buyer actually cares about that.

The only sensible interpretation of "constant" is "the same with respect to previous architectures".

Vesku · Sep 13, 2011

intangir said:
Yes, but the proper measure of that is average IPC over many workloads. They wouldn't use IPC variability as a metric (how would you measure that? standard deviation?), because no buyer actually cares about that.

The only sensible interpretation of "constant" is "the same with respect to previous architectures".

I think a lot of server chip buyers care about relative IPC consistency. It's not a strict numerical metric but a description of behavior, i.e. we aren't designing to look good in current benchmark software. Can't be certain what was meant by "constant" though without more context or a direct clarification from AMD.

intangir · Sep 13, 2011

Vesku said:
I think a lot of server chip buyers care about relative IPC consistency. It's not a strict numerical metric but a description of behavior, i.e. we aren't designing to look good in current benchmark software. Can't be certain what was meant by "constant" though without more context or a direct clarification from AMD.

No, no they don't. They only care about worst-case latencies, which is fundamentally limited by single-threaded performance.

But like I said, people will believe what they want to believe. Just ask Dresdenboy how he interprets that statement. See what he says. I've said my piece, and saying more would just get me frustrated.

Abwx · Sep 13, 2011

intangir said:
Yes, but the proper measure of that is average IPC over many workloads. They wouldn't use IPC variability as a metric (how would you measure that? standard deviation?), because no buyer actually cares about that.

The only sensible interpretation of "constant" is "the same with respect to previous architectures".

He s not comparing two uarch , the term "constant" is a property of the
design he is talking about.

Average IPC will be the one inherently measured by benchmarks...

Fudzilla: Bulldozer performance figures are in

Diamond Member

Senior member

Senior member

Senior member

Golden Member

Golden Member

Golden Member

Golden Member

Senior member

Elite Member

Senior member

Golden Member

Member

Diamond Member

Senior member

Lifer

Member

Elite Member

Senior member

Lifer

Lifer

Member

Diamond Member

Member

Lifer