Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Page 29 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Voo

Golden Member
Feb 27, 2009
1,684
0
76
They can't play too terribly loose with their definitions. Any chip that regularly exceeds it's rated TDP is going to burn itself out or shorten its lifespan if it's paired with a cooling system that can't disperse the heat. TDP is usually given as the most heat the chip will ever produce at stock settings.
Well as a matter of fact they do. Intel defines the TDP as the reasonable maximum amount their CPUs can reach while AMD CPUs hardly ever reach their TDPs.

Since you were already comparing P4s to Athlons I'll just use that old cb article to demonstrate that point: http://www.computerbase.de/artikel/...cht-energieverbrauch-aktueller-prozessoren/2/

The Athlon 64 2800+ (Clawhammer) had a TDP of 89W while the Pentium 4 3,40 GHz (Northwood) had a 103W TDP. Under load the difference between the two is 65W while the TDP only accounts for 14W.

Or if you prefer it from Johan he even had an article about that topic - http://www.anandtech.com/show/2807/2
I'll quote the first sentence:
The only thing clear about the TDP numbers of AMD and Intel is that they are confusing and not comparable


Comparing TDP values of Intel and AMD to find out about power draw is skewed at best or completely wrong more often than not - TDPs aren't even meant to be interpreted in that way. ACP would be more useful since its use is more in line with what we want to know but then even if Intel would publish such a number as well, I still wouldn't trust those numbers without a unbiased third party running the tests.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
5,178
5,576
136
Sure, but we are talking high performance X86 here. Nobody has a starting point that's at half the power consumption of the other guy. The only way to achieve a 50% power advantage would be fundamental changes in transistor design. And AMD isn't going to be the guy to do that, they don't have the cash - or the development fab.
What in the world is a high performance X86 design? That term sounds very arbitrary, artificial and limited.
Surely there can be several ways to design a high performance X86 CPU? There is no way one can claim that all X86 designs (in power efficiency)are similar, as you very much seem to be implying.
The reason greater spending companies advance faster is a partially random process. They cover more bets. SOMETIMES, one can get a breakthrough without great expenditures. To claim that smaller companies cannot jump ahead is completely ignorant of innovation and how it occurs.
Yes, BUT.

Surely you're not saying that all similar speed (Ghz) processors consume identical power?
Each design has a different speed for a given energy use.

I find your arguments rather myopic.

Jeez, people can't read.

I'll say it again.

Within the context of high performance X86 CPU's.

Also, I believe you took my quote out of another context. That was the context that regardless of design, increasing the clock will increase power consumption proportionately.


Certainly, no one here is refuting the statement "for a given design, power use increases with clock speed increases", but you are comparing totally different designs and saying that AMD cannot achieve much higher clocks and still stay within comparable power limits.

For some reason, circuit designs appears to be irrelevant and only transistor design important.

You were definitely not taken out of context.

Please remember written communication has 2 parties. The one who reads AND the one who writes.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
About TDP: That's not true with the Core 2 and beyond Intel chips. When Core 2 first came out, it went way under TDP in most load circumstances. The only program that would make it reach TDP was a power-virus like programmed called Intel TAT(Thermal Analysis Tool).

So TDP wasn't max power before. It is now. Nowadays, AMD's TDP = ~ Intel's TDP

http://www.xbitlabs.com/articles/cpu/display/core2duo-shootout_11.html#sect0
 
Last edited:

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Certainly, no one here is refuting the statement "for a given design, power use increases with clock speed increases", but you are comparing totally different designs and saying that AMD cannot achieve much higher clocks and still stay within comparable power limits.

Ok, do this.

Find a CPU created by anybody that has roughly equivalent general purpose performance -x86 or not - to a current generation Intel CPU that has significantly lower power consumption.

Go for it, start researching Power, Sparc, AMD, whatever. Or take my word for it - everybody is operating their top tiers around 100w. Power6 was pushing 200w at 5Ghz, Power7 currently tops out at 4.25Ghz somewhere in the 150w area (below that I believe).

Performance = transistors = heat. Period. It's why AMD is reducing functional units. Fewer functional units = less transistors = less heat (and smaller die size).

Don't take my word for it, ask JF. The reason why the FP unit is shared between cores? Transistor budget. Although I don't think thermal output is their driving goal. Just as in their graphics division, I believe that when Bulldozer was conceived in the Ruiz days, manufacturing cost (die size) was the single largest driving factor of their transistor budget.
 

Mopetar

Diamond Member
Jan 31, 2011
8,510
7,766
136
Or if you prefer it from Johan he even had an article about that topic - http://www.anandtech.com/show/2807/2
I'll quote the first sentence:


Comparing TDP values of Intel and AMD to find out about power draw is skewed at best or completely wrong more often than not - TDPs aren't even meant to be interpreted in that way. ACP would be more useful since its use is more in line with what we want to know but then even if Intel would publish such a number as well, I still wouldn't trust those numbers without a unbiased third party running the tests.

As I said, since both companies use some kind of Turbo, they're both capable of running closer to their rated TDP when under heavy load. If one has more headroom under normal operation, it will just let it boost higher/longer.

I suspect that Intel runs closer to their rated amount simply because they don't increase their clock as much as AMD says BD will. Whether that means they test their chips different or are just able to get better average utilization out of them doesn't matter when you let both chips run wild.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
As I said, since both companies use some kind of Turbo, they're both capable of running closer to their rated TDP when under heavy load. If one has more headroom under normal operation, it will just let it boost higher/longer.

Company A markets a CPU with a base frequency of 2Ghz that will turbo to 3Ghz.

Company B markets a CPU with a base frequency of 2.5Ghz that will turbo to 3Ghz.

Oh, by the way, under light loads they both also underclock below their base frequency.

Which CPU has more headroom?

If I decide to market CPU B as a 1.8Ghz CPU does it now have more headroom?
 

maddie

Diamond Member
Jul 18, 2010
5,178
5,576
136
Ok, do this.

Find a CPU created by anybody that has roughly equivalent general purpose performance -x86 or not - to a current generation Intel CPU that has significantly lower power consumption.

Go for it, start researching Power, Sparc, AMD, whatever. Or take my word for it - everybody is operating their top tiers around 100w. Power6 was pushing 200w at 5Ghz, Power7 currently tops out at 4.25Ghz somewhere in the 150w area (below that I believe).

Performance = transistors = heat. Period. It's why AMD is reducing functional units. Fewer functional units = less transistors = less heat (and smaller die size).

Don't take my word for it, ask JF. The reason why the FP unit is shared between cores? Transistor budget. Although I don't think thermal output is their driving goal. Just as in their graphics division, I believe that when Bulldozer was conceived in the Ruiz days, manufacturing cost (die size) was the single largest driving factor of their transistor budget.

Amazing. You claim for Intel the same thing I've been saying that might apply to Bulldozer. Current Phenom II do use more power than Intel for given throughput.

Why can't Bulldozer do the same to Intel? Not saying they will, but why not? Are you aware of confidential information?

They redesigned the layout to optimize output without a proportional increase in size and power use.

SO WHAT if they're using less circuitry in total, once they get the desired output. I call that efficiency and intelligent design.

You were the one saying improving transistor design is practically the only factor for power usage gains at a given frequency.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Amazing. You claim for Intel the same thing I've been saying that might apply to Bulldozer. Current Phenom II do use more power than Intel for given throughput.

Why can't Bulldozer do the same to Intel? Not saying they will, but why not? Are you aware of confidential information?

They redesigned the layout to optimize output without a proportional increase in size and power use.

SO WHAT if they're using less circuitry in total, once they get the desired output. I call that efficiency and intelligent design.

You were the one saying improving transistor design is practically the only factor for power usage gains at a given frequency.

Here's what I said:

The only way to achieve a 50% power advantage would be fundamental changes in transistor design.

I stand by that. Save this post. Come back to it after Bulldozer is released. End of conversation until that time.
 

HW2050Plus

Member
Jan 12, 2011
168
0
0
Theoretically the Intel compiler could do this, but did not do this because of the compiler settings I used (does not mean that the compiler does this anyway). Actually it thinks it compiles for an Intel CPU (forced by architecture settings). And there was no additional code. I know this because I inspect dissassembly in the performance critical parts (I do this because I do e.g. SSE optimizations and need to investigate that thourougly because performance is critical for this application). After the bad results I tried different compiler settings and by that the slowdown for AMD CPUs varies, but does not disappear.

However the newer versions of Intel Profiler VTune has such code and will simply not even do a profile run if on a computer with AMD CPU (older versions worked fine). I think that was the reason why AMD released it's charge free profiler "AMD CodeAnalyst" after that. My company was very unpleased by that as customers of Intel VTune since many development machines ran on AMD CPUs and we had to switch profilers therefore (we switched to Rational Quantify then)!
UPDATE

A member of the forums here send me a PM and gave me a link to evidence that Intel compiler is really adding some code to check if it should use one codepath or another and the good one is used only for Intel CPUs. That was only not the case for my program because I used the /arch=SSE compiler flag and used directly SSE instructions where the compiler has just no option.

This is really very bad and I want to provide the link I received by PM:

http://www.agner.org/optimize/blog/read.php?i=49

There is also information available how to trick the Intel compiler so that the good code path is also executed for AMD CPUs.

This is really another Intel special since just the additional code and branching reduces the preformance for Intel customers at well and all that only to hamper the competition even more.

However that simplifies detection of icc compiler when you scan as like a virus scanner your executable for this icc specific addition. Anandtech should check all their benchmark programs!

And this bad thing would even apply to and hamper the results of the upcoming AMD Bulldozer. Even more it will also affect future Intel CPUs according to the source of the link above!

Also there are compiler benchmarks which shows clearly how good Microsfot compiler and gcc are on both CPU architectures while icc is very bad for AMD and good only for Intel CPU.
 
Last edited:

HW2050Plus

Member
Jan 12, 2011
168
0
0
No they aren't. Power consumption scales with clock speed, unless AMD has found a way around physical laws.
That is true, especially for CMOS.

But you forget a lot of important facts here. First CMOS transistor switching loss (by a very small time frame when both Complementary n- and p-MOS are switching it's like a short circuit) is not the only power loss in a chip.

Much comes from several leakages over substrate, etc.

All of the above can be changed by process parameters, e.g. the high-k Intel uses and will be also used for the 32 nm AMD process for Bulldozer plus additional features like the unique AMD feature of SOI.

And what is even more important: A transistor switching costs power which is very related to frequency (esp. applicable for CMOS technology).

But if you look at chip power consumption it is very important how much transistors are switching. By just increase of power gating you can save enourmous amounts of power. Otherwise chips like Nehalem, Phenom or Sandy Bridge would just not fit in the given TDP. Therefore there is always the possibility to improve power gating.

Even more there was a research by IBM to identify at which type of design you get the most performance per power and AMD Bulldozer has choosen exactly the design parameters which turned out to be the best in this IBM research.

Research report: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.9109&rep=rep1&type=pdf

Anyway all you ever need is a high performance chip. With that you can either choose to give higher performance or lower power at a given performance level.

So you could deliver an incarnation of many cores and high frequency, take into account a high TDP and deliver best of class performance. Or you use a part for a certain performance and reduce core count and/or frequency and deliver the specified performance at minimum power consumption.

As always with CPUs from either Intel or AMD we will likly see both, some special high performance parts with high TDP and some power efficient parts with low TDP.

And back to what AMD did: They use a high frequency design! That is different from just pushing the frequency high. A high frequency DESIGN means you have more power loss of a single switched transistor but by design you are switching less (you are also doing less). If you are just pushing frequency you have high power loss per transistor and no change in the amount of switched transistors.

It is a difficult balance and you have to do a high frequency design right to get more performance and to get reasonable power consuption. And yet again this IBM reasearch comes into play so we could guess that AMD did it right.

Even Intel got it quite right with e.g. Northwood, though it was somewhat more aggressive and failed mainly at later incarnations (Penryn, etc.). There were many other additional problems with P4 which made it fail.
 

Mopetar

Diamond Member
Jan 31, 2011
8,510
7,766
136
But if you look at chip power consumption it is very important how much transistors are switching. By just increase of power gating you can save enourmous amounts of power. Otherwise chips like Nehalem, Phenom or Sandy Bridge would just not fit in the given TDP. Therefore there is always the possibility to improve power gating.

JFAMD has stated on his blog, which according to his blog has to have posts approved by the legal dept. to ensure he's not making any false claims, that AMD rates chip TDP based on testing that stresses the whole chip. While it may be possible for some real world software to do this, it would need to be heavily optimized. Most software won't.

TDP is the maximum amount of waste heat that the chip will produce based on its stock settings. If it produced any more the cooling system might not be able to handle it, leading to potential system crashes, reduced chip life, or possibly even chip failure.

Power gating just turns off or reduces power to parts of the chip that aren't being used. The only thing this really allows is for a single core to boost much higher than if all cores were running at once. If most cores are off or receiving less power, they're not producing as much waste heat so other parts of the chip are free to go overboard to some extent. Conceivably a 16 core chip only using one core should be able to boost by an incredible amount before hitting the chip TDP, but eventually the amount of voltage required to get that high will start damaging the chip. Power gating doesn't allow a chip to hit a TDP target, it just allows a chip not running near its TDP to reduce actual power draw. This might not make a big difference to you or I, but it's a huge cost saver for data centers.


Even Intel got it quite right with e.g. Northwood, though it was somewhat more aggressive and failed mainly at later incarnations (Penryn, etc.). There were many other additional problems with P4 which made it fail.

Do you mean Prescott? Penryn is based on the Core architecture.
 

maddie

Diamond Member
Jul 18, 2010
5,178
5,576
136
UPDATE

A member of the forums here send me a PM and gave me a link to evidence that Intel compiler is really adding some code to check if it should use one codepath or another and the good one is used only for Intel CPUs. That was only not the case for my program because I used the /arch=SSE compiler flag and used directly SSE instructions where the compiler has just no option.

This is really very bad and I want to provide the link I received by PM:

http://www.agner.org/optimize/blog/read.php?i=49

There is also information available how to trick the Intel compiler so that the good code path is also executed for AMD CPUs.

This is really another Intel special since just the additional code and branching reduces the preformance for Intel customers at well and all that only to hamper the competition even more.

However that simplifies detection of icc compiler when you scan as like a virus scanner your executable for this icc specific addition. Anandtech should check all their benchmark programs!

And this bad thing would even apply to and hamper the results of the upcoming AMD Bulldozer. Even more it will also affect future Intel CPUs according to the source of the link above!

Also there are compiler benchmarks which shows clearly how good Microsfot compiler and gcc are on both CPU architectures while icc is very bad for AMD and good only for Intel CPU.
Are they still allowed to do this? This sounds like monopolistic practices to me. Anyone knows the terms of the recent AMD Intel settlement for illegal business practices?
 

HW2050Plus

Member
Jan 12, 2011
168
0
0
Do you mean Prescott? Penryn is based on the Core architecture.
Thanks for the correction. Yes Prescott of course. There are too much code names around and I had a long working day today ...


Regarding power gating you can gate a full core you can gate an execution unit or you can just gate a very small unused part (I wrote unused latch before but gating a unused latch is really stupid). What you said is as far as I understood it related to gate full cores.

Maybe I was not precise and what I meant runs more under the name of clock gating.

You can gate a lot even if the CPU is running at full speed on all cores. However gating is still difficult since it adds an additional gate into maybe a speed path.
 
Last edited:

HW2050Plus

Member
Jan 12, 2011
168
0
0
I do not know if that is real or can be trusted but looks more reliable than the last "leaks":

http://www.***************/amd-bulldozer-benchmarks-leaked/#comment-525

Whereas a BD 4M/8T would be 35% faster than a 4C/8T Sandy Bridge 2600K in 3D Mark Vantage CPU Benchmark according to the leak.

Some comments are there regarding strange results and claim it is fake.

I don't know why the link is modified by the forum.
 
Last edited:

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Obviously the forum is censoring the domain, so be creative in writing it down if you want to share the link, like so:

http://www.rumor pedia.net/amd-bulldozer-benchmarks-leaked/ (remove the space between rumor and pedia)

EDIT: Just read the supposed leak. It says:
3.5 GHz Bulldozer matches 4.0 GHz Sandy Bridge.
That's with 3DMark Vantage CPU bench.

Personally, I believe some people just have way too much time on their hands to be imaginative enough to fabricate something like this.
 
Last edited:

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
I do not know if that is real or can be trusted but looks more reliable than the last "leaks":

http://www.***************/amd-bulldozer-benchmarks-leaked/#comment-525

Whereas a BD 4M/8T would be 35% faster than a 4C/8T Sandy Bridge 2600K in 3D Mark Vantage CPU Benchmark according to the leak.

Some comments are there regarding strange results and claim it is fake.

I don't know why the link is modified by the forum.

As I said in the first page of this thread:

Hey, an eight core CPU is faster than a four core CPU! Who woulda thought??

Really, would you compare that 2600K to an Athlon X2??
 

sawtx

Member
Dec 9, 2008
93
0
61
As I said in the first page of this thread:

Hey, an eight core CPU is faster than a four core CPU! Who woulda thought??

Really, would you compare that 2600K to an Athlon X2??

If it is similarly priced, sure. Number of cores shouldn't really matter to the end user, especially if the thread counts are the same.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Ok let’s take the x264 HD bench which is highly multithreaded in second pass

35043.png


Have a look at Intel Core i7 975 (4 core 8 threads) vs Intel Core i7 980X (6 Core 12 Threads).
The Intel Core 980X has 50% more Cores/Threads than Core i7 975 and both have the same Frequency + same L2 and L3 per Core and they are of the same micro architecture.

Core i7 975 = 32,3 vs
Core i7 980X = 46,1
42,7% faster with 50% more cores

Same goes for AMD Phenom ii X4 970 vs Phenom ii X6 1100T, 50% Cores almost same Frequency but same L3 for both.

Phenom 970 = 21,4
Phenom 1100T = 31,5
47,2% faster with 50% more cores

I would make an assumption and say that a 4 module Bulldozer will have higher score than Intel Core i7 980-990X.
So in that benchmark the rumor that Bulldozer will be 50% faster than Core i7 (4 Core 8 Threads) and Phenom X6 could be true.
 
Last edited:

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Whereas a BD 4M/8T would be 35% faster than a 4C/8T Sandy Bridge 2600K in 3D Mark Vantage CPU Benchmark according to the leak.
They did not say it was a 4 module Zambezi. They said it was a quad core.

Not saying it's legit (because I definitely think it isn't). Just pointing out that a quad-core Zambezi means exactly that: 4 cores (only 2 modules). It helps to just forget the modules because the OS supposedly never sees them, and just reports the number of int cores as cores.

This indication is what definitely makes the entire leak fake to me. If the leak was anywhere near legit, there would have been no rookie mistake regarding what a quad core BD means, and therefore the ridiculous comparison to a 2600K (which the quad core zambezi is shown to dominate) would not have been made in the first place. A 4-core/4-thread Zambezi spanking a 4-core/8-thread 2600K in a multi-threaded benchmark, and with less GHz? Absolutely rubbish.

This, like most other leaks, are just page-hit materials. JFAMD already spelled it out: Under his watch, no leaks, no benchies, nothing, until launch. Everything else is just fiction.
 

LiuKangBakinPie

Diamond Member
Jan 31, 2011
3,903
0
0
Ok let’s take the x264 HD bench which is highly multithreaded in second pass

35043.png


Have a look at Intel Core i7 975 (4 core 8 threads) vs Intel Core i7 980X (6 Core 12 Threads).
The Intel Core 980X has 50% more Cores/Threads than Core i7 975 and both have the same Frequency + same L2 and L3 per Core and they are of the same micro architecture.

Core i7 975 = 32,3 vs
Core i7 980X = 46,1
42,7% faster with 50% more cores

Same goes for AMD Phenom ii X4 970 vs Phenom ii X6 1100T, 50% Cores almost same Frequency but same L3 for both.

Phenom 970 = 21,4
Phenom 1100T = 31,5
47,2% faster with 50% more cores

I would make an assumption and say that a 4 module Bulldozer will have higher score than Intel Core i7 980-990X.
So in that benchmark the rumor that Bulldozer will be 50% faster than Core i7 (4 Core 8 Threads) and Phenom X6 could be true.
If you want to show benches in multi threading show recent benches with retail chips not cherryt ones like the above. And that decoder is from the CS4 days. Outdated and dont make use of the multi core systems so well

http://ppbm5.com/Benchmark5.html
^thats recent with retail chips with clocks the systems run 24/7 on
 
Status
Not open for further replies.