Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

JFAMD · Feb 12, 2011

maddie said:
I don't understand your reasoning. Why would the synthetic benchmark not affect various cpu in a similar manner? Thus the comparison between them should be valid.

Only if the benchmark was optimized for a particular design could you say the delta would change.

You are not using that argument.

Yes, that would mean that it would impact all CPUs the same way. However, what is at question is whether what it is measuring will actually manifest itself in real world environments.

You can run the STREAM benchmark to show memory bandwidth. Then you can run two computers to see the difference that STREAM gives you in running microsoft word. you will see no difference.

Synthetic benchmarks are one indication of performance, but not the "be all, end all" answer on performance. I'd be willing to bet that you could put two computers side by side with a 20% delta in a synthetic benchmark and I would be willing to bet that 80-90% of the people could not tell the difference if they were to sit down and use the two, even for gaming.

Too much emphasis is put on benchmarks, not enough emphasis is put on actual real life performance. This is because benchmarks are easier to compare and real life performance is subjective.

HW2050Plus · Feb 12, 2011

maddie said:
I don't understand your reasoning. Why would the synthetic benchmark not affect various cpu in a similar manner? Thus the comparison between them should be valid.

Only if the benchmark was optimized for a particular design could you say the delta would change.

You are not using that argument.

Yes that is not easy to understand and obviously it needs some explanation.

Let us make an example:
If you get 20% more performance in a synthetic benchmark that will reduce to let's say 14% in a real world benchmark. The numbers are just example and not a fixed rule.

The reason for this - as I stated already - is that synthetic benchmarks are coded better than real applications. So all applications will perform less than synthetic ones. This in general would not change the relative performance in theory so far, but with the worse programming other issues come into play.

The worse code of real world applications reduces the ability to make use of the capabilities of a faster processor. Most importantly you have more scaling issues in real world applications. Next important one is IO and memory limitations. Most synthetic benchmarks do not stress e.g. the memory subsystem.

Even memory benchmarks are not suited. Memory benchmarks are to test memory and the memory subsystem. For that they use special code to deliver utmost performance on memory (to exclude CPU impact). But a real world application e.g. uses a much worse routine for memory operations, so they do not get the maximum performance. Yet again this obviously only reduces absolut numbers not the deltas between them, but yet again the bad code comes with another effect of worse scaling/usage of capabilities.

The passmark benchmark is not as synthetic as e.g. Memtests, Drhystone, etc. but much more synthetic than SPEC benchmark. SPEC is the most honest benchmark out there because it uses real world applications available as source code and allows for maximum compiler optimization. So the SPEC benchmark gives you the real ability of a processor achieveable with real compilers out there. It just excludes the disabilities of software vendors.

Application benchmark sets e.g. here at Anandtech differ from SPEC results because for those a certain mix of applications is used and those applications are likly to favor one or the other CPU, just because they are compiled with same compiler (and compiler settings) for both CPUs.

Anandtech does it like this because their audience is consumers using consumer applications. Consumers are interested in what performance do I get for the applications I use. The audience for SPEC on the other hand is professionals using professional applications and those are interested in how much performance can I get from a certain CPU.

It's just different questions:
So Anandtech benchmark set -> How fast are those applications running
SPEC benchmark -> How fast is the CPU
Synthetic benchmarks -> How fast is the CPU excluding some issues (e.g. code quality, memory impact)

And therefore you should not expect that the relative values are just the same. To say it clear: If that benchmark ist true and Bulldozer is ~70% faster in this as e.g. 980X then you will see less difference in a real world application, maybe 40% just to give a sample number for a sample real world benchmark.

Just to tell you a story of my own: I wrote some consumer application (very integer and FPU and SSE intense, very low memory impact but high cache impact). I compiled this with high optimizations with Visual Studio compiler.
Result was 100 on AMD and 100 on Intel. Then I compiled with Intel compiler and result was around 72 on AMD and 110 on Intel. That really puzzled me because I expected something like 110 on AMD and 120 on Intel assuming that it favors Intel specifcs. But it was a lot slower on AMD than default compiler. So there is really a lot wrong with Intel compilers. I have evidence that Intel compiler forfeits performance improvements to be able to hurt AMD CPUs more and that is e.g. with the use of lea instructions. Intel compiler still uses lea if an add instruction would be sufficient. However using lea hurts AMD more than Intel. And that is only one of many examples you see only in the Intel compiler (others are mul replacement rules (AMD is very fast in muls), usage of push/pop etc.). Those icc problems started in around 2007 and got worse esp. since icc 10 (old Intel compilers (version 5-7) have been quite well for both architectures). That makes the situation especially worse as because of the historically good results on both CPUs in many companies the icc compiler was used.

To make an application look good on an Intel CPU and bad on AMD CPU you just take icc (version 10 or above) and that's it, there is no need to - as often written - "optimize software" for a certain CPU, on the other hand you take Pathscale or PGI if you want to favor AMD (though those do not produce special bad code for Intel). It is especially funny that with the AMD Bulldozer many of those Intel icc tricks will then work in favor (or let's say not as a disadvantage) of Bulldozer. You could see some surprises with AMD Bulldozer on benchmarks you recently thought that those will esp. favor Intel CPUs.

In addition e.g. with the application I mentioned above I found out a very strong weakpoint of all AMD CPUs so far: They have really high performance capabilities regarding throughput but it is almost impossible to make use of it. I even tried hand optimized assemler code (if CPU detected as AMD) but it's just not possible to fully utilize their capabilities with real code doing something useful. And Bulldozer is just a great attempt to overcome this issue. They offer less throughput (which does not hurt because was unused anyway) but increased effiency (I mean performance and die area efficiency). Together with the high frequency and longer pipeline design this completely changes as well how to optimze software and that is why those current icc tricks will no longer work for Bulldozer.

To summarize, AMD optimized their CPUs in the past for utmost theoretical performance. They could never reach that because either of practical programming issues and/or some kick by icc tricks.

Bulldozer architecture on the other hand is fully optimized to get utmost practical performance though if you look at paper specs they dropped a lot of theoretical features (e.g. the 3-way AGU and 3-way FLOAT unit). I mean they got 2 cores with almost the die area of one previous core. In my opinion this is the largest CPU design change ever in the x86 business since Pentium, more than P4 was and more than Core was.

Emulex · Feb 12, 2011

i'm interested in reverse hyper-threading. hyper-v does this inherently but esxi doesn't really do this. plus licensing costs. 6 core is the limit before you have to upgrade to enteprise plus. i'd rather have 12 real cores present at 6 and do double the duty

Ajay · Feb 12, 2011

Voo said:
They narrowing Q1-Q2 to Q2 2011 and shows confidence? But yeah I don't think we'll have delays there, but then it's not as if they could fix much in the last few months even if performance wasn't what they wanted

Versus say "sometime this summer". I guess I'm looking at this with a glass 1/2 full attitude.

Mopetar · Feb 12, 2011

The other problem with benchmarks is that they often don't give a very complete picture. One processor could receive higher benchmark scores than another in 9 out of 10 benchmarks, but the real world performance could be identical if that 10th benchmark is an area that bottlenecks the first chip.

It would be a little like using 580s in SLI with a single core CPU for gaming. The CPU performance will be a significant bottleneck when running games. A CPU can have similar issues that can make it look amazing in certain benchmarks, but not have significantly better performance.

Castiel · Feb 12, 2011

formulav8 said:
Most people who complain about PII performance is nothing more than bench babes. PII can do everything I7 can do, just not as fast. And as someone said earlier. most things you would not even be able to tell a difference. Blah

Denab was slower then the Core 2 Quad's clock for clock for the most part. Yes it could do things an I7 could do but what was the point of buying it if a Lynnfield did them better for alittle more money?

maddie said:
You do realize don't you, that stressing IPC as the sole relevant factor affecting throughput sounds rather ignorant?

IPC plays a BIG part

drizek · Feb 12, 2011

I'm pretty sure PII sold better than Lynnfield. No data to back that up though.

Castiel · Feb 12, 2011

drizek said:
I'm pretty sure PII sold better than Lynnfield. No data to back that up though.

PII is cheaper then Lynnfield so i imagine it did.

podspi · Feb 12, 2011

Does anyone know if this MSI board has the new AM3+ socket? AM3 and AM3+ are not pin compatible, so if this board has a AM3 socket, Bulldozer isn't going to fit.

I agree though, if it does happen to have an AM3+ socket, it is tempting. The only feature of BD that I can imagine not working is TurboCORE 2.0 though...

busydude · Feb 12, 2011

podspi said:
Does anyone know if this MSI board has the new AM3+ socket?

Nop.

Its still AM3 socket. AM3+ has an extra pin.. I don't think it will be compatible.

Markfw · Feb 12, 2011

busydude said:
Nop.

Its still AM3 socket. AM3+ has an extra pin.. I don't think it will be compatible.

Well, looking at that picture, that sure smells of false advertising then.

996GT2 · Feb 12, 2011

drizek said:
I'm pretty sure PII sold better than Lynnfield. No data to back that up though.

I highly doubt that. Just think of Dell, HP, and all of the other big OEMs. Most of their machines are using Intel CPUs and they go through a LOT more processors than the enthusiast community ever will.

mosox · Feb 12, 2011

Maybe they used some PCBs made for the 9xx chipsets. According to some rumors on the net an AM3 version Bulldozer can be done with a small performance hit, the poster said 5% if I remember well.

Might be true since the AM3 CPUs work in AM3+ sockets there's no big electrical difference and the pinout is the same. So it all should boil down to that extra pin and whether AMD made them incompatible on purpose - Intel style, or that extra pin is really that important.

Anyway sooner or later someone will "customize" his new Bulldozer CPU or his AM3 socket and try it.

Phynaz · Feb 12, 2011

JFAMD said:
Too much emphasis is put on benchmarks, not enough emphasis is put on actual real life performance. This is because benchmarks are easier to compare and real life performance is subjective.

So when I have one server that is capable of processing more TPS than another it is not really faster, I just think it is?

I'll remember this when you throw benchmark numbers out there for Bulldozer.

Phynaz · Feb 12, 2011

mosox said:
So it all should boil down to that extra pin and whether AMD made them incompatible on purpose

Do not assume that just because there is one extra pin that is the only difference. Electrical specs of many of the others could have changed also. Many pins in a CPU are marked as reserved, those could be used by AM3+.

drizek · Feb 12, 2011

996GT2 said:
I highly doubt that. Just think of Dell, HP, and all of the other big OEMs. Most of their machines are using Intel CPUs and they go through a LOT more processors than the enthusiast community ever will.

Maybe. I was actually thinking of Bloomfield rather than Lynnfield. Phenom II came out a couple of months after bloomfield, right?

Idontcare · Feb 12, 2011

podspi said:
Does anyone know if this MSI board has the new AM3+ socket? AM3 and AM3+ are not pin compatible, so if this board has a AM3 socket, Bulldozer isn't going to fit.

I agree though, if it does happen to have an AM3+ socket, it is tempting. The only feature of BD that I can imagine not working is TurboCORE 2.0 though...

I think you meant to say that an AM3+ CPU is not compatible with an AM3 socket.

The AM3+ socket is suppose to be backwards compatible with any AM3 CPU.

Voo · Feb 12, 2011

Phynaz said:
So when I have one server that is capable of processing more TPS than another it is not really faster, I just think it is?

I'll remember this when you throw benchmark numbers out there for Bulldozer.

No he's saying that most benchmarks are far from being indicative for real life performance. Especially those synthetic that test only one specific part of the cpu (fp, int, memory subsystem,..)

JFAMD · Feb 12, 2011

Phynaz said:
So when I have one server that is capable of processing more TPS than another it is not really faster, I just think it is?

I'll remember this when you throw benchmark numbers out there for Bulldozer.

Voo said:
No he's saying that most benchmarks are far from being indicative for real life performance. Especially those synthetic that test only one specific part of the cpu (fp, int, memory subsystem,..)

Yes, it is a complicated statement. The problem here is that in the client world there are dozens of applications that people run, so it is more difficult to say that one system is faster than another. Especially when people pull one benchmark and make a broad sweeping statement. Most statements don't end in "...for this application." They just say "x is faster than y."

When it comes to database tests, those are an indicator, but they are not the ultimate indicator. Take TPC-C for instance. Typically it is run on a server with hundreds of hard drives with perhaps 500MB partitions - it is all about striping because you get the best I/O performance that way. But nobody would ever configure a server that way.

Will we run benchmarks? Yes. Will we make competitive claims on those benchmarks? Absolutely. That doesn't mean I have to like benchmarks. There are a lot of things in every job that you have to put up with. I also have to sit in a cubicle and fly in coach to places like Beijing. (Oh, and the worst thing you can hear as they are getting the plane ready is "ok, it is safe to take the prisoner on board" - true story from last Thursday's flight.)

Everyone in the business has to use benchmarks, it is the nature of the beast.

But I can guarantee you that when I sit at a table across from a customer we are discussing their applications and how I can help them with their applications, not how I can tweak a benchmark with a configuration that they would never buy.

hamunaptra · Feb 13, 2011

HW2050Plus said:
Yes that is not easy to understand and obviously it needs some explanation.

Let us make an example:
If you get 20% more performance in a synthetic benchmark that will reduce to let's say 14% in a real world benchmark. The numbers are just example and not a fixed rule.

The reason for this - as I stated already - is that synthetic benchmarks are coded better than real applications. So all applications will perform less than synthetic ones. This in general would not change the relative performance in theory so far, but with the worse programming other issues come into play.

The worse code of real world applications reduces the ability to make use of the capabilities of a faster processor. Most importantly you have more scaling issues in real world applications. Next important one is IO and memory limitations. Most synthetic benchmarks do not stress e.g. the memory subsystem.

Even memory benchmarks are not suited. Memory benchmarks are to test memory and the memory subsystem. For that they use special code to deliver utmost performance on memory (to exclude CPU impact). But a real world application e.g. uses a much worse routine for memory operations, so they do not get the maximum performance. Yet again this obviously only reduces absolut numbers not the deltas between them, but yet again the bad code comes with another effect of worse scaling/usage of capabilities.

The passmark benchmark is not as synthetic as e.g. Memtests, Drhystone, etc. but much more synthetic than SPEC benchmark. SPEC is the most honest benchmark out there because it uses real world applications available as source code and allows for maximum compiler optimization. So the SPEC benchmark gives you the real ability of a processor achieveable with real compilers out there. It just excludes the disabilities of software vendors.

Application benchmark sets e.g. here at Anandtech differ from SPEC results because for those a certain mix of applications is used and those applications are likly to favor one or the other CPU, just because they are compiled with same compiler (and compiler settings) for both CPUs.

Anandtech does it like this because their audience is consumers using consumer applications. Consumers are interested in what performance do I get for the applications I use. The audience for SPEC on the other hand is professionals using professional applications and those are interested in how much performance can I get from a certain CPU.

It's just different questions:
So Anandtech benchmark set -> How fast are those applications running
SPEC benchmark -> How fast is the CPU
Synthetic benchmarks -> How fast is the CPU excluding some issues (e.g. code quality, memory impact)

And therefore you should not expect that the relative values are just the same. To say it clear: If that benchmark ist true and Bulldozer is ~70% faster in this as e.g. 980X then you will see less difference in a real world application, maybe 40% just to give a sample number for a sample real world benchmark.

Just to tell you a story of my own: I wrote some consumer application (very integer and FPU and SSE intense, very low memory impact but high cache impact). I compiled this with high optimizations with Visual Studio compiler.
Result was 100 on AMD and 100 on Intel. Then I compiled with Intel compiler and result was around 72 on AMD and 110 on Intel. That really puzzled me because I expected something like 110 on AMD and 120 on Intel assuming that it favors Intel specifcs. But it was a lot slower on AMD than default compiler. So there is really a lot wrong with Intel compilers. I have evidence that Intel compiler forfeits performance improvements to be able to hurt AMD CPUs more and that is e.g. with the use of lea instructions. Intel compiler still uses lea if an add instruction would be sufficient. However using lea hurts AMD more than Intel. And that is only one of many examples you see only in the Intel compiler (others are mul replacement rules (AMD is very fast in muls), usage of push/pop etc.). Those icc problems started in around 2007 and got worse esp. since icc 10 (old Intel compilers (version 5-7) have been quite well for both architectures). That makes the situation especially worse as because of the historically good results on both CPUs in many companies the icc compiler was used.

To make an application look good on an Intel CPU and bad on AMD CPU you just take icc (version 10 or above) and that's it, there is no need to - as often written - "optimize software" for a certain CPU, on the other hand you take Pathscale or PGI if you want to favor AMD (though those do not produce special bad code for Intel). It is especially funny that with the AMD Bulldozer many of those Intel icc tricks will then work in favor (or let's say not as a disadvantage) of Bulldozer. You could see some surprises with AMD Bulldozer on benchmarks you recently thought that those will esp. favor Intel CPUs.

In addition e.g. with the application I mentioned above I found out a very strong weakpoint of all AMD CPUs so far: They have really high performance capabilities regarding throughput but it is almost impossible to make use of it. I even tried hand optimized assemler code (if CPU detected as AMD) but it's just not possible to fully utilize their capabilities with real code doing something useful. And Bulldozer is just a great attempt to overcome this issue. They offer less throughput (which does not hurt because was unused anyway) but increased effiency (I mean performance and die area efficiency). Together with the high frequency and longer pipeline design this completely changes as well how to optimze software and that is why those current icc tricks will no longer work for Bulldozer.

To summarize, AMD optimized their CPUs in the past for utmost theoretical performance. They could never reach that because either of practical programming issues and/or some kick by icc tricks.

Bulldozer architecture on the other hand is fully optimized to get utmost practical performance though if you look at paper specs they dropped a lot of theoretical features (e.g. the 3-way AGU and 3-way FLOAT unit). I mean they got 2 cores with almost the die area of one previous core. In my opinion this is the largest CPU design change ever in the x86 business since Pentium, more than P4 was and more than Core was.

Very good read dude..I love lots of technical explanation =)
I am wondering is there any way to find out what a program was compiled with?
I sure hope this it he approach of BD that you mention and that intel cant use anymore nasty tricks in their compiler cuz AMD addressed those weakspots intel was "attacking" with its compiler....muahaha take that intel!
If true..bd keeps getting better and better.

drizek · Feb 13, 2011

Idontcare said:
I think you meant to say that an AM3+ CPU is not compatible with an AM3 socket.

The AM3+ socket is suppose to be backwards compatible with any AM3 CPU.

He's talking about an MSI board with an AM3 socket that is marketed as being "AM3+ CPU compatible".

HW2050Plus · Feb 13, 2011

hamunaptra said:
I am wondering is there any way to find out what a program was compiled with?

Easiest is to ask the vendor what compiler they used.

For SPEC benchmarks compiler used and compiler settings are disclosed in the audit.

You could also get this by investigation of the disassembly but this forensic approach is neither easy nor fast. E.g. you could search for this "lea"-treatement I mentioned to identify an Intel compiler, however as there are also many "normal" lea-instructions it is some work. And you need to be an expert in that to get this sorted out.

You could even determine the major version of used compiler but would create even more work. And determining minor version of compiler is likly mostly impossible.

Determining the used compiler would be quite easy if you have the source code as well. And then you do not need to be an expert. Just compile it with a certain compiler and compare the resulting executable with what you have.

endlessmike133 · Feb 13, 2011

HW2050Plus your argument makes me want to wait until Bulldozer is released...when is it expected to release again?

ShadowVVL · Feb 13, 2011

Q2/h1 so july or august i think unless they run into a issue.

JFAMD · Feb 13, 2011

July and august are in Q3

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Senior member

Member

Diamond Member

Lifer

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Senior member

Lifer

Lifer

Golden Member

Elite Member

Golden Member

Senior member

Senior member

Golden Member

Member

Senior member

Senior member

Senior member