Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Cogman · Feb 13, 2011

HW2050Plus said:
Easiest is to ask the vendor what compiler they used.

For SPEC benchmarks compiler used and compiler settings are disclosed in the audit.

You could also get this by investigation of the disassembly but this forensic approach is neither easy nor fast. E.g. you could search for this "lea"-treatement I mentioned to identify an Intel compiler, however as there are also many "normal" lea-instructions it is some work. And you need to be an expert in that to get this sorted out.

You could even determine the major version of used compiler but would create even more work. And determining minor version of compiler is likly mostly impossible.

Determining the used compiler would be quite easy if you have the source code as well. And then you do not need to be an expert. Just compile it with a certain compiler and compare the resulting executable with what you have.

How about the GCC and AMD/Intel? Lots of compiler benchmarks put it at about equal footing in compiled code with the icc. Have you seen any big differences across architectures for the GCC?

Also, for your code above, how do you know that intel wasn't taking a different code path for AMD CPUs (something they have become notorious for doing) and not just using instructions that hurt AMD more then they hurt intel.

eflat123 · Feb 13, 2011

drizek said:
He's talking about an MSI board with an AM3 socket that is marketed as being "AM3+ CPU compatible".

A good while back I know I read somewhere that a current am3 board *might* be made am3+ compatible with a bios update (if the board manufacturer decided to provide one). Did anyone else see that? Everything I've read recently indicates that will not be the case and in fact the new chip won't even physically go into the am3 socket.

formulav8 · Feb 13, 2011

I've always see synthetic benchs as the "Best Case" scenario.

Edit: Last I heard AMD said they decided NOT to make BD work in existing motherboards do to performance/feature reasons. Has that changed?

IIRC current AM3 cpu's Can work in AM3+ mobos.

evolucion8 · Feb 13, 2011

HW2050Plus said:
To summarize, AMD optimized their CPUs in the past for utmost theoretical performance. They could never reach that because either of practical programming issues and/or some kick by icc tricks.

Bulldozer architecture on the other hand is fully optimized to get utmost practical performance though if you look at paper specs they dropped a lot of theoretical features (e.g. the 3-way AGU and 3-way FLOAT unit). I mean they got 2 cores with almost the die area of one previous core. In my opinion this is the largest CPU design change ever in the x86 business since Pentium, more than P4 was and more than Core was.

Pretty much the same approach that Intel did when they launched Banias/Dothan and its derivatives like Conroe/Penryn, Nehalem/Lynfield. AMD's approach is similar to what Intel did with their NetBust architecture, except that the Star architecture is much better than Netburst.

hamunaptra · Feb 13, 2011

Cogman said:
How about the GCC and AMD/Intel? Lots of compiler benchmarks put it at about equal footing in compiled code with the icc. Have you seen any big differences across architectures for the GCC?

Also, for your code above, how do you know that intel wasn't taking a different code path for AMD CPUs (something they have become notorious for doing) and not just using instructions that hurt AMD more then they hurt intel.

Yes was wondering this as well. How do the 2 compare in GCC...how fair is GCC between intel and AMD.

HW2050Plus · Feb 14, 2011

Cogman said:
How about the GCC and AMD/Intel? Lots of compiler benchmarks put it at about equal footing in compiled code with the icc. Have you seen any big differences across architectures for the GCC?

gcc and Microsoft compiler are not affected. They produce very good code for both architectures. Even the old Intel compilers procuded good code for either architecture (though they were always a bit faster for Intel, but not in a strange way). However the "newer" Intel compilers show this problems. I mean version 5 to 7 were okay, 8-9 got slowly strange and 10 was really very bad for AMD. And Microsoft compiler and gcc continously closed the gap of code performance (in old days you could get 20% more by just compiling with icc, but that is no longer the case and icc is slower if run on AMD).

Cogman said:
Also, for your code above, how do you know that intel wasn't taking a different code path for AMD CPUs (something they have become notorious for doing) and not just using instructions that hurt AMD more then they hurt intel.

Theoretically the Intel compiler could do this, but did not do this because of the compiler settings I used (does not mean that the compiler does this anyway). Actually it thinks it compiles for an Intel CPU (forced by architecture settings). And there was no additional code. I know this because I inspect dissassembly in the performance critical parts (I do this because I do e.g. SSE optimizations and need to investigate that thourougly because performance is critical for this application). After the bad results I tried different compiler settings and by that the slowdown for AMD CPUs varies, but does not disappear.

However the newer versions of Intel Profiler VTune has such code and will simply not even do a profile run if on a computer with AMD CPU (older versions worked fine). I think that was the reason why AMD released it's charge free profiler "AMD CodeAnalyst" after that. My company was very unpleased by that as customers of Intel VTune since many development machines ran on AMD CPUs and we had to switch profilers therefore (we switched to Rational Quantify then)!

This thing with different code parts is more a thing for certain applications not for the compiler itself. E.g. an application could use vanilla code and use SSE only if an Intel CPU has been detected.

But a compiler can't do that if I give the SSE instructions in the source code (what I did just for example).

hamunaptra said:
Yes was wondering this as well. How do the 2 compare in GCC...how fair is GCC between intel and AMD.

I am not familar with the latest gcc version 4.x but I cannot imagine that they are not fair. However you could maybe use some compiler options which work better for one or the other but that is something you can freely choose and all of those things will have very minor effects. Again there is nothing immanent in the compiler and I would declare gcc and Microsoft compiler as 100% fair.

I mean it is really nothing special to be "fair". It creates a lot of work to find out things where you can hurt your competitor without hurting yourself. Obviously only Intel had enough funds to do so. And it does not give any advantage for Intel user's!

evolucion8 said:
Pretty much the same approach that Intel did when they launched Banias/Dothan and its derivatives like Conroe/Penryn, Nehalem/Lynfield. AMD's approach is similar to what Intel did with their NetBust architecture, except that the Star architecture is much better than Netburst.

That is a bit oversimplified. They made a lot more than that they changed almost every aspect in their design compared to K7-K10.5 so it's not star core design anymore. The high speed design has a similarity but works different and much better than with Netburst because e.g. Bulldozer does not have such extremly long pipelines. And the Core split might be compared with Hyper Threading but again the AMD approach of splitting cores is way better (80 to 100% boost for Module technology over -5 to 30% for Hyper Threading).

So Bulldozer is a complete core overhaul + frontend/backend overhaul + cache system overhaul + predictor/prefetcher overhaul; so far that are more conventional changes (in the direction of what Intel did with Conroe) but larger than those of K8-K10.5 architecture switch, only the K7 to K8 switch was nearly that large.

The two big changes on top of that are the high frequency design and the "module technology".

But if you simplify and want to condense that to a single statement then you could call it "Netburst-like" though this is somewhat misleading regarding the results and techniques in detail. It is much more what IBM did with Power7 and the real merits for innovation should maybe go to IBM.

Intel could do the same for their future CPUs. However reagarding the module technology this would be extremly difficult to do for Intel because of let's say the way Intel does x86. AMD could do this because of their design optimized for throughput. They had too much parallel unit's so they could just split them and still had enough (they added even units btw). On the other hand it would be difficult to do Hyper Threading with any AMD CPU architecture. That is why we never saw any Hyper Threading CPU from AMD though it's quite an old technology by now.

Therefore this "Module tech." vs. Hyperthreading advantage could last for quite a long while and that advantage is really huge. Will be interesting if and when Intel manages to provide a similar technology for their upcoming CPUs. Maybe Intel might counter with a Super-HT approach where they could fully use their latency advantage and "Reverse-HT" would be included. But that is just a personal speculation of mine.

maddie · Feb 14, 2011

I appreciate very much your last few posts.

Arkadrel · Feb 14, 2011

The two big changes on top of that are the high frequency design and the "module technology".

Yay for High Frequency design... I wanna see AMD cpus hitting the 4ghz stock values

(probably wishfull thinking, but Im optimistic)

evolucion8 · Feb 14, 2011

HW2050Plus said:
That is a bit oversimplified. They made a lot more than that they changed almost every aspect in their design compared to K7-K10.5 so it's not star core design anymore. The high speed design has a similarity but works different and much better than with Netburst because e.g. Bulldozer does not have such extremly long pipelines. And the Core split might be compared with Hyper Threading but again the AMD approach of splitting cores is way better (80 to 100% boost for Module technology over -5 to 30% for Hyper Threading).

That's cool but I wasn't refering to Bulldozer, I was refering to Stars.

Phynaz · Feb 14, 2011

HW2050Plus said:
The two big changes on top of that are the high frequency design and the "module technology".

You have said a few times that Bulldozer is a "high frequency design", which is a verbatim quote from Real World Technologies. That article is rather dated, and in the interim since it was published additional information has shown inaccuracies in it.

This claim also seems to be at odds with AMD executives who say Bulldozer is designed for power efficiency, and not to be the top performing cpu on the market.

Could you link to sources that show Bulldozer is designed for and will run at high clocks?

Thanks.

Mopetar · Feb 14, 2011

Arkadrel said:
Yay for High Frequency design... I wanna see AMD cpus hitting the 4ghz stock values
(probably wishfull thinking, but Im optimistic)

Might not be terribly unrealistic. The Wikipedia page reports "modules are operating at 0.8 to 1.3V, achieving clock frequencies of 3.5GHz or more" sourced from paper abstracts for the upcoming ICCSS 2011 conference which will take place later this month. If we assume this information is reliable, depending on how much higher it can clock and how aggressively it runs full turbo, 4GHz seems achievable.

Mopetar · Feb 14, 2011

Phynaz said:
This claim also seems to be at odds with AMD executives who say Bulldozer is designed for power efficiency, and not to be the top performing cpu on the market.

Could you link to sources that show Bulldozer is designed for and will run at high clocks?

It's all speculation at this point, but high clock speed and power efficiency are not mutually exclusive. In fact, articles on SB over-clocking have found that the chip is more efficient at higher clock levels.

It all depends on how much additional voltage is needed to drive the faster clock rate.

hamunaptra · Feb 14, 2011

Phynaz said:
You have said a few times that Bulldozer is a "high frequency design", which is a verbatim quote from Real World Technologies. That article is rather dated, and in the interim since it was published additional information has shown inaccuracies in it.

This claim also seems to be at odds with AMD executives who say Bulldozer is designed for power efficiency, and not to be the top performing cpu on the market.

Could you link to sources that show Bulldozer is designed for and will run at high clocks?

Thanks.

AFAIK everything the realworldtech article states its still true of BD design, can you give me an example where it isnt?

Phynaz · Feb 14, 2011

hamunaptra said:
AFAIK everything the realworldtech article states its still true of BD design, can you give me an example where it isnt?

Sure, the first thing that pops out is the first BD CPU will be a 16 core MCM.

So, can you now answer my question? Is there any other evidence besides this article that BD is designed to run at high clocks?

podspi · Feb 14, 2011

Phynaz said:
Sure, the first thing that pops out is the first BD CPU will be a 16 core MCM.

That's Interlagos, which is Bulldozer derived, and will be MCM. I'm almost positive Interlagos isn't going to be a 16-core native design.

Arkadrel · Feb 14, 2011

I'm almost positive Interlagos isn't going to be a 16-core native design.

Zambezi (4 moduals, 8 cores, 8 threads) (dual memory channels) (client market)
Interlagos(8 moduals, 16 cores, 16 threads) (quad memory channels) (server product)
Valencia (4 moduals, 8 cores, 8 threads) (dual memory channels) (server product)

What do you mean "native" design? its not like you can throw a extra in there if you want too, either its designed that way or its not, and/or comes with a part cut off... their being designed for 16 cores.

Is there any other evidence besides this article that BD is designed to run at high clocks?

Hmmm not sure, but it seems to have spread and being reported many places, and I guess there most be something behinde that more than just speculation.

xbitlabs's are saying:
http://www.xbitlabs.com/news/cpu/di..._Clock_Speeds_May_Be_Higher_than_3_50GHz.html

Bulldozer processor modules are designed to operate from 0.8V to 1.3V voltages at 3.50GHz+ clock-speed.

+

"This micro-architecture improves performance and frequency while reducing area and power over a previous AMD x86-64 CPU in the same process. The design reduces the number of gates/cycle relative to prior designs, achieving 3.5GHz+ operation," the claim by AMD reads.

so xbitlabs are quoteing AMD, so somewhere out there on the net must be a artical with it in it.

So 3.5Ghz + ~500+ Turbo Core boost = 4Ghz stock on CPUs x8-16 cores.
We might just see this^ (again Im optimistic)

** unless I mis-interpreated this and turbo is part of the 3.5Ghz already, I dont think so though because of how I read they where gonna market the CPUs with turbo core + stock mhz value, I belive in one of the blogs.

So far we have AMD quoted for saying the Turbo Core boost on all 8 cores will be atleast 500mhz (again from a blog about "Turbo Core boost", and heavly suggesting that for less than 8 core use, it might be higher (if there is TPD headroom for it).

They also say Llano will be over 3GHz... because of the expected higher clocks of the bulldozer cores, its natural to assume thats because of design... theyre simply designed to run at higher clock rates.

Each Bulldozer dual-core CPU module with 2MB unified L2 cache contains 213 million transistors and is 30.9mm2 large.

^ impressive, given Llano is 9.69mm² (without L2 cache) for 1 core (19.38mm² for 2 cores).

From the same link above (xbit labs):

This is not a "Leak". ISSC happens every february. In order to present (as we do every year), you have to submit the paper content in advance.

This was not some calculated disclosure, it was just part of the yearly ISSC cadence that we take part in.

ISSC comes and info goes out

Phynaz · Feb 14, 2011

Thanks Arkadrel, that's what I needed.

The speed quote comes from page 19 of http://isscc.org/doc/2011/isscc2011.advanceprogrambooklet_abstracts.pdf.

They do not mention if the 3.5Ghz is a base or turbo speed. If you were to force me to guess, I would say it is the turbo speed, because you wouldn't design for a base speed and then see what you ended up with.

Interesting that the presentation is being made by AMD Forte Collins, which is made up of former Intel engineers. Specifically the guys that designed the the on-chip ammeter for Itanium - and now used in Core. Only so many places you can work designing high performance cpu's.

Arkadrel · Feb 14, 2011

They do not mention if the 3.5Ghz is a base or turbo speed. If you were to force me to guess, I would say it is the turbo speed, because you wouldn't design for a base speed and then see what you ended up with.

Yes you would.. because of how TPD is the limiter... some workloads dont use the CPU the same as others, for those "light" workloads, you can then overclock the CPU because it still has TPD headroom.

So you would need to make the CPU at a given mhz, where its "safely" under TPD no matter the workload.

Im more or less sure when they say 3.5 Ghz, they mean without the Turbo.
(which means for "light" workloads, we ll see 4Ghz on the CPU)

*drools over idea of 3.5ghz stock +500mhz turbo*

JFAMD · Feb 14, 2011

Phynaz said:
You have said a few times that Bulldozer is a "high frequency design", which is a verbatim quote from Real World Technologies. That article is rather dated, and in the interim since it was published additional information has shown inaccuracies in it.

This claim also seems to be at odds with AMD executives who say Bulldozer is designed for power efficiency, and not to be the top performing cpu on the market.

Could you link to sources that show Bulldozer is designed for and will run at high clocks?

Thanks.

Who is this at odds with?

Ajay · Feb 14, 2011

Arkadrel said:
Zambezi (4 moduals, 8 cores, 8 threads) (dual memory channels) (client market)
Interlagos(8 moduals, 16 cores, 16 threads) (quad memory channels) (server product)
Valencia (4 moduals, 8 cores, 8 threads) (dual memory channels) (server product)

What do you mean "native" design? its not like you can throw a extra in there if you want too, either its designed that way or its not, and/or comes with a part cut off... their being designed for 16 cores.

Versus an MCM like Magny-Cours (IIRC).

hamunaptra · Feb 14, 2011

Phynaz said:
You have said a few times that Bulldozer is a "high frequency design", which is a verbatim quote from Real World Technologies. That article is rather dated, and in the interim since it was published additional information has shown inaccuracies in it.

This claim also seems to be at odds with AMD executives who say Bulldozer is designed for power efficiency, and not to be the top performing cpu on the market.

Could you link to sources that show Bulldozer is designed for and will run at high clocks?

Thanks.

Dude, clock speed and power design are seperate from each other. Just because you design a high speed cpu, does not mean its gonna be power hungry. Sure, thats what history shows us and thats what tends to happen.
But if you design a CPU uarch from the ground up to be good at clocking high and doing it with a good power envelope AND maintaining a good throughput then the more power to ya!

Thats seems to be exactly what the design of BD is all about. Theyve taken out the unnessecarily wide execution resources, shared much of the front end and caches all for the sake of saving die space, boosting IPC by making major parts of the pipeline uncoupled from the rest, also investing heavily redoing the way their front end works.
Lengthening the pipelines, using higher latency caches in some areas. These all point to a high speed frequency driven uarch.
It increases ipc per core a bit over stars AND designed for higher clock.
If all AMD did was design a bit more ipc throughput and maintained only the same power envelopes they had now and for the same clocks....BD would be a huge letdown.
They would have otherwise designed a chip to compete with intel's IPC on every front and in every circumstance. Which is beating a dead horse some more...

So, they realized, why not just find ways to make the uarch more effecient in throughput per clock while slimming down some of the fat areas of the design....sounds good!
So, they increase ipc a bit over current uarch from AMD, but the design can hit amazingly good clocks!

Thats my theory and Im stickin to it =)

Arkadrel · Feb 14, 2011

"It increases ipc per core a bit over stars AND designed for higher clock."

This is the important part.. the IPC is higher than the Phenom's, and it runs at faster speeds than the phenoms, we re gonna see a great boost from a phenom -> bulldozer, in terms of performance.

I love the "idea"s behinde the bulldozer.

Modular => 12% extra space share stuff => double cores (100%->12% space taken).
Flex FPU = 4x 256-bit FP execution or 8x 128-bit FP execution (for a 8 "core" BD).
Turbo Core = makeing use of TPD headroom.

The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, OR each of the integer cores can execute 128-bit commands simultaneously.

Theres no waste this way, with share/combining.

Also with the Modular approch they can have 1 core, have both core's resoures in a modual. So with lower threads, 1 core turns off in each modual, and you have 4-8 "cores" (1 pr thread) that have twice the resources of a normal core.

So light threaded stuff, you have a gain in performance.
heavly threaded stuff, there are 8-16 cores, doing 8-16 threads, which Intel doesnt.

*afaik intel doesnt have a processor that can handle 16 threads.

IntelUser2000 · Feb 14, 2011

Arkadrel said:
*afaik intel doesnt have a processor that can handle 16 threads.

They do, if you are willing to pay $2500 per chip and 20k for a system. They'll even have 20 threads soon.

HW2050Plus · Feb 14, 2011

Phynaz said:
You have said a few times that Bulldozer is a "high frequency design", which is a verbatim quote from Real World Technologies. That article is rather dated, and in the interim since it was published additional information has shown inaccuracies in it.

This claim also seems to be at odds with AMD executives who say Bulldozer is designed for power efficiency, and not to be the top performing cpu on the market.

Could you link to sources that show Bulldozer is designed for and will run at high clocks?

Thanks.

That is an official disclosure by AMD at the ISSCC 2011 conference:

Design Solutions for the Bulldozer 32nm SOI 2-Core Processor Module in an 8-Core CPU
T. Fischer, S. Arekapudi, E. Busta, C. Dietz, M. Golden, S. Hilker, A. Horiuchi, K. A. Hurd, D. Johnson, H. McIntyre, S. Naffziger, J. Vinh, J. White, K. Wilcox, AMD

Bulldozer should have a ~1.35 higher frequency than current AMD CPUs by this high frequency design plus some unknown gain from the shrink to 32 nm.

This ~1.35 is rather fixed by the high frequency design. The question is however what you take as a base frequency to multiply with. And also what gain you add to this for the process itself.

It is possible that AMD reduces the initial clock frequencies for some reason (power saving, more headroom for turbo, marketing, ...) which we do not know.

And do not get mixed up by this 3.5+ GHz statement.
This is the same as if AMD says our current Phenom lineup is 2.6+ GHz.
You should not forget the "+" inside!

Noone knows what AMD will do. So they could - for marketing reasons - deliver much lower speed bins for Bulldozer as they have. I can't tell but I do not think that AMD will do this other than for energy efficient parts. And they could have troubles with high speed bin yields because the process is new.

Phynaz · Feb 14, 2011

JFAMD said:
Who is this at odds with?

People that no longer work for AMD, lol. If you really want me to I'll find the source saying that Bulldozer was not designed for top performance, and AMD was not perusing being the performance king. If I had to guess I would say it was in a 2010 earnings call.

But let me ask this, have you ever stated something simular to the above? Something about being the fastest doesn't matter anymore?

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Lifer

Member

Diamond Member

Platinum Member

Senior member

Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Diamond Member

Senior member

Lifer

Golden Member

Diamond Member

Lifer

Diamond Member

Senior member

Lifer

Senior member

Diamond Member

Elite Member

Member

Lifer