Fudzilla: Bulldozer performance figures are in

BlueBlazer · Sep 8, 2011

Dresdenboy said:
You should read more accurately. This is not a bug, it's a behaviour resulting from some aliasing (caused by the way it's implemented) and a way to prevent it. I'm not sure if I twittered about that. So in CPU-heavy benchmarks performance might drop to ~97%.

If you noticed the summary in there, from amd.com.......

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

IMHO the "patch" here seems to imply that there's some sort of issue (or bug) with the cache behaviour during virtual address aliasing. However, whether this issue affects other unpatched operating systems (perhaps Windows) is unknown (speculation). And later, from Linus.....

You guys do realize that we had to disable ASLR on many machines?

Not sure how disabling this feature also affects other operating systems. :hmm:

Dresdenboy said:
Of course, I missed that. I was on vacation from 08/15 to 08/29. Anyway thanks for your analysis, I'll have a look at it. With "[f/le]aked" I tried to show that I have no knowledge if these results were faked or leaked and I even found something strange at first sight.

Fortunately, I knew the original source of those fakes and followed the thread (many posts later). That's when the other members in the forum alerted others about the poster's previous fakery (and exposed his photochopped mistakes).

Idontcare said:
Seriously though, in this day and age with firewalled routers and AV/spamblocker software I don't understand how you are possibly leaving yourself exposed to internet-based threat vectors. How the heck are they getting in?

Exploits, either thru browser itself (especially IE users), or thru browser plugins or ActiveX applets. I got hit once with a Java trojan (because I was lazy to update Java) but luckily the AV caught it. Other well known vectors are e-mail systems (especially Outlook) and USB storage drives (especially from an infected machine).

LOL_Wut_Axel · Sep 8, 2011

Nemesis 1 said:
Ya I want to See ALL 8 cores on BD running @ 4.5ghz. The 4 core blows its wadd at 100mgz turbo. You act like all 8 BD cores are going to turbo up . They are not. See now we have to listen to you bable for another 30+++ days to mussle you.

Right, so first your argument is that a Core i5-2500K at 4.5GHz can reach around 7.2 points on Cinebench and then when I say that an 8-core FX will probably get a lot more at the same clock speed, you mention Turbo and some other irrelevant things. All the FX CPUs should have around the same overclocking headroom as Sandy Bridge given that they're on the same process node and the die size is not huge (see Phenom II X4 vs Core 2 Quad 45nm and Phenom II X6 vs Nehalem Core i7 45nm).

Nemesis 1 · Sep 8, 2011

LOL_Wut_Axel said:
Right, so first your argument is that a Core i5-2500K at 4.5GHz can reach around 7.2 points on Cinebench and then when I say that an 8-core FX will probably get a lot more at the same clock speed, you mention Turbo and some other irrelevant things. All the FX CPUs should have around the same overclocking headroom as Sandy Bridge given that they're on the same process node and the die size is not huge (see Phenom II X4 vs Core 2 Quad 45nm and Phenom II X6 vs Nehalem Core i7 45nm).

No NO no your not real good at this. I was comparring 2500k to 12 core amd at 2.6ghz the 2500k is at 4.5ghz. 12 cores against 4 2.6 +2.6 + 2.6 = 7.8ghz 4 cores to 4.5 ghz 4 cores. Ya like that math . looks great to me.

Now if you want to go 8 core BD we put that up against 4 core 2600K not 2500K .

I will bet you what ever you want that an 8 core BD will not come close to a 2600K when ALL CORES ARE overclocked put up or shout up.I am referring to clock speed of course

LOL_Wut_Axel · Sep 8, 2011

Nemesis 1 said:
No NO no your not real good at this. I was comparring 2500k to 12 core amd at 2.6ghz the 2500k is at 4.5ghz. 12 cores against 4 2.6 +2.6 + 2.6 = 7.8ghz 4 cores to 4.5 ghz 4 cores. Ya like that math . looks great to me.

Now if you want to go 8 core BD we put that up against 4 core 2600K not 2500K .

I will bet you what ever you want that an 8 core BD will not come close to a 2600K when ALL CORES ARE overclocked put up or shout up.I am referring to clock speed of course

What's the point of comparing a Magny Cours 12-core at 2.6GHz versus an i5-2500K at 4.5GHz on Cinebench, anyway? We all know Cinebench performance numbers increase hugely at higher clock speeds. As a matter of fact, a Phenom II X6 at 4GHz scores 7 points, and at 4.2GHz 7.3 points. That already matches the 2500K at 4.5GHz. Take the same IPC, add two cores, add 300MHz more than before (to 4.5GHz), and you should get a score higher than 9 points, easily surpassing a 2600K at 4.5GHz.

Even with the same IPC as Llano, Bulldozer would be faster than Sandy Bridge in very multi-threaded applications. Add 10% IPC, and then the difference starts to become more noticeable. In single-threaded it'll suck unless it can get near-Nehalem IPC, though.

carnage10 · Sep 8, 2011

LOL_Wut_Axel said:
All the FX CPUs should have around the same overclocking headroom as Sandy Bridge given that they're on the same process node and the die size is not huge (see Phenom II X4 vs Core 2 Quad 45nm and Phenom II X6 vs Nehalem Core i7 45nm).

I dunno... The paltry 100mhz turbo increase on the FX-4170, plus the rumors that the delays are partly due to AMD not being able to reach the clocks they wanted, leads me to believe that 4.3 Ghz is fast approaching the max overclocking headroom limit at a reasonable voltage/temp on the current BD chips. I think with big voltage bumps some may hit 4.5 or maybe 4.6 under water, but i highly doubt we'll see most chips hitting 4.8ghz+ (which almost all 2500k/2600k's seem to be able to reach) let alone any 5 or 5.1ghz BD chips.

I think in most cases BD will be more like my old Newcastle/Clawhammer 3800+ where no matter how much voltage i'd pump into it, it just wouldnt OC past 2.5ghz (+100mhz)

LOL_Wut_Axel · Sep 8, 2011

carnage10 said:
I dunno... The paltry 100mhz turbo increase on the FX-4170, plus the rumors that the delays are partly due to AMD not being able to reach the clocks they wanted, leads me to believe that 4.3 Ghz is fast approaching the max overclocking headroom limit at a reasonable voltage/temp on the current BD chips. I think with big voltage bumps some may hit 4.5 or maybe 4.6 under water, but i highly doubt we'll see most chips hitting 4.8ghz+ (which almost all 2500k/2600k's seem to be able to reach) let alone any 5 or 5.1ghz BD chips.

I think in most cases BD will be more like my old Newcastle/Clawhammer 3800+ where no matter how much voltage i'd pump into it, it just wouldnt OC past 2.5ghz (+100mhz)

No. It simply means that they won't clock it higher because they need to pass their stability validations at a given voltage--nothing else. Even if turbo only reaches 4.2GHz, that means they'll have min a 300MHz headroom over it. Given that clock speeds are already so high to begin with, reaching 4.5GHz should be easy as cake.

AMD isn't gonna clock their chips at the max they can before they run into stability issues. Also, the fact that there's no 140W TDP 8-core chips and the 3.6GHz/4.2GHz Turbo one is at 125W should tell you they have decent headroom and aren't limited by heat.

Dresdenboy · Sep 8, 2011

BlueBlazer said:
If you noticed the summary in there, from amd.com.......IMHO the "patch" here seems to imply that there's some sort of issue (or bug) with the cache behaviour during virtual address aliasing. However, whether this issue affects other unpatched operating systems (perhaps Windows) is unknown (speculation). And later, from Linus.....Not sure how disabling this feature also affects other operating systems. :hmm:

OK, I have to go a bit deeper into the matter, any virtual regiments aside..

Patching software just means to apply changed code to do something differently than originally implemented. This can be a workaround for a HW bug (maybe this is why you think every patch is related to a bug) or a fix for a SW bug, but it also can be an improvement or even a new feature.

In case of Barcelona the published patch was a workaround+disabling a feature regarding a real HW bug in form of the well known TLB bug. It was a bug because it could cause a crash under certain circumstances.

In the case of this BD related patch it's about handling something differently in the kernel to avoid situations where performance could be reduced by a couple of percent, as can be clearly seen here:

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

http://www.spinics.net/lists/linux-tip-commits/msg13140.html

Although it's just some "tuning" it affects much more situations than e.g. optimizations like using ADD/SHL instead of a slower MUL instruction depending on the operands.

The shared I-cache of a BD module is not a bug but a conceptual detail. And how it handles virtual addresses is also not a bug but an implementation detail.

BTW I found the posting where I read about it first time:
http://www.planet3dnow.de/vbulletin/showpost.php?p=4487776&postcount=4655
The OP already quoted the performance impact I used to get to 97%:

> Out of curiosity, what's the performance impact if the workaround is
> not enabled?
Up to 3% for a CPU-intensive style benchmark, and it can vary highly in a microbenchmark depending on workload and compiler.

The 3% number is also clearly mentioned in the patch as of 08/05 here:

Code:

+	align_va_addr=	[X86-64]
+			Align virtual addresses by clearing slice [14:12] when
+			allocating a VMA at process creation time. This option
+			gives you up to 3% performance improvement on AMD F15h
+			machines (where it is enabled by default) for a
+			CPU-intensive style benchmark, and it can vary highly in
+			a microbenchmark depending on workload and compiler.

It might be interesting, how the situation is on Windows based systems. Any OS guys around?

Nemesis 1 · Sep 8, 2011

LOL_Wut_Axel said:
What's the point of comparing a Magny Cours 12-core at 2.6GHz versus an i5-2500K at 4.5GHz on Cinebench, anyway? We all know Cinebench performance numbers increase hugely at higher clock speeds. As a matter of fact, a Phenom II X6 at 4GHz scores 7 points, and at 4.2GHz 7.3 points. That already matches the 2500K at 4.5GHz. Take the same IPC, add two cores, add 300MHz more than before (to 4.5GHz), and you should get a score higher than 9 points, easily surpassing a 2600K at 4.5GHz.

Even with the same IPC as Llano, Bulldozer would be faster than Sandy Bridge in very multi-threaded applications. Add 10% IPC, and then the difference starts to become more noticeable. In single-threaded it'll suck unless it can get near-Nehalem IPC, though.

I would very much like to see X6 stable at 4.2GHZ .

I can boot and run alot of benchies at unstable O/C s Were talking 24/7 overclocks here on air. You also keep comparring intel 32nm to AMDs gate first like there the same . You have been pulling alot of O/C numbers that 90% of AMD cores can't reach and none stable. I think it was you saying llano is getting 4 ghz o/cs . I wouldn't buy an AMD myself but Bob has them in the shop thats were I been the last 90 mn.

He thinks your drunk . YOU load all 4 cores on lano and run a game it will not do as you say . Fact is not even close Stable . Next your going to go gas on me here shortly. Show me a stable 24/7 x6 O/C link @4.2 ghz . Hell I will take a 4ghzx6 stable over clock . You assume for some wild reason AMD 32nm and Intel 32nm are the same . Sorry intel isn't fabbing AMD chips . You act like 2 extra cores is nothing . Just make the wager and stop the talk .

Nemesis 1 · Sep 8, 2011

LOL_Wut_Axel said:
No. It simply means that they won't clock it higher because they need to pass their stability validations at a given voltage--nothing else. Even if turbo only reaches 4.2GHz, that means they'll have min a 300MHz headroom over it. Given that clock speeds are already so high to begin with, reaching 4.5GHz should be easy as cake.

AMD isn't gonna clock their chips at the max they can before they run into stability issues. Also, the fact that there's no 140W TDP 8-core chips and the 3.6GHz/4.2GHz Turbo one is at 125W should tell you they have decent headroom and aren't limited by heat.

The 4 core BD pulls 125 TDP at 4.2 and Zoooms all the way up to 4.3 under turbo , You act like none of us here can read the same information as yourself and comprend what we read

LOL_Wut_Axel · Sep 8, 2011

Nemesis 1 said:
I would very much like to see X6 stable at 4.2GHZ .

I can boot and run alot of benchies at unstable O/C s Were talking 24/7 overclocks here on air. You also keep comparring intel 32nm to AMDs gate first like there the same . You have been pulling alot of O/C numbers that 90% of AMD cores can't reach and none stable. I think it was you saying llano is getting 4 ghz o/cs . I wouldn't buy an AMD myself but Bob has them in the shop thats were I been the last 90 mn.

He thinks your drunk . YOU load all 4 cores on lano and run a game it will not do as you say . Fact is not even close Stable . Next your going to go gas on me here shortly. Show me a stable 24/7 x6 O/C link @4.2 ghz . Hell I will take a 4ghzx6 stable over clock . You assume for some wild reason AMD 32nm and Intel 32nm are the same . Sorry intel isn't fabbing AMD chips . You act like 2 extra cores is nothing . Just make the wager and stop the talk .

I was making an argument for how performance would be at the clock speed and how much it can affect the performance numbers in Cinebench, not based on whether it's stable or not. Regardless, there's some Phenom II X6s running at 4.2GHz stable. BD should reach 4.5GHz easily.

Looks like you're grasping at straws.

LOL_Wut_Axel · Sep 8, 2011

Nemesis 1 said:
The 4 core BD pulls 125 TDP at 4.2 and Zoooms all the way up to 4.3 under turbo , You act like none of us here can read the same information as yourself and comprend what we read

Because you don't understand how CPU stability validation works. There's a target AMD needs to reach at a set voltage. You act as if they'll just push the CPUs with more voltage and frequencies while forgetting about power efficiency and the TDP.

AtenRa · Sep 8, 2011

Most Phenom II X6 1090T and 1100T will be stable at 4/4.2GHz for 24/7

Comparing Phenom II X6 with Llano and Bulldozer about OC headroom is apples to oranges.

No matter if Llano is made at the same 32nm process, you cannot say that BD will not OC because Llano doesn't pass 4G.

LOL_Wut_Axel · Sep 8, 2011

AtenRa said:
Most Phenom II X6 1090T and 1100T will be stable at 4/4.2GHz for 24/7

Comparing Phenom II X6 with Llano and Bulldozer about OC headroom is apples to oranges.

No matter if Llano is made at the same 32nm process, you cannot say that BD will not OC because Llano doesn't pass 4G.

We don't know yet, but it definitely should. The unlocked A8-3870 is coming out. :thumbsup:

BlueBlazer · Sep 8, 2011

Dresdenboy said:
It might be interesting, how the situation is on Windows based systems. Any OS guys around?

That might be interesting. Seems (rumorville) another new "B2G" stepping is coming up again (previously there was the "B2F" stepping), perhaps to correct this issue rather than having to apply patches (speculation) >> AMD new discussion topic - Part 33.....

As I mentioned before but has delayed the launch date Zambezi is now fixed (but I have not mentioned the exact date to be seen).

The delay I had with internally because we do not have final silicon, the media samples are still not well defined or something and know that distributors have nothing concrete to pre-order (and other signals that I do not post). This process takes place all in one month and a half off for the launch. So if one of these items are missing is a paper launch with no samples. Incidentally, I still have a few indicators that tell me that something is still not going well.

Mass Production of the final stepping B2G should soon start but it is not yet known and when it is actually available.

Benchmarks indications:
compared 2600K
compared 1100T :\

masterbm · Sep 8, 2011

Hum this seems similar to the stuff I heard when pheonm came out. Unitl anandtech reviews or another sit let say that it begins with t reviews I will not take it as fact

Nemesis 1 · Sep 8, 2011

LOL_Wut_Axel said:
Because you don't understand how CPU stability validation works. There's a target AMD needs to reach at a set voltage. You act as if they'll just push the CPUs with more voltage and frequencies while forgetting about power efficiency and the TDP.

You say , I don't understand. Your a funny whatever teen maybe. Do yourself a favor and stop while your behind . Your titanic may not make it away from the port.

Riek · Sep 9, 2011

Nemesis 1 said:
I don't understand. Your a funny whatever teen maybe. Do yourself a favor and stop while your behind . Your titanic may not make it away from the port.

Don't blame him. Its hard to argue with people whos titanic have already hit the oceans bottom.

The delay I had with internally because we do not have final silicon, the media samples are still not well defined or something and know that distributors have nothing concrete to pre-order (and other signals that I do not post). This process takes place all in one month and a half off for the launch. So if one of these items are missing is a paper launch with no samples. Incidentally, I still have a few indicators that tell me that something is still not going well.

Translated correctly: he says that he expected delays because they didn't get the final silicon, including other factors which he cannot elaborate which should occur at 1month before release. He is a reliable source though

Nemesis 1 · Sep 9, 2011

Riek said:
Don't blame him. Its hard to argue with people whos titanic have already hit the oceans bottom.

Translated correctly: he says that he expected delays because they didn't get the final silicon, including other factors which he cannot elaborate which should occur at 1month before release. He is a reliable source though

Who is a reliable source Oh I fixed my post above yours . So you could read it correctly

CHADBOGA · Sep 9, 2011

grimpr said:
Hans De Vries claims that the Prefetchers are also not enabled on the BD ES chip.

http://semiaccurate.com/forums/showpost.php?p=132211&postcount=788

Hans has unfortunately drunken too much of the Green Kool Aid and has become a radicalised AMDroid.

grimpr said:
You're quoting and taking seriously Elmer Phud? :thumbsdown::thumbsdown: i forgive you since you obviously dont know who he is.

He is a genius and a great, great man.

Dresdenboy said:
I've been on SI and IH boards years ago and while I liked to discuss with a rather strongly Intel biased guy like Elmer I at one day thought that it was impossible to have a neutral discussion.

Elmer has grown weary of the nonsense so many AMD fanboys go on with, so he might be a tad standoffish towards you initially, but if he knows who you are, I suspect he would enjoy conversing with you and fruitful conversations could be had for both parties.

inf64 · Sep 9, 2011

From Anand's twitter account:

anandshimpi
B2.G is where AMD is at today, no word on whether or not that's final, it's close though

11 hours ago

anandshimpi

This is why we never did an early preview of Bulldozer on AT, no sense in putting out numbers that may not be representative

11 hours ago

anandshimpi

And I don't believe the final decision has been made to go to market (desktop) with B2.G either, will know for sure in the coming weeks

11 hours ago

anandshimpi

I'm not saying anything about absolute performance, just keep in mind that silicon that's older than ~2 weeks isn't production worthy

11 hours ago

anandshimpi

Beware of any leaked Bulldozer benchmarks, unless you're running B2.G you're not looking at shipping performance

So yeah,all those results showing BD ES slower than Bobcat/Llano are useless.

classy · Sep 9, 2011

inf64 said:
From Anand's twitter account:
So yeah,all those results showing BD ES slower than Bobcat/Llano are useless.

I believe this thing is going to shock some folks. I think the real value is I believe its an architecture they can build on and improve over time. Unlike the K8/K10 which they have squeezed out all that architecture had to to offer.

lifeblood · Sep 9, 2011

inf64 said:
From Anand's twitter account:
So yeah,all those results showing BD ES slower than Bobcat/Llano are useless.

This could explain the silence from AMD. It sounds like they had bugs and their tweaking things that have a significant impact on benchmarks. If that were the case I would not release any information either. Hopefully it will all come together at the end and a worthy product will roll off the production line.

From Anands tone it does not sound like BD is a disaster. It may not be what we hope for, but it doesn't sound terribly bad either.

uribag · Sep 9, 2011

inf64 said:
From Anand's twitter account:
So yeah,all those results showing BD ES slower than Bobcat/Llano are useless.

Informal,

Do you know if those Interlagos that are already shipping are B2G?

inf64 · Sep 9, 2011

uribag said:
Informal,

Do you know if those Interlagos that are already shipping are B2G?

I'm not sure but they should be. Launch is still 2 weeks or so away so I think we won't see any numbers anytime soon.

podspi · Sep 9, 2011

Yea, I don't understand how a revision with higher IPC is coming down the pipeline if AMD is already shipping (more expensive, lower clocked) Opterons?

Are they just expecting server customers to accept lower performance? Are there actually two dies for BD (instead of just the one?).

Fudzilla: Bulldozer performance figures are in

Senior member

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Golden Member

Lifer

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Member

Lifer

Senior member

Lifer

Platinum Member

Diamond Member

Lifer

Senior member

Member

Diamond Member

Golden Member