Fudzilla: Bulldozer performance figures are in

Page 88 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Oops. Cray thought they were buying 16 core BD CPUs.

The Orochi die is only 8-cores Interlagos is a dual orochi

The new Crays will be relying heavily on NVIDIA Teslas, so the CPUs hardly matters any more. :D

The XE6's will replace Magny-Cours with Interlagos

XE6 is the one you know with only CPUs

The integer cores lose an ALU, but according to AMD that shouldn't be anything significant because the architecture itself is faster.
An ALU+AGU pair was lost and according to AMD it lowers power consumption and heat dissipation allowing for higher clocks

What's more worrying is the FPUs, since those are at a much bigger disadvantage than the integer cores when it comes to sharing resources. Things like POV-Ray and Cinebench 11.5 care a lot more about FP performance.

The thing that is inflicting bad performance on the Floating Point Cores is not what you thinking it is
 
Last edited:

-Slacker-

Golden Member
Feb 24, 2010
1,563
0
76
Sounds crazy but just wait for benchmarks. It's true unfortunately. Prety dissapointing that they enter 2012 with a Bobcat level of x86 performance instead of K10++ on steroids that could fight at least Nehalem/Westmere. This way they will barely beat Magny Cours in server space. Crazy stuff.

Ah, this sounds like a confident claim. That means you've already tested a retail version of the cpu, but you're still under NDA. Then we'll await your test results on the 12th.

Unless...
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
All they do is advertise the OC capability. From what we found out from different leaks, since AMD will not give away any benchmarks of this piece of technical marvel, IPC is crap, clock for clock comparison with Sandy Bridge is abysmal, some AMD haters say that it's worse than Thuban but hey, you can overclock like crazy for just 250$. 4.5GHz on air? My 2500K is doing the same with a 20$ HSF and it costs 220$ so what's the big deal?

Well, I'm not denying anything you've said about BD. Could all be true, we just don't know. When you compare your 2500k to BD like they are the same thing you are forgeting that BD is 8c, not 4c. I agree that if it gives no better performance then it's irrelevant, that's to be seen though. It's not irrelevant though when it comes to O/C'ing. 8c running 4.5GHz-5GHz is impressive and you can't compare it to 4c being able to run the same speed.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Sorry guys to bring that here but didnt want to start a new BD thread.

I was thinking about BDs die size and that 320mm wasn't looking ok for me.

AMD wanted to minimize die size with the CMT design and if BD is 320mm then it is bigger than Quad Core Phenom I (450M transistors @ 285mm, 65nm process) and from Quad Core Phenom II (758M transistors @ 258mm, 45nm process).

With 32nm process they would be able to double the transistor count of 45nm and maintain the same Die size, so if BD is at 1.5B transistors then it should be at 250-260mm and because of the CMT design even smaller 240mm ??

If the 320mm is correct, then how the CMT design helped in the Die Size ?? unless BD has more than 1.5B transistors i dont believe that the Die size will be bigger than Quad Core Phenom II.

Could it be that 320mm is in fact 220-230mm ??
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Sorry guys to bring that here but didnt want to start a new BD thread.

I was thinking about BDs die size and that 320mm wasn't looking ok for me.

AMD wanted to minimize die size with the CMT design and if BD is 320mm then it is bigger than Quad Core Phenom I (450M transistors @ 285mm, 65nm process) and from Quad Core Phenom II (758M transistors @ 258mm, 45nm process).

With 32nm process they would be able to double the transistor count of 45nm and maintain the same Die size, so if BD is at 1.5B transistors then it should be at 250-260mm and because of the CMT design even smaller 240mm ??

If the 320mm is correct, then how the CMT design helped in the Die Size ?? unless BD has more than 1.5B transistors i dont believe that the Die size will be bigger than Quad Core Phenom II.

Could it be that 320mm is in fact 220-230mm ??

Remember, transistors have two dimensions that are critical in arriving at total drive current - which is critical in determining clockspeed.

GloFo went gate-first integration, an integration process that inherently results in lower I-drives (voltage and width normalized) than a gate-last integration.

AMD has two design choices if they want higher clocks, make the transistors wider in the layout or increase the operating voltage (or a combo of both).

This is why the xtor layout on llano can be so dense for the slower clocked GPU part versus the less dense areas of the higher clocked logic cores. (this is also why the slower L3$ is also a more dense sram cell)

Bulldozer needs to clock even higher than Llano, the xtor density must be reduced to accommodate the necessity of wider transistors.

The trade-offs are made knowingly, which is to say GloFo knew what the upsides and downsides were ahead of committing themselves to them, as were their customers.

Bulldozer being a bigger die than one might expect is actually to be expected once you account for the integration choices and microarchitecture design targets.
 

Abwx

Lifer
Apr 2, 2011
10,949
3,462
136
Given the 315mm2 and accounting for the server dedicated circuits
that are not activated in the DT version , BD should be 30 to 50% faster
in MT tasks than a X6 for the new Uarch to make sense, so it s
troubling that the first numbers are way below AMD s own claims..

As already pointed , the current leaks show an amazingly low
bandwith for L1 Wrt/Cpy , two to three times slower than an X6 ,
assumed that the leaks are legit, so it remain to be seen if the
current version is not litteraly unwillingly crippled..
 

piesquared

Golden Member
Oct 16, 2006
1,651
473
136
Sorry guys to bring that here but didnt want to start a new BD thread.

I was thinking about BDs die size and that 320mm wasn't looking ok for me.

AMD wanted to minimize die size with the CMT design and if BD is 320mm then it is bigger than Quad Core Phenom I (450M transistors @ 285mm, 65nm process) and from Quad Core Phenom II (758M transistors @ 258mm, 45nm process).

With 32nm process they would be able to double the transistor count of 45nm and maintain the same Die size, so if BD is at 1.5B transistors then it should be at 250-260mm and because of the CMT design even smaller 240mm ??

If the 320mm is correct, then how the CMT design helped in the Die Size ?? unless BD has more than 1.5B transistors i dont believe that the Die size will be bigger than Quad Core Phenom II.

Could it be that 320mm is in fact 220-230mm ??

One possibility is that the wafer shown was for the originial 45nm version.
 

Abwx

Lifer
Apr 2, 2011
10,949
3,462
136
Found this posted at hardware.fr..

Actually, we already have such an issue known for Bulldozer, and NO bench-marked system has the patch installed!

The shared L1 cache is causing cross invalidations across threads so that the prefetch data is incorrect in too many cases and data must be fetched again. The fix is a "simple" memory alignment and (possible)tagging system in the kernel of Windows/Linux.

I reviewed the code for the Linux patch and was astonished by just how little I know of the Linux kernel... lol! In any event, it could easily cost 10% in terms of single threaded performance, possibly more than double that in multi-threaded loads on the same module due to the increased contention and randomness of accesses.

Not sure if ordained reviewers have been given access to the MS patch, but I'd imagine (and hope) so! Last I saw, the Linux kernel patch was still being worked on by AMD (publicly) and Linus was showing some distaste for the method used to address the issue. One comment questioned the performance cost but had received no replies... but you don't go re-working kernel memory mapping for anything less than 5-10%... just not worth it!


http://www.xtremesystems.org/forum [...] ost4969164
 

frostedflakes

Diamond Member
Mar 1, 2005
7,925
1
0
How do "cross invalidations across threads" negatively affect single threaded performance? :hmm:

Anyway, hopefully something that can be remedied with a hotfix is wrong, because the performance numbers we've been seeing are pretty underwhelming. I have a feeling this is just wishful thinking, but who knows, we'll have the full story in a couple more days I guess.
 

Abwx

Lifer
Apr 2, 2011
10,949
3,462
136
How do "cross invalidations across threads" negatively affect single threaded performance? :hmm:

There is no such thing as purely single thread situation..
Even in so called single threaded benches , the OS will use
available cores to execute whatever is data independant
in respect of the main thread.

So if the OS is to use ,even weakly , a second core , chances
are that it will use the second core of a module whose first
core is executing the main thread , leading to the problem
highlighted in this XS post...
 

frostedflakes

Diamond Member
Mar 1, 2005
7,925
1
0
10-20% performance improvement seems too good to be true. I think some people are just willing to cling onto anything that could suggest BD isn't the flop benchmarks so far are making it out to be. But like I said, guess we'll see soon enough.

Is there more info on this issue and the fix? The poster mentioned AMD has been working publicly on a patch for the Linux kernel, anyone have a the link to the public discussions on it?
 

Crap Daddy

Senior member
May 6, 2011
610
0
0
What's going on? Now we need patches, driver updates and such for CPUs? The review samples have been sent long ago and probably the reviews are done waiting for the NDA to lift. Based on what we have right now, early leaks and the latest thorough reviews by Lab501 and Donanimhaber the show is over. Bulldozer as it is right now is a big dissapointment.
 

Riek

Senior member
Dec 16, 2008
409
14
76
What's going on? Now we need patches, driver updates and such for CPUs? The review samples have been sent long ago and probably the reviews are done waiting for the NDA to lift. Based on what we have right now, early leaks and the latest thorough reviews by Lab501 and Donanimhaber the show is over. Bulldozer as it is right now is a big dissapointment.

Performance is so far below expectations from the design, that speculation is = huge design fault. In which case everybody hopes they will resolve it with software... Since the samples are already with reviewers, software is the only thing that can change atm. So it is more out of hope.
A bit similar to the 69xx debacle where everybody hoped the Cat 11 drivers would skyrocket the performance. Although the 69xx is probably the best buy cards, everybody expected that it would beat the 580 from nvidia.

However in this relation: BD has the bigger die and power consumption, where 69xx had huge advantages in both those regions.
 
Last edited:

wonderbread57

Junior Member
Oct 4, 2011
22
0
0
Performance is so far below expectations from the design, that speculation is = huge design fault. In which case everybody hopes they will resolve it with software... Since the samples are already with reviewers, software is the only thing that can change atm. So it is more out of hope.
Graphics cards have software drivers, what do CPUs have that can be improved with software other than utilizing the latest compilers?
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
Given the 315mm2 and accounting for the server dedicated circuits
that are not activated in the DT version , BD should be 30 to 50% faster
in MT tasks than a X6 for the new Uarch to make sense, so it s
troubling that the first numbers are way below AMD s own claims..

As already pointed , the current leaks show an amazingly low
bandwith for L1 Wrt/Cpy , two to three times slower than an X6 ,
assumed that the leaks are legit, so it remain to be seen if the
current version is not litteraly unwillingly crippled..

Obviously it's hard to tell with leaks, but I've never seen false leaks that were this far below expectations. I think that the leakers themselves were shocked by the data.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
10-20% performance improvement seems too good to be true. I think some people are just willing to cling onto anything that could suggest BD isn't the flop benchmarks so far are making it out to be. But like I said, guess we'll see soon enough.

Is there more info on this issue and the fix? The poster mentioned AMD has been working publicly on a patch for the Linux kernel, anyone have a the link to the public discussions on it?

Actually, I asked jfamd about this exact issue 6 months ago, and iirc he said that they were working on it. Apparently, they didn't work on it fast enough, and something like this would explain the counterintuitive drop in performance relative to phenom II x6.
 
Status
Not open for further replies.