Fudzilla: Bulldozer performance figures are in

Page 72 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
That slide is very vague. "128 bit FP" is what exactly? FMA4 optimized ? Or legacy SIMD? C11.5 leaked numbers show us that in legacy SIMD "8C" Bulldozer is not faster than X6 that works at lower clock.

According to this slide , P2 X6 can do 48 flops/cycle wich is
quite respectable and lead to think that CB can extract
only half of BD s througput , i.e , 32 Flops/cycle wich explain
the score that is close to a SB 2600K wich has also 32 flops/cycle.

So CB undoubtly has no FMA support but still , BD manage
to do as well as the X6 , meaning way more efficient FPU
latency and execution speed.

There s a little more of it when thinking that BD s FPUs
can execute both an ADD and a MULT while SB and the X6
have one FPU for FADD and another for FMUL,
and probably that the codes are optimised for this latter case.
 
Last edited:

JFAMD

Senior member
May 16, 2009
565
0
0
A guy at Hardware.fr provided this link with what seems
to be the AMD official numbers...

If they were official AMD numbers they would be on AMD.com

That slide is very vague. "128 bit FP" is what exactly? FMA4 optimized ? Or legacy SIMD? C11.5 leaked numbers show us that in legacy SIMD "8C" Bulldozer is not faster than X6 that works at lower clock.

That is my slide. 128-bit FP is today's SSE.

According to this slide , P2 X6 can do 48 flops/cycle wich is
quite respectable and lead to think that CB can extract
only half of BD s througput , i.e , 32 Flops/cycle wich explain
the score that is close to a SB 2600K wich has also 32 flops/cycle.

No, that is a server slide (I made it) and that refers to the current 12-core Magny Cours.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
If they were official AMD numbers they would be on AMD.com

Let s say unofficialy official...


No, that is a server slide (I made it) and that refers to the current 12-core Magny Cours.

There s written "Phenom II X6" and "Sandy Bridge" in the slide,
are you sure we re talking about the same one ?...

4b225_amdfx01sm.jpg
 

JFAMD

Senior member
May 16, 2009
565
0
0
Let s say unofficialy official...


There s written "Phenom II X6" and "Sandy Bridge" in the slide,
are you sure we re talking about the same one ?...


1. Let's not.

2. Crap, someone took my slide and changed the titles but did not bother to change the numbers at the bottom.

Phenom has 6 cores, each can handle a 128-bit FP execution which is 4x32. that should be 6x4=24. Magny Cours has 12 cores or 12x4=48.


Speaking about servers, you mentioned over at Hardforum......And yet, this "Opteron 6282SE" SKU shows up on this server configuration list? :hmm:

I will retract the 6282 comment, there are some model number changes. But I will tell you that the 2GHz comment on the original statement stands.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
1.

Phenom has 6 cores, each can handle a 128-bit FP execution which is 4x32. that should be 6x4=24. Magny Cours has 12 cores or 12x4=48.

Phenom has two FPUs per core , one to execute FADD and one
which is dedicated to FMULT , both being 128b wide , hence
if they can work simultaneously , they can provide 8 single
precision flops/cycle/core, thus the 48 Flops/cycle for a X6
is quite relevant.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
John takes each FP unit (both FADD and FMUL) as one 128-bit FLOP.

So each FP unit can do one 128-bit FP (4x 32bit FLOPs) or one 256bit FP (8x 32bit FLOPs)

Phenom II X6 has 6 cores and each core can do 4x 32bit FLOPs x 6 = 24 FLOPs
If the original slide was referring to 12-core Magny Cours then it would be 2x 24 = 48 FLOPs

Sandybridge has 4 cores and each core can do 4x 32bit FLOPs x 4 = 16 FLOPs
If the original slide was for servers then it was referring to an 8-core SB, then it would be 2x 16 = 32 FLOPs or 64 FLOPs for 256bit AVX

Bulldozer has 4 modules and each module has 2x 128bit FMACs, that is 2x (32bit x 4) = 8 FLOPs per Module x 4 = 32x FLOPs.
If the original slide was mend for BD 16-core Interlagos then it would be 2x 32 = 64 FLOPs or 64FLOPs for 256bit AVX (each module can only do one 256bit AVX)

Edit: 128bit AVX will again be the same as 128-bit legacy = 32 FLOPs (desktop) or 64 FLOPs for Interlagos ;)
 
Last edited:

sm625

Diamond Member
May 6, 2011
8,172
137
106
JFAMD, I know you work on the server side, but I was wondering if you could give us an educated guess as to what kind of gaming performance increase we will see once they start using FMA in the graphics driver. In all these threads I am not seeing anyone mention the impact the new instructions could have on video drivers. Yet being that ati and AMD are now one, you'd expect those new instructions to utilized right away.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
John takes each FP unit (both FADD and FMUL) as one 128-bit FLOP.

So each FP unit can do one 128-bit FP (4x 32bit FLOPs) or one 256bit FP (8x 32bit FLOPs)

Phenom II X6 has 6 cores and each core can do 4x 32bit FLOPs x 6 = 24 FLOPs
If the original slide was referring to 12-core Magny Cours then it would be 2x 24 = 48 FLOPs

Sandybridge has 4 cores and each core can do 4x 32bit FLOPs x 4 = 16 FLOPs
If the original slide was for servers then it was referring to an 8-core SB, then it would be 2x 16 = 32 FLOPs or 64 FLOPs for 256bit AVX

Bulldozer has 4 modules and each module has 2x 128bit FMACs, that is 2x (32bit x 4) = 8 FLOPs per Module x 4 = 32x FLOPs.
If the original slide was mend for BD 16-core Interlagos then it would be 2x 32 = 64 FLOPs or 64FLOPs for 256bit AVX (each module can only do one 256bit AVX)

Edit: 128bit AVX will again be the same as 128-bit legacy = 32 FLOPs (desktop) or 64 FLOPs for Interlagos ;)

These numbers for SB and a X6 would be right assuming these
CPUs are not capable of executing an FADD and a FMULT
simultaneously , wich is doubtfull...
 

trollolo

Senior member
Aug 30, 2011
266
0
0
why are you kids even still arguing about this? AMD has more cores, which means intel can't compete
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Its true, AMD is going to give us a $1000 processor for a mere $200. What's not to love about that?

:awe:

Seriously, though. I still think the same: a dog in single-threaded and mildly multi-threaded (games), competitive in multi-threaded. Thuban Part 2.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Because a 6-core Phenom II X6 is SOOOO much better than a 4-core i5 2500K *rolls eyes*

sarcasm01.jpg


It doesn't fare bad in multi-threaded, though:

lightroom.png

x264.png


premiere.png


cinebench.png


3dsmax.png


35032.png


I'm only expecting the FX-8120 to do a bit better than the Phenom II X6 1090T. 15% better in multi-threaded, perhaps.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106


It doesn't fare bad in multi-threaded, though:



I'm only expecting the FX-8120 to do a bit better than the Phenom II X6 1090T. 15% better in multi-threaded, perhaps.[/QUOTE]

Unfortunately, unless it beats it gaming the avg. person on these boards will see it as fail. They'll say, who cares if it beats it in x,y, and z? Those are just for fringe users. :rolleyes:

It looks like it's going to be a good "catch up" for AMD. I don't know why people can't comprehend that if AMD can't catch up and pass Intel in one giant Hail Mary pass it's not the end of the world. Intel is the 800lb. gorilla. Bulldozer was actually supposed to have been out a long time ago, but has been delayed. It looks like it's going to be competitive with Intel's current i7 chips. Not too shabby, IMO. Hopefully it can mature and improve so they won't be left behind again. We need competition to drive the technology.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Unfortunately, unless it beats it gaming the avg. person on these boards will see it as fail. They'll say, who cares if it beats it in x,y, and z? Those are just for fringe users. :rolleyes:

It looks like it's going to be a good "catch up" for AMD. I don't know why people can't comprehend that if AMD can't catch up and pass Intel in one giant Hail Mary pass it's not the end of the world. Intel is the 800lb. gorilla. Bulldozer was actually supposed to have been out a long time ago, but has been delayed. It looks like it's going to be competitive with Intel's current i7 chips. Not too shabby, IMO. Hopefully it can mature and improve so they won't be left behind again. We need competition to drive the technology.

True, but the problem is what I've outlined earlier: for desktop workloads, it's better to have four fast cores than eight slow ones. If you have a Quad-Core with a CMP that has 30% higher IPC than the Eight-Core with CMT, then it'll be 30% faster in single-threaded and 20-25% faster in mainstream gaming (1920x1080) since games only use two-four threads. There's the multi-threaded programs, and that's where it has a chance to be competitive. How much the CMT decreases performance in comparison to CMP when all resources are being used, we don't know yet exactly.

As I've said many times, look at the clock speeds, number of cores, and pricing and you'll see it's still at a significant disadvantage in IPC.
 
Last edited:

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
True, but the problem is what I've outlined earlier: for desktop workloads, it's better to have four fast cores than eight slow ones. If you have a Quad-Core with a CMP that has 30% higher IPC than the Eight-Core with CMT, then it'll be 30% faster in single-threaded and 20-25% faster in mainstream gaming (1920x1080) since games only use two-four threads. There's the multi-threaded programs, and that's where it has a chance to be competitive. How much the CMT decreases performance in comparison to CMP when all resources are being used, we don't know yet exactly.

As I've said many times, look at the clock speeds, number of cores, and pricing and you'll see it's still at a significant disadvantage in IPC.

This is only true because of the way we developed software code. If you can optimize for more cores, it's generally always best to do so.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
This is only true because of the way we developed software code. If you can optimize for more cores, it's generally always best to do so.


But unfortunately that's the reality we live with right now. Even then, if the software were compiled to run eight threads in parallel, if the Quad-Core has 30% higher IPC and uses CMP you might not win by a noticeable amount with an Eight-Core with CMT.

If the possibility of Bulldozer being like Gulftown existed, then I'd be cheering. It'll probably have noticeably lower IPC, though, so I'm not.

AMD's not gonna give us $585 worth of performance for $210-250. If it had similar IPC to Gulftown and was around the same speed in multi-threaded we'd be seeing a $320-350 price minimum.
 
Last edited:

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
But unfortunately that's the reality we live with right now. Even then, if the software were compiled to run eight threads in parallel, if the Quad-Core has 30% higher IPC and uses CMP you might not win by a noticeable amount with en Eight-Core with CMT.

If the possibility of Bulldozer being like Gulftown existed, then I'd be cheering. It'll probably have noticeably lower IPC, though, so I'm not.

AMD's not gonna give us $585 worth of performance for $210-250. If it had similar IPC to Gulftown and was around the same speed in multi-threaded we'd be seeing a $320-350 price minimum.

Of course, I never denied the fact that software is currently coded in this manner. However, Windows 8 is changing the way it functions. In fact IE 10 is multi-threaded, javascript functions etc...

To respond to the price statement. I agree but have to state that we have not seen how an optimized software package functions on Bulldozer. Until that time, it's all speculation. Based on the architecture, I still say it will be a SQL and VM monster.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
True, but the problem is what I've outlined earlier: for desktop workloads, it's better to have four fast cores than eight slow ones. If you have a Quad-Core with a CMP that has 30% higher IPC than the Eight-Core with CMT, then it'll be 30% faster in single-threaded and 20-25% faster in mainstream gaming (1920x1080) since games only use two-four threads. There's the multi-threaded programs, and that's where it has a chance to be competitive. How much the CMT decreases performance in comparison to CMP when all resources are being used, we don't know yet exactly.

As I've said many times, look at the clock speeds, number of cores, and pricing and you'll see it's still at a significant disadvantage in IPC.

IPC is only one measure of performance. I'm not trying to dismiss it's importance. Nor am I trying to say that single core performance is not important. Just AMD releasing a CPU that is competitive with Intel is a major accomplishment considering the resources of the 2 companies, and I don't necessarily agree with your (or others) definition of desktop workloads.

We'd all still be using P4's if Intel had no comp in the desktop market and we'd be paying $1000 for one. A chip from AMD that is on par performance wise with the current crop of i7 chips is a very good thing. I understand that AMD is doing it with more cores, if you want to accept that they are actually cores, and higher clocks. It's not as impressive as doing it Intel's way, high IPC and state of the art, cutting edge fabs, but if it's effective, and it works with AMD's current resources, what's the big problem?

Everyday programs that lots of people use are very quickly becoming multi-threaded. Especially tasks which are very time consuming. The tasks that typically only require one core aren't generally very time consuming and are very doable with a slower core (as long as it's not ridiculously slow), as well.
 

reb0rn

Senior member
Dec 31, 2009
320
120
116
you think you could just like that code or reencode program to be multi threaded??
most programs will never be multi threaded
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Of course, I never denied the fact that software is currently coded in this manner. However, Windows 8 is changing the way it functions. In fact IE 10 is multi-threaded, javascript functions etc...

To respond to the price statement. I agree but have to state that we have not seen how an optimized software package functions on Bulldozer. Until that time, it's all speculation. Based on the architecture, I still say it will be a SQL and VM monster.

The thing is, even if we're to assume everything is multi-threaded:

You have CPU X with eight cores, a CMT architecture, and a 3.6GHz clock speed. You have CPU Y with four cores, a CMP architecture, and a 3.3GHz clock speed. If your IPC is too low, you won't have that much multi-threaded speed. The Phenom II X6 1100T has two more cores than the Core i5-2500K and can only match it in multi-threaded.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
The thing is, even if we're to assume everything is multi-threaded:

You have CPU X with eight cores, a CMT architecture, and a 3.6GHz clock speed. You have CPU Y with four cores, a CMP architecture, and a 3.3GHz clock speed. If your IPC is too low, you won't have that much multi-threaded speed. The Phenom II X6 1100T has two more cores than the Core i5-2500K and can only match it in multi-threaded.

That's all it needs to do. A different approach to the same end. The problem comes when it's too slow in single threaded applications. We'll have to see, but BD with it's aggressive turbo and fatter front end is likely to solve that. If it can, at the same power and cost perform as well, why isn't that acceptable?
 
Status
Not open for further replies.