YABulldozerT: AMD FX Processor Prices Lower Than Expected

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
More yoda talk from mainland China.

Know the truth

1) Bulldozer full product line announcement, FX-4170 up to 4.2GHz frequency. http://diybbs.zol.com.cn/11/11_100683.html
2) from coolaler with corescn (970 replacement motherboard after) Bulldozer B2 stepping score on the "I know the results," the premise is true.
2.1) with 2 sets of NB coolaler frequency of testing, found that when the frequency of 2.2GHz and 2.6GHz NB When, WPrime results better than NB frequency of 2.0GHz and 2.4GHz (see above B2 stepping known BUG).
2.2) Another test showed that: When the FX-8150 default rate with 1866MT / s memory, Cinebench R11.5 score of 5.95. The overclocked to 4.8GHz, and NB 2.2GHz after, Cinebench R11.5 performance upgrade to 7.8. (I can only guarantee the authenticity of the default results.)
3) C0 stepping overclocking ability is improved, and to reduce some of the products of the TDP.
4) shopblt.com Bulldozer with provantage.com scheduled start, the truth is unknown.
5) The so-called Bulldozer official results from AMD. http://diybbs.zol.com.cn/11/11_100797.html
6) AGESA is to develop the necessary BIOS code, now part of the BIOS does not use the latest version of AGESA, Bulldozer will ultimately limit the performance.
7) retail chip (B2G) has started production, the relative performance of engineering samples (B2) has been raised. (Please read my microblogging)
8) had a final statement chips Fritz Chess Game score as high as 17000, veracity study. (Refer to section 2 of this item)
9) all messages are pointing October 13 date for the NDA lifted, authenticity is difficult to trace. Probably like 6 September, rumors, etc. August 26 the same. After all, only to eventually determine the release of AMD.

http://translate.google.com/transla...&u=http://diybbs.zol.com.cn/11/11_100864.html
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
But those cores have the module penalty, right?

So those 16 cores, at 80% are only 12.8 cores - so those 35% would have to come from IPC and clock speed.

But since the clock speed is the same either the IPC is way higher or the performance won't be 35% up.

They said "up to" 35% higher performance. If they got that "up to" figure using an application that only requires integer execution then they may lose very little performance, if any.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
You could say the same thing about the Apple A6. How many transistors does it have? How much does it cost to make? We know nothing about it, yet there must be 10 million of those chips somewhere.

Err, no you couldn't.
 

The J

Senior member
Aug 30, 2004
755
0
76
:hmm: about as wise as seeking information on pre-release hardware from a troll-laden forum in the firstplace? :|

We all get what we pay for here, the info is free and it's worth every penny :D
Good point and agreed! :biggrin:

Even if we ignore all the pricing rumors, if BD is so competitive, why was it delayed so many times, why did it require so many revisions to get higher frequencies, why is it 9 months late since SB arrived, why hasn't there been a single positive leak from anyone?
I don't mean to imply anything about BD's competitiveness; I'm as skeptical as anyone else (though I remain somewhat hopeful). It just seems like people have been appropriately skeptical of performance leaks (both good and bad), but have accepted these prices without question. Even those who believe the benchmark leaks are fake/not representative are formulating their arguments on the basis that these prices are correct ("AMD has a history of pricing this way...", and so on) rather than simply questioning the prices themselves.

This was true across a few different forums (at least what I saw, anyway), which is why I wanted to know if the BLT website is known for being reliable. I thought maybe they are given the amount of trust being placed in them, but perhaps they're just following the rumors like the rest of us.

Maybe the trend would have been different if the prices were leaked before a lot of the benches?

Its not to say that the pricing is reliable but any bit of information brings more clues and answers. I was also referring to the price of Opteron 6272 being lower than Opteron 6176 SE earlier from another site. :hmm:
Your first sentence there makes sense and I think is the explanation I was looking for.

I did sort of figure that AMD would try to use their "it's got 8 cores!!" marketing to fetch a higher price from the early adopters and overclockers, but maybe that is they're doing and prices will drop even lower soon after availability.
 

BlueBlazer

Senior member
Nov 25, 2008
555
0
76
Its a collection of information that's already been posted/discussed earlier. The Fritz chess was debunked. If not mistaken, that "C0 stepping" speculation came from SemiAccurate forums. Other details not posted here is that Cinebench score 7.8 was with a FX8150 overclocked to 4.8GHz mentioned on XS forums (by a poster from Vietnam who supposedly have one). :hmm:
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81

You're forgetting the part about pricing coming from stores that are gonna sell the CPU. Pre-order pricing is always very close to launch pricing.

So yes, it's pretty much a fact that the FX-8150 will cost from $~250-275.

But by all means, be optimist. The reality of it is that it's not as competitive as they hoped for, so they're pricing lower and trying desperately to raise clock speeds. Bulldozer, like Thuban, will be good for multi-threaded applications and at a big deficit in anything that's not.
 
Last edited:

njdevilsfan87

Platinum Member
Apr 19, 2007
2,341
264
126
I'm disappointed, but not at the same time. I've had the need to load up more cores lately (e6600 is about to be upgraded to a used 45nm xeon quad) so if they release something that's 8-cores for $160... I could get myself a multi-tasking monster for pretty cheap if all I'm going to need is the chip and motherboard.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
They said "up to" 35% higher performance. If they got that "up to" figure using an application that only requires integer execution then they may lose very little performance, if any.

They said "up to 80%" performance of a traditional dual-core design.

The integer portion seems to be the one more prone to lose performance - the FPU has more theoretical capability than 2x K10.5's FPU.
 

The J

Senior member
Aug 30, 2004
755
0
76
You're forgetting the part about pricing coming from stores that are gonna sell the CPU. Pre-order pricing is always very close to launch pricing.

So yes, it's pretty much a fact that the FX-8150 will cost from $~250-275.

But by all means, be optimist. The reality of it is that it's not as competitive as they hoped for, so they're pricing lower and trying desperately to raise clock speeds. Bulldozer, like Thuban, will be good for multi-threaded applications and at a big deficit in anything that's not.

That's what I wanted to be sure of, thanks! I'll continue being cautiously optimisitic! :p
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
They said "up to 80%" performance of a traditional dual-core design.

The integer portion seems to be the one more prone to lose performance - the FPU has more theoretical capability than 2x K10.5's FPU.

Actually, it's the other way around. A module has two full, normal integer cores that do not share any resources. When it comes to that it's the same as all other CPUs in execution. Where it changes is the FPU, since if only a single thread is executed on the module it'll use all the resources. That means it'll be able to handle 256-bit AVX (used for encryption) and FP SSE. If there's two threads being executed, then each one is forced to share resources. How much real-world performance they lose from it, we don't know. Every module has 2x 128-bit FMAC.

Also, they meant to say 180% scaling in comparison to two typical cores, or 200%.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
They said "up to" 35% higher performance. If they got that "up to" figure using an application that only requires integer execution then they may lose very little performance, if any.

The 35% was for the HPC market which is a downplayed number because it was talking about current apps

33% more core + 2% improvement(.125% improvement per core)

Unfortunately, HPC is one of the few places where developers program to metal

The applications that will be built for Interlagos will use its FMA and 256bit capabilities

They said "up to 80%" performance of a traditional dual-core design.

The integer portion seems to be the one more prone to lose performance - the FPU has more theoretical capability than 2x K10.5's FPU.

Bulldozer CMP = 100% performance
Bulldozer CMT = 80% performance with less mm^2 and less power(watts)

The 80% number wasn't aimed at current products but at a hypothetical Bulldozer using CMP

If there's two threads being executed, then each one is forced to share resources. How much real-world performance they lose from it, we don't know. Every module has 2x 128-bit FMAC.

If two threads are required to execute on the Floating Point aka 256bit....an extra cycle will occur for the extra 256bit op

Also, they meant to say 180% scaling in comparison to two typical cores, or 200%.
160% vs 200% to be exact

160% w/ Bulldozer Module vs 200% w/ Bulldozer Dual-core w/ Seperate Fetch/Decode and Floating Point and L2
 
Last edited:

lau808

Senior member
Jun 25, 2011
217
0
71
not 160% 180% of 2 full cores. id link it but its in all these damn threads so im not gonna bother looking for it
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
not 160% 180% of 2 full cores. id link it but its in all these damn threads so im not gonna bother looking for it

bulldozer-5.jpg


bulldozer-6.jpg


Do, I need to point out where it says 80% of CMP

Dual-Core CMP => 100%, 200%
Dual-Core CMT => 80%, 160%

It's not rocket science

Also, it's not a static 80% it's an average of 80%
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Dual-Core CMP => 100%, 200%
Dual-Core CMT => 80%, 160%

Owing to the shared-resource design, I thought the single-thread performance was supposed to be better than 1/2 the dual-thread performance on the dual-core CMT microarchitecture?

Something like:
Dual-Core CMP => 1x, 2x
Dual-Core CMT => 1.1x, 1.6x
Because how could a single thread on a single core outperform a single thread on a dual-core module when that thread now has access to 2x the resources in those instances where a second thread is not scheduled on the module?

Or are they saying that the comparitor CMP architecture is to take the CMT module, remove ONLY the int core, leave all the shared resources as is, and then make two cores out of that?
 

lau808

Senior member
Jun 25, 2011
217
0
71
i saw a statement where 2nd core will provide 80% performance of 1st core. dont even remember the exact wording. but im too lazy to look for it so nvm
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
Owing to the shared-resource design, I thought the single-thread performance was supposed to be better than 1/2 the dual-thread performance on the dual-core CMT microarchitecture?

My understanding is that this percentage of the average 80% of CMP only deals with the dual-core design
Because how could a single thread on a single core outperform a single thread on a dual-core module when that thread now has access to 2x the resources in those instances where a second thread is not scheduled on the module?

You are talking about Throughput, I think

Or are they saying that the comparitor CMP architecture is to take the CMT module, remove ONLY the int core, leave all the shared resources as is, and then make two cores out of that?

I would say....
Bulldozer CMP
2 x 1x32KB L1i
2 x 1x16KB L1d
2 x 4-way decode(might be 2-way but not sure)
2 x Floating Point Coprocessor
2 x Integer Schedulers
2 x Floating Point Schedulers
2 x 1MB L2
2 Cores

Vs
Bulldozer CMT
1 x 64KB L1i
2 x 16KB L1d
1 x 4-way decode
1 x FpCp
2 x Int Schs
1 x Fp Schs
1 x 2MB L2
2 Cores
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
Actually, it's the other way around. A module has two full, normal integer cores that do not share any resources.

The integer cores do share resources, just look at AMD slides posted a few posts above.


When it comes to that it's the same as all other CPUs in execution. Where it changes is the FPU, since if only a single thread is executed on the module it'll use all the resources. That means it'll be able to handle 256-bit AVX (used for encryption) and FP SSE. If there's two threads being executed, then each one is forced to share resources. How much real-world performance they lose from it, we don't know. Every module has 2x 128-bit FMAC.
Considering that MC can do 0 (ZERO) AVX instructions and 0 (ZERO) 256-bit instructions, a BD module should lose no FP performance compared to 2 MC cores. In fact, a BD module should gain performance compared to 2 K10.5 cores if such instructions are used.

Also, they meant to say 180% scaling in comparison to two typical cores, or 200%.

I've read both, but the 80% is the only one showing up on AMD marketing slides.
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
The 35% was for the HPC market which is a downplayed number because it was talking about current apps

33% more core + 2% improvement(.125% improvement per core)

Unfortunately, HPC is one of the few places where developers program to metal

The applications that will be built for Interlagos will use its FMA and 256bit capabilities



Bulldozer CMP = 100% performance
Bulldozer CMT = 80% performance with less mm^2 and less power(watts)

The 80% number wasn't aimed at current products but at a hypothetical Bulldozer using CMP



If two threads are required to execute on the Floating Point aka 256bit....an extra cycle will occur for the extra 256bit op


160% vs 200% to be exact

160% w/ Bulldozer Module vs 200% w/ Bulldozer Dual-core w/ Seperate Fetch/Decode and Floating Point and L2

No, 180%. AMD means 100% for the first core, and 80% performance added from the second core. Therefore, 180% overall. Of course, this is probably an average, and it may very well be that AMD is using this estimate taking into account how much performance will be lost in FP workloads in comparison to having it completely dedicated and using it to round out. It may very well be, like another poster said: 200% integer performance, 160% floating point performance when two threads are being run.

And the number said by AMD when it comes to performance improvements in HPC from Interlagos in comparison to Magny Cours is "UP TO 35% higher", which means it's a best-case scenario. It could definitely be true in integer workloads, though. Needless to say, there's been more info in terms of servers because that's what Bulldozer is mainly aimed at.
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
The integer cores do share resources, just look at AMD slides posted a few posts above.



Considering that MC can do 0 (ZERO) AVX instructions and 0 (ZERO) 256-bit instructions, a BD module should lose no FP performance compared to 2 MC cores. In fact, a BD module should gain performance compared to 2 K10.5 cores if such instructions are used.



I've read both, but the 80% is the only one showing up on AMD marketing slides.

The FPUs share resources, the integer units do not. I've looked at that slide, and that's exactly what it says.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
The FPUs share resources, the integer units do not. I've looked at that slide, and that's exactly what it says.

I guess they didn't write shared fetch and shared decode, although as you can see they are shared, but they did write "shared L2$"

Additionally I didn't say the FPU isn't shared by the 2 threads and I also said MC can do 0 256-bit FP instructions and 0 AVX.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
I guess they didn't write shared fetch and shared decode, although as you can see they are shared, but they did write "shared L2$"

Additionally I didn't say the FPU isn't shared by the 2 threads and I also said MC can do 0 256-bit FP instructions and 0 AVX.

It's 2MB L2 cache per module, which is shared by both cores. It does not result in any performance deficit. It's akin to Core 2 Duo and its 6MB L2 cache, which both cores have access to. All the cores also have access to shared 8MB L3 cache, too. Many architectures have worked like this when it comes to cache in the past.

As for the FPU, what's being compared is not the performance improvement from K10.5 in FP SSE and 256-bit AVX (this one in particular K10.5 can't do at all), but how much performance it loses in comparison to having everything in the unit dedicated. Of course, FP SSE performance should be improved in comparison to K10.5 when only one thread is being run in the module.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
As for the FPU, what's being compared is not the performance improvement from K10.5 in FP SSE and 256-bit AVX (this one in particular K10.5 can't do at all), but how much performance it loses in comparison to having everything in the unit dedicated. Of course, FP SSE performance should be improved in comparison to K10.5 when only one thread is being run in the module.

It's unknown if the Floating Point Coprocessor is shared at all

It has one big 60 Entry Floating Point Schedular Core A tasks the first 30 and Core B tasks the second 30

This happens on separate cycles

How the Floating Point Coprocessor works is pretty much like Intel's Hyperthreading which isn't a bad thing it just means if Core A tasks a 256bit FP op and Core B tasks a 256bit Int op they can occur simultaneously

I think alot of people are ignoring the fact that Vertical Multithreading means that Core A and Core B have fetches, decodes and tasking abilities on separate cycles

The Front End is VMT and The Floating Point Front End is also VMT
but the cores and floating point abuse SMT
(Both cores can execute at the same time and all the resources in the FPCP can execute at the same time)

I can't wait till Agner Fog talks about this and makes sense out of this lol
 
Last edited:

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
As for the FPU, what's being compared is not the performance improvement from K10.5 in FP SSE and 256-bit AVX (this one in particular K10.5 can't do at all), but how much performance it loses in comparison to having everything in the unit dedicated. Of course, FP SSE performance should be improved in comparison to K10.5 when only one thread is being run in the module.

Didn't this chain of talk spawned from a comparison of performance between 12 cores MC and 16 cores BD?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
No, 180%. AMD means 100% for the first core, and 80% performance added from the second core. Therefore, 180% overall. Of course, this is probably an average, and it may very well be that AMD is using this estimate taking into account how much performance will be lost in FP workloads in comparison to having it completely dedicated and using it to round out. It may very well be, like another poster said: 200% integer performance, 160% floating point performance when two threads are being run.

And the number said by AMD when it comes to performance improvements in HPC from Interlagos in comparison to Magny Cours is "UP TO 35% higher", which means it's a best-case scenario. It could definitely be true in integer workloads, though. Needless to say, there's been more info in terms of servers because that's what Bulldozer is mainly aimed at.

So it would be something more like:
Dual-Core CMP => 1x, 2x
Dual-Core CMT => 1x, 1.8x
But I still don't get how both the CMT and CMP designs would yield the same 1x (i.e. the same IPC).

Shouldn't the CMT architecture have an advantage over the CMP architecture when there is only one thread running owing to the fact that the thread running on the CMT architecture has more resources available to it because the otherwise shared resources are not being shared at that point?

Is this not the case?