AMD sheds light on Bulldozer, Bobcat, desktop, laptop plans

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

tatertot

Member
Nov 30, 2009
29
0
0
I think the difference between 50% and 5% might be the difference between marketing and engineering. Engineers tend to be very literal.

If 2 cores get you 180% performance of 1, then in simple terms, that extra core is 50% that gets you the 80%.

What I asked the engineering team was "what is the actual die space of the dedicated integer portion of the module"? So, for instance, if I took an 8 core processor (with 4 modules) and removed one integer core from each module, how much die space would that save. The answer was ~5%.

Simply put, in each module, there are plenty of shared components. And there is a large cache in the processor. And a northbridge/memory controller. The dies themselves are small in relative terms.

JF,

I think you should talk to the engineers again, I suspect they misunderstood your question slightly.

~5% sounds about right for the overall die area cost for ONE of the 4 BD module's extra integer execution units, not all 4 of them.

It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself. :)

So let's take that as a given and run some fictitious numbers:

Let's say that a BD module is 30 mm2, and that if it didn't have the extra integer resources and L1D it would be 20 mm2.

This is consistent with Chuck saying 50% more space: 20 * 1.5 = 30

So let's say we have 4 modules on die: 120 mm2.

And let's say that the uncore (L2/L3/mem controller) is another 80 mm2.

So the overall die is 120 + 80 = 200 mm2.

Then what do we see?

The extra 10mm2 for ONE of the modules is (10/200) or 5% of the total die area.

But you asked about ALL 4 modules, so that's 20% of the total die area.

And that actually sounds reasonable, and fits with Chuck's 50% figure when not counting the uncore.
--
If your 5% overall figure were correct in this example, then the 30 mm2 BD module would only shrink to 27.5 mm2 without the extra integer units and L1D.

That would mean their cost was 30 / 27.5, or only an extra 9%, vs the 50% Chuck Moore wrote up in his presentation, relative to the core area.

And that's just not believable for another set of integer units, L1D etc.

So, I'd poll those engineers again. :)
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
for comparison, 1 module = 1 [intel] core. Both run 2 threads. They're positioning 1 module parts against duals, 2 module parts against quads, and 4 module parts against octos. Hex is pretty much a one time thing that won't help anybody.

The only reason we compare Intel Core i3 to Athlon II X4 is because Intel's cores are stronger.

If this AMD product is to be an improvement One module Bulldozer (dual core) needs to better than the maximum Intel dual core (which includes hyper-threading).
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
So what is the consensus on the Bulldozer's core strength?

How much faster could a One module Bulldozer be compared to a Phenom II X2?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself. :)

Your analysis, both preceding and post-ceding this particular sentence, may be 100% correct but it is still irrelevant to the context of the discussion for which the 5% number was given.

You are referring to the area of the module, sans L3$/IMC/NB/etc, whereas JF quite distinctly and repeatedly make clear that he was referring to the area of the entire die.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
JF,

I think you should talk to the engineers again, I suspect they misunderstood your question slightly.

~5% sounds about right for the overall die area cost for ONE of the 4 BD module's extra integer execution units, not all 4 of them.

It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself. :)

So let's take that as a given and run some fictitious numbers:

Let's say that a BD module is 30 mm2, and that if it didn't have the extra integer resources and L1D it would be 20 mm2.

This is consistent with Chuck saying 50% more space: 20 * 1.5 = 30

So let's say we have 4 modules on die: 120 mm2.

And let's say that the uncore (L2/L3/mem controller) is another 80 mm2.

So the overall die is 120 + 80 = 200 mm2.

Then what do we see?

The extra 10mm2 for ONE of the modules is (10/200) or 5% of the total die area.

But you asked about ALL 4 modules, so that's 20% of the total die area.

And that actually sounds reasonable, and fits with Chuck's 50% figure when not counting the uncore.
--
If your 5% overall figure were correct in this example, then the 30 mm2 BD module would only shrink to 27.5 mm2 without the extra integer units and L1D.

That would mean their cost was 30 / 27.5, or only an extra 9%, vs the 50% Chuck Moore wrote up in his presentation, relative to the core area.

And that's just not believable for another set of integer units, L1D etc.

So, I'd poll those engineers again. :)

Lets say each module is 100mm^2. 4 modules are 400mm^2. Lets say an Int core is 5% area of a module, or 5mm^2. 4 Int cores are 20mm^2. 20mm^2/400mm^2 is still 5%.

But it still is 50% of the Integer core area since 8 cores would be 40mm^2 and 4 will be 20mm^2.

Edit: I guess this is semantics again. As IDC say module vs die.

If we think of Module as the entire die, 1 integer core is 5%.

If we think as the module as the integer+fp portion as represented in the schematics, then 1 integer core is 50% of it.
 
Last edited:

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
The only reason we compare Intel Core i3 to Athlon II X4 is because Intel's cores are stronger.

If this AMD product is to be an improvement One module Bulldozer (dual core) needs to better than the maximum Intel dual core (which includes hyper-threading).


Are the i3's 45nm? What's their die size? They still have the full 6MB of cache right? They could actually be bigger chips than the Athlon II X4s then.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Are the i3's 45nm? What's their die size? They still have the full 6MB of cache right? They could actually be bigger chips than the Athlon II X4s then.

Less than 80mm2 die size(+ or - 2mm2) at 32nm, for the 4MB L3 variant.

Maybe the cores/modules confusion was made by AMD as a slap-in-the-face to the people comparing equal "core" counts. They are saying "Wake up, overall performance is what matters." ;)
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
So what is the consensus on the Bulldozer's core strength?

How much faster could a One module Bulldozer be compared to a Phenom II X2?

Considering 4 Bulldozer modules (8 cores) are about 60 to 80% faster than one six-core Opteron 6100 CPU in SPECInt_rate, then it should be between 20-35%.

But, take this information with a truck load of salt. Not sure if it is at the same clock speeds or same power.
 
Last edited:

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
Yeah, one of the parts to the rumour was that it would work different than turbo boost.

4d7f6fd1-8a8f-436f-84b6-c265cf6c9587.jpg


9a18c571-b6e9-45fe-a169-92b5f80cee21.jpg
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Considering 4 Bulldozer modules (8 cores) are about 60 to 80% faster than one six-core Opteron 6100 CPU in SPECInt_rate, then it should be between 20-30%.

But, take this information with a truck load of salt. Not sure if it is at the same clock speeds or same power.

8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?

(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)

That seems too good to be true.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?

(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)

That seems too good to be true.

This would be 20% to 35% higher specInt_rate per core for Bulldozer compared to Instanbul.

Is this measurement "specInt_rate" a good way to access improvement in Instructions per clock?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Here are some SpecCPU2006 scores, using Int_Rate

Intel Core i7 920(4 cores/8 thread/2.66GHz): 102 Base/109 Peak
Xeon X5550(8 cores/16 thread/2.66GHz): 226 Base/242 Peak

AMD Opteron 2435(6 cores/2.6GHz): 82.0/104
AMD Opteron 8435(12 cores/2.6GHz): 159/203

AMD Shanghai Opteron(4 cores/2.7GHz): 56.1/67.5
AMD Shanghai Opteron(8 cores/2.7GHz): 110/132

1.6x 2435's score would put it at 131.2/166.4, behind 8 core Nehalem, but significantly ahead of 4 core.

Lots of smaller "cores" to save die space and power, yet still pull its weight in very well threaded apps?
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?

(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)

That seems too good to be true.

This would be 20% to 35% higher specInt_rate per core for Bulldozer compared to Instanbul.

Is this measurement "specInt_rate" a good way to access improvement in Instructions per clock?

By dresdenboy

http://www.semiaccurate.com/forums/showpost.php?p=13883&postcount=12


Dresden boy
Originally Posted by nvo View Post
One thing that did not escape my notice is that has actually doubled the throughput of both integer and floating point in the new core (at least the theoretical throughput). By implementing fused-multiply-accumulate in its new FPU, AMD has doubled the theoretical throughput of FP instructions on its Bulldozer core, and by implementing two integer clusters per core, AMD has doubled its theoretical integer throughput

Compared to Intel's Nehalem core:

Bulldozer
2x 4 instructions per clock
up to 2x 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies and Adds per clock

Nehalem:
4 instructions per clock
up to 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies or Adds per clock
With a probability >95% BD will not only be able to do 2 x 128 bit or 1 x 256 bit FMAC per BD module, but it could also do 2 x 128 bit or 1 x 256 bit FADD together with 2 x 128 or 1 x 256 bit FMUL (independend from the adds) per cycle.

From http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3

• Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
• Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
• The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
• L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
• Up to 4 modules share a L3-cache and Northbridge
• Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
So what are the cliffs? How is it different and what makes it different good or different bad?

Well Turbo mode is there.

I want the cliff notes too. :p

But apparently both C6/CC6 are the new stuff and are related to power management, including complete power down.

But I'm completely ignorant on this matter.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Let's do some elementary school math, shall we? :)

"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)

If AMD's earlier presentation of "50% greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.

If this is true:
8 "cores" = 1.7 x 6 cores > 4 Nehalem cores

It's not really beating them in the top-bin/performance per se, but beating them in the performance/watt/dollar. 4 Nehalem core equivalent(in die size) which AMD calls 8 cores will beat the 4 core Nehalem significantly in multi-thread apps, but HALF the size of an 8 core Nehalem, which perform better than the 8 "core" AMD.

Performance
4 Intel cores<<8 "Cores"<8 Intel cores

Die size
8 Intel cores>>8 "Cores"=4 Intel cores


Is a "Core" faster than AMD Core?

The following should more than answer that question

AMD Shanghai Opteron(4 cores/2.7GHz): 56.1/67.5
AMD Shanghai Opteron(8 cores/2.7GHz): 110/132

8 "Cores" 131.2/166.4

More efficient per die size and power? Power, I do not know, but usually it follows core size. As for die size:

1x current AMD core=0.66x Intel core
2x current AMD core=1.33x Intel core

Bulldozer=Faster than 2x AMD core at 1x Intel core size
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Let's do some elementary school math, shall we? :)

"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)

If AMD's earlier presentation of "50&#37; greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.

If this is true:
8 "cores" = 1.7 x 6 cores > 4 Nehalem cores

It's not really beating them in the top-bin/performance per se, but beating them in the performance/watt/dollar. 4 Nehalem core equivalent(in die size) which AMD calls 8 cores will beat the 4 core Nehalem significantly in multi-thread apps, but HALF the size of an 8 core Nehalem, which perform better than the 8 "core" AMD.

Performance
4 Intel cores<<8 "Cores"<8 Intel cores

Die size
8 Intel cores>>8 "Cores"=4 Intel cores


Is a "Core" faster than AMD Core?

The following should more than answer that question

AMD Shanghai Opteron(4 cores/2.7GHz): 56.1/67.5
AMD Shanghai Opteron(8 cores/2.7GHz): 110/132

8 "Cores" 131.2/166.4

More efficient per die size and power? Power, I do not know, but usually it follows core size. As for die size:

1x current AMD core=0.66x Intel core
2x current AMD core=1.33x Intel core

Bulldozer=Faster than 2x AMD core at 1x Intel core size

So four 32nm Bulldozer mini-cores (dual module) would be equal in size to 32nm Intel Core i3?

This could work if AMD had a really strong Turbo mode.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
Let's do some elementary school math, shall we? :)

"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)

If AMD's earlier presentation of "50&#37; greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.

Actually they don't claim die size. AMD claim's 50% area investment http://www.amd.com.cn/chcn/assets/content_type/DownloadableAssets/Chuck_Moore_6-10-05.pdf .

JFAMD says an integer core is 5% total die size in a 8 core Bulldozer or 4 Modules if you prefer.

But as IDC said in a post long before, those shared resources could stay or not stay in there.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91

gaia I think if you scrutinize the info in the quotes you gave a little more closely you'll see they are all conflating modules with cores and equating the doubling of resources within a module as being equivalent of beefing up each core, and then multiplying (incorrectly) the module's execution capabilities by the publicized core count.

In other words it would appear all the quotes you quoted double counting the processing capabilities of a Bulldozer "core" and then erroneously comparing them to a nehalem core (or istanbul core as we are attempting to do).

Do you see that too?
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
So "Bulldozer"(which I know very little about) sounds like a design strategy to preserve power to watt performance due mostly to the fact that AMD always lags behind Intel in manufacturing process.

What would happen if AMD could beat Intel to the smaller manufacturing node with this design or a variant of this design? How well could it scale with additional voltage at a smaller node if Bulldozer was originally meant to be a power saving architecture? Could development of Future Turbo mode strategies help the potential here?
 
Last edited:

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
gaia I think if you scrutinize the info in the quotes you gave a little more closely you'll see they are all conflating modules with cores and equating the doubling of resources within a module as being equivalent of beefing up each core, and then multiplying (incorrectly) the module's execution capabilities by the publicized core count.

In other words it would appear all the quotes you quoted double counting the processing capabilities of a Bulldozer "core" and then erroneously comparing them to a nehalem core (or istanbul core as we are attempting to do).

Do you see that too?

Yes.

But I don't think current phenom II can do 4 instructions.

Still, 1 Module is equal to 2 Nehalem cores in integer instructions and at least equal to the 2 Nehalem cores due to fused multipy-add FP(?).

But I'm needing bed. My last post was just crap.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Cores are the most important and do the most work in a CPU. It's also the part where most effort is put in and most power is used. Even regarding yield, redundant parts like caches don't factor a lot into it because one part failing isn't as critical as a part in a core failing.

Another way to think of the "Cores" is this. Often in discussions the prospect of AMD having Hyperthreading is talked about. What if rather than bringing 25-30% increase as with Hyperthreading, it'll do 80% instead? And rather than calling that 2 threads/core, they just call it "Dual core". It's kinda fuzzy, but it'll do what's intended.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Looks like the AMd version of Turbo will be called "APM Boost" and both BD and Llano will have it:
kaigai3.jpg



kaigai2.jpg


kaigai6.jpg
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Another way to think of the "Cores" is this. Often in discussions the prospect of AMD having Hyperthreading is talked about. What if rather than bringing 25-30&#37; increase as with Hyperthreading, it'll do 80% instead? And rather than calling that 2 threads/core, they just call it "Dual core". It's kinda fuzzy, but it'll do what's intended.

So with Bulldozer one thread is 25% stronger than the other right?

But since the second thread is "close enough" in power we just call the Bulldozer module "dual core" instead of single core with super strong hyperthreading.