AMD sheds light on Bulldozer, Bobcat, desktop, laptop plans

ShawnD1 · Nov 30, 2009

piesquared said:
Yeah, one of the parts to the rumour was that it would work different than turbo boost.

If you're hard core and have a black edition, you can manually do this. The cores can be independently clocked, right?

tatertot · Nov 30, 2009

JFAMD said:
I think the difference between 50% and 5% might be the difference between marketing and engineering. Engineers tend to be very literal.

If 2 cores get you 180% performance of 1, then in simple terms, that extra core is 50% that gets you the 80%.

What I asked the engineering team was "what is the actual die space of the dedicated integer portion of the module"? So, for instance, if I took an 8 core processor (with 4 modules) and removed one integer core from each module, how much die space would that save. The answer was ~5%.

Simply put, in each module, there are plenty of shared components. And there is a large cache in the processor. And a northbridge/memory controller. The dies themselves are small in relative terms.

JF,

I think you should talk to the engineers again, I suspect they misunderstood your question slightly.

~5% sounds about right for the overall die area cost for ONE of the 4 BD module's extra integer execution units, not all 4 of them.

It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself.

So let's take that as a given and run some fictitious numbers:

Let's say that a BD module is 30 mm2, and that if it didn't have the extra integer resources and L1D it would be 20 mm2.

This is consistent with Chuck saying 50% more space: 20 * 1.5 = 30

So let's say we have 4 modules on die: 120 mm2.

And let's say that the uncore (L2/L3/mem controller) is another 80 mm2.

So the overall die is 120 + 80 = 200 mm2.

Then what do we see?

The extra 10mm2 for ONE of the modules is (10/200) or 5% of the total die area.

But you asked about ALL 4 modules, so that's 20% of the total die area.

And that actually sounds reasonable, and fits with Chuck's 50% figure when not counting the uncore.
--
If your 5% overall figure were correct in this example, then the 30 mm2 BD module would only shrink to 27.5 mm2 without the extra integer units and L1D.

That would mean their cost was 30 / 27.5, or only an extra 9%, vs the 50% Chuck Moore wrote up in his presentation, relative to the core area.

And that's just not believable for another set of integer units, L1D etc.

So, I'd poll those engineers again.

cbn · Nov 30, 2009

ilkhan said:
for comparison, 1 module = 1 [intel] core. Both run 2 threads. They're positioning 1 module parts against duals, 2 module parts against quads, and 4 module parts against octos. Hex is pretty much a one time thing that won't help anybody.

The only reason we compare Intel Core i3 to Athlon II X4 is because Intel's cores are stronger.

If this AMD product is to be an improvement One module Bulldozer (dual core) needs to better than the maximum Intel dual core (which includes hyper-threading).

cbn · Nov 30, 2009

So what is the consensus on the Bulldozer's core strength?

How much faster could a One module Bulldozer be compared to a Phenom II X2?

Idontcare · Nov 30, 2009

tatertot said:
It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself.

Your analysis, both preceding and post-ceding this particular sentence, may be 100% correct but it is still irrelevant to the context of the discussion for which the 5% number was given.

You are referring to the area of the module, sans L3$/IMC/NB/etc, whereas JF quite distinctly and repeatedly make clear that he was referring to the area of the entire die.

GaiaHunter · Nov 30, 2009

tatertot said:
JF,

I think you should talk to the engineers again, I suspect they misunderstood your question slightly.

~5% sounds about right for the overall die area cost for ONE of the 4 BD module's extra integer execution units, not all 4 of them.

It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself.

So let's take that as a given and run some fictitious numbers:

Let's say that a BD module is 30 mm2, and that if it didn't have the extra integer resources and L1D it would be 20 mm2.

This is consistent with Chuck saying 50% more space: 20 * 1.5 = 30

So let's say we have 4 modules on die: 120 mm2.

And let's say that the uncore (L2/L3/mem controller) is another 80 mm2.

So the overall die is 120 + 80 = 200 mm2.

Then what do we see?

The extra 10mm2 for ONE of the modules is (10/200) or 5% of the total die area.

But you asked about ALL 4 modules, so that's 20% of the total die area.

And that actually sounds reasonable, and fits with Chuck's 50% figure when not counting the uncore.
--
If your 5% overall figure were correct in this example, then the 30 mm2 BD module would only shrink to 27.5 mm2 without the extra integer units and L1D.

That would mean their cost was 30 / 27.5, or only an extra 9%, vs the 50% Chuck Moore wrote up in his presentation, relative to the core area.

And that's just not believable for another set of integer units, L1D etc.

So, I'd poll those engineers again.

Lets say each module is 100mm^2. 4 modules are 400mm^2. Lets say an Int core is 5% area of a module, or 5mm^2. 4 Int cores are 20mm^2. 20mm^2/400mm^2 is still 5%.

But it still is 50% of the Integer core area since 8 cores would be 40mm^2 and 4 will be 20mm^2.

Edit: I guess this is semantics again. As IDC say module vs die.

If we think of Module as the entire die, 1 integer core is 5%.

If we think as the module as the integer+fp portion as represented in the schematics, then 1 integer core is 50% of it.

Fox5 · Nov 30, 2009

Computer Bottleneck said:
The only reason we compare Intel Core i3 to Athlon II X4 is because Intel's cores are stronger.

If this AMD product is to be an improvement One module Bulldozer (dual core) needs to better than the maximum Intel dual core (which includes hyper-threading).

Are the i3's 45nm? What's their die size? They still have the full 6MB of cache right? They could actually be bigger chips than the Athlon II X4s then.

IntelUser2000 · Nov 30, 2009

Fox5 said:
Are the i3's 45nm? What's their die size? They still have the full 6MB of cache right? They could actually be bigger chips than the Athlon II X4s then.

Less than 80mm2 die size(+ or - 2mm2) at 32nm, for the 4MB L3 variant.

Maybe the cores/modules confusion was made by AMD as a slap-in-the-face to the people comparing equal "core" counts. They are saying "Wake up, overall performance is what matters."

GaiaHunter · Nov 30, 2009

Computer Bottleneck said:
So what is the consensus on the Bulldozer's core strength?

How much faster could a One module Bulldozer be compared to a Phenom II X2?

Considering 4 Bulldozer modules (8 cores) are about 60 to 80% faster than one six-core Opteron 6100 CPU in SPECInt_rate, then it should be between 20-35%.

But, take this information with a truck load of salt. Not sure if it is at the same clock speeds or same power.

GaiaHunter · Nov 30, 2009

piesquared said:
Yeah, one of the parts to the rumour was that it would work different than turbo boost.

Idontcare · Nov 30, 2009

piesquared said:
Yeah, one of the parts to the rumour was that it would work different than turbo boost.

GaiaHunter said:
*snip cool slides*

So what are the cliffs? How is it different and what makes it different good or different bad?

Idontcare · Nov 30, 2009

GaiaHunter said:
Considering 4 Bulldozer modules (8 cores) are about 60 to 80% faster than one six-core Opteron 6100 CPU in SPECInt_rate, then it should be between 20-30%.

But, take this information with a truck load of salt. Not sure if it is at the same clock speeds or same power.

8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?

(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)

That seems too good to be true.

cbn · Nov 30, 2009

Idontcare said:
8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?

(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)

That seems too good to be true.

This would be 20% to 35% higher specInt_rate per core for Bulldozer compared to Instanbul.

Is this measurement "specInt_rate" a good way to access improvement in Instructions per clock?

IntelUser2000 · Nov 30, 2009

Here are some SpecCPU2006 scores, using Int_Rate

Intel Core i7 920(4 cores/8 thread/2.66GHz): 102 Base/109 Peak
Xeon X5550(8 cores/16 thread/2.66GHz): 226 Base/242 Peak

AMD Opteron 2435(6 cores/2.6GHz): 82.0/104
AMD Opteron 8435(12 cores/2.6GHz): 159/203

AMD Shanghai Opteron(4 cores/2.7GHz): 56.1/67.5
AMD Shanghai Opteron(8 cores/2.7GHz): 110/132

1.6x 2435's score would put it at 131.2/166.4, behind 8 core Nehalem, but significantly ahead of 4 core.

Lots of smaller "cores" to save die space and power, yet still pull its weight in very well threaded apps?

GaiaHunter · Nov 30, 2009

Idontcare said:
8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?

(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)

That seems too good to be true.

Computer Bottleneck said:
This would be 20% to 35% higher specInt_rate per core for Bulldozer compared to Instanbul.

Is this measurement "specInt_rate" a good way to access improvement in Instructions per clock?

By dresdenboy

http://www.semiaccurate.com/forums/showpost.php?p=13883&postcount=12

Dresden boy

Originally Posted by nvo View Post
One thing that did not escape my notice is that has actually doubled the throughput of both integer and floating point in the new core (at least the theoretical throughput). By implementing fused-multiply-accumulate in its new FPU, AMD has doubled the theoretical throughput of FP instructions on its Bulldozer core, and by implementing two integer clusters per core, AMD has doubled its theoretical integer throughput

Compared to Intel's Nehalem core:

Bulldozer
2x 4 instructions per clock
up to 2x 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies and Adds per clock

Nehalem:
4 instructions per clock
up to 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies or Adds per clock

Click to expand...

With a probability >95% BD will not only be able to do 2 x 128 bit or 1 x 256 bit FMAC per BD module, but it could also do 2 x 128 bit or 1 x 256 bit FADD together with 2 x 128 or 1 x 256 bit FMUL (independend from the adds) per cycle.

From http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3

• Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
• Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
• The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
• L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
• Up to 4 modules share a L3-cache and Northbridge
• Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.

GaiaHunter · Nov 30, 2009

Idontcare said:
So what are the cliffs? How is it different and what makes it different good or different bad?

Well Turbo mode is there.

I want the cliff notes too.

But apparently both C6/CC6 are the new stuff and are related to power management, including complete power down.

But I'm completely ignorant on this matter.

IntelUser2000 · Nov 30, 2009

Let's do some elementary school math, shall we?

"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)

If AMD's earlier presentation of "50% greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.

If this is true:
8 "cores" = 1.7 x 6 cores > 4 Nehalem cores

It's not really beating them in the top-bin/performance per se, but beating them in the performance/watt/dollar. 4 Nehalem core equivalent(in die size) which AMD calls 8 cores will beat the 4 core Nehalem significantly in multi-thread apps, but HALF the size of an 8 core Nehalem, which perform better than the 8 "core" AMD.

Performance
4 Intel cores<<8 "Cores"<8 Intel cores

Die size
8 Intel cores>>8 "Cores"=4 Intel cores

Is a "Core" faster than AMD Core?

The following should more than answer that question

AMD Shanghai Opteron(4 cores/2.7GHz): 56.1/67.5
AMD Shanghai Opteron(8 cores/2.7GHz): 110/132

8 "Cores" 131.2/166.4

More efficient per die size and power? Power, I do not know, but usually it follows core size. As for die size:

1x current AMD core=0.66x Intel core
2x current AMD core=1.33x Intel core

Bulldozer=Faster than 2x AMD core at 1x Intel core size

cbn · Nov 30, 2009

IntelUser2000 said:
Let's do some elementary school math, shall we?

"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)

If AMD's earlier presentation of "50% greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.

If this is true:
8 "cores" = 1.7 x 6 cores > 4 Nehalem cores

It's not really beating them in the top-bin/performance per se, but beating them in the performance/watt/dollar. 4 Nehalem core equivalent(in die size) which AMD calls 8 cores will beat the 4 core Nehalem significantly in multi-thread apps, but HALF the size of an 8 core Nehalem, which perform better than the 8 "core" AMD.

Performance
4 Intel cores<<8 "Cores"<8 Intel cores

Die size
8 Intel cores>>8 "Cores"=4 Intel cores

Is a "Core" faster than AMD Core?

The following should more than answer that question

AMD Shanghai Opteron(4 cores/2.7GHz): 56.1/67.5
AMD Shanghai Opteron(8 cores/2.7GHz): 110/132

8 "Cores" 131.2/166.4

More efficient per die size and power? Power, I do not know, but usually it follows core size. As for die size:

1x current AMD core=0.66x Intel core
2x current AMD core=1.33x Intel core

Bulldozer=Faster than 2x AMD core at 1x Intel core size

So four 32nm Bulldozer mini-cores (dual module) would be equal in size to 32nm Intel Core i3?

This could work if AMD had a really strong Turbo mode.

GaiaHunter · Nov 30, 2009

IntelUser2000 said:
Let's do some elementary school math, shall we?

"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)

If AMD's earlier presentation of "50% greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.

Actually they don't claim die size. AMD claim's 50% area investment http://www.amd.com.cn/chcn/assets/content_type/DownloadableAssets/Chuck_Moore_6-10-05.pdf .

JFAMD says an integer core is 5% total die size in a 8 core Bulldozer or 4 Modules if you prefer.

But as IDC said in a post long before, those shared resources could stay or not stay in there.

Idontcare · Nov 30, 2009

GaiaHunter said:
By dresdenboy

http://www.semiaccurate.com/forums/showpost.php?p=13883&postcount=12

From http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3

gaia I think if you scrutinize the info in the quotes you gave a little more closely you'll see they are all conflating modules with cores and equating the doubling of resources within a module as being equivalent of beefing up each core, and then multiplying (incorrectly) the module's execution capabilities by the publicized core count.

In other words it would appear all the quotes you quoted double counting the processing capabilities of a Bulldozer "core" and then erroneously comparing them to a nehalem core (or istanbul core as we are attempting to do).

Do you see that too?

cbn · Dec 1, 2009

So "Bulldozer"(which I know very little about) sounds like a design strategy to preserve power to watt performance due mostly to the fact that AMD always lags behind Intel in manufacturing process.

What would happen if AMD could beat Intel to the smaller manufacturing node with this design or a variant of this design? How well could it scale with additional voltage at a smaller node if Bulldozer was originally meant to be a power saving architecture? Could development of Future Turbo mode strategies help the potential here?

GaiaHunter · Dec 1, 2009

Idontcare said:
gaia I think if you scrutinize the info in the quotes you gave a little more closely you'll see they are all conflating modules with cores and equating the doubling of resources within a module as being equivalent of beefing up each core, and then multiplying (incorrectly) the module's execution capabilities by the publicized core count.

In other words it would appear all the quotes you quoted double counting the processing capabilities of a Bulldozer "core" and then erroneously comparing them to a nehalem core (or istanbul core as we are attempting to do).

Do you see that too?

Yes.

But I don't think current phenom II can do 4 instructions.

Still, 1 Module is equal to 2 Nehalem cores in integer instructions and at least equal to the 2 Nehalem cores due to fused multipy-add FP(?).

But I'm needing bed. My last post was just crap.

IntelUser2000 · Dec 1, 2009

Cores are the most important and do the most work in a CPU. It's also the part where most effort is put in and most power is used. Even regarding yield, redundant parts like caches don't factor a lot into it because one part failing isn't as critical as a part in a core failing.

Another way to think of the "Cores" is this. Often in discussions the prospect of AMD having Hyperthreading is talked about. What if rather than bringing 25-30% increase as with Hyperthreading, it'll do 80% instead? And rather than calling that 2 threads/core, they just call it "Dual core". It's kinda fuzzy, but it'll do what's intended.

Idontcare · Dec 1, 2009

Looks like the AMd version of Turbo will be called "APM Boost" and both BD and Llano will have it:

cbn · Dec 1, 2009

IntelUser2000 said:
Another way to think of the "Cores" is this. Often in discussions the prospect of AMD having Hyperthreading is talked about. What if rather than bringing 25-30% increase as with Hyperthreading, it'll do 80% instead? And rather than calling that 2 threads/core, they just call it "Dual core". It's kinda fuzzy, but it'll do what's intended.

So with Bulldozer one thread is 25% stronger than the other right?

But since the second thread is "close enough" in power we just call the Bulldozer module "dual core" instead of single core with super strong hyperthreading.

AMD sheds light on Bulldozer, Bobcat, desktop, laptop plans

Lifer

Member

Lifer

Lifer

Elite Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Elite Member

Elite Member

Lifer

Elite Member

Diamond Member

Diamond Member

Elite Member

Lifer

Diamond Member

Elite Member

Lifer

Diamond Member

Elite Member

Elite Member

Lifer