Yeah, one of the parts to the rumour was that it would work different than turbo boost.
If you're hard core and have a black edition, you can manually do this. The cores can be independently clocked, right?
Yeah, one of the parts to the rumour was that it would work different than turbo boost.
I think the difference between 50% and 5% might be the difference between marketing and engineering. Engineers tend to be very literal.
If 2 cores get you 180% performance of 1, then in simple terms, that extra core is 50% that gets you the 80%.
What I asked the engineering team was "what is the actual die space of the dedicated integer portion of the module"? So, for instance, if I took an 8 core processor (with 4 modules) and removed one integer core from each module, how much die space would that save. The answer was ~5%.
Simply put, in each module, there are plenty of shared components. And there is a large cache in the processor. And a northbridge/memory controller. The dies themselves are small in relative terms.
for comparison, 1 module = 1 [intel] core. Both run 2 threads. They're positioning 1 module parts against duals, 2 module parts against quads, and 4 module parts against octos. Hex is pretty much a one time thing that won't help anybody.
It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself.![]()
JF,
I think you should talk to the engineers again, I suspect they misunderstood your question slightly.
~5% sounds about right for the overall die area cost for ONE of the 4 BD module's extra integer execution units, not all 4 of them.
It's doubtful that Chuck got it wrong regarding the area cost relative to the module itself.
So let's take that as a given and run some fictitious numbers:
Let's say that a BD module is 30 mm2, and that if it didn't have the extra integer resources and L1D it would be 20 mm2.
This is consistent with Chuck saying 50% more space: 20 * 1.5 = 30
So let's say we have 4 modules on die: 120 mm2.
And let's say that the uncore (L2/L3/mem controller) is another 80 mm2.
So the overall die is 120 + 80 = 200 mm2.
Then what do we see?
The extra 10mm2 for ONE of the modules is (10/200) or 5% of the total die area.
But you asked about ALL 4 modules, so that's 20% of the total die area.
And that actually sounds reasonable, and fits with Chuck's 50% figure when not counting the uncore.
--
If your 5% overall figure were correct in this example, then the 30 mm2 BD module would only shrink to 27.5 mm2 without the extra integer units and L1D.
That would mean their cost was 30 / 27.5, or only an extra 9%, vs the 50% Chuck Moore wrote up in his presentation, relative to the core area.
And that's just not believable for another set of integer units, L1D etc.
So, I'd poll those engineers again.![]()
The only reason we compare Intel Core i3 to Athlon II X4 is because Intel's cores are stronger.
If this AMD product is to be an improvement One module Bulldozer (dual core) needs to better than the maximum Intel dual core (which includes hyper-threading).
Are the i3's 45nm? What's their die size? They still have the full 6MB of cache right? They could actually be bigger chips than the Athlon II X4s then.
So what is the consensus on the Bulldozer's core strength?
How much faster could a One module Bulldozer be compared to a Phenom II X2?
Yeah, one of the parts to the rumour was that it would work different than turbo boost.
Yeah, one of the parts to the rumour was that it would work different than turbo boost.
*snip cool slides*
Considering 4 Bulldozer modules (8 cores) are about 60 to 80% faster than one six-core Opteron 6100 CPU in SPECInt_rate, then it should be between 20-30%.
But, take this information with a truck load of salt. Not sure if it is at the same clock speeds or same power.
8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?
(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)
That seems too good to be true.
8 bulldozer cores result in 60-80% higher specInt_rate over a 6-core istanbul?
(I am assuming these are clock-normalized scores so we are just assessing the microarchitectural improvements between the two processor families)
That seems too good to be true.
This would be 20% to 35% higher specInt_rate per core for Bulldozer compared to Instanbul.
Is this measurement "specInt_rate" a good way to access improvement in Instructions per clock?
Dresden boyWith a probability >95% BD will not only be able to do 2 x 128 bit or 1 x 256 bit FMAC per BD module, but it could also do 2 x 128 bit or 1 x 256 bit FADD together with 2 x 128 or 1 x 256 bit FMUL (independend from the adds) per cycle.Originally Posted by nvo View Post
One thing that did not escape my notice is that has actually doubled the throughput of both integer and floating point in the new core (at least the theoretical throughput). By implementing fused-multiply-accumulate in its new FPU, AMD has doubled the theoretical throughput of FP instructions on its Bulldozer core, and by implementing two integer clusters per core, AMD has doubled its theoretical integer throughput
Compared to Intel's Nehalem core:
Bulldozer
2x 4 instructions per clock
up to 2x 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies and Adds per clock
Nehalem:
4 instructions per clock
up to 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies or Adds per clock
• Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
• Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
• The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
• L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
• Up to 4 modules share a L3-cache and Northbridge
• Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.
So what are the cliffs? How is it different and what makes it different good or different bad?
Let's do some elementary school math, shall we?
"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)
If AMD's earlier presentation of "50% greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.
If this is true:
8 "cores" = 1.7 x 6 cores > 4 Nehalem cores
It's not really beating them in the top-bin/performance per se, but beating them in the performance/watt/dollar. 4 Nehalem core equivalent(in die size) which AMD calls 8 cores will beat the 4 core Nehalem significantly in multi-thread apps, but HALF the size of an 8 core Nehalem, which perform better than the 8 "core" AMD.
Performance
4 Intel cores<<8 "Cores"<8 Intel cores
Die size
8 Intel cores>>8 "Cores"=4 Intel cores
Is a "Core" faster than AMD Core?
The following should more than answer that question
AMD Shanghai Opteron(4 cores/2.7GHz): 56.1/67.5
AMD Shanghai Opteron(8 cores/2.7GHz): 110/132
8 "Cores" 131.2/166.4
More efficient per die size and power? Power, I do not know, but usually it follows core size. As for die size:
1x current AMD core=0.66x Intel core
2x current AMD core=1.33x Intel core
Bulldozer=Faster than 2x AMD core at 1x Intel core size
Let's do some elementary school math, shall we?
"Cores"=The mini-cores in Bulldozer
Cores=Regular cores(confused yet?)
If AMD's earlier presentation of "50% greater die size for 80% greater performance is true", then it means each Bulldozer module is size of Nehalem core, since they are currently 50% larger than AMD's.
gaia I think if you scrutinize the info in the quotes you gave a little more closely you'll see they are all conflating modules with cores and equating the doubling of resources within a module as being equivalent of beefing up each core, and then multiplying (incorrectly) the module's execution capabilities by the publicized core count.
In other words it would appear all the quotes you quoted double counting the processing capabilities of a Bulldozer "core" and then erroneously comparing them to a nehalem core (or istanbul core as we are attempting to do).
Do you see that too?
Another way to think of the "Cores" is this. Often in discussions the prospect of AMD having Hyperthreading is talked about. What if rather than bringing 25-30% increase as with Hyperthreading, it'll do 80% instead? And rather than calling that 2 threads/core, they just call it "Dual core". It's kinda fuzzy, but it'll do what's intended.
