Modules are more effective than hyperthreading, right?

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Soulkeeper

Diamond Member
Nov 23, 2001
6,740
156
106
It appears that less than 30% of the transistors on the bulldozer die are part of the 8 modules themselves

modules: 27.67% 553.38M
L2: 16.00% 320.19M
L3: 12.66% 253.28M
Other: 43.67% 873.40M

these are ROUGH calculations i've done from the die shot images in the reviews on the web
The image was low resolution 457x433.

the cache takes more than the modules, but the big surprise is "Other" ie: imc, HT, etc.

Makes me wonder ...
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
A lot of the transistor count went into doubling the total cache. As has been mentioned the AMD talk was referencing roughly 12% increase going to a module design over a single core design. It is thought this 12% number is inclusive of the L2 cache.
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
They're not able to feed the cores fully so in practice they're getting 30-59% which explains a lot. When it's at the 30% end it's just horrible. This also explains why they're so close to Thuban 6 core performance. [1.6x4 = 6.4]


.

This is a major issue with BD, they cannot keep the cores working, wasting many many cycles. Long ago when AMD was tops, Intel really spent invaluable time working on finding better ways to keep the cores busy. Without an imc Intel found clever ways to manage data well. Hyperthreading was born from intel wanting not to waste cycles. But hyperthreading was just part of intels search to increase performance just by having less wasted cycles. The results of their work multiplied once they implemented an on chip integrated memory controller (imc).

AMDs route was different. Their imc gave them plenty of bandwidth to work with. They didnt see the importance like intel did. I would say that AMD took their IMC for granted while intel found genius ways to handle, predict, process data in a much less forgiving fsb. Their aim was to eliminate wasted cycles and they truly are hands down the best at doing this.

Intels work
It also seems they threw too much cache in and the latency is not as good as older designs.
.

They are trying to keep their cores busy. Wasted cycles is wasted energy. Nothing but waste. All that cache is to keep as much data as they can on chip, its more forgiving in their inefficient system. Its not that they want it, its just that they are having a very hard time getting the data to flow efficiently. I "well written" code BD does very well. This is what amd wants, code that doesnt show the BDs true weakness. "Sloppy code" (and most code is) and BD cant keep the cores fed they waste more cycles than they have data for. Benchmarks and strait forward conversions are the only realistic time to expect well written code. Code that has a ton of unchanging data that can be threaded evenly in perfect order. Its not feasible to expect this pretty coding.

Windows 8 will manage better for BD, but AMD still is lacking on the HW side. They need to put a lot of money and time in research all on data throughput. The need to eliminate wasted cycles. Its not a good idea to expect the software to cater to your design, but to cater your design to the software that exist now. The BD would be much more consistent if it only could keep the cores busy with data.
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I'd say the cache amounts were more targeted at getting back into the server market. I'm only vaguely familiar with HPC but they definitely have more control over getting their work sets to fit in cache.