Setting performance expectations for Bulldozer(client)

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Diamond Member
Feb 6, 2011
3,078
3,911
136
I can only assume you started looking into CPUs very recently.

around the 486 DX33 days, your point? look at what 7mb of extra L2/3 on 8core BD to 4 core SB. On BD 2mb block of cache is somewhere around 13-15mm sq.

Removing 1/2 of the L2/3 cache in what ever pattern gives best performace, i would imagine removing all the L3 to be the simplest would buy around 60mm sq. Making 294 around 234 much closer to SB. Sure SB has a GPU attached but there is also plenty of empty space on the bulldozer SOC die shot that could also be reduced.

its not like AMD haven't done that before.

edit: i even once owned a 200MHZ 686 cyrix
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
according to this
http://forums.anandtech.com/showthread.php?t=2146715
its 12-13mm for the L2 , the L3 looks a little bigger just eyeballing it. 14-15mm

edit: L3 per 2mb block

You are measuring it wrong(cue Apple). :)

Take the number of 30.9mm2 on a module basis and compare with the L3 cache. By the way, the white border is not the one you measure, the one inside it is the SRAM. You can see from the modified GlobalFoundries pic, the white border is just an outline by the AMD people who sent out the Bulldozer die shots.

You can simply do it on paint.

I can only assume you started looking into CPUs very recently.

Oh, and this still applies, I'm just not going to tell you why. :D

but there is also plenty of empty space on the bulldozer SOC die shot that could also be reduced.

There's not much empty space. Some are omitted, like the center where the crossbar/router is supposed to be on the left side.
 
Last edited:

itsmydamnation

Diamond Member
Feb 6, 2011
3,078
3,911
136
You are measuring it wrong(cue Apple). :)

Take the number of 30.9mm2 on a module basis and compare with the L3 cache. By the way, the white border is not the one you measure, the one inside it is the SRAM. You can see from the modified GlobalFoundries pic, the white border is just an outline by the AMD people who sent out the Bulldozer die shots.

You can simply do it on paint.
there not my numbers, there hiroshige's.


Oh, and this still applies, I'm just not going to tell you why. :D

I guess i think the same thing every time i see posts on core IP networking topics :D . im also an optimist by nature.
 

OCGuy

Lifer
Jul 12, 2000
27,224
37
91
Silence from the AMD corner....I would not be thrilled if I was a stockholder.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,078
3,911
136
They ARE your numbers because he only estimated 300mm2 as the die size, and said nothing about cache area.

he said module +L2 size, and module size. module + L2 size - module size = L2 size. they are still his numbers. if the L2 is a different size then either his numbers or his names for the numbers aren't accurate.
 

OneEng1

Junior Member
Apr 3, 2010
9
0
0
The die size of the 8c BD should be around 295mm^2 making it quite a bit larger than the 216mm^2 4c SB. It should also be unrivaled in performance on the desktop.

If a 6c BD were to be made available (it makes perfect sense to do this IMHO), it would likely be the same mask as the 8c with one module inoperative (ie no die size savings). This processor would likely be a good match for 4c SB Core i7 (SMT enabled).

The 4c BD is likely a separate mask. This one should be around the same size as SB 4c and should be a match for Core i5.

I am not aware of a 2c BD, but if they made one, it seems like it would compete with Llano (not all that smart).
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
I am not aware of a 2c BD, but if they made one, it seems like it would compete with Llano (not all that smart).


I bet there will be one, depending on yields. I think a 1M/2C with high clocks would be perfect for cheap office machines, just as current DC are now.

If AMD could get away without the L3 cache, it would be ridiculously cheap to manufacture as well...
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
he said module +L2 size, and module size. module + L2 size - module size = L2 size. they are still his numbers. if the L2 is a different size then either his numbers or his names for the numbers aren't accurate.

I don't think they can reduce the L3 caches without a layout change or simply disabling it. The first which isn't going to happen. Hell, it barely happens on a lithography shrink.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,078
3,911
136
I don't think they can reduce the L3 caches without a layout change or simply disabling it. The first which isn't going to happen. Hell, it barely happens on a lithography shrink.

they did it for athlon II, yes it would be a lot of work, but from when i was listening/watching the bulldozer hotchips presentation it was very clear that the L2 was apart of the cores/modules design and the L3 was part of the SOC design and he said the configuration of the L3 is purely a choice related to the SOC.

if memory latency and prefetch are a lot better in bulldozer then STARS and given the L3 acts as an eviction cache would it even make that big of a difference to consumer workloads?

If AMD is going to be on 32nm for a long period while intel are on 22nm it makes more sense to do this, so hopefully no layout change to the "core" itself. i guess the actual wiring would change a lot.

edit:

thinking about it, having two different 8 core designs might be a bit of a resouce wastefull if they plan on actually delivering continued core improvements, so maybe the quad core could go without L3 to make is as small as possible and depending on the 8 cores performace we might be paying a packet for them anyway and relative to AMD's current position they might not care about the extra die size.

i guess the other thing is back in the day, AMD still had the fastest CPU even though they where a process behind. Intel already know what there doing for 22nm if AMD can deliver will bulldozer they still might have performance advantage over IB even though IB would be much smaller.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
they did it for athlon II, yes it would be a lot of work, but from when i was listening/watching the bulldozer hotchips presentation it was very clear that the L2 was apart of the cores/modules design and the L3 was part of the SOC design and he said the configuration of the L3 is purely a choice related to the SOC.

The layout change is really simple in this case:

Athlon II: http://www.anandtech.com/show/2775
Phenom II: http://www.anandtech.com/show/2702/3

Notice the outer edges where the I/O(Memory controller, connection to the chipset, Hypertransport)is.

-If you look at the Phenom II X4 die shot, and look at the I/O, and assume that there's only 2 cores, the dark green I/O is exactly long as 2 cores + extra 512KB L2 cache per core, there's no empty spot. Of course, in this case, the extra 512KB L2 per core is due to having 2x or 2 extra cores.
-On the Athlon II X2 shot, you can see the same patterned I/O is about the length of the core, and Athlon II X2 has 1MB L2 per core, exactly what would require to fit the length
-The I/O connections that are at the bottom of the dark green one in the Phenom II X4 shot is now at the bottom outer layer of the Athlon II X2 shot.

Unlike the Bulldozer die, all they needed to do to make a dual core 0MB L3 device was omit the entire L3 cache block conveniently located at the bottom, and the 2 cores, rotate some portion of the I/O by 90 degrees and you got yourself a Athlon II X2.

Do you see the L3 caches all combined together in one spot for easy omission?

THAT kind of layout change doesn't happen on a lower-end crippled device.

Pic explaining the above:
will6u.jpg
 
Last edited:

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Silence from the AMD corner....I would not be thrilled if I was a stockholder.

Yeah, AMD is in pretty precarious position here.

If you have a stellar part do you do what Intel did and let people at it three months ahead of time to freeze out your competitor? The problem is you can (will) freeze yourself out at the same time, and AMD can't take the revenue loss.

On the other hand, silence implies you have nothing to hype. That doesn't inspire confidence either.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
On the other hand, silence implies you have nothing to hype. That doesn't inspire confidence either.

One is a risk to revenue and cashflow...a very real concern for any company.

The other is a risk to shareholders, who's net value has little to no impact on your own actual financials.

Without question AMD is best off keeping silent until they are ready to deliver on volume orders. Without question their shareholders will feel to the contrary.

It is obvious what the shareholders would want, they have the shortest of short term mindsets of everyone with a stake in the pot. But if I was a salaried employee at AMD or GloFo I'd be praying that my superiors would shut the hell up until we were ready to deliver.
 

OneEng1

Junior Member
Apr 3, 2010
9
0
0
Seems to me that a 2c BD would squarely compete with Llano. It is never a great idea to compete with yourself ;)

As far as IB vs BD, I wouldn't bet the farm that BD will outperform IB. Considering the deficit that K10.5 currently has against Sandybridge, it seems pretty optimistic to expect BD to not only catch up and exceed SB, but to do it so definitively that even IB can't catch-up.

Sure, AMD was able to do this with K8 vs P4; however, at that time, Intel was off thinking that physics didn't apply to them anymore since they were (after all) Intel.

As it turns out, heat density is a problem even if you happen to be Intel. P4's ultra deep pipeline and double clocked ALU's really turned and burned up the temps. K8 was a fundamentally better design, as was PIII/Centrino (where Core 2 got its roots from).

Once Intel determined that clock speed wasn't going to sell poor performing processors anymore and got their act together with Core 2, everything changed.

The Nehalem based architecture is not fundamentally inferior to Bulldozer as P4 was to K8. Intel has done a remarkable job refining their single threaded efficiency with SB. On the same die size, I suspect that BD will exceed the performance of SB.

Once IB is introduced, I suspect that the tables will turn and IB will exceed the performance of BD at the same die size.

In the server market, BD may be able to maintain its lead. As far as handling well threaded code goes, BD is an architecture designed to move information better than SB IMHO. Even IB may have difficulty eclipsing BD in this space. AMD could likely create a 2x10c part on a single socket (20c total) with little trouble on 32nm. It would be big, but it would be very powerful.

On the desktop, I feel pretty sure that IB is going to be hard to beat.
 

drizek

Golden Member
Jul 7, 2005
1,410
0
71
AMD is going to release 10-core bulldozer next year, and they will glue two of them together to make 20-core server parts, like they did with Thuban.
 

OneEng1

Junior Member
Apr 3, 2010
9
0
0
AMD is going to release 10-core bulldozer next year, and they will glue two of them together to make 20-core server parts, like they did with Thuban.

It does make me wonder if they should have opted for 3 channel or 4 channel memory. With 20 cores, are they going to have enough bandwidth to feed the beast? I realize that they have 2 banks of dual channel memory, but still ...... 20 cores?
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
It does make me wonder if they should have opted for 3 channel or 4 channel memory. With 20 cores, are they going to have enough bandwidth to feed the beast? I realize that they have 2 banks of dual channel memory, but still ...... 20 cores?
They already use 4 channels. Even so, 5 threads per channel could be pushing it.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
AMD is going to release 10-core bulldozer next year, and they will glue two of them together to make 20-core server parts, like they did with Thuban.

20 cores.....wow.....I can now finally run Photoshop. :rolleyes:

Give me 8 fast cores over 20 slower cores any day of the week. I am not sure why people get all excited over more cores without caring about speed or IPC performance or wattage.
 
Last edited:

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
20 cores.....wow.....I can now finally run Photoshop. :rolleyes:

Give me 8 fast cores over 20 slower cores any day of the week. I am not sure why people get all excited over more cores without caring about speed or IPC performance or wattage.


20 core would be a server part.... for people that run virtual machine, and set 1 core for each virtual machine server.

I agree though, for the desktop comsumer, most games dont even use 6-8 threads yet... so haveing 20 cores wouldnt do much for the gamer, but like the guy said its a server part.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
20 cores.....wow.....I can now finally run Photoshop. :rolleyes:

Give me 8 fast cores over 20 slower cores any day of the week.
Photoshop? Chances are good that its lone VGA output will only be used once during its entire lifetime, to configure the BIOS, and install the OS.

I am not sure why people get all excited over more cores without caring about speed or IPC performance or wattage.
Because Apache, MySQL, DB2, Oracle whatever-the-name-is-now, and Postgres can all make great use of more cores almost as well as faster cores, and are all very low IPC, throwing a wrench in your assumptions. In fact, most server software that isn't HPC is low IPC. Performance per watt can be quite good.

As such, the desktop part would top off at 10 cores, clocked much faster than the 20-core server part.