Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

videoclone · May 12, 2011

OCGuy said:
You still haven't read the post. Sorry, that is much more believeable to me than your anecdotal evidence.

Down clocking the CPU made no difference.

Ok to help those people who DO NOT play Supreme Commander 2 and with over 800 hours in game let me explain.

The game has a LIVE CPU benchmark indication in game. ( A SIM Rating is displayed much like ping ) The higher your SIM the better your running the game.

While you play your SIM rating reduces as the game progresses and more units, structures are created. Other player in the game also hav SIM ratings they can be higher or lower depending on there current "CPU"

An overclocked CPU will give you a few extra sims above someone with the same cpu at default Clock speed! They will start to slow down the game befor you do.

A guy with an i7 2600 + HD5770 1GB has the exact same sim from the start of the game to the end of the game as someone with a i7 2600 + HD6950 2GB

Everyone knows that RTS ( millions of little units running around doing AI things )
are all about CPU

and FPS ( pretty graphics and fast action ) are all about Videocard.

I will buy whatever is the fastest CPU for supcom2 and at the moment its the i7 2600. I'm Hoping Bulldozer is a faster gaming cpu then SB.

Friend has an i7 X980 and i have a higher SIM rating then he does everytime we play... WHY? because the i7 2600 is a faster Gaming CPU and the reviews prove it.

Idontcare · May 12, 2011

videoclone said:
Ok to help those people who DO NOT play Supreme Commander 2 and with over 800 hours in game let me explain.

< snip super helpful post contents for sake of brevity >

Dude, thank YOU so much for taking the time and breaking it down like this for the rest of us who just weren't getting it :thumbsup:

Seriously, I was clueless on the back and forth above.

I wish more posters would be like you and take a little time to bring the rest of us hapless thread readers up to speed on occasion. Thanks again!

Mopetar · May 12, 2011

Do you know if the game continues to scale with more cores or if it is hard-limited to a certain number? There are certainly games that benefit from additional cores (Civ V comes to mind), but there are also a lot that are CPU limited, but don't appear to benefit from additional cores. I'm not too familiar with how well this particular game scales, but if it's artificially limited, BD may not perform better.

I think developers are getting better at utilizing the growing number of CPU cores in modern systems, but there are still a lot of games that don't benefit from more cores.

Edit: After doing a little bit of searching, there are some posts claiming Supreme Commander 2 scales up to 12 threads. Would like something more substantive, but this is a game that may benefit quite well from BD.

iCyborg · May 12, 2011

Mopetar said:
Edit: After doing a little bit of searching, there are some posts claiming Supreme Commander 2 scales up to 12 threads. Would like something more substantive, but this is a game that may benefit quite well from BD.

That is quite at odds with videoclone's claim that i7 980X loses to 2600 though...

Mopetar · May 13, 2011

iCyborg said:
That is quite at odds with videoclone's claim that i7 980X loses to 2600 though...

I'd hoped to find more substantial support one way or the other, but there were a few forum posts on various sites that seemed to indicate that it could scale well enough for the current crop of CPUs. I suppose until someone posts some actual benchmarks, we should take all anecdotal evidence with a grain of salt. I'll see if I can find some that indicate one way or another.

videoclone · May 13, 2011

iCyborg said:
That is quite at odds with videoclone's claim that i7 980X loses to 2600 though...

The developers worked on this game with dual and quad core in mind. It has full support for multi core CPUs but just how many cores i'm not sure.

Yeah the 6 cores on the i7 980X would be better then the 4 on the i7 2600 but i think the more efficient CPU is helping balance out the difference.

Plus Sandybridge as seen in reviews seams to excell in videogames.
Just check out these benchmarks
http://www.anandtech.com/show/4083/...core-i7-2600k-i5-2500k-core-i3-2100-tested/20

videoclone · May 13, 2011

Mopetar said:
I'd hoped to find more substantial support one way or the other, but there were a few forum posts on various sites that seemed to indicate that it could scale well enough for the current crop of CPUs. I suppose until someone posts some actual benchmarks, we should take all anecdotal evidence with a grain of salt. I'll see if I can find some that indicate one way or another.

Look above this post to the anandtech link... The i7 2600 is clearly the faster GAMING CPU in both RTS and FPS. ( Dawn of WarII and StarcraftII being the CPU hungry RTS examples )

We just need Bulldozer to beat those numbers to make it a VERY interesting 2011

Idontcare said:
Dude, thank YOU so much for taking the time and breaking it down like this for the rest of us who just weren't getting it :thumbsup:

Seriously, I was clueless on the back and forth above.

I wish more posters would be like you and take a little time to bring the rest of us hapless thread readers up to speed on occasion. Thanks again!

Thank's and nice moderating job by the way ^_~

iCyborg · May 13, 2011

videoclone said:
Yeah the 6 cores on the i7 980X would be better then the 4 on the i7 2600 but i think the more efficient CPU is helping balance out the difference.

Look above this post to the anandtech link... The i7 2600 is clearly the faster GAMING CPU in both RTS and FPS. ( Dawn of WarII and StarcraftII being the CPU hungry RTS examples )

We just need Bulldozer to beat those numbers to make it a VERY interesting 2011

I don't think that's what's balancing it out. Starcraft doesn't include 980X and 975, but the graph for Dawn of War 2 clearly suggests that the game doesn't scale beyond 4 cores, since 975 actually beats 980X despite having the same clock, same architecture and 2 cores less.

So, yes, 2600K beats 980X, but it also beats 975 by only 6% in DoW2. With slightly higher clock, better turbo and medium settings at a mediocre resolution, it's hardly an impressive showing. You can see in encoding types of tests what happens when apps scale with cores.

For the same reasons, it doesn't look like BD should be something to look forward for you since AMD seems to be going for the core count there. I'd be very pleasantly surprised if BD can match SB at single-threaded performance (at similar prices, not clock-for-clock). And if it can OC as well.

videoclone · May 13, 2011

You should look at all gaming benchmarks in general anyways
We can only hope that it will match or beat it in single thread! (on price!)
Either way we wont be waiting very long to find out though

DirkGently1 · May 13, 2011

Tuna-Fish said:
Supreme Commander and Shogun II are both notoriously cpu-hungry. There simply isn't a processor in the world that will run either perfectly in very complex situations.

Perhaps in 4 years?

If game game companies are releasing games that can't be played with current hardware i guess it is something that needs looking at. It's a bit ludicrous really. Are you saying that the games can't be played or just that there is a bottleneck with the CPU? I find it hard to believe that a SB @ 3.2Ghz+ with Tri-Fire or 3 way SLI, for example, will not be getting 60FPS. Am i wrong?

Tuna-Fish · May 13, 2011

OCGuy said:
You still haven't read the post. Sorry, that is much more believeable to me than your anecdotal evidence.

Down clocking the CPU made no difference.

The load Supreme Commander places on your CPU depends very much on the map and the amount of units.

In that test, they were hitting 100fps flat, which means they couldn't possibly have been playing a very large map. Ask anyone who has ever played any amount of Supreme Commander, you can not expect to be anywhere near 100fps in the late game. Just another case of the benchmarks being done by idiots.

DirkGently1 said:
If game game companies are releasing games that can't be played with current hardware i guess it is something that needs looking at. It's a bit ludicrous really. Are you saying that the games can't be played or just that there is a bottleneck with the CPU? I find it hard to believe that a SB @ 3.2Ghz+ with Tri-Fire or 3 way SLI, for example, will not be getting 60FPS. Am i wrong?

Supreme Commander can be played just fine when you limit the players/units/mapsize. It's just that it's the "heir" of Total Annihilation, and the game mechanics make it more fun as you add more units. The game company just didn't see a reason to put arbitrary limits on the maximum units -- so you have to know how large games your rig can play before you join them.

There is no need for tri-SLI or anything similar -- the game isn't really that heavy on the GPU. But as the load it places on your CPU goes up with the amount of units, and players generally want to play with more of them, there just isn't enough CPU power out there.

Arkadrel · May 13, 2011

Oh man! I remember TA (total Annihilation), that was epic for lan parties

But 2hours into a match, the air plane wars (with enough players).... would then make most of our pcs at lan parties lag. That game has to be one of the most CPU intensive games ever, once everyone has max units and is going for a air battle.

HW2050Plus · May 13, 2011

Dresdenboy said:
Avoiding the work of doing an in-depth analysis involving details of manufacturing (from mask costs over foundry-fabbing and their fixed/variable costs - now per die, to stepper throughput issues of certain die widths and lengths), of development issues and so on, I'll try to keep it short with my thoughts:
- How much additional costs are caused by the unused space and what's the effect on margins?
- Will those 8C dies be sold to lower ASP market segments?
- Is production capacity limited in regard of anticipated volumes?

I do see the die size utilization as a performance issue. As you know you cannot make a chip as large as you want. There are many limits, speed pathes, TDP and finally costs.

AMD will not suffer from initial Zambezi die size. But they already suffer when it comes to Interlagos and they will suffer also in desktop area latest in 2012 when Sandybridge E parts are available.

Let me just explain what I mean so you understand where I am seeing the problem with an example:
Intel 4C/8T: ~150-160 mm²
AMD 4M/8C: ~280 mm²
Intel 6C/12T: ~220 mm²
Intel 8C/16T: ~290 mm²
(theor. AMD 8M/16C: 560 mm²)

All that still on 32 nm! Means that with roughly the same die size consumption Intel's architecture is much faster than AMD Bulldozer.

I mean AMD Bulldozer will surpass Intel SB 4C in performance but Intel can just double the core count and AMD is in the same position as now: Significantly behind the competition. Okay you could say why crying it is the same as of now but the difference is that AMD issued a brand new architecture but did not change anything by that. That will affect average reslling prices. The problem in performance per die size is not the die size it is the performance.

And in 2012 Intel will go to 22 nm die shrink.

Now die size is not everything because when you are TDP limited then a small die size does not help out. But there the same bad news. Intel can do their SB 8C/16T with roughly the maximum TDP they have (still on 32 nm). AMD obviously has also problems there because for the high end 4M/8C part they already speciofied 125W TDP. So AMD would have to drop also frequency significantly when adding new cores (problem of Interlagos).

As you see though they surpass Sandy Bridge 4C this is just not enough to be competitive. And by that I mean that AMD is able to raise average selling prices. Again the 2H2011 might be well for AMD because of Bulldozer but in 2012 it will look the same or (hopfully not) even worse for AMD as of now.

That is my point and not the manufacturing ability.

So Bulldozer should fix two design issues:
a) CMT -> This is just not effective enough regarding die size.
b) High Frequency Design -> Obviously not effective regarding TDP (I mean AMD has even SOI and higher TDP).
c) High uncore die consumption

So my proposal for AMD for Bulldozer II:
to solve a):
BD brings a lot of fine things with it: A decoder capable of doing 4 ops decode / cycle. This is an advantage over Intel. So just add another one to feed the other integer core. That takes ~6 mm²/core. A lot but you gain a lot.
Now also widen up to 4 ALU + 2 AGLU. That costs very little ~2 mm²/core. Also the scheduler must be changed of course but that is no issue as the scheduler step back was related to b) and as we fix b) that should be no issue. The result could be a core that is equal or faster than SB core. On top of that utilizing the amazing decoder they can add SMT. So another SB feature included.

b) Remove high speed design, fix latencies (esp. INT-SSE) etc. That will fix TDP issues

c) Optimize uncore: That will result in even smaller chip though we added more transistors with a)

Result could be a BD II that is able to compete (equal or faster) than a SB-E chip. That will lift AMD on par with Intel.

That is what AMD should do without delay to not loose the ability to compete and go the way the other x86 companies did (Cypress, IBM, etc.). These companies also lost first the performance market and then their whole business.

APUs alone will not do it for AMD as you do not earn enough to finance the business.

Martimus · May 13, 2011

HW2050Plus said:
I do see the die size utilization as a performance issue. As you know you cannot make a chip as large as you want. There are many limits, speed pathes, TDP and finally costs.

AMD will not suffer from initial Zambezi die size. But they already suffer when it comes to Interlagos and they will suffer also in desktop area latest in 2012 when Sandybridge E parts are available.

Let me just explain what I mean so you understand where I am seeing the problem with an example:
Intel 4C/8T: ~150-160 mm²
AMD 4M/8C: ~280 mm²
Intel 6C/12T: ~220 mm²
Intel 8C/16T: ~290 mm²
(theor. AMD 8M/16C: 560 mm²)

All that still on 32 nm! Means that with roughly the same die size consumption Intel's architecture is much faster than AMD Bulldozer.

http://www.anandtech.com/show/4118/a-closer-look-at-the-sandy-bridge-die

4C Sandybridge is 216mm^2, not 150mm^2.
6C Gulftown is 240mm^2, not 220mm^2.

The rest don't exist yet.

Also, you are making assumptions on performance site unseen for Bulldozer, and analyzing the results of these fictional performance figures and making a judgement on how efficient the die is. You have no idea if an equivalent sized SB or BD processor would be superior, since we only have data from one of these chips.

Arkadrel · May 13, 2011

A 8c bulldozer chip seems to be around 280-290mm^2 by most peoples guesses (measureing pics)
A 4c bulldozer chip then you could reason would be HALF of that.

so 280-290 /2 => 140-145mm^2 for a 4c bulldozer chip.

Anandtech says:
4c Sandy Bridge = 216mm^2
4c bulldozer estimation guess = 145mm^2

ofc the Sandy bridge will have 8 threads...
which in stuff that uses 8 threads, the 4c sandy bridge will most likely beat the 4core bulldozer.
That is why the bulldozer arch shares stuff, to add a 2nd core without takeing up much space.

If AMD has any chance of catching Intel, it will be from very smart designs, to be honest Im not sure if that is enough myself.

The bulldozer chips will probably be sold abit cheaper, and live off of that.
Id like to think the bulldozer will be faster than the Sandy Bridge... but it probably wont be, reguardless its a big step up from their Phenom II's.

Links to guys measureing pictures for predictions on Bulldozer 8core chip size:
http://citavia.blog.de/2011/03/01/isscc-2011-news-and-bulldozer-die-size-10726253/
http://www.semiaccurate.com/forums/showthread.php?p=100377#post100377

@HW2050Plus

Compaireing 1 modual of a bulldozer vs 1 sandy bridge thread (not even a core)
Is probably wishfull thinking.

I seriously doubt sandy bridge will be twice as powerfull pr core, as the bulldozer is.
It probably makes more sense if you compair them core to core. Although this way, performance wise Intel chips will probably be abit faster in multi threaded stuff because of hyperthreading giveing them extra treads to work with.

podspi · May 13, 2011

HW2050Plus said:
I do see the die size utilization as a performance issue. As you know you cannot make a chip as large as you want. There are many limits, speed pathes, TDP and finally costs.

AMD will not suffer from initial Zambezi die size. But they already suffer when it comes to Interlagos and they will suffer also in desktop area latest in 2012 when Sandybridge E parts are available.

...<snip>

APUs alone will not do it for AMD as you do not earn enough to finance the business.

Performance per mm^2 is important, but we don't know BD's performance yet. We also don't really know BD's size (although there are a lot of estimates I'd be inclined to believe). Here is what I think is going on.

First, I think we should all realize that Zambezi is going to be a very low volume product for AMD. I think this actually is going to cause die size to be larger than it would be otherwise, because AMD is not going to bother making a specific consumer die. So the "unoptimized uncore" is probably due to all of the extra things for upcoming BD-based Opterons coming in Q3. We will (probably) have to wait until Trinity to see a consumer-optimized BD CPU. I also don't think AMD is going to be hurting too badly from the size of Interlagos, since it is an MCM.

Second, I think APUs will be more than enough. I know people who purchased Bobcat based systems who are satisfied with them. Llano is going to be more than capable for the average user, and with hybrid crossfire is, imho, going to be the mobile gaming CPU to get. These are incredibly high volume products that will make AMD a lot of money.

Hopefully in Q3 the BD-based Opterons will also be successful, but since we have no idea how they are going to perform, that is up in the air at the moment.

Riek · May 13, 2011

HW2050Plus said:
I do see the die size utilization as a performance issue. As you know you cannot make a chip as large as you want. There are many limits, speed pathes, TDP and finally costs.

AMD will not suffer from initial Zambezi die size. But they already suffer when it comes to Interlagos and they will suffer also in desktop area latest in 2012 when Sandybridge E parts are available.

Let me just explain what I mean so you understand where I am seeing the problem with an example:
Intel 4C/8T: ~150-160 mm²
AMD 4M/8C: ~280 mm²
Intel 6C/12T: ~220 mm²
Intel 8C/16T: ~290 mm²
(theor. AMD 8M/16C: 560 mm²)

All that still on 32 nm! Means that with roughly the same die size consumption Intel's architecture is much faster than AMD Bulldozer.

more like intel SB 4C ~ 180mm
Gulftown 6core ~ 240mm
8core doesn't exist yet, will it exist?? leaked roadmaps show SB-E 6core with massive more cache (>300mm)
Amd 4M/8C = unknown yet. but calculation show around ~280

Makes it between gulftown and a 6-core SB-E Which is not that bad... given those 6core intels are running against the 130W TDP limit around 3.33Ghz.

I mean AMD Bulldozer will surpass Intel SB 4C in performance but Intel can just double the core count and AMD is in the same position as now: Significantly behind the competition. Okay you could say why crying it is the same as of now but the difference is that AMD issued a brand new architecture but did not change anything by that. That will affect average reslling prices. The problem in performance per die size is not the die size it is the performance.

And in 2012 Intel will go to 22 nm die shrink.

There are more aspects to cost than die size.. what about setup costs and return value? What about the time needed to launch your device?

Now die size is not everything because when you are TDP limited then a small die size does not help out. But there the same bad news. Intel can do their SB 8C/16T with roughly the maximum TDP they have (still on 32 nm). AMD obviously has also problems there because for the high end 4M/8C part they already speciofied 125W TDP. So AMD would have to drop also frequency significantly when adding new cores (problem of Interlagos).

Seriously?? intel top end 130W!! 32nm Westmere-Ex is running around 2.4GHz
Gulftown 130W is running around 3.46GHz for 6cores
SB 3.4Ghz 4core TDP 95 is running near its maximum.(in AVX code)
No they cannot release an SB-E 8core @ 3.4GHz without serious binning.
Also the BD 4M/8C has TDP of 95W and 125W.

As you see though they surpass Sandy Bridge 4C this is just not enough to be competitive. And by that I mean that AMD is able to raise average selling prices. Again the 2H2011 might be well for AMD because of Bulldozer but in 2012 it will look the same or (hopfully not) even worse for AMD as of now.

That is my point and not the manufacturing ability.

Try look it rom a server perspective where the roles would become completely the reverse from what you believe.

If BD 8c manages to be around gulftown 6c they are fine. The difference in die size is almost nothing and will have the die size advantage against SB-E 6core version.
22nm won't come to server/high end for a long time seen from now.

So Bulldozer should fix two design issues:
a) CMT -> This is just not effective enough regarding die size.
b) High Frequency Design -> Obviously not effective regarding TDP (I mean AMD has even SOI and higher TDP).
c) High uncore die consumption

a) you don't know.
b) howso not effective? 95W TDP 8cores and performance above SB 2600K (guess) which has a TDP of 95 also (and reaches it close with AVX code).
c) looking at the released manipulated screens... yes but was that final? and will probably be improved when the 5module BD+ arrives (includes fpu change).

So my proposal for AMD for Bulldozer II:
to solve a):
BD brings a lot of fine things with it: A decoder capable of doing 4 ops decode / cycle. This is an advantage over Intel. So just add another one to feed the other integer core. That takes ~6 mm²/core. A lot but you gain a lot.
Now also widen up to 4 ALU + 2 AGLU. That costs very little ~2 mm²/core.

CMT was done for a reason, you want to make it two seperate chips again. different decoders = different predictors = completely different front ends = alot more die space than performance gain. (but i understand you are still stuck at the decoding limitation in your head).
Adding 2ALU/core would be complete idiocy, gain would be below 10% costs is not inly in core space, but also in sheduler in retire, in load store in datapaths..

Also the scheduler must be changed of course but that is no issue as the scheduler step back was related to b) and as we fix b) that should be no issue. The result could be a core that is equal or faster than SB core. On top of that utilizing the amazing decoder they can add SMT. So another SB feature included.

So basically your id to improve BD is remove the shared decoders, widen the exeuction resources and use SMT to use those addes resources? So BD should become like SB and the traditional core design? How revolutionary -_-

b) Remove high speed design, fix latencies (esp. INT-SSE) etc. That will fix TDP issues

There are no TDP issues. the latencies are fine for integer and SSE. High speed design has its advantages.

c) Optimize uncore: That will result in even smaller chip though we added more transistors with a)

If they have a uncore die size issue (which you think due to photoshopped dies) then they can fix it when they go to FMA3 implementation and go to 5Module design.

Result could be a BD II that is able to compete (equal or faster) than a SB-E chip. That will lift AMD on par with Intel.

Result will indeed be that BD II would = SB core since that is what you want to design, you will have 2 chips that basically have the same advantages and disadvantages. BD will compete with SB-E in terms of power/die-size/performance relation. (SB-E 6core will already be bigger than BD 4M).

That is what AMD should do without delay to not loose the ability to compete and go the way the other x86 companies did (Cypress, IBM, etc.). These companies also lost first the performance market and then their whole business.

APUs alone will not do it for AMD as you do not earn enough to finance the business.

I hope you are sarcastic in your whole post.. BD has a clear server oriented design which is looking to be alot better in $/W/mm compared to any intel product there. The design will be less efficient on the Desktop due to the nature of the workload but will be alot more efficient than what they have now.

Tuna-Fish · May 13, 2011

HW2050Plus said:
A lot but you gain a lot.
Now also widen up to 4 ALU + 2 AGLU. That costs very little ~2 mm²/core.

WHAAAT!?

Do you have any idea about processor design? At all? You want to add 4 more read ports and 2 more write ports into the register file, all the extra for forwarding to two more units, and you say it would cost "very little"?

Just STFU when you clearly have no clue whatsoever what you are talking about.

It might well not be possible to add that much extra access to the register file without making it much slower.

And all that for 2 more ALU's? Which, for x86, would absolutely not give you back the performance you just lost to a lower clock speed?

What are you thinking?

HW2050Plus · May 13, 2011

Martimus said:
4C Sandybridge is 216mm^2, not 150mm^2.

You make the mistake to forget to substract the die size consumption of included graphics unit.

Martimus said:
6C Gulftown is 240mm^2, not 220mm^2.

Oh what difference ...

Arkadrel said:
A 8c bulldozer chip seems to be around 280-290mm^2 by most peoples guesses (measureing pics)
A 4c bulldozer chip then you could reason would be HALF of that.

so 280-290 /2 => 140-145mm^2 for a 4c bulldozer chip.

Yes a 4C Bulldozer which competes with Dual Core Sandy Bridge regarding performance. I take two similar (as far as you can do that) performing chips and comparing the die sizes. And what is your point?

Arkadrel said:
I seriously doubt sandy bridge will be twice as powerfull pr core, as the bulldozer is.

2 Bulldozer Cores will be a little bit faster overall than 1 Sandy Bridge Core though in some single benchmark results BD will also come out below that. So I claim that for sure at least in one benchmark 2 BD cores will be slower than a SB core (HT on).

Arkadrel said:
Performance per mm^2 is important, but we don't know BD's performance yet. We also don't really know BD's size (although there are a lot of estimates I'd be inclined to believe). Here is what I think is going on.

We have enough information to make an educated guess on performance of Bulldozer. Also we have the new die picture without at least apparent obfuscation. Also the die size fits with what we know from Deneb & Co (bad die size -> uncore).

Riek said:
8core doesn't exist yet, will it exist?? leaked roadmaps show SB-E 6core with massive more cache (>300mm)

Sure it will exist and it is on the roadmaps. The question is if Intel will release also a desktop 8C part. It was on their desktop roadmap but disappeared. Maybe but that is speculation because Bulldozer is too weak so a 6 Core part is just enough to turn out complete superior on desktop.

Riek said:
Seriously?? intel top end 130W!! 32nm Westmere-Ex is running around 2.4GHz
Gulftown 130W is running around 3.46GHz for 6cores
SB 3.4Ghz 4core TDP 95 is running near its maximum.(in AVX code)
No they cannot release an SB-E 8core @ 3.4GHz without serious binning.
Also the BD 4M/8C has TDP of 95W and 125W.

They have now 95W TDP including a graphics core and they really don't get that far even with the graphics core. With AVX you impose - might be but as there is currently no code out I do not mind if the Turbo is lower then.

Maybe they issue the 6 core as 130 Watt TDP but only to be able to issue 8C - if needed - without need for motherboard redesign. That might be the reason to request for such a high design power.

Riek said:
If BD 8c manages to be around gulftown 6c they are fine. The difference in die size is almost nothing and will have the die size advantage against SB-E 6core version.
22nm won't come to server/high end for a long time seen from now.

They don't need 22 nm because SB-E will come on 32 nm and have 8 Cores and they will be faster than Interlagos 16 Core.
Your comparison of Gulftown/Westmere vs. Bulldozer is a comparison of the past. A comparison of SB-E vs. BD is what compete in 2012.

Riek said:
a) you don't know.

Oh yes I know. CMT causes that 2 cores have to share a single decoder. That is the issue. You invest a lot of silicon with extremly little gain. Just compare die size of Llano core (~9.6 mm²) with that of BD module (~18.9 mm²). As two Llano cores are faster than a BD module AMD even has a reduced performance vs. die size efficiency than they had with their current chips. So in fact regarding performance vs. die size Bulldozer is even a step back for AMD increasing the distance to Intel. Yes they have some additional features like AVX, FO4 but that won't help AMD really.

What is good is the shared FPU because FPU costs really much die size.

Riek said:
b) howso not effective? 95W TDP 8cores and performance above SB 2600K (guess) which has a TDP of 95 also (and reaches it close with AVX code).

First the 4M/8C top part will come as a 125 W TDP part. Next the SB 2600K is consuming much less than 95W. You claim not in AVX - I claim who cares as no AVX code is out. And then particularly in AVX we have to see if this - claim by your side - high consumption in AVX does not come from exceptional performance.

Riek said:
c) looking at the released manipulated screens... yes but was that final? and will probably be improved when the 5module BD+ arrives (includes fpu change).

One module more and what has Intel then?

Riek said:
Adding 2ALU/core would be complete idiocy, gain would be below 10% costs is not inly in core space, but also in sheduler in retire, in load store in datapaths..

For you: ~2-3 mm² more for that per module. Would add up to 12 mm² more for the whole chip on a 4M/8C. You should realize for what die space is consumed. It is not for interger. Roughly 30% are for decoders, about 35-40% for FPU/SSE and 30% for integer/pipelines/scheduler/L1 cache. Now let's take BD: 18.9 mm² for core -> ~6 mm² for integer/pipelines/scheduler/L1 cache - makes ~3 mm² for a single core in total. So for all of your integer stuff you have on your 280 mm² chip around 24 mm² for the integer performance. A single pipeline + Register file path + scheduler part + ALU will be around ~ 0.75 mm². That is the cheapest way of all to crank up performance.

That is why I would keep the shared FPU. Because the FPU consumes many times more of die space.

Riek said:
So basically your id to improve BD is remove the shared decoders, widen the exeuction resources and use SMT to use those addes resources? So BD should become like SB and the traditional core design? How revolutionary -_-

Exactly. Yes it is not revolutionary. But guess what the market leader does from what he gets exceptional performance? What they did with Core? Why Intel is since 5 years better in performance than AMD? Guess what they did?
Take good old PIII wided the execution width (port 5, AGU with Sandy Bridge, Macro Op Fusion, etc.) and reduced instruction latencies. Paired with SMT, Prefetch and fast caches. So yes not revolutionary but extremly successful in any regard.

AMD is the again the first who did the near impossible - a wide x86 decoder. Fantastic! But they split this power in two halves. What a waste!

Intel is very decoder limited so this could have become a major advantage. Intel is struggling and added a loop trace cache in Sandy Bridge to compensate for these limitations. AMD comes out with a full solution but just wastes it!

As I said. Bulldozer will bring a great decoder, a good FPU, finally also prefetchers and long L/S queues. But all these great things are completly annihilated by (integer) CMT and the high frequency design. If they remove those two points which just break Bulldozer (again slower than predecessor per die size) then everything could be really fine. If those are fixed AMD could become roughly on par with Intel regarding overall performance, performance per die size, performance per Watt. But with the first BD incarnation they will even lack more behind as they do now. This is only covered by the 32 nm transition and the ability to issue 8 cores.

And to come back again:

Riek said:
So basically your id to improve BD is remove the shared decoders, widen the exeuction resources and use SMT to use those addes resources? So BD should become like SB and the traditional core design? How revolutionary -_-

Issuing a P4-like design is also not revolutionary. CMT is okay. But CMT is just stupid on x86 as x86 is so much decoder limited. CMT on FPU is clever because many applications don't use FPU and FPU costs really a lot of die space. But on integer? Double cores but then half the core it self? What is the logic inside? Yes there is a gain but you could achieve that with SMT as well at almost no cost.

HW2050Plus · May 13, 2011

Tuna-Fish said:
WHAAAT!?

Do you have any idea about processor design? At all? You want to add 4 more read ports and 2 more write ports into the register file, all the extra for forwarding to two more units, and you say it would cost "very little"?

Just STFU when you clearly have no clue whatsoever what you are talking about.

It might well not be possible to add that much extra access to the register file without making it much slower.

And all that for 2 more ALU's? Which, for x86, would absolutely not give you back the performance you just lost to a lower clock speed?

What are you thinking?

I think you have little knowledge.

Think again what an incredible success Intel had regarding by just adding Port 5 and how little it costed.

I also named the die sizes of chips where it is done like that and how small they are. And Bulldozer where it was done vice versa and how bad the performance vs. die size gets.

Martimus · May 13, 2011

HW2050Plus said:
I also named the die sizes of chips where it is done like that and how small they are. And Bulldozer where it was done vice versa and how bad the performance vs. die size gets.

I am going to say this again; you are making assumptions on performance site unseen for Bulldozer, and analyzing the results of these fictional performance figures and making a judgement on how efficient the die is.

This makes your arguments seem very silly, to say the least. The truth is that neither you nor I know anything about the performance of BD versus the die size. Saying otherwise just seems foolish to me.

Dribble · May 13, 2011

Martimus said:
I am going to say this again; you are making assumptions on performance site unseen for Bulldozer, and analyzing the results of these fictional performance figures and making a judgement on how efficient the die is.

This makes your arguments seem very silly, to say the least. The truth is that neither you nor I know anything about the performance of BD versus the die size. Saying otherwise just seems foolish to me.

lol - this is page 83 of a thread full of assumptions, if that's your view point then I think you are about 82 pages late in expressing it.

Martimus · May 13, 2011

Dribble said:
lol - this is page 83 of a thread full of assumptions, if that's your view point then I think you are about 82 pages late in expressing it.

Good point. Touché

It is kind of funny how this thread has stayed alive for months, and thousands of posts with almost no actual content.

Edrick · May 13, 2011

How many more days until BD is released?

I don't know which I am more excited about, the actual release of the CPU or the death of this thread.

Terzo · May 13, 2011

Edrick said:
How many more days until BD is released?

I don't know which I am more excited about, the actual release of the CPU or the death of this thread.

I think rumors indicate it will be revealed at E3 (June 7-9) and released on June 20.

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Golden Member

Elite Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Senior member

Golden Member

Diamond Member

Member

Diamond Member

Diamond Member

Golden Member

Senior member

Golden Member

Member

Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member