4C Sandybridge is 216mm^2, not 150mm^2.
You make the mistake to forget to substract the die size consumption of included graphics unit.
6C Gulftown is 240mm^2, not 220mm^2.
Oh what difference ...
A 8c bulldozer chip seems to be around 280-290mm^2 by most peoples guesses (measureing pics)
A 4c bulldozer chip then you could reason would be HALF of that.
so 280-290 /2 => 140-145mm^2 for a 4c bulldozer chip.
Yes a 4C Bulldozer which competes with Dual Core Sandy Bridge regarding performance. I take two similar (as far as you can do that) performing chips and comparing the die sizes. And what is your point?
I seriously doubt sandy bridge will be twice as powerfull pr core, as the bulldozer is.
2 Bulldozer Cores will be a little bit faster overall than 1 Sandy Bridge Core though in some single benchmark results BD will also come out below that. So I claim that for sure at least in one benchmark 2 BD cores will be slower than a SB core (HT on).
Performance per mm^2 is important, but we don't know BD's performance yet. We also don't really know BD's size (although there are a lot of estimates I'd be inclined to believe). Here is what I think is going on.
We have enough information to make an educated guess on performance of Bulldozer. Also we have the new die picture without at least apparent obfuscation. Also the die size fits with what we know from Deneb & Co (bad die size -> uncore).
8core doesn't exist yet, will it exist?? leaked roadmaps show SB-E 6core with massive more cache (>300mm)
Sure it will exist and it is on the roadmaps. The question is if Intel will release also a desktop 8C part. It was on their desktop roadmap but disappeared. Maybe but that is speculation because Bulldozer is too weak so a 6 Core part is just enough to turn out complete superior on desktop.
Seriously?? intel top end 130W!! 32nm Westmere-Ex is running around 2.4GHz
Gulftown 130W is running around 3.46GHz for 6cores
SB 3.4Ghz 4core TDP 95 is running near its maximum.(in AVX code)
No they cannot release an SB-E 8core @ 3.4GHz without serious binning.
Also the BD 4M/8C has TDP of 95W and 125W.
They have now 95W TDP including a graphics core and they really don't get that far even with the graphics core. With AVX you impose - might be but as there is currently no code out I do not mind if the Turbo is lower then.
Maybe they issue the 6 core as 130 Watt TDP but only to be able to issue 8C - if needed - without need for motherboard redesign. That might be the reason to request for such a high design power.
If BD 8c manages to be around gulftown 6c they are fine. The difference in die size is almost nothing and will have the die size advantage against SB-E 6core version.
22nm won't come to server/high end for a long time seen from now.
They don't need 22 nm because SB-E will come on 32 nm and have 8 Cores and they will be faster than Interlagos 16 Core.
Your comparison of Gulftown/Westmere vs. Bulldozer is a comparison of the past. A comparison of SB-E vs. BD is what compete in 2012.
Oh yes I know. CMT causes that 2 cores have to share a single decoder. That is the issue. You invest a lot of silicon with extremly little gain. Just compare die size of Llano core (~9.6 mm²) with that of BD module (~18.9 mm²). As two Llano cores are faster than a BD module AMD even has a reduced performance vs. die size efficiency than they had with their current chips. So in fact regarding performance vs. die size Bulldozer is even a step back for AMD increasing the distance to Intel. Yes they have some additional features like AVX, FO4 but that won't help AMD really.
What is good is the shared FPU because FPU costs really much die size.
b) howso not effective? 95W TDP 8cores and performance above SB 2600K (guess) which has a TDP of 95 also (and reaches it close with AVX code).
First the 4M/8C top part will come as a 125 W TDP part. Next the SB 2600K is consuming much less than 95W. You claim not in AVX - I claim who cares as no AVX code is out. And then particularly in AVX we have to see if this - claim by your side - high consumption in AVX does not come from exceptional performance.
c) looking at the released manipulated screens... yes but was that final? and will probably be improved when the 5module BD+ arrives (includes fpu change).
One module more and what has Intel then?
Adding 2ALU/core would be complete idiocy, gain would be below 10% costs is not inly in core space, but also in sheduler in retire, in load store in datapaths..
For you: ~2-3 mm² more for that per module. Would add up to 12 mm² more for the whole chip on a 4M/8C. You should realize for what die space is consumed. It is not for interger. Roughly 30% are for decoders, about 35-40% for FPU/SSE and 30% for integer/pipelines/scheduler/L1 cache. Now let's take BD: 18.9 mm² for core -> ~6 mm² for integer/pipelines/scheduler/L1 cache - makes ~3 mm² for a single core in total. So for all of your integer stuff you have on your 280 mm² chip around 24 mm² for the integer performance. A single pipeline + Register file path + scheduler part + ALU will be around ~ 0.75 mm². That is the cheapest way of all to crank up performance.
That is why I would keep the shared FPU. Because the FPU consumes many times more of die space.
So basically your id to improve BD is remove the shared decoders, widen the exeuction resources and use SMT to use those addes resources? So BD should become like SB and the traditional core design? How revolutionary -_-
Exactly. Yes it is not revolutionary. But guess what the market leader does from what he gets exceptional performance? What they did with Core? Why Intel is since 5 years better in performance than AMD? Guess what they did?
Take good old PIII wided the execution width (port 5, AGU with Sandy Bridge, Macro Op Fusion, etc.) and reduced instruction latencies. Paired with SMT, Prefetch and fast caches. So yes not revolutionary but extremly successful in any regard.
AMD is the again the first who did the near impossible - a wide x86 decoder. Fantastic! But they split this power in two halves. What a waste!
Intel is very decoder limited so this could have become a major advantage. Intel is struggling and added a loop trace cache in Sandy Bridge to compensate for these limitations. AMD comes out with a full solution but just wastes it!
As I said. Bulldozer will bring a great decoder, a good FPU, finally also prefetchers and long L/S queues. But all these great things are completly annihilated by (integer) CMT and the high frequency design. If they remove those two points which just break Bulldozer (again slower than predecessor per die size) then everything could be really fine. If those are fixed AMD could become roughly on par with Intel regarding overall performance, performance per die size, performance per Watt. But with the first BD incarnation they will even lack more behind as they do now. This is only covered by the 32 nm transition and the ability to issue 8 cores.
And to come back again:
So basically your id to improve BD is remove the shared decoders, widen the exeuction resources and use SMT to use those addes resources? So BD should become like SB and the traditional core design? How revolutionary -_-
Issuing a P4-like design is also not revolutionary. CMT is okay. But CMT is just stupid on x86 as x86 is so much decoder limited. CMT on FPU is clever because many applications don't use FPU and FPU costs really a lot of die space. But on integer? Double cores but then half the core it self? What is the logic inside? Yes there is a gain but you could achieve that with SMT as well at almost no cost.