• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Page 48 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.
While this is true, I also believe that AMD is smart enough to realize that if BD is really going to stomp all over 2600k they need to let everyone know about it RIGHT NOW.

Not at all. If they say Bulldozer is amazing in the next few days/weeks or so, and they screw up shortly after that and miss execution, they'll lose sales. They would have also lose sales on current processors for people who waited. It might not be big, but there's no real gain.

HW2050Plus:

You can't judge its bad engineering based on die sizes. Think of it this way. If Bulldozer module was twice as larger at 60mm2, the whole chip might be 30mm2 larger per module it contains. You just can`t know. Only the engineers do.
 
who's going to wait on an 1100t for a BD? If BD is faster than SB then 1100t will just be a bug on its windshield. Any AMD enthusiast who is buying 1100t right now is not very likely to wait 3 months to spend double the money for 50% more performance; if he needs that performance then he'd just jump ship and get an SB right now. UNLESS, of course, he has reason to think that BD will as fast as/faster than SB. Or are you talking about athlon 2 555 buyers instead deciding to spend 4x as much for BD? That just doesn't make sense. AMD has lost the vast majority of their enthusiast loyalists (including yours truly) over the past 5 years of futility. If that futility is about to turn into dominance, or even parity, then a little inside info would most likely convince many enthusiasts who are in the market right now to hold off for a few months. And this instance is particularly important because they can actually capitalize on an intel mistake.
 
who's going to wait on an 1100t for a BD? If BD is faster than SB then 1100t will just be a bug on its windshield. Any AMD enthusiast who is buying 1100t right now is not very likely to wait 3 months to spend double the money for 50% more performance; if he needs that performance then he'd just jump ship and get an SB right now. UNLESS, of course, he has reason to think that BD will as fast as/faster than SB. Or are you talking about athlon 2 555 buyers instead deciding to spend 4x as much for BD? That just doesn't make sense. AMD has lost the vast majority of their enthusiast loyalists (including yours truly) over the past 5 years of futility. If that futility is about to turn into dominance, or even parity, then a little inside info would most likely convince many enthusiasts who are in the market right now to hold off for a few months. And this instance is particularly important because they can actually capitalize on an intel mistake.

I don't necessarily disagree with your opinion, but enthusiast's make an insignificant impact on AMD/Intel's revenue. Don't get me wrong.. I would also like to get some insider info myself.. we are atleast 3 months away from launch. Do you think AMD can handle being cash strapped for 3 months due to hyping up the product for enthusiasts?
 
if bulldozer is fast, then AMD will be supply constrained and wanting to focusing on getting as much product into the Server side where margins are much higher. from this end there is no point stopping phenom II sales right now by saying bulldozer smashes everything.
 
if AMD can make the bulldozer 8cores around 290mm^2, I wonder if their 4core versions will be like ~150 ish mm^2. Also how much slower would these 4core parts be than sandy bridges?
 
if AMD can make the bulldozer 8cores around 290mm^2, I wonder if their 4core versions will be like ~150 ish mm^2. Also how much slower would these 4core parts be than sandy bridges?


Bingo. Everyone keeps on comparing the Bulldozer to SB, while forgetting that BD is going to be a high-end 8-core CPU.

Even if IPC is equal to Thuban, BD is going to be faster than a 2600K in multithreaded (8+ threads of course) workloads.

A 4-core BD should be much smaller.
 
I don't understand why you are imposing a binary "either or" logic tree here.

Engineering involves making tradeoffs, R&D investments, timeline for deliverables, risk management, etc.

When it comes to uncore stuff, one thing that Intel certainly has more experience with is their power-gating and turbo-clocking circuits. We can't really expect AMD to go from time-zero to being on-par with Intel all in one step when it comes to circuit layout and areal management of these features.

I also don't understand why you are so insistent on the Orochi die released at ISSCC being obfuscated...when this happened last autumn AMD was right quick with making sure everyone knew that it was. They have not said anything this time around.

Why release two differently obfuscated diemaps? Does not compute. If they wanted nothing but obfuscated diemaps to be out in the wild then they'd be releasing them left and right...or only releasing the same one but doing it over and over.

I believe the newest die shot is the real McCoy. Accepting that it is such but concluding it must be bad engineering is just needlessly silly. AMD does not have the resources to engineer products the likes of Intel, there is no need to cast negative judgement on their design engineers just because the company's bank account has less in it.
If they would need 20% more die space for that than I am with you but if they need 120 % (!) more for the same than there are only these two answers possible. And what you say is not correct. AMD was first with integrated memory controller and interconnect so if then Intel would have more problems since they did it first with Nehalem. And regarding design, the uncore is the simplest stuff on the CPU, means it consumes also the littlest space (see Intel). And if i look at the die shots from Shanghai and Deneb you can see how dense AMD can pack the design/uncore stuff. Why they should fail then with Zambezi? As I say either obfuscated or engineer failing, as their engineers already prooved that they can do it right.

They did the Module/Core so well so why should they fail so terribly with the much easier uncore stuff? And if I talk about uncore I mean not the core, not L2 nor L3 but the rest. How can this rest consume almost 50% of the die space? And even more why even waste any engineer resources on making this complex CMT if just doubling cores and invest the power in the uncore stuff would bring so much more benefit. It is just logic thinking that runs my alarm bells that there is something wrong.

And then in addition if you take into account that AMD SRAM cell size is 0.149 µm2 compared to Intel's 0.171 µm2 (both on their 32 nm process) it is even more suspicious. So smaller cores smaller SRAM and still ending with double(!) die size?
http://www.realworldtech.com/includes/images/articles/iedm08-16.png

And regarding the obfuscation. They made it once, they did obfuscate the Bulldozer die. It was so obvious that everyone detected it. So why not do it again making it a little bit more difficult to see? Or wait - no, they did not do it again, they just removed the obvious obfuscation and left the unobvious obfuscation as is. It is still the same picture. And the proportions are still the same of the previously obfuscated one.

The die size is one of the key informations that Intel wants to have as this is the factor determining production cost. So it is the most important commercial information.

And you have my answers for both ways. I do not make a binary tree of it. It is a binary tree of if this die shot is either true or false.

I'll come back when the real die size numbers are out if you just do not want to see this discrepancy (AMD is capable of making smaller cores but ends up with double the size of Intel!).

Update:
To get some additional light in that I investigated the Deneb die.
There you have for Cores/L2/L3 ~163 mm²
BD has for Cores/L2/L3 ~172 mm²
Uncore size for Deneb is 95 mm² in 45 nm
BD uncore should be similar but on 32 nm which would mean ~70 mm² (very conservative and taking into account that pads don't shrink).

Deneb already shows that AMD has a problem in the Uncore design because that 95 mm² is way too much (compared with Intel offering more functionality in less area).

Now this bad Uncore design would lead to ~240 mm² for Zambezi.
And the Deneb die also shows four HT links so very likly they will not be removed in Deneb.
If we assume that the die picture is right and we have this large 280 mm² Zambezi part that means that Uncore increased further to 108 mm².

With that comparison - let the missing 40 mm² left aside - it could be true that the ISSCC die shot is real.

This is a real surprise in my opinion. That would mean that AMD could not compete regarding performance when a little bit later the Core i7 Extreme on LGA2011 aka Sandy Bridge EN comes. Intel just doubles their core count to 8 and they have the performance crown back in 3 month.

So far I thought this would not be a problem since AMD can just issue a 8 Module single die part against this but with such a large Uncore area they cannot. AMD cannot compete because of their incredible large uncore area, it is unbelievable and some folks here talk about they do not have bad engineers. My strong advice to AMD: Get your Uncore fixed!

I now really ask for what all the effort with CMT?

If AMD would get their uncore right they could just double the core count when Intel does but this option is nullified by the uncore waste.

I am still hoping that the die picture is obfuscated but my investigation of Deneb which brougth a bad uncore die size performance is not very promising.

Let us hope that it is at least smaller than this 280 mm². With 240 mm² the situation would be better than AMD could issue at least a 6 module / 12 core part. But a 8 module / 16 core would be necessary. Yes I know they can do it by flipping two 4 modules together as they do with Interlagos. That is nice but we will not see these parts on consumer motherboards.

So we will likly see a mixed picture: In the main stream a Zambezi which will be superior to the Sandy Bridge 4 core and in the high end a Sandy Bridge EN which will be even better than Zambezi.

Normally you could say very fine because main stream market is what counts and AMD will be very good there. On the other hand you never before get such a large performance boost with Intel Extreme parts. They are not just give 10-15% percent advantage over main stream - this time they will give a whopping 100% minus application scaling.

Therefore I am looking forward to Zambezi and I am looking forward to Core i7 Extreme. AMD Zambezi could be 40% faster than SB2600 in May/June and SB EN could be 80% faster than SB2600 in October or so. And maybe at the year end we see a Zambezi 12C ... What a race in 2H/2011!

And again: AMD get your Uncore fixed!
 
if AMD can make the bulldozer 8cores around 290mm^2, I wonder if their 4core versions will be like ~150 ish mm^2. Also how much slower would these 4core parts be than sandy bridges?

If 8-core is 290mm2 then 4-core would be 290-2*30.8-X where X=whatever they gain by cutting out some amount of L3$.

So how much L3$ are you thinking they would cut? Half of it? All of it?
 
AMD are not comparing these modules to K10 cores. 100% CMP is achieved if each core(Not K10 core) in a module has its own dedicated resources. It was a tradeoff they made sacrificing little performance to save die space. JFAMD said.. IPC is greater.

I was talking about two (2) K10 cores in a module, that would be 100% CMP (No sharing resources) and if Llanos core can be taken as a reference then the module would have 100% CMP Throughput at almost the same size.

What busydude said, plus the fact that we don't really know what size the K10 cores would be if they had been shrunk with an expectation that they'd be reaching 4GHz+ clockspeeds.

http://translate.googleusercontent....le.com&usg=ALkJrhgDYyMm1DBiKbBqpH0ZTtZhH91miA

51239608.jpg


43780940.jpg


Granted we don’t know the exact die size if Deneb/Thuban would be shrunk with 32nm but I will make a guess that Llanos core is an enhanced K10 Thuban at 32nm because AMD wouldn’t have the time or the resources to design a new (third) core. So Llanos core could be very close to what a K10 Thuban core would be at 32nm.

At 32nm SOI HKMG, the Thuban core would be able to reach more than 3,5GHz even if it was only shrunk. And if the Llanos core has the same performance as Thubans core then AMD could have made an 8 core (100% CMP no sharing) at less than 200mm2.

Yes I know IPC wouldn’t have changed from Thuban (45nm) (lets say it could take +5% with the enhancements) but they would have made an 8 core CPU at less than 200mm2. Would it be SB competitive ?? at 4GHz it could and they could price it lower too 😉

Perhaps this could be Athlon III ?? 😛
 
Last edited:
I was talking about two (2) K10 cores in a module, that would be 100% CMP (No sharing resources) and if Llanos core can be taken as a reference then the module would have 100% CMP Throughput at almost the same size.

Granted we don’t know the exact die size if Deneb/Thuban would be shrunk with 32nm but I will make a guess that Llanos core is an enhanced K10 Thuban at 32nm because AMD wouldn’t have the time or the resources to design a new (third) core. So Llanos core could be very close to what a K10 Thuban core would be at 32nm.

At 32nm SOI HKMG, the Thuban core would be able to reach more than 3,5GHz even if it was only shrunk. And if the Llanos core has the same performance as Thubans core then AMD could have made an 8 core (100% CMP no sharing) at less than 200mm2.

Yes I know IPC wouldn’t have changed from Thuban (45nm) (lets say it could take +5% with the enhancements) but they would have made an 8 core CPU at less than 200mm2. Would it be SB competitive ?? at 4GHz it could and they could price it lower too 😉

Perhaps this could be Athlon III ?? 😛

I liken AMD shrinking K10 for Llano on 32nm as Intel shrinking Pentium4 for Cedar Mill on 65nm even though they had Core2Duo in the works at the same time.

It makes for a good backup plan should BD bomb for any reason. Should BD take off, like Core2Duo did for Intel, then Llano gets relegated to the low-end ASP markets just as Cedar Mill was.

I question the wisdom of diluting such a small resource pool of R&D dollars which AMD's management did in order to develop BD and Llano at the same time as supporting Bobcat development...but they know their business far better than I'll ever understand it in my lifetime so it must have made sense on a whole number of levels of risk management and cost-benefits analyses.

(or maybe not, Dirk was asked to walk the plank for some good reason...and the timing...)
 
The only logical explanation I can derive is maybe AMD came to a conclusion that stars is a dead end and not worth investing r&d into that. As I have mentioned previously, llano is just a stop gap solution.. According to the roadmaps, their future products are either based on bobcat or bulldozer..
 
Did any one read the amd blog where the writer argues that X86 benchmarks are not relevant now? I found that amusing.. It felt like a cop out on amd's part.
 
And again: AMD get your Uncore fixed!

It all depends upon how they're designing. My guess from all their die shots is that they're optimizing logic die-space at the cost of wiring. It results in a smaller 'core' that's easier to structurally design, but requires more spacing between the different 'components' for wiring and necessary repeaters to get the resulting mess straightened out. The 'cons' of this approach are slightly higher die space (gains from denser core logic don't offset the necessary spacing) and potentially more delay in inter-'component' signals, but is -far- easier to design.

By comparison, Intel's Sandybridge is an -excellent- demonstration of minimizing 'wasted' die space. It's rather clear that they're minimizing the use of upper metal layers for local logic signals in order to have them free for cross-chip type signals. And as I believe was stated in a review, that ring bus that does most of the heavy lifting is pretty much entirely on top of the L3 cache itself. Sure it means that the logic itself can't be quite as dense, but over all it does save die space... the only problem being that it's easily twice the effort to implement a design in this fashion, probably even more than that. But for Intel the extra design time makes sense.
 
The only logical explanation I can derive is maybe AMD came to a conclusion that stars is a dead end and not worth investing r&d into that. As I have mentioned previously, llano is just a stop gap solution.. According to the roadmaps, their future products are either based on bobcat or bulldozer..

I think Llano is anything but a stop gap solution. Much of the things they learn from Llano will undoubtadly be invaluable for the next generation Fusion products. Take a look at the demo they showed recently, Llano is no slouch. Maybe the fusion memory controller will be carried over etc. It's not like the thing doesn't carry a Fusion badge.
 
I question the wisdom of diluting such a small resource pool of R&D dollars which AMD's management did in order to develop BD and Llano at the same time as supporting Bobcat development...but they know their business far better than I'll ever understand it in my lifetime so it must have made sense on a whole number of levels of risk management and cost-benefits analyses.

(or maybe not, Dirk was asked to walk the plank for some good reason...and the timing...)


Wasn't Llano supposed to be out already, but delayed due to yield issues? If Llano had come out at around the same time as SB, the mainstream laptop market would be very different today.

That being said, as I've said before in other threads -- Llano is so late that it already looks long in the tooth. Of course, as AMD has shown in its recent demo, x86 performance (especially a K10 quad @ >= 3ghz) is more than good enough. To be honest, my office machine is a lowly P4 running at 3ghz, and its fast enough for Microsoft Word and these forums... :biggrin:
 
I question the wisdom of diluting such a small resource pool of R&D dollars which AMD's management did in order to develop BD and Llano at the same time as supporting Bobcat development...but they know their business far better than I'll ever understand it in my lifetime so it must have made sense on a whole number of levels of risk management and cost-benefits analyses.

(or maybe not, Dirk was asked to walk the plank for some good reason...and the timing...)

I think they are being quite efficient. Llano is just made of leftover parts from Phenom and Radeon. Bulldozer is where the real CPU innovation is, and Llanos successors will be running Bulldozer cores. All they're doing really is just building some chips with integrated video and some without.

Besides, architecture was never really AMD's weakness. AMD's biggest problem right now is that intel is less than a year away from 22nm while they haven't even shipped 32nm yet.
 
Llano may be AMD's unwillingness to put all of their eggs in one basket. Bulldozer may be one of those make or break products, but it does seem strange that they wouldn't transition to their new architecture on all platforms, especially if they're behind.

Perhaps it's just a matter of slipped ship dates and revamped schedules, but it does seem odd to mar and otherwise great product with a CPU core which is all but depreciated. The recent Llano demo/promo video showcased an area where the chip will excel, but it hardly paints a complete picture.

Regardless, I would have rather seen AMD go big or go home. They've certainly been on the ropes for a while now, but it almost feels as though they have nothing to lose. I'd rather see them go down swinging, if that's how it has to end.
 
Wasn't Llano supposed to be out already, but delayed due to yield issues? If Llano had come out at around the same time as SB, the mainstream laptop market would be very different today.

That being said, as I've said before in other threads -- Llano is so late that it already looks long in the tooth. Of course, as AMD has shown in its recent demo, x86 performance (especially a K10 quad @ >= 3ghz) is more than good enough. To be honest, my office machine is a lowly P4 running at 3ghz, and its fast enough for Microsoft Word and these forums... :biggrin:

I was just thinking about how back in the day I was so impressed with getting to use a dual socket 1.6GHz Opteron workstation for Maya and Photoshop. My Macbook Air is more powerful than that monstrous Lian-Li tower was. Sure, that was 7 years ago, but it was still able to easily handle tasks far more demanding than what the vast majority of people would perform. A 3Ghz quad core with a good IGP would run circles around it. Llano isn't just a "good enough for facebook" CPU, it's a "good enough for Toy Story 4" CPU. Sure, you wouldn't in a million years use one when you have the option of a 32-core Bulldozer workstation, but it is still far more than what the vast majority of people would ever need.
 
The claims by AMD are promising, but I think the only thing that is preventing me from jumping onto the bulldozer bandwagon is the imminent lack of SLI motherboards for AM3+.

Been on AMD since the K6 days, would be sad to jump onto Intel just for SLI when their chips seem to finally be on par/better than I7's
 
It all depends upon how they're designing. My guess from all their die shots is that they're optimizing logic die-space at the cost of wiring. It results in a smaller 'core' that's easier to structurally design, but requires more spacing between the different 'components' for wiring and necessary repeaters to get the resulting mess straightened out. The 'cons' of this approach are slightly higher die space (gains from denser core logic don't offset the necessary spacing) and potentially more delay in inter-'component' signals, but is -far- easier to design.

That might be possible, but after all they already use 11 metal layers which is already more than used by Intel. I mean they already have some additional metal layers to sort wiring out (means if they use 9 layers for the core they could use another 2 to pass wires over the core), therefore I am not so sure if this is the root cause. The easiest answer would be that they did not get their uncore logic dense. Usually you have to invest some resources to optimize that. That is why I said they should do it. Also with wiring there is still optimization potential. I mean 11 metal layers is really a lot. Don't know how much Intel uses but I think they use only 9. AMD has even there an advantage and obviously cannot use it. Yes I know the uppermost layers don't add as much as the lowest do but it is still more.

By comparison, Intel's Sandybridge is an -excellent- demonstration of minimizing 'wasted' die space. It's rather clear that they're minimizing the use of upper metal layers for local logic signals in order to have them free for cross-chip type signals. And as I believe was stated in a review, that ring bus that does most of the heavy lifting is pretty much entirely on top of the L3 cache itself. Sure it means that the logic itself can't be quite as dense, but over all it does save die space... the only problem being that it's easily twice the effort to implement a design in this fashion, probably even more than that. But for Intel the extra design time makes sense.
Yes Sandy Bridge is a real beauty compared to the BD die. The ring bus however is more useful for implementation, esp. if you consider a flexible amount of cores. Don't think that the ring bus was used for design purposes, though it might help.
 
I was just thinking about how back in the day I was so impressed with getting to use a dual socket 1.6GHz Opteron workstation for Maya and Photoshop. My Macbook Air is more powerful than that monstrous Lian-Li tower was. Sure, that was 7 years ago, but it was still able to easily handle tasks far more demanding than what the vast majority of people would perform. A 3Ghz quad core with a good IGP would run circles around it. Llano isn't just a "good enough for facebook" CPU, it's a "good enough for Toy Story 4" CPU. Sure, you wouldn't in a million years use one when you have the option of a 32-core Bulldozer workstation, but it is still far more than what the vast majority of people would ever need.


Yea, almost everybody is more concerned with specific use cases "Will I be able to watch blu-ray?" or "Will I be able to play game x?", weight, and battery life then how fast it benches. I think as enthusiasts we sometimes lose sight of what these chips are, what they're used for, and what drives sales.


Llano doesn't have to be the best in CPU or GPU (though I think it'll have the fastest GPU when it launches) -- it just has to deliver "the experience" at a competitive price point to Intel, and OEMs will snatch it up.
 
The claims by AMD are promising, but I think the only thing that is preventing me from jumping onto the bulldozer bandwagon is the imminent lack of SLI motherboards for AM3+.

Been on AMD since the K6 days, would be sad to jump onto Intel just for SLI when their chips seem to finally be on par/better than I7's

it is quite easy to hack sli on a non-sli motherboard. all of the hardware connections are there, you just need a small software tweak.
 
Status
Not open for further replies.
Back
Top