Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Cerb · Mar 3, 2011

Pentium could be beat, not running recompiled/specialized code, by 486 variants and tweaks (AMD and IBM, not sure if Cyrix' were unique, or 486-derived).

Pentium division bug.

The PPro had poor performance outside of a 32-bit protected environment (NT4 flew on those things, though!).

Several cases of Pentium II and III cores coming late to market. Luckily, most scaled in clocks well enough to not be too big of a deal, and AMD was even later and slower with competition.

The i740. Like Itanium, it was a good implementation of a bad idea.

Rambus. Even discounting the Intel hubris issues of Netburst, anything without an i850E ($$$ RDRAM) wasn't much faster than a P3, and you were consistently better off with AMD. If you could afford a 760 board, Athlon XP was a no-brainer, once they were released. It took far too long for Intel to come out with good DDR chipsets, entirely due to contract issues.

Prescott. Not bad, but even lower IPC, and higher power per clock, when people clearly wanted less power, and greater IPC.

SB chipset SATAs.

In the opposite corner:

The entire series of K6 can be summed up by being too late. Hot, too, but it seems every one of them was late enough to market that they were just competitive by the time they got out in volume. Great value (I fondly and bitterly remember my K6-2 350.Voodoo2 rig), but very much stuck in the low-cost rut, like the current Athlon II CPUs are. Not only that, but thinking about the K6 ALi chipset mobos, right now, gets my stomach in a knot. I'd love to find a few and bash them in, Office Space printer style.

Athlons had issues, not only for delays and such, which weren't nearly the problem of the k6, as they performed well, and Intel had their own delays, but due to chipsets. You couldn't always find an AMD 750, and there were quirks to worry about with others. As time went on, 60mm fans cooling 50-70W CPUs proved to be a liability, as well.

Athlon XPs ended up with the same problem, while competitive. The SiS 735-748 were nice bright spots, but as the platform got old, they got harder to find. The same later became true until well into socket AM2's lifetime.

The Phenom bug hurt AMD's server share, then Intel came out with very nice Xeons. I've read rumors that there have also been anti-competitive practices involved in keeping the Phenom II gen Opterons from gaining a good foothold, but I'm not sure how true it is, given that Intel had no problems at all selling cheap 2-socket Xeons.

While not a late-to-market problem, the Cyrix-based AMD Geodes are real POSes. Throw that special random number generator on something better (like a Bobcat-based SoC?), and move on, please!

...and that's just off the top of my head, without getting into VIA and nVidia AMD or Intel chipsets (AMD/nV was often a lesser evil, but Intel/nV was just people being masochists for SLI), and I still wonder why SiS couldn't get the quirks out of their P4 chipsets, when the Athlon ones generally were flawless. So, I'm going to wait for launch, and see what pricing and performance are like. I have high expectations for server use, but not so sure about desktop and notebook BDs. There's no good reason they shouldn't be 30-50% faster than Phenom II, but that only barely gets them into Intel's current midrange, and it will be highly-tweaked productivity apps where the lower peak IPC could be really detrimental, as it will necessarily mean taking more clock cycles to execute code that can take advantage of bursts of >2 IPC (OTOH, high clocks and an excellent cache system could be enough to offset that, so you never know).

Nemesis 1 · Mar 3, 2011

Well I am actually happy about the SB sata , It gives the AMD guys something to crow about . It also showed the speed that intel moves at When a problem is found . I like how so many pick this mistake up to bash intel . Its even more interesting that Most these bashers Don't run multi drives and choose not to use the satas3 ports to begin with . The other thing that is interesting is most the peole complaining run AMD hardware . Its a dead horse now but people such as yourself will rally around it . Nice life ya have

IntelUser2000 · Mar 3, 2011

HW2050Plus said:
With that picture you get 275-285 mm² by measuring pixel count (I used gimp for that).

Really. The whole article that linked the pic is about the new die pic and how its a real one because its different from the other more colorful one.

And yes, it does make sense use the same die as the server part since 300mm2 isn't that big as a high-end enthusiast part anymore. Lynnfield/Thuban/Agena ring a bell?

HW2050Plus · Mar 3, 2011

IntelUser2000 said:
Really. The whole article that linked the pic is about the new die pic and how its a real one because its different from the other more colorful one.

And yes, it does make sense use the same die as the server part since 300mm2 isn't that big as a high-end enthusiast part anymore. Lynnfield/Thuban/Agena ring a bell?

Thanks for that info.

But regarding the server/consumer dies it is a question of money. Do the costs for another mask pay out by the saves in die area. And regarding that you get really much money from a smaller die size. I do not think that the costs for another mask set is that expensive because then nobody would do mask revisions in that case and as we know mask revisions is a frequent task.

Anyway again, even now with 280 mm² I again claim that anything above let's say 240 mm² I would regard as bad engineering from AMD. I mean Intel manages to consume ~60% less for their uncore compared to AMD (if 280 mm² is true) and they also now have QMI and memory controller on die.

And I am sure that 2600K die has no 4 QMI links.

Idontcare · Mar 3, 2011

HW2050Plus said:
Not really, this is the old source, see the link to the origin from 2010-10-19 if you look at the original source:
http://pc.watch.impress.co.jp/img/pcw/docs/408/107/html/02.jpg.html
So there is nothing new from ISSCC as far as I can see.

Did you really just not bother at all to look at the photo I embedded and linked? :\ C'mon dude, at least make an effort here.

Ajay · Mar 3, 2011

HW2050Plus said:
Anyway again, even now with 280 mm² I again claim that anything above let's say 240 mm² I would regard as bad engineering from AMD. I mean Intel manages to consume ~60% less for their uncore compared to AMD (if 280 mm² is true) and they also now have QMI and memory controller on die.

Well the bottom line isn't the actual physical characteristics of the die. As a consumer I'll be interested in Bulldozer if has comparable (or better) performance per dollar compared to Intel - especially for applications that are the most important to me. Oh, and that it overclocks well (that's just how I roll

).

AtenRa · Mar 3, 2011

I have to admit that BD will not be able to compete against an 8 core 16 threads SB-E but i will guess that it will be very competitive against a 6 core 12 threads SB-E.

Just because you don't see something in the uncore of the BD die doesn't mean its empty space (not always). Even if it is empty space it could be designed that way for the Dual module 4 core BD.

From the pictures we have, it seams that the die size is close to 290-300mm2.

We don't know yet if a 5 module 10 core BD will be available for desktop in 2012.

300mm2 for an 8 core CPU with 16MB caches at 32nm is very nice engineering from AMDs and GloFos part

BD module with 2MB L2 cache is 30,9mm2

SB core with 2MB L3 is 29,5mm2

HW2050Plus · Mar 3, 2011

Idontcare said:
Did you really just not bother at all to look at the photo I embedded and linked? :\ C'mon dude, at least make an effort here.

I used that photo to measure the 280 mm². I took the lower photo where the statement that it is a new one from ISSCC was missing. IntelUser2000 pointed that out already.

Idontcare · Mar 3, 2011

AtenRa said:
BD module with 2MB L2 cache is 30,9mm2

Do we have the number for just the core, sans the cache, to compare with the numbers in the chip-architect collage?

AtenRa · Mar 3, 2011

Idontcare said:
Do we have the number for just the core, sans the cache, to compare with the numbers in the chip-architect collage?

I remember Hans or Dresdenboy saying close to 18-19mm2, I will try to find the link

AtenRa · Mar 3, 2011

Actually it was from Hiroshige Goto

http://translate.google.com/transla...co.jp/docs/column/kaigai/20110301_430044.html

The picture is excluding the die 2MB of L2 looks about 18 mm square.

HW2050Plus · Mar 3, 2011

AtenRa said:
Just because you don't see something in the uncore of the BD die doesn't mean its empty space (not always). Even if it is empty space it could be designed that way for the Dual module 4 core BD.

And what functionality should that be that takes so much more die size than what Sandy Bridge needs for the same? And if there is some (except for the hypertransport links I already pointed out) than is it worth the die size consumption?

Therefore it is either bad engineerinig (if you need ~120% more space to do the same your competitor does) or the new picture is still obfuscated.

Difficult to say how much is consumed by the HT links but I would say no more than 6-7 mm² per Hypertransport link. So if you would strip 3 HT links off than you end up with ~20 mm² less (then total of ~260 mm²). Still too much to come anything close to Sandy Bridge.

I mean the number is so bad, that Intel could put two Sandy Bridges in the same size! And all that not because of the core or cache is large but because AMD made a blooper in the uncore.

All that just makes no sense. You put incredible effort in making smart small cores and then you waste it all in the uncore logic. Bulldozer module is just 18.0 mm² in size. All this CMT would simply render useless if Intel could just double cores with same die space because they don't suck in the uncore. And from what this immense waste should come from? They didn't have this waste with Thuban/Deneb! And Sandy Bridge has Turbo and C6 as well.

No I just do not believe that this high number of ~280 mm² is real.

@AtenRa
I would call good if they are at least as good as Intel and that would be at around ~ 200 mm² for Zambezi. That would be still larger as Intel but Zambezi has 6 MB more cache as we know. So it would be roughly equal then.

Don't know the exact Sandy Bridge numbers but without GPU SB should not be around 150 mm². Add 4 more cores and cache you have 4 * 29.5 mm² in addition that would be 270 mm²! Even less than AMD Bulldozer if the die shot is correct, but then Sandy Bridge would have 24 MB of cache and 8 full real cores that are much larger than AMD one's.

Yet again that is why this shot can't be correct und must have been obfuscated.

AtenRa · Mar 3, 2011

I can’t say anything about the Bulldozer die size but I can say about the core sizes because we do have the actual size from AMD (30,9mm2 including 2MB L2).

An observation,

Llanos Core is a K10 (Deneb) derivative manufactured at 32nm and it has a die size of 9,69mm2 excluding 1MB L2 Cache. That is half the Deneb at 45nm.

If AMD would use two of these cores in one module then they would have a module at 19,38mm2 and 100% CMP.

BD module (Dual Core) is close to 18-19mm2 excluding L2 Cache, almost the same size. But the idea of sharing the front end and FP units is to save die size. So why do we have almost the same size with two (2)K10. Why did AMD spend money and time to create the BD module when they could install two (2) K10 cores and have almost the same core size and 100% CMP?

busydude · Mar 3, 2011

AtenRa said:
Why did AMD spend money and time to create the BD module when they could install two (2) K10 cores and have almost the same core size and 100% CMP?

AMD are not comparing these modules to K10 cores. 100% CMP is achieved if each core(Not K10 core) in a module has its own dedicated resources. It was a tradeoff they made sacrificing little performance to save die space. JFAMD said.. IPC is greater.

AtenRa · Mar 3, 2011

I could be wrong but lets have it,

If the above picture is correct in relation to sizes, then I will guess that BDs single thread will skyrocket compared to K10 Deneb/Thuban. Why ??

The BD front end is huge in contrast to Llanos K10 and so is the FP unit. Remember in BD, if we only have one thread it will be able to use all the resources of the module (except the second Integer Execution Unit).

Intels single thread performance strength comes from the front end and IMO thats what AMD have done with BD, the Integer execution units are not bigger than K10 (although they have been upgraded) but the Front end have been oversized with L1 and L2 BTBs, Prediction Prefetch , 4-way Decoder, better OoO and Microcode (uops) etc. Of course the front end it has to be able to handle 2 threads thats why is bigger too.

The FP unit is much bigger than K10 and we do know that it can do Fused Mull/ADD (FMAC) and each FP unit per module can do one 256-bit AVX per cycle. Again the FP will have to be able to SMT two threads so we can say it has to be bigger for that reason too.

Im open to suggestions

Idontcare · Mar 3, 2011

AtenRa said:
BD module (Dual Core) is close to 18-19mm2 excluding L2 Cache, almost the same size. But the idea of sharing the front end and FP units is to save die size. So why do we have almost the same size with two (2)K10. Why did AMD spend money and time to create the BD module when they could install two (2) K10 cores and have almost the same core size and 100% CMP?

What busydude said, plus the fact that we don't really know what size the K10 cores would be if they had been shrunk with an expectation that they'd be reaching 4GHz+ clockspeeds.

Remember if you intend to go for lower-clockspeeds (lower power-consumption) then that means you need less drive current to do things which means you can get away with making your xtor higher and the logic scaling all the higher.

I don't think you are going to arrive at a meaningful conclusion by comparing Llano cores to Orochi cores...the layout has been optimized with two totally different targets in mind.

bryanW1995 · Mar 3, 2011

busydude said:
Because you are in Namibia..

The image says BD's initial production is going to begin in Apr'11 and Llano's in Jun'11.

BD release date has been atleast a little consistent.. when compared to Llano.

that slide clearly states it is from oct 2010. jfamd has since stated that all he can say is "summer 2011".

Idontcare · Mar 3, 2011

HW2050Plus said:
Therefore it is either bad engineerinig (if you need ~120% more space to do the same your competitor does) or the new picture is still obfuscated.

I don't understand why you are imposing a binary "either or" logic tree here.

Engineering involves making tradeoffs, R&D investments, timeline for deliverables, risk management, etc.

When it comes to uncore stuff, one thing that Intel certainly has more experience with is their power-gating and turbo-clocking circuits. We can't really expect AMD to go from time-zero to being on-par with Intel all in one step when it comes to circuit layout and areal management of these features.

I also don't understand why you are so insistent on the Orochi die released at ISSCC being obfuscated...when this happened last autumn AMD was right quick with making sure everyone knew that it was. They have not said anything this time around.

Why release two differently obfuscated diemaps? Does not compute. If they wanted nothing but obfuscated diemaps to be out in the wild then they'd be releasing them left and right...or only releasing the same one but doing it over and over.

I believe the newest die shot is the real McCoy. Accepting that it is such but concluding it must be bad engineering is just needlessly silly. AMD does not have the resources to engineer products the likes of Intel, there is no need to cast negative judgement on their design engineers just because the company's bank account has less in it.

Ajay · Mar 3, 2011

AtenRa said:
I have to admit that BD will not be able to compete against an 8 core 16 threads SB-E but i will guess that it will be very competitive against a 6 core 12 threads SB-E.

Well, for the first case that makes sense, since there will be 8 'full' cores + HT (though I thought SB-E was only going to be 6C).

In the second case, we don't really know - since we don't have any benchmarks yet (and at the rate things are going, I don't think we'll have any leaked numbers for a couple of months).

Just last week JFAMD made a comment about not having final silicon yet, so BD is still a mystery (though AMD must have a pretty good idea of what the throughput/frequency is).

bryanW1995 · Mar 3, 2011

podspi said:
Truth. One actually happened and one is just a rumor from Fudzilla D:

It doesn't even make sense. 890 is compatible with BD, and we've already seen the physical socket, hell we've already seen BD running.

I think AMD is being quiet about Bulldozer because they want to give Intel as little time as possible to react, similar to their secrecy over eyefinity. If release is really mid-June, that's three and a half months. As we've seen from the SB-chipset fiasco, (unless you believe the conspiracy theories that they knew, which I don't) Intel is very, very good at execution. If AMD shows BD destroying SB Intel could push forward 6 or 8 core SB, or move IB forward. AMD needs as much time as possible to reap whatever reward they can from Bobcat, Llano, and Bulldozer -- hence the secrecy.

While this is true, I also believe that AMD is smart enough to realize that if BD is really going to stomp all over 2600k they need to let everyone know about it RIGHT NOW. not next week, or next month the sata issue is a thing of the past, but immediately. A nice demo of a BD running at 5ghz and smoking a 2600k would prompt a LOT of people to delay an intel purchase, even if intel does throw out a 2700/2800k (which they are almost certain to do around the BD launch anyway). The most likely reason that AMD isn't crowing about how great BD is (on the consumer side at least) is that BD isn't as great as many of us want/hope it to be. Isn't it also telling the John Fruehe is relatively active here but we don't have his equivalent on the consumer side openly frequently our boards?

Idontcare · Mar 3, 2011

bryanW1995 said:
Isn't it also telling the John Fruehe is relatively active here but we don't have his equivalent on the consumer side openly frequently our boards?

IMO this has more to do with the individual than the position they hold as a matter of employment.

Look at myself, how many other CMOS process development engineers from any company do you see around here? I worked with some 120-150 process development engineers and there was only one (1!) other engineer that had ever even heard of AnandTech, but he was not a forum member.

I would not read too much into JF's presence here other than to say that we all must have something in common as we are all drawn here despite our diverse backgrounds and walks of life.

That is something we should celebrate, not something we should analyze or critique.

bryanW1995 · Mar 3, 2011

Idontcare said:
I've been hearing about Ivy Bridge for 3 yrs now, and Sandy even longer before that, and Haswell for nearly as long by now.

I don't get this sense of entitlement that some people seem to have in regards to AMD, Bulldozer, and information from one about the other.

Some of you guys are acting like spurned or jilted lovers.

Frankly it is bizarre.

Obviously you don't spend very much time in VC&G...

Idontcare · Mar 3, 2011

bryanW1995 said:
Obviously you don't spend very much time in VC&G...

touché :thumbsup:

bryanW1995 · Mar 3, 2011

HW2050Plus said:
No chance for this one. If Intel brings a 8 core / 16 thread Sandy Bridge than it will be the top performer and no 4 module / 8 thread Bulldozer will compete with half of the threads. That could only work if Sandy Bridge is clocked very low (for TDP reasons).

However when is that Sandy Bridge part coming? At the time it might come then it is likly that AMD will ready a 6 module / 12 core which should be able to surpass.

Btw. I heard that for the high speed design the number of pipeline stages was increased to 15 for Bulldozer. Can anyone confirm that? That would be really good for a high speed design compared with e.g. the 31 stages of Prescott. Afaik Deneb has 12 and Sandy Bridge 14.

well, they'll have 16 core server cpus so it shouldn't be too hard to send some of those to the retail side, especially if yields are decent and/or they can can charge $1000 for them like intel does on their EE chips.

busydude · Mar 3, 2011

bryanW1995 said:
that slide clearly states it is from oct 2010. jfamd has since stated that all he can say is "summer 2011".

I was just relaying information to Skurge from that particular slide, since he didn't have access to it.

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Elite Member

Lifer

Elite Member

Member

Elite Member

Lifer

Lifer

Member

Elite Member

Lifer

Lifer

Member

Lifer

Diamond Member

Lifer

Elite Member

Lifer

Elite Member

Lifer

Lifer

Elite Member

Lifer

Elite Member

Lifer

Diamond Member