Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

bryanW1995 · May 16, 2011

Joseph F said:
I suppose that had copied it from Wikipedia, where I had gotten it.
And I certainly hope that AMD can pull this one off and can truly compete with Intel in performance this time around. If not however, Then I may have to switch over to Intel also.

I already switched

. Would happily go back if BD is competitive, however.

Nemesis 1 · May 16, 2011

@Moptar BD isn't going to force intel to do anything . Unless its like onto the P4 820. IF BD is all that Intel won't lower pricies . AMD will raise theirs . Intel has The NAME and the FAME, AMD needs a killer product.

Nemesis 1 · May 16, 2011

bryanW1995 said:
Nobody thought that conroe would be the hit that it eventually became. Also, due to intel's current drastic underclocking, the less time that AMD gives intel to prepare the more likely intel will be to incorrectly compensate for BD, thus potentially at least giving AMD a temporary advantage in either pricing or price/performance. And their only realistic shot at the absolute performance crown is to have intel underestimate BD, if intel knows the real performance they'll be able to bring out a 4.5-5.0 + Ghz EE cpu that will crush it.

Really Go back to those Times in forum . I was banned for saying C2D would kick AMD to the curve and would be 20%+ faster than AMD 64 . So Ya there were those who New because of dothan what C2D would be like.

drizek · May 16, 2011

I knew what C2D would be like because of dothan. I just never was able to get my hear around the idea that Intel would have enough sense to actually ship an energy efficient, low frequency desktop chip.

Phynaz · May 16, 2011

bryanW1995 said:
Nobody thought that conroe would be the hit that it eventually became.

Intel was demoing Conroe months before launch. By launch it was pretty apparent they had a hit on their hands.

Mopetar · May 16, 2011

Nemesis 1 said:
@Moptar BD isn't going to force intel to do anything . Unless its like onto the P4 820. IF BD is all that Intel won't lower pricies . AMD will raise theirs . Intel has The NAME and the FAME, AMD needs a killer product.

Regardless of how well, or poorly, BD turns out, AMD gains nothing by releasing information and giving Intel extra time to prepare. If the have a killer product, it will still be killer in a few weeks when it's announced. If they have a crap product, it will still be crap when it's released. Why let Intel know ahead of time?

Abwx · May 16, 2011

Mopetar said:
If the have a killer product, it will still be killer in a few weeks when it's announced. If they have a crap product, it will still be crap when it's released. Why let Intel know ahead of time?

Intel already knows , and probably that they have some ES as well.

OCGuy · May 16, 2011

Mopetar said:
Regardless of how well, or poorly, BD turns out, AMD gains nothing by releasing information and giving Intel extra time to prepare. If the have a killer product, it will still be killer in a few weeks when it's announced. If they have a crap product, it will still be crap when it's released. Why let Intel know ahead of time?

It might make someone building a system or placing a large order wait a few weeks.

What does Intel need to prepare for? They move along to the beat of their own drum, I doubt BD is going to change anything for them.

Mopetar · May 16, 2011

Abwx said:
Intel already knows , and probably that they have some ES as well.

Or Intel knows what AMD wants them to know.

Intel probably has a good idea, but even if they didn't they would have prepared plans for several different possible outcomes ranging from BD being completely anemic to beating their pants off. Regardless, there's still a difference between knowing and assuming.

Abwx · May 16, 2011

Mopetar said:
Or Intel knows what AMD wants them to know.

Intel probably has a good idea, but even if they didn't they would have prepared plans for several different possible outcomes ranging from BD being completely anemic to beating their pants off. Regardless, there's still a difference between knowing and assuming.

There s a lot of ES in the OEM labs, and for commercial
reasons, since the business is what it is, no doubt that
a manufacturer here or there did give them all the essential
informations we re still running after through the net..

formulav8 · May 16, 2011

OCGuy said:
What does Intel need to prepare for? They move along to the beat of their own drum, I doubt BD is going to change anything for them.

Correct. They already have their quite aggressive roadmap finished. BD won't suddenly make them scramble and start slashing cpu prices by 75% or anything if BD is even 'great'. Beside, they already have their BD answer in queue. So there wouldn't be to much they could do if BD becomes a big hit. They could probably reel in SB-E? (I think its called) by a couple weeks or something. That is their BD answer either way.

Dresdenboy · May 17, 2011

HW2050Plus said:
I do see the die size utilization as a performance issue. As you know you cannot make a chip as large as you want. There are many limits, speed pathes, TDP and finally costs.

As long as a BD module's area w/ L2 is about the same SB core w/ L3 and core sizes too, this shouldn't be a performance issue. The Orochi die size might affect latencies between modules and L3 subcaches, IMC, HT-PHYs, but so does a ring buffer (where data moves from stop to stop until reaching its destination). And if I look at both dies side by side the likely signal traveling distances between components on the Orochi die don't look that long.

There is no direct relation between die size and TDP (which is actually a specified value the chips are binned for). I could make a 1000 mm² die with one BD module at a given clock speed, NB, IMC, DDR pads. It wouldn't have a higher power consumption than when putting the same components on a <100 mm² die. Instead the larger die might provide a better hot spot separation.

The observed higher power consumption of larger dies actually comes from more transistors being used. They have leakage and switching power consumption, not the blank silicon.

HW2050Plus said:
AMD will not suffer from initial Zambezi die size. But they already suffer when it comes to Interlagos and they will suffer also in desktop area latest in 2012 when Sandybridge E parts are available.

Interlagos is not just larger in overall die size (with the benefit of improved binning opportunities due to being a MCM) and thus might cost maybe $40 more to make, but it also offers more performance in a single socket (this is also about density).

HW2050Plus said:
Let me just explain what I mean so you understand where I am seeing the problem with an example:
Intel 4C/8T: ~150-160 mm²
AMD 4M/8C: ~280 mm²
Intel 6C/12T: ~220 mm²
Intel 8C/16T: ~290 mm²
(theor. AMD 8M/16C: 560 mm²)

All that still on 32 nm! Means that with roughly the same die size consumption Intel's architecture is much faster than AMD Bulldozer.

As others already said, we don't know anything about BD's performance. And 32 nm processes are not that easily comparable. Gate First HKMG vs. Gate Last, bulk vs. SOI, a lot of other differences (stress, metal gate materials, geometries etc.). Think about Ontario produced with TSMC's 40 nm process compared to 45 nm Atom.

An 8C BD is a throughput machine, but a 4C variant (6C might be harvested from 8C) could be ~150 mm² as well, but might clock higher (~30-40% with actually twice the TDP available (sqrt) -> but yield needs to be better because it would be a volume product).

HW2050Plus said:
I mean AMD Bulldozer will surpass Intel SB 4C in performance but Intel can just double the core count and AMD is in the same position as now: Significantly behind the competition. Okay you could say why crying it is the same as of now but the difference is that AMD issued a brand new architecture but did not change anything by that. That will affect average reslling prices. The problem in performance per die size is not the die size it is the performance.

You mean, doubling the core count is that simple? Due to roughly half the TDP available per core, base clocks would have to go down (as you also said) and its performance profile would look different, only mitigated by its advanced Turbo mechanisms.

Performance per die size indirectly translates to a price/performance metric.

HW2050Plus said:
Now die size is not everything because when you are TDP limited then a small die size does not help out. But there the same bad news. Intel can do their SB 8C/16T with roughly the maximum TDP they have (still on 32 nm). AMD obviously has also problems there because for the high end 4M/8C part they already speciofied 125W TDP. So AMD would have to drop also frequency significantly when adding new cores (problem of Interlagos).

Doubling the cores usually meant dropping clock frequency by 30%. Thanks to most recent Turbo boost modes, the highest boosted frequency might actually stay at the same level (or even increase, if TDP increased too). But as long as we don't know, if it needs a 125W BD to be faster than a 95W SB or if a 95W BD is enough, we can only speculate here.

BTW the 8C SB-EP/EX (~400 mm² according to the ISSCC 2011 die photo presentation) also needs to take away some TDP for the additional memory channels and PCIe lanes. OTOH TDP can go up to 150W. This might become a top performer, but surely not at price points <$500. There is the problem again: Do we look for the top performing CPU? x86 or non-x86? 2C, 3C or 4C memory? Desktop or server? Same price or max price?

HW2050Plus said:
As you see though they surpass Sandy Bridge 4C this is just not enough to be competitive. And by that I mean that AMD is able to raise average selling prices. Again the 2H2011 might be well for AMD because of Bulldozer but in 2012 it will look the same or (hopfully not) even worse for AMD as of now.

IB is said to be 20% faster at the same TDP. This is in line with Intel's data about their Tri-Gate process. Die size and idle power (if Vcc still is provided, no power gating) will shrink as well, so will costs.

AMD's options are:

µarch improvements ("enhanced BD", some interesting patented options weren't implemented in BD1)
natural process maturity and process improvements at GF (similar to CTI), something like low-k for Thuban or smaller steps like adding a new way of applying stress etc.
adding a few cores (like 10 with Komodo -> 25% more) with a small drop (if any at all) in base frequency

BTW, is Intel improving their processes in the same way or do they make it as good as possible (under cost considerations) from the start that any improvements are left for the next smaller process node?

HW2050Plus said:
So Bulldozer should fix two design issues:
a) CMT -> This is just not effective enough regarding die size.
b) High Frequency Design -> Obviously not effective regarding TDP (I mean AMD has even SOI and higher TDP).
c) High uncore die consumption

There is no proof and even not a hint that a) or b) are true, while c) has an obvious effect on die size. Further the die size is not directly related to the module size. There is a lot more to consider. Better compare module sizes with core sizes than just die sizes. This is like deriving the efficiency of a motor by measuring the car's length.

While we are at it:

High Frequency Design (it's not extreme with ~20-25% increase) has two ways to use it: clock the logic faster at the same voltage (which influences the transistors' switching speed), or keep the clock frequency while lowering voltage. According to Mike Butler they did increase frequency while not increasing power consumption and keeping IPC constant to their previous arch (10h/12h).

HW2050Plus said:
So my proposal for AMD for Bulldozer II:
to solve a):
BD brings a lot of fine things with it: A decoder capable of doing 4 ops decode / cycle. This is an advantage over Intel. So just add another one to feed the other integer core. That takes ~6 mm²/core. A lot but you gain a lot.
Now also widen up to 4 ALU + 2 AGLU. That costs very little ~2 mm²/core. Also the scheduler must be changed of course but that is no issue as the scheduler step back was related to b) and as we fix b) that should be no issue. The result could be a core that is equal or faster than SB core. On top of that utilizing the amazing decoder they can add SMT. So another SB feature included.

As others already wrote, you might see this stuff in a too simple way. You need to think of signal traveling times and such things. Adding port 5 to Intel's core also meant more complex scheduling, bypass logic, read/write ports (ROB, RRF). Such changes actually cause more cycle time and might require some trade offs. Adding further execution units and issue ports to BD also needs more bypassing logic, more register ports, a more complex scheduler logic (complexity increases roughly quadratically), flags processing logic, etc.

So while you "fix" b) you will create a bigger, slower clocked core and need to rebalance everything. The need to exploit higher ILP with a fatter core can actually only be solved by involving SMT in the integer cores (but BD's huge front end, L2 cache and FPU already process 2 threads). You save that by having a narrower core churning faster. Example: 4-wide ALUs: to make efficient use of them there need to be many opportunities to find 3 or 4 independent (not waiting for other op's results) ALU ops to be executed. Then remember that x86 has 8 GPRs in 32b mode and 16 in 64b mode, one not generally available due to being used as a stack pointer. There is not much room for ILP. With 2-wide ALUs you only have to find 2 instructions. If they run faster, they might provide results to dependent ops after 60-70% the time (4-wide might need to run at 60-70% the frequency at the same power consumption).

BTW, the IRF is already replicated in BD to need only 4R/2W ports and have short routing distances (dunno if they want to clock it 4GHz at 1.2V or . So for your idea there need to be another IRF. Further the whole integer core is only 2.37 mm² according to an AMD paper.

HW2050Plus said:
b) Remove high speed design, fix latencies (esp. INT-SSE) etc. That will fix TDP issues

And what will we do with b) if there are no TDP issues?

HW2050Plus said:
c) Optimize uncore: That will result in even smaller chip though we added more transistors with a)

This is a real option, even w/o a).

Inspire · May 17, 2011

videoclone said:
When you have poo in your hand, you never want to show anyone!

But when your playing poker and you have an ace in your hand you also never want to show anyone!

Its really hard to tell what analogy AMD is going with by hiding the performance, i "REALLY hope its the poker one because otherwise its going to be ......SH*T

Probably the same hand they had with the Phenom

Arkadrel · May 17, 2011

Intel has had SMT for sometime now with its Hyper-Threading Technology (HTT). With execution resources not fully used by a task, the same core, can put those towards another task.

What makes Bulldozer intresting is AMD is finally gonna do something akin to Intels Hyper-Threading Technology.

So:

Intel makes 1 super single core, that can act like 2, if it has spare resources and a 2nd task is needed. (super single thread performance, decent dual thread performance)

Amd makes 2 decent cores, that can act like 1 super core, if there is no 2nd task. (decent dual thread performance, super single thread performance).

They just go about it in 2 differnt ways.
This is something new from AMD, they used to just have more cores thrown in and just try to win in performance on the multithreaded apps.
Now theyve done something that ll drastically enhance their single thread performance.

I guess why most people are abit excited is because their eager to see which solution to the problem is the best answear. And because so far AMD hasnt had one.

Apperantly AMD claims for 12% extra space, they can get ~90% extra performance (1 vs 2 cores). By useing this Modualer approach where they share some stuff between the 2 cores.

I believe when HTT was first implimented by Intel they claimed 5% extra die space for 30% performance or so.

Abwx · May 17, 2011

Arkadrel said:
What makes Bulldozer intresting is AMD is finally gonna do something akin to Intels Hyper-Threading Technology.

Intel makes 1 super single core, that can act like 2, if it has spare resources and a 2nd task is needed. (super single thread performance, decent dual thread performance)

Amd makes 2 decent cores, that can act like 1 super core, if there is no 2nd task. (decent dual thread performance, super single thread performance).

.

Intel core has decent dual thread performance only if the core
is not fully used , thus when a thread make full use of the core
execution ressources, there s no more room for a second thread
to be executed with decent power computing.

The more Intel improve its architecture efficency,the less the
relevance of hyperthreading.

In that respect, AMD approach seems more efficient and will
surely be copied by Intel in a way or another...

Cerb · May 17, 2011

Arkadrel said:
Intel makes 1 super single core, that can act like 2, if it has spare resources and a 2nd task is needed. (super single thread performance, decent dual thread performance)

Amd makes 2 decent cores, that can act like 1 super core, if there is no 2nd task. (decent dual thread performance, super single thread performance).

Not quite. The only part where that could be the case is the decoder (assume all matching width units behind it must be as wide, if it were to be widened). The only case where it will lose you more than a couple cycles, is when a two threads can sustain >2 IPC, which is somewhat uncommon anyway, more-so in x86, and more-so as the chip's speed increases. More like decent single-threaded performance, great multithreaded performance.

Consider this:
* FSB -> HT: massive multithreaded gains, due to not having to hog the same north bridge for all CPU-to-CPU communication, nor memory access, and being able to keep high communication speeds, as more CPUs are added.
* "True" multicore: moderate multithreaded gains (enough for many business to upgrade CPUs only, which is fairly rare), due to halving (or removing, in the single-socket case) those high-latency socket-to-socket communications. Intel ended up doing this, but AMD doing it with the K8 let us see the performance gains just from that change.
* Two-core modules: unknown multithreaded gains, but again halving the need for communications between cores, that can slow down multithreaded cases.

Whole processors per set of execution resources can only go so far, before even Intel's process advantage can't keep multithreaded performance scaling well enough.

Apperantly AMD claims for 12% extra space, they can get ~90% extra performance (1 vs 2 cores). By useing this Modualer approach where they share some stuff between the 2 cores.

Lies, damn lies, and statistics, there. It's going to be hard to get a straight number of what a true single-core BD-alike would have looked like. However, if they did it right, there will be no downside to the sharing of the front-end, and it will be the 2-wide ALU that limits performance, v. other configurations (such as a 4-wide shared ALU). It will definitely be much smaller than going with basic CMP, though.

formulav8 · May 17, 2011

The models being released soon are based on a single Die? So the 6 core chip (for example) is the same as the 8 core chip with just a module disabled? Likewise with quad core versions?

drizek · May 17, 2011

Definitely for 6-cores, probably for 4-cores at first. There will be 4-core native chips later though if not at launch.

Cerb · May 17, 2011

There's really no reason not to have both at launch. AMD has done that before, with more than one core for a given model.

sm625 · May 17, 2011

Re: LucasMatosRodrigues's video, AMD Bulldozer FX 4110: Windows Experience Index (WEI Test)

( http://www.youtube.com/watch?v=6zK4WAPKNYo )

Look at the cpu test. It starts up and runs for a while at low load. Then it switches to full load after awhile. From 3:05 to 3:31 his cpu is pegged at near 100%. At 3:31, the CPU test ends and it switches to the memory test. So the cpu test pegged his cpu for 26 seconds.

My Q6600 at 2.7GHz takes 40 seconds
My X2 4200 (65nm) at 2.7GHz takes 33 seconds

But here's the real kicker. My Q6600 at 2.1GHz takes the same exact 40 seconds. :\

I cant make anything out of these numbers. But if there is anyone who knows what's going on during this cpu assessment, they may be able to figure something out.

996GT2 · May 17, 2011

sm625 said:
Re: LucasMatosRodrigues's video, AMD Bulldozer FX 4110: Windows Experience Index (WEI Test)

( http://www.youtube.com/watch?v=6zK4WAPKNYo )

Look at the cpu test. It starts up and runs for a while at low load. Then it switches to full load after awhile. From 3:05 to 3:31 his cpu is pegged at near 100%. At 3:31, the CPU test ends and it switches to the memory test. So the cpu test pegged his cpu for 26 seconds.

My Q6600 at 2.7GHz takes 40 seconds
My X2 4200 (65nm) at 2.7GHz takes 33 seconds

But here's the real kicker. My Q6600 at 2.1GHz takes the same exact 40 seconds. :\

I cant make anything out of these numbers. But if there is anyone who knows what's going on during this cpu assessment, they may be able to figure something out.

Eh, that time means next to nothing. Your Q6600 is much faster than an X2 4200 so makes no sense for it to complete the test more slowly.

And besides, this guy hasn't shown any concrete proof that he actually has a Bulldozer CPU.

drizek · May 17, 2011

Does anyone know what the red line is in the windows task manager CPU graphs? I've never seen that before.

Bearach · May 17, 2011

drizek said:
Does anyone know what the red line is in the windows task manager CPU graphs? I've never seen that before.

Under task manager, select performance and then on the top menu do view and click upon "show kernal times" and that will turn the function on.

Cogman · May 17, 2011

drizek said:
Does anyone know what the red line is in the windows task manager CPU graphs? I've never seen that before.

Kernel time. This is how much of the CPU is being used by the kernel vs a user program. If you turn it on, just be aware that having high kernel times isn't a bad thing, it just means that a program might be talking a lot with the OS.

videoclone · May 17, 2011

more rumours. From a guy who apparently works with a computer supplier. US prices obviously.

llano
E2-3250 = 70$
A4-3350 = 80$
A6-3450 = 110$
A6-3450P = 130$
A6-3550 = 150$
A8-3550P = 170$

Bulldozer priced similar to sb.
FX4110 = 190$
FX6110 = 240$
FX8110 = 290$
FX8130P = 320$

PII's are expected to have 15-20% pricecut.
That's probably what they pay per 1000 units, retail will likely be more

P = Performance? Higher default speed than the others.

All FX-series (Zambezi) will be Black Edition.

The three models without P => 95W TDP
The last model with P => 125W TDP

Those prices give me hope. Now matching intel in performance and price, but if they really did completly stomp all over Intels CPU's then i would think AMD would price them higher! no?

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Lifer

Lifer

Lifer

Golden Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Golden Member

Member

Diamond Member

Lifer

Elite Member

Diamond Member

Golden Member

Elite Member

Diamond Member

Diamond Member

Golden Member

Senior member

Lifer

Golden Member