Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Mopetar · Feb 8, 2011

IntelUser2000 said:
I assume the code fits within the L1 caches perfectly well and never stress the core routers, the L2 and L3 caches, or even the memory subsystem. Because it fits so easily within the caches, there's almost no bottleneck or contention with Hyperthreading.

Usually hyper-threading sees more of a gain when there are cache misses or branch mispredictions, in which case one thread is fishing in the L2/L3 caches or has to wait to get the data from memory or the pipeline has the be flushed. The other thread has opportunity to take over.

The other case is that the current thread isn't using all of the chip's physical resources (ALUs, etc.) and the hyper-thread can use them.

The > 30% from the benchmark is ridiculously good performance from hyper-threading. I'm not overly familiar with the benchmark or every detail SB architecture, so it is possible that it's a benchmark that works well with hyper-threading, or that the way the SB architecture is designed allows for better performance on hyper-threads and the benchmark can't make full use of the hardware.

Mopetar · Feb 8, 2011

After looking at Anand's benchmarks for the SB review, 30% isn't completely out of line. There are a few benchmarks that see at least that much, if not somewhat more, performance gain between the 2500K and 2600K.

x264 - 2nd Pass
Pov-Ray
7-Zip Benchmark
Visual Studio Chromium Compile

All of these benchmarks are able to get about that same level of performance increase. Otherwise most benchmarks see 2% - 15% performance increases at most. Additionally, these are the same benchmarks where the improvements from having more physical cores are also the biggest. Here's the performance increase from going with the 2500K to the 2600K as well as the performance increase from using the 980X over the 2600K.

Code:

Bench | 2500K --> 2600K | 2600K --> 980X
7-zip |      36.7%      |      43.2%
P-ray |      41.8%      |      31.7%
x264  |      25.4%      |      28.1%

So if we assume the BD chip does have 8 physical cores, the benchmark isn't terribly unreasonable assuming decent IPC and high clock rate. BD's IPC isn't quite as good as SB, but if it has higher clock rates it's able to compete on a per core basis.

IntelUser2000 · Feb 8, 2011

Here's how I think AMD's claims of Interlagos being 50% faster than Magny Cours comes from. They claimed that's in Integer, and SpecCPU is a very popular benchmark for servers, so its a reasonable assumption they are using the latest SpecCPU benchmark.

SpecInt2k06_Rate has an average of 80-90% scaling for 100% increase in cores.

33% more cores = +28%(assuming 85%)

AMD says the Turbo core in Interlagos can boost 500MHz when all cores are active, and more when less is active.

Magny Cours: 2.3GHz for the top SKU
2.3GHz + 500MHz Turbo Core = 21.7% clock speed boost

They only need 15% performance increase from clocks to get total of 50%. Now 20% from Turbo is a lot, and that's deserving of a job well done by the engineers, looking at the TDP and the total performance improvement.

What about the desktop SKU? Umm, hello, we're talking about servers.

Elixer · Feb 8, 2011

JFAMD said:
I don't leak and if desktop wanted to do a leak they would have to get my approval because we share the die. It's not in our culture to leak data because it messes up the supply chains for our partners.

Doesn't that depend on what kind of leaked data we are talking about ? Not everthing that can be leaked can mess with the 'supply chains' for your partners.

There is so much fake info about what is going on, it isn't funny.

Anexate said:
Above all, JFMAD is being treated like a squeezed orange, attacked by a thirsty Paris Hilton which wants to check if there is a drop or two still left.

Heh, now why do you think JF-AMD is JF-MAD when he gets done reading the forums ?

JF-AMD, your next blog should be about your forum adventures, I am sure it would make a good read

Mopetar · Feb 8, 2011

IntelUser2000 said:
Here's how I think AMD's claims of Interlagos being 50% faster than Magny Cours comes from. They claimed that's in Integer, and SpecCPU is a very popular benchmark for servers, so its a reasonable assumption they are using the latest SpecCPU benchmark.

SpecInt2k06_Rate has an average of 80-90% scaling for 100% increase in cores.

33% more cores = +28%(assuming 85%)

AMD says the Turbo core in Interlagos can boost 500MHz when all cores are active, and more when less is active.

Magny Cours: 2.3GHz for the top SKU
2.3GHz + 500MHz Turbo Core = 21.7% clock speed boost

They only need 15% performance increase from clocks to get total of 50%. Now 20% from Turbo is a lot, and that's deserving of a job well done by the engineers, looking at the TDP and the total performance improvement.

What about the desktop SKU? Umm, hello, we're talking about servers.

Aren't the new server parts also going to be based on BD? If they're relying on core count increase and clock speed increase alone to gain there performance it would be a massive disappointment. That's essentially admitting that you can't design a better architecture and that only moving to a new process could save you.

Bobcat turned out fairly well so I'm fairly confident that AMD is capable of getting performance from more than just core and clock increases.

hamunaptra · Feb 8, 2011

Mopetar said:
Aren't the new server parts also going to be based on BD? If they're relying on core count increase and clock speed increase alone to gain there performance it would be a massive disappointment. That's essentially admitting that you can't design a better architecture and that only moving to a new process could save you.

Bobcat turned out fairly well so I'm fairly confident that AMD is capable of getting performance from more than just core and clock increases.

Um dude...building a uarch that increases core count and clockspeed while providing a small IPC boost to boot and do all that at a very good TDP...is a significant achievement.

Its not admitting you cant design a better uarch, its admitting that there are other ways to get more performance per given amount of energy used...rather than just going for the highest IPC and wasting lots of energy.

Not only admitting but chasing that goal is something that rarely happens in the x86 world. There have been no significant changes to a x86 uarch since the pentium pro, core2 and up are based on it.
Pentium 4 was the huge departure from a normal x86 design but that failed.
AMD's last what like 4 generations of uarchs or more have all been based off of k8, which was based off of k7.
BD is the first fresh new uarch we're gonna see in a long time. I hope it gets AMD some recognition and intel competition!

BD is also the first high speed design since the P4. If AMD does it right they will have one hell of a chip on their hands.
I just hope BD doesnt have any "hidden features" which were week links of the uarch as a result of trying to get the higher clock. Kinda like the failed replay system of the P4. eek that would drive me crazy if BD was only "this good" in certain circumstances...grrr
But, I doubt AMD is dumb enough at this point to pull something like that off.

JFAMD · Feb 8, 2011

Phynaz said:
Intel grantees that a fully loaded cpu will run one speed bin over it's rating.

AMD has made no such claim.

AMD has said up to 500MHz boost with all cores active. 16 threads baby.

Elixer said:
JF-AMD, your next blog should be about your forum adventures, I am sure it would make a good read

I try to keep my forum activity seperate from my day job. As soon as someone starts connecting that then they are going to start asking too many questions. This is not part of my job.

Mopetar said:
Aren't the new server parts also going to be based on BD? If they're relying on core count increase and clock speed increase alone to gain there performance it would be a massive disappointment. That's essentially admitting that you can't design a better architecture and that only moving to a new process could save you.

Yes there are server parts. If increasing core counts to get more performance is such a "disappointing" strategy, are you willing to say the same thing for Intel? Westmere was a 32nm shrink that allowed them to get two more cores onto the die.

Mopetar · Feb 8, 2011

JFAMD said:
Yes there are server parts. If increasing core counts to get more performance is such a "disappointing" strategy, are you willing to say the same thing for Intel? Westmere was a 32nm shrink that allowed them to get two more cores onto the die.

I'm not saying increasing core count is bad. All things being equal it would be nice if every new BD core outperformed a Stars core (More precisely ever BD module outperformed 2 Stars cores.) on top of being able to be clocked higher and fitting more cores onto the chip.

That's not always possible, and realistically if a BD module were able to offer the same performance of 2 Stars cores while using significantly fewer transistors to do so I would consider it a performance improvement.

What I meant to say was it would be somewhat disappointing if the new BD chips weren't much better than if AMD had decided to die shrink the Stars core, putting more on a chip and increasing the clock rate. The fact that they can turbo up to 500 MHz on all cores tells me that there's some special sauce in that design. Hopefully there are a few more surprises in store as well.

Anexate · Feb 9, 2011

Elixer said:
There is so much fake info about what is going on, it isn't funny.

Heh, now why do you think JF-AMD is JF-MAD when he gets done reading the forums ?

JF-AMD, your next blog should be about your forum adventures, I am sure it would make a good read

I agree; the web is buzzing with fake rumors and data. I would dare to say, that a good part of them has a "blue" godfather.
At this point my guess would be, that AMD are so sure in their new Bulldozer proc., that they let the talk be done and to "afraid" of the Intel marketing machine and the Ivy Bridge shrink coupled with the LGA2011.
Intel is already defending the grounds, with Ivy Bridge, swamping the net with +20% performance, which is almost to impossible for a mere shrink (ok, more GHz, but I doubt in a prolonged use at 5,8GHz).
AMD, on the other hand, cannot help themselves with shrink stunts, and has to rely on the quality of the new architecture.
Undoubtedly AMD got a hand from the divine with this motherboard "bug".

That blog from JF-amd would be loved on Wall Street; every second line would be "mooore", give me "moore",...
The financial derivates there are much less complex than the discussion founded on "leaked" and faked performance reports, that get worked based on semantic discussion of the various synthetic benchmarks.

Phynaz · Feb 9, 2011

JFAMD said:
AMD has said up to 500MHz boost with all cores active. 16 threads baby.

"up to"

What boost will you guarantee?

Martimus · Feb 9, 2011

Phynaz said:
"up to"

What boost will you guarantee?

500MHz. If the turbo boost is guaranteed at all times, then it isn't really a turbo boost, it is the regular clock speed.

drizek · Feb 9, 2011

I don't get it.

AMD says theirTurbo is based on TDP, but I can't imagine a scenario where there can be any difference in TDP when all cores are active. I can understand turboing two modules while the other two are turned off, but when all 4 are on, how can you clock them up without changing the TDP?

I'm hoping that the answer to this is that they are going to give us a TDP slider, that lets us turn our 125W CPUs into 140s, yielding a full 500MHz clock increase. Alternatively, you can turn it into a 95W part and disable Turbo entirely.

Mopetar · Feb 9, 2011

drizek said:
I don't get it.

AMD says theirTurbo is based on TDP, but I can't imagine a scenario where there can be any difference in TDP when all cores are active. I can understand turboing two modules while the other two are turned off, but when all 4 are on, how can you clock them up without changing the TDP?

I'm hoping that the answer to this is that they are going to give us a TDP slider, that lets us turn our 125W CPUs into 140s, yielding a full 500MHz clock increase. Alternatively, you can turn it into a 95W part and disable Turbo entirely.

There was a post earlier in this thread (or another) where someone pointed out that the TDP of most chips is rarely reached, which is why they can turbo up in most cases without exceeding the TDP. Intel has talked about their turbo mode which will increase the clock to a level that would normally cause the chip to exceed its TDP, but that because the chip doesn't immediately reach that point, it can overshoot for a few seconds.

TDP is really just a number that says under no conditions will this chip output more than this amount of heat. Obviously this doesn't hold true once you start over-clocking the processor, but the chip you're sold won't exceed its TDP as long as it's not defective. Ramping up the base clock speed takes more power and ends up producing more waste heat as a side effect so chips with higher stock clocks usually have a higher TDP. It's possible for BD that adding an extra 500 MHz across all cores doesn't cause it to break the TDP thresholds set for its chips.

You can already change the TDP of your chip through over-clocking. However, if you're doing a serious OC you're not going to leave the stock cooler on your chip as it's probably not built to handle it. Don't know why you'd want to lower the TDP of your chip. The only reason would be if using a lower clock setting resulted in better performance per watt, but that's not usually the case. There was a recent article on Tom's hardware that found the 2600K was slightly more efficient at certain higher clock speeds than at its stock settings.

AtenRa · Feb 9, 2011

drizek said:
I don't get it.

AMD says theirTurbo is based on TDP, but I can't imagine a scenario where there can be any difference in TDP when all cores are active. I can understand turboing two modules while the other two are turned off, but when all 4 are on, how can you clock them up without changing the TDP?

I'm hoping that the answer to this is that they are going to give us a TDP slider, that lets us turn our 125W CPUs into 140s, yielding a full 500MHz clock increase. Alternatively, you can turn it into a 95W part and disable Turbo entirely.

http://blogs.amd.com/work/2011/01/31/bulldozer-goes-to-11/

dmens · Feb 9, 2011

Martimus said:
500MHz. If the turbo boost is guaranteed at all times, then it isn't really a turbo boost, it is the regular clock speed.

What's the point of saying "up to 500Mhz"?

I can take any unfused i7 and set the max turbo multiplier on all cores to 63 and claim I can have "up to" some ridiculous amount of turbo frequency. Or I can set the default multiplier to 11, turn on turbo, and viola. +500mhz turbo, all the time. Both tricks are nothing more than marketing gimmicks.

You can give the customer the option of having a really high turbo multiplier, but you better add the caveat that the CPU would not get anywhere close to that frequency without a ridiculous cooling solution. Omitting the cooling requirement turns a legitimate feature into an inaccurate marketing ploy.

Schmide · Feb 9, 2011

dmens said:
What's the point of saying "up to 500Mhz"?

I can take any unfused i7 and set the max turbo multiplier on all cores to 63 and claim I can have "up to" some ridiculous amount of turbo frequency. Or I can set the default multiplier to 11, turn on turbo, and viola. +500mhz turbo, all the time. Both tricks are nothing more than marketing gimmicks.

You can give the customer the option of having a really high turbo multiplier, but you better add the caveat that the CPU would not get anywhere close to that frequency without a ridiculous cooling solution. Omitting the cooling requirement turns a legitimate feature into an inaccurate marketing ploy.

Did you read it before you spewed it? The article is about measuring power and allowing the CPU to optimize its clock speed around that. Please revisit it.

dmens · Feb 9, 2011

Schmide said:
Did you read it before you spewed it? The article is about measuring power and allowing the CPU to optimize its clock speed around that. Please revisit it.

I was specifically addressing the "500mhz boost on all cores" marketing point in that article.

Schmide · Feb 9, 2011

I get that, but you're taking it out of context and putting forth a reductio ad absurdum response.

Martimus · Feb 9, 2011

dmens said:
What's the point of saying "up to 500Mhz"?

I can take any unfused i7 and set the max turbo multiplier on all cores to 63 and claim I can have "up to" some ridiculous amount of turbo frequency. Or I can set the default multiplier to 11, turn on turbo, and viola. +500mhz turbo, all the time. Both tricks are nothing more than marketing gimmicks.

You can give the customer the option of having a really high turbo multiplier, but you better add the caveat that the CPU would not get anywhere close to that frequency without a ridiculous cooling solution. Omitting the cooling requirement turns a legitimate feature into an inaccurate marketing ploy.

The way it has been explained, you get up to 500MHz on all cores until the TDP limit is reached. I would read the blog to explain it if I were you, although it isn't particularly technical. It is still the best that we have right now.

We won't see how it all works exactly until it is released in a few months. And I sincerely doubt that John Fruehe is telling you what some maximum BIOS setting is for the turbo multiplier that increases the TDP, since he is refering to server chips, and it wouldn't make much sense to have increased TDP for that feature.

dmens · Feb 9, 2011

Schmide said:
I get that, but you're taking it out of context and putting forth a reductio ad absurdum response.

Out of context how? I am well aware of how the intel system works and the article describes a very similar system which uses TDP estimates to boost speed. On the system I am familiar with, the actual measurement is current, but whatever.

Given that knowledge and reasonable CPU constraints (i.e. no funny multiplier settings and a normal cooling system), what I wrote is actually the only logical way such gains are possible.

tweakboy · Feb 9, 2011

Wow, 50 percent is a lot. Well the extra cores will just help increase its speed.

12 core Dozer . I can see why they would say 50 percent.

Over a 6 core Intel, possibly 12 core ivybridge. well see,,

bryanW1995 · Feb 9, 2011

drizek said:
If that is real, then they are basically going to be shipping 4-5GHz chips. Somehow, I doubt that that is going to happen.

not necessarily. there's no scale listed, it could just be 1.2x 980x performance in that one benchmark. even if it's 10% slower than nehalem per clock, remember that an 8 core BD has 33% more cores. so a 1.2x performance would put it around clock parity with a 980x in that scenario. even 1.5x performance is possible at around 4ghz depending on ipc relative to nehalem/SB.

bryanW1995 · Feb 9, 2011

IntelUser2000 said:
As long as we are speculating, there's no reason Intel chips can't reach those frequencies either. However, it remains true the only chip capable of reaching 5GHz is Power 7. I know they also use 200W of power and cost more than $10k for the high end chip. Some even use liquid cooling to cool the chips.

if intel wasn't underclocking SB so aggressively we would have many sku's over 4ghz right now, with an EE edition easily able to reach 4.5. Intel clock speeds aren't that far behind ibm, they just don't have any reason to push the envelope right now.

drizek · Feb 9, 2011

bryanW1995 said:
not necessarily. there's no scale listed, it could just be 1.2x 980x performance in that one benchmark. even if it's 10% slower than nehalem per clock, remember that an 8 core BD has 33% more cores. so a 1.2x performance would put it around clock parity with a 980x in that scenario. even 1.5x performance is possible at around 4ghz depending on ipc relative to nehalem/SB.

It is to scale. THe numbers are on the bottom.

Mopetar · Feb 9, 2011

bryanW1995 said:
if intel wasn't underclocking SB so aggressively we would have many sku's over 4ghz right now, with an EE edition easily able to reach 4.5. Intel clock speeds aren't that far behind ibm, they just don't have any reason to push the envelope right now.

Increasing the clock rate by that much would also increase the TDP. They probably have existing TDP targets that they wanted to hit so they set their clock rates accordingly. You are right that there's no reason to push very hard right now. Leaving something back gives them room to grow later and lets them react more flexibly to AMD.

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Diamond Member

Diamond Member

Elite Member

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Member

Lifer

Diamond Member

Golden Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Lifer

Golden Member

Diamond Member