My (conspiracy) theory on bulldozer benches so far...

deltbrah · Sep 18, 2011

After reading some of the the leaked benchmarks that have been appearing lately regarding bulldozer (FX-8150), it appears that 8150 has less single thread ipc (clock for clock) then phenom II (super pi = 21sec at 3.6GHz maybe with turbo on).

My theory is that the chips (ES samples?) being benchmarked are restricted in the following way. For a module running a single thread, that thread should be able to use all available resources within the module (module becomes a 4 issue core essentially). However, I think that during these benchmarks, there might only be a static allocation of resources (so during leaked benchmarks, module acts as a two issue core). I also heard that clock for clock, somebody on these threads mentioned that bulldozer appears to have the IPC of zacate chips (which are two issue out of order execution cores).

your thoughts?

Also on a side note, I think a better way to look at bulldozer is not a 8 core processor, but a quad core processor where AMD has implemented a more elegant solution to run simultaneous threads in each core than HT.

PreferLinux · Sep 18, 2011

deltbrah said:
After reading some of the the leaked benchmarks that have been appearing lately regarding bulldozer (FX-8150), it appears that 8150 has less single thread ipc (clock for clock) then phenom II (super pi = 21sec at 3.6GHz maybe with turbo on).

My theory is that the chips (ES samples?) being benchmarked are restricted in the following way. For a module running a single thread, that thread should be able to use all available resources within the module (module becomes a 4 issue core essentially). However, I think that during these benchmarks, there might only be a static allocation of resources (so during leaked benchmarks, module acts as a two issue core). I also heard that clock for clock, somebody on these threads mentioned that bulldozer appears to have the IPC of zacate chips (which are two issue out of order execution cores).

your thoughts?

Also on a side note, I think a better way to look at bulldozer is not a 8 core processor, but a quad core processor where AMD has implemented a more elegant solution to run simultaneous threads in each core than HT.

They won't, ever. One thread only ever gets one int core.

videoclone · Sep 18, 2011

AMD increased the Instruction pipeline of the NEW Bulldozer cpu much like Intel did with the Pentium 4.

This is the single worst thing they could have done and this is the reason why bulldozer is falling short of expectations, (from leeked benches of ES ) is slower then their 2 years old 6 core CPU's and is also why AMD's CEO got fired for letting this happen.

This longer pipeline nullified the improvements to the CPU much like Intel's Pentium 4, Every Mhz was worth less no amount of cache or improvement to the design could fix that. Until Intel canned the longer stage pipeline for a more efficient Pentium Pro approach with core series processor's and the rest is history.

( PS the Core series design had an even shorter pipeline then the AMD Athlon64 it was up against and moped the floor with it.. ) do you see where I’m going with this?

Just my thoughts.

NostaSeronx · Sep 18, 2011

videoclone said:
AMD increased the Instruction pipeline of the NEW Bulldozer cpu much like Intel did with the Pentium 4.

12? to 17~? is not that big <-- Bulldozer
12? to 31 is big <-- Pentium 4

and High Clocks

FO4 8? <-- Pentium 4
FO4 16-19~? <-- Bulldozer from 22-24~? of Phenom II

(Dresdenboy help? explain? or someone else)

videoclone said:
This is the single worst thing they could have done and this is the reason why bulldozer is falling short of expectations, (from leeked benches of ES ) is slower then their 2 years old 6 core CPU's and is also why AMD's CEO got fired for letting this happen.

It could be several other reasons than the pipeline length

videoclone said:
This longer pipeline nullified the improvements to the CPU much like Intel's Pentium 4, Every Mhz was worth less no amount of cache or improvement to the design could fix that. Until Intel canned the longer stage pipeline for a more efficient Pentium Pro approach with core series processor's and the rest is history.

It wasn't the pipeline length that made Pentium 4 a bad idea it was other bad design decisions

videoclone said:
( PS the Core series design had an even shorter pipeline then the AMD Athlon64 it was up against and moped the floor with it.. ) do you see where I’m going with this?

I thought they had the same length or maybe they had 10, I don't really pay attention to Intel

Elixer · Sep 18, 2011

deltbrah said:
your thoughts?

If you really want a conspiracy theory, you would have posted about AGESA settings to make it look worse than it really is. :whiste:

BD231 · Sep 18, 2011

deltbrah said:
Also on a side note, I think a better way to look at bulldozer is not a 8 core processor, but a quad core processor where AMD has implemented a more elegant solution to run simultaneous threads in each core than HT.

From what I've gathered of these threads it looks like its not going to be a crappy part from a technology standpoint. We've all wanted a new process from AMD for more frequency headroom so at the very least they're moving forward on that front, but nothing has shown me these CPU's are going to be hugely different in performance where the added hardware cannot be utilized. That's my biggest fear out of all the speculation right now, is that AMD only did so much as slap together existing cpu designs in this variation. It's downright tacky if they did ... but people swear up and down bulldozer is a new architecture from the ground up. I guess we'll see.

My biggest question would have to be why didn't AMD have the decency to label it appropriately like Intel with their HT tech? Seems like a deceptive move if they come strong stating they have eight core parts in the form of these octo-quads.

I prefer my added cpus to be full fledged CPU's, just sayin.

videoclone · Sep 18, 2011

NostaSeronx said:
I thought they had the same length or maybe they had 10, I don't really pay attention to Intel

Yeah core1 had 14 stage
Athlon64 had 12 stage
Athlon64 FX started to get to 15 stage thats about the time Core1 Core2 showed up so i was pointing to the 2006 AMD 64s

Accord99 · Sep 19, 2011

deltbrah said:
Also on a side note, I think a better way to look at bulldozer is not a 8 core processor, but a quad core processor where AMD has implemented a more elegant solution to run simultaneous threads in each core than HT.

No, AMD's implementation of CMT is an inferior, less elegant version of a wide core with HT. For a BD module and a SB core with Hyperthreading, with two threads both a BD module and SB core can be considered two "Wimpy" cores. However, with one thread, SB becomes a "Brawny" core as the one thread has access to every resource of the core. With one thread on a BD module, it's still only running on a "Wimpy" core and the execution resources of the second integer core are inaccessible to the thread.

Riek · Sep 19, 2011

videoclone said:
Yeah core1 had 14 stage
Athlon64 had 12 stage
Athlon64 FX started to get to 15 stage thats about the time Core1 Core2 showed up so i was pointing to the 2006 AMD 64s

Nehalem and Sandy bridge increased the pipeline length from Core Duo.

a longer pipeline is needed to enhance performance. P4 was revolutionnary but well before its time. I wouldnt be suprised if we would see >25 stage pipelines over some years.

since x86 almost every new architecture has increased its pipeline length compared to the previous generation. Increasing pipelinelength is necesairy. Maybe they did need an architecture in between to have more experience with a huge front end (that they didnt had previously), if thats the case then it will take BD2 to get things going for them.

AtenRa · Sep 19, 2011

Accord99 said:
No, AMD's implementation of CMT is an inferior, less elegant version of a wide core with HT. For a BD module and a SB core with Hyperthreading, with two threads both a BD module and SB core can be considered two "Wimpy" cores. However, with one thread, SB becomes a "Brawny" core as the one thread has access to every resource of the core. With one thread on a BD module, it's still only running on a "Wimpy" core and the execution resources of the second integer core are inaccessible to the thread.

80%(avg) with CMT for second thread vs 30%(Max) with HT and it is inferior in Multithreading ?? get serious plz.

Just do a CMT with two SB cores(no HT) and you have more performance than one SB Core with HT 😉

Accord99 · Sep 19, 2011

AtenRa said:
80%(avg) with CMT for second thread vs 30%(Max) with HT and it is inferior in Multithreading ?? get serious plz.

80% gain from what? That's the question that AMD is not answering.

Right now the 2600K with >=8 threads has comparable throughput to a hypothetical 8 core Phenom II/Llano at 2.9 GHz. With <=4 threads, it equivalent to a 4.8 GHz Phenom II X4. It's die size is small, and power consumption is roughly the same as the much slower 2.9 GHz LLano. That's what I call elegant.

Just do a CMT with two SB cores(no HT) and you have more performance than one SB Core with HT

Too bad that's not what BD is going to be.

BlueBlazer · Sep 19, 2011

As an ex-AMD engineer quoted.......

On paper bulldozer is a lovely chip. Bulldozer was on the drawing board (people were even working on it) even back when I was there. All I can say is that by the time you see silicon for sale, it will be a lot less impressive, both in its own terms and when compared to what Intel will be offering. (Because I have no faith AMD knows how to actually design chips anymore). I don't really want to reveal what I know about Bulldozer from my time at AMD.

Theoretical performance versus actual performance may be a different story. :hmm:

AtenRa · Sep 19, 2011

Accord99 said:
80% gain from what? That's the question that AMD is not answering.

Right now the 2600K with >=8 threads has comparable throughput to a hypothetical 8 core Phenom II/Llano at 2.9 GHz. With <=4 threads, it equivalent to a 4.8 GHz Phenom II X4. It's die size is small, and power consumption is roughly the same as the much slower 2.9 GHz LLano. That's what I call elegant.

Too bad that's not what BD is going to be.

Try read what i said once again and im sure you will understand, just to help you a bit,

Take out of the equation the performance comparison between SB and BD, just try and see what an SB with CMT would be vs an SB with HT 😉

Accord99 · Sep 19, 2011

AtenRa said:
Take out of the equation the performance comparison between SB and BD, just try and see what an SB with CMT would be vs an SB with HT 😉

And imagine that CMT'ed SB instead implemented with full SMT, an 8-issue quad-threaded CPU that can allocated all its resources between 1-4 threads. That would be even better.

NostaSeronx · Sep 19, 2011

AMD's Solution to die reduction
80% -> 80% -> 80% -> 80%
aka
100% -> 60% -> 95% -> 65% -> 95% -> 65% -> 95% -> 65%
or for comparison
100% -> 95% -> 95% -> 95% -> 65% -> 65% -> 65% -> 65%
6.4 cores without any help

Intel's Solution is
100% -> 95% -> 95% -> 95% -> 18% -> 18% -> 18% -> 18%
4.57 cores without any help

Oh look the equations workout

Ignore the scores look at the MP Ratios only

AMD's Solution
100% -> 95% -> 95% -> 95% -> 65% -> 65% -> 65% -> 65%

Intel's Solution
100% -> 95% -> 95% -> 95% -> 18% -> 18% -> 18% -> 18%

Final DIE IPC is unknown and shouldn't be compared to Engineer Samples that are for validation aka finding errors in the die or microcode

BlueBlazer · Sep 19, 2011

Accord99 said:
And imagine that CMT'ed SB instead implemented with full SMT, an 8-issue quad-threaded CPU that can allocated all its resources between 1-4 threads. That would be even better.

That would increase its die size as the requirements to not only duplicate the 4-issue wide execution units but also other architectural enhancements of its core. Furthermore, the tight integration of execution units, out-of-order, schedulers and other parts gives it better performance. Breaking up this synergy may actually slightly degrade its overall performance (or make it worse). I guess the engineers at Intel have already decided early not to pursue this course of development (by looking at Ivy Bridge enhancements revealed). :hmm:

marino.DV · Sep 19, 2011

BlueBlazer said:
That would increase its die size as the requirements to not only duplicate the 4-issue wide execution units but also other architectural enhancements of its core. Furthermore, the tight integration of execution units, out-of-order, schedulers and other parts gives it better performance. Breaking up this synergy may actually slightly degrade its overall performance (or make it worse). I guess the engineers at Intel have already decided early not to pursue this course of development (by looking at Ivy Bridge enhancements revealed). :hmm:

I agree with this !

BlueBlazer · Sep 19, 2011

NostaSeronx said:
Oh look the equations workout

Ignore the scores look at the MP Ratios only

That's all Bulldozer "cores" at 4.22GHz (while the other is Sandy Bridge at 3.4GHz base/3.8GHz turbo), which is not correct. You are not comparing it proper. Anyway, that looks like (fraudster) OBR's score! 😛

NostaSeronx · Sep 19, 2011

BlueBlazer said:
That's all Bulldozer "cores" at 4.22GHz (while the other is Sandy Bridge at 3.4GHz base/3.8GHz turbo), which is not correct. You are not comparing it proper. Anyway, that looks like (fraudster) OBR's score! 😛

Notice I said ignore the scores but look at the MP Ratio it works with my equation

ES Performance from before June will not equal Performance of the Final Die coming soon

Remember this is a new Architecture it's going to have a lot of faults to fix

If they can get the single core performance up(if any circuits were colliding) without increasing the clock to at least 25% it could be a bunker buster

Accord99 · Sep 19, 2011

NostaSeronx said:
Ignore the scores look at the MP Ratios only

The scores in the end are the only thing that matters.

So you basically highlight the difference of AMD's dual narrow core CMT versus Intel's wide core SMT. Seeing as the 2600K scores ~6.9 at only 3.5 GHz, it can be said that the 8150 BD is always 8 "Wimpy" while 2600K SB can either be 8 "Wimpy" cores or 4 "Brawny" cores. The difference in scaling ratio comes from the fact that even with one thread BD remains wimpy.

Accord99 · Sep 19, 2011

BlueBlazer said:
That would increase its die size as the requirements to not only duplicate the 4-issue wide execution units but also other architectural enhancements of its core. Furthermore, the tight integration of execution units, out-of-order, schedulers and other parts gives it better performance. Breaking up this synergy may actually slightly degrade its overall performance (or make it worse). I guess the engineers at Intel have already decided early not to pursue this course of development (by looking at Ivy Bridge enhancements revealed). :hmm:

That might be true, but then that's possibly problems that AMD also had to deal with BD, seeing as they had to narrow their cores down.

Accord99 · Sep 19, 2011

NostaSeronx said:
If they can get the single core performance up(if any circuits were colliding) without increasing the clock to at least 25% it could be a bunker buster

It sure would, it might even be competitive with a hypothetical 5-core Sandy Bridge with HT.

NostaSeronx · Sep 19, 2011

Accord99 said:
So you basically highlight the difference of AMD's dual narrow core CMT versus Intel's wide core SMT. Seeing as the 2600K scores ~6.9 at only 3.5 GHz, it can be said that the 8150 BD is always 8 "Wimpy" while 2600K SB can either be 8 "Wimpy" cores or 4 "Brawny" cores. The difference in scaling ratio comes from the fact that even with one thread BD remains wimpy.

The Bulldozer cores are about the same size as Intel's core

It could be many issues that are affecting Single Core Performance

We don't know the full story

It could be many things especially that AMD K15h isn't supported in Cinebench R11.5
While K7->K12 is the same and K15h is different
and Intel's architecture maintain and improve the quota

Since Core 2 Intel has maintained 3 ALUs and 2+1 AGUs, that is why people see consistent improvements when you go from an intel core to an intel core

AMD K10 -> K15

Dropping an ALU/AGU pair and providing dedicated pipelines

Accord99 said:
It sure would, it might even be competitive with a hypothetical 5-core Sandy Bridge with HT.

The man in my ear says we will see a 50% improvement from Cinebench R11.5 to Cinebench R11.6(R12) now it gets the same score as Intel's i7 3960X I wonder why

BlueBlazer · Sep 19, 2011

NostaSeronx said:
Notice I said ignore the scores but look at the MP Ratio it works with my equation

Take OBR's postings with lots of salt. After all he "punk'd" lots of news sites with his fake graphs. 😛

NostaSeronx said:
ES Performance from before June will not equal Performance of the Final Die coming soon

Remember this is a new Architecture it's going to have a lot of faults to fix

The improvements I've seen (IMHO valid leaks, but not OBR's) from B0 stepping (earliest) to B2 stepping (latest) is hardly much at all in this Cinebench benchmark (by extrapolating results hypothethically with frequency). Whatever fixes forthcoming will take time (silicon re-spins, packaging, testing, validation and debugging which causes delays). 😉

Accord99 · Sep 19, 2011

NostaSeronx said:
The man in my ear says we will see a 50% improvement from Cinebench R11.5 to Cinebench R11.6(R12) now it gets the same score as Intel's i7 3960X I wonder why

They're running Interlagos? That's probably the only way AMD will be competitive with the throughput of 6-core SB.

My (conspiracy) theory on bulldozer benches so far...

Junior Member

Senior member

Golden Member

Diamond Member

Lifer

Lifer

Golden Member

Platinum Member

Senior member

Lifer

Platinum Member

Senior member

Lifer

Platinum Member

Diamond Member

Senior member

Member

Senior member

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Platinum Member