Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
no ShadowVVL,

a 8 core BD, is not like a 4 core with hyper threading.
They are 8 "real" cores, they will show up as 8 cores, and they ll handle 8 threads.

So the task manager would show 8 cores.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
well i was just wondering if the 8 core BD is just like a quad core with hyper threading only using half cores instead of virtual threads? Or will it be 8 cores and 8 half cores where 1 and a half show as 1 core in the task manager?

You will see 8 cores in the Windows task manager. There are two 'integer' cores which is what Windows will show. The floating point unit can be shared of used by one thread on 256b operations. IIRC, because some resources are shared between the two cores the average performance will be 80% of that of two fully independent core.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
Does the performance estimation of 2 threads running at the same Bulldozer module means that they may achieve UP TO 80% performance when compared to the same threads running on different modules (I meant the range from 50% (no or negative performance gain) to 80% ? -Anton
It means (in rough general terms):
1 thread on 1 module = 100%
2 threads on 1 module = 180%
2 threads on 2 modules = 200%
.
Compare that with hyperthreading:
1 thread on 1 core = 100%
2 threads on 1 core = 120%
.
There are 2 ways to solve throughput problems. One is to throw more threads at the problem, one is to throw more execution resources at the problem. HT is like adding more checkout lines at the grocery store to take care of more customers, but making the cashiers jump back and forth between the checkouts instead of putting one cashier on each checkout. - John Fruehe
Quote from: http://blogs.amd.com/work/2010/10/04/20-questions-part-4/comment-page-2/#comments


@ShadowVVL, after I saw your question and Ajays answear it made me think about the blog and what I read there... not sure I could answear it better so did a copy paste of John Fruehe answearing a simular question.

The 1 cashier pr checkout, is the modual approach. ^^
 
Last edited:

ShadowVVL

Senior member
May 1, 2010
758
0
71
Kind of makes sence.

I was thinking it was gonna use 1 and 1 half core per module to run something like hyper threading.

Still a bit over my head but no biggy, as long as its fast in both single thread and multi thats what counts.


BD will be cpu not apu right?
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
BD will be many things... lateron in APUs (next gen ones will have BD cores), but first:

Zambezi (4 moduals, 8 cores, 8 threads) (2-4? memory channels) (client)
Valencia (4 moduals, 8 cores, 8 threads) (2 memory channels) (server)
Interlagos(8 moduals, 16 cores, 16 threads) (4 memory channels) (server)


ShadowVVL for simplisity sake, just think of it as 2 "cores", they are 2 full 'integer' cores pr each modual (non of that 1 and a half stuff).

also core by core, they will be faster than what amd currently has, and theyre designed to handle much higher speeds (Mhz wise). So we might actually see amd stock cpus over 4ghz.
 
Last edited:

ShadowVVL

Senior member
May 1, 2010
758
0
71
Ok thank you that clears it up.

I am kind of like apus alot, im a fan of small pcs like mini itx or u1 rack mount but I dont think i could fit a graphics card in them or keep it vary cool.

Also no way to fit a 500w psu in there either.

Wow 8 core on only 2 stick of ram.

I thought client was server or is it not?

so 5.5ghz maybe or 5.2.

Depending on the price and performance i might do llano for my micro work station and BD for gaming and high end work station.
 
Last edited:

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,402
8,574
126
Sure, if we had 8 core and 4 core CPUs whic had the same die size, same TDW, same speed, same price, same IMC, etc. But we don't. :p

i don't care about die size either. only intel and amd care about die size.

price, performance, and watts are the only things that affect a consumer. that would be... pirce, 0-60 and fuel mileage. so :p
 
Last edited:

ydnas7

Member
Jun 13, 2010
160
0
0
BD is not what i will describe below, but it is comparable in concept against this.

1 x SB core (and cache) has same die area as 2 x Llano core (and cache)

imagine if the fictitious Llano module (2 cores) could share its L1 and L2 cache if less threads were used, that would be a speedup (or clawback)

Intel maximises speed but uses with SMT to clawback some increased thoughput.
AMD maximises throughput but uses anti-SMT to clawback some increased speed.

so AMD and Intel are diverging, and some applications will be more natural to one choice than the other. As Moore's law increases the cores available, Intel's solution will age more gracefully than AMDs. (in the future - 8 full strength Intel cores is more useful than 16 mid strength AMD cores for a single user)

Its academic now, but 2 bulldozer modules would be about the same area as 2 westmere/SB cores, just as dual core sandybridge is much better than the 'dales, so to would 2 module bulldozer would've been very competive (and probably a better balance) than the 'dales.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
@Ydnas7

http://en.wikipedia.org/wiki/Hyper-threading

According to Intel the first implementation only used 5% more die area than the comparable non-hyperthreaded processor, but the performance was 15–30% better.

Simultaneous multithreading cannot improve performance if any of the shared resources are limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that they have to test whether simultaneous multithreading is good or bad for their application in various situations and insert extra logic to turn it off if it decreases performance. Current operating systems lack convenient API calls for this purpose and for preventing processes with different priority from taking resources from each other

Its very hard to code for, gains can be negative (loss of performance).
*if* any of the shared resources are limiting bottlenecks.... so how effective this is depends on how good/badly the coded software is. Not sure Id call this a elegant design, more like makeing the best of a bad situation with said software.


In 2006, hyper-threading was criticised for being energy-inefficient. For example, specialist low-power CPU design company ARM has stated SMT can use up to 46% more power than dual core designs.
its not a efficent way of doing throughput, compaired to useing 2 "real" cores.


so AMD and Intel are diverging, and some applications will be more natural to one choice than the other. As Moore's law increases the cores available, Intel's solution will age more gracefully than AMDs. (in the future - 8 full strength Intel cores is more useful than 16 mid strength AMD cores for a single user)
...clock speed is hitting a wall. You won’t see major improvements in clock speed and IPC is hitting its limits as well. The best increase in performance will come from more cores; software developers know this and you are seeing more and more support every day. Multi-core is looking forward; single threaded apps that rely on clock speed only is looking backwards.


More and More you ll see software developers optimise for more and more cores, so the future does lay in haveing alot of them. I really dont think hypertreading will age more gracefully than useing 2 real cores instead.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Arkadrel said:
In 2006, hyper-threading was criticised for being energy-inefficient. For example, specialist low-power CPU design company ARM has stated SMT can use up to 46% more power than dual core designs.

46% more power than a dual core design - WTH was ARM talking about :confused:

I have seen research in the past pointing to the fact that SMT scales well to about 4 hardware threads, but a single HT core using 46% more power than two cores - nuts!
 

hamunaptra

Senior member
May 24, 2005
929
0
71
To clarify, clock speed is not hitting a wall, its simply the fact that the design of current x86 CPU's focus on IPC moreso than hitting a high clock speed.
Example: IBM's Power7 processors scale 4 - 5ghz nowadays on servers.

It is a FACT that one of the distinct design features of BD is that it is a highspeed microarchitecture design.

From what I understand AMD has been hitting the BD arch design for years and years now, Its one of their most researched/designed archs in probably 6-8 years now.
It departs in so many ways from traditional x86 design and implementation standpoints.

If AMD can pull of higher clocks at launch with NO issues while having VERY MUCH clock headroom, they will have a winner.
Given its been stated each core of the BD performs better in everyway than current thuban cores add in amazingly high clocks while maintaining a 130w or less TDP...they will pretty much own intels current offerings left and right.

If it scales well and can clock high for future purposes, all they gotta do to keep up with ivy bridge is up the clocks more and their golden.

If the need to implement FMA4 vs FMA3 arises in the near future, I read, changing this feature in the FPU cluster is a VERY EASY thing to do.
Noone knows what will be more popular yet. Kinda like IA64 vs AMD's x86-64..noone knew what would be the new standard...

AMD basically sat back and said, well, we can either design an arch that does high IPC in order to keep attacking with the same strategy intel does, or we can give and take for a little less IPC but get a lot more clock speed and power savings...the latter is what they chose.

Everything hinges on PRICE and launch CLOCK SPEED and CLOCK HEADROOM.
They already claim amazing performance will be delivered for the power consumption...so theyve already scored on that vector.
The 3 Ive listed are yet to be seen.



Also, SMT is alright, it really depends on your workload type. Id say maybe 5-10% things suffer from it, 10-50% perform the same, 50-80% have a little increase, 80-100% show up to 30% increase.
Power consumption:
Ive noticed my i7 920 C0 @ 1.05v w/o HT runs 50watts linx load....with HT enabled that jumps up to 55-60watts.
Linx does see a decent improvement with HT enabled, and so do F@H type applications.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
HT is like adding more checkout lines at the grocery store to take care of more customers, but making the cashiers jump back and forth between the checkouts instead of putting one cashier on each checkout. - John Fruehe

Heh, horrid anaology. You can hire another cashier with the die area savings you gained versus adding an entire core. You can do a 4T SMT core for far less area cost than another core, never mind 2T. Plus SMT mainly exploits bottlenecks, i.e. some customer decides to read a magazine at the checkout. Good luck fixing that problem when your one "cashier" is stuck.

Trying to argue 2x area cost for 2x average speedup is better than 1.05x gain for 1.1x average speedup = fail. Especially when SMT is configurable by the OS.

For example, specialist low-power CPU design company ARM has stated SMT can use up to 46% more power than dual core designs.

Um OK?
 
Last edited:

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
AMD basically sat back and said, well, we can either design an arch that does high IPC in order to keep attacking with the same strategy intel does, or we can give and take for a little less IPC but get a lot more clock speed and power savings...the latter is what they chose.

nice post btw, but I have a question on this part for you. Isn't achieving power saving means achieving higher IPC. How can increasing IPC resulting in a loss in power saving?
 

hamunaptra

Senior member
May 24, 2005
929
0
71
46% more power than a dual core design - WTH was ARM talking about :confused:

I have seen research in the past pointing to the fact that SMT scales well to about 4 hardware threads, but a single HT core using 46% more power than two cores - nuts!

Well it depends on how many stages your pipeline is and how saturated you can keep them.

If your pipeline design has few stages and is very effecient at keeping them full then SMT will have VERY LITTLE extra power draw / performance increase.

If your pipeline has TONS of stages and can keep them more saturated using SMT, well...you will have more performance and since those stages arent sitting around doing nothing anymore, you will have more power consumption.

Also, if you use SMT with more threads ie: power7 4threads per core , you're hoping to hell you are running that many threads to keep ALL stages pretty much 100% saturated.
 

tweakboy

Diamond Member
Jan 3, 2010
9,517
2
81
www.hammiestudios.com
I agree with this. bulldozer has 12 cores ,, Intel has a lonely 4 core 8 logical

8 to 12 cores will beat 4 core any day of the week.

Wake up Intel , make a 8 core desktop CPU or 12 core or 16 core, then I upgrade. thank you
 

OCGuy

Lifer
Jul 12, 2000
27,224
37
91
I agree with this. bulldozer has 12 cores ,, Intel has a lonely 4 core 8 logical

8 to 12 cores will beat 4 core any day of the week.

Wake up Intel , make a 8 core desktop CPU or 12 core or 16 core, then I upgrade. thank you

What do you do that takes advantage of more than 4 cores?

Intel has a 6 core desktop CPU that is much faster than your current quad, and you didnt buy it.
 

hamunaptra

Senior member
May 24, 2005
929
0
71
nice post btw, but I have a question on this part for you. Isn't achieving power saving means achieving higher IPC. How can increasing IPC resulting in a loss in power saving?

Because by adding more execution resources and adding more dedicated things for those resources your power consumption traditionally goes up substantially.
Its all a balancing act.
AMD basically realized the following -
Todays modern approach could have meant BD might have looked like this
1 module = 2 cores, fully dedicated resources for each core such as , instruction fetch, decoder stages, branch branch predictors, and each core having their own FPU pipe
OR
1 module = 2 cores sacrificed some of those dedicated resources for shared ones such as
shared instruction fetch, decoder stages, branch predictors, L2 cache and a FP cluster being its own entity completely seperate of the cores but can be utilized any time needed in a flexible way by the cores.


In design option 1, this scenario gives us the greater IPC generally, because the cores arent fighting over what they get, they have all their own stages, but AMD saw this used a ton more die space and power consumption then what they could do with option 2.

In design option 2, AMD has stated the cores in the module only suffer a 10% performance loss(IPC) when both are being heavily utilized simultaneously. But in the process of doing that they were able to save HUGE amounts of die space and maintain a much lower power envelope. While they were at it they made different design decisions( probably because they saw how much in power savings they could achieve) they made a tad bit more sacrifices in IPC to be able to scale the clocks really high.

Lets go with this example....
given a 130watt TDP they were able to go with it like this:
If AMD could pull off independent resources in the cores, 30% higher IPC and 30% higher clockspeed with medium headroom for future clocks
OR
by sharing the resources they could do, 20% higher IPC and 50% higher clockspeed with a higher headroom for future clocks

which design would you have gone after?
Cuz afterall this was the exact design decision AMD was faced with at some point in BD's R&D.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Just hit me that this leak is merely confirmation of what AMD has said about Bulldozer all along. 33% more cores for 50% more performance, 8 core BD is showing about 50% more performance than 6 core Phenom II. I'll probably pick one up if they can deliver it at a similar cost to their current x6 lineup. Oh, and that it retains the AMD benefit of not segregating chips based on virtualization and other features. Can see myself pairing the 8 cores with 16 or so GB of ECC memory and being a happy camper.
 

hamunaptra

Senior member
May 24, 2005
929
0
71
Just hit me that this leak is merely confirmation of what AMD has said about Bulldozer all along. 33% more cores for 50% more performance, 8 core BD is showing about 50% more performance than 6 core Phenom II. I'll probably pick one up if they can deliver it at a similar cost to their current x6 lineup. Oh, and that it retains the AMD benefit of not segregating chips based on virtualization and other features. Can see myself pairing the 8 cores with 16 or so GB of ECC memory and being a happy camper.

If AMD is able to surpass the performance on launch of all intel chips current, then AMD will probably be charging appropriate prices as well. So expect them to go up if AMD surpasses the performance easily of intel.
In the above case, what we can hope for is that AMD midend offerings are wide open for tweaking or the platform itself is very scalable (bus speeds and so forth) in which case we can still attain these awesome chips, achieve awesome clocks and still beat intel on the high end easily.

If AMD's performance target is not met, then we can expect the prices be at or lower where they are now...that is to be sure.
 

maddie

Diamond Member
Jul 18, 2010
5,178
5,575
136
What do you do that takes advantage of more than 4 cores?

Intel has a 6 core desktop CPU that is much faster than your current quad, and you didnt buy it.


All I see are reviews that test single programs.

When I and probably many others here, use a computer, I will have the following. Several Firefox windows each containing 20+ tabs, an antivirus, several work programs, several pdf files open, and at times a modern 3d game.

Now tell me that "few programs use 4+ cores", applies to me.
 

beginner99

Diamond Member
Jun 2, 2009
5,320
1,768
136
its not a efficent way of doing throughput, compaired to useing 2 "real" cores.

That quote was from 2006 and Intels' policy since P4 has always been at least 1% performance gain per 1% more power usage. If it was inefficient, Atom would sure not use it.

And in certain cases benefit can be way more than 20 %. But depends on the workload.

If BD does scale so well, I just hope it also has decent single-threaded performance, meaning a lot more than Phenom II. Phenom II is probably even slower than core 2 duo clock for clock, not to mention SB.
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
What do you do that takes advantage of more than 4 cores?

Intel has a 6 core desktop CPU that is much faster than your current quad, and you didnt buy it.

By your logic, everyone with a quad core should just sit tight and forget about upgrading their system because there is no need for more cores. :colbert:

The fact is, Intel and AMD have hit the wall when it comes to how fast they can make a single core run. The best we can get as consumers now is more cores on a CPU.

8 cores will perform better than 4 cores. Programs are becoming more and more highly threaded. I'm actually amazed at how many games take advantage of a quad core.

Besides that, there are tons of programs out now that can pretty much take advantage of an infinite number of cores.
 
Status
Not open for further replies.