Some Bulldozer and Bobcat articles have sprung up

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JFAMD

Senior member
May 16, 2009
565
0
0
If I read the small print correctly, AMD isn't even banking on improved IPC.
http://www.anandtech.com/Gallery/Album/754#6


To me that reads like: "Okay, there will be a loss on serial single-threaded workloads, but we managed to keep this loss under control, so it's not going to be significant."
So the approach seems to favour parallel workloads, more cores per die, less transistors per core... that sort of thing. But not better IPC.

This is where reading the small print and drawing conclusions vs. asking the question will get you in trouble.

The comparison is NOT against previous generation. The comparison is two threads running on one module vs. 2 threads running on two complete dedicated cores.

Very minor impact in overall performance per thread, huge impact in power savings and die level savings (cost).

I have said it several times. IPC WILL BE HIGHER.
 

JFAMD

Senior member
May 16, 2009
565
0
0
Thanks for the info. I have to say, this is a surprise. What happened to the "4/8 CPU" mentioned in an old slide before? You said to interpret that as "there will be a 4-core and 8-core variants available" instead of "4c/8t" since no HT on BD. The slide in question is below, but I can't link to your response - I forgot which forum it was, either here, at SA, or at AMDZone, I suppose.
desktoproadmap.jpg



I hope Scali's response has had you enlightened already. For the past weeks (or was it months already) it is getting tiresome having to explain "HT core", "logical core", "real core" over and over to people who think hyperthreading produces "one real + one hyperthreaded/logical core", and similar such fallacies regarding "logical cores", whether in HT context or not.

I can't speak to the different implementations. Just as we are doing 12 and 16 cores on Interlagos they could well be doing a 4 core Zambezi. The only data that I will comment on is the max cores, which is 8. I am not a client guy.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Ht can only carry you so far and OoO will only cary you so far.
You say that as if only Intel had OoO, and that it is a minuscule contribution. The fact that Atom is reviled by all should tell you otherwise, along with the fact that AMD CPUs are also OoO.

It sounds like AMD was well aware of the fact that most CPU cores go underutilized "thanx microsoft" and I would imagine that before they taped out they looked at current Intel CPUs and made the call as to wether the Intel HT process was going to be better or worse. they could have fixed it before tape out. So that should tell you a bunch right there.
A lot here doesn't make sense:

1) "thanx microsoft"? Are we blaming Microsoft now for CPU woes? I suppose it is also Microsoft's fault that StarCraft II reportedly only uses two cores?

2.) AMD was aware that most cores go underutilized so they made an architecture that is optimized for multi-threaded scenarios instead of single-threaded? Don't you just think that is a bit of "We need X, therefore let's build Y instead of X"? Wouldn't AMD have optimized for single-threaded performance if they were aware that most cores go underutilized anyway because single-threaded performance was all that still mattered?

3.) (off topic) You could try to double check your spelling, and use proper punctuation, and try to state your thoughts clearer. When you argue like you do, and don't bother to fix these small things, it makes responding to you harder, and makes arguments longer than necessary. I am not a native English speaker myself, English is only my second language, and I speak it probably 5-10% of a day only, but I do my best, anyway. Peace :)
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
The comparison is NOT against previous generation. The comparison is two threads running on one module vs. 2 threads running on two complete dedicated cores.
Exactly what I said. Thank you for confirming, because until you did, all I had as basis was my own deductive reasoning, which means squat as a proof.

I have said it several times. IPC WILL BE HIGHER.
Thank you for confirming this as well. Anand's wording was vague enough as to be debatable, and you stating it this clearly is very much appreciated.

The only data that I will comment on is the max cores, which is 8.
Great, I wasn't arguing against that. I merely remembered the slide, and your response to it, which then became my basis for saying Zambezi will come out as a quad-core and an octo-core. Never did (and never meant to) imply that Zambezi was "only" a quad-core.
 

ModestGamer

Banned
Jun 30, 2010
1,140
0
0
You say that as if only Intel had OoO, and that it is a minuscule contribution. The fact that Atom is reviled by all should tell you otherwise, along with the fact that AMD CPUs are also OoO.


A lot here doesn't make sense:

1) "thanx microsoft"? Are we blaming Microsoft now for CPU woes? I suppose it is also Microsoft's fault that StarCraft II reportedly only uses two cores?

2.) AMD was aware that most cores go underutilized so they made an architecture that is optimized for multi-threaded scenarios instead of single-threaded? Don't you just think that is a bit of "We need X, therefore let's build Y instead of X"? Wouldn't AMD have optimized for single-threaded performance if they were aware that most cores go underutilized anyway because single-threaded performance was all that still mattered?

3.) (off topic) You could try to double check your spelling, and use proper punctuation, and try to state your thoughts clearer. When you argue like you do, and don't bother to fix these small things, it makes responding to you harder, and makes arguments longer than necessary. I am not a native English speaker myself, English is only my second language, and I speak it probably 5-10% of a day only, but I do my best, anyway. Peace :)

1. Microsoft is the problem with multithreading performance. The Os should be handling those tasks. The application should be oblivous to the core count. Look at the BEOS and HaikuOs websites.

2. They are attacking both fronts. They are implementing SMT "HT" in hardware vrs in a quasi emulated core sense. It should be faster all thing considered.

3. I am a horriable speller and I can't type at 10% of the speed I think. I could read what I wrote twice and not see the mising words. If you don't like my posts.

Don't reply.
 

extra

Golden Member
Dec 18, 1999
1,947
7
81
If I read the small print correctly, AMD isn't even banking on improved IPC.
http://www.anandtech.com/Gallery/Album/754#6


To me that reads like: "Okay, there will be a loss on serial single-threaded workloads, but we managed to keep this loss under control, so it's not going to be significant."
So the approach seems to favour parallel workloads, more cores per die, less transistors per core... that sort of thing. But not better IPC.

Don't try to spin it some weird way. AMD has said, unequivocally, that bulldozer will have higher single thread performance than phenom II.
 

khon

Golden Member
Jun 8, 2010
1,319
124
106
So now we know what BD design looks like, information that would be available well before it taped out. But we know it has taped out, so when are we going to see information on what the performance is like ?

The only thing I've seen so far was the claim of 50% increase in performance for 33% increase in cores. Which I guess would mean that an 8-core BD is roughly 50% faster than a 6-core thuban in a fully threaded workload. That's obviously a nice improvement, but it's not going to be enough to compete against LGA2011 SB models, so is AMD once again leaving the high end to Intel, and settling for competing with LGA1155 SB mainstream models ?

Not that I'd mind getting an 8-core BD for about the same price as a 4C/8T SB, especially if they support more than 16 PCI-E lanes for graphics.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
I really really really doubt AMD dropped the ball here. The have been releasing blah products for a few years now while throwing massive R&D money at this design and overall architecture.

Be careful there...
You're saying "Because AMD has sucked for the past few years, they are going to be successful now".
I think in general you'll find the opposite to be true. If a company has sucked for a number of years, they can't just bounce back. They worked for years on the Barcelona architecture aswell, and that was when they were NOT sucking (so they had more profits, and more to invest in R&D), and we all know how much they didn't drop the ball there.
There's no 'secret sauce' as I said many times before.

Your also stating that things on the chip are anemic. How do you know this ?

As I said, we've had CPUs with 3 or more ALUs and AGUs for years (Pentium III was the last dual ALU one pretty much), and now AMD is going back to 2 ALUs and AGUs.

It sucks when the CPU has to do the job the OS should be doing as far as threading

Complete nonsense.
An OS can only run threads. The software developer has to optimize the code for threading. All modern OSes support threading. Nothing any HT or similar CPU does is 'the work of the OS'.
 

ModestGamer

Banned
Jun 30, 2010
1,140
0
0
Be careful there...
You're saying "Because AMD has sucked for the past few years, they are going to be successful now".
I think in general you'll find the opposite to be true. If a company has sucked for a number of years, they can't just bounce back. They worked for years on the Barcelona architecture aswell, and that was when they were NOT sucking (so they had more profits, and more to invest in R&D), and we all know how much they didn't drop the ball there.
There's no 'secret sauce' as I said many times before.



As I said, we've had CPUs with 3 or more ALUs and AGUs for years (Pentium III was the last dual ALU one pretty much), and now AMD is going back to 2 ALUs and AGUs.



Complete nonsense.
An OS can only run threads. The software developer has to optimize the code for threading. All modern OSes support threading. Nothing any HT or similar CPU does is 'the work of the OS'.


your ignorance bleeds through your post.

threading of a application in a OS should be handed by the OS. The applications need not know what hardware there is. That is the job of the OS. Go look at the BEOS and HaikuOS. They show how this should be done.

Well I doubt AMD is going bust over bulldzoer but my gut tells me if they didn't think they had a winner. It would not have tapped out. The first iteration might be a bit shaky or not exactly what they hoped for but the refresh will most likely adress those isues.


Ford sucked for year. Look at them today.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
1. Microsoft is the problem with multithreading performance. The Os should be handling those tasks. The application should be oblivous to the core count. Look at the BEOS and HaikuOs websites.
Applications are oblivious to the core count. They spawn threads (not request cores or anything), and the OS caters to their needs by scheduling any such threads time with CPU, and now that CPUs have multiple cores, they are scheduled to available cores as is ideal. I can spawn 16 threads on a program I create, and those 16 threads will be handled by the OS despite having only a quad core, or even a single core CPU. It's basic multi-tasking, and OSes have that down to a pat. But if I create a program that only ever uses one single thread, then my quad core will perform just as fast as if it were only a single core CPU.

The problem is that applications don't request/spawn/need many threads at all when the processing needs are serial in nature. In fact, in such a scenario, they can only really use one. That is not Microsoft's fault.

It will be impossible for the OS to "multi-thread" an application that does not work on anything more than a single thread. The OS will have no way to transform a serial workload into a parallel workload, especially a program it knows nothing about. At least, not by non-magic means, and if actually done without magic, that would be a major breakthrough in parallel / multi-threaded programming. I would certainly want in on that, because as it is now, I have to go the painstaking route of optimizing my programs to use multiple threads, and it is no easy task figuring out the best way to parallelize as much as possible from what used to be, or easily are, serial programs (if at all possible - sometimes the program is simply 90% serial, and parallelizing it is impossible or impractical given the costs (code complexity, which affects costs related to development, debugging and maintenance) versus the gain).

2. They are attacking both fronts. They are implementing SMT "HT" in hardware vrs in a quasi emulated core sense. It should be faster all thing considered.
I do not know where to start here, and calling Intel's HT implementation as "quasi emulated core" just makes me wonder more if you actually understand the topic (but just like calling it what it isn't), or you actually don't (hence you come up with nonsensical descriptions).

3. I am a horriable speller and I can't type at 10% of the speed I think. I could read what I wrote twice and not see the mising words. If you don't like my posts. Don't reply.
Sorry, perhaps I should not have brought it up. I hope you are not mad.



I give up on this topic. For one thing, all of what we are talking about now is actually off-topic. The real place for this is another thread (or, if #1, then that thread should be in Programming or OS subforums). So have your say if you please, then we'll let it go so as not to continue with the derailment.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Doesn't cache usually take up a significant portion of the die? Yes, it will be smaller like you said, but it seems like having a lower clock speed would provide a significant power savings.

Look at the die size differences between Deneb and Propus and compare the power-consumption. To be sure cache is not zero power-consumption but is about the lowest power-consuming circuitry you can put on a cpu. About the only drawback is the required die-area (which translates to higher production cost).
 

Scali

Banned
Dec 3, 2004
2,495
0
0
your ignorance bleeds through your post.

Lol yea, right.

threading of a application in a OS should be handed by the OS. The applications need not know what hardware there is. That is the job of the OS. Go look at the BEOS and HaikuOS. They show how this should be done.

And you call ME ignorant?
I don't know what you do for a living, but I write multithreaded software for a living.
I think I have a pretty good idea of where CPU, OS and applications fit into the picture.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
With a 4-way Decoder it means it can issue 4 Instructions per Thread per Cycle vs 3-way Decoder in Phenom II (3 Instructions per Thread per Cycle). Correct me if im wrong

Phenom II has 3 Execution Ports inside the INT execution Unit (3 ALUs + 3 AGUs/Ld/ST ) but Bulldozer has 4 Execution Ports inside the INT execution Unit ( 2 ALUs + 2 AGUs ) plus one Ld/ST unit. So BD can execute 4 Instructions per Thread per Cycle and Phenom II can execute 3 Instructions per Thread per Cycle, in other words, IPC will go up and not down.

Feel free to correct me where im wrong
 

StrangerGuy

Diamond Member
May 9, 2004
8,443
124
106
Yea I know... Barcelona will also be 40% faster than Kentsfield.

Hey Mr. know-it-all, ever since the K7 debut AMD always have increased their IPC till now, excluding the TLB bugged Phenoms. Oh wait, maybe you don't.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
With a 4-way Decoder it means it can issue 4 Instructions per Thread per Cycle vs 3-way Decoder in Phenom II (3 Instructions per Thread per Cycle). Correct me if im wrong

The decoder is shared...
So it can decode 4 instructions per 2 threads per cycle.

Phenom II has 3 Execution Ports inside the INT execution Unit (3 ALUs + 3 AGUs/Ld/ST ) but Bulldozer has 4 Execution Ports inside the INT execution Unit ( 2 ALUs + 2 AGUs ) plus one Ld/ST unit. So BD can execute 4 Instructions per Thread per Cycle and Phenom II can execute 3 Instructions per Thread per Cycle, in other words, IPC will go up and not down.

In theory yes... But the question is whether a balance of 2+2 ALUs and AGUs is good.
It can execute at most 2 ALU instructions per cycle.
If most instructions are ALU rather than AGU, then you'd be better off with the 3 ALU ports of the Phenom II.

That's the thing... the CPU cannot choose the instructions to execute. It will just have to execute whatever is being fed to it by the software.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Hey Mr. know-it-all, ever since the K7 debut AMD always have increased their IPC till now, excluding the TLB bugged Phenoms. Oh wait, maybe you don't.

AMD has only added and improved execution units since K7... this is the first time they're going to REMOVE them... Might want to pay attention to the topic at hand before shouting your mouth off.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
With a 4-way Decoder it means it can issue 4 Instructions per Thread per Cycle vs 3-way Decoder in Phenom II (3 Instructions per Thread per Cycle). Correct me if im wrong

Phenom II has 3 Execution Ports inside the INT execution Unit (3 ALUs + 3 AGUs/Ld/ST ) but Bulldozer has 4 Execution Ports inside the INT execution Unit ( 2 ALUs + 2 AGUs ) plus one Ld/ST unit. So BD can execute 4 Instructions per Thread per Cycle and Phenom II can execute 3 Instructions per Thread per Cycle, in other words, IPC will go up and not down.

Feel free to correct me where im wrong

BD can retire 4 instructions per clock.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
1. Microsoft is the problem with multithreading performance. The Os should be handling those tasks. The application should be oblivous to the core count. Look at the BEOS and HaikuOs websites.
Applications are oblivious to core counts in Windows and every other major operating system as well. MS is NOT the problem with multithreading performance. Windows has supported multithreading since the first version of NT more than 17 years ago.
 

heyheybooboo

Diamond Member
Jun 29, 2007
6,278
0
0
Applications are oblivious to core counts in Windows and every other major operating system as well. MS is NOT the problem with multithreading performance. Windows has supported multithreading since the first version of NT more than 17 years ago.

Ding! Ding! Ding!

They have not dealt with NUMA so well, however, but are slowly coming around ...





--
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Applications are oblivious to core counts in Windows and every other major operating system as well. MS is NOT the problem with multithreading performance. Windows has supported multithreading since the first version of NT more than 17 years ago.

It's very naive to put the fault entirely on software. Development on software costs have increased exponentially, and for a lot of developers multi-thread development is probably not at the top of their list.

With finite amount of resources, time, and manpower, focusing on optimization means taking off resources from what really matters on software, which is usability.

Server is mostly throughput oriented, while on the PC its latency and response time dominant. There is of course overlap between the two, but the statement is true in general. I'd like to say its not all about PC, but the discussion about Windows means the main focus for lots of users in this thread is about PC.

Imagine if Blizzard spent their resources on Starcraft II for multi-threading and graphics rather than things like gameplay. Sure, it might have been technically impressive, but I doubt it would have sold as much as it did.
 

Cogman

Lifer
Sep 19, 2000
10,277
125
106
Applications are oblivious to core counts in Windows and every other major operating system as well. MS is NOT the problem with multithreading performance. Windows has supported multithreading since the first version of NT more than 17 years ago.

I want to correct this, Application DEVELOPERS are oblivious to core counts. Windows and every other major OS that I know of provide a method for an application developer to find out how many cores are in a system. Application developers, however, choose not to use this.

But I agree. MS, and just about any OS, are not to blame for crappy multithreaded applications. Honestly, their threading apis are dead simple to use. However, Application developers are scared of them. Race conditions, deadlocking, ect (which are fixed by proper coding practices) are what scare them.
 

Cogman

Lifer
Sep 19, 2000
10,277
125
106
It's very naive to put the fault entirely on software. Development on software costs have increased exponentially, and for a lot of developers multi-thread development is probably not at the top of their list.

With finite amount of resources, time, and manpower, focusing on optimization means taking off resources from what really matters on software, which is usability.

Server is mostly throughput oriented, while on the PC its latency and response time dominant. There is of course overlap between the two, but the statement is true in general. I'd like to say its not all about PC, but the discussion about Windows means the main focus for lots of users in this thread is about PC.

Imagine if Blizzard spent their resources on Starcraft II for multi-threading and graphics rather than things like gameplay. Sure, it might have been technically impressive, but I doubt it would have sold as much as it did.

Blizzard DID spend resources on multi-threading and graphics. Yes, it added to the overhead. But it also added to the quality of the application.

Not every application needs to be threaded, nor does every application need an Opengl interface. However, where they do, it seems developers often take the lazy way out.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
If Bulldozer has significantly better performance, then I can see AMD raising the prices, and then a 2 module Bulldozer will compete with current 4-core CPUs...
If not, then AMD may have to continue the Thuban trick: more cores at a lower price. In which case we might see 4 module Bulldozers competing with 4-core CPUs.

I sure hope AMD doesn't continue with the "Thuban Trick".

However, could it be BD is compact and efficient enough to pull it off without costing too much to build?
 
Last edited:

khon

Golden Member
Jun 8, 2010
1,319
124
106
I sure hope AMD doesn't continue with the "Thuban Trick".

However, could it be BD is compact and efficient enough to pull it off without costing too much to build?

An 8-core BD could well be smaller than a 4-core SB, since adding the four secondary cores apparently only increases size by 5%, and it doesn't have the IGP that SB does.