AMD sheds light on Bulldozer, Bobcat, desktop, laptop plans

Page 15 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
source?
Each module has 2 integer cores, that doesn't necessarily mean each int core has double the integer resources. (it could be that thats the case, I certainly don't know)

Bulldozer (Module)
2x 4 instructions per clock
up to 2x 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies and Adds per clock

Nehalem:
4 instructions per clock
up to 4x 64-bit Integer instructions per clock
up to 4x 64-bit FP Multiplies or Adds per clock

Bulldozer (Module) information comes from Chuck Moore's presentation found in http://phx.corporate-ir.net/phoenix.zhtml?p=irol-eventDetails&c=74093&eventID=2345413 .

I think that those are the correct Nehalem numbers - correct me if I'm wrong.

Additionally, since its Multiples and Adds opposed to Multiple or Adds, does that make the FP capabilities of bulldozer module, up to twice as high?

And compared to Phenom it does seem to be even higher than double integer resources - I think Phenom is 3 instructions?

Edit: Dresden boy says "With a probability >95% BD will not only be able to do 2 x 128 bit or 1 x 256 bit FMAC per BD module, but it could also do 2 x 128 bit or 1 x 256 bit FADD together with 2 x 128 or 1 x 256 bit FMUL (independend from the adds) per cycle".
 
Last edited:

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
Gaia you are comparing module to core, ilkhan is comparing core to core.

I understand that.

But if a module is equal to twice the integer resources (and the FP resources seems to be 2x times too compared to today Nehalem, although I can just be interpreting the information in a wrong way), why can't a module be compared to 2 cores?

If AMD didn't came up with those modules and just said BD had 8 integer cores and 4 FPU, would you call it a quad-core or a octo-core?

Or if the bulldozer module was 2 integer cores and 2 FPU sharing cache would it be a dual-core then? Even if a huge percentage of the time, we the users, aren't using a fraction of those FPU resources?

I bet the penalty of thread scaling comes more from other shared resources than from FPU sharing.

I think we are just seeing this as a more expensive hyper threading and since we, regular desktop consumers, most of the time don't see any benefits from it, we're having difficulties seeing how can a "4 cores +4 HT cores" be compared to 8 cores.

What we need to think is:

A 4 module, 8 core BD is an octo-core without HT. So, yes, an octo-core Nehalem sharing cache with HT will be faster. But a quad-core Nehalem with HT will be slower and another octo-core might be slower/equal/faster depending on the situation.

A 2 module, 4 core BD is a quad-core without HT. So, yes, a quad-core Nehalem with HT will be faster is more than 4 threads. But in 4 threads it can be faster/equal/slower depending on the situation.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I'm telling ya, ATI mentality of "smaller die" chip must have taken over. :D

They don't want to make 500mm2 chips, server or not. If my estimates of Nehalem core=1 BD module, an 8 module version will reach over 330mm2 even at 32nm process. If they want to do 16 module, it'll be at greater than 660mm2 if they add nothing for better scability.

With BD they can just MCM the 8 module chip if they want a 32 "core" version.

If the single thread couldn't use both of the integer cores, there wouldn't have been the need to unify the fetch and the decode units, think about that guys.
 
Last edited:

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
But if a module is equal to twice the integer resources (and the FP resources seems to be 2x times too compared to today Nehalem, although I can just be interpreting the information in a wrong way), why can't a module be compared to 2 cores?
Because thats not how AMD wants to market it?

If AMD wanted to call each module a core, yeah, it'd have double the int resources, their own version of SMT, and a fairly nifty FP setup to boot. But they want to call each module 2 cores. Its not reality (to my mind) but is how they want to play it.

NOR would each BD module have 2x the int resources compared to 2 nehalem cores. They'd each have 2 (but the Intel would have 2 sets of FP resources, as well).

IU2K- the 16 core BD is 2 4-module chips MCMed. To get to 32 "cores" theyd need 4x 4 modules MCMed.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Really? MCM as in Clovertown-style? The way the L2 and the L3 caches are shown seems it'll be similar to Dunnington, shariing the LLC.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3683&p=2

A hypothetical eight-core Bulldozer. Presumably the L3 cache would be shared by all four modules.

That's technically not an MCM. The MCM I'm talking about is the way they do in Magny Cours, where the two die is really seperate.

The FPU supports FMAC, which means a module can be compared to 2 cores, scaling be damned. :p
 
Last edited:

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
If AMD wanted to call each module a core, yeah, it'd have double the int resources, their own version of SMT, and a fairly nifty FP setup to boot. But they want to call each module 2 cores. Its not reality (to my mind) but is how they want to play it.

But if a module have twice the resources calling it a core would make a BD core 2x more powerful than a Nehalem. Except it won't use both integer cores for a same thread.


NOR would each BD module have 2x the int resources compared to 2 nehalem cores. They'd each have 2 (but the Intel would have 2 sets of FP resources, as well).

No but a module has 2x the integer resources of a single Nehalem core and the same integer resources as 2x Nehalem cores. So 1 Module = 2x Nehalem cores.

1 Nehalem has the same FP resources, but its ADD or Multiply. ADD and Multiply counts as 2x if I'm not mistaken.

Even if a BD module has the same resources and half the FP resources of 2 Nehalem CORES, is FP that important compared to integer? Or do you need so much FP?
 

deputc26

Senior member
Nov 7, 2008
548
1
76
And how do you come to the conclusion that it would be worse than hyperthreading at that point. You believe that putting that same load on a single core will net better results? How?

If you view a module as 2 cores, then what IDC is saying makes sense. With the FP unit fully tied up by one thread you lose the 80% performance boost of the second thread on the module. On an i7 you would (I think, I am not sure how i7's fpu handles 256bit instructions or if it even can) lose only the ~20% benefit of SMT which is considered a "bonus" that is not guaranteed to be there anyway. If you count a module as one core instead of two then you will lose the same amount of performance (obviously) but the operator will not feel cheated by his "8-core" only crunching 4 threads. Calling a proc an octo-core is an unspoken promise that the proc will be able to run 8 threads... always, which (though very close) is not the case for bulldozer(as in IDC's example).

Calling a module 2 cores is a mild overstatement while calling a module 1 core is an understatement. The understatement will make AMD appear to over-deliver and exceed expectations while the overstatement will result in AMD under-delivering in the eyes of most opinion leaders.

Do you want the head lines to read: "AMD Launches New 8-Core CPU... but it's barely faster than an Intel Quad"

Or: "AMD Launches New Quad-Core CPU... faster than Intel's"

Of course there are other factors and price is the most relevant metric in regards to which SKUs are in competition but people will still naturally want to compare procs of like core count, and this is going to make AMD look weak with their current interpretation of "core".
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
But if a module have twice the resources calling it a core would make a BD core 2x more powerful than a Nehalem. Except it won't use both integer cores for a same thread.
So? The "core" would be working on 2 threads, same as an Intel core does.

I wonder how sandy bridge will update HT.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
(FP is something that seems of a minefield to discuss so I'll leave that out)

Here's how Bulldozer works



Do you want the head lines to read: "AMD Launches New 8-Core CPU... but it's barely faster than an Intel Quad"

Or: "AMD Launches New Quad-Core CPU... faster than Intel's"

It doesn't matter. Core sizes are increasing every generation. 8 core DP/High-end Consumer variant of Sandy Bridge will have die sizes of upward of 300mm2(possibly 350mm2). I can't imagine how big a 12-core MP variant would result in, with all the enhancements done on MP chips for better scaling. 700mm2? If AMD fights a die size war, it'll be at a losing battle. Even if they are at similar process generation, Intel has a much better chance at succeeding with larger die chips.

I concluded that a 4 module Bulldozer would outperform a 4 core Nehalem by 30-40% in well threaded apps right? That's something very significant. 1.4x performance/mm2 and possibly performance/watt.
 
Last edited:

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
yeah, I got that.

I have no doubt that for an equal number of threads BD will be faster than nehalem. The question is, is AMD going to be selling 32nm 4 cores (2 modules) to compete with Intel's 22nm dual cores? If AMD can do that, they're in a good position. If not...we'll see.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Does AMD want to be rick-rolled or be hit by a train? Just kidding... :D

I think this'll at least guarantee their survival. They are focusing on all markets unlike before, from ultra-mobile to servers, 2W to 125W(when have they ever made a CPU that went even below 15W?).
 

JFAMD

Senior member
May 16, 2009
565
0
0
(FP is something that seems of a minefield to discuss so I'll leave that out)

Here's how Bulldozer works





I can't imagine how big a 12-core MP variant would result in, with all the enhancements done on MP chips for better scaling. 700mm2? If AMD fights a die size war, it'll be at a losing battle.

Your diagram is wrong. In the single threaded environment you have 1 thread coming in, but 2 threads moving through the module. 1 thread would come in and only 1 of the 2 integer cores would be active for that cycle. So 1 red arrow going down.

As for MP, our DP and MP parts are exactly the same size. The only difference is the number of HT links, and that is a fusing option.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
JFAMD- Thanks. Though I was trying to convey that both of the integer cores are able to combine for executing one thread. See what I'm trying to do here? On dual thread, its totally seperate, but on single thread it can go wherever it wants, no, not really, ah, I give up.

The DP/MP is for Intel, BTW.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Though I was trying to convey that both of the integer cores are able to combine for executing one thread.

That's not what happens though...you are applying the old Dresdenboy interpretation of clustered integer cores which turned out to not be correct.

The integer cores within a module do not "gang up" when only one thread is present.

However the 128bit FPU per core within the module are designed to "gang up" whenever a 256bit instruction comes down the pipe of any thread within the module.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
I am saying that calling it a quad would also be justified (I am assuming that a module is capable of devoting all resources to a single thread which i am not 100% sure of)

If the single thread couldn't use both of the integer cores, there wouldn't have been the need to unify the fetch and the decode units, think about that guys.

The problem here is that it isn't that simple, unless you are treating it as just HT with a different name.

Loosely speaking, in HT "2 virtual cores" can combine into "one big core" for single threads because it is actually just optimizing the use of one physical core's pipeline. If there's only a single thread, no problem at all. Of course, it's actually because it's "one big core" in the first place.

No doubt it would be the best thing since ham and eggs if AMD somehow accomplishes this feat (I am doubting it now, but I am actually hoping to be wrong about it). I have no idea right now why the fetch/decode was unified, and that's a very astute observation.

Even If AMD can combine a module's two integer cores to handle one thread, it will be limited in use for absolutely single-threaded environments. Since the CPU appears as 8 cores to everybody (hardware, OS, application), the thread just gets assigned to a random core. If the environment is purely single-threaded, then sure, we are guarranteed that the partner of that core is also available, and so it is feasible that it could help (had AMD been able to design the chip that way, but again I doubt it).

But what happens if there are more than one thread? The threads can be assigned to just one module, since the hardware/OS/app only sees 8 cores and has no sense of modules. So even in a 2 module Bulldozer, two threads may very well just take up 1 of the modules entirely, so we seem like we "lose" a lot of performance, or that Bulldozer has very poor scaling.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
So? The "core" would be working on 2 threads, same as an Intel core does.

But for that situation BD CMT is vastly superior to Nehalem SMT.

Then it is about Die size and price.

yeah, I got that.

I have no doubt that for an equal number of threads BD will be faster than nehalem. The question is, is AMD going to be selling 32nm 4 cores (2 modules) to compete with Intel's 22nm dual cores? If AMD can do that, they're in a good position. If not...we'll see.

Again it is all about the execution, die sizes, power envelopes and prices.

Another thing to take in consideration is how many threads will consumers be running, especially at desktop side?

I need 4 threads - dual-core Intel w/t HT, 4 cores AMD or 4 cores Intel. 2 cores Intel<<<4 cores AMD< 4 cores Intel.

At 8 and 16 threads, in the desktop side (AMD announced only up to 8 cores Zambezi), Intel 8cores/16T are the performance winners, but at what price over 8 cores BD? Will we need 16 threads?

At "guessmations" considering IPCs will be equal between Nehalem and BD, 8 core BD could be up to 90% of the performance of 8 core Nehalem at 8 with something between 2/3 - 3/4 of the size.

I agree with IU2000, ATI mentality.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Wow, the thread advanced with 4 posts as I was (slowly) typing my previous message.

That's not what happens though...you are applying the old Dresdenboy interpretation of clustered integer cores which turned out to not be correct. The integer cores within a module do not "gang up" when only one thread is present.

1 thread would come in and only 1 of the 2 integer cores would be active for that cycle. So 1 red arrow going down.

I guess the matter is laid to rest.

What then is the use of fetch/decode being unified, as IntelUser2000 pointed out?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
What then is the use of fetch/decode being unified, as IntelUser2000 pointed out?

It reduces the die-area of the module/cores at the expense of reducing the performance somewhat.

It also enables a single-threaded app to operate a bit faster because the thread then has access to 100% of the shared resources versus just its half when a second thread is active on the module.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
I agree with IU2000, ATI mentality.
You say that like it's a bad thing. ATi did pretty well. I'm guessing it is a smart move on AMD's part, because bigger and bigger die sizes is only feasible for Intel to win given their production and shipment volume.

Then it is about Die size and price.
+1. It is always about price. Right now we have an Athlon II X4 that is compared to a Core 2 Duo. It's an unfair comparison on the basis of "cores", but it's the price point that declares the comparison. As long as Bulldozer is priced accordingly, it will compete with Intel's offering that it can match or beat.

At 8 and 16 threads, in the desktop side (AMD announced only up to 8 cores Zambezi), Intel 8cores/16T are the performance winners, but at what price over 8 cores BD? Will we need 16 threads?
Eventually we should. A major concern in software development now is multi-threading. I have no idea how fast it takes to become the norm. But just as multi-core is the future, so are multi-threaded apps.

Of course, I get what you mean, and you probably mean "do we need 16 threads on the desktop by the time these products arrive", and given it is just 2 years, then no, maybe not. It'll probably be the same old story of a few games, and always video encoding software and the like that uses up all available cores.

Then again, even now, high core-count products remain shipped to enthusiasts who need to use those high core-count chips. Workstations or farms for rendering would no doubt appreciate having 16-threaded power. For regular desktops, most users will still want just a dual-core, and maybe a quad-core for serious gaming since games seem to be catching up on using multiple cores faster than most normal desktop apps.
 

JFAMD

Senior member
May 16, 2009
565
0
0
What then is the use of fetch/decode being unified, as IntelUser2000 pointed out?

Saves space, saves power, really doesn't impact performance much. It's a real easy tradeoff.

As to the other questions above, I cannot be more clear - you can't merge 2 integer cores together. But, by 2011, the prevalence of highly threaded software will be even greater than today. For servers today, quad core is the mainstream.

When Windows 95 came out people complained that there were some Win 3.1 programs that wouldn't run on it. Technology marched on. With windows 2000 people were no longer complaining about old DOS programs and win 3.1 programs because technology had moved on. I feel pretty confident that the argument in 2011 is not going to be about single core performance, its going to be more about scheduler efficiency.

And for that matter, have we decided how many angels are on the head of this pin?

My guess is that in 2011 the real discussion is going to be the price, the performance and the power consumption of competing processors. That is all people really care about.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
You say that like it's a bad thing. ATi did pretty well. I'm guessing it is a smart move on AMD's part, because bigger and bigger die sizes is only feasible for Intel to win given their production and shipment volume.

Quite the opposite - actually it even makes more sense in AMD vs Intel than AMD/ATi vs NVIDIA, since Intel is even bigger and has even a bigger slice of market share.

If you need to sell your products at a lower price then the competition (and even if AMD or ATi products were the exact same performance than their competition, most people would choose the current market leader) or just can't take market share fast enough for a reason or the other, increasing the profits on the units they can actually sell and then "hoping" that cheaper prices will make a person that was indecisive on buy/not buy to make a buy, and slowly gain some market share, seems perfectly wise to make your product as small/inexpensive as possible.

In my case if the 9800GTX+ was available at the same price of the 4850 when I bought the 4850, I would have bought a 9800GTX+ instead.

Likewise, I'm typing from this dual-core - my needs are more or less served at the time, still I could use something more in some situations. If to get to the Quad route I would have to pay €200-350+ to get a quad-core system, I would be "buy, not buy, buy, not buy...". At something like €100-175 is a lot easier to choose to satisfy the occasional need for more performance. If the price was even lower, it would be even easier to do so.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Quite the opposite - actually it even makes more sense in AMD vs Intel than AMD/ATi vs NVIDIA, since Intel is even bigger and has even a bigger slice of market share.
Ah, then we are in agreement. I must have misread your statement to be a negative (like "Damn AMD, going the ATi route.. FAIL!!!"). Apologies. :)