AMD sheds light on Bulldozer, Bobcat, desktop, laptop plans

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
They're talking about OpenCL...so I would disagree with your statement as they are almost ready now and developers already have initial SDK kits...

Let me know when Visual Studio or the Intel or the Micro-Focus compilers support OpenCL. Then add a couple of years for OS and applications support.

To put it another way, both Nvidia and ATI have had SDK's out there for their own implementations of GPGPU for a couple of years now, and I will not run out of fingers counting supporting applications.

Oh, and I think that I read that the SDK's are out of date anyway, becuase the OpenCL spec just reved, I think Apple completed it or something.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I believe that it means that every 2 cores will share one cache, and that the shared cache pairs will be using direct connect architecture with each other. So an 8 core will be 4 dual cores directly connected to each other on the DCA Bus...
This is similar to Intel's shared cache on the Core Duo, except that Intel had to connect the cache through an off-die FSB rather than directly connect to another cache on the die (much higher latency).

Huh? You mean the MCM implementations like Kentsfield or Core Duo as in CORE DUO? If you mean the former, you are correct. If you mean the latter, read on.

The reason AMD has the DCA is because they used seperate LLC(Last Level Cache), which in their case was the L2 cache. Remember when they had 1MB of L2 cache per core on a dual core configuration? Well the DCA was needed to transfer the data between the cores.

Shared cache architectures like Core Duo negates the need for DCA, because all inter-core traffic happens on the cache. The arbiter, or the router is only needed if the data needs to go off chip, as in multi-processor configurations.

"K10"(and Bulldozer too!) has a shared LLC via the L3, so the need for internal communication via the router is reduced or disappears altogether.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
I believe that it means that every 2 cores will share one cache, and that the shared cache pairs will be using direct connect architecture with each other. So an 8 core will be 4 dual cores directly connected to each other on the DCA Bus...
This is similar to Intel's shared cache on the Core Duo, except that Intel had to connect the cache through an off-die FSB rather than directly connect to another cache on the die (much higher latency).

But what is a "Bulldozer core"?

I doubt they will be launching in 2011 a dual core, even if it has 4 integer cores and a quad core even if it has 8 integer cores, when they will be releasing 6 cores and 12 cores next year.

From Anand http://anandtech.com/cpuchipsets/showdoc.aspx?i=3674

bulldozer.jpg
This is a single Bulldozer core, but notice that it has two independent integer clusters, each with its own L1 data cache. The single FP cluster shares the L1 cache of the two integer clusters.

Within each integer “core” are four pipelines, presumably half for ALUs and half for memory ops. That’s a narrower width than a single Phenom II core, but there are two integer clusters on a single Bulldozer core.

It clearly state that 1 Bulldozer core includes those 2 smaller cores (seems to refer to the integer cores).
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Might be misinterpreting your post, and if so I apologize, but won't Buldozer CPUs also have GPU on the same die, using Fusion as the controller?
*
*
*
What will be the value of Larrabee and how will NVIDIA GPUs work with Intel APU?

2011 seems as it can be an interest year. 2010, though, seems one-sided - again...

You mean on-die GPU for high-end? I'm not so sure about that. Maybe the generation after that but it seems Fusion at Bulldozer timeframe is mostly about upping IGP performance.

The way the FP is arranged makes me think pure FP wise Bulldozer will be better on 128-bit FP and Sandy Bridge on 256-bit FP, IF Sandy Bridge is using single, 256-bit FPU. Of course performance is almost impossible to predict. Up until now, we had more confirmed info about Sandy Bridge than Bulldozer. Now, its the opposite.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
You mean on-die GPU for high-end? I'm not so sure about that. Maybe the generation after that but it seems Fusion at Bulldozer timeframe is mostly about upping IGP performance.

I mean that Bulldozer will always have on-die GPU to utilize as GPGPU and eventually basic GPU functions, as today IGP, just faster on the graphics department.

Sincerely I don't see a reason to glue a IGP on the same die if not for that reason.

People can say it is to save costs, but a high-end platform, especially for desktop where game performance can be important, I don't see the need for an IGP and no IGP - on die or otherwise - at all would be cheaper.

Again from Anand

Much heavy FP work is expected to be moved to the GPU anyway, there’s little sense in duplicating FP hardware on the Bulldozer core when it has a fully capable GPU sitting on the same piece of silicon. Presumably the Bulldozer cores and the GPU will share the L3 cache. It’s really a very elegant design and the basis for what AMD, Intel and NVIDIA have been talking about for years now. The CPU will do what it does best while the GPU does what it is good at.

Or are versions of bulldozer without on-die GPU are planned?

Then for High-end Graphics you still have a discrete GPU - at least for now.

EDIT: Actually this keeps seeming better and better, if AMD can tap all the potential of Fuzion and if Bulldozer delivers in the CPU side.

Considering stuff like Hydra, if it is possible to boost the discrete GPU performance, with the on-die one; shut down the discrete GPU if it isn't being used to save power; accessing the discrete GPU if those resources can be used in something, etc.

Sounds very good if AMD can deliver.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
But what is a "Bulldozer core"?

I doubt they will be launching in 2011 a dual core, even if it has 4 integer cores and a quad core even if it has 8 integer cores, when they will be releasing 6 cores and 12 cores next year.

From Anand http://anandtech.com/cpuchipsets/showdoc.aspx?i=3674


It clearly state that 1 Bulldozer core includes those 2 smaller cores (seems to refer to the integer cores).

Yes this is the source of my current confusion...is this a dual-core or a quad-core Bulldozer CPU:
2BulldozerCores.jpg


and likewise is this a quadcore or is this an octo-core Bulldozer:
4BulldozerCores.jpg


I could care less what AMD is a calling core juniors to avoid using the term hyperthreading...what I care about is when AMD sayz Zambezi will be quad and octo-core are they are talking about four BD cores capable of 8 threads or four bulldozer mini-cores capable of four threads?
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
Yes this is the source of my current confusion...is this a dual-core or a quad-core Bulldozer CPU:
*
*
and likewise is this a quadcore or is this an octo-core Bulldozer:
*
*
I could care less what AMD is a calling core juniors to avoid using the term hyperthreading...what I care about is when AMD sayz Zambezi will be quad and octo-core are they are talking about four BD cores capable of 8 threads or four bulldozer mini-cores capable of four threads?

While I'm not 100% sure, I would call that a dual and a quad.

I'm not sure what miracles that architecture will pull, but will 2C/4T and 4C/8T have a hope in hell to be faster than whatever Intel has by then? Cause Sandy Bridge will be up to 8C/16T threads.

Actually will 2C/4T and 4C/8T be able to beat a 6 Phenom II cores or 12 Phenom II cores in heavy threaded scenarios? Even considering speed advantage and architecture advantage? And by how much to make any difference? OR will AMD satisfy itself with fighting current Nehalem?

Even without doing any math, that seems to me quite a BIG leap!
 
Last edited:

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
There's no way they'll be able to pair a end-2010 GPU with Llano, unless Llano is releasing at sometime like say, September. From what I heard the GPU performance will be at Radeon 4700-ish levels and will feature a 5x00 derivative core.

The GPU performance looks quite impressive although its not going to be up to the HD47xx level. It looks like it has either 4~6 SIMD cores from the die shot. Im guessing its performance would be more inline with the HD43xx~45xx series.

But when compared to everything else in the low end, this might pose a big problem for the other competitors.

I mean wow! maybe this will bring back PC gaming :p
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
Well I asked on the article and Anand kindly replied.

How does AMD count the cores? by GaiaHunter, an hour ago
There is been some discussion about this on the forums.

When AMD says 4 Bulldozer cores, does it means 4Cores/8Threads or does it means a pair of bulldozer cores each with its "2 tightly linked together cores" able to do 4 threads?

Thank you.


RE: How does AMD count the cores? by Anand Lal Shimpi, 45 minutes ago
I believe it means 4 cores/8 threads, but I will ask AMD to confirm.

Take care,
Anand

Reply | Report Post
RE: How does AMD count the cores? by Anand Lal Shimpi, 6 minutes ago
Confirmed. 4 cores/8 threads, each Bulldozer core can handle two threads.

Take care,
Anand

So it is 4cores/8threads and 8cores/16threads.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
While I'm not 100% sure, I would call that a dual and a quad.

I'm not sure what miracles that architecture will pull, but will 2C/4T and 4C/8T have a hope in hell to be faster than whatever Intel has by then? Cause Sandy Bridge will be up to 8C/16T threads.

Actually will 2C/4T and 4C/8T be able to beat a 6 Phenom II cores or 12 Phenom II cores in heavy threaded scenarios? Even considering speed advantage and architecture advantage? And by how much to make any difference? OR will AMD satisfy itself with fighting current Nehalem?

Even without doing any math, that seems to me quite a BIG leap!

You know what, what didn't make sense to me was that how in the hell will they achieve 16 core they hope to achieve on the server versions if each "Cluster" is indeed a core. Even assuming just twice many cores and caches it would have ballooned to over 600mm2 but I forgot one statement I heard, and your logic reminded me too. Thanks.

What I heard was initial Bulldozer "16 core" versions are actually two 8 core versions on an MCM. The conflict with the 16 core was what initially made me think each mini core would count as a "Core". Maybe you are right. A cluster is a core.

Cookie Monster: Are you sure about the performance or is it a guess? Well the "source" I have heard didn't seem too reliable but it seemed to have a merit. Clarkdale's IGP is supposed to perform at least on par with 785G if not better and CPU assisting GPU could be one of the big contributors to it. Maybe in reality its up to HD4500 level. We'll see.

Update: After reading Anand's reply, another thing came to me. If they needed to claim they have multi-threading of some sort, the mini-core=core approach wouldn't have worked. Man, I can't think today. Good to know.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Gaia, I think regarding the on-die IGP its important to read the whole paragraph.

"there’s little sense in duplicating FP hardware on the Bulldozer core when it will eventually have a fully capable GPU sitting on the same piece of silicon (not in the first generation unfortunately as Llano doesn't use Bulldozer cores). Presumably the Bulldozer cores and its eventual GPU pair will share the L3 cache. It’s really a very elegant design and the basis for what AMD, Intel and NVIDIA have been talking about for years now. The CPU will do what it does best while the GPU does what it is good at."

The words "eventually", "basis", "presumably" would possibly mean that for the first incarnation, it would leave to be low-end graphics rather than CPU-assisting GPU.

Update: Here, take a look at the second pic: http://www.semiaccurate.com/forums/showthread.php?t=1076

That's the full die shot. It looks like the larger pic has the GPU cropped to show the CPU die. According to Xbitlabs, the GPU features 480SPs.

Considering the CPU is L3-less and Phenom II-based, the 1 billion transistor count would mean majority of the transistors are off the GPU part(Phenom II without L3 has 300 million transistors). And it seems GPUs are much more dense than CPU cores per transistor! Meaning, GPU cores are simpler than CPU logic and more in line with SRAM.
 
Last edited:

redpriest_

Senior member
Oct 30, 1999
223
0
0
OMG, Bulldozer is horribly misbalanced.

I didn't think AMD could shock me any more, but they did.

They had better wish for two things to happen in the next two years:

1) General purpose FP loads are moved to GPU's.

2) Applications become much, much more threaded.

If that doesn't happen, then AMD will be in big, big trouble.

Can I ask where are you getting this idea from? Keep in mind this is a vastly simplified overview of the microarchitecture; and not a full disclosure; I imagine that will be for sometime later. Let me add that just because it's not there doesn't mean it doesn't exist. I think you will be pleasantly surprised when it's released.
 

WhoBeDaPlaya

Diamond Member
Sep 15, 2000
7,415
404
126
Wait...so now we are finding out that AMD's "the future is fusion" marketing strategy is really more like "back to the future" with CPU's being Integer processors and the FPU being shoveled into math-coprocessors?
Picked the wrong example, good sir. AMD's 386DX40 ruled all ;)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Ha ha, touché!

TBH I really look forward to fusion, I don't care where the computations take place within my computer so long as it is cheap and amenable to overclocking so I can satiate my hobby desires.

If AMD wants to implement an ISA that includes FPU processing in an APU then I am all for it provided I can get compiler support to port my codes. If it turns out to be another 3DNow! adventure then its not going to be so great, but only time will tell. Nvidia and Cuda are definitely making advances here well before AMD gets there feet on the ground, that only makes AMD's job all the more challenging.
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
This is definitely an interesting move. And all in the name of faster multi-threaded processing.

Given a single fetch stage, isn't that going to count as one processor to the OS? I mean, fetch can't (AFAIK) load different instructions from different memory locations... Or can it?

The focus on integer math as opposed to floating point math doesn't surprise me one bit. Most applications have VERY light uses of floating point math (IE, non-existent) So why would you spend the time to make it just as fast as your integer processing? Couple that with the fact that floating point math takes more transistors and you have your reason for AMD's heavy focus on integer processing.

What I'll be interested to know is if different CPU cores are going to "Share" the work load. IE "Hey, I'm out of Integer units, let me use some of yours". That would truly be an interesting design feature. I doubt that is the case, as AMD said it was trying to keep single threaded apps from suffering, but still, it would be interesting.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Given a single fetch stage, isn't that going to count as one processor to the OS?
Whatever the theory, it's already been said that 1 physical BD core will appear as two cores to the OS, similar to the effect of Intel's hyperthreading.
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
BTW, if AMD plays it's cards right, Bobcat looks well poised to clean the clock of Intel's Atom. An OO processor is going to run circles around intels in order atom.

Hopefully AMD has the brains not to saddle bobcat with a northbridge that consumes more power then it (Boneheaded move intel)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
This is definitely an interesting move. And all in the name of faster multi-threaded processing.

Given a single fetch stage, isn't that going to count as one processor to the OS? I mean, fetch can't (AFAIK) load different instructions from different memory locations... Or can it?

The focus on integer math as opposed to floating point math doesn't surprise me one bit. Most applications have VERY light uses of floating point math (IE, non-existent) So why would you spend the time to make it just as fast as your integer processing? Couple that with the fact that floating point math takes more transistors and you have your reason for AMD's heavy focus on integer processing.

What I'll be interested to know is if different CPU cores are going to "Share" the work load. IE "Hey, I'm out of Integer units, let me use some of yours". That would truly be an interesting design feature. I doubt that is the case, as AMD said it was trying to keep single threaded apps from suffering, but still, it would be interesting.

JH has said not to read too much into the implied architectural limitations one might divine from sifting the tea leaves of the power-point slideware and diagrams (I am paraphrasing of course) and I completely agree with him.

It might appear to you and me that the fetch unit is simplistic and potentially limiting, but we would be doing ourselves a disservice to assume the power-point slide fully represents the attributes and capabilities of the fetch unit. (or any other architectural unit for that matter)

We take these diagrams to mean "at a minimum this much can be assumed to be true/guaranteed". With that initial condition, plus a few boundary conditions, we can set about parsing thru the ODE's that underlie our organic-based speculation processing units.
 
Last edited:

nonameo

Diamond Member
Mar 13, 2006
5,902
2
76
BTW, if AMD plays it's cards right, Bobcat looks well poised to clean the clock of Intel's Atom. An OO processor is going to run circles around intels in order atom.

Hopefully AMD has the brains not to saddle bobcat with a northbridge that consumes more power then it (Boneheaded move intel)

I don't know, I don't think it was boneheaded... it got rid of old chipsets, and they had no competition really, so who is going to present anything better?
 

the kernel

Junior Member
Jul 1, 2008
19
0
0
Let me know when Visual Studio or the Intel or the Micro-Focus compilers support OpenCL. Then add a couple of years for OS and applications support.

To put it another way, both Nvidia and ATI have had SDK's out there for their own implementations of GPGPU for a couple of years now, and I will not run out of fingers counting supporting applications.

Oh, and I think that I read that the SDK's are out of date anyway, becuase the OpenCL spec just reved, I think Apple completed it or something.

Indeed, there are two problems with this that I see:

1) Current GPU compute resources are only useful for certain types of very select workloads, for most applications they are not going to be useful.

2) Software development optimizes for Intel not for AMD since they only have 13% of the market.

I think it's clear what the intention behind Bulldozer is...this is another design aimed squarely at the enterprise market. HPC workloads are much more compatible with this sort of optimization and have developers willing to tune for a specific architecture so this should work out splendidly there. However, I don't think the integrated GPU approach is going to mean much for the consumer market except as a cost cutting measure.

One could argue that since Intel is going the same route with Larrabee and integration that this would also benefit AMD but I'm not convinced of that at all. Larrabee is a much more general purpose architecture and it doesn't follow that just because both use OpenCL that code will optimize for both equally--these are radically different approaches.
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
The core count thing can be simplified as "are they counting cores based on FP or Int units?". Im guessing FP units, as each core showing as 2 in task manager makes more sense that way. So 4/8 would be 4 cores, 8 threads. Its a high end chip though, so they may do 8 cores (16 threads), but I doubt it.

llano is the 32nm phenom II shrink + on die graphics. PII based, not bulldozer based. Timeframe and specs look to make it in competition with sandy bridge, which is also 4 cores with on die GPU. Bottom right box in their roadmap: (OMG YAY for in line images!)
desktoproadmap.jpg
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Wait...so now we are finding out that AMD's "the future is fusion" marketing strategy is really more like "back to the future" with CPU's being Integer processors and the FPU being shoveled into math-coprocessors?

Floating Point calculations are still being performed on the CPU, there is just more parallel Integer computing power available. Which from what I know about multithreading thus far is about right for the average workload. I don't see why this is a big issue, as it is easier to break up integer intensive code than Floating Point intensive code for multithreading. It isn't like the FPU is slower than the rest of the chip, there are just fewer of them. (And fewer of them are likely to be used as well, so this seems like a natural evolution to me at least).

Chances are I am completely wrong about this, but I am sure you will let me know if I am.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,732
432
126
The core count thing can be simplified as "are they counting cores based on FP or Int units?". Im guessing FP units, as each core showing as 2 in task manager makes more sense that way. So 4/8 would be 4 cores, 8 threads. Its a high end chip though, so they may do 8 cores (16 threads), but I doubt it.
This is really confusing in terms of definitions, but the building blocks are "Bulldozer Module", each able to do 2 threads, thanks to their "mini cores".

You can read that in the Server Platforms PDF from their Analyst Day found in http://phx.corporate-ir.net/phoenix.zhtml?c=74093&p=irol-analystday .

The definition we currently give to core is filled by the "Bulldozer Module".

When they say 4/8 cores (and in the Client Platform PDF, the "Scorpius Platform" is described as being 4-8 (I read this as 4 to 8) 32nm Bulldozer cores.

Now, it gets confusing, because AMD describes their way of doing "hyper threading" has 1 "mini-core" (as IDC said, if anyone knows of a better word to describe it, please be my guest) per thread.

As AMD told Anand, when they say 1 Bulldozer Core, they are referring to the unit capable of doing 2 threads, not the individual "mini-core".

From all this, I infer that AMD will release a CPU with 8 of those Bulldozer cores or modules, per die, for the desktop and up to 16 of them for Servers.

What I still didn't understand is how will AMD build their dies - are these Quad-Cores and then a Octo-core is 2 Quads MCM'ed? Or can AMD just throw these things individually as in "Want 12 cores just put 12 modules, want 16 put 16 modules" instead of having to have 3 or 4 Quads MCM'ed?

"Valencia" is 6 cores and 8 cores. Does that means 2 MCM'ed quads with 2 disabled cores?
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
This is really confusing in terms of definitions, but the building blocks are "Bulldozer Module", each able to do 2 threads, thanks to their "mini cores"
...
When they say 4/8 cores (and in the Client Platform PDF, the "Scorpius Platform" is described as being 4-8 (I read this as 4 to 8) 32nm Bulldozer cores.
...
As AMD told Anand, when they say 1 Bulldozer Core, they are referring to the unit capable of doing 2 threads, not the individual "mini-core".
Im reading it as this:
captureajc.jpg
is a bulldozer "module" / bulldozer core. Each of which has 2 int groups and a FP group and shows up as 2 cores in task manager. L2 is "shared" within the core, L3 is shared within the die.

(my definition was count the FP group to count cores, as theres only one per bulldozer "module")

As has been mentioned above, I don't think we'll see any BD CPUs without an on die GPU. It may be used solely for FP, but it'll be present.
 
Last edited: