AMD sheds light on Bulldozer, Bobcat, desktop, laptop plans

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Cogman

Lifer
Sep 19, 2000
10,277
125
106
JH has said not to read too much into the implied architectural limitations one might divine from sifting the tea leaves of the power-point slideware and diagrams (I am paraphrasing of course) and I completely agree with him.

It might appear to you and me that the fetch unit is simplistic and potentially limiting, but we would be doing ourselves a disservice to assume the power-point slide fully represents the attributes and capabilities of the fetch unit. (or any other architectural unit for that matter)

We take these diagrams to mean "at a minimum this much can be assumed to be true/guaranteed". With that initial condition, plus a few boundary conditions, we can set about parsing thru the ODE's that underlie our organic-based speculation processing units.
Point taken.

I guess the one thing that we can tell from the slides and comments is that Bulldozer is going to be a significant change from the phenom architecture. Time will tell if this will be a good change.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,628
158
106
Im reading it as this:
captureajc.jpg
is a bulldozer "module" / bulldozer core. Each of which has 2 int groups and a FP group and shows up as 2 cores in task manager. L2 is "shared" within the core, L3 is shared within the die.

(my definition was count the FP group to count cores, as theres only one per bulldozer "module")

As has been mentioned above, I don't think we'll see any BD CPUs without an on die GPU. It may be used solely for FP, but it'll be present.

So far we agree.

What I don't agree is when u say 4/8 on that image means 4C/8T. I believe they saying Zambezi will be released in 4 and 8 cores configurations or in other way 4C/8T and 8C/16T.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,628
158
106
It looks like based on this post on the semi-accurate forums that 1 "bulldozer module" is two "cores", since John Fruehe says Interlagos (the successor to magny-cours) will have 8 modules/16 cores.

http://www.semiaccurate.com/forums/showthread.php?p=12167#post12167

Well this made me registering in there to ask if Zambezi will be maxed at 4 modules or 8.

Frigging modules this, cores that and threads even.

EDIT: http://www.semiaccurate.com/forums/showpost.php?p=12362&postcount=308

Meh, he doesn't know. :(
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
So far we agree.

What I don't agree is when u say 4/8 on that image means 4C/8T. I believe they saying Zambezi will be released in 4 and 8 cores configurations or in other way 4C/8T and 8C/16T.

The marketing slide referencing Zambezi quite clearly states it is 4/8 CPU Cores...no mention of core versus thread there whatsoever.

So I agree, the question on all our minds is are these 4 core and 8 core Zambezi's going to be 4C/8T or 4C/4T chips (and likewise 8C/16T or 8C/8T). If we knew that answer then we'd know whether or not AMD is counting mini-cores as full-fledged cores.

One thing I think that is very obvious between Anand's article and JH's comments on Semiaccurate is that AMD doesn't have a clear marketing handle on how they want to characterize these new Bulldozer "modules".

Let's all pray to our respective god(s) that someone with authority within AMD elects to make it a priority to officially clarify the confusion that they have created.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
I believe Anand cleared that up in a comment (by asking AMD) that a 4 core Zambezi means 4 core / 8 threads while an 8-core Zambezi means 8 core / 16 threads. In that conversation in the comments, GaiaHunter asked if a 4-core BD means 4 Core / 8 threads OR 2 Core / 4 thread (meaning 4 "mini-cores"). Anand clarified, after talking to AMD according to him, that a 4 core Bulldozer means 4 cores / 8 threads.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Thanks, I must have missed it or forgot it already.

And interlagos is MCM'ed bulldozer dies, yes? So a 16 core interlagos will be two 8c/16t bulldozer dies MCM'ed into the same package much as magny-cours is two istanbul dies MCM'ed into the same package as well? If true, then is seems reasonable to me.

This will be quite interesting, Sandy vs. Bulldozer.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,628
158
106
Sincerely, IDC and jvroig, I'm not sure if they know and/or if their marketing and engineering department are in the same page.

Until I see something from a reputable source stating "Zambezi will do x threads", I'm done guessing.

Dresdenboy, for example, http://citavia.blog.de/2009/11/ , says a Zambezi 8 core will have 4 of these blocks.
The whole thing is a CMT capable processor as speculated before. And if we look at core counts of Bulldozer based MPUs we should remember, that 2 such cores are accompanied by 1 FPU and an 8 core Zambezi actually contains 4 of these blocks shown on the Bulldozer slide.

I'm starting to think I've been reading this all wrong.

I though AMD was trying to catch Intel, by adding their own type of hyperthreading, throwing more hardware at it and maybe squeezing more performance from it to balance the extra space spent. I was assuming this because, and I think we can all agree, AMD processes are generally inferior to Intel.

Now, I'm starting to think AMD is actually trying to use less hardware. Guess they trying to replicate ATI vs NVIDIA situation in the RV770 vs GT200.

In a simplified way, they seem to be grabbing 2 of their cores, shave what they see as useless and have a smaller twin core, that shall end being not inferior (or at least not being inferior in any way of much relevance - maybe countered by the GPU present in the same die?) to the starting dual-core.

Not sure how it will work, especially because Intel has/will have the process advantage (?) in this situation, but seems quite an aggressive move that might be rewarded by higher margin profits if it works as planed by AMD - selling a bigger die for a smaller price seems a lot worse than selling a smaller core for a smaller price.

Lets hope it works fine for AMD and for our consumer pockets.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
In a simplified way, they seem to be grabbing 2 of their cores, shave what they see as useless and have a smaller twin core, that shall end being not inferior (or at least not being inferior in any way of much relevance - maybe countered by the GPU present in the same die?) to the starting dual-core.

GaiaHunter the possibility of what you just stated had entirely eluded me until I read your post. And I think quite possibly, quite possibly indeed, you just made baby jesus cry if what you posted is true.

So instead of getting potentially silly stupid thread processing power with an 8C/16T zambezi we are are going to get zamboozled into buying mini-core counts masquerading as full-fledged cores. Not only will 8C != 16 threads, but those 8C cores will equal 8 threads && be sharing resources (fetch, etc) to boot.

If this is true then man what a let down.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,628
158
106
If this is true then man what a let down.

That was a bit my reaction too - lets just hope they know what they are doing and all other optimizations, changes will make up for that.

On better news, they seem to have their own Turbo Boost. (Feeling better now? :p )
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Ouch :( I hope somebody can confirm or refute this fast, preferrably officially. Don't we have an AMD employee in the forum somewhere?

There are plenty around...but they aren't authorized to speak on behalf of AMD in any official capacity so don't hold your breath waiting for that confirmation/refutation statement.

The closest we have to official spokespeople who interact with the forums is Francois Piednoel for Intel (he frequents both the XS forum and the aceshardware forum) and John Fruehe for AMD (he frequents the AMDzone forum and the semiaccurate forum).

So...sadly until we hear otherwise it looks like this is what we have for our 4C and 8C zambezi's in 2011:

quad-core Bulldozer CPU:
2BulldozerCores.jpg


octo-core Bulldozer CPU:
4BulldozerCores.jpg
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
GaiaHunter the possibility of what you just stated had entirely eluded me until I read your post. And I think quite possibly, quite possibly indeed, you just made baby jesus cry if what you posted is true.

So instead of getting potentially silly stupid thread processing power with an 8C/16T zambezi we are are going to get zamboozled into buying mini-core counts masquerading as full-fledged cores. Not only will 8C != 16 threads, but those 8C cores will equal 8 threads && be sharing resources (fetch, etc) to boot.

If this is true then man what a let down.
I dont think that AMD is that dumb. As I said above, I think that they are counting the 2int/1fp unit as a core, meaning a 4c zambezi chip would have 8 int units and 4fp, with extra fp resources shared from the on-die GPU. That 4 zambezi core chip would have 8 threads, the same 4/8 arrangement as bloomfield/lynnfield/sandy/etc.

BD vs sandy is going to be VERY interesting. Actually, BD won't be out until 2011. If its late enough 2011, the main opposition will be the tail end of sandy and the start of the Ivy generation.

Ill quote from the 2011 BD/BC article to re-iterate.
"A single Bulldozer core will appear to the OS as two cores, just like a Hyper Threaded Core i7. The difference is that AMD is duplicating more hardware in enabling per-core multithreading. The integer resources are all doubled, including the schedulers and d-caches. It’s only the FP resources that are shared between the threads. The benefit is you get much better multithreaded integer performance, the downside is a larger core. "
 
Last edited:

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
I would have to agree that a quad-core bulldozer core means 8 int schedulers and 4 FP schedulers, so 4 cores, 8 threads.

I reviewed Anand's articles about bulldozer(here and here), and he consistently referred to 1 bulldozer core to mean exactly 2int schedulers sharing 1 fp scheduler. In fact, he claimed that was AMD's approach to hyperthreading, throwing more hardware at the problem. I have no idea why matthias/ dresdenboy says otherwise, but that doesn't make sense from the perspective that AMD is trying to create its own "HT" equivalent. Even with architectural optimizations that will undoubtedly be present, going the "matthias" way instead of the "anand" way would mean still being at a distinct disadvantage in heavily threaded situations. Given that the i7 already has HT down to a tee, it just doesn't make sense to me that a chip due almost 2 years in the future won't even have anything remotely similar, and would instead go the opposite route: a little crippled by even sharing fetch resources.

Does it make sense to anybody? Because it certainly doesn't to me. Or perhaps the problem is there is still too little info about it, but from Anand's article, it was quite clear that duplicating int schedulers was the way AMD plans to implement its own "HT" equivalent.
 
Last edited:
May 11, 2008
19,547
1,192
126
Yahoo , i was hoping AMD would make an atom competitor. It's named bobcat and it might actually be an improved athlon barton style on a modern process. I do wonder if they are going for an onboard memory controller or a chipset based memory controller.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
You know, that might be a good plan for AMD. Maybe they don't have to create the fastest processor on the planet, but maybe they can beat Intel in making the best low-power processor for netbooks and smartphones. Those are growing markets, and making the ideal processor for those markets can mean a lot of dollars that so far AMD isn't getting a share of.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
I would have to agree that a quad-core bulldozer core means 8 int schedulers and 4 FP schedulers, so 4 cores, 8 threads.

I reviewed Anand's articles about bulldozer(here and here), and he consistently referred to 1 bulldozer core to mean exactly 2int schedulers sharing 1 fp scheduler. In fact, he claimed that was AMD's approach to hyperthreading, throwing more hardware at the problem. I have no idea why matthias/ dresdenboy says otherwise, but that doesn't make sense from the perspective that AMD is trying to create its own "HT" equivalent. Even with architectural optimizations that will undoubtedly be present, going the "matthias" way instead of the "anand" way would mean still being at a distinct disadvantage in heavily threaded situations. Given that the i7 already has HT down to a tee, it just doesn't make sense to me that a chip due almost 2 years in the future won't even have anything remotely similar, and would instead go the opposite route: a little crippled by even sharing fetch resources.

Does it make sense to anybody? Because it certainly doesn't to me. Or perhaps the problem is there is still too little info about it, but from Anand's article, it was quite clear that duplicating int schedulers was the way AMD plans to implement its own "HT" equivalent.

Well, I'm not an expert on this, but sharing fetch resources is a bit out of the ordinary, but if they have enough fetch resources and won't stall, it shouldn't be a problem, probably they're doing that to save some die space. Hyper Threading duplicates registers and schedulers and share execution resources, and currently most calculations are int based than FP based, so since AMD will fuse the GPU which has a lot of FPU units with a CPU, it makes more sense duplicating Int than FPU units. Hyper Threading showed gains up to 50% in the best case scenario, if AMD approach is done properly, it should be even better (Hopefully, we need competition!)
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Well, I'm not an expert on this, but sharing fetch resources is a bit out of the ordinary, but if they have enough fetch resources and won't stall, it shouldn't be a problem, probably they're doing that to save some die space. Hyper Threading duplicates registers and schedulers and share execution resources, and currently most calculations are int based than FP based, so since AMD will fuse the GPU which has a lot of FPU units with a CPU, it makes more sense duplicating Int than FPU units. Hyper Threading showed gains up to 50% in the best case scenario, if AMD approach is done properly, it should be even better (Hopefully, we need competition!)

From what you say, what I understand is you are thinking of it in terms of "1 Bulldozer core = (2 Int units + 1 FP unit)". Am I correct? Because other than that, there doesn't seem to be any other "AMD approach" to hyperthreading-like functionality.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,628
158
106
I dont think that AMD is that dumb. As I said above, I think that they are counting the 2int/1fp unit as a core, meaning a 4c zambezi chip would have 8 int units and 4fp, with extra fp resources shared from the on-die GPU. That 4 zambezi core chip would have 8 threads, the same 4/8 arrangement as bloomfield/lynnfield/sandy/etc.

I started with that impression as well and following a similar logic to you. Asked Anand, he said the same.

But John Fruehe "JF-AMD" on semi-accurate forums states that their "Interlagos" will be 16 cores = 16 threads and keeps insisting that AMD as no HT since HT is only 30%-40% improvement while a core is 100% improvement.

At this point however, I'm not sure, with all this conflicting information.
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
I started with that impression as well and following a similar logic to you. Asked Anand, he said the same.

But John Fruehe "JF-AMD" on semi-accurate forums states that their "Interlagos" will be 16 cores = 16 threads and keeps insisting that AMD as no HT since HT is only 30%-40% improvement while a core is 100% improvement.

At this point however, I'm not sure, with all this conflicting information.
I think "JF-AMD" is counting int groups as cores (instead of threads). AMD won't be seeing 100% improvement, but they should get more bang per thread than Intel gets (since they're throwing more execution width at the problem, and Intel is just duplicating/sharing per-execution stuff). If that makes sense, anyway. 2 mini-cores (int group and shared FP group) per core, 2 threads per core.

Interlagos will be 2 8-core chips MCMed together just like magny-coures (sp?) will be 2 6-cores stuck together. Or there may be no actual 8 core (16 thread) chips, and they are going to stick 2 4 core / 8 thread chips together. Being with the company doesn't mean they have accurate information.

Don't forget AMD will have llano for the low/mid range market, as well. 32nm and with on-die GPU but with a K10.5 core design instead of BD. BD will trickle down, but it'll take a while.
 

Soleron

Senior member
May 10, 2009
337
0
71
I think "JF-AMD" is counting int groups as cores (instead of threads). AMD won't be seeing 100% improvement, but they should get more bang per thread than Intel gets (since they're throwing more execution width at the problem, and Intel is just duplicating/sharing per-execution stuff). If that makes sense, anyway. 2 mini-cores (int group and shared FP group) per core, 2 threads per core.

No, JF is counting two int units in a module as two cores. He said 80% improvement over one core, but the same die size increase in relative terms as HT adds. This is looking like a major improvement in perf/mm^2 if nothing else.

He also said the single FP will have better performance than two Shanghai FP units.

EDIT: Actually I have some evidence directly from the presentation, thanks to abinstein on amdzone.

http://phx.corporate-ir.net/phoenix.zhtml?p=irol-eventDetails&c=74093&eventID=2457769

- Bulldozer will have improved single-thread performance than today's processors slide 4, 9:50 to 10:50
- Each Bulldozer module is an optimized dual core slide 9, 45:50
- Each Bulldozer "core" is capable of 2 loads/cycle; each is a 4-way out-of-order machine 46:30
- Bulldozer module is not bigger in area than Intel's hyperthreading design 47:40
- Bulldozer module can achieve ~80% speedup when running 2 threads (versus ~25% from hyperthreading) 48:00
- Multiple Bulldozer modules can share the L2 cache; and multiple of those (module? L2?) can share the L3 and NB 48:20
- Each INT scheduler can issue 4 inst./cycle; the FP scheduler can issue 4 inst./cycle 48:50 to 49:50
 
Last edited:

piesquared

Golden Member
Oct 16, 2006
1,651
473
136
I'm continuously amused by this misconception, which has started from somewhere, that AMD's solution was to "throw more hardware at it". Always good for a chuckle. Are people supposed to believe that hyperthreading is just a software trick and required no additional die space? So it appears as though intel's approach may be the one throwing more die space at it.
 
Last edited:

Soleron

Senior member
May 10, 2009
337
0
71
So it appears as though intel's approach may be the one throwing more die space at it.

Well, the same die space. Just getting less out of it, and sometimes reducing the performance instead of increasing it (rare, yes, but not what you need on critical systems where a consistent level of performance is desired).

There must be some engineering reason why; Intel's engineers can see the same facts as AMD. Maybe Intel's architecture wouldn't benefit from the new setup? Or they plan some major revision to HT in the future to make it closer to AMD's increase?
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
I'm continuously amused by this misconseption, which has started from somewhere, that AMD's solution was to "throw more hardware at it". Always good for a chuckle. Are people supposed to believe that hyperthreading is just a software trick and required no additional die space? So it appears as though intel's approach may be the one throwing more die space at it.

I don't exactly understand the difference.

Is bulldozer basically creating a mixed mode dual core/single core? As in, if can function as two simple dual cores (two wide int execution), or one 4 wide int execution single core, per core? So rather than trying to mix two threads into the same execution pipeline, it just splits the execution units in two, gives them their own pipeline, and calls it a day? If so, it seems like hyperthreading would be better with asymmetric loads, while bulldozer will be better when the loads are close to equal, unless bulldozer is able to determine whether it's more beneficial to run as a single versus dual core. If it's an always on (or always off) type thing like hyperthreading, it seems like this feature would be best turned off on the desktop.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
From what you say, what I understand is you are thinking of it in terms of "1 Bulldozer core = (2 Int units + 1 FP unit)". Am I correct? Because other than that, there doesn't seem to be any other "AMD approach" to hyperthreading-like functionality.

HT takes less than the 5% of the total transistor budget because it only duplicates register states and some schedulers, so it's approach is to maximize the execution engine usage. AMD approach takes more die space and adds you more execution resources.

From what you say, what I understand is you are thinking of it in terms of "1 Bulldozer core = (2 Int units + 1 FP unit)". Am I correct? Because other than that, there doesn't seem to be any other "AMD approach" to hyperthreading-like functionality.

Yep, you are correct, and its a totally different approach from Intel and is the only one known for now.