Some Bulldozer and Bobcat articles have sprung up

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Also, Quad channel DDR3 perhaps? :)

Meh, I kinda like that dual-channel AMD mobos don't cost an arm and a leg like the triple-channel Intel mobos...you want to pay for a quad-channel mobo and use it with consumer apps that won't benefit from anything that exceeds dual-channel bandwidth?

Was just about to drop the $ for a 6 core Thuban, but now I may wait until next summer and see what my $ will get me.

marketing guy's nightmare...over-sell the vapor/paper and you lose real sales on your product that is in the channel right now
 

Scali

Banned
Dec 3, 2004
2,495
1
0
I also enjoy alternative (not meant in a bad way) development of solutions. Look at each Bulldozer module. It could have been 1 integer unit and one FPU unit and AMD could have perhaps developed some sort of SMT or other type of hyper threading to make it work. Instead, they recognized that 80-90% of workloads are heavy integer so they added a second integer unit which, if the integer and its components are good, should provide better performance in threaded applications than a SMT or hyper threading scenario (2 Bulldozer modules with 4 integer cores vs. 2 core sandy bridge with 4 threads, if all else is equal, the 4 hardware integer units should have an advantage most of the time).

AMD currently has 3 ALUs per core (as does Intel).
Now they're going to 2 ALUs per core. So they didn't add an integer unit, they removed one (likewise, from 3 address generators to 2, so with AMD counting both as 'integer unit', they went from 6 units to 4 units per core).
So I think they're going for a tradeoff: less performance per core, but more cores per die.

At this point it's difficult to say whether Bulldozer or Sandy Bridge's HT will deliver more integer performance per die area.
The idea of HT is the same however... According to Intel, adding HT only increased the total transistor count by about 5%, but performance increase was around 10-30% on average.

At least the question of "How will they feed 4 integer units with one thread?" is now solved: Only two of them are ALUs. So they only feed 2 ALUs, which makes more sense.
 

JFAMD

Senior member
May 16, 2009
565
0
0
After hot chips tonight I can answer this better. Let's just say that people counting ALUs and making assumptions are not really getting it right. You're thinking old architecture not new architecture.
 

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
Meh, I kinda like that dual-channel AMD mobos don't cost an arm and a leg like the triple-channel Intel mobos...you want to pay for a quad-channel mobo and use it with consumer apps that won't benefit from anything that exceeds dual-channel bandwidth?
I was under the impression that the huge costs of x58 mobos was in part to the chipset itself costing $70. I would imagine low/mid consumer boards would still only have 4 slots.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
After hot chips tonight I can answer this better. Let's just say that people counting ALUs and making assumptions are not really getting it right. You're thinking old architecture not new architecture.

You may have a new architecture, but it's not going to be that different in IPC.
If you compare the P6 to the K7, you see that adding an extra ALU and AGU doesn't really make much of a difference in IPC (integer performance was pretty much tied, it was the pipelined FPU where the K7 excelled).
Likewise, removing it isn't going to impact much either. Good trade-off.
Intel probably kept the third ALU in there because they have HT, and you don't.
 

kalniel

Member
Aug 16, 2010
52
0
0
marketing guy's nightmare...over-sell the vapor/paper and you lose real sales on your product that is in the channel right now
Only if you're competing with yourself. If you cause a competitor to lose a sale while waiting for your next chip then it's good :p
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Only if you're competing with yourself. If you cause a competitor to lose a sale while waiting for your next chip then it's good :p

I was specifically quoting an individual who stated that was exactly what they were doing with their purchase decision...naturally my statement doesn't apply to purchasing decisions that I wasn't quoting or responding too.
 

blckgrffn

Diamond Member
May 1, 2003
9,687
4,348
136
www.teamjuchems.com
Hey, I am excited about Bobcat, folks! I think this is the going to be the "fast enough" CPU for the masses where Atom just doesn't *quite* make it. If it is as fast as a 3800+ x2 or so at most tasks, that's going to be a huge win and should find itself in more places than netbooks, especially if a four core implementation.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
First, I would like to thx JFAMD for the info

As for the ALUs and AGUs

Actually they now have 4 ALUs + 4 AGUs in one Module (Bulldozer), two of each per Integer Unit but they had 3 ALUs + 3 AGUs per Core in the K10 Architecture.
So if you take a 4 module Bulldozer and a 4 core Phenom II you have (4 ALUs + 4 AGUs) x 4 vs (3 ALUs + 3 AGUs) x 4 or x6 for Thuban. 16 vs 12(Deneb) vs 18 Thuban

I believe the difference is in the MacroOP and MicroOP and the integer pipeline, combined with better predictor + 4 way Decoder and OoO, the integer pipe will be more busy and well fed than K10 (Phenom).
 

Soleron

Senior member
May 10, 2009
337
0
71
First, I would like to thx JFAMD for the info

As for the ALUs and AGUs

Actually they now have 4 ALUs + 4 AGUs in one Module (Bulldozer), two of each per Integer Unit but they had 3 ALUs + 3 AGUs per Core in the K10 Architecture.

But each thread can only use two ALUs and two AGUs now. We know single-threaded performance is higher than Deneb though, probably from other stuff.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
So if you take a 4 module Bulldozer and a 4 core Phenom II you have (4 ALUs + 4 AGUs) x 4 vs (3 ALUs + 3 AGUs) x 4 or x6 for Thuban. 16 vs 12(Deneb) vs 18 Thuban
A 4 module Bulldozer does not go against a 4 core Phenom II. Rather, it would be a 2 module Bulldozer (thus, quad-core vs quad-core), which would then make it: 8 vs 12(Deneb), instead of 16 vs 12 as you stated. 16 vs 12 does not make sense at all as a comparison, especially when a quad-core is available as well to compare to the old quad-core. Bottomline: if you bought a quad-core Deneb now, and then bought a quad-core Zambezi (first Bulldozer desktop chip) later next year, you would actually be getting less ALUs and AGUs.

Which of course doesn't mean anything by itself. Even if you had made the proper comparison (quad vs quad instead of octo vs quad), the comparison would still be rather pointless.

Counting ALUs and AGUs will only get you so much. It all depends on how those units are fed, which is the job of the rest of the architecture (the ALUs and AGUs being only a part of it). We can stack ALU upon ALU, and it won't make much of a difference if the rest of the architecture had no way to feed all of them realistically. But I see you realize this, because further along you mentioned this:
I believe the difference is in the MacroOP and MicroOP and the integer pipeline, combined with better predictor + 4 way Decoder and OoO, the integer pipe will be more busy and well fed than K10 (Phenom).
... which makes me wonder why you bothered comparing an octo-core Bulldozer against a quad-core Deneb to count ALUs and AGUs in the first place.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
I wouldn't be surprised if it's really a hot chip.
Where did this come from? Have you inferred this from any relevant information they've disseminated (I wouldn't know, I have not read about all the official disclosures yet), or are you just making jab at the conference name, "Hot Chips"?
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
I am excited for Bulldozer just in a general sense of enjoying new technology that pushes the boundaries.

I also enjoy alternative (not meant in a bad way) development of solutions. Look at each Bulldozer module. It could have been 1 integer unit and one FPU unit and AMD could have perhaps developed some sort of SMT or other type of hyper threading to make it work. Instead, they recognized that 80-90% of workloads are heavy integer so they added a second integer unit which, if the integer and its components are good, should provide better performance in threaded applications than a SMT or hyper threading scenario (2 Bulldozer modules with 4 integer cores vs. 2 core sandy bridge with 4 threads, if all else is equal, the 4 hardware integer units should have an advantage most of the time).


Definitely looking forward to the comparisons. Was just about to drop the $ for a 6 core Thuban, but now I may wait until next summer and see what my $ will get me.

bulldozer will be on am3 so just get the thuban now and, if the performance jump is significant enough, upgrade to BD next fall.


anarchist 420 said:
I wouldn't be surprised if it's really a hot chip.

jvroig said:
Where did this come from? Have you inferred this from any relevant information they've disseminated (I wouldn't know, I have not read about all the official disclosures yet), or are you just making jab at the conference name, "Hot Chips"?

at the risk of sounding presumptuous, I believe that is called "humor".
 
Last edited:

Eeqmcsq

Senior member
Jan 6, 2009
407
1
0
So if I'm reading this correctly, there will be more new information presented later today? That would explain why much of the information available in these articles is stuff we've already heard of.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Yes, that's how I understood it. The article (I'm specifically referring to the AT one, haven't check the others yet) is supposed to be just a sneak peak, as per admission.

That, and JFAMD stating more info later.

Sounds good to me. Now, if I had that remote control Adam Sandler had, I would fast forward by about 7 hours
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
at the risk of sounding presumptuous, I believe that is called "humor".
Hence, a jab at the conference name. It was 90+ percent sure to be just not-funny-at-all attempt at humor, but on the off-chance that he might be at all serious, I bothered asking, aside from pointing out "that's obviously just an attempt at humor, taking a jab at the conference name, right?"
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
I am excited for Bulldozer just in a general sense of enjoying new technology that pushes the boundaries.

I also enjoy alternative (not meant in a bad way) development of solutions. Look at each Bulldozer module. It could have been 1 integer unit and one FPU unit and AMD could have perhaps developed some sort of SMT or other type of hyper threading to make it work. Instead, they recognized that 80-90% of workloads are heavy integer so they added a second integer unit which, if the integer and its components are good, should provide better performance in threaded applications than a SMT or hyper threading scenario (2 Bulldozer modules with 4 integer cores vs. 2 core sandy bridge with 4 threads, if all else is equal, the 4 hardware integer units should have an advantage most of the time).


Definitely looking forward to the comparisons. Was just about to drop the $ for a 6 core Thuban, but now I may wait until next summer and see what my $ will get me.

I agree with you, since most of the calculations are done on the integer unit, make sense doubling them, we don't know how much AMD beefed up each unit to increase IPC, but AMD's attack against Intel's HT is throwing more hardware at it. Hyper Threading benefits of the bubbles in the execution pipeline, but what if a thread is able to fullfill the pipeline? Or what if two thread stalls within the pipeline? Hyper Threading may actually cause a performance drop, plus HT its known to have the cache pollution issue. And AMD has a pretty strong FPU performance, so makes sense to beef up the integer unit.

A pretty basic example of the hardware solution in heavy multi threading scenario, the Anandtech moderator Mark, stated that in F@H, the X6 1090T was actually faster than his heavily overclocked i7 930 or 940, dont remember which one, because you have 4 phi\ysical cores with 4 logical cores sharing execution resources compared to 6 cores with their individual execution resources, if only AMD had better IPC, the performance advantage would be even bigger.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Does anyone knows and be able to tell us what’s the difference in performance of having an FP unit with FMAC vs an FP unit with FADD + FMUL.

Do we get a latency penalty with the FMAC vs FADD + FMULL or is it faster to have FMACs ???
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
An 8-core BD could potentially be outperformed by an X6 in stuff like media encoding! And it looks like it will be completely smoked by Sandy Bridge. I hope the other improvements will make up for this deficit in FPU thoughput or AMD will need software to be recompiled to use it's FMAC units just to keep up with the older generation.

That is interesting, what other tasks could be affected by relative lack of FPU?

P.S. I don't think many people would be concerned about CPU media encoding provided these tasks were better handled on a Discrete Video card or fusion iGPU.
 

jones377

Senior member
May 2, 2004
464
65
91
That is interesting, what other tasks could be affected by relative lack of FPU?

P.S. I don't think many people would be concerned about CPU media encoding provided these tasks were better handled on a Discrete Video card or fusion iGPU.

Wake me up when X264 supports GPGPU.... Things like Cinebench would be affected too. I think AMD will need software to be compiled to use it's FMAC units, but since AMD's FMAC won't be compatible with Intel's future FMAC I don't hold much hope for support outside of HPC until both AMD and Intel supports the same standard.
 

zokudu

Diamond Member
Nov 11, 2009
4,364
1
81
That is interesting, what other tasks could be affected by relative lack of FPU?

P.S. I don't think many people would be concerned about CPU media encoding provided these tasks were better handled on a Discrete Video card or fusion iGPU.

I think this is AMD strategy. Its mid and entry level are becoming AMD Fusion with an IGP already included. I guess theyre assuming if your looking at Bulldozer you'll have a discrete card capable of offloading most FP tasks to a GPU. This is their idea of the future.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
A 4 module Bulldozer does not go against a 4 core Phenom II. Rather, it would be a 2 module Bulldozer (thus, quad-core vs quad-core), which would then make it: 8 vs 12(Deneb), instead of 16 vs 12 as you stated.

I guess I see it a different way. Each core or half module is smaller now since it cut down the number of IEU per core and maybe managed to save some area for the quasi-merged FP unit/scheduler across cores. So it could be that in terms of die area, 1 module is roughly the same area as 1 core and assuming AMD is pricing based on area then for the same money you can get 4 modules of Bulldozer against a 4 core Phenom II.

I could be completely wrong since I haven't looked at any of the die area numbers yet.