• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Some Bulldozer and Bobcat articles have sprung up

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

kalniel

Member
Aug 16, 2010
52
0
0
overclock potential for both intel and amd is important b/c it shows what the arch is capable of. look at it this way: if amd cherry picked some thubans for 4ghz intel would just bump up the stock clocks on gulftown a couple of notches so they could maintain their lead. amd realizes this so they stay down in the sub-$200 price ranges (other than 1090t). however, if BD is capable of, say, 4.6 ghz with a little bit of volume at launch and SB won't do over 4.1 ghz, then amd is going to pound out something that they don't think intel will be able to match and we'll end up with a performance competition again. Remember how poorly the original amd athlon x2 cpus oc'd? the lower end cpus did ok, but if you had an opty 185 or whatever then you weren't getting much more out of that sucker. how much oc can you get on air out of a 975x? 30% or so? maybe 25-30% on a 980x as well. if BD is competitive those oc numbers could go back to just a few again.
I would clarify that OC potential of top bin parts would demonstrate arch capability. OC potential of lower bins just indicates that yields are good enough that market factors are influencing the binning process rather than voltage/speed limitations.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
This is probably a dumb question...but has anybody reported the power delta when overclocking the NB? Ok, having looked at the article now, bumping the NB by 20% or 40% didn't buy very much performance... the best case was ~15% improvement for a 40% overclock but most gains were much smaller. It's a shame power consumption wasn't also reported. It would be interesting to see if/how the scaling differs for L3-less products.

I'm with you, a 5-15% performance improvement from a 40% overclock seems really low. But the way I see it, the NB in K10 processors should have always been running at a 1:1 ratio with the CPU clock. AMD needed every ounce of extra performance to be able to catch up with Intel's offerings, so even a small 5% increase helps. But for various reasons, they kept the frequency low at 2GHz.

And it's likely BD will have a higher clocked NB, possibly run it at a 1:1 ratio with the CPU core. If I'm not mistaken, the L1 and L2 caches in BD were cut in half (for lower latency?) compared to K10 processors, in this case a higher clocked NB/L3 cache would be more desirable, and the performance gain would be more significant. In Anand's test, Deneb showed a linear increase from 2GHz to 2.8GHz NB clocks, then the performance gain tapered off at 3GHz. DB may be different in that it could end up benefiting more from extreme NB clocks.
 
Last edited:

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
I'm with you, a 5-15% performance improvement from a 40% overclock seems really low. But the way I see it, the NB in K10 processors should have always been running at a 1:1 ratio with the CPU clock. AMD needed every ounce of extra performance to be able to catch up with Intel's offerings, so even a small 5% increase helps. But for various reasons, they kept the frequency low at 2GHz.

And it's likely BD will have a higher clocked NB, possibly run it at a 1:1 ratio with the CPU core. If I'm not mistaken, the L1 and L2 caches in BD were cut in half (for lower latency?) compared to K10 processors, in this case a higher clocked NB/L3 cache would be more desirable, and the performance gain would be more significant. In Anand's test, Deneb showed a linear increase from 2GHz to 2.8GHz NB clocks, then the performance gain tapered off at 3GHz. DB may be different in that it could end up benefiting more from extreme NB clocks.

L1 is smaller (well a portion of it is smaller), but the L2 is expected to be larger (at 2MB per module - shared between the two cores), and the latency is much higher than Deneb at 18-20 cycles, versus 10 for Deneb (which is another indication of the high clockspeed target). Also the L2 cache is said to be inclusive rather than exclusive in BD. There are a lot of reasons for this, but for a more detailed explanation I suggest reading this: http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=9
 
Last edited:

Janooo

Golden Member
Aug 22, 2005
1,067
13
81
L1 is smaller (well a portion of it is smaller), but the L2 is expected to be larger (at 2MB per module - shared between the two cores), and the latency is much higher than Deneb at 18-20 cycles, versus 10 for Deneb (which is another indication of the high clockspeed target). Also the L2 cache is said to be inclusive rather than exclusive in BD. There are a lot of reasons for this, but for a more detailed explanation I suggest reading this: http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=9
I understand AMD wants to have more cores for marketing reasons.
Personally, I consider one module an old core because it shares the cache and it's only ~10% bigger.
They should have kept the name 'core' and make a big deal out of the fact that AMD's hyperthreading is much better than Intel.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I understand AMD wants to have more cores for marketing reasons.
Personally, I consider one module an old core because it shares the cache and it's only ~10% bigger.
They should have kept the name 'core' and make a big deal out of the fact that AMD's hyperthreading is much better than Intel.

That's a really good point, could have made for some interesting times.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
I'm going to save jvroig from going apeshit here:

it's a lot more than 10% bigger, think more like 50%. If you want the details jvroig posted a great explanation 5 or 6 pages ago. A BD module really is 2 cores, it is going to have better ipc than k10, it's just only going to be more like 80% of peak performance of a group of theoretical BD "single cores" in highly multithreaded scenarios.
 

Janooo

Golden Member
Aug 22, 2005
1,067
13
81
I'm going to save jvroig from going apeshit here:

it's a lot more than 10% bigger, think more like 50%. If you want the details jvroig posted a great explanation 5 or 6 pages ago. A BD module really is 2 cores, it is going to have better ipc than k10, it's just only going to be more like 80% of peak performance of a group of theoretical BD "single cores" in highly multithreaded scenarios.
I am talking the die size.
I have trouble finding the above mentioned post. Can you help?
Anand says 12%. What do I miss?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I am talking the die size.
I have trouble finding the above mentioned post. Can you help?
Anand says 12%. What do I miss?

the 12.5% is only if you tear out the Integer ALU's but not it's share of the shared resources that are only present in the module because of the second ALU core

The question is what part of the module is the core...the core isn't just the integer ALU circuits, its a bunch of other stuff that make up the rest of the pipeline as well...and none of that other stuff is included in the 12.5% number.
 

Janooo

Golden Member
Aug 22, 2005
1,067
13
81
the 12.5% is only if you tear out the Integer ALU's but not it's share of the shared resources that are only present in the module because of the second ALU core

The question is what part of the module is the core...the core isn't just the integer ALU circuits, its a bunch of other stuff that make up the rest of the pipeline as well...and none of that other stuff is included in the 12.5% number.
Interesting...
When I read Anand's article (especially the slide) it appears to me 12% is it. Nothing more.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Interesting...
When I read Anand's article (especially the slide) it appears to me 12% is it. Nothing more.

and it's right, 12% is "it"...provided by "it" you understand they are talking about the die-area of the integer ALU and nothing more...if you take "it" to mean the die-area of the second core then that is when the number no longer applies.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Ugh, a lot of this confusion would go away if AMD would back away from trying to dominate the number of "cores" part of CPU marketing. Look what it did to Intel in the GHz war. I trust AMD to be a bit more careful in that regard, especially with their renewed focus on engineering but it can still be frustrating. Maybe call the integer units "cortexes" or something.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
L1 is smaller (well a portion of it is smaller), but the L2 is expected to be larger (at 2MB per module - shared between the two cores), and the latency is much higher than Deneb at 18-20 cycles, versus 10 for Deneb (which is another indication of the high clockspeed target). Also the L2 cache is said to be inclusive rather than exclusive in BD. There are a lot of reasons for this, but for a more detailed explanation I suggest reading this: http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=9

That's an intersting article Martimus, thanks for the link. Though a cache configuration of 8MB L2/8MB L3 for a 4 module BD seems strange, in a perfect world the the shared L3 should be larger. I'm thinking a 4MB L2/8MB L3 cache configuration would be more accurate.

It's true that Phenom 1 had 2MB for both L2/L3 caches, but AMD was forced to do that because of the size constraints of the arthitecture @ 65nm. I don't see them facing this problem at 32nm.
 
Last edited:

Triskain

Member
Sep 7, 2009
63
33
91
L1 is smaller (well a portion of it is smaller), but the L2 is expected to be larger (at 2MB per module - shared between the two cores), and the latency is much higher than Deneb at 18-20 cycles, versus 10 for Deneb (which is another indication of the high clockspeed target). Also the L2 cache is said to be inclusive rather than exclusive in BD. There are a lot of reasons for this, but for a more detailed explanation I suggest reading this: http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=9

The difference in latency isn't that big actually, PhII's L2 cache has a latency of 15 cycles, not 10, as written here.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106

That is a very interesting picture, and seems to correlate to the L2 and L3 cache being the same size (2MB each for each module), although reading the X-Bit article (http://www.xbitlabs.com/news/cpu/di...Core_Orochi_Processor_for_the_First_Time.html), they believe it is heavily photoshopped for competitive reasons (The cores and cach sizes are all different sizes on the same die).

For those that don't know 'Orichi' is the codename for the Bulldozer core.

Also, fo those that can't reach the link from jones77, I have included the die shot here:
amd_orochi_august2010.jpg
 

JFAMD

Senior member
May 16, 2009
565
0
0
While you beleive it "correlates" data, you are making that assumption based on a die shot that actually has things purposely modified to mask their true size.

Not sure how you can correlate anything based on that.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
While you beleive it "correlates" data, you are making that assumption based on a die shot that actually has things purposely modified to mask their true size.

Not sure how you can correlate anything based on that.

Thank you for verifying that the die shot was photoshopped. I appreciate that, considering the other source that suggested it was based purely on conjecture. I like hearing the data directly from the horses mouth.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
While you beleive it "correlates" data, you are making that assumption based on a die shot that actually has things purposely modified to mask their true size.

Not sure how you can correlate anything based on that.

While I can appreciate the desire and need for the secrecy I don't get the point of showing a dieshot that is intentionally blurred and distorted.

The purpose of showing a dieshot is to show it. What you are showing there is not a dieshot of Orochi/BD...it is art.

The artist may have started their creation by using a canvas that was a dieshot of Orochi, but when the artist finished their masterpiece they did not leave the audience with a dieshot of Orochi.

So what value was delivered to the audience in showing us this artwork? AMD has already said Bulldozer taped out and you guys have silicon in hand...so we already knew wafers existed with bulldozer dies on them.

Showing the audience an artistic rendition of the bulldozer dieshot accomplishes what?

That is a very interesting picture, and seems to correlate to the L2 and L3 cache being the same size (2MB each for each module)

Even if the picture really were a dieshot you still have to be careful about using die-area data to project L2$ and L3$ sizes because the sram cell size itself changes depending on the desired sram density, clockspeed, latency and Vmin sensitivity.
 

JFAMD

Senior member
May 16, 2009
565
0
0
The goal was not to show off the die. The goal was to support our partner at their event.

We don't share dies prior to launch. When the request came in to show it from GF, we needed a way to accommodate them yet still stay within our process.

Those that think this was either "AMD driven" or an "AMD disclosure" are off base. We do that at launch.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Ah, I'm with you now. Makes perfect sense. Your first two sentences, pwned my entire train of thought :thumbsup: thx for clarifying!
 

Janooo

Golden Member
Aug 22, 2005
1,067
13
81
Hi, Janooo. Post #241 of this thread. I think I made a follow-up comment after that post, but most issues, I believe, are covered by that post regarding the "5%, 12%, 50%" figures
Thanks. It's an interesting post.

At the end of the post:
Where did we get this 50%? From AMD themselves, when they corrected Anand, because Anand thought from 1 core, you need 5% more and you get a dual core. Which is not true, and if you've gotten this far into my post, you probably understand now that from 1 core, you need 50% more to make a dual core - exactly what AMD told Anand as a correction.
Would somebody point me to the AMD correction? A link?
Thanks.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Here it is: http://www.anandtech.com/show/2881

It was also posted in Post #244 (just shortly after the post, Martimus asked for the link as well), and also linked to before already in the thread. However, with the way this thread exploded, it's getting harder and harder to sort through everything.
 

Janooo

Golden Member
Aug 22, 2005
1,067
13
81
Here it is: http://www.anandtech.com/show/2881

It was also posted in Post #244 (just shortly after the post, Martimus asked for the link as well), and also linked to before already in the thread. However, with the way this thread exploded, it's getting harder and harder to sort through everything.
Thanks.
Forgive me if it was asked.
Let's assume single Phenom II core with 2MB L2 cache is 100%.
What die size percentage is cache and what is non-cache area?

It seems like 1MB is 50% at this page (the second die picture).
http://www.anandtech.com/show/2836
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Thanks.
Forgive me if it was asked.
Let's assume single Phenom II core with 2MB L2 cache is 100%.
What die size percentage is cache and what is non-cache area?

Just to give a very rough idea, without L3 cache, a single core PII with IMC/NB and Hyper Transport links would account to ~58% of the die size, and around ~42% for 2MB L2 cache area.

Adding a second core to the above mentioned configuration (without adding more L2 cache) would only drop the 2MB L2 die area to ~30%, and ~70% would be for the dual cores and everything else.

BTW, it's useless to try to apply these numbers to BD, because BD is a completely new architecture on a 32nm process, I got my rough calculation from an Athlon II X2 die shot.