• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Some Bulldozer and Bobcat articles have sprung up

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

brybir

Senior member
Jun 18, 2009
241
0
0
So a former engineer who no longer works on the project says "5% less" and a current engineer who IS working on the project tells me "more IPC."

Why do you choose to believe the one with less access to today's data?


Well I think the problem most of the time is that we get fed a LOT of marketing that makes almost every product sound amazing, then come release date we are left sort of disappointed. So, we like to latch on to statements that are a bit pessimistic because at least historically, the pessimists seem to be on the correct edge of real performance.

Another issue is that we keep hearing, from wherever, that BD is a core designed around server workloads. I personally have no knowledge about that, but on release day you can be sure I dont care about server performance, but how it performs in games, F@H etc in the consumer space. Once we see those numbers, conducted by the likes of Anand or Hexus, maybe I will change my mind, but until then, I choose to remain a pessimist and assume that its not going to be as good as the marketing tells us it should be.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
If they can do 3.4GHz stock, and 3.6GHz turbo, on a 6-core thuban with 45nm w/o HKMG, I am going to be floored and gobsmacked if they can't do 4GHz with a 4module BD on 32nm w/HKMG while operating within the same thermals.

Consider JF has already said Interlagos is going to be 50% more performance with 33% more cores while operating within the same thermals as Magny-Cours.

People OC their 45nm X6's to 4GHz routinely. I expect to see 5GHz with BD a year after intro.

Yeah, and with air. I grew tired of seeing 2.8-3.2GHz stock clocked CPU's for more than 5 years and people not being able to go beyond 4.4GHz on air cooling (5GHz+ with liquid nitrogen/helium for cooling is nice but not practical for everyday use). That would be :awe: If Phenom II was able to reach 6.5GHz on liquid nitro/helium, BD should go higher than 8GHz!! (We shall see though)
 

khon

Golden Member
Jun 8, 2010
1,318
124
106
I expect to see 5GHz with BD a year after intro.

We've been stuck between 3 and 4GHz for 7 years now, and Intel has shown no indication that they will break that barrier any time soon, yet you expect AMD to jump all the way to 5GHz in a year ?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Well I think the problem most of the time is that we get fed a LOT of marketing that makes almost every product sound amazing, then come release date we are left sort of disappointed. So, we like to latch on to statements that are a bit pessimistic because at least historically, the pessimists seem to be on the correct edge of real performance.

Another issue is that we keep hearing, from wherever, that BD is a core designed around server workloads. I personally have no knowledge about that, but on release day you can be sure I dont care about server performance, but how it performs in games, F@H etc in the consumer space. Once we see those numbers, conducted by the likes of Anand or Hexus, maybe I will change my mind, but until then, I choose to remain a pessimist and assume that its not going to be as good as the marketing tells us it should be.

Consider that Barcelona, Shanghai, and Istanbul all came out months before their desktop consumer-space comparables.

Whether you like it or not, you are likely to see server benching done Interlagos before you see benching done on Zambezi.

And whether AMD likes it not, we are likely to some halfway bastardized pseudo-Zambezi benchmarking done on a server/workstation system before we see Zambezi benched as well.

Not that I'm complaining, it is all entertaining.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
We've been stuck between 3 and 4GHz for 7 years now, and Intel has shown no indication that they will break that barrier any time soon, yet you expect AMD to jump all the way to 5GHz in a year ?

AMD is purposefully designing the chip for high clockspeeds. A lot of the details we have seen show that they are doing this (we have discussed the reasons already, but this site has a lot of references as to why: http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333)

On the other hand, we know that the wire speeds will be slower than initial projections due to foundry issues. However, they aren't necessarily bad, just not as good as expected, and they are better than current 45nm SOI, which allows a 6 core Thuban to reach >4GHz on air.

Everything I have seen points to increased clockspeeds. I am being conservative at 4GHz, but I trust IDC when he says he expects 5GHz by the end of 2012. Of course by then, BD will be competing with Ivy Bridge on 22nm, which may make the high clockspeed moot.
 

brybir

Senior member
Jun 18, 2009
241
0
0
Consider that Barcelona, Shanghai, and Istanbul all came out months before their desktop consumer-space comparables.

Whether you like it or not, you are likely to see server benching done Interlagos before you see benching done on Zambezi.

And whether AMD likes it not, we are likely to some halfway bastardized pseudo-Zambezi benchmarking done on a server/workstation system before we see Zambezi benched as well.

Not that I'm complaining, it is all entertaining.


I am okay with it, not that I have a choice in the matter in any event. I just get tired of getting excited about some cool new tech and it turning out to be *meh* when all is said and done. So I just take the wait and see approach in the real world scenarios.

I think personally I am more interesting in the development of the underlying technology, but I dont have a sufficient technical background to truly grasp the significance of what is going on. For example, we keep talking about aggressive branch prediction. I think that is cool in the sense that I vaugly understand the idea behind branch prediction, and think its exciting to see what people can do with what amounts to billions of on/off switches with a bunch of electrons flying around real fast. So, I like to hear about the branch updates and stuff, but a lot of it turns into speculation and argument...which sort of ruins it for me.

That is why I LOVE the anandtech articles where he goes deep into the tech, which I assume comes from close collaboration with engineers and such, and provides that cool detail AND relates it to tangible increases so I can say....cool, BD is 15% faster, and half of that is because of this neat new branch prediction, which increases the speed because....

I think ive gotten a bit off topic in any event, so I will end this comment by saying I look forward to BD's release!
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
So a former engineer who no longer works on the project says "5% less" and a current engineer who IS working on the project tells me "more IPC."

Why do you choose to believe the one with less access to today's data?

sorry, forgot that you actually did say "more IPC" instead of just "higher performance". I haven't accused you of anything, john, in fact I believe that you're here trying to help. I just forgot that you specifically mentioned ipc until I went back to look.

That would be even more impressive if we have better ipc AND higher clocks. That could change the equation significantly, though if BD has been designed from the ground up as a server cpu then it does make sense that they would put more effort into optimizing multi-thread performance/thermals than they would in maximizing ipc. I personally don't care if it's 5 ghz with the same ipc or 4ghz with 25% higher ipc, I just want an option to at least be able to consider amd next year instead of having to choose intel by default.
 

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
In light of the very recent article on Thuban, I wonder what the 'uncore' speeds would be, assuming high core clock speeds of BD.

Increase in CPU-NB speeds(overclock stable) has been almost linear with each stepping,

c2 stepping - 2.6 Ghz
c3 stepping - 2.8 Ghz
e0 stepping - 3.0 Ghz

and the performance increase is substantial too, atleast for lightly threaded applications.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
We've been stuck between 3 and 4GHz for 7 years now, and Intel has shown no indication that they will break that barrier any time soon, yet you expect AMD to jump all the way to 5GHz in a year ?

Intel has had no need to push stock clockspeeds. I would not use their product specs and timeline as a proxy for the capability of the underlying process technology and architecture.

Notice any plans for 130W TDP SandyBridge SKU's? Rather than worry about using up there 130W budget to make >4GHz Sandy SKU's they have instead decided to max out the SKU's with 95W TDP's.

AMD has been steadily pushing the clockspeeds upwards, what did Barcelona and 65nm debut at? They have to, they have a need to push stock clockspeeds.

And I'm not saying they are "jumping" to 5GHz in a year. I said a year after BD debuts, and to be more specific I mean Zambezi not Interlagos. So I'm talking consumer SKU's circa Q4 2012 - Q1 2013.

Just look at Deneb X4 clockspeeds, TDP, release date versus Thuban X6 clockspeeds, TDP, release date (about a year later).

Nothing is telling me its outside the realm of the plausible. But so too is Intel releasing 5GHz 22nm Ivy's in Q4 2012 as Martimus alluded. We will have simply reached that point, 2yrs from now, where node shrinks and power consumption will make 5GHz reasonable.

Back in 2006 I was running my 65nm quad (QX6700) at 4GHz but it required vaporphase cooling. Now you can hit that with stock volts and air cooling. Extrapolate that out to two years from now. 5Ghz on air is not exactly shooting the moon.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I am okay with it, not that I have a choice in the matter in any event. I just get tired of getting excited about some cool new tech and it turning out to be *meh* when all is said and done. So I just take the wait and see approach in the real world scenarios.

I think personally I am more interesting in the development of the underlying technology, but I dont have a sufficient technical background to truly grasp the significance of what is going on. For example, we keep talking about aggressive branch prediction. I think that is cool in the sense that I vaugly understand the idea behind branch prediction, and think its exciting to see what people can do with what amounts to billions of on/off switches with a bunch of electrons flying around real fast. So, I like to hear about the branch updates and stuff, but a lot of it turns into speculation and argument...which sort of ruins it for me.

That is why I LOVE the anandtech articles where he goes deep into the tech, which I assume comes from close collaboration with engineers and such, and provides that cool detail AND relates it to tangible increases so I can say....cool, BD is 15% faster, and half of that is because of this neat new branch prediction, which increases the speed because....

I think ive gotten a bit off topic in any event, so I will end this comment by saying I look forward to BD's release!

I'm with you there, all the way around. Hydra anyone?

And the conversation, YES :thumbsup: we are all here to derive some enjoyment by interacting, that can be all the more challenging to achieve if negativity abounds even if the thread itself is data rich.

We all have stressful enough lives in the real-world between family and coworkers, who wants to spend a portion of their free time immersed in even more of it?
 

khon

Golden Member
Jun 8, 2010
1,318
124
106
Just look at Deneb X4 clockspeeds, TDP, release date versus Thuban X6 clockspeeds, TDP, release date (about a year later).

Deneb X4 actually has a higher clock maximum clock speed than Thuban X6, they just used the lowered power consumption to fit another 2 cores into the same TDP.

Who's to say BD won't do the same thing, especially since AMD seems to be targetting the server market ?
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,700
406
126
Deneb X4 actually has a higher clock maximum clock speed than Thuban X6, they just used the lowered power consumption to fit another 2 cores into the same TDP.

Who's to say BD won't do the same thing, especially since AMD seems to be targetting the server market ?

I'm under the impression that both X4 965 (C3) and X6 1055/1090 hit around 4GHz, give or take.

Of course the X6 is E0 stepping so a native X4 E0 could potentially be faster.

Do you have data showing otherwise?
 

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
I'm under the impression that both X4 965 (C3) and X6 1055/1090 hit around 4GHz, give or take.

Of course the X6 is E0 stepping so a native X4 E0 could potentially be faster.

Do you have data showing otherwise?

I have not seen power consumption figures at 4Ghz. I guess the equation will be different, favoring deneb's, if you consider power draw.
 

khon

Golden Member
Jun 8, 2010
1,318
124
106
I'm under the impression that both X4 965 (C3) and X6 1055/1090 hit around 4GHz, give or take.

Of course the X6 is E0 stepping so a native X4 E0 could potentially be faster.

Do you have data showing otherwise?

I was talking stock speeds, not overclock potential.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
overclock potential for both intel and amd is important b/c it shows what the arch is capable of. look at it this way: if amd cherry picked some thubans for 4ghz intel would just bump up the stock clocks on gulftown a couple of notches so they could maintain their lead. amd realizes this so they stay down in the sub-$200 price ranges (other than 1090t). however, if BD is capable of, say, 4.6 ghz with a little bit of volume at launch and SB won't do over 4.1 ghz, then amd is going to pound out something that they don't think intel will be able to match and we'll end up with a performance competition again. Remember how poorly the original amd athlon x2 cpus oc'd? the lower end cpus did ok, but if you had an opty 185 or whatever then you weren't getting much more out of that sucker. how much oc can you get on air out of a 975x? 30% or so? maybe 25-30% on a 980x as well. if BD is competitive those oc numbers could go back to just a few again.
 

khon

Golden Member
Jun 8, 2010
1,318
124
106
overclock potential for both intel and amd is important b/c it shows what the arch is capable of. look at it this way: if amd cherry picked some thubans for 4ghz intel would just bump up the stock clocks on gulftown a couple of notches so they could maintain their lead. amd realizes this so they stay down in the sub-$200 price ranges (other than 1090t). however, if BD is capable of, say, 4.6 ghz with a little bit of volume at launch and SB won't do over 4.1 ghz, then amd is going to pound out something that they don't think intel will be able to match and we'll end up with a performance competition again. Remember how poorly the original amd athlon x2 cpus oc'd? the lower end cpus did ok, but if you had an opty 185 or whatever then you weren't getting much more out of that sucker. how much oc can you get on air out of a 975x? 30% or so? maybe 25-30% on a 980x as well. if BD is competitive those oc numbers could go back to just a few again.

Probably true.

It's fairly trivial to get an i5 quadcore up to 4GHz on air, yet Intel doesn't actually sell any of them with a stock speed above 2.8GHz, since they're faster than all the AMD quadcores anyway.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Have you checked Anand's latest Thuban article?

http://www.anandtech.com/show/3877/...investigation-of-thuban-performance-scaling/7

The OCed NB produced some pretty good results in some cases, SC2 got a 16% increase in performance from the higher clocked NB frequency alone. AMD had to lower the NB/L3 clocks to keep thermals down, and that slightly handicapped Deneb's performance in many situations.

This is probably a dumb question...but has anybody reported the power delta when overclocking the NB? Ok, having looked at the article now, bumping the NB by 20% or 40% didn't buy very much performance... the best case was ~15% improvement for a 40% overclock but most gains were much smaller. It's a shame power consumption wasn't also reported. It would be interesting to see if/how the scaling differs for L3-less products.

the only thing that went down is the number of ALU. a third ALU is only gives a small performance boost (only <5% code makes use of this).
Considering the biggest change is the AGU and that they can work simultaniously with the ALU in BD can allready result in a gain large enough to offset the <5% from the 3rd ALU. i'm pretty confident that the ipc on average will be higher then K10. What i agree with others is that the potential peak of K10 will be higher due to the 3rd ALU. But in real applications i see enough improvement in BD to keep its alu's better occupied (remember it will also support op fusion).

Are you saying that you couldn't simultaneously use all 3 ALUs and 3 AGUs in K7/K8? I always assumed you could do that... my reading of Agner Fog's microarchitecture descriptions unfortunately doesn't make it clear... it sounds like the retirement limit makes it too difficult to measure in practice. The way he talks about macro operations makes me think it should be possible to retire 2 instructions that mix a load and an operation (something like add r32, m32) and one ALU or LEA instruction every cycle, which would be using 3 ALUs and 2 AGUs every cycle (so, 5 of the 6). I vaguely remember trying to do that once and failing, but I don't remember why. This old K7 paper says the integer register file / future file had 9 read ports... maybe that ends up being a limitation when you try to execute that many ops in one cycle?</irrelevant ramble>

I think personally I am more interesting in the development of the underlying technology, but I dont have a sufficient technical background to truly grasp the significance of what is going on. For example, we keep talking about aggressive branch prediction. I think that is cool in the sense that I vaugly understand the idea behind branch prediction, and think its exciting to see what people can do with what amounts to billions of on/off switches with a bunch of electrons flying around real fast. So, I like to hear about the branch updates and stuff, but a lot of it turns into speculation and argument...which sort of ruins it for me.

That is why I LOVE the anandtech articles where he goes deep into the tech, which I assume comes from close collaboration with engineers and such, and provides that cool detail AND relates it to tangible increases so I can say....cool, BD is 15% faster, and half of that is because of this neat new branch prediction, which increases the speed because....

It's a little disappointing that we never get to see any apples-to-apples comparisons showing the impact of each feature independently. How much of the performance change will come from the branch predictor, versus the integer execution units, or floating point unit, or data cache, etc? Obviously the solution is a free performance model or cycle-accurate simulator that can run real code to experiment with :)
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Well I think the problem most of the time is that we get fed a LOT of marketing that makes almost every product sound amazing, then come release date we are left sort of disappointed. So, we like to latch on to statements that are a bit pessimistic because at least historically, the pessimists seem to be on the correct edge of real performance.

Problem is, what is "real performance". There isn't an absolute IPC improvement that anyone can quote on a processor. It's more of an average based across a wide range of programs. And the problem gets even more complicated because you have legacy IPC, basically how much better does the processor do on applications that aren't recompiled to take advantage of the new instructions. Or do you quote future IPC, basically if programs were to get correctly recompiled to take advantage of the new instructions.

I think generally marketing and engineers prefer to quote the latter numbers because they're putting in all this great hardware and if they don't get any credit for it, that pretty much makes you feel like you spent all this time for nothing.

At least that's what gets me by in my job. Hoping that programmers recompile and show off that the stuff we're putting in is super awesome.

This old K7 paper says the integer register file / future file had 9 read ports... maybe that ends up being a limitation when you try to execute that many ops in one cycle?</irrelevant ramble>

Not necessarily. You could "theoretically" fill a bunch of execution units without requiring many integer read ports if all your uops had sources coming from uops in executions. So the better your scheduler is, the less read port limited you are!
 
Last edited:

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Not necessarily. You could "theoretically" fill a bunch of execution units without requiring many integer read ports if all your uops had sources coming from uops in executions. So the better your scheduler is, the less read port limited you are!

Yeah, I was wondering about that... that would depend on very carefully arranging instructions, and there actually being enough bypassing to move the results around as needed.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Yeah, I was wondering about that... that would depend on very carefully arranging instructions, and there actually being enough bypassing to move the results around as needed.

If you throw at it a bunch of independent uops that don't use the same source then you'd get in trouble. But there are large workloads/flows that is basically a huge chain of dependent uops and so you automatically get the instructions arranged and all you have to do is make sure the hardware is there to bypass them as fast as you can.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
This is probably a dumb question...but has anybody reported the power delta when overclocking the NB? Ok, having looked at the article now, bumping the NB by 20% or 40% didn't buy very much performance... the best case was ~15% improvement for a 40% overclock but most gains were much smaller. It's a shame power consumption wasn't also reported. It would be interesting to see if/how the scaling differs for L3-less products.

Someone did actually test this in this forum using Phenoms and Thlons to test NB overclocking with and without L3 Cache. I'll see if I can find it for you.


Are you saying that you couldn't simultaneously use all 3 ALUs and 3 AGUs in K7/K8? I always assumed you could do that... my reading of Agner Fog's microarchitecture descriptions unfortunately doesn't make it clear... it sounds like the retirement limit makes it too difficult to measure in practice. The way he talks about macro operations makes me think it should be possible to retire 2 instructions that mix a load and an operation (something like add r32, m32) and one ALU or LEA instruction every cycle, which would be using 3 ALUs and 2 AGUs every cycle (so, 5 of the 6). I vaguely remember trying to do that once and failing, but I don't remember why. This old K7 paper says the integer register file / future file had 9 read ports... maybe that ends up being a limitation when you try to execute that many ops in one cycle?</irrelevant ramble>

I think the issue with 3 AGU's is that K8 had dual-only ported DataCache which caused sequencing issues. (http://groups.google.de/group/comp.arch/browse_thread/thread/45018bf3214f6049?hl=de# check post from 26 Aug, 21:37)



It's a little disappointing that we never get to see any apples-to-apples comparisons showing the impact of each feature independently. How much of the performance change will come from the branch predictor, versus the integer execution units, or floating point unit, or data cache, etc? Obviously the solution is a free performance model or cycle-accurate simulator that can run real code to experiment with :)

I agree. I know it would help me understand how each portion of the processor works in relation with the others much better.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
I agree. I know it would help me understand how each portion of the processor works in relation with the others much better.

It's potentially very complicated. Sometimes some features are crippled if another feature isn't enabled.
 

DrMrLordX

Lifer
Apr 27, 2000
22,945
13,028
136
This is probably a dumb question...but has anybody reported the power delta when overclocking the NB?

Actually yes, in a thread months ago that I may have started or in which I participated . . . we did some NB overclocking on L3-less chips (this was still when I was beating my poor 635 to death) and the subject of power consumption/thermals due to NB speed came up.

Someone used a kill-a-watt or similar meter to record power consumption and found that maxing out the NB speed on their l3-less K10.5 produced something like 6-8W of extra power consumption, or something thoroughly trivial for an overclocked, overvolted processor. If I didn't know better, I'd say it was either jvroig or . . . formulav8 that did it. Maybe.

If I had a kill-a-watt handy, I'd run some tests on this wimpy little Sargas, but I don't.

All that I can report is that moving NB voltages and NB speeds upwards does practically nothing to my reported CPU temperatures on either this Sargas chip or on my now-dead Propus. I've pushed up to 1.51v CPU-NB with various other escalated voltages (NB, HT) without producing notable temperature increases. This may be a result of the cooling involved but I can't say for sure. Another observation I've noticed is that NB stability is extremely sensitive at high NB clocks when core temps start to rise (particularly from increased vcore), which seems to imply that the NB is not quite capable of generating enough of its own heat to create a voltage/stability feedback loop that you normally get with cores when raising vcore. Or, in other words, I can get the NB to start misbehaving by raising vcore without increasing core clockspeeds, but I can't make the NB start wigging out by raising CPU-NB (leaving the NB multi untouched in both scenarios).
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
from reading that comp.arch thread, the former amd engineer said their internal goals were 20-25&#37; higher clocks with 5% less ipc. so figure 15-20% more performance/core as a best case scenario, that will leave them 10-15% behind SB on up to 4 cores. figure breakeven is between 5/6, then 7-8 cores give them the advantage. very server friendly, not so much for desktop users :(

their only chance on desktop vs 2011 is to go to 12/16 cores.
One problem there is that that still doesn't leave them in great shape for servers. If they move from 9% (last number I read) of the server market to 15%, then as soon as Intel gets done with dinner, go back to <10%, why go to the trouble?

For current servers, and future desktops/notebooks*, more cores matter more than faster single cores. However, higher speed cores, at the expense of IPC, would be suicidal. It would just make it too easy for them to trip over themselves, or for Intel to pull out something unexpected. However, even AMD has been willing to admit that they have had serious problems filling their K8 and newer cores' instruction units, so there's still no reason to believe that the fewer resources will result in lower performance (but, how much higher...er...???). Working more on highly-threaded situations than single-threaded makes perfect sense, but the single-thread performance still needs to keep going up on a per-clock basis.

* So, a distilled thesis for gaming CPUs, without being mixed with a confrontation with Scali, would be that as desktops and consoles have gotten hardware features that previously only servers had, desktop software, including games, are coming to resemble server software in many ways, including threading out. The two's needs are still different, but a common x86 server, and common x86 gaming PC, right now, differ most in how many cores they need, and graphics cards--used desktop cores are in the plural, and only going up. Throw in typical AMD pricing, assuming they don't get the single-thread crown, and...