Llano and Bulldozer performance speculation thread

podspi

Golden Member
Jan 11, 2011
1,982
102
106
First-time poster, long-time lurker here. Given all the recent bad news (and rumors) circulating around AMD, I thought it would be interesting to talk about what the performance of these future parts might be, given what little we know about them.

I've given Bulldozer a lot more thought than Llano, so I'll give that a go first. It goes without saying everything I'm about to say is a gross over-generalization/ballpark estimate.

First of all, if we normalize a Star's core IPC to 1, I think it would be safe to define Nehalem and SB IPC to ~ 1.4 and 1.6 respectively.

Now, John Fruehe has said that we can expect 50% more throughout from 33% more cores from Interlagos. He has also been quick to point out that you can't compare server and client workloads, a point which I will address shortly.

So before we continue, I'll make one huge heroic assumption: That Interlagos will operate at the same frequency as the MC chip John Fruehe was referring to. Assuming this is the case, I find the same ~ 12.5% increase in IPC that has been posted over and over again. John is correct though, on the client side there will likely be less threads, at higher clock speeds. If you recall, AMD has said that a single thread running on a module will incur a penalty of ~ 20% if another thread is running on that module. Backing this out, BD performance of a single-thread on a single module should be about ~ 1.4, or about identical to Nehalem.


So to recap:
Stars: 1
Nehalem: 1.4
Sandy Bridge: 1.6
Bulldozer (2-thread/Mod): 1.125
Bulldozer (1-thread/Mod): 1.4


If these happens to be the case, how do things look for Bulldozer? Using this (very) simple model of performance, we'd rate a Gulftown CPU @ 8.4, and a 4-module BD @ 9 for multithreaded code (and identical for single), which is consistent with the rumor that Bulldozer will be very close to Gulftown performance, and not completely blown away (but solidly losing) to a hypothetical 6-core Sandy Bridge processor.

Of course, this is completely ignoring the actual clocking of the processors, while assuming again that Interlagos operates at the same clock speed as MC. That being said, I don't think anything I've said sounds too outlandish, given that Nehalem has been out for a while, if AMD wasn't planning/able to hit Nehalem-level IPC (or able to clock significantly higher) Bulldozer for all intents and purposes a failed project.



All of that being said, I think the worst case scenario is that when the modules are fully loaded, Bulldozer will be able to deliver Stars-level performance. If this is the case, then the numbers change to:

Stars: 1
Nehalem: 1.4
Sandy Bridge: 1.6
Bulldozer (2-thread/Mod): 1
Bulldozer (1-thread/Mod): 1.25

In this case, Bulldozer is still 'close' to Gulftown for multi-threaded workloads, and could conceivable compete in single-threaded performance if AMD manages to push the clocks high enough (4.5-5ghz).



Unless Bulldozer somehow manages to deliver worse performance than current AMD processors, Bulldozer should be competitive enough to at least keep AMD going until they can get Enhanced Bulldozer out in 2012 (pretty quick, considering BD is coming out in Q2).


As for Llano, I think it is going to surprise everyone at how competitive it actually is. If it can hit mid 4ghz on turbo with one or two cores and the graphics performance is there, I think it'll be competitive with the new SB mobile-CPUs. Of course, nothing is stopping Intel from ramping up the clock speed and pulling the plug out from Llano, but given the model refresh-rate of laptops, this probably isn't too large an issue, and probably a huge reason why AMD has been as silent as possible as to the performance of this part.


So, thoughts? Think I'm right, wrong, crazy? :D
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
If you recall, AMD has said that a single thread running on a module will incur a penalty of ~ 20% if another thread is running on that module

AMD didn't say that. They said maximum amount of gain for doubling the cores is 100%, enabling Hyperthreading is 30%, and Bulldozer should be closer to doubling cores, but a 20% penalty is there because there is still little bit of sharing going on.

In summary:
2x cores: 100% gain
Hyperthreading: 30% gain
Bulldozer: 80% gain

Backing this out, BD performance of a single-thread on a single module should be about ~ 1.4, or about identical to Nehalem.

The presence of multi-threaded applications complicate this matter. Performance per clock advantages of Nehalem over Stars was only 15% or so, the 40% is mainly due to Nehalem having Hyperthreading.
 
Last edited:

JFAMD

Senior member
May 16, 2009
565
0
0
First, you never get 100% performance increase off the second core, most of the time is is scaling to about 90-95%.

Secondly, the reason that you cannot use the server performance estimate to compare to client is that server is a THROUGHPUT number and client is a SPEED number. That is like saying a car can pull this much weight on a trailer so it should be able to achieve this speed on the open road. It is apples and oranges. I have explained this hundreds of times and people keep coming back with this 12% number that I keep telling them is really wrong.
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
John, throughput and speed on a CPU are related to IPC just as the speed a car can run at and the amount of weight it can pull are related to horsepower. While it's clear you can't calculate one or the other in a straightforward matter (at least not practically, in the horsepower case you should be able to calculate the theoretical limits of course) -- this exercise was meant more as a lower-bound from what we should expect from Bulldozer (and to a lesser degree Llano).


Of course, if you say what I've done is wrong, it really doesn't make sense to argue with you :-D . I realize you won't be able to say whether I've under or over-shot things, but I'm hoping if I'm wrong its because it'll be faster than what I'm saying, not slower. Also, I think that for client code that actually utilizes 8 integer threads, this analysis is more applicable, though if it isn't I'd love to know why.
 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
From other thread, it looks like there's some leaked info from industry insider which suggest BD 8core is close to but not beating the performance of a 6core 980X. This I'd say is excellent result, Gulftown is a formidable chip on its own right, coming close to it means AMD has made significant improvement in its architecture. Of course Sandy is one notch above Gulftown which puts Intel slightly ahead still. But we finally is seeing AMD with a more modern cpu architecture that is able to compete with Intel if not be the King of the Hill. As long as BD has good features like AMD-V, AVX etc and a nice, decent price, I think it can sell quite well.
 

RyanGreener

Senior member
Nov 9, 2009
550
0
76
From other thread, it looks like there's some leaked info from industry insider which suggest BD 8core is close to but not beating the performance of a 6core 980X. This I'd say is excellent result, Gulftown is a formidable chip on its own right, coming close to it means AMD has made significant improvement in its architecture. Of course Sandy is one notch above Gulftown which puts Intel slightly ahead still. But we finally is seeing AMD with a more modern cpu architecture that is able to compete with Intel if not be the King of the Hill. As long as BD has good features like AMD-V, AVX etc and a nice, decent price, I think it can sell quite well.

Yup. I'm rooting for AMD. I've owned both AMD/Intel, and I'd love for there to be more competition cause in the end, it forces better technology/prices....(usually :D)
 

velis

Senior member
Jul 28, 2005
600
14
81
Not to crap on AMD or something, but I hope Meyer's abrupt departure doesn't have anything to do with expected BD performance. I fear that even if IPC target has been met, frequency target wasn't or is at least showing possible severe issues ATM.

First, you never get 100% performance increase off the second core, most of the time is is scaling to about 90-95%.
Ahem. As much as I respect you, I have to disagree if you're talking about algorithm scaling. The 20% penalty discussed here goes for multiple independent algorithms. This is the penalty the cores will incur by themselves when running 2 threads - as opposed to running one. If these particular cores will be running one multi-threaded algorithm, an additional 5 - 10% (depending on algorithm threading efficiency) penalty should be multiplied in addition to the 20 discussed here. IMHO.

I also suspect the estimated 20% performance loss when running 2T/M was calculated with an "average" load of int/fp ops since the one single FPU is the major limiting factor here? Or did I forget something again?

I must admit I was hoping for 1 thread/module to bring minor performance improvements from being able to use additional free ALUs, but I guess it was too much to ask for the first round. Probably in one of the next revisions.

I would like to remind those that are hoping for 5GHz+ frequency figures that AMD has lately been stuck around 3GHz and BD core redesign isn't suggesting (to me at least) they got rid of the parts that were causing the frequency scaling issues. Also please note that Intel tried to play this game with Netburst and failed. That's not to say that I wouldn't just LOVE to see a 6GHz 8 core BD chip with a Nehalem level IPC. I've been waiting for it for the last 3 years :D
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
From what I've read, BD has a deeper pipeline than the current Phenom line, so that along with the new manufacturing process should allow AMD to break through the 4ghz barrier they've been hovering around.


I'd also like to point out that IBM has had great success with their POWER line that operates at relatively high frequencies. Netburst was just a disaster...
 
Dec 30, 2004
12,553
2
76
Not to crap on AMD or something, but I hope Meyer's abrupt departure doesn't have anything to do with expected BD performance. I fear that even if IPC target has been met, frequency target wasn't or is at least showing possible severe issues ATM.


Ahem. As much as I respect you, I have to disagree if you're talking about algorithm scaling. The 20% penalty discussed here goes for multiple independent algorithms. This is the penalty the cores will incur by themselves when running 2 threads - as opposed to running one. If these particular cores will be running one multi-threaded algorithm, an additional 5 - 10% (depending on algorithm threading efficiency) penalty should be multiplied in addition to the 20 discussed here. IMHO.

I also suspect the estimated 20% performance loss when running 2T/M was calculated with an "average" load of int/fp ops since the one single FPU is the major limiting factor here? Or did I forget something again?

I must admit I was hoping for 1 thread/module to bring minor performance improvements from being able to use additional free ALUs, but I guess it was too much to ask for the first round. Probably in one of the next revisions.

I would like to remind those that are hoping for 5GHz+ frequency figures that AMD has lately been stuck around 3GHz and BD core redesign isn't suggesting (to me at least) they got rid of the parts that were causing the frequency scaling issues. Also please note that Intel tried to play this game with Netburst and failed. That's not to say that I wouldn't just LOVE to see a 6GHz 8 core BD chip with a Nehalem level IPC. I've been waiting for it for the last 3 years :D

uhhh if by 3ghz you mean 4ghz then yeah. Going to 32nm is going to help that, we'll probably all hit 4Ghz IMO.
 

maniac5999

Senior member
Dec 30, 2009
505
14
81
From what I've read, BD has a deeper pipeline than the current Phenom line, so that along with the new manufacturing process should allow AMD to break through the 4ghz barrier they've been hovering around.


I'd also like to point out that IBM has had great success with their POWER line that operates at relatively high frequencies. Netburst was just a disaster...

Designing for higher clockspeeds isn't inherently a bad thing. For over 15 years higher clockspeed virtually always won at the ned of the day. What happened with Netburst was that Intel seemed focused on higher clockspeed at the expense of all else. Intel seemed to decide that clockspeed sold chips, and decided that they'd prefer a higher clockspeed arch. to a better IPC arch. (P3) At the same time AMD released a far superior low-speed high-IPC arch. in K8 that just made Intel look silly.

Remember, Intel responded with Conroe, which was a similarly designed low-speed, high-IPC core. Since then, both have just used derivatives of those two cores. (up until sandy and BD)

So, What we've had for the last half decade is a battle between two low-speed architectures. Remember Conroe debuted at 2.66ghz, and K8 at 2.2ghz. While both have been able to scale it's designs up a good bit (e8600 and i7 980x @3.33, K8 6400+ @ 3.2) No one has been able to hit the 3.8ghz that Netburst hit on 90nm. I would suspect that with an arch. that's designed from the ground up with more of a focus on speed, something in the 4.5ghz range would be possible today, with reasonably certain overclocking to 5ghz+ Because of this, I have high hopes for BD if it is indeed designed for high clockspeed, because my gut says that's currently where the low hanging fruit is, relative to trying for further IPC improvements.

*Note: SB is getting close to 3.8ghz (the 2600 can turbo to that speed), but it's the most significant revision to the conroe core since release back in 2006.
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
Yea, it does look like Meyer's departure is unrelated to Llano and Bulldozer. If that is the case, unless they're gearing up for a quick sale I think the board made a huge mistake. AMD is launching 3 major new products this year, not a good time to oust the CEO.

Back on topic, given the current clock speeds of K10 on 45nm, I think Bulldozer should be able to hit 5ghz (assuming yields are going well). As I said before, even Llano should be able to hit mid 4ghz (again assuming yields are good).
 

HW2050Plus

Member
Jan 12, 2011
168
0
0
So to recap:
Stars: 1
Nehalem: 1.4
Sandy Bridge: 1.6
So if we assume the above numbers (which I do not agree).

Then you have Bulldozer getting additional 80% by the BD design.

So what you can roughly expect:
Bulldozer: ~1.8 (per module)
Bulldozer: ~1.0 (per core)
Sandy Bridge: ~ 1.6 (per HT-pair)
Sandy Bridge: ~ 1.2 (per core)
Nehalem: ~ 1.4 (per HT-pair)
Nehalem: ~ 1.1 (per core)
Stars: ~ 1.0 (per HT-pair -> has none)
Stars: ~ 1.0 (per core)
Future SB successor with module tech: ~ 2.1 (per module)
Future SB successor with module tech: ~ 1.2 (per core)

So Bulldozer will be somewhat faster than Sandy Bridge. By how much must be shown by actual benchmark results.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
So to recap:
Stars: 1
Nehalem: 1.4
Sandy Bridge: 1.6
Bulldozer (2-thread/Mod): 1.125
Bulldozer (1-thread/Mod): 1.4

As IntelUser2000 explained, Nehalem's IPC is not 40% higher than Phenom II. I think it's more accurate to say:Stars 1, Nehalem 1.2, SB 1.4.

All of that being said, I think the worst case scenario is that when the modules are fully loaded, Bulldozer will be able to deliver Stars-level performance.

A better way to think of it is that the performance gain from running 2 threads on 1 module would be slightly less than when running 2 threads on two "Full" cores (or on two different modules). But I still believe that BD will have enough IPC improvements that even in worst case scenarios 2 threads running on a module would be faster than 2 Stars cores.

In this case, Bulldozer is still 'close' to Gulftown for multi-threaded workloads, and could conceivable compete in single-threaded performance if AMD manages to push the clocks high enough (4.5-5ghz).

My guess/estimate for IPC improvement of Bulldozer over Stars would be 10-15% higher, and when all modules are fully loaded the IPC drops to 5-10% (still faster). If we assume such numbers are correct, then BDs IPC would end up being slightly slower than Nehalem.

To compensate BD would have to be clocked higher than competing Intel processors. I'll give an example of how BD would compare in such a case:

Relative Performance normalized to a Stars/K10.5 core:
Phenom 2 945 (3GHz) (8MB total cache) (die: 258mm^2) running 4 threads = 4
Core i7-950 (3GHz) (9MB total cache) (die: 263mm^2) running 4 threads = 1.2*4 = 4.8
Core i7-980X (3.33GHz) (13.5MB total cache) (die 248mm^2) running 4 threads = 1.332*4 = 5.32
Core i7-2600K (3.4GHz) (9MB total cache) (die: 216mm^2) running 4 threads = 1.586*4 = 6.34
2-Module BD (3GHz) (8MB total cache) (die: ~140mm^2) running 4 threads = 1.05*4 = 4.2
2-Module BD (3.6GHz) (8MB total cache) (die: ~140mm^2) running 4 threads = 1.26*4 = 5.04
2-Module BD (3.8GHz) (8MB total cache) (die: ~140mm^2) running 4 threads = 1.33*4 = 5.32
4-Module BD (3GHz) (16MB total cache) (die: ~280mm^2) running 4 threads = 1.1*4 = 4.4
4-Module BD (3.6GHz) (16MB total cache) (die: ~280mm^2) running 4 threads = 1.32*4 = 5.28

A few things to note here:
- The 2-Module BD should be tiny
- BD shouldn't need 4.5GHz to compete with i7/Gulftown on single thread performance, a frequency between 3.4-3.8GHz should do.
- SB will be untouchable when running 4 threads or less.

Relative performance when running 8 threads:
Core i7-950 (3GHz) (9MB total cache) (30% from HT) = 4.8+30% = 6.24
Core i7-2600K (3.4GHz) (9MB total cache) (40% from HT) = 6.34+40% = 8.87
Core i7-980X (3.33GHz) (13.5MB total cache) (30% from HT) = 7.99+30% = 8.79
4-Module BD (3GHz) (16MB total cache) = 1.05*8 = 8.4
4-Module BD (3.4GHz) (16MB total cache) = 1.19*8 = 9.52
4-Module BD (3.6GHz) (16MB total cache) = 1.26*8 = 10.08
4-Module BD (3.8GHz) (16MB total cache) = 1.33*8 = 10.64

What to note here:
- Nehalem based i7 looks weak in this case.
- SB and Gulftown offer comparable performance when running 8 threads.
- A 4-Module BD clocked between 3-3.2GHz should compete well against Gulftown and SB when running 8 threads.

// End of Speculation :)
 
Last edited:
Dec 30, 2004
12,553
2
76
My guess is they wouldn't bother leaking this to stave off Intel purchases if they didn't actually have the performance, because people would just put off their purchase and then go Intel anyways when BD benchmarks are released.
 
Dec 30, 2004
12,553
2
76
See above, its not quite 40% but it's well above 15% or 20%. People are either seriously underselling Nehalem or overrating Phenom II IPC. Seriously, its at the same level as 65nm Core 2 Quads aka Kentsfield...http://www.anandtech.com/bench/Product/53?vs=86

I think a more realistic scaling would be

Phenom II: 1.0
Nehalem: 1.3 - 1.35x
Sandy Bridge: 1.4 - 1.45x (SB is not 20% faster than Nehalem per clock, more like 10 - 15%)

I think the clocking abilities of Bulldozer will be just as important, if not more so, than it's IPC level. If it can get 4GHz+, even with a moderate IPC increase over PH II, it'll have a decent show against SB/Gulftown, but perhaps not 6/8 core Ivy Bridge which will most likely be a step above.

All that being said, it's not inconceivable for AMD to have the performance crown for the first time since 2005, however short lived it may be... they probably have a window of about a quarter or so, unless Intel pushes ahead the release of S2011 / 6 core SB if Bulldozer, well, bulldozes 4 core SB. :)

One final caveat - Intel is currently seriously sandbagging on SB frequencies. In fact, with the overclockability seen in every Intel CPU since Core 2, I'd say they have been sandbagging for the past 4 years! In a pinch Intel could easily release a 4GHz+ SKU to remain competitive. Most 2500K/2600Ks hit 4.2 - 4.3GHz on stock volts, and 4.5GHz+ with slight voltage bumps.

not to mention sandbagging the cache latencies in the dual cores.

Sandybridges are made of sandbags!
 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
So, What we've had for the last half decade is a battle between two low-speed architectures. Remember Conroe debuted at 2.66ghz, and K8 at 2.2ghz. While both have been able to scale it's designs up a good bit (e8600 and i7 980x @3.33, K8 6400+ @ 3.2) No one has been able to hit the 3.8ghz that Netburst hit on 90nm. I would suspect that with an arch. that's designed from the ground up with more of a focus on speed, something in the 4.5ghz range would be possible today, with reasonably certain overclocking to 5ghz+ Because of this, I have high hopes for BD if it is indeed designed for high clockspeed, because my gut says that's currently where the low hanging fruit is, relative to trying for further IPC improvements.

*Note: SB is getting close to 3.8ghz (the 2600 can turbo to that speed), but it's the most significant revision to the conroe core since release back in 2006.

It's very true, 2xlow speed variations battling it out, but of course Intel's got a better design. But now the process technology is catching up, even the low speed can be clocked higher, it's almost 5hgz now, amazing!
 

BD231

Lifer
Feb 26, 2001
10,568
138
106
I have my doubts but AMD has blind sided us with some decent ingenuity in the past and we really have no reason to think AMD won't release a viable product. I doubt they'd be that helpless this far down the line and while Intel can spank AMDs current offerings 2 generations removed its not like they have not had viable products for us.
 
Last edited:

JFAMD

Senior member
May 16, 2009
565
0
0
My guess is they wouldn't bother leaking this to stave off Intel purchases if they didn't actually have the performance, because people would just put off their purchase and then go Intel anyways when BD benchmarks are released.

It's just standard policy not to release benchmarks prior to launch. And I don't approve leaks.
 

RyanGreener

Senior member
Nov 9, 2009
550
0
76
It's just standard policy not to release benchmarks prior to launch. And I don't approve leaks.

When it is said that these are coming out in Q2, does that count as the beginning or end of Q2? Interested to see how long we'll have to wait before we can see any results.
 
Dec 30, 2004
12,553
2
76
My guess is they wouldn't bother leaking this to stave off Intel purchases if they didn't actually have the performance, because people would just put off their purchase and then go Intel anyways when BD benchmarks are released.
It's just standard policy not to release benchmarks prior to launch. And I don't approve leaks.

Something could be standard policy but this is a non-standard scenario :)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
It's just standard policy not to release benchmarks prior to launch. And I don't approve leaks.

Just quoting John to add to his comments my usual spiel that at best any would-be leaked performance results are going to be lower-limits (under-estimations) of actual performance because the platform itself is naturally still undergoing optimizations and tweaks let alone any steppings of the CPU itself.

I personally deplore "leaks" as they generally add very little tangible or genuine value to the reader. Sure it whets out appetite and gives us something to be distracted by while time rolls on, but when was the last time any of us saw leaks of a CPU that meant a damn to us months later when the chip actually launched?

For example I like Coolaler's leaks as much as the next guy, but all the Prime95 leaks and SPi 1M benches in the world didn't really tell us much about the price/performance or performance/watt story of Sandy Bridge, let alone even nail the performance number itself for reasons mentioned at the outset of this post.
 

JFAMD

Senior member
May 16, 2009
565
0
0
When it is said that these are coming out in Q2, does that count as the beginning or end of Q2? Interested to see how long we'll have to wait before we can see any results.

Q2 is April 1 through the end of June. We don't get any more granular, I have not seen (in my time) a month granularity unless it happened to be the last month of the quarter, and at that point it was pretty obvious so we switched to "this month" instead of giving the quarter.

Something could be standard policy but this is a non-standard scenario :)

Actually not. What drives this is not wanting to disrupt the current business for our partners.

If you think about it, this is not just about AMD but also about partners. People have forecasts and commitments. Suddenly throwing out some numbers into the market causes churn. It causes stalled sales. It causes returns. It causes all kinds of problems.

The number of incremental sales is much smaller than the overall big picture impact to both our business and our partners' business.