With the current rate of Intel CPU performance increases, could AMD be catching up?

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
In regards to the bold, reality check equals a big "ugh" here :(

Consider this - If you took the AMD that exists today (financially, human resources, market presence, etc) and time-ported them back to circa 2006 when bulldozer was just starting development and tasked present-day AMD to develop what would one-day become bulldozer (with the same development costs that existed back then) then there is virtually no chance AMD would have had the resources to even develop the bulldozer (and Piledriver) that exists today.

They worked on borrowed time and money just to develop bulldozer. That was their hail-mary attempt after Core2Duo devastated their Phenom/Opteron revenue.

Now you want to talk about the prospects of an even less capable AMD, fewer employees, fewer resources, etc, facing an environment where IC design and validation is not only more expensive but is a LOT more expensive (future nodes)...you want to speculate on the likelihood that that AMD is going to come out with a "next-gen" microarchitecture that will supplant the bulldozer lineage?

I don't see how it is possible. The math just doesn't add up. I think we'll see AMD fall back to evolving their bobcat successors and become Via by a different name.

Yes, Kaveri is probably it for big cores. From what I've read, AMD is borrowing money to feed their R&D budgets right now. The only way AMD could afford to design another big core (still evolutionary, not really "next-gen" from the ground up - but still a major redesign in the core) would be to drop the ARM nonsense, but I'm afraid that won't happen, since it's Papermaster's baby.

So, I agree with you, AMD is going to become like VIA (as ShintaiDK suggested last year), though I think they will have a stronger product portfolio than Via.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Johan De Galas here at Anandtech found it to be primarily comprised of three things: low clock speeds (relative to the pipeline length), L1 instruction cache is too small, branch misprediction penalty is too large. Cache latency was not a major factor.

http://www.anandtech.com/show/5057/the-bulldozer-aftermath-delving-even-deeper

You've referred to this before to refute what others claimed can be bottlenecks for BD, but rather than explain again why I think you're overgeneralizing his limited results let me just refer you to what he said on the last page:

We do agree that it is a serious problem for desktop applications as most of our profiling shows that games and other consumer applications are much more sensitive to L2 cache latency. It was after all one of the reasons why Nehalem was not much faster than the older Penryn based CPUs. Lowly threaded desktop applications run best in a large, low latency L2 cache.But for server applications, we found worse problems than the L2 cache.

His investigation was mainly in server and HPC applications. He says flat out that he doesn't think his conclusions apply to desktop applications.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,423
663
126
Now you want to talk about the prospects of an even less capable AMD, fewer employees, fewer resources, etc, facing an environment where IC design and validation is not only more expensive but is a LOT more expensive (future nodes)...you want to speculate on the likelihood that that AMD is going to come out with a "next-gen" microarchitecture that will supplant the bulldozer lineage?

Just wondering what you mean by "next-gen microarchitecture"? Do you mean:

a) A similar huge leap/redesign as was done when going from e.g. Netburst->Conroe(Core) architecture?
or
b) Just a "normal" next-gen architecture such as when going from e.g. SB->Haswell?

If you're talking about a), then not even Intel is planning any such architectural change as far as I'm aware. So is there any reason AMD will have to? Isn't there enough room for improvement for AMD by doing b) only? I.e. is AMDs current architectural base so seriously flawed that a) is needed?

Reading the past few posts I don't get that impression. Instead there seems to be several obvious tweaks that could be done to their current architecture that will improve performance quite a lot without requiring a completely redesigned microarchitecture as in a). But perhaps I misinterpreted that? :hmm:
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
If you're talking about a), then not even Intel is planning any such architectural change as far as I'm aware. So is there any reason AMD will have to? Isn't there enough room for improvement for AMD by doing b) only? I.e. is AMDs current architectural base so seriously flawed that a) is needed?

They *do* need a redesign if they were to stay in the game, if not for the atrocious performance/watt, because of the even worse performance/area of the Bulldozer architecture. With the CPU area of Trinity, at the same node, Intel could fit two cores *and* a GPU. With the area of Bulldozer, Intel could fit one and a half SNB. Go for the next node and things becomes even worse: Bigger GPU and two cores in a size smaller of the CPU portion of Trinity, or two 4C IVB in a Bulldozer/Vishera die.

This will reflect on the COGS. AMD COGS for the same node at the same maturity point should be bigger than Intel's, and if Intel yields are better, things become even worse for AMD.

So yes, they do need a redesign, but they can't afford it, hence IDC's conclusion that they are going to become Via on steroids.
 

Piroko

Senior member
Jan 10, 2013
905
79
91
I'm not so sure if this is a problem related to Bulldozer or the manufacturing process. From a transistor count view Trinity is competitive, we might see the result of GLF stretching the numbers of their process.

Also, Bobcat was positively tiny on TSMCs 40nm process, so...
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
I'm not so sure if this is a problem related to Bulldozer or the manufacturing process. From a transistor count view Trinity is competitive, we might see the result of GLF stretching the numbers of their process.

I don't think this comes from GLF in any way, as the CPU portion of Llano isn't much bigger than what a theoretical Deneb die shrink sans L3 would look, meaning that there doesn't seems to be any problem with GLF 32nm.

Also this:

http://www.tomshardware.com/news/Steamroller-High_Density_Libraries-hot-chips-cpu-gpu,17218.html

Shouldn't be possible in a highly hand-optimized design. This points to design inefficiencies, it is synthesis removing the synthesis die area penalty at the cost of more clock and efficiency penalties.
 

pyjujiop

Senior member
Mar 17, 2001
243
0
76
AMD closed the gap a tiny bit with Piledriver relative to IB, but at that rate, they'd probably catch Intel in about 25 years.

AMD has actually beaten Intel twice, first with the original Athlon in 1999, and then with the introduction of the A64 in 2003. Both times, Intel won back the performance crown fairly quickly, though. AMD could conceivably come up with another design that's a quantum leap forward, but I think they'd still have a hard time beating Intel now, because:

1. The Intel of today executes a lot better than they did back then, when the biggest headlines from them were about fiascos like RDRAM, the i820 bug, and the Coppermine 1133 recall.
2. Intel's manufacturing capability is far beyond anything that is available to AMD. The third-party foundries that AMD relies on are technologically inferior to Intel's own production facilities. Because of this, AMD products will always have higher power draw that limits their ability to just bump up clock speeds.

Even if AMD pulled another rabbit out its hat, Intel would probably just move its ultra high-end parts downmarket, like they did in the days of the Pentium 4 Emergency Edition. Only now, their high-end part isn't a crappy NetBurst egg-fryer.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
AMD closed the gap a tiny bit with Piledriver relative to IB, but at that rate, they'd probably catch Intel in about 25 years.

AMD has actually beaten Intel twice, first with the original Athlon in 1999, and then with the introduction of the A64 in 2003. Both times, Intel won back the performance crown fairly quickly, though. AMD could conceivably come up with another design that's a quantum leap forward, but I think they'd still have a hard time beating Intel now, because:

1. The Intel of today executes a lot better than they did back then, when the biggest headlines from them were about fiascos like RDRAM, the i820 bug, and the Coppermine 1133 recall.
2. Intel's manufacturing capability is far beyond anything that is available to AMD. The third-party foundries that AMD relies on are technologically inferior to Intel's own production facilities. Because of this, AMD products will always have higher power draw that limits their ability to just bump up clock speeds.

Even if AMD pulled another rabbit out its hat, Intel would probably just move its ultra high-end parts downmarket, like they did in the days of the Pentium 4 Emergency Edition. Only now, their high-end part isn't a crappy NetBurst egg-fryer.

Remember the headroom. 77W IB vs a 140W+ PD. AMD is already way beyond spec and beyond P4 in consumption.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Just wondering what you mean by "next-gen microarchitecture"? Do you mean:

a) A similar huge leap/redesign as was done when going from e.g. Netburst->Conroe(Core) architecture?
or
b) Just a "normal" next-gen architecture such as when going from e.g. SB->Haswell?

If you're talking about a), then not even Intel is planning any such architectural change as far as I'm aware. So is there any reason AMD will have to? Isn't there enough room for improvement for AMD by doing b) only? I.e. is AMDs current architectural base so seriously flawed that a) is needed?

Reading the past few posts I don't get that impression. Instead there seems to be several obvious tweaks that could be done to their current architecture that will improve performance quite a lot without requiring a completely redesigned microarchitecture as in a). But perhaps I misinterpreted that? :hmm:

Well, if we take Rory's statements as being true then he has already written off AMD doing either (a) or (b).

Big cores and leading edge process nodes are both written off as being the old way the old AMD approached the market. It would make him a liar if he tells the analysts that but then internally he keeps spending and prioritizing the development of big cores on leading edge nodes.

Now the reality is Rory had little choice but to make that his strategy because the cost associated with developing big cores is immense, and it gets even more immense as you go to newer nodes. AMD simply doesn't have the cash to do that anymore, its not a matter of willpower and desire.

If you listened to that last conference call that mrmt linked up with Rory talking about where AMD is going I think it is pretty clear that Rory is prioritizing AMD's remaining resources towards furthering the bobcat/jaguar APU lineage going forward.

It will still be competitive, no one else is capable of marrying x86 compatibility to GPU capability like AMD can. But the x86 compatibility puts them into Via territory for niche market spaces (like the PS4) that others can't get into while the GPU IP keeps them out of reach from pretty much all others (they can easily defend themselves from would-be intruders into their niche market spaces once they are established in them).

The question is what sort of TAM does that leave AMD to play in? Is it enough TAM to enable a $6B/yr revenue or is the TAM from all those niche markets only large enough to support a $2B/yr revenue model?

It is obvious that AMD is going to become a smaller fish as they try to find ponds in which they can survive. But how much smaller is that going to be?

As for (a) and (b) above, I personally doubt we'll see AMD finish out Excavator, and I also don't see how AMD could hope to find the cash needed to support the R&D of building next-gen after Steamroller. I think Kaveri is going to be it for big cores from AMD unless someone swoops in and hands them $10B or so.
 

Pilum

Member
Aug 27, 2012
182
3
81
Having only two integer pipelines per core is yet another big negative (down from 3 in Thurban). I have no idea if AMD will stick to the BD architecture after Kaveri, but they will likely wait till 20nm, IMHO, to released a redesigned, rather than tweaked, core. That is if they are still around and have the R&D budget to do a major redesign.

Three int ports per core would be a nice improvement in peak performance and especially helpful in handling today's burtsy compiled code profiles/traces.
Maybe they'll get the AGLUs to work as advertised. Remember that AMD indicated that the AGUs in BD were serving double-time as ALUs for simple operations (ADD, INC). While this turned out not to be the case in BD and PD, maybe they can achieve this in SR. This would reduce a lot of pressure on the 2 proper ALUs.

It's hard to judge how this would turn out in practice, I guess this may result in the effective performance of a 3-wide ALU issue core.

But of course this would require a full-fledged Result Forwarding Network between the ALUs and AGUs, and to my knowledge these are rather expensive in area and power. If this is a feasible option for a design which already is beyond its power wall remains to be seen.
 

Pilum

Member
Aug 27, 2012
182
3
81
But note that the Intel chips also have a Turbo frequency at about 0.5 GHz higher than the base frequency. So either they'd have to skip that, or they'd be at 5.0 GHz in Turbo. That would mean a TDP @ 200 W... :eek:
IVB has two turbo settings: +200MHz for 4core/3core loads, and +400MHz for 2core/1core loads. So the TDP would be ~145W. That's very close to the practical thermal power of the 8350, so this really shouldn't be a problem for people who can acceppt AMDs current products. :)
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
You've referred to this before to refute what others claimed can be bottlenecks for BD, but rather than explain again why I think you're overgeneralizing his limited results let me just refer you to what he said on the last page:

His investigation was mainly in server and HPC applications. He says flat out that he doesn't think his conclusions apply to desktop applications.
How do they not? He may not have looked into desktop applications, but all of the weaknesses that he found are at a lower level than the L2. Better branch prediction and higher clock speed are universally beneficial. And the L1 I-cache should help negate the performance hit of going from one thread/module to two.

That's not to say that the uarch won't be freed from one bottleneck just to run into another. But the discussion we were having (unless I missed something?) was platform and application agnostic. I don't disagree that a slow, oversized L2 is not the best idea if you're "running lightly threaded applications." But our discussion had nothing to do with that.

I see the Bulldozer uarch as a workhorse anyways. Intel's architectures, by contrast, are racehorses. Obviously the workhorse will never be as fast as a racehorse when you're running laps around the track, but in this case, the racehorse is better for plowing the fields as well. That's a problem.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I thought the discussion was "why is BD slow." mrmt listed a bunch of complaints Darek Mihocka had. You can find a totally separate set from Agner Fog and various other people. You responded to say that it actually only had three real weaknesses because that's what Johan De Gelas stressed in his article, but this is ignoring that he deliberately only tried to draw conclusions for server and HPC workloads.

I'm not sure how lightly threaded performance was never part of the discussion, which as far as I could tell was only a vague question on AMD's ability to "catch up." But do note that the original post at least said desktop CPUs and not server CPUs.

I think you're trying to say that AMD made some deliberate tradeoffs where they sacrificed performance in some types of applications to win it back in others. It's going to be difficult for anyone to say what is and isn't a smart tradeoff, but if you're even going to try that first someone has to decide what programs are interesting which isn't something the original poster did.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Maybe they'll get the AGLUs to work as advertised. Remember that AMD indicated that the AGUs in BD were serving double-time as ALUs for simple operations (ADD, INC). While this turned out not to be the case in BD and PD, maybe they can achieve this in SR. This would reduce a lot of pressure on the 2 proper ALUs.

It's hard to judge how this would turn out in practice, I guess this may result in the effective performance of a 3-wide ALU issue core.

But of course this would require a full-fledged Result Forwarding Network between the ALUs and AGUs, and to my knowledge these are rather expensive in area and power. If this is a feasible option for a design which already is beyond its power wall remains to be seen.

That would be a better solution in terms of die area and power,as you point out. There are and insignificant number of instructions that lead to 3 int operations being carried out in PD (and then the second core is effectively starved).

What I outlined was basically using smaller nodes to move CMT towards a more traditional big core. Like wise, the shared FPU needs to go wider to 2X256b and add AVX2, simply for compatibility and a performance bump (though not like as big a boost as Haswell will give Intel, but AMD is already using FMA). I think that a 2X5 port scheduler would be less complex than Haswells 1X 8 port. With this type of scenario a Two module/four core 'Excavator + CGN 2.0' could have been a formidable desktop processor @20nm. A four to six module Opteron may have been a good server CPU.

But this is just idle speculation. AMD will have to abandon Big Core as has been explained throughout this thread. Without the ARM devopment activities, a waste IMHO, we may have seen Excavator - but unless AMD's financial fortunes changed, that would have simply extended their big line for one more generation.

Bottom line is that Bulldozer needed to deliver as promised (JFAMD) for AMD to gain market share on Intel and sustain it's large core development.
 

vampirr

Member
Mar 7, 2013
132
0
0
AMD's Piledriver - "Vishera"(Bulldozer 2.0) are 32nm processors and they can beat Intel's Ivy and soon released Haswell that are 22nm. FX 8350 can beat i5 3570k, the only problem is the nm size of Piledriver.

Steamroller will be out by end of this year Q3/Q4, it will be a 28nm processor so there will be space for better performance, tweaks and improvements and also less TDP and power consumption and there are couple of options AMD can do...

Way of Intel:
AMD decides to considerably lower TDP/power consumption with 28nm Steamroller and with 5-10% performance gain/improvements over Piledriver.

The Hybrid:
Lower TDP/power consumption with 15-20% of performance gain/improvements over Piledriver and be on the same level as Ivy Bridge and beat most of Ivy Bridge models.

The way of AMD:
Take a risk, go all out... Aim for 25-30% performance gain/improvements over Piledriver and beat Ivy and most Haswell's SKU's and have same TDP/power consumption.

^this is what AMD can do, three options... The Way of Intel is most unlikely.^
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
The execution unit arrangement goes well beyond having 2 ALUs instead of 3, even if that's how AMD publicly described it. A lot of the arguments JFAMD posted on the AMD blog and elsewhere were not really accurate descriptions of either BD or K10.

BD has two at sort of K8 style integer units. Those do both ALU and AGU calculations. Only one of them can do branches and you need to schedule to both of them to do stores (one to calculate the address and the other to perform the store, it's similar to Intel having separate store address and store data ports). So it's not just that you can't do 3 ALU operations simultaneously but that you can't do 2 load/ALU + 1 branch, or 1 store + anything. This has an impact on a lot of real code. Sure it could be masked if you have a bigger bottleneck earlier like in the decode throughput or L1 way collisions while sharing the module. But of course this is never going to be true for all code nor are all users going to fully occupy 4 modules (which is a far cry from saying they're only using lightly threaded code)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Steamroller will be out by end of this year Q3/Q4, it will be a 28nm processor so there will be space for better performance, tweaks and improvements and also less TDP and power consumption and there are couple of options AMD can do...

Piledriver is 32nm HKMG w/SOI. 28nm is HKMG bulk-Si.

What you posted is only true if you feel SOI adds no value above and beyond what 32nm HKMG bulk-Si would offer, such that shrinking to 28nm HKMG bulk-Si would offer a shrink benefit.

I foresee an areal shrink benefit to 28nm (lowers production cost for AMD), but not a power-reduction nor a clockspeed increase benefit.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
I thought the discussion was "why is BD slow." mrmt listed a bunch of complaints Darek Mihocka had. You can find a totally separate set from Agner Fog and various other people. You responded to say that it actually only had three real weaknesses because that's what Johan De Gelas stressed in his article, but this is ignoring that he deliberately only tried to draw conclusions for server and HPC workloads.
Did you not read anything I had written?
I think you're trying to say that AMD made some deliberate tradeoffs where they sacrificed performance in some types of applications to win it back in others. It's going to be difficult for anyone to say what is and isn't a smart tradeoff, but if you're even going to try that first someone has to decide what programs are interesting which isn't something the original poster did.
The problem is that they sacrificed both of those "types of applications."
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
If you listened to that last conference call that mrmt linked up with Rory talking about where AMD is going I think it is pretty clear that Rory is prioritizing AMD's remaining resources towards furthering the bobcat/jaguar APU lineage going forward.

They already have said they will continue in the Server market and i dont believe they were only talking about small core designs.

It is another thing they changed their focus to smaller cores and another they will not design ever again a big core.

SR design is finished, Excavator will be an update like PD is to Bulldozer. It is 99% sure those two will be released in the next couple of years. They spend a lot for the Bulldozer architecture (CMT etc) and they going to keep it for 4-5 years.

Like Intel, AMD focusing on low power Mobile products and servers, they simple are not competing for HighEnd desktops. They can release an SR in desktop like BD and PD simple because they will produce it for the server market.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
AMD has been suspiciously quiet. I'm chalking it up to being a bad thing (e.g. the scenario that IDC has painted), rather than a good thing.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
AMD has been suspiciously quiet. I'm chalking it up to being a bad thing (e.g. the scenario that IDC has painted), rather than a good thing.

Quiet? The CEO himself is giving speeches in conferences. I wouldn't call this quiet.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
That's not the kind of noise we want to hear. Where are the roadmaps?

They gave a roadmap for kabini, temash and for ARM server chips. The products that serve the areas deemed essential in AMD new strategy. What kind of noise do you want beyond this?

If you are expecting a big bang announcement of the end of their big core line, forget it. They still need to fill the wsa quarterly quota, meaning that they will order big core until they can find someone to buy them.

Think of it as a slow death, not a quick one.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Did you not read anything I had written?

I read everything you wrote and I'm really struggling to determine what your problem is. What I said was an exact description of what happened. mrmt said one person had various criticisms, you replied to say that Johan De Gelas determined that there issues with BD were primarily due to three problems. No where did anyone say anything about desktop or server apps.

You've played this game before, any time someone says that BD has any weaknesses you link that article to say those weaknesses aren't actually weaknesses at all.