With the current rate of Intel CPU performance increases, could AMD be catching up?

Homeles · Mar 22, 2013

any time someone says that BD has any weaknesses you link that article to say those weaknesses aren't actually weaknesses at all.

So you didn't read what I said. Got it.

Fjodor2001 · Mar 22, 2013

Idontcare said:
Well, if we take Rory's statements as being true then he has already written off AMD doing either (a) or (b).

Big cores and leading edge process nodes are both written off as being the old way the old AMD approached the market. It would make him a liar if he tells the analysts that but then internally he keeps spending and prioritizing the development of big cores on leading edge nodes.

Now the reality is Rory had little choice but to make that his strategy because the cost associated with developing big cores is immense, and it gets even more immense as you go to newer nodes. AMD simply doesn't have the cash to do that anymore, its not a matter of willpower and desire.

If you listened to that last conference call that mrmt linked up with Rory talking about where AMD is going I think it is pretty clear that Rory is prioritizing AMD's remaining resources towards furthering the bobcat/jaguar APU lineage going forward.

It will still be competitive, no one else is capable of marrying x86 compatibility to GPU capability like AMD can. But the x86 compatibility puts them into Via territory for niche market spaces (like the PS4) that others can't get into while the GPU IP keeps them out of reach from pretty much all others (they can easily defend themselves from would-be intruders into their niche market spaces once they are established in them).

The question is what sort of TAM does that leave AMD to play in? Is it enough TAM to enable a $6B/yr revenue or is the TAM from all those niche markets only large enough to support a $2B/yr revenue model?

It is obvious that AMD is going to become a smaller fish as they try to find ponds in which they can survive. But how much smaller is that going to be?

As for (a) and (b) above, I personally doubt we'll see AMD finish out Excavator, and I also don't see how AMD could hope to find the cash needed to support the R&D of building next-gen after Steamroller. I think Kaveri is going to be it for big cores from AMD unless someone swoops in and hands them $10B or so.

AMD has obviously been able to bring out both Llano, Trinity and soon Kaveri despite the economic constraints you mentioned. Why shouldn't they be able to continue like that going forward?

Also, according AtenRa the statement from Rory does not necessarily have to be interpreted as AMD leaving the big core x86 desktop CPU market segment.

Exophase · Mar 22, 2013

Homeles said:
So you didn't read what I said. Got it.

Quit screwing around and playing games. If you have a point to make make it. And stop telling me I didn't read for describing exactly what happened.

mrmt says that Darek gave reasons for why BD is so slow:

Darek Mihocka also did some low level analysis of Bulldozer and he gave us a hint on why Bulldozer is so slow:

You said that it was actually primarily only due to three reasons:

Johan De Galas here at Anandtech found it to be primarily comprised of three things: low clock speeds (relative to the pipeline length), L1 instruction cache is too small, branch misprediction penalty is too large. Cache latency was not a major factor.

Then I said:

mrmt said one person had various criticisms, you replied to say that Johan De Gelas determined that there issues with BD were primarily due to three problems. No where did anyone say anything about desktop or server apps.

Then you have the gall to repeatedly telling me I'm not reading. Maybe the problem is I can't read whatever point it is you're not expressing. If it's that stuff about its problems being more "low level" than L2 latency and about improving L2 latency being "treating the symptom" that's just a load of crap. L2 latency to L1 data cache actually does matter and increasing the size of L1 icache 1.5x or whatever it is won't always reduce the miss rate as much or even at all.

AtenRa · Mar 22, 2013

mrmt said:
With the CPU area of Trinity, at the same node, Intel could fit two cores *and* a GPU.

AMD Module (excluding L2) at 32nm = 19,42mm2 with 67M tranistors

Intel SandyBridge Core (+HT) (excluding L2) at 32nm = 16,5mm2 with 55M transistors.

The difference in size is only 18% or 2.92mm2, that translates to Trinity being faster in the majority of the MT apps against the Core i3(SB).

mrmt said:
With the area of Bulldozer, Intel could fit one and a half SNB.

Again, If you compare the Core i7 3820 (294mm2) at 32nm against the FX8350 (315mm2) at 32nm, you will find that at almost the same die size the MT performance is almost the same.
The FX8350 has 8+8 = 16MB L2+L3 Cache vs 1+10 = 11MB L2+L3 Cache for the Core i7 3820.

http://www.anandtech.com/show/5276/intel-core-i7-3820-review-285-quadcore-sandy-bridge-e

2is · Mar 22, 2013

Why would you compare a 3820? How about a LGA 1155 processor which is a much more accurate comparison both in terms of cost and performance which comes in at 216mm2

Only reason to compare a 3820 is to make AMD's die sizes not seem as big as they really are.

AtenRa · Mar 22, 2013

2is said:
Why would you compare a 3820? How about a LGA 1155 processor which is a much more accurate comparison both in terms of cost and performance which comes in at 216mm2

Only reason to compare a 3820 is to make AMD's die sizes not seem as big as they really are.

Why ?? because that is the only Intel CPU to compare it apples to apples. Almost same transistor count for the CPU without any iGPUs (FX has 1.2B transistors vs 1.27B transistors for the Core i7 3820). Both were made for the Server first and Desktop secondly.

Blandge · Mar 22, 2013

AtenRa said:
Why ?? because that is the only Intel CPU to compare it apples to apples. Almost same transistor count for the CPU without any iGPUs (FX has 1.2B transistors vs 1.27B transistors for the Core i7 3820). Both were made for the Server first and Desktop secondly.

Important to note that the 3820 has 2 extra memory controllers and 24 more lanes of PCIe.

2is · Mar 22, 2013

AtenRa said:
Why ?? because that is the only Intel CPU to compare it apples to apples. Almost same transistor count for the CPU without any iGPUs (FX has 1.2B transistors vs 1.27B transistors for the Core i7 3820). Both were made for the Server first and Desktop secondly.

Right, because you buy your processors based on transistor count right? And all the comparison reviews? They chose their comparison reviews based on transistor count to right?

I'm going to let you in on something. When people buy a processor their main factors are going to be price and performance. NO one I know buys a CPU based on transistor count. Hence, your comparison is anything but apples to apples, and you know it.

The ONLY reason you compared a 2011 processor is to make AMD's die size look more efficient than it really is. There really is no other reason, at least none that makes sense. Transistor count... Jesus.

AtenRa · Mar 22, 2013

Blandge said:
Important to note that the 3820 has 2 extra memory controllers and 24 more lanes of PCIe.

Yes, and the FX8350 has 7MB more L2 cache(8+8 vs 1+10) and 4 HyperTransports(two of them are disabled for the desktop CPUs).

AtenRa · Mar 22, 2013

2is said:
Right, because you buy your processors based on transistor count right? And all the comparison reviews? They chose their comparison reviews based on transistor count to right?

I'm going to let you in on something. When people buy a processor their main factors are going to be price and performance. NO one I know buys a CPU based on transistor count. Hence, your comparison is anything but apples to apples, and you know it.

The ONLY reason you compared a 2011 processor is to make AMD's die size look more efficient than it really is. There really is no other reason, at least none that makes sense. Transistor count... Jesus.

We are not talking about purchasing the CPU, we are talking about die sizes, transistors and microarchitectures. So we compare at the same transistor count, same die sizes and same node process etc.

Maybe it is you that trying to make the Intel 1155 look more than it is.

2is · Mar 22, 2013

AtenRa said:
We are not talking about purchasing the CPU, we are talking about die sizes, transistors and microarchitectures. So we compare at the same transistor count, same die sizes and same node process etc.

Maybe it is you that trying to make the Intel 1155 look more than it is.

Yes, and Intel's comparable SB processor has a die size 216mm2. Period.

You're the only one using LGA2011 as a comparison, that should tell you something, and it isn't that you're right and everyone else is wrong.

Blandge · Mar 22, 2013

AtenRa said:
Yes, and the FX8350 has 7MB more L2 cache(8+8 vs 1+10) and 4 HyperTransports(two of them are disabled for the desktop CPUs).

Do fused off parts of the die get reported in transistor count?

AtenRa · Mar 22, 2013

2is said:
Yes, and Intel's comparable SB processor has a die size 216mm2. Period.

You're the only one using LGA2011 as a comparison, that should tell you something, and it isn't that you're right and everyone else is wrong.

Comparable to what ??? die size ?? transistor count ??? L caches ??

We are not comparing as consumers here, we are comparing CPUs apples to apples technically.

AtenRa · Mar 22, 2013

Blandge said:
Do fused off parts of the die get reported in transistor count?

AMD use the same transistor count both for the desktop and server parts, so i believe they count them.

Vesku · Mar 22, 2013

The comparison of 8350 to 3820 just shows how AMD went wide rather than spend time and $$$ on the core as Intel. Granted part of AMD's R&D money went down the drain when they re-targeted from 45nm to 32nm. Will be interesting to see how Steamroller performs as just from the preview information it looks like what JFAMD had us thinking Bulldozer would be. Almost dejavu of Phenom-Phenom II at this point.

Sidenote, looks to me that even 32nm vs 32nm Intel has a power efficiency advantage vs GloFo. Especially notable when comparing OC power consumption.

jpiniero · Mar 22, 2013

To answer the OP, yes. But it's not a game worth winning; at a time when that market is shrinking. On top of that, no OEM is going to want those 140W heat monsters. If SR's design really is finished, then they may as well release it; but anything further would be foolish. For them to survive, they have to make Jaguar work.

Pilum · Mar 22, 2013

AtenRa said:
AMD Module (excluding L2) at 32nm = 19,42mm2 with 67M tranistors

Intel SandyBridge Core (+HT) (excluding L2) at 32nm = 16,5mm2 with 55M transistors.

The difference in size is only 18% or 2.92mm2, that translates to Trinity being faster in the majority of the MT apps against the Core i3(SB).

Too bad that MT performance still doesn't matter for most use cases, especially on low-cost CPUs. People who do 3D rendering, number crunching, video transcoding/editing, etc. will rarely do that on low-end systems. And for most other tasks, single-threaded performance is still far more relevant.

Homeles · Mar 22, 2013

Exophase said:
Then you have the gall to repeatedly telling me I'm not reading. Maybe the problem is I can't read whatever point it is you're not expressing. If it's that stuff about its problems being more "low level" than L2 latency and about improving L2 latency being "treating the symptom" that's just a load of crap. L2 latency to L1 data cache actually does matter and increasing the size of L1 icache 1.5x or whatever it is won't always reduce the miss rate as much or even at all.

Still got it wrong. The "symptom" comment had nothing to do with the L1 I-cache.

Really, you're just putting words in my mouth. Knock it off already.

Idontcare · Mar 22, 2013

Fjodor2001 said:
AMD has obviously been able to bring out both Llano, Trinity and soon Kaveri despite the economic constraints you mentioned. Why shouldn't they be able to continue like that going forward?

Because the constraints are not the same. The AMD that was flush enough with revenue and cash to keep R&D fed for years to create those chips doesn't exist anymore. It doesn't have the same revenue, it doesn't even have the same employee count.

It has less of everything today versus back in the day when it designed the products you list.

Further the cost to develop next-gen products will be nothing like what it cost to develop the products you listed. It will be vastly more expensive. Node-over-node IC design costs tend to increase by around 30% per node.

AMD's future is a simply straightforward consequence of the math.

Via ran into the exact same unavoidable consequences of the math, albeit much sooner because they had less to begin with. But AMD is now in no different a situation.

Fjodor2001 said:
Also, according AtenRa the statement from Rory does not necessarily have to be interpreted as AMD leaving the big core x86 desktop CPU market segment.

Rory has no choice but to use every bit of marketing double-speak possible in this matter because he (1) still wants to sell piledriver and steamroller-based Opterons to the market, and (2) wants to assure the shareholders that he's not going to keep throwing good money after bad.

If the customers that Rory wants to convince to buy (1) learn about (2) then those customers won't buy (1). They won't buy into an EOL roadmap. That is the same issue Itanium and HP have at the moment, and why Oracle is making as much of it as they can.

Rory can't risk his potential customers getting cold-feet and hastening AMD's exit from the server market. In the meantime he doesn't have the cash or the shareholder support to invest $2-$4B into developing a next-gen Operaton microarchitecture. The money simply isn't there, nor are the priorities.

And he tries so hard to communicate the new direction for AMD to the analysts. But he can't spell it all out. He needs revenue to keep coming in to pay for the transition.

I think its great that people hold out hope for an AMD turnaround. Nothing wrong with seeing the glass as half full. But the numbers don't add up. You have to vastly undercount how much it will cost AMD to develop a 20nm big-core IC in order to convince yourself that they are doing that right now with their current R&D expenditures (combined with all the other stuff those R&D dollars are supposedly funding per Rory and Co.).

Exophase · Mar 23, 2013

Homeles said:
Still got it wrong. The "symptom" comment had nothing to do with the L1 I-cache.

Really, you're just putting words in my mouth. Knock it off already.

Maybe I'd get it right if you'd actually explain yourself instead of constantly doing nothing than repeating "wrong."

So okay, here's what you said.

I'm sure a lower latency L2 would negate the issue, but that would be treating the symptom and not the problem in this circumstance

This came about because you said L1 icache size, branch prediction rates, and clock speed was too low, and that L2 latency isn't a problem. IDC said that if L1 icache hit rate is a sensitivity then L2 latency has to play a role. Yes, I know that later you said L2 latency is a problem, but that's in direct contradiction of when you said that L2 latency doesn't play a major factor for BD being slow. And don't tell me I'm putting words in your mouth, that is exactly what you said, I'll quote it AGAIN:

Johan De Galas here at Anandtech found it to be primarily comprised of three things: low clock speeds (relative to the pipeline length), L1 instruction cache is too small, branch misprediction penalty is too large. Cache latency was not a major factor.

Or do you want to tell me the "it" in that sentence isn't actually the "why Bulldozer is so slow" that mrmt said and you were responding to?

You said that improving L2 latency is treating the symptom and not the problem. How on earth could I interpret this as anything other than L1 icache SIZE being the supposed problem? If it had nothing to do with L1 icache then what did it have to do with?

Phynaz · Mar 23, 2013

AtenRa said:
AMD use the same transistor count both for the desktop and server parts, so i believe they count them.

This is the same company that couldn't figure out how to count the transistors in Bulldozer.

Sleepingforest · Mar 23, 2013

AtenRa said:
Why ?? because that is the only Intel CPU to compare it apples to apples. Almost same transistor count for the CPU without any iGPUs (FX has 1.2B transistors vs 1.27B transistors for the Core i7 3820). Both were made for the Server first and Desktop secondly.

Wait, this is not true at all. The FX line is designed for gamers originally, while LGA2011 is for "extreme enthusiasts". If those CPUs were marketed for server, then Intel and AMD wouldn't need their Xeon and Opteron lines. Sandy Bridge i7-2700K has 1.16 billion transistors, which is closer to the 1.2 billion of 8-core FX than the 2011 6-core is. Furthermore, the two are closer in capability--both overclock and have the same number of cores/threads. If anyone is trying to weasel out a non-apples-to-apples comparison, it is you.

AtenRa · Mar 23, 2013

Pilum said:
Too bad that MT performance still doesn't matter for most use cases, especially on low-cost CPUs. People who do 3D rendering, number crunching, video transcoding/editing, etc. will rarely do that on low-end systems. And for most other tasks, single-threaded performance is still far more relevant.

You may not like it but not everyone has deep pockets for 1000 dollars CPUs. Also, lots of people play games and using lots of different applications that are MT.

The big difference is that using a faster(30%) CPU in single Threaded apps like iTunes will save you a couple of seconds, but using a 30% faster CPU in heavy MT scenarios like Video Transcoding will save you minutes/hours.

AtenRa · Mar 23, 2013

Sleepingforest said:
Wait, this is not true at all. The FX line is designed for gamers originally, while LGA2011 is for "extreme enthusiasts". If those CPUs were marketed for server, then Intel and AMD wouldn't need their Xeon and Opteron lines. Sandy Bridge i7-2700K has 1.16 billion transistors, which is closer to the 1.2 billion of 8-core FX than the 2011 6-core is. Furthermore, the two are closer in capability--both overclock and have the same number of cores/threads. If anyone is trying to weasel out a non-apples-to-apples comparison, it is you.

You are completely wrong, the FX and 2011 was designed to be a server CPU first. The Xeon and Opteron are using the same dies as the desktop counterparts. SandyBridge has 1.16M transistors including the iGPU, the CPU part has less than a Mill transistors. The FX and 2011 CPUs doesnt have an iGPU.

The problem is that people forget or doesnt know that Intel uses more than two CPU dies for their desktop products, for the Sandybridge they used three(3) dies for the desktop 1155 (Dual + GT1, Dual + GT2 and Quad + GT2) and two for the 2011(Quad core and 8 core), that is total of 5. AMD uses only two, one for every Trinity APU sold and another one for the FX line. So, comparing (Technically) the biggest AMD die to an Intel middle sized die is not apples to apples.

From the consumer point of view yes we can compare the FX to 1155 socket CPUs.

AtenRa · Mar 23, 2013

Phynaz said:
This is the same company that couldn't figure out how to count the transistors in Bulldozer.

Then we have to believe that Intel doesnt know how to count transistors as well :whiste:

http://www.anandtech.com/show/4818/counting-transistors-why-116b-and-995m-are-both-correct

With the current rate of Intel CPU performance increases, could AMD be catching up?

Platinum Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Member

Diamond Member

Lifer

Lifer

Diamond Member

Member

Lifer

Lifer

Diamond Member

Lifer

Member

Platinum Member

Elite Member

Diamond Member

Lifer

Platinum Member

Lifer

Lifer

Lifer