Randy Allen of AMD - what went wrong?

Idontcare · Aug 17, 2008

http://www.youtube.com/watch?v=G_n3wvsfq4Y

I never saw this youtube video of Randy actually talking up the pre-release performance of K10, had only read the snippets in articles across the web.

But now having seen the man in action, talking the talk as he does, it's clear the guy is not entirely a babbling idiot. He seems decently intelligent and cognizant of what he thinks he's talking about.

So what struck me is how did they get it so wrong (the performance expectations)?

Another thing that struck me is that Randy makes it pretty clear (if you sit thru the entire 9min video) that he/AMD felt Barcelona was THE flagship product of 65nm. This is stated more than once. Which again makes me wonder what went wrong between the expectations of 65nm versus the reality? It kinda gave me a 90nm prescott deja vu where you have chips leaving the fab and the process guys and the design guys are pointing the power-consumption finger at each other.

The last thing tucked away in the video starting around minute 8 is priceless commentary on the apparent lack of understanding of what Nehalem was going to be. Mind you this video was prepared in 2007, but when you listen to the guy display such an air of confidence regarding AMD's assured dominance for years to come because of Intel's FSB (plus the comments he makes about Intel going monolithic with no architectural changes) it again just makes you wonder how on earth AMD had it all just so wrong as recently as 12 months ago.

I know this "K10 will beat Clovertown by >40%" quote made a lot of laughter for a great deal of time, but now that the dust has settled and many months have passed has there ever been much reported in the way of a post-mortem to figure out just how the heck AMD convinced itself that they were going to achieve such results?

And why was AMD so unawares of what Nehalem was bringing to the table just one year ago? Was there really so little info available back then? (I thought there was a lot of knowledge about Nehalem from last summer, wasn't the "I am Nehalem" demo done at IDF nearly a full year ago?)

Cookie Monster · Aug 17, 2008

From my knowledge, there was several things that AMD couldn't achieve (but rather achievable via software simulation most definitely). First one was how the L3 cache was clocked at a much higher frequency than the core clock speed. This was one of the features that AMD touted and hyped up about. The reality is that it was clocked ALOT lower than its aimed target due to power/heat. Infact we are talking about around ~3GHz +/- 400MHz depending on model for the L3/NB clock which is almost double that of what we have now. Secondly was the core clock frequency. Im almost certain that the engineers in AMD were targeting around or over 3.0GHz. Reality is that they could only top 2.3GHz at the time of release, and also hindered by the TLB bug.

Its true that Agena is faster clock per clock against the last generation (however not by a significant margin) but im too puzzled by AMD's confidence before the launch of their hyped up K10. It would be interesting to see what they were actually aiming for, and what kind of design choices they made that went wrong etc but realistically it wont happen any time soon. I guess deneb is one of the missing links where this will show where K10 fell short on.

Reminds me of the long confession made by nVIDIA on NV30 (and also sir Eric on R600) where they explained what went wrong and why it went wrong. (The decision was made that the NV30 would use a "powerful fan" to cool the GPU. This resulted in the creation of the dust buster. Its also entertaining that it was the CEO himself who went with the idea!)

Idontcare · Aug 17, 2008

Originally posted by: Cookie Monster
From my knowledge, there was several things that AMD couldn't achieve (but rather achievable via software simulation most definitely). First one was how the L3 cache was clocked at a much higher frequency than the core clock speed. This was one of the features that AMD touted and hyped up about. The reality is that it was clocked ALOT lower than its aimed target due to power/heat. Infact we are talking about around ~3GHz +/- 400MHz depending on model for the L3/NB clock which is almost double that of what we have now. Secondly was the core clock frequency. Im almost certain that the engineers in AMD were targeting around or over 3.0GHz. Reality is that they could only top 2.3GHz at the time of release, and also hindered by the TLB bug.

True, but even if you take a phenom and intentionally underclock the cores to make them synchronous with the L3$ and then compare that to an underclocked kentsfield (make them clock-equivalent) you still don't see K10 besting the kentsfield by 40% across a wide-range of workloads.

AMD had kentsfield systems in hand for >8months at the time, so they had to intimately know its performance by that point. And surely they knew the performance of K10 with a synched L3$ relative to kentsfield even if those early K10 samples had low clocks.

The TLB bug did not cause a performance issue, the TLB bug patch caused the performance issue. The TLB bug patch did not exist at the time AMD was confident they had a 40% lead on clovertown. So TLB bug doesn't explain the gap.

I just don't see how AMD could have possibly ever convinced themselves that K10 was just so vastly superior to Clovertown, but clearly they did and were so convinced of it that they made such optimistic public statements. I mean even harpertown wasn't better than Clovertown by 40%, how was 65nm K10 going to make such a leap?

Zstream · Aug 17, 2008

Barcelona is great from a server perspective. The server market is more interesting to me as it really shows the power of the cpu. If you can show me a 16 way cpu benchmark with Intel soundly winning please let me know.

myocardia · Aug 17, 2008

Originally posted by: Zstream
Barcelona is great from a server perspective. The server market is more interesting to me as it really shows the power of the cpu. If you can show me a 16 way cpu benchmark with Intel soundly winning please let me know.

Assuming you mean a 16P vs a 16P system, that would have nothing to do with "power of the CPU", only the speed of the interconnects involved/used. 1P vs 1P involves CPU power power, and only CPU power.

Zstream · Aug 17, 2008

umm no, the interconnects is an engineering design with the CPU.

myocardia · Aug 17, 2008

Originally posted by: Zstream
umm no, the interconnects is an engineering design with the CPU.

Which still has zero to do with CPU prowess.

CTho9305 · Aug 17, 2008

Originally posted by: myocardia

Originally posted by: Zstream
umm no, the interconnects is an engineering design with the CPU.

Click to expand...

Which still has zero to do with CPU prowess.

This is a pointless debate. You might as well argue about the performance of the lower 8 bits of the ALUs of the various designs. If you're interested in single-socket performance, you look at single-socket performance evaluations. If you're interested in high-bandwidth multi-socket performance, it really doesn't matter who has a "better core" as rated by enthusiasts stuck at single sockets or using low-communication workloads like raytracing to "benchmark" more-socket systems. And FWIW, an integrated northbridge is part of "the CPU" by any reasonable definition.

Idontcare · Aug 17, 2008

Originally posted by: Zstream
Barcelona is great from a server perspective. The server market is more interesting to me as it really shows the power of the cpu. If you can show me a 16 way cpu benchmark with Intel soundly winning please let me know.

I'm not picky, my question was on what basis did AMD internally evaluate K10 and Clovertown to make the sweeping conclusion that a 40% performance improvement across a wide-range of applications would be realized.

We can start with your request though, I'd love to see a 16 way cpu benchmark with AMD soundly winning by 40% or more as that would at least go some distance to explaining how AMD management gained their confidence to make bold statements.

Do you have a link to such benchmark results?

myocardia · Aug 17, 2008

Originally posted by: Idontcare
We can start with your request though, I'd love to see a 16 way cpu benchmark with AMD soundly winning by 40% or more as that would at least go some distance to explaining how AMD management gained their confidence to make bold statements.

Do you have a link to such benchmark results?

AMD loses in 1P comparisons (which we all knew, right?), loses by considerably less in 2P comparisons, and wins in anything above 2P. A 16P vs 16P comparison would have AMD leading by a minimum of 40%, I would guess. I'll see if I can find the link to the 4P AMD (4 duals) vs 2P Intel (dual quad) comparison that I saw about a year ago.

Idontcare · Aug 17, 2008

Originally posted by: myocardia
AMD loses in 1P comparisons (which we all knew, right?)

And surely AMD knew this too when they made their 40% statement, right? Its not like Randy goes on and on about the performance of Barcelona as only being relevant in the 4S/4P markets.

In his mind he clearly had seen a run-down of performance lists across a bunch of applications in which internal testing showed K10 thumping Clovertown. So what happened to K10 after that internal review that invalidated the performance results that Randy was speaking too?

So I must question whether the 1P comparisons we enthusiasts have been exposed to are the same 1P comparisons that AMD management were assessing before Randy talked about it.

Is this a case of "most desktop apps are integer performance limited whereas server and HPC apps are FPU performance limited"?

The skulltrail review did not show Yorkfield having drastic issues scaling multithreaded desktop apps across two sockets. But maybe server apps do have scaling issues even as early on as 2-> 4 cores within a single socket which AMD saw in internal testing and convinced them they had a monster of a chip on their hands?

We do have some 4S/4P benchmarks from Johan to go by: Sixteen Cores, Four Sockets

Here we see a definite thumping on Intel from AMD for near-equivalent clockspeeds (2.3GHz K10 vs. 2.4GHz : http://images.anandtech.com/gr...050208123441/17057.png

(for what its worth, here's Johan's original Clovertown vs. Barcelona review: http://www.anandtech.com/cpuch...howdoc.aspx?i=2897&p=1)

Accord99 · Aug 18, 2008

Originally posted by: Idontcare

Here we see a definite thumping on Intel from AMD for near-equivalent clockspeeds (2.3GHz K10 vs. 2.4GHz : http://images.anandtech.com/gr...050208123441/17057.png

On the other hand, when comparing official scores for SPECjbb2005 the fastest 4S Xeon system outscores the fastest 4S Opteron system.

http://www.spec.org/osg/jbb200...005-20080506-00485.txt
http://www.spec.org/osg/jbb200...005-20080522-00492.txt

In his mind he clearly had seen a run-down of performance lists across a bunch of applications in which internal testing showed K10 thumping Clovertown. So what happened to K10 after that internal review that invalidated the performance results that Randy was speaking too?

Personally, I think AMD knew with the official launch of Clovertown that they were in trouble. They only used SPECfp_rate to show a 40% advantage, which is a benchmark well-known to be heavily bandwidth dependent (A QuadFX system would outscore a QX6700 by about 40% in this benchmark though it loses in virtually every realworld benchmark).

Otherwise, they were quiet in not revealing any other benchmark scores, until that accidental POV-Ray score release when they were demoing the 4-socket Barcelona system.

Idontcare · Aug 18, 2008

Originally posted by: Accord99

Originally posted by: Idontcare
Here we see a definite thumping on Intel from AMD for near-equivalent clockspeeds (2.3GHz K10 vs. 2.4GHz : http://images.anandtech.com/gr...050208123441/17057.png

Click to expand...

On the other hand, when comparing official scores for SPECjbb2005 the fastest 4S Xeon system outscores the fastest 4S Opteron system.

http://www.spec.org/osg/jbb200...005-20080506-00485.txt
http://www.spec.org/osg/jbb200...005-20080522-00492.txt

Yeah Johan mentioned this discrepancy in hist test results, which clearly isn't helpful when the "data" is so wishy-washy.

But if we concede this benchmark to Intel's 4S platform...then where are the benchmarks that everyone alludes to as showing AMD's performance dominance in the 4S market? Is it just Spec_FP?

Originally posted by: Accord99

Originally posted by: Idontcare
In his mind he clearly had seen a run-down of performance lists across a bunch of applications in which internal testing showed K10 thumping Clovertown. So what happened to K10 after that internal review that invalidated the performance results that Randy was speaking too?

Click to expand...

Personally, I think AMD knew with the official launch of Clovertown that they were in trouble. They only used SPECfp_rate to show a 40% advantage, which is a benchmark well-known to be heavily bandwidth dependent (A QuadFX system would outscore a QX6700 by about 40% in this benchmark though it loses in virtually every realworld benchmark).

Otherwise, they were quiet in not revealing any other benchmark scores, until that accidental POV-Ray score release when they were demoing the 4-socket Barcelona system.

If this was true then that would mean AMD was intentionally placing misleading statements out into the public...a very big no-no in the age of shareholder lawsuits.

I can't believe AMD would risk such a thing for two reasons - the lawsuit liability first, and secondly because what would the point be of lying to your customers and shareholders on something that would be unavoidably proved out once the first K10 chips hit the market?

I mean I am willing to accept that this is what happened, but to me that is the trivial solution merely because it is an obvious possibility. But the trivial solution carries with it a whole bag of unlikely side-assumptions. That AMD was willing to take on the liability of shareholder lawsuits for misleading statements and that it was unavoidable that the truth would come out regardless of what was said prior to K10 being released.

VirtualLarry · Aug 18, 2008

I saw that video and that's just nuts. It appears that AMD really believed that they had a 40% improvement over Intel. The contrast with the real-world results are staggering. From the video, it appears that Barcelona isn't a bad chip, but it seems that all of their "microarchitectural tweaks" amounted to a hill of beans.

Thanks for the interesting video link, op.

zephyrprime · Aug 18, 2008

This just goes to show you than anything from the company's mouth is naught but marketing.

Idontcare · Aug 18, 2008

Originally posted by: zephyrprime
This just goes to show you than anything from the company's mouth is naught but marketing.

Well they'd be piss-poor stewards of their shareholder's equity if they operated any differently...Intel is no different (thankfully for its shareholders, but Larrabee is a great example) and Nvidia talks the same talk too.

What my mind is boggling to figure out is how AMD managed to compile enough internal data points on K10 vs Clovertown to conclude "40% performance advantage across a wide-range of applications". That's not something you say when all you got in that list is Spec_FP. Surely they had more applications, you don't blatantly lie to the public and escape shareholder lawsuits of forward looking statements (sarbox, etc) so I just can't accept the idea that these guys literally lied to the public.

Something must have happened to cause the Barcelona that came to market to be markedly under-performing relative to the Barcelona that all the internal benchmarking was done with in the months prior.

Perhaps it really was the decision to go with asynchronous L3$ when the clockspeeds just didn't come out of the fab correctly?

dmens · Aug 18, 2008

Originally posted by: Idontcare
Perhaps it really was the decision to go with asynchronous L3$ when the clockspeeds just didn't come out of the fab correctly?

even if the L3 came back half as fast, i really don't think it would cause a 40% dropoff in performance across more than a couple benchmarks.

bryanW1995 · Aug 19, 2008

I remember reading some statements by gary key referencing that the L3 needed to be above 2.5 ghz to make it start to shine. iirc, he said that it started to scale much better vs conroe as the L3 clock speed went up, though I don't remember the details. What we have in barcelona is simply an engineering failure from amd; it had a ton of potential but they were simply unable to realize it with the engineering talent that they had available to work on it. perhaps 45nm will show us what k10 should have been and meet/beat yorkfield...just in time to get swatted by nehalem.

Zap · Aug 19, 2008

Well, it was probably a combination of three things...

1) They didn't know at the time that their product wasn't going to scale so well.

2) They didn't have official proof of future Intel chip performance.

3) Hector ordered, ahem, "better marketing" to make the company look better for investors.

Is this true? About as much as those 40% performance margins.

How I came up with these three things that went wrong... are these three things...

1) It is 2am.

2) I generally don't get enough sleep.

3) I have a good imagination.

:laugh:

Originally posted by: Cookie Monster
The decision was made that the NV30 would use a "powerful fan" to cool the GPU. This resulted in the creation of the dust buster. Its also entertaining that it was the CEO himself who went with the idea!

Well, several years ago there wasn't as much awareness of extraneous computer noise, so it wasn't unreasonable (if the situation never arose previously) to say "just put a more powerful fan on it."

Randy Allen of AMD - what went wrong?

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Elite Member

Diamond Member

Elite Member

Platinum Member

Elite Member

No Lifer

Diamond Member

Elite Member

Platinum Member

Lifer

Elite Member