What made AMD stray from K10?

pantsaregood

Senior member
Feb 13, 2011
993
37
91
There's been plenty of conjecture on what an upgraded K10 could do, but I've always thought there had to be a reason AMD abandoned it.

Llano brought roughly 5% increase in IPC as a result of core tweaks. Overclocking the CPU-NB to 2.6 GHz yields around a 7.5% performance gain on average.

Combine those on a single CPU and you have something about 13% faster than a Phenom II at equal clocks.

What about clock speed? At 45nm, K10 regularly hit 4 GHz. Could K10 not reliably scale further? Had clocks increased by 20% on a 32nm shrink, parts would ship at 4 GHz for six-core models and 4.5 GHz for quad-cores.

A 20% clock increase would bring the total increase in speed to 35%.

Now: is there some obvious reason AMD opted against this? I understand that including extra instruction sets (AVX, SSE4, AES) and additional cache require a larger die, as does adding additional cores. What would the drawbacks to the above units be, though? A 35% jump in performance would've put AMD in a pretty competitive place.
 

nehalem256

Lifer
Apr 13, 2012
15,669
8
0
There's been plenty of conjecture on what an upgraded K10 could do, but I've always thought there had to be a reason AMD abandoned it.

Llano brought roughly 5% increase in IPC as a result of core tweaks. Overclocking the CPU-NB to 2.6 GHz yields around a 7.5% performance gain on average.

Combine those on a single CPU and you have something about 13% faster than a Phenom II at equal clocks.

What about clock speed? At 45nm, K10 regularly hit 4 GHz. Could K10 not reliably scale further? Had clocks increased by 20% on a 32nm shrink, parts would ship at 4 GHz for six-core models and 4.5 GHz for quad-cores.

A 20% clock increase would bring the total increase in speed to 35%.

Now: is there some obvious reason AMD opted against this? I understand that including extra instruction sets (AVX, SSE4, AES) and additional cache require a larger die, as does adding additional cores. What would the drawbacks to the above units be, though? A 35% jump in performance would've put AMD in a pretty competitive place.

I assume they were hoping to decrease the amount of die area per core so they could cram more GPU are on their APUs.

Since it looks like Piledriver made significant advances in single thread performance even while removing the L3 cache over Bulldozer I do not think the inherent design is as flawed as it originally looked.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Readin this, it would seem that Bulldozer just needs a few tweaks. Piledriver seems to be going in the right direction.
 

pantsaregood

Senior member
Feb 13, 2011
993
37
91
Piledriver doesn't really explain much. Bulldozer was slower than K10 by a fair margin in IPC. Even if the 15% IPC improvement THG shows actually comes true, that's only making up the ground they lost from K10-Bulldozer. It may be better, but only when compared to Bulldozer.
 

Ventanni

Golden Member
Jul 25, 2011
1,432
142
106
Although completely different architectures, the way I look at it is; take the Radeon 2900, right? While somewhat revolutionary in its own right, it was a downright dog overall. It was slower, ran hotter, and consumed more power than its competition. By the 3000 series, you began to see the 2900 performance, just without the heat and power consumption, but by the 4000 series you began to see a real player in the market. And, by the 5000 series, we saw a really great, efficient, and powerful architecture (that has nothing to do with me owning a 5850 btw).

Bulldozer is a dog. It runs hot, performs nowhere near its competition, and consumes way more power than it should. Piledriver is already starting to show us approximately what Bulldozer is capable of. I don't think the architectural change is inherently flawed. I just think it's a result of AMD having nowhere near the R&D capabilities that Intel does. Since the Core 2 release, all CPU architectures from Intel have been practically flawless (including Ivy Bridge aside from its overclockability).

Give AMD some time, and I think they'll show us a pretty darn good architecture out of Piledriver.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
There's been plenty of conjecture on what an upgraded K10 could do, but I've always thought there had to be a reason AMD abandoned it.

Llano brought roughly 5% increase in IPC as a result of core tweaks. Overclocking the CPU-NB to 2.6 GHz yields around a 7.5% performance gain on average.

Combine those on a single CPU and you have something about 13% faster than a Phenom II at equal clocks.

What about clock speed? At 45nm, K10 regularly hit 4 GHz. Could K10 not reliably scale further? Had clocks increased by 20% on a 32nm shrink, parts would ship at 4 GHz for six-core models and 4.5 GHz for quad-cores.

A 20% clock increase would bring the total increase in speed to 35%.

Now: is there some obvious reason AMD opted against this? I understand that including extra instruction sets (AVX, SSE4, AES) and additional cache require a larger die, as does adding additional cores. What would the drawbacks to the above units be, though? A 35% jump in performance would've put AMD in a pretty competitive place.

^ said the board of directors to Dirk Meyers as they showed him the door...
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
What made AMD stray from K10? The short answer is that it wasn't working.

When you try to do the same thing as Intel year after year - focusing on increasing IPC and high-end performance - and keep falling behind, it's probably time to change your strategy. AMD doesn't (or at least didn't) want to settle for being #2, and they weren't going to surpass Intel trying to build CPUs like Intel. They may have reached parity on an Intel-like design (K7), but it was by going in a different direction than Intel that they achieved dominance (K8), even if it was really Intel who shifted tracks.
 

pantsaregood

Senior member
Feb 13, 2011
993
37
91
I don't understand why increasing IPC wasn't working, though. Llano brought a 5% increase in IPC, and faster L3 cache brought a 7.5% increase. Bringing those improvements together would yield a 13% improvement overall - still slower than Sandy Bridge, but pretty close to Nehalem.

A 20% increase in clock speed in conjunction with those improvements would've been enough to compete with Sandy Bridge. A quad-core unit would keep up with Sandy Bridge i5s fairly well, and a six-core unit could potentially win out against quad-core i7s at times.

Piledriver fails to impress me because it is only making up ground that AMD lost with the Bulldozer architecture. If Piledriver performs as Trinity reports expect it to, then AMD is performing exactly where they were in late 2010 - the issue, of course, is that 1090T vs. i7 950 was a good fight.
 

JustMe21

Senior member
Sep 8, 2011
324
49
91
The way I understood it ws that it was getting harder to clock the K10 any higher. Bulldozer was a bit of a gamble that we'd have a lot more multiprocessing in use today and it could scale higher in speeds, but it does feel like it got rushed along like the original Phenom did.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
It's my opinion that AMD was trying to accomplish too much on the 32nm node. Bulldozer came too early. Also, AMD's obsession with monolithic dies has been a huge setback for them. Phenom, for instance, suffered the same issues. It couldn't reach high clock speeds, partially as a result of sticking 4 cores on the same die.
 

nehalem256

Lifer
Apr 13, 2012
15,669
8
0
I don't understand why increasing IPC wasn't working, though. Llano brought a 5% increase in IPC, and faster L3 cache brought a 7.5% increase. Bringing those improvements together would yield a 13% improvement overall - still slower than Sandy Bridge, but pretty close to Nehalem.

Overclocking the L3 cache brought a 7.5% increase performance. AMD decided not to ship their chips with the L3 cache overclocked though. Also remember that APUs do not have L3 cache, and AMD cares much more about them.

A 20% increase in clock speed in conjunction with those improvements would've been enough to compete with Sandy Bridge. A quad-core unit would keep up with Sandy Bridge i5s fairly well, and a six-core unit could potentially win out against quad-core i7s at times.

Llano which is die shrunk Phenom does not have a 20% increase in clock speed. And it is not like Sandy Bridge is suffering from lack of frequency head room. If AMD turned up the clock speed magically, it seems nearly certain that Intel could match it clock for clock.
 

pantsaregood

Senior member
Feb 13, 2011
993
37
91
Llano was never designed to clock 20% higher, or higher at all for that matter. A large portion - if not a majority - of 45nm K10s can run at 4.0 GHz. 980s hitting 4.2 GHz isn't particularly rare, either. Had the die shrink focused on improved clocks, it surely could've managed some sort of increase.

AMD didn't ship Phenom II with the L3 overclocked. What I'm asserting is that increasing the L3 speed on newer iterations of K10 would've been a cheap and effective boost in IPC.

You are right, though. Sandy Bridge and Ivy Bridge both have plenty of headroom. I don't think I've heard of either failing to hit 4.0 GHz.

As for Bulldozer feeling rushed and the comparison to the original Phenom: I don't think that's quite a fair comparison. 65nm Phenom gets a much worse reputation than it deserves. Phenom II's slight IPC increase from Phenom came from increased L3 size, not from improved cores. The TLB bug was awful, but Phenom's inability to compete properly with Core 2 Quad was a result of being unable to get clocks up. I had a Phenom X4 9750 that would go from rock stable at 2.8 GHz to BSODing while loading Windows at 2.9 GHz, and no amount of voltage would stabilize it. I actually blew up a motherboard running 1.7v through that thing.
 

Smoblikat

Diamond Member
Nov 19, 2011
5,184
107
106
Llano was never designed to clock 20% higher, or higher at all for that matter. A large portion - if not a majority - of 45nm K10s can run at 4.0 GHz. 980s hitting 4.2 GHz isn't particularly rare, either. Had the die shrink focused on improved clocks, it surely could've managed some sort of increase.

AMD didn't ship Phenom II with the L3 overclocked. What I'm asserting is that increasing the L3 speed on newer iterations of K10 would've been a cheap and effective boost in IPC.

You are right, though. Sandy Bridge and Ivy Bridge both have plenty of headroom. I don't think I've heard of either failing to hit 4.0 GHz.

As for Bulldozer feeling rushed and the comparison to the original Phenom: I don't think that's quite a fair comparison. 65nm Phenom gets a much worse reputation than it deserves. Phenom II's slight IPC increase from Phenom came from increased L3 size, not from improved cores. The TLB bug was awful, but Phenom's inability to compete properly with Core 2 Quad was a result of being unable to get clocks up. I had a Phenom X4 9750 that would go from rock stable at 2.8 GHz to BSODing while loading Windows at 2.9 GHz, and no amount of voltage would stabilize it. I actually blew up a motherboard running 1.7v through that thing.

Haha, thats an insane amount of volts :p

I agree though, just because overclocking made the L3 cache faster doesnt mean you should have to OC it. My GTX 470's stock clock is 600mhz, I could OC it to 700 and it still wouldnt even compete with teh 480, they purposely lowered clocks on it and now it technically has a high percentage of headroom. All that means is that I need to OC to get it wheer ti should have been in the first place, so saying OCing the L3 brings a 7.5% increase is true, considering the L3 is 7.5% slower than it should actually be. Though I do think that a PH 2 X8 @ 32nm would have been the best chip. The 1090T generally hit at least 3.8 and competed with the I7 920.
 

cytg111

Lifer
Mar 17, 2008
25,230
14,719
136
There's been plenty of conjecture on what an upgraded K10 could do, but I've always thought there had to be a reason AMD abandoned it.

Llano brought roughly 5% increase in IPC as a result of core tweaks. Overclocking the CPU-NB to 2.6 GHz yields around a 7.5% performance gain on average.

Combine those on a single CPU and you have something about 13% faster than a Phenom II at equal clocks.

What about clock speed? At 45nm, K10 regularly hit 4 GHz. Could K10 not reliably scale further? Had clocks increased by 20% on a 32nm shrink, parts would ship at 4 GHz for six-core models and 4.5 GHz for quad-cores.

A 20% clock increase would bring the total increase in speed to 35%.

Now: is there some obvious reason AMD opted against this? I understand that including extra instruction sets (AVX, SSE4, AES) and additional cache require a larger die, as does adding additional cores. What would the drawbacks to the above units be, though? A 35% jump in performance would've put AMD in a pretty competitive place.

- Excellent question .. I can only imagine it is a combination of two things

1. A not anticipated hiccup in the production somewhere
2. AMD Thoght it was entering "The Core Race(tm)"

I think numer one is pretty plausible, but number two is just beyond my grasp. Allright you would have to look at what segments AMD makes its money from, as a server architecture it might make sense, i dont know.. but mainstream does not need 8 cores, we're still strugling with using 4... 2 even, at times. When I think of the 8 core / module / whatever amd chip I think of that drawing thats been posted 100'rds of times here "make moar coars".. it does no make sense.
 

cytg111

Lifer
Mar 17, 2008
25,230
14,719
136
Readin this, it would seem that Bulldozer just needs a few tweaks. Piledriver seems to be going in the right direction.

Also, was this ever dvelved into ?

"But what about the fourth show stopper? That is probably one of the most interesting ones because it seems to show up (in a lesser degree) in Sandy Bridge too. However, we're not quite ready with our final investigations into this area, so you'll have to wait a bit longer. To be continued...."
 

nehalem256

Lifer
Apr 13, 2012
15,669
8
0
- Excellent question .. I can only imagine it is a combination of two things

1. A not anticipated hiccup in the production somewhere
2. AMD Thoght it was entering "The Core Race(tm)"

I think numer one is pretty plausible, but number two is just beyond my grasp. Allright you would have to look at what segments AMD makes its money from, as a server architecture it might make sense, i dont know.. but mainstream does not need 8 cores, we're still strugling with using 4... 2 even, at times. When I think of the 8 core / module / whatever amd chip I think of that drawing thats been posted 100'rds of times here "make moar coars".. it does no make sense.

2 is likely a good part of it. More cores are good for server workloads. Being able to cram more cores into a smaller die size, as the module approach allows, is also good for APUs. And it gives them a potential marketing advantage for laptops as their chips are quad-core whereas Intel predominantly sells Dual-core+HT chips.

The area where their approach is least optimal is really for desktop users, which AMD may have felt was a dying market.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
I think numer one is pretty plausible, but number two is just beyond my grasp. Allright you would have to look at what segments AMD makes its money from, as a server architecture it might make sense, i dont know.. but mainstream does not need 8 cores, we're still strugling with using 4... 2 even, at times. When I think of the 8 core / module / whatever amd chip I think of that drawing thats been posted 100'rds of times here "make moar coars".. it does no make sense.

It was originally intended as a server architecture. I don't think there's been a single "desktop" architecture or desktop-focused chip since the Q6600. What the desktop gets is either mobile-based chips (APUs/Ivy/SB) or server chips (2011/AM3+). From that perspective it looks a bit more sane.

K10 wasn't a high clocking architecture. On 32nm, the Husky/Stars cores hit their ceiling at 3.7ghz (and some of them had fused VLIW5 shaders). A node shrink might usually mean higher clocks, but it's never a sure thing and depends highly on the process and the architecture itself. K10 was already hitting its limits.

AMD would never catch Intel in their IPC race. That's what started Bulldozer in the first place. AMD came to the realization that they'd never catch up to Intel with respect to IPC and/or single-threaded performance and that they'd fall further behind on the fab end. This article gets linked a lot, but that's because it delves into the architecture and its intended purpose quite heavily and is certainly worth reading. The halving of the FPUs also points in the general direction of GPGPU but that's another topic entirely.

AMD could have extended the life of K10/K10.5, but it wouldn't have done them any good. They needed something new, it's just that whatever they were planning didn't come to fruition because somewhere along the line some mistakes were clearly made.
 
Last edited:

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
I wonder if AMD was worried about threaded performance when compared to Intel. For AMD to increase threaded performance, it meant attaching a whole other core. For Intel, it meant turning on Hyperthreading. Bulldozer, to me, looks to try and be an elegant solution to handling more threads. And I think with some tweaks to get IPC and clock speed up, while hopefully not increasing power consumption as that is already AMD's sore spot in my opinion, they'll have an ok part.

But you have to wonder how things would be today if AMD had released a tweaked Phenom-based part that had, say, ~15+% better IPC cores than Deneb/Thuban, maybe more or faster L2/L3, a faster northbridge, ~4.5GHz, and six to eight cores. Maybe that is easier said than done.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
But you have to wonder how things would be today if AMD had released a tweaked Phenom-based part that had, say, ~15+% better IPC cores than Deneb/Thuban, maybe more or faster L2/L3, a faster northbridge, ~4.5GHz, and six to eight cores. Maybe that is easier said than done.

They wouldn't because such a part would be impossible unless they completely redesigned the K10 cores and layout, which would mean an entirely new microarchitecture anyway :p

Bulldozer didn't come out of some random AMD interns sketch on a handkerchief, it came out of necessity. They were losing server market share at an accelerating pace after their Barcelona failure (and even before that) and had to do something to address it. Keeping the same baseline K# architecture wouldn't work.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
There's been plenty of conjecture on what an upgraded K10 could do, but I've always thought there had to be a reason AMD abandoned it.

Llano brought roughly 5% increase in IPC as a result of core tweaks. Overclocking the CPU-NB to 2.6 GHz yields around a 7.5% performance gain on average.

Combine those on a single CPU and you have something about 13% faster than a Phenom II at equal clocks.
(...)
A 20% increase in clock speed in conjunction with those improvements would've been enough to compete with Sandy Bridge.
Where are you going to get those extra clocks? Shrink it and pray? Yeah, that worked out really well. Llano was a good stop-gap, but if that would be the future for AMD, they might as well give up. Then, how are they going to merge the K7 and the GPU's FUs?

Also, now that you accept that Intel will have the #1 performance spot, where do you go?

BD as we got it isn't the best thing they could have made, but it might have been the bets choice at the time. If they try to keep making it perfect, but never sell a CPU, they won't be all that well off. Also, check out more recent reviews with gaming benches. It won't knock your socks off, but Trinity isn't all that bad. If they can keep on fixing it up, they should have a nice CPU for some years to come.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
But you have to wonder how things would be today if AMD had released a tweaked Phenom-based part that had, say, ~15+% better IPC cores than Deneb/Thuban, maybe more or faster L2/L3, a faster northbridge, ~4.5GHz, and six to eight cores. Maybe that is easier said than done.

Its always easy to be clever afterwards. But Intels bet on the fat fast cores payed of. Its been an utter disaster with the more thin cores approach that AMD took.

One has to wonder if its also related to the heavily limited R&D AMD got. But looking on the case, one can only wonder why AMD didnt see the light from Intel and running the uncore at core speed atleast. The entire uncore part from AMD doesnt look thought through.

And what did AMD even think on when they reused the AM3+ socket for the new uarch. Again either lack of R&D funds or simply lack of vision. The FM sockets to the rescue there.

And then there is the "APU" part. Its brutally bolted on, unlike a more thoughtful design from Intel with much better integration. Yet a GPU integration should be something AMD should put on very high priority. Its one of the only advantage they still have. But I guess it will be gone in ½-1½ years. Another wasted oppotunity.
 
Last edited:

infoiltrator

Senior member
Feb 9, 2011
704
0
0
Speed was increasing current increasing heat, factory overclocks were nearing the limit of consumer overclocks. Multiple cores were great but in gaming and single thread aps falling behind the 4 core chips.

Factor in needing to spend more on oem heatsinks to "keep up" and a "gamble" on Bulldozer makes sense. Took longer and accomplished less than hoped for. Refinement/evolution is the best hope for AMD.

The K-10 was bumping an "upper limit," at least in theory.
The new architecture "fits" servers just fine, and with less total "complete" cores should cost less to produce.
Any company that loses sight of profit is likely doomed. K-10 was shrinking profits with no releaf in sight.
Fan boys have helped fuel Bulldozer sales, as well as functionality.

By the Pure Numbers I'll spend for Intel, on the other hand I have a stock of AM3 motherboards and lesser AMD AM3 chips to use up. 830/840/640/620
Ha allso have an E6700 3.2 and MSI P31 motherboard to build, bargain hunting.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
And then there is the "APU" part. Its brutally bolted on, unlike a more thoughtful design from Intel with much better integration.

say-what-pl-ffffff.jpg


Friend, I do believe you're talking poopoo.

Photoshop%20CS6%20Liquify.jpg


Photoshop%20CS6%20Blur%20CMYK%2025.jpg


http://www.tomshardware.com/reviews/photoshop-cs6-gimp-aftershot-pro,3208-13.html

Unlike Intel's approach, which is MIC and hinges on whether it succeeds in the HPC space, AMD has already made some decent strides in this regard. And those up there are the Llano figures, these below are Trinity.

46685.png

*note that Intel's approach here isn't the same as AMD's in that Intel has dedicated hardware specifically for Quicksync that isn't tied to rendering images, meaning it's not really part of the GPU. It's more GPU + quicksync.

46687.png


Note the bump in performance between the CPU only and openCL figures for AMD. The openCL for Intel's chips runs CPU only and sees much smaller gains as a result.

This truly is the holy grail for what AMD is hoping to deliver with heterogeneous compute in the short term. The Sandy Bridge comparison is particularly telling. What once was a significant performance advantage for Intel, shrinks to something unnoticeable. If AMD could achieve similar gains in other key applications, I think more users would be just fine in ignoring the CPU deficit and would treat Trinity as a balanced alternative to Intel. The Ivy Bridge gap is still more significant but it's also a much more expensive chip, and likely won't appear at the same price points as AMD's A10 for a while.

http://www.anandtech.com/show/5835/testing-opencl-accelerated-handbrakex264-with-amds-trinity-apu

Intel's current approach is that the GPU only produces images. That's it. The quicksync portion is hardware dedicated for transcoding. AMD offers GPU-focused hardware also capable of compute (and transcoding).

Give credit where credit is due, AMD has made some decent strides in GPU-accelerated computing and comparatively Intel has brought absolutely nothing (literally nothing) to the table with their HDxxxx class on-die GPUs.

- Don't forget better frame rates in mobile (and much better in desktop)
- Faster response time meaning less latency
- Better image quality as well


And all of these to go along with the GPGPU capabilities.

For all the crap AMD get with Bulldozer, which is rightfully deserved, they should get equal praise for their APUs.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Friend, I do believe you're talking poopoo.

I believe you should read it again, because you completely missed it. I even said AMD still had the advantage. So the poopoo is on you.

I never said Intels solution was faster. But it is much better integrated.

Intels iGPU got direct access to the L3 cache and able to directly communicate via the ringbus. While AMDs solution sits on a bus (One would guess PCIe or HT in nature.)

Trinity_unb.png


Vs.
intel-ivy-bridge-hd4000-diagram-architecture.jpg

4.png


Also its nonsense to say Intels iGPU doesnt support compute. Because they do.
45866.png

45906.png
 
Last edited: