AMD vs Intel at the high end in the future

Kuzi

Senior member
Sep 16, 2007
572
0
0
Since the Phenom II was released AMD's sales are doing better, but the problem is that they are not making much money from selling their CPUs and also their GPUs for that matter.

AMD has to compete with Intel at the top end if they want to start making money, like they did in the Athlon 64 days. It seems now that Intel is into overdrive and AMD will have a very hard time catching up. In terms of process technology I think AMD can slowly catch up to Intel, to say being only 6 to 9 months behind Intel. I think that's what will happen for the 32nm process generation.

All hopes for AMD are on Bulldozer which I'm guessing won't be released until late 2010 or early 2011. It's a completely new architecture which is a good thing, because AMD finally moves away from the K6/K7/K8/K10/K10.5 which were great but have become a little too old for the tooth. Now if we look at the current Phenom II, while great for the price, it's still maybe 10% slower clock for clock compared to Core 2. If we look at Nehalem it?s about 15-20% faster than Phenom II at the same clock speed. This is without even counting gains from Hyper threading.

Westmere is basically Nehalem on the newer 32nm process, which can mean higher clocks and lower power draw. And by the time Bulldozer is released it will have to compete not only with Westmere but also Sandy Bridge which should be released around the same time as Bulldozer (late 2010/early 2011).

If we assume only an incremental improvement from Westmere to Sandy Bridge, similar to the difference between Core 2 and Nehalem, then we can say it will be about 10% faster than Westmere. For Bulldozer to compete with that it will have to be about 30% faster than the current Phenom II, which is very hard to get out of the x86 architecture at this time.

Even that jump in performance will not be enough, and Bulldozer will need speeds of at least 3.6GHz and higher. Plus the Hyper threading issue, which will become more important as software becomes more threaded. AMD will not be able to compete against Intel CPUs in the future in highly threaded programs, and adding more cores helps but is not an efficient way to fight Intel's HT. Future AMD CPUs will need to have some form of Hyper threading in order to compete.

So to summerize, AMD can have a chance with Bulldozer if:

Clock per clock performance is at least 25-30% faster than Phenom II
Having at least 8 cores running at a minimum of 3.6GHz
They add Hyper threading

Is this possbile for AMD to do?

 

LoneNinja

Senior member
Jan 5, 2009
825
0
0
Bulldozer should have a 16core variant upon release from what I've read. AMD is also planning to beat Intel in a core race, and if they follow the roadmaps they will. Q1 2010 is suppose to be the launch of what I believe they call "Mangy Cores" and it'll feature up to 12 cores. Don't forget, we've also got the 6core Istantbul releasing in June. Personally I don't see why they are trying to ramp the cores up so rapidly now, unless this is all server exclusive right now. The only way I can max out my quad is with 3ds max rendering, I've got nothing else that will do that.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: Kuzi
So to summerize, AMD can have a chance with Bulldozer if:

Clock per clock performance is at least 25-30% faster than Phenom II
Having at least 8 cores running at a minimum of 3.6GHz
They add Hyper threading

Is this possbile for AMD to do?

Hi Kuzi, the simple and short answer is yes they can.

The reason we know it can be done is because Intel is doing it...so proof that x86 can be taken to those performance levels is already being (has already been) proven.

The challenge as I see it is not to "can they do xyz by 2011" but "can they do xyz by 2011 with 25% of the resources that Intel has to accomplish xyz?".

The difference comes down to the people. Motivation, passion, morale, incentive, etc.

So we really are engaging in speculation as to the odds or likelihood that AMD will deliver a competitive product with Bulldozer while operating in a resource restricted environment relative to the competition.

(the same could be said of every past/by-gone x86 competitor, from TI to centaur to cyrix to IBM to transmeta, etc, including Via today...all faded away (mostly) as they could not stand up to even AMD's R&D budget of the time)

Regarding your comments on Hyperthreading...I don't view this as THE necessity, what we really are getting at here is that the chip (CPU, fusion GPGPU product, whatever is under the IHS) that plugs into the socket needs to be capable of processing a competitive number of active threads...something SUN dubbed as "throughput computing" a while back IIRC...and we don't really care if bulldozer processes 12 threads by way of 12 cores or 6 cores with dual-thread SMT or 4 cores with tri-thread SMT or 3 cores with quad-thread SMT, etc.

Hyperthreading is an engineering architecture tradeoff, just one of many means to the same end.

Clockspeed...likely needs to be 4.5-5GHz. Initial 32nm westmere reports are showing upwards of 5GHz for clarkdale on-air overclocks. 32nm will only get more mature from here, and Sandy Bridge will only likely take it one step further when it comes to stable clockspeeds on 32nm.

Clock-for-clock...i.e. IPC metrics...30% higher single-thread IPC is a MUST to be competitive with Sandy Bridge.

Is it possible? Yes, we are speculating after-all that Sandy Bridge itself will have this level of performance so we have already conceded it is possible for an x86 architecture to reach this level of IPC.

But is it probable? Ugh...I'd love to say yes, but look at history. When has AMD beat Intel? When AMD delivered a vastly vastly superior solution on the 1/4 budget they have, or when Intel elected to rest on their laurels and pursue netburst for business reasons instead of technology reasons and thus left open the door of opportunity for a merely superior (but not vastly superior) architecture to step into the limelight for a few years?

I look at AMD's past (hard fought) victory NOT as AMD getting ahead by one step because they moved ahead by two steps at once whilst Intel merely stood still; but rather I view it as AMD got ahead by 2 steps because Intel took one step back whilst AMD merely took one step forward. Not to say AMD doesn't get the credit for shaping the x86 market we benefit from today, but just saying had the decision to go Netburst been made by Andy Grove or Otellini instead of Barret then I think netburst would have been avoided and the legacy of AMD's K8 would not stand out as much as it rightfully does.

Why do I bring this all up...because folks like to invoke the K8 era as an example of how AMD could trump Intel with Bulldozer...they still can do it, sky is the limit, but to do it in the manner that K8 trumped Intel in the past would require Otellini to make equally disastrous marketing-based decision regarding the direction of Sandy Bridge's architecture relative to the currently stellar Nehalem architecture. Again it is possible Otellini will do this, but it is not likely IMO. This making AMD's challenge with Bulldozer all the more difficult.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: LoneNinja
Bulldozer should have a 16core variant upon release from what I've read. AMD is also planning to beat Intel in a core race, and if they follow the roadmaps they will. Q1 2010 is suppose to be the launch of what I believe they call "Mangy Cores" and it'll feature up to 12 cores. Don't forget, we've also got the 6core Istantbul releasing in June. Personally I don't see why they are trying to ramp the cores up so rapidly now, unless this is all server exclusive right now. The only way I can max out my quad is with 3ds max rendering, I've got nothing else that will do that.

Yes Magny Cores is a server part, and being at 45nm will mean it will be huge in size, but that is fine because AMD can price these very high. My post was mostly talking about the Desktop though.

More cores is not the only answer though, and AMD needs higher IPC to compete in the future. Bulldozer with 16 cores will be huge too, even at 32nm, all these cores will need a very large L3 cache pool in order for them to function optimally. My guess is the 16 core variant @ 32nm will be for servers only, and 4 and 8 core versions will be for the Desktop.

Now if we take a Sandy Bridge with 16 cores, that will mean 32 virtual cores with HT. A 16 core Bulldozer will have no chance against that in highly threaded apps, that's why AMD needs HT also (or something similar). Hopefully they have decided to add some form of HT into Bulldozer.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Nice post IDC :)

Originally posted by: Idontcare
Hi Kuzi, the simple and short answer is yes they can.

The reason we know it can be done is because Intel is doing it...so proof that x86 can be taken to those performance levels is already being (has already been) proven.

The challenge as I see it is not to "can they do xyz by 2011" but "can they do xyz by 2011 with 25% of the resources that Intel has to accomplish xyz?".

True Intel did it, but not all in one jump, they slowly added performance, Core->Core 2->Nehalem etc at a time when AMD was sleeping. I agree about the resources part and that's a big difference there between Intel and AMD.

Regarding your comments on Hyperthreading...I don't view this as THE necessity, what we really are getting at here is that the chip (CPU, fusion GPGPU product, whatever is under the IHS) that plugs into the socket needs to be capable of processing a competitive number of active threads...something SUN dubbed as "throughput computing" a while back IIRC...and we don't really care if bulldozer processes 12 threads by way of 12 cores or 6 cores with dual-thread SMT or 4 cores with tri-thread SMT or 3 cores with quad-thread SMT, etc.

Hyperthreading is an engineering architecture tradeoff, just one of many means to the same end.

Ok, so in the end they do need to add something similar to HT in a way so that their CPUs can process more threads on the same core.


Clockspeed...likely needs to be 4.5-5GHz. Initial 32nm westmere reports are showing upwards of 5GHz for clarkdale on-air overclocks. 32nm will only get more mature from here, and Sandy Bridge will only likely take it one step further when it comes to stable clockspeeds on 32nm.

5GHz seems really high, especially when talking about new processors with at least 8 cores. At least we know AMD will finally have HK/MG for the 32nm process, so that should give them a very nice jump in frequency.


Clock-for-clock...i.e. IPC metrics...30% higher single-thread IPC is a MUST to be competitive with Sandy Bridge.

Is it possible? Yes, we are speculating after-all that Sandy Bridge itself will have this level of performance so we have already conceded it is possible for an x86 architecture to reach this level of IPC.

This is in my view the hardest part for AMD to achieve with Bulldozer, I don't believe they can do it, at least not with the first Iteration of Bulldozer. But we can always hope.


But is it probable? Ugh...I'd love to say yes, but look at history. When has AMD beat Intel? When AMD delivered a vastly vastly superior solution on the 1/4 budget they have, or when Intel elected to rest on their laurels and pursue netburst for business reasons instead of technology reasons and thus left open the door of opportunity for a merely superior (but not vastly superior) architecture to step into the limelight for a few years?

I look at AMD's past (hard fought) victory NOT as AMD getting ahead by one step because they moved ahead by two steps at once whilst Intel merely stood still; but rather I view it as AMD got ahead by 2 steps because Intel took one step back whilst AMD merely took one step forward. Not to say AMD doesn't get the credit for shaping the x86 market we benefit from today, but just saying had the decision to go Netburst been made by Andy Grove or Otellini instead of Barret then I think netburst would have been avoided and the legacy of AMD's K8 would not stand out as much as it rightfully does.

Why do I bring this all up...because folks like to invoke the K8 era as an example of how AMD could trump Intel with Bulldozer...they still can do it, sky is the limit, but to do it in the manner that K8 trumped Intel in the past would require Otellini to make equally disastrous marketing-based decision regarding the direction of Sandy Bridge's architecture relative to the currently stellar Nehalem architecture. Again it is possible Otellini will do this, but it is not likely IMO. This making AMD's challenge with Bulldozer all the more difficult.

I agree AMD will have a tough time beating Sandy Bridge, which will be an evolutionary step over Nehalem. I think AMD can catch up to at least Westmere in performance, especially if they add some form of Simultaneous Multithreading to Bulldozer and can get high clocks out of their 32nm process.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: Kuzi
Yes Magny Cores is a server part, and being at 45nm will mean it will be huge in size, but that is fine because AMD can price these very high. My post was mostly talking about the Desktop though.

And considering the TDP vs. GHz envelope that a mere four-cores require on AMD's 45nm we can be fairly confident that we won't be seeing 3GHz 12-core Magny-Cours coming anytime soon.

A 2.2GHz MagnyCours at 140W TDP would be pretty impressive, would be really impressive.

Originally posted by: Kuzi
More cores is not the only answer though, and AMD needs higher IPC to compete in the future. Bulldozer with 16 cores will be huge too, even at 32nm, all these cores will need a very large L3 cache pool in order for them to function optimally. My guess is the 16 core variant @ 32nm will be for servers only, and 4 and 8 core versions will be for the Desktop.

At least 2MB/core (or thread really), so we'd be expecting around 32MB of L3$ for a 16core Bulldozer.

That would be aggressive for 22nm, let alone 32nm.

Originally posted by: Kuzi
Now if we take a Sandy Bridge with 16 cores, that will mean 32 virtual cores with HT. A 16 core Bulldozer will have no chance against that in highly threaded apps, that's why AMD needs HT also. Hopefully they have decided to add some form of HT into Bulldozer.

Its not just cores vs. hyperthreading, it really about die-size (area) needed per thread...nehalem cores are considerably larger than K10.5 cores, bulldozer can have 16 tiny cores and still be competitive with an 8core dual-thread SMT SandyBridge of equivalent die-size.

More cores can be a superior approach for cache sensitive apps as the dedicated cache (L1 and L2) don't get cut in half then from that second thread trying to operate in the shadow of the first thread.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,025
3,497
126
Originally posted by: Kuzi

So to summerize, AMD can have a chance with Bulldozer if:

Clock per clock performance is at least 25-30% faster than Phenom II
Having at least 8 cores running at a minimum of 3.6GHz
They add Hyper threading

Is this possbile for AMD to do?

They would also need turbo on, which is the reverse of speedstep.

But your also talking about a very small share in the computer world.

Remember for every consumer PC you see there is about 5-10 enterprise PC on its other side.

So, the better question you shoud ask is when do you expect your typical office workplace to be swaping all there machines with AMDs.

Sad truth i cant think of many, even back when AMD was prefered over the P4 D.

Also more gainestown will be sold then bloomfield. This is a fact, and the gainestown platform has AMD as well as all the other chip vendors shitting bricks.
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
Are we really expecting sandy to reach 16 cores? Nehalem is hitting 8 cores, and while sandy is going to last until 3 years after that introduction (start of 8 core nehalem until Ivy), do we really expect Intel to put 16 cores on a 32nm process? Itd be HUGE. I wouldn't expect 16 cores until Ivy, unless you can inform me otherwise.

Bulldozer with 16 cores might be able to compete with an 8c16t Sandy.
 

Flipped Gazelle

Diamond Member
Sep 5, 2004
6,666
3
81
Originally posted by: ilkhan
Are we really expecting sandy to reach 16 cores? Nehalem is hitting 8 cores, and while sandy is going to last until 3 years after that introduction (start of 8 core nehalem until Ivy), do we really expect Intel to put 16 cores on a 32nm process? Itd be HUGE. I wouldn't expect 16 cores until Ivy, unless you can inform me otherwise.

Bulldozer with 16 cores might be able to compete with an 8c16t Sandy.

Are you opposed to CPU's the size of a slice of Kraft American Cheese? :laugh:
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: Idontcare
Its not just cores vs. hyperthreading, it really about die-size (area) needed per thread...nehalem cores are considerably larger than K10.5 cores, bulldozer can have 16 tiny cores and still be competitive with an 8core dual-thread SMT SandyBridge of equivalent die-size.

More cores can be a superior approach for cache sensitive apps as the dedicated cache (L1 and L2) don't get cut in half then from that second thread trying to operate in the shadow of the first thread.

True, but many tiny cores won't help with software that uses only 2 or 4 threads (most games, most windows apps).

IIRC, Nehalem cores are about 50% larger than those found in K10.5, and Sandy Bridge cores will probably be larger than Westmere ones. So AMD has a lot of room to increase the size of each Bulldozer core, although cache density advantage is on Intels side.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: Kuzi
Originally posted by: Idontcare
Its not just cores vs. hyperthreading, it really about die-size (area) needed per thread...nehalem cores are considerably larger than K10.5 cores, bulldozer can have 16 tiny cores and still be competitive with an 8core dual-thread SMT SandyBridge of equivalent die-size.

More cores can be a superior approach for cache sensitive apps as the dedicated cache (L1 and L2) don't get cut in half then from that second thread trying to operate in the shadow of the first thread.

True, but many tiny cores won't help with software that uses only 2 or 4 threads (most games, most windows apps).

IIRC, Nehalem cores are about 50% larger than those found in K10.5, and Sandy Bridge cores will probably be larger than Westmere ones. So AMD has a lot of room to increase the size of each Bulldozer core, although cache density advantage is on Intels side.

I agree with everything you say...the cache advantage in Intel's favor could evaporate at 32nm if the speculation by Hans over on Ace's comes true.
 

faxon

Platinum Member
May 23, 2008
2,109
1
81
wow that's really impressive. having a 2.5x smaller SRAM cell will really help in ratcheting up the cache sizes that AMD's cores seem to be so hungry for. just look at the difference between PH1 and 2 and you get the idea. it will also help in putting enough L3$ on an 8 or 16 core monolithic die to feed every core properly, something AMD has obviously struggled with in the past. will be interesting to see how this rolls out in 2-3 years
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Just to clarify, the comparison to be made at 32nm is the following:

Intel 32nm sram = 0.171 µm²

AMD 32nm sram = 0.149 µm²

What we don't know is (a) functional clockspeed versus Vcc, and (b) parametric "yieldability", aka manufacturability.

The Idrives are comparable for PMOS (1.21 and 1.22 mA/µm respectively) and NMOS (1.55 mA/µm for both)...which should give us some confidence the raw capability exists in these process nodes for both to reach equivalent clockspeeds sram provided all other architecture caveats to the sram implementation were identical (which they never are). But this does mean that whatever the clockspeeds end up being, it won't be slower than the competition for reasons of process tech deficiency.

The second question, manufacturability, is more critical in my opinion in terms of defining whether the smaller sram will actually yield better margins (as in profit margins) for AMD. If inherent process variability should happen to cause considerable yield issues with that particularly smaller sram design then having a small sram cell is pretty much pointless from a technological advantage standpoint.

It needs to be yieldable at Vcc and GHz needed to support sellable and performance competitive SKU's against Intel. I'm quite excited actually to find out how this story turns out.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Originally posted by: Kuzi
If we assume only an incremental improvement from Westmere to Sandy Bridge, similar to the difference between Core 2 and Nehalem,
?

Conroe -> Nehalem is by no means 'incremental'. And remember that you can do it only once.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
OK having said that, there is nothing stopping Intel/AMD from pulling Conroe/Hammer again.
 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
I am quite surprised that I got through OP's post in one sitting. Maybe as I age my attention span is getting worse these days. Anyhow, I think AMD isn't just competing on core#//efficiency, there's also features. I believe AMD has some nice features coming up like their co-processors. It could be very useful say for people who do engineering work to add an engineering cp. Like I do a lot of video compression from time to time, if they got a cp for that, I'd love it. I mean 90% of time my cpu is just sitting idle only that 10% time when I load it with some x264s it becomes busy. if amd has a nice cp that can speed x264 like 5 times, I believe for all intents and purposes, it would present a better value to me.

another things is, the high end chips while has huge profit margin probably just account for like 10% of the market, so I don't think it's so huge for balance sheet of either company. intel actual makes a ton of cash from the atom. it's very cheap to make.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
An i7 core has a healthy ipc advantage over phenom II, but the actual cores (excluding cache) are significantly larger. AMD has more room to add cores to compete, though if L3 cache needs to increase along with that it kind of sucks for them in terms of die size. Maybe AMD could adopt the fastest of ddr3 memory as standard for a line of processors without L3 cache. They could really load up the die with cores then.

I doubt AMD will win clock for clock compared to Intel. Maybe they can come close though. Barring some kind of breakthrough on AMD's side allowing for high clock speeds or better memory controller/cache design, I don't think there's anything AMD can do (where's that ati synergy?). Heh, maybe go down the netburst/nvidia g80 root and have certain parts of the cpu run at twice its clock speed.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: nyker96
I am quite surprised that I got through OP's post in one sitting. Maybe as I age my attention span is getting worse these days. Anyhow, I think AMD isn't just competing on core#//efficiency, there's also features. I believe AMD has some nice features coming up like their co-processors. It could be very useful say for people who do engineering work to add an engineering cp. Like I do a lot of video compression from time to time, if they got a cp for that, I'd love it. I mean 90% of time my cpu is just sitting idle only that 10% time when I load it with some x264s it becomes busy. if amd has a nice cp that can speed x264 like 5 times, I believe for all intents and purposes, it would present a better value to me.

another things is, the high end chips while has huge profit margin probably just account for like 10% of the market, so I don't think it's so huge for balance sheet of either company. intel actual makes a ton of cash from the atom. it's very cheap to make.

As I read your post it invokes images of what I think Intel is intending to do with Larrabee and Sandy Bridge.
 

faxon

Platinum Member
May 23, 2008
2,109
1
81
Originally posted by: Idontcare
Originally posted by: nyker96
I am quite surprised that I got through OP's post in one sitting. Maybe as I age my attention span is getting worse these days. Anyhow, I think AMD isn't just competing on core#//efficiency, there's also features. I believe AMD has some nice features coming up like their co-processors. It could be very useful say for people who do engineering work to add an engineering cp. Like I do a lot of video compression from time to time, if they got a cp for that, I'd love it. I mean 90% of time my cpu is just sitting idle only that 10% time when I load it with some x264s it becomes busy. if amd has a nice cp that can speed x264 like 5 times, I believe for all intents and purposes, it would present a better value to me.

another things is, the high end chips while has huge profit margin probably just account for like 10% of the market, so I don't think it's so huge for balance sheet of either company. intel actual makes a ton of cash from the atom. it's very cheap to make.

As I read your post it invokes images of what I think Intel is intending to do with Larrabee and Sandy Bridge.

i was thinking the same thing :laugh:
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: Idontcare
Just to clarify, the comparison to be made at 32nm is the following:

Intel 32nm sram = 0.171 µm²

AMD 32nm sram = 0.149 µm²

That would be great news if AMD could get the SRAM density @ 32nm to be similar or better than Intel. This way they end up with smaller and cheaper CPUs.

 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: lopri
Conroe -> Nehalem is by no means 'incremental'. And remember that you can do it only once.

If you count HT, TurboBoost and server performance, then yes Nehalem is a great jump.

But what I meant was single core IPC, without HT and TurboBoost, at the same clocks as Penryn. I think the difference on average was like 7% faster than Penryn. Correct me if I'm wrong, too lazy to check now :)
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
Originally posted by: Kuzi
Originally posted by: lopri
Conroe -> Nehalem is by no means 'incremental'. And remember that you can do it only once.

If you count HT, TurboBoost and server performance, then yes Nehalem is a great jump.

But what I meant was single core IPC, without HT and TurboBoost, at the same clocks as Penryn. I think the difference on average was like 7% faster than Penryn. Correct me if I'm wrong, too lazy to check now :)

So do you call the shift from X2 to Phenom incremental as well? It was a whole new architecture.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Kuzi I hope you are the kind of person who takes my "point, counterpoint" dialogue in stride and enjoys it. I think you do, I think you are that type of person. I certainly enjoy it myself, but please let me know if my counter-pointing you is getting annoying or irritating as I certainly don't want to tread there.

Having said that, I'll continue to pester you by providing "counter-points" to your "points" :D

Originally posted by: Kuzi
Originally posted by: lopri
Conroe -> Nehalem is by no means 'incremental'. And remember that you can do it only once.

If you count HT, TurboBoost and server performance, then yes Nehalem is a great jump.

But what I meant was single core IPC, without HT and TurboBoost, at the same clocks as Penryn. I think the difference on average was like 7% faster than Penryn. Correct me if I'm wrong, too lazy to check now :)

It is true that Nehalem's single-threaded IPC took a relatively minor "half-step" forward over penryn but I'd like to point out that the IPC/watt was the far more impressive aspect of the Nehalem architecture IMO.

I'm sure if the Intel engineers had been budgeted with a "boost single-thread IPC all you can while keeping IPC/watt at the same level as Penryn" then we'd have seen much more impressive gains in IPC at equally less impressive (or non-existent) reduction in power-consumption and performance/watt metrics.

I don't think you can cast Nehalem's IPC gains relative to Penryn's as a single evaluation metric that can be used to then project a trend regarding the pace or trajectory of future single-threaded IPC advancements.

I am thinking specifically of the data as crunched in this post.

The IPC metric needs to be normalized wrt power-consumption (at a minimum) and probably xtor budget as well before we can make claim to any insight or estimation as to whether the rate of single-threaded IPC improvements is slowing down, staying constant, or even possibly picking up speed with all these ISA and architecture enhancements flowing out of the Tick-Tock model.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: ExarKun333
So do you call the shift from X2 to Phenom incremental as well? It was a whole new architecture.

Phenom was really a highly tweaked x2 (K8), and with L3 cache added and 4 cores instead of two. The x2 was basically an Athlon XP (K7) but with an integrated memory controller, 64bit support, and two cores.

The Phenom was about 15% faster than the x2, which was a nice bump in performance, but it is still the same architecture with many tweaks and improvements. AMD has been using the same CPU architecture for over a decade, it is time for a change :)
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: Idontcare
Kuzi I hope you are the kind of person who takes my "point, counterpoint" dialogue in stride and enjoys it. I think you do, I think you are that type of person. I certainly enjoy it myself, but please let me know if my counter-pointing you is getting annoying or irritating as I certainly don't want to tread there.

Having said that, I'll continue to pester you by providing "counter-points" to your "points" :D

Not at all IDC, I enjoy reading your posts and find them very informative.


It is true that Nehalem's single-threaded IPC took a relatively minor "half-step" forward over penryn but I'd like to point out that the IPC/watt was the far more impressive aspect of the Nehalem architecture IMO.

I'm sure if the Intel engineers had been budgeted with a "boost single-thread IPC all you can while keeping IPC/watt at the same level as Penryn" then we'd have seen much more impressive gains in IPC at equally less impressive (or non-existent) reduction in power-consumption and performance/watt metrics.

I don't think you can cast Nehalem's IPC gains relative to Penryn's as a single evaluation metric that can be used to then project a trend regarding the pace or trajectory of future single-threaded IPC advancements.

I am thinking specifically of the data as crunched in this post.

The IPC metric needs to be normalized wrt power-consumption (at a minimum) and probably xtor budget as well before we can make claim to any insight or estimation as to whether the rate of single-threaded IPC improvements is slowing down, staying constant, or even possibly picking up speed with all these ISA and architecture enhancements flowing out of the Tick-Tock model.

I see your point here, thanks for the link, when taking the performance/watt numbers the i7 looks very good against Core 2. As you said, Intels engineers had to balance IPC/Watt, and they could not just add IPC at the expense of power consumption. I'm sure AMD, ATI, NV have to deal with similar problems too.

I believe HT is giving Nehalem this huge advantage over Penryn, HT is probably one of the most important features added to i7 that gives it this large IPC/Watt improvement. Especially that the HT circuitry itself is not taking much space on the CPU.

Now that brings me back to my main point, that Bulldozer must have some form of simultaneous multi-threading in order to boost performance and IPC/Watt numbers. Adding more cores boosts IPC at the expense of size, heat, and power consumption. Intel found an elegant approach a long time ago, and that is Hyper Threading. In the Pentium 4 days HT was used too early for it's time, and on a relatively bad architecture to really be of much benefit.