AMD vs Intel at the high end in the future

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I think the lowered L1 cache latency will be more relevant for the single thread performance, which is what most people care about. From what I read, the not-better gaming performance is a very big thing. It's partially attributed to the extra hop for PCI Express, but after looking at comparisons between AMD Agena and Windsor for gaming benchmarks, its not entirely convincing either: http://www.tomshardware.com/re...lon-64-x2,1746-10.html

 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
For desktop use most people shouldn't have any problem turning HT off on Westmere. 6 threads are enough for almost anything consumers will want to do (excepting some specific circumstances). Making the HT/split-cache problem irrelevant.

And honestly I couldn't care less about the 5th and 6th cores, I just want more Mhz (my goal is still supcom, and 4 threads are enough for it, but each thread needs to be faster).
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: ilkhan
For desktop use most people shouldn't have any problem turning HT off on Westmere. 6 threads are enough for almost anything consumers will want to do (excepting some specific circumstances). Making the HT/split-cache problem irrelevant.

And honestly I couldn't care less about the 5th and 6th cores, I just want more Mhz (my goal is still supcom, and 4 threads are enough for it, but each thread needs to be faster).

Excepting for Gulftown which isn't expected to debut until Q2/2010, the westmere chips we will see are all dual-core.

The idea being that the hyperthreading technology will be superior enough in Westmere as to enable it to be performance competive with AMD's PhII quadcores at the time.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I mentioned in a new thread: http://forums.anandtech.com/me...=2302652&enterthread=y

Intel is being very secretive right now. Even the usual sites that get the news don't have them. Probably more SKUs will be released. It just does not make sense that they are willing to lose that quad core + IGP market, especially when after Gulftown, quad cores will no longer be the top end.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Bigger cache size is not always better for L1 and L2 caches, especially for CPUs that have an IMC and L3 cache (Nehalem&Phenom). The L1 cache in Nehalem may be a bit slower than in Core 2 processors, but the L2 cache while only 256K in size, is about 50% faster than it was in Core 2 processors. Overall the cache subsystem in Nehalem is pretty good, a bit better than what Phenom II has now.

As you say IntelUser2000, Intel may improve the cache subsystem for Westmere and gain a little performance there. For AMD, I think they will go the Nehalem route with Bulldozer and maybe lower the L2 cache size to 256K, to lower the latency. Unless they can keep the same 512K L2 cache per core and improve it's performance, which is also possible, because AMD has more room for improvement than Intel in this area.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: Idontcare
Excepting for Gulftown which isn't expected to debut until Q2/2010, the westmere chips we will see are all dual-core.

Really? I guess I wasn't paying attention, what about Quad Westmere?

The idea being that the hyperthreading technology will be superior enough in Westmere as to enable it to be performance competive with AMD's PhII quadcores at the time.

That would be so dangerous for AMD, if a dual core Westmere is competitive with PII. A dual core Westmere die at 32nm would be so tiny and cheap to produce compared to PII that AMD would have no chance with a price war there.

Again, another reason why Bulldozer really needs some form of SMT to compete.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
See, Nehalem/Westmere generation is very confusing. Westmere will be on a new process, but it indicates no cache size increases. But the enhancements are supposed to come. IDF 2008 presentation says Westmere will feature "Cache enhancements". I'm thinking this is latency and/or bandwidth, probably latency.

That would be so dangerous for AMD, if a dual core Westmere is competitive with PII. A dual core Westmere die at 32nm would be so tiny and cheap to produce compared to PII that AMD would have no chance with a price war there. Again, another reason why Bulldozer really needs some form of SMT to compete.

Update: Also another to watch for 2009 Nehalem will be Turbo Mode for overclocker-restrained markets like notebooks. Naturally, notebook systems will benefit more because of the form factor.

Current Bloomfield Nehalem has a PCU and the power gate transistor, but the power gating only works on the cores, not the uncore. The power gating allows cores to reach C6 state, which is an extremely low power state. In mainstream Nehalem like Lynnfield, power gating will extend to the uncore.

Turbo Mode from what it looks like will enable the quad cores to be not worse off than dual cores in all usage scenarios. Dual core mode on Lynnfield will allow clock speeds akin to the fastest Core 2 Duo, which is at 3.33-3.5GHz. Power consumption will be kept in control by the power gate transistor, which means in notebooks, quad core notebooks will have similar power consumption to the dual core notebooks in dual core operation, and similar performance thanks to Turbo Mode.
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
Kuzi: Intel hasn't talked about a quad westmere part, at all. Not mobile, not desktop, not server. Nothing.

IntelUser2000: Im with you, If theres no size increase I expect latency to drop.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Did anyone notice that Lynnfield has its own wafer?? They do really mean it when Nehalem is described as "scalable". So we have couple of dies based on one architecture.

-Bloomfield: Triple Channel DDR3 on CPU die
-Lynnfield/Clarksfield: Dual Channel DDR3/PCI Express on CPU die(not on MCM)
-Clarkdale/Arrandale: Dual Channel DDR3/PCI Express/IGP on MCM
-Nehalem-EX: 8 core/Quad Channel DDR3/4x QPI on CPU die
-Jasper Forest: This is essentially Lynnfield, but was shown its own wafer for some reason. It's a quad core Nehalem for embedded markets.
 

ShawnD1

Lifer
May 24, 2003
15,987
2
81
Originally posted by: Kuzi
The idea being that the hyperthreading technology will be superior enough in Westmere as to enable it to be performance competive with AMD's PhII quadcores at the time.

That would be so dangerous for AMD, if a dual core Westmere is competitive with PII. A dual core Westmere die at 32nm would be so tiny and cheap to produce compared to PII that AMD would have no chance with a price war there.

Again, another reason why Bulldozer really needs some form of SMT to compete.

Intel's dual cores are already able to compete against AMD's quad cores (in certain tasks). That doesn't matter though since they still don't have a price advantage when doing this:
$150 Phenom 9950 link
$168 C2D E8400 link
$170 Phenom II 810 link

These processor families are fairly close at things like Photoshop and some games. The Phenoms start to pull away from the C2D when doing heavily threaded things like video encoding or CPU rendering. link

Then there's also market perception. This would actually work in AMD's favor since 4 cores sounds better than 2. It doesn't matter if they have the same performance, 4 is bigger than 2 so it must be better :D
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: ShawnD1
Intel's dual cores are already able to compete against AMD's quad cores (in certain tasks). That doesn't matter though since they still don't have a price advantage when doing this:
$150 Phenom 9950 link
$168 C2D E8400 link
$170 Phenom II 810 link

These processor families are fairly close at things like Photoshop and some games. The Phenoms start to pull away from the C2D when doing heavily threaded things like video encoding or CPU rendering. link

The thing is a dual core Westmere "may" only be ~ 82 mm^2 in size, a Phenom II Quad is 245 mm^2, that is 3 times larger. So Intel can produce many more CPUs per wafer which allows them to lower the price.

The other problem AMD may face is the clock speed of Westmere. If we assume clocks for Westmere @ 3.6GHz and higher going against a PII @ 3.2/3.4GHz, Westmere will be faster in basically everything even apps that use more that 2 threads.

It does not look good for a company if the competitor can release a chip only a third the size, while also being faster and more power efficient.

Then there's also market perception. This would actually work in AMD's favor since 4 cores sounds better than 2. It doesn't matter if they have the same performance, 4 is bigger than 2 so it must be better :D

There is also the clock speed perception which helped Intel during the Pentium 4 days, and Westmere should have a lot of that :D
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Kuzi
It does not look good for a company if the competitor can release a chip only a third the size, while also being faster and more power efficient.

Yep, that's why it is all the more critical that GlobalFoundries "picks up the pace" of their node cadence.

Being (a) smaller in revenue/sales/earnings, and (b) perpetually one year behind on taking advantage of node shrinks, means AMD's future is kind of a foregone conclusion. This is one that even Nostradameus could get right :laugh:
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
The die size of 32nm dual core Westmere looks similar to Penryn. I reckon they are going to be ending up to be somewhere in the range of low 100's. The Phenom II has a die size of 258mm2, extremely similar to Nehalem's 263mm2.

One thing that won't go 100% favorable to Intel is that every dual core will feature MCM'd IGP. The IGP's die size is larger than the CPU core.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: IntelUser2000
The die size of 32nm dual core Westmere looks similar to Penryn. I reckon they are going to be ending up to be somewhere in the range of low 100's. The Phenom II has a die size of 258mm2, extremely similar to Nehalem's 263mm2.

One thing that won't go 100% favorable to Intel is that every dual core will feature MCM'd IGP. The IGP's die size is larger than the CPU core.

Wouldn't that be a pretty poor scaling factor for dual-core Westmere if it ends up >100mm^2?

I expect ~80mm^2.

And the 45nm IGP for MCM with westmere is larger than the entire westmere chip, not just a core. Maybe you meant to say that instead of "larger than the CPU core"? A cpu core on westmere will be what, maybe 25mm^2 at most?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Originally posted by: Idontcare

Wouldn't that be a pretty poor scaling factor for dual-core Westmere if it ends up >100mm^2?

I expect ~80mm^2.

And the 45nm IGP for MCM with westmere is larger than the entire westmere chip, not just a core. Maybe you meant to say that instead of "larger than the CPU core"? A cpu core on westmere will be what, maybe 25mm^2 at most?

The scaling is actually quite poor. Theory suggests that from each process generation, it affords twice the transistor density(0.7x0.7= ~0.5), real scaling is worse, more in the range of "pure number" reduction(ie. 65nm to 45nm is 0.7), rather than the square of it.

http://www.chip-architect.com/...19_Various_Images.html

Throughout 0.13u to 45nm, from Pentium M to 45nm Penryn, the scaling has been approximately 0.7 for logic. Now for SRAM it follows the ideal scaling.

Back in the P4 days, logic scaling has been closer to the 50% reduction, but that has gone away since Pentium M. Probably has to do with optimizing for power and heat control.

For the second point, yea you are right. I have not worded it right. To rephrase it: The GMCH portion of the MCM in Clarkdale/Arrandale is larger than the CPU portion.
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
I wonder how much power a single core westmere, with 256 L2, no l3, a single channel mem controller, 4 PCI-E lanes with onboard 32nm GPU would use at 2Ghz. More than atom, but probably not too much more for the total platform vs Atom/945GM.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Originally posted by: ilkhan
I wonder how much power a single core westmere, with 256 L2, no l3, a single channel mem controller, 4 PCI-E lanes with onboard 32nm GPU would use at 2Ghz. More than atom, but probably not too much more for the total platform vs Atom/945GM.

Are you talking about the Nettop Atom/945 or the Netbook Atom/945? Since you said 945GM I assume the latter. The 945GM on Netbooks are one of the most power efficient chips out there. Nettop Atom 945G uses 22W, but Netbook 945GMS only uses 4.5W.

In comparison, the lowest power 65nm GMCH uses 8W in the form of GS45, and the 90nm low power GM965 is 9.5W.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: IntelUser2000
The die size of 32nm dual core Westmere looks similar to Penryn. I reckon they are going to be ending up to be somewhere in the range of low 100's. The Phenom II has a die size of 258mm2, extremely similar to Nehalem's 263mm2.

One thing that won't go 100% favorable to Intel is that every dual core will feature MCM'd IGP. The IGP's die size is larger than the CPU core.

Unless Intel adds some enhancements (more transistors) and/or more L2 cache etc to Westmere cores, I don't see the die size going over 85 mm^2.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Originally posted by: Kuzi

Unless Intel adds some enhancements (more transistors) and/or more L2 cache etc to Westmere cores, I don't see the die size going over 85 mm^2.

Hmm. I guess that would also depend largely on how Havendale's die size would have been, which noone outside of Intel knows.

One thing is sure, pricing for Clarkdale is going to be really attractive.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: IntelUser2000
Originally posted by: Idontcare

Wouldn't that be a pretty poor scaling factor for dual-core Westmere if it ends up >100mm^2?

I expect ~80mm^2.

And the 45nm IGP for MCM with westmere is larger than the entire westmere chip, not just a core. Maybe you meant to say that instead of "larger than the CPU core"? A cpu core on westmere will be what, maybe 25mm^2 at most?

The scaling is actually quite poor. Theory suggests that from each process generation, it affords twice the transistor density(0.7x0.7= ~0.5), real scaling is worse, more in the range of "pure number" reduction(ie. 65nm to 45nm is 0.7), rather than the square of it.

http://www.chip-architect.com/...19_Various_Images.html

Throughout 0.13u to 45nm, from Pentium M to 45nm Penryn, the scaling has been approximately 0.7 for logic. Now for SRAM it follows the ideal scaling.

Back in the P4 days, logic scaling has been closer to the 50% reduction, but that has gone away since Pentium M. Probably has to do with optimizing for power and heat control.

For the second point, yea you are right. I have not worded it right. To rephrase it: The GMCH portion of the MCM in Clarkdale/Arrandale is larger than the CPU portion.

You are aware I spent more than a decade developing technology nodes? (from 0.5um to 32nm) Thanks for educating me on the reality of scaling though :laugh:

At any rate, Nehalem is 263mm^2 and has two QPI's that are not included in 2C westmere. However westmere will include on-die PCIe...so we aren't sure how much of the areal savings from ditching the QPI will be spent on PCIe. But let's assume what is likely a worst case and say PCIe requires the same die-space as one QPI, so the tradeout is a wash.

Cut Nehalem down to a 2c chip, 50% of 263mm^2 = 132mm^2. A really crappy shrink would be 70%...that means 92mm^2. Now given that areal scaling of Intel's 45nm was limited by the double-patterning limitations of 193nm dry litho, the migration to 32nm with 193nm immersion-litho is going to give a better than trend increase in density, but let's say Intel doesn't figure out how to do that and so they merely deliver a 60% areal shrink going to 32nm with immersion lith.

A 60% areal shrink would net a westmere of 78.9mm^2. Give it some room for I/O pads, etc that don't usually shrink as much and figure around 80mm^2. Again, IMO this is an upper limit as it really assumes Intel does not fully leverage the better than trend litho enhancements that immersion litho could provide their 32nm node over the gate and metal pitch used in their 45nm double-pattern integration.

If Westmere is 100mm^2 that would be a disaster from a node entitlement standpoint. At TI our absolute worst node shrink was a 72% areal shrink in the logic areas at the 90nm node because we went with single-damascene (the only company in the industry to do that) and we did not want to challenge our metal pitch over concerns of TDDB reliability. That 72% areal shrink was heavily criticized internally as it translated into nearly zero net cost savings for product shrinks once the cost of the team doing the shrink plus maskset costs were factored in.

You simply MUST hit better than 70% areal shrink to make the node transition cost effective. A 100mm^2 westmere would be a disastrously bad areal shrink on Intel's behalf.

I expect ~80mm^2 for 2C westmere.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: IntelUser2000
Hmm. I guess that would also depend largely on how Havendale's die size would have been, which noone outside of Intel knows.

Intel canceled Havendale (CPU+GPU) for the 45nm process, but will release a 32nm version, do you know if the GPU in those will be something new, or will it still use the same horrendous IGP stuff Intel uses in Mobos?

It will be interesting to see how Havendale fairs against Fusion (Ontario), which was delayed too till AMD gets to 32nm. The IGP in Fusion will surely have good performance, most likely better than Havendale, but the IPC per core may be lower.

Since Fusion got delayed, hopefully AMD uses Bulldozer cores in it and not K10.5 cores. These Hybrid chips should be great for Notebooks.
 

ilkhan

Golden Member
Jul 21, 2006
1,117
1
0
As far as we know the GPU in clarkdale/arrandale is an improved and shrunk X4500HD. It'll do its job just fine, but won't game very well at all.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Kuzi
Originally posted by: IntelUser2000
Hmm. I guess that would also depend largely on how Havendale's die size would have been, which noone outside of Intel knows.

Intel canceled Havendale (CPU+GPU) for the 45nm process, but will release a 32nm version, do you know if the GPU in those will be something new, or will it still use the same horrendous IGP stuff Intel uses in Mobos?

It will be interesting to see how Havendale fairs against Fusion (Ontario), which was delayed too till AMD gets to 32nm. The IGP in Fusion will surely have good performance, most likely better than Havendale, but the IPC per core may be lower.

Since Fusion got delayed, hopefully AMD uses Bulldozer cores in it and not K10.5 cores. These Hybrid chips should be great for Notebooks.

This AT article on Intel's stated 32nm plans contains a lot of nuggets.

I went to it to pull this up for your question regarding Clarkdale's IGP:

Clarkdale/Arrandale have 32nm CPUs but their on-package GPUs are still built on Intel?s 45nm process; these are the GPUs that were supposed to be used for Havendale! It won?t be until 2010 with Sandy Bridge that we see a 32nm CPU and 32nm GPU on the same package.

And I realized it also contains this info regarding scaling (to address IntelUser2000's concerns on 32nm scaling):
At 32nm the transistors are approximately 70% the size of Intel?s 45nm hk + mg transistors, allowing Intel to pack more in a smaller area.
Click me to see Intel's slide with this stated

http://www.anandtech.com/cpuch...howdoc.aspx?i=3513&p=2

And if you look at their sram scaling trend you can see that 32nm sram scaling is actually better than 45nm, as expected considering 45nm scaling from 65nm was handicapped with the limitations of double-patterning dry litho whereas 32nm scaling from 45nm gets to take advantage of the immersion litho. Had 45nm used immersion litho then its scaling over 65nm would have been a little better than it was, and then the scaling to 32nm would have been a little less impressive as it is.

Having said that I also realized upon revisiting AT's document that the last page of the article discusses Westmere's new instructions and I had not considered the fact that Westmere's cores are going to contain some "bloat" as the ISA gets expanded a little bit. So 80mm^2 might be to small if one allows for the possibility of the cores themselves to be increased a modest but reasonable 5mm^2 each.

So maybe 90mm^2 for westmere?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: ilkhan
That quote is confusing. Sandy gets on-die GPU (Cougar Point), yes?

That's what I thought, but you are right the AT quote leaves open the possibility of Sandy being an MCM'ed IGP as well. Good catch, hadn't sunk in when I read it before.

I guess it begs the question - what advantage would it be to have the graphics integrated monolithically with the CPU?

We know the disadvantages in terms of yield and costs that come with increasing the diesize...but is it really an advantage to make it monolithic if the communications pathway between the two discreet IC's is not ever saturated in practice?

I'm having my doubts.