ITT: We discuss processors for Steam & whether Westmere 2C/4T should be resurrected?

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Intel isn't going to invest in old and obsolete process nodes and architectures. Those only have disadvantages.

If Intel wants to make ARM irrelevant, they have to do the opposite.
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
Since it is a desktop, not having the latest node shouldn't be a problem. (In fact, 32nm on desktop clocks quite high, although cooling could become an issue if frequencies were pushed to extremes)

Now as far as using newer uarchs on 22nm and beyond, I would be concerned about the amount of logic that would need to be disabled to make an extreme budget gamer desktop chip ($15 and below, including PCH). Sure your cost per xtor is somewhat lower on advanced nodes, but then then the chip has more total xtors and more of them are being disabled to create the differentiation. Sure some of these disabled units will come from defects, but how much volume is that really going to add? I would think it would not be much and most of the volume necessary would have to be created from disabling perfectly good logic.

Alternatively, there are always chips like Braswell (quad core 14nm atom with 16 Gen 8 EUs, optimized for mobile xtors, SOC). And while I think something like this is fine for high end tablet (it is a tablet chip re-purposed for desktop, afterall) I have to believe it is less than optimum for x86 gamer desktop for many reasons.

Here are some of them:

1. Quad small (ie, atom) core: This is not a good idea for x86 gamer desktop because most of the existing x86 games suitable for its low voltage 16EU iGPU would be single or dual thread games. Two large cores would have been a better use of silicon die area here if it were designed from the ground up as a specialized budget desktop gamer chip.

2. Optimized for mobile xtors: While the low leakage rate is great for mobile, on the desktop the low drive current and low max frequencies make for a poorer value. For optimum value on desktop, I would like to see a die optimized for higher voltage/frequency per mm2 silicon area.

3. SOC: While integrating PCH is beneficial for saving space in the tight confines of a phone or 8" tablet, I have read it does nothing (or very very little) for performance. In fact, in some cases integrating the PCH can bloat the die to the point where some CPU and GPU die area need to be sacrificed in order to keep costs down.
If $15 is your target for a big core chip, all I can say is it's not gonna happen. This price point is what Atom is for. Flat out, you will not get anything more than mobile gaming from a $15 chip.

To put the Westmere resurrection to rest, I'd have to point out that die space relates directly to cost, the bigger the die, the more expensive it is to produce. A 32 nm Westmere chip is likely to be at least as expensive, if not more so than a 2C 22nm Haswell chip. The price difference for Intel will expand even more with Broadwell. It makes zero sense to bring back 32nm for their processors.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
To put the Westmere resurrection to rest, I'd have to point out that die space relates directly to cost, the bigger the die, the more expensive it is to produce. A 32 nm Westmere chip is likely to be at least as expensive, if not more so than a 2C 22nm Haswell chip.

A dual core Haswell is 130mm2 on 22nm ----> http://www.anandtech.com/show/7744/intel-reveals-new-haswell-details-at-isscc-2014

Whereas, 2C/4T Westmere is only 81mm2 on 32nm.

So that is a massive difference in die sizes, not to mention a good difference in process tech.

P.S. Even if the Haswell were only 81mm2 on 22nm (just for the sake of argument), the 81mm2 Westmere on 32nm would be cheaper to make. So we just can't go purely by die sizes when trying to estimate cost.
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
A dual core Haswell is 130mm2 on 22nm ----> http://www.anandtech.com/show/7744/intel-reveals-new-haswell-details-at-isscc-2014

Whereas, 2C/4T Westmere is only 81mm2 on 32nm.

So that is a massive difference in die sizes, not to mention a good difference in process tech.

P.S. Even if the Haswell were only 81mm2 on 22nm (just for the sake of argument), the 81mm2 Westmere on 32nm would be cheaper to make. So we just can't go purely by die sizes when trying to estimate cost.
According to that diagram you posted, the iGPU on dual core Haswell accounts for more than half of the die itself.

In addition, even if you can make Westmere itself cheaper to make than a Haswell die, there is the extra cost of implementing the iGPU and associated chipset as a separate die. :rolleyes:
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I wish I had better access to information regarding Intel's fabs, but according to the following list here are the fabs currently in use:

http://en.wikipedia.org/wiki/List_of_Intel_manufacturing_sites

Fab sites:

D1X Hillsboro, Oregon, USA 300 mm, 14 nm
D1D Hillsboro, Oregon, USA 300 mm, 14 nm
D1C Hillsboro, Oregon, USA 300 mm, 22/14 nm
Fab 12 Chandler, Arizona, USA 300 mm, 65 nm
Fab 32 Chandler, Arizona, USA 300 mm, 22/14 nm
Fab 42 Chandler, Arizona, USA 450 mm, 14 nm
Fab 11 Rio Rancho, New Mexico, USA 300 mm, 45/32 nm
Fab 11X Rio Rancho, New Mexico, USA 300 mm, 45/32 nm
Fab 17 Hudson, Massachusetts, USA 200 mm, 130 nm
Fab 24 Leixlip, Ireland 300 mm, 14 nm
Fab 28 Kiryat Gat, Israel 300 mm, 22 nm
Fab 68 Dalian, China 300 mm, 65 nm

Perhaps if there was a need maybe some of the 65nm nodes could at least partly transition to 45nm to increase capacity if necessary.

Terrible list tho. Fab 17 for example is closed. And D1X is 2 fabs, both 450mm capable. And so on and on.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
A dual core Haswell is 130mm2 on 22nm ----> http://www.anandtech.com/show/7744/intel-reveals-new-haswell-details-at-isscc-2014

Whereas, 2C/4T Westmere is only 81mm2 on 32nm.

So that is a massive difference in die sizes, not to mention a good difference in process tech.

P.S. Even if the Haswell were only 81mm2 on 22nm (just for the sake of argument), the 81mm2 Westmere on 32nm would be cheaper to make. So we just can't go purely by die sizes when trying to estimate cost.

You compare apples and oranges for that matter.

You also forgot this part for the Westmere if you somehow wish to compare them:
clarkdale_dice.jpg
 

kimmel

Senior member
Mar 28, 2013
248
0
41
I think this thread exemplifies why Atom is taking over quite a bit of volume for Intel. People are seriously asking for Westmere? When the in the couple years both Arm and Atom could possibly be within spitting distance at massively lower power and cost.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Is that including both the core and the IGP die for westmere? The Haswell die includes both the cores and the IGP.

No it doesn't.

I only included the die size for 2C/4T Westmere.

Westmere does not include the 45nm Iron Lake GMA graphics chip which is 114 nm.

Clarkdale = Westmere + Iron Lake
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
You compare apples and oranges for that matter.

You also forgot this part for the Westmere if you somehow wish to compare them:
clarkdale_dice.jpg

The comparison was Haswell dual core die size vs. dual core Westmere, not Haswell dual core vs. Clarkdale (which is dual core Westmere + Iron Lake).
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
The comparison was Haswell dual core die size vs. dual core Westmere, not Haswell dual core vs. Clarkdale (which is dual core Westmere + Iron Lake).

And thats an irrelevant compare. You can also make a dualcore Haswell or Broadwell without IGP, memory controller and so on.
If Haswell was made like the Westmere you link, it would be what, 50mm2? Broadwell, 30mm2?

If you claim its the same diesize in terms of cost, you also have to include all. Your Westmere is useless without the MCP.
 
Last edited:

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
The comparison was Haswell dual core die size vs. dual core Westmere, not Haswell dual core vs. Clarkdale (which is dual core Westmere + Iron Lake).
Regardless, you still need an additional die alongside Westmere for your iGPU, Memory controller, etc. This additional die adds significant cost, and there's still the issue of memory latency.

Unless Intel has a warehouse full of Westmere chips, cost savings not found.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
And thats an irrelevant compare.

Remember, this thread is about relaunching 2C/4T with a new on graphics/memory controller chip.

So knowing the cost of 2C/4T Westmere is very relevant when making a comparison to a CPU with iGPU as a single chip.

If 2C/4T Westmere were actually more expensive than Haswell dual core, I wouldn't have even made this thread.

You can also make a dualcore Haswell or Broadwell without IGP, memory controller and so on.

Intel doesn't make the Haswell or Broadwell chips that way.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Regardless, you still need an additional die alongside Westmere for your iGPU, Memory controller, etc. This additional die adds significant cost

The additional cost will depend on what is used for the on package graphics and memory controller chip.

Here are some possibilities:

1. Nvidia Kepler on TSMC 28nm
2. Nvidia Maxwell on TSMC 28nm
3. Nvidia Kepler on Intel 22nm
4. Nvidia Maxwell on Intel 22nm
5. Intel Gen 7 or 7.5 on 22nm
6. Imagination Tech Series 6 on Intel 22nm (Merrifield uses this IP)
7. Intel Gen 8 on 14nm

P.S. Remember that with advanced nodes, defect rate on wafers rise exponentially with area. So in some cases I think having the graphics/memory controller being a separate chip could help reduce costs.

and there's still the issue of memory latency.

Hopefully the latency issue mentioned in the OP would be fixed by using QPI rather than FSB.
 

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
Remember, this thread is about relaunching 2C/4T with a new on graphics/memory controller chip.

So knowing the cost of 2C/4T Westmere is very relevant when making a comparison to a CPU with iGPU as a single chip.

If 2C/4T Westmere were actually more expensive than Haswell dual core, I wouldn't have even made this thread.

Intel doesn't make the Haswell or Broadwell chips that way.

I probably mentioned this before, but resurrecting Westmere is probably more expensive that fusing off a Haswell or taking an Atom IP.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
I probably mentioned this before, but resurrecting Westmere is probably more expensive that fusing off a Haswell

Back in post #25 of this thread you mentioned you thought it would be better to use fuses on a current generation processor than to shrink Westmere down to 22nm:

Assuming the project was successful, Intel could always shrink down Westmere to a smaller node (at some later point in time).

I think once you open that door, you're better off using fuses on a current gen processor. The amount of design resources to build another core derivative (even if it's logically the same as a previous gen) is enormous.

So I took that as quite a bit different than using Westmere "As is".

So moving on to the idea of using a fused off Haswell dual core as an extreme budget gaming processor we are looking at a 130mm2 die size on 22nm. This is a good deal larger than the 102mm2 on 22nm for Bay Trail which Intel currently sells for as low as $17 according to Intel Ark:

http://ark.intel.com/products/80274/Intel-Atom-Processor-Z3735F-2M-Cache-up-to-1_83-GHz

http://ark.intel.com/products/80275/Intel-Atom-Processor-Z3735G-2M-Cache-up-to-1_83-GHz

Also in addition to whatever your fused off Haswell would cost, we would also need one of the 32nm PCHs.

Now with that information and data in mind, realize I am hoping to see a Rockchip level x86 desktop big core gaming processor released. So with that mentioned, what kind of specs and pricing do you have in mind for a dual core Haswell? What would be your time frame for releasing your chip?

P.S. I am not against your idea of using fused off Haswell. Quite the contrary I believe Intel could make some really interesting budget level SKUs, including some feature disabled 2C/4T GT2 configurations that they haven't released before. However, I question if they would be cheap enough for Rockchip level Big core gear? Also it seems that Intel's graphics uarchs improve much faster than their cpu uarchs, so if these chips get too old I have to think a point comes when mixing an old Intel big core (eg, Westmere) with a new GPU uarch (eg, Nvidia, Gen 8, etc) could be a better value.

or taking an Atom IP.

I think atom is just too slow for a gaming desktop.

....And as I mentioned back in post #49 I question the value of atom even more when it is the form of a desktop SOC like Braswell for at least the three reasons listed:

1. Quad small (ie, atom) core: This is not a good idea for x86 gamer desktop because most of the existing x86 games suitable for its low voltage 16EU iGPU would be single or dual thread games. Two large cores would have been a better use of silicon die area here if it were designed from the ground up as a specialized budget desktop gamer chip.

2. Optimized for mobile xtors: While the low leakage rate is great for mobile, on the desktop the low drive current and low max frequencies make for a poorer value. For optimum value on desktop, I would like to see a die optimized for higher voltage/frequency per mm2 silicon area.

3. SOC: While integrating PCH is beneficial for saving space in the tight confines of a phone or 8" tablet, I have read it does nothing (or very very little) for performance. In fact, in some cases integrating the PCH can bloat the die to the point where some CPU and GPU die area need to be sacrificed in order to keep costs down.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
P.S. Remember that with advanced nodes, defect rate on wafers rise exponentially with area. So in some cases I think having the graphics/memory controller being a separate chip could help reduce costs.

22nm is cheaper than 32nm in all metrics for Intel.

Hopefully the latency issue mentioned in the OP would be fixed by using QPI rather than FSB.

Westmere already used QPI for the MCP.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Intel doesn't make the Haswell or Broadwell chips that way.

Its cheaper to make a new die than it is to resurrect 32nm westmeres. And you would sit with 30-50nm CPUs with much lower TDP, lower cost and so on.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
cbn said:
P.S. Remember that with advanced nodes, defect rate on wafers rise exponentially with area. So in some cases I think having the graphics/memory controller being a separate chip could help reduce costs.

22nm is cheaper than 32nm in all metrics for Intel.

If this were true I think we would have seen Intel integrate PCHs on the low voltage 22nm Haswell mobile chips, but we didn't. Instead the PCH stayed on 32nm rather than being integrated on 22nm

Same thing for 14nm Broadwell ULV chips, the PCH isn't even 22nm. They still kept it at 32nm.



(Broadwell ULT/ULX on 14nm with 32nm PCH, left and Haswell ULT/ULX on 22nm with 32nm PCH, right)
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Westmere already used QPI for the MCP.

Yes, Anand mentioned the chipset used QPI to connect to the Clarkdale multichip package (MCP), but the on package memory controller is mentioned as optimized for FSB architectures (which according to the article made matters worse beyond simply moving off the cpu die)

http://www.anandtech.com/show/2901/2

Memory Performance - Not Very Nehalem

Let’s start at the obvious place, memory performance. Nehalem moved the memory controller on-die, but Clarkdale pushes it off again and over to an on-package 45nm graphics core.

To make matters worse, the on-package chipset is a derivative of the P45 lineage. It’s optimized for FSB architectures, not the QPI that connects the chipset to Clarkdale. Let’s look at the numbers first:

Processor L1 Latency L2 Latency L3 Latency
Intel Core i7-975 4 clocks 10 clocks 34 clocks
Intel Core i5-750 4 clocks 10 clocks 34 clocks
Intel Core i5-661 4 clocks 10 clocks 39 clocks
AMD Phenom II X4 965 3 clocks 15 clocks 57 clocks
Intel Core 2 Duo E8600 3 clocks 15 clocks



L1 and L2 cache latency is unchanged. Nehalem uses a 4-cycle L1 and a 10-cycle L2, and that’s exactly what we get with Clarkdale. L3 cache is a bit slower than the Core i7 975, which makes sense because the Core i5 661 has a lower un-core clock (2.40GHz vs. 2.66GHz for the high end Core i7s) Intel says that all Clarkdale Core i5s use the same 2.40GHz uncore clock, while the i3s run it at 2.13GHz and the Clarkdale Pentiums run it at 2.0GHz.

Processor Memory Latency Read Bandwidth Write Bandwidth Copy Bandwidth
Intel Core i7-975 45.5 ns 14379 MB/s 15424 MB/s 16291 MB/s
Intel Core i5-750 51.5 ns 15559 MB/s 12432 MB/s 15200 MB/s
Intel Core i5-661 76.4 ns 9796 MB/s 7599 MB/s 9354 MB/s
AMD Phenom II X4 965 52.3 ns 8425 MB/s 6811 MB/s 10145 MB/s
Intel Core 2 Duo E8600 68.6 ns 7975 MB/s 7062 MB/s 7291 MB/s


Here’s where things get disgusting. Memory latency is about 76% higher than on Lynnfield. That’s just abysmal. It’s also reflected in the memory bandwidth scores. While Lynnfield can manage over 15GB/s from its dual-channel memory controller, Clarkdale can’t break 10. Granted this is higher than the Core 2 platforms, but it’s not great.

What we’re looking at is a Nehalem-like CPU architecture coupled with a 45nm P45 chipset on-package. And it doesn’t look very good. If anything was going to hurt Clarkdale’s performance, it’d be memory latency
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Yes, Anand mentioned the chipset used QPI to connect to the Clarkdale multichip package (MCP), but the on package memory controller is mentioned as optimized for FSB architectures (which according to the article made matters worse beyond simply moving off the cpu die)

http://www.anandtech.com/show/2901/2

I am not sure what you (And Anand) try and show. Nomatter what, the extra QPI jump will make it worse than any native implementation. And its not going to change nomatter what you replace the MCP with.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
If this were true I think we would have seen Intel integrate PCHs on the low voltage 22nm Haswell mobile chips, but we didn't. Instead the PCH stayed on 32nm rather than being integrated on 22nm

Its at 32nm for capacity reasons.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
I am not sure what you (And Anand) try and show. Nomatter what, the extra QPI jump will make it worse than any native implementation. And its not going to change nomatter what you replace the MCP with.

Whatever was going on with Clarkdale's memory controller it wasn't good. According to the chart the on package memory controller actually had a worse memory latency than even a E8600 Core 2 duo (which had a memory controller on the Northbridge, which is a greater distance away from the cpu than an on package memory controller)
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Whatever was going on with Clarkdale's memory controller it wasn't good. According to the chart the on package memory controller actually had a worse memory latency than even a E8600 Core 2 duo (which had a memory controller on the Northbridge, which is a greater distance away from the cpu than an on package memory controller)

But the E8600 also had 50% more cache in a fast L2. Vs the Westmere with slower L3. Not to mention the Core 2 runs their cache much faster than the westmere uncore.