If $15 is your target for a big core chip, all I can say is it's not gonna happen. This price point is what Atom is for. Flat out, you will not get anything more than mobile gaming from a $15 chip.Since it is a desktop, not having the latest node shouldn't be a problem. (In fact, 32nm on desktop clocks quite high, although cooling could become an issue if frequencies were pushed to extremes)
Now as far as using newer uarchs on 22nm and beyond, I would be concerned about the amount of logic that would need to be disabled to make an extreme budget gamer desktop chip ($15 and below, including PCH). Sure your cost per xtor is somewhat lower on advanced nodes, but then then the chip has more total xtors and more of them are being disabled to create the differentiation. Sure some of these disabled units will come from defects, but how much volume is that really going to add? I would think it would not be much and most of the volume necessary would have to be created from disabling perfectly good logic.
Alternatively, there are always chips like Braswell (quad core 14nm atom with 16 Gen 8 EUs, optimized for mobile xtors, SOC). And while I think something like this is fine for high end tablet (it is a tablet chip re-purposed for desktop, afterall) I have to believe it is less than optimum for x86 gamer desktop for many reasons.
Here are some of them:
1. Quad small (ie, atom) core: This is not a good idea for x86 gamer desktop because most of the existing x86 games suitable for its low voltage 16EU iGPU would be single or dual thread games. Two large cores would have been a better use of silicon die area here if it were designed from the ground up as a specialized budget desktop gamer chip.
2. Optimized for mobile xtors: While the low leakage rate is great for mobile, on the desktop the low drive current and low max frequencies make for a poorer value. For optimum value on desktop, I would like to see a die optimized for higher voltage/frequency per mm2 silicon area.
3. SOC: While integrating PCH is beneficial for saving space in the tight confines of a phone or 8" tablet, I have read it does nothing (or very very little) for performance. In fact, in some cases integrating the PCH can bloat the die to the point where some CPU and GPU die area need to be sacrificed in order to keep costs down.
To put the Westmere resurrection to rest, I'd have to point out that die space relates directly to cost, the bigger the die, the more expensive it is to produce. A 32 nm Westmere chip is likely to be at least as expensive, if not more so than a 2C 22nm Haswell chip.
Is that including both the core and the IGP die for westmere? The Haswell die includes both the cores and the IGP.A dual core Haswell is 130mm2 on 22nm ----> http://www.anandtech.com/show/7744/intel-reveals-new-haswell-details-at-isscc-2014
Whereas, 2C/4T Westmere is only 81mm2 on 32nm.
According to that diagram you posted, the iGPU on dual core Haswell accounts for more than half of the die itself.A dual core Haswell is 130mm2 on 22nm ----> http://www.anandtech.com/show/7744/intel-reveals-new-haswell-details-at-isscc-2014
Whereas, 2C/4T Westmere is only 81mm2 on 32nm.
So that is a massive difference in die sizes, not to mention a good difference in process tech.
P.S. Even if the Haswell were only 81mm2 on 22nm (just for the sake of argument), the 81mm2 Westmere on 32nm would be cheaper to make. So we just can't go purely by die sizes when trying to estimate cost.
I wish I had better access to information regarding Intel's fabs, but according to the following list here are the fabs currently in use:
http://en.wikipedia.org/wiki/List_of_Intel_manufacturing_sites
Fab sites:
D1X Hillsboro, Oregon, USA 300 mm, 14 nm
D1D Hillsboro, Oregon, USA 300 mm, 14 nm
D1C Hillsboro, Oregon, USA 300 mm, 22/14 nm
Fab 12 Chandler, Arizona, USA 300 mm, 65 nm
Fab 32 Chandler, Arizona, USA 300 mm, 22/14 nm
Fab 42 Chandler, Arizona, USA 450 mm, 14 nm
Fab 11 Rio Rancho, New Mexico, USA 300 mm, 45/32 nm
Fab 11X Rio Rancho, New Mexico, USA 300 mm, 45/32 nm
Fab 17 Hudson, Massachusetts, USA 200 mm, 130 nm
Fab 24 Leixlip, Ireland 300 mm, 14 nm
Fab 28 Kiryat Gat, Israel 300 mm, 22 nm
Fab 68 Dalian, China 300 mm, 65 nm
Perhaps if there was a need maybe some of the 65nm nodes could at least partly transition to 45nm to increase capacity if necessary.
A dual core Haswell is 130mm2 on 22nm ----> http://www.anandtech.com/show/7744/intel-reveals-new-haswell-details-at-isscc-2014
Whereas, 2C/4T Westmere is only 81mm2 on 32nm.
So that is a massive difference in die sizes, not to mention a good difference in process tech.
P.S. Even if the Haswell were only 81mm2 on 22nm (just for the sake of argument), the 81mm2 Westmere on 32nm would be cheaper to make. So we just can't go purely by die sizes when trying to estimate cost.
Is that including both the core and the IGP die for westmere? The Haswell die includes both the cores and the IGP.
You compare apples and oranges for that matter.
You also forgot this part for the Westmere if you somehow wish to compare them:
![]()
The comparison was Haswell dual core die size vs. dual core Westmere, not Haswell dual core vs. Clarkdale (which is dual core Westmere + Iron Lake).
Regardless, you still need an additional die alongside Westmere for your iGPU, Memory controller, etc. This additional die adds significant cost, and there's still the issue of memory latency.The comparison was Haswell dual core die size vs. dual core Westmere, not Haswell dual core vs. Clarkdale (which is dual core Westmere + Iron Lake).
And thats an irrelevant compare.
You can also make a dualcore Haswell or Broadwell without IGP, memory controller and so on.
Regardless, you still need an additional die alongside Westmere for your iGPU, Memory controller, etc. This additional die adds significant cost
and there's still the issue of memory latency.
Remember, this thread is about relaunching 2C/4T with a new on graphics/memory controller chip.
So knowing the cost of 2C/4T Westmere is very relevant when making a comparison to a CPU with iGPU as a single chip.
If 2C/4T Westmere were actually more expensive than Haswell dual core, I wouldn't have even made this thread.
Intel doesn't make the Haswell or Broadwell chips that way.
I probably mentioned this before, but resurrecting Westmere is probably more expensive that fusing off a Haswell
Assuming the project was successful, Intel could always shrink down Westmere to a smaller node (at some later point in time).
I think once you open that door, you're better off using fuses on a current gen processor. The amount of design resources to build another core derivative (even if it's logically the same as a previous gen) is enormous.
or taking an Atom IP.
1. Quad small (ie, atom) core: This is not a good idea for x86 gamer desktop because most of the existing x86 games suitable for its low voltage 16EU iGPU would be single or dual thread games. Two large cores would have been a better use of silicon die area here if it were designed from the ground up as a specialized budget desktop gamer chip.
2. Optimized for mobile xtors: While the low leakage rate is great for mobile, on the desktop the low drive current and low max frequencies make for a poorer value. For optimum value on desktop, I would like to see a die optimized for higher voltage/frequency per mm2 silicon area.
3. SOC: While integrating PCH is beneficial for saving space in the tight confines of a phone or 8" tablet, I have read it does nothing (or very very little) for performance. In fact, in some cases integrating the PCH can bloat the die to the point where some CPU and GPU die area need to be sacrificed in order to keep costs down.
P.S. Remember that with advanced nodes, defect rate on wafers rise exponentially with area. So in some cases I think having the graphics/memory controller being a separate chip could help reduce costs.
Hopefully the latency issue mentioned in the OP would be fixed by using QPI rather than FSB.
Intel doesn't make the Haswell or Broadwell chips that way.
cbn said:P.S. Remember that with advanced nodes, defect rate on wafers rise exponentially with area. So in some cases I think having the graphics/memory controller being a separate chip could help reduce costs.
22nm is cheaper than 32nm in all metrics for Intel.
Westmere already used QPI for the MCP.
Memory Performance - Not Very Nehalem
Lets start at the obvious place, memory performance. Nehalem moved the memory controller on-die, but Clarkdale pushes it off again and over to an on-package 45nm graphics core.
To make matters worse, the on-package chipset is a derivative of the P45 lineage. Its optimized for FSB architectures, not the QPI that connects the chipset to Clarkdale. Lets look at the numbers first:
Processor L1 Latency L2 Latency L3 Latency
Intel Core i7-975 4 clocks 10 clocks 34 clocks
Intel Core i5-750 4 clocks 10 clocks 34 clocks
Intel Core i5-661 4 clocks 10 clocks 39 clocks
AMD Phenom II X4 965 3 clocks 15 clocks 57 clocks
Intel Core 2 Duo E8600 3 clocks 15 clocks
L1 and L2 cache latency is unchanged. Nehalem uses a 4-cycle L1 and a 10-cycle L2, and thats exactly what we get with Clarkdale. L3 cache is a bit slower than the Core i7 975, which makes sense because the Core i5 661 has a lower un-core clock (2.40GHz vs. 2.66GHz for the high end Core i7s) Intel says that all Clarkdale Core i5s use the same 2.40GHz uncore clock, while the i3s run it at 2.13GHz and the Clarkdale Pentiums run it at 2.0GHz.
Processor Memory Latency Read Bandwidth Write Bandwidth Copy Bandwidth
Intel Core i7-975 45.5 ns 14379 MB/s 15424 MB/s 16291 MB/s
Intel Core i5-750 51.5 ns 15559 MB/s 12432 MB/s 15200 MB/s
Intel Core i5-661 76.4 ns 9796 MB/s 7599 MB/s 9354 MB/s
AMD Phenom II X4 965 52.3 ns 8425 MB/s 6811 MB/s 10145 MB/s
Intel Core 2 Duo E8600 68.6 ns 7975 MB/s 7062 MB/s 7291 MB/s
Heres where things get disgusting. Memory latency is about 76% higher than on Lynnfield. Thats just abysmal. Its also reflected in the memory bandwidth scores. While Lynnfield can manage over 15GB/s from its dual-channel memory controller, Clarkdale cant break 10. Granted this is higher than the Core 2 platforms, but its not great.
What were looking at is a Nehalem-like CPU architecture coupled with a 45nm P45 chipset on-package. And it doesnt look very good. If anything was going to hurt Clarkdales performance, itd be memory latency
Yes, Anand mentioned the chipset used QPI to connect to the Clarkdale multichip package (MCP), but the on package memory controller is mentioned as optimized for FSB architectures (which according to the article made matters worse beyond simply moving off the cpu die)
http://www.anandtech.com/show/2901/2
If this were true I think we would have seen Intel integrate PCHs on the low voltage 22nm Haswell mobile chips, but we didn't. Instead the PCH stayed on 32nm rather than being integrated on 22nm
I am not sure what you (And Anand) try and show. Nomatter what, the extra QPI jump will make it worse than any native implementation. And its not going to change nomatter what you replace the MCP with.
Whatever was going on with Clarkdale's memory controller it wasn't good. According to the chart the on package memory controller actually had a worse memory latency than even a E8600 Core 2 duo (which had a memory controller on the Northbridge, which is a greater distance away from the cpu than an on package memory controller)