Intel launches Haswell Xeon E5

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Intel Xeon E5 Version 3: Up to 18 Haswell EP Cores

While some sites previously reported that an "unknown source" told them Intel was cooking up a 14-core Haswell EP Xeon chip, and that the next generation 14 nm Xeon E5 "Broadwell" would be an 18-core design, the reality is that Intel has an 18-core Haswell EP design, and we have it for testing. This is yet another example of truth beating fiction.

HaswellEP_DieConfig.png


The Xeon E5-2650L v3 however is the true star of this review. It is power efficient (obviously) and contrary to previous low power offerings it still offers a good response time. Perhaps more surprising is that it even performs well in our FP intensive applications.

At the other end of the spectrum, the Xeon E5-2699 v3 is much more power hungry than we are used to from a high end part. It shines in SAP where hardware costs are dwarfed by the consulting invoices and delivers maximum performance in HPC. However, the peak power draw of this CPU is nothing to laugh about. Of course, the HPC crowd are used to powerhogs (e.g. GPGPU), but there's a reason Intel doesn't usually offer >130W TDP processors.

Considering the new Haswell EP processors will require a completely new platform – motherboards, memory, and processors all need to be upgraded – at least initially the parts will mostly be of interest to new server buyers. There are also businesses that demand the absolute fastest servers available and they'll be willing to upgrade, but for many the improvements with Haswell EP may not be sufficient to entice them into upgrading. The 14 nm Broadwell EP will likely be a better time to update servers, but that's still a year or so away.

58017s.png



18 Haswell cores, 5.69B transistors and a die area of 662mm², which makes the GTX Titan look small in comparison. A great way to start IDF :).
 

Ajay

Lifer
Jan 8, 2001
15,451
7,861
136
18 Haswell cores, 5.69B transistors and a die area of 662mm², which makes the GTX Titan look small in comparison. A great way to start IDF :).

Holy smokes, that's a huge die!!! These Haswell-EPs are impressive, but I wonder how many customers will wait for BW-EP (which will have a smaller die and much lower power consumption). They'll probably be socket compatible, so really it's a win/win.

Also interesting is the 160 W TDP limit on the MCC CPUs - I'm guessing there are going to be some very fast workstation processors.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
18 Haswell cores, 5.69B transistors and a die area of 662mm², which makes the GTX Titan look small in comparison. A great way to start IDF :).

That is the largest die size I have ever seen listed.

18 cores is a lot!

So with that core count total in mind for a single die, how many cores are needed or used in each multi-socket server for a high end cluster?
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
Massive die and insane MT performance from the 18-core model. Around 50% faster than previous 2.7GHz 12-core Ive Bridge-EP, impressive feat considering they managed to pull that @ same 22nm process. Honestly never expected they would go >15 cores (Ivy Bridge-EX has 15 cores).
 
Last edited:

BigDaveX

Senior member
Jun 12, 2014
440
216
116
Dayy-um. I'm almost scared to think how many cores Broadwell-EP is likely to be packing. 24, anyone?

What's worse is seeing the Opterons just get pulverized in this test. I mean I wouldn't expect it to even come close to beating Haswell-EP anyway, given that at the high-end Intel now has more cores than AMD has modules... but it looks like AMD's best Opterons are still only about competitive with Westmere, to say nothing of the three Intel cores that have come and gone since then.

EDIT: Someone should install Windows 7 on a system with two E5-2699 v3 chips, just to see what its task manager looks like on a system with 72 logical cores!
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
but I wonder how many customers will wait for BW-EP (which will have a smaller die and much lower power consumption).

The thing I found impressive is the market that buys these things essentially move 100% into new generation 6-9 months from product introduction. That's big contrast to consumer market where even 3 years after people might still be buying Sandy Bridge based products.

The ROI is really fast though, so its worth it for them.

That is the largest die size I have ever seen listed.
http://arstechnica.com/gadgets/2008/02/intel-shows-off-tukwila-first-2-billion-transistor-cpu/

32nm Itanium:
21.5x32.5mm2

699mm2. It sounds like the package size of some processors. :D
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Wow. Intel is finally executing as they should. Chip is all out effort, insane amount of cores for one chip, for -EP platform to rival IBM big iron CPU in area is insanity :)

And feature wise they are not really holding back the horses either, after seeing that they allow NUMA to be enabled inside the chip, amazing engineering.

Intel probably promoted those marketing/product management douchebags responsible for all CPU feature segmentation(VT-d, TSX etc) and online CPU upgrades out of their jobs and in their place came guys responsible for all cool stuff like overclockable Pentiums and Xeon chips for workstation with high clocks and/or high L3 amount.
 

Grooveriding

Diamond Member
Dec 25, 2008
9,108
1,260
126
That die is massive... Imagine a GPU that big on that process.. I wish Intel would make GPUs. Is it a safe assumption no one else is capable of producing anything like that ?
 

Ajay

Lifer
Jan 8, 2001
15,451
7,861
136
The thing I found impressive is the market that buys these things essentially move 100% into new generation 6-9 months from production introduction. That's big contrast to consumer market where even 3 years after people might still be buying Sandy Bridge based products.

The ROI is really fast though, so its worth it for them.

Interesting. No wonder Intel loves it dominance in the server market!
 

kimmel

Senior member
Mar 28, 2013
248
0
41

mavere

Member
Mar 2, 2005
186
0
76
The 4-4-4-6 config came as a surprise, but it looks like a pretty good use of die space. Of course, some primitive part of me finds the unbalanced stacks mildly infuriating ;).

3IOMhNu.jpg
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
6650_01_intel_haswell_ep_xeon_e5_2600_v3_server_family_processor_overview.jpg


Those are massive dies.

Can someone explain why they use areas of the wafer that clearly cannot yield a usable chip?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Can someone explain why they use areas of the wafer that clearly cannot yield a usable chip?

They aren't really "wasting" anything, if that's why you are wondering. The wafer is round because the silicon ingots they cut it from are cylinders(which are easiest to form), and there's bound to be "waste".

It's probably easier to print it to the edges rather than taking extra effort required to determine which parts of the wafer you don't etch.

Apparently printing it to the "edge" helps with yield on the actual usable dies as well.

The key point is companies like Intel are truly MASS-scale, and the things that are important to their bottom line will be lot different than those with tiny-scale, or hobbyist level projects.
 
Last edited:

Maximilian

Lifer
Feb 8, 2004
12,603
9
81
Octadeca core is what that 18 core beast is called :eek:

What do they do with the bits of the wafer at the edges? The partial CPU's?
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
That die is massive... Imagine a GPU that big on that process.. I wish Intel would make GPUs. Is it a safe assumption no one else is capable of producing anything like that ?

Die size is reticle (litho) limited in a hardware sense. Any fab with the same litho tools (same hardware-limited reticle size) could/can produce a wafer with dies the same size as Intel.

That is the trivial/academic answer to your question, but I suspect you meant to ask a slightly different question.

Rather, I suspect you meant to ask if it was safe to assume no one else is capable of yielding chips of that size while managing to have the performance/capabilities of those chips be such that the chips can command the price-point necessary for the company to make sufficiently large enough profits once the costs of the chip (which includes yields, both functional and parametric) were appropriately accounted for...to which I would answer with a strong "YES!"

To have a chip be as large as the 2699V3, yielding in high enough quantities for the binned SKU (both functional and parametric), while simultaneously providing enough performance and value-add such that the end-user is willing to pony up the dosh to pay for it (and thus make it worth Intel's time developing and producing) is a feat that pretty much no one else can approach (save for possibly IBM, but that is a situation where the economic picture of their large-die SKUs is perhaps obfuscated by their integrated hardware/software/service contract sales model).

Those are massive dies.

Can someone explain why they use areas of the wafer that clearly cannot yield a usable chip?

Partial dies along the edge of the wafer are still printed and etched as key means of decreasing functional yield loss for the rest of the wafer.

A lot of particles are generated from film delamination along the edge of the wafer. If you don't pattern and etch those partial die along the perimeter of the wafer then the thicknesses of the film stacks along the edge of the wafer become asymmetric with respect to the areas that are being patterned and etched as full die. A big issue when you go to CMP.

Much easier to just pattern and fill those partial dies, eliminating proximity effects and fill effects, for the neighboring dies that you do want to sell.

It isn't necessary, you can do blanket field/shot exposure (or not) on those partial dies if you desired. But you'll take a yield hit from the associated rise in defectivity across the rest of the wafer when those partial dies become defect generators at later stages of processing.
 

kimmel

Senior member
Mar 28, 2013
248
0
41
Die size is reticle (litho) limited in a hardware sense. Any fab with the same litho tools (same hardware-limited reticle size) could/can produce a wafer with dies the same size as Intel.

Does thermal expansion of such a large die also have the potential to limit sizes?
 

Grooveriding

Diamond Member
Dec 25, 2008
9,108
1,260
126
Die size is reticle (litho) limited in a hardware sense. Any fab with the same litho tools (same hardware-limited reticle size) could/can produce a wafer with dies the same size as Intel.

That is the trivial/academic answer to your question, but I suspect you meant to ask a slightly different question.

Rather, I suspect you meant to ask if it was safe to assume no one else is capable of yielding chips of that size while managing to have the performance/capabilities of those chips be such that the chips can command the price-point necessary for the company to make sufficiently large enough profits once the costs of the chip (which includes yields, both functional and parametric) were appropriately accounted for...to which I would answer with a strong "YES!"

To have a chip be as large as the 2699V3, yielding in high enough quantities for the binned SKU (both functional and parametric), while simultaneously providing enough performance and value-add such that the end-user is willing to pony up the dosh to pay for it (and thus make it worth Intel's time developing and producing) is a feat that pretty much no one else can approach (save for possibly IBM, but that is a situation where the economic picture of their large-die SKUs is perhaps obfuscated by their integrated hardware/software/service contract sales model).


Thank you for the succinct explanation. That was what I was actually thinking when I asked. Looking at those wafers is super impressive when you see how much area those massive 18 core monsters take up in relation to available area of the wafer.
 

WhoBeDaPlaya

Diamond Member
Sep 15, 2000
7,414
401
126
Analog/RF designers jump through a LOT of hoops to address those processing issues, eg. advanced layout / common-centroid techniques (beyond simple AB|BA) to eliminate 3rd and even 4th order mismatches.
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,020
136
That on die fabric is getting seriously complex! The "cluster on die" mode is pretty interesting- just treat the two rings as separate NUMA nodes, except with extremely high speed on-chip switches between them instead of off-die QPI links.

I wonder where the fabric design will go next? Will they go back to a single, larger bidirectional ring? More "clusters" of cores on their own rings, connected in a network by switches? Or something more like Knights Landing's 2D-mesh fabric?