14C/28T Haswell-EP Xeon Surfaces in Malaysia

Sweepr

Diamond Member
May 12, 2006
5,148
1,143
136
Intel’s Haswell-X Xeon EP Processor has surfaced in Penang, Malaysia during the company’s “Design in Asia” tour.

The Intel Haswell-X Xeon EP processor made an appearance in a visit from VR-Zone’s Nebojsa Novakovic to one of Intel’s facilities in Penang, Malaysia during Intel’s “Design in Asia” tour. The processor appears to be the Socket 2011-3 Haswell EP or Xeon E5 v3 and features 14 cores, a 35 MB cache, twin 9.6 GT/s QPI channels, and support for quad channel DDR4-2133 memory.

Even though the processor is still at least a year from commercial announcement, and the Ivy Bridge EP chips will only be arriving this summer, it seems that Intel’s Haswell Xeons have already reached at least the QS phase. Selected customers may be receiving fully working ES versions by the end of the year.

www.tomshardware.com/news/Haswell-X-Xeon-EP-Intel,23477.html

Published two days ago but no one's posted here yet.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
IMG_4735.jpg
 

CakeMonster

Golden Member
Nov 22, 2012
1,662
843
136
Is the HW architecture really able to handle so many threads and the overhead and crowding of data it would create? Or are there more fundamental changes in the "EP" chips?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Is the HW architecture really able to handle so many threads and the overhead and crowding of data it would create? Or are there more fundamental changes in the "EP" chips?

Are you thinking of hyperthreading efficiency here or more just the regular data I/O congestion concerns?
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
...and all those cinebench boxes rendering ^_^

Cinebench doesn't understand processor groups, and so, is pointless on these extra wide systems. Actually the quad capable E5s are a strange duck anyway. It's often cheaper to configure two duals, and you don't get the extra RAS features of E7.
 
Last edited:

StinkyPinky

Diamond Member
Jul 6, 2002
6,992
1,284
126
A 8-core Haswell-E would do nicely and probably more realistic for most of us :hmm:
 

CakeMonster

Golden Member
Nov 22, 2012
1,662
843
136
Are you thinking of hyperthreading efficiency here or more just the regular data I/O congestion concerns?

I was mainly thinking of IO congestion. Is the design ready and is it as simple as the above mentioned bandwidth increases or will there be more penalties?
 

NTMBK

Lifer
Nov 14, 2011
10,523
6,047
136
I want to see 28 threads contend for a lock. Although I guess that shouldn't be as much of a problem with TSX.

The 32-thread dual-socket workstations we have in the office do just fine, even without TSX. ;)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I was mainly thinking of IO congestion. Is the design ready and is it as simple as the above mentioned bandwidth increases or will there be more penalties?

With the optimized prefetchers and huge fast caches that Intel has, I do expect them to have the IO situation hammered out.

I have no proof of this mind you, just confidence in their product engineers.
 

psyq321

Junior Member
Jun 18, 2012
11
1
71
Properly written code should not have problems scaling this big - the point is in the "properly" :) I am currently testing my stuff on dual 12-core Ivy EP and there are no issues with threading performance - then again, my code is mostly lockless and in rare cases it stays as little as possible in the critical section.

The problem with more than 32 physical / 64 logical cores on Windows is processor group support. The code has to be explicitly aware of CPU groups, otherwise it will not be able to use more than 64 logical cores in x64 more (32 logical cores in 32-bit mode) and CPU scheduler will "pin" all its threads into a single CPU group which is 64-cores (logical) wide.

If the Windows software you use is not aware of CPU groups then the only chance is to launch many instances of it and hope that Windows scheduler will equally spread them across 64-core groups. Then, there is the issue of NUMA memory accesses etc.

In short, to unlock performance on systems with more than 32 physical cores the software must be optimized for this:

1. Use NUMA-aware memory allocation and respect NUMA boundaries when doing work

2. On Windows (Server 2012+) it also must use new CPU-group aware Win32 APIs and set the thread affinities properly and manually.

I suspect that most consumer software except specific enterprise/HPC apps is not CPU-group aware, and subset of that software is not even NUMA-aware. Using of these apps in benchmarks of these high-core-count platforms will simply be flawed.

In fact, I believe Anandtech already did a test of Xeon E5 4600 where they used software which is not CPU-group aware, and this resulted in rather lousy results. And this is still Sandy Bridge EP 4S. Once Ivy Bridge EX (and, later, Haswell EX) becomes available, this problem will be amplified.