• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Updated Knights Landing (KNL) Info.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
The distance is just one part of the latency. There's also the latency of addressing lines in the arrays inside the memory chips. GDD5 has not great at that.

However, with so many threads, plus decent amounts of cache, RAM latency is not likely a high priority.
From my understanding, the latency difference between DDR3 ICs and GDDR5 ICs is practically nonexistent. The difference comes in the way the memory controllers are optimized for bandwidth, latency be damned, on a GPU (or presumably on the Xeon Phis). Since the memory controller variable stays constant, but the physical distance is decreasing, the latency should decrease.
 

Hans de Vries

Senior member
May 2, 2008
347
1,177
136
www.chip-architect.com
You may want to read your linked article again. It works exactly as it should. (DDR4 is regular slotted mobo DDR4 DIMMs.)

knights_landing.jpg


The HMC serves as a big cache.

One may hope it's just the journalist's unlucky quote:

eetimes said:
Our HMC will be packaged with a very optimized interface, so they can be placed very close to the Xeon Phi using DDR4 channels," Mike Black, HMC technology strategist at Micron, told us. "And then all of that will be put into a common package that then drops into a single socket on the board."
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
From my understanding, the latency difference between DDR3 ICs and GDDR5 ICs is practically nonexistent.
At similar speeds, it shouldn't be, and/or GDDR5 may even be faster (IE, OCed DDR3 v. stock GDDR5). They rarely run at similar speeds, though. Common GDD5 has been pushing 2GHz, while common DDR3 is still a bit below 1GHz.
Since the memory controller variable stays constant, but the physical distance is decreasing, the latency should decrease.
Only if a lot of the latency is from the trace length, which it typically hasn't been--system RAM is usually only slightly higher in latency than the chips on the DIMM. Not having to worry about DIMMs should allow for a substantial real latency reduction, though (no bank selecting), along with high concurrency (greater chance of your data being available on some channel/link somewhere, and if not, a great chance that memory utilization is good enough to not worry about it), which should matter much more for ~200 threads.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Intel press release about it:
http://newsroom.intel.com/community...building-block-for-high-performance-computing

Also included the new Omni Scale Fabric that is integrated into KNL and future 14nm Xeons (Skylake).

And this is classic Intel here. To some extent, commoditize system components, interconnects, etcetera. Intel does a ton of work to establish it's HPC platform as a standard. If it is as good as advertised, why will Cray and others continue to develop their own fabrics long term when they can build systems 'off the shelf' using Intel's fabric and standized components?

Nvidia has a huge lead, in large part due to it's copious CUDA libraries for C and Fortran, but KNL and Omni Scale must be giving them fits. Intel, from what I've read, has also been exceedingly generous with Xeon Phi pricing in order to gain market share.

The big question for me is 'how open is Omni Scale'? Will Integrators be able use NV products? Will NV need access to Intel IP?
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
At similar speeds, it shouldn't be, and/or GDDR5 may even be faster (IE, OCed DDR3 v. stock GDDR5). They rarely run at similar speeds, though. Common GDD5 has been pushing 2GHz, while common DDR3 is still a bit below 1GHz.

They have similar latencies at common speeds. DDR3 CL10 @ 1GHz has same latencies as GDDR5 CL20 @2GHz. At clock cycles the faster memory runs the more cycles latencies have to be to resulting similar latencies.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
If I had the time and the money I'd love to sit down with one of these and see what kind of numbers you could get for multi-algorithm CPU/GPU mixed Cryptocoin mining. That would be no small effort though, given the tailoring you would need to do for the various algorithms and the specific traits of the architecture. Still, interesting to think about.
 

NTMBK

Lifer
Nov 14, 2011
10,455
5,842
136
That's a heck of a lot of changes... sounds like it's based on Silvermont the same way Sandy Bridge was based on Nehalem!
 

Nothingness

Diamond Member
Jul 3, 2013
3,310
2,383
136
That's a heck of a lot of changes... sounds like it's based on Silvermont the same way Sandy Bridge was based on Nehalem!
Indeed. I guess they just kept some of the integer execution units and the integer decoders :biggrin:
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
That's a heck of a lot of changes... sounds like it's based on Silvermont the same way Sandy Bridge was based on Nehalem!

Indeed. I guess they just kept some of the integer execution units and the integer decoders :biggrin:

Yeah, I hope we get to see an architectural breakdown. I'm really wondering what Intel did with the 4-way SMT in terms of xtor resource allocation. Did they keep it light like HT or put more muscle behind it for higher throughput? It could be more like HT, with a modified scheduler, just to manage, in hardware, more threads in flight.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Yeah, I hope we get to see an architectural breakdown.
I'd rather haven an architectural breakdown of Broadwell/Airmont, Gen8/9 and Skylake. The last time we got one was almost 2 years ago (Oct 5 '12 for Haswell).
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
From my understanding, the latency difference between DDR3 ICs and GDDR5 ICs is practically nonexistent. The difference comes in the way the memory controllers are optimized for bandwidth, latency be damned, on a GPU (or presumably on the Xeon Phis). Since the memory controller variable stays constant, but the physical distance is decreasing, the latency should decrease.

Ding ding ding! This is the correct way to think about the differences between DDRx and GDDR5. GDDR5 chips themselves are actually slightly (very slightly) better in terms of latency than DDRx chips, and the reason that many people seem to think that it has worse latency all comes down to the memory controller.

Bandwidth and latency are fundamental trade-offs when designing a memory controller. GPUs care more about bandwidth, so many armchair architects get the idea that GDDR5 is bad at latency, and in many applications CPUs care more about latency, so many armchair architects get the idea that DDRx is good at latency. Really these are just memory controller design decisions that the designers made.

If you want to learn a bit more about a memory controller design that tries to strike a balance between "fair," "high bandwidth," and "low latency," I would recommend reading this paper:
http://users.ece.cmu.edu/~omutlu/pub/tcm_micro10.pdf
 

mavere

Member
Mar 2, 2005
196
14
81
Really these are just memory controller design decisions that the designers made.

Sure, but in the absence of any actual claims from the vendor, there's no point in tip-toeing around academic pedantries when pragmatic reality basically guarantees that a design great at one thing would only be middling (or worse) at the other.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,698
4,660
75
It seems to me that future CPUs are going to get more and more complicated. I'm starting to imagine a future Intel CPU. It will have 2 big hyper-threaded main cores, maybe a dozen Knights Landing-like small cores, an IGP vector processor, and an FPGA, all on one die.
 
Mar 10, 2006
11,715
2,012
126
Knights Landing looks extremely impressive. Kudos to Intel for this one.

It will be interesting to see what NVIDIA fires back with! Interesting times we live in.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
I'd rather haven an architectural breakdown of Broadwell/Airmont, Gen8/9 and Skylake. The last time we got one was almost 2 years ago (Oct 5 '12 for Haswell).

Best we may get is something on RWT, if Kanter has the time considering he has a new job and all. He did 8 pages on Silvermont, plus the discussions over there usually bring some additional clarity (if one has the patience).
 

Nothingness

Diamond Member
Jul 3, 2013
3,310
2,383
136
Best we may get is something on RWT, if Kanter has the time considering he has a new job and all. He did 8 pages on Silvermont, plus the discussions over there usually bring some additional clarity (if one has the patience).
I'm afraid it's not a question of time, I don't think David Kanter will be legally allowed to write such lengthy articles on RWT any more. I'm lucky enough to have a Microprocessor Report subscription :)

PS - For reference here's DK original announcement
As of January, 2014 I joined the Linley Group as an analyst and senior editor of the Microprocessor Report (MPR), where I am responsible for PC and server processors, and will be lending a hand with graphics, power management, and mobile devices. I will continue to write shorter articles at RWT and share my thoughts on the industry, but in a fashion that does not conflict with my editorial responsibilities.
 

NTMBK

Lifer
Nov 14, 2011
10,455
5,842
136
Knights Landing looks extremely impressive. Kudos to Intel for this one.

It will be interesting to see what NVIDIA fires back with! Interesting times we live in.

Nvidia will have to wait until 2016 for "Pascal".
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Which manufacturing node will Pascal be made on? Nvidia's slide doesn't list FinFET as a new feature:

PascalRoadmap.jpg