Intel announcements for AI: Nervana 100x faster than GPU, Knights Crest & Mill 4x faster, SKL mid-17


Diamond Member
Dec 25, 2013
Intel Unveils Strategy for State-of-the-Art Artificial Intelligence

Brian Krzanich Editorial: The Intelligence Revolution – Intel’s AI Commitments to Deliver a Better World

So at the tail end of SC16 (Supercomputing 2016, November 13-18, Salt Lake City), at Intel's first AI day, Intel just made a slew of announcements. Basically, as CEO Brian Krzanich outlined in this second (!) essay of the week (after the self driving cars one [0]), Intel, after discontinuing their mobile effors earlier this year, now sees the nascent field of Artificial Intelligence / Machine Learning / Deep Learning as the next revolution in the... well, world. So with a lot of superlatives, Intel just released their plans to -- well, what do you expect -- lead the industry with Intel Architecture.

So IA for AI.

To summarize ('cause you probably don't want to bother reading the press material marketing).
  1. After buying Altera last year [1][2] to accelerate FPGAs for IoT and data center and to integrate them, eventually, in Xeon silicon, Intel now announced the Intel Nervana platform for AI. Nervana makes, not general-purpose GPUs like Nvidia, but specialized coprocessors for neural networks. Intel promises to improve deep learning (the training part, not the second, inference part [3][4][5]) speed by a breakthrough 100x over the next three years compared to current GPUs. (Incidentally, the last time Intel announced a breakthrough technology was only last year when IMFT announced the fast, dense and non-volatile 3D XPoint memory [6] which will be qualified in Q4'16 for 2017 launch. (And the year before that was silicon photonics, then finFET, etc.))

    Moving on to the details of how Nervana IP will be integrated in the Intel platform / roadmap.

  2. It's called Lake Crest. First silicon will be tested in H1'17 (as earlier planned, so on track) and will be shipped to key customers in H2'17.
  3. Announcement of Knights Crest on the roadmap which will integrate Xeon processors with Nervana tech. Optimized for neural networks: high performance deep learning. High compute density with a high-bandwidth interconnect. Reiterates the 100x performance before end of decade (hopefully not like how 450mm wafers were promised before end of decade) to accelerate pace of AI innovation.
  4. Diane Bryant expects the 14nm Knights Mill (earlier announced deep learning-focused successor of 14nm Knights Landing, which goes up to 7TFLOPS single precision with 72 Silvermont-based cores with AVX-512 at 1.5GHz) to be 4x faster than "the previous generation" for deep learning (given that half precision, or 16bit, delivers 2x performance, this means they also have 8bit support or have done other things to further improve performance). Knights Mill has 2017 launch.
  5. Intel has begun shipping Xeon Skylake (Purley) to "select cloud service providers". Has AVX-512 support (up from 256 bit AVX2) as well. Available mid-2017. Will use Arria 10 PCIe card for inference.
  6. Intel has Saffron Technology platform for business insights, which is suited for small devices such as IoT.
  7. They have RealSense, which combines well with Movidius' VPU (vision processing unit), which was also acquired a few months ago. Intel wants to give computers a "cortex" with the VPU. Wants to integrate both technologies.
  8. Earlier today Intel announced an alliance with Google [20][21].
  9. They created the Nervana AI board with some university profs and some software tools to make AI more accessible and to work together as an industry.
  10. Of course they have done investments in deep learning applications to improve the world. I suppose.
Witeken's take (or must one be an analyst to do that :D)
Today's announcement isn't so much about new, previously unknown information, as it is about putting together all of the behind the scenes work that Intel has done to expand their portfolio into one press package. In this way, they can really drive home the significance of their investments and their role in the bigger picture of Intel's data center investments (which in recent years have gone up by >1 billion in R&D money that has become available as the mobile investments have dwindled).

Now that the big picture is becoming more and more clear with many of the pieces falling together in the last one or two years (Xeon Phi, Omni Path, silicon photonics, Altera, IoT investments, Purley, 3D NAND, 3D XPoint, VISC, Movidius' VPU, Nervana, and probably a bunch of software around all that), it's clear that Intel is becoming first and foremost a data center company, with the PC market providing the base volume and IP necessary.

Further details
1. Nervana was announced to be acquired at IDF in August this year [7]. Nervana also has Neon cloud service, which goes against Nvidia CUDA software. Their specialized neural network chip should keep Nvidia at bay and prevent Intel from losing market share as Nvidia grows (which they did by quite a bit, as their data center business nearly tripled in Q3'16 to about $250M). Intel's total data center business, for comparison, is growing at on average >=10% for years now and now sits in at $16B annual revenue. Their cloud segment is rapidly growing at 30-40% per year, fueled in large part by their "super seven" [8], and accounts for roughly 1/3rd of the revenue share or so.
Nervana's 8-terabit per second Engine chip is a silicon-interposer based multi-chip module with terabytes of 3-D memory surrounding a 3-D torus fabric of connected neurons each using low-precision floating-point units (FPUs). As such, it can pack many more deep learning calculations-per-second into a smaller silicon chip that competitor's general-purpose GPUs, according to Freund.

Intel's acquisition of Nervana [9] is reminiscent of their earlier, $250M acquisition for Soft Machines, which made the claimed-to-be revolutionary VISC ISA [10][11], which was done in all secrecy [12][13][14]. In the sense that they're acquiring a start-up promising to change the field in a big, disruptive way.

The first chip will be a 28nm TSMC part (Nervana was a startup not long ago, so no worries there). But without doubt they probably plan to transition to Intel fabs. In this way the story is similar to the Infineon acquisition (14nm part still pending, probably for 2018 release). Or the Altera foundry deal / acquisition (part is delayed, but according to BK because of pre-M&A Altera investment decisions). So hopefully this time they do a better job of getting the IP in-house manufactured, as the Intel foundry is maturing (Altera 14nm Arria 10 should be qualified this quarter IIRC).

But also like Altera in the sense that they plan to integrate the chip on die. If you were to summarize the semiconductor industry, integration is definitely the key word, so this makes sense. Everything used to be separate, but has become and it still becoming just one chip -- SoC: the audio, the video, the connectivity, the ISP (cf. Skylake), the wireless things (modem, Wifi rumored), the graphics (cf. Larrabee), the cache. And in the future Wifi, silicon photonics, Omni-Path Architecture (the Infiniband competitor that started shipping with Knights Landing this year in e.g. supercomputers), the FPGA and the neural processor. And things that are not yet integrated are getting closer to the CPU: non-volatile memory (3D XPoint), and DRAM through 2.5D or 3D interconnects.

Nervana's chip is meant for training, the first part of deep learning. Nvidia also claims Pascal will do the second part, the inference. Intel, BTW, at the moment has the biggest market share for AI (I read 97%, but most of data center is not AI). The chip doesn't have floating point units, which kind of make the comparison harder obviously. Nervana CEO said their 28nm chip does 55 teraflops. Nvidia's 28nm Kepler did 6 teraflops single precision (32 bits). Pascal goes up to ~10 teraflops and with half precision (16 bit) support, is capable of 20 teraflops on the 16nm process. But since it's not real floating point, it doesn't really do much besides machine learning. They call it "flexpoint". It is a tensor-based architecture.

But the most important aspect, they say, is the interconnect. The aggregate bandwidth is 2.4 terabits/s, but it has a few other tricks up to its sleeve as well.
The real star of the Nervana chip story is the interconnect, which, as one might imagine, the company was reticent to describe in detail. This is where the engineering expertise of its seasoned team of hardware engineers (Rao himself has designed six chips for Sun, among others) comes in. Rao describes the interconnect approach as a modular architecture with a fabric on the chip that extends to high speed serial links to other chips so from a programming perspective, it looks the same to talk between chips or to different units on a single chip. While not revolutionary in itself, he says this let the team build the software stack in a much different way than with a GPU, for instance. “With a GPU, there’s a distinct difference in communication on and off the chips—you have to memory map the I/O, move things around the memory hierarchy, and this involves more complicated steps, adds latency, and prevents things like model parallelism.”

As one might expect then, the architecture is completely non-coherent. There are no caches; there are software managed resources on the die. This means no concept of cache coherency on the chip and since everything is managed by software, all data movement between chips is software-driven—so no cache hierarchy and instead, a direct, clean explicit message passing approach from one functional unit on the die to another.
The reported deal for Nervana is (confirmed to be) in excess of $350M (lower bound) [15]. In the Interview with NextPlatform, CEO Rao is eager about the fabs to get to more competitive ("14nm") nodes and the 3D XPoint. They have Google as customer, which is analogous to how Microsoft was a customer of Altera.

2. None.

3. Just want to add that in recent times Intel has been accused of opportunistically making use of order of magnitude estimate improvements. For instance the 1000x claimed faster 3D XPoint (because people have failed to realize that faster actually literally means lower latency).

4. Xeon Mill was earlier in 2016 announced as deep learning focused Xeon Phi [16][17]. It was made to go in between Knights Landing and the 10nm Knights Hill successor. Earlier at SC16, Intel provided a few new details. Mill will be available in 2017. It will be a host-CPU. And by now claiming 4x performance, it seems most likely to me that it doesn't just add 16 bit support, but also 8 bit.

Actually, another viable option to get to the claimed 4x performance without 8-bit would be to switch to high-density libraries. This becomes awefully clear if you compare transistor counts of GP100 (15B) vs. Knights Landing (8B), even though Intel actually uses a 40% denser process. So if they could get up to 21B (say, 16B for 2x flops) transistors on same area GP100, that might do the trick together with half-precision. So yeah, have never really understood the design decision here to let a low-frequency part use low-density transistors (Bay Trail used HD AFAIK).

4.5. So that's all training. For inference, Intel has demonstrated at SC16 an on-package integrated Arria 10 Browdwell-EP. They will ship a Arria 10 FPGA PCIe card "deep learning inference card", coming in 2017 [18].

5. Purley will have Omni Path Architecture (OPA) integrated. This won't be on die (yet), but like the Xeon Phi-F, on package (I suppose). There are already Omni Paths out there, but the one with integrated OPA has only started shipping recently. In October BK said the initial Purleys won't support 3D XPoint yet, though. Purley details already leaked many moons ago [19].

7. Further reading:

[0] BK blog self driving cars (autos):
Even earlier blog:

6+. It's clear Intel has been on an M&A spray this year, both for data center companies as well as the IoT. For the IoT story, I will drop a few links below.

This tidbit is also from the press materials, something about Shakespeare powered by Intel processors:

To conclude, if you own NVDA stock because of their deep learning efforts, you might want to reconsider your options since they're about to get serious competition, if the previously announced half precision Knights Mill wasn't already enough :). Remember how in the midst of the bitcoin craze, GPUs were abandoned rather sooner rather than later because it turned out there were chips (ASICs) which were a ton faster still? (So Nvidia might become irrelevant in AI even before they well and fully entered the deep learning market. Or maybe Intel will have wasted another "$10B" like their mobile bet.)


Editorials (a guide for the perplexed)
Brian Krzanich (CEO): The Intelligence Revolution – Intel’s AI Commitments to Deliver a Better World
Diane Bryant (EVP data center group): The Foundation of Artificial Intelligence
Doug Fischer (SVP IoT group): ‘Upstreaming’ Artificial Intelligence: Making AI Available for All

I just put this here for completeness sake: Intel and the Royal Shakespeare Company Collaborate with The Imaginarium Studios to Create Ariel as a Digital Avatar for the First Time

Sources / further reading
Press kit:

[0] BK blog 1:
[1] Press overview:
[2] Slide deck:
[3] From, ironically, Nvidia:
[4] From Nvidia:
[5] Again Nvidia:
[6] The latest article by Stephen Breezy, who has followed this technology for a long time, although some articles are more speculative. I suggest to follow the links in the article to earlier ones: For the rest there has already been written an ton about it, so I won't further dwell on it here. But clearly this could become a key product for Intel's data center portfolio.

[7] EETimes coverage:
[8] Super seven:
[9] Nervana article before M&A with some in-depth info:
[10] Soft Machines PR:
[11] AnandTech VISC article:
[12] The Register original source:
[13] SiliconAngle:
[14] AnandTech Forums topic:
[15] Nextplatform CEO Nervana interview:

[16] Knights Mill:
[17] More KM:
[18] Broadwell+Arria 10:
[19] Purley:
[20] Google blog:
[21] Intel blog:
Hope you appreciate the topic start. So yeah, this is the stuff that keeps me up at night... literally :/.

According to Wired, Nervana also for inference
Last edited:


Golden Member
Nov 28, 2013
This sort of very specalised competition was always going to come to this space - its a potentially utterly massive market. You can see that by the way that NV have been specalising their GPUs quite a bit. At some point the designs for their AI training stuff might diverge even further from the gaming stuff.

Its also a very much more natural sort of market for Intel to make progress in that IoT, or indeed mobile. Big, high tech expensive chips with big profit margins etc. No reason not to expect them to do at least quite well I'd think.

We'll see.


Mar 13, 2006
Good to see Intel developing a strategy. The deep learning field undergoing huge growth. Having seen some of the things that are being done in the field I'm super excited.
Mar 10, 2006
Good to see Intel developing a strategy. The deep learning field undergoing huge growth. Having seen some of the things that are being done in the field I'm super excited.

It's a growing market that gobbles up any and all kinds of computing power that it can.

The problem that Intel has faced in PCs is that at some point, the average joe isn't willing to pay for more performance because the ROI isn't there for him.

In areas like deep learning, there is real money to be saved (TCO reduction) as performance continues to go up, which drives demand for higher performance chips from the likes of Intel, NVIDIA, and others.

This is a perfect fit for Intel. Now they need to make sure to capitalize on it.


Diamond Member
Dec 25, 2013
A bit strange that AnandTech, EETimes (edit oops , ExtremeTech, Spectrum and NextPlatform have no article about it (yet?).
Isn't nVidia doing something similar?
Nvidia is also into deep learning with software and Pascal, but they're just taking it from their existing general purpose GPU.

Intel has a even more general purpose many core, and now they also have a specialized chip as well.
Last edited:


Senior member
Dec 22, 2013
Isn't nVidia doing something similar?

Nothing announced as far as an FPGA is concerned. IMO, currently, the GPU exists in an awkward position between a CPU and FPGA/ASIC. The direction is going more general purpose (closer to CPU), but the effect will be that FPGA and ASIC's will be increasingly better for fixed workloads. I guess you could look at crypto-mining as an example. First there were the CPU's, then GPU's, now ASIC's. Obviously for real-world workloads, FPGA has a huge advantage over ASIC in the fact that your algorithms or load out can change (whereas current crypto-mining stays the same).


Diamond Member
Dec 25, 2013
Intel claims Lake Crest (the 28nm Nervana part) already has more raw compute "than today's state-of-the-art [16nm] GPUs"

So the 100x in 3 yrs just bonus.