XBitlabs: Intel Shows Off "Knights Corner" MIC Compute Accelerator, Beats Nvidia's Fe

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Sure Knights Corner might be competitive at 35W, but what if it's 65W? 95W? 130W? 300W?

i am under the same question, there is nowhere beside xlabs saying that KC have 10x DP\watt than fermis. This claims are BS.

actually, even that if they were true, it's not that impresive.

it's arquitecture goal is pure performance, it don't have z-buffers, geometri engines and other stuffs that will be sucking power while doing nothing.
let's not forget the huge node advantage
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
Where I work, we paid tens of thousands of dollars and many months to get a software version made that could work on a CUDA solution so our physics modelers could have something to put on their desk instead of sharing time on a cluster.

Do not underestimate this being an x86 solution that wouldn't need new software.

I'm sure performance per watt is better than Fermi, but I think that's relatively minor compared to being x86.

I do see the appeal, dont get me wrong.

I think if intel couldve delivered this earlier it wouldve killed nvidias dream. But you know what? Nvidia kinda created this market all on their own. Intel just wants to stop nvidia and steal this market they worked so hard to make. Shame shame intel, bad bad!

Nvidia has gained a lot of traction, the longer intel drags their feet the more market nvidia will have cornered. I do see the threat, and i am sure nvidia does too. It might get interesting. But like mayfield, it might be a bigger bite than intel can chew. I am waiting on the chips before i spend too much time on this one.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
The amount of draw power required for Intel's MIC cards to attain the performance figures they are reaching will probably make the gtx480 look like a tree hugging, power sipping, hippy party.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Back in 2007 Intel demo'd its Polaris chip (http://en.wikipedia.org/wiki/Teraflops_Research_Chip). This was built on 45nm process and had 80 cores. It was the first true single chip Teraflop processor. And it only consumed 62W of power.

Now Knights Corner is based on 22nm tri gate technology and only has 50 cores and can reach 1 Teraflop. I have not seen the power consumption yet, but I have to believe it is well under 60W (I would guess 20-30W)

The fastest Fermi chips are about half that speed and consume about 250W. Lets say Kepler can double the speed or half the power consumption. So now we are at 1TF @ 250W or 125W at Fermi speeds @ 32nm.

I call this a win for Intel. Especially since it will work with most existing x86 compilers.
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
Back in 2007 Intel demo'd its Polaris chip (http://en.wikipedia.org/wiki/Teraflops_Research_Chip). This was built on 45nm process and had 80 cores. It was the first true single chip Teraflop processor. And it only consumed 62W of power.

Now Knights Corner is based on 22nm tri gate technology and only has 50 cores and can reach 1 Teraflop. I have not seen the power consumption yet, but I have to believe it is well under 60W (I would guess 20-30W)

The fastest Fermi chips are about half that speed and consume about 250W. Lets say Kepler can double the speed or half the power consumption. So now we are at 1TF @ 250W or 125W at Fermi speeds @ 32nm.

I call this a win for Intel. Especially since it will work with most existing x86 compilers.


See my post above. Nvidia is predicting a quadrupling of DP\watt for Kepler.
Maxwell which will be on the same process at Intels chip is expected to provide 14x DP\watt of Fermi or about 9.2TFLOPS at the same power.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Back in 2007 Intel demo'd its Polaris chip (http://en.wikipedia.org/wiki/Teraflops_Research_Chip). This was built on 45nm process and had 80 cores. It was the first true single chip Teraflop processor. And it only consumed 62W of power.

Now Knights Corner is based on 22nm tri gate technology and only has 50 cores and can reach 1 Teraflop. I have not seen the power consumption yet, but I have to believe it is well under 60W (I would guess 20-30W)

The fastest Fermi chips are about half that speed and consume about 250W. Lets say Kepler can double the speed or half the power consumption. So now we are at 1TF @ 250W or 125W at Fermi speeds @ 32nm.

I call this a win for Intel. Especially since it will work with most existing x86 compilers.

Did you call this a win too? http://www.youtube.com/watch?v=ynjYuS1J3jI
http://www.hardwarecentral.com/news...s-Larrabee-Hits-1TFlop-of-Computing-Speed.htm
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
The fastest Fermi chips are about half that speed and consume about 250W. Lets say Kepler can double the speed or half the power consumption. So now we are at 1TF @ 250W or 125W at Fermi speeds @ 32nm.

I call this a win for Intel. Especially since it will work with most existing x86 compilers.

You count 250W of the entire Graphic Card, that is the PCB with all the capacitors etc etc and 1.5GB of GDDR5 memory. We have no idea what power usage the GF110/110 alone has.

One more thing, HD5870 Evergreen had more 32bit GFLOPs than GF100 (2720 vs 1344) but Fermi was faster in HPC.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
You count 250W of the entire Graphic Card, that is the PCB with all the capacitors etc etc and 1.5GB of GDDR5 memory. We have no idea what power usage the GF110/110 alone has.

True, so that number will be lower. But by how much?
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Nvidia is predicting a quadrupling of DP\watt for Kepler. Maxwell which will be on the same process at Intels chip is expected to provide 14x DP\watt of Fermi or about 9.2TFLOPS at the same power.

Sorry if I do not take Nvidia at face value for their predictions of future products that are years off. 28nm is already causing delays for them, we expect 20nm to be smoother? And by that time Intel will be on 14nm or smaller.
 
Last edited:

Obsoleet

Platinum Member
Oct 2, 2007
2,181
1
0
Sorry if I do not take Nvidia at face value for their predictions of future products that are years off. 28nm is already causing delays for them, we expect 20nm to be smoother? And by that time Intel will be on 14nm or smaller.

Exactly.
While playing the "futures" game is usually a mess.. judging from what Intel has previously demonstrated a 45nm/80core teraflop processor@62watts.. vs a 22nm/50core reaching the same teraflop potential

Larrabee 1.0- 45nm, 80cores, 1 teraflop power, 62watts.
Fermi- 250watts, 0.5 teraflop
Knights Corner- 22nm, 50cores, 1 teraflop, ~30watts estimated
Kepler- 250watts, 1 teraflop

That's a nasty pattern. What the heck is Nvidia going to counter with, Tegra..........?
 
Feb 19, 2009
10,457
10
76
That's what i find impressive, so much performance for so little power. It's scary if its true.

Intel will have a tech/node edge for a long time to come as long as NV relies on TSMC.
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
Exactly.
While playing the "futures" game is usually a mess.. judging from what Intel has previously demonstrated a 45nm/80core teraflop processor@62watts.. vs a 22nm/50core reaching the same teraflop potential

Larrabee 1.0- 45nm, 80cores, 1 teraflop power, 62watts.
Fermi- 250watts, 0.5 teraflop
Knights Corner- 22nm, 50cores, 1 teraflop, ~30watts estimated
Kepler- 250watts, 1 teraflop

That's a nasty pattern. What the heck is Nvidia going to counter with, Tegra..........?

I guess larrabee was just too amassing they decided to drop it. 1 Teraflop at 60watts. Wow, they couldve had 4teraflops at 240watts. Give me a break!

There is a lot more to it than that. Even if those figures are right, the chip must have not been realistic it or it wouldve already been here and killed nvidia and AMDs GPUs. I just cant how it didnt come out seeing as your teraflop figures is so spectacular. I am incline to believe that this cherry picked scenario that measures those teraflops cannot be a realistic measure of performance across the board...... There is something extremely off here. Doesnt it look like larrabee would be here now if those numbers really held any weight?
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
First of all Larrabee 1.0 is not the 80 core chip. The cores found on the polaris chip is much simpler than that of larrabee (which is similiar the cores found on atom processors I believe) hence being able to fit 80 of them using their 45nm HKMG process. I dont even think its practically usable but intel's intentions were rather clear with that project. It was simply a demo project to show the possibility of the whole "many core" initiative going over at intel.

People forget why Larrabee 1.0 was canned in the first place. It was rumoured to be measured at around 600mm^2 (estimated from a wafer shot) @ 45nm and I would hazard a guess that power consumption wasn't something intel desired seeing as even if one larrabee core consumes 1W, that would mean roughly ~320W since it was going to have 32 cores. And this is just the cores themselves.

On top of that they now show KC using the 22nm process (2 process nodes from the original larrabee) to finally have something desirable to show when most likely GCN and Kepler is going to perform much better with a node disadvantage.

I guess one of intel's strength is paying off here (node advantage) but I do wonder if it will be enough.
 

acx

Senior member
Jan 26, 2001
364
0
71
There's a lot of mixing up of whats what in this thread. Here's clarification of what we know and what the slides at xbit labs show.

Chip: Intel Polaris
Timeframe: 2007
Process Node: 65nm
Power: 62W unknown for chip or for board
Cores: 80 simple floating point unit + register + control, not x86
Performance: claimed 1 teraflop, unknown single or double precision
Commercial Availability: None
Status: Unknown
Purpose: research and development experiment of stacked dies, on chip routing network

Chip: Intel Larrabee
Timeframe: 2009
Process Node: unknown, assumed 45nm
Power: unknown
Cores: 32-64 x86 derived from P54C Pentium
Performance: claimed 1 teraflop single precision on SGEMM
Commercial Availability: None
Status: Terminated
Purpose: GPGPU, graphics acceleration

Chip: Intel Knight's Ferry
Timeframe: 2009-2010
Process Node: 45nm
Power: unknown
Cores: 32 x86 @ 1.2 Ghz from Intel MIC architecture, possibly derived from Larrabee
Performance: claimed > 1 teraflop single precision on SGEMM
Commercial Availability: prototype development chip/board for Intel MIC architecture
Status: ongoing
Purpose: high performance computing accelerator
Source: http://download.intel.com/pressroom/archive/reference/ISC_2010_Skaugen_keynote.pdf

Chip: Intel Knight's Corner
Timeframe: 2011+
Process Node: 22nm
Power: unknown
Cores: 50+ x86 derived from Knight's Ferry
Performance: claimed 1 teraflop double precision on DGEMM on prototype boards
Commercial Availability: planned availability in 2012 or later, planned into be placed into 10 petaflop TACC supercomputer in 2013
Purpose: high performance computing accelerator

Chip: NVIDIA GF100
Timeframe: 2010-2011
Process Node: 40nm
Power: 225W TDP claimed for board
Cores: 448 CUDA cores
Performance: theoretical 1.03 teraflops single precision, 0.515 teraflops double precision, 635 gigaflops in SGEMM, 305 gigaflops in DGEMM (source: http://www.netlib.org/utk/people/JackDongarra/SLIDES/gpu-0711.pdf)
Commercial Availability: now as NVIDIA Tesla M2070 and M2050
Status: ongoing
Purpose: GPGPU, graphics acceleration

Chip: AMD RV870
Timegrame: 2010-2011
Process Node: 40nm
Power: 225W TDP claimed for board
Cores: 320 SIMD cores (1600 processing elements)
Performance: theoretical 2.72 teraflop single precision, 0.544 double precision, claimed 87% peak efficiency on DGEMM for a 473 gigaflops double precision (source: http://www.cse.scitech.ac.uk/disco/mew21/presentations/AMD.pdf)
Commercial Availability: now as AMD Radeon 5870
Status: ongoing
Purpose: GPGPU, graphics acceleration

Chip: Fujitsu SPARC64 VIIIfx
Timeframe: 2011
Process Node: 45nm
Power: 58W for chip?
Cores: 8
Performance: theoretical 128 gigaflops double precision, 93% efficiency in Linpack 119 gigaflops double precision, 111 gigaflops DGEMM (source: http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=459)
Commercial Availability: Deployed now in 10 petaflop RIKEN K supercomputer, follow on 16 core chip to be commercially available in 2012
Status: ongoing
Purpose: high performance supercomputing

Chip: (Future Exascale Processor)
Timeframe: unknown, possibly 2018 based on previous stated goals from Intel
Process Node: unknown
Power: Projected 5W for computation, 20W for an entire system (which includes computation, disk/storage, memory, communication, and miscellaneous)
Cores: unknown
Performance: unknown, performance target of 1 exaflop double precision on 20 megawatts of power for the entire supercomputer
Commercial Availability: unknown, targeting 2018
Status: ongoing
Purpose: high performance supercomputing

Here is Michael Feldmen's take on Intel's SC11 Knight's Ferry presentation: http://www.hpcwire.com/hpcwire/2011...ark_of_one_teraflop_with_knights_corner_.html
 
Last edited:

Riek

Senior member
Dec 16, 2008
409
15
76
There's a lot of mixing up of whats what in this thread. Here's clarification of what we know and what the slides at xbit labs show.

Chip: Intel Polaris
Timeframe: 2007
Process Node: 65nm
Power: 62W unknown for chip or for board
Cores: 80 simple floating point unit + register + control, not x86
Performance: claimed 1 teraflop, unknown single or double precision
Commercial Availability: None
Status: Unknown
Purpose: research and development experiment of stacked dies, on chip routing network

Chip: Intel Larrabee
Timeframe: 2009
Process Node: unknown, assumed 45nm
Power: unknown
Cores: 32-64 x86 derived from P54C Pentium
Performance: claimed 1 teraflop single precision on SGEMM
Commercial Availability: None
Status: Terminated
Purpose: GPGPU, graphics acceleration

Chip: Intel Knight's Ferry
Timeframe: 2011
Process Node: 22nm
Power: unknown
Cores: 50+ x86, possibly derived from Larrabee
Performance: claimed 1 teraflop double precision on DGEMM
Commercial Availability: planned availability in 2012 or later as Knight's Corner, planned into be placed into 10 petaflop TACC supercomputer in 2013
Status: ongoing
Purpose: high performance computing accelerator

Chip: NVIDIA GF100
Timeframe: 2010-2011
Process Node: 40nm
Power: 225W TDP claimed for board
Cores: 448 CUDA cores
Performance: theoretical 1.03 teraflops single precision, 0.515 teraflops double precision, 635 gigaflops in SGEMM, 305 gigaflops in DGEMM (source: http://www.netlib.org/utk/people/JackDongarra/SLIDES/gpu-0711.pdf)
Commercial Availability: now as NVIDIA Tesla M2070 and M2050
Status: ongoing
Purpose: GPGPU, graphics acceleration

Chip: AMD RV870
Timegrame: 2010-2011
Process Node: 40nm
Power: 225W TDP claimed for board
Cores: 320 SIMD cores (1600 processing elements)
Performance: theoretical 2.72 teraflop single precision, 0.544 double precision, claimed 87% peak efficiency on DGEMM for a 473 gigaflops double precision (source: http://www.cse.scitech.ac.uk/disco/mew21/presentations/AMD.pdf)
Commercial Availability: now as AMD Radeon 5870
Status: ongoing
Purpose: GPGPU, graphics acceleration

Chip: Fujitsu SPARC64 VIIIfx
Timeframe: 2011
Process Node: 45nm
Power: 58W for chip?
Cores: 8
Performance: theoretical 128 gigaflops double precision, 93% efficiency in Linpack 119 gigaflops double precision, 111 gigaflops DGEMM (source: http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=459)
Commercial Availability: Deployed now in 10 petaflop RIKEN K supercomputer, follow on 16 core chip to be commercially available in 2012
Status: ongoing
Purpose: high performance supercomputing

Chip: (Future Exascale Processor)
Timeframe: unknown, possibly 2018 based on previous stated goals from Intel
Process Node: unknown
Power: Projected 5W for computation, 20W for an entire system (which includes computation, disk/storage, memory, communication, and miscellaneous)
Cores: unknown
Performance: unknown, performance target of 1 exaflop double precision on 20 megawatts of power for the entire supercomputer
Commercial Availability: unknown, targeting 2018
Status: ongoing
Purpose: high performance supercomputing

Here is Michael Feldmen's take on Intel's SC11 Knight's Ferry presentation: http://www.hpcwire.com/hpcwire/2011...ark_of_one_teraflop_with_knights_corner_.html

^ This. A little to much confusion in this topic about TDP and others.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
That's what i find impressive, so much performance for so little power. It's scary if its true.

Intel will have a tech/node edge for a long time to come as long as NV relies on TSMC.

As we saw with Larrabee, Atom, and Itanium...Intel may have the opportunity to leverage their process node edge towards gaining an advantage in its respective product markets, but they tend to fall on their sword more often than not when it comes to the execution.

Just because Intel's 22nm is production worthy and IB will be shipping doesn't mean they actually have Kights Corner on a release timeline that will place it in the the market in competition with 28nm. Just look at the gap in 32nm Atom launch.

IMO Nvidia has a much more consistent launch cadence for their GPU's than Intel does for anything that is not a mainstream desktop CPU.
 

masteryoda34

Golden Member
Dec 17, 2007
1,399
3
81
There is a lot more to real world performance besides theoretical FLOPS.

The FLOPS on paper for AMD GPU's is much better than the FLOPS for Nvidia GPU's. Does that translate to actual performance advantage? Not really.
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
There is a lot more to real world performance besides theoretical FLOPS.

The FLOPS on paper for AMD GPU's is much better than the FLOPS for Nvidia GPU's. Does that translate to actual performance advantage? Not really.

Exactly.

And lets all get exicited when the chip finally comes out. Until ten everything is subject to........
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Back in 2007 Intel demo'd its Polaris chip (http://en.wikipedia.org/wiki/Teraflops_Research_Chip). This was built on 45nm process and had 80 cores. It was the first true single chip Teraflop processor. And it only consumed 62W of power.

Now Knights Corner is based on 22nm tri gate technology and only has 50 cores and can reach 1 Teraflop. I have not seen the power consumption yet, but I have to believe it is well under 60W (I would guess 20-30W)

The fastest Fermi chips are about half that speed and consume about 250W. Lets say Kepler can double the speed or half the power consumption. So now we are at 1TF @ 250W or 125W at Fermi speeds @ 32nm.

I call this a win for Intel. Especially since it will work with most existing x86 compilers.

Your in a debate you can't sustain. I am with you on this but now its both the green and Red team your up against. Best just to wait till Haswell after the fact you can nug a few with quotes. But your in a debate that goes nowhere until intels puts all of its cards into 1 hand AVX/LrBi/FMA3. Our 1st real glimpse will be haswell, Than Rockwell should be the real deal.
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
As we saw with Larrabee, Atom, and Itanium...Intel may have the opportunity to leverage their process node edge towards gaining an advantage in its respective product markets, but they tend to fall on their sword more often than not when it comes to the execution.

Just because Intel's 22nm is production worthy and IB will be shipping doesn't mean they actually have Kights Corner on a release timeline that will place it in the the market in competition with 28nm. Just look at the gap in 32nm Atom launch.

IMO Nvidia has a much more consistent launch cadence for their GPU's than Intel does for anything that is not a mainstream desktop CPU.

I not sure. Are you saying we won't see KC until 2013. Maybe. I think if we see it in the jan. show well see it real soon. It is a high margine sector and we all know how intel feels about margins. The eas at which intel can recompile or just run what ever it wants is extrodinary and is a game changer. Will see results on HTC benchmarks real soon.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
I am amazed at that the 1 teraFLOP is not peak performance...like NVIDIA and AMD uses...but sustained DGEMM.

Now that is some *beep*whooping...seriously.
 
Feb 19, 2009
10,457
10
76
There is a lot more to real world performance besides theoretical FLOPS.

The FLOPS on paper for AMD GPU's is much better than the FLOPS for Nvidia GPU's. Does that translate to actual performance advantage? Not really.

Yes it does. But it requires software that caters to the radeon architecture.

Case in point: Bitcoin mining, decryption and encryption HPC software where the huge sp flop perf advantage on radeons blow everything away.

For general HPC, its pretty even but NV has the CUDA advantage.

Intel's big advantage is native x86 support. When companies and research corporation spend a lot of money on recompiling and optimizing their software.. that is a huge edge for Intel's MIC.

Oh, it's not just 1 tflops, that's the minimum achieved on debug setup: http://semiaccurate.com/2011/11/17/intel%E2%80%99s-22nm-knights-corner/
Looks like it has a lot more performance to give.
 

rgallant

Golden Member
Apr 14, 2007
1,361
11
81
Kepler is expected to quadruple the performance\watt of Fermi. So either we can expect a Kepler GPU to provide 2.7TFLOPS of performance at the same level of power consumption of Fermi, or provide 655GFLOPS at 1/4th the power.

But remember, this is on a process one node larger than Intel. Meaning, it is possible this thing isnt impressive at all.

A more appropriate comparison would be with Maxwell as it will be on the same node. That is expected to deliver 14x performance\watt of Fermi. An estimated 9.2TFLOPS at the same power.
What does Tesla consume? 225Watts? If Intel hits their power threshold it would take over 9 of these to equal one Tesla or about 180-200 watts. So it is possible with the infrastructure required to hook 9 of these up that 25-45 watt difference will be nullified.
-a chip in the hand , is worth 4 in the bush [or ground\ sand]
-I see a kc ,don't see a maxwell - it's cake.