Knights Landing package pictures

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Well, we know it isn't much over 3TFLOPs... if it was >3.5TFLOPs, that's what they would put on the slide. ;) It's enough for a rough ballpark figure, and it means we know it isn't going to be some crazy >2GHz thing. I'm sure my clock speed estimate won't turn out to be precisely correct, but I'm pretty confident that it's not that far off.

Perfect. I would like to add that at this stage this 3TFLOPS mark might not even be achievable depending on how much "optimism" is included in these expectations.
 

NTMBK

Lifer
Nov 14, 2011
10,523
6,048
136
Perfect. I would like to add that at this stage this 3TFLOPS mark might not even be achievable depending on how much "optimism" is included in these expectations.

Nah, I reckon at this point they'll have enough samples back to be confident about the clocks. There'll be at least one top of the line part with full 72 cores and top clocks. I just wonder how many cut-down SKUs they will make- with a die that large, there must be plenty of chips with a faulty core (or 10).
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
The last one went from 57 to 61 cores. So I would guess the delta would be around the same. Say ~5-6 cores.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
sure, but the slide says in fineprint (0) "Over 3 Teraflops [...]", it doesn't make much sense to do too much precise predictions based on the fact that it isn't 3 TFlops (DP theoretical peak) but > 3 TFlops

No offense, but I already indicated in post #66 for my reasoning, latter part of it. When I post, I try to read at least a page back so I don't get redundant in my questioning: http://forums.anandtech.com/showpost.php?p=37293958&postcount=66

If we assume certain SKUs of Knights Landing achieves peak stated goals of 16 DP FLOPs/W, and the top SKU is stated to be 215W, we get 3.4TFLOPs of performance. That results in 1.45-1.5GHz.
Based on leaks:
-160-215W SKUs
-14-16 DP FLOPS/W

So at BEST, we get 3.44TFLOPs which turns out to be ~1.5GHz. Then 3+ TFLOPS would mean anywhere between 3.01-3.44TFLOPs. Not that ambiguous is it?

or do you have a source where *scalar* performance and/or *effective* performance is claimed ?
"3+ TeraFLOPS of double-precision peak theoretical performance per single socket node

3x Single-Thread Performance compared to Knights Corner"

That's pretty clear. You don't say "Single-Threaded Performance" and point out fairly limited gains like "Vector performance". Pentium 4 had quite nice SSE performance, did people care? Did Intel say "2x single-threaded performance" because of the AVX 2 with FMA for Haswell? No they didn't.

http://vr-zone.com/articles/xeon-phi-knights-series-continues-landing-2015/64112.html

On the link above, in the first slide it states vector performance and single thread as separate.

Also, think of it logically. If they didn't think of single-threaded performance the way we think of "single-threaded performance", they wouldn't have needed to use Silvermont cores. Why did they opt for such a high IPC core on a "many simple core" device? Why not 72 P54C cores? Why not 96 P54C cores? Why do they want to use Goldmont, which is supposedly a much higher performing core in the Xeon Phi coming after that? They did that for a reason.

Intel must have reasoned, or even found out that single-thread performance is critical to what they are aiming for.
 
Last edited:

bronxzv

Senior member
Jun 13, 2011
460
0
71
No offense, but I already indicated in post #66 for my reasoning, latter part of it.

sorry if I have missed a post of yours but my answer was to NTMBK

btw I see that in an old post of yours you were writing this : "Perhaps maybe for some reason they'll boost SP by 2x and result in 3.25 TFLOP DP/13 TFLOP SP?" this nonsense shows well that you lack a basic understanding of AVX-512 which is fully disclosed since ages now, why will we think that you have anything serious to teach us about KNL ?

That's pretty clear. You don't say "Single-Threaded Performance" and point out fairly limited gains like "Vector performance".

3x isn't "fairly limited" in my book

as per fine print (6) which says "Projected peak theoretical single-thread performance [...]" my explanation is a sensible one IMHO since it isn't hard to imagine 1.8 GHz for a *single thread* on KNL

btw, nobody will talk about "peak theoretical performance" for an effective IPC increase (your wall of text / theory) when comparing 2 cores with the same issue width, isn't it ?
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Also, think of it logically. If they didn't think of single-threaded performance the way we think of "single-threaded performance", they wouldn't have needed to use Silvermont cores. Why did they opt for such a high IPC core on a "many simple core" device? Why not 72 P54C cores? Why not 96 P54C cores? Why do they want to use Goldmont, which is supposedly a much higher performing core in the Xeon Phi coming after that? They did that for a reason.

Intel must have reasoned, or even found out that single-thread performance is critical to what they are aiming for.

Do you realise, that if you have simple in order core like p54C and bolt it with 512bit SIMD unit things will go *underutilized* in most of real world scenarios? While AVX512 is nice and dandy, once you hit cache misses you are looking at hundreds of cycles of waiting for memory. Same with SIMD unit, it will sit idle.

So Intel needed OoO and to increase overall utilization even more they went with 4xHT. That means you need beefy CPU and that is where your 3xST performance improvement comes from - from having modern out of order core with SMT.

And You should really look up the basics about SSE/AVX to stop misleading people. The beauty of KNL is that is your garden variety x86 CPU with massive SIMD capabilities. You don't need any magic to start working with it, but it will for sure take hand tuned assembly and threading to make most of it.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Actually KNC already had 4 way SMT.

How else they could extract performance out of 1993 heritage P54C in order core bolted with 512bit SIMD? It was accelerator, designed to be GPU with additional hardware while KNL is general purpose x86 boosted to 4 thread HT support to increase utilization of its two 512bit SIMD units.
 

csbin

Senior member
Feb 4, 2013
908
614
136
Intel confirmed 72 cores for Knights landing

http://www.computerbase.de/2015-04/xeon-phi-intel-bestaetigt-72-kerne-fuer-knights-landing/

AO8IJ.png


72 cores based on the Silvermont architecture