Discussion Intel current and future Lakes & Rapids thread

JoeRambo · Oct 22, 2019

DrMrLordX said:
ICELAKE_D is probably:

INTEL_ICELAKE_D is for XEON-D products.

For example ICELAKE_L are existing mobile Icelake products like 1065G7 that have extended model 7E.

Desktop is INTEL_FAM6_ICELAKE, extended model 0x7D

jpiniero · Oct 22, 2019

Intel Core i7-1065G7 Ice Lake Linux Performance Benchmarks Review - Phoronix

www.phoronix.com

More Icelake benchmarks in Linux from Phoronix. Seems Firefox in Linux seems to love Icelake.

liahos1 · Oct 22, 2019

jpiniero said:
Intel Core i7-1065G7 Ice Lake Linux Performance Benchmarks Review - Phoronix

www.phoronix.com

More Icelake benchmarks in Linux from Phoronix. Seems Firefox in Linux seems to love Icelake.

looks like a damned good part!

Carfax83 · Oct 22, 2019

For the more knowledgeable folk, how does Sunny Cove compare to Apple's latest and greatest in terms of efficiency and performance?

I get the general impression from reading this forum on a regular basis that Apple's microarchitecture is far superior to Intel and AMDs current desktop lineup from a performance per watt perspective.

Does Sunny Cove change that sentiment at all, since it's on a more comparable process node?

Bouowmx · Oct 22, 2019

Carfax83 said:
For the more knowledgeable folk

Not a knowledgeable folk, I say you can compare AnandTech's SPEC2006 results.

Intel Core i7-1065G7, i7-8550U, i9-9900K, and AMD Ryzen 9 3900X: https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/4

Apple A13 and other recent mobile flagships: https://www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-review/4

Carfax83 · Oct 22, 2019

Bouowmx said:
Not a knowledgeable folk, I say you can compare AnandTech's SPEC2006 results.

Yeah I've seen all those, but I have next to no knowledge about the Spec benchmark. I know it's an industry benchmark, but as to how it translates to actual performance in consumer workloads, I have no idea.

That's why I wanted some feedback from the forum experts. Honestly, I'm not a big fan of Apple at all so it irritates me that they are beating Intel and AMD in microarchitecture performance. I at least want to know that Intel isn't taking it laying down

Thunder 57 · Oct 22, 2019

Carfax83 said:
Yeah I've seen all those, but I have next to no knowledge about the Spec benchmark. I know it's an industry benchmark, but as to how it translates to actual performance in consumer workloads, I have no idea.

That's why I wanted some feedback from the forum experts. Honestly, I'm not a big fan of Apple at all so it irritates me that they are beating Intel and AMD in microarchitecture performance. I at least want to know that Intel isn't taking it laying down

They are very good in single core / low core counts. Whether they can scale it up, I very much question that. They would need a power hungry interconnect like Intel's mesh or AMD's Infinity fabric. Just look at the following:

Worst case, AMD's cores are using 2.78 Watts each (89 / 32). The rest is all IF. See other examples here. I doubt Apple, with little experience in an interconnect like this, could do any better. If they did, it would seem they would want a piece of the lucrative server market.

That's what I get for rushing into math. At lower core counts it looks like the use up to 8 Watts each. Surely because they are boosting higher.

beginner99 · Oct 23, 2019

liahos1 said:
looks like a damned good part!

is it really? best test has 32% better performance/watt on a new node and new uArch isn't actually Earth shattering,

tamz_msc · Oct 23, 2019

Carfax83 said:
Yeah I've seen all those, but I have next to no knowledge about the Spec benchmark. I know it's an industry benchmark, but as to how it translates to actual performance in consumer workloads, I have no idea.

?
All the workloads in SPEC are based on real-world applications, some of which include consumer workloads - like gcc, h264, povray etc.

Carfax83 · Oct 23, 2019

Thunder 57 said:
They are very good in single core / low core counts.

Yep, and that's what I'm interested in the most. I'm curious as to how Intel's latest core compares to the A13 on a single core perspective.

The multicore aspect doesn't really interest me as much. I guess I'm still in shock or awe that the A13 can have such high single threaded performance; at least according to Spec.

Carfax83 · Oct 23, 2019

tamz_msc said:
?
All the workloads in SPEC are based on real-world applications, some of which include consumer workloads - like gcc, h264, povray etc.

Yes but it doesn't tell you everything unless you are very familiar with it. For example, is the h264 benchmark in Spec optimized for modern SIMD instructions or is it running in scalar mode only?

If it's not SIMD optimized, I suppose that would explain a lot, like how the A13 could match the 9900K . I'm assuming the benchmark came out in 2006, so it's very old and does not use advanced SIMD instructions like AVX2.

Nothingness · Oct 23, 2019

Carfax83 said:
Yes but it doesn't tell you everything unless you are very familiar with it. For example, is the h264 benchmark in Spec optimized for modern SIMD instructions or is it running in scalar mode only?

SPEC is supposed to be portable which means no intrinsics or assembly. Your only hope to see SIMD instructions being used is if the compiler could identify some vectorizable code.

If it's not SIMD optimized, I suppose that would explain a lot, like how the A13 could match the 9900K . I'm assuming the benchmark came out in 2006, so it's very old and does not use advanced SIMD instructions like AVX2.

On hand-written SIMD code I'd indeed expect the 9900K to lead due to the width of AVX2. But the end-user apps where this would make a difference is likely not that large.

EDIT: You should read this document that discusses the move from 128-bit to 256-bit SIMD: https://mailman.videolan.org/pipermail/x264-devel/attachments/20130423/ffd6bfb6/attachment-0001.pdf

Furthermore, keep in mind that x264 is not 100% assembly code. About half of x264's running
time is spent in either optimized C code or mostly-non-SIMD assembly code, where AVX2 will not
provide any gains. Even if all the SIMD assembly functions doubled in speed, x264 would only get
33% faster because of this bottleneck.

Carfax83 · Oct 23, 2019

Nothingness said:
SPEC is supposed to be portable which means no intrinsics or assembly. Your only hope to see SIMD instructions being used is if the compiler could identify some vectorizable code.

Yeah I was doing some research and I found that out, that the benchmark itself is compiled and not just run from an exe.

On hand-written SIMD code I'd indeed expect the 9900K to lead due to the width of AVX2. But the end-user apps where this would make a difference is likely not that large.

I was doing some digging on Spec's website, looking at the different scores for various submitted results. For the 6700K, the highest score I could find for the h264 benchmark was 79.2.

Now in Anandtech's testing, the 9900K scored 80.36, which is nowhere near what you'd expect considering the 9900K has a much higher single core clock speed than the 6700K; 5ghz vs 4.2ghz at stock clocks.

So this of course, makes me very suspicious, that the benchmark itself just doesn't scale well.

tamz_msc · Oct 23, 2019

Carfax83 said:
Now in Anandtech's testing, the 9900K scored 80.36, which is nowhere near what you'd expect considering the 9900K has a much higher single core clock speed than the 6700K; 5ghz vs 4.2ghz at stock clocks.

Different OS and compilers. The Intel compiler gives better results than gcc by about 10 percent.

Nothingness · Oct 23, 2019

Carfax83 said:
I was doing some digging on Spec's website, looking at the different scores for various submitted results. For the 6700K, the highest score I could find for the h264 benchmark was 79.2.

Now in Anandtech's testing, the 9900K scored 80.36, which is nowhere near what you'd expect considering the 9900K has a much higher single core clock speed than the 6700K; 5ghz vs 4.2ghz at stock clocks.

So this of course, makes me very suspicious, that the benchmark itself just doesn't scale well.

The Intel compiler is better for some of the SPEC subtests; sometimes it's close to pure cheating (quantum) but in the case of h264 I would not be surprised that it could be due to better vectorization by icc.

EDIT: looked this up; the gain on h264 by icc is also in part due to the use of automatic reduction of pointers to 32-bit (the -auto-p32 option). According to David Kanter that option brings 10-20% gain on that test.

Ajay · Oct 23, 2019

Nothingness said:
EDIT: looked this up; the gain on h264 by icc is also in part due to the use of automatic reduction of pointers to 32-bit (the -auto-p32 option). According to David Kanter that option brings 10-20% gain on that test.

Geez, that’s a surprisingly large gain just from a reduction in pointer width. There must be an unusually large amount of pointer math being done.

Nothingness · Oct 23, 2019

Ajay said:
Geez, that’s a surprisingly large gain just from a reduction in pointer width. There must be an unusually large amount of pointer math being done.

That's not pointer math that matters, that's pointer being stored in the caches that are twice larger resulting in more cache thrashing 😉

Ajay · Oct 23, 2019

Nothingness said:
That's not pointer math that matters, that's pointer being stored in the caches that are twice larger resulting in more cache thrashing 😉

Ah, thanks - I was trying to figure out how to make sense of this.

DrMrLordX · Oct 23, 2019

Nothingness said:
The Intel compiler is better for some of the SPEC subtests; sometimes it's close to pure cheating (quantum)

Your point bears repetition. One of the problems with SPEC is that there have been many years over which compiler and hardware optimizations can (and have) been made to cheat at SPEC.

NostaSeronx · Oct 24, 2019

I think it should be noted:
SunnycoveX is using the 8.5T(7.7T) library w/ critical paths being 10.2T(9.3T), while both Cannonlake/Icelake products that launched were using the 6.8T(6.2T) library.

Desktop(HEDT) and Server(Xeon Scalable) will potentially run as fast or even faster than Zen2.

Dayman1225 · Oct 25, 2019

Tigerlake 4+2 with LPDDR5

https://twitter.com/x/status/1187725400178225152

yeshua · Oct 26, 2019

momomo_us on twitter has leaked some new info about future server Cooper Lake/Ice Lake parts:

・Cooper Lake
Whitley , 14nm , Socket P+
48C , 2S , 300W , 8ch DDR4-3200
C62x chipset , PCIe Gen3 64 Lanes
Q2 20

・Ice Lake
Whitley , 10nm , Socket P+
38C , 2S , 270W , 8ch DDR4-3200
C62x chipset , PCIe Gen4 64 Lanes
Q3 20

This timing doesn't inspire too much confidence. Intel might as well skip their 10nm node altogether since it still doesn't quite work: out of eleven announced Ice Lake U CPUs just few are available in etail/retail. They haven't even bothered to add the i7-1068G7 to their Ark DB.

mikk · Oct 26, 2019

yeshua said:
out of eleven announced Ice Lake U CPUs just few are available in etail/retail. They haven't even bothered to add the i7-1068G7 to their Ark DB.

To be honest the availability is better than expected for ICL-U by end of October, it is constantly improving almost every day in Germany. Also the prices are lower than some had been feared. ICL-Y is indeed not available yet but do you really think this is a big deal by now? I mean this is basically ICL-U with a lower TDP....the market demand is very thin for Y-SKUs in general. OEMs can use -U SKUs with TDP down.

About Ice lake-SP....it is known since ages that ICL-SP is some of a lower end server generation because of the 10nm issues, in fact the 38C confirmation is a nice surprise, some time ago we thought Icelake-SP tops out at only 26C. The big 10nm server thing is coming with Sapphire Rapids.

NostaSeronx · Oct 26, 2019

yeshua said:
This timing doesn't inspire too much confidence. Intel might as well skip their 10nm node altogether since it still doesn't quite work

mikk said:
About Ice lake-SP....it is known since ages that ICL-SP is some of a lower end server generation because of the 10nm issues, in fact the 38C confirmation is a nice surprise, some time ago we thought Icelake-SP tops out at only 26C. The big 10nm server thing is coming with Sapphire Rapids.

Icelake Server is not using the same core from Icelake Client. Server core is using SunnycoveX which might have more AVX512 units than even the client model.

Icelake Client is the legacy 10nm aka the same node as Cannonlake. Icelake Server is going for Tigerlake-S(/H)/Tigerlake-U(/Y) node which is the legacy 10nm++ node.

Sapphire Rapids is most likely skipping the 10nm node all together. Willowcove client is 10nm++, that either means 10nm+++ or 7nm for WillowcoveX(the Server core).

DrMrLordX · Oct 26, 2019

38C, 270W on 10nm? What?

Discussion Intel current and future Lakes & Rapids thread

Golden Member

Lifer

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Golden Member

Member

Diamond Member

Diamond Member

Lifer