Discussion Intel current and future Lakes & Rapids thread

Page 142 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
For the more knowledgeable folk, how does Sunny Cove compare to Apple's latest and greatest in terms of efficiency and performance?

I get the general impression from reading this forum on a regular basis that Apple's microarchitecture is far superior to Intel and AMDs current desktop lineup from a performance per watt perspective.

Does Sunny Cove change that sentiment at all, since it's on a more comparable process node?
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Not a knowledgeable folk, I say you can compare AnandTech's SPEC2006 results.

Yeah I've seen all those, but I have next to no knowledge about the Spec benchmark. I know it's an industry benchmark, but as to how it translates to actual performance in consumer workloads, I have no idea.

That's why I wanted some feedback from the forum experts. Honestly, I'm not a big fan of Apple at all so it irritates me that they are beating Intel and AMD in microarchitecture performance. I at least want to know that Intel isn't taking it laying down :innocent:
 

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
Yeah I've seen all those, but I have next to no knowledge about the Spec benchmark. I know it's an industry benchmark, but as to how it translates to actual performance in consumer workloads, I have no idea.

That's why I wanted some feedback from the forum experts. Honestly, I'm not a big fan of Apple at all so it irritates me that they are beating Intel and AMD in microarchitecture performance. I at least want to know that Intel isn't taking it laying down :innocent:

They are very good in single core / low core counts. Whether they can scale it up, I very much question that. They would need a power hungry interconnect like Intel's mesh or AMD's Infinity fabric. Just look at the following:

IF%20Power%20EPYC_575px.png


Worst case, AMD's cores are using 2.78 Watts each (89 / 32). The rest is all IF. See other examples here. I doubt Apple, with little experience in an interconnect like this, could do any better. If they did, it would seem they would want a piece of the lucrative server market.

That's what I get for rushing into math. At lower core counts it looks like the use up to 8 Watts each. Surely because they are boosting higher.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,795
3,626
136
Yeah I've seen all those, but I have next to no knowledge about the Spec benchmark. I know it's an industry benchmark, but as to how it translates to actual performance in consumer workloads, I have no idea.
?
All the workloads in SPEC are based on real-world applications, some of which include consumer workloads - like gcc, h264, povray etc.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
They are very good in single core / low core counts.

Yep, and that's what I'm interested in the most. I'm curious as to how Intel's latest core compares to the A13 on a single core perspective.

The multicore aspect doesn't really interest me as much. I guess I'm still in shock or awe that the A13 can have such high single threaded performance; at least according to Spec.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
?
All the workloads in SPEC are based on real-world applications, some of which include consumer workloads - like gcc, h264, povray etc.

Yes but it doesn't tell you everything unless you are very familiar with it. For example, is the h264 benchmark in Spec optimized for modern SIMD instructions or is it running in scalar mode only?

If it's not SIMD optimized, I suppose that would explain a lot, like how the A13 could match the 9900K . I'm assuming the benchmark came out in 2006, so it's very old and does not use advanced SIMD instructions like AVX2.
 
  • Like
Reactions: beginner99

Nothingness

Platinum Member
Jul 3, 2013
2,410
745
136
Yes but it doesn't tell you everything unless you are very familiar with it. For example, is the h264 benchmark in Spec optimized for modern SIMD instructions or is it running in scalar mode only?
SPEC is supposed to be portable which means no intrinsics or assembly. Your only hope to see SIMD instructions being used is if the compiler could identify some vectorizable code.

If it's not SIMD optimized, I suppose that would explain a lot, like how the A13 could match the 9900K . I'm assuming the benchmark came out in 2006, so it's very old and does not use advanced SIMD instructions like AVX2.
On hand-written SIMD code I'd indeed expect the 9900K to lead due to the width of AVX2. But the end-user apps where this would make a difference is likely not that large.

EDIT: You should read this document that discusses the move from 128-bit to 256-bit SIMD: https://mailman.videolan.org/pipermail/x264-devel/attachments/20130423/ffd6bfb6/attachment-0001.pdf
Furthermore, keep in mind that x264 is not 100% assembly code. About half of x264's running
time is spent in either optimized C code or mostly-non-SIMD assembly code, where AVX2 will not
provide any gains. Even if all the SIMD assembly functions doubled in speed, x264 would only get
33% faster because of this bottleneck.
 
  • Love
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
SPEC is supposed to be portable which means no intrinsics or assembly. Your only hope to see SIMD instructions being used is if the compiler could identify some vectorizable code.

Yeah I was doing some research and I found that out, that the benchmark itself is compiled and not just run from an exe.

On hand-written SIMD code I'd indeed expect the 9900K to lead due to the width of AVX2. But the end-user apps where this would make a difference is likely not that large.

I was doing some digging on Spec's website, looking at the different scores for various submitted results. For the 6700K, the highest score I could find for the h264 benchmark was 79.2.

Now in Anandtech's testing, the 9900K scored 80.36, which is nowhere near what you'd expect considering the 9900K has a much higher single core clock speed than the 6700K; 5ghz vs 4.2ghz at stock clocks.

So this of course, makes me very suspicious, that the benchmark itself just doesn't scale well.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,795
3,626
136
Now in Anandtech's testing, the 9900K scored 80.36, which is nowhere near what you'd expect considering the 9900K has a much higher single core clock speed than the 6700K; 5ghz vs 4.2ghz at stock clocks.
Different OS and compilers. The Intel compiler gives better results than gcc by about 10 percent.
 

Nothingness

Platinum Member
Jul 3, 2013
2,410
745
136
I was doing some digging on Spec's website, looking at the different scores for various submitted results. For the 6700K, the highest score I could find for the h264 benchmark was 79.2.

Now in Anandtech's testing, the 9900K scored 80.36, which is nowhere near what you'd expect considering the 9900K has a much higher single core clock speed than the 6700K; 5ghz vs 4.2ghz at stock clocks.

So this of course, makes me very suspicious, that the benchmark itself just doesn't scale well.
The Intel compiler is better for some of the SPEC subtests; sometimes it's close to pure cheating (quantum) but in the case of h264 I would not be surprised that it could be due to better vectorization by icc.

EDIT: looked this up; the gain on h264 by icc is also in part due to the use of automatic reduction of pointers to 32-bit (the -auto-p32 option). According to David Kanter that option brings 10-20% gain on that test.
 
  • Like
Reactions: Carfax83

Ajay

Lifer
Jan 8, 2001
15,451
7,861
136
EDIT: looked this up; the gain on h264 by icc is also in part due to the use of automatic reduction of pointers to 32-bit (the -auto-p32 option). According to David Kanter that option brings 10-20% gain on that test.

Geez, that’s a surprisingly large gain just from a reduction in pointer width. There must be an unusually large amount of pointer math being done.
 

Nothingness

Platinum Member
Jul 3, 2013
2,410
745
136
Geez, that’s a surprisingly large gain just from a reduction in pointer width. There must be an unusually large amount of pointer math being done.
That's not pointer math that matters, that's pointer being stored in the caches that are twice larger resulting in more cache thrashing ;)
 
  • Like
Reactions: Ajay

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
The Intel compiler is better for some of the SPEC subtests; sometimes it's close to pure cheating (quantum)

Your point bears repetition. One of the problems with SPEC is that there have been many years over which compiler and hardware optimizations can (and have) been made to cheat at SPEC.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
I think it should be noted:
SunnycoveX is using the 8.5T(7.7T) library w/ critical paths being 10.2T(9.3T), while both Cannonlake/Icelake products that launched were using the 6.8T(6.2T) library.

Desktop(HEDT) and Server(Xeon Scalable) will potentially run as fast or even faster than Zen2.
 

yeshua

Member
Aug 7, 2019
166
134
86
momomo_us on twitter has leaked some new info about future server Cooper Lake/Ice Lake parts:

・Cooper Lake
Whitley , 14nm , Socket P+
48C , 2S , 300W , 8ch DDR4-3200
C62x chipset , PCIe Gen3 64 Lanes
Q2 20

・Ice Lake
Whitley , 10nm , Socket P+
38C , 2S , 270W , 8ch DDR4-3200
C62x chipset , PCIe Gen4 64 Lanes
Q3 20

EHzdbkRUcAAb10p.png:orig


This timing doesn't inspire too much confidence. Intel might as well skip their 10nm node altogether since it still doesn't quite work: out of eleven announced Ice Lake U CPUs just few are available in etail/retail. They haven't even bothered to add the i7-1068G7 to their Ark DB.
 
  • Like
Reactions: lightmanek

mikk

Diamond Member
May 15, 2012
4,140
2,154
136
out of eleven announced Ice Lake U CPUs just few are available in etail/retail. They haven't even bothered to add the i7-1068G7 to their Ark DB.


To be honest the availability is better than expected for ICL-U by end of October, it is constantly improving almost every day in Germany. Also the prices are lower than some had been feared. ICL-Y is indeed not available yet but do you really think this is a big deal by now? I mean this is basically ICL-U with a lower TDP....the market demand is very thin for Y-SKUs in general. OEMs can use -U SKUs with TDP down.

About Ice lake-SP....it is known since ages that ICL-SP is some of a lower end server generation because of the 10nm issues, in fact the 38C confirmation is a nice surprise, some time ago we thought Icelake-SP tops out at only 26C. The big 10nm server thing is coming with Sapphire Rapids.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
This timing doesn't inspire too much confidence. Intel might as well skip their 10nm node altogether since it still doesn't quite work
About Ice lake-SP....it is known since ages that ICL-SP is some of a lower end server generation because of the 10nm issues, in fact the 38C confirmation is a nice surprise, some time ago we thought Icelake-SP tops out at only 26C. The big 10nm server thing is coming with Sapphire Rapids.
Icelake Server is not using the same core from Icelake Client. Server core is using SunnycoveX which might have more AVX512 units than even the client model.

Icelake Client is the legacy 10nm aka the same node as Cannonlake. Icelake Server is going for Tigerlake-S(/H)/Tigerlake-U(/Y) node which is the legacy 10nm++ node.

Sapphire Rapids is most likely skipping the 10nm node all together. Willowcove client is 10nm++, that either means 10nm+++ or 7nm for WillowcoveX(the Server core).