Discussion Intel current and future Lakes & Rapids thread

Page 203 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

coercitiv

Diamond Member
Jan 24, 2014
6,199
11,895
136
Exactly, Gracemont with IPC around Sky Lake would be pretty powerfull and IMO it's not correct to be named as LITTLE core (as ARM's LITTLE cores are typically very low IPC in-order cores). It's more like MIDDLE core: saving a lot of die space and energy while delivering 80% IPC of Sunny Cove (or 60% of Golden Cove).
MIDDLE core.... this is comedy gold.

big and LITTLE are names based on relative die area, not performance. We already have a clear visual indicator of what that means in Lakefield, with 4-core Tremont cluster being only slightly bigger than a single Sunny Cove core.

8x Golden Cove (IPC +40% over SKL) + 8x cores of SKL IPC .... this thing could be pretty competitive against Zen3 and Zen4.
Pretty competitive?! High core count configurations in desktops are meant for throughout. In highly multi-threaded workloads this 8+8 config will behave like a 16 core ICL at best. It would be competitive... this year.

The only way this addition of small cores would make real sense would be with a different ratio, so that final throughput does not dilute the big core potential, but enhances it instead. Something like 8+16 could make sense from a performance perspective.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
MIDDLE core.... this is comedy gold.

big and LITTLE are names based on relative die area, not performance. We already have a clear visual indicator of what thatmeans in Lakefield, with 4-core Tremont cluster being only slightly bigger than a single Sunny Cove core.
Look at ARM big.LITTLE today. It's more complicated now. Snapdragon 865 consists of:
1x Prime core - Cortex A77 (2.8 Ghz + 512L2 cache) ... for top ST perf
3x Gold core - Cortex A77 (2.4 Ghz + 256L2 cache) ... for max MT perf
4x Silver core - Cortex A55 (1.8 Ghz + 128L2) ... for off screen background and idle

Intel's approach is more like Prime+Gold than traditional big.Little (Intel would use tiny in-order Atom cores instead much bigger Tremont/Gracemont). Intel's approach is more like gluing Bulldozer + Bobcat cores (both 2xALU OoO, but one high perf fat core and second one high efficiency skinny core but still close in terms of IPC).
 

Nothingness

Platinum Member
Jul 3, 2013
2,410
745
136
Lakefield has no ISA parity. Wikichip says AVX won't work on Core chips for Lakefield.
You mean they fused off AVX AVX-512 on the Sunny Cove Core? Sweet. I love Intel.

I wonder if ISA parity really is needed. Aren't the executables tagged with ISA requirements? Can't the processes be pinned to the CPU that has the right support in the SoC?

If Gracemont has support for AVX-512, that will change. It doesn't need full width. 2x cycles using 256-bit units or 4x cycles with 128-bit cycles will work.
You're preaching to the choir. I think having support for all of the ISA is paramount as it helps adoption. But given Intel past behavior I doubt they'll go that route.
 

Gideon

Golden Member
Nov 27, 2007
1,637
3,673
136
Didn't Atom's core have AVX512 support already in Knights landing?
Yes, but with minimal overlap (no the same instructions mostly), AVX512 is quite a mess:

zCuHmGTYoe4taYzpNFmsYtK6L5loBvTqjVdMxkQmw6U.jpg
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I wonder if ISA parity really is needed. Aren't the executables tagged with ISA requirements? Can't the processes be pinned to the CPU that has the right support in the SoC?

Executables are not tagged. It is responsibility of the programmer to deliver correct code to the cores by querying features.
In fact a single executable can support multiple code pathes for different feature levels.
 

yeshua

Member
Aug 7, 2019
166
134
86
IMO x86 desktop CPUs don't need BIG and little cores and even the ones meant for laptops could perform just fine with e.g. as many BIG cores as possible and just two little cores for light tasks to increase battery life (and even that seems like a good stretch considering that my Sky Lake U CPU from 2015 consumes less than 1.5W when it's idling on the desktop - is it really worth it adding special cores to reduce this figure even further?)
 
  • Like
Reactions: coercitiv

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
You mean they fused off AVX AVX-512 on the Sunny Cove Core? Sweet. I love Intel.

I wonder if ISA parity really is needed. Aren't the executables tagged with ISA requirements? Can't the processes be pinned to the CPU that has the right support in the SoC?

Executables supporting AVX-based ISA extensions have probably not been compiled with heterogeneous core configurations in mind. I'm guessing Intel ran into problems in testing and decided to fix the problem in hardware rather than try to have everyone recompile everything.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Executables are not tagged. It is responsibility of the programmer to deliver correct code to the cores by querying features.
In fact a single executable can support multiple code pathes for different feature levels.

That sounds like nightmare scenario and was designed for strictly heterogenous scenario, where features are detected once at startup and scheduled using those features detected.
This scheme does not help with Big.little in any way.

1) Processes get thrown around cpus and if features are different, obviously crash will happen.
2) If vendors "fix" (1) by pinning process to just one type of cores, the following happens "hey, why my Cinebench is 10x slower than AMD five times out of 6"
3) Whitelisting processes to be able to run on .little cores is frankly retarded due to limited impact for money spent and would get limited to OS vendors and several apps in the field.

So that leave with 2 solutions:
1) HW way Apple went
2) Heterogenous cores ( (1) can also have heterogenous cores ) and hoping for time machine invention to help with process scheduling making decision using information from future.
 

Nothingness

Platinum Member
Jul 3, 2013
2,410
745
136
Executables are not tagged.
Is that a limitation of Windows executable format or of x86? AArch64 ELF have HWCAP.

It is responsibility of the programmer to deliver correct code to the cores by querying features.
In fact a single executable can support multiple code pathes for different feature levels.
Then you can't migrate and benefit from a better ISA. Well I guess this clearly means you need the same ISA on all cores.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Is that a limitation of Windows executable format or of x86? AArch64 ELF have HWCAP.

elf_hwcap is a gobal which is initialized by the kernel at load time. The user application then queries hwcap at runtime calling

Code:
unsigned long hwcap = getauxval(AT_HWCAP);

From there on the applications typically initializes function pointers with the matching implementation.

There is nothing static in the elf file.

Biggest difference here between x86 and ARM is, that on ARM there is no trivial way to figure out HW capabilities at EL0, so kernel need to expose these capabilties. On Linux you can use the code snippet posted above, on Windows there is no equivalent i am aware of but you can safely assume ARMv8.2.

Then you can't migrate and benefit from a better ISA. Well I guess this clearly means you need the same ISA on all cores.

Precisely.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Exactly, Gracemont with IPC around Sky Lake would be pretty powerfull and IMO it's not correct to be named as LITTLE core (as ARM's LITTLE cores are typically very low IPC in-order cores). It's more like MIDDLE core: saving a lot of die space and energy while delivering 80% IPC of Sunny Cove (or 60% of Golden Cove).

8x Golden Cove (IPC +40% over SKL) + 8x cores of SKL IPC .... this thing could be pretty competitive against Zen3 and Zen4.
Exactly the wishful thinking we all love you for :)
 
  • Like
Reactions: Thunder 57

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
It is responsibility of the programmer to deliver correct code to the cores by querying features.
Now imagine if AMD would launch this big.LITTLE desktop CPU and you're gonna see immediately how ridiculous your sentence sounds all of a sudden.
 

Nothingness

Platinum Member
Jul 3, 2013
2,410
745
136
elf_hwcap is a gobal which is initialized by the kernel at load time. The user application then queries hwcap at runtime calling

Code:
unsigned long hwcap = getauxval(AT_HWCAP);

From there on the applications typically initializes function pointers with the matching implementation.

There is nothing static in the elf file.
Oh I thought it was a property in the ELF file. Thanks for correcting my misunderstanding!

Biggest difference here between x86 and ARM is, that on ARM there is no trivial way to figure out HW capabilities at EL0, so kernel need to expose these capabilties.
Yes the /proc/cpuinfo output on ARM is regularly being discussed to get around that...

On Linux you can use the code snippet posted above, on Windows there is no equivalent i am aware of but you can safely assume ARMv8.2.
Since Windows can run on Raspberry you can't assume ARMv8.2 :) Joke aside I hope MS will support more than Qualcomm chips.
 

Nothingness

Platinum Member
Jul 3, 2013
2,410
745
136
Now imagine if AMD would launch this big.LITTLE desktop CPU and you're gonna see immediately how ridiculous your sentence sounds all of a sudden.
I fail to see what is ridiculous (well except that big.LITTLE for desktop sounds ridiculous of course).
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
I fail to see what is ridiculous (well except that big.LITTLE for desktop sounds ridiculous of course).
The notion that efficient programming is the responsibility of the software developer. But anytime AMD is first with a technology or a different way of executing, they're almost always blamed for it. Mantle/DX12, async computing etc.

The 'ridiculous' is just meant in light of that. For what it's worth, I totally agree that it should really be the responsibility of the software developers to not be lazy all the time.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Since Windows can run on Raspberry you can't assume ARMv8.2 :) Joke aside I hope MS will support more than Qualcomm chips.

Technically Windows on ARM can run on a variety of ARM SoCs as is as long as they have proper EFI and device recovery support. It pretty much is a generic ARM64 implementation of Windows (plus x86 emulation/WoW on top).
Practically it does require device driver support from the OEM when you want to have anything more than basic i/o. So i assume Microsoft is not the limiting factor here.
 

Markeyse

Member
Feb 9, 2020
112
13
41
I just want them to release a Workstation/Server style CPU with almsot 4GHz of clock and about 72 Lanes of PCIe! That aint' too hard to ask for.
 

jpiniero

Lifer
Oct 1, 2010
14,591
5,214
136
Maybe something like this for desktop?

i9: 8 cores + 2 clusters
i7: 8 cores + 1 cluster
i5: 6 cores + 1 cluster or 6 cores
i3: 4 cores + 1 cluster or 4 cores
Pentium: 2 cores + 2 clusters or 2 cores
Celeron: 2 cores

The 6/4 cores + 2 clusters would be reserved for mobile.
 

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
You have to wonder if at some point down the road if it would be better to use big/LITTLE instead of SMT. No side channel attacks if there is no side channel. I don't think software is there yet for it, but it shouldn't be too hard to add in.
 

Spartak

Senior member
Jul 4, 2015
353
266
136
Not sure if the below slide has been discussed yet? Couldn't find anything about it in the 3-4 days after the article so apologies if it came up later.

But as much as I was counting on Rocket Lake to be a 14nm Willow Cove backport, it seems to be the final iteration of Coffee Lake, not even with AVX512 included.
index.php


The 15W / 125W TDP split makes sense if you see it as a powerful CPU for ultramobile where there is no room for a dGPU and reuse it for higher-end/SFF desktops for non-gamers.

I was looking forward to upgrade to Rocket Lake around april next year but I guess I'll have to wait at least another year for Alder Lake or be swayed by Zen in the meantime.

My estimate right now:
Comet Lake S - 10C - skylake - 14nm - april 2020
Rocket Lake S - 8C - skylake - 14nm - nov 2020 ~ april 2021
Alder Lake S - 8C/8c - Willow Cove - 7nm/10nm? - april ~ nov 2022
 

exquisitechar

Senior member
Apr 18, 2017
657
871
136
Not sure if the below slide has been discussed yet? Couldn't find anything about it in the 3-4 days after the article so apologies if it came up later.

But as much as I was counting on Rocket Lake to be a 14nm Willow Cove backport, it seems to be the final iteration of Coffee Lake, not even with AVX512 included.
index.php


The 15W / 125W TDP split makes sense if you see it as a powerful CPU for ultramobile where there is no room for a dGPU and reuse it for higher-end/SFF desktops for non-gamers.

I was looking forward to upgrade to Rocket Lake around april next year but I guess I'll have to wait at least another year for Alder Lake or be swayed by Zen in the meantime.

My estimate right now:
Comet Lake S - 10C - skylake - 14nm - april 2020
Rocket Lake S - 8C - skylake - 14nm - nov 2020 ~ april 2021
Alder Lake S - 8C/8c - Willow Cove - 7nm/10nm? - april ~ nov 2022
That leak was incorrect, it was corrected later. Rocket Lake has AVX-512 support. That being said, being swayed by Zen 3/4 would make sense either way (until 7nm desktop, at least). ;)
Alder is 10nm, and it should be Golden Cove, no?
 
  • Like
Reactions: mikk

Spartak

Senior member
Jul 4, 2015
353
266
136
That leak was incorrect, it was corrected later. Rocket Lake has AVX-512 support. That being said, being swayed by Zen 3/4 would make sense either way (until 7nm desktop, at least). ;)
Alder is 10nm, and it should be Golden Cove, no?

Can you point to the correction? Curious if the stated VRM is wrong as well.

I have trouble seeing a 10nm desktop product ever coming to fruition, so I think it will be skipped. OTOH, I find it hard to believe they'll skip two generations of micro-architecture for the desktop. But the truth will probably in the middle so most likely would be 10nm/Willow Cove or 7nm Golden Cove. If it's indeed 10nm Golden Cove, clocks will be brutally low.
 
Last edited: