Discussion Intel current and future Lakes & Rapids thread

Page 510 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Abwx

Lifer
Apr 2, 2011
10,940
3,445
136
I am comparing my own Skylake with fixed 5.1 clock.

Scoring so bad in Navigation and Text Compression means any workload that hits memory is badly impacted.

Btw i am not claiming this is due to pre-release or whatever impact. Early DDR5 could be this bad easily. Even 4800CL40 or so is disaster levels of memory latency.

Including those two (and a two others subscores that perform not well) ADL is still like 17% faster than your CML clock/clock, and close to 20% faster than the original SKL since IPC has been slightly improved by CML.
Aren’t the non-K variants locked to 65W?

I also find the L2 cache setup to be curious…

EDIT: could this chip possibly have 4 big cores and 8 small???

And still perform better at 4.2GHz than a 5.1GHz 10C CML..?.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Including those two (and a two others subscores that perform not well) ADL is still like 17% faster than your CML clock/clock, and close to 20% faster than the original SKL since IPC has been slightly improved by CML.


And still perform better at 4.2GHz than a 5.1GHz 10C CML..?.

The clocks of the chip were 4.8ghz, not 4.2, and yes. Absolutely possible. A quad core 1195g7 scores around 5500. Add 15%, and then you have the small cores…

EDIT: I think people think this chip has 8 big and no small, but the cache implies otherwise.
 

Abwx

Lifer
Apr 2, 2011
10,940
3,445
136
The clocks of the chip were 4.8ghz, not 4.2, and yes. Absolutely possible. A quad core 1195g7 scores around 5500. Add 15%, and then you have the small cores…

EDIT: I think people think this chip has 8 big and no small, but the cache implies otherwise.

Datas and instructions caches are accurate in respect of big core (L2 should be 1.25MB/core), small core has 64kB instruction cache.


 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136

The PL1 for non-K variants is 65W. I was just curious if it could be changed, since I have only ever owned K variants…not that I ever buy Intel these days.
Datas and instructions caches are accurate in respect of big core (L2 should be 1.25MB/core), small core has 64kB instruction cache.



EDIT: The L2 count rather, sorry! tabbing between things and I got mixed up. It says 2x.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
The PL1 for non-K variants is 65W. I was just curious if it could be changed, since I have only ever owned K variants…not that I ever buy Intel these days.


EDIT: The L2 count rather, sorry! tabbing between things and I got mixed up. It says 2x.

you are right. Each cluster of 4 cores gets 1.25mb cache.
 

Mopetar

Diamond Member
Jan 31, 2011
7,835
5,981
136
I wouldn't read anything into slides labeled "for illustrative purposes only" because they contain no hard data.

However on the face of it something has gone horribly wrong at Intel if the energy efficient Atom core isn't energy efficient and actually worse than the performance core.

Instead, I think the way to "read" the charts is that the efficiency cores don't scale past a certain amount of power. Take Apple's M1 SoC which does quite well, but you can't feed it the same 100W+ that x86 desktop CPUs are designed to handle.

Once you push any CPU core to a certain point it falls off hard and an efficiency core going to 4 GHz is a bit surprising. As others have pointed out the base clock will be much lower and so will the sweet spot of where it gets peak performance per watt.
 

Doug S

Platinum Member
Feb 8, 2020
2,254
3,485
136
Instead, I think the way to "read" the charts is that the efficiency cores don't scale past a certain amount of power. Take Apple's M1 SoC which does quite well, but you can't feed it the same 100W+ that x86 desktop CPUs are designed to handle.

Once you push any CPU core to a certain point it falls off hard and an efficiency core going to 4 GHz is a bit surprising. As others have pointed out the base clock will be much lower and so will the sweet spot of where it gets peak performance per watt.

You can call anything an "efficiency" core. It just needs to be more efficient than what you call your "performance" core.

If you strip out SMT, strip out AVX512, reduce the size of caches, buffers, rename registers etc. and have narrower decode/execution/load/store and redo the timing so your FO4 delays limit you to lower max clock rate then you have a core that is smaller, has a higher IPC, and is more energy efficient than your big core.

How much more efficient depends on how cut down it is. If your goal is for maximum efficiency then it won't contribute much to multithreaded loads but when your big cores are shut down you'll have huge power savings. If you want it to be a big contributor to multithreaded loads (and indeed it seems like that was one of Intel's design goals) you have to aim for less efficiency to get there. Its a compromise, you can't have both massive efficiency and major contribution to multithreaded loads.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,859
136
However on the face of it something has gone horribly wrong at Intel if the energy efficient Atom core isn't energy efficient and actually worse than the performance core.

Instead, I think the way to "read" the charts is that the efficiency cores don't scale past a certain amount of power. Take Apple's M1 SoC which does quite well, but you can't feed it the same 100W+ that x86 desktop CPUs are designed to handle.
I'm pretty sure E-cores will be energy efficient in mobile, but on the desktop their aim will be to maximize area efficiency instead.

Even if P-core is twice as power efficient than E-core at high clocks, the 4x E-core complex is still twice as area efficient.
 

uzzi38

Platinum Member
Oct 16, 2019
2,624
5,894
146
This is much more where I'd expect final silicon to perform. Could still be a touch slow, but any more differences it might be worth just chalking up to poor DDR5 or something.

Here's my 5950X for comparison: https://browser.geekbench.com/v5/cpu/6576128

Idk what PBO setting I ran the test at so ignore the MT score. Looking at ST score and we have:

5950X vs 12900K
Crypto: 4083 vs 4990
INT: 1459 vs 1614
FP: 1872 vs 1980
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,753
136
This is much more where I'd expect final silicon to perform. Could still be a touch slow, but any more differences it might be worth just chalking up to poor DDR5 or something.

Here's my 5950X for comparison: https://browser.geekbench.com/v5/cpu/6576128

Idk what PBO setting I ran the test at so ignore the MT score. Looking at ST score and we have:

5950X vs 12900K
Crypto: 4083 vs 4990
INT: 1459 vs 1614
FP: 1872 vs 1980

That does not seem great when the 11900K can get
Crypto: ~6,000
Int: ~ 1,600
FP: ~1,900
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Memory tuning matters, Text Compression / Nav scores while improved vs previuos score are still horribad.

It looks like GC will score ~2100 GB5 ST when paired with 6400+ DDR5
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,753
136
Comparing highly tuned scores with obviously not tuned ones is gonna give you a bad time.

There are higher 11900K results. Those ones had 3600 ram with unknown timings. Besides 1850-1900 total matches the pre-release scores for the 11900K that were leaking.

Geekbench 5 stock result from a review 1870

March runs - Anything sub 1800 is using sub 3200 ram.

I expect the AL results are using sub-optimal ram but those Int and FP scores are not great vs RL.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
On a properly behaving system, the MT/ST score ratio is somewhat lower than the MT/ST core ratio.

So if you see 7x gains, that's a CPU with 8 cores. With Hyperthreading, it's slightly over 8x.

Now for the Intel chips, that seems to hold true for earlier Skylake parts, like up to Kabylake. The ratio falls a bit after that. Probably because it's overextending 14nm?

Based on that the 17K score for that chip means it only has the 8 Golden Cove cores enabled and Hyperthreading is on.

Here is my max tuned 5950x vs Alderlake
Not sure this will be enough against Zen3d considering how good rocket lake 11900k score in this synthetic benchmark compared to "realworld performance".. :grimacing:

It's not just due to AVX. The Intel chips do better relative to AMD on Geekbench.

I speculate maybe it has to do with optimizations they have done back in the days when they wanted to compete in the portable space. I noticed that with the Bay Trail graphics doing much better on the mobile oriented ones such as GFXBench relative to other PC oriented benches.

Also, the Amberlake chip does pretty damn well on Geekbench. In the real world, it's actually quite horrible. The U chips are something like 50%+ faster in single thread, but in Geekbench ST the differences are about 10%. Obviously for certain scenarios Geekbench is only second next to Dhrystone in how bad it is.
 

naukkis

Senior member
Jun 5, 2002
705
576
136
Now for the Intel chips, that seems to hold true for earlier Skylake parts, like up to Kabylake. The ratio falls a bit after that. Probably because it's overextending 14nm?

It's because for ST they are boosting way higher clocks than those maintainable in MT. Intel slides are all about that situation. The bigger they make performance cores the less MT scalability they got from given power envelope. With Golden Cove they need those efficient cores to be able to be competitive in MT workloads.
 

Gideon

Golden Member
Nov 27, 2007
1,625
3,650
136
I was thinking what Intel could do about their Alder Lake AVX-512 situation. They need to support it in the future but there is a nasty tradeoff:

On the one hand they absolutely need to support all the big-core instructions on the small cores for hybrid to work well
On the other hand they really don't want t bloat up every single small core with full AVX-512 units, AI instructions, etc. Especially as Raptor lake will have 16 of those.

The solution is quite obvious: CMT (just as ARM is doing)

If a workload maxes out AVX-512 units on every core, it best be run on the big cores anyway. If those are full, then having less full-fat FPUs on small cores would help with thermals and hotspots.

CMT is sort-of a win-win:
  • Some tasks using AVX-512 erratically or lightly (speeding up memory operations, etc) do not need to be moved to big cores and do fine with a shared FPU.
  • FP heavy tasks will still do decently on little cores (as there are 16 small cores and 8 FPUs). Overall there would be a similar uplift as 8C Comet Lake -> 8C Rocket Lake.
  • Power and heat are much more manageable due to having less FPUs. These can also be placed a bit more sparsely than they otherwise could.
I'm not sure if Intel will pull it off by Raptor Lake, but I'm quite confident that's the way they're going. And they absolutely need AVX-512 if for no other reason then massively wasted die-space on big cores.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Best way to solve AVX-512 is to drop 512bit support and introduce AVX-256 by plundering AVX512 instruction set of useful instructions and keeping 256bit vector width.

They have made halfhearted effort in Gracemont already, by taking marketing department favourites like AVX512-VNNI and turning into AVX256-VNNI so they can make a slide how they beat "competition" in benchmarks that are useless for 99.99% people.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
FYU regarding the memory, The 12900k result is with DDR5, but with standard JEDEC timings (CL40).

This chip is going to be a beast. Intel just has to not mess things up by trying to up the price. They need to sell the chip for around the same price as the 11900k. If they do that, they will have a winner on their hands.
 
  • Like
Reactions: lightmanek