Discussion Intel current and future Lakes & Rapids thread

Page 828 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

exquisitechar

Senior member
Apr 18, 2017
722
1,019
136
Only 10% SC uplift, for real? It can't be just 10% IPC uplift.....maybe there is a big peak clock speed regression from 6 Ghz to like 5.5 Ghz or even lower. MT performance looks better to me if there is no SMT. Without SMT they are losing like 20% performance in some of the better multithreaded benchmarks. It's a mediocre improvement however considering it uses Lion Cove+Skymont+3nm.
Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.
 

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
That L0-L1-L2-L3 change is real strange, i don't quite get it. (...)
P.S. The more i think about it: this whole "L0 naming scheme" is inflicted by Intel's byzantine internal corporate politics and marketing idiots, who (...) went on to obfuscate things with "L0 level" of caching.
Yeah, that change surely is primarily driven by marketing . Now thanks to L0 every level is bigger than that of the competition!!1!
 

Joe NYC

Diamond Member
Jun 26, 2021
3,651
5,198
136
Intel traditionally is very good at filling their product range with different core gens that all at least decently serve the respective markets. They will try to do that with coming gens as well.

Intel's biggest competitive advantage against AMD hasn't been performance (which only really matters in high end, everything else is nerfed anyway) but quantity, and I expect Intel to deliver at least that in the future as well.
I think it is still incumbency on client side and enterprise part of datacenter.

Incumbency with OEMs and IT purchasing types.

A product with equal performance and equal price would go 50:50 without incumbency, and with incumbency, it goes 80% Intel 20% AMD.
 
  • Like
Reactions: Tlh97 and moinmoin

mikk

Diamond Member
May 15, 2012
4,296
2,382
136
Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.

It could be but maybe it's not the only issue they are facing given that they apparently disabled SMT.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
30-40% higher ST than Raptor Lake. High Confidence.

Lulz.
There must be something that makes actually 30-40% but MLID is known to mess the numbers since he obviously has no knowledge of the underlying physics at work, it could be 30-40% better efficency at ST isoperf wich he confuse with 30-40% better ST perf.

Edit : There s a summary of all slides at Computerbase :

 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
There must be something that makes actually 30-40% but MLID is known to mess the numbers since he obviously has no knowledge of the underlying physics at work, it could be 30-40% better efficency at ST isoperf wich he confuse with 30-40% better ST perf.

Edit : There s a summary of all slides at Computerbase :

Ooooooorrrrrrrrrr the 30-40% claim is just totally made up (par for the course really)
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Cloud guys loved the new server L3 scheme, cause workloads were mostly contained in L2 and L3 cross pollution was no longer consideration in predictable perf. Said cloud guys moved on to love AMD's Z3 chiplets even more as it had actually functional L3 instead.
Huh? Cloud loves large caches. AMD's scheme is a bit of a pain for them with fragmentation across CCDs, and something they're likely to work on in future gens. In any case, I don't see a relation here. This is all presumably core-exclusive cache. Shouldn't change anything past the core boundary.

Also, the doubling of L2 with WLC was like moving from 15->16 cycles. Not sure where you're seeing such a large increase in latency.
P.S. The more i think about it: this whole "L0 naming scheme" is inflicted by Intel's byzantine internal corporate politics and marketing idiots
What else would you call it? Seems to make more sense than giving everything else a +1. Especially if Atom doesn't have the same change.
There must be something that makes actually 30-40% but MLID is known to mess the numbers since he obviously has no knowledge of the underlying physics at work
MLID is known to lie and fabricate numbers out of nowhere. Reality is under no obligation to conform to his nonsense. As we see time and time again.
 
  • Like
Reactions: Tlh97 and clemsyn

H433x0n

Golden Member
Mar 15, 2023
1,224
1,606
106
Tick-tock, remember? Back then, same node (theoretically) meant bigger architectural changes. Zen 3 was on the same node.

With just one internal slide you went from extremely bullish on Intel to grumpy bear mode. We have plenty of time for either until 2024, or even until 2025.

I’m still bullish on the fabs.

If anything this fiasco has shown Intel’s fabs were probably underrated a bit. There seems to be some synergy we haven’t taken into account between Intel’s fabs and the design teams.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Ooooooorrrrrrrrrr the 30-40% claim is just totally made up (par for the course really)

If we take the most favourable ST number wich is Geekbench ST then 13% better perf amount easily to 30% better efficency at isoperf, so that s not difficult to find something that is within 30-40% improvement

For Geekbench MC the number is roughly 40% better efficency at isoperf but let say that he was talking of ST only.
 

Khato

Golden Member
Jul 15, 2001
1,279
361
136
Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.
Indeed it is. Would be most amusing if Intel's process woes followed it to TSMC.

No notion of accuracy, but yield numbers this article is claiming aren't exactly great - https://news.kmib.co.kr/article/view.asp?arcid=0018468798 (I expect some amount of bias toward Samsung given it's a Korean site.)
 

Joe NYC

Diamond Member
Jun 26, 2021
3,651
5,198
136
I’m still bullish on the fabs.

If anything this fiasco has shown Intel’s fabs were probably underrated a bit. There seems to be some synergy we haven’t taken into account between Intel’s fabs and the design teams.
It's probably using TSMC N3.
 

H433x0n

Golden Member
Mar 15, 2023
1,224
1,606
106
It's probably using TSMC N3.
I know, that's what I meant. Based on these preliminary results it appears that Intel's design teams do not mesh well with it. I'm not saying that TMSC N3B is a bad node but it appears that their designs seem to perform much better (both frequency and perf/watt) on Intel's processes.

That's why I've said they've got to either tweak the process (unlikely that's possible at this point), change to an internal process (also unlikely) or move to a MTL refresh in a similar manner that they did with RPL & RPL-R.

Edit:

tech_insights_node_comparison.png

Suddenly this chart seems to be much more plausible now. It's just like the very first iteration of Intel's 10nm actually performed *worse* than 14nm+++.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Huh? Cloud loves large caches. AMD's scheme is a bit of a pain for them with fragmentation across CCDs, and something they're likely to work on in future gens. In any case, I don't see a relation here. This is all presumably core-exclusive cache. Shouldn't change anything past the core boundary.

Also, the doubling of L2 with WLC was like moving from 15->16 cycles. Not sure where you're seeing such a large increase in latency.
First: the cloud question.
Yeah that is my claim too. Cloud loves chips with large local per core caches ( intel+ and amd+) and they love when workload misses that L2, it does not lighten up full chip, it does not lighten up square mile of silicon ( Intel-, AMD+). Workloads like VMs can rely on AMD's excellent and local L3, while on Intel each such miss means journey via mess to some L3 slice, missing it means even more mesh travelling. I mean we are talking about chip with 50ns L3, we shouldn't.
AMD's problem is more when there are inter core comms, but Intel's latest gen is barely better due to horrible mesh speeds/bw combined with small L3.

Second the cache question:

L1 is 5 clocks, L2 of old day used to be 12 cycles. Now it is 16 cycles, so 33% more. The only way to grow it and achieve perf and power efficiency is to go Apple way, they have ~18 cycle L2 cache of 16MB size that is shared by 4 cores.
So the problem Intel faces, is they can't do that. Their L1 is 48KB and Apple has 128KB. Their solution (level naming suggested by marketing retards): have L0 of 48KB, have L1 of 256KB and have L2 of some 4MB per core shared in core complex. SLC is just different level again.
See what they did there? They need to insert one more level to have Apple's levels of caching efficiency, cause their L1 is small and if their L2 latency degrades to 20-25 cycle zone, performance and power efficiency will suffer big time.

What else would you call it? Seems to make more sense than giving everything else a +1. Especially if Atom doesn't have the same change.

What about calling it what it really is: L1, L2, L3 and SLC ?
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
No.
many ways to skin the cat.

Also really it's the old Intel way, they had large shared 12 or 13clk L2's before Nehalem.

They did not target 6Ghz back then and just 2 cores sharing L2 in Penryn as well. Without monstrous BW requirements. Quite a different task vs current cores.
But You are right, we might call it "old Intel way" as well.
 

eek2121

Diamond Member
Aug 2, 2005
3,414
5,051
136
This thread has gone off the deep end I see. 🍿

fun ARL is a confirmed clock regression over MTL, and it barely improves perf by intels own admision
BTW hilarious how you use my old MTL number and not the new one: View attachment 83119
You are funny.
Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.
There are rumors and an apparent agenda from a small number of folks who claim Intel is using TSMC for Arrow Lake (for the compute cores). I have yet to see anything credible. Intel has repeatedly confirmed that 20A/18A are not only on track, but ahead of schedule.

Ooooooorrrrrrrrrr the 30-40% claim is just totally made up (par for the course really)
Of course. Have you guys seen the IPC uplift from Zen 3 -> Zen 4? It turns out that making chips is hard…luckily that chart, posted in a rumor section, is being misinterpreted. There are a few things not represented there.
 
  • Like
Reactions: Tlh97

adroc_thurston

Diamond Member
Jul 2, 2023
7,148
9,918
106
Profanity is not permitted in the tech forums
They did not target 6Ghz back then and just 2 cores sharing L2 in Penryn as well.
Eh, Merom was a very mean device for its era!
That big, low latency L2 really killed poor AMD.
Intel has repeatedly confirmed that 20A/18A are not only on track, but ahead of schedule.
They also said 10nm was on track and...
There are a few things not represented there.
Yea it's shit.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,651
5,198
136
There are rumors and an apparent agenda from a small number of folks who claim Intel is using TSMC for Arrow Lake (for the compute cores). I have yet to see anything credible. Intel has repeatedly confirmed that 20A/18A are not only on track, but ahead of schedule.
Intel also said, a long time ago, that "Intel 4" is manufacturing ready, but still, there are no products out that are using it.

How would it differ from Intel saying 20A/18A is ready, but Arrow Lake uses TSMC?
 
  • Like
Reactions: Tlh97 and Exist50

Thunder 57

Diamond Member
Aug 19, 2007
4,035
6,748
136
FWIW, I also believe ARL will using Intel fabs for the compute tile. It would be a huge blow for them to outsource it to TSMC, and would probably turn off a lot of investors and those interested in IFS for future products.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Cloud loves chips with large local per core caches ( intel+ and amd+) and they love when workload misses that L2, it does not lighten up full chip, it does not lighten up square mile of silicon ( Intel-, AMD+).
Cloud likes pretty large local caches, true, but they also like large, unified, shared caches. They don't care if the whole chip lights up if that gives them performance, and more importantly, consistency. No one's buying a chip based on L3 latency.
Workloads like VMs can rely on AMD's excellent and local L3
VMs are the worst possible example. Once you spill over one CCD, you now have two separate cache pools to deal with. It's a fragmentation problem. If AMD introduces some sort of L4/SLC, it would be a huge boon for the cloud market for that reason alone. Obviously, it's something they're able to deal with, but it's not the ideal either.
L1 is 5 clocks, L2 of old day used to be 12 cycles. Now it is 16 cycles, so 33% more.
L2 for what gen? It's been creeping up, but the doubling from SNC to WLC only added a single cycle, which was my point.
What about calling it what it really is: L1, L2, L3 and SLC ?
So then what about Atom? You say it has just L2, L3, and SLC? I don't think that's any more sensible than leaving the existing designations intact, and just adding one more at the bottom for LNC.

And it's not an Apple-style SLC either. Intel's L3 is just a cache for the ring-attached cores (and historically graphics). Though I agree an Apple-style SLC would be nice to see. Maybe something on the SoC die.

But either way, I don't see why marketing would have any input. Consumers mostly don't even know cache exists, much less the intricacies of different implementations. AMD pulled a smart move by just totaling it all up as "game cache".
 
  • Like
Reactions: Joe NYC

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Intel also said, a long time ago, that "Intel 4" is manufacturing ready, but still, there are no products out that are using it.

How would it differ from Intel saying 20A/18A is ready, but Arrow Lake uses TSMC?
Yeah, words are cheap. No one's going to believe them until the silicon is in hand, and for good reason. Once burned, twice shy.
 

ondma

Diamond Member
Mar 18, 2018
3,310
1,693
136
Intel also said, a long time ago, that "Intel 4" is manufacturing ready, but still, there are no products out that are using it.

How would it differ from Intel saying 20A/18A is ready, but Arrow Lake uses TSMC?
Yea, but MTL is using intel 4, right? Should be out in a few months at most.

I still dont know what to make of ARL. There is a lot of possible FUD and intel hate being thrown around, but with this many rumors, I have to think that ARL, whatever form it finally appears in, will be a disappointment from what I was expecting (20% or so increase in ST and MT performance with a significant increase in perf/watt). Sad really; I dont understand the glee some seem to have with Intel's problems. Even if one hates intel for some reason, it still is in everyone's best interest for them to have a competitive product, if only to keep AMD under pressure to keep prices down and continue to increase performance.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
VMs are the worst possible example. Once you spill over one CCD, you now have two separate cache pools to deal with. It's a fragmentation problem.

Actually not. The reality is that inter core communications and problem of separate cache domains are really way overblown and matter the most in enterprise DBMS and some very niche workloads that have problems with cache line sharing and so on.
Some workloads look like they would benefit from unified memory subsystem a lot like say JVM workloads? Well they also don't, since quite some of performance is dominated by garbage collection that is in turn done in chunks that are helped immensely by AMD's local L3. They perform OK on Intel too due to 2MB of L2 trumping 1MB of L2, but L3 is not a factor on Intel and is a factor on AMD.
Looking at 32C Z4 vs 32C is very telling what is going on.
So then what about Atom? You say it has just L2, L3, and SLC?

I don't get why Atom is big deal at all. It has it's own caching in it's own tile. L1 and shared L2, backed by said SLC.


And it's not an Apple-style SLC either. Intel's L3 is just a cache for the ring-attached cores (and historically graphics). Though I agree an Apple-style SLC would be nice to see. Maybe something on the SoC die.

They will move to such scheme in my opinion. Their uncore is their weak point everywhere. AMD is destroying them with their excellent L3 that is synchronous with cores, local to CCD and rather large even before bolt on cache.
On desktop/mobile their async uncore, running low clocks and not that good performance is no good and is terrible pain when trying to connect multiple chiplets. Once you disconnect graphics, IO and have pure compute, there are opportunities for current async ring and move to something tighter for performance cores.
On server mesh is disaster, everything is bad about it, energy, bandwidth, latency. The less cores use it, the more Intel wins.

Cinebech and marketing cores don't need to share low latency L3 with big performance cores, nor do they need that much bandwidth or that low latency.
 
  • Like
Reactions: Tlh97 and moinmoin