Discussion Intel current and future Lakes & Rapids thread

exquisitechar · Jul 17, 2023

mikk said:
Only 10% SC uplift, for real? It can't be just 10% IPC uplift.....maybe there is a big peak clock speed regression from 6 Ghz to like 5.5 Ghz or even lower. MT performance looks better to me if there is no SMT. Without SMT they are losing like 20% performance in some of the better multithreaded benchmarks. It's a mediocre improvement however considering it uses Lion Cove+Skymont+3nm.

Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.

moinmoin · Jul 17, 2023

JoeRambo said:
That L0-L1-L2-L3 change is real strange, i don't quite get it. (...)
P.S. The more i think about it: this whole "L0 naming scheme" is inflicted by Intel's byzantine internal corporate politics and marketing idiots, who (...) went on to obfuscate things with "L0 level" of caching.

Yeah, that change surely is primarily driven by marketing . Now thanks to L0 every level is bigger than that of the competition!!1!

Joe NYC · Jul 17, 2023

moinmoin said:
Intel traditionally is very good at filling their product range with different core gens that all at least decently serve the respective markets. They will try to do that with coming gens as well.

Intel's biggest competitive advantage against AMD hasn't been performance (which only really matters in high end, everything else is nerfed anyway) but quantity, and I expect Intel to deliver at least that in the future as well.

I think it is still incumbency on client side and enterprise part of datacenter.

Incumbency with OEMs and IT purchasing types.

A product with equal performance and equal price would go 50:50 without incumbency, and with incumbency, it goes 80% Intel 20% AMD.

mikk · Jul 17, 2023

exquisitechar said:
Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.

It could be but maybe it's not the only issue they are facing given that they apparently disabled SMT.

Abwx · Jul 17, 2023

tamz_msc said:
30-40% higher ST than Raptor Lake. High Confidence.

Lulz.

There must be something that makes actually 30-40% but MLID is known to mess the numbers since he obviously has no knowledge of the underlying physics at work, it could be 30-40% better efficency at ST isoperf wich he confuse with 30-40% better ST perf.

Edit : There s a summary of all slides at Computerbase :

Intel Arrow Lake: Performance-Prognosen zur Leistung dämpfen Erwartungen

Erste angeblich interne Intel-Leistungsprognosen zur nächsten Desktop-CPU Arrow Lake zeigen mit Ausnahmen nur geringe Leistungszuwächse.

www.computerbase.de

uzzi38 · Jul 17, 2023

Abwx said:
There must be something that makes actually 30-40% but MLID is known to mess the numbers since he obviously has no knowledge of the underlying physics at work, it could be 30-40% better efficency at ST isoperf wich he confuse with 30-40% better ST perf.

Edit : There s a summary of all slides at Computerbase :

Intel Arrow Lake: Performance-Prognosen zur Leistung dämpfen Erwartungen

Erste angeblich interne Intel-Leistungsprognosen zur nächsten Desktop-CPU Arrow Lake zeigen mit Ausnahmen nur geringe Leistungszuwächse.

www.computerbase.de

Ooooooorrrrrrrrrr the 30-40% claim is just totally made up (par for the course really)

Exist50 · Jul 17, 2023

JoeRambo said:
Cloud guys loved the new server L3 scheme, cause workloads were mostly contained in L2 and L3 cross pollution was no longer consideration in predictable perf. Said cloud guys moved on to love AMD's Z3 chiplets even more as it had actually functional L3 instead.

Huh? Cloud loves large caches. AMD's scheme is a bit of a pain for them with fragmentation across CCDs, and something they're likely to work on in future gens. In any case, I don't see a relation here. This is all presumably core-exclusive cache. Shouldn't change anything past the core boundary.

Also, the doubling of L2 with WLC was like moving from 15->16 cycles. Not sure where you're seeing such a large increase in latency.

JoeRambo said:
P.S. The more i think about it: this whole "L0 naming scheme" is inflicted by Intel's byzantine internal corporate politics and marketing idiots

What else would you call it? Seems to make more sense than giving everything else a +1. Especially if Atom doesn't have the same change.

Abwx said:
There must be something that makes actually 30-40% but MLID is known to mess the numbers since he obviously has no knowledge of the underlying physics at work

MLID is known to lie and fabricate numbers out of nowhere. Reality is under no obligation to conform to his nonsense. As we see time and time again.

H433x0n · Jul 17, 2023

coercitiv said:
Tick-tock, remember? Back then, same node (theoretically) meant bigger architectural changes. Zen 3 was on the same node.

With just one internal slide you went from extremely bullish on Intel to grumpy bear mode. We have plenty of time for either until 2024, or even until 2025.

I’m still bullish on the fabs.

If anything this fiasco has shown Intel’s fabs were probably underrated a bit. There seems to be some synergy we haven’t taken into account between Intel’s fabs and the design teams.

Abwx · Jul 17, 2023

uzzi38 said:
Ooooooorrrrrrrrrr the 30-40% claim is just totally made up (par for the course really)

If we take the most favourable ST number wich is Geekbench ST then 13% better perf amount easily to 30% better efficency at isoperf, so that s not difficult to find something that is within 30-40% improvement

For Geekbench MC the number is roughly 40% better efficency at isoperf but let say that he was talking of ST only.

Khato · Jul 17, 2023

exquisitechar said:
Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.

Indeed it is. Would be most amusing if Intel's process woes followed it to TSMC.

No notion of accuracy, but yield numbers this article is claiming aren't exactly great - https://news.kmib.co.kr/article/view.asp?arcid=0018468798 (I expect some amount of bias toward Samsung given it's a Korean site.)

Joe NYC · Jul 17, 2023

H433x0n said:
I’m still bullish on the fabs.

If anything this fiasco has shown Intel’s fabs were probably underrated a bit. There seems to be some synergy we haven’t taken into account between Intel’s fabs and the design teams.

It's probably using TSMC N3.

H433x0n · Jul 17, 2023

Joe NYC said:
It's probably using TSMC N3.

I know, that's what I meant. Based on these preliminary results it appears that Intel's design teams do not mesh well with it. I'm not saying that TMSC N3B is a bad node but it appears that their designs seem to perform much better (both frequency and perf/watt) on Intel's processes.

That's why I've said they've got to either tweak the process (unlikely that's possible at this point), change to an internal process (also unlikely) or move to a MTL refresh in a similar manner that they did with RPL & RPL-R.

Edit:

Suddenly this chart seems to be much more plausible now. It's just like the very first iteration of Intel's 10nm actually performed *worse* than 14nm+++.

JoeRambo · Jul 17, 2023

Exist50 said:
Huh? Cloud loves large caches. AMD's scheme is a bit of a pain for them with fragmentation across CCDs, and something they're likely to work on in future gens. In any case, I don't see a relation here. This is all presumably core-exclusive cache. Shouldn't change anything past the core boundary.

Also, the doubling of L2 with WLC was like moving from 15->16 cycles. Not sure where you're seeing such a large increase in latency.

First: the cloud question.
Yeah that is my claim too. Cloud loves chips with large local per core caches ( intel+ and amd+) and they love when workload misses that L2, it does not lighten up full chip, it does not lighten up square mile of silicon ( Intel-, AMD+). Workloads like VMs can rely on AMD's excellent and local L3, while on Intel each such miss means journey via mess to some L3 slice, missing it means even more mesh travelling. I mean we are talking about chip with 50ns L3, we shouldn't.
AMD's problem is more when there are inter core comms, but Intel's latest gen is barely better due to horrible mesh speeds/bw combined with small L3.

Second the cache question:

L1 is 5 clocks, L2 of old day used to be 12 cycles. Now it is 16 cycles, so 33% more. The only way to grow it and achieve perf and power efficiency is to go Apple way, they have ~18 cycle L2 cache of 16MB size that is shared by 4 cores.
So the problem Intel faces, is they can't do that. Their L1 is 48KB and Apple has 128KB. Their solution (level naming suggested by marketing retards): have L0 of 48KB, have L1 of 256KB and have L2 of some 4MB per core shared in core complex. SLC is just different level again.
See what they did there? They need to insert one more level to have Apple's levels of caching efficiency, cause their L1 is small and if their L2 latency degrades to 20-25 cycle zone, performance and power efficiency will suffer big time.

Exist50 said:
What else would you call it? Seems to make more sense than giving everything else a +1. Especially if Atom doesn't have the same change.

What about calling it what it really is: L1, L2, L3 and SLC ?

adroc_thurston · Jul 17, 2023

H433x0n said:
Suddenly this chart seems to be much more plausible now.

No it's not, doesn't account for fin options.

JoeRambo said:
The only way to grow it and achieve perf and power efficiency is to go Apple way

No.
many ways to skin the cat.

Also really it's the old Intel way, they had large shared 12 or 13clk L2's before Nehalem.

JoeRambo · Jul 17, 2023

adroc_thurston said:
No.
many ways to skin the cat.

Also really it's the old Intel way, they had large shared 12 or 13clk L2's before Nehalem.

They did not target 6Ghz back then and just 2 cores sharing L2 in Penryn as well. Without monstrous BW requirements. Quite a different task vs current cores.
But You are right, we might call it "old Intel way" as well.

eek2121 · Jul 17, 2023

This thread has gone off the deep end I see. 🍿

davidbepo said:
fun ARL is a confirmed clock regression over MTL, and it barely improves perf by intels own admision
BTW hilarious how you use my old MTL number and not the new one:
https://twitter.com/x/status/1666180008723791873
View attachment 83119

You are funny.

exquisitechar said:
Wasn’t there a rumor that it’s using N3B? Maybe that’s the reason for low clock speeds and low ST performance. Hopefully, it’s not the core itself being a disappointment.

There are rumors and an apparent agenda from a small number of folks who claim Intel is using TSMC for Arrow Lake (for the compute cores). I have yet to see anything credible. Intel has repeatedly confirmed that 20A/18A are not only on track, but ahead of schedule.

uzzi38 said:
Ooooooorrrrrrrrrr the 30-40% claim is just totally made up (par for the course really)

Of course. Have you guys seen the IPC uplift from Zen 3 -> Zen 4? It turns out that making chips is hard…luckily that chart, posted in a rumor section, is being misinterpreted. There are a few things not represented there.

adroc_thurston · Jul 17, 2023

JoeRambo said:
They did not target 6Ghz back then and just 2 cores sharing L2 in Penryn as well.

Eh, Merom was a very mean device for its era!
That big, low latency L2 really killed poor AMD.

eek2121 said:
Intel has repeatedly confirmed that 20A/18A are not only on track, but ahead of schedule.

They also said 10nm was on track and...

eek2121 said:
There are a few things not represented there.

Yea it's shit.

Joe NYC · Jul 17, 2023

eek2121 said:
There are rumors and an apparent agenda from a small number of folks who claim Intel is using TSMC for Arrow Lake (for the compute cores). I have yet to see anything credible. Intel has repeatedly confirmed that 20A/18A are not only on track, but ahead of schedule.

Intel also said, a long time ago, that "Intel 4" is manufacturing ready, but still, there are no products out that are using it.

How would it differ from Intel saying 20A/18A is ready, but Arrow Lake uses TSMC?

Thunder 57 · Jul 17, 2023

FWIW, I also believe ARL will using Intel fabs for the compute tile. It would be a huge blow for them to outsource it to TSMC, and would probably turn off a lot of investors and those interested in IFS for future products.

Mopetar · Jul 17, 2023

Exist50 said:
Yeah, he was livid that they dropped Cobalt.

https://twitter.com/x/status/1521492211426377729

View attachment 83083

Man loves his cobalt. After reading some of those tweets I wonder if his crack pipe is cobalt-plated.

adroc_thurston · Jul 17, 2023

Thunder 57 said:
FWIW, I also believe ARL will using Intel fabs for the compute tile

It really does not.

Mopetar said:
Man loves his cobalt. After reading some of those tweets I wonder if his crack pipe is cobalt-plated.

he's maybe the biggest lolcow on tech-twitter so yea.

Exist50 · Jul 17, 2023

JoeRambo said:
Cloud loves chips with large local per core caches ( intel+ and amd+) and they love when workload misses that L2, it does not lighten up full chip, it does not lighten up square mile of silicon ( Intel-, AMD+).

Cloud likes pretty large local caches, true, but they also like large, unified, shared caches. They don't care if the whole chip lights up if that gives them performance, and more importantly, consistency. No one's buying a chip based on L3 latency.

JoeRambo said:
Workloads like VMs can rely on AMD's excellent and local L3

VMs are the worst possible example. Once you spill over one CCD, you now have two separate cache pools to deal with. It's a fragmentation problem. If AMD introduces some sort of L4/SLC, it would be a huge boon for the cloud market for that reason alone. Obviously, it's something they're able to deal with, but it's not the ideal either.

JoeRambo said:
L1 is 5 clocks, L2 of old day used to be 12 cycles. Now it is 16 cycles, so 33% more.

L2 for what gen? It's been creeping up, but the doubling from SNC to WLC only added a single cycle, which was my point.

JoeRambo said:
What about calling it what it really is: L1, L2, L3 and SLC ?

So then what about Atom? You say it has just L2, L3, and SLC? I don't think that's any more sensible than leaving the existing designations intact, and just adding one more at the bottom for LNC.

And it's not an Apple-style SLC either. Intel's L3 is just a cache for the ring-attached cores (and historically graphics). Though I agree an Apple-style SLC would be nice to see. Maybe something on the SoC die.

But either way, I don't see why marketing would have any input. Consumers mostly don't even know cache exists, much less the intricacies of different implementations. AMD pulled a smart move by just totaling it all up as "game cache".

Exist50 · Jul 17, 2023

Joe NYC said:
Intel also said, a long time ago, that "Intel 4" is manufacturing ready, but still, there are no products out that are using it.

How would it differ from Intel saying 20A/18A is ready, but Arrow Lake uses TSMC?

Yeah, words are cheap. No one's going to believe them until the silicon is in hand, and for good reason. Once burned, twice shy.

ondma · Jul 17, 2023

Joe NYC said:
Intel also said, a long time ago, that "Intel 4" is manufacturing ready, but still, there are no products out that are using it.

How would it differ from Intel saying 20A/18A is ready, but Arrow Lake uses TSMC?

Yea, but MTL is using intel 4, right? Should be out in a few months at most.

I still dont know what to make of ARL. There is a lot of possible FUD and intel hate being thrown around, but with this many rumors, I have to think that ARL, whatever form it finally appears in, will be a disappointment from what I was expecting (20% or so increase in ST and MT performance with a significant increase in perf/watt). Sad really; I dont understand the glee some seem to have with Intel's problems. Even if one hates intel for some reason, it still is in everyone's best interest for them to have a competitive product, if only to keep AMD under pressure to keep prices down and continue to increase performance.

JoeRambo · Jul 17, 2023

Exist50 said:
VMs are the worst possible example. Once you spill over one CCD, you now have two separate cache pools to deal with. It's a fragmentation problem.

Actually not. The reality is that inter core communications and problem of separate cache domains are really way overblown and matter the most in enterprise DBMS and some very niche workloads that have problems with cache line sharing and so on.
Some workloads look like they would benefit from unified memory subsystem a lot like say JVM workloads? Well they also don't, since quite some of performance is dominated by garbage collection that is in turn done in chunks that are helped immensely by AMD's local L3. They perform OK on Intel too due to 2MB of L2 trumping 1MB of L2, but L3 is not a factor on Intel and is a factor on AMD.
Looking at 32C Z4 vs 32C is very telling what is going on.

Exist50 said:
So then what about Atom? You say it has just L2, L3, and SLC?

I don't get why Atom is big deal at all. It has it's own caching in it's own tile. L1 and shared L2, backed by said SLC.

Exist50 said:
And it's not an Apple-style SLC either. Intel's L3 is just a cache for the ring-attached cores (and historically graphics). Though I agree an Apple-style SLC would be nice to see. Maybe something on the SoC die.

They will move to such scheme in my opinion. Their uncore is their weak point everywhere. AMD is destroying them with their excellent L3 that is synchronous with cores, local to CCD and rather large even before bolt on cache.
On desktop/mobile their async uncore, running low clocks and not that good performance is no good and is terrible pain when trying to connect multiple chiplets. Once you disconnect graphics, IO and have pure compute, there are opportunities for current async ring and move to something tighter for performance cores.
On server mesh is disaster, everything is bad about it, energy, bandwidth, latency. The less cores use it, the more Intel wins.

Cinebech and marketing cores don't need to share low latency L3 with big performance cores, nor do they need that much bandwidth or that low latency.

Discussion Intel current and future Lakes & Rapids thread

Senior member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Platinum Member

Golden Member

Lifer

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Golden Member