Discussion Intel current and future Lakes & Rapids thread

Page 829 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mikk

Diamond Member
May 15, 2012
4,296
2,382
136
FWIW, I also believe ARL will using Intel fabs for the compute tile. It would be a huge blow for them to outsource it to TSMC, and would probably turn off a lot of investors and those interested in IFS for future products.


Best hope would be partially 20A and partially TSMC but not sure. Remember Raichu told last month that ARL-U will be based on a refreshed MTL. Why they are doing this if 20A/N3 works so well? It's like they are trying to minimize the needs for 20A/N3 by cutting the originally planned core counts or some of the entire compute tiles. Or maybe there is something design related which failed somehow.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Actually not. The reality is that inter core communications and problem of separate cache domains are really way overblown and matter the most in enterprise DBMS and some very niche workloads that have problems with cache line sharing and so on.
It certainly does matter. It drastically changes how you bin-pack VMs, and performance considerations for >8c ones. Again, it's a very manageable problem, but definitely one cloud providers would want to avoid. Just the same as how it would be nice if Intel's mesh were lower latency. There's no objectively superior solution here.
I don't get why Atom is big deal at all. It has it's own caching in it's own tile. L1 and shared L2, backed by said SLC.
Currently, the L1 and L2 are analogous structures between Core and Atom. Under your naming, Atom's L2 would be equivalent to Core's L3. Just seems more awkward than the alternative.
On desktop/mobile their async uncore, running low clocks and not that good performance is no good and is terrible pain when trying to connect multiple chiplets. Once you disconnect graphics, IO and have pure compute, there are opportunities for current async ring and move to something tighter for performance cores.
AMD's 8c CCX interconnect is also a ring bus, though I don't recall anyone publishing info on what frequency it runs at. But their approach is broadly similar to Intel's, so I don't think that line of thinking makes much sense. But we can wait and see what fabric they use for their hybrid chips to compare.

And did you just refer to Atom as "marketing cores"...?
 

Thunder 57

Diamond Member
Aug 19, 2007
4,035
6,748
136
Yea, but MTL is using intel 4, right? Should be out in a few months at most.

I still dont know what to make of ARL. There is a lot of possible FUD and intel hate being thrown around, but with this many rumors, I have to think that ARL, whatever form it finally appears in, will be a disappointment from what I was expecting (20% or so increase in ST and MT performance with a significant increase in perf/watt). Sad really; I dont understand the glee some seem to have with Intel's problems. Even if one hates intel for some reason, it still is in everyone's best interest for them to have a competitive product, if only to keep AMD under pressure to keep prices down and continue to increase performance.

I don't think it's all outrate hate. Probably some contempt for the years of abusing their leadership. Maybe some just want Intel to keep messing up until the market is closer to 50/50. And of course some of it is simply irrational hate.

Best hope would be partially 20A and partially TSMC but not sure. Remember Raichu told last month that ARL-U will be based on a refreshed MTL. Why they are doing this if 20A/N3 works so well? It's like they are trying to minimize the needs for 20A/N3 by cutting the originally planned core counts or some of the entire compute tiles. Or maybe there is something design related which failed somehow.

I was justating my opinion.I can't get a good read on what Intel is up to lately. Did you mean Intel 3, because N3 seems to be working well enough. I think 20A/N3 was their goal. It would be a shame if they had to drop 20A for N3 or Intel 4. If they did that decision would have to have been made some time ago by now though, right?
 
  • Like
Reactions: Tlh97

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
I think it is still incumbency on client side and enterprise part of datacenter.

Incumbency with OEMs and IT purchasing types.

A product with equal performance and equal price would go 50:50 without incumbency, and with incumbency, it goes 80% Intel 20% AMD.
Yes, of course that plays a role as well.

But we need to keep in mind that AMD at this point is much more margin driven than Intel, as funny as that sounds. So if AMD can it will take all possible big margin markets. While a significant part of the total market by revenue, quantity wise that will still be little of the total market by units volume. While AMD will continue to reliably participate in lower margin sections to keep a foot in, they'll likely keep plenty of it up for grabs.

Intel on the other hand will want to keep quantity up to keep their fabs (respective obligations with TSMC) filled. Only once they get quantity wise significant outside customers for their fabs this may change.
 
  • Like
Reactions: Tlh97 and Joe NYC

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
AMD's 8c CCX interconnect is also a ring bus, though I don't recall anyone publishing info on what frequency it runs at. But their approach is broadly similar to Intel's, so I don't think that line of thinking makes much sense

It is clocked at core speeds and is low latency, high bandwidth and said bandwidth is cumulative for each CCD. Intel's ring has asynchronous clocks, multiple additional ring stops for IO, MC, atom, GPU. So total BW is much more limited, latency suffers due to crossing clock domains. it is ~300GB/s vs 2TB/s for 8C affair on client.
I think at some point increasing core private caches starts to make less. Intel is already using that strategy for Atom core cluster sharing L2, why not extend it to performance cores? They obviuosly can't hit a latency as low as Apple due to targetting much higher clocks and also can't afford much more latency either, as L1 is anemic.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
It is clocked at core speeds
Do you have a source for that? The different cores can be different individual speeds, so I don't see how a "core speed" ring would make sense. And having DVFS for the ring would be beneficial for power, so if they don't have it today, expect them to add it eventually.
Intel's ring has asynchronous clocks, multiple additional ring stops for IO, MC, atom, GPU
With MTL, it's just the cores. No GPU, IO, etc. Well, and the attach to the other die/fabric, but same with AMD. Doubt it helps any.
I think at some point increasing core private caches starts to make less. Intel is already using that strategy for Atom core cluster sharing L2, why not extend it to performance cores?
I think there's merit to that idea, but I'm not sure how well it would work for ST performance. For reference, ARM used to have cores within a cluster share an L2, but with DynamiQ, they made that L2 private per core.
So far we’ve discussed DynamIQ’s flexibility and scalability features, but it also improves CPU performance through a new cache topology. With bL, CPUs inside a cluster had access to a shared L2 cache; however, DynamIQ compatible CPUs (currently limited to A75/A55) have private L2 caches that operate at the CPU core’s frequency. Moving the L2 closer to the core improves L2 cache latency by 50% or more. DynamIQ also adds another level of cache: The optional shared L3 cache lives inside the DSU and is 16-way set associative. Cache sizes are 1MB, 2MB, or 4MB, but may be omitted for certain applications like networking. The L3 cache is technically pseudo-exclusive, but ARM says it’s really closer to being fully-exclusive, with nearly all of the L3’s contents not appearing in the L2 and L1 caches.

For Atom, it seems like more of an area efficiency play than a performance one. Also ties in with the whole 4c to a ring stop idea. Are you suggesting a similar idea for the big cores? Multiple sharing a ring stop and fewer total stops?

Though on that topic, anyone have data handy for Atom vs Core L2 latency? Would make for a good comparison point.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,651
5,198
136
Yea, but MTL is using intel 4, right? Should be out in a few months at most.

Yes, it is.

Weird part is that there is no follow up from Intel. None of the client products are using "Intel 3". All the other dies of MTL that are going to TSMC, none of them are coming back to Intel.

Why not put the GPU die on Intel 3, instead of TSMC N5 for Arrow Lake? "Intel 3" is being used only in servers, where Intel is losing market share, and not in client, where Intel has been maintaining its share.

It seems to me like Intel is hiding something about its fabs or the process nodes. Why spam fabs all over the world, without a single large outside customer and at the same time, move Intel's own production from its fabs to TSMC?
 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,651
5,198
136
Yes, of course that plays a role as well.

But we need to keep in mind that AMD at this point is much more margin driven than Intel, as funny as that sounds. So if AMD can it will take all possible big margin markets. While a significant part of the total market by revenue, quantity wise that will still be little of the total market by units volume. While AMD will continue to reliably participate in lower margin sections to keep a foot in, they'll likely keep plenty of it up for grabs.

Intel on the other hand will want to keep quantity up to keep their fabs (respective obligations with TSMC) filled. Only once they get quantity wise significant outside customers for their fabs this may change.

I was somewhat disagreeing with part of your post, where you said that "quantity" or capacity is Intel's competitive advantage.

I think that only plays a role in tiny phase of the semi-cycle when there is a severe shortage. In all other times, TSMC can ramp up or re-allocate capacity to satisfy all customers.

Just look at NVidia, for example. NVidia was able to ramp up from almost nothing (single A100 product at TSMC) to all of the client GPU, all of the server GPUs and also the new Tulip mania for the AI cards.

Nothing would be stopping TSMC from providing all the capacity AMD needs to take over Intel's market share, since that share gain would be permanent for TSMC, while Intel can (and in fact is planning to) screw TSMC and move the production to its own fabs.

I will only believe that we have a level playing field when 2 equally competitive products, priced at the same price, split market 50:50. Intel gaining market share in client last 3 quarters, to a grotesque degree / market share breakdown is an indication that something else, something fishy is going on...
 
  • Like
Reactions: Tlh97 and moinmoin

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Weird part is that there is no follow up from Intel. None of the client products are using "Intel 3". All the other dies of MTL that are going to TSMC, none of them are coming back to Intel.
Someone mentioned Raichu saying that ARL-U was a refreshed MTL. That might be an opportunity, assuming it's just not a straight up rebrand, which is very possible.
Why not put the GPU die on Intel 3, instead of TSMC N5 for Arrow Lake?
Going off of rumors, isn't ARL's GPU still 12.7-based? Sounds like they might be straight up reusing it from MTL. Or maybe a minor tweak at best (Alchemist+?). Which would be weird considering LNL seems to be on 12.9/Battlemage.
I will only believe that we have a level playing field when 2 equally competitive products, priced at the same price, split market 50:50. Intel gaining market share in client last 3 quarters, to a grotesque degree / market share breakdown is an indication that something else, something fishy is going on...
I think a more interesting question is comparing marketshare in the segments where they compete. AMD has some big holes in the lineup that PHX2 and the lower end Ryzen 7000 chips should help fill. Not a high margin market, but certainly high volume.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
maybe pat can repent for his sins by praying more. really if this is all intel's got i feel like a damn fool for having a gram of confidence in the berk even ultimately arl was not his baby. I pray this is testament to another blatantly stupid write up by igor based on nothing but i fear the tables have turned and we'll be stuck with amd banging up on our balls for the next decade.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,651
5,198
136
Someone mentioned Raichu saying that ARL-U was a refreshed MTL. That might be an opportunity, assuming it's just not a straight up rebrand, which is very possible.

Going off of rumors, isn't ARL's GPU still 12.7-based? Sounds like they might be straight up reusing it from MTL. Or maybe a minor tweak at best (Alchemist+?). Which would be weird considering LNL seems to be on 12.9/Battlemage.

Still, it is a lot of dies. Maybe 30-50 million per quarter.

Why not replace the TSMC N5 with "Intel 3". If for nothing else, at least to stop taking underutilization charge for Intel's fabs being half empty.

Something is not adding up here, hopefully, some new information will surface to make sense of this.

I think a more interesting question is comparing marketshare in the segments where they compete. AMD has some big holes in the lineup that PHX2 and the lower end Ryzen 7000 chips should help fill. Not a high margin market, but certainly high volume.
I was specifically talking about a product, not entire product line.

Take DDR5 desktop processors. Roughly equally competitive, competitiveness could even be tilting to AMD, roughly equally priced. In normal market, it would result in 50:50 market share.

Yet, Intel somehow manages to turn every market into an abnormal one, and the die always falls on 6 for Intel, 1 for AMD. Regardless of the merit of the products.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Yet, Intel somehow manages to turn every market into an abnormal one, and the die always falls on 6 for Intel, 1 for AMD. Regardless of the merit of the products.

Because as long as they are not competitive perfs and perf/watt wise by a big amount they sell to OEMs at zero margin prices to keep AMD gaining marketshare, thats the reason why they fell in the red recently or make minimal profits despite huge quantities.
 
Last edited:
  • Like
Reactions: Tlh97 and Joe NYC

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Why not replace the TSMC N5 with "Intel 3". If for nothing else, at least to stop taking underutilization charge for Intel's fabs being half empty.
Just spitballing here, but their layoffs is graphics might have a lot to do with that, on top of any fab considerations. Can't make a new die if you fired all the people who'd build it.
Take DDR5 desktop processors. Roughly equally competitive, competitiveness could even be tilting to AMD, roughly equally priced. In normal market, it would result in 50:50 market share.
Probably a combination of supply, marketing, pricing, and OEM relationships (including platform support). In mobile in particular, AMD's really been held back by their slower design win ramp with OEMs. Most of the early launch designs for mobile processors are basically Intel reference platforms/co-design. AMD's done some of that (Asus Zephyrus), but at a much smaller scale. They've been building that expertise up now that they have the money for it, but it'll take time.
 

H433x0n

Golden Member
Mar 15, 2023
1,224
1,606
106
Take DDR5 desktop processors. Roughly equally competitive, competitiveness could even be tilting to AMD, roughly equally priced. In normal market, it would result in 50:50 market share.

Yet, Intel somehow manages to turn every market into an abnormal one, and the die always falls on 6 for Intel, 1 for AMD. Regardless of the merit of the products.
I think you’re reading too much into this. AMD does fine on desktop. Perhaps not DDR5 desktop in particular but they still sell a lot of Zen 3.

They don’t buy enough wafers to even get close to that 50% market share figure. In 2022, Intel almost purchased / allocated more wafers from TSMC than AMD. The volume just isn’t there from AMD.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Do you have a source for that? The different cores can be different individual speeds, so I don't see how a "core speed" ring would make sense. And having DVFS for the ring would be beneficial for power, so if they don't have it today, expect them to add it eventually.

It's Chips and Cheese, the best site for actual tech investigations. In some earlier article they were discussing this point.


They've also just released super excellent article about L3 caching in Genoa-X and Intel's incompetent design is in some graphs too. The scale of disaster is awe inspiring really.


I think the real weakness now is that 70GB/s memory bw for single CCD, if AMD manages to design something that is less infinite in name, but gives more BW at palatable energy efficiency, their numbers will improve even more.
But even this is made almost non problem by that L3 cache they have...
 
  • Haha
Reactions: Joe NYC

Henry swagger

Senior member
Feb 9, 2022
511
313
106
Eh, Merom was a very mean device for its era!
That big, low latency L2 really killed poor AMD.

They also said 10nm was on track and...

Yea it's shit.
Get out of this thread clown.. this is a intel section amd and lisa dnot know you sir lol



Personal attacks on forum members not allowed.


esquared
Anandtech Forum Director
 
Last edited by a moderator:

Joe NYC

Diamond Member
Jun 26, 2021
3,651
5,198
136
I think you’re reading too much into this. AMD does fine on desktop. Perhaps not DDR5 desktop in particular but they still sell a lot of Zen 3.

They don’t buy enough wafers to even get close to that 50% market share figure. In 2022, Intel almost purchased / allocated more wafers from TSMC than AMD. The volume just isn’t there from AMD.
I think this was a circular argument.

There is a non-market mechanism that Intel uses to make sure that market participants (OEMs) don't purchase 50% MS of equally performing and equally priced product.

There is no limit to how much product AMD (TSMC) can supply.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
It's Chips and Cheese, the best site for actual tech investigations. In some earlier article they were discussing this point.
I found them use such a term here, but not really any further elaboration. Do you have such a link? It's perhaps something they covered in older article.
They've also just released super excellent article about L3 caching in Genoa-X and Intel's incompetent design is in some graphs too. The scale of disaster is awe inspiring really.
Don't spend too much time obsessing over microbenches. SPR obviously has a laundry list of problems, but pigeonholing on L3 latency while ignoring capacity, distribution, and power isn't particularly helpful. This kind of discussion reminds me of similar hand-wringing done about memory latency on various products (Alder Lake, Matisse, etc), which never really turned out to be meaningful. Let's see what EMR does there first.

Or if you really want to mock Intel's uncore design, I think Meteor Lake should prove a far more deserved opportunity. I think we'll all have fun there.
 
  • Like
Reactions: controlflow

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
I was somewhat disagreeing with part of your post, where you said that "quantity" or capacity is Intel's competitive advantage.
Maybe it's more fitting to say it's Intel's curse. Intel can't run its fabs empty. Maybe sometime there is a big IFS customer and Intel alone needn't fill the fabs anymore, but that point in time is not yet visible.

Intel gaining market share in client last 3 quarters, to a grotesque degree / market share breakdown is an indication that something else, something fishy is going on...
Nothing fishy really. Intel flooded the market (since they have to fill their fabs anyway) whereas AMD could purely look at the margins and noped out for as long as necessary for this wave to end.

While it's true that TSMC is perfectly capable of ramping, such cases are combined with heavy investments akin to Apple, Nvidia, or even Intel. Apple and Nvidia both don't do it for the masses but essentially only serve high margin markets. AMD does a mix and tries to move to reduce the low margin part of the mix. So as a result AMD continues being very conservative with shipping any quantity of its products. And that's Intel's opportunity, Intel has the capacity it must use anyway so even if at a lower margin and with otherwise less competitive positioning it can still control significant chunks of the market until better days (be those for IFS or for in-house chip designs).
 
  • Like
Reactions: Joe NYC

DrMrLordX

Lifer
Apr 27, 2000
22,905
12,976
136
It seems to me like Intel is hiding something about its fabs or the process nodes.
I'm just guessing here, but it's likely that Intel 4 and 3 (and possibly 20a and 18a) aren't going to be anywhere near as good or as on-time as advertised by Gelsinger.
 

mikk

Diamond Member
May 15, 2012
4,296
2,382
136
A pitty they don't work on a MTL refresh using Intel 3, or is there a chance MTL refresh could use Intel 3? I guess not. If they really intend to use Intel 4 MTL for ARL-U we might see ARL with a big variation of process node and architectures. Some people are telling 20A is still planned for some SKUs. Maybe N3B for 8+16, 20A for 6+8 and Intel 4 for 2+8.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I found them use such a term here, but not really any further elaboration. Do you have such a link? It's perhaps something they covered in older article.

It scales with CPU clock speed, would make little sense to have it async. How else they can achieve so good latency in clocks and ns.
And also i think the fact that X3D variants don't clock as high is giveaway of sync clocks.

It is contrast with Intel, who had uncore in sync till Haswell. Raptor Lake moved performance forward due to 2MB of L2 making less dependency on L3.

Don't spend too much time obsessing over microbenches. SPR obviously has a laundry list of problems, but pigeonholing on L3 latency while ignoring capacity, distribution, and power isn't particularly helpful. This kind of discussion reminds me of similar hand-wringing done about memory latency on various products (Alder Lake, Matisse, etc), which never really turned out to be meaningful. Let's see what EMR does there first.

Or if you really want to mock Intel's uncore design, I think Meteor Lake should prove a far more deserved opportunity. I think we'll all have fun there.

Not obsessing about microbenches, but (completely ignoring power, area, beancounter headaches etc) problem for Intel is they are already showing up in real world benches. Intel has real powerful core with great L1 and L2 cores that is let down by rest of memory subsystem both from latency and from bandwidth point of view. That has implications everywhere, in mobile, desktop and servers.
I am esp. mad about servers, since we are mostly buying 32C systems and SPR was nothing short of slow motion unfolding disaster from 2021.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
It scales with CPU clock speed, would make little sense to have it async. How else they can achieve so good latency in clocks and ns.
With different per-core clock speeds, how can it be synchronous? That's what I'm trying to dig into here. IIRC, Haswell added separate DVFS to Intel's ring bus, but it was still asynchronous before then.
Not obsessing about microbenches, but (completely ignoring power, area, beancounter headaches etc) problem for Intel is they are already showing up in real world benches. Intel has real powerful core with great L1 and L2 cores that is let down by rest of memory subsystem both from latency and from bandwidth point of view.
I have the opposite impression. GLC is a terrible core. Bloated and power hungry compared to its competitors. Add on a less than ideal process and core count deficit, plus a sprinkling of bugs, and you can more or less explain SPR's problems without needing to look at the memory subsystem. This has been a topic of discussion since the mesh was introduced, but it never seemed all that significant.