• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Discussion Intel current and future Lakes & Rapids thread

Page 415 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
7,412
2,099
136
Hmm, haven't followed the 'mont cores at all really, but why have both a shared L2$ and shared L3$ per four core cluster?
L2$ is probably inclusive and L3$ victim; sharing L2$ seems pretty old skool.
It's likely just an easy way of integrating Gracemont cluster to Alderlake and it's ring. Having a dual/quad core cluster sharing L2 has been true all the way since Silvermont in 2013.

The low cost/low power versions that will be branded Celerons and Pentiums will likely have the same configuration, just without the L3 cache. The Grand Ridge base station SoC also does not have L3 cache.

What is also interesting is 64KB L1I cache for Gracemont, Intel probably realized that without uCode cache L1I is glass jaw of performance and are increasing it from 32KB in Tremont to 64KB in Silvermont.
Or, doubling L1I is a perfect low hanging fruit improvement in a dual decode cluster architecture. The L1 Instruction cache feeds the two decoders. In a scenario where maximum decode width is utilized, in Tremont it's similar to halving L1I size per decoder.

Also:
While Tremont microarchitecture did not build a dynamic mechanism to load balance the decode clusters, future generations of Intel Atom processors will include hardware to recognize and mitigate these cases without the need for explicit insertions of taken branches into the assembly code.
Further reinforcing the fact that it's not based on an architecture that should have retired(Skylake), but an entirely new one.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,092
904
136
Or, doubling L1I is a perfect low hanging fruit improvement in a dual decode cluster architecture. The L1 Instruction cache feeds the two decoders. In a scenario where maximum decode width is utilized, in Tremont it's similar to halving L1I size per decoder.
I think it is not only "halving" that is in question here. They probably did simulations and found that things like L1I cache read ports, cache bank conflicts due to address bit "collisions" also make impact.
Most recent x86 CPU with 64KB of L1I was ZEN1, but they had 4-way associative cache, that made very little sense. They have fixed it in ZEN2 with downsize 32KB, but finally 8-way associative.
There is more to L1I caches, due to the way code branches call into various addresses.

I think Intel realized that when building Tremont already, right now, both clusters are used only when stars align and there is branch at the right place, but they probably also ran sims and realised that L1I would hold them back if they used them more often due to reasons above. And they are fixing it with Silvermont Gracemont and throwing more hw to use all decoders more of the time.

My bet is that decoders work up 6 and can use half of that capacity to branch. If the rest of the chip is widened, that will result in very sizable performance increase.
 
Last edited:

Exist50

Senior member
Aug 18, 2016
255
297
136
I think it is not only "halving" that is in question here. They probably did simulations and found that things like L1I cache read ports, cache bank conflicts due to address bit "collisions" also make impact.
Most recent x86 CPU with 64KB of L1I was ZEN1, but they had 4-way associative cache, that made very little sense. They have fixed it in ZEN2 with downsize 32KB, but finally 8-way associative.
There is more to L1I caches, due to the way code branches call into various addresses.

I think Intel realized that when building Tremont already, right now, both clusters are used only when stars align and there is branch at the right place, but they probably also ran sims and realised that L1I would hold them back if they used them more often due to reasons above. And they are fixing it with Silvermont and throwing more hw to use all decoders more of the time.

My bet is that decoders work up 6 and can use half of that capacity to branch. If the rest of the chip is widened, that will result in very sizable performance increase.
Gracemont, not Silvermont. I only mention it because this is the second comment when the swap :p.
 
  • Like
Reactions: Tlh97 and JoeRambo

DrMrLordX

Lifer
Apr 27, 2000
17,132
6,132
136
Not sure if it was posted already, but it seems people have figured out cache composition for Alder Lake ( due to lucky leak of GB5 OpenCL bench getting scheduled on small cores and revealing structure ).
Good catch. Also Silvermont lol, that's been awhile.

Is the iGPU on Alder Lake also going to share the L3? If so, it's seeming a lot like SLC from an Apple design.

ASML sold 7 EUV machines in Q1, Intel bought ... none of them

But I guess we'll hear soon that: "7nm is on schedule and beating internal estimates"
I mean, Intel has all the EUV machines they need alright, right?!?!?
 

LP-ZX100C

Junior Member
Mar 16, 2021
6
10
41
I mean, Intel has all the EUV machines they need alright, right?!?!?
So , after all, it's inline what Charlie from Semiaccurate said in a call with Susquehanna about Intel buying tools also Intel doesn't believe in their own roadmaps, too.
 

Hulk

Platinum Member
Oct 9, 1999
2,997
436
126
Is machine learning used when designing processors? Seems like it would be a great application. Basically tell the "machine" you have this many transistors and you need to run this code as fast as possible (from an IPC point of view), find the best way to to do it. Obviously that is greatly simplified but you know what I mean.
 

DrMrLordX

Lifer
Apr 27, 2000
17,132
6,132
136
Is machine learning used when designing processors? Seems like it would be a great application. Basically tell the "machine" you have this many transistors and you need to run this code as fast as possible (from an IPC point of view), find the best way to to do it. Obviously that is greatly simplified but you know what I mean.
Machines designing machines is still a ways off. We don't have The Architect quite yet.
 
  • Like
Reactions: Tlh97 and Hulk

moinmoin

Platinum Member
Jun 1, 2017
2,440
3,047
106
Is machine learning used when designing processors? Seems like it would be a great application. Basically tell the "machine" you have this many transistors and you need to run this code as fast as possible (from an IPC point of view), find the best way to to do it. Obviously that is greatly simplified but you know what I mean.
As @DrMrLordX said AI designing processors is far off. What's more and more automated is the layout on silicon. I think AWS referred to its Graviton ARM server chips as being optimized using AI.
 

mikk

Diamond Member
May 15, 2012
3,120
938
136
Q1 Earnings Call:

As a sign of our improving execution, we qualified Tiger Lake-H ahead of schedule in 10 nanometers, and we expect 10-nanometer unit volumes to cross over 14 nanometers in the second half of the year.

10nm ULV market volume is already higher than 14nm, Tigerlake-U is still growing while CML-U is declining more and more. On ADL/SPR/MTL:

In the PC business, we will follow the successful launches of Tiger Lake and Rocket Lake with Alder Lake, which is currently sampling and will ship in the second half of this year. Within the next couple of weeks, we’ll tape in the compute tile for Meteor Lake, our first 7-nanometer CPU for 2023. In the data center, we will follow the strong ramp of Ice Lake with Sapphire Rapids, which is scheduled to reach production around the end of this year, and ramp in the first half of 2022.
 

DrMrLordX

Lifer
Apr 27, 2000
17,132
6,132
136
Not sure if I agree with Dr. Cutress. Alder Lake-S with DDR4 is better than 3-6 months more of Rocket Lake-S.
 

tomatosummit

Member
Mar 21, 2019
48
27
61
I somewhat agree with him but it's more of a lesser evil thing and the market is different now compared to skylake launch. Haswell-E started off the consumer ddr4 market with it's own foibles but the enthusiast segment will more easily adopt a newer memory standard. Although I see mob manufacturers just releasing a variety of boards, ddr4 at the lower budget, rgb extreme ddr5 boards and some combos.
This is the year for everything being delayed though so the ddr5 ramp might be late and cause additional problems, especially as we have so little information on epyc4 and sapphire rapids and their launch windows.

Regarding his "20% IPC" statement, isn't that only people misreporting intel's own "20% more single thread performance" they haven't specified ipc themselves yet.
 

jpiniero

Diamond Member
Oct 1, 2010
9,197
1,815
126
I somewhat agree with him but it's more of a lesser evil thing and the market is different now compared to skylake launch. Haswell-E started off the consumer ddr4 market with it's own foibles but the enthusiast segment will more easily adopt a newer memory standard. Although I see mob manufacturers just releasing a variety of boards, ddr4 at the lower budget, rgb extreme ddr5 boards and some combos.
A combo board is asking too much, DDR5 is pretty different from DDR4. I don't see DDR5 being unavailable as being a problem as long as Alder Lake's DDR4 support isn't a big regression.
 

tomatosummit

Member
Mar 21, 2019
48
27
61
A combo board is asking too much, DDR5 is pretty different from DDR4.
Asrock will be taking that bet.
If ddr4+5 is the real reason for the extra 500pins on the socket then it would be trivial to add. Each dimm socket would be wired to unique pin sets on the socket, the only complications I see would be electrical interferance and convincing accounting.
 

Mopetar

Diamond Member
Jan 31, 2011
5,756
2,520
136
I had assumed that Zen 4 would using DDR5 so it make sense for Intel to also provide support for it. He does mention that it's a chicken and egg problem, but the surest way to get something made is by creating demand for a product.
 

eek2121

Senior member
Aug 2, 2005
884
965
136
I am sorry, this post has been a bit delayed, day job and all.

Yea I mean, if you look at Tigerlake, 15W barely serves 4 cores. AMD has 8 because not only it's smaller but more power efficient too.

The decisions are probably arbitrary as well. Marketing, profits, performance all play a role. 8+8+2 is possible too, but will it make sense as a brand, perf/watt, and in revenue terms?

Golden Cove, unless they change the design paradigm it's likely it won't be more efficient in MT. The growth in transistors don't result in corresponding increases in performance.

I won't bet on ESF being a big of a gain as SF. 10nm Icelake process sucked, so SF had lot more potential. SF brought 20% gains. ESF, maybe 7-10% on top of that?
For someone named IntelUser2000 you seem to be pessimistic regarding Intel. When Intel finalized TGL-U, 10nm was still an uncertainty and Renoir was not even announced or launched. More on this in a moment.

Is there some evidence DDR5 won't be ready this year? More and more news are coming about DDR5, recently from Micron/Crucial: https://wccftech.com/crucial-ready-mainstream-ddr5-memory-modules-sodimm-udimm-4800-mhz-32-gb/
DDR5 will likely launched in Q4. If ADL-S is DDR5 only at launch, it will likely launch in January...a year after rocket lake. We will probably see a server variant, and if we are lucky mobile, but I strongly suspect desktop to launch in January. I hope I am wrong. Believe it or not, I LOVE to be wrong.

Now, back to TGL-U, when TGL was being developed, 10nm had poor yields, 10SF had just completed testing, and the best AMD had was the Ryzen 3000 mobile chips. TGL-U, as it stands, would walk all over Ryzen at the time, Intel also strongly suspected that it would also walk all over Zen 2. They were right. The issue is that the U parts were designed around quad core parts. Intel did not focus on 8-core parts due to 10nm issues at the time.

When the chip launched, a bunch of people piled on and said that the flagship TGL-U chips used too much power because they briefly spiked very high. Anyone that argued TDP and power usage in the same sentence needs to go back and understand that TDP != power usage. Tiger Lake actually appears to obey "TDP" for the first time ever. That is, A TGL-U chip will burst super high, but quickly drop once TDP is met. There is no issue with a chip consuming even 200W of power as long as the 15W TDP is met (I feel like yelling at AMD, Intel, NVIDIA, shoot, anyone making "processors" for allowing this to ben open definition).

AMD originally designed Renoir around 6 cores with much higher clocks. They quickly discovered they could lower clocks, but increase core counts. That was their strategy.

If you compare a Ryzen 4800U with 4 cores disabled vs. a top of the line TGL-U chip, the TGL-U chip will win. If you compare Intel's offering to AMD's chip with only 6 cores disabled, the Intel offering will win in most cases/be neck in neck with the rest.

That is pretty much all I have to say. Now, before you begin your attack, I have a ton of hardware in my household, and you should know and understand that the only Intel CPU that exists is in an aging my spouse uses. I hold no Intel stock, and I held AMD stock for years on and off until tonight when I sold at $89. Please note that I will be buying back, but I know the market, and I know that the stuck will dump (it already has :D)

AMD has a fantastic product. However, TGL-H actually competes VERY well with Cezanne, (the 11800H beats the 5900HX in Geekbench, Cinebench, and a few other benchmarks that I'm privy to. Don't bother asking, the chip is dropping soon) and that is why I think Alder Lake will surprise us. If you actual evidence to the contrary I'd love to debate it, as it is likely something I've not seen. Until then, as I see leaks I will post them here. Love you all! :)

EDIT: My actual thoughts are: Alder Lake Desktop in January, Mobile in Q2, and workstation/server variants between the two. I think Alder Lake will have Xeon offering, but Sapphire Rapids/Ice Lake will rule the roost for high end. ADL-S will be mid-range to low end. I know not much has leaked out regard that, but IMO that is a sensible play.
 
  • Like
Reactions: lightmanek

zir_blazer

Golden Member
Jun 6, 2013
1,017
236
116
The only time that I recall AMD being on top of Intel adopting a new RAM type was with the original DDR and that is because Intel tried to force RamBus onto everyone, and failed catastrophically. All the other DDR generations the new DDRx was far more expensive at the same capacity levels and usually equal or just slighty below/above in performance because the speeds and Timmings were horrible. I don't see positive that AMD takes the first shot, maybe in Server (Like Intel with DDR4 and Haswell-E) if DDR5 allows for greater densities or using a dual controller like AM3 doing both DDR2 and DDR3 or even Intel Skylake supporting DDR3 and DDR4.
It makes sense to me that AMD is already prepared for such case where poor RAM adoption forces a new AM4 release, unless they want to upgrade the Socket anyways and have Motherboards with one of the two slots types. The question is whenever the IO dies already can do both modes or you need a new one.

And Alder Lake big.LITTLE... The horror. I'm expecting something worse than AMD CMT in Bulldozer, unless Intel is working VERY close to Microsoft to make sure than the CPU Scheduler will not wreck the party. For me it is a bad idea since implementing it properly seems far more complex than it looks. And cores not being equal means that possible behavior like erratas and bugs affects them in different ways. Imagine if a piece of code produces one result if executed in a big Core but another in the smaller Core. Harder to debug.
 

Hulk

Platinum Member
Oct 9, 1999
2,997
436
126
I thoroughly enjoyed watching Ian's entire 26 minute video, which is rare for me with a video that long. I have been thinking the same thing as Ian regarding the Golden Cove 20% IPC improvement that has been stated (I don't even remember the origin of this) and the Scheduler issues with Big/Little and how they will be handled. Namely, from an architectural level Intel has not told us, even in general terms, how it plans on improving IPC by 20% over Sunny Cove and how the heterogeneous cores will be effectively utilized by Windows?

The DDR4/5 issue I think is a little more out of Intel's hands. They are still quite the juggernaut in terms of sheer volume but perhaps not as able to "force" massive swings in the adoption of technology on their schedule.

It does kind of put them in a bind. If they wait on DDR5 because it is critical to Alder Lake performance vs Zen 3 then they remain behind Zen 3 while AMD sits back and quietly continues to ship Zen 3 while preparing for the release of Zen 4 when DDR5 is ready for widespread adoption. The current Rocket Lake vs. Zen 3 is not a great situation for Intel.

On the other hand is they put out Alder Lake predominantly with DDR4 (meaning that's how users will have to set it up due to lack of cost effective DDR5 memory) and performance suffers dramatically they they will get blasted by reviewers, may still be behind Zen 3, and will have this messy development of ADL as the shift to DDR5 occurs. By the time DDR5 is widespread Zen 4 will most likely be released and quite possibly providing AMD with a continuing performance/efficiency lead.

I'm thinking their best play here is to release ADL with DDR4/5 support and make sure review kits are DDR5. Assuming performance is impressive, if DDR4 performance isn't they have the caveat of simply ignoring DDR4 performance metrics, and we know Intel is good at making believe certain known facts don't exist, like the fact that the 5800X is a better performer than the 11900K while being cheaper and more power efficient.

Or, the old Intel may return and blow our minds in Core2Duo fashion circa 2006. Hopeful for that but I don't think it likely.
 

Asterox

Senior member
May 15, 2012
502
728
136
Is there some evidence DDR5 won't be ready this year? More and more news are coming about DDR5, recently from Micron/Crucial: https://wccftech.com/crucial-ready-mainstream-ddr5-memory-modules-sodimm-udimm-4800-mhz-32-gb/
It will be available in smaller quantities(even less available than RTX 3080 at lounch), and very expensive no doubt.You can expect minimum 200$ for "standard green naked non RGB" 16gb DDR5.

The future is pretty gray, and a shortage of GPU and CPU hardware will continue through 2022.
 
  • Like
Reactions: Tlh97

Shivansps

Diamond Member
Sep 11, 2013
3,221
837
136
Not sure if I agree with Dr. Cutress. Alder Lake-S with DDR4 is better than 3-6 months more of Rocket Lake-S.
I dont think memory support is a big deal, they could do the same the did with Skylake and Kabylake, have both DDR4 and DDR5 controllers and in the first year there will be boards with DDR4, and others with DDR5. The problem is the number of pins that arent going to be used later on, in the end they never removed the DDR3 controller from LGA1151 cpus, Coffe lake still had it.
 

ASK THE COMMUNITY