Discussion Intel current and future Lakes & Rapids thread

Page 617 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Platinum Member
Feb 6, 2011
2,744
3,079
136
That could really bite AMD in the ass if true. We always knew AMD would go to 1MB of L2 but now it might have to compete with cores that come with twice that?
why is 48kb of L1 vs 32 biting them now ? how about that 1.25mb vs 512kb of L2 right now? how about that 512 vs 256 entry ROB?

There is no free lunch , bigger caches cost in latency , power and space and AMD has better predictors/prefetch then intel right now let alone when Zen 4 lands.
 
  • Like
Reactions: tamz_msc and Tlh97

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
why is 48kb of L1 vs 32 biting them now ? how about that 1.25mb vs 512kb of L2 right now? how about that 512 vs 256 entry ROB?

There is no free lunch , bigger caches cost in latency , power and space and AMD has better predictors/prefetch then intel right now let alone when Zen 4 lands.

I understand that. AMD has had a lot of trouble getting caches right in the past though. Even Zen came out with I think a 17 cycles L2 latency, which they got down to 11-12 with Raven Ridge, Threadripper, and Zen+. I was expecting 1MB L2 with Zen 3. I suppose it didn't make sense to spend die space on it though since it was still 7nm.

On a side note, too bad Ian won't be around here to do a real deep dive on the next CPU's. How many reviews do you think will know or state anything about cache latency? I love the uarch info. Most people just post specs and do benchmarks.
 
  • Like
Reactions: Tlh97

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
why is 48kb of L1 vs 32 biting them now ? how about that 1.25mb vs 512kb of L2 right now? how about that 512 vs 256 entry ROB?

There is no free lunch , bigger caches cost in latency , power and space and AMD has better predictors/prefetch then intel right now let alone when Zen 4 lands.
That is a huge thing I rarely see mentioned. Intel was always king of predicting and prefetching till now.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
On a side note, too bad Ian won't be around here to do a real deep dive on the next CPU's. How many reviews do you think will know or state anything about cache latency? I love the uarch info. Most people just post specs and do benchmarks.

I think that Ian did mention he'd be working freelance, so presumably any site that wants to offer that kind of analysis could bring him on for such an article.
 
  • Like
Reactions: Tlh97 and moinmoin

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
That is a huge thing I rarely see mentioned. Intel was always king of predicting and prefetching till now.

I think with GC they are at Zen3 level. They were stuck with 2013-2015 era ideas for too long with SKL and ICL, while AMD advanced with Z2 and Z3.

On cache side, i think good things will happen with RPL, since their L2 is already slow, increase to 2MB should not come with more drawbacks and only hitrate benefits. Larger L2 is more important for Intel, since their L3 is slow and bandwidth constrained, touching it less is a package level win, potentially helping performance of other cores.

Z4 core will have access to 1MB of L2 backed by 32MB of L3
RPL's big core will have access to 2MB of L2 backed by 36MB of L3

So Intel has achieved caching parity in this matchup at least in mainstream CPUs. One has to wonder if AMD missed opportunity to increase L3 to at least 48MB. I know forum warriors will be quick to point out the X3D technology, but cache advantage was important part of Zen success.
 

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
When talking about cache sizes we have to keep in mind that unlike AMD Intel likes to segment them based on models even if using the same die. So if they don't change that approach even if the top variants are competitive or beyond cache size wise, lower variants may fall back again. Same with AMD using mobile APUs containing smaller caches to fill the lower end of the desktop products range for that matter. Considering the impact of cache sizes on gaming, all the segmentation should be looked at far closer in benchmarks than has been done so far.
 
  • Like
Reactions: lightmanek

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
When talking about cache sizes we have to keep in mind that unlike AMD Intel likes to segment them based on models even if using the same die.

Yeah, and L2 increase will help them proportionally more. So even if 13400F might still have 18MB of L3, it will use it less.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
I think that Ian did mention he'd be working freelance, so presumably any site that wants to offer that kind of analysis could bring him on for such an article.

Well Ian has already written for Chips and Cheese and I think he plans on staying on youtube, so I'm sure we'll hear from him somewhere.

why is 48kb of L1 vs 32 biting them now ? how about that 1.25mb vs 512kb of L2 right now? how about that 512 vs 256 entry ROB?

There is no free lunch , bigger caches cost in latency , power and space and AMD has better predictors/prefetch then intel right now let alone when Zen 4 lands.

Yeah, and L2 increase will help them proportionally more. So even if 13400F might still have 18MB of L3, it will use it less.

Speaking of Chips and Cheese, they have an article regarding cache on Golden Cove. Some of it is purely for fun, but they loved Intel's decision to go with the larger L2 and suggested it could be hurting AMD. Of course these are completely different uarchs though.

"A fair number of traces end up benefiting from lower L2 latency, even if the cache is smaller. But those wins are smaller and less numerous than the losses, so it’s safe to say Golden Cove’s large L2 is a bright spot in its caching setup. Its latency is still low enough for the core to absorb the occasional L1 refill from L2. And it’s large enough to insulate the core from Alder Lake’s brutally high latency L3.

Armchair quarterback comment: Well played, Intel. Keep using those big, relatively fast L2 caches."
 
  • Like
Reactions: Tlh97

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
@Thunder 57 the Chips and Cheese is widely known in this thread and even that exact article was linked multiple times already :) They are one of the few left who do technical analyses and don't recite marketing material and run benchmarks without commentaries or analysis.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
@Thunder 57 the Chips and Cheese is widely known in this thread and even that exact article was linked multiple times already :) They are one of the few left who do technical analyses and don't recite marketing material and run benchmarks without commentaries or analysis.

Must've missed it or ignored it? I had only visited them a few times before last night. I still have a few tabs open I need to go through. Good site. I thought it seemed odd about me questioning AMD's decision to stick 512KB and now go to only 1MB L2 after reading that though. That's why I made sure to link it even if it already was and I didn't see it. It's a long thread after all.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Funny you say that, considering RL is rumoured to include a sizeable cache bump, which is supposed to significantly improve gaming performance right? Can't have it both ways. Can't beat AMD with the uber caches? Join em ;)
Raptor Lake 13900K will receive a total of 68MiB of total system Cache, but the biggest bump on that is on the e cores, the P core cluster where the gaming takes place is only getting 6 additional MiB of L2, The L3 remains the same for the P cores. Don't expect much gaming performance.
 

epsilon84

Golden Member
Aug 29, 2010
1,142
927
136
Raptor Lake 13900K will receive a total of 68MiB of total system Cache, but the biggest bump on that is on the e cores, the P core cluster where the gaming takes place is only getting 6 additional MiB of L2, The L3 remains the same for the P cores. Don't expect much gaming performance.

So a 60% increase in L2 cache from 10MB to 16MB is supposed to have little impact on gaming performance? What do you base this on?

Comet Lake showed significant gains in many games from an additional 8MB / 67% increase in L3 cache going from 12MB on a 10600K to 20MB on the 10900K, with frequency and active cores locked (4.5GHz / 6 cores active)


In the best case scenario, the jump from 12MB L3 to 20MB L3 yielded an additional 18.5% in R6S. Is it inconceivable that a similar increase, but in much faster L2 cache, could see similar or even better gains in RPL?
 
  • Like
Reactions: Thunder 57

tamz_msc

Diamond Member
Jan 5, 2017
3,725
3,554
136
In the best case scenario, the jump from 12MB L3 to 20MB L3 yielded an additional 18.5% in R6S. Is it inconceivable that a similar increase, but in much faster L2 cache, could see similar or even better gains in RPL?
Comet Lake L3$ is inclusive which does prefetching while Alder Lake L3$ is victim which doesn't do prefetching. Using Comet Lake results to guess the impact of increased L2$ in gaming isn't a worthwhile exercise.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
So a 60% increase in L2 cache from 10MB to 16MB is supposed to have little impact on gaming performance? What do you base this on?
What makes you believe that adding 0.75 MiB of INCLUSIVE L2$ per core(on P cores) will make any substantial difference in gaming?
 

epsilon84

Golden Member
Aug 29, 2010
1,142
927
136
Comet Lake L3$ is inclusive which does prefetching while Alder Lake L3$ is victim which doesn't do prefetching. Using Comet Lake results to guess the impact of increased L2$ in gaming isn't a worthwhile exercise.

Would looking at how ADL responds to increasing L3 sizes be a more accurate way to estimate any potential uplift?
20MB L3 -> 30MB L3 nets an ~8% gain, though that is only on 4 cores, so I'm not sure how that would translate to 6/8P core configs.


Average.png
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Would looking at how ADL responds to increasing L3 sizes be a more accurate way to estimate any potential uplift?
20MB L3 -> 30MB L3 nets an ~8% gain, though that is only on 4 cores, so I'm not sure how that would translate to 6/8P core configs.

It's a pretty good way, but only for the L3 Victim Cache size since in gaming the P cores are doing all of the work and actually taking advantage of the total 30MiB L3$

In that test they measured the difference between 20MiB and 30MiB, that is a 50% increase of L3 for a total of 8% performance.

Let us try to do that with Raptor Lake, the L3$ in the 13900K/F/S(only on the TOP SKU as the 13700K will remain 8P/8E) was increased from 30MiB to 36MiB(due to an additional cluster of 8 e cores) that amounts to 20% increment. for a total of 3% gaming performance boost(Theoretical)?


That's theoretical as many tests have been done and Alder Lake gaming performance remains the same without e cores.
 
Last edited:

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
It's a pretty good way, but only for the L3 Victim Cache size since in gaming the P cores are doing all of the work and actually taking advantage of the total 30MiB L3$

In that test they measured the difference between 20MiB and 30MiB, that is a 50% increase of L3 for a total of 8% performance.

Let us try to do that with Raptor Lake, the L3$ in the 13900K/F/S(only on the TOP SKU as the 13700K will remain 8P/8E) was increased from 30MiB to 36MiB(due to an additional cluster of 8 e cores) that amounts to 20% increment. for a total of 3% gaming performance boost(Theoretical)?


That's theoretical as many tests have been done and Alder Lake gaming performance remains the same without e cores.
The 12700k only has 4 e-cores, so upping the 13700k to 8 e-cores means both L2$ and L3$ will increase.
 

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
Underfox has an interesting tweet about an Intel patent that got accepted this year... He says it's for Ocean Cove but his observations about what's being shown in the patent are shocking. What do you guys make of this?
Here's the patent itself:
That one just has to collapse at the first whiff of a prior art objection, right? Right?? I mean it's almost like all they did for the graphs is run AMD slides through b/w conversion. :oops:

(And pretty sure AMD was aware of that patent and stopped its detailed slides at that point. So that change was Intel's fault all along.)
 

Cardyak

Member
Sep 12, 2018
72
159
106
Underfox has an interesting tweet about an Intel patent that got accepted this year... He says it's for Ocean Cove but his observations about what's being shown in the patent are shocking. What do you guys make of this?
Here's the patent itself:

I wouldn't read to much into this, Underfox is a very reputable source when it comes to unearthing patents, but in this particular instance I believe the information to be out of date.

These patents were no doubt authorized, but Ocean Cove has been cancelled by Intel for at least a year now.

Ocean Cove was meant to be the next natural progression of Intel's Microarchitecture, sticking to the iterative engineering formula that they have employed internally for decades now. (Wider, Deeper, Smarter). However a more radical departure from previous designs was suggested instead by Jim Keller et al, and as such Ocean Cove was cancelled and replaced with this new radical design. (Most likely Lunar Lake & Beyond)

I know *very* little about this new design. But what I do know is that it completely changes the way we think about physical cores, and it is instead based around the idea of abandoning big core designs such as Golden Cove, Redwood Cove, etc, and only having small cores. The "big" cores will be created dynamically as and when needed by stitching together little cores.

So instead of having something like Alder Lake where you have 8 Golden Cove Cores and 8 Gracemont Cores, in the next generation design you would simply have something like 48 ****mont cores. The configuration of these cores is then managed by a process (I'm not sure whether it's Hardware or software based yet)

You could have configs that change on the fly depending on what the user needs:

- 48 mont cores, and 0 cove cores (maximum multi-threading performance)
- 32 mont cores and 8 cove cores (via stitching together 16 mont cores into pairs)
- 16 mont cores and 16 cove cores (via stitching together 32 mont cores into pairs)

In the long term you could even merge the mont cores together into larger groups, and have something along the lines of:

- 32 mont cores and 4 cove cores, and 2 super-wide cove cores (via stitching together 8 mont cores into pairs, and another 8 mont cores into 2 groups of 4)

Essentially, the idea is why constrain yourself to a particular configuration right out of the factory, when you can dynamically resize cores by merging smaller components together as and when needed?

Just imagine merging together 4 Gracemont cores together horizontally into one super-wide core, and then imagine doing that with future more advanced ****mont cores.

THAT is Intel's vision of the future.